diff --git a/03-visualization.Rmd b/03-visualization.Rmd
index 8ad82d359..4d7c9d743 100755
--- a/03-visualization.Rmd
+++ b/03-visualization.Rmd
@@ -15,7 +15,7 @@ knitr::opts_chunk$set(
   fig.height = 4,
   fig.align='center',
   warning = FALSE
-  )
+)
 
 options(scipen = 99, digits = 3)
 
@@ -29,7 +29,7 @@ set.seed(76)
 
 We begin the development of your data science toolbox with data visualization. By visualizing our data, we gain valuable insights that we couldn't initially see from just looking at the raw data in spreadsheet form.  We will use the `ggplot2` package as it provides an easy way to customize your plots. `ggplot2` is rooted in the data visualization theory known as _The Grammar of Graphics_ [@wilkinson2005].
 
-At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center).  Graphics should be designed to emphasise the findings and insight you want your audience to understand.  This does however require a balancing act.  On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don't want to include so many as to overwhelm your audience.  
+At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center).  Graphics should be designed to emphasize the findings and insight you want your audience to understand.  This does however require a balancing act.  On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don't want to include so many as to overwhelm your audience.  
 
 As we will see, plots/graphics also help us to identify patterns and outliers in our data.  We will see that a common extension of these ideas is to compare the *distribution* of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is *distributed* in terms of its values) as we go across the levels of a different categorical variable.
 
@@ -54,13 +54,13 @@ library(readr)
 
 
 
----
+***
 
 
 
 ## The Grammar of Graphics {#grammarofgraphics}
 
-We begin with a discussion of a theoretical framework for data visualization known as "The Grammar of Graphics," which serves as the foundation for the `ggplot2` package. Think of how we construct sentences in english to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can't just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, "The Grammar of Graphics" define a set of rules for contructing *statistical graphics* by combining different types of *layers*. This grammar was created by Leland Wilkinson [@wilkinson2005] and has been implemented in a variety of data visualization software including R. 
+We begin with a discussion of a theoretical framework for data visualization known as "The Grammar of Graphics," which serves as the foundation for the `ggplot2` package. Think of how we construct sentences in English to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can't just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, "The Grammar of Graphics" define a set of rules for constructing *statistical graphics* by combining different types of *layers*. This grammar was created by Leland Wilkinson [@wilkinson2005] and has been implemented in a variety of data visualization software including R. 
 
 ### Components of the Grammar
 
@@ -165,7 +165,7 @@ There are other components of the Grammar of Graphics we can control as well.  A
 - `stat`istical transformations: this includes smoothing, binning values into a histogram, or no transformation at all (known as the `"identity"` transformation).
 -->
 
-Other more complex components like `scales` and `coord`inate systems are left for a more advanced text such as [R for Data Science](http://r4ds.had.co.nz/data-visualisation.html#aesthetic-mappings) [@rds2016].  Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifiying them.
+Other more complex components like `scales` and `coord`inate systems are left for a more advanced text such as [R for Data Science](http://r4ds.had.co.nz/data-visualisation.html#aesthetic-mappings) [@rds2016].  Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifying them.
 
 ### ggplot2 package
 
@@ -180,7 +180,7 @@ Let's now put the theory of the Grammar of Graphics into practice.
 
 
 
----
+***
 
 
 
@@ -198,7 +198,7 @@ We will discuss some variations of these plots, but with this basic repertoire o
 
 
 
----
+***
 
 
 
@@ -367,7 +367,7 @@ With medium to large data sets, you may need to play around with the different m
 -->
 
 
----
+***
 
 
 ## 5NG#2: Linegraphs {#linegraphs}
@@ -438,11 +438,11 @@ Much as with the `ggplot()` code that created the scatterplot of departure and a
 
 ### Summary
 
-Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use lingraphs over scatterplots when the variable on the x-axis (i.e. the explanatory variable) has an inherent ordering, like some notion of time.  
+Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use linegraphs over scatterplots when the variable on the x-axis (i.e. the explanatory variable) has an inherent ordering, like some notion of time.  
 
 
 
----
+***
 
 
 
@@ -491,7 +491,7 @@ The remaining bins all have a similar interpretation.
 
 ### Histograms via geom_histogram {#geomhistogram}
 
-Let's now present the `ggplot()` code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in `aes()`: the single numerical variable `temp`. The y-aesthetic of a histogram gets computed for you automatically. Furthemore, the geometric object layer is now a `geom_histogram()`
+Let's now present the `ggplot()` code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in `aes()`: the single numerical variable `temp`. The y-aesthetic of a histogram gets computed for you automatically. Furthermore, the geometric object layer is now a `geom_histogram()`
 
 ```{r weather-histogram, warning=TRUE, fig.cap="Histogram of hourly temperatures at three NYC airports."}
 ggplot(data = weather, mapping = aes(x = temp)) +
@@ -524,7 +524,7 @@ Observe in both Figure \@ref(fig:weather-histogram-2) and Figure \@ref(fig:weath
 
 Using the first method, we have the power to specify how many bins we would like to cut the x-axis up in. As mentioned in the previous section, the default number of bins is 30. We can override this default, to say 40 bins, as follows:
 
-```{r, warning=FALSE, message=FALSE, fig.cap= "Histogram with 60 bins."}
+```{r, warning=FALSE, message=FALSE, fig.cap= "Histogram with 40 bins."}
 ggplot(data = weather, mapping = aes(x = temp)) +
   geom_histogram(bins = 40, color = "white")
 ```
@@ -558,13 +558,13 @@ Histograms, unlike scatterplots and linegraphs, present information on only a si
 
 
 
----
+***
 
 
 
 ## Facets {#facets}
 
-Before continuing the 5NG, let's briefly introduce a new concept called *faceting*.  Faceting is used when we'd like to split a particular visualization of variables by another variable. This will create mutiple copies of the same type of plot with matching x and y axes, but whose content will differ. 
+Before continuing the 5NG, let's briefly introduce a new concept called *faceting*.  Faceting is used when we'd like to split a particular visualization of variables by another variable. This will create multiple copies of the same type of plot with matching x and y axes, but whose content will differ. 
 
 For example, suppose we were interested in looking at how the histogram of hourly temperature recordings at the three NYC airports we saw in Section \@ref(histograms) differed by month. We would "split" this histogram by the 12 possible months in a given year, in other words plot histograms of `temp` for each `month`. We do this by adding `facet_wrap(~ month)` layer.
 
@@ -574,7 +574,7 @@ ggplot(data = weather, mapping = aes(x = temp)) +
   facet_wrap(~ month)
 ```
 
-Note the use of the tilde `~` before `month` in `facet_wrap()`.  The tilde is required and you'll receive the error `Error in as.quoted(facets) : object 'month' not found` if you don't include it before `month` here. We can also specify the number of rows and columns in the grid by using the `nrow` and `ncol` arguments inside of `facet_wrap()`. For example, say we would like our facetted plot to have 4 rows instead of 3. Add the `nrow = 4` argument to `facet_wrap(~ month)`
+Note the use of the tilde `~` before `month` in `facet_wrap()`.  The tilde is required and you'll receive the error `Error in as.quoted(facets) : object 'month' not found` if you don't include it before `month` here. We can also specify the number of rows and columns in the grid by using the `nrow` and `ncol` arguments inside of `facet_wrap()`. For example, say we would like our faceted plot to have 4 rows instead of 3. Add the `nrow = 4` argument to `facet_wrap(~ month)`
 
 ```{r facethistogram2, fig.cap="Faceted histogram with 4 instead of 3 rows."}
 ggplot(data = weather, mapping = aes(x = temp)) +
@@ -601,7 +601,7 @@ Observe in both Figure \@ref(fig:facethistogram) and Figure \@ref(fig:facethisto
 
 
 
----
+***
 
 
 
@@ -732,7 +732,7 @@ It is important to keep in mind that the definition of an outlier is somewhat ar
 
 **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Which months have the highest variability in temperature?  What reasons can you give for this?
 
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** We looked at the distribution of a numerical variable over a categorical variable here with this boxplot.  Why can't we look at the distribution of one numerical variable over the distribution of another numerical variable?  Say, temperature across pressure, for example?
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** We looked at the distribution of the numerical variable `temp` split by the numerical variable `month` that we converted to a categorical variable using the `factor()` function. Why would a boxplot of `temp` split by the numerical variable `pressure` similarly converted to a categorical variable using the `factor()` not be informative?
 
 **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Boxplots provide a simple way to identify outliers.  Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?
 
@@ -745,7 +745,7 @@ Side-by-side boxplots provide us with a way to compare and contrast the distribu
 
 
 
----
+***
 
 
 
@@ -985,7 +985,7 @@ Barplots are the preferred way of displaying the distribution of a categorical v
 
 
 
----
+***
 
 
 
@@ -1096,7 +1096,7 @@ ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) +
   geom_line()
 ```
 
-These two code segments were a preview of Chapter \@ref(wrangling) on data wrangling where we'll delve further into the `dplyr` package. Data wrangling is the process of transforming and modifying existing data to with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the `filter()` function to create new data frames (`alaska_flights` and `early_january_weather`) by choosing only a subset of rows of existing data frames (`flights` and `weather`). In this next chapter, we'll formally introduce the `filter()` and other data wrangling functions as well as the *pipe operator* `%>%` which allows you to combine multiple data wrangling actions into a single sequential *chain* of actions. On to Chapter \@ref(wrangling) on data wrangling!
+These two code segments were a preview of Chapter \@ref(wrangling) on data wrangling where we'll delve further into the `dplyr` package. Data wrangling is the process of transforming and modifying existing data with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the `filter()` function to create new data frames (`alaska_flights` and `early_january_weather`) by choosing only a subset of rows of existing data frames (`flights` and `weather`). In this next chapter, we'll formally introduce the `filter()` and other data wrangling functions as well as the *pipe operator* `%>%` which allows you to combine multiple data wrangling actions into a single sequential *chain* of actions. On to Chapter \@ref(wrangling) on data wrangling!
 
 ```{r echo=FALSE, fig.cap="ModernDive flowchart", out.width='110%', fig.align='center'}
 # knitr::include_graphics("images/flowcharts/flowchart/flowchart.004.png")
diff --git a/04-wrangling.Rmd b/04-wrangling.Rmd
index 7b4cf5a69..4b49905f5 100755
--- a/04-wrangling.Rmd
+++ b/04-wrangling.Rmd
@@ -13,7 +13,7 @@ knitr::opts_chunk$set(
   fig.height = 4,
   fig.align='center',
   warning = FALSE
-  )
+)
 
 options(scipen = 99, digits = 3)
 
@@ -25,7 +25,7 @@ options(knitr.kable.NA = '')
 set.seed(76)
 ```
 
-So far in our journey, we've seen how to look at data saved in data frames using the `glimpse()` and `View()` functions in Chapter \@ref(getting-started) on and how to create data visualizations using the `ggplot2` package in Chapter \@ref(viz). In particular we study what we term the "five named graphs" (5NG):
+So far in our journey, we've seen how to look at data saved in data frames using the `glimpse()` and `View()` functions in Chapter \@ref(getting-started) on and how to create data visualizations using the `ggplot2` package in Chapter \@ref(viz). In particular we studied what we term the "five named graphs" (5NG):
 
 1. scatterplots via `geom_point()`
 1. linegraphs via `geom_line()`
@@ -33,9 +33,9 @@ So far in our journey, we've seen how to look at data saved in data frames using
 1. histograms via `geom_histogram()`
 1. barplots via `geom_bar()` or `geom_col()`
 
-We created these visualization using the "Grammar of Graphics", which maps variables in a data frame to the aesthetic attributes of the above 5 `geom`etric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure \@ref(fig:gapminder). 
+We created these visualizations using the "Grammar of Graphics", which maps variables in a data frame to the aesthetic attributes of one the above 5 `geom`etric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure \@ref(fig:gapminder). 
 
-Furthermore in Section \@ref(whats-to-come-3) we discussed that for two of our visualizations, we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay *only* for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the `flights` data frame to a new data frame `alaska_flights` consisting of only `carrier == AS` flights using the `filter()` function.
+Recall however in Section \@ref(whats-to-come-3) we discussed that for two of our visualizations we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay *only* for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the `flights` data frame to a new data frame `alaska_flights` consisting of only `carrier == "AS"` flights using the `filter()` function.
 
 ```{r, eval=FALSE}
 alaska_flights <- flights %>% 
@@ -48,13 +48,15 @@ ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) +
 In this chapter, we'll introduce a series of functions from the `dplyr` package that will allow you to take a data frame and
 
 1. `filter()` its existing rows to only pick out a subset of them. For example, the `alaska_flights` data frame above.
-1. `summarize()` one of its columns/variables with a *summary statistic*. For example, the median and interquartile range of temperatures as we saw in Section \@ref(boxplots) on boxplots. 
-1. `group_by()` its rows. In other words assign different rows to be part of the same *group* and thus report summary statistics for each group separately. For example, perhaps you want not the overall average departure delay `dep_delay` for all three `origin` airports combined, but the average departure delay for each of the three `origin` airports separately.
+1. `summarize()` one of its columns/variables with a *summary statistic*. Examples include the median and interquartile range of temperatures as we saw in Section \@ref(boxplots) on boxplots. 
+1. `group_by()` its rows. In other words assign different rows to be part of the same *group* and report summary statistics for each group separately. For example, say perhaps you don't want a single overall average departure delay `dep_delay` for all three `origin` airports combined, but rather three separate average departure delays, one for each of the three `origin` airports.
 1. `mutate()` its existing columns/variables to create new ones. For example, convert hourly temperature recordings from &deg;F to &deg;C.
 1. `arrange()` its rows. For example, sort the rows of `weather` in ascending or descending order of `temp`.
 1. `join()` it with another data frame by matching along a "key" variable. In other words, merge these two data frames together.
 
-Notice how we used computer code type font to describe the actions we want to take on our data frames. This is because the `dplyr` package have intuitively verb-named functions that are easy to remember. We'll start by introducing the pipe operator `%>%`, which allows you to combine multiple data wrangling verb-named functions into a single sequential *chain* of actions.
+Notice how we used `computer code` font to describe the actions we want to take on our data frames. This is because the `dplyr` package for data wrangling that we'll introduce in this chapter has intuitively verb-named functions that are easy to remember. 
+
+We'll start by introducing the pipe operator `%>%`, which allows you to combine multiple data wrangling verb-named functions into a single sequential *chain* of actions.
 
 
 ### Needed packages {-}
@@ -76,13 +78,13 @@ library(readr)
 
 
 
----
+***
 
 
 
 ## The pipe operator: `%>%` {#piping}
 
-Before we dig into data wrangling, let's first introduce a very nifty tool that gets loaded along with the `dplyr` package: the pipe operator `%>%`. Let's say you would like to perform this sequence of operations in R:
+Before we start data wrangling, let's first introduce a very nifty tool that gets loaded along with the `dplyr` package: the pipe operator `%>%`. Say you would like to perform a hypothetical sequence of operations on a hypothetical data frame `x` using hypothetical functions `f()`, `g()`, and `h()`:
 
 1. Take `x` *then*
 1. Use `x` as an input to a function `f()` *then*
@@ -95,7 +97,7 @@ One way to achieve this sequence of operations is by using nesting parentheses a
 h(g(f(x)))
 ```
 
-In this case, the above code isn't so hard to read since we are applying only three functions: `f()`, then `g()`, then `h()`. However, you can imagine this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator `%>%` (pronounced "then") comes in handy. `%>%` takes one output of one function and then "pipes" it to be the input of the next function. For example: you can obtain the same output as the above sequence of operations as follows:
+The above code isn't so hard to read since we are applying only three functions: `f()`, then `g()`, then `h()`. However, you can imagine that this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator `%>%` comes in handy. `%>%` takes one output of one function and then "pipes" it to be the input of the next function. Furthermore, a helpful trick is to read `%>%` as "then." For example, you can obtain the same output as the above sequence of operations as follows:
 
 ```{r, eval = FALSE}
 x %>% 
@@ -111,7 +113,7 @@ You would read this above sequence as:
 1. Use this output as the input to the next function `g()` *then*
 1. Use this output as the input to the next function `h()`
 
-So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are `x`, `f()`, `g()`, and `h()`?  Throughout this chapter on data wrangling:
+So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are the hypothetical `x`, `f()`, `g()`, and `h()`?  Throughout this chapter on data wrangling:
 
 * The starting value `x` will be a data frame. For example: `flights`.
 * The sequence of functions, here `f()`, `g()`, and `h()`, will be a sequence of any number of the 6 data wrangling verb-named functions we listed in the introduction to this chapter. For example: `filter(carrier == "AS")`.
@@ -124,12 +126,11 @@ alaska_flights <- flights %>%
   filter(carrier == "AS")
 ```
 
+Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you'll see some examples of these near in Section \@ref(other-verbs). However, just with these 6 verb-named functions you'll be able to perform a broad array of data wrangling tasks for the rest of this book.
 
-Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you'll see some examples of these near in Section \@ref(other-verbs). However, just with these 6 verb-named functions you'll be able to perform a broad array of data wrangling tasks. 
 
 
-
----
+***
 
 
 
@@ -139,7 +140,7 @@ Keep in mind, there are many more advanced data wrangling functions than just th
 knitr::include_graphics("images/filter.png")
 ```
 
-The `filter()` function here works much like the "Filter" option in Microsoft Excel; it allows you to specify criteria about values of a variable in your dataset and then chooses only those rows that match that criteria.  We begin by focusing only on flights from New York City to Portland, Oregon.  The `dest` code (or airport code) for Portland, Oregon is `"PDX"`. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here:
+The `filter()` function here works much like the "Filter" option in Microsoft Excel; it allows you to specify criteria about the values of a variables in your dataset and then filters out only those rows that match that criteria.  We begin by focusing only on flights from New York City to Portland, Oregon.  The `dest` code (or airport code) for Portland, Oregon is `"PDX"`. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here:
 
 ```{r, eval=FALSE}
 portland_flights <- flights %>% 
@@ -150,38 +151,48 @@ View(portland_flights)
 Note the following:
 
 * The ordering of the commands:
-    + Take the data frame `flights` *then*
+    + Take the `flights` data frame `flights` *then*
     + `filter` the data frame so that only those where the `dest` equals `"PDX"` are included.
-* The double equal sign `==` for testing for equality, and not `=`. You are almost guaranteed to make the mistake at least once of only including one equals sign.
-
-You can combine multiple criteria together using operators that make comparisons:
-
-- `|` corresponds to "or"
-- `&` corresponds to "and"
-
-We can often skip the use of `&` and just separate our conditions with a comma.  You'll see this in the example below.
+* We test for equality using the double equal sign `==` and not a single equal sign `=`. In other words `filter(dest = "PDX")` will yield an error. This is a convention across many programming languages. If you are new to coding, you'll probably forget to use the double equal sign `==` a few times before you get the hang of it.
 
-In addition, you can use other mathematical checks (similar to `==`):
+You can use other mathematical operations beyond just `==` to form criteria:
 
 - `>` corresponds to "greater than"
 - `<` corresponds to "less than"
 - `>=` corresponds to "greater than or equal to"
 - `<=` corresponds to "less than or equal to"
-- `!=` corresponds to "not equal to"
+- `!=` corresponds to "not equal to". The `!` is used in many programming languages to indicate "not".
 
-To see many of these in action, let's select all flights that left JFK airport heading to Burlington, Vermont (`"BTV"`) or Seattle, Washington (`"SEA"`) in the months of October, November, or December. Run the following
+Furthermore, you can combine multiple criteria together using operators that make comparisons:
+
+- `|` corresponds to "or"
+- `&` corresponds to "and"
+
+To see many of these in action, let's filter `flights` for all rows that:
+
+* Departed from JFK airport and 
+* Were heading to Burlington, Vermont (`"BTV"`) or Seattle, Washington (`"SEA"`) and
+* Departed in the months of October, November, or December. 
+
+Run the following:
 
 ```{r, eval=FALSE}
 btv_sea_flights_fall <- flights %>% 
-  filter(origin == "JFK", 
-         dest == "BTV" | dest == "SEA", 
-         month >= 10)
+  filter(origin == "JFK" & (dest == "BTV" | dest == "SEA") & month >= 10)
 View(btv_sea_flights_fall)
 ```
 
-Note: even though colloquially speaking one might say "all flights leaving Burlington, Vermont *and* Seattle, Washington," in terms of computer logical operations, we really mean "all flights leaving Burlington, Vermont *or* Seattle, Washington." For a given row in the data, `dest` can be "BTV", "SEA", or something else, but not "BTV" and "SEA" at the same time.
+Note that even though colloquially speaking one might say "all flights leaving Burlington, Vermont *and* Seattle, Washington," in terms of computer operations, we really mean "all flights leaving Burlington, Vermont *or* leaving Seattle, Washington." For a given row in the data, `dest` can be "BTV", "SEA", or something else, but not "BTV" and "SEA" at the same time. Furthermore, note the careful use of parentheses around the `dest == "BTV" | dest == "SEA"`.
 
-Another example uses the `!` to pick rows that *don't* match a condition. The `!` can be read as "not." Here we are selecting rows corresponding to flights that didn't go to Burlington, VT or Seattle, WA.
+We can often skip the use of `&` and just separate our conditions with a comma. In other words the code above will return the identical output `btv_sea_flights_fall` as this code below:
+
+```{r, eval=FALSE}
+btv_sea_flights_fall <- flights %>% 
+  filter(origin == "JFK", (dest == "BTV" | dest == "SEA"), month >= 10)
+View(btv_sea_flights_fall)
+```
+
+Let's present another example that uses the `!` "not" operator to pick rows that *don't* match a criteria. As mentioned earlier, the `!` can be read as "not." Here we are filtering rows corresponding to flights that didn't go to Burlington, VT or Seattle, WA.
 
 ```{r, eval=FALSE}
 not_BTV_SEA <- flights %>% 
@@ -189,6 +200,15 @@ not_BTV_SEA <- flights %>%
 View(not_BTV_SEA)
 ```
 
+Again, note the careful use of parentheses around the `(dest == "BTV" | dest == "SEA")`. If we didn't use parentheses as follows:
+
+```{r, eval=FALSE}
+flights %>% 
+  filter(!dest == "BTV" | dest == "SEA")
+```
+
+We would be returning all flights not headed to `"BTV"` *or* those headed to `"SEA"`, which is an entirely different resulting data frame. 
+
 Now say we have a large list of airports we want to filter for, say `BTV`, `SEA`, `PDX`, `SFO`, and `BDL`. We could continue to use the `|` or operator as so:
 
 ```{r, eval=FALSE}
@@ -197,7 +217,7 @@ many_airports <- flights %>%
 View(many_airports)
 ```
 
-but as we progressively include more airports, this will get unwieldly. A slightly shorter approach uses the `%in%` operator:
+but as we progressively include more airports, this will get unwieldy. A slightly shorter approach uses the `%in%` operator:
 
 ```{r, eval=FALSE}
 many_airports <- flights %>% 
@@ -205,28 +225,28 @@ many_airports <- flights %>%
 View(many_airports)
 ```
 
-What this code is doing is its filtering for all flights where `dest` is in the list of airports `c("BTV", "SEA", "PDX", "SFO", "BDL")`. Both outputs of `many_airports` are the same, but as you can see the latter takes much less time to code.
+What this code is doing is filtering `flights` for all flights where `dest` is in the list of airports `c("BTV", "SEA", "PDX", "SFO", "BDL")`. Recall from Chapter \@ref(getting-started) that the `c()` function "combines" or "concatenates" values in a vector of values. Both outputs of `many_airports` are the same, but as you can see the latter takes much less time to code. 
 
-As a final note we point out that `filter()` should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope to just the observations your care about. 
+As a final note we point out that `filter()` should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope of your data frame to just the observations your care about. 
 
 ```{block lc-filter, type='learncheck', purl=FALSE}
 **_Learning check_**
 ```
 
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What's another way using the "not" operator `!` we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the `flights` data frame? Test this out using the code above.
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What's another way of using the "not" operator `!` to filter only the rows that are not going to Burlington VT nor Seattle WA in the `flights` data frame? Test this out using the code above.
 
 ```{block, type='learncheck', purl=FALSE}
 ```
 
 
 
----
+***
 
 
 
 ## `summarize` variables {#summarize}
 
-The next common task when working with data is to be able to summarize data: take a large number of values and summarize them with a single value. While this may seem like a very abstract idea, something as simple as the sum, the smallest value, and the largest values are all summaries of a large number of values.
+The next common task when working with data is to return *summary statistics*: a single numerical value that summarizes a large number of values, for example the mean/average or the median. Other examples of summary statistics that might not immediately come to mind include the sum, the smallest value AKA the minimum, the largest value AKA the maximum, and the standard deviation; they are all summaries of a large number of values.
 
 ```{r sum1, echo=FALSE, fig.cap="Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet", purl=FALSE}
 knitr::include_graphics("images/summarize1.png")
@@ -237,25 +257,15 @@ knitr::include_graphics("images/summary.png")
 options(knitr.kable.NA = 'NA')
 ```
 
-We can calculate the standard deviation and mean of the temperature variable `temp` in the `weather` data frame of `nycflights13` in one step using the `summarize` (or equivalently using the UK spelling `summarise`) function in `dplyr` (See Appendix \@ref(appendixA)):
+Let's calculate the mean and the standard deviation of the temperature variable `temp` in the `weather` data frame included in the `nycflights13` package (See Appendix \@ref(appendixA)). We'll do this in one step using the `summarize()` function from the `dplyr` package and save the results in a new data frame `summary_temp` with columns/variables `mean` and the `std_dev`. Note you can also use the UK spelling of "summarise" using the `summarise()` function. 
 
-```{r, eval=FALSE}
+As shown in Figures \@ref(fig:sum1) and \@ref(fig:sum2), the `weather` data frame's many rows will be collapsed into a single row of just the summary values, in this case the mean and standard deviation:
+
+```{r, eval=TRUE}
 summary_temp <- weather %>% 
-  summarize(mean = mean(temp), 
-            std_dev = sd(temp))
+  summarize(mean = mean(temp), std_dev = sd(temp))
 summary_temp
 ```
-
-
-<!--
-TODO: Fix this output later. As is, the table outputs no rows
--->
-```
-# A tibble: 1 x 2
-   mean std_dev
-  <dbl>   <dbl>
-1    NA      NA
-```
 ```{r, echo=FALSE, eval=FALSE}
 options(knitr.kable.NA = '')
 summary_temp <- weather %>% 
@@ -266,21 +276,19 @@ kable(summary_temp) %>%
                 latex_options = c("HOLD_position"))
 ```
 
-We've created a small data frame here called `summary_temp` that includes both the `mean` and the `std_dev` of the `temp` variable in `weather`.  Notice as shown in Figures \@ref(fig:sum1) and \@ref(fig:sum2), the data frame `weather` went from many rows to a single row of just the summary values in the data frame `summary_temp`.  
-
-But why are the values returned `NA`? This stands for "not available or not applicable" and is how R encodes *missing values*; if in a data frame for a particular row and column no value exists, `NA` is stored instead. Furthermore, by default any time you try to summarize a number of values (using `mean()` and `sd()` for example) that has one or more missing values, then `NA` is returned. 
+Why are the values returned `NA`? As we saw in Section \@ref(geompoint) when creating the scatterplot of departure and arrival delays for `alaska_flights`, `NA` is how R encodes *missing values* where `NA` indicates "not available" or "not applicable." If a value for a particular row and a particular column does not exist, `NA` is stored instead. Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You'll often encounter issues with missing values when working with real data.
 
-Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You'll often encounter issues with missing values. 
+Going back to our `summary_temp` output above, by default any time you try to calculate a summary statistic of a variable that has one or more `NA` missing values in R, then `NA` is returned. To work around this fact, you can set the `na.rm` argument to `TRUE`, where `rm` is short for "remove"; this will ignore any `NA` missing values and only return the summary value for all non-missing values. 
 
-You can summarize all non-missing values by setting the `na.rm` argument to TRUE (`rm` is short for "remove"). This will remove any `NA` missing values and only return the summary value for all non-missing values. So the code below computes the mean and standard deviation of all non-missing values. Notice how the `na.rm=TRUE` are set as arguments to the `mean()` and `sd()` functions, and not to the `summarize()` function. 
+The code below computes the mean and standard deviation of all non-missing values of `temp`. Notice how the `na.rm=TRUE` are used as arguments to the `mean()` and `sd()` functions individually, and not to the `summarize()` function. 
 
-```{r, eval=FALSE}
+```{r, eval = TRUE}
 summary_temp <- weather %>% 
   summarize(mean = mean(temp, na.rm = TRUE), 
             std_dev = sd(temp, na.rm = TRUE))
 summary_temp
 ```
-```{r, echo=FALSE}
+```{r, echo=FALSE, eval=FALSE}
 summary_temp <- weather %>% 
   summarize(mean = mean(temp, na.rm = TRUE), 
             std_dev = sd(temp, na.rm = TRUE))
@@ -289,17 +297,9 @@ kable(summary_temp) %>%
                 latex_options = c("HOLD_position"))
 ```
 
-It is not good practice to include a `na.rm = TRUE` in your summary commands by default; you should attempt to run code first without this argument as this will alert you to the presence of missing data. Only after you've identified where missing values occur and have thought about the potential causes of this missing should you consider using `na.rm = TRUE`. In the upcoming Learning Checks we'll consider the possible ramifications of blindly sweeping rows with missing values under the rug. 
-
-<!--
-If we'd like to access either of these values directly we can use the `$` to specify a column in a data frame. For example:
-
-```{r}
-#summary_temp$mean
-```
--->
+However, one needs to be cautious whenever ignoring missing values as we've done above. In the upcoming Learning Checks we'll consider the possible ramifications of blindly sweeping rows with missing values "under the rug." This is in fact why the `na.rm` argument to any summary statistic function in R has is set to `FALSE` by default; in other words, do not ignore rows with missing values by default. R is alerting you to the presence of missing data and you should by mindful of this missingness and any potential causes of this missingness throughtout your analysis. 
 
-What other summary functions can we use inside the `summarize()` verb? Any function in R that takes a vector of values and returns just one. Here are just a few:
+What are other summary statistic functions can we use inside the `summarize()` verb? As seen in Figure \@ref(fig:sum2), you can use any function in R that takes many values and returns just one. Here are just a few:
 
 * `mean()`: the mean AKA the average
 * `sd()`: the standard deviation, which is a measure of spread
@@ -329,7 +329,7 @@ summary_temp <- weather %>%
 
 
 
----
+***
 
 
 
@@ -339,14 +339,7 @@ summary_temp <- weather %>%
 knitr::include_graphics("images/group_summary.png")
 ```
 
-It's often more useful to summarize a variable based on the groupings of another variable.  Let's say, we are interested in the mean and standard deviation of temperatures but *grouped by month*. To be more specific: we want the mean and standard deviation of temperatures
-
-1. split by month.
-1. sliced by month.
-1. aggregated by month.
-1. collapsed over month.
-
-Run the following code:
+Say instead of the a single mean temperature for the whole year, you would like 12 mean temperatures, one for each of the 12 months separately? In other words, we would like to compute the mean temperature split by month AKA sliced by month AKA aggregated by month. We can do this by "grouping" temperature observations by the values of another variable, in this case by the 12 values of the variable `month`. Run the following code:
 
 ```{r, eval=FALSE}
 summary_monthly_temp <- weather %>% 
@@ -365,21 +358,50 @@ kable(summary_monthly_temp) %>%
                 latex_options = c("HOLD_position"))
 ```
 
-This code is identical to the previous code that created `summary_temp`, with an extra `group_by(month)` added. Grouping the `weather` dataset by `month` and then passing this new data frame into `summarize` yields a data frame that shows the mean and standard deviation of  temperature for each month in New York City. Note: Since each row in `summary_monthly_temp` represents a summary of different rows in `weather`, the observational units have changed.
+This code is identical to the previous code that created `summary_temp`, but with an extra `group_by(month)` added before the `summarize()`. Grouping the `weather` dataset by `month` and then applying the `summarize()` functions yields a data frame that displays the mean and standard deviation temperature split by the 12 months of the year.
+
+It is important to note that the `group_by()` function doesn't change data frames by itself. Rather it changes the *meta-data*, or data about the data, specifically the group structure. It is only after we apply the `summarize()` function that the data frame changes. For example, let's consider the `diamonds` data frame included in the `ggplot2` package. Run this code, specifically in the console:
 
-It is important to note that `group_by` doesn't change the data frame. It sets *meta-data* (data about the data), specifically the group structure of the data. It is only after we apply the `summarize` function that the data frame changes. 
+```{r, eval=TRUE}
+diamonds
+```
 
-If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the `ungroup()` function. For example, say the group structure meta-data is set to be by month via `group_by(month)`, all future summarizations will be reported on a month-by-month basis. If however, we would like to no longer have this and have all summarizations be for all data in a single group (in this case over the entire year of 2013), then pipe the data frame in question through and `ungroup()` to remove this.
+Observe that the first line of the output reads `# A tibble: 53,940 x 10`. This is an example of meta-data, in this case the number of observations/rows and variables/columns in `diamonds`. The actual data itself are the subsequent table of values. 
 
-We now revisit the `n()` counting summary function we introduced in the previous section. For example, suppose we'd like to get a sense for how many flights departed each of the three airports in New York City:
+Now let's pipe the `diamonds` data frame into `group_by(cut)`. Run this code, specifically in the console:
 
-```{r, eval=FALSE}
+```{r, eval=TRUE}
+diamonds %>% 
+  group_by(cut)
+```
+
+Observe that now there is additional meta-data: `# Groups: cut [5]` indicating that the grouping structure meta-data has been set based on the 5 possible values AKA levels of the categorical variable `cut`: `"Fair"`, `"Good"`, `"Very Good"`, `"Premium"`, `"Ideal"`. On the other hand observe that the data has not changed: it is still a table of 53,940 $\times$ 10 values.
+
+Only by combining a `group_by()` with another data wrangling operation, in this case `summarize()` will the actual data be transformed. 
+
+```{r, eval=TRUE}
+diamonds %>% 
+  group_by(cut) %>% 
+  summarize(avg_price = mean(price))
+```
+
+If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the `ungroup()` function. Observe how the `# Groups: cut [5]` meta-data is no longer present. Run this code, specifically in the console:
+
+```{r, eval=TRUE}
+diamonds %>% 
+  group_by(cut) %>% 
+  ungroup()
+```
+
+Let's now revisit the `n()` counting summary function we introduced in the previous section. For example, suppose we'd like to count how many flights departed each of the three airports in New York City:
+
+```{r, eval=TRUE}
 by_origin <- flights %>% 
   group_by(origin) %>% 
   summarize(count = n())
 by_origin
 ```
-```{r, echo=FALSE}
+```{r, echo=FALSE, eval=FALSE}
 by_origin <- flights %>% 
   group_by(origin) %>% 
   summarize(count = n())
@@ -388,12 +410,12 @@ kable(by_origin) %>%
                 latex_options = c("HOLD_position"))
 ```
 
-We see that Newark (`"EWR"`) had the most flights departing in 2013 followed by `"JFK"` and lastly by LaGuardia (`"LGA"`). Note there is a subtle but important difference between `sum()` and `n()`. While `sum()` simply adds up a large set of numbers, the latter counts the number of times each of many different values occur. 
+We see that Newark (`"EWR"`) had the most flights departing in 2013 followed by `"JFK"` and lastly by LaGuardia (`"LGA"`). Note there is a subtle but important difference between `sum()` and `n()`; While `sum()` returns the sum of a numerical variable, `n()` returns counts of the the number of rows/observations. 
 
 
 ### Grouping by more than one variable
 
-You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports *for each month*, we can also group by a second variable `month`: `group_by(origin, month)`.
+You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports *for each month*, we can also group by a second variable `month`: `group_by(origin, month)`. We see there are 36 rows to `by_origin_monthly` because there are 12 months for 3 airports (`EWR`, `JFK`, and `LGA`). 
 
 ```{r}
 by_origin_monthly <- flights %>% 
@@ -402,7 +424,7 @@ by_origin_monthly <- flights %>%
 by_origin_monthly
 ```
 
-We see there are 36 rows to `by_origin_monthly` because there are 12 months times 3 airports (`EWR`, `JFK`, and `LGA`). Why do we `group_by(origin, month)` and not `group_by(origin)` and then `group_by(month)`? Let's investigate:
+Why do we `group_by(origin, month)` and not `group_by(origin)` and then `group_by(month)`? Let's investigate:
 
 ```{r}
 by_origin_monthly_incorrect <- flights %>% 
@@ -412,20 +434,7 @@ by_origin_monthly_incorrect <- flights %>%
 by_origin_monthly_incorrect
 ```
 
-What happened here is that the second `group_by(month)` overrode the first `group_by(origin)`, so that in the end we are only grouping by `month`. The lesson here, is if you want to `group_by()` two or more variables, you should include all these variables in a single `group_by()` function call. 
-
-
-<!--
-Alternatively, you can use the shortcut `count()` function in `dplyr` to get the same result:
-
-```{r, eval=FALSE}
-by_monthly_origin <- flights %>% 
-  count(origin, month)
-by_monthly_origin
-```
-
--->
-
+What happened here is that the second `group_by(month)` overrode the group structure meta-data of the first `group_by(origin)`, so that in the end we are only grouping by `month`. The lesson here is if you want to `group_by()` two or more variables, you should include all these variables in a single `group_by()` function call. 
 
 ```{block lc-groupby, type='learncheck', purl=FALSE}
 **_Learning check_**
@@ -446,7 +455,7 @@ by_monthly_origin
 
 
 
----
+***
 
 
 
@@ -456,7 +465,39 @@ by_monthly_origin
 knitr::include_graphics("images/mutate.png")
 ```
 
-When looking at the `flights` dataset, there are some clear additional variables that could be calculated based on the values of variables already in the dataset.  Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to when they expected to land.  This is commonly referred to as "gain" and we will create this variable using the `mutate` function.  Note that we have also overwritten the `flights` data frame with what it was before as well as an additional variable `gain` here, or put differently, the `mutate()` command outputs a new data frame which then gets saved over the original `flights` data frame.
+Another common transformation of data is to create/compute new variables based on existing ones. For example, say you are more comfortable thinking of temperature in degrees Celsius &deg;C and not degrees Farenheit &deg;F. The formula to convert temperatures from &deg;F to &deg;C is:
+
+$$
+\text{temp in C} = \frac{\text{temp in F} - 32}{1.8}
+$$
+
+We can apply this formula to the `temp` variable using the `mutate()` function, which takes existing variables and mutates them to create new ones.
+
+```{r, eval=FALSE}
+weather <- weather %>% 
+  mutate(temp_in_C = (temp-32)/1.8)
+View(weather)
+````
+```{r, eval=TRUE, echo=FALSE}
+weather <- weather %>% 
+  mutate(temp_in_C = (temp-32)/1.8)
+````
+
+Note that we have overwritten the original `weather` data frame with a new version that now includes the additional variable `temp_in_C`. In other words, the `mutate()` command outputs a new data frame which then gets saved over the original `weather` data frame. Furthermore, note how in `mutate()` we used `temp_in_C = (temp-32)/1.8` to create a new variable `temp_in_C`.
+
+Why did we overwrite the data frame `weather` instead of assigning the result to a new data frame like `weather_new`, but on the other hand why did we *not* overwrite `temp`, but instead created a new variable called `temp_in_C`? As a rough rule of thumb, as long as you are not losing original information that you might need later, it's acceptable practice to overwrite existing data frames. On the other hand, had we used `mutate(temp = (temp-32)/1.8)` instead of `mutate(temp_in_C = (temp-32)/1.8)`, we would have overwritten the original variable `temp` and lost its values. 
+
+Let's compute average monthly temperatures in both &deg;F and &deg;C using the similar `group_by()` and `summarize()` code as in the previous section.
+
+```{r}
+summary_monthly_temp <- weather %>% 
+  group_by(month) %>% 
+  summarize(mean_temp_in_F = mean(temp, na.rm = TRUE), 
+            mean_temp_in_C = mean(temp_in_C, na.rm = TRUE))
+summary_monthly_temp
+````
+
+Let's consider another example. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to the original arrival time.  This is commonly referred to as "gain" and we will create this variable using the `mutate()` function. 
 
 ```{r}
 flights <- flights %>% 
@@ -473,8 +514,6 @@ flights %>%
 
 The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its "gained time in the air" is actually a loss of 9 minutes, hence its `gain` is `-9`. Contrast this to the flight in the fourth row which departed a minute early (`dep_delay` of `-1`) but arrived 18 minutes early (`arr_delay` of `-18`), so its "gained time in the air" is 17 minutes, hence its `gain` is `+17`.
 
-Why did we overwrite `flights` instead of assigning the resulting data frame to a new object, like `flights_with_gain`? As a rough rule of thumb, as long as you are not losing information that you might need later, it's acceptable practice to overwrite data frames.  However, if you overwrite existing variables and/or change the observational units, recovering the original information might prove difficult. In this case, it might make sense to create a new data object.
-
 Let's look at summary measures of this `gain` variable and even plot it in the form of a histogram:
 
 ```{r, eval=FALSE}
@@ -541,15 +580,15 @@ flights <- flights %>%
 
 
 
----
+***
 
 
 
 ## `arrange` and sort rows {#arrange}
 
-One of the most common things people working with data would like to do is sort the data frames by a specific variable in a column.  Have you ever been asked to calculate a median by hand?  This requires you to put the data in order from smallest to highest in value.  The `dplyr` package has a function called `arrange` that we will use to sort/reorder our data according to the values of the specified variable.  This is often used after we have used the `group_by` and `summarize` functions as we will see.
+One of the most common tasks people working with data would like to perform is sort the data frame's rows in alphanumeric order of the values in a variable/column.  For example, when calculating a median by hand requires you to first sort the data from the smallest to highest in value and then identify the "middle" value. The `dplyr` package has a function called `arrange()` that we will use to sort/reorder a data frame's rows according to the values of the specified variable. This is often used after we have used the `group_by()` and `summarize()` functions as we will see.
 
-Let's suppose we were interested in determining the most frequent destination airports from New York City in 2013:
+Let's suppose we were interested in determining the most frequent destination airports for all domestic flights departing from New York City in 2013:
 
 ```{r, eval}
 freq_dest <- flights %>% 
@@ -558,45 +597,47 @@ freq_dest <- flights %>%
 freq_dest
 ```
 
-You'll see that by default the values of `dest` are displayed in alphabetical order here.  We are interested in finding those airports that appear most:
+Observe that by default the rows of the resulting `freq_dest` data frame are sorted in alphabetical order of `dest` destination.  Say instead we would like to see the same data, but sorted from the most to the least number of flights `num_flights` instead:
 
 ```{r}
 freq_dest %>% 
   arrange(num_flights)
 ```
 
-This is actually giving us the opposite of what we are looking for.  It tells us the least frequent destination airports first.  To switch the ordering to be descending instead of ascending we use the `desc` (`desc`ending) function:
+This is actually giving us the opposite of what we are looking for: the rows are sorted with the least frequent destination airports displayed first.  To switch the ordering to be descending instead of ascending we use the `desc()` function, which is short for "descending":
 
 ```{r}
 freq_dest %>% 
   arrange(desc(num_flights))
 ```
 
+In other words, `arrange()` sorts in ascending order by default unless you override this default behavior by using `desc()`.
 
 
----
+
+***
 
 
 
 ## `join` data frames {#joins}
 
-Another common task is joining AKA merging two different datasets. For example, in the `flights` data, the variable `carrier` lists the carrier code for the different flights. While `"UA"` and `"AA"` might be somewhat easy to guess for some (United and American Airlines), what are "VX", "HA", and "B6"? This information is provided in a separate data frame `airlines`.
+Another common data transformation task is "joining" or "merging" two different datasets. For example in the `flights` data frame the variable `carrier` lists the carrier code for the different flights. While the corresponding airline names for `"UA"` and `"AA"` might be somewhat easy to guess (United and American Airlines), what airlines have codes? `"VX"`, `"HA"`, and `"B6"`? This information is provided in a separate data frame `airlines`.
 
 ```{r eval=FALSE}
 View(airlines)
 ```
 
-We see that in `airports`, `carrier` is the carrier code while `name` is the full name of the airline. Using this table, we can see that "VX", "HA", and "B6" correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, will we have to continually look up the carrier's name for each flight in the `airlines` dataset?  No! Instead of having to  do this manually, we can have R automatically do the "looking up" for us.  
+We see that in `airports`, `carrier` is the carrier code while `name` is the full name of the airline company. Using this table, we can see that `"VX"`, `"HA"`, and `"B6"` correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, wouldn't it be nice to have all this information in a single data frame instead of two separate data frames? We can do this by "joining" i.e. "merging" the `flights` and `airlines` data frames.
 
-Note that the values in the variable `carrier` in `flights` match the values in the variable `carrier` in `airlines`. In this case, we can use the variable `carrier` as a *key variable* to join/merge/match the two data frames by. Key variables are almost always identification variables that uniquely identify the observational units as we saw back in Subsection \@ref(identification-vs-measurement) on identification vs measurement variables. This ensures that rows in both data frames are appropriate matched during the join. Hadley and Garrett [@rds2016] created the following diagram to help us understand how the different datasets are linked by various key variables:
+Note that the values in the variable `carrier` in the `flights` data frame match the values in the variable `carrier` in the `airlines` data frame. In this case, we can use the variable `carrier` as a *key variable* to match the rows of the two data frames. Key variables are almost always identification variables that uniquely identify the observational units as we saw in Subsection \@ref(identification-vs-measurement-variables). This ensures that rows in both data frames are appropriately matched during the join. Hadley and Garrett [@rds2016] created the following diagram to help us understand how the different datasets are linked by various key variables:
 
 ```{r reldiagram, echo=FALSE, fig.cap="Data relationships in nycflights13 from R for Data Science", purl=FALSE}
 knitr::include_graphics("images/relational-nycflights.png")
 ```
 
-### Joining by "key" variables
+### Matching "key" variable names
 
-In both `flights` and `airlines`, the key variable we want to join/merge/match the two data frames with has the same name in both datasets: `carriers`. We make use of the `inner_join()` function to join by the variable `carrier`.
+In both the `flights` and `airlines` data frames, the key variable we want to join/merge/match the rows of the two data frames by have the same name: `carriers`. We make use of the `inner_join()` function to join the two data frames, where the rows will be matched by the variable `carrier`.
 
 ```{r eval=FALSE}
 flights_joined <- flights %>% 
@@ -605,19 +646,19 @@ View(flights)
 View(flights_joined)
 ```
 
-We observed that the `flights` and `flights_joined` are identical except that `flights_joined` has an additional variable `name` whose values were drawn from `airlines`. 
+Observe that the `flights` and `flights_joined` data frames are identical except that `flights_joined` has an additional variable `name` whose values correspond to the airline company names drawn from the `airlines` data frame. 
 
-A visual representation of the `inner_join` is given below [@rds2016]:
+A visual representation of the `inner_join()` is given below [@rds2016]. There are other types of joins available (such as `left_join()`, `right_join()`, `outer_join()`, and `anti_join()`), but the `inner_join()` will solve nearly all of the problems you'll encounter in this book.
 
 ```{r ijdiagram, echo=FALSE, fig.cap="Diagram of inner join from R for Data Science", purl=FALSE}
 knitr::include_graphics("images/join-inner.png")
 ```
 
-There are more complex joins available, but the `inner_join` will solve nearly all of the problems you'll face in our experience.
 
-### Joining by "key" variables with different names
 
-Say instead, you are interested in all the destinations of flights from NYC in 2013 and ask yourself:
+### Different "key" variable names {#diff-key}
+
+Say instead you are interested in the destinations of all domestic flights departing NYC in 2013 and ask yourself:
 
 - "What cities are these airports in?" 
 - "Is `"ORD"` Orlando?"
@@ -629,16 +670,17 @@ The `airports` data frame contains airport codes:
 View(airports)
 ```
 
-However, looking at both the `airports` and `flights` and the visual representation of the relations between the data frames in Figure \@ref(fig:ijdiagram), we see that in:
+However, looking at both the `airports` and `flights` frames and the visual representation of the relations between these data frames in Figure \@ref(fig:ijdiagram) above, we see that in:
 
-* `airports` the airport code is in the variable `faa`
-* `flights` the airport code is in the variables `origin` and `dest` (destination)
+* the `airports` data frame the airport code is in the variable `faa`
+* the `flights` data frame the airport codes are in the variables `origin` and `dest`
 
-So to join these two datasets so that we can identify the destination cities, our `inner_join` operation involves a `by` argument that accounts for the different names:
+So to join these two data frames so that we can identify the destination cities for example, our `inner_join()` operation will use the `by = c("dest" = "faa")` argument, which allows us to join two data frames where the key variable has a different name:
 
 ```{r, eval=FALSE}
-flights %>% 
+flights_with_airport_names <-  flights %>% 
   inner_join(airports, by = c("dest" = "faa"))
+View(flights_with_airport_names)
 ```
 
 Let's construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport:
@@ -653,19 +695,18 @@ named_dests <- flights %>%
 named_dests
 ```
 
-In case you didn't know, `"ORD"` is the airport code of Chicago O'Hare airport and `"FLL"` is the main airport in Fort Lauderdale, Florida, which we can now see in our `named_dests` data frame.
+In case you didn't know, `"ORD"` is the airport code of Chicago O'Hare airport and `"FLL"` is the main airport in Fort Lauderdale, Florida, which we can now see in the `airport_name` variable in the resulting `named_dests` data frame.
 
-### Joining by multiple "key" variables
+### Multiple "key" variables
 
 Say instead we are in a situation where we need to join by multiple variables. For example, in Figure \@ref(fig:reldiagram) above we see that in order to join the `flights` and `weather` data frames, we need more than one key variable: `year`, `month`, `day`, `hour`, and `origin`. This is because the combination of these 5 variables act to uniquely identify each observational unit in the `weather` data frame: hourly weather recordings at each of the 3 NYC airports.
 
-We achieve this by specifying a vector of key variables to join by using the `c()` concatenate function. Note the individual variables need to be wrapped in quotation marks.
+We achieve this by specifying a vector of key variables to join by using the `c()` function for "combine" or "concatenate" that we saw earlier:
 
-```{r}
+```{r, eval=FALSE}
 flights_weather_joined <- flights %>%
-  inner_join(weather, 
-             by = c("year", "month", "day", "hour", "origin"))
-flights_weather_joined
+  inner_join(weather, by = c("year", "month", "day", "hour", "origin"))
+View(flights_weather_joined)
 ```
 
 
@@ -681,14 +722,43 @@ flights_weather_joined
 ```
 
 
+### Normal forms
+
+The data frames included in the `nycflights13` package are in a form that minimizes redundancy of data. For example, the `flights` data frame only saves the `carrier` code of the airline company; it does not include the actual name of the airline. For example the first row of `flights` has `carrier` equal to `UA`, but does it does not include the airline name "United Air Lines Inc." The names of the airline companies are included in the `name` variable of the `airlines` data frame. In order to have the airline company name included in `flights`, we could join these two data frames as follows:
+
+```{r eval=FALSE}
+joined_flights <- flights %>% 
+  inner_join(airlines, by = "carrier")
+View(joined_flights)
+```
+
+We are capable of performing this join because each of the data frames have _keys_ in common to relate one to another: the `carrier` variable in both the `flights` and `airlines` data frames.  The *key* variable(s) that we join are often *identification variables* we mentioned previously. 
+
+This is an important property of what's known as **normal forms** of data.  The process of decomposing data frames into less redundant tables without losing information is called **normalization**.  More information is available on [Wikipedia](https://en.wikipedia.org/wiki/Database_normalization).
+
+
+```{block, type='learncheck'}
+**_Learning check_**
+```
+
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are some advantages of data in normal forms?  What are some disadvantages?
+
+```{block, type='learncheck', purl=FALSE}
+```
+
+
 
----
+***
 
 
 
 ## Other verbs {#other-verbs}
 
-On top of the following examples of other verbs, if you'd like to see more examples on using `dplyr`, the data wrangling verbs we introduction in Section \@ref(verbs), and the pipe function `%>%` with the `nycflights13` dataset, check out [Chapter 5](http://r4ds.had.co.nz/transform.html) of Hadley and Garrett's book [@rds2016].
+Here are some other useful data wrangling verbs that might come in handy:
+
+* `select()` only a subset of variables/columns
+* `rename()` variables/columns to have new names
+* Return only the `top_n()` values of a variable
 
 ### `select` variables {#select}
 
@@ -696,30 +766,30 @@ On top of the following examples of other verbs, if you'd like to see more examp
 knitr::include_graphics("images/select.png")
 ```
 
-We've seen that the `flights` data frame in the `nycflights13` package contains many different variables. The `names` function gives a listing of all the columns in a data frame; in our case you would run `names(flights)`. You can also identify these variables by running the `glimpse` function in the `dplyr` package:
+We've seen that the `flights` data frame in the `nycflights13` package contains 19 different variables. You can identify the names of these 19 variables by running the `glimpse()` function from the `dplyr` package:
 
 ```{r, eval=FALSE}
 glimpse(flights)
 ```
 
-However, say you only want to consider two of these variables, say `carrier` and `flight`. You can `select` these:
+However, say you only need two of these variables, say `carrier` and `flight`. You can `select()` these two variables:
 
 ```{r, eval=FALSE}
 flights %>% 
   select(carrier, flight)
 ```
 
-This function makes navigating datasets with a very large number of variables easier for humans by restricting consideration to only those of interest, like `carrier` and `flight` above. So for example, this might make viewing the dataset using the `View()` spreadsheet viewer more digestible. However, as far as the computer is concerned it doesn't care how many additional variables are in the dataset in question, so long as `carrier` and `flight` are included.
+This function makes exploring data frames with a very large number of variables easier for humans to process by restricting consideration to only those we care about, like our example with `carrier` and `flight` above. This might make viewing the dataset using the `View()` spreadsheet viewer more digestible. However, as far as the computer is concerned, it doesn't care how many additional variables are in the data frame in question, so long as `carrier` and `flight` are included.
 
-Another example involves the variable `year`. If you remember the original description of the `flights` data frame (or by running `?flights`), you'll remember that this data correspond to flights in 2013 departing New York City.  The `year` variable isn't really a variable here in that it doesn't vary... `flights` actually comes from a larger dataset that covers many years.  We may want to remove the `year` variable from our dataset since it won't be helpful for analysis in this case. We can deselect `year` by using the `-` sign:
+Let's say instead you want to drop i.e deselect certain variables. For example, take the variable `year` in the `flights` data frame. This variable isn't quite a "variable" in the sense that all the values are `2013` i.e. it doesn't change. Say you want to remove the `year` variable from the data frame; we can deselect `year` by using the `-` sign:
 
 ```{r, eval=FALSE}
 flights_no_year <- flights %>% 
   select(-year)
-names(flights_no_year)
+glimpse(flights_no_year)
 ```
 
-Or we could specify a ranges of columns:
+Another way of selecting columns/variables is by specifying a range of columns:
 
 ```{r, eval=FALSE}
 flight_arr_times <- flights %>% 
@@ -727,15 +797,15 @@ flight_arr_times <- flights %>%
 flight_arr_times
 ```
 
-The `select` function can also be used to reorder columns in combination with the `everything` helper function.  Let's suppose we'd like the `hour`, `minute`, and `time_hour` variables, which appear at the end of the `flights` dataset, to actually appear immediately after the `day` variable:
+The `select()` function can also be used to reorder columns in combination with the `everything()` helper function.  Let's suppose we'd like the `hour`, `minute`, and `time_hour` variables, which appear at the end of the `flights` dataset, to appear immediately after the `year`, `month`, and `day` variables while keeping the rest of the variables. In the code below `everything()` picks up all remaining variables. 
 
 ```{r, eval=FALSE}
 flights_reorder <- flights %>% 
-  select(month:day, hour:time_hour, everything())
-names(flights_reorder)
+  select(year, month, day, hour, minute, time_hour, everything())
+glimpse(flights_reorder)
 ```
 
-in this case `everything()` picks up all remaining variables. Lastly, the helper functions `starts_with`, `ends_with`, and `contains` can be used to choose column names that match those conditions:
+Lastly, the helper functions `starts_with()`, `ends_with()`, and `contains()` can be used to select variables/column that match those conditions. For example:
 
 ```{r, eval=FALSE}
 flights_begin_a <- flights %>% 
@@ -757,35 +827,28 @@ flights_time
 
 ### `rename` variables {#rename}
 
-Another useful function is `rename`, which as you may suspect renames one column to another name.  Suppose we wanted `dep_time` and `arr_time` to be `departure_time` and `arrival_time` instead in the `flights_time` data frame:
+Another useful function is `rename()`, which as you may have guessed renames one column to another name.  Suppose we want `dep_time` and `arr_time` to be `departure_time` and `arrival_time` instead in the `flights_time` data frame:
 
 ```{r, eval=FALSE}
 flights_time_new <- flights %>% 
   select(contains("time")) %>% 
   rename(departure_time = dep_time,
          arrival_time = arr_time)
-names(flights_time)
+glimpse(flights_time)
 ```
 
-Note that in this case we used a single `=` sign with the `rename()`. Ex: `departure_time = dep_time`. This is because we are not testing for equality like we would using `==`, but instead we want to assign a new variable `departure_time` to have the same values as `dep_time` and then delete the variable `dep_time`.
-
-
-It's easy to forget if the new name comes before or after the equals sign.  I usually remember this as "New Before, Old After" or NBOA. You'll receive an error if you try to do it the other way:
-
-```
-Error: Unknown variables: departure_time, arrival_time.
-```
+Note that in this case we used a single `=` sign within the `rename()`, for example `departure_time = dep_time`. This is because we are not testing for equality like we would using `==`, but instead we want to assign a new variable `departure_time` to have the same values as `dep_time` and then delete the variable `dep_time`. It's easy to forget if the new name comes before or after the equals sign.  I usually remember this as "New Before, Old After" or NBOA. 
 
 ### `top_n` values of a variable
 
-We can also use the `top_n` function which automatically tells us the most frequent `num_flights`.  We specify the top 10 airports here:
+We can also return the top `n` values of a variable using the `top_n()` function. For example, we can return a data frame of the top 10 destination airports using the example from Section \@ref(diff-key). Observe that we set the number of values to return to `n = 10` and `wt = num_flights` to indicate that we want the rows of corresponding to the top 10 values of `num_flights`. See the help file for `top_n()` by running `?top_n` for more information. 
 
 ```{r, eval=FALSE}
 named_dests %>% 
   top_n(n = 10, wt = num_flights)
 ```
 
-We'll still need to arrange this by `num_flights` though:
+Let's further `arrange()` these results in descending order of `num_flights`:
 
 ```{r, eval=FALSE}
 named_dests  %>% 
@@ -793,18 +856,6 @@ named_dests  %>%
   arrange(desc(num_flights))
 ```
 
-**Note:** Remember that I didn't pull the `n` and `wt` arguments out of thin air.  They can be found by using the `?` function on `top_n`.
-
-We can go one stop further and tie together the `group_by` and `summarize` functions we used to find the most frequent flights:
-
-```{r, eval=FALSE}
-ten_freq_dests <- flights %>%
-  group_by(dest) %>%
-  summarize(num_flights = n()) %>%
-  arrange(desc(num_flights)) %>%
-  top_n(n = 10) 
-View(ten_freq_dests)
-```
 
 ```{block lc-other-verbs, type='learncheck', purl=FALSE}
 **_Learning check_**
@@ -823,7 +874,7 @@ View(ten_freq_dests)
 
 
 
----
+***
 
 
 
@@ -831,7 +882,7 @@ View(ten_freq_dests)
 
 ### Summary table
 
-Let's recap a selection of verbs in Table \@ref(tab:wrangle-summary-table) summarizing their differences. Using these verbs and the pipe `%>%` operator from Section \@ref(piping), you'll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book. 
+Let's recap our data wrangling verbs in Table \@ref(tab:wrangle-summary-table). Using these verbs and the pipe `%>%` operator from Section \@ref(piping), you'll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book. 
 
 ```{r wrangle-summary-table, echo=FALSE, message=FALSE}
 # The following Google Doc is published to CSV and loaded below using read_csv() below:
@@ -839,15 +890,15 @@ Let's recap a selection of verbs in Table \@ref(tab:wrangle-summary-table) summa
 
 "https://docs.google.com/spreadsheets/d/e/2PACX-1vRgwl1lugQA6zxzfB6_0hM5vBjXkU7cbUVYYXLcWeaRJ9HmvNXyCjzJCgiGW8HCe1kvjLCGYHf-BvYL/pub?gid=0&single=true&output=csv" %>% 
   read_csv(na = "") %>% 
-  rename_(" " = "X1") %>% 
+  select(-X1) %>% 
   kable(
     caption = "Summary of data wrangling verbs", 
     booktabs = TRUE
   ) %>% 
   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
                 latex_options = c("HOLD_position")) %>%
-  column_spec(2, width = "0.9in") %>% 
-  column_spec(3, width = "3.3in")
+  column_spec(1, width = "0.9in") %>% 
+  column_spec(2, width = "3.3in")
 ```
 
 ```{block lc-asm, type='learncheck', purl=FALSE}
@@ -881,6 +932,8 @@ You can access this cheatsheet by going to the RStudio Menu Bar -> Help -> Cheat
 include_graphics("images/dplyr_cheatsheet-1.png")
 ```
 
+On top of data wrangling verbs and examples we presented in this section, if you'd like to see more examples of using the `dplyr` package for data wrangling check out [Chapter 5](http://r4ds.had.co.nz/transform.html) of Garrett Grolemund and Hadley Wickham's and Garrett's book [@rds2016].
+
 <!--
 Review questions have been designed using the `fivethirtyeight` R package [@R-fivethirtyeight] with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course **Effective Data Storytelling using the `tidyverse`**.  The material in this chapter is covered in the chapters of the DataCamp course available below:
 
@@ -892,7 +945,9 @@ Review questions have been designed using the `fivethirtyeight` R package [@R-fi
 
 ### What's to come?
 
-So far in this book, we've explored, visualized, and wrangled data saved in data frames that are in spreadsheet-type format: rectangular with a certain number of rows corresponding to observations and a certain number of columns corresponding to variables describing the observations. We'll see in Chapter \@ref(tidy) that there are actually two ways to represent data in spreadsheet-type rectangular format: 1) "wide" format and 2) "tall/narrow" format also known in R circles as "tidy" format. While the distinction between "tidy" and non-"tidy" formatted data is very subtle, it has very large implications for whether or not we can use the `ggplot2` package for data visualization and the `dplyr` package for data wrangling. 
+So far in this book, we've explored, visualized, and wrangled data saved in data frames that are in spreadsheet-type format: rectangular with a certain number of rows corresponding to observations and a certain number of columns corresponding to variables describing the observations. 
+
+We'll see in Chapter \@ref(tidy) that there are actually two ways to represent data in spreadsheet-type rectangular format: 1) "wide" format and 2) "tall/narrow" format also known in R circles as "tidy" format. While the distinction between "tidy" and non-"tidy" formatted data is very subtle, it has very large implications for whether or not we can use the `ggplot2` package for data visualization and the `dplyr` package for data wrangling. 
 
 Furthermore, we've only explored, visualized, and wrangled data saved within R packages. What if you have spreadsheet data saved in a Microsoft Excel, Google Sheets, or "Comma-Separated Values" (CSV) file that you would like to analyze? In Chapter \@ref(tidy), we'll show you how to import this data into R using the `readr` package. 
 
diff --git a/05-tidy.Rmd b/05-tidy.Rmd
index 599aaf7f2..69c05e1f0 100755
--- a/05-tidy.Rmd
+++ b/05-tidy.Rmd
@@ -13,7 +13,7 @@ knitr::opts_chunk$set(
   fig.height = 4,
   fig.align='center',
   warning = FALSE
-  )
+)
 
 options(scipen = 99, digits = 3)
 
@@ -25,11 +25,11 @@ options(knitr.kable.NA = '')
 set.seed(76)
 ```
 
-In Subsection \@ref(programming-concepts) we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation.  In Section \@ref(nycflights13), we started exploring our first data frame: the `flights` data frame included in the `nycflights13` package. In Chapter \@ref(viz) we created visualizations based on the data included in `flights` and other data frames such as `weather`. In Chapter \@ref(wrangling), we learned how to wrangle data, in other words take existing data frames and transform and modify them to suit our desired analysis. 
+In Subsection \@ref(programming-concepts) we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation.  In Section \@ref(nycflights13), we started exploring our first data frame: the `flights` data frame included in the `nycflights13` package. In Chapter \@ref(viz) we created visualizations based on the data included in `flights` and other data frames such as `weather`. In Chapter \@ref(wrangling), we learned how to wrangle data, in other words take existing data frames and transform/ modify them to suit our analysis goals. 
 
-In this final chapter of the "Data Science via the tidyverse" portion of the book, we extend some of these ideas by discussing a type of data formatting called "tidy" data. You will see that having data stored in "tidy" format is about more than what the colloquial definition of the term "tidy" might suggest of having your data "neatly organized" in a spreadsheet. Instead, we define the term "tidy" in a more rigorous fashion, outlining a set of rules by which data can be stored and the implications of these rules for analyses.
+In this final chapter of the "Data Science via the tidyverse" portion of the book, we extend some of these ideas by discussing a type of data formatting called "tidy" data. You will see that having data stored in "tidy" format is about more than what the colloquial definition of the term "tidy" might suggest: having your data "neatly organized." Instead, we define the term "tidy" in a more rigorous fashion, outlining a set of rules by which data can be stored, and the implications of these rules for analyses.
 
-Although knowledge of this type of data formatting was not necessary in our treatment of data visualization in Chapter \@ref(viz) since all the data was already in tidy format, we'll see going forward that having tidy data will allow you to more easily create data visualizations in a wide range of settings. Furthermore, it will also help you with data wrangling in Chapter \@ref(wrangling) and in all subsequent chapters in this book when we cover regression and discuss statistical inference.  
+Although knowledge of this type of data formatting was not necessary for our treatment of data visualization in Chapter \@ref(viz) and data wrangling in Chapter \@ref(wrangling) since all the data was already in "tidy" format, we'll now see this format is actually essential to using the tools we covered in these two chapters. Furthermore, it will also be useful for all subsequent chapters in this book when we cover regression and statistical inference.  First however, we'll show you how to import spreadsheet data for use in R.
 
 ### Needed packages {-}
 
@@ -54,28 +54,27 @@ library(stringr)
 
 
 
----
+***
 
 
 
 ## Importing data {#csv}
 
-Up to this point, we've almost entirely used data stored inside of an R package. Another common way to getting data into R is by importing from a spreadsheet file either on your computer or online.  Spreadsheet data is often saved in one of two formats:
+Up to this point, we've almost entirely used data stored inside of an R package. Say instead you have your own data saved on your computer or somewhere online? How can you analyze this data in R? Spreadsheet data is often saved in one of the following formats:
 
-* A *Comma Separated Values* `.csv` file.  You can think of a CSV file as a bare-bones spreadsheet where:
+* A *Comma Separated Values* `.csv` file.  You can think of a `.csv` file as a bare-bones spreadsheet where:
     + Each line in the file corresponds to one row of data/one observation.
     + Values for each line are separated with commas. In other words, the values of different variables are separated by commas.
     + The first line is often, but not always, a *header* row indicating the names of the columns/variables.
-* An Excel `.xlsx` file. This format is based on Microsoft's proprietary Excel software. As opposed to a bare-bones `.csv` files, `.xlsx` Excel files contain a lot of *metadata*, or put more simply, data about the data. Examples include the use of bold and italic fonts, colored cells, different column widths, and formula macros etc.
+* An Excel `.xlsx` file. This format is based on Microsoft's proprietary Excel software. As opposed to a bare-bones `.csv` files, `.xlsx` Excel files contain a lot of meta-data, or put more simply, data about the data. (Recall we saw a previous example of meta-data in Section \@ref(groupby) when adding "group structure" meta-data to a data frame by using the `group_by()` verb.) Some examples of spreadsheet meta-data include the use of bold and italic fonts, colored cells, different column widths, and formula macros.
+* A [Google Sheets](https://www.google.com/sheets/about/) file, which is a "cloud" or online-based way to work with a spreadsheet. Google Sheets allows you to download your data in both comma separated values `.csv` and Excel `.xlsx` formats however: go to the Google Sheets menu bar -> File -> Download as -> Select "Microsoft Excel" or "Comma-separated values."
 
-[Google Sheets](https://www.google.com/sheets/about/) allows you to download your data in both comma separated values `.csv` and Excel `.xlsx` formats: Go to the Google Sheets menu bar -> File -> Download as -> Select "Microsoft Excel" or "Comma-separated values."
+We'll cover two methods for importing `.csv` and `.xlsx` spreadsheet data in R: one using the R console and the other using RStudio's graphical user interface, abbreviated a GUI.
 
-We'll cover two methods for importing data in R: one using the R console and the other using RStudio's graphical interface. 
 
+### Using the console
 
-### Importing via the console
-
-First, let's import a *Comma Separated Values* (CSV) of data directly off the internet. The CSV file `dem_score.csv` accessible at <https://moderndive.com/data/dem_score.csv> contains ratings of the level of democracy in different countries spanning 1952 to 1992. Let's use the `read_csv()` function from the `readr` package to read it off the web, import it into R, and save the data in a data frame called `dem_score`
+First, let's import a Comma Separated Values `.csv` file of data directly off the internet. The `.csv` file `dem_score.csv` accessible at <https://moderndive.com/data/dem_score.csv> contains ratings of the level of democracy in different countries spanning 1952 to 1992. Let's use the `read_csv()` function from the `readr` package to read it off the web, import it into R, and save it in a data frame called `dem_score`
 
 ```{r message=FALSE, eval=FALSE}
 library(readr)
@@ -87,44 +86,50 @@ dem_score <- read_csv("data/dem_score.csv")
 dem_score
 ```
 
-In this `dem_score` data frame, the minimum value of `-10` corresponds to a highly autocratic nation whereas a value of `10` corresponds to a highly democratic nation.  We'll revisit the `dem_score` data frame in a case study analysis in the upcoming Section \@ref(case-study-tidy).
+In this `dem_score` data frame, the minimum value of `-10` corresponds to a highly autocratic nation whereas a value of `10` corresponds to a highly democratic nation.  We'll revisit the `dem_score` data frame in a case study in the upcoming Section \@ref(case-study-tidy).
 
-Note that the `read_csv()` function included in the `readr` package is different than the `read.csv()` function that comes with R even if you don't install any packages. While the different in the names might be near meaningless (an `_` instead of a `.`), the `read_csv()` is in our opinions easier to use since it can easily read data off the web and generally imports data at a much faster speed. 
+Note that the `read_csv()` function included in the `readr` package is different than the `read.csv()` function that comes installed with R by default. While the difference in the names might seem near meaningless (an `_` instead of a `.`), the `read_csv()` function is in our opinion easier to use since it can more easily read data off the web and generally imports data at a much faster speed. 
 
 <!--Note also that backticks surround the different names of the columns here.  Variable names are not allowed to start with a number but this can be worked around by surrounding the column name in backticks.  Variable names also can't include spaces so if you'd like to refer to the variable **Stock Names** above, for example, you'll need to surround it in backticks: `` `Stock Names` ``.-->
 
-### Importing via RStudio's interface
+### Using RStudio's interface
 
 Let's read in the exact same data saved in Excel format, but this time via RStudio's graphical interface instead of via the R console. First download the Excel file `dem_score.xlsx` by clicking <a href="https://moderndive.com/data/dem_score.xlsx" download>here</a>, then
 
 1. Go to the Files panel of RStudio.
-2. Navigate to the directory where your downloaded `dem_score.xlsx` is saved.
-3. Click on `dem_score.xlsx`
+2. Navigate to the directory i.e. folder on your computer where the downloaded `dem_score.xlsx` Excel file is saved.
+3. Click on `dem_score.xlsx`.
 4. Click "Import Dataset..." 
 
 At this point you should see an image like this:
 
 ![](images/read_excel.png)
 
-After clicking on the "Import" button on the bottom right RStudio save this spreadsheet's data in a data frame called `dem_score` and display its contents in the spreadsheet viewer. Furthermore on the bottom right you'll see the code that read in your data in the console; you can copy and paste this code to reload your data again later automatically instead of repeating the above manual process.
+After clicking on the "Import" button on the bottom right RStudio, RStudio will save this spreadsheet's data in a data frame called `dem_score` and display its contents in the spreadsheet viewer. Furthermore, note in the bottom right of the above image there exists a "Code Preview": you can copy and paste this code to reload your data again later automatically instead of repeating the above manual point-and-click process.
 
 
 
----
+***
 
 
 
-## Tidy data
+## Tidy data {#tidy-data-ex}
 
-Let's now switch gears and learn about the concept of "tidy" data format. Let's start with a motivating example. Let's consider the `drinks` data frame included in the `fivethirtyeight` data. Run the
+Let's now switch gears and learn about the concept of "tidy" data format by starting with a motivating example. Let's consider the `drinks` data frame included in the `fivethirtyeight` data. Run the following:
 
 ```{r}
 drinks
 ```
 
-After reading the help file by running `?drinks` we see that is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries originally reported on the data journalism website FiveThirtyEight.com's article ["Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?"](https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/). 
+After reading the help file by running `?drinks`, we see that `drinks` is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries. This data was originally reported on the data journalism website FiveThirtyEight.com in Mona Chalabi's article ["Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?"](https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/)
+
+Let's apply some of the data wrangling verbs we learned in Chapter \@ref(wrangling) on the `drinks` data frame. Let's
+
+1. `filter()` the `drinks` data frame to only consider 4 countries (the United States, China, Italy, and Saudi Arabia) then
+1. `select()` all columns except `total_litres_of_pure_alcohol` by using `-` sign, then
+1. `rename()` the variables `beer_servings`, `spirit_servings`, and `wine_servings` to `beer`, `spirit`, and `wine` respectively
 
-Let's filter `drinks` to only consider 4 countries: the US, China, Italy, and Saudi Arabia; drop the column `total_litres_of_pure_alcohol` by using `select()` with a `-` sign; and rename the variables `beer_servings`, `spirit_servings`, and `wine_servings` to read `beer`, `spirit`, and `wine`.
+and save the resulting data frame in `drinks_smaller`.
 
 ```{r}
 drinks_smaller <- drinks %>% 
@@ -134,7 +139,7 @@ drinks_smaller <- drinks %>%
 drinks_smaller
 ```
 
-Using `drinks_smaller`, how would we create the side-by-side AKA dodged barplot in Figure \@ref(fig:drinks-smaller); recall we saw barplots displaying two categorical variables in Section \@ref(two-categ-barplot).
+Using the `drinks_smaller` data frame, how would we create the side-by-side AKA dodged barplot in Figure \@ref(fig:drinks-smaller)? Recall we saw barplots displaying two categorical variables in Section \@ref(two-categ-barplot).
 
 ```{r drinks-smaller, fig.cap="Alcohol consumption in 4 countries.", fig.height=3.5, echo=FALSE}
 drinks_smaller_tidy <- drinks_smaller %>% 
@@ -146,20 +151,27 @@ ggplot(drinks_smaller_tidy, aes(x=country, y=servings, fill=type)) +
 
 Let's break down the Grammar of Graphics:
 
-1. The categorical variable `country` with four levels (China, Italy, Saudi Arabia, USA) is mapped to the `x`-position of the bars.
-1. The numerical variable `servings` is mapped to the `y`-position of the bars, in other words the height.
-1. The cateogircal variable `type` with three levels (beer, spirit, wine) is mapped to the `fill` color of the bars.
+1. The categorical variable `country` with four levels (China, Italy, Saudi Arabia, USA) would have to be mapped to the `x`-position of the bars.
+1. The numerical variable `servings` would have to be mapped to the `y`-position of the bars, in other words the height of the bars.
+1. The categorical variable `type` with three levels (beer, spirit, wine) who have to be mapped to the `fill` color of the bars.
 
-Observe however that `drinks_smaller` has *three separate columns* for `beer`, `spirit`, and `wine`, whereas in order to recreate the side-by-side AKA dodged barplot in Figure \@ref(fig:drinks-smaller) we would need a *single column* `type` with three possible values: `beer`, `spirit`, and `wine`. In other words, for us to be able to create this barplot, our data frame would have to look like:
+Observe however that `drinks_smaller` has *three separate variables* for `beer`, `spirit`, and `wine`, whereas in order to recreate the side-by-side AKA dodged barplot in Figure \@ref(fig:drinks-smaller) we would need a *single variable* `type` with three possible values: `beer`, `spirit`, and `wine`, which we would then map to the `fill` aesthetic.  In other words, for us to be able to create the barplot in Figure \@ref(fig:drinks-smaller), our data frame would have to look like this:
 
 ```{r}
 drinks_smaller_tidy
 ```
 
-Observe that while `drinks_smaller` and `drinks_smaller_tidy` are both rectangular in shape and contain the same data on 4 countries average number of servings for 3 alcohol types, totalling 12 numerical values, they are formatted differently. `drinks_smaller` is formatted in what's known as ["wide"](https://en.wikipedia.org/wiki/Wide_and_narrow_data) format, whereas `drinks_smaller_tidy` is formated in what's known as ["long/narrow"](https://en.wikipedia.org/wiki/Wide_and_narrow_data#Narrow). "Long/narrow" format is as known in R circles as "tidy" format.
+Let's compare the `drinks_smaller_tidy` with the `drinks_smaller` data frame from earlier:
 
+```{r}
+drinks_smaller
+```
 
-### What is tidy data?
+Observe that while `drinks_smaller` and `drinks_smaller_tidy` are both rectangular in shape and contain the same 12 numerical values (3 alcohol types $\times$ 4 countries), they are formatted differently. `drinks_smaller` is formatted in what's known as ["wide"](https://en.wikipedia.org/wiki/Wide_and_narrow_data) format, whereas `drinks_smaller_tidy` is formatted in what's known as ["long/narrow"](https://en.wikipedia.org/wiki/Wide_and_narrow_data#Narrow). In the context of using R, long/narrow format is also known as "tidy" format. Furthermore, in order to use the `ggplot2` and `dplyr` packages for data visualization and data wrangling, your input data frames *must* be in "tidy" format. So all non-"tidy" data must be converted to "tidy" format first. 
+
+Before we show you how to convert non-"tidy" data frames like `drinks_smaller` to "tidy" data frames like `drinks_smaller_tidy`, let's go over the explicit definition of "tidy" data.
+
+### Definition of "tidy" data
 
 You have surely heard the word "tidy" in your life:
 
@@ -168,7 +180,7 @@ You have surely heard the word "tidy" in your life:
 * Marie Kondo's best-selling book [_The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing_](https://www.amazon.com/Life-Changing-Magic-Tidying-Decluttering-Organizing/dp/1607747308/ref=sr_1_1?ie=UTF8&qid=1469400636&sr=8-1&keywords=tidying+up) and Netflix TV series [_Tidying Up with Marie Kondo_](https://www.netflix.com/title/80209379).
 * "I am not by any stretch of the imagination a tidy person, and the piles of unread books on the coffee table and by my bed have a plaintive, pleading quality to me - 'Read me, please!'" - Linda Grant
 
-What does it mean for your data to be "tidy"? While "tidy" has a clear english meaning of "organized", "tidy" in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham's definition of *tidy data* here [@tidy]:
+What does it mean for your data to be "tidy"? While "tidy" has a clear English meaning of "organized", "tidy" in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham's definition of *tidy data* here [@tidy]:
 
 > A dataset is a collection of values, usually either numbers (if quantitative)
 or strings AKA text data (if qualitative). Values are organised in two ways.
@@ -185,13 +197,13 @@ are matched up with observations, variables and types. In *tidy data*:
 > 2. Each observation forms a row.
 > 3. Each type of observational unit forms a table.
 
-```{r tidyfig, echo=FALSE, fig.cap="Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html"}
+```{r tidyfig, echo=FALSE, fig.cap="Tidy data graphic from [R for Data Science](http://r4ds.had.co.nz/tidy-data.html)."}
 knitr::include_graphics("images/tidy-1.png")
 ```
 
-For example, say the following table consists of stock prices:
+For example, say you have the following table of stock prices in Table \@ref(tab:non-tidy-stocks):
 
-```{r echo=FALSE}
+```{r non-tidy-stocks, echo=FALSE}
 stocks <- data_frame(
   Date = as.Date('2009-01-01') + 0:4,
   `Boeing Stock Price` = paste("$", c("173.55", "172.61", "173.86", "170.77", "174.29"), sep = ""),
@@ -209,9 +221,9 @@ stocks %>%
                 latex_options = c("HOLD_position"))
 ```
 
-Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format since there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), but there are not three columns. In tidy data format each variable should be its own column, as shown below. Notice that both tables present the same information, but in different formats. 
+Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format because while there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), there are not three columns. In "tidy" data format each variable should be its own column, as shown in Table \@ref(tab:tidy-stocks). Notice that both tables present the same information, but in different formats. 
 
-```{r echo=FALSE}
+```{r tidy-stocks, echo=FALSE}
 stocks_tidy <- stocks %>% 
   rename(
     Boeing = `Boeing Stock Price`,
@@ -229,9 +241,9 @@ stocks_tidy %>%
                 latex_options = c("HOLD_position"))
 ```
 
-However, consider the following table
+Now we have the requisite three columns Date, Stock Name, and Stock Price. On the other hand, consider the data in  Table \@ref(tab:tidy-stocks-2).
 
-```{r echo=FALSE}
+```{r tidy-stocks-2, echo=FALSE}
 stocks <- data_frame(
   Date = as.Date('2009-01-01') + 0:4,
   `Boeing Price` = paste("$", c("173.55", "172.61", "173.86", "170.77", "174.29"), sep = ""),
@@ -248,17 +260,31 @@ stocks %>%
                 latex_options = c("HOLD_position"))
 ```
 
-In this case, even though the variable "Boeing Price" occurs again, the data *is* tidy since there are three variables corresponding to three unique pieces of information (Date, Boeing stock price, and the weather that particular day).
+In this case, even though the variable "Boeing Price" occurs just like in our non-"tidy" data in Table \@ref(tab:non-tidy-stocks), the data *is* "tidy" since there are three variables corresponding to three unique pieces of information: Date, Boeing stock price, and the weather that particular day.
+
+```{block, type='learncheck'}
+**_Learning check_**
+```
+
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are common characteristics of "tidy" data frames?
+
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What makes "tidy" data frames useful for organizing data?
+
+```{block, type='learncheck', purl=FALSE}
+```
 
-### Converting to "tidy" format
 
-In this book so far, you've only seen data frames that were already in "tidy" format. Furthermore for the rest of this book, you'll only see data frames that are already in "tidy" format. This is not always the case however with data in the wild. If your original data is in wide AKA non-"tidy" format and you would like to use the `ggplot2` or `dplyr` packages on it, you will have to convert it "tidy" format using the `gather()` function in the `tidyr` package [@R-tidyr]. Going back to our `drinks_smaller` data frame
+### Converting to "tidy" data
+
+In this book so far, you've only seen data frames that were already in "tidy" format. Furthermore for the rest of this book, you'll mostly only see data frames that are already in "tidy" format as well. This is not always the case however with data in the wild. If your original data frame is in wide i.e. non-"tidy" format and you would like to use the `ggplot2` package for data visualization or the `dplyr` package for data wrangling, you will first have to convert it "tidy" format using the `gather()` function in the `tidyr` package [@R-tidyr]. 
+
+Going back to our `drinks_smaller` data frame from earlier:
 
 ```{r}
 drinks_smaller
 ```
 
-let's convert it to "tidy" format by using the `gather()` function from the `tidyr` package:
+We convert it to "tidy" format by using the `gather()` function from the `tidyr` package as follows:
 
 ```{r}
 drinks_smaller_tidy <- drinks_smaller %>% 
@@ -266,13 +292,21 @@ drinks_smaller_tidy <- drinks_smaller %>%
 drinks_smaller_tidy
 ```
 
-We set the
+We set the arguments to `gather()` as follows:
 
-1. `key` argument to be the name of the column/variable in the new "tidy" frame that contains the column names of the original data frame that you want to gather. Observe we set `key = type` and in the resulting `drinks_smaller_tidy` data frame, the column `type` contains the names `beer`, `spirit`, and `serving`.
-1. `value` argument to be the name of the column/variable in the "tidy" frame that contains the rows and columns of values in the original data frame you want to gather. Observe we set `value = servings` and in the resulting `drinks_smaller_tidy` data frame, the column `value` contains the 4 $\times$ 3 numerical values.
-1. Third argument to be the columns you want to or don't want to gather. Observe we set this to `-country` indicating that we don't want to gather the `country` variable and in the resulting `drinks_smaller_tidy` data frame there is still a variable `country`.
+1. `key` is the name of the column/variable in the new "tidy" frame that contains the column names of the original data frame that you want to tidy. Observe how we set `key = type` and in the resulting `drinks_smaller_tidy` the column `type` contains the three types of alcohol `beer`, `spirit`, and `wine`.
+1. `value` is the name of the column/variable in the "tidy" frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set `value = servings` and in the resulting `drinks_smaller_tidy` the column `value` contains the 4 $\times$ 3 = 12 numerical values.
+1. The third argument are the columns you either want to or don't want to tidy. Observe how we set this to `-country` indicating that we don't want to tidy the `country` variable in `drinks_smaller` and rather only `beer`, `spirit`, and `wine`.
 
-With the resulting `drinks_smaller_tidy` "tidy" format data frame, we can now produce a side-by-side AKA dodged barplot using `geom_col()` and not `geom_bar()`, since we would like to map the `servings` variable to the `y`-aesthetic of the bars.
+The third argument is a little nuanced, so let's consider another example. Note the code below is very similar, but now the third argument species which columns we'd want to tidy `c(beer, spirit, wine)`, instead of the columns we don't want to tidy `-country`. Note the use of `c()` to create a vector of the columns in `drinks_smaller` that we'd like to tidy. If you run the code below, you'll see that the resulting `drinks_smaller_tidy` is the same.
+
+```{r, eval=FALSE}
+drinks_smaller_tidy <- drinks_smaller %>% 
+  gather(key = type, value = servings, c(beer, spirit, wine))
+drinks_smaller_tidy
+```
+
+With our `drinks_smaller_tidy` "tidy" format data frame, we can now produce a side-by-side AKA dodged barplot using `geom_col()` and not `geom_bar()`, since we would like to map the `servings` variable to the `y`-aesthetic of the bars.
 
 ```{r}
 ggplot(drinks_smaller_tidy, aes(x=country, y=servings, fill=type)) +
@@ -286,36 +320,34 @@ Converting "wide" format data to "tidy" format often confuses new R users. The o
 **_Learning check_**
 ```
 
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article [Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?](https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/) 
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Take a look the `airline_safety` data frame included in the `fivethirtyeight` data. Run the following:
 
-```{r echo=FALSE}
-drinks_sub <- drinks %>%
-  select(-total_litres_of_pure_alcohol) %>% 
-  filter(country %in% c("USA", "Canada", "South Korea"))
-drinks_sub_tidy <- drinks_sub %>%
-  gather(type, servings, -c(country)) %>%
-  mutate(
-    type = str_sub(type, start=1, end=-10)
-  ) %>%
-  arrange(country, type) %>% 
-  rename(`alcohol type` = type)
-drinks_sub
+```{r, eval=FALSE}
+airline_safety
 ```
 
-This data frame is not in tidy format. What would it look like if it were?
+After reading the help file by running `?airline_safety`, we see that `airline_safety` is a data frame containing information on different airlines companies' safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver's article ["Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?"](https://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/). Let's ignore the `incl_reg_subsidiaries` and `avail_seat_km_per_week` variables for simplicity:
+
+```{r}
+airline_safety_smaller <- airline_safety %>% 
+  select(-c(incl_reg_subsidiaries, avail_seat_km_per_week))
+airline_safety_smaller
+```
+
+This data frame is not in "tidy" format. How would you convert this data frame to be in "tidy" format, in particular so that it has a variable `incident_type_years` indicating the incident type/year and a variable `count` of the counts?
 
 ```{block, type='learncheck', purl=FALSE}
 ```
 
 
 
----
+***
 
 
 
 ### `nycflights13` package
 
-Recall the `nycflights13` package with data about all domestic flights departing from New York City in 2013 that we introduced in Section \@ref(nycflights13) and used extensively in Chapter \@ref(viz) to create visualizations. In particular, let's revisit the `flights` data frame by running `View(flights)` in your console. We see that `flights` has a rectangular shape with each row corresponding to a different flight and each column corresponding to a characteristic of that flight.  This matches exactly with how Hadley Wickham defined tidy data:
+Recall the `nycflights13` package with data about all domestic flights departing from New York City in 2013 that we introduced in Section \@ref(nycflights13) and used extensively in Chapter \@ref(viz) on data visualization and Chapter \@ref(wrangling) on data wrangling. Let's revisit the `flights` data frame by running `View(flights)`. We saw that `flights` has a rectangular shape with each of its `r scales::comma(nrow(flights))` rows corresponding to a flight and each of its `r ncol(flights)` columns corresponding to different characteristics/measurements of each flight. This matches exactly with our definition of "tidy" data from above.
 
 1. Each variable forms a column.
 2. Each observation forms a row.
@@ -324,56 +356,27 @@ But what about the third property of "tidy" data?
 
 > 3. Each type of observational unit forms a table.
 
-**Observational units**:
-
-We identified earlier that the observational unit in the `flights` dataset is an individual flight.  And we have shown that this dataset consists of `r scales::comma(nrow(flights))` flights with `r ncol(flights)` variables.  In other words, rows of this dataset don't refer to a measurement on an airline or on an airport; they refer to characteristics/measurements on a given flight from New York City in 2013.
-
-Also included in the `nycflights13` package are datasets with different observational units [@R-nycflights13]:
-
-* `airlines`: translation between two letter IATA carrier codes and names (`r nrow(nycflights13::airlines)` in total)
-* `planes`: construction information about each of `r scales::comma(nrow(nycflights13::planes))` planes used
-* `weather`: hourly meteorological data (about `r nycflights13::weather %>% count(origin) %>% .[["n"]] %>% mean() %>% round()` observations) for each of the three NYC airports
-* `airports`: airport names and locations
-
-The organization of this data follows the third "tidy" data property: observations corresponding to the same observational unit should be saved in the same table/data frame. Another example involves a spreadsheet of all students enrolled in a university along with information about them, such as name, gender, and date of birth. Each row represents an individual student, which is the observational unit in question.
-
-**Identification vs measurement variables**:
-
-There is a subtle difference between the kinds of variables that you will encounter in data frames: *measurement variables* and *identification variables*.  The `airports` data frame you worked with above contains both these types of variables.  Recall that in `airports` the observational unit is an airport, and thus each row corresponds to one particular airport.  Let's pull them apart using the `glimpse` function:
-
-```{r}
-glimpse(airports)
-```
-
-The variables `faa` and `name` are what we will call *identification variables*: variables that uniquely identify each observational unit. They are mainly used to provide a unique name to each observational unit, thereby allowing us to uniquely identify them. `faa` gives the unique code provided by the FAA for that airport, while the `name` variable gives the longer more natural name of the airport.  The remaining variables (`lat`, `lon`, `alt`, `tz`, `dst`, `tzone`) are often called *measurement* or *characteristic* variables: variables that describe properties of each observational unit, in other words each observation in each row. For example, `lat` and `long` describe the latitude and longitude of each airport. 
-
-So in our above example of a spreadsheet of all students enrolled at a university, email address could be treated as an identical variable since it uniquely identifies each observational unit i.e. each student, while date of birth could not since it is possible (and highly probable) that two students share the same birthday. 
-
-Furthermore, sometimes a single variable might not be enough to uniquely identify each observational unit: combinations of variables might be needed (see Learning Check below). While it is not an absolute rule, for organizational purposes it is considered good practice to have your identification variables in the far left-most columns of your data frame.
+Recall that we also saw in Section \@ref(exploredataframes) that the observational unit for the `flights` data frame is an individual flight. In other words, the rows of the `flights` data frame refer to characteristics/measurements of individual flights. Also included in the `nycflights13` package are other data frames with their rows representing different observational units [@R-nycflights13]:
 
-```{block lc3-3c, type='learncheck'}
-**_Learning check_**
-```
-
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What properties of the observational unit do each of `lat`, `lon`, `alt`, `tz`, `dst`, and `tzone` describe for the `airports` data frame?  Note that you may want to use `?airports` to get more information.
+* `airlines`: translation between two letter IATA carrier codes and names (`r nrow(nycflights13::airlines)` in total). i.e. the observational unit is an airline company.
+* `planes`: construction information about each of `r scales::comma(nrow(nycflights13::planes))` planes used. i.e. the observational unit is an aircraft.
+* `weather`: hourly meteorological data (about `r nycflights13::weather %>% count(origin) %>% .[["n"]] %>% mean() %>% round()` observations) for each of the three NYC airports. i.e. the observational unit is an hourly measurement.
+* `airports`: airport names and locations.  i.e. the observational unit is an airport.
 
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not.  In other words, create your own tidy dataset that matches these conditions.
-
-```{block, type='learncheck', purl=FALSE}
-```
+The organization of the information into these five data frames follow the third "tidy" data property: observations corresponding to the same observational unit should be saved in the same table i.e. data frame. You could think of this property as the old English expression: "birds of a feather flock together." 
 
 
 
 
-
-
----
+***
 
 
 
 ## Case study: Democracy in Guatemala {#case-study-tidy}
 
-In this section, we'll show you another example of how to convert a dataset that isn't in "tidy" format i.e. "wide" format, to a dataset that is in "tidy" format i.e. "long/narrow" format using the `gather()` function from the `tidyr` package.. Let's use the `dem_score` data frame we imported in Section \@ref(csv), but focus on only data corresponding to the country of Guatemala.
+In this section, we'll show you another example of how to convert a data frame that isn't in "tidy" format i.e. "wide" format, to a data frame that is in "tidy" format i.e. "long/narrow" format. We'll do this using the `gather()` function from the `tidyr` package again. Furthermore, we'll make use of some of the `ggplot2` data visualization and `dplyr` data wrangling tools you learned in Chapters \@ref(viz) and \@ref(wrangling).
+
+Let's use the `dem_score` data frame we imported in Section \@ref(csv), but focus on only data corresponding to Guatemala.
 
 ```{r}
 guat_dem <- dem_score %>% 
@@ -381,51 +384,49 @@ guat_dem <- dem_score %>%
 guat_dem
 ```
 
-Now let's produce a plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala.  Let's start by laying out how we would map our aesthetics to variables in the data frame:
-
-- The `data` frame is `guat_dem` by setting `data = guat_dem`
-
-What are the names of the variables to plot?  We'd like to see how the democracy score has changed over the years.  Now we are stuck in a predicament.  We see that we have a variable named `country` but its only value is `"Guatemala"`.  We have other variables denoted by different year values.  Unfortunately, we've run into a dataset that is not in the appropriate format to apply the Grammar of Graphics and `ggplot2`.  Remember that `ggplot2` is a package in the `tidyverse` and, thus, needs data to be in a tidy format.  We'd like to finish off our mapping of aesthetics to variables by doing something like 
+Now let's produce a *time-series plot* showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Recall that we saw time-series plot in Section \@ref(linegraphs) on creating linegraphs using `geom_line()`. Let's lay out the Grammar of Graphics we saw in Section \@ref(grammarofgraphics). 
 
-- The `aes`thetic mapping is set by `aes(x = year, y = democracy_score)`
+First we know we need to set `data = guat_dem` and use a `geom_line()` layer, but what is the aesthetic mapping of variables. We'd like to see how the democracy score has changed over the years, so we need to map:
 
-but this is not possible with our wide-formatted data. We need to take the values of the current column names in `guat_dem` (aside from `country`) and convert them into a new variable that will act as a key called `year`.  Then, we'd like to take the numbers on the inside of the table and turn them into a column that will act as values called `democracy_score`.  Our resulting data frame will have three columns:  `country`, `year`, and `democracy_score`.
+* `year` to the x-position aesthetic and
+* `democracy_score` to the y-position aesthetic
 
-The `gather()` function in the `tidyr` package can complete this task for us.  The first argument to `gather()`, just as with `ggplot2()`, is the `data` argument where we specify which data frame we would like to tidy.  The next two arguments to `gather()` are `key` and `value`, which specify what we'd like to call the new columns that convert our wide data into long format.  Lastly, we include a specification for variables we'd like to NOT include in this tidying process using a `-`.
+Now we are stuck in a predicament, much like with our `drinks_smaller` example in Section \@ref(tidy-data-ex). We see that we have a variable named `country`, but its only value is `"Guatemala"`.  We have other variables denoted by different year values.  Unfortunately, the `guat_dem` data frame is not "tidy" and hence is not in the appropriate format to apply the Grammar of Graphics and thus we cannot use the `ggplot2` package.  We need to take the values of the columns corresponding to years in `guat_dem` and convert them into a new "key" variable called `year`. Furthermore, we'd like to take the democracy scores on the inside of the table and turn them into a new "value" variable called `democracy_score`.  Our resulting data frame will thus have three columns:  `country`, `year`, and `democracy_score`. 
 
-<!-- Should we include a mention of also including all the variables you'd like to include? I rarely do this and use the negation instead. -->
-
-<!-- I like not teaching the pipe here since the data argument is the same as what they are used to with ggplot2 -->
+Recall that the `gather()` function in the `tidyr` package can complete this task for us:
 
 ```{r}
-guat_tidy <- guat_dem %>% 
+guat_dem_tidy <- guat_dem %>% 
   gather(key = year, value = democracy_score, -country) 
-guat_tidy
+guat_dem_tidy
 ```
 
-We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a linegraph and `ggplot2`.
+We set the arguments to `gather()` as follows:
 
-```{r errors=TRUE}
-ggplot(guat_tidy, aes(x = year, y = democracy_score)) +
-  geom_line()
-```
+1. `key` is the name of the column/variable in the new "tidy" frame that contains the column names of the original data frame that you want to tidy. Observe how we set `key = year` and in the resulting `guat_dem_tidy` the column `year` contains the years where the Guatemala's democracy score were measured.
+1. `value` is the name of the column/variable in the "tidy" frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set `value = democracy_score` and in the resulting `guat_dem_tidy` the column `democracy_score` contains the 1 $\times$ 9 = 9 democracy scores.
+1. The third argument are the columns you either want to or don't want to tidy. Observe how we set this to `-country` indicating that we don't want to tidy the `country` variable in `guat_dem` and rather only `1952` through `1992`. 
 
-<!-- Arg, this is really annoying that gather() doesn't see that these are all numbers.  Do you know a way around this? I usually just go mutate(year = as.numeric(year) but they don't know mutate() yet. -->
+<!-- 
+Chester: Should we include a mention of also including all the variables you'd like to include? I rarely do this and use the negation instead. 
+Albert: I did for the drinks_smaller example above, but in this case it will be a little hairy to include: c(`1952`:`1992`)
+-->
 
-Observe that the `year` variable in `guat_tidy` is stored as a character vector since we had to circumvent the naming rules in R by adding backticks around the different year columns in `guat_dem`.  This is leading to `ggplot` not knowing exactly how to plot a line using a categorical variable.  We can fix this by using the `parse_number()` function in the `readr` package and then specify the horizontal axis label to be `"year"`:
+However, observe in the output for `guat_dem_tidy` that the `year` variable is of type `chr` or character. Before we can plot this variable on the x-axis, we need to convert it into a numerical variable using the `as.numeric()` function within the `mutate()` function, which we saw in Section \@ref(mutate) on mutating existing variables to create new ones.
 
-```{r guatline, fig.cap="Guatemala's democracy score ratings from 1952 to 1992"}
-ggplot(guat_tidy, aes(x = parse_number(year), y = democracy_score)) +
-  geom_line() +
-  labs(x = "year")
+```{r}
+guat_dem_tidy <- guat_dem_tidy %>% 
+  mutate(year = as.numeric(year))
 ```
 
-We'll see in Chapter \@ref(wrangling) how we could use the `mutate()` function to change `year` to be a numeric variable instead after we have done our tidying.  Notice now that the mappings of aesthetics to variables make sense in Figure \@ref(fig:guatline):
+We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a `geom_line()`:
+
+```{r errors=TRUE}
+ggplot(guat_dem_tidy, aes(x = year, y = democracy_score)) +
+  geom_line() +
+  labs(x = "Year", y = "Democracy Score", title = "Democracy score in Guatemala 1952-1992")
+```
 
-- The `data` frame is `guat_tidy` by setting `data = dem_score`
-- The `x` `aes`thetic is mapped to `year`
-- The `y` `aes`thetic is mapped to `democracy_score`
-- The `geom_`etry chosen is `line`
 
 ```{block lc-tidying, type='learncheck', purl=FALSE}
 **_Learning check_**
@@ -441,7 +442,7 @@ a tidy data frame and assign the name of `dem_score_tidy` to the resulting long-
 
 
 
----
+***
 
 
 
@@ -449,7 +450,7 @@ a tidy data frame and assign the name of `dem_score_tidy` to the resulting long-
 
 ### `tidyverse` package
 
-Notice at the beginning of the Chapter we loaded the following four packages:
+Notice at the beginning of the chapter we loaded the following four packages, which are among the four of the most frequently used R packages for data science:
 
 ```{r, eval=FALSE}
 library(dplyr)
@@ -458,7 +459,7 @@ library(readr)
 library(tidyr)
 ```
 
-In fact, these are among the four of the most frequently used R packages for data science. There is a much quicker way to load these packages than by individually loading them as we did above. We can install and load the `tidyverse` package. The `tidyverse` package acts as an "umbrella" package whereby installing/loading it will install/load multiple packages at once for you. So that after installing the `tidyverse` package as you would a normal package, running this:
+There is a much quicker way to load these packages than by individually loading them as we did above: by installing and loading the `tidyverse` package. The `tidyverse` package acts as an "umbrella" package whereby installing/loading it will install/load multiple packages at once for you. So after installing the `tidyverse` package as you would a normal package, running this:
 
 ```{r, eval=FALSE}
 library(tidyverse)
@@ -479,44 +480,10 @@ library(forcats)
 
 You've seen the first 4 of the these packages: `ggplot2` for data visualization, `dplyr` for data wrangling, `tidyr` for converting data to "tidy" format, and `readr` for importing spreadsheet data into R. The remaining packages (`purrr`, `tibble`, `stringr`, and `forcats`) are left for a more advanced book; check out [R for Data Science](http://r4ds.had.co.nz/) to learn about these packages.
 
-The `tidyverse` "umbrella" package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in "tidy" format and all output data frames are in "tidy" format as well. This acts as a standardization to make transitions between the various functions in these packages as seamless as possible. 
+The `tidyverse` "umbrella" package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in "tidy" format and all output data frames are in "tidy" format as well. This standardization of input and output data frames makes transitions between the various functions in these packages as seamless as possible. 
 
 
 
-
-
-### Optional: Normal forms of data
-
-The datasets included in the `nycflights13` package are in a form that minimizes redundancy of data.  We will see that there are ways to _merge_ (or _join_) the different tables together easily.  We are capable of doing so because each of the tables have _keys_ in common to relate one to another.  This is an important property of **normal forms** of data.  The process of decomposing data frames into less redundant tables without losing information is called **normalization**.  More information is available on [Wikipedia](https://en.wikipedia.org/wiki/Database_normalization).
-
-We saw an example of this above with the `airlines` dataset.  While the `flights` data frame could also include a column with the names of the airlines instead of the carrier code, this would be repetitive since there is a unique mapping of the carrier code to the name of the airline/carrier. 
-
-Below an example is given showing how to **join** the `airlines` data frame together with the `flights` data frame by linking together the two datasets via a common **key** of `"carrier"`.  Note that this "joined" data frame is assigned to a new data frame called `joined_flights`. The **key** variable that we frequently join by is one of the *identification variables* mentioned above.
-
-```{r message=FALSE}
-joined_flights <- inner_join(x = flights, y = airlines, by = "carrier")
-```
-
-```{r eval=FALSE}
-View(joined_flights)
-```
-
-If we `View()` this dataset, we see a new variable has been created called `name`. (We will see in Subsection \@ref(rename) ways to change `name` to a more descriptive variable name.)  More discussion about joining data frames together will be given in Chapter \@ref(wrangling).  We will see there that the names of the columns to be linked need not match as they did here with `"carrier"`.
-
-```{block tidy_review, type='learncheck'}
-**_Learning check_**
-```
-
- **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are common characteristics of "tidy" datasets?
-
- **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What makes "tidy" datasets useful for organizing data?
-
- **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are some advantages of data in normal forms?  What are some disadvantages?
-
-```{block, type='learncheck', purl=FALSE}
-```
-
-
 ### Additional resources
 
 An R script file of all R code used in this chapter is available [here](scripts/05-tidy.R).
@@ -538,7 +505,7 @@ Review questions have been designed using the `fivethirtyeight` R package [@R-fi
 
 ### What's to come?
 
-Congratulations! We've completed the "Data Science via the tidyverse" portion of this book! We'll now move to the "data modeling" portion in Chapters \@ref(regression) and \@ref(multiple-regression), where you'll leverage your data visualization and wrangling skills to model relationships between different variables in datasets. However, we're going to leave the Chapter \@ref(inference-for-regression) on "Inference for Regression" until after we've covered statistical inference.
+Congratulations! We've completed the "Data Science via the tidyverse" portion of this book! We'll now move to the "data modeling" portion in Chapters \@ref(regression) and \@ref(multiple-regression), where you'll leverage your data visualization and wrangling skills to model relationships between different variables in data frames. However, we're going to leave the Chapter \@ref(inference-for-regression) on "Inference for Regression" until after we've covered statistical inference.
 
 ```{r echo=FALSE, fig.cap="ModernDive flowchart - On to Part II!", fig.align='center'}
 knitr::include_graphics("images/flowcharts/flowchart/flowchart.005.png")
diff --git a/06-regression.Rmd b/06-regression.Rmd
index 3078a6a86..5dd311860 100755
--- a/06-regression.Rmd
+++ b/06-regression.Rmd
@@ -10,30 +10,21 @@ rq <- 0
 # **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
 
 knitr::opts_chunk$set(
-  tidy = FALSE,
-  out.width = "\\textwidth",
-  message = FALSE,
+  tidy = FALSE, 
+  out.width = '\\textwidth', 
+  fig.height = 4,
+  fig.align='center',
   warning = FALSE
-  )
+)
+
 options(scipen = 99, digits = 3)
 
-# This bit of code is a bug fix on asis blocks, which we use to show/not show LC
-# solutions, which are written like markdown text. In theory, it shouldn't be
-# necessary for knitr versions <=1.11.6, but I've found I still need to for
-# everything to knit properly in asis blocks. More info here: 
-# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
-library(knitr)
-knit_engines$set(asis = function(options) {
-  if (options$echo && options$eval) knit_child(text = options$code)
-})
+# In knitr::kable printing replace all NA's with blanks
+options(knitr.kable.NA = '')
 
-# This controls which LC solutions to show. Options for solutions_shown: "ALL"
-# (to show all solutions), or subsets of c('5-1', '5-2','5-3', '5-4'), including
-# the null vector c("") to show no solutions.
-solutions_shown <- c("")
-show_solutions <- function(section){
-  return(solutions_shown == "ALL" | section %in% solutions_shown)
-  }
+# Set random number generator see value for replicable pseudorandomness. Why 76?
+# https://www.youtube.com/watch?v=xjJ7FheCkCU
+set.seed(76)
 ```
 
 
@@ -82,7 +73,6 @@ library(gapminder)
 library(skimr)
 ```
 
-
 ```{r, message=FALSE, warning=FALSE, echo=FALSE}
 library(ggplot2)
 library(dplyr)
@@ -104,16 +94,8 @@ library(kableExtra)
 ```
 
 
-### DataCamp {-}
-
-The introductory basic regression analysis below was the inspiration for a large part of ModernDive co-author [Albert Y. Kim's](https://twitter.com/rudeboybert) DataCamp course "Modeling with Data in the Tidyverse." If you're interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 "Introduction to Modeling" and Chapter 2 "Modeling with Basic Regression".
 
-```{r, echo=FALSE, results='asis', purl=FALSE}
-image_link(path = "images/datacamp_working_with_data.png", 
-           link = "https://www.datacamp.com/courses/working-with-data-in-the-tidyverse", 
-           html_opts = "height: 150px;", 
-           latex_opts = "width=0.3\\textwidth")
-```
+***
 
 
 
@@ -553,6 +535,12 @@ Just as we did for the 21st instructor in the `evals_ch6` dataset (in the first
 
 More development of this idea appears in Section \@ref(leastsquares) and we encourage you to read that section after you investigate residuals.
 
+
+
+***
+
+
+
 ## One categorical explanatory variable {#model2}
 
 It's an unfortunate truth that life expectancy is not the same across various countries in the world; there are a multitude of factors that are associated with how long people live. International development agencies are very interested in studying these differences in the hope of understanding where governments should allocate resources to address this problem. In this section, we'll explore differences in life expectancy in two ways:
@@ -639,7 +627,7 @@ ggplot(gapminder2007, aes(x = lifeExp)) +
        title = "Worldwide life expectancy")
 ```
 
-We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancies that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let's proceed by comparing median and mean life expectancy between continents by adding a `group_by(continent)` to the above code:
+We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancy that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let's proceed by comparing median and mean life expectancy between continents by adding a `group_by(continent)` to the above code:
 
 ```{r, eval=TRUE}
 lifeExp_by_continent <- gapminder2007 %>%
@@ -665,9 +653,9 @@ n_countries <- gapminder2007 %>% nrow()
 n_countries_africa <- gapminder2007 %>% filter(continent == "Africa") %>% nrow()
 ```
 
-We see now that there are differences in life expectancies between the continents. For example let's focus on only medians. While the median life expectancy across all $n = `r n_countries`$ countries in 2007 was `r lifeExp_worldwide$median %>% round(3)`, the median life expectancy across the $n =`r n_countries_africa`$ countries in Africa was only `r median_africa`.
+We see now that there are differences in life expectancy between the continents. For example let's focus on only medians. While the median life expectancy across all $n = `r n_countries`$ countries in 2007 was `r lifeExp_worldwide$median %>% round(3)`, the median life expectancy across the $n =`r n_countries_africa`$ countries in Africa was only `r median_africa`.
 
-Let's create a corresponding visualization. One way to compare the life expectancies of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section \@ref(facets), that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure \@ref(fig:catxplot0b), the variable we facet by is `continent`, which is categorical with five levels, each corresponding to the five continents of the world.
+Let's create a corresponding visualization. One way to compare the life expectancy of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section \@ref(facets), that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure \@ref(fig:catxplot0b), the variable we facet by is `continent`, which is categorical with five levels, each corresponding to the five continents of the world.
 
 ```{r catxplot0b, warning=FALSE, fig.cap="Life expectancy in 2007"}
 ggplot(gapminder2007, aes(x = lifeExp)) +
@@ -677,7 +665,7 @@ ggplot(gapminder2007, aes(x = lifeExp)) +
   facet_wrap(~ continent, nrow = 2)
 ```
 
-Another way would be via a `geom_boxplot` where we map the categorical variable `continent` to the $x$-axis and the different life expectancies within each continent on the $y$-axis; we do this in Figure \@ref(fig:catxplot1). 
+Another way would be via a `geom_boxplot` where we map the categorical variable `continent` to the $x$-axis and the different life expectancy within each continent on the $y$-axis; we do this in Figure \@ref(fig:catxplot1). 
 
 ```{r catxplot1, warning=FALSE, fig.cap="Life expectancy in 2007"}
 ggplot(gapminder2007, aes(x = continent, y = lifeExp)) +
@@ -693,7 +681,7 @@ It’s important to remember however that the solid lines in the middle of the b
 * Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes).
 * Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand. 
 
-Now, let's start making comparisons of life expectancy *between* continents. Let's use Africa as a *baseline for comparsion*. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the "eyeball test" (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa:
+Now, let's start making comparisons of life expectancy *between* continents. Let's use Africa as a *baseline for comparison*. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the "eyeball test" (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa:
 
 1. The median life expectancy of the Americas is roughly 20 years greater.
 1. The median life expectancy of Asia is roughly 20 years greater.
@@ -811,7 +799,7 @@ Now let's interpret the terms in the estimate column of the regression table. Fi
 
 i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table \@ref(tab:continent-mean-life-expectancies).
 
-Next, $b_{\text{Amer}}$ = `continentAmericas = 18.8` is the difference in mean life expectancies of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is:
+Next, $b_{\text{Amer}}$ = `continentAmericas = 18.8` is the difference in mean life expectancy of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is:
 
 
 \begin{align}
@@ -827,7 +815,7 @@ Next, $b_{\text{Amer}}$ = `continentAmericas = 18.8` is the difference in mean l
 
 i.e. in this case, only the indicator function $\mathbb{1}_{\mbox{Amer}}(x)$ is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table \@ref(tab:continent-mean-life-expectancies).
 
-Similarly, $b_{\text{Asia}}$ = `continentAsia = 15.9` is the difference in mean life expectancies of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is:
+Similarly, $b_{\text{Asia}}$ = `continentAsia = 15.9` is the difference in mean life expectancy of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is:
 
 
 \begin{align}
@@ -918,6 +906,11 @@ $-26.9 = 43.8 - 70.7$ is Afghanistan's mean life expectancy minus the mean life
 expectancy of all Asian countries.
 
 
+
+***
+
+
+
 ## Related topics
 
 ### Correlation coefficient {#correlationcoefficient}
@@ -1167,12 +1160,17 @@ In this case, it outputs only variables of interest to us as new regression mode
 If you're even more curious, take a look at the source code for these functions on [GitHub](https://github.com/moderndive/moderndive/blob/master/R/regression_functions.R).
 
 
-## Conclusion
 
-In this chapter, you've seen what we call "basic regression" when you only have one explanatory variable. In Chapter \@ref(multiple-regression), we'll study *multiple regression* where we have more than one explanatory variable! In particular, we'll see why we've been conducting the residual analyses from Subsections \@ref(model1residuals) and \@ref(model2residuals). We are actually verifying some very important assumptions that must be met for the `std_error` (standard error), `p_value`, `lower_ci` and `upper_ci` (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don't worry for now if you don't understand what these terms mean. After the next chapter on multiple regression, we'll dive in!  
+***
 
 
-### Script of R code
+
+## Conclusion
+
+### Additional resources
 
 An R script file of all R code used in this chapter is available [here](scripts/06-regression.R).
 
+### What's to come?
+
+In this chapter, you've seen what we call "basic regression" when you only have one explanatory variable. In Chapter \@ref(multiple-regression), we'll study *multiple regression* where we have more than one explanatory variable! In particular, we'll see why we've been conducting the residual analyses from Subsections \@ref(model1residuals) and \@ref(model2residuals). We are actually verifying some very important assumptions that must be met for the `std_error` (standard error), `p_value`, `lower_ci` and `upper_ci` (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don't worry for now if you don't understand what these terms mean. After the next chapter on multiple regression, we'll dive in!  
diff --git a/07-multiple-regression.Rmd b/07-multiple-regression.Rmd
index 6f9241794..b307c1e23 100644
--- a/07-multiple-regression.Rmd
+++ b/07-multiple-regression.Rmd
@@ -1,4 +1,3 @@
-
 # Multiple Regression {#multiple-regression}
 
 ```{r, include=FALSE, purl=FALSE}
@@ -9,30 +8,21 @@ rq <- 0
 # **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
 
 knitr::opts_chunk$set(
-  tidy = FALSE,
-  out.width = "\\textwidth",
-  message = FALSE,
+  tidy = FALSE, 
+  out.width = '\\textwidth', 
+  fig.height = 4,
+  fig.align='center',
   warning = FALSE
-  )
+)
+
 options(scipen = 99, digits = 3)
 
-# This bit of code is a bug fix on asis blocks, which we use to show/not show LC
-# solutions, which are written like markdown text. In theory, it shouldn't be
-# necessary for knitr versions <=1.11.6, but I've found I still need to for
-# everything to knit properly in asis blocks. More info here: 
-# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
-library(knitr)
-knit_engines$set(asis = function(options) {
-  if (options$echo && options$eval) knit_child(text = options$code)
-})
-
-# This controls which LC solutions to show. Options for solutions_shown: "ALL"
-# (to show all solutions), or subsets of c('5-1', '5-2','5-3', '5-4'), including
-# the null vector c("") to show no solutions.
-solutions_shown <- c("")
-show_solutions <- function(section){
-  return(solutions_shown == "ALL" | section %in% solutions_shown)
-  }
+# In knitr::kable printing replace all NA's with blanks
+options(knitr.kable.NA = '')
+
+# Set random number generator see value for replicable pseudorandomness. Why 76?
+# https://www.youtube.com/watch?v=xjJ7FheCkCU
+set.seed(76)
 ```
 
 In Chapter \@ref(regression) we introduced ideas related to modeling, in particular that the fundamental premise of modeling is *to make explicit the relationship* between an outcome variable $y$ and an explanatory/predictor variable $x$. Recall further the synonyms that we used to also denote $y$ as the dependent variable and $x$ as an independent variable or covariate. 
@@ -62,7 +52,6 @@ library(ISLR)
 # library(skimr) (Causes problems with table linking)
 ```
 
-
 ```{r, message=FALSE, warning=FALSE, echo=FALSE}
 # Packages needed internally, but not in text:
 library(mvtnorm)
@@ -74,13 +63,8 @@ library(patchwork)
 ```
 
 
-### DataCamp {-}
 
-The approach taken below of using more than one variable of information in models using multiple regression is identical to that taken in ModernDive co-author [Albert Y. Kim's](https://twitter.com/rudeboybert) DataCamp course "Modeling with Data in the Tidyverse." If you're interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 "Introduction to Modeling" and Chapter 3 "Modeling with Multiple Regression."
-
-```{r, echo=FALSE, results='asis'}
-image_link(path = "images/datacamp_working_with_data.png", link = "https://www.datacamp.com/courses/working-with-data-in-the-tidyverse", html_opts = "height: 150px;", latex_opts = "width=0.3\\textwidth")
-```
+***
 
 
 
@@ -422,6 +406,11 @@ Recall the format of the output:
 * `residual` corresponds to $y - \widehat{y}$ (the residual)
 
 
+
+***
+
+
+
 ## One numerical & one categorical explanatory variable {#model4}
 
 Let's revisit the instructor evaluation data introduced in Section \@ref(model1), where we studied the relationship between instructor evaluation scores and their beauty scores. This analysis suggested that there is a positive relationship between `bty_avg` and `score`, in other words as instructors had higher beauty scores, they also tended to have higher teaching evaluation scores. Now let's say instead of `bty_avg` we are interested in the numerical explanatory variable $x_1$ `age` and furthermore we want to use a second explanatory variable $x_2$, the (binary) categorical variable `gender`. 
@@ -526,7 +515,6 @@ get_regression_table(score_model_2) %>%
 
 The modeling equation for this scenario is:
 
-
 \begin{align}
 \widehat{y} &= b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 \\
 \widehat{\mbox{score}} &= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x)
@@ -668,6 +656,10 @@ Recall the format of the output:
 
 
 
+***
+
+
+
 ## Related topics
 
 ### More on the correlation coefficient {#correlationcoefficient2}
@@ -788,17 +780,23 @@ ggplot(Credit, aes(x = Income, y = Balance)) +
 --> 
 
 
+
+***
+
+
+
 ## Conclusion
 
-### What's to come?
+### Additional resources
+
+An R script file of all R code used in this chapter is available [here](scripts/07-multiple-regression.R).
 
-Congratulations! We're ready to proceed to the third portion of this book: "statistical inference" using a new package called `infer`.  Once we've covered Chapters \@ref(sampling) on sampling, \@ref(confidence-intervals) on confidence intervals, and \@ref(hypothesis-testing) on hypothesis testing, we'll come back to the models we've seen in "data modeling" in Chapter \@ref(inference-for-regression) on inference for regression. As we said at the end of Chapter \@ref(regression), we'll see why we've been conducting the residual analyses from Subsections \@ref(model3residuals) and \@ref(model4residuals). We are actually verifying some very important assumptions that must be met for the `std_error` (standard error), `p_value`, `conf_low` and `conf_high` (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. 
+### What's to come?
 
-Up next:
+Congratulations! We're ready to proceed to the third portion of this book: "statistical inference" using a new package called `infer`.  Once we've covered Chapters \@ref(sampling) on sampling, \@ref(confidence-intervals) on confidence intervals, and \@ref(hypothesis-testing) on hypothesis testing, we'll come back to the models we've seen in "data modeling" in Chapter \@ref(inference-for-regression) on inference for regression. As we said at the end of Chapter \@ref(regression), we'll see why we've been conducting the residual analyses from Subsections \@ref(model3residuals) and \@ref(model4residuals). We are actually verifying some very important assumptions that must be met for the `std_error` (standard error), `p_value`, `conf_low` and `conf_high` (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Up next:
 
 <center><img src="images/flowcharts/flowchart/flowchart.006.png" title="ModernDive flowchart" width="800"/></center>
 
-### Script of R code
 
-An R script file of all R code used in this chapter is available [here](scripts/07-multiple-regression.R).
+
 
diff --git a/08-sampling.Rmd b/08-sampling.Rmd
index 8e0c94c39..ea45da4e1 100644
--- a/08-sampling.Rmd
+++ b/08-sampling.Rmd
@@ -1,4 +1,4 @@
-# (PART) Inference via infer {-} 
+# (PART) Statistical inference via infer {-} 
 
 # Sampling {#sampling}
 
@@ -26,6 +26,7 @@ set.seed(76)
 
 In this chapter we kick off the third segment of this book, statistical inference, by learning about **sampling**. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we'll cover in Chapters \@ref(confidence-intervals) and \@ref(hypothesis-testing) respectively. We will see that the tools that you learned in the data science segment of this book, in particular data visualization and data wrangling, will also play an important role here in the development of your understanding.  As mentioned before, the concepts throughout this text all build into a culmination allowing you to "think with data."
 
+
 ### Needed packages {-}
 
 Let's load all the packages needed for this chapter (this assumes you've already installed them). If needed, read Section \@ref(packages) for information on how to install and load R packages.
@@ -42,21 +43,24 @@ library(knitr)
 library(kableExtra)
 library(patchwork)
 library(readr)
+library(stringr)
 ```
 
 
 
----
+***
 
 
 
 ## Sampling activity {#sampling-activity}
 
-Let's start with a hand-on activity.
+Let's start with a hands-on activity.
 
 ### What proportion of this bowl's balls are red?
 
-Take a look at the bowl in Figure \@ref(fig:sampling-exercise-1). It has a certain number of red and and a certain number of white balls, all of equal size. What proportion of this bowl's balls are red?
+Take a look at the bowl in Figure \@ref(fig:sampling-exercise-1). It has a certain number of red and a certain number of white balls all of equal size. Furthermore, it appears the bowl has been mixed beforehand as there does not seem to be any particular pattern to the spatial distribution of red and white balls. 
+
+Let's now ask ourselves, what proportion of this bowl's balls are red?
 
 ```{r sampling-exercise-1, echo=FALSE, fig.cap="A bowl with red and white balls.", purl=FALSE, out.width = "80%"}
 knitr::include_graphics("images/sampling_bowl_1.jpg")
@@ -64,7 +68,7 @@ knitr::include_graphics("images/sampling_bowl_1.jpg")
 
 One way to answer this question would be to perform an exhaustive count: remove each ball individually, count the number of red balls and the number of white balls, and divide the number of red balls by the total number of balls. However this would be a long and tedious process. 
 
-### Using shovel once 
+### Using the shovel once 
 
 Instead of performing an exhaustive count, let's insert a shovel into the bowl as seen in Figure \@ref(fig:sampling-exercise-2).
 
@@ -78,23 +82,27 @@ Using the shovel we remove a number of balls as seen in Figure \@ref(fig:samplin
 knitr::include_graphics("images/sampling_bowl_3_cropped.jpg")
 ```
 
-Observe that 17 of the balls are red and there are a total of 5 x 10 = 50 balls and thus 0.34 = 34% of the shovel's balls are red. The proportion of balls that are red in this shovel is a guess of the proportion of balls that are red in the entire bowl. While not as exact as doing an exhaustive count, our guess of 34% took much less time and energy to obtain. 
+Observe that 17 of the balls are red and there are a total of 5 x 10 = 50 balls and thus 0.34 = 34% of the shovel's balls are red. We can view the proportion of balls that are red *in this shovel* as a guess of the proportion of balls that are red *in the entire bowl*. While not as exact as doing an exhaustive count, our guess of 34% took much less time and energy to obtain. 
 
-However say we started this activity over from the beginning. In other words, we replace the 50 balls back into the ball and start over. Would we remove exactly 17 red balls again? In other words, would our guess at the proportion of the bowl's balls that are red by exactly 34% again? Maybe? 
+However, say, we started this activity over from the beginning. In other words, we replace the 50 balls back into the bowl and start over. Would we remove exactly 17 red balls again? In other words, would our guess at the proportion of the bowl's balls that are red be exactly 34% again? Maybe? 
 
-What if we repeated this exercise several times? Would I obtain exactly 17 red balls each time? In other words, would our guess at the proportion of the bowl's balls that are red by exactly 34% every time? Surely not. Let's actually do and observe the results with the help of 33 of our friends.
+What if we repeated this exercise several times? Would I obtain exactly 17 red balls each time? In other words, would our guess at the proportion of the bowl's balls that are red be exactly 34% every time? Surely not. Let's actually do and observe the results with the help of 33 of our friends.
 
-### Using shovel 33 times {#student-shovels}
+### Using the shovel 33 times {#student-shovels}
 
-Each of our 33 friends will do the following: use the shovel to remove 50 balls each, count the number of red balls, use this number to compute the proportion of the 50 balls they removed that are red, return the balls into the bowl, and mix the contents of the bowl a little to not let a previous group;s results influence the next group's set of results. 
+Each of our 33 friends will do the following: 
 
-```{r sampling-exercise-3b, echo=FALSE, fig.cap="Repeating sampling activity 33 times.", purl=FALSE, out.width = "20%"}
+- use the shovel to remove 50 balls each, 
+- count the number of red balls, 
+- use this number to compute the proportion of the 50 balls they removed that are red, 
+- return the balls into the bowl, and 
+- mix the contents of the bowl a little to not let a previous group's results influence the next group's set of results. 
+
+```{r sampling-exercise-3b, echo=FALSE, fig.show='hold', fig.cap="Repeating sampling activity 33 times.", purl=FALSE, out.width = "20%"}
 #
 # Need new picture
 #
-knitr::include_graphics("images/sampling/tactile_2_a.jpg")
-knitr::include_graphics("images/sampling/tactile_2_b.jpg")
-knitr::include_graphics("images/sampling/tactile_2_c.jpg")
+knitr::include_graphics(c("images/sampling/tactile_2_a.jpg", "images/sampling/tactile_2_b.jpg", "images/sampling/tactile_2_c.jpg"))
 ```
 
 However, before returning the balls into the bowl, they are going to mark the proportion of the 50 balls they removed that are red in a histogram as seen in Figure \@ref(fig:sampling-exercise-4).
@@ -113,10 +121,10 @@ Observe the following about the histogram in Figure \@ref(fig:sampling-exercise-
 
 * At the low end, one group removed 50 balls from the bowl with proportion between 0.20 = 20% and 0.25 = 25%
 * At the high end, another group removed 50 balls from the bowl with proportion between 0.45 = 45% and 0.5 = 50% red.
-* However the most frequently occuring proportions were between 0.30 = 30% and 0.35 = 35% red, right in the middle of the distribution.
+* However the most frequently occurring proportions were between 0.30 = 30% and 0.35 = 35% red, right in the middle of the distribution.
 * The shape of this distribution is somewhat bell-shaped. 
 
-Let's construct this same hand-drawn histogram in R using your data visualization skills that you honed in Chapter \@ref(viz). We saved our 33 groups of friend's proportion red in a data frame `tactile_prop_red` which is included in the `moderndive` package you loaded earlier. 
+Let's construct this same hand-drawn histogram in R using your data visualization skills that you honed in Chapter \@ref(viz). We saved our 33 group of friends' proportion red in a data frame `tactile_prop_red` which is included in the `moderndive` package you loaded earlier. 
 
 ```{r, eval=FALSE}
 tactile_prop_red
@@ -138,9 +146,9 @@ tactile_prop_red %>%
                 latex_options = c("HOLD_position", "repeat_header"))
 ```
 
-Observe for each `group` we have their names, the number of `red_balls` they obtained, and the corresponding proportion out of 50 balls that were red `prop_red`. Observe, we also have a variable `replicate` enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red. 
+Observe for each `group` we have their names, the number of `red_balls` they obtained, and the corresponding proportion out of 50 balls that were red named `prop_red`. Observe, we also have a variable `replicate` enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red. 
 
-We visualize the distribution of these 33 proportions using a `geom_histogram()` with `binwidth = 0.05` in Figure \@ref(fig:samplingdistribution-tactile), which matches our hand-drawn histogram from the earlier Figure \@ref(fig:sampling-exercise-5). Recall that using a histogram is appropriate since `prop_red` is a numerical variable. 
+We visualize the distribution of these 33 proportions using a `geom_histogram()` with `binwidth = 0.05` in Figure \@ref(fig:samplingdistribution-tactile), which is appropriate since the variable `prop_red` is numerical. This computer-generated histogram matches our hand-drawn histogram from the earlier Figure \@ref(fig:sampling-exercise-5). 
 
 ```{r eval=FALSE}
 ggplot(tactile_prop_red, aes(x = prop_red)) +
@@ -156,56 +164,61 @@ tactile_histogram +
        title = "Distribution of 33 proportions red")
 ```
 
+<!-- Albert will make sure that the chalkboard histogram matches up 
+with the ggplot2 histogram so that `boundary` isn't needed. -->
 
 ### What are we doing here?
 
-What we just demonstrated in this activity is the statistical concept of sampling. We would like to know the proportion of the bowl's balls that are red. However, because the bowl has a very large number of balls, performing an exhaustive count of the number of red and white balls in the bowl would be very costly, both in terms of both time and energy. We therefore instead mix the balls and extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we approximate the proportion of the bowl's balls that are red using the proportion of the shovel's balls that are red, 17 red balls out of 50 balls = 34% in our earlier example. 
-
-Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table \@ref(tab:tactilered). This is known as the concept of *sampling variation*.
+What we just demonstrated in this activity is the statistical concept of *sampling*. We would like to know the proportion of the bowl's balls that are red, but because the bowl has a very large number of balls performing an exhaustive count of the number of red and white balls in the bowl would be very costly in terms of both time and energy. We therefore extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we estimate the proportion of the bowl's balls that are red using the proportion of the shovel's balls that are red. This estimate in our earlier example was 17 red balls out of 50 balls = 34%. Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table \@ref(tab:tactilered). This is known as the concept of *sampling variation*.
 
-In Section \@ref(sampling-simulation) we'll mimic the hands-on sampling activity we just performed in a *computer simulation*; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the activity a very large number of times, but we will also be able to repeat it with different sized shovels. 
+In Section \@ref(sampling-simulation) we'll mimic the hands-on sampling activity we just performed in a *computer simulation*; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the hands-on activity a very large number of times, but we will also be able to repeat it using different sized shovels. 
 
-After these simulations, in Section \@ref(sampling-goal) we'll explicitly articulate our goals for this chapter: understanding the concept of sampling variation and the role that sample size plays in this variation. 
+The purpose of these simulations is to develop an understanding of two key concepts relating to sampling: understanding the concept of sampling variation and the role that sample size plays in this variation. To this end, we'll present you with definitions, terminology, and notation related to sampling in Section \@ref(sampling-framework). As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you'll be able to master these topics.
 
-After having armed ourselves with this conceptual understanding of sampling, we'll present you with definitions, terminology, and notation related to sampling in Section \@ref(sampling-framework). As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you'll be able to master these topics.
-
-To tie the contents of this chapter to the real-word, we'll present an example of one of the most recognizable uses of sampling: polls. In Section \@ref(sampling-case-study) we'll look at a particular case study: a 2013 poll on then President Obama's popularity amongst young Americans, conducted by the Harvard Kennedy School's Institute of Politics.
-
-We'll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distiguishing between *random sampling* and *random assignment*, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter.
+To tie the contents of this chapter to the real-word, we'll present an example of one of the most recognizable uses of sampling: polls. In Section \@ref(sampling-case-study) we'll look at a particular case study: a 2013 poll on then President Obama's popularity among young Americans, conducted by the Harvard Kennedy School's Institute of Politics.
 
+We'll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distinguishing between *random sampling* and *random assignment*, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter.
 
+<!-- 
+Chester: Albert will add a discussion of *random sampling* and *random assignment* as it is currently missing. 
+Albert: To be added in Conclusion later
+-->
 
----
+***
 
 
 
 ## Computer simulation {#sampling-simulation}
 
-What we performed in Section \@ref(sampling-activity) is a *simulation* of sampling. The crowd-sourced Wikipedia definition of a simulation states: "A simulation is an approximate imitation of the operation of a process or system."^[[Wikipedia entry for simulation](https://en.wikipedia.org/wiki/Simulation)] One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible. 
+What we performed in Section \@ref(sampling-activity) is a *simulation* of sampling. In other words, we were not in a real-life sampling scenario in order to answer a real-life question, but rather we were mimicking such a scenario with our bowl and shovel. The crowd-sourced Wikipedia definition of a simulation states: "A simulation is an approximate imitation of the operation of a process or system."^[[Wikipedia entry for simulation](https://en.wikipedia.org/wiki/Simulation)] One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible. 
+
+Now you might be thinking that simulations must necessarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengers of being in an automobile crash. To distinguish between these two simulation types, we'll term a simulation performed in real-life as a "tactile" simulation done with your hands and to the touch as opposed to a "virtual" simulation performed on a computer.
 
-Now you might be thinking that simulations must necssarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengeres of being in an automobile crash. To distinguish between these two simulation types, we'll term a simulation performed in real-life as a "tactile" simulation done with your hands and to the touch as opposed to a "virtual" simulation performed on a computer. 
+<!-- Albert will check if images exist on shutterstock and link to those images if needed below. -->
 
 Example of a "tactile" simulation          |  Example of "virtual" simulation
 :-------------------------:|:-------------------------:
 ![](images/crash-test-dummy.jpg){ height=1.7in }  |  ![](images/flight-simulator.jpg){ height=1.7in }
 
-So while in Section \@ref(sampling-activity) we performed a "tactile" simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we'll perform a "virtual" simulation using a virtual bowl and a virtual shovel with our computers.
+So while in Section \@ref(sampling-activity) we performed a "tactile" simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we'll perform a "virtual" simulation using a "virtual" bowl and a "virtual" shovel with our computers.
 
 <!--
 Supplement definition of simulation with idea of "replicates"?
 -->
 
-### Using shovel once
+### Using the virtual shovel once
 
-Let's start by perfoming the virtual analogue of the tactile sampling simulation we performed in \@ref(sampling-activity). We first need a virtual analogue of the bowl seen in Figure \@ref(fig:sampling-exercise-1). To this end, we created a data frame called `bowl` whose rows correspond exactly with the contents of the actual bowl; we've included this data frame in the `moderndive` package. 
+Let's start by performing the virtual analogue of the tactile sampling simulation we performed in \@ref(sampling-activity). We first need a virtual analogue of the bowl seen in Figure \@ref(fig:sampling-exercise-1). To this end, we included a data frame `bowl` in the `moderndive` package whose rows correspond exactly with the contents of the actual bowl. 
 
 ```{r}
 bowl
 ```
 
-Observe in the output that `bowl` has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable `ball_ID` is used merely as an "identification variable" for this data frame as discussed in Subsection \@ref(identification-vs-measurement); none of the balls in the actual bowl are marked with numbers. The second variable `color` indicates whether a particular virtual ball i s red or white. Run `View(bowl)` in RStudio and scroll through the contents to convince yourselves that `bowl` is indeed a virtual version of the actual bowl in Figure \@ref(fig:sampling-exercise-1).
+<!-- Albert will make sure identification-vs-measurement matches the name of a Subsection. -->
 
-Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure \@ref(fig:sampling-exercise-2) to generate our random samples of 50 balls. We're going to use the `rep_sample_n()` function included in the `moderndive` package that allows us to take `rep`eated/`rep`licated `samples of size `n`. Run the following and explore `virtual_shovel`'s contents in the spreadsheet viewer.
+Observe in the output that `bowl` has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable `ball_ID` is used merely as an "identification variable" for this data frame as discussed in Subsection \@ref(identification-vs-measurement-variables); none of the balls in the actual bowl are marked with numbers. The second variable `color` indicates whether a particular virtual ball is red or white. View the contents of the bowl in RStudio's data viewer and scroll through the contents to convince yourselves that `bowl` is indeed a virtual version of the actual bowl in Figure \@ref(fig:sampling-exercise-1).
+
+Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure \@ref(fig:sampling-exercise-2); we'll use this virtual shovel to generate our virtual random samples of 50 balls. We're going to use the `rep_sample_n()` function included in the `moderndive` package. This function allows us to take `rep`eated, or `rep`licated, `samples` of size `n`. Run the following and explore `virtual_shovel`'s contents in the RStudio viewer.
 
 ```{r, eval=FALSE}
 virtual_shovel <- bowl %>% 
@@ -230,24 +243,24 @@ virtual_shovel %>%
                 latex_options = c("HOLD_position"))
 ```
 
-The `ball_ID` variable identifies which of balls from `bowl` are included in our sample of 50 balls and `color` denotes it's color. However what does the `replicate` variable indicate? In `virtual_shovel`'s case, `replicate` is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in other words our first sample. We'll see below when we "virtually" take 33 samples below, `replicate` will take values between 1 and 33. Before we do this, let's compute the proportion of balls in our virtual sample of size 50 that are red. We'll be using the `dplyr` data wrangling verbs you learned in Chapter \@ref(wrangling). Let's breakdown the steps individually:
+The `ball_ID` variable identifies which of the balls from `bowl` are included in our sample of 50 balls and `color` denotes its color. However what does the `replicate` variable indicate? In `virtual_shovel`'s case, `replicate` is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in our case our first sample. We'll see below when we "virtually" take 33 samples, `replicate` will take values between 1 and 33. Before we do this, let's compute the proportion of balls in our virtual sample of size 50 that are red using the `dplyr` data wrangling verbs you learned in Chapter \@ref(wrangling). Let's breakdown the steps individually:
 
-First, for each of our 50 sampled balls, identify if it is red or not using the boolean algebra. For every row where `color == "red"`, the boolean `TRUE` is returned and for every row where `color` is not equal to `"red"`, the boolean `FALSE` is returned. Let's create a new boolean variable `is_red` using the `mutate()` function from Section \@ref(mutate):
+First, for each of our 50 sampled balls, identify if it is red using a test for equality using `==`. For every row where `color == "red"`, the Boolean `TRUE` is returned and for every row where `color` is not equal to `"red"`, the Boolean `FALSE` is returned. Let's create a new Boolean variable `is_red` using the `mutate()` function from Section \@ref(mutate):
 
 ```{r}
 virtual_shovel %>% 
-  mutate(is_red = color == "red")
+  mutate(is_red = (color == "red"))
 ```
 
 Second, we compute the number of balls out of 50 that are red using the `summarize()` function. Recall from Section \@ref(summarize) that `summarize()` takes a data frame with many rows and returns a data frame with a single row containing summary statistics that you specify, like `mean()` and `median()`. In this case we use the `sum()`:
 
 ```{r}
 virtual_shovel %>% 
-  mutate(is_red = color == "red") %>% 
+  mutate(is_red = (color == "red")) %>% 
   summarize(num_red = sum(is_red))  
 ```
 
-Why does this work? Because R treats `TRUE` like the number `1` and `FALSE` like the number `0`. So summing the number of `TRUE`'s and `FALSE`'s is equivalent to summing `1`'s and `0`'s, which in the end which counts the number of balls where `color` is `red`.
+Why does this work? Because R treats `TRUE` like the number `1` and `FALSE` like the number `0`. So summing the number of `TRUE`'s and `FALSE`'s is equivalent to summing `1`'s and `0`'s, which in the end counts the number of balls where `color` is `red`. In our case, 17 of the 50 balls were red. 
 
 Third and last, we compute the proportion of the 50 sampled balls that are red by dividing `num_red` by 50:
 
@@ -258,7 +271,7 @@ virtual_shovel %>%
   mutate(prop_red = num_red / 50)
 ```
 
-Let's make the above code a little more compact and succinct by combining the first `mutate()` and the `summarize()` as follows:
+In other words, this "virtual" sample's balls were 34% red. Let's make the above code a little more compact and succinct by combining the first `mutate()` and the `summarize()` as follows:
 
 ```{r}
 virtual_shovel %>% 
@@ -266,16 +279,12 @@ virtual_shovel %>%
   mutate(prop_red = num_red / 50)
 ```
 
-Great! 44% of `virtual_shovel`'s 50 balls were red! So based on this particular sample, our guess at the proportion of `bowl`'s balls that are red is 44%. But remember from our earlier tactile sampling activity, that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 44% of them being red; there will likely be some variation. 
-
-In fact in Table \@ref(tab:virtual-shovel) we displayed 33 such proportions based on 33 tactile samples and then in Figure \@ref(fig:sampling-exercise-5) we visualized the distribution of the 33 proportions in a histogram. Let's now perform the virtual analogue of having 33 groups of students use the sampling shovel!
-
+Great! 34% of `virtual_shovel`'s 50 balls were red! So based on this particular sample, our guess at the proportion of the `bowl`'s balls that are red is 34%. But remember from our earlier tactile sampling activity that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 34% of them being red again; there will likely be some variation. In fact in Table \@ref(tab:virtual-shovel) we displayed 33 such proportions based on 33 tactile samples and then in Figure \@ref(fig:sampling-exercise-5) we visualized the distribution of the 33 proportions in a histogram. Let's now perform the virtual analogue of having 33 groups of students use the sampling shovel!
 
-### Using shovel 33 times
 
-Recall however in our tactile sampling exercise in Section \@ref(sampling-activity) above that we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we used to then compute 33 proportions. In other words we *repeated/replicated* the sampling activity 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel funciton `rep_sample_n()`, but by adding the `reps = 33` argument indicating we want to repeat the sampling 33 times. 
+### Using the virtual shovel 33 times
 
-Be sure to scroll through the contents of `virtual_samples` in RStudio's spreadsheet viewer. 
+Recall that in our tactile sampling exercise in Section \@ref(sampling-activity) we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we then used to compute 33 proportions. In other words we repeated/replicated using the shovel 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel function `rep_sample_n()`, but by adding the `reps = 33` argument, indicating we want to repeat the sampling 33 times. Be sure to scroll through the contents of `virtual_samples` in RStudio's viewer. 
 
 ```{r, eval=FALSE}
 virtual_samples <- bowl %>% 
@@ -287,9 +296,9 @@ virtual_samples <- bowl %>%
   rep_sample_n(size = 50, reps = 33)
 ```
 
-Observe that while the first 50 rows of `replicate` are equal to `1` the next 50 are equal to `2`. This is indicating that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all `reps = 33` replicates and thus `virtual_samples` has 33 $\times$ 50 = 1650 rows. 
+Observe that while the first 50 rows of `replicate` are equal to `1`, the next 50 rows of `replicate` are equal to `2`. This is telling us that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all `reps = 33` replicates and thus `virtual_samples` has 33 $\times$ 50 = 1650 rows. 
 
-Let's now take the data frame `virtual_samples` with 33 $\times$ 50 = 1650 rows corresponding to 33 samples of size 50 and compute the resulting 33 proportions red. We'll use the same `dplyr` verbs as we did in the previous section, but this time with an additional `group_by()` the `replicate` variable. Recall from Section \@ref(groupby) that by assigning grouping "meta-data" before `summarizing()`, we'll obtain 33 different proportions red:
+Let's now take the data frame `virtual_samples` with 33 $\times$ 50 = 1650 rows corresponding to 33 samples of size 50 balls and compute the resulting 33 proportions red. We'll use the same `dplyr` verbs as we did in the previous section, but this time with an additional `group_by()` of the `replicate` variable. Recall from Section \@ref(groupby) that by assigning the grouping variable "meta-data" before `summarizing()`, we'll obtain 33 different proportions red:
 
 ```{r, eval=FALSE}
 virtual_prop_red <- virtual_samples %>% 
@@ -299,7 +308,9 @@ virtual_prop_red <- virtual_samples %>%
 View(virtual_prop_red)
 ```
 
-Let's display only the first 10 out of 33 rows of `virtual_prop_red`'s contents in Table \@ref(tab:tactilered).
+Let's display only the first 10 out of 33 rows of `virtual_prop_red`'s contents in Table \@ref(tab:tactilered). As one would expect, there is variation in the resulting `prop_red` proportions red for the first 10 out 33 repeated/replicated samples.
+
+<!-- Albert will remove `boundary` here on updates to chalkboard image. -->
 
 ```{r virtualred, echo=FALSE}
 virtual_prop_red <- virtual_samples %>% 
@@ -338,11 +349,11 @@ virtual_histogram +
        title = "Distribution of 33 proportions red")
 ``` 
 
-Observe that occasionally we obtained proportions red that are less than 0.3 = 30%, while occasionally we obtained proportions that are greater than 0.45 = 45%. However, the most frequently occurring proportions red out of 50 balls were between 35% and 40% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation. 
+Observe that occasionally we obtained proportions red that are less than 0.3 = 30%, while on the other hand we occasionally we obtained proportions that are greater than 0.45 = 45%. However, the most frequently occurring proportions red out of 50 balls were between 35% and 40% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation. 
 
-Let's now compare our virtual results with our tactile results from the previous section in Figure \@ref(fig:tactile-vs-virtual). We see that both histograms, in other words the distribution of the 33 proportions red, are *somewhat* somewhat similar in their center and spread, although not identical; these slight differences are again due to random variation. Furthermore both distributions are *somewhat* bell-shaped.
+Let's now compare our virtual results with our tactile results from the previous section in Figure \@ref(fig:tactile-vs-virtual). We see that both histograms, in other words the distribution of the 33 proportions red, are *somewhat* similar in their center and spread although not identical. These slight differences are again due to random variation. Furthermore both distributions are *somewhat* bell-shaped.
 
-```{r tactile-vs-virtual, echo=FALSE, fig.cap="Two distribution of 33 proportions based on 33 samples of size 50"}
+```{r tactile-vs-virtual, echo=FALSE, fig.cap="Comparing 33 virtual and 33 tactile proportions red."}
 bind_rows(
   virtual_prop_red %>% 
     mutate(type = "Virtual sampling"), 
@@ -359,11 +370,9 @@ bind_rows(
 ```
 
 
-### Using shovel 1000 times
-
-Now say we want study the variation in proportions red not based on 33 samples but rather a very large number of samples, say 1000 samples. We have two choices at this point. We could make our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. However, this would be cruel and unusual, as it this would be very tedious and time consuming. This is however where computers excel: for automating long and repetitive tasks and having them performed very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let's once again use the `rep_sample_n()` function with sample `size` set to 50, but the number of replicates `reps = 1000`. 
+### Using the virtual shovel 1000 times
 
-Be sure to scroll through the contents of `virtual_samples` in RStudio's spreadsheet viewer. 
+Now say we want study the variation in proportions red not based on 33 repeated/replicated samples, but rather a very large number of samples say 1000 samples. We have two choices at this point. We could have our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. This would be cruel and unusual however, as this would be very tedious and time-consuming. This is where computers excel: automating long and repetitive tasks while performing them very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let's once again use the `rep_sample_n()` function with sample `size` set to 50 once again, but this time with the number of replicates `reps = 1000`. Be sure to scroll through the contents of `virtual_samples` in RStudio's viewer. 
 
 ```{r, eval=FALSE}
 virtual_samples <- bowl %>% 
@@ -376,7 +385,7 @@ virtual_samples <- bowl %>%
 ```
 
 
-Observe that now `virtual_samples` has 1000 $\times$ 50 = 50,000 rows, instead of the 33 $\times$ 50 = 1650 rows from earlier. Using the same code as earlier, let's take the data frame `virtual_samples` with 1000 $\times$ 50 = 50,000 and compute the resulting 33 proportions red. 
+Observe that now `virtual_samples` has 1000 $\times$ 50 = 50,000 rows, instead of the 33 $\times$ 50 = 1650 rows from earlier. Using the same code as earlier, let's take the data frame `virtual_samples` with 1000 $\times$ 50 = 50,000 and compute the resulting 1000 proportions red. 
 
 ```{r, eval=FALSE}
 virtual_prop_red <- virtual_samples %>% 
@@ -388,6 +397,8 @@ View(virtual_prop_red)
 
 Observe that we now have 1000 replicates of `prop_red`, the proportion of 50 balls that are red. Using the same code as earlier, let's now visualize the distribution of these 1000 replicates of `prop_red` in a histogram in Figure \@ref(fig:samplingdistribution-virtual-1000).
 
+<!-- Albert will remove `boundary` here on updates to chalkboard image. -->
+
 ```{r eval=FALSE}
 ggplot(virtual_prop_red, aes(x = prop_red)) +
   geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
@@ -406,18 +417,20 @@ virtual_histogram +
        title = "Distribution of 1000 proportions red")
 ``` 
 
-Once again, the most frequently occuring proportions red occur between 35% and 40%. Every now and then, we'd obtain proportions are low as between 20% and 25%, and others as high as between 55% and 60%, but those are rarities. Furthermore observe that we now have much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix \@ref(appendixA) for a brief discussion on properties of the Normal distribution.
+Once again, the most frequently occurring proportions red occur between 35% and 40%. Every now and then, we obtain proportions as low as between 20% and 25%, and others as high as between 55% and 60%. These are rare however. Furthermore observe that we now have a much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix \@ref(appendixA) for a brief discussion on properties of the Normal distribution.
+
 
+<!-- Albert will add Learning Checks throughout this chapter after going over this chapter with his students. We should aim for 10-15 multiple choice or explanation type questions for the chapter as a whole with a concluding lab at the end. Documenting student questions can help to write these. -->
 
 ### Using different shovels
 
-We ask ourselves a question now. Say you had three choices of shovels to extract a sample of balls and compute the corresponding proportion of balls in the shovel that are red:
+Now say instead of just one shovel, you had three choices of shovels to extract a sample of balls with.
 
 A shovel with 25 slots          |  A shovel with 50 slots  | A shovel with 100 slots
 :-------------------------:|:-------------------------:|:-------------------------:
 ![](images/sampling/shovel_025.jpg){ height=1.7in }  |  ![](images/sampling/shovel_050.jpg){ height=1.7in } | ![](images/sampling/shovel_100.jpg){ height=1.7in } 
 
-Which would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size, and thus would yield the "best" guess of the proportion of the bowl's 2400 balls that are red. The three shovels above present with three possible sample sizes. Using our newly developed tools for virtual sampling simulations, let's unpack the effect of having different sample sizes! In other words, for `size = 25`, `size = 50`, and `size = 100`:
+If your goal was still to estimate the proportion of the bowl's balls that were red, which shovel would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size and hence would yield the "best" guess of the proportion of the bowl's 2400 balls that are red. Using our newly developed tools for virtual sampling simulations, let's unpack the effect of having different sample sizes! In other words, let's use `rep_sample_n()` with `size = 25`, `size = 50`, and `size = 100`, while keeping the number of repeated/replicated samples at 1000:
 
 1. Virtually use the appropriate shovel to generate 1000 samples with `size` balls.
 1. Compute the resulting 1000 replicated of the proportion of the shovel's balls that are red.
@@ -505,17 +518,18 @@ virtual_prop_red_100 <- virtual_samples_100 %>%
   mutate(prop_red = red / 100) %>% 
   mutate(n = 100)
 
-virtual_prop <- bind_rows(virtual_prop_red_25, virtual_prop_red_50,virtual_prop_red_100)
+virtual_prop <- bind_rows(virtual_prop_red_25, virtual_prop_red_50, virtual_prop_red_100)
 
-ggplot(virtual_prop, aes(x = prop_red)) +
+comparing_sampling_distributions <- ggplot(virtual_prop, aes(x = prop_red)) +
   geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
-  labs(x = "Sample proportion red", title = "Comparing the distributions of proportion red for different sample sizes") +
+  labs(x = "Proportion of shovel's balls that are red", title = "Comparing distributions of proportions red for 3 different shovels.") +
   facet_wrap(~n)
+comparing_sampling_distributions
 ```
 
-Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation, and the distribution centers more tightly around the same value. Eyeballing Figure \@ref(fig:comparing-sampling-distributions), things appear to center more tightly around roughly 40%.
+Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation and the distribution centers more tightly around the same value. Eyeballing Figure \@ref(fig:comparing-sampling-distributions), things appear to center tightly around roughly 40%.
 
-We can be numerically explicit about the amount of spread using the *standard deviation*: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix \@ref(appendixA) for a brief discussion on properties of the standard deviation. For all three sample sizes, compute the standard deviation of `sd()` of the 1000 proportions red by running the following data wrangling code.
+We can be numerically explicit about the amount of spread in our 3 sets of 1000 values of `prop_red` using the *standard deviation*: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix \@ref(appendixA) for a brief discussion on properties of the standard deviation. For all three sample sizes, let's compute the standard deviation of the 1000 proportions red by running the following data wrangling code that uses the `sd()` summary function.
 
 ```{r, eval = FALSE}
 # n = 25
@@ -531,16 +545,18 @@ virtual_prop_red_100 %>%
   summarize(sd = sd(prop_red))
 ```
 
-Let's compare these 3 measures of spread of the distributions we in Table \@ref(tab:comparing-n).
+Let's compare these 3 measures of spread of the distributions in Table \@ref(tab:comparing-n).
 
 ```{r comparing-n, eval=TRUE, echo=FALSE}
-virtual_prop %>% 
+comparing_n_table <- virtual_prop %>% 
   group_by(n) %>% 
   summarize(sd = sd(prop_red)) %>% 
-  rename(`sample size` = n, `standard deviation` = sd) %>% 
+  rename(`Number of slots in shovel` = n, `Standard deviation of proportions red` = sd) 
+
+comparing_n_table  %>% 
   kable(
     digits = 3,
-      caption = "Comparing the standard deviations of the proportion red for different sample sizes.", 
+      caption = "Comparing standard deviations of proportions red for 3 different shovels.", 
       booktabs = TRUE
 ) %>% 
   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
@@ -550,280 +566,266 @@ virtual_prop %>%
 As we observed visually in Figure \@ref(fig:comparing-sampling-distributions), as the sample size increases our numerical measure of spread decreases; there is less variation in our proportions red. In other words, as the sample size increases, our guesses at the true proportion of the bowl's balls that are red get more consistent and precise. 
 
 
----
 
 
 
-## Our goal {#sampling-goal}
+***
 
-Simply put: study the effects of sampling variation
 
-### What is sampling variation?
-
-### Effect of sample size
 
+## Sampling framework {#sampling-framework}
 
+In both our "hands-on" tactile simulations and our "virtual" simulations using a computer, we used sampling for the purpose of estimation: we extract samples in order to estimate the proportion of the bowl's balls that are red. We used sampling as a cheaper and less-time consuming approach than to do a full census of all the balls. Our virtual simulations all built up to the results shown in Figure \@ref(fig:comparing-sampling-distributions) and Table \@ref(tab:comparing-n), comparing 1000 proportions red based on samples of size 25, 50, and 100. This was our first attempt at understanding two key concepts relating to sampling for estimation:
 
----
+1. The effect of sampling variation on our estimates.
+1. The effect of sample size on sampling variation.
 
+Let's now introduce some terminology and notation as well as statistical definitions related to sampling. Given the number of new words to learn, you will likely have to read these next three subsections multiple times. Keep in mind however that none of the concepts underlying these terminology, notation, and definitions are any different than the concepts underlying our simulations in Sections \@ref(sampling-activity) and \@ref(sampling-simulation); it will simply take time and practice to master them. 
 
 
-## Sampling framework {#sampling-framework}
-### Terminology
-
-Let's now define some concepts and terminology important to understand sampling, being sure to tie things back to the above example. You might have to read this a couple times more as you progress throughout this book, as they are very deeply layered concepts. However as we'll soon see, they are very powerful concepts that open up a whole new world of scientific thinking:
-
-1. **Population**: The population is a set of $N$ observations of interest.
-    + Above Ex: Our bowl consisting of $N=2400$ identically-shaped balls. 
-1. **Population parameter**: A population parameter is a numerical summary value about the population. In most settings, this is a value that's unknown and you wish you knew it.
-    + Above Ex: The true *population proportion $p$* of the balls in the bowl that are red.
-    + In this scenario the parameter of interest is the proportion, but in others it could be numerical summary values like the mean, median, etc.
-1. **Census**: An exhaustive enumeration/counting of all observations in the population in order to compute the population parameter's numerical value *exactly*.
-    + Above Ex: This corresponds to manually going over all $N=2400$ balls and counting the number that are red, thereby allowing us to compute the population proportion $p$ of the balls that are red exactly. 
-    + When $N$ is small, a census is feasible. However, when $N$ is large, a census can get very expensive, either in terms of time, energy, or money. 
-    + Ex: the Decennial United States census attempts to exhaustively count the US population. Consequently it is a very expensive, but necessary, procedure. 
-1. **Sampling**: Collecting a sample of size $n$ of observations from the population. Typically the sample size $n$ is much smaller than the population size $N$, thereby making sampling a much cheaper procedure than a census. 
-    + Above Ex: Using the shovel to extract a sample of $n=50$ balls. 
-    + It is important to remember that the lowercase $n$ corresponds to the sample size and uppercase $N$ corresponds to the population size, thus  $n \leq N$.
-1. **Point estimates/sample statistics**: A summary statistic based on the sample of size $n$ that *estimates* the unknown population parameter.
-    + Above Ex: it's the *sample proportion $\widehat{p}$* red of the balls in the sample of size $n=50$. 
-    + Key: The sample proportion red $\widehat{p}$ is an *estimate* of the true unknown population proportion red $p$.
-1. **Representative sampling**: A sample is said be a *representative sample* if it "looks like the population." In other words, the sample's characteristics are a good representation of the population's characteristics.
-    + Above Ex: Does our sample of $n=50$ balls "look like" the contents of the larger set of $N=2400$ balls in the bowl?
-1. **Generalizability**: We say a sample is *generalizable* if any results of based on the sample can generalize to the population.
-    + Above Ex: Is $\widehat{p}$ a "good guess" of $p$? 
-    + In other words, can we *infer* about the true proportion of the balls in the bowl that are red, based on the results of our sample of $n=50$ balls?
-1. **Bias**: In a statistical sense, we say *bias* occurs if certain observations in a population have a higher chance of being sampled than others. We say a sampling procedure is *unbiased* if every observation in a population had an equal chance of being sampled. 
-    + Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? We feel since the balls are all of the same size, there isn't any bias in the sampling. If, say, the red balls had a much larger diameter than the white ones then you might have have a higher or lower probability of now sampling red balls.
-1. **Random sampling**: We say a sampling procedure is *random* if we sample randomly from the population in an unbiased fashion.
-    + Above Ex: As long as you mixed the bowl sufficiently before sampling, your samples of size $n=50$ balls would be random. 
-
-### Sampling for inference
-
-Why did we go through the trouble of enumerating all the above concepts and terminology?
-
-**The moral of the story**:
+### Terminology & notation
 
-> * If the sampling of a sample of size $n$ is done at **random**, then
-> * The sample is **unbiased** and **representative** of the population, thus
-> * Any result based on the sample can **generalize** to the population, thus
-> * The **point estimate/sample statistic** is a "good guess" of the unknown population parameter of interest
+Here is a list of terminology and mathematical notation relating to sampling. For each item, we'll be sure to tie them to our simulations in Sections \@ref(sampling-activity) and \@ref(sampling-simulation). 
 
-**and thus we have inferred about the population based on our sample. In the above example**:
+1. **(Study) Population**: A (study) population is a collection of individuals or observations about which we are interested. We mathematically denote the population's size using upper case $N$.  In our simulations the (study) population was the collection of $N$ = 2400 identically sized red and white balls contained in the bowl.
+1. **Population parameter**: A population parameter is a numerical summary quantity about the population that is unknown, but you wish you knew. For example, when this quantity is a mean, the population parameter of interest is the *population mean* which is mathematically denoted with the Greek letter $\mu$ (pronounced "mu"). In our simulations however since we were interested in the proportion of the bowl's balls that were red, the population parameter is the *population proportion* which is mathematically denoted with the letter $p$. 
+1. **Census**: An exhaustive enumeration or counting of all $N$ individuals or observations in the population in order to compute the population parameter's value *exactly*. In our simulations, this would correspond to manually going over all $N$ = 2400 balls in the bowl and counting the number that are red and computing the population proportion $p$ of the balls that are red *exactly*. When the number $N$ of individuals or observations in our population is large, as was the case with our bowl, a census can be very expensive in terms of time, energy, and money. 
+1. **Sampling**: Sampling is the act of collecting a sample from the population when we don't have the means to perform a census. We mathematically denote the sample's size using lower case $n$, as opposed to upper case $N$ which denotes the population's size. Typically the sample size $n$ is much smaller than the population size $N$, thereby making sampling a much cheaper procedure than a census. In our simulations, we used shovels with 25, 50, and 100 slots to extract a sample of size $n$ = 25, $n$ = 50, and $n$ = 100 balls. 
+1. **Point estimate (AKA sample statistic)**: A summary statistic computed from the sample that *estimates* the unknown population parameter. In our simulations, recall that the unknown population parameter was the population proportion and that this is mathematically denoted with $p$. Our point estimate is the *sample proportion*: the proportion of the shovel's balls that are red. In other words, it is our guess of the proportion of the bowl's balls balls that are red. We mathematically denote the sample proportion using $\widehat{p}$; the "hat" on top of the $p$ indicates that it is an estimate of the unknown population proportion $p$. 
+1. **Representative sampling**: A sample is said be a *representative sample* if it is representative of the population. In other words, are the sample's characteristics a good representation of the population's characteristics? In our simulations, are the samples of $n$ balls extracted using our shovels representative of the bowl's $N$=2400 balls?
+1. **Generalizability**: We say a sample is *generalizable* if any results based on the sample can generalize to the population. In other words, can the value of the point estimate be generalized to estimate the value of the population parameter well? In our simulations, can we generalize the values of the sample proportions red of our shovels to the population proportion red of the bowl? Using mathematical notation, is $\widehat{p}$ a "good guess" of $p$?
+1. **Bias**: In a statistical sense, we say *bias* occurs if certain individuals or observations in a population have a higher chance of being included in a sample than others. We say a sampling procedure is *unbiased* if every observation in a population had an equal chance of being sampled. In our simulations, since each ball had the same size and hence an equal chance of being sample in our shovels, our samples were unbiased.
+1. **Random sampling**: We say a sampling procedure is *random* if we sample randomly from the population in an unbiased fashion. In our simulations, this would correspond to sufficiently mixing the bowl before each use of the shovel. 
 
-> * If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size $n=50$, then
-> * The contents of the shovel will "look like" the contents of the bowl, thus
-> * Any results based on the sample of $n=50$ balls can generalize to the large bowl of $N=2400$ balls, thus
-> * The sample proportion $\widehat{p}$ of the $n=50$ balls in the shovel that are red is a "good guess" of the true population proportion $p$ of the $N=2400$ balls that are red.
+Phew, that's a lot of new terminology and notation to learn! Let's put them all together to describe the paradigm of sampling:
 
-**and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel.**
+> * If the sampling of a sample of size $n$ is done at **random**, then
+> * the sample is **unbiased** and **representative** of the population of size $N$, thus
+> * any result based on the sample can **generalize** to the population, thus
+> * the point estimate is a **"good guess"** of the unknown population parameter, thus
+> * instead of performing a census, we can **infer** about the population using sampling.
 
-### Statistical definitions
+Restricting consideration to a shovel with 50 slots from our simulations,
 
-Sampling distributions are a specific kind of distribution: distributions of *point estimates/sample statistics* based on samples of size $n$ used to estimate an unknown *population parameter*. 
+> * If we extract a sample of $n=50$ balls at **random**, in other words we mix the equally-sized balls before using the shovel, then
+> * the contents of the shovel are an **unbiased representation** of the contents of the bowl's 2400 balls, thus
+> * any result based on the sample of balls can **generalize** to the bowl, thus
+> * the sample proportion $\widehat{p}$ of the $n=50$ balls in the shovel that are red is a **"good guess"** of the population proportion $p$ of the $N$=2400 balls that are red, thus
+> * instead of manually going over all the balls in the bowl, we can **infer** about the bowl using the shovel. 
 
-In the case of the histogram in Figure \@ref(fig:samplingdistribution-tactile), its the distribution of the sample proportion red $\widehat{p}$ based on $n=50$ sampled balls from the bowl, for which we want to estimate the unknown *population proportion* $p$ of the $N=2400$ balls that are red. Sampling distributions describe how values of the sample proportion red $\widehat{p}$ will vary from sample to sample due to **sampling variability** and thus identify "typical" and "atypical" values of $\widehat{p}$. For example
+Note that last word we wrote in bold: **infer**. The act of "inferring" is to deduce or conclude (information) from evidence and reasoning. In our simulations, we wanted to infer about the proportion of the bowl's balls that are red. *Statistical inference* is the theory, methods, and practice of forming judgments about the parameters of a population and the reliability of statistical relationships, typically on the basis of random sampling (Wikipedia). In other words, statistical inference is the act of inference via sampling. In the upcoming Chapter \@ref(confidence-intervals) on confidence intervals, we'll introduce the `infer` package, which makes statistical inference "tidy" and transparent. It is why this third portion of the book is called "Statistical inference via infer".
 
-* Obtaining a sample that yields $\widehat{p} = 0.36$ would be considered typical, common, and plausible since it would in theory occur frequently.
-* Obtaining a sample that yields $\widehat{p} = 0.8$ would be considered atypical, uncommon, and implausible since it lies far away from most of the distribution.
+### Statistical definitions
 
-Let's now ask ourselves the following questions:
+Now for some important statistical definitions related to sampling. As a refresher of our 1000 repeated/replicated virtual samples of size $n$ = 25, $n$ = 50, and $n$ = 100 in Section \@ref(sampling-simulation), let's display Figure \@ref(fig:comparing-sampling-distributions) again below.  
 
-1. Where is the sampling distribution centered? 
-1. What is the spread of this sampling distribution?
+```{r echo=FALSE}
+comparing_sampling_distributions
+```
 
-Recall from Section \@ref(summarize) the mean and the standard deviation are two summary statistics that would answer this question:
+These types of distributions have a special name: **sampling distributions**; their visualization displays the effect of sampling variation on the distribution of any point estimate, in this case the sample proportion $\widehat{p}$. Using these sampling distributions, for a given sample size $n$, we can make statements about what values we can typically expect. For example, observe the centers of all three sampling distributions: they are all roughly centered around 0.4 = 40%. Furthermore, observe that while we are somewhat likely to observe sample proportions red of 0.2 = 20% when using the shovel with 25 slots, we will almost never observe this sample proportion when using the shovel with 100 slots. Observe also the effect of sample size on the sampling variation. As the sample size $n$ increases from 25 to 50 to 100, the spread/variation of the sampling distribution decreases and thus the values cluster more and more tightly around the same center of around 40%. We quantified this spread/variation using the standard deviation of our proportions in Table \@ref(tab:comparing-n), which we display again below:
 
-```{r, eval=FALSE}
-tactile_prop_red %>% 
-  summarize(mean = mean(prop_red), sd = sd(prop_red))
-```
-```{r, echo=FALSE}
-summary_stats <- tactile_prop_red %>% 
-  summarize(mean = mean(prop_red), sd = sd(prop_red))
-summary_stats %>% 
-  kable(digits = 3) %>% 
-  kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
-                latex_options = c("HOLD_position"))
+```{r, eval=TRUE, echo=FALSE}
+comparing_n_table  %>% 
+  kable(digits = 3)
 ```
 
-Finally, it's important to keep in mind:
+So as the number of slots in the shovel increased, this standard deviation decreased. These types of standard deviations have another special name: **standard errors**; they quantify the effect of sampling variation induced on our estimates. In other words, they are quantifying how much we can expect different proportions of a shovel's balls that are red to vary from random sample to random sample. 
 
-1. If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red $p$, or in other words the true number of balls out of 2400 that are red.
-1. The spread of this histogram, as quantified by the standard deviation of `r summary_stats %>% pull(sd) %>% round(3)`, is called the **standard error**. It quantifies the uncertainty of our estimates of $p$, which recall are called $\widehat{p}$.
-    + **Note**: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors.
+Unfortunately, many new statistics practitioners get confused by these names. For example, it's common for people new to statistical inference to call the "sampling distribution" the "sample distribution". Another additional source of confusion is the name "standard deviation" and "standard error". Remember that a standard error is merely a *kind* of standard deviation: the standard deviation of any point estimate from a sampling scenario. In other words, all standard errors are standard deviations, but not all standard deviations are a standard error. 
 
+To help reinforce these concepts, let's re-display Figure \@ref(fig:comparing-sampling-distributions) but using our new terminology, notation, and definitions relating to sampling in Figure \@ref(fig:comparing-sampling-distributions-2). 
 
-* sampling distribution
-* standard error
+```{r comparing-sampling-distributions-2, echo=FALSE, fig.cap="Three sampling distributions of the sample proportion $\\widehat{p}$."}
+virtual_prop %>% 
+  mutate(
+    n = str_c("n = ", n),
+    n = factor(n, levels = c("n = 25", "n = 50", "n = 100"))
+    ) %>% 
+  ggplot( aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
+  labs(x = expression(paste("Sample proportion ", hat(p))), 
+       title = expression(paste("Sampling distributions of the sample proportion ", hat(p), " based on n = 25, 50, 100.")) ) +
+  facet_wrap(~n)
+```
 
-<!--
-virtual_histogram <- virtual_histogram +
-  labs(
-    x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
-    title = "Sampling distribution: Virtual"
-    )
--->
+Furthermore, let's re-display Table \@ref(tab:comparing-n) but using our new terminology, notation, and definitions relating to sampling in Table \@ref(tab:comparing-n-2).
 
+```{r comparing-n-2, eval=TRUE, echo=FALSE}
+comparing_n_table <- virtual_prop %>% 
+  group_by(n) %>% 
+  summarize(sd = sd(prop_red)) %>% 
+  mutate(
+    n = str_c("n = ", n),
+    n = factor(n, levels = c("n = 25", "n = 50", "n = 100"))
+    ) %>% 
+  rename(`Sample size` = n, `Standard error of $\\widehat{p}$` = sd) 
 
+comparing_n_table  %>% 
+  kable(
+    digits = 3,
+      caption = "Three standard errors of the sample proportion $\\widehat{p}$ based on n = 25, 50, 100. ", 
+      booktabs = TRUE
+) %>% 
+  kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
+                latex_options = c("HOLD_position"))
+```
 
-Now let's mimic the above *tactile* sampling, but with *virtual* sampling. We'll resort to virtual sampling because while collecting 33 tactile samples manually is feasible, for large numbers like 1000, things start getting tiresome! That's where a computer can really help: computers excel at performing mundane tasks repeatedly; think of what accounting software must be like!
+<!-- Potential Learning Check: Have readers fill in portions of the table instead. Have them write "Standard error of p-hat" and "sample size", and match the values of the standard errors with the different values of n. We could also use different sizes here instead of just re-displaying Figure 8.11 to ensure they aren't just looking back to match. -->
 
-In Figure \@ref(fig:samplingdistribution-virtual), we can start seeing a pattern in the sampling distribution emerge. However, 33 values of the sample proportion $\widehat{p}$ might not be enough to get a true sense of the distribution. Using 1000 values of $\widehat{p}$ would definitely give a better sense. What are our two options for constructing these histograms?
+Remember the key message of this last table: that as the sample size $n$ goes up, the "typical" error of your point estimate as quantified by the standard error will go down.
 
-1. Tactile sampling: Make the 33 groups of students take $1000 / 33 \approx 31$ samples of size $n=50$ each, count the number of red balls for each of the 1000 tactile samples, and then compute the 1000 corresponding values of the sample proportion $\widehat{p}$. However, this would be cruel and unusual as this would take hours!
-1. Virtual sampling: Computers are very good at automating repetitive tasks such as this one. This is the way to go!
 
-First, generate 1000 samples of size $n=50$
+### The moral of the story
 
-```{r, eval=FALSE}
-virtual_samples <- bowl %>% 
-  rep_sample_n(size = 50, reps = 1000)
-View(virtual_samples)
-```
-```{r, echo=FALSE}
-virtual_samples <- bowl %>% 
-  rep_sample_n(size = 50, reps = 1000)
-```
+Let's recap this section so far. We've seen that if a sample is generated at random, then the resulting point estimate is a "good guess" of the true unknown population parameter. In our simulations, since we made sure to mix the balls first before extracting a sample with the shovel, the resulting sample proportion $\widehat{p}$ of the shovel's balls that were red was a "good guess" of the population proportion $p$ of the bowl's balls that were red. 
 
-Then for each of these 1000 samples of size $n=50$, compute the corresponding sample proportions
+However, what do we mean by our point estimate being a "good guess"? While sometimes we'll obtain a point estimate less than the true value of the unknown population parameter, other times we'll obtain a point estimate greater than the true value of the unknown population parameter, this is because of sampling variation. However despite this sampling variation, our point estimates will "on average" be correct. In our simulations, sometimes our sample proportion $\widehat{p}$ was less than the true population proportion $p$, other times the sample proportion $\widehat{p}$ was greater than the true population proportion $p$. This was due to the sampling variability induced by the mixing. However despite this sampling variation, our sample proportions $\widehat{p}$ were always centered around the true population proportion. This is also known as having an **accurate** estimate.
 
-```{r, eval=FALSE}
-virtual_prop_red <- virtual_samples %>% 
-  group_by(replicate) %>% 
-  summarize(red = sum(color == "red")) %>% 
-  mutate(prop_red = red / 50)
-View(virtual_prop_red)
-````
-```{r, echo=FALSE}
-virtual_prop_red <- virtual_samples %>% 
-  group_by(replicate) %>% 
-  summarize(red = sum(color == "red")) %>% 
-  mutate(prop_red = red / 50)
+What was the value of the population proportion $p$ of the $N$ = 2400 balls in the actual bowl? There were 900 red balls, for a proportion red of 900/2400 = 0.375 = 37.5%! How do we know this? Did the authors do an exhaustive count of all the balls? No! They were listed on the contexts of the box that the bowl came in. Hence we made the contents of the virtual `bowl` match the tactile bowl:
+
+```{r}
+bowl %>% 
+  summarize(sum_red = sum(color == "red"), 
+            sum_not_red = sum(color != "red"))
 ```
 
-As previously done, let's plot the sampling distribution of these 1000 simulated values of the sample proportion red $\widehat{p}$ with a histogram in Figure \@ref(fig:samplingdistribution-virtual-1000).
+Let's re-display our sampling distributions from Figures \@ref(fig:comparing-sampling-distributions) and \@ref(fig:comparing-sampling-distributions-2), but now with a vertical red line marking the true population proportion $p$ of balls that are red = 37.5% in Figure \@ref(fig:comparing-sampling-distributions-3). We see that while there is a certain amount of error in the sample proportions $\widehat{p}$ for all three sampling distributions, on average the $\widehat{p}$ are centered at the true population proportion red $p$.
 
-```{r, eval=FALSE}
-ggplot(virtual_prop_red, aes(x = prop_red)) +
-  geom_histogram(binwidth = 0.05, color = "white") +
-  labs(x = "Sample proportion red based on n = 50", 
-       title = "Sampling distribution of p-hat") 
+```{r comparing-sampling-distributions-3, echo=FALSE, fig.cap="Three sampling distributions with population proportion $p$ marked in red."}
+p <- bowl %>% 
+  summarize(p = mean(color == "red")) %>% 
+  pull(p)
+virtual_prop %>% 
+  mutate(
+    n = str_c("n = ", n),
+    n = factor(n, levels = c("n = 25", "n = 50", "n = 100"))
+    ) %>% 
+  ggplot( aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
+  labs(x = expression(paste("Sample proportion ", hat(p))), 
+       title = expression(paste("Sampling distributions of the sample proportion ", hat(p), " based on n = 25, 50, 100.")) ) +
+  facet_wrap(~n) +
+  geom_vline(xintercept = p, col = "red", size = 1)
 ```
-```{r echo=FALSE, fig.cap="Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50"}
-virtual_prop_red <- virtual_samples %>% 
-  group_by(replicate) %>% 
-  summarize(red = sum(color == "red")) %>% 
-  mutate(prop_red = red / 50)
 
-ggplot(virtual_prop_red, aes(x = prop_red)) +
-  geom_histogram(binwidth = 0.05, color = "white") +
-    labs(
-      x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
-      title = expression(paste("Sampling distribution of ", hat(p)))
-      )
-```
+We also saw in this section that as your sample size $n$ increases, your point estimates will vary less and less and be more and more concentrated around the true population parameter; this is quantified by the decreasing standard error. In other words, the typical error of your point estimates will decrease. In our simulations, as the sample size increases, the spread/variation of our sample proportions $\widehat{p}$ around the true population proportion $p$ decreases. You can observe this behavior as well in Figure \@ref(fig:comparing-sampling-distributions-3).  This is also known as having a more **precise** estimate. 
 
-Since the sampling is random and thus representative and unbiased, the above sampling distribution is centered at the true population proportion red $p$ of all $N=2400$ balls in the bowl. Eyeballing it, the sampling distribution appears to be centered at around 0.375. 
+So random sampling ensures our point estimates are accurate, while having a large sample size ensures our point estimates are precise. While accuracy and precision may sound like the same concept, they are actually not. Accuracy relates to how "on target" our estimates are whereas precision relates to how "consistent" our estimates are. Figure \@ref(fig:accuracy-vs-precision) illustrates the difference.
 
-What is the standard error of the above sampling distribution of $\widehat{p}$ based on 1000 samples of size $n=50$? 
+<!-- Albert will check to see if this image exists on shutterstock. If not, 
+it will need to be recreated. -->
 
-```{r}
-virtual_prop_red %>% 
-  summarize(SE = sd(prop_red))
+```{r accuracy-vs-precision, echo=FALSE, fig.cap="Comparing accuracy and precision", purl=FALSE, out.width = "50%"}
+knitr::include_graphics("images/accuracy_vs_precision.jpg")
 ```
-What this value is saying might not be immediately apparent by itself to someone who is new to sampling. It's best to first compare different standard errors for different sampling schemes based on different sample sizes $n$. We'll do so for samples of size $n=25$, $n=50$, and $n=100$ next.
 
+As this point you might be asking yourself: "If you already knew the true proportion of the bowl's balls that are red was 37.5%, then what did we do any of this for?" In other words, "If you already knew the value of the true unknown population parameter, then why did we do any sampling?" You might also be asking: "Why did we take 1000 repeated/replicated samples of size n = 25, 50, and 100? Shouldn't we be taking only *one* sample that's as large as possible?" Recall our definition of a simulation from Section \@ref(sampling-simulation): an approximate imitation of the operation of a process or system. We performed these simulations to study:
 
----
+1. The effect of sampling variation on our estimates.
+1. The effect of sample size on sampling variation.
 
+In a real-life scenario, we won't know what the true value of the population parameter is and furthermore we won't take repeated/replicated samples but rather a single sample that's as large as we can afford. This was also done to show the power of the technique of sampling when trying to estimate a population parameter. Since we knew the value was 37.5%, we could show just how well the different sample sizes approximated this value in their sampling distributions. We present one case study of a real-life sampling scenario in the next section: polling. 
 
 
-## Interpretation {#sampling-intepretation}
 
-At this point, you might be saying to yourself: "Big deal, why do we care about this bowl?" As hopefully you'll soon come to appreciate, this sampling bowl exercise is merely a **simulation** representing the reality of many important sampling scenarios in a simplified and accessible setting. One in particular sampling scenario is familiar to many: polling. Whether for market research or for political purposes, polls inform much of the world's decision and opinion making, and understanding the mechanism behind them can better inform you statistical citizenship. We'll tie-in everything we learn in this chapter with an example relating to a 2013 poll on President Obama's approval ratings among young adults in Section \@ref(polls).
-
-
-
----
+***
 
 
 
 ## Case study: Polls {#sampling-case-study}
 
-In December 4, 2013 National Public Radio reported on a recent poll of President Obama's approval rating among young Americans aged 18-29 in an article [Poll: Support For Obama Among Young Americans Eroding](https://www.npr.org/sections/itsallpolitics/2013/12/04/248793753/poll-support-for-obama-among-young-americans-eroding). A quote from the article:
+In December 4, 2013 National Public Radio in the US reported on a recent, at the time, poll of President Obama's approval rating among young Americans aged 18-29 in an article [Poll: Support For Obama Among Young Americans Eroding](https://www.npr.org/sections/itsallpolitics/2013/12/04/248793753/poll-support-for-obama-among-young-americans-eroding). A quote from the article:
 
 > After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama.
 > 
 > According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama's job performance, his lowest-ever standing among the group and an 11-point drop from April.
 
-Let's tie elements of this story using the concepts and terminology we learned at the outset of this chapter along with our observations from the tactile and virtual sampling simulations:
+Let's tie elements of the real-life poll in this new article with our "tactile" and "virtual" simulations from Sections \@ref(sampling-activity) and \@ref(sampling-simulation) using the terminology, notations, and definitions we learned in Section \@ref(sampling-framework).
 
-1. **Population**: Who is the population of $N$ observations of interest?
-    + Bowl: $N=2400$ identically-shaped balls
-    + Obama poll: $N = \text{?}$ young Americans aged 18-29
+1. **(Study) Population**: Who is the population of $N$ individuals or observations of interest?
+    + Simulation: $N$ = 2400 identically-sized red and white balls
+    + Obama poll: $N$ = ? young Americans aged 18-29
 1. **Population parameter**: What is the population parameter? 
-    + Bowl: The true population proportion $p$ of the balls in the bowl that are red.
-    + Obama poll: The true population proportion $p$ of young Americans who approve of Obama's job performance.
-1. **Census**: What would a census be in this case? 
-    + Bowl: Manually going over all $N=2400$ balls and exactly computing the population proportion $p$ of the balls that are red. 
-    + Obama poll: Locating all $N = \text{?}$ young Americans (which is in the millions) and asking them if they approve of Obama's job performance. This would be quite expensive to do!
-1. **Sampling**: How do you acquire the sample of size $n$ observations?
-    + Bowl: Using the shovel to extract a sample of $n=50$ balls. 
-    + Obama poll: One way would be to get phone records from a database and pick out $n$ phone numbers. In the case of the above poll, the sample was of size $n=2089$ young adults. 
-1. **Point estimates/sample statistics**: What is the summary statistic based on the sample of size $n$ that *estimates* the unknown population parameter?
-    + Bowl: The *sample proportion $\widehat{p}$* red of the balls in the sample of size $n=50$. 
-    + Key: The sample proportion red $\widehat{p}$ of young Americans in the sample of size $n=2089$ that approve of Obama's job performance. In this study's case, $\widehat{p} = 0.41$ which is the quoted 41% figure in the article.
-1. **Representative sampling**: Is the sample procedure *representative*? In other words, to the resulting samples "look like" the population? 
-    + Bowl: Does our sample of $n=50$ balls "look like" the contents of the larger set of $N=2400$ balls in the bowl?
-    + Obama poll: Does our sample of $n=2089$ young Americans "look like" the population of all young Americans aged 18-29?
+    + Simulation: The population proportion $p$ of ALL the balls in the bowl that are red.
+    + Obama poll: The population proportion $p$ of ALL young Americans who approve of Obama's job performance.
+1. **Census**: What would a census look like? 
+    + Simulation: Manually going over all $N$ = 2400 balls and exactly computing the population proportion $p$ of the balls that are red, a time consuming task.
+    + Obama poll: Locating all $N$ = ? young Americans and asking them all if they approve of Obama's job performance, an expensive task.
+1. **Sampling**: How do you collect the sample of size $n$ individuals or observations?
+    + Simulation: Using a shovel with $n$ slots. 
+    + Obama poll: One method is to get a list of phone numbers of all young Americans and pick out $n$ phone numbers. In this poll's case, the sample size of this poll was $n$ = 2089 young Americans.
+1. **Point estimate (AKA sample statistic)**: What is your estimate of the unknown population parameter?
+    + Simulation: The sample proportion $\widehat{p}$ of the balls in the shovel that were red. 
+    + Obama poll: The sample proportion $\widehat{p}$ of young Americans in the sample that approve of Obama's job performance. In this poll's case, $\widehat{p}$ = 0.41 = 41%, the quoted percentage in the second paragraph of the article.
+1. **Representative sampling**: Is the sampling procedure *representative*?
+    + Simulation: Are the contents of the shovel representative of the contents of the bowl?
+    + Obama poll: Is the sample of $n$ = 2089 young Americans representative of all young Americans aged 18-29?
 1. **Generalizability**: Are the samples *generalizable* to the greater population?
-    + Bowl: Is $\widehat{p}$ a "good guess" of $p$? 
-    + Obama poll: Is $\widehat{p} = 0.41$ a "good guess" of $p$? In other words, can we confidently say that 41% of *all* young Americans approve of Obama.
+    + Simulation: Is the sample proportion $\widehat{p}$ of the shovel's balls that are red a "good guess" of the population proportion $p$ of the bowl's balls that are red? 
+    + Obama poll: Is the sample proportion $\widehat{p}$ = 0.41 of the sample of young Americans who support Obama a "good guess" of the population proportion $p$ of all young Americans who support Obama? In other words, can we confidently say that 41% of *all* young Americans approve of Obama?
 1. **Bias**: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample?
-    + Bowl: Here, I would say it is unbiased. All balls are equally sized as evidenced by the slots of the $n=50$ shovel, and thus no particular color of ball can be favored in our samples over others. 
-    + Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using a database of only mobile phone numbers, would people without mobile phones be included? What about if this were an internet poll on a certain news website? Would non-readers of this this website be included?
+    + Simulation: Since each ball was equally sized, each ball had an equal chance of being included in a shovel's sample, and hence the sampling was unbiased. 
+    + Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using only mobile phone numbers, would people without mobile phones be included? What if those who disapproved of Obama were less likely to agree to take part in the poll? What about if this were an internet poll on a certain news website? Would non-readers of this website be included? We need to ask the Harvard University Institute of Politics pollsters about their *sampling methodology*. 
 1. **Random sampling**: Was the sampling random?
-    + Bowl: As long as you mixed the bowl sufficiently before sampling, your samples would be random?
-    + Obama poll: Random sampling is a necessary assumption for all of the above to work.  Most articles reporting on polls take this assumption as granted. In our Obama poll, you'd have to ask the group that conducted the poll: The Harvard University Institute of Politics.
+    + Simulation: As long as you mixed the bowl sufficiently before sampling, your samples would be random.
+    + Obama poll: Was the sample conducted at random? We need to ask the Harvard University Institute of Politics pollsters about their *sampling methodology*. 
     
-Recall the punchline of all the above:
+Once again, let's revisit the sampling paradigm:
 
 > * If the sampling of a sample of size $n$ is done at **random**, then
-> * The sample is **unbiased** and **representative** of the population, thus
-> * Any result based on the sample can **generalize** to the population, thus
-> * The **point estimate/sample statistic** is a "good guess" of the unknown population parameter of interest
+> * the sample is **unbiased** and **representative** of the population of size $N$, thus
+> * any result based on the sample can **generalize** to the population, thus
+> * the point estimate is a **"good guess"** of the unknown population parameter, thus
+> * instead of performing a census, we can **infer** about the population using sampling.
 
-and thus we have *inferred* about the population based on our sample. In the bowl example:
+In our simulations using the shovel with 50 slots:
 
-> * If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size $n=50$, then
-> * The contents of the shovel will "look like" the contents of the bowl, thus
-> * Any results based on the sample of $n=50$ balls can generalize to the large bowl of $N=2400$ balls, thus
-> * The sample proportion $\widehat{p}$ of the $n=50$ sampled balls in the shovel that are red is a "good guess" of the true population proportion $p$ of the $N=2400$ balls that are red.
+> * If we extract a sample of $n$ = 50 balls at **random**, in other words we mix the equally-sized balls before using the shovel, then
+> * the contents of the shovel are an **unbiased representation** of the contents of the bowl's 2400 balls, thus
+> * any result based on the sample of balls can **generalize** to the bowl, thus
+> * the sample proportion $\widehat{p}$ of the $n$ = 50 balls in the shovel that are red is a **"good guess"** of the population proportion $p$ of the $N$ = 2400 balls that are red, thus
+> * instead of manually going over all the balls in the bowl, we can **infer** about the bowl using the shovel. 
 
-and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel: the proportion of balls that are red. In the Obama poll example:
+In the in-real life Obama poll:
 
-> * If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then
-> * These 2089 young Americans would "look like" the population of all young Americans, thus
-> * Any results based on this sample of 2089 young Americans can generalize to entire population of all young Americans, thus
-> * The reported sample approval rating of 41% of these 2089 young Americans is a "good guess" of the true approval rating amongst *all* young Americans.
+> * If we had a way of contacting a **randomly** chosen sample of 2089 young Americans and poll their approval of Obama, then
+> * these 2089 young Americans would be an **unbiased** and **representative** sample of *all* young Americans, thus 
+> * any results based on this sample of 2089 young Americans can **generalize** to the entire population of all young Americans, thus
+> * the reported sample approval rating of 41% of these 2089 young Americans is a **good guess** of the true approval rating among all young Americans, thus
+> * instead of performing a highly costly census of all young Americans, we can **infer** about all young Americans using polling.
 
-So long story short, this poll's guess of Obama's approval rating was 41%. However is this the end of the story when understanding the results of a poll?  If you read further in the article, it states:
+<!-- Albert will include different 5-10 problem statements here (made-up is perfectly fine) for students to practice these ideas as Learning Checks. -->
 
-> The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll's margin of error was plus or minus 2.1 percentage points.
 
-Note the term *margin of error*, which here is plus or minus 2.1 percentage points.  This is saying that a typical range of errors for polls of this type is about $\pm 2.1\%$, in words from about 2.1% too small to about 2.1% too big. These errors are caused by *sampling variation*, the same sampling variation you saw studied in the histograms in Sections \@ref(tactile) on our tactile sampling simulations and Sections \@ref(virtual) on our virtual sampling simulations.
 
-In this case of polls, any variation from the true approval rating is an "error" and a reasonable range of errors is the margin of error. We'll see in the next chapter that this what's known as a 95% confidence interval for the unknown approval rating. We'll study confidence intervals using a new package for our data science and statistical toolbox: the `infer` package for statistical inference. 
+***
 
 
+## Conclusion {#sampling-conclusion}
 
----
+<!--
+### Random sampling vs random assignment {#sampling-conclusion-sampling-vs-assignment}
 
+As big point of confusion is the difference between random sampling and random assignment.
+-->
 
 
-## Conclusion {#sampling-conclusion}
-### Table of inference scenarios {#sampling-conclusion-table}
+### Central Limit Theorem {#sampling-conclusion-central-limit-theorem}
+
+What you did in Sections \@ref(sampling-activity) and \@ref(sampling-simulation) (in particular in Figure \@ref(fig:comparing-sampling-distributions) and Table \@ref(tab:comparing-n)) was demonstrate a very famous theorem, or mathematically proven truth, called the *Central Limit Theorem*. It loosely states that when sample means and sample proportions are based on larger and larger sample sizes, the sampling distribution of these two point estimates become more and more normally shaped and more and more narrow. In other words, their sampling distributions become more normally distributed and the spread/variation of these sampling distributions as quantified by their standard errors gets smaller. Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following 3m38s video at https://www.youtube.com/embed/jvoxEYmQHNM explaining this crucial statistical theorem using the average weight of wild bunny rabbits and the average wing span of dragons as examples. Enjoy!
+
+<center>
+<iframe width="800" height="450" src="https://www.youtube.com/embed/jvoxEYmQHNM" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
+</center>
+
+
+### Summary table {#sampling-conclusion-table}
+
+In this chapter, we performed both tactile and virtual simulations of sampling to infer about an unknown proportion. We also presented a case study of a sampling in real life situation: polls. In both cases, we used the sample proportion $\widehat{p}$ to estimate the population proportion $p$. However, we are not just limited to scenarios related statistical inference for proportions. In other words, we can consider other population parameter and point estimate scenarios than just the population proportion $p$ and sample proportion $\widehat{p}$ scenarios we studied in this chapter. We present 5 more such scenarios in Table \@ref(tab:summarytable-ch8). 
+
+Note that the sample mean is traditionally noted as $\overline{x}$ but can also be thought of as an estimate of the population mean $\mu$. Thus, it can also be denoted as $\widehat{\mu}$ as shown below in the table.
+
 ```{r summarytable-ch8, echo=FALSE, message=FALSE}
 # The following Google Doc is published to CSV and loaded below using read_csv() below:
 # https://docs.google.com/spreadsheets/d/1QkOpnBGqOXGyJjwqx1T2O5G5D72wWGfWlPyufOgtkk4/edit#gid=0
@@ -831,7 +833,7 @@ In this case of polls, any variation from the true approval rating is an "error"
 "https://docs.google.com/spreadsheets/d/e/2PACX-1vRd6bBgNwM3z-AJ7o4gZOiPAdPfbTp_V15HVHRmOH5Fc9w62yaG-fEKtjNUD2wOSa5IJkrDMaEBjRnA/pub?gid=0&single=true&output=csv" %>% 
   read_csv(na = "") %>% 
   kable(
-    caption = "\\label{tab:summarytable}Scenarios of sampling for inference", 
+    caption = "\\label{tab:summarytable-ch8}Scenarios of sampling for inference", 
     booktabs = TRUE,
     escape = FALSE
   ) %>% 
@@ -844,50 +846,27 @@ In this case of polls, any variation from the true approval rating is an "error"
   column_spec(5, width = "1in")
 ```
 
-We'll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing:
-
-* Scenario 2 about means. Ex: the average age of pennies.
-* Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of *two-sample* inference.
-* Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This is another situation of *two-sample* inference.
-
-In Chapter \@ref(inference-for-regression) on inference for regression, we'll cover Scenarios 5 & 6 about the regression line. In particular we'll see that the fitted regression line from Chapter \@ref(regression) on basic regression, $\widehat{y} = b_0 + b_1 \cdot x$, is in fact an estimate of some true population regression line $y = \beta_0 + \beta_1 \cdot x$ based on a sample of $n$ pairs of points $(x, y)$. Ex: Recall our sample of $n=463$ instructors at the UT Austin from the `evals` data set in Chapter \@ref(regression). Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for *all* instructors, not just those at the UT Austin?
-
-In most cases, we don't have the population values as we did with the `bowl` of balls. We only have a single sample of data from a larger population. We'd like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a **confidence interval** and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as **bootstrapping** that will be the focus of the beginning sections of this chapter.
-
-
-### Random sampling vs random assignment {#sampling-conclusion-sampling-vs-assignment}
-
+We'll cover all the remaining scenarios as follows, using the terminology, notation, and definitions related to sampling you saw in Section \@ref(sampling-framework):
 
+* In Chapter \@ref(confidence-intervals), we'll cover examples of statistical inference for
+    + Scenario 2: The mean age $\mu$ of all pennies in circulation in the US.
+    + Scenario 3: The difference $p_1 - p_2$ in the proportion of people who yawn when seeing someone else yawn and the proportion of people who yawn without seeing someone else yawn. This is an example of *two-sample* inference.
+* In Chapter \@ref(hypothesis-testing), we'll cover an example of statistical inference for
+    + Scenario 4: The difference $\mu_1 - \mu_2$ in average IMDB ratings for action and romance movies. This is another example of *two-sample* inference.
+* In Chapter \@ref(inference-for-regression), we'll cover an example of statistical inference for the relationship between teaching score and various instructor demographic variables you saw in Chapter \@ref(regression) on basic regression and Chapter \@ref(multiple-regression) on multiple regression. Specifically
+    + Scenario 5: The intercept $\beta_0$ of some population regression line.
+    + Scenario 6: The slope $\beta_1$ of some population regression line.
 
 
-### Theory: Central Limit Theorem {#sampling-conclusion-central-limit-theorem}
-
-What you did in Section \@ref(tactile) and \@ref(virtual) was demonstrate a very famous theorem, or mathematically proven truth, called the *Central Limit Theorem*. It loosely states that when sample means and sample proportions are based on larger and larger samples, the sampling distribution corresponding to these point estimates get
-
-1. More and more normal
-1. More and more narrow
-
-Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following three minute and 38 second video explaining this crucial theorem to statistics using as examples, what else?
-
-1. The average weight of wild bunny rabbits!
-1. The average wing span of dragons!
-
-<center>
-<iframe width="800" height="450" src="https://www.youtube.com/embed/jvoxEYmQHNM" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
-</center>
-
-
-### Formula: Standard error {#sampling-conclusion-standard-error}
-### Closing notes 
-
-This chapter serves as an introduction to the theoretical underpinning of the statistical inference techniques that will be discussed in greater detail in Chapter \@ref(confidence-intervals) for confidence intervals and Chapter \@ref(hypothesis-testing) for hypothesis testing. 
+### Additional resources
 
 An R script file of all R code used in this chapter is available [here](scripts/08-sampling.R).
 
+### What's to come?
 
+Recall in our Obama poll case study in Section \@ref(sampling-case-study) that based on this particular sample, the Harvard University Institute of Politics' best guess of Obama's approval rating among all young Americans was 41%. However, this isn't the end of the story. If you read further in the article, it states:
 
+> The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll's margin of error was plus or minus 2.1 percentage points.
 
-
-
-
+Note the term *margin of error*, which here is plus or minus 2.1 percentage points. What this is saying is that most polls won't get it perfectly right; there will always be a certain amount of error caused by *sampling variation*. The margin of error of plus or minus 2.1 percentage points is saying that a typical range of errors for polls of this type is about $\pm$ 2.1%, in words from about 2.1% too small to about 2.1% too big for an interval of [41% - 2.1%, 41% + 2.1%] = [37.9%, 43.1%].  Remember that this notation corresponds to 37.9% and 43.1% being included as well as all numbers between the two of them. We'll see in the next chapter that such intervals are known as *confidence intervals*.
 
diff --git a/09-confidence-intervals.Rmd b/09-confidence-intervals.Rmd
index 76ca4e0f7..d249b4b88 100755
--- a/09-confidence-intervals.Rmd
+++ b/09-confidence-intervals.Rmd
@@ -21,6 +21,22 @@ options(scipen = 99, digits = 3)
 set.seed(76)
 ```
 
+
+
+***
+
+
+
+```{block, type='announcement', purl=FALSE}
+**In preparation for our first print edition to be published by CRC Press in Fall 2019, we're remodeling this chapter a bit.  Don't expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at [ModernDive.com](https://moderndive.com/) by early Summer 2019!**
+```
+
+
+
+***
+
+
+
 In Chapter \@ref(sampling), we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter \@ref(sampling):
 
 Generally speaking, we learned that if the sampling of a sample of size $n$ is done at *random*, then the resulting sample is *unbiased* and *representative* of the *population*, thus any result based on the sample can *generalize* to the population, and hence the **point estimate/sample statistic** computed from this sample is a "good guess" of the unknown population parameter of interest
@@ -89,7 +105,7 @@ library(infer)
 
 
 
----
+***
 
 
 
@@ -324,7 +340,7 @@ knitr::include_graphics("images/flowcharts/infer/ci_diagram.png")
 
 
 
----
+***
 
 
 
@@ -524,7 +540,7 @@ If we aren't able to use the sample mean as a good guess for the population mean
 
 
 
----
+***
 
 
 
@@ -661,7 +677,7 @@ After this elaboration on what the level corresponds to in a confidence interval
 
 
 
----
+***
 
 
 
@@ -948,7 +964,7 @@ Theoretical methods like this have largely been used in the past since we didn't
 
 
 
----
+***
 
 
 
@@ -1088,7 +1104,7 @@ Practice problems to come soon!
 
 
 
----
+***
 
 
 
diff --git a/10-hypothesis-testing.Rmd b/10-hypothesis-testing.Rmd
index c58aa5083..b2696863d 100755
--- a/10-hypothesis-testing.Rmd
+++ b/10-hypothesis-testing.Rmd
@@ -21,6 +21,22 @@ options(scipen = 99, digits = 3)
 set.seed(76)
 ```
 
+
+
+***
+
+
+
+```{block, type='announcement', purl=FALSE}
+**In preparation for our first print edition to be published by CRC Press in Fall 2019, we're remodeling this chapter a bit.  Don't expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at [ModernDive.com](https://moderndive.com/) by early Summer 2019!**
+```
+
+
+
+***
+
+
+
 We saw some of the main concepts of hypothesis testing introduced in Chapters \@ref(sampling) and \@ref(confidence-intervals).  We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests.  You can then adapt to different scenarios as needed down the road when you encounter different statistical situations.
 
 The same can be said for confidence intervals.  There was one general framework that applies to all confidence intervals and we elaborated on this using the `infer` package pipeline in Chapter \@ref(confidence-intervals). The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously.  You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix \@ref(appendixB).  
@@ -47,7 +63,7 @@ library(knitr)
 
 
 
----
+***
 
 
 
@@ -77,7 +93,7 @@ library(knitr)
 
 
 
----
+***
 
 
 
@@ -126,7 +142,7 @@ As you get more and more practice with hypothesis testing, you'll be better able
 
 
 
----
+***
 
 
 
@@ -150,7 +166,7 @@ Before we hop into this framework, we will provide another way to think about hy
 
 
 
----
+***
 
 
 
@@ -193,7 +209,7 @@ When you run a hypothesis test, you are the jury of the trial.  You decide wheth
 
 
 
----
+***
 
 
 
@@ -252,7 +268,7 @@ So if we can set $\alpha$ to be whatever we want, why choose 0.05 instead of 0.0
 
 
 
----
+***
 
 
 
@@ -275,7 +291,7 @@ The idea that sample results are more extreme than we would reasonably expect to
 
 
 
----
+***
 
 
 
@@ -295,7 +311,7 @@ We'll first explore the two variable case by comparing two means. Note the secti
 
 
 
----
+***
 
 
 
@@ -651,7 +667,7 @@ we fail to reject $H_0$.  (If no significance level is given, one can assume $\a
 
 
 
----
+***
 
 
 
@@ -663,7 +679,7 @@ These traditional methods have been used for many decades back to the time when
 
 ### Example:  $t$-test for two independent samples
 
-What is commonly done in statistics is the process of normalization.  What this entails is calculating the mean and standard deviation of a variable.  Then you subtract the mean from each value of your variable and divide by the standard deviation.  The most common normalization is known as the $z$-score.  The formula for a $z$-score is $$Z = \frac{x - \mu}{\sigma},$$ where $x$ represent the value of a variable, $\mu$ represents the mean of the variable, and $\sigma$ represents the standard deviation of the variable.  Thus, if your variable has 10 elements, each one has a corresponding $z$-score that gives how many standard deviations away that value is from its mean.  $z$-scores are normally distributed with mean 0 and standard deviation 1.  They have the common, bell-shaped pattern seen below.
+What is commonly done in statistics is the process of standardization.  What this entails is calculating the mean and standard deviation of a variable.  Then you subtract the mean from each value of your variable and divide by the standard deviation.  The most common standardization is known as the $z$-score.  The formula for a $z$-score is $$Z = \frac{x - \mu}{\sigma},$$ where $x$ represent the value of a variable, $\mu$ represents the mean of the variable, and $\sigma$ represents the standard deviation of the variable.  Thus, if your variable has 10 elements, each one has a corresponding $z$-score that gives how many standard deviations away that value is from its mean.  $z$-scores are normally distributed with mean 0 and standard deviation 1.  They have the common, bell-shaped pattern seen below.
 
 ```{r echo=FALSE}
 ggplot(data.frame(x = c(-4, 4)), aes(x)) + stat_function(fun = dnorm)
@@ -671,7 +687,7 @@ ggplot(data.frame(x = c(-4, 4)), aes(x)) + stat_function(fun = dnorm)
 
 Recall, that we hardly ever know the mean and standard deviation of the population of interest.  This is almost always the case when considering the means of two independent groups.  To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity.
 
-Another form of normalization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations.  This normalization is often called the $t$-score.  For the two independent samples case like what we have for comparing action movies to romance movies, the formula is $$T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}}  }$$
+Another form of standardization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations.  This standardization is often called the $t$-score.  For the two independent samples case like what we have for comparing action movies to romance movies, the formula is $$T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}}  }$$
   
 There is a lot to try to unpack here.  
 
@@ -758,7 +774,7 @@ Since all three conditions are met, we can be reasonably certain that the theory
 
 
 
----
+***
 
 
 
diff --git a/11-inference-for-regression.Rmd b/11-inference-for-regression.Rmd
index 11a5cb3f2..e932aed02 100644
--- a/11-inference-for-regression.Rmd
+++ b/11-inference-for-regression.Rmd
@@ -23,20 +23,18 @@ set.seed(76)
 
 
 
----
+***
+
 
-```{block, type='learncheck', purl=FALSE}
-**Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at <https://github.com/moderndive/moderndive_book>.**
 
-<center> 
-/begin{center}
-`r include_image(path = "images/sign-2408065_1920.png", html_opts="height=100px",
-  latex_opts = "width=20%")`
-/end{center}
-</center>
+```{block, type='announcement', purl=FALSE}
+**In preparation for our first print edition to be published by CRC Press in Fall 2019, we're remodeling this chapter a bit.  Don't expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at [ModernDive.com](https://moderndive.com/) by early Summer 2019!**
 ```
 
----
+
+
+***
+
 
 
 ### Needed packages {-}
@@ -61,7 +59,7 @@ library(patchwork)
 
 
 
----
+***
 
 
 
@@ -162,7 +160,7 @@ Since `r pull(slope_obs)` falls far to the right of this plot beyond where any o
 
 
 
----
+***
 
 
 
@@ -182,7 +180,7 @@ To further reinforce the process being done in the pipeline, we've added the `ty
 
 If instead we'd like to get a range of plausible values for the true slope value, we can use the process of bootstrapping:
 
-```{r echo=FALSE}
+```{r}
 bootstrap_slope_distn <- evals %>% 
   specify(score ~ bty_avg) %>%
   generate(reps = 10000, type = "bootstrap") %>% 
@@ -227,7 +225,7 @@ With the bootstrap distribution being close to symmetric, it makes sense that th
 
 
 
----
+***
 
 
 
@@ -333,7 +331,7 @@ An R script file of all R code used in this chapter is available [here](scripts/
 
 
 
----
+***
 
 
 
diff --git a/12-thinking-with-data.Rmd b/12-thinking-with-data.Rmd
index 6c189cfa9..bc3816226 100755
--- a/12-thinking-with-data.Rmd
+++ b/12-thinking-with-data.Rmd
@@ -11,29 +11,35 @@ rq <- 0
 
 knitr::opts_chunk$set(
   tidy = FALSE, 
-  out.width = '\\textwidth'
+  out.width = '\\textwidth', 
+  fig.height = 4,
+  warning = FALSE
   )
+
 options(scipen = 99, digits = 3)
 
-# This bit of code is a bug fix on asis blocks, which we use to show/not show LC
-# solutions, which are written like markdown text. In theory, it shouldn't be
-# necessary for knitr versions <=1.11.6, but I've found I still need to for
-# everything to knit properly in asis blocks. More info here: 
-# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
-library(knitr)
-knit_engines$set(asis = function(options) {
-  if (options$echo && options$eval) knit_child(text = options$code)
-})
+# Set random number generator see value for replicable pseudorandomness. Why 76?
+# https://www.youtube.com/watch?v=xjJ7FheCkCU
+set.seed(76)
+```
+
+
+
+***
+
 
-# This controls which LC solutions to show. Options for solutions_shown: "ALL"
-# (to show all solutions), or subsets of c('4-4', '4-5'), including the
-# null vector c('') to show no solutions.
-solutions_shown <- c('')
-show_solutions <- function(section){
-  return(solutions_shown == "ALL" | section %in% solutions_shown)
-  }
+
+```{block, type='announcement', purl=FALSE}
+**In preparation for our first print edition to be published by CRC Press in Fall 2019, we're remodeling this chapter a bit.  Don't expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at [ModernDive.com](https://moderndive.com/) by early Summer 2019!**
 ```
 
+
+
+***
+
+
+
+
 Recall in Section \@ref(sec:intro-for-students) "Introduction for students" and at the end of chapters throughout this book, we displayed the "ModernDive flowchart" mapping your journey through this book.
 
 ```{r moderndive-figure-conclusion, echo=FALSE, fig.align='center', fig.cap="ModernDive Flowchart"}
@@ -88,19 +94,11 @@ library(scales)
 ```
 
 
-### DataCamp {-}
-
-The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author [Albert Y. Kim's](https://twitter.com/rudeboybert) DataCamp course "Modeling with Data in the Tidyverse." If you're interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 "Introduction to Modeling" and Chapter 3 "Modeling with Multiple Regression."
-
-```{r, echo=FALSE, results='asis'}
-image_link(path = "images/datacamp_intro_to_modeling.png", link = "https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse")
-```
-
-Case studies involving data in the `fivethirtyeight` R package form the basis of ModernDive co-author [Chester Ismay's](https://twitter.com/old_man_chester?lang=en) DataCamp course "Effective Data Storytelling in the Tidyverse." This free course can be accessed [here](https://www.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free).
 
 ***
 
 
+
 ## Case study: Seattle house prices {#seattle-house-prices}
 
 [Kaggle.com](https://www.kaggle.com/) is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the [House Sales in King County, USA](https://www.kaggle.com/harlfoxem/housesalesprediction) consisting of homes sold in between May 2014 and May 2015 in King County, Washington State, USA, which includes the greater Seattle metropolitan area. This [CC0: Public Domain](https://creativecommons.org/publicdomain/zero/1.0/) licensed dataset is included in the `moderndive` package in the `house_prices` data frame, which we'll refer to as the "Seattle house prices" dataset. 
@@ -229,7 +227,7 @@ data_frame(Price = c(1,10,100,1000,10000,100000,1000000)) %>%
 Let's break this down:
 
 1. When purchasing a cup of coffee, we tend to think of prices ranging in single dollars e.g. \$2 or \$3. However when purchasing say mobile phones, we don't tend to think in prices in single dollars e.g. \$676 or \$757, but tend to round to the nearest unit of hundreds of dollars e.g. \$200 or \$500. 
-1. Let's say want to know the log10-transformed value of \$76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since \$76 is between \$10 and \$100. In fact, `log10(76)` is 1.880814.
+1. Let's say we want to know the log10-transformed value of \$76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since \$76 is between \$10 and \$100. In fact, `log10(76)` is 1.880814.
 1. log10-transformations are *monotonic*, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B).
 1. Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: \$100 to \$1000.
 
@@ -441,22 +439,26 @@ intepreting the inference for regression in Subsection \@ref(house-prices-infere
 
 
 
+***
+
+
+
 ## Case study: Effective data storytelling {#data-journalism}
 
----
 
 ```{block, type='learncheck', purl=FALSE}
 **Note: This section is still under construction. If you would like to contribute, please check us out on GitHub at <https://github.com/moderndive/moderndive_book>.**
+```
 
 <center> 
-/begin{center}
-`r include_image(path = "images/sign-2408065_1920.png", html_opts="height=100px",
-  latex_opts = "width=20%")`
-/end{center}
+`r include_image(path = "images/sign-2408065_1920.png", html_opts="height=100px", latex_opts = "width=20%")`
 </center>
-```
 
----
+
+
+***
+
+
 
 As we've progressed throughout this book, you've seen how to work with data in a variety of ways.  You've learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types.  You've summarized data in table form and calculated summary statistics for a variety of different variables.  Further, you've seen the value of inference as a process to come to conclusions about a population by using a random sample.  Lastly, you've explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure.  All throughout, you've learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the "effective data storytelling" done by data journalists around the world. Great data stories don't mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling.
 
diff --git a/91-appendixA.Rmd b/91-appendixA.Rmd
index f3d2a1b79..b1957a517 100755
--- a/91-appendixA.Rmd
+++ b/91-appendixA.Rmd
@@ -38,3 +38,7 @@ The **distribution** of a variable/dataset corresponds to generalizing patterns
 **Outliers** correspond to values in the dataset that fall far outside the range of "ordinary" values.  In regards to a boxplot (by default), they correspond to values below $Q_1 - (1.5 * IQR)$ or above $Q_3 + (1.5 * IQR)$.
 
 Note that these terms (aside from **Distribution**) only apply to quantitative variables.
+
+
+
+## Normal distribution discussion
diff --git a/92-appendixB.Rmd b/92-appendixB.Rmd
index 16c2d37e0..4cf0e3c61 100755
--- a/92-appendixB.Rmd
+++ b/92-appendixB.Rmd
@@ -233,7 +233,11 @@ We see that `r mu0` is not contained in this confidence interval as a plausible
 
 **Interpretation**:  We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between `r ci[["2.5%"]]` and `r ci[["97.5%"]]`.
 
----
+
+
+***
+
+
 
 ### Traditional methods
 
@@ -469,7 +473,11 @@ We see that 0.80 is contained in this confidence interval as a plausible value o
 
 **Interpretation**:  We are 95% confident the true proportion of customers who are satisfied with the service they receive is between `r ci[["2.5%"]]` and `r ci[["97.5%"]]`.
 
----
+
+
+***
+
+
 
 ### Traditional methods
 
@@ -722,7 +730,11 @@ We see that 0 is not contained in this confidence interval as a plausible value
 
 **Interpretation**:  We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between `r round(-ci[["2.5%"]], 2)` dollars smaller to `r round(-ci[["97.5%"]], 2)` dollars smaller than for college graduates.
 
----
+
+
+***
+
+
 
 ### Traditional methods
 
@@ -784,7 +796,11 @@ The $p$-value---the probability of observing a $Z$ value of -3.16 or more extrem
 
 We, therefore, have sufficient evidence to reject the null hypothesis.  Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated.  We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians.
 
----
+
+
+***
+
+
 
 ### Comparing results
 
@@ -997,7 +1013,11 @@ We see that 0 is contained in this confidence interval as a plausible value of $
 
 **Note**:  You could also use the null distribution based on randomization with a shift to have its center at $\bar{x}_{sac} - \bar{x}_{cle} = \$`r round(d_hat, 2)`$ instead of at 0 and calculate its percentiles.  The confidence interval produced via this method should be comparable to the one done using bootstrapping above.
 
----
+
+
+***
+
+
 
 ### Traditional methods
 
@@ -1258,7 +1278,11 @@ We see that 0 is not contained in this confidence interval as a plausible value
 
 **Interpretation**:  We are 95% confident the true mean zinc concentration on the surface is between `r round(-ci[["2.5%"]], 2)` units smaller to `r round(-ci[["97.5%"]], 2)` units smaller than on the bottom.
 
----
+
+
+***
+
+
 
 ### Traditional methods
 
@@ -1316,7 +1340,11 @@ pt(-4.8638, df = nrow(zinc_diff) - 1, lower.tail = TRUE)
 
 We, therefore, have sufficient evidence to reject the null hypothesis.  Our initial guess that our observed sample mean difference was not statistically less than the hypothesized mean of 0 has been invalidated here.  Based on this sample, we have evidence that the mean concentration in the bottom water is greater than that of the surface water at different paired locations.
 
----
+
+
+***
+
+
 
 ### Comparing results
 
diff --git a/94-appendixD.Rmd b/94-appendixD.Rmd
index b4b5c8506..eab5b6d98 100644
--- a/94-appendixD.Rmd
+++ b/94-appendixD.Rmd
@@ -1,5 +1,8 @@
 # Learning Check Solutions {#appendixD}
 
+<!-- Albert will make sure the exercises here match up with exercises in the
+reordering of the book. -->
+
 ```{r setup_lc_solutions, include=FALSE, purl=FALSE}
 knitr::opts_chunk$set(tidy = FALSE, out.width = '\\textwidth')
 # This bit of code is a bug fix on asis blocks, which we use to show/not show LC solutions, which are written like markdown text. In theory, it shouldn't be necessary for knitr versions <=1.11.6, but I've found I still need to for everything to knit properly in asis blocks. More info here:
@@ -27,6 +30,19 @@ library(ggplot2)
 library(nycflights13)
 ```
 
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**  Repeat the above installing steps, but for the `dplyr`, `nycflights13`, and `knitr` packages. This will install the earlier mentioned `dplyr` package, the `nycflights13` package containing data on all domestic flights leaving a NYC airport in 2013, and the `knitr` package for writing reports in R. 
+
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** "Load" the `dplyr`, `nycflights13`, and `knitr` packages as well by repeating the above steps.
+
+**Solution**: If the following code runs with no errors, you've succeeded!
+
+```{r, eval=FALSE}
+library(dplyr)
+library(nycflights13)
+library(knitr)
+```
+
+
 **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What does any *ONE* row in this `flights` dataset refer to?
 
 - A. Data on an airline 
@@ -62,6 +78,18 @@ library(nycflights13)
 * `chr`: character. i.e. text
 
 
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What properties of the observational unit do each of `lat`, `lon`, `alt`, `tz`, `dst`, and `tzone` describe for the `airports` data frame?  Note that you may want to use `?airports` to get more information.
+
+**Solution**: `lat` `long` represent the airport geographic coordinates, `alt` is the altitude above sea level of the airport (Run `airports %>% filter(faa == "DEN")` to see the altitude of Denver International Airport), `tz` is the time zone difference with respect to GMT in London UK, `dst` is the daylight savings time zone, and `tzone` is the time zone label.
+
+
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not.  In other words, create your own tidy dataset that matches these conditions.
+
+**Solution**: 
+
+* In the `weather` example in LC3.8, the combination of `origin`, `year`, `month`, `day`, `hour`  are identification variables as they identify the observation in question.
+* Anything else pertains to observations: `temp`, `humid`, `wind_speed`, etc.
+
 
 ***
 
@@ -147,7 +175,7 @@ ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = humid)) +
   geom_line()
 ```
 
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures?
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What does changing the number of bins from 30 to 40 tell us about the distribution of temperatures?
 
 **Solution**: The distribution doesn't change much. But by refining the bin width, we see that the temperature data has a high degree of accuracy. What do I mean by accuracy? Looking at the `temp` variabile by `View(weather)`, we see that the precision of each temperature recording is 2 decimal places.
 
@@ -190,7 +218,7 @@ the middle 50% of values, as delineated by the interquartile range is 30&deg;F:
 
 **Solution**: 
 
-* We'd have 365 facets to look at. Way to many.
+* We'd have 365 facets to look at. Way too many.
 * We don't really care about day-to-day fluctuation in weather so much, but maybe more week-to-week variation. We'd like to focus on seasonal trends.
 
 **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Does the `temp` variable in the `weather` data-set have a lot of variability?  Why do you say that?
@@ -241,9 +269,9 @@ weather %>%
   kable()
 ```
 
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** We looked at the distribution of a numerical variable over a categorical variable here with this boxplot.  Why can't we look at the distribution of one numerical variable over the distribution of another numerical variable?  Say, temperature across pressure, for example?
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** We looked at the distribution of the numerical variable `temp` split by the numerical variable `month` that we converted to a categorical variable using the `factor()` function. Why would a boxplot of `temp` split by the numerical variable `pressure` similarly converted to a categorical variable using the `factor()` not be informative?
 
-**Solution**: Because we need a way to group many numerical observations together, say by grouping by month. For pressure, we have near unique values for pressure, i.e. no groups, so we can't make boxplots.
+**Solution**: Because there are 12 unique values of `month` yielding only 12 boxes in our boxplot. There are many more unique values of `pressure` (`r weather$pressure %>% unique() %>% length()` unique values in fact), because values are to the first decimal place. This would lead to `r weather$pressure %>% unique() %>% length()` boxes, which is too many for people to digest. 
 
 **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Boxplots provide a simple way to identify outliers.  Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?
 
@@ -311,127 +339,7 @@ weather %>%
 chap <- 4
 lc <- 0
 # This controls which LC solutions to show. Options for solutions_shown: "ALL" (to show all solutions), or subsets of c('2-1', '2-2'), including the null vector c('') to show no solutions.
-solutions_shown <- c('4-1', '4-2', '4-3', '4-4')
-# solutions_shown <- c('')
-show_solutions <- function(section){return(solutions_shown == "ALL" | section %in% solutions_shown)}
-```
-
-```{r message=FALSE}
-library(dplyr)
-library(ggplot2)
-library(nycflights13)
-library(tidyr)
-library(readr)
-``` 
-
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article [Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?](https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/) 
-
-```{r echo=FALSE}
-drinks_sub <- drinks %>%
-  select(-total_litres_of_pure_alcohol) %>% 
-  filter(country %in% c("USA", "Canada", "South Korea"))
-drinks_sub_tidy <- drinks_sub %>%
-  gather(type, servings, -c(country)) %>%
-  mutate(
-    type = str_sub(type, start=1, end=-10)
-  ) %>%
-  arrange(country, type) %>% 
-  rename(`alcohol type` = type)
-drinks_sub
-```
-
-This data frame is not in tidy format. What would it look like if it were?
-
-**Solution**: There are three variables of information included: country, alcohol type, and number of servings. In tidy format, each of these variables of information are included in their own column.
-
-```{r, include=show_solutions('4-1'), echo=FALSE}
-drinks_sub_tidy
-```
-
-Note that how the rows are sorted is inconsequential in whether or not the data frame is in tidy format. In other words, the following data frame sorted by alcohol type instead of country is equally in tidy format. 
-
-```{r, include=show_solutions('4-1'), echo=FALSE}
-drinks_sub_tidy %>% 
-  arrange(`alcohol type`)
-```
-
-
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What properties of the observational unit do each of `lat`, `lon`, `alt`, `tz`, `dst`, and `tzone` describe for the `airports` data frame?  Note that you may want to use `?airports` to get more information.
-
-**Solution**: `lat` `long` represent the airport geographic coordinates, `alt` is the altitude above sea level of the airport (Run `airports %>% filter(faa == "DEN")` to see the altitude of Denver International Airport), `tz` is the time zone difference with respect to GMT in London UK, `dst` is the daylight savings time zone, and `tzone` is the time zone label.
-
-
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not.  In other words, create your own tidy dataset that matches these conditions.
-
-**Solution**: 
-
-* In the `weather` example in LC3.8, the combination of `origin`, `year`, `month`, `day`, `hour`  are identification variables as they identify the observation in question.
-* Anything else pertains to observations: `temp`, `humid`, `wind_speed`, etc.
-
-
-
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**  Convert the `dem_score` data frame into
-a tidy data frame and assign the name of `dem_score_tidy` to the resulting long-formatted data frame.
-
-**Solution**: Running the following in the console:
-
-```{r, include=show_solutions('4-3')}
-dem_score_tidy <- gather(data = dem_score, key = year, value = democracy_score, - country)
-```
-
-Let's now compare the `dem_score` and `dem_score_tidy`. `dem_score` has democracy score information for each year in columns, whereas in `dem_score_tidy` there are explicit variables `year` and `democracy_score`. While both representations of the data contain the same information, we can only use `ggplot()` to create plots using the `dem_score_tidy` data frame.
-
-```{r, include=show_solutions('4-3')}
-dem_score
-dem_score_tidy
-```
-
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**  Read in the life expectancy data stored at <https://moderndive.com/data/le_mess.csv> and convert it to a tidy data frame. 
-
-**Solution**: The code is similar
-
-```{r, eval=FALSE,include=show_solutions('4-3'), echo=show_solutions('4-3')}
-life_expectancy <- read_csv('https://moderndive.com/data/le_mess.csv')
-life_expectancy_tidy <- gather(data = life_expectancy, key = year, value = life_expectancy, -country)
-```
-```{r, echo=FALSE, purl=FALSE, message=FALSE, warning=FALSE}
-life_expectancy <- read_csv('data/le_mess.csv')
-life_expectancy_tidy <- gather(data = life_expectancy, key = year, value = life_expectancy, -country)
-```
-
-We observe the same construct structure with respect to `year` in `life_expectancy` vs `life_expectancy_tidy` as we did in `dem_score` vs `dem_score_tidy`:
-
-```{r, lc4-2solutions-4, include=show_solutions('4-3')}
-life_expectancy
-life_expectancy_tidy
-```
-
-
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are common characteristics of "tidy" datasets?
-
-**Solution**: Rows correspond to observations, while columns correspond to variables.  
-
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What makes "tidy" datasets useful for organizing data?
-
-**Solution**: Tidy datasets are an organized way of viewing data. We'll see later that this format is required for the `ggplot2` and `dplyr` packages for data visualization and wrangling. 
-
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are some advantages of data in normal forms?  What are some disadvantages?
-
-**Solution**: When datasets are in normal form, we can easily `_join` them with other datasets! For example, can we join the `flights` data with the `planes` data?  We'll see this more in Chapter 5!
-
-
-
-***
-
-
-
-## Chapter 5 Solutions
-
-```{r, include=FALSE, purl=FALSE}
-chap <- 5
-lc <- 0
-# This controls which LC solutions to show. Options for solutions_shown: "ALL" (to show all solutions), or subsets of c('2-1', '2-2'), including the null vector c('') to show no solutions.
-solutions_shown <- c('5-1', '5-2', '5-3', '5-4', '5-5', '5-6', '5-7')
+solutions_shown <- c('4-1', '4-2', '4-3', '4-4', '4-5', '4-6', '4-7')
 # solutions_shown <- c('')
 show_solutions <- function(section){return(solutions_shown == "ALL" | section %in% solutions_shown)}
 ```
@@ -443,7 +351,7 @@ library(nycflights13)
 ``` 
 
 
-**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What's another way using the "not" operator `!` we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the `flights` data frame? Test this out using the code above.
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What's another way using the "not" operator `!` to filter only the rows that are not going to Burlington, VT nor Seattle, WA in the `flights` data frame? Test this out using the code above.
 
 **Solution**: 
 
@@ -636,6 +544,10 @@ with?
 **Solution**: This question is subjective! What surprises me is the high number of flights to Boston. Wouldn't it be easier and quicker to take the train?
 
 
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are some advantages of data in normal forms?  What are some disadvantages?
+
+**Solution**: When datasets are in normal form, we can easily `_join` them with other datasets! For example, we can join the `flights` data with the `planes` data.
+
 **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are some ways to select all three of the `dest`, `air_time`, and `distance` variables from `flights`?  Give the code showing how to do this in at least three different ways.
 
 **Solution**: 
@@ -743,7 +655,7 @@ flights %>%
   summarize(ASM = sum(ASM))
 ```
 
-However, because for certain carriers certain flights have missing `NA` values, the resulting table also returns `NA`'s. We can eliminate these by adding a `na.rm = TRUE` argument to `sum()`, telling R that we want to remove the `NA`'s in the sum. We saw this in Section \ref(summarize):
+However, because for certain carriers certain flights have missing `NA` values, the resulting table also returns `NA`'s. We can eliminate these by adding a `na.rm = TRUE` argument to `sum()`, telling R that we want to remove the `NA`'s in the sum. We saw this in Section \@ref(summarize):
 
 ```{r, include=show_solutions('5-7')}
 flights %>% 
@@ -787,8 +699,114 @@ flights %>%
 
 
 
+## Chapter 5 Solutions
+
+```{r, include=FALSE, purl=FALSE}
+chap <- 5
+lc <- 0
+# This controls which LC solutions to show. Options for solutions_shown: "ALL" (to show all solutions), or subsets of c('2-1', '2-2'), including the null vector c('') to show no solutions.
+solutions_shown <- c('5-1', '5-2', '5-3', '5-4')
+# solutions_shown <- c('')
+show_solutions <- function(section){return(solutions_shown == "ALL" | section %in% solutions_shown)}
+```
+
+```{r message=FALSE}
+library(dplyr)
+library(ggplot2)
+library(nycflights13)
+library(tidyr)
+library(readr)
+``` 
+
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are common characteristics of "tidy" datasets?
+
+**Solution**: Rows correspond to observations, while columns correspond to variables.  
+
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What makes "tidy" datasets useful for organizing data?
+
+**Solution**: Tidy datasets are an organized way of viewing data. This format is required for the `ggplot2` and `dplyr` packages for data visualization and wrangling. 
+
+
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Take a look the `airline_safety` data frame included in the `fivethirtyeight` data. Run the following:
+
+```{r, eval=FALSE}
+airline_safety
+```
+
+After reading the help file by running `?airline_safety`, we see that `airline_safety` is a data frame containing information on different airlines companies' safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver's article ["Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?"](https://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/). Let's ignore the `incl_reg_subsidiaries` and `avail_seat_km_per_week` variables for simplicity:
+
+```{r}
+airline_safety_smaller <- airline_safety %>% 
+  select(-c(incl_reg_subsidiaries, avail_seat_km_per_week))
+airline_safety_smaller
+```
+
+This data frame is not in "tidy" format. How would you convert this data frame to be in "tidy" format, in particular so that it has a variable `incident_type_years` indicating the indicent type/year and a variable `count` of the counts?
+
+**Solution**: Using the `gather()` function from the `tidyr` package:
+
+```{r}
+airline_safety_smaller_tidy <- airline_safety_smaller %>% 
+  gather(key = incident_type_years, value = count, -airline)
+airline_safety_smaller_tidy
+```
+
+If you look at the resulting `airline_safety_smaller_tidy` data frame in the spreadsheet viewer, you'll see that the variable `incident_type_years` has 6 possible values: `"incidents_85_99", "fatal_accidents_85_99", "fatalities_85_99", 
+"incidents_00_14", "fatal_accidents_00_14", "fatalities_00_14"` corresponding to the 6 columns of `airline_safety_smaller` we tidied. 
+
+
+
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**  Convert the `dem_score` data frame into
+a tidy data frame and assign the name of `dem_score_tidy` to the resulting long-formatted data frame.
+
+**Solution**: Running the following in the console:
+
+```{r, include=show_solutions('4-3')}
+dem_score_tidy <- dem_score %>% 
+  gather(key = year, value = democracy_score, - country)
+```
+
+Let's now compare the `dem_score` and `dem_score_tidy`. `dem_score` has democracy score information for each year in columns, whereas in `dem_score_tidy` there are explicit variables `year` and `democracy_score`. While both representations of the data contain the same information, we can only use `ggplot()` to create plots using the `dem_score_tidy` data frame.
+
+```{r, include=show_solutions('4-3')}
+dem_score
+dem_score_tidy
+```
+
+**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**  Read in the life expectancy data stored at <https://moderndive.com/data/le_mess.csv> and convert it to a tidy data frame. 
+
+**Solution**: The code is similar
+
+```{r, eval=FALSE,include=show_solutions('4-3'), echo=show_solutions('4-3')}
+life_expectancy <- read_csv("https://moderndive.com/data/le_mess.csv")
+life_expectancy_tidy <- life_expectancy %>% 
+  gather(key = year, value = life_expectancy, -country)
+```
+```{r, echo=FALSE, purl=FALSE, message=FALSE, warning=FALSE}
+life_expectancy <- read_csv('data/le_mess.csv')
+life_expectancy_tidy <- life_expectancy %>% 
+  gather(key = year, value = life_expectancy, -country)
+```
+
+We observe the same construct structure with respect to `year` in `life_expectancy` vs `life_expectancy_tidy` as we did in `dem_score` vs `dem_score_tidy`:
+
+```{r, lc4-2solutions-4, include=show_solutions('4-3')}
+life_expectancy
+life_expectancy_tidy
+```
+
+
+
+
+***
+
+
+
+
 ## Chapter 6 Solutions
 
+To come!
+
 ```{r, include=FALSE, purl=FALSE}
 chap <- 6
 lc <- 0
diff --git a/NEWS.md b/NEWS.md
index 3c2af5a3b..2d66c5439 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,26 +1,9 @@
-# ModernDive 0.4.0.9000
+# ModernDive 0.5.0.9000
 
 ## Major refactoring of inference chapters of book
 
-### Old chapter structure
-
-* Chapter 8 - Sampling
-    1. Introduction to sampling
-        a) Concepts related to sampling
-        b) Inference via sampling
-    2. Tactile sampling simulation
-        a) Using the shovel once
-        b) Using the shovel 33 times
-    3. Virtual sampling simulation
-        a) Using the shovel once
-        b) Using shovel 33 times
-        c) Using shovel 1000 times
-        d) Using different shovels
-    4. In real-life sampling: Polls
-    5. Conclusion
-        a) Central Limit Theorem
-        b) What’s to come?
-        c) Script of R code
+**Old Chapter Structure**:
+
 * Chapter 9 - Confidence Intervals
     1. Bootstrapping
         a) Data explanation
@@ -83,61 +66,36 @@
         d) Script of R code
 
 
-### New chapter structure
-
-* Chapter 8 - Sampling
-    1. Activity: Sampling from a bowl
-        a) Question: What proportion of this bowl is red?
-        b) Using shovel once
-        c) Using shovel 33 times
-    1. Computer simulation:
-        a) What is a simulation? We just did a "tactile" one by hand, now let's do one using the the computer
-        b) Using shovel once
-        c) Using shovel 33 times
-        d) Using shovel 1000 times
-        e) Using different shovels
-    1. Goal: Study fluctuations due to sampling variation
-        a) You probably already knew: Bigger sample size means "better" guess.
-        b) Comparing shovels: Role of sample size
-    1. Framework: Sampling
-        a) Terminology for sampling (population, sample, point estimate, etc)
-        b) Statistical concepts: sampling distribution and standard error
-        c) Computer's random number generator
-    1. Interpretation: 
-        a) Visual display of differences
-    1. Case study: Obama poll 
-    1. Big picture: 
-        a) Table of inferential scenarios: Add bowl and obama poll (both p)
-        b) Why does this work? Theoretial result: CLT
-        c) There's a formula for that: SE formula that has sqrt(n) at the bottom
-        d) Appendix: Normal distribution discussion
+**New Chapter Structure**:
+
 * Chapter 9 - Confidence Intervals
     1. Activity: Working with a sample of pennies from the bank. Are they representative of all pennies in the US.
         a) Question: What do I do when I only have one sample?
         b) Resampling once (paper slips)
         c) Resampling 33 times
+        d) Diagrams in Keynote
     1. Computer simulation: 
         a) What is resampling?
         b) Resampling once
         c) Resampling 33 times
         d) Resampling 1000 times
     1. Goal: Generate an estimate that accounts for sampling variation
-        a) Constructing a confidence interval
+        a) Constructing a confidence interval: hide code to shade ci region and to get the actual values. 
         b) Constructing a CI using percentile method
         c) Constructing a CI using SE method
     1. Framework: Boostrap resampling with replacement
         a) What dplyr verbs did we use?
         b) There is only one test framework
-        c) the infer package
+        c) the infer package: make sure to draw parallels between dplyr code and infer verbs
     1. Interpretation: 
-        a) 95% speaks to reliability of the process, not about an particular interval
+        a) 95% speaks to reliability of the process, not about an particular interval. "We are 95% confident"
         b) What determines the width? Sample size, confidence levels (only int at population variance)
     1. Case study: Comparing two proportions with Mythbusters data
     1. Big picture: 
-        a) Does this even work? Comparing sampling and bootstrap distribution.
+        a) Does this even work? Comparing sampling and bootstrap distribution. Do this using balls. 
         b) Table of inferential scenarios: Add pennies (mu) and Mythbusters (p1 - p2)
-        c) Why does this work? Theoretical result: Donsker's theorem. The empirical CDF converges to the population CDF. Bootstrap works for any point estimate
-        d) There's a formula for that! Margin of error using critical values z*
+        c) Why does this work? Theoretical result: Efron. The empirical CDF converges to the population CDF. Bootstrap works for any point estimate
+        d) There's a formula for that! Margin of error using critical values z. Talk about normal distributions. 
 * Chapter 10 - Hypothesis Testing
     1. Activity: Shuffling resumes between male and female job applicants
         a) Question: Are men and women rated for jobs differently?
@@ -145,79 +103,147 @@
         c) What about sampling variation?
         d) What did we actually observe?
         e) How likely is this result?
-    1. Computer simulation: 
+        f) Diagrams in Keynote
+    1. Extension of previous framework/infer
+        a) Revisit verb framework
+        a) Permutation test resampling w/o replacement
+        b) There is only one test framework
+        a) Do activity via infer package
     1. Goal: Choose between two possible truths while accounting for sampling variation
         a) Conducting a hypothesis test
         b) Null hypothesis that's assumed
         c) Null distribution of test statistics: A "alternate universe" distribution
         d) Observed test statistics
         e) Definition of p-value
-    1. Framework: Permutation test resampling w/o replacement
-        a) Revisit verb framework
-        b) There is only one test framework
-        c) the infer package
     1. Interpretation: 
         a) A yes/no-type decision: statistical significance via alpha
         b) Types of errors: 2x2 table
         c) Analogy of criminal justice system
     1. Case study: Comparing two means with action vs romance movie data
     1. Big picture: 
+        a) When is inference not needed: EDA can solve the problem. 
         a) Problems with p-values: p-hacking, hard to understand, ASA statement
         b) Comparison with confidence intervals. HT yields binary decision, but CI's yield plausible range of estimates. This is statistical vs practical significance
         c) Table of inferential scenarios: Add action vs romance (mu1 - mu2)
-        d) Why does this work? Theoretical result: Neyman-Pearson lemma
-        e) There's a formula for that! t-test
+        d) Why does this work? Theoretical result: Neyman-Pearson lemma (maybe)
+        e) There's a formula for that! t-test. Draw a null distribution with t-distribution superimposed. 
 * Chapter 11 - Inference for Regression
     1. Activity: Revisit simple linear regression
         a) Question: Is there a significant relationship between teaching score and bty score above and beyond any evidence due to sampling variation.
         b) Review exercise/re-run all code
         c) Regression table
     1. Computer simulation: 
-        a) Bootstraping the relationship
-        b) Permuting the relationship
+        a) Permuting the relationship: to do a hypothesis test assuming independence of y & x. 
+        a) Bootstraping the rows: Having done HT, generate confidence interval.
     1. Goal: Inferring about the population regression slope
-        a) 
     1. Framework: 
     1. Interpretation:
-        a) Values in table are given! No simulations necessary!
-        b) Conditions for inference: residual and partial residual plots
-    1. Case study: Mmultiple regression example from Ch 7.
-        a) 
+        a) "You don't have to do any of this! Values in table are given!" No simulations necessary!
+        b) Conditions for inference: residual and partial residual plots, assumption of indepdence. 
+    1. Case study: Multiple regression example from Ch 7.
     1. Big picture: 
         a) ANOVA = Regression with categorical variables
-        b) Table of inferential scenarios: Add TBD (beta1)
-        c) Why does this work? Theoretical result: Gauss-Markov Theorem
-        d) There's a formula for that! Fitted intercept and slope. SE of fitted intercept and slope. Note there is a sqrt(n) in denominator. 
+        b) Table of inferential scenarios: Add (beta1)
+        c) Why does this work?
+        d) There's a formula for that! Fitted intercept and slope. SE of fitted intercept and slope: observe there is a sqrt(n) in denominator. 
+        
 
 
+***
+
+
+
+# ModernDive 0.5.0
+
+## Highlights
+
+* "Data wrangling" chapter now comes after "Tidy data" chapter.
+* Improved explanations and examples of `geom_histogram()`, `geom_boxplot()`, and "tidy" data
+* Moving residual analysis from regression Chapters 6 & 7 to Chap 11: Inference for regression
+* Reorganized Chap 8 on Sampling
+* All learning check solutions now in Appendix D
+* PDF build re-added (still a work-in-progress)
+
 ## All content changes
 
-* Changed title from "Statistical Inference via Data Science in R" to "Statistical Inference via Data Science: A moderndive into R and the tidyverse"
+* Changed title
+    + From: "Statistical Inference via Data Science in R"
+    + To: "Statistical Inference via Data Science: A moderndive into R and the tidyverse"
 * Chapter 2 - Getting Started
     + Added subsection 2.2.3 "Errors, warnings, and messages" by @andrewheiss
 * Chapter 3 - Data visualization:
-    + Added simpler introductory `geom_histogram()` example
-    + Added simpler introductory `geom_boxplot()` example
-    + Started downweighting the amount of data wrangling previews included in this chapter, in particular `join`
+    + Added simpler introductory `geom_histogram()` and `geom_boxplot()` examples
+    + Started downweighting the amount of data wrangling previews included in this chapter, in particular `join`.
     + Cleaned up conclusion section
+    + Added cheatsheet
 * Switched order of "Chap 4 Tidy Data" and "Chap 5 Data Wrangling": Data Wrangling now comes first
 * Chapter 4 - Data wrangling:
+    + Added cheatsheet
 * Chapter 5 - Renamed to "Importing and tidy data"
     + Reordered sections: importing then tidying
-    + Added `fivethirtyeight::drinks` example of hitting the non-tidy wall, then using `tidyr::gather()`
+    + Added `fivethirtyeight::drinks` example of "hitting the non-tidy wall", then using `tidyr::gather()`
     + Made Guatemala democracy score a case study.
     + Added discussion on what `tidyverse` package is.
+    + Moved discussion on normal forms to Ch4: Data Wrangling - joins.
+    + Moved discussion on identification vs measurement variables to Ch2: Getting started with data.
 * Chapter 6 - Basic regression:
     + Moved residual analysis to Chapter 11
 * Chapter 7 - Multiple regression:
     + Moved residual analysis to Chapter 11
+* Chapter 8 - Sampling: Major refactoring of presentation/exposition; see below
 * Chapter 11 - Inference for regression:
     + Moved residual analysis from Chapter 6 & 7 here
 * Moved all Learning Check solutions to Appendix D
-    
-## Other changes
 
-* Added PDF build
+
+### Chapter 8 Sampling Refactoring
+
+**Old chapter structure**:
+
+1. Introduction to sampling
+    a) Concepts related to sampling
+    b) Inference via sampling
+2. Tactile sampling simulation
+    a) Using the shovel once
+    b) Using the shovel 33 times
+3. Virtual sampling simulation
+    a) Using the shovel once
+    b) Using shovel 33 times
+    c) Using shovel 1000 times
+    d) Using different shovels
+4. In real-life sampling: Polls
+5. Conclusion
+    a) Central Limit Theorem
+    b) What’s to come?
+    c) Script of R code
+
+**New chapter structure**:
+
+1. Activity: Sampling from a bowl
+    a) Question: What proportion of this bowl is red?
+    b) Using shovel once
+    c) Using shovel 33 times
+1. Computer simulation:
+    a) What is a simulation? We just did a "tactile" one by hand, now let's do one using the the computer
+    b) Using shovel once
+    c) Using shovel 33 times
+    d) Using shovel 1000 times
+    e) Using different shovels
+1. Goal: Study fluctuations due to sampling variation
+    a) You probably already knew: Bigger sample size means "better" guess.
+    b) Comparing shovels: Role of sample size
+1. Framework: Sampling
+    a) Terminology for sampling (population, sample, point estimate, etc)
+    b) Statistical concepts: sampling distribution and standard error
+    c) Computer's random number generator
+1. Interpretation: 
+    a) Visual display of differences
+1. Case study: Obama poll 
+1. Big picture: 
+    a) Table of inferential scenarios: Add bowl and obama poll (both p)
+    b) Why does this work? Theoretial result: CLT
+    c) There's a formula for that: SE formula that has sqrt(n) at the bottom
+    d) Appendix: Normal distribution discuss
 
 
 
diff --git a/_output.yml b/_output.yml
index acfa7773e..7946f5888 100755
--- a/_output.yml
+++ b/_output.yml
@@ -1,4 +1,20 @@
 # Modified from https://github.com/rstudio/bookdown/blob/master/inst/examples/_output.yml
+#bookdown::pdf_book:
+#  includes:
+#    in_header: latex/preamble.tex
+#    before_body: latex/before_body.tex
+#    after_body: latex/after_body.tex
+#  keep_tex: true
+#  dev: "cairo_pdf"
+#  latex_engine: xelatex
+#  citation_package: natbib
+#  template: null
+#  pandoc_args: --top-level-division=chapter
+#  toc_depth: 3
+#  toc_unnumbered: false
+#  toc_appendix: true
+#  quote_footer: ["\\VA{", "}{}"]
+#  highlight_bw: true
 bookdown::gitbook:
   df_print: default
   css: style.css
@@ -18,19 +34,4 @@ bookdown::gitbook:
     before_body: _includes/logo.html
 #bookdown::epub_book: default
 #bookdown::word_document2: default
-#bookdown::pdf_book:
-#  includes:
-#    in_header: latex/preamble.tex
-#    before_body: latex/before_body.tex
-#    after_body: latex/after_body.tex
-#  keep_tex: true
-#  dev: "cairo_pdf"
-#  latex_engine: xelatex
-#  citation_package: natbib
-#  template: null
-#  pandoc_args: --top-level-division=chapter
-#  toc_depth: 3
-#  toc_unnumbered: false
-#  toc_appendix: true
-#  quote_footer: ["\\VA{", "}{}"]
-#  highlight_bw: true
+
diff --git a/docs/10-example-comparing-two-proportions.html b/docs/10-example-comparing-two-proportions.html
deleted file mode 100644
index e88725482..000000000
--- a/docs/10-example-comparing-two-proportions.html
+++ /dev/null
@@ -1,784 +0,0 @@
-<!DOCTYPE html>
-<html >
-
-<head>
-
-  <meta charset="UTF-8">
-  <meta http-equiv="X-UA-Compatible" content="IE=edge">
-  <title>Chapter 10 Example: Comparing two proportions | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
-  <meta name="generator" content="bookdown  and GitBook 2.6.7">
-
-  <meta property="og:title" content="Chapter 10 Example: Comparing two proportions | Statistical Inference via Data Science" />
-  <meta property="og:type" content="book" />
-  <meta property="og:url" content="https://moderndive.com/" />
-  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
-  <meta name="github-repo" content="moderndive/moderndive_book" />
-
-  <meta name="twitter:card" content="summary" />
-  <meta name="twitter:title" content="Chapter 10 Example: Comparing two proportions | Statistical Inference via Data Science" />
-  
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
-  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
-
-<meta name="author" content="Chester Ismay and Albert Y. Kim">
-
-
-<meta name="date" content="2019-02-03">
-
-  <meta name="viewport" content="width=device-width, initial-scale=1">
-  <meta name="apple-mobile-web-app-capable" content="yes">
-  <meta name="apple-mobile-web-app-status-bar-style" content="black">
-  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
-  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
-<link rel="prev" href="9-confidence-intervals.html">
-<link rel="next" href="11-hypothesis-testing.html">
-<script src="libs/jquery-2.2.3/jquery.min.js"></script>
-<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
-
-
-
-
-
-
-
-<script src="libs/kePrint-0.0.1/kePrint.js"></script>
-<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
-<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
-<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
-<script src="libs/dygraphs-1.1.1/shapes.js"></script>
-<script src="libs/moment-2.8.4/moment.js"></script>
-<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
-<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
-<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
-<script>
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
-
-  ga('create', 'UA-89938436-1', 'auto');
-  ga('send', 'pageview');
-
-</script>
-
-
-<style type="text/css">
-a.sourceLine { display: inline-block; line-height: 1.25; }
-a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
-a.sourceLine:empty { height: 1.2em; }
-.sourceCode { overflow: visible; }
-code.sourceCode { white-space: pre; position: relative; }
-div.sourceCode { margin: 1em 0; }
-pre.sourceCode { margin: 0; }
-@media screen {
-div.sourceCode { overflow: auto; }
-}
-@media print {
-code.sourceCode { white-space: pre-wrap; }
-a.sourceLine { text-indent: -1em; padding-left: 1em; }
-}
-pre.numberSource a.sourceLine
-  { position: relative; left: -4em; }
-pre.numberSource a.sourceLine::before
-  { content: attr(data-line-number);
-    position: relative; left: -1em; text-align: right; vertical-align: baseline;
-    border: none; pointer-events: all; display: inline-block;
-    -webkit-touch-callout: none; -webkit-user-select: none;
-    -khtml-user-select: none; -moz-user-select: none;
-    -ms-user-select: none; user-select: none;
-    padding: 0 4px; width: 4em;
-    color: #aaaaaa;
-  }
-pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
-div.sourceCode
-  {  }
-@media screen {
-a.sourceLine::before { text-decoration: underline; }
-}
-code span.al { color: #ff0000; font-weight: bold; } /* Alert */
-code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
-code span.at { color: #7d9029; } /* Attribute */
-code span.bn { color: #40a070; } /* BaseN */
-code span.bu { } /* BuiltIn */
-code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
-code span.ch { color: #4070a0; } /* Char */
-code span.cn { color: #880000; } /* Constant */
-code span.co { color: #60a0b0; font-style: italic; } /* Comment */
-code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
-code span.do { color: #ba2121; font-style: italic; } /* Documentation */
-code span.dt { color: #902000; } /* DataType */
-code span.dv { color: #40a070; } /* DecVal */
-code span.er { color: #ff0000; font-weight: bold; } /* Error */
-code span.ex { } /* Extension */
-code span.fl { color: #40a070; } /* Float */
-code span.fu { color: #06287e; } /* Function */
-code span.im { } /* Import */
-code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
-code span.kw { color: #007020; font-weight: bold; } /* Keyword */
-code span.op { color: #666666; } /* Operator */
-code span.ot { color: #007020; } /* Other */
-code span.pp { color: #bc7a00; } /* Preprocessor */
-code span.sc { color: #4070a0; } /* SpecialChar */
-code span.ss { color: #bb6688; } /* SpecialString */
-code span.st { color: #4070a0; } /* String */
-code span.va { color: #19177c; } /* Variable */
-code span.vs { color: #4070a0; } /* VerbatimString */
-code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
-</style>
-
-<link rel="stylesheet" href="style.css" type="text/css" />
-</head>
-
-<body>
-
-
-
-  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
-
-    <div class="book-summary">
-      <nav role="navigation">
-
-<ul class="summary">
-<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.1</b> Introduction for students</a><ul>
-<li class="chapter" data-level="1.1.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.1.1</b> What you will learn from this book</a></li>
-<li class="chapter" data-level="1.1.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.1.2</b> Data/science pipeline</a></li>
-<li class="chapter" data-level="1.1.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.1.3</b> Reproducible research</a></li>
-<li class="chapter" data-level="1.1.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.1.4</b> Final note for students</a></li>
-</ul></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.2</b> Introduction for instructors</a><ul>
-<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.2.1</b> Who is this book for?</a></li>
-</ul></li>
-<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.3</b> DataCamp</a></li>
-<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.4</b> Connect and contribute</a></li>
-<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.5</b> About this book</a></li>
-<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.6</b> About the authors</a></li>
-</ul></li>
-<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
-<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#r-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
-<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
-<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
-</ul></li>
-<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
-<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
-<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#errors-warnings-and-messages"><i class="fa fa-check"></i><b>2.2.2</b> Errors, warnings, and messages</a></li>
-<li class="chapter" data-level="2.2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.3</b> Tips on learning to code</a></li>
-</ul></li>
-<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
-<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
-<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
-<li class="chapter" data-level="2.3.3" data-path="2-getting-started.html"><a href="2-getting-started.html#package-use"><i class="fa fa-check"></i><b>2.3.3</b> Package use</a></li>
-</ul></li>
-<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
-<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> <code>nycflights13</code> package</a></li>
-<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> <code>flights</code> data frame</a></li>
-<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
-<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
-</ul></li>
-<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
-<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#additional-resources"><i class="fa fa-check"></i><b>2.5.1</b> Additional resources</a></li>
-<li class="chapter" data-level="2.5.2" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.2</b> What’s to come?</a></li>
-</ul></li>
-</ul></li>
-<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
-<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization</a><ul>
-<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
-<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
-<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder data</a></li>
-<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components"><i class="fa fa-check"></i><b>3.1.3</b> Other components</a></li>
-<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> ggplot2 package</a></li>
-</ul></li>
-<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
-<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
-<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
-<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
-<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
-<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
-<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
-<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
-<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
-<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
-<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
-<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
-<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
-<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bar-or-geom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar or geom_col</a></li>
-<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
-<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#two-categ-barplot"><i class="fa fa-check"></i><b>3.8.3</b> Two categorical variables</a></li>
-<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#summary-table"><i class="fa fa-check"></i><b>3.9.1</b> Summary table</a></li>
-<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#argument-specification"><i class="fa fa-check"></i><b>3.9.2</b> Argument specification</a></li>
-<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#additional-resources-1"><i class="fa fa-check"></i><b>3.9.3</b> Additional resources</a></li>
-<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>3.9.4</b> What’s to come</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="4" data-path="4-wrangling.html"><a href="4-wrangling.html"><i class="fa fa-check"></i><b>4</b> Data Wrangling</a><ul>
-<li class="chapter" data-level="" data-path="4-wrangling.html"><a href="4-wrangling.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="4.1" data-path="4-wrangling.html"><a href="4-wrangling.html#piping"><i class="fa fa-check"></i><b>4.1</b> The pipe operator: <code>%&gt;%</code></a></li>
-<li class="chapter" data-level="4.2" data-path="4-wrangling.html"><a href="4-wrangling.html#filter"><i class="fa fa-check"></i><b>4.2</b> <code>filter</code> rows</a></li>
-<li class="chapter" data-level="4.3" data-path="4-wrangling.html"><a href="4-wrangling.html#summarize"><i class="fa fa-check"></i><b>4.3</b> <code>summarize</code> variables</a></li>
-<li class="chapter" data-level="4.4" data-path="4-wrangling.html"><a href="4-wrangling.html#groupby"><i class="fa fa-check"></i><b>4.4</b> <code>group_by</code> rows</a><ul>
-<li class="chapter" data-level="4.4.1" data-path="4-wrangling.html"><a href="4-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>4.4.1</b> Grouping by more than one variable</a></li>
-</ul></li>
-<li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
-<li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
-<li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
-</ul></li>
-<li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
-<li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
-<li class="chapter" data-level="4.8.2" data-path="4-wrangling.html"><a href="4-wrangling.html#rename"><i class="fa fa-check"></i><b>4.8.2</b> <code>rename</code> variables</a></li>
-<li class="chapter" data-level="4.8.3" data-path="4-wrangling.html"><a href="4-wrangling.html#top_n-values-of-a-variable"><i class="fa fa-check"></i><b>4.8.3</b> <code>top_n</code> values of a variable</a></li>
-</ul></li>
-<li class="chapter" data-level="4.9" data-path="4-wrangling.html"><a href="4-wrangling.html#conclusion-2"><i class="fa fa-check"></i><b>4.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="4.9.1" data-path="4-wrangling.html"><a href="4-wrangling.html#summary-table-1"><i class="fa fa-check"></i><b>4.9.1</b> Summary table</a></li>
-<li class="chapter" data-level="4.9.2" data-path="4-wrangling.html"><a href="4-wrangling.html#additional-resources-2"><i class="fa fa-check"></i><b>4.9.2</b> Additional resources</a></li>
-<li class="chapter" data-level="4.9.3" data-path="4-wrangling.html"><a href="4-wrangling.html#whats-to-come-1"><i class="fa fa-check"></i><b>4.9.3</b> What’s to come?</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
-<li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
-</ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
-<li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
-</ul></li>
-<li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
-<li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
-</ul></li>
-</ul></li>
-<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
-<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
-<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
-<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
-<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
-<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
-</ul></li>
-<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
-<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
-<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
-<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
-</ul></li>
-<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
-<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
-<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
-<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
-<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
-</ul></li>
-<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
-<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
-<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
-<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
-</ul></li>
-<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
-<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
-<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
-<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
-<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
-</ul></li>
-<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
-<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
-<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
-</ul></li>
-<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
-<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
-<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
-<li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
-</ul></li>
-<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
-<li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
-</ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
-</ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
-<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
-<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
-<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
-<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
-</ul></li>
-<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
-<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
-<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
-<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
-<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
-</ul></li>
-<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
-<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
-<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
-</ul></li>
-<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
-<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
-<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> Example: One proportion</a><ul>
-<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
-<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
-<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="10" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html"><i class="fa fa-check"></i><b>10</b> Example: Comparing two proportions</a><ul>
-<li class="chapter" data-level="10.0.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>10.0.1</b> Compute the point estimate</a></li>
-<li class="chapter" data-level="10.0.2" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>10.0.2</b> Bootstrap distribution</a></li>
-<li class="chapter" data-level="10.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#conclusion-6"><i class="fa fa-check"></i><b>10.1</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.1.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#whats-to-come-5"><i class="fa fa-check"></i><b>10.1.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="10.1.2" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#script-of-r-code-2"><i class="fa fa-check"></i><b>10.1.2</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="11" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html"><i class="fa fa-check"></i><b>11</b> Hypothesis Testing</a><ul>
-<li class="chapter" data-level="" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="11.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>11.1</b> When inference is not needed</a></li>
-<li class="chapter" data-level="11.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>11.2</b> Basics of hypothesis testing</a></li>
-<li class="chapter" data-level="11.3" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>11.3</b> Criminal trial analogy</a><ul>
-<li class="chapter" data-level="11.3.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>11.3.1</b> Two possible conclusions</a></li>
-</ul></li>
-<li class="chapter" data-level="11.4" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>11.4</b> Types of errors in hypothesis testing</a><ul>
-<li class="chapter" data-level="11.4.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>11.4.1</b> Logic of hypothesis testing</a></li>
-</ul></li>
-<li class="chapter" data-level="11.5" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>11.5</b> Statistical significance</a></li>
-<li class="chapter" data-level="11.6" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>11.6</b> Hypothesis testing with infer</a></li>
-<li class="chapter" data-level="11.7" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>11.7</b> Example: Comparing two means</a><ul>
-<li class="chapter" data-level="11.7.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>11.7.1</b> Randomization/permutation</a></li>
-<li class="chapter" data-level="11.7.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>11.7.2</b> Comparing action and romance movies</a></li>
-<li class="chapter" data-level="11.7.3" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>11.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
-<li class="chapter" data-level="11.7.4" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>11.7.4</b> Data</a></li>
-<li class="chapter" data-level="11.7.5" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>11.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="11.7.6" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>11.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
-<li class="chapter" data-level="11.7.7" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>11.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
-<li class="chapter" data-level="11.7.8" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>11.7.8</b> Simulated data</a></li>
-<li class="chapter" data-level="11.7.9" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>11.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="11.7.10" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>11.7.10</b> The p-value</a></li>
-<li class="chapter" data-level="11.7.11" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>11.7.11</b> Corresponding confidence interval</a></li>
-<li class="chapter" data-level="11.7.12" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>11.7.12</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="11.8" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>11.8</b> Building theory-based methods using computation</a><ul>
-<li class="chapter" data-level="11.8.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>11.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
-<li class="chapter" data-level="11.8.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>11.8.2</b> Conditions for t-test</a></li>
-</ul></li>
-<li class="chapter" data-level="11.9" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>11.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="11.9.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>11.9.1</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="12" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html"><i class="fa fa-check"></i><b>12</b> Inference for Regression</a><ul>
-<li class="chapter" data-level="" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="12.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>12.1</b> Simulation-based Inference for Regression</a><ul>
-<li class="chapter" data-level="12.1.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>12.1.1</b> Data</a></li>
-<li class="chapter" data-level="12.1.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>12.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
-<li class="chapter" data-level="12.1.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>12.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
-<li class="chapter" data-level="12.1.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>12.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="12.1.5" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>12.1.5</b> Simulated data</a></li>
-<li class="chapter" data-level="12.1.6" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>12.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="12.1.7" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>12.1.7</b> The p-value</a></li>
-</ul></li>
-<li class="chapter" data-level="12.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>12.2</b> Bootstrapping for the regression slope</a></li>
-<li class="chapter" data-level="12.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>12.3</b> Inference for multiple regression</a><ul>
-<li class="chapter" data-level="12.3.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>12.3.1</b> Refresher: Professor evaluations data</a></li>
-<li class="chapter" data-level="12.3.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>12.3.2</b> Refresher: Visualizations</a></li>
-<li class="chapter" data-level="12.3.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>12.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="12.3.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>12.3.4</b> Script of R code</a></li>
-</ul></li>
-<li class="chapter" data-level="12.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>12.4</b> Residual analysis</a><ul>
-<li class="chapter" data-level="12.4.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>12.4.1</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model2residuals"><i class="fa fa-check"></i><b>12.4.2</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model3residuals"><i class="fa fa-check"></i><b>12.4.3</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model4residuals"><i class="fa fa-check"></i><b>12.4.4</b> Residual analysis</a></li>
-</ul></li>
-</ul></li>
-<li class="part"><span><b>IV Conclusion</b></span></li>
-<li class="chapter" data-level="13" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html"><i class="fa fa-check"></i><b>13</b> Thinking with Data</a><ul>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="13.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>13.1</b> Case study: Seattle house prices</a><ul>
-<li class="chapter" data-level="13.1.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>13.1.1</b> Exploratory data analysis (EDA)</a></li>
-<li class="chapter" data-level="13.1.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>13.1.2</b> log10 transformations</a></li>
-<li class="chapter" data-level="13.1.3" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>13.1.3</b> EDA Part II</a></li>
-<li class="chapter" data-level="13.1.4" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>13.1.4</b> Regression modeling</a></li>
-<li class="chapter" data-level="13.1.5" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>13.1.5</b> Making predictions</a></li>
-</ul></li>
-<li class="chapter" data-level="13.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>13.2</b> Case study: Effective data storytelling</a><ul>
-<li class="chapter" data-level="13.2.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>13.2.1</b> Bechdel test for Hollywood gender representation</a></li>
-<li class="chapter" data-level="13.2.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>13.2.2</b> US Births in 1999</a></li>
-<li class="chapter" data-level="13.2.3" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>13.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="13.2.4" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>13.2.4</b> Script of R code</a></li>
-</ul></li>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
-</ul></li>
-<li class="appendix"><span><b>Appendix</b></span></li>
-<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
-<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
-<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
-<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
-<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
-<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
-<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
-<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
-<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
-<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
-<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
-<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
-<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
-<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
-<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
-<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
-</ul></li>
-<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
-<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
-<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
-<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
-<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
-<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
-<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
-</ul></li>
-<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
-<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
-<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
-<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
-<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
-<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
-<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
-<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
-<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
-<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
-</ul></li>
-<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
-<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
-<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
-<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
-<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
-<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
-<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
-<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
-<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
-<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
-</ul></li>
-<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
-<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
-<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
-<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
-<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
-<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
-<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-12"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
-<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
-<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="D" data-path="D-appendixD.html"><a href="D-appendixD.html"><i class="fa fa-check"></i><b>D</b> Learning Check Solutions</a><ul>
-<li class="chapter" data-level="D.1" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-2-solutions"><i class="fa fa-check"></i><b>D.1</b> Chapter 2 Solutions</a></li>
-<li class="chapter" data-level="D.2" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-3-solutions"><i class="fa fa-check"></i><b>D.2</b> Chapter 3 Solutions</a></li>
-<li class="chapter" data-level="D.3" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-4-solutions"><i class="fa fa-check"></i><b>D.3</b> Chapter 4 Solutions</a></li>
-<li class="chapter" data-level="D.4" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-5-solutions"><i class="fa fa-check"></i><b>D.4</b> Chapter 5 Solutions</a></li>
-<li class="chapter" data-level="D.5" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-6-solutions"><i class="fa fa-check"></i><b>D.5</b> Chapter 6 Solutions</a></li>
-</ul></li>
-<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
-</ul>
-
-      </nav>
-    </div>
-
-    <div class="book-body">
-      <div class="body-inner">
-        <div class="book-header" role="navigation">
-          <h1>
-            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Statistical Inference via Data Science</a>
-          </h1>
-        </div>
-
-        <div class="page-wrapper" tabindex="-1" role="main">
-          <div class="page-inner">
-
-            <section class="normal" id="section-">
-<html>
-<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
-</html>
-<div id="example-comparing-two-proportions" class="section level1">
-<h1><span class="header-section-number">Chapter 10</span> Example: Comparing two proportions</h1>
-<p>If you see someone else yawn, are you more likely to yawn? In an <a href="http://www.discovery.com/tv-shows/mythbusters/mythbusters-database/yawning-contagious/">episode</a> of the show <em>Mythbusters</em>, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website <a href="https://www.discovery.com/tv-shows/mythbusters/videos/is-yawning-contagious">here</a>. More information about the episode is also available on IMDb <a href="https://www.imdb.com/title/tt0768479/">here</a>.</p>
-<p>Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (“confederate”) who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at <code>mythbusters_yawn</code> in the <code>moderndive</code> package. Let’s check it out.</p>
-<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn</code></pre>
-<pre><code># A tibble: 50 x 3
-    subj group   yawn 
-   &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
- 1     1 seed    yes  
- 2     2 control yes  
- 3     3 seed    no   
- 4     4 seed    yes  
- 5     5 seed    no   
- 6     6 control no   
- 7     7 seed    yes  
- 8     8 control no   
- 9     9 control no   
-10    10 seed    no   
-# … with 40 more rows</code></pre>
-<ul>
-<li>The participant ID is stored in the <code>subj</code> variable with values of 1 to 50.</li>
-<li>The <code>group</code> variable is either <code>&quot;seed&quot;</code> for when a confederate was trying to influence the participant or <code>&quot;control&quot;</code> if a confederate did not interact with the participant.</li>
-<li>The <code>yawn</code> variable is either <code>&quot;yes&quot;</code> if the participant yawned or <code>&quot;no&quot;</code> if the participant did not yawn.</li>
-</ul>
-<p>We can use the <code>janitor</code> package to get a glimpse into this data in a table format:</p>
-<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">tabyl</span>(group, yawn) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">adorn_percentages</span>() <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">adorn_pct_formatting</span>() <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># To show original counts</span>
-<span class="st">  </span><span class="kw">adorn_ns</span>()</code></pre>
-<pre><code>   group         no        yes
- control 75.0% (12) 25.0%  (4)
-    seed 70.6% (24) 29.4% (10)</code></pre>
-<p>We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We’d like to see if the difference between these two proportions is significantly larger than 0. If so, we’d have evidence to support the claim that yawning is contagious based on this study.</p>
-<p>In looking over this problem, we can make note of some important details to include in our <code>infer</code> pipeline:</p>
-<ul>
-<li>We are calling a <code>success</code> having a <code>yawn</code> value of <code>&quot;yes&quot;</code>.</li>
-<li>Our response variable will always correspond to the variable used in the <code>success</code> so the response variable is <code>yawn</code>.</li>
-<li>The explanatory variable is the other variable of interest here: <code>group</code>.</li>
-</ul>
-<p>To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not.</p>
-<div id="compute-the-point-estimate" class="section level3">
-<h3><span class="header-section-number">10.0.1</span> Compute the point estimate</h3>
-<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group)</code></pre>
-<pre><code>Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`.</code></pre>
-<p>Note that the <code>success</code> argument must be specified in situations such as this where the response variable has only two levels.</p>
-<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>)</code></pre>
-<pre><code>Response: yawn (factor)
-Explanatory: group (factor)
-# A tibble: 50 x 2
-   yawn  group  
-   &lt;fct&gt; &lt;fct&gt;  
- 1 yes   seed   
- 2 yes   control
- 3 no    seed   
- 4 yes   seed   
- 5 no    seed   
- 6 no    control
- 7 yes   seed   
- 8 no    control
- 9 no    control
-10 no    seed   
-# … with 40 more rows</code></pre>
-<p>We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes.</p>
-<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>)</code></pre>
-<pre><code>Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c(&quot;first&quot;, &quot;second&quot;)` means `(&quot;first&quot; - &quot;second&quot;)`. Check `?calculate` for details.</code></pre>
-<p>We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the <code>order</code> in which R should subtract these proportions of successes. As the error message states, we’ll want to put <code>&quot;seed&quot;</code> first after <code>c()</code> and then <code>&quot;control&quot;</code>: <code>order = c(&quot;seed&quot;, &quot;control&quot;)</code>. Our point estimate is thus calculated:</p>
-<pre class="sourceCode r"><code class="sourceCode r">obs_diff &lt;-<span class="st"> </span>mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;seed&quot;</span>, <span class="st">&quot;control&quot;</span>))
-obs_diff</code></pre>
-<pre><code># A tibble: 1 x 1
-    stat
-   &lt;dbl&gt;
-1 0.0441</code></pre>
-<p>This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25).</p>
-</div>
-<div id="bootstrap-distribution-2" class="section level3">
-<h3><span class="header-section-number">10.0.2</span> Bootstrap distribution</h3>
-<p>Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection <a href="9-confidence-intervals.html#bootstrap-process">9.1.3</a> and in computing bootstrap proportions in Section <a href="9-confidence-intervals.html#one-prop-ci">9.6</a>, but we haven’t yet worked with bootstrapping involving multiple variables though.</p>
-<p>In the <code>infer</code> package, bootstrapping with multiple variables means that each <strong>row</strong> is potentially resampled. Let’s investigate this by looking at the first few rows of <code>mythbusters_yawn</code>:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(mythbusters_yawn)</code></pre>
-<pre><code># A tibble: 6 x 3
-   subj group   yawn 
-  &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
-1     1 seed    yes  
-2     2 control yes  
-3     3 seed    no   
-4     4 seed    yes  
-5     5 seed    no   
-6     6 control no   </code></pre>
-<p>When we bootstrap this data, we are potentially pulling the subject’s readings multiple times. Thus, we could see the entries of <code>&quot;seed&quot;</code> for <code>group</code> and <code>&quot;no&quot;</code> for <code>yawn</code> together in a new row in a bootstrap sample. This is further seen by exploring the <code>sample_n()</code> function in <code>dplyr</code> on this smaller 6 row data frame comprised of <code>head(mythbusters_yawn)</code>. The <code>sample_n()</code> function can perform this bootstrapping procedure and is similar to the <code>rep_sample_n()</code> function in <code>infer</code>, except that it is not <code>rep</code>eated but rather only performs one sample with or without replacement.</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2019</span>)</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(mythbusters_yawn) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">sample_n</span>(<span class="dt">size =</span> <span class="dv">6</span>, <span class="dt">replace =</span> <span class="ot">TRUE</span>)</code></pre>
-<pre><code># A tibble: 6 x 3
-   subj group   yawn 
-  &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
-1     5 seed    no   
-2     5 seed    no   
-3     2 control yes  
-4     4 seed    yes  
-5     1 seed    yes  
-6     1 seed    yes  </code></pre>
-<p>We can see that in this bootstrap sample generated from the first six rows of <code>mythbusters_yawn</code>, we have some rows repeated. The same is true when we perform the <code>generate()</code> step in <code>infer</code> as done below.</p>
-<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution &lt;-<span class="st"> </span>mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;seed&quot;</span>, <span class="st">&quot;control&quot;</span>))</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">20</span>)</code></pre>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-323-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<p>This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply <code>get_ci()</code> can be used.</p>
-<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">type =</span> <span class="st">&quot;percentile&quot;</span>, <span class="dt">level =</span> <span class="fl">0.95</span>)</code></pre>
-<pre><code># A tibble: 1 x 2
-  `2.5%` `97.5%`
-   &lt;dbl&gt;   &lt;dbl&gt;
-1 -0.219   0.293</code></pre>
-<p>The confidence interval shown here includes the value of 0. We’ll see in Chapter <a href="11-hypothesis-testing.html#hypothesis-testing">11</a> further what this means in terms of this difference being statistically significant or not, but let’s examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293.</p>
-<p>Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about “95% confident”) that the seed group had a higher proportion of yawning than the control group.</p>
-<p>Note that this all relates to the importance of denoting the <code>order</code> argument in the <code>calculate()</code> function. Since we specified <code>&quot;seed&quot;</code> and then <code>&quot;control&quot;</code> positive values for the statistic correspond to the <code>&quot;seed&quot;</code> proportion being higher, whereas negative values correspond to the <code>&quot;control&quot;</code> group being higher.</p>
-<p>We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that “yawning is contagious” being “confirmed” is not statistically appropriate.</p>
-<div class="learncheck">
-<p>
-<strong><em>Learning check</em></strong>
-</p>
-</div>
-<p>Practice problems to come soon!</p>
-<div class="learncheck">
-
-</div>
-<hr />
-</div>
-<div id="conclusion-6" class="section level2">
-<h2><span class="header-section-number">10.1</span> Conclusion</h2>
-<div id="whats-to-come-5" class="section level3">
-<h3><span class="header-section-number">10.1.1</span> What’s to come?</h3>
-<p>This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter <a href="11-hypothesis-testing.html#hypothesis-testing">11</a> up next!</p>
-</div>
-<div id="script-of-r-code-2" class="section level3">
-<h3><span class="header-section-number">10.1.2</span> Script of R code</h3>
-<p>An R script file of all R code used in this chapter is available <a href="scripts/09-confidence-intervals.R">here</a>.</p>
-
-</div>
-</div>
-</div>
-            </section>
-
-          </div>
-        </div>
-      </div>
-<a href="9-confidence-intervals.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
-<a href="11-hypothesis-testing.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
-    </div>
-  </div>
-<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
-<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
-<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
-<script>
-gitbook.require(["gitbook"], function(gitbook) {
-gitbook.start({
-"sharing": {
-"github": false,
-"facebook": true,
-"twitter": true,
-"google": false,
-"linkedin": false,
-"weibo": false,
-"instapaper": false,
-"vk": false,
-"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
-},
-"fontsettings": {
-"theme": "white",
-"family": "sans",
-"size": 2
-},
-"edit": {
-"link": "https://github.com/moderndive/moderndive_book/edit/master/09-confidence-intervals.Rmd",
-"text": "Edit"
-},
-"history": {
-"link": null,
-"text": null
-},
-"download": null,
-"toc": {
-"collapse": "section",
-"scroll_highlight": true
-}
-});
-});
-</script>
-
-<!-- dynamically load mathjax for compatibility with self-contained -->
-<script>
-  (function () {
-    var script = document.createElement("script");
-    script.type = "text/javascript";
-    var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
-    if (location.protocol !== "file:" && /^https?:/.test(src))
-      src = src.replace(/^https?:/, '');
-    script.src = src;
-    document.getElementsByTagName("head")[0].appendChild(script);
-  })();
-</script>
-</body>
-
-</html>
diff --git a/docs/10-hypothesis-testing.html b/docs/10-hypothesis-testing.html
index 071dddac2..bb0ab84b0 100644
--- a/docs/10-hypothesis-testing.html
+++ b/docs/10-hypothesis-testing.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 10 Hypothesis Testing | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 10 Hypothesis Testing | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 10 Hypothesis Testing | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -538,6 +531,13 @@ <h1>
 </html>
 <div id="hypothesis-testing" class="section level1">
 <h1><span class="header-section-number">Chapter 10</span> Hypothesis Testing</h1>
+<hr />
+<div class="announcement">
+<p>
+<strong>In preparation for our first print edition to be published by CRC Press in Fall 2019, we’re remodeling this chapter a bit. Don’t expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at <a href="https://moderndive.com/">ModernDive.com</a> by early Summer 2019!</strong>
+</p>
+</div>
+<hr />
 <p>We saw some of the main concepts of hypothesis testing introduced in Chapters <a href="8-sampling.html#sampling">8</a> and <a href="9-confidence-intervals.html#confidence-intervals">9</a>. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations.</p>
 <p>The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the <code>infer</code> package pipeline in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix <a href="B-appendixB.html#appendixB">B</a>.</p>
 <p>We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the <span class="math inline">\(t\)</span>-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook.</p>
@@ -597,7 +597,7 @@ <h2><span class="header-section-number">10.1</span> When inference is not needed
 <p>To further understand just how different the <code>air_time</code> variable is for BOS and SFO, let’s look at a boxplot:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> bos_sfo, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dest, <span class="dt">y =</span> air_time)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-335-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-338-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Since there is no overlap at all, we can conclude that the <code>air_time</code> for San Francisco flights is statistically greater (at any level of significance) than the <code>air_time</code> for Boston flights. This is a clear example of not needing to do anything more than some simple exploratory data analysis with descriptive statistics and data visualization to get an appropriate inferential conclusion. This is one reason why you should <strong>ALWAYS</strong> investigate the sample data first using <code>dplyr</code> and <code>ggplot2</code> via exploratory data analysis.</p>
 <p>As you get more and more practice with hypothesis testing, you’ll be better able to determine in many cases whether or not the results will be statistically significant. There are circumstances where it is difficult to tell, but you should always try to make a guess FIRST about significance after you have completed your data exploration and before you actually begin the inferential techniques.</p>
 <hr />
@@ -668,7 +668,7 @@ <h2><span class="header-section-number">10.4</span> Types of errors in hypothesi
 </ul>
 <p>The risk of error is the price researchers pay for basing an inference about a population on a sample. With any reasonable sample-based procedure, there is some chance that a Type I error will be made and some chance that a Type II error will occur.</p>
 <p>To help understand the concepts of Type I error and Type II error, observe the following table:</p>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-336"></span>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-339"></span>
 <img src="images/errors.png" alt="Type I and Type II errors" width="\textwidth" />
 <p class="caption">
 FIGURE 10.2: Type I and Type II errors
@@ -772,8 +772,8 @@ <h3><span class="header-section-number">10.7.2</span> Comparing action and roman
 <p>Let’s now visualize the distributions of <code>rating</code> across both levels of <code>genre</code>. Think about what type(s) of plot is/are appropriate here before you proceed:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> movies_trimmed, <span class="kw">aes</span>(<span class="dt">x =</span> genre, <span class="dt">y =</span> rating)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-344"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-344-1.png" alt="Rating vs genre in the population" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-347"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-347-1.png" alt="Rating vs genre in the population" width="\textwidth" />
 <p class="caption">
 FIGURE 10.3: Rating vs genre in the population
 </p>
@@ -815,8 +815,8 @@ <h3><span class="header-section-number">10.7.4</span> Data</h3>
 <p>We can now observe the distributions of our two sample ratings for both groups. Remember that these plots should be rough approximations of our population distributions of movie ratings for <code>&quot;Action&quot;</code> and <code>&quot;Romance&quot;</code> in our population of all movies in the <code>movies</code> data frame.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> movies_genre_sample, <span class="kw">aes</span>(<span class="dt">x =</span> genre, <span class="dt">y =</span> rating)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-347"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-347-1.png" alt="Genre vs rating for our sample" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-350"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-350-1.png" alt="Genre vs rating for our sample" width="\textwidth" />
 <p class="caption">
 FIGURE 10.5: Genre vs rating for our sample
 </p>
@@ -824,8 +824,8 @@ <h3><span class="header-section-number">10.7.4</span> Data</h3>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> movies_genre_sample, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> rating)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">1</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
 <span class="st">  </span><span class="kw">facet_grid</span>(genre <span class="op">~</span><span class="st"> </span>.)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-348"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-348-1.png" alt="Genre vs rating for our sample as faceted histogram" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-351"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-351-1.png" alt="Genre vs rating for our sample as faceted histogram" width="\textwidth" />
 <p class="caption">
 FIGURE 10.6: Genre vs rating for our sample as faceted histogram
 </p>
@@ -932,8 +932,8 @@ <h3><span class="header-section-number">10.7.9</span> Distribution of <span clas
 <p>A <strong>null distribution</strong> of simulated differences in sample means is created with the specification of <code>stat = &quot;diff in means&quot;</code> for the <code>calculate()</code> step. The <strong>null distribution</strong> is similar to the bootstrap distribution we saw in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>, but remember that it consists of statistics generated assuming the null hypothesis is true.</p>
 <p>We can now plot the distribution of these simulated differences in means:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-361"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-361-1.png" alt="Simulated differences in means histogram" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-364"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-364-1.png" alt="Simulated differences in means histogram" width="\textwidth" />
 <p class="caption">
 FIGURE 10.7: Simulated differences in means histogram
 </p>
@@ -944,8 +944,8 @@ <h3><span class="header-section-number">10.7.10</span> The p-value</h3>
 <p>Remember that we are interested in seeing where our observed sample mean difference of 0.95 falls on this null/randomization distribution. We are interested in simply a difference here so “more extreme” corresponds to values in both tails on the distribution. Let’s shade our null distribution to show a visual representation of our <span class="math inline">\(p\)</span>-value:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> obs_diff, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-362"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-362-1.png" alt="Shaded histogram to show p-value" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-365"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-365-1.png" alt="Shaded histogram to show p-value" width="\textwidth" />
 <p class="caption">
 FIGURE 10.8: Shaded histogram to show p-value
 </p>
@@ -953,8 +953,8 @@ <h3><span class="header-section-number">10.7.10</span> The p-value</h3>
 <p>Remember that the observed difference in means was 0.95. We have shaded red all values at or above that value and also shaded red those values at or below its negative value (since this is a two-tailed test). By giving <code>obs_stat = obs_diff</code> a vertical darker line is also shown at 0.95. To better estimate how large the <span class="math inline">\(p\)</span>-value will be, we also increase the number of bins to 100 here from 20:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">100</span>, <span class="dt">obs_stat =</span> obs_diff, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-363"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-363-1.png" alt="Histogram with vertical lines corresponding to observed statistic" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-366"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-366-1.png" alt="Histogram with vertical lines corresponding to observed statistic" width="\textwidth" />
 <p class="caption">
 FIGURE 10.9: Histogram with vertical lines corresponding to observed statistic
 </p>
@@ -983,8 +983,9 @@ <h3><span class="header-section-number">10.7.11</span> Corresponding confidence
 <span class="co">#  hypothesize(null = &quot;independence&quot;) %&gt;% </span>
 <span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">5000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Romance&quot;</span>, <span class="st">&quot;Action&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">get_ci</span>()
-percentile_ci_two_means</code></pre></div>
+<span class="st">  </span><span class="kw">get_ci</span>()</code></pre></div>
+<pre><code>Setting `type = &quot;bootstrap&quot;` in `generate()`.</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">percentile_ci_two_means</code></pre></div>
 <pre><code># A tibble: 1 x 2
   `2.5%` `97.5%`
    &lt;dbl&gt;   &lt;dbl&gt;
@@ -1024,10 +1025,10 @@ <h2><span class="header-section-number">10.8</span> Building theory-based method
 <p>These traditional methods have been used for many decades back to the time when researchers didn’t have access to computers that could run 5000 simulations in a few seconds. They had to base their methods on probability theory instead. Many fields and researchers continue to use these methods and that is the biggest reason for their inclusion here. It’s important to remember that a <span class="math inline">\(t\)</span>-test or a <span class="math inline">\(z\)</span>-test is really just an approximation of what you have seen in this chapter already using simulation and randomization. The focus here is on understanding how the shape of the <span class="math inline">\(t\)</span>-curve comes about without digging big into the mathematical underpinnings.</p>
 <div id="example-t-test-for-two-independent-samples" class="section level3">
 <h3><span class="header-section-number">10.8.1</span> Example: <span class="math inline">\(t\)</span>-test for two independent samples</h3>
-<p>What is commonly done in statistics is the process of normalization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common normalization is known as the <span class="math inline">\(z\)</span>-score. The formula for a <span class="math inline">\(z\)</span>-score is <span class="math display">\[Z = \frac{x - \mu}{\sigma},\]</span> where <span class="math inline">\(x\)</span> represent the value of a variable, <span class="math inline">\(\mu\)</span> represents the mean of the variable, and <span class="math inline">\(\sigma\)</span> represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding <span class="math inline">\(z\)</span>-score that gives how many standard deviations away that value is from its mean. <span class="math inline">\(z\)</span>-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below.</p>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-368-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>What is commonly done in statistics is the process of standardization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common standardization is known as the <span class="math inline">\(z\)</span>-score. The formula for a <span class="math inline">\(z\)</span>-score is <span class="math display">\[Z = \frac{x - \mu}{\sigma},\]</span> where <span class="math inline">\(x\)</span> represent the value of a variable, <span class="math inline">\(\mu\)</span> represents the mean of the variable, and <span class="math inline">\(\sigma\)</span> represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding <span class="math inline">\(z\)</span>-score that gives how many standard deviations away that value is from its mean. <span class="math inline">\(z\)</span>-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below.</p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-371-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity.</p>
-<p>Another form of normalization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This normalization is often called the <span class="math inline">\(t\)</span>-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is <span class="math display">\[T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}}  }\]</span></p>
+<p>Another form of standardization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This standardization is often called the <span class="math inline">\(t\)</span>-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is <span class="math display">\[T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}}  }\]</span></p>
 <p>There is a lot to try to unpack here.</p>
 <ul>
 <li><span class="math inline">\(\bar{x}_1\)</span> is the sample mean response of the first group</li>
@@ -1045,8 +1046,8 @@ <h3><span class="header-section-number">10.8.1</span> Example: <span class="math
 <p>We have already built an approximation for what we think the distribution of <span class="math inline">\(\delta = \bar{x}_1 - \bar{x}_2\)</span> looks like using randomization above. Recall this distribution:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> null_distribution_two_means, <span class="kw">aes</span>(<span class="dt">x =</span> stat)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">bins =</span> <span class="dv">20</span>)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-369"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-369-1.png" alt="Simulated differences in means histogram" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-372"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-372-1.png" alt="Simulated differences in means histogram" width="\textwidth" />
 <p class="caption">
 FIGURE 10.10: Simulated differences in means histogram
 </p>
@@ -1060,12 +1061,12 @@ <h3><span class="header-section-number">10.8.1</span> Example: <span class="math
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distribution_t &lt;-<span class="st"> </span>generated_samples <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;t&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Romance&quot;</span>, <span class="st">&quot;Action&quot;</span>))
 null_distribution_t <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-371-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-374-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see that the shape of this <code>stat = &quot;t&quot;</code> distribution is the same as that of <code>stat = &quot;diff in means&quot;</code>. The scale has changed though with the <span class="math inline">\(t\)</span> values having less spread than the difference in means.</p>
 <p>A traditional <span class="math inline">\(t\)</span>-test doesn’t look at this simulated distribution, but instead it looks at the <span class="math inline">\(t\)</span>-curve with degrees of freedom equal to 62.029. We can overlay this distribution over the top of our permuted <span class="math inline">\(t\)</span> statistics using the <code>method = &quot;both&quot;</code> setting in <code>visualize()</code>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distribution_t <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">method =</span> <span class="st">&quot;both&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-372-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-375-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We can see that the curve does a good job of approximating the randomization distribution here. (More on when to expect for this to be the case when we discuss conditions for the <span class="math inline">\(t\)</span>-test in a bit.) To calculate the <span class="math inline">\(p\)</span>-value in this case, we need to figure out how much of the total area under the <span class="math inline">\(t\)</span>-curve is at our observed <span class="math inline">\(T\)</span>-statistic or more, plus also adding the area under the curve at the negative value of the observed <span class="math inline">\(T\)</span>-statistic or below. (Remember this is a two-tailed test so we are looking for a difference–values in the tails of either direction.) Just as we converted all of the simulated values to <span class="math inline">\(T\)</span>-statistics, we must also do so for our observed effect <span class="math inline">\(\delta^*\)</span>:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">obs_t &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
@@ -1073,7 +1074,7 @@ <h3><span class="header-section-number">10.8.1</span> Example: <span class="math
 <p>So graphically we are interested in finding the percentage of values that are at or above 2.945 or at or below -2.945.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distribution_t <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">method =</span> <span class="st">&quot;both&quot;</span>, <span class="dt">obs_stat =</span> obs_t, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-374-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-377-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>As we might have expected with this just being a standardization of the difference in means statistic that produced a small <span class="math inline">\(p\)</span>-value, we also have a very small one here.</p>
 </div>
 <div id="conditions-for-t-test" class="section level3">
@@ -1095,7 +1096,7 @@ <h3><span class="header-section-number">10.8.2</span> Conditions for t-test</h3>
 <h2><span class="header-section-number">10.9</span> Conclusion</h2>
 <p>We conclude by showing the <code>infer</code> pipeline diagram. In Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a>, we’ll come back to regression and see how the ideas covered in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a> and this chapter can help in understanding the significance of predictors in modeling.</p>
 <p><img src="images/flowcharts/infer/ht_diagram.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<div id="script-of-r-code-3" class="section level3">
+<div id="script-of-r-code-1" class="section level3">
 <h3><span class="header-section-number">10.9.1</span> Script of R code</h3>
 <p>An R script file of all R code used in this chapter is available <a href="scripts/10-hypothesis-testing.R">here</a>.</p>
 
diff --git a/docs/11-inference-for-regression.html b/docs/11-inference-for-regression.html
index 71e39fa5d..5f8c6b127 100644
--- a/docs/11-inference-for-regression.html
+++ b/docs/11-inference-for-regression.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 11 Inference for Regression | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 11 Inference for Regression | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 11 Inference for Regression | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -539,13 +532,10 @@ <h1>
 <div id="inference-for-regression" class="section level1">
 <h1><span class="header-section-number">Chapter 11</span> Inference for Regression</h1>
 <hr />
-<div class="learncheck">
+<div class="announcement">
 <p>
-<strong>Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at <a href="https://github.com/moderndive/moderndive_book" class="uri">https://github.com/moderndive/moderndive_book</a>.</strong>
+<strong>In preparation for our first print edition to be published by CRC Press in Fall 2019, we’re remodeling this chapter a bit. Don’t expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at <a href="https://moderndive.com/">ModernDive.com</a> by early Summer 2019!</strong>
 </p>
-<center>
-/begin{center} <code>r include_image(path = “images/sign-2408065_1920.png”, html_opts=“height=100px”, latex_opts = “width=20%”)</code> /end{center}
-</center>
 </div>
 <hr />
 <div id="needed-packages-9" class="section level3 unnumbered">
@@ -617,7 +607,7 @@ <h3><span class="header-section-number">11.1.6</span> Distribution of <span clas
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre></div>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> slope_obs, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-383-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-386-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with <code>visualize()</code>.</p>
 </div>
 <div id="the-p-value-1" class="section level3">
@@ -651,8 +641,12 @@ <h2><span class="header-section-number">11.2</span> Bootstrapping for the regres
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre></div>
 <p>To further reinforce the process being done in the pipeline, we’ve added the <code>type</code> argument to <code>generate()</code>. This is automatically added based on the entries for <code>specify()</code> and <code>hypothesize()</code> but it provides a useful way to check to make sure <code>generate()</code> is created the samples in the desired way. In this case, we <code>permute</code>d the values of one variable across the values of the other 10,000 times and <code>calculate</code>d a <code>&quot;slope&quot;</code> coefficient for each of these 10,000 <code>generate</code>d samples.</p>
 <p>If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_slope_distn &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>, <span class="dt">type =</span> <span class="st">&quot;bootstrap&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre></div>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-389-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-392-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Next we can use the <code>get_ci()</code> function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">percentile_slope_ci &lt;-<span class="st"> </span>bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">level =</span> <span class="fl">0.99</span>, <span class="dt">type =</span> <span class="st">&quot;percentile&quot;</span>)
@@ -824,7 +818,7 @@ <h3><span class="header-section-number">11.3.3</span> Refresher: Regression tabl
 <span class="kw">get_regression_table</span>(score_model_<span class="dv">3</span>)</code></pre></div>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-396">TABLE 11.2: </span>Model 2: Regression table with interaction effect included
+<span id="tab:unnamed-chunk-399">TABLE 11.2: </span>Model 2: Regression table with interaction effect included
 </caption>
 <thead>
 <tr>
@@ -947,7 +941,7 @@ <h3><span class="header-section-number">11.3.3</span> Refresher: Regression tabl
 </tbody>
 </table>
 </div>
-<div id="script-of-r-code-4" class="section level3">
+<div id="script-of-r-code-2" class="section level3">
 <h3><span class="header-section-number">11.3.4</span> Script of R code</h3>
 <p>An R script file of all R code used in this chapter is available <a href="scripts/11-inference-for-regression.R">here</a>.</p>
 <hr />
@@ -1086,7 +1080,7 @@ <h3><span class="header-section-number">11.4.2</span> Residual analysis</h3>
 <span class="st">  </span><span class="kw">arrange</span>(lifeExp)</code></pre></div>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-405">TABLE 11.3: </span>Countries in Asia with shortest life expectancy
+<span id="tab:unnamed-chunk-408">TABLE 11.3: </span>Countries in Asia with shortest life expectancy
 </caption>
 <thead>
 <tr>
@@ -1227,8 +1221,8 @@ <h3><span class="header-section-number">11.4.3</span> Residual analysis</h3>
 <span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Income (in $1000)&quot;</span>, 
        <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>, 
        <span class="dt">title =</span> <span class="st">&quot;Residuals vs income&quot;</span>)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-410"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-410-1.png" alt="Residuals vs credit limit and income" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-413"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-413-1.png" alt="Residuals vs credit limit and income" width="\textwidth" />
 <p class="caption">
 FIGURE 11.9: Residuals vs credit limit and income
 </p>
@@ -1237,6 +1231,7 @@ <h3><span class="header-section-number">11.4.3</span> Residual analysis</h3>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
 <span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>)</code></pre></div>
+<pre><code>`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</code></pre>
 <div class="figure" style="text-align: center"><span id="fig:model3-residuals-hist"></span>
 <img src="ismaykimkuyper_files/figure-html/model3-residuals-hist-1.png" alt="Relationship between credit card balance and credit limit/income" width="\textwidth" />
 <p class="caption">
diff --git a/docs/12-inference-for-regression.html b/docs/12-inference-for-regression.html
deleted file mode 100644
index 8f50ea2b1..000000000
--- a/docs/12-inference-for-regression.html
+++ /dev/null
@@ -1,1410 +0,0 @@
-<!DOCTYPE html>
-<html >
-
-<head>
-
-  <meta charset="UTF-8">
-  <meta http-equiv="X-UA-Compatible" content="IE=edge">
-  <title>Chapter 12 Inference for Regression | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
-  <meta name="generator" content="bookdown  and GitBook 2.6.7">
-
-  <meta property="og:title" content="Chapter 12 Inference for Regression | Statistical Inference via Data Science" />
-  <meta property="og:type" content="book" />
-  <meta property="og:url" content="https://moderndive.com/" />
-  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
-  <meta name="github-repo" content="moderndive/moderndive_book" />
-
-  <meta name="twitter:card" content="summary" />
-  <meta name="twitter:title" content="Chapter 12 Inference for Regression | Statistical Inference via Data Science" />
-  
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
-  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
-
-<meta name="author" content="Chester Ismay and Albert Y. Kim">
-
-
-<meta name="date" content="2019-02-03">
-
-  <meta name="viewport" content="width=device-width, initial-scale=1">
-  <meta name="apple-mobile-web-app-capable" content="yes">
-  <meta name="apple-mobile-web-app-status-bar-style" content="black">
-  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
-  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
-<link rel="prev" href="11-hypothesis-testing.html">
-<link rel="next" href="13-thinking-with-data.html">
-<script src="libs/jquery-2.2.3/jquery.min.js"></script>
-<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
-
-
-
-
-
-
-
-<script src="libs/kePrint-0.0.1/kePrint.js"></script>
-<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
-<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
-<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
-<script src="libs/dygraphs-1.1.1/shapes.js"></script>
-<script src="libs/moment-2.8.4/moment.js"></script>
-<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
-<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
-<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
-<script>
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
-
-  ga('create', 'UA-89938436-1', 'auto');
-  ga('send', 'pageview');
-
-</script>
-
-
-<style type="text/css">
-a.sourceLine { display: inline-block; line-height: 1.25; }
-a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
-a.sourceLine:empty { height: 1.2em; }
-.sourceCode { overflow: visible; }
-code.sourceCode { white-space: pre; position: relative; }
-div.sourceCode { margin: 1em 0; }
-pre.sourceCode { margin: 0; }
-@media screen {
-div.sourceCode { overflow: auto; }
-}
-@media print {
-code.sourceCode { white-space: pre-wrap; }
-a.sourceLine { text-indent: -1em; padding-left: 1em; }
-}
-pre.numberSource a.sourceLine
-  { position: relative; left: -4em; }
-pre.numberSource a.sourceLine::before
-  { content: attr(data-line-number);
-    position: relative; left: -1em; text-align: right; vertical-align: baseline;
-    border: none; pointer-events: all; display: inline-block;
-    -webkit-touch-callout: none; -webkit-user-select: none;
-    -khtml-user-select: none; -moz-user-select: none;
-    -ms-user-select: none; user-select: none;
-    padding: 0 4px; width: 4em;
-    color: #aaaaaa;
-  }
-pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
-div.sourceCode
-  {  }
-@media screen {
-a.sourceLine::before { text-decoration: underline; }
-}
-code span.al { color: #ff0000; font-weight: bold; } /* Alert */
-code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
-code span.at { color: #7d9029; } /* Attribute */
-code span.bn { color: #40a070; } /* BaseN */
-code span.bu { } /* BuiltIn */
-code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
-code span.ch { color: #4070a0; } /* Char */
-code span.cn { color: #880000; } /* Constant */
-code span.co { color: #60a0b0; font-style: italic; } /* Comment */
-code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
-code span.do { color: #ba2121; font-style: italic; } /* Documentation */
-code span.dt { color: #902000; } /* DataType */
-code span.dv { color: #40a070; } /* DecVal */
-code span.er { color: #ff0000; font-weight: bold; } /* Error */
-code span.ex { } /* Extension */
-code span.fl { color: #40a070; } /* Float */
-code span.fu { color: #06287e; } /* Function */
-code span.im { } /* Import */
-code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
-code span.kw { color: #007020; font-weight: bold; } /* Keyword */
-code span.op { color: #666666; } /* Operator */
-code span.ot { color: #007020; } /* Other */
-code span.pp { color: #bc7a00; } /* Preprocessor */
-code span.sc { color: #4070a0; } /* SpecialChar */
-code span.ss { color: #bb6688; } /* SpecialString */
-code span.st { color: #4070a0; } /* String */
-code span.va { color: #19177c; } /* Variable */
-code span.vs { color: #4070a0; } /* VerbatimString */
-code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
-</style>
-
-<link rel="stylesheet" href="style.css" type="text/css" />
-</head>
-
-<body>
-
-
-
-  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
-
-    <div class="book-summary">
-      <nav role="navigation">
-
-<ul class="summary">
-<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.1</b> Introduction for students</a><ul>
-<li class="chapter" data-level="1.1.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.1.1</b> What you will learn from this book</a></li>
-<li class="chapter" data-level="1.1.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.1.2</b> Data/science pipeline</a></li>
-<li class="chapter" data-level="1.1.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.1.3</b> Reproducible research</a></li>
-<li class="chapter" data-level="1.1.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.1.4</b> Final note for students</a></li>
-</ul></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.2</b> Introduction for instructors</a><ul>
-<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.2.1</b> Who is this book for?</a></li>
-</ul></li>
-<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.3</b> DataCamp</a></li>
-<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.4</b> Connect and contribute</a></li>
-<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.5</b> About this book</a></li>
-<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.6</b> About the authors</a></li>
-</ul></li>
-<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
-<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#r-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
-<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
-<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
-</ul></li>
-<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
-<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
-<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#errors-warnings-and-messages"><i class="fa fa-check"></i><b>2.2.2</b> Errors, warnings, and messages</a></li>
-<li class="chapter" data-level="2.2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.3</b> Tips on learning to code</a></li>
-</ul></li>
-<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
-<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
-<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
-<li class="chapter" data-level="2.3.3" data-path="2-getting-started.html"><a href="2-getting-started.html#package-use"><i class="fa fa-check"></i><b>2.3.3</b> Package use</a></li>
-</ul></li>
-<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
-<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> <code>nycflights13</code> package</a></li>
-<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> <code>flights</code> data frame</a></li>
-<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
-<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
-</ul></li>
-<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
-<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#additional-resources"><i class="fa fa-check"></i><b>2.5.1</b> Additional resources</a></li>
-<li class="chapter" data-level="2.5.2" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.2</b> What’s to come?</a></li>
-</ul></li>
-</ul></li>
-<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
-<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization</a><ul>
-<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
-<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
-<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder data</a></li>
-<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components"><i class="fa fa-check"></i><b>3.1.3</b> Other components</a></li>
-<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> ggplot2 package</a></li>
-</ul></li>
-<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
-<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
-<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
-<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
-<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
-<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
-<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
-<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
-<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
-<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
-<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
-<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
-<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
-<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bar-or-geom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar or geom_col</a></li>
-<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
-<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#two-categ-barplot"><i class="fa fa-check"></i><b>3.8.3</b> Two categorical variables</a></li>
-<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#summary-table"><i class="fa fa-check"></i><b>3.9.1</b> Summary table</a></li>
-<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#argument-specification"><i class="fa fa-check"></i><b>3.9.2</b> Argument specification</a></li>
-<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#additional-resources-1"><i class="fa fa-check"></i><b>3.9.3</b> Additional resources</a></li>
-<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>3.9.4</b> What’s to come</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="4" data-path="4-wrangling.html"><a href="4-wrangling.html"><i class="fa fa-check"></i><b>4</b> Data Wrangling</a><ul>
-<li class="chapter" data-level="" data-path="4-wrangling.html"><a href="4-wrangling.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="4.1" data-path="4-wrangling.html"><a href="4-wrangling.html#piping"><i class="fa fa-check"></i><b>4.1</b> The pipe operator: <code>%&gt;%</code></a></li>
-<li class="chapter" data-level="4.2" data-path="4-wrangling.html"><a href="4-wrangling.html#filter"><i class="fa fa-check"></i><b>4.2</b> <code>filter</code> rows</a></li>
-<li class="chapter" data-level="4.3" data-path="4-wrangling.html"><a href="4-wrangling.html#summarize"><i class="fa fa-check"></i><b>4.3</b> <code>summarize</code> variables</a></li>
-<li class="chapter" data-level="4.4" data-path="4-wrangling.html"><a href="4-wrangling.html#groupby"><i class="fa fa-check"></i><b>4.4</b> <code>group_by</code> rows</a><ul>
-<li class="chapter" data-level="4.4.1" data-path="4-wrangling.html"><a href="4-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>4.4.1</b> Grouping by more than one variable</a></li>
-</ul></li>
-<li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
-<li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
-<li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
-</ul></li>
-<li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
-<li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
-<li class="chapter" data-level="4.8.2" data-path="4-wrangling.html"><a href="4-wrangling.html#rename"><i class="fa fa-check"></i><b>4.8.2</b> <code>rename</code> variables</a></li>
-<li class="chapter" data-level="4.8.3" data-path="4-wrangling.html"><a href="4-wrangling.html#top_n-values-of-a-variable"><i class="fa fa-check"></i><b>4.8.3</b> <code>top_n</code> values of a variable</a></li>
-</ul></li>
-<li class="chapter" data-level="4.9" data-path="4-wrangling.html"><a href="4-wrangling.html#conclusion-2"><i class="fa fa-check"></i><b>4.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="4.9.1" data-path="4-wrangling.html"><a href="4-wrangling.html#summary-table-1"><i class="fa fa-check"></i><b>4.9.1</b> Summary table</a></li>
-<li class="chapter" data-level="4.9.2" data-path="4-wrangling.html"><a href="4-wrangling.html#additional-resources-2"><i class="fa fa-check"></i><b>4.9.2</b> Additional resources</a></li>
-<li class="chapter" data-level="4.9.3" data-path="4-wrangling.html"><a href="4-wrangling.html#whats-to-come-1"><i class="fa fa-check"></i><b>4.9.3</b> What’s to come?</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
-<li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
-</ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
-<li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
-</ul></li>
-<li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
-<li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
-</ul></li>
-</ul></li>
-<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
-<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
-<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
-<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
-<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
-<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
-</ul></li>
-<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
-<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
-<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
-<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
-</ul></li>
-<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
-<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
-<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
-<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
-<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
-</ul></li>
-<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
-<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
-<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
-<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
-</ul></li>
-<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
-<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
-<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
-<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
-<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
-</ul></li>
-<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
-<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
-<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
-</ul></li>
-<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
-<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
-<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
-<li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
-</ul></li>
-<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
-<li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
-</ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
-</ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
-<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
-<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
-<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
-<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
-</ul></li>
-<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
-<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
-<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
-<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
-<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
-</ul></li>
-<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
-<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
-<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
-</ul></li>
-<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
-<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
-<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> Example: One proportion</a><ul>
-<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
-<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
-<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="10" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html"><i class="fa fa-check"></i><b>10</b> Example: Comparing two proportions</a><ul>
-<li class="chapter" data-level="10.0.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>10.0.1</b> Compute the point estimate</a></li>
-<li class="chapter" data-level="10.0.2" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>10.0.2</b> Bootstrap distribution</a></li>
-<li class="chapter" data-level="10.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#conclusion-6"><i class="fa fa-check"></i><b>10.1</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.1.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#whats-to-come-5"><i class="fa fa-check"></i><b>10.1.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="10.1.2" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#script-of-r-code-2"><i class="fa fa-check"></i><b>10.1.2</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="11" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html"><i class="fa fa-check"></i><b>11</b> Hypothesis Testing</a><ul>
-<li class="chapter" data-level="" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="11.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>11.1</b> When inference is not needed</a></li>
-<li class="chapter" data-level="11.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>11.2</b> Basics of hypothesis testing</a></li>
-<li class="chapter" data-level="11.3" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>11.3</b> Criminal trial analogy</a><ul>
-<li class="chapter" data-level="11.3.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>11.3.1</b> Two possible conclusions</a></li>
-</ul></li>
-<li class="chapter" data-level="11.4" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>11.4</b> Types of errors in hypothesis testing</a><ul>
-<li class="chapter" data-level="11.4.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>11.4.1</b> Logic of hypothesis testing</a></li>
-</ul></li>
-<li class="chapter" data-level="11.5" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>11.5</b> Statistical significance</a></li>
-<li class="chapter" data-level="11.6" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>11.6</b> Hypothesis testing with infer</a></li>
-<li class="chapter" data-level="11.7" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>11.7</b> Example: Comparing two means</a><ul>
-<li class="chapter" data-level="11.7.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>11.7.1</b> Randomization/permutation</a></li>
-<li class="chapter" data-level="11.7.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>11.7.2</b> Comparing action and romance movies</a></li>
-<li class="chapter" data-level="11.7.3" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>11.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
-<li class="chapter" data-level="11.7.4" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>11.7.4</b> Data</a></li>
-<li class="chapter" data-level="11.7.5" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>11.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="11.7.6" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>11.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
-<li class="chapter" data-level="11.7.7" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>11.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
-<li class="chapter" data-level="11.7.8" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>11.7.8</b> Simulated data</a></li>
-<li class="chapter" data-level="11.7.9" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>11.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="11.7.10" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>11.7.10</b> The p-value</a></li>
-<li class="chapter" data-level="11.7.11" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>11.7.11</b> Corresponding confidence interval</a></li>
-<li class="chapter" data-level="11.7.12" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>11.7.12</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="11.8" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>11.8</b> Building theory-based methods using computation</a><ul>
-<li class="chapter" data-level="11.8.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>11.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
-<li class="chapter" data-level="11.8.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>11.8.2</b> Conditions for t-test</a></li>
-</ul></li>
-<li class="chapter" data-level="11.9" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>11.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="11.9.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>11.9.1</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="12" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html"><i class="fa fa-check"></i><b>12</b> Inference for Regression</a><ul>
-<li class="chapter" data-level="" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="12.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>12.1</b> Simulation-based Inference for Regression</a><ul>
-<li class="chapter" data-level="12.1.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>12.1.1</b> Data</a></li>
-<li class="chapter" data-level="12.1.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>12.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
-<li class="chapter" data-level="12.1.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>12.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
-<li class="chapter" data-level="12.1.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>12.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="12.1.5" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>12.1.5</b> Simulated data</a></li>
-<li class="chapter" data-level="12.1.6" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>12.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="12.1.7" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>12.1.7</b> The p-value</a></li>
-</ul></li>
-<li class="chapter" data-level="12.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>12.2</b> Bootstrapping for the regression slope</a></li>
-<li class="chapter" data-level="12.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>12.3</b> Inference for multiple regression</a><ul>
-<li class="chapter" data-level="12.3.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>12.3.1</b> Refresher: Professor evaluations data</a></li>
-<li class="chapter" data-level="12.3.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>12.3.2</b> Refresher: Visualizations</a></li>
-<li class="chapter" data-level="12.3.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>12.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="12.3.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>12.3.4</b> Script of R code</a></li>
-</ul></li>
-<li class="chapter" data-level="12.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>12.4</b> Residual analysis</a><ul>
-<li class="chapter" data-level="12.4.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>12.4.1</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model2residuals"><i class="fa fa-check"></i><b>12.4.2</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model3residuals"><i class="fa fa-check"></i><b>12.4.3</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model4residuals"><i class="fa fa-check"></i><b>12.4.4</b> Residual analysis</a></li>
-</ul></li>
-</ul></li>
-<li class="part"><span><b>IV Conclusion</b></span></li>
-<li class="chapter" data-level="13" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html"><i class="fa fa-check"></i><b>13</b> Thinking with Data</a><ul>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="13.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>13.1</b> Case study: Seattle house prices</a><ul>
-<li class="chapter" data-level="13.1.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>13.1.1</b> Exploratory data analysis (EDA)</a></li>
-<li class="chapter" data-level="13.1.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>13.1.2</b> log10 transformations</a></li>
-<li class="chapter" data-level="13.1.3" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>13.1.3</b> EDA Part II</a></li>
-<li class="chapter" data-level="13.1.4" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>13.1.4</b> Regression modeling</a></li>
-<li class="chapter" data-level="13.1.5" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>13.1.5</b> Making predictions</a></li>
-</ul></li>
-<li class="chapter" data-level="13.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>13.2</b> Case study: Effective data storytelling</a><ul>
-<li class="chapter" data-level="13.2.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>13.2.1</b> Bechdel test for Hollywood gender representation</a></li>
-<li class="chapter" data-level="13.2.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>13.2.2</b> US Births in 1999</a></li>
-<li class="chapter" data-level="13.2.3" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>13.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="13.2.4" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>13.2.4</b> Script of R code</a></li>
-</ul></li>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
-</ul></li>
-<li class="appendix"><span><b>Appendix</b></span></li>
-<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
-<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
-<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
-<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
-<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
-<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
-<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
-<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
-<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
-<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
-<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
-<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
-<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
-<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
-<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
-<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
-</ul></li>
-<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
-<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
-<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
-<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
-<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
-<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
-<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
-</ul></li>
-<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
-<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
-<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
-<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
-<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
-<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
-<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
-<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
-<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
-<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
-</ul></li>
-<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
-<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
-<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
-<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
-<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
-<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
-<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
-<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
-<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
-<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
-</ul></li>
-<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
-<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
-<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
-<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
-<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
-<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
-<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-12"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
-<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
-<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="D" data-path="D-appendixD.html"><a href="D-appendixD.html"><i class="fa fa-check"></i><b>D</b> Learning Check Solutions</a><ul>
-<li class="chapter" data-level="D.1" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-2-solutions"><i class="fa fa-check"></i><b>D.1</b> Chapter 2 Solutions</a></li>
-<li class="chapter" data-level="D.2" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-3-solutions"><i class="fa fa-check"></i><b>D.2</b> Chapter 3 Solutions</a></li>
-<li class="chapter" data-level="D.3" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-4-solutions"><i class="fa fa-check"></i><b>D.3</b> Chapter 4 Solutions</a></li>
-<li class="chapter" data-level="D.4" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-5-solutions"><i class="fa fa-check"></i><b>D.4</b> Chapter 5 Solutions</a></li>
-<li class="chapter" data-level="D.5" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-6-solutions"><i class="fa fa-check"></i><b>D.5</b> Chapter 6 Solutions</a></li>
-</ul></li>
-<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
-</ul>
-
-      </nav>
-    </div>
-
-    <div class="book-body">
-      <div class="body-inner">
-        <div class="book-header" role="navigation">
-          <h1>
-            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Statistical Inference via Data Science</a>
-          </h1>
-        </div>
-
-        <div class="page-wrapper" tabindex="-1" role="main">
-          <div class="page-inner">
-
-            <section class="normal" id="section-">
-<html>
-<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
-</html>
-<div id="inference-for-regression" class="section level1">
-<h1><span class="header-section-number">Chapter 12</span> Inference for Regression</h1>
-<hr />
-<div class="learncheck">
-<p>
-<strong>Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at <a href="https://github.com/moderndive/moderndive_book" class="uri">https://github.com/moderndive/moderndive_book</a>.</strong>
-</p>
-<center>
-/begin{center} <code>r include_image(path = “images/sign-2408065_1920.png”, html_opts=“height=100px”, latex_opts = “width=20%”)</code> /end{center}
-</center>
-</div>
-<hr />
-<div id="needed-packages-9" class="section level3 unnumbered">
-<h3>Needed packages</h3>
-<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(dplyr)
-<span class="kw">library</span>(moderndive)
-<span class="kw">library</span>(infer)
-<span class="kw">library</span>(gapminder)
-<span class="kw">library</span>(ISLR)</code></pre>
-<hr />
-</div>
-<div id="simulation-based-inference-for-regression" class="section level2">
-<h2><span class="header-section-number">12.1</span> Simulation-based Inference for Regression</h2>
-<p>We can also use the concept of permuting to determine the standard error of our null distribution and conduct a hypothesis test for a population slope. Let’s go back to our example on teacher evaluations Chapters <a href="6-regression.html#regression">6</a> and <a href="7-multiple-regression.html#multiple-regression">7</a>. We’ll begin in the basic regression setting to test to see if we have evidence that a statistically significant <em>positive</em> relationship exists between teaching and beauty scores for the University of Texas professors. As we did in Chapter <a href="6-regression.html#regression">6</a>, teaching <code>score</code> will act as our outcome variable and <code>bty_avg</code> will be our explanatory variable. We will set up this hypothesis testing process as we have each before via the “There is Only One Test” diagram in Figure <a href="11-hypothesis-testing.html#fig:htdowney">11.1</a> using the <code>infer</code> package.</p>
-<div id="data-1" class="section level3">
-<h3><span class="header-section-number">12.1.1</span> Data</h3>
-<p>Our data is stored in <code>evals</code> and we are focused on the measurements of the <code>score</code> and <code>bty_avg</code> variables there. Note that we don’t choose a subset of variables here since we will <code>specify()</code> the variables of interest using <code>infer</code>.</p>
-<pre class="sourceCode r"><code class="sourceCode r">evals <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg)</code></pre>
-<pre><code>Response: score (numeric)
-Explanatory: bty_avg (numeric)
-# A tibble: 463 x 2
-   score bty_avg
-   &lt;dbl&gt;   &lt;dbl&gt;
- 1   4.7    5   
- 2   4.1    5   
- 3   3.9    5   
- 4   4.8    5   
- 5   4.6    3   
- 6   4.3    3   
- 7   2.8    3   
- 8   4.1    3.33
- 9   3.4    3.33
-10   4.5    3.17
-# … with 453 more rows</code></pre>
-</div>
-<div id="test-statistic-delta-1" class="section level3">
-<h3><span class="header-section-number">12.1.2</span> Test statistic <span class="math inline">\(\delta\)</span></h3>
-<p>Our test statistic here is the sample slope coefficient that we denote with <span class="math inline">\(b_1\)</span>.</p>
-</div>
-<div id="observed-effect-delta-1" class="section level3">
-<h3><span class="header-section-number">12.1.3</span> Observed effect <span class="math inline">\(\delta^*\)</span></h3>
-<p>We can use the <code>specify() %&gt;% calculate()</code> shortcut here to determine the slope value seen in our observed data:</p>
-<pre class="sourceCode r"><code class="sourceCode r">slope_obs &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre>
-<p>The calculated slope value from our observed sample is <span class="math inline">\(b_1 = 0.067\)</span>.</p>
-</div>
-<div id="model-of-h_0-1" class="section level3">
-<h3><span class="header-section-number">12.1.4</span> Model of <span class="math inline">\(H_0\)</span></h3>
-<p>We are looking to see if a positive relationship exists so <span class="math inline">\(H_A: \beta_1 &gt; 0\)</span>. Our null hypothesis is always in terms of equality so we have <span class="math inline">\(H_0: \beta_1 = 0\)</span>. In other words, when we assume the null hypothesis is true, we are assuming there is NOT a linear relationship between teaching and beauty scores for University of Texas professors.</p>
-</div>
-<div id="simulated-data-1" class="section level3">
-<h3><span class="header-section-number">12.1.5</span> Simulated data</h3>
-<p>Now to simulate the null hypothesis being true and recreating how our sample was created, we need to think about what it means for <span class="math inline">\(\beta_1\)</span> to be zero. If <span class="math inline">\(\beta_1 = 0\)</span>, we said above that there is no relationship between the teaching and beauty scores. If there is no relationship, then any one of the teaching score values could have just as likely occurred with any of the other beauty score values instead of the one that it actually did fall with. We, therefore, have another example of permuting in our simulating of data under the null hypothesis.</p>
-<p><strong>Tactile simulation</strong></p>
-<p>We could use a deck of 926 note cards to create a tactile simulation of this permuting process. We would write the 463 different values of beauty scores on each of the 463 cards, one per card. We would then do the same thing for the 463 teaching scores putting them on one per card.</p>
-<p>Next, we would lay out each of the 463 beauty score cards and we would shuffle the teaching score deck. Then, after shuffling the deck well, we would disperse the cards one per each one of the beauty score cards. We would then enter these new values in for teaching score and compute a sample slope based on this permuting. We could repeat this process many times, keeping track of our sample slope after each shuffle.</p>
-</div>
-<div id="distribution-of-delta-under-h_0-1" class="section level3">
-<h3><span class="header-section-number">12.1.6</span> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></h3>
-<p>We can build our null distribution in much the same way we did in Chapter <a href="11-hypothesis-testing.html#hypothesis-testing">11</a> using the <code>generate()</code> and <code>calculate()</code> functions. Note also the addition of the <code>hypothesize()</code> function, which lets <code>generate()</code> know to perform the permuting instead of bootstrapping.</p>
-<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> slope_obs, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)</code></pre>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-383-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<p>In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with <code>visualize()</code>.</p>
-</div>
-<div id="the-p-value-1" class="section level3">
-<h3><span class="header-section-number">12.1.7</span> The p-value</h3>
-<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> slope_obs, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)</code></pre>
-<pre><code># A tibble: 1 x 1
-  p_value
-    &lt;dbl&gt;
-1       0</code></pre>
-<p>Since 0.067 falls far to the right of this plot beyond where any of the histogram bins have data, we can say that we have a <span class="math inline">\(p\)</span>-value of 0. We, thus, have evidence to reject the null hypothesis in support of there being a positive association between the beauty score and teaching score of University of Texas faculty members.</p>
-<div class="learncheck">
-<p>
-<strong><em>Learning check</em></strong>
-</p>
-</div>
-<p><strong>(LC11.1)</strong> Repeat the inference above but this time for the correlation coefficient instead of the slope. Note the implementation of <code>stat = &quot;correlation&quot;</code> in the <code>calculate()</code> function of the <code>infer</code> package.</p>
-<div class="learncheck">
-
-</div>
-<hr />
-</div>
-</div>
-<div id="bootstrapping-for-the-regression-slope" class="section level2">
-<h2><span class="header-section-number">12.2</span> Bootstrapping for the regression slope</h2>
-<p>With the p-value calculated as 0 in the hypothesis test above, we can next determine just how strong of a positive slope value we might expect between the variables of teaching <code>score</code> and beauty score (<code>bty_avg</code>) for University of Texas faculty. Recall the <code>infer</code> pipeline above to compute the null distribution. Recall that this assumes the null hypothesis is true that there is no relationship between teaching score and beauty score using the <code>hypothesize()</code> function.</p>
-<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>, <span class="dt">type =</span> <span class="st">&quot;permute&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre>
-<p>To further reinforce the process being done in the pipeline, we’ve added the <code>type</code> argument to <code>generate()</code>. This is automatically added based on the entries for <code>specify()</code> and <code>hypothesize()</code> but it provides a useful way to check to make sure <code>generate()</code> is created the samples in the desired way. In this case, we <code>permute</code>d the values of one variable across the values of the other 10,000 times and <code>calculate</code>d a <code>&quot;slope&quot;</code> coefficient for each of these 10,000 <code>generate</code>d samples.</p>
-<p>If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping:</p>
-<pre class="sourceCode r"><code class="sourceCode r">bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-389-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<p>Next we can use the <code>get_ci()</code> function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score.</p>
-<pre class="sourceCode r"><code class="sourceCode r">percentile_slope_ci &lt;-<span class="st"> </span>bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">level =</span> <span class="fl">0.99</span>, <span class="dt">type =</span> <span class="st">&quot;percentile&quot;</span>)
-percentile_slope_ci</code></pre>
-<pre><code># A tibble: 1 x 2
-  `0.5%` `99.5%`
-   &lt;dbl&gt;   &lt;dbl&gt;
-1 0.0229   0.110</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r">se_slope_ci &lt;-<span class="st"> </span>bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">level =</span> <span class="fl">0.99</span>, <span class="dt">type =</span> <span class="st">&quot;se&quot;</span>, <span class="dt">point_estimate =</span> slope_obs)
-se_slope_ci</code></pre>
-<pre><code># A tibble: 1 x 2
-   lower upper
-   &lt;dbl&gt; &lt;dbl&gt;
-1 0.0220 0.111</code></pre>
-<p>With the bootstrap distribution being close to symmetric, it makes sense that the two resulting confidence intervals are similar.</p>
-<!-- It's all you, Bert! Not sure if we want to cover more about the t distribution here as well or how we should transition from simulation-based to theory-based for the multiple regression part? -->
-<hr />
-</div>
-<div id="inference-for-multiple-regression" class="section level2">
-<h2><span class="header-section-number">12.3</span> Inference for multiple regression</h2>
-<div id="refresher-professor-evaluations-data" class="section level3">
-<h3><span class="header-section-number">12.3.1</span> Refresher: Professor evaluations data</h3>
-<p>Let’s revisit the professor evaluations data that we analyzed using multiple regression with one numerical and one categorical predictor. In particular</p>
-<ul>
-<li><span class="math inline">\(y\)</span>: outcome variable of instructor evaluation <code>score</code></li>
-<li>predictor variables
-<ul>
-<li><span class="math inline">\(x_1\)</span>: numerical explanatory/predictor variable of <code>age</code></li>
-<li><span class="math inline">\(x_2\)</span>: categorical explanatory/predictor variable of <code>gender</code></li>
-</ul></li>
-</ul>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(dplyr)
-<span class="kw">library</span>(moderndive)
-
-evals_multiple &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">select</span>(score, ethnicity, gender, language, age, bty_avg, rank)</code></pre>
-<p>First, recall that we had two competing potential models to explain professors’
-teaching scores:</p>
-<ol style="list-style-type: decimal">
-<li>Model 1: No interaction term. i.e. both male and female profs have the same slope describing the associated effect of age on teaching score</li>
-<li>Model 2: Includes an interaction term. i.e. we allow for male and female profs to have different slopes describing the associated effect of age on teaching score</li>
-</ol>
-</div>
-<div id="refresher-visualizations" class="section level3">
-<h3><span class="header-section-number">12.3.2</span> Refresher: Visualizations</h3>
-<p>Recall the plots we made for both these models:</p>
-<div class="figure" style="text-align: center"><span id="fig:model1"></span>
-<img src="ismaykim_files/figure-html/model1-1.png" alt="Model 1: no interaction effect included" width="\textwidth" />
-<p class="caption">
-FIGURE 12.1: Model 1: no interaction effect included
-</p>
-</div>
-<div class="figure" style="text-align: center"><span id="fig:model2"></span>
-<img src="ismaykim_files/figure-html/model2-1.png" alt="Model 2: interaction effect included" width="\textwidth" />
-<p class="caption">
-FIGURE 12.2: Model 2: interaction effect included
-</p>
-</div>
-</div>
-<div id="refresher-regression-tables" class="section level3">
-<h3><span class="header-section-number">12.3.3</span> Refresher: Regression tables</h3>
-<p>Last, let’s recall the regressions we fit. First, the regression with no
-interaction effect: note the use of <code>+</code> in the formula in Table <a href="12-inference-for-regression.html#tab:modelmultireg">12.1</a>.</p>
-<pre class="sourceCode r"><code class="sourceCode r">score_model_<span class="dv">2</span> &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">+</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_multiple)
-<span class="kw">get_regression_table</span>(score_model_<span class="dv">2</span>)</code></pre>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<caption style="font-size: initial !important;">
-<span id="tab:modelmultireg">TABLE 12.1: </span>Model 1: Regression table with no interaction effect included
-</caption>
-<thead>
-<tr>
-<th style="text-align:left;">
-term
-</th>
-<th style="text-align:right;">
-estimate
-</th>
-<th style="text-align:right;">
-std_error
-</th>
-<th style="text-align:right;">
-statistic
-</th>
-<th style="text-align:right;">
-p_value
-</th>
-<th style="text-align:right;">
-lower_ci
-</th>
-<th style="text-align:right;">
-upper_ci
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:left;">
-intercept
-</td>
-<td style="text-align:right;">
-4.484
-</td>
-<td style="text-align:right;">
-0.125
-</td>
-<td style="text-align:right;">
-35.79
-</td>
-<td style="text-align:right;">
-0.000
-</td>
-<td style="text-align:right;">
-4.238
-</td>
-<td style="text-align:right;">
-4.730
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-age
-</td>
-<td style="text-align:right;">
--0.009
-</td>
-<td style="text-align:right;">
-0.003
-</td>
-<td style="text-align:right;">
--3.28
-</td>
-<td style="text-align:right;">
-0.001
-</td>
-<td style="text-align:right;">
--0.014
-</td>
-<td style="text-align:right;">
--0.003
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-gendermale
-</td>
-<td style="text-align:right;">
-0.191
-</td>
-<td style="text-align:right;">
-0.052
-</td>
-<td style="text-align:right;">
-3.63
-</td>
-<td style="text-align:right;">
-0.000
-</td>
-<td style="text-align:right;">
-0.087
-</td>
-<td style="text-align:right;">
-0.294
-</td>
-</tr>
-</tbody>
-</table>
-<p>Second, the regression with an interaction effect: note the use of <code>*</code> in the formula.</p>
-<pre class="sourceCode r"><code class="sourceCode r">score_model_<span class="dv">3</span> &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">*</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_multiple)
-<span class="kw">get_regression_table</span>(score_model_<span class="dv">3</span>)</code></pre>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-396">TABLE 12.2: </span>Model 2: Regression table with interaction effect included
-</caption>
-<thead>
-<tr>
-<th style="text-align:left;">
-term
-</th>
-<th style="text-align:right;">
-estimate
-</th>
-<th style="text-align:right;">
-std_error
-</th>
-<th style="text-align:right;">
-statistic
-</th>
-<th style="text-align:right;">
-p_value
-</th>
-<th style="text-align:right;">
-lower_ci
-</th>
-<th style="text-align:right;">
-upper_ci
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:left;">
-intercept
-</td>
-<td style="text-align:right;">
-4.883
-</td>
-<td style="text-align:right;">
-0.205
-</td>
-<td style="text-align:right;">
-23.80
-</td>
-<td style="text-align:right;">
-0.000
-</td>
-<td style="text-align:right;">
-4.480
-</td>
-<td style="text-align:right;">
-5.286
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-age
-</td>
-<td style="text-align:right;">
--0.018
-</td>
-<td style="text-align:right;">
-0.004
-</td>
-<td style="text-align:right;">
--3.92
-</td>
-<td style="text-align:right;">
-0.000
-</td>
-<td style="text-align:right;">
--0.026
-</td>
-<td style="text-align:right;">
--0.009
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-gendermale
-</td>
-<td style="text-align:right;">
--0.446
-</td>
-<td style="text-align:right;">
-0.265
-</td>
-<td style="text-align:right;">
--1.68
-</td>
-<td style="text-align:right;">
-0.094
-</td>
-<td style="text-align:right;">
--0.968
-</td>
-<td style="text-align:right;">
-0.076
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-age:gendermale
-</td>
-<td style="text-align:right;">
-0.014
-</td>
-<td style="text-align:right;">
-0.006
-</td>
-<td style="text-align:right;">
-2.45
-</td>
-<td style="text-align:right;">
-0.015
-</td>
-<td style="text-align:right;">
-0.003
-</td>
-<td style="text-align:right;">
-0.024
-</td>
-</tr>
-</tbody>
-</table>
-</div>
-<div id="script-of-r-code-4" class="section level3">
-<h3><span class="header-section-number">12.3.4</span> Script of R code</h3>
-<p>An R script file of all R code used in this chapter is available <a href="scripts/11-inference-for-regression.R">here</a>.</p>
-<hr />
-</div>
-</div>
-<div id="residual-analysis" class="section level2">
-<h2><span class="header-section-number">12.4</span> Residual analysis</h2>
-<div id="model1residuals" class="section level3">
-<h3><span class="header-section-number">12.4.1</span> Residual analysis</h3>
-<p>Recall the residuals can be thought of as the error or the “lack-of-fit” between the observed value <span class="math inline">\(y\)</span> and the fitted value <span class="math inline">\(\widehat{y}\)</span> on the blue regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>. Ideally when we fit a regression model, we’d like there to be <em>no systematic pattern</em> to these residuals. We’ll be more specific as to what we mean by <em>no systematic pattern</em> when we see Figure <a href="12-inference-for-regression.html#fig:numxplot7">12.4</a> below, but let’s keep this notion imprecise for now. Investigating any such patterns is known as <em>residual analysis</em> and is the theme of this section.</p>
-<p>We’ll perform our residual analysis in two ways:</p>
-<ol style="list-style-type: decimal">
-<li>Creating a scatterplot with the residuals on the <span class="math inline">\(y\)</span>-axis and the original explanatory variable <span class="math inline">\(x\)</span> on the <span class="math inline">\(x\)</span>-axis.</li>
-<li>Creating a histogram of the residuals, thereby showing the <em>distribution</em> of the residuals.</li>
-</ol>
-<p>First, recall in Figure <a href="6-regression.html#fig:numxplot5">6.8</a> above we created a scatterplot where</p>
-<ul>
-<li>on the vertical axis we had the teaching score <span class="math inline">\(y\)</span>,</li>
-<li>on the horizontal axis we had the beauty score <span class="math inline">\(x\)</span>, and</li>
-<li>the blue arrow represented the residual for one particular instructor.</li>
-</ul>
-<p>Instead, in Figure <a href="12-inference-for-regression.html#fig:numxplot6">12.3</a> below, let’s create a scatterplot where</p>
-<ul>
-<li>On the vertical axis we have the residual <span class="math inline">\(y-\widehat{y}\)</span> instead</li>
-<li>On the horizontal axis we have the beauty score <span class="math inline">\(x\)</span> as before:</li>
-</ul>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Get data</span>
-evals_ch6 &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">select</span>(score, bty_avg, age)
-<span class="co"># Fit regression model:</span>
-score_model &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>bty_avg, <span class="dt">data =</span> evals_ch6)
-<span class="co"># Get regression table:</span>
-<span class="kw">get_regression_table</span>(score_model)</code></pre>
-<pre><code># A tibble: 2 x 7
-  term      estimate std_error statistic p_value lower_ci upper_ci
-  &lt;chr&gt;        &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
-1 intercept    3.88      0.076     51.0        0    3.73     4.03 
-2 bty_avg      0.067     0.016      4.09       0    0.035    0.099</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Get regression points</span>
-regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(score_model)</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> bty_avg, <span class="dt">y =</span> residual)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Beauty Score&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_hline</span>(<span class="dt">yintercept =</span> <span class="dv">0</span>, <span class="dt">col =</span> <span class="st">&quot;blue&quot;</span>, <span class="dt">size =</span> <span class="dv">1</span>)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:numxplot6"></span>
-<img src="ismaykim_files/figure-html/numxplot6-1.png" alt="Plot of residuals over beauty score" width="\textwidth" />
-<p class="caption">
-FIGURE 12.3: Plot of residuals over beauty score
-</p>
-</div>
-<p>You can think of Figure <a href="12-inference-for-regression.html#fig:numxplot6">12.3</a> as Figure <a href="6-regression.html#fig:numxplot5">6.8</a> but with the blue line flattened out to <span class="math inline">\(y=0\)</span>. Does it seem like there is <em>no systematic pattern</em> to the residuals? This question is rather qualitative and subjective in nature, thus different people may respond with different answers to the above question. However, it can be argued that there isn’t a <em>drastic</em> pattern in the residuals.</p>
-<p>Let’s now get a little more precise in our definition of <em>no systematic pattern</em> in the residuals. Ideally, the residuals should behave <em>randomly</em>. In addition,</p>
-<ol style="list-style-type: decimal">
-<li>the residuals should be on average 0. In other words, sometimes the regression model will make a positive error in that <span class="math inline">\(y - \widehat{y} &gt; 0\)</span>, sometimes the regression model will make a negative error in that <span class="math inline">\(y - \widehat{y} &lt; 0\)</span>, but <em>on average</em> the error is 0.</li>
-<li>Further, the value and spread of the residuals should not depend on the value of <span class="math inline">\(x\)</span>.</li>
-</ol>
-<p>In Figure <a href="12-inference-for-regression.html#fig:numxplot7">12.4</a> below, we display some hypothetical examples where there are <em>drastic</em> patterns to the residuals. In Example 1, the value of the residual seems to depend on <span class="math inline">\(x\)</span>: the residuals tend to be positive for small and large values of <span class="math inline">\(x\)</span> in this range, whereas values of <span class="math inline">\(x\)</span> more in the middle tend to have negative residuals. In Example 2, while the residuals seem to be on average 0 for each value of <span class="math inline">\(x\)</span>, the spread of the residuals varies for different values of <span class="math inline">\(x\)</span>; this situation is known as <em>heteroskedasticity</em>.</p>
-<div class="figure" style="text-align: center"><span id="fig:numxplot7"></span>
-<img src="ismaykim_files/figure-html/numxplot7-1.png" alt="Examples of less than ideal residual patterns" width="\textwidth" />
-<p class="caption">
-FIGURE 12.4: Examples of less than ideal residual patterns
-</p>
-</div>
-<p>The second way to perform a residual analysis is to look at the histogram of the residuals:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.25</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:model1residualshist"></span>
-<img src="ismaykim_files/figure-html/model1residualshist-1.png" alt="Histogram of residuals" width="\textwidth" />
-<p class="caption">
-FIGURE 12.5: Histogram of residuals
-</p>
-</div>
-<p>This histogram seems to indicate that we have more positive residuals than negative. Since the residual <span class="math inline">\(y-\widehat{y}\)</span> is positive when <span class="math inline">\(y &gt; \widehat{y}\)</span>, it seems our fitted teaching score from the regression model tends to <em>underestimate</em> the true teaching score. This histogram has a slight <em>left-skew</em> in that there is a long tail on the left. Another way to say this is this data exhibits a <em>negative skew</em>. Is this a problem? Again, there is a certain amount of subjectivity in the response. In the authors’ opinion, while there is a slight skew/pattern to the residuals, it isn’t a large concern. On the other hand, others might disagree with our assessment. Here are examples of an ideal and less than ideal pattern to the residuals when viewed in a histogram:</p>
-<div class="figure" style="text-align: center"><span id="fig:numxplot9"></span>
-<img src="ismaykim_files/figure-html/numxplot9-1.png" alt="Examples of ideal and less than ideal residual patterns" width="\textwidth" />
-<p class="caption">
-FIGURE 12.6: Examples of ideal and less than ideal residual patterns
-</p>
-</div>
-<p>In fact, we’ll see later on that we would like the residuals to be <em>normally distributed</em> with
-mean 0. In other words, be bell-shaped and centered at 0! While this requirement and residual analysis in general may seem to some of you as not being overly critical at this point, we’ll see later after when we cover <em>inference for regression</em> in Chapter <a href="12-inference-for-regression.html#inference-for-regression">12</a> that for the last five columns of the regression table from earlier (<code>std error</code>, <code>statistic</code>, <code>p_value</code>,<code>lower_ci</code>, and <code>upper_ci</code>) to have valid interpretations, the above three conditions should roughly hold.</p>
-<div class="learncheck">
-<p>
-<strong><em>Learning check</em></strong>
-</p>
-</div>
-<p><strong>(LC11.2)</strong> Continuing with our regression using <code>age</code> as the explanatory variable and teaching <code>score</code> as the outcome variable, use the <code>get_regression_points()</code> function to get the observed values, fitted values, and residuals for all 463 instructors. Perform a residual analysis and look for any systematic patterns in the residuals. Ideally, there should be little to no pattern.</p>
-<div class="learncheck">
-
-</div>
-</div>
-<div id="model2residuals" class="section level3">
-<h3><span class="header-section-number">12.4.2</span> Residual analysis</h3>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Get data:</span>
-gapminder2007 &lt;-<span class="st"> </span>gapminder <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">filter</span>(year <span class="op">==</span><span class="st"> </span><span class="dv">2007</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(country, continent, lifeExp, gdpPercap)
-<span class="co"># Fit regression model:</span>
-lifeExp_model &lt;-<span class="st"> </span><span class="kw">lm</span>(lifeExp <span class="op">~</span><span class="st"> </span>continent, <span class="dt">data =</span> gapminder2007)
-<span class="co"># Get regression table:</span>
-<span class="kw">get_regression_table</span>(lifeExp_model)</code></pre>
-<pre><code># A tibble: 5 x 7
-  term              estimate std_error statistic p_value lower_ci upper_ci
-  &lt;chr&gt;                &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
-1 intercept             54.8      1.02     53.4        0     52.8     56.8
-2 continentAmericas     18.8      1.8      10.4        0     15.2     22.4
-3 continentAsia         15.9      1.65      9.68       0     12.7     19.2
-4 continentEurope       22.8      1.70     13.5        0     19.5     26.2
-5 continentOceania      25.9      5.33      4.86       0     15.4     36.4</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Get regression points</span>
-regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(lifeExp_model)</code></pre>
-<p>Recall our discussion on residuals from Section <a href="12-inference-for-regression.html#model1residuals">12.4.1</a> where our goal was to investigate whether or not there was a <em>systematic pattern</em> to the residuals. Ideally since residuals can be thought of as error, there should be no such pattern. While there are many ways to do such residual analysis, we focused on two approaches based on visualizations.</p>
-<ol style="list-style-type: decimal">
-<li>A plot with residuals on the vertical axis and the predictor (in this case continent) on the horizontal axis</li>
-<li>A histogram of all residuals</li>
-</ol>
-<p>First, let’s plot the residuals versus continent in Figure <a href="12-inference-for-regression.html#fig:catxplot7">12.7</a>, but also let’s plot all 142 points with a little horizontal random jitter by setting the <code>width = 0.1</code> parameter in <code>geom_jitter()</code>:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> continent, <span class="dt">y =</span> residual)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_jitter</span>(<span class="dt">width =</span> <span class="fl">0.1</span>) <span class="op">+</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Continent&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_hline</span>(<span class="dt">yintercept =</span> <span class="dv">0</span>, <span class="dt">col =</span> <span class="st">&quot;blue&quot;</span>)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:catxplot7"></span>
-<img src="ismaykim_files/figure-html/catxplot7-1.png" alt="Plot of residuals over continent" width="\textwidth" />
-<p class="caption">
-FIGURE 12.7: Plot of residuals over continent
-</p>
-</div>
-<p>We observe</p>
-<ol style="list-style-type: decimal">
-<li>There seems to be a rough balance of both positive and negative residuals for all 5 continents.</li>
-<li>However, there is one clear outlier in Asia, which has a residual with the largest deviation away from 0.</li>
-</ol>
-<p>Let’s investigate the 5 countries in Asia with the shortest life expectancy:</p>
-<pre class="sourceCode r"><code class="sourceCode r">gapminder2007 <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">filter</span>(continent <span class="op">==</span><span class="st"> &quot;Asia&quot;</span>) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">arrange</span>(lifeExp)</code></pre>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-405">TABLE 12.3: </span>Countries in Asia with shortest life expectancy
-</caption>
-<thead>
-<tr>
-<th style="text-align:left;">
-country
-</th>
-<th style="text-align:left;">
-continent
-</th>
-<th style="text-align:right;">
-lifeExp
-</th>
-<th style="text-align:right;">
-gdpPercap
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:left;">
-Afghanistan
-</td>
-<td style="text-align:left;">
-Asia
-</td>
-<td style="text-align:right;">
-43.8
-</td>
-<td style="text-align:right;">
-975
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-Iraq
-</td>
-<td style="text-align:left;">
-Asia
-</td>
-<td style="text-align:right;">
-59.5
-</td>
-<td style="text-align:right;">
-4471
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-Cambodia
-</td>
-<td style="text-align:left;">
-Asia
-</td>
-<td style="text-align:right;">
-59.7
-</td>
-<td style="text-align:right;">
-1714
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-Myanmar
-</td>
-<td style="text-align:left;">
-Asia
-</td>
-<td style="text-align:right;">
-62.1
-</td>
-<td style="text-align:right;">
-944
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-Yemen, Rep.
-</td>
-<td style="text-align:left;">
-Asia
-</td>
-<td style="text-align:right;">
-62.7
-</td>
-<td style="text-align:right;">
-2281
-</td>
-</tr>
-</tbody>
-</table>
-<p>This was the earlier identified residual for Afghanistan of -26.9. Unfortunately
-given recent geopolitical turmoil, individuals who live in Afghanistan and, in particular in 2007, have a
-drastically lower life expectancy.</p>
-<p>Second, let’s look at a histogram of all 142 values of
-residuals in Figure <a href="12-inference-for-regression.html#fig:catxplot8">12.8</a>. In this case, the residuals form a
-rather nice bell-shape, although there are a couple of very low and very high
-values at the tails. As we said previously, searching for patterns in residuals
-can be somewhat subjective, but ideally we hope there are no “drastic” patterns.</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:catxplot8"></span>
-<img src="ismaykim_files/figure-html/catxplot8-1.png" alt="Histogram of residuals" width="\textwidth" />
-<p class="caption">
-FIGURE 12.8: Histogram of residuals
-</p>
-</div>
-<div class="learncheck">
-<p>
-<strong><em>Learning check</em></strong>
-</p>
-</div>
-<p><strong>(LC11.3)</strong> Continuing with our regression using <code>gdpPercap</code> as the outcome variable and <code>continent</code> as the explanatory variable, use the <code>get_regression_points()</code> function to get the observed values, fitted values, and residuals for all 142 countries in 2007 and perform a residual analysis to look for any systematic patterns in the residuals. Is there a pattern? Please keep in mind that these types of questions are somewhat subjective and different people will most likely have different answers. The focus should be on being able to justify the conclusions made.</p>
-<div class="learncheck">
-
-</div>
-</div>
-<div id="model3residuals" class="section level3">
-<h3><span class="header-section-number">12.4.3</span> Residual analysis</h3>
-<p>Recall in Section <a href="12-inference-for-regression.html#model1residuals">12.4.1</a>, our first residual analysis plot investigated the presence of any systematic pattern in the residuals when we had a single numerical predictor: <code>bty_age</code>. For the <code>Credit</code> card dataset, since we have two numerical predictors, <code>Limit</code> and <code>Income</code>, we must perform this twice:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Get data:</span>
-Credit &lt;-<span class="st"> </span>Credit <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">select</span>(Balance, Limit, Income, Rating, Age)
-<span class="co"># Fit regression model:</span>
-Balance_model &lt;-<span class="st"> </span><span class="kw">lm</span>(Balance <span class="op">~</span><span class="st"> </span>Limit <span class="op">+</span><span class="st"> </span>Income, <span class="dt">data =</span> Credit)
-<span class="co"># Get regression table:</span>
-<span class="kw">get_regression_table</span>(Balance_model)</code></pre>
-<pre><code># A tibble: 3 x 7
-  term      estimate std_error statistic p_value lower_ci upper_ci
-  &lt;chr&gt;        &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
-1 intercept -385.       19.5       -19.8       0 -423.    -347.   
-2 Limit        0.264     0.006      45.0       0    0.253    0.276
-3 Income      -7.66      0.385     -19.9       0   -8.42    -6.91 </code></pre>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Get regression points</span>
-regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(Balance_model)</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> Limit, <span class="dt">y =</span> residual)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Credit limit (in $)&quot;</span>, 
-       <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>, 
-       <span class="dt">title =</span> <span class="st">&quot;Residuals vs credit limit&quot;</span>)
-  
-<span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> Income, <span class="dt">y =</span> residual)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Income (in $1000)&quot;</span>, 
-       <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>, 
-       <span class="dt">title =</span> <span class="st">&quot;Residuals vs income&quot;</span>)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-410"></span>
-<img src="ismaykim_files/figure-html/unnamed-chunk-410-1.png" alt="Residuals vs credit limit and income" width="\textwidth" />
-<p class="caption">
-FIGURE 12.9: Residuals vs credit limit and income
-</p>
-</div>
-<p>In this case, there <strong>does</strong> appear to be a systematic pattern to the residuals. As the scatter of the residuals around the line <span class="math inline">\(y=0\)</span> is definitely not consistent. This behavior of the residuals is further evidenced by the histogram of residuals in Figure <a href="12-inference-for-regression.html#fig:model3-residuals-hist">12.10</a>. We observe that the residuals have a slight right-skew (recall we say that data is right-skewed, or positively-skewed, if there is a tail to the right). Ideally, these residuals should be bell-shaped around a residual value of 0.</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:model3-residuals-hist"></span>
-<img src="ismaykim_files/figure-html/model3-residuals-hist-1.png" alt="Relationship between credit card balance and credit limit/income" width="\textwidth" />
-<p class="caption">
-FIGURE 12.10: Relationship between credit card balance and credit limit/income
-</p>
-</div>
-<p>Another way to interpret this histogram is that since the residual is computed as <span class="math inline">\(y - \widehat{y}\)</span> = <code>balance</code> - <code>balance_hat</code>, we have some values where the fitted value <span class="math inline">\(\widehat{y}\)</span> is very much lower than the observed value <span class="math inline">\(y\)</span>. In other words, we are underestimating certain credit card holders’ balances by a very large amount.</p>
-<div class="learncheck">
-<p>
-<strong><em>Learning check</em></strong>
-</p>
-</div>
-<p><strong>(LC11.4)</strong> Continuing with our regression using <code>Rating</code> and <code>Age</code> as the explanatory variables and credit card <code>Balance</code> as the outcome variable, use the <code>get_regression_points()</code> function to get the observed values, fitted values, and residuals for all 400 credit card holders. Perform a residual analysis and look for any systematic patterns in the residuals.</p>
-<div class="learncheck">
-
-</div>
-</div>
-<div id="model4residuals" class="section level3">
-<h3><span class="header-section-number">12.4.4</span> Residual analysis</h3>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Get data:</span>
-evals_ch7 &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">select</span>(score, age, gender)
-<span class="co"># Fit regression model:</span>
-score_model_<span class="dv">2</span> &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">+</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_ch7)
-<span class="co"># Get regression table:</span>
-<span class="kw">get_regression_table</span>(score_model_<span class="dv">2</span>)</code></pre>
-<pre><code># A tibble: 3 x 7
-  term       estimate std_error statistic p_value lower_ci upper_ci
-  &lt;chr&gt;         &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
-1 intercept     4.48      0.125     35.8    0        4.24     4.73 
-2 age          -0.009     0.003     -3.28   0.001   -0.014   -0.003
-3 gendermale    0.191     0.052      3.63   0        0.087    0.294</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Get regression points</span>
-regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(score_model_<span class="dv">2</span>)</code></pre>
-<p>As always, let’s perform a residual analysis first with a histogram, which we can facet by <code>gender</code>:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.25</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
-<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span>gender)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:residual1"></span>
-<img src="ismaykim_files/figure-html/residual1-1.png" alt="Interaction model histogram of residuals" width="\textwidth" />
-<p class="caption">
-FIGURE 12.11: Interaction model histogram of residuals
-</p>
-</div>
-<p>Second, the residuals as compared to the predictor variables:</p>
-<ul>
-<li><span class="math inline">\(x_1\)</span>: numerical explanatory/predictor variable of <code>age</code></li>
-<li><span class="math inline">\(x_2\)</span>: categorical explanatory/predictor variable of <code>gender</code></li>
-</ul>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> residual)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;age&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_hline</span>(<span class="dt">yintercept =</span> <span class="dv">0</span>, <span class="dt">col =</span> <span class="st">&quot;blue&quot;</span>, <span class="dt">size =</span> <span class="dv">1</span>) <span class="op">+</span>
-<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>gender)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:residual2"></span>
-<img src="ismaykim_files/figure-html/residual2-1.png" alt="Interaction model residuals vs predictor" width="\textwidth" />
-<p class="caption">
-FIGURE 12.12: Interaction model residuals vs predictor
-</p>
-</div>
-
-</div>
-</div>
-</div>
-
-
-
-            </section>
-
-          </div>
-        </div>
-      </div>
-<a href="11-hypothesis-testing.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
-<a href="13-thinking-with-data.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
-    </div>
-  </div>
-<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
-<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
-<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
-<script>
-gitbook.require(["gitbook"], function(gitbook) {
-gitbook.start({
-"sharing": {
-"github": false,
-"facebook": true,
-"twitter": true,
-"google": false,
-"linkedin": false,
-"weibo": false,
-"instapaper": false,
-"vk": false,
-"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
-},
-"fontsettings": {
-"theme": "white",
-"family": "sans",
-"size": 2
-},
-"edit": {
-"link": "https://github.com/moderndive/moderndive_book/edit/master/11-inference-for-regression.Rmd",
-"text": "Edit"
-},
-"history": {
-"link": null,
-"text": null
-},
-"download": null,
-"toc": {
-"collapse": "section",
-"scroll_highlight": true
-}
-});
-});
-</script>
-
-<!-- dynamically load mathjax for compatibility with self-contained -->
-<script>
-  (function () {
-    var script = document.createElement("script");
-    script.type = "text/javascript";
-    var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
-    if (location.protocol !== "file:" && /^https?:/.test(src))
-      src = src.replace(/^https?:/, '');
-    script.src = src;
-    document.getElementsByTagName("head")[0].appendChild(script);
-  })();
-</script>
-</body>
-
-</html>
diff --git a/docs/12-thinking-with-data.html b/docs/12-thinking-with-data.html
index 01c8e552e..86c503036 100644
--- a/docs/12-thinking-with-data.html
+++ b/docs/12-thinking-with-data.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 12 Thinking with Data | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 12 Thinking with Data | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 12 Thinking with Data | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -538,6 +531,13 @@ <h1>
 </html>
 <div id="thinking-with-data" class="section level1">
 <h1><span class="header-section-number">Chapter 12</span> Thinking with Data</h1>
+<hr />
+<div class="announcement">
+<p>
+<strong>In preparation for our first print edition to be published by CRC Press in Fall 2019, we’re remodeling this chapter a bit. Don’t expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at <a href="https://moderndive.com/">ModernDive.com</a> by early Summer 2019!</strong>
+</p>
+</div>
+<hr />
 <p>Recall in Section <a href="index.html#sec:intro-for-students">1.1</a> “Introduction for students” and at the end of chapters throughout this book, we displayed the “ModernDive flowchart” mapping your journey through this book.</p>
 <div class="figure" style="text-align: center"><span id="fig:moderndive-figure-conclusion"></span>
 <img src="images/flowcharts/flowchart/flowchart.002.png" alt="ModernDive Flowchart" width="\textwidth" />
@@ -588,14 +588,6 @@ <h3>Needed packages</h3>
 <span class="kw">library</span>(dplyr)
 <span class="kw">library</span>(moderndive)
 <span class="kw">library</span>(fivethirtyeight)</code></pre></div>
-</div>
-<div id="datacamp-2" class="section level3 unnumbered">
-<h3>DataCamp</h3>
-<p>The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author <a href="https://twitter.com/rudeboybert">Albert Y. Kim’s</a> DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression.”</p>
-<center>
-<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse"><img src="images/datacamp_intro_to_modeling.png" style="height: 200px;"/></a>
-</center>
-<p>Case studies involving data in the <code>fivethirtyeight</code> R package form the basis of ModernDive co-author <a href="https://twitter.com/old_man_chester?lang=en">Chester Ismay’s</a> DataCamp course “Effective Data Storytelling in the Tidyverse.” This free course can be accessed <a href="https://www.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free">here</a>.</p>
 <hr />
 </div>
 <div id="seattle-house-prices" class="section level2">
@@ -835,7 +827,7 @@ <h3><span class="header-section-number">12.1.2</span> log10 transformations</h3>
 <p>Let’s break this down:</p>
 <ol style="list-style-type: decimal">
 <li>When purchasing a cup of coffee, we tend to think of prices ranging in single dollars e.g. $2 or $3. However when purchasing say mobile phones, we don’t tend to think in prices in single dollars e.g. $676 or $757, but tend to round to the nearest unit of hundreds of dollars e.g. $200 or $500.</li>
-<li>Let’s say want to know the log10-transformed value of $76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since $76 is between $10 and $100. In fact, <code>log10(76)</code> is 1.880814.</li>
+<li>Let’s say we want to know the log10-transformed value of $76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since $76 is between $10 and $100. In fact, <code>log10(76)</code> is 1.880814.</li>
 <li>log10-transformations are <em>monotonic</em>, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B).</li>
 <li>Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: $100 to $1000.</li>
 </ol>
@@ -1012,19 +1004,19 @@ <h3><span class="header-section-number">12.1.5</span> Making predictions</h3>
 <div class="learncheck">
 
 </div>
+<hr />
 </div>
 </div>
 <div id="data-journalism" class="section level2">
 <h2><span class="header-section-number">12.2</span> Case study: Effective data storytelling</h2>
-<hr />
 <div class="learncheck">
 <p>
 <strong>Note: This section is still under construction. If you would like to contribute, please check us out on GitHub at <a href="https://github.com/moderndive/moderndive_book" class="uri">https://github.com/moderndive/moderndive_book</a>.</strong>
 </p>
+</div>
 <center>
-/begin{center} <code>r include_image(path = “images/sign-2408065_1920.png”, html_opts=“height=100px”, latex_opts = “width=20%”)</code> /end{center}
+<img src="images/sign-2408065_1920.png" height="100" />
 </center>
-</div>
 <hr />
 <p>As we’ve progressed throughout this book, you’ve seen how to work with data in a variety of ways. You’ve learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types. You’ve summarized data in table form and calculated summary statistics for a variety of different variables. Further, you’ve seen the value of inference as a process to come to conclusions about a population by using a random sample. Lastly, you’ve explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure. All throughout, you’ve learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the “effective data storytelling” done by data journalists around the world. Great data stories don’t mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling.</p>
 <div id="bechdel-test-for-hollywood-gender-representation" class="section level3">
@@ -1058,14 +1050,14 @@ <h3><span class="header-section-number">12.2.2</span> US Births in 1999</h3>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(US_births_<span class="dv">1999</span>, <span class="kw">aes</span>(<span class="dt">x =</span> date, <span class="dt">y =</span> births)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_line</span>() <span class="op">+</span>
 <span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Data&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Number of births&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;US Births in 1999&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-435-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-438-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see a big valley occurring just before January 1st, 2000, mostly likely due to the holiday season. However, what about the major peak of over 14,000 births occurring just before October 1st, 1999? What could be the reason for this anomalously high spike in ? Time to think with data!</p>
 </div>
 <div id="other-examples" class="section level3">
 <h3><span class="header-section-number">12.2.3</span> Other examples</h3>
 <p>Stand by!</p>
 </div>
-<div id="script-of-r-code-5" class="section level3">
+<div id="script-of-r-code-3" class="section level3">
 <h3><span class="header-section-number">12.2.4</span> Script of R code</h3>
 <p>An R script file of all R code used in this chapter is available <a href="scripts/12-thinking-with-data.R">here</a>.</p>
 </div>
diff --git a/docs/2-getting-started.html b/docs/2-getting-started.html
index 597e7ed2c..7a092a09e 100644
--- a/docs/2-getting-started.html
+++ b/docs/2-getting-started.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 2 Getting Started with Data in R | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 2 Getting Started with Data in R | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 2 Getting Started with Data in R | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -928,7 +921,7 @@ <h3><span class="header-section-number">2.5.1</span> Additional resources</h3>
 <div id="whats-to-come" class="section level3">
 <h3><span class="header-section-number">2.5.2</span> What’s to come?</h3>
 <p>As we stated earlier however, the best way to learn R is to learn by doing. We now start the “data science” portion of the book in Chapter <a href="3-viz.html#viz">3</a> with what we feel is the most important tool in a data scientist’s toolbox: data visualization. We will continue to explore the data included in the <code>nycflights13</code> package through data visualization. We’ll see that data visualization is a powerful tool to add to our toolbox for data exploring that provides additional insight to what the <code>View()</code> and <code>glimpse()</code> functions can provide.</p>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-12"></span>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-14"></span>
 <img src="images/flowcharts/flowchart/flowchart.004.png" alt="ModernDive flowchart" width="110%" />
 <p class="caption">
 FIGURE 2.1: ModernDive flowchart
diff --git a/docs/3-viz.html b/docs/3-viz.html
index b3b177156..23f6b98f5 100644
--- a/docs/3-viz.html
+++ b/docs/3-viz.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 3 Data Visualization | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 3 Data Visualization | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 3 Data Visualization | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -539,7 +532,7 @@ <h1>
 <div id="viz" class="section level1">
 <h1><span class="header-section-number">Chapter 3</span> Data Visualization</h1>
 <p>We begin the development of your data science toolbox with data visualization. By visualizing our data, we gain valuable insights that we couldn’t initially see from just looking at the raw data in spreadsheet form. We will use the <code>ggplot2</code> package as it provides an easy way to customize your plots. <code>ggplot2</code> is rooted in the data visualization theory known as <em>The Grammar of Graphics</em> <span class="citation">(Wilkinson <a href="#ref-wilkinson2005">2005</a>)</span>.</p>
-<p>At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). Graphics should be designed to emphasise the findings and insight you want your audience to understand. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don’t want to include so many as to overwhelm your audience.</p>
+<p>At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). Graphics should be designed to emphasize the findings and insight you want your audience to understand. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don’t want to include so many as to overwhelm your audience.</p>
 <p>As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the <em>distribution</em> of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is <em>distributed</em> in terms of its values) as we go across the levels of a different categorical variable.</p>
 <div id="needed-packages" class="section level3 unnumbered">
 <h3>Needed packages</h3>
@@ -551,7 +544,7 @@ <h3>Needed packages</h3>
 </div>
 <div id="grammarofgraphics" class="section level2">
 <h2><span class="header-section-number">3.1</span> The Grammar of Graphics</h2>
-<p>We begin with a discussion of a theoretical framework for data visualization known as “The Grammar of Graphics,” which serves as the foundation for the <code>ggplot2</code> package. Think of how we construct sentences in english to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can’t just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, “The Grammar of Graphics” define a set of rules for contructing <em>statistical graphics</em> by combining different types of <em>layers</em>. This grammar was created by Leland Wilkinson <span class="citation">(Wilkinson <a href="#ref-wilkinson2005">2005</a>)</span> and has been implemented in a variety of data visualization software including R.</p>
+<p>We begin with a discussion of a theoretical framework for data visualization known as “The Grammar of Graphics,” which serves as the foundation for the <code>ggplot2</code> package. Think of how we construct sentences in English to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can’t just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, “The Grammar of Graphics” define a set of rules for constructing <em>statistical graphics</em> by combining different types of <em>layers</em>. This grammar was created by Leland Wilkinson <span class="citation">(Wilkinson <a href="#ref-wilkinson2005">2005</a>)</span> and has been implemented in a variety of data visualization software including R.</p>
 <div id="components-of-the-grammar" class="section level3">
 <h3><span class="header-section-number">3.1.1</span> Components of the Grammar</h3>
 <p>In short, the grammar tells us that:</p>
@@ -803,7 +796,7 @@ <h3><span class="header-section-number">3.1.3</span> Other components</h3>
 - `coord`inate system for x/y values: typically `cartesian`, but can also be `map` or `polar`.
 - `stat`istical transformations: this includes smoothing, binning values into a histogram, or no transformation at all (known as the `"identity"` transformation).
 -->
-<p>Other more complex components like <code>scales</code> and <code>coord</code>inate systems are left for a more advanced text such as <a href="http://r4ds.had.co.nz/data-visualisation.html#aesthetic-mappings">R for Data Science</a> <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>. Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifiying them.</p>
+<p>Other more complex components like <code>scales</code> and <code>coord</code>inate systems are left for a more advanced text such as <a href="http://r4ds.had.co.nz/data-visualisation.html#aesthetic-mappings">R for Data Science</a> <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>. Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifying them.</p>
 </div>
 <div id="ggplot2-package" class="section level3">
 <h3><span class="header-section-number">3.1.4</span> ggplot2 package</h3>
@@ -1044,7 +1037,7 @@ <h3><span class="header-section-number">3.4.1</span> Linegraphs via geom_line</h
 </div>
 <div id="summary-1" class="section level3">
 <h3><span class="header-section-number">3.4.2</span> Summary</h3>
-<p>Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use lingraphs over scatterplots when the variable on the x-axis (i.e. the explanatory variable) has an inherent ordering, like some notion of time.</p>
+<p>Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use linegraphs over scatterplots when the variable on the x-axis (i.e. the explanatory variable) has an inherent ordering, like some notion of time.</p>
 <hr />
 </div>
 </div>
@@ -1087,7 +1080,7 @@ <h2><span class="header-section-number">3.5</span> 5NG#3: Histograms</h2>
 <p>The remaining bins all have a similar interpretation.</p>
 <div id="geomhistogram" class="section level3">
 <h3><span class="header-section-number">3.5.1</span> Histograms via geom_histogram</h3>
-<p>Let’s now present the <code>ggplot()</code> code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in <code>aes()</code>: the single numerical variable <code>temp</code>. The y-aesthetic of a histogram gets computed for you automatically. Furthemore, the geometric object layer is now a <code>geom_histogram()</code></p>
+<p>Let’s now present the <code>ggplot()</code> code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in <code>aes()</code>: the single numerical variable <code>temp</code>. The y-aesthetic of a histogram gets computed for you automatically. Furthermore, the geometric object layer is now a <code>geom_histogram()</code></p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>()</code></pre></div>
 <pre><code>`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</code></pre>
@@ -1128,17 +1121,17 @@ <h3><span class="header-section-number">3.5.2</span> Adjusting the bins</h3>
 <p>Using the first method, we have the power to specify how many bins we would like to cut the x-axis up in. As mentioned in the previous section, the default number of bins is 30. We can override this default, to say 40 bins, as follows:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">40</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-24"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-24-1.png" alt="Histogram with 60 bins." width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-26"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-26-1.png" alt="Histogram with 40 bins." width="\textwidth" />
 <p class="caption">
-FIGURE 3.14: Histogram with 60 bins.
+FIGURE 3.14: Histogram with 40 bins.
 </p>
 </div>
 <p>Using the second method, instead of specifying the number of bins, we specify the width of the bins by using the <code>binwidth</code> argument in the <code>geom_histogram()</code> layer. For example, let’s set the width of each bin to be 10°F.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-25"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-25-1.png" alt="Histogram with binwidth 10." width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-27"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-27-1.png" alt="Histogram with binwidth 10." width="\textwidth" />
 <p class="caption">
 FIGURE 3.15: Histogram with binwidth 10.
 </p>
@@ -1164,7 +1157,7 @@ <h3><span class="header-section-number">3.5.3</span> Summary</h3>
 </div>
 <div id="facets" class="section level2">
 <h2><span class="header-section-number">3.6</span> Facets</h2>
-<p>Before continuing the 5NG, let’s briefly introduce a new concept called <em>faceting</em>. Faceting is used when we’d like to split a particular visualization of variables by another variable. This will create mutiple copies of the same type of plot with matching x and y axes, but whose content will differ.</p>
+<p>Before continuing the 5NG, let’s briefly introduce a new concept called <em>faceting</em>. Faceting is used when we’d like to split a particular visualization of variables by another variable. This will create multiple copies of the same type of plot with matching x and y axes, but whose content will differ.</p>
 <p>For example, suppose we were interested in looking at how the histogram of hourly temperature recordings at the three NYC airports we saw in Section <a href="3-viz.html#histograms">3.5</a> differed by month. We would “split” this histogram by the 12 possible months in a given year, in other words plot histograms of <code>temp</code> for each <code>month</code>. We do this by adding <code>facet_wrap(~ month)</code> layer.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
@@ -1175,7 +1168,7 @@ <h2><span class="header-section-number">3.6</span> Facets</h2>
 FIGURE 3.16: Faceted histogram.
 </p>
 </div>
-<p>Note the use of the tilde <code>~</code> before <code>month</code> in <code>facet_wrap()</code>. The tilde is required and you’ll receive the error <code>Error in as.quoted(facets) : object 'month' not found</code> if you don’t include it before <code>month</code> here. We can also specify the number of rows and columns in the grid by using the <code>nrow</code> and <code>ncol</code> arguments inside of <code>facet_wrap()</code>. For example, say we would like our facetted plot to have 4 rows instead of 3. Add the <code>nrow = 4</code> argument to <code>facet_wrap(~ month)</code></p>
+<p>Note the use of the tilde <code>~</code> before <code>month</code> in <code>facet_wrap()</code>. The tilde is required and you’ll receive the error <code>Error in as.quoted(facets) : object 'month' not found</code> if you don’t include it before <code>month</code> here. We can also specify the number of rows and columns in the grid by using the <code>nrow</code> and <code>ncol</code> arguments inside of <code>facet_wrap()</code>. For example, say we would like our faceted plot to have 4 rows instead of 3. Add the <code>nrow = 4</code> argument to <code>facet_wrap(~ month)</code></p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
 <span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>month, <span class="dt">nrow =</span> <span class="dv">4</span>)</code></pre></div>
@@ -1286,7 +1279,7 @@ <h3><span class="header-section-number">3.7.1</span> Boxplots via geom_boxplot</
 </div>
 <p><strong>(LC3.22)</strong> What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point.</p>
 <p><strong>(LC3.23)</strong> Which months have the highest variability in temperature? What reasons can you give for this?</p>
-<p><strong>(LC3.24)</strong> We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?</p>
+<p><strong>(LC3.24)</strong> We looked at the distribution of the numerical variable <code>temp</code> split by the numerical variable <code>month</code> that we converted to a categorical variable using the <code>factor()</code> function. Why would a boxplot of <code>temp</code> split by the numerical variable <code>pressure</code> similarly converted to a categorical variable using the <code>factor()</code> not be informative?</p>
 <p><strong>(LC3.25)</strong> Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?</p>
 <div class="learncheck">
 
@@ -1540,7 +1533,7 @@ <h3><span class="header-section-number">3.8.2</span> Must avoid pie charts!</h3>
 </ul>
 <p>While it is quite difficult to answer these questions when looking at the pie chart in Figure <a href="3-viz.html#fig:carrierpie">3.27</a>, we can much more easily answer these questions using the barchart in Figure <a href="3-viz.html#fig:flightsbar">3.26</a>. This is true since barplots present the information in a way such that comparisons between categories can be made with single horizontal lines, whereas pie charts present the information in a way such that comparisons between categories must be made by comparing angles.</p>
 <p>There may be one exception of a pie chart not to avoid courtesy Nathan Yau at <a href="https://flowingdata.com/2008/09/19/pie-i-have-eaten-and-pie-i-have-not-eaten/" title="Pie I Have Eaten and Pie I Have Not Eaten">FlowingData.com</a>, but we will leave this for the reader to decide:</p>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-32"></span>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-34"></span>
 <img src="images/Pie-I-have-Eaten.jpg" alt="The only good pie chart" width="\textwidth" />
 <p class="caption">
 FIGURE 3.28: The only good pie chart
@@ -1562,7 +1555,7 @@ <h3><span class="header-section-number">3.8.3</span> Two categorical variables</
 <p>Barplots are the go-to way to visualize the frequency of different categories, or levels, of a single categorical variable. Another use of barplots is to visualize the <em>joint</em> distribution of two categorical variables at the same time. Let’s examine the <em>joint</em> distribution of outgoing domestic flights from NYC by <code>carrier</code> and <code>origin</code>, or in other words the number of flights for each <code>carrier</code> and <code>origin</code> combination. For example, the number of WestJet flights from <code>JFK</code>, the number of WestJet flights from <code>LGA</code>, the number of WestJet flights from <code>EWR</code>, the number of American Airlines flights from <code>JFK</code>, and so on. Recall the <code>ggplot()</code> code that created the barplot of <code>carrier</code> frequency in Figure <a href="3-viz.html#fig:flightsbar">3.26</a>:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_bar</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-34-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-36-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We can now map the additional variable <code>origin</code> by adding a <code>fill = origin</code> inside the <code>aes()</code> aesthetic mapping; the <code>fill</code> aesthetic of any bar corresponds to the color used to fill the bars.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier, <span class="dt">fill =</span> origin)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_bar</span>()</code></pre></div>
@@ -1598,8 +1591,8 @@ <h3><span class="header-section-number">3.8.3</span> Two categorical variables</
 <p>Another alternative to stacked barplots are <em>side-by-side barplots</em>, also known as a <em>dodged barplot</em>. The code to created a side-by-side barplot is identical to the code to create a stacked barplot, but with a <code>position = &quot;dodge&quot;</code> argument added to <code>geom_bar()</code>. In other words, we are overriding the default barplot type, which is a stacked barplot, and specifying it to be a side-by-side barplot.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier, <span class="dt">fill =</span> origin)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_bar</span>(<span class="dt">position =</span> <span class="st">&quot;dodge&quot;</span>)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-37"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-37-1.png" alt="Side-by-side AKA dodged barplot comparing the number of flights by carrier and origin." width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-39"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-39-1.png" alt="Side-by-side AKA dodged barplot comparing the number of flights by carrier and origin." width="\textwidth" />
 <p class="caption">
 FIGURE 3.31: Side-by-side AKA dodged barplot comparing the number of flights by carrier and origin.
 </p>
@@ -1805,7 +1798,7 @@ <h3><span class="header-section-number">3.9.4</span> What’s to come</h3>
 
 <span class="kw">ggplot</span>(<span class="dt">data =</span> early_january_weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> time_hour, <span class="dt">y =</span> temp)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_line</span>()</code></pre></div>
-<p>These two code segments were a preview of Chapter <a href="4-wrangling.html#wrangling">4</a> on data wrangling where we’ll delve further into the <code>dplyr</code> package. Data wrangling is the process of transforming and modifying existing data to with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the <code>filter()</code> function to create new data frames (<code>alaska_flights</code> and <code>early_january_weather</code>) by choosing only a subset of rows of existing data frames (<code>flights</code> and <code>weather</code>). In this next chapter, we’ll formally introduce the <code>filter()</code> and other data wrangling functions as well as the <em>pipe operator</em> <code>%&gt;%</code> which allows you to combine multiple data wrangling actions into a single sequential <em>chain</em> of actions. On to Chapter <a href="4-wrangling.html#wrangling">4</a> on data wrangling!</p>
+<p>These two code segments were a preview of Chapter <a href="4-wrangling.html#wrangling">4</a> on data wrangling where we’ll delve further into the <code>dplyr</code> package. Data wrangling is the process of transforming and modifying existing data with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the <code>filter()</code> function to create new data frames (<code>alaska_flights</code> and <code>early_january_weather</code>) by choosing only a subset of rows of existing data frames (<code>flights</code> and <code>weather</code>). In this next chapter, we’ll formally introduce the <code>filter()</code> and other data wrangling functions as well as the <em>pipe operator</em> <code>%&gt;%</code> which allows you to combine multiple data wrangling actions into a single sequential <em>chain</em> of actions. On to Chapter <a href="4-wrangling.html#wrangling">4</a> on data wrangling!</p>
 
 </div>
 </div>
diff --git a/docs/4-wrangling.html b/docs/4-wrangling.html
index 1fff7858d..372aee794 100644
--- a/docs/4-wrangling.html
+++ b/docs/4-wrangling.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 4 Data Wrangling | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 4 Data Wrangling | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 4 Data Wrangling | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -538,7 +531,7 @@ <h1>
 </html>
 <div id="wrangling" class="section level1">
 <h1><span class="header-section-number">Chapter 4</span> Data Wrangling</h1>
-<p>So far in our journey, we’ve seen how to look at data saved in data frames using the <code>glimpse()</code> and <code>View()</code> functions in Chapter <a href="2-getting-started.html#getting-started">2</a> on and how to create data visualizations using the <code>ggplot2</code> package in Chapter <a href="3-viz.html#viz">3</a>. In particular we study what we term the “five named graphs” (5NG):</p>
+<p>So far in our journey, we’ve seen how to look at data saved in data frames using the <code>glimpse()</code> and <code>View()</code> functions in Chapter <a href="2-getting-started.html#getting-started">2</a> on and how to create data visualizations using the <code>ggplot2</code> package in Chapter <a href="3-viz.html#viz">3</a>. In particular we studied what we term the “five named graphs” (5NG):</p>
 <ol style="list-style-type: decimal">
 <li>scatterplots via <code>geom_point()</code></li>
 <li>linegraphs via <code>geom_line()</code></li>
@@ -546,8 +539,8 @@ <h1><span class="header-section-number">Chapter 4</span> Data Wrangling</h1>
 <li>histograms via <code>geom_histogram()</code></li>
 <li>barplots via <code>geom_bar()</code> or <code>geom_col()</code></li>
 </ol>
-<p>We created these visualization using the “Grammar of Graphics”, which maps variables in a data frame to the aesthetic attributes of the above 5 <code>geom</code>etric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure <a href="3-viz.html#fig:gapminder">3.1</a>.</p>
-<p>Furthermore in Section <a href="3-viz.html#whats-to-come-3">3.9.4</a> we discussed that for two of our visualizations, we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay <em>only</em> for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the <code>flights</code> data frame to a new data frame <code>alaska_flights</code> consisting of only <code>carrier == AS</code> flights using the <code>filter()</code> function.</p>
+<p>We created these visualizations using the “Grammar of Graphics”, which maps variables in a data frame to the aesthetic attributes of one the above 5 <code>geom</code>etric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure <a href="3-viz.html#fig:gapminder">3.1</a>.</p>
+<p>Recall however in Section <a href="3-viz.html#whats-to-come-3">3.9.4</a> we discussed that for two of our visualizations we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay <em>only</em> for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the <code>flights</code> data frame to a new data frame <code>alaska_flights</code> consisting of only <code>carrier == &quot;AS&quot;</code> flights using the <code>filter()</code> function.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">alaska_flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">filter</span>(carrier <span class="op">==</span><span class="st"> &quot;AS&quot;</span>)
 
@@ -556,13 +549,14 @@ <h1><span class="header-section-number">Chapter 4</span> Data Wrangling</h1>
 <p>In this chapter, we’ll introduce a series of functions from the <code>dplyr</code> package that will allow you to take a data frame and</p>
 <ol style="list-style-type: decimal">
 <li><code>filter()</code> its existing rows to only pick out a subset of them. For example, the <code>alaska_flights</code> data frame above.</li>
-<li><code>summarize()</code> one of its columns/variables with a <em>summary statistic</em>. For example, the median and interquartile range of temperatures as we saw in Section <a href="3-viz.html#boxplots">3.7</a> on boxplots.</li>
-<li><code>group_by()</code> its rows. In other words assign different rows to be part of the same <em>group</em> and thus report summary statistics for each group separately. For example, perhaps you want not the overall average departure delay <code>dep_delay</code> for all three <code>origin</code> airports combined, but the average departure delay for each of the three <code>origin</code> airports separately.</li>
+<li><code>summarize()</code> one of its columns/variables with a <em>summary statistic</em>. Examples include the median and interquartile range of temperatures as we saw in Section <a href="3-viz.html#boxplots">3.7</a> on boxplots.</li>
+<li><code>group_by()</code> its rows. In other words assign different rows to be part of the same <em>group</em> and report summary statistics for each group separately. For example, say perhaps you don’t want a single overall average departure delay <code>dep_delay</code> for all three <code>origin</code> airports combined, but rather three separate average departure delays, one for each of the three <code>origin</code> airports.</li>
 <li><code>mutate()</code> its existing columns/variables to create new ones. For example, convert hourly temperature recordings from °F to °C.</li>
 <li><code>arrange()</code> its rows. For example, sort the rows of <code>weather</code> in ascending or descending order of <code>temp</code>.</li>
 <li><code>join()</code> it with another data frame by matching along a “key” variable. In other words, merge these two data frames together.</li>
 </ol>
-<p>Notice how we used computer code type font to describe the actions we want to take on our data frames. This is because the <code>dplyr</code> package have intuitively verb-named functions that are easy to remember. We’ll start by introducing the pipe operator <code>%&gt;%</code>, which allows you to combine multiple data wrangling verb-named functions into a single sequential <em>chain</em> of actions.</p>
+<p>Notice how we used <code>computer code</code> font to describe the actions we want to take on our data frames. This is because the <code>dplyr</code> package for data wrangling that we’ll introduce in this chapter has intuitively verb-named functions that are easy to remember.</p>
+<p>We’ll start by introducing the pipe operator <code>%&gt;%</code>, which allows you to combine multiple data wrangling verb-named functions into a single sequential <em>chain</em> of actions.</p>
 <div id="needed-packages-1" class="section level3 unnumbered">
 <h3>Needed packages</h3>
 <p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
@@ -573,7 +567,7 @@ <h3>Needed packages</h3>
 </div>
 <div id="piping" class="section level2">
 <h2><span class="header-section-number">4.1</span> The pipe operator: <code>%&gt;%</code></h2>
-<p>Before we dig into data wrangling, let’s first introduce a very nifty tool that gets loaded along with the <code>dplyr</code> package: the pipe operator <code>%&gt;%</code>. Let’s say you would like to perform this sequence of operations in R:</p>
+<p>Before we start data wrangling, let’s first introduce a very nifty tool that gets loaded along with the <code>dplyr</code> package: the pipe operator <code>%&gt;%</code>. Say you would like to perform a hypothetical sequence of operations on a hypothetical data frame <code>x</code> using hypothetical functions <code>f()</code>, <code>g()</code>, and <code>h()</code>:</p>
 <ol style="list-style-type: decimal">
 <li>Take <code>x</code> <em>then</em></li>
 <li>Use <code>x</code> as an input to a function <code>f()</code> <em>then</em></li>
@@ -582,7 +576,7 @@ <h2><span class="header-section-number">4.1</span> The pipe operator: <code>%&gt
 </ol>
 <p>One way to achieve this sequence of operations is by using nesting parentheses as follows:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">h</span>(<span class="kw">g</span>(<span class="kw">f</span>(x)))</code></pre></div>
-<p>In this case, the above code isn’t so hard to read since we are applying only three functions: <code>f()</code>, then <code>g()</code>, then <code>h()</code>. However, you can imagine this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator <code>%&gt;%</code> (pronounced “then”) comes in handy. <code>%&gt;%</code> takes one output of one function and then “pipes” it to be the input of the next function. For example: you can obtain the same output as the above sequence of operations as follows:</p>
+<p>The above code isn’t so hard to read since we are applying only three functions: <code>f()</code>, then <code>g()</code>, then <code>h()</code>. However, you can imagine that this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator <code>%&gt;%</code> comes in handy. <code>%&gt;%</code> takes one output of one function and then “pipes” it to be the input of the next function. Furthermore, a helpful trick is to read <code>%&gt;%</code> as “then.” For example, you can obtain the same output as the above sequence of operations as follows:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">f</span>() <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">g</span>() <span class="op">%&gt;%</span><span class="st"> </span>
@@ -594,7 +588,7 @@ <h2><span class="header-section-number">4.1</span> The pipe operator: <code>%&gt
 <li>Use this output as the input to the next function <code>g()</code> <em>then</em></li>
 <li>Use this output as the input to the next function <code>h()</code></li>
 </ol>
-<p>So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are <code>x</code>, <code>f()</code>, <code>g()</code>, and <code>h()</code>? Throughout this chapter on data wrangling:</p>
+<p>So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are the hypothetical <code>x</code>, <code>f()</code>, <code>g()</code>, and <code>h()</code>? Throughout this chapter on data wrangling:</p>
 <ul>
 <li>The starting value <code>x</code> will be a data frame. For example: <code>flights</code>.</li>
 <li>The sequence of functions, here <code>f()</code>, <code>g()</code>, and <code>h()</code>, will be a sequence of any number of the 6 data wrangling verb-named functions we listed in the introduction to this chapter. For example: <code>filter(carrier == &quot;AS&quot;)</code>.</li>
@@ -603,7 +597,7 @@ <h2><span class="header-section-number">4.1</span> The pipe operator: <code>%&gt
 <p>Much like when adding layers to a <code>ggplot()</code> using the <code>+</code> sign at the end of lines, you form a single <em>chain</em> of data wrangling operations by combining verb-named functions into a single sequence with pipe operators <code>%&gt;%</code> at the end of lines. So continuing our example involving Alaska Airlines flights, we form a chain using the pipe operator <code>%&gt;%</code> and save the resulting data frame in <code>alaska_flights</code>:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">alaska_flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">filter</span>(carrier <span class="op">==</span><span class="st"> &quot;AS&quot;</span>)</code></pre></div>
-<p>Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you’ll see some examples of these near in Section <a href="4-wrangling.html#other-verbs">4.8</a>. However, just with these 6 verb-named functions you’ll be able to perform a broad array of data wrangling tasks.</p>
+<p>Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you’ll see some examples of these near in Section <a href="4-wrangling.html#other-verbs">4.8</a>. However, just with these 6 verb-named functions you’ll be able to perform a broad array of data wrangling tasks for the rest of this book.</p>
 <hr />
 </div>
 <div id="filter" class="section level2">
@@ -614,7 +608,7 @@ <h2><span class="header-section-number">4.2</span> <code>filter</code> rows</h2>
 FIGURE 4.1: Diagram of
 </p>
 </div>
-<p>The <code>filter()</code> function here works much like the “Filter” option in Microsoft Excel; it allows you to specify criteria about values of a variable in your dataset and then chooses only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The <code>dest</code> code (or airport code) for Portland, Oregon is <code>&quot;PDX&quot;</code>. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here:</p>
+<p>The <code>filter()</code> function here works much like the “Filter” option in Microsoft Excel; it allows you to specify criteria about the values of a variables in your dataset and then filters out only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The <code>dest</code> code (or airport code) for Portland, Oregon is <code>&quot;PDX&quot;</code>. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">portland_flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">filter</span>(dest <span class="op">==</span><span class="st"> &quot;PDX&quot;</span>)
 <span class="kw">View</span>(portland_flights)</code></pre></div>
@@ -622,52 +616,63 @@ <h2><span class="header-section-number">4.2</span> <code>filter</code> rows</h2>
 <ul>
 <li>The ordering of the commands:
 <ul>
-<li>Take the data frame <code>flights</code> <em>then</em></li>
+<li>Take the <code>flights</code> data frame <code>flights</code> <em>then</em></li>
 <li><code>filter</code> the data frame so that only those where the <code>dest</code> equals <code>&quot;PDX&quot;</code> are included.</li>
 </ul></li>
-<li>The double equal sign <code>==</code> for testing for equality, and not <code>=</code>. You are almost guaranteed to make the mistake at least once of only including one equals sign.</li>
-</ul>
-<p>You can combine multiple criteria together using operators that make comparisons:</p>
-<ul>
-<li><code>|</code> corresponds to “or”</li>
-<li><code>&amp;</code> corresponds to “and”</li>
+<li>We test for equality using the double equal sign <code>==</code> and not a single equal sign <code>=</code>. In other words <code>filter(dest = &quot;PDX&quot;)</code> will yield an error. This is a convention across many programming languages. If you are new to coding, you’ll probably forget to use the double equal sign <code>==</code> a few times before you get the hang of it.</li>
 </ul>
-<p>We can often skip the use of <code>&amp;</code> and just separate our conditions with a comma. You’ll see this in the example below.</p>
-<p>In addition, you can use other mathematical checks (similar to <code>==</code>):</p>
+<p>You can use other mathematical operations beyond just <code>==</code> to form criteria:</p>
 <ul>
 <li><code>&gt;</code> corresponds to “greater than”</li>
 <li><code>&lt;</code> corresponds to “less than”</li>
 <li><code>&gt;=</code> corresponds to “greater than or equal to”</li>
 <li><code>&lt;=</code> corresponds to “less than or equal to”</li>
-<li><code>!=</code> corresponds to “not equal to”</li>
+<li><code>!=</code> corresponds to “not equal to”. The <code>!</code> is used in many programming languages to indicate “not”.</li>
 </ul>
-<p>To see many of these in action, let’s select all flights that left JFK airport heading to Burlington, Vermont (<code>&quot;BTV&quot;</code>) or Seattle, Washington (<code>&quot;SEA&quot;</code>) in the months of October, November, or December. Run the following</p>
+<p>Furthermore, you can combine multiple criteria together using operators that make comparisons:</p>
+<ul>
+<li><code>|</code> corresponds to “or”</li>
+<li><code>&amp;</code> corresponds to “and”</li>
+</ul>
+<p>To see many of these in action, let’s filter <code>flights</code> for all rows that:</p>
+<ul>
+<li>Departed from JFK airport and</li>
+<li>Were heading to Burlington, Vermont (<code>&quot;BTV&quot;</code>) or Seattle, Washington (<code>&quot;SEA&quot;</code>) and</li>
+<li>Departed in the months of October, November, or December.</li>
+</ul>
+<p>Run the following:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">btv_sea_flights_fall &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">filter</span>(origin <span class="op">==</span><span class="st"> &quot;JFK&quot;</span>, 
-         dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>, 
-         month <span class="op">&gt;=</span><span class="st"> </span><span class="dv">10</span>)
+<span class="st">  </span><span class="kw">filter</span>(origin <span class="op">==</span><span class="st"> &quot;JFK&quot;</span> <span class="op">&amp;</span><span class="st"> </span>(dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>) <span class="op">&amp;</span><span class="st"> </span>month <span class="op">&gt;=</span><span class="st"> </span><span class="dv">10</span>)
 <span class="kw">View</span>(btv_sea_flights_fall)</code></pre></div>
-<p>Note: even though colloquially speaking one might say “all flights leaving Burlington, Vermont <em>and</em> Seattle, Washington,” in terms of computer logical operations, we really mean “all flights leaving Burlington, Vermont <em>or</em> Seattle, Washington.” For a given row in the data, <code>dest</code> can be “BTV”, “SEA”, or something else, but not “BTV” and “SEA” at the same time.</p>
-<p>Another example uses the <code>!</code> to pick rows that <em>don’t</em> match a condition. The <code>!</code> can be read as “not.” Here we are selecting rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA.</p>
+<p>Note that even though colloquially speaking one might say “all flights leaving Burlington, Vermont <em>and</em> Seattle, Washington,” in terms of computer operations, we really mean “all flights leaving Burlington, Vermont <em>or</em> leaving Seattle, Washington.” For a given row in the data, <code>dest</code> can be “BTV”, “SEA”, or something else, but not “BTV” and “SEA” at the same time. Furthermore, note the careful use of parentheses around the <code>dest == &quot;BTV&quot; | dest == &quot;SEA&quot;</code>.</p>
+<p>We can often skip the use of <code>&amp;</code> and just separate our conditions with a comma. In other words the code above will return the identical output <code>btv_sea_flights_fall</code> as this code below:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">btv_sea_flights_fall &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(origin <span class="op">==</span><span class="st"> &quot;JFK&quot;</span>, (dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>), month <span class="op">&gt;=</span><span class="st"> </span><span class="dv">10</span>)
+<span class="kw">View</span>(btv_sea_flights_fall)</code></pre></div>
+<p>Let’s present another example that uses the <code>!</code> “not” operator to pick rows that <em>don’t</em> match a criteria. As mentioned earlier, the <code>!</code> can be read as “not.” Here we are filtering rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">not_BTV_SEA &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">filter</span>(<span class="op">!</span>(dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>))
 <span class="kw">View</span>(not_BTV_SEA)</code></pre></div>
+<p>Again, note the careful use of parentheses around the <code>(dest == &quot;BTV&quot; | dest == &quot;SEA&quot;)</code>. If we didn’t use parentheses as follows:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(<span class="op">!</span>dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>)</code></pre></div>
+<p>We would be returning all flights not headed to <code>&quot;BTV&quot;</code> <em>or</em> those headed to <code>&quot;SEA&quot;</code>, which is an entirely different resulting data frame.</p>
 <p>Now say we have a large list of airports we want to filter for, say <code>BTV</code>, <code>SEA</code>, <code>PDX</code>, <code>SFO</code>, and <code>BDL</code>. We could continue to use the <code>|</code> or operator as so:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">many_airports &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">filter</span>(dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;PDX&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SFO&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;BDL&quot;</span>)
 <span class="kw">View</span>(many_airports)</code></pre></div>
-<p>but as we progressively include more airports, this will get unwieldly. A slightly shorter approach uses the <code>%in%</code> operator:</p>
+<p>but as we progressively include more airports, this will get unwieldy. A slightly shorter approach uses the <code>%in%</code> operator:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">many_airports &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">filter</span>(dest <span class="op">%in%</span><span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;BTV&quot;</span>, <span class="st">&quot;SEA&quot;</span>, <span class="st">&quot;PDX&quot;</span>, <span class="st">&quot;SFO&quot;</span>, <span class="st">&quot;BDL&quot;</span>))
 <span class="kw">View</span>(many_airports)</code></pre></div>
-<p>What this code is doing is its filtering for all flights where <code>dest</code> is in the list of airports <code>c(&quot;BTV&quot;, &quot;SEA&quot;, &quot;PDX&quot;, &quot;SFO&quot;, &quot;BDL&quot;)</code>. Both outputs of <code>many_airports</code> are the same, but as you can see the latter takes much less time to code.</p>
-<p>As a final note we point out that <code>filter()</code> should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope to just the observations your care about.</p>
+<p>What this code is doing is filtering <code>flights</code> for all flights where <code>dest</code> is in the list of airports <code>c(&quot;BTV&quot;, &quot;SEA&quot;, &quot;PDX&quot;, &quot;SFO&quot;, &quot;BDL&quot;)</code>. Recall from Chapter <a href="2-getting-started.html#getting-started">2</a> that the <code>c()</code> function “combines” or “concatenates” values in a vector of values. Both outputs of <code>many_airports</code> are the same, but as you can see the latter takes much less time to code.</p>
+<p>As a final note we point out that <code>filter()</code> should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope of your data frame to just the observations your care about.</p>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
 </p>
 </div>
-<p><strong>(LC4.1)</strong> What’s another way using the “not” operator <code>!</code> we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the <code>flights</code> data frame? Test this out using the code above.</p>
+<p><strong>(LC4.1)</strong> What’s another way of using the “not” operator <code>!</code> to filter only the rows that are not going to Burlington VT nor Seattle WA in the <code>flights</code> data frame? Test this out using the code above.</p>
 <div class="learncheck">
 
 </div>
@@ -675,7 +680,7 @@ <h2><span class="header-section-number">4.2</span> <code>filter</code> rows</h2>
 </div>
 <div id="summarize" class="section level2">
 <h2><span class="header-section-number">4.3</span> <code>summarize</code> variables</h2>
-<p>The next common task when working with data is to be able to summarize data: take a large number of values and summarize them with a single value. While this may seem like a very abstract idea, something as simple as the sum, the smallest value, and the largest values are all summaries of a large number of values.</p>
+<p>The next common task when working with data is to return <em>summary statistics</em>: a single numerical value that summarizes a large number of values, for example the mean/average or the median. Other examples of summary statistics that might not immediately come to mind include the sum, the smallest value AKA the minimum, the largest value AKA the maximum, and the standard deviation; they are all summaries of a large number of values.</p>
 <div class="figure" style="text-align: center"><span id="fig:sum1"></span>
 <img src="images/summarize1.png" alt="Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
 <p class="caption">
@@ -688,58 +693,28 @@ <h2><span class="header-section-number">4.3</span> <code>summarize</code> variab
 FIGURE 4.3: Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
 </p>
 </div>
-<p>We can calculate the standard deviation and mean of the temperature variable <code>temp</code> in the <code>weather</code> data frame of <code>nycflights13</code> in one step using the <code>summarize</code> (or equivalently using the UK spelling <code>summarise</code>) function in <code>dplyr</code> (See Appendix <a href="A-appendixA.html#appendixA">A</a>):</p>
+<p>Let’s calculate the mean and the standard deviation of the temperature variable <code>temp</code> in the <code>weather</code> data frame included in the <code>nycflights13</code> package (See Appendix <a href="A-appendixA.html#appendixA">A</a>). We’ll do this in one step using the <code>summarize()</code> function from the <code>dplyr</code> package and save the results in a new data frame <code>summary_temp</code> with columns/variables <code>mean</code> and the <code>std_dev</code>. Note you can also use the UK spelling of “summarise” using the <code>summarise()</code> function.</p>
+<p>As shown in Figures <a href="4-wrangling.html#fig:sum1">4.2</a> and <a href="4-wrangling.html#fig:sum2">4.3</a>, the <code>weather</code> data frame’s many rows will be collapsed into a single row of just the summary values, in this case the mean and standard deviation:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp), 
-            <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp))
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp), <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp))
 summary_temp</code></pre></div>
-<!--
-TODO: Fix this output later. As is, the table outputs no rows
--->
 <pre><code># A tibble: 1 x 2
    mean std_dev
   &lt;dbl&gt;   &lt;dbl&gt;
 1    NA      NA</code></pre>
-<p>We’ve created a small data frame here called <code>summary_temp</code> that includes both the <code>mean</code> and the <code>std_dev</code> of the <code>temp</code> variable in <code>weather</code>. Notice as shown in Figures <a href="4-wrangling.html#fig:sum1">4.2</a> and <a href="4-wrangling.html#fig:sum2">4.3</a>, the data frame <code>weather</code> went from many rows to a single row of just the summary values in the data frame <code>summary_temp</code>.</p>
-<p>But why are the values returned <code>NA</code>? This stands for “not available or not applicable” and is how R encodes <em>missing values</em>; if in a data frame for a particular row and column no value exists, <code>NA</code> is stored instead. Furthermore, by default any time you try to summarize a number of values (using <code>mean()</code> and <code>sd()</code> for example) that has one or more missing values, then <code>NA</code> is returned.</p>
-<p>Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You’ll often encounter issues with missing values.</p>
-<p>You can summarize all non-missing values by setting the <code>na.rm</code> argument to TRUE (<code>rm</code> is short for “remove”). This will remove any <code>NA</code> missing values and only return the summary value for all non-missing values. So the code below computes the mean and standard deviation of all non-missing values. Notice how the <code>na.rm=TRUE</code> are set as arguments to the <code>mean()</code> and <code>sd()</code> functions, and not to the <code>summarize()</code> function.</p>
+<p>Why are the values returned <code>NA</code>? As we saw in Section <a href="3-viz.html#geompoint">3.3.1</a> when creating the scatterplot of departure and arrival delays for <code>alaska_flights</code>, <code>NA</code> is how R encodes <em>missing values</em> where <code>NA</code> indicates “not available” or “not applicable.” If a value for a particular row and a particular column does not exist, <code>NA</code> is stored instead. Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You’ll often encounter issues with missing values when working with real data.</p>
+<p>Going back to our <code>summary_temp</code> output above, by default any time you try to calculate a summary statistic of a variable that has one or more <code>NA</code> missing values in R, then <code>NA</code> is returned. To work around this fact, you can set the <code>na.rm</code> argument to <code>TRUE</code>, where <code>rm</code> is short for “remove”; this will ignore any <code>NA</code> missing values and only return the summary value for all non-missing values.</p>
+<p>The code below computes the mean and standard deviation of all non-missing values of <code>temp</code>. Notice how the <code>na.rm=TRUE</code> are used as arguments to the <code>mean()</code> and <code>sd()</code> functions individually, and not to the <code>summarize()</code> function.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>), 
             <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
 summary_temp</code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<thead>
-<tr>
-<th style="text-align:right;">
-mean
-</th>
-<th style="text-align:right;">
-std_dev
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:right;">
-55.3
-</td>
-<td style="text-align:right;">
-17.8
-</td>
-</tr>
-</tbody>
-</table>
-<p>It is not good practice to include a <code>na.rm = TRUE</code> in your summary commands by default; you should attempt to run code first without this argument as this will alert you to the presence of missing data. Only after you’ve identified where missing values occur and have thought about the potential causes of this missing should you consider using <code>na.rm = TRUE</code>. In the upcoming Learning Checks we’ll consider the possible ramifications of blindly sweeping rows with missing values under the rug.</p>
-<!--
-If we'd like to access either of these values directly we can use the `$` to specify a column in a data frame. For example:
-
-
-```r
-#summary_temp$mean
-```
--->
-<p>What other summary functions can we use inside the <code>summarize()</code> verb? Any function in R that takes a vector of values and returns just one. Here are just a few:</p>
+<pre><code># A tibble: 1 x 2
+   mean std_dev
+  &lt;dbl&gt;   &lt;dbl&gt;
+1  55.3    17.8</code></pre>
+<p>However, one needs to be cautious whenever ignoring missing values as we’ve done above. In the upcoming Learning Checks we’ll consider the possible ramifications of blindly sweeping rows with missing values “under the rug.” This is in fact why the <code>na.rm</code> argument to any summary statistic function in R has is set to <code>FALSE</code> by default; in other words, do not ignore rows with missing values by default. R is alerting you to the presence of missing data and you should by mindful of this missingness and any potential causes of this missingness throughtout your analysis.</p>
+<p>What are other summary statistic functions can we use inside the <code>summarize()</code> verb? As seen in Figure <a href="4-wrangling.html#fig:sum2">4.3</a>, you can use any function in R that takes many values and returns just one. Here are just a few:</p>
 <ul>
 <li><code>mean()</code>: the mean AKA the average</li>
 <li><code>sd()</code>: the standard deviation, which is a measure of spread</li>
@@ -772,14 +747,7 @@ <h2><span class="header-section-number">4.4</span> <code>group_by</code> rows</h
 FIGURE 4.4: Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
 </p>
 </div>
-<p>It’s often more useful to summarize a variable based on the groupings of another variable. Let’s say, we are interested in the mean and standard deviation of temperatures but <em>grouped by month</em>. To be more specific: we want the mean and standard deviation of temperatures</p>
-<ol style="list-style-type: decimal">
-<li>split by month.</li>
-<li>sliced by month.</li>
-<li>aggregated by month.</li>
-<li>collapsed over month.</li>
-</ol>
-<p>Run the following code:</p>
+<p>Say instead of the a single mean temperature for the whole year, you would like 12 mean temperatures, one for each of the 12 months separately? In other words, we would like to compute the mean temperature split by month AKA sliced by month AKA aggregated by month. We can do this by “grouping” temperature observations by the values of another variable, in this case by the 12 values of the variable <code>month</code>. Run the following code:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">summary_monthly_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>), 
@@ -934,56 +902,88 @@ <h2><span class="header-section-number">4.4</span> <code>group_by</code> rows</h
 </tr>
 </tbody>
 </table>
-<p>This code is identical to the previous code that created <code>summary_temp</code>, with an extra <code>group_by(month)</code> added. Grouping the <code>weather</code> dataset by <code>month</code> and then passing this new data frame into <code>summarize</code> yields a data frame that shows the mean and standard deviation of temperature for each month in New York City. Note: Since each row in <code>summary_monthly_temp</code> represents a summary of different rows in <code>weather</code>, the observational units have changed.</p>
-<p>It is important to note that <code>group_by</code> doesn’t change the data frame. It sets <em>meta-data</em> (data about the data), specifically the group structure of the data. It is only after we apply the <code>summarize</code> function that the data frame changes.</p>
-<p>If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the <code>ungroup()</code> function. For example, say the group structure meta-data is set to be by month via <code>group_by(month)</code>, all future summarizations will be reported on a month-by-month basis. If however, we would like to no longer have this and have all summarizations be for all data in a single group (in this case over the entire year of 2013), then pipe the data frame in question through and <code>ungroup()</code> to remove this.</p>
-<p>We now revisit the <code>n()</code> counting summary function we introduced in the previous section. For example, suppose we’d like to get a sense for how many flights departed each of the three airports in New York City:</p>
+<p>This code is identical to the previous code that created <code>summary_temp</code>, but with an extra <code>group_by(month)</code> added before the <code>summarize()</code>. Grouping the <code>weather</code> dataset by <code>month</code> and then applying the <code>summarize()</code> functions yields a data frame that displays the mean and standard deviation temperature split by the 12 months of the year.</p>
+<p>It is important to note that the <code>group_by()</code> function doesn’t change data frames by itself. Rather it changes the <em>meta-data</em>, or data about the data, specifically the group structure. It is only after we apply the <code>summarize()</code> function that the data frame changes. For example, let’s consider the <code>diamonds</code> data frame included in the <code>ggplot2</code> package. Run this code, specifically in the console:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">diamonds</code></pre></div>
+<pre><code># A tibble: 53,940 x 10
+   carat cut       color clarity depth table price     x     y     z
+   &lt;dbl&gt; &lt;ord&gt;     &lt;ord&gt; &lt;ord&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
+ 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
+ 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
+ 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
+ 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
+ 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
+ 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
+ 7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
+ 8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
+ 9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
+10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39
+# … with 53,930 more rows</code></pre>
+<p>Observe that the first line of the output reads <code># A tibble: 53,940 x 10</code>. This is an example of meta-data, in this case the number of observations/rows and variables/columns in <code>diamonds</code>. The actual data itself are the subsequent table of values.</p>
+<p>Now let’s pipe the <code>diamonds</code> data frame into <code>group_by(cut)</code>. Run this code, specifically in the console:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">diamonds <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(cut)</code></pre></div>
+<pre><code># A tibble: 53,940 x 10
+# Groups:   cut [5]
+   carat cut       color clarity depth table price     x     y     z
+   &lt;dbl&gt; &lt;ord&gt;     &lt;ord&gt; &lt;ord&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
+ 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
+ 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
+ 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
+ 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
+ 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
+ 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
+ 7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
+ 8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
+ 9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
+10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39
+# … with 53,930 more rows</code></pre>
+<p>Observe that now there is additional meta-data: <code># Groups: cut [5]</code> indicating that the grouping structure meta-data has been set based on the 5 possible values AKA levels of the categorical variable <code>cut</code>: <code>&quot;Fair&quot;</code>, <code>&quot;Good&quot;</code>, <code>&quot;Very Good&quot;</code>, <code>&quot;Premium&quot;</code>, <code>&quot;Ideal&quot;</code>. On the other hand observe that the data has not changed: it is still a table of 53,940 <span class="math inline">\(\times\)</span> 10 values.</p>
+<p>Only by combining a <code>group_by()</code> with another data wrangling operation, in this case <code>summarize()</code> will the actual data be transformed.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">diamonds <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(cut) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">avg_price =</span> <span class="kw">mean</span>(price))</code></pre></div>
+<pre><code># A tibble: 5 x 2
+  cut       avg_price
+  &lt;ord&gt;         &lt;dbl&gt;
+1 Fair          4359.
+2 Good          3929.
+3 Very Good     3982.
+4 Premium       4584.
+5 Ideal         3458.</code></pre>
+<p>If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the <code>ungroup()</code> function. Observe how the <code># Groups: cut [5]</code> meta-data is no longer present. Run this code, specifically in the console:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">diamonds <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(cut) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">ungroup</span>()</code></pre></div>
+<pre><code># A tibble: 53,940 x 10
+   carat cut       color clarity depth table price     x     y     z
+   &lt;dbl&gt; &lt;ord&gt;     &lt;ord&gt; &lt;ord&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
+ 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
+ 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
+ 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
+ 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
+ 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
+ 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
+ 7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
+ 8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
+ 9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
+10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39
+# … with 53,930 more rows</code></pre>
+<p>Let’s now revisit the <code>n()</code> counting summary function we introduced in the previous section. For example, suppose we’d like to count how many flights departed each of the three airports in New York City:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">by_origin &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">group_by</span>(origin) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())
 by_origin</code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<thead>
-<tr>
-<th style="text-align:left;">
-origin
-</th>
-<th style="text-align:right;">
-count
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-120835
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-111279
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-104662
-</td>
-</tr>
-</tbody>
-</table>
-<p>We see that Newark (<code>&quot;EWR&quot;</code>) had the most flights departing in 2013 followed by <code>&quot;JFK&quot;</code> and lastly by LaGuardia (<code>&quot;LGA&quot;</code>). Note there is a subtle but important difference between <code>sum()</code> and <code>n()</code>. While <code>sum()</code> simply adds up a large set of numbers, the latter counts the number of times each of many different values occur.</p>
+<pre><code># A tibble: 3 x 2
+  origin  count
+  &lt;chr&gt;   &lt;int&gt;
+1 EWR    120835
+2 JFK    111279
+3 LGA    104662</code></pre>
+<p>We see that Newark (<code>&quot;EWR&quot;</code>) had the most flights departing in 2013 followed by <code>&quot;JFK&quot;</code> and lastly by LaGuardia (<code>&quot;LGA&quot;</code>). Note there is a subtle but important difference between <code>sum()</code> and <code>n()</code>; While <code>sum()</code> returns the sum of a numerical variable, <code>n()</code> returns counts of the the number of rows/observations.</p>
 <div id="grouping-by-more-than-one-variable" class="section level3">
 <h3><span class="header-section-number">4.4.1</span> Grouping by more than one variable</h3>
-<p>You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports <em>for each month</em>, we can also group by a second variable <code>month</code>: <code>group_by(origin, month)</code>.</p>
+<p>You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports <em>for each month</em>, we can also group by a second variable <code>month</code>: <code>group_by(origin, month)</code>. We see there are 36 rows to <code>by_origin_monthly</code> because there are 12 months for 3 airports (<code>EWR</code>, <code>JFK</code>, and <code>LGA</code>).</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">by_origin_monthly &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">group_by</span>(origin, month) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())
@@ -1003,7 +1003,7 @@ <h3><span class="header-section-number">4.4.1</span> Grouping by more than one v
  9 EWR        9  9550
 10 EWR       10 10104
 # … with 26 more rows</code></pre>
-<p>We see there are 36 rows to <code>by_origin_monthly</code> because there are 12 months times 3 airports (<code>EWR</code>, <code>JFK</code>, and <code>LGA</code>). Why do we <code>group_by(origin, month)</code> and not <code>group_by(origin)</code> and then <code>group_by(month)</code>? Let’s investigate:</p>
+<p>Why do we <code>group_by(origin, month)</code> and not <code>group_by(origin)</code> and then <code>group_by(month)</code>? Let’s investigate:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">by_origin_monthly_incorrect &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">group_by</span>(origin) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span><span class="st"> </span>
@@ -1024,18 +1024,7 @@ <h3><span class="header-section-number">4.4.1</span> Grouping by more than one v
 10    10 28889
 11    11 27268
 12    12 28135</code></pre>
-<p>What happened here is that the second <code>group_by(month)</code> overrode the first <code>group_by(origin)</code>, so that in the end we are only grouping by <code>month</code>. The lesson here, is if you want to <code>group_by()</code> two or more variables, you should include all these variables in a single <code>group_by()</code> function call.</p>
-<!--
-Alternatively, you can use the shortcut `count()` function in `dplyr` to get the same result:
-
-
-```r
-by_monthly_origin <- flights %>% 
-  count(origin, month)
-by_monthly_origin
-```
-
--->
+<p>What happened here is that the second <code>group_by(month)</code> overrode the group structure meta-data of the first <code>group_by(origin)</code>, so that in the end we are only grouping by <code>month</code>. The lesson here is if you want to <code>group_by()</code> two or more variables, you should include all these variables in a single <code>group_by()</code> function call.</p>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
@@ -1060,7 +1049,38 @@ <h2><span class="header-section-number">4.5</span> <code>mutate</code> existing
 FIGURE 4.5: Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet
 </p>
 </div>
-<p>When looking at the <code>flights</code> dataset, there are some clear additional variables that could be calculated based on the values of variables already in the dataset. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to when they expected to land. This is commonly referred to as “gain” and we will create this variable using the <code>mutate</code> function. Note that we have also overwritten the <code>flights</code> data frame with what it was before as well as an additional variable <code>gain</code> here, or put differently, the <code>mutate()</code> command outputs a new data frame which then gets saved over the original <code>flights</code> data frame.</p>
+<p>Another common transformation of data is to create/compute new variables based on existing ones. For example, say you are more comfortable thinking of temperature in degrees Celsius °C and not degrees Farenheit °F. The formula to convert temperatures from °F to °C is:</p>
+<p><span class="math display">\[
+\text{temp in C} = \frac{\text{temp in F} - 32}{1.8}
+\]</span></p>
+<p>We can apply this formula to the <code>temp</code> variable using the <code>mutate()</code> function, which takes existing variables and mutates them to create new ones.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">weather &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">temp_in_C =</span> (temp<span class="op">-</span><span class="dv">32</span>)<span class="op">/</span><span class="fl">1.8</span>)
+<span class="kw">View</span>(weather)</code></pre></div>
+<p>Note that we have overwritten the original <code>weather</code> data frame with a new version that now includes the additional variable <code>temp_in_C</code>. In other words, the <code>mutate()</code> command outputs a new data frame which then gets saved over the original <code>weather</code> data frame. Furthermore, note how in <code>mutate()</code> we used <code>temp_in_C = (temp-32)/1.8</code> to create a new variable <code>temp_in_C</code>.</p>
+<p>Why did we overwrite the data frame <code>weather</code> instead of assigning the result to a new data frame like <code>weather_new</code>, but on the other hand why did we <em>not</em> overwrite <code>temp</code>, but instead created a new variable called <code>temp_in_C</code>? As a rough rule of thumb, as long as you are not losing original information that you might need later, it’s acceptable practice to overwrite existing data frames. On the other hand, had we used <code>mutate(temp = (temp-32)/1.8)</code> instead of <code>mutate(temp_in_C = (temp-32)/1.8)</code>, we would have overwritten the original variable <code>temp</code> and lost its values.</p>
+<p>Let’s compute average monthly temperatures in both °F and °C using the similar <code>group_by()</code> and <code>summarize()</code> code as in the previous section.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">summary_monthly_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_temp_in_F =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>), 
+            <span class="dt">mean_temp_in_C =</span> <span class="kw">mean</span>(temp_in_C, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
+summary_monthly_temp</code></pre></div>
+<pre><code># A tibble: 12 x 3
+   month mean_temp_in_F mean_temp_in_C
+   &lt;dbl&gt;          &lt;dbl&gt;          &lt;dbl&gt;
+ 1     1           35.6           2.02
+ 2     2           34.3           1.26
+ 3     3           39.9           4.38
+ 4     4           51.7          11.0 
+ 5     5           61.8          16.6 
+ 6     6           72.2          22.3 
+ 7     7           80.1          26.7 
+ 8     8           74.5          23.6 
+ 9     9           67.4          19.7 
+10    10           60.1          15.6 
+11    11           45.0           7.22
+12    12           38.4           3.58</code></pre>
+<p>Let’s consider another example. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to the original arrival time. This is commonly referred to as “gain” and we will create this variable using the <code>mutate()</code> function.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">mutate</span>(<span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay)</code></pre></div>
 <p>Let’s take a look at <code>dep_delay</code>, <code>arr_delay</code>, and the resulting <code>gain</code> variables for the first 5 rows in our new <code>flights</code> data frame:</p>
@@ -1073,7 +1093,6 @@ <h2><span class="header-section-number">4.5</span> <code>mutate</code> existing
 4        -1       -18    17
 5        -6       -25    19</code></pre>
 <p>The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its “gained time in the air” is actually a loss of 9 minutes, hence its <code>gain</code> is <code>-9</code>. Contrast this to the flight in the fourth row which departed a minute early (<code>dep_delay</code> of <code>-1</code>) but arrived 18 minutes early (<code>arr_delay</code> of <code>-18</code>), so its “gained time in the air” is 17 minutes, hence its <code>gain</code> is <code>+17</code>.</p>
-<p>Why did we overwrite <code>flights</code> instead of assigning the resulting data frame to a new object, like <code>flights_with_gain</code>? As a rough rule of thumb, as long as you are not losing information that you might need later, it’s acceptable practice to overwrite data frames. However, if you overwrite existing variables and/or change the observational units, recovering the original information might prove difficult. In this case, it might make sense to create a new data object.</p>
 <p>Let’s look at summary measures of this <code>gain</code> variable and even plot it in the form of a histogram:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">gain_summary &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(
@@ -1148,8 +1167,8 @@ <h2><span class="header-section-number">4.5</span> <code>mutate</code> existing
 <p>We’ve recreated the <code>summary</code> function we saw in Chapter <a href="3-viz.html#viz">3</a> here using the <code>summarize</code> function in <code>dplyr</code>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> gain)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">bins =</span> <span class="dv">20</span>)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-76"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-76-1.png" alt="Histogram of gain variable" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-85"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-85-1.png" alt="Histogram of gain variable" width="\textwidth" />
 <p class="caption">
 FIGURE 4.6: Histogram of gain variable
 </p>
@@ -1176,8 +1195,8 @@ <h2><span class="header-section-number">4.5</span> <code>mutate</code> existing
 </div>
 <div id="arrange" class="section level2">
 <h2><span class="header-section-number">4.6</span> <code>arrange</code> and sort rows</h2>
-<p>One of the most common things people working with data would like to do is sort the data frames by a specific variable in a column. Have you ever been asked to calculate a median by hand? This requires you to put the data in order from smallest to highest in value. The <code>dplyr</code> package has a function called <code>arrange</code> that we will use to sort/reorder our data according to the values of the specified variable. This is often used after we have used the <code>group_by</code> and <code>summarize</code> functions as we will see.</p>
-<p>Let’s suppose we were interested in determining the most frequent destination airports from New York City in 2013:</p>
+<p>One of the most common tasks people working with data would like to perform is sort the data frame’s rows in alphanumeric order of the values in a variable/column. For example, when calculating a median by hand requires you to first sort the data from the smallest to highest in value and then identify the “middle” value. The <code>dplyr</code> package has a function called <code>arrange()</code> that we will use to sort/reorder a data frame’s rows according to the values of the specified variable. This is often used after we have used the <code>group_by()</code> and <code>summarize()</code> functions as we will see.</p>
+<p>Let’s suppose we were interested in determining the most frequent destination airports for all domestic flights departing from New York City in 2013:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">freq_dest &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>())
@@ -1196,7 +1215,7 @@ <h2><span class="header-section-number">4.6</span> <code>arrange</code> and sort
  9 BGR           375
 10 BHM           297
 # … with 95 more rows</code></pre>
-<p>You’ll see that by default the values of <code>dest</code> are displayed in alphabetical order here. We are interested in finding those airports that appear most:</p>
+<p>Observe that by default the rows of the resulting <code>freq_dest</code> data frame are sorted in alphabetical order of <code>dest</code> destination. Say instead we would like to see the same data, but sorted from the most to the least number of flights <code>num_flights</code> instead:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">freq_dest <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">arrange</span>(num_flights)</code></pre></div>
 <pre><code># A tibble: 105 x 2
@@ -1213,7 +1232,7 @@ <h2><span class="header-section-number">4.6</span> <code>arrange</code> and sort
  9 JAC            25
 10 BZN            36
 # … with 95 more rows</code></pre>
-<p>This is actually giving us the opposite of what we are looking for. It tells us the least frequent destination airports first. To switch the ordering to be descending instead of ascending we use the <code>desc</code> (<code>desc</code>ending) function:</p>
+<p>This is actually giving us the opposite of what we are looking for: the rows are sorted with the least frequent destination airports displayed first. To switch the ordering to be descending instead of ascending we use the <code>desc()</code> function, which is short for “descending”:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">freq_dest <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights))</code></pre></div>
 <pre><code># A tibble: 105 x 2
@@ -1230,40 +1249,40 @@ <h2><span class="header-section-number">4.6</span> <code>arrange</code> and sort
  9 MIA         11728
 10 DCA          9705
 # … with 95 more rows</code></pre>
+<p>In other words, <code>arrange()</code> sorts in ascending order by default unless you override this default behavior by using <code>desc()</code>.</p>
 <hr />
 </div>
 <div id="joins" class="section level2">
 <h2><span class="header-section-number">4.7</span> <code>join</code> data frames</h2>
-<p>Another common task is joining AKA merging two different datasets. For example, in the <code>flights</code> data, the variable <code>carrier</code> lists the carrier code for the different flights. While <code>&quot;UA&quot;</code> and <code>&quot;AA&quot;</code> might be somewhat easy to guess for some (United and American Airlines), what are “VX”, “HA”, and “B6”? This information is provided in a separate data frame <code>airlines</code>.</p>
+<p>Another common data transformation task is “joining” or “merging” two different datasets. For example in the <code>flights</code> data frame the variable <code>carrier</code> lists the carrier code for the different flights. While the corresponding airline names for <code>&quot;UA&quot;</code> and <code>&quot;AA&quot;</code> might be somewhat easy to guess (United and American Airlines), what airlines have codes? <code>&quot;VX&quot;</code>, <code>&quot;HA&quot;</code>, and <code>&quot;B6&quot;</code>? This information is provided in a separate data frame <code>airlines</code>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(airlines)</code></pre></div>
-<p>We see that in <code>airports</code>, <code>carrier</code> is the carrier code while <code>name</code> is the full name of the airline. Using this table, we can see that “VX”, “HA”, and “B6” correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, will we have to continually look up the carrier’s name for each flight in the <code>airlines</code> dataset? No! Instead of having to do this manually, we can have R automatically do the “looking up” for us.</p>
-<p>Note that the values in the variable <code>carrier</code> in <code>flights</code> match the values in the variable <code>carrier</code> in <code>airlines</code>. In this case, we can use the variable <code>carrier</code> as a <em>key variable</em> to join/merge/match the two data frames by. Key variables are almost always identification variables that uniquely identify the observational units as we saw back in Subsection <a href="#identification-vs-measurement"><strong>??</strong></a> on identification vs measurement variables. This ensures that rows in both data frames are appropriate matched during the join. Hadley and Garrett <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span> created the following diagram to help us understand how the different datasets are linked by various key variables:</p>
+<p>We see that in <code>airports</code>, <code>carrier</code> is the carrier code while <code>name</code> is the full name of the airline company. Using this table, we can see that <code>&quot;VX&quot;</code>, <code>&quot;HA&quot;</code>, and <code>&quot;B6&quot;</code> correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, wouldn’t it be nice to have all this information in a single data frame instead of two separate data frames? We can do this by “joining” i.e. “merging” the <code>flights</code> and <code>airlines</code> data frames.</p>
+<p>Note that the values in the variable <code>carrier</code> in the <code>flights</code> data frame match the values in the variable <code>carrier</code> in the <code>airlines</code> data frame. In this case, we can use the variable <code>carrier</code> as a <em>key variable</em> to match the rows of the two data frames. Key variables are almost always identification variables that uniquely identify the observational units as we saw in Subsection <a href="#identification-vs-measurement-variables"><strong>??</strong></a>. This ensures that rows in both data frames are appropriately matched during the join. Hadley and Garrett <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span> created the following diagram to help us understand how the different datasets are linked by various key variables:</p>
 <div class="figure" style="text-align: center"><span id="fig:reldiagram"></span>
 <img src="images/relational-nycflights.png" alt="Data relationships in nycflights13 from R for Data Science" width="\textwidth" />
 <p class="caption">
 FIGURE 4.7: Data relationships in nycflights13 from R for Data Science
 </p>
 </div>
-<div id="joining-by-key-variables" class="section level3">
-<h3><span class="header-section-number">4.7.1</span> Joining by “key” variables</h3>
-<p>In both <code>flights</code> and <code>airlines</code>, the key variable we want to join/merge/match the two data frames with has the same name in both datasets: <code>carriers</code>. We make use of the <code>inner_join()</code> function to join by the variable <code>carrier</code>.</p>
+<div id="matching-key-variable-names" class="section level3">
+<h3><span class="header-section-number">4.7.1</span> Matching “key” variable names</h3>
+<p>In both the <code>flights</code> and <code>airlines</code> data frames, the key variable we want to join/merge/match the rows of the two data frames by have the same name: <code>carriers</code>. We make use of the <code>inner_join()</code> function to join the two data frames, where the rows will be matched by the variable <code>carrier</code>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights_joined &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">inner_join</span>(airlines, <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)
 <span class="kw">View</span>(flights)
 <span class="kw">View</span>(flights_joined)</code></pre></div>
-<p>We observed that the <code>flights</code> and <code>flights_joined</code> are identical except that <code>flights_joined</code> has an additional variable <code>name</code> whose values were drawn from <code>airlines</code>.</p>
-<p>A visual representation of the <code>inner_join</code> is given below <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>:</p>
+<p>Observe that the <code>flights</code> and <code>flights_joined</code> data frames are identical except that <code>flights_joined</code> has an additional variable <code>name</code> whose values correspond to the airline company names drawn from the <code>airlines</code> data frame.</p>
+<p>A visual representation of the <code>inner_join()</code> is given below <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>. There are other types of joins available (such as <code>left_join()</code>, <code>right_join()</code>, <code>outer_join()</code>, and <code>anti_join()</code>), but the <code>inner_join()</code> will solve nearly all of the problems you’ll encounter in this book.</p>
 <div class="figure" style="text-align: center"><span id="fig:ijdiagram"></span>
 <img src="images/join-inner.png" alt="Diagram of inner join from R for Data Science" width="\textwidth" />
 <p class="caption">
 FIGURE 4.8: Diagram of inner join from R for Data Science
 </p>
 </div>
-<p>There are more complex joins available, but the <code>inner_join</code> will solve nearly all of the problems you’ll face in our experience.</p>
 </div>
-<div id="joining-by-key-variables-with-different-names" class="section level3">
-<h3><span class="header-section-number">4.7.2</span> Joining by “key” variables with different names</h3>
-<p>Say instead, you are interested in all the destinations of flights from NYC in 2013 and ask yourself:</p>
+<div id="diff-key" class="section level3">
+<h3><span class="header-section-number">4.7.2</span> Different “key” variable names</h3>
+<p>Say instead you are interested in the destinations of all domestic flights departing NYC in 2013 and ask yourself:</p>
 <ul>
 <li>“What cities are these airports in?”</li>
 <li>“Is <code>&quot;ORD&quot;</code> Orlando?”</li>
@@ -1271,14 +1290,15 @@ <h3><span class="header-section-number">4.7.2</span> Joining by “key” variab
 </ul>
 <p>The <code>airports</code> data frame contains airport codes:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(airports)</code></pre></div>
-<p>However, looking at both the <code>airports</code> and <code>flights</code> and the visual representation of the relations between the data frames in Figure <a href="4-wrangling.html#fig:ijdiagram">4.8</a>, we see that in:</p>
+<p>However, looking at both the <code>airports</code> and <code>flights</code> frames and the visual representation of the relations between these data frames in Figure <a href="4-wrangling.html#fig:ijdiagram">4.8</a> above, we see that in:</p>
 <ul>
-<li><code>airports</code> the airport code is in the variable <code>faa</code></li>
-<li><code>flights</code> the airport code is in the variables <code>origin</code> and <code>dest</code> (destination)</li>
+<li>the <code>airports</code> data frame the airport code is in the variable <code>faa</code></li>
+<li>the <code>flights</code> data frame the airport codes are in the variables <code>origin</code> and <code>dest</code></li>
 </ul>
-<p>So to join these two datasets so that we can identify the destination cities, our <code>inner_join</code> operation involves a <code>by</code> argument that accounts for the different names:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(airports, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;dest&quot;</span> =<span class="st"> &quot;faa&quot;</span>))</code></pre></div>
+<p>So to join these two data frames so that we can identify the destination cities for example, our <code>inner_join()</code> operation will use the <code>by = c(&quot;dest&quot; = &quot;faa&quot;)</code> argument, which allows us to join two data frames where the key variable has a different name:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights_with_airport_names &lt;-<span class="st">  </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">inner_join</span>(airports, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;dest&quot;</span> =<span class="st"> &quot;faa&quot;</span>))
+<span class="kw">View</span>(flights_with_airport_names)</code></pre></div>
 <p>Let’s construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">named_dests &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span>
 <span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span>
@@ -1301,36 +1321,15 @@ <h3><span class="header-section-number">4.7.2</span> Joining by “key” variab
  9 MIA         11728 Miami Intl           25.8  -80.3     8    -5 A     America…
 10 DCA          9705 Ronald Reagan Wash…  38.9  -77.0    15    -5 A     America…
 # … with 91 more rows</code></pre>
-<p>In case you didn’t know, <code>&quot;ORD&quot;</code> is the airport code of Chicago O’Hare airport and <code>&quot;FLL&quot;</code> is the main airport in Fort Lauderdale, Florida, which we can now see in our <code>named_dests</code> data frame.</p>
+<p>In case you didn’t know, <code>&quot;ORD&quot;</code> is the airport code of Chicago O’Hare airport and <code>&quot;FLL&quot;</code> is the main airport in Fort Lauderdale, Florida, which we can now see in the <code>airport_name</code> variable in the resulting <code>named_dests</code> data frame.</p>
 </div>
-<div id="joining-by-multiple-key-variables" class="section level3">
-<h3><span class="header-section-number">4.7.3</span> Joining by multiple “key” variables</h3>
+<div id="multiple-key-variables" class="section level3">
+<h3><span class="header-section-number">4.7.3</span> Multiple “key” variables</h3>
 <p>Say instead we are in a situation where we need to join by multiple variables. For example, in Figure <a href="4-wrangling.html#fig:reldiagram">4.7</a> above we see that in order to join the <code>flights</code> and <code>weather</code> data frames, we need more than one key variable: <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code>, and <code>origin</code>. This is because the combination of these 5 variables act to uniquely identify each observational unit in the <code>weather</code> data frame: hourly weather recordings at each of the 3 NYC airports.</p>
-<p>We achieve this by specifying a vector of key variables to join by using the <code>c()</code> concatenate function. Note the individual variables need to be wrapped in quotation marks.</p>
+<p>We achieve this by specifying a vector of key variables to join by using the <code>c()</code> function for “combine” or “concatenate” that we saw earlier:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights_weather_joined &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">inner_join</span>(weather, 
-             <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;year&quot;</span>, <span class="st">&quot;month&quot;</span>, <span class="st">&quot;day&quot;</span>, <span class="st">&quot;hour&quot;</span>, <span class="st">&quot;origin&quot;</span>))
-flights_weather_joined</code></pre></div>
-<pre><code># A tibble: 335,220 x 32
-    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
-   &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt;    &lt;int&gt;          &lt;int&gt;     &lt;dbl&gt;    &lt;int&gt;          &lt;int&gt;
- 1  2013     1     1      517            515         2      830            819
- 2  2013     1     1      533            529         4      850            830
- 3  2013     1     1      542            540         2      923            850
- 4  2013     1     1      544            545        -1     1004           1022
- 5  2013     1     1      554            600        -6      812            837
- 6  2013     1     1      554            558        -4      740            728
- 7  2013     1     1      555            600        -5      913            854
- 8  2013     1     1      557            600        -3      709            723
- 9  2013     1     1      557            600        -3      838            846
-10  2013     1     1      558            600        -2      753            745
-# … with 335,210 more rows, and 24 more variables: arr_delay &lt;dbl&gt;,
-#   carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;,
-#   air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;,
-#   time_hour.x &lt;dttm&gt;, gain &lt;dbl&gt;, hours &lt;dbl&gt;, gain_per_hour &lt;dbl&gt;,
-#   temp &lt;dbl&gt;, dewp &lt;dbl&gt;, humid &lt;dbl&gt;, wind_dir &lt;dbl&gt;, wind_speed &lt;dbl&gt;,
-#   wind_gust &lt;dbl&gt;, precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;,
-#   time_hour.y &lt;dttm&gt;</code></pre>
+<span class="st">  </span><span class="kw">inner_join</span>(weather, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;year&quot;</span>, <span class="st">&quot;month&quot;</span>, <span class="st">&quot;day&quot;</span>, <span class="st">&quot;hour&quot;</span>, <span class="st">&quot;origin&quot;</span>))
+<span class="kw">View</span>(flights_weather_joined)</code></pre></div>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
@@ -1340,13 +1339,36 @@ <h3><span class="header-section-number">4.7.3</span> Joining by multiple “key
 <p><strong>(LC4.14)</strong> What surprises you about the top 10 destinations from NYC in 2013?</p>
 <div class="learncheck">
 
+</div>
+</div>
+<div id="normal-forms" class="section level3">
+<h3><span class="header-section-number">4.7.4</span> Normal forms</h3>
+<p>The data frames included in the <code>nycflights13</code> package are in a form that minimizes redundancy of data. For example, the <code>flights</code> data frame only saves the <code>carrier</code> code of the airline company; it does not include the actual name of the airline. For example the first row of <code>flights</code> has <code>carrier</code> equal to <code>UA</code>, but does it does not include the airline name “United Air Lines Inc.” The names of the airline companies are included in the <code>name</code> variable of the <code>airlines</code> data frame. In order to have the airline company name included in <code>flights</code>, we could join these two data frames as follows:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">joined_flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">inner_join</span>(airlines, <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)
+<span class="kw">View</span>(joined_flights)</code></pre></div>
+<p>We are capable of performing this join because each of the data frames have <em>keys</em> in common to relate one to another: the <code>carrier</code> variable in both the <code>flights</code> and <code>airlines</code> data frames. The <em>key</em> variable(s) that we join are often <em>identification variables</em> we mentioned previously.</p>
+<p>This is an important property of what’s known as <strong>normal forms</strong> of data. The process of decomposing data frames into less redundant tables without losing information is called <strong>normalization</strong>. More information is available on <a href="https://en.wikipedia.org/wiki/Database_normalization">Wikipedia</a>.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC4.15)</strong> What are some advantages of data in normal forms? What are some disadvantages?</p>
+<div class="learncheck">
+
 </div>
 <hr />
 </div>
 </div>
 <div id="other-verbs" class="section level2">
 <h2><span class="header-section-number">4.8</span> Other verbs</h2>
-<p>On top of the following examples of other verbs, if you’d like to see more examples on using <code>dplyr</code>, the data wrangling verbs we introduction in Section <a href="#verbs"><strong>??</strong></a>, and the pipe function <code>%&gt;%</code> with the <code>nycflights13</code> dataset, check out <a href="http://r4ds.had.co.nz/transform.html">Chapter 5</a> of Hadley and Garrett’s book <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>.</p>
+<p>Here are some other useful data wrangling verbs that might come in handy:</p>
+<ul>
+<li><code>select()</code> only a subset of variables/columns</li>
+<li><code>rename()</code> variables/columns to have new names</li>
+<li>Return only the <code>top_n()</code> values of a variable</li>
+</ul>
 <div id="select" class="section level3">
 <h3><span class="header-section-number">4.8.1</span> <code>select</code> variables</h3>
 <div class="figure" style="text-align: center"><span id="fig:selectfig"></span>
@@ -1355,25 +1377,25 @@ <h3><span class="header-section-number">4.8.1</span> <code>select</code> variabl
 FIGURE 4.9: Select diagram from Data Wrangling with dplyr and tidyr cheatsheet
 </p>
 </div>
-<p>We’ve seen that the <code>flights</code> data frame in the <code>nycflights13</code> package contains many different variables. The <code>names</code> function gives a listing of all the columns in a data frame; in our case you would run <code>names(flights)</code>. You can also identify these variables by running the <code>glimpse</code> function in the <code>dplyr</code> package:</p>
+<p>We’ve seen that the <code>flights</code> data frame in the <code>nycflights13</code> package contains 19 different variables. You can identify the names of these 19 variables by running the <code>glimpse()</code> function from the <code>dplyr</code> package:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(flights)</code></pre></div>
-<p>However, say you only want to consider two of these variables, say <code>carrier</code> and <code>flight</code>. You can <code>select</code> these:</p>
+<p>However, say you only need two of these variables, say <code>carrier</code> and <code>flight</code>. You can <code>select()</code> these two variables:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">select</span>(carrier, flight)</code></pre></div>
-<p>This function makes navigating datasets with a very large number of variables easier for humans by restricting consideration to only those of interest, like <code>carrier</code> and <code>flight</code> above. So for example, this might make viewing the dataset using the <code>View()</code> spreadsheet viewer more digestible. However, as far as the computer is concerned it doesn’t care how many additional variables are in the dataset in question, so long as <code>carrier</code> and <code>flight</code> are included.</p>
-<p>Another example involves the variable <code>year</code>. If you remember the original description of the <code>flights</code> data frame (or by running <code>?flights</code>), you’ll remember that this data correspond to flights in 2013 departing New York City. The <code>year</code> variable isn’t really a variable here in that it doesn’t vary… <code>flights</code> actually comes from a larger dataset that covers many years. We may want to remove the <code>year</code> variable from our dataset since it won’t be helpful for analysis in this case. We can deselect <code>year</code> by using the <code>-</code> sign:</p>
+<p>This function makes exploring data frames with a very large number of variables easier for humans to process by restricting consideration to only those we care about, like our example with <code>carrier</code> and <code>flight</code> above. This might make viewing the dataset using the <code>View()</code> spreadsheet viewer more digestible. However, as far as the computer is concerned, it doesn’t care how many additional variables are in the data frame in question, so long as <code>carrier</code> and <code>flight</code> are included.</p>
+<p>Let’s say instead you want to drop i.e deselect certain variables. For example, take the variable <code>year</code> in the <code>flights</code> data frame. This variable isn’t quite a “variable” in the sense that all the values are <code>2013</code> i.e. it doesn’t change. Say you want to remove the <code>year</code> variable from the data frame; we can deselect <code>year</code> by using the <code>-</code> sign:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights_no_year &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">select</span>(<span class="op">-</span>year)
-<span class="kw">names</span>(flights_no_year)</code></pre></div>
-<p>Or we could specify a ranges of columns:</p>
+<span class="kw">glimpse</span>(flights_no_year)</code></pre></div>
+<p>Another way of selecting columns/variables is by specifying a range of columns:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flight_arr_times &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">select</span>(month<span class="op">:</span>day, arr_time<span class="op">:</span>sched_arr_time)
 flight_arr_times</code></pre></div>
-<p>The <code>select</code> function can also be used to reorder columns in combination with the <code>everything</code> helper function. Let’s suppose we’d like the <code>hour</code>, <code>minute</code>, and <code>time_hour</code> variables, which appear at the end of the <code>flights</code> dataset, to actually appear immediately after the <code>day</code> variable:</p>
+<p>The <code>select()</code> function can also be used to reorder columns in combination with the <code>everything()</code> helper function. Let’s suppose we’d like the <code>hour</code>, <code>minute</code>, and <code>time_hour</code> variables, which appear at the end of the <code>flights</code> dataset, to appear immediately after the <code>year</code>, <code>month</code>, and <code>day</code> variables while keeping the rest of the variables. In the code below <code>everything()</code> picks up all remaining variables.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights_reorder &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(month<span class="op">:</span>day, hour<span class="op">:</span>time_hour, <span class="kw">everything</span>())
-<span class="kw">names</span>(flights_reorder)</code></pre></div>
-<p>in this case <code>everything()</code> picks up all remaining variables. Lastly, the helper functions <code>starts_with</code>, <code>ends_with</code>, and <code>contains</code> can be used to choose column names that match those conditions:</p>
+<span class="st">  </span><span class="kw">select</span>(year, month, day, hour, minute, time_hour, <span class="kw">everything</span>())
+<span class="kw">glimpse</span>(flights_reorder)</code></pre></div>
+<p>Lastly, the helper functions <code>starts_with()</code>, <code>ends_with()</code>, and <code>contains()</code> can be used to select variables/column that match those conditions. For example:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights_begin_a &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">select</span>(<span class="kw">starts_with</span>(<span class="st">&quot;a&quot;</span>))
 flights_begin_a</code></pre></div>
@@ -1386,42 +1408,32 @@ <h3><span class="header-section-number">4.8.1</span> <code>select</code> variabl
 </div>
 <div id="rename" class="section level3">
 <h3><span class="header-section-number">4.8.2</span> <code>rename</code> variables</h3>
-<p>Another useful function is <code>rename</code>, which as you may suspect renames one column to another name. Suppose we wanted <code>dep_time</code> and <code>arr_time</code> to be <code>departure_time</code> and <code>arrival_time</code> instead in the <code>flights_time</code> data frame:</p>
+<p>Another useful function is <code>rename()</code>, which as you may have guessed renames one column to another name. Suppose we want <code>dep_time</code> and <code>arr_time</code> to be <code>departure_time</code> and <code>arrival_time</code> instead in the <code>flights_time</code> data frame:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights_time_new &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">select</span>(<span class="kw">contains</span>(<span class="st">&quot;time&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">rename</span>(<span class="dt">departure_time =</span> dep_time,
          <span class="dt">arrival_time =</span> arr_time)
-<span class="kw">names</span>(flights_time)</code></pre></div>
-<p>Note that in this case we used a single <code>=</code> sign with the <code>rename()</code>. Ex: <code>departure_time = dep_time</code>. This is because we are not testing for equality like we would using <code>==</code>, but instead we want to assign a new variable <code>departure_time</code> to have the same values as <code>dep_time</code> and then delete the variable <code>dep_time</code>.</p>
-<p>It’s easy to forget if the new name comes before or after the equals sign. I usually remember this as “New Before, Old After” or NBOA. You’ll receive an error if you try to do it the other way:</p>
-<pre><code>Error: Unknown variables: departure_time, arrival_time.</code></pre>
+<span class="kw">glimpse</span>(flights_time)</code></pre></div>
+<p>Note that in this case we used a single <code>=</code> sign within the <code>rename()</code>, for example <code>departure_time = dep_time</code>. This is because we are not testing for equality like we would using <code>==</code>, but instead we want to assign a new variable <code>departure_time</code> to have the same values as <code>dep_time</code> and then delete the variable <code>dep_time</code>. It’s easy to forget if the new name comes before or after the equals sign. I usually remember this as “New Before, Old After” or NBOA.</p>
 </div>
 <div id="top_n-values-of-a-variable" class="section level3">
 <h3><span class="header-section-number">4.8.3</span> <code>top_n</code> values of a variable</h3>
-<p>We can also use the <code>top_n</code> function which automatically tells us the most frequent <code>num_flights</code>. We specify the top 10 airports here:</p>
+<p>We can also return the top <code>n</code> values of a variable using the <code>top_n()</code> function. For example, we can return a data frame of the top 10 destination airports using the example from Section <a href="4-wrangling.html#diff-key">4.7.2</a>. Observe that we set the number of values to return to <code>n = 10</code> and <code>wt = num_flights</code> to indicate that we want the rows of corresponding to the top 10 values of <code>num_flights</code>. See the help file for <code>top_n()</code> by running <code>?top_n</code> for more information.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">named_dests <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>, <span class="dt">wt =</span> num_flights)</code></pre></div>
-<p>We’ll still need to arrange this by <code>num_flights</code> though:</p>
+<p>Let’s further <code>arrange()</code> these results in descending order of <code>num_flights</code>:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">named_dests  <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>, <span class="dt">wt =</span> num_flights) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights))</code></pre></div>
-<p><strong>Note:</strong> Remember that I didn’t pull the <code>n</code> and <code>wt</code> arguments out of thin air. They can be found by using the <code>?</code> function on <code>top_n</code>.</p>
-<p>We can go one stop further and tie together the <code>group_by</code> and <code>summarize</code> functions we used to find the most frequent flights:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ten_freq_dests &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>()) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights)) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>) 
-<span class="kw">View</span>(ten_freq_dests)</code></pre></div>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
 </p>
 </div>
-<p><strong>(LC4.15)</strong> What are some ways to select all three of the <code>dest</code>, <code>air_time</code>, and <code>distance</code> variables from <code>flights</code>? Give the code showing how to do this in at least three different ways.</p>
-<p><strong>(LC4.16)</strong> How could one use <code>starts_with</code>, <code>ends_with</code>, and <code>contains</code> to select columns from the <code>flights</code> data frame? Provide three different examples in total: one for <code>starts_with</code>, one for <code>ends_with</code>, and one for <code>contains</code>.</p>
-<p><strong>(LC4.17)</strong> Why might we want to use the <code>select</code> function on a data frame?</p>
-<p><strong>(LC4.18)</strong> Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.</p>
+<p><strong>(LC4.16)</strong> What are some ways to select all three of the <code>dest</code>, <code>air_time</code>, and <code>distance</code> variables from <code>flights</code>? Give the code showing how to do this in at least three different ways.</p>
+<p><strong>(LC4.17)</strong> How could one use <code>starts_with</code>, <code>ends_with</code>, and <code>contains</code> to select columns from the <code>flights</code> data frame? Provide three different examples in total: one for <code>starts_with</code>, one for <code>ends_with</code>, and one for <code>contains</code>.</p>
+<p><strong>(LC4.18)</strong> Why might we want to use the <code>select</code> function on a data frame?</p>
+<p><strong>(LC4.19)</strong> Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.</p>
 <div class="learncheck">
 
 </div>
@@ -1432,15 +1444,13 @@ <h3><span class="header-section-number">4.8.3</span> <code>top_n</code> values o
 <h2><span class="header-section-number">4.9</span> Conclusion</h2>
 <div id="summary-table-1" class="section level3">
 <h3><span class="header-section-number">4.9.1</span> Summary table</h3>
-<p>Let’s recap a selection of verbs in Table <a href="4-wrangling.html#tab:wrangle-summary-table">4.1</a> summarizing their differences. Using these verbs and the pipe <code>%&gt;%</code> operator from Section <a href="4-wrangling.html#piping">4.1</a>, you’ll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book.</p>
+<p>Let’s recap our data wrangling verbs in Table <a href="4-wrangling.html#tab:wrangle-summary-table">4.1</a>. Using these verbs and the pipe <code>%&gt;%</code> operator from Section <a href="4-wrangling.html#piping">4.1</a>, you’ll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book.</p>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
 <span id="tab:wrangle-summary-table">TABLE 4.1: </span>Summary of data wrangling verbs
 </caption>
 <thead>
 <tr>
-<th style="text-align:right;">
-</th>
 <th style="text-align:left;">
 Verb
 </th>
@@ -1451,9 +1461,6 @@ <h3><span class="header-section-number">4.9.1</span> Summary table</h3>
 </thead>
 <tbody>
 <tr>
-<td style="text-align:right;">
-1
-</td>
 <td style="text-align:left;width: 0.9in; ">
 <code>filter()</code>
 </td>
@@ -1462,9 +1469,6 @@ <h3><span class="header-section-number">4.9.1</span> Summary table</h3>
 </td>
 </tr>
 <tr>
-<td style="text-align:right;">
-2
-</td>
 <td style="text-align:left;width: 0.9in; ">
 <code>summarize()</code>
 </td>
@@ -1473,9 +1477,6 @@ <h3><span class="header-section-number">4.9.1</span> Summary table</h3>
 </td>
 </tr>
 <tr>
-<td style="text-align:right;">
-3
-</td>
 <td style="text-align:left;width: 0.9in; ">
 <code>group_by()</code>
 </td>
@@ -1484,9 +1485,6 @@ <h3><span class="header-section-number">4.9.1</span> Summary table</h3>
 </td>
 </tr>
 <tr>
-<td style="text-align:right;">
-4
-</td>
 <td style="text-align:left;width: 0.9in; ">
 <code>mutate()</code>
 </td>
@@ -1495,9 +1493,6 @@ <h3><span class="header-section-number">4.9.1</span> Summary table</h3>
 </td>
 </tr>
 <tr>
-<td style="text-align:right;">
-5
-</td>
 <td style="text-align:left;width: 0.9in; ">
 <code>arrange()</code>
 </td>
@@ -1506,9 +1501,6 @@ <h3><span class="header-section-number">4.9.1</span> Summary table</h3>
 </td>
 </tr>
 <tr>
-<td style="text-align:right;">
-6
-</td>
 <td style="text-align:left;width: 0.9in; ">
 <code>inner_join()</code>
 </td>
@@ -1523,7 +1515,7 @@ <h3><span class="header-section-number">4.9.1</span> Summary table</h3>
 <strong><em>Learning check</em></strong>
 </p>
 </div>
-<p><strong>(LC4.19)</strong> Let’s now put your newly acquired data wrangling skills to the test!</p>
+<p><strong>(LC4.20)</strong> Let’s now put your newly acquired data wrangling skills to the test!</p>
 <p>An airline industry measure of a passenger airline’s capacity is the <a href="https://en.wikipedia.org/wiki/Available_seat_miles">available seat miles</a>, which is equal to the number of seats available multiplied by the number of miles or kilometers flown summed over all flights. So for example say an airline had 2 flights using a plane with 10 seats that flew 500 miles and 3 flights using a plane with 20 seats that flew 1000 miles, the available seat miles would be 2 <span class="math inline">\(\times\)</span> 10 <span class="math inline">\(\times\)</span> 500 <span class="math inline">\(+\)</span> 3 <span class="math inline">\(\times\)</span> 20 <span class="math inline">\(\times\)</span> 1000 = 70,000 seat miles.</p>
 <p>Using the datasets included in the <code>nycflights13</code> package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:</p>
 <ol style="list-style-type: decimal">
@@ -1547,6 +1539,7 @@ <h3><span class="header-section-number">4.9.2</span> Additional resources</h3>
 FIGURE 4.10: Data Transformation with dplyr cheatsheat
 </p>
 </div>
+<p>On top of data wrangling verbs and examples we presented in this section, if you’d like to see more examples of using the <code>dplyr</code> package for data wrangling check out <a href="http://r4ds.had.co.nz/transform.html">Chapter 5</a> of Garrett Grolemund and Hadley Wickham’s and Garrett’s book <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>.</p>
 <!--
 Review questions have been designed using the `fivethirtyeight` R package [@R-fivethirtyeight] with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course **Effective Data Storytelling using the `tidyverse`**.  The material in this chapter is covered in the chapters of the DataCamp course available below:
 
@@ -1556,7 +1549,8 @@ <h3><span class="header-section-number">4.9.2</span> Additional resources</h3>
 </div>
 <div id="whats-to-come-1" class="section level3">
 <h3><span class="header-section-number">4.9.3</span> What’s to come?</h3>
-<p>So far in this book, we’ve explored, visualized, and wrangled data saved in data frames that are in spreadsheet-type format: rectangular with a certain number of rows corresponding to observations and a certain number of columns corresponding to variables describing the observations. We’ll see in Chapter <a href="5-tidy.html#tidy">5</a> that there are actually two ways to represent data in spreadsheet-type rectangular format: 1) “wide” format and 2) “tall/narrow” format also known in R circles as “tidy” format. While the distinction between “tidy” and non-“tidy” formatted data is very subtle, it has very large implications for whether or not we can use the <code>ggplot2</code> package for data visualization and the <code>dplyr</code> package for data wrangling.</p>
+<p>So far in this book, we’ve explored, visualized, and wrangled data saved in data frames that are in spreadsheet-type format: rectangular with a certain number of rows corresponding to observations and a certain number of columns corresponding to variables describing the observations.</p>
+<p>We’ll see in Chapter <a href="5-tidy.html#tidy">5</a> that there are actually two ways to represent data in spreadsheet-type rectangular format: 1) “wide” format and 2) “tall/narrow” format also known in R circles as “tidy” format. While the distinction between “tidy” and non-“tidy” formatted data is very subtle, it has very large implications for whether or not we can use the <code>ggplot2</code> package for data visualization and the <code>dplyr</code> package for data wrangling.</p>
 <p>Furthermore, we’ve only explored, visualized, and wrangled data saved within R packages. What if you have spreadsheet data saved in a Microsoft Excel, Google Sheets, or “Comma-Separated Values” (CSV) file that you would like to analyze? In Chapter <a href="5-tidy.html#tidy">5</a>, we’ll show you how to import this data into R using the <code>readr</code> package.</p>
 
 </div>
diff --git a/docs/5-appendixD.html b/docs/5-appendixD.html
deleted file mode 100644
index b6230012d..000000000
--- a/docs/5-appendixD.html
+++ /dev/null
@@ -1,765 +0,0 @@
-<!DOCTYPE html>
-<html >
-
-<head>
-
-  <meta charset="UTF-8">
-  <meta http-equiv="X-UA-Compatible" content="IE=edge">
-  <title>Statistical Inference via Data Science in R</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
-  <meta name="generator" content="bookdown 0.7 and GitBook 2.6.7">
-
-  <meta property="og:title" content="Statistical Inference via Data Science in R" />
-  <meta property="og:type" content="book" />
-  <meta property="og:url" content="https://moderndive.com/" />
-  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
-  <meta name="github-repo" content="moderndive/moderndive_book" />
-
-  <meta name="twitter:card" content="summary" />
-  <meta name="twitter:title" content="Statistical Inference via Data Science in R" />
-  
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
-  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
-
-<meta name="author" content="Chester Ismay and Albert Y. Kim">
-
-
-<meta name="date" content="2018-10-02">
-
-  <meta name="viewport" content="width=device-width, initial-scale=1">
-  <meta name="apple-mobile-web-app-capable" content="yes">
-  <meta name="apple-mobile-web-app-status-bar-style" content="black">
-  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
-  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
-<link rel="prev" href="4-tidy.html">
-
-<script src="libs/jquery-2.2.3/jquery.min.js"></script>
-<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
-
-
-
-
-
-
-
-<script src="libs/kePrint-0.0.1/kePrint.js"></script>
-<script>
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
-
-  ga('create', 'UA-89938436-1', 'auto');
-  ga('send', 'pageview');
-
-</script>
-
-
-<style type="text/css">
-div.sourceCode { overflow-x: auto; }
-table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
-  margin: 0; padding: 0; vertical-align: baseline; border: none; }
-table.sourceCode { width: 100%; line-height: 100%; }
-td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
-td.sourceCode { padding-left: 5px; }
-code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
-code > span.dt { color: #902000; } /* DataType */
-code > span.dv { color: #40a070; } /* DecVal */
-code > span.bn { color: #40a070; } /* BaseN */
-code > span.fl { color: #40a070; } /* Float */
-code > span.ch { color: #4070a0; } /* Char */
-code > span.st { color: #4070a0; } /* String */
-code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
-code > span.ot { color: #007020; } /* Other */
-code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
-code > span.fu { color: #06287e; } /* Function */
-code > span.er { color: #ff0000; font-weight: bold; } /* Error */
-code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
-code > span.cn { color: #880000; } /* Constant */
-code > span.sc { color: #4070a0; } /* SpecialChar */
-code > span.vs { color: #4070a0; } /* VerbatimString */
-code > span.ss { color: #bb6688; } /* SpecialString */
-code > span.im { } /* Import */
-code > span.va { color: #19177c; } /* Variable */
-code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
-code > span.op { color: #666666; } /* Operator */
-code > span.bu { } /* BuiltIn */
-code > span.ex { } /* Extension */
-code > span.pp { color: #bc7a00; } /* Preprocessor */
-code > span.at { color: #7d9029; } /* Attribute */
-code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
-code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
-code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
-code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
-</style>
-
-<link rel="stylesheet" href="style.css" type="text/css" />
-</head>
-
-<body>
-
-
-
-  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
-
-    <div class="book-summary">
-      <nav role="navigation">
-
-<ul class="summary">
-<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.1</b> Introduction for students</a><ul>
-<li class="chapter" data-level="1.1.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.1.1</b> What you will learn from this book</a></li>
-<li class="chapter" data-level="1.1.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.1.2</b> Data/science pipeline</a></li>
-<li class="chapter" data-level="1.1.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.1.3</b> Reproducible research</a></li>
-<li class="chapter" data-level="1.1.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.1.4</b> Final note for students</a></li>
-</ul></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.2</b> Introduction for instructors</a><ul>
-<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.2.1</b> Who is this book for?</a></li>
-</ul></li>
-<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.3</b> DataCamp</a></li>
-<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.4</b> Connect and contribute</a></li>
-<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.5</b> About this book</a></li>
-<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.6</b> About the authors</a></li>
-</ul></li>
-<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
-<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
-<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
-<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
-</ul></li>
-<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
-<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
-<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
-</ul></li>
-<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
-<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
-<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
-</ul></li>
-<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
-<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
-<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
-<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
-<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
-</ul></li>
-<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
-<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
-</ul></li>
-</ul></li>
-<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
-<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
-<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
-<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
-<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
-<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
-<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
-</ul></li>
-<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
-<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
-<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
-<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
-<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
-<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
-<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
-<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
-<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
-<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
-<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
-<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
-<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
-<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
-<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
-<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
-<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
-<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
-<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
-<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
-<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
-<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
-<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
-<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
-</ul></li>
-<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
-<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
-<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
-</ul></li>
-<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
-<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
-<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
-<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
-<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="5" data-path="5-appendixD.html"><a href="5-appendixD.html"><i class="fa fa-check"></i><b>5</b> Learning Check Solutions</a><ul>
-<li class="chapter" data-level="5.1" data-path="5-appendixD.html"><a href="5-appendixD.html#chapter-2-solutions"><i class="fa fa-check"></i><b>5.1</b> Chapter 2 Solutions</a></li>
-<li class="chapter" data-level="5.2" data-path="5-appendixD.html"><a href="5-appendixD.html#chapter-3-solutions"><i class="fa fa-check"></i><b>5.2</b> Chapter 3 Solutions</a></li>
-<li class="chapter" data-level="5.3" data-path="5-appendixD.html"><a href="5-appendixD.html#chapter-4-solutions"><i class="fa fa-check"></i><b>5.3</b> Chapter 4 Solutions</a></li>
-</ul></li>
-</ul>
-
-      </nav>
-    </div>
-
-    <div class="book-body">
-      <div class="body-inner">
-        <div class="book-header" role="navigation">
-          <h1>
-            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Statistical Inference via Data Science in R</a>
-          </h1>
-        </div>
-
-        <div class="page-wrapper" tabindex="-1" role="main">
-          <div class="page-inner">
-
-            <section class="normal" id="section-">
-<html>
-<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
-</html>
-<div id="appendixD" class="section level1">
-<h1><span class="header-section-number">Chapter 5</span> Learning Check Solutions</h1>
-<div id="chapter-2-solutions" class="section level2">
-<h2><span class="header-section-number">5.1</span> Chapter 2 Solutions</h2>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
-<span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(nycflights13)</code></pre></div>
-<p><strong>(LC2.1)</strong> What does any <em>ONE</em> row in this <code>flights</code> dataset refer to?</p>
-<ul>
-<li>A. Data on an airline</li>
-<li>B. Data on a flight</li>
-<li>C. Data on an airport</li>
-<li>D. Data on multiple flights</li>
-</ul>
-<p><strong>Solution</strong>: This is data on a flight. Not a flight path! Example:</p>
-<ul>
-<li>a flight path would be United 1545 to Houston</li>
-<li>a flight would be United 1545 to Houston at a specific date/time. For example: 2013/1/1 at 5:15am.</li>
-</ul>
-<p><strong>(LC2.2)</strong> What are some examples in this dataset of <strong>categorical</strong> variables? What makes them different than <strong>quantitative</strong> variables?</p>
-<p><strong>Solution</strong>: Hint: Type <code>?flights</code> in the console to see what all the variables mean!</p>
-<ul>
-<li>Categorical:
-<ul>
-<li><code>carrier</code> the company</li>
-<li><code>dest</code> the destination</li>
-<li><code>flight</code> the flight number. Even though this is a number, its simply a label. Example United 1545 is not less than United 1714</li>
-</ul></li>
-<li>Quantitative:
-<ul>
-<li><code>distance</code> the distance in miles</li>
-<li><code>time_hour</code> time</li>
-</ul></li>
-</ul>
-<p><strong>(LC2.3)</strong> What does <code>int</code>, <code>dbl</code>, and <code>chr</code> mean in the output above?</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li><code>int</code>: integer. Used to count things i.e. a discrete value. Ex: the # of cars parked in a lot</li>
-<li><code>dbl</code>: double. Used to measure things. i.e. a continuous value. Ex: your height in inches</li>
-<li><code>chr</code>: character. i.e. text</li>
-</ul>
-<hr />
-</div>
-<div id="chapter-3-solutions" class="section level2">
-<h2><span class="header-section-number">5.2</span> Chapter 3 Solutions</h2>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(nycflights13)
-<span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(dplyr)</code></pre></div>
-<p><strong>(LC3.1)</strong> Take a look at both the <code>flights</code> and <code>alaska_flights</code> data frames by running <code>View(flights)</code> and <code>View(alaska_flights)</code> in the console. In what respect do these data frames differ?</p>
-<p><strong>Solution</strong>: <code>flights</code> contains all flight data, while <code>alaska_flights</code> contains only data from Alaskan carrier “AS”. We can see that flights has 336776 rows while <code>alaska_flights</code> has only 714</p>
-<p><strong>(LC3.2)</strong> What are some practical reasons why <code>dep_delay</code> and <code>arr_delay</code> have a positive relationship?</p>
-<p><strong>Solution</strong>: The later a plane departs, typically the later it will arrive.</p>
-<p><strong>(LC3.3)</strong> What variables (not necessarily in the <code>flights</code> data frame) would you expect to have a negative correlation (i.e. a negative relationship) with <code>dep_delay</code>? Why? Remember that we are focusing on numerical variables here.</p>
-<p><strong>Solution</strong>: An example in the <code>weather</code> dataset is <code>visibility</code>, which measure visibility in miles. As visibility increases, we would expect departure delays to decrease.</p>
-<p><strong>(LC3.4)</strong> Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights?</p>
-<p><strong>Solution</strong>: The point (0,0) means no delay in departure nor arrival. From the point of view of Alaska airlines, this means the flight was on time. It seems most flights are at least close to being on time.</p>
-<p><strong>(LC3.5)</strong> What are some other features of the plot that stand out to you?</p>
-<p><strong>Solution</strong>: Different people will answer this one differently. One answer is most flights depart and arrive less than an hour late.</p>
-<p><strong>(LC3.6)</strong> Create a new scatterplot using different variables in the <code>alaska_flights</code> data frame by modifying the example above.</p>
-<p><strong>Solution</strong>: Many possibilities for this one, see the plot below. Is there a pattern in departure delay depending on when the flight is scheduled to depart? Interestingly, there seems to be only two blocks of time where flights depart.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_time, <span class="dt">y =</span> dep_delay)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_point</span>()</code></pre></div>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-82-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<p><strong>(LC3.7)</strong> Why is setting the <code>alpha</code> argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot?</p>
-<p><strong>Solution</strong>: Why is setting the <code>alpha</code> argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? <em>It thins out the points so we address overplotting. But more importantly it hints at the (statistical) <strong>density</strong> and <strong>distribution</strong> of the points: where are the points concentrated, where do they occur. We will see more about densities and distributions in Chapter 6 when we switch gears to statistical topics.</em></p>
-<p><strong>(LC3.8)</strong> After viewing the Figure <a href="3-viz.html#fig:alpha">3.4</a> above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the <code>alpha = 0.2</code> set in Figure <a href="3-viz.html#fig:noalpha">3.2</a>?</p>
-<p><strong>Solution</strong>: After viewing the Figure <a href="3-viz.html#fig:alpha">3.4</a> above, give a range of arrival delays and departure delays that occur most frequently? How has that region changed compared to when you observed the same plot without the <code>alpha = 0.2</code> set in Figure <a href="3-viz.html#fig:noalpha">3.2</a>? <em>The lower plot suggests that most Alaska flights from NYC depart between 12 minutes early and on time and arrive between 50 minutes early and on time.</em></p>
-<p><strong>(LC3.9)</strong> Take a look at both the <code>weather</code> and <code>early_january_weather</code> data frames by running <code>View(weather)</code> and <code>View(early_january_weather)</code> in the console. In what respect do these data frames differ?</p>
-<p><strong>Solution</strong>: Take a look at both the <code>weather</code> and <code>early_january_weather</code> data frames by running <code>View(weather)</code> and <code>View(early_january_weather)</code> in the console. In what respect do these data frames differ? <em>The rows of <code>early_january_weather</code> are a subset of <code>weather</code>.</em></p>
-<p><strong>(LC3.10)</strong> <code>View()</code> the <code>flights</code> data frame again. Why does the <code>time_hour</code> variable uniquely identify the hour of the measurement whereas the <code>hour</code> variable does not?</p>
-<p><strong>Solution</strong>: <code>View()</code> the <code>flights</code> data frame again. Why does the <code>time_hour</code> variable correctly identify the hour of the measurement whereas the <code>hour</code> variable does not? <em>Because to uniquely identify an hour, we need the <code>year</code>/<code>month</code>/<code>day</code>/<code>hour</code> sequence, whereas there are only 24 possible <code>hour</code>’s.</em></p>
-<p><strong>(LC3.11)</strong> Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis?</p>
-<p><strong>Solution</strong>: Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? <em>Because lines suggest connectedness and ordering.</em></p>
-<p><strong>(LC3.12)</strong> Why are linegraphs frequently used when time is the explanatory variable?</p>
-<p><strong>Solution</strong>: Why are linegraphs frequently used when time is the explanatory variable? <em>Because time is sequential: subsequent observations are closely related to each other.</em></p>
-<p><strong>(LC3.13)</strong> Plot a time series of a variable other than <code>temp</code> for Newark Airport in the first 15 days of January 2013.</p>
-<p><strong>Solution</strong>: Plot a time series of a variable other than <code>temp</code> for Newark Airport in the first 15 days of January 2013. <em>Humidity is a good one to look at, since this very closely related to the cycles of a day.</em></p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> early_january_weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> time_hour, <span class="dt">y =</span> humid)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_line</span>()</code></pre></div>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-83-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<p><strong>(LC3.14)</strong> What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures?</p>
-<p><strong>Solution</strong>: The distribution doesn’t change much. But by refining the bid width, we see that the temperature data has a high degree of accuracy. What do I mean by accuracy? Looking at the <code>temp</code> variabile by <code>View(weather)</code>, we see that the precision of each temperature recording is 2 decimal places.</p>
-<p><strong>(LC3.15)</strong> Would you classify the distribution of temperatures as symmetric or skewed?</p>
-<p><strong>Solution</strong>: It is rather symmetric, i.e. there are no <strong>long tails</strong> on only one side of the distribution</p>
-<p><strong>(LC3.16)</strong> What would you guess is the “center” value in this distribution? Why did you make that choice?</p>
-<p><strong>Solution</strong>: The center is around 55.2603921°F. By running the <code>summary()</code> command, we see that the mean and median are very similar. In fact, when the distribution is symmetric the mean equals the median.</p>
-<p><strong>(LC3.17)</strong> Is this data spread out greatly from the center or is it close? Why?</p>
-<p><strong>Solution</strong>: This can only be answered relatively speaking! Let’s pick things to be relative to Seattle, WA temperatures:</p>
-<div class="figure">
-<img src="images/temp.png" />
-
-</div>
-<p>While, it appears that Seattle weather has a similar center of 55°F, its temperatures are almost entirely between 35°F and 75°F for a range of about 40°F. Seattle temperatures are much less spread out than New York i.e. much more consistent over the year. New York on the other hand has much colder days in the winter and much hotter days in the summer. Expressed differently, the middle 50% of values, as delineated by the interquartile range is 30°F:</p>
-<p><strong>(LC3.18)</strong> What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables?</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li>Certain months have much more consistent weather (August in particular), while others have crazy variability like January and October, representing changes in the seasons.</li>
-<li>The two variables we are see the relationship of are <code>temp</code> and <code>month</code>.</li>
-</ul>
-<p><strong>(LC3.19)</strong> What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100?</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li>While month is technically a number between 1-12, we’re viewing it as a categorical variable here. Specifically an <strong>ordinal categorical</strong> variable since there is a ordering to the categories</li>
-<li>25, 50, 75, 100 are temperatures</li>
-</ul>
-<p><strong>(LC3.20)</strong> For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics.</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li>We’d have 365 facets to look at. Way to many.</li>
-<li>We don’t really care about day-to-day fluctuation in weather so much, but maybe more week-to-week variation. We’d like to focus on seasonal trends.</li>
-</ul>
-<p><strong>(LC3.21)</strong> Does the <code>temp</code> variable in the <code>weather</code> data-set have a lot of variability? Why do you say that?</p>
-<p><strong>Solution</strong>: Again, like in LC (LC3.17), this is a relative question. I would say yes, because in New York City, you have 4 clear seasons with different weather. Whereas in Seattle WA and Portland OR, you have two seasons: summer and rain!</p>
-<p><strong>(LC3.22)</strong> What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point.</p>
-<p><strong>Solution</strong>: It appears to be an outlier. Let’s revisit the use of the <code>filter</code> command to hone in on it. We want all data points where the <code>month</code> is 5 and <code>temp&lt;25</code></p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">weather <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">filter</span>(month<span class="op">==</span><span class="dv">5</span> <span class="op">&amp;</span><span class="st"> </span>temp <span class="op">&lt;</span><span class="st"> </span><span class="dv">25</span>)</code></pre></div>
-<pre><code># A tibble: 1 x 15
-  origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
-  &lt;chr&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt;      &lt;dbl&gt;     &lt;dbl&gt;
-1 JFK     2013     5     8    22  13.1  12.0  95.3       80       8.06        NA
-# ... with 4 more variables: precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;,
-#   time_hour &lt;dttm&gt;</code></pre>
-<p>There appears to be only one hour and only at JFK that recorded 13.1 F (-10.5 C) in the month of May. This is probably a data entry mistake! Why wasn’t the weather at least similar at EWR (Newark) and LGA (La Guardia)?</p>
-<p><strong>(LC3.23)</strong> Which months have the highest variability in temperature? What reasons do you think this is?</p>
-<p><strong>Solution</strong>: We are now interested in the <strong>spread</strong> of the data. One measure some of you may have seen previously is the standard deviation. But in this plot we can read off the Interquartile Range (IQR):</p>
-<ul>
-<li>The distance from the 1st to the 3rd quartiles i.e. the length of the boxes</li>
-<li>You can also think of this as the spread of the <strong>middle 50%</strong> of the data</li>
-</ul>
-<p>Just from eyeballing it, it seems</p>
-<ul>
-<li>November has the biggest IQR, i.e. the widest box, so has the most variation in temperature</li>
-<li>August has the smallest IQR, i.e. the narrowest box, so is the most consistent temperature-wise</li>
-</ul>
-<p>Here’s how we compute the exact IQR values for each month (we’ll see this more in depth Chapter 5 of the text):</p>
-<ol style="list-style-type: decimal">
-<li><code>group</code> the observations by <code>month</code> then</li>
-<li>for each <code>group</code>, i.e. <code>month</code>, <code>summarize</code> it by applying the summary statistic function <code>IQR()</code>, while making sure to skip over missing data via <code>na.rm=TRUE</code> then</li>
-<li><code>arrange</code> the table in <code>desc</code>ending order of <code>IQR</code></li>
-</ol>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">weather <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">IQR =</span> <span class="kw">IQR</span>(temp, <span class="dt">na.rm=</span><span class="ot">TRUE</span>)) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(IQR))</code></pre></div>
-<table>
-<thead>
-<tr>
-<th style="text-align:right;">
-month
-</th>
-<th style="text-align:right;">
-IQR
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:right;">
-16.02
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:right;">
-14.04
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:right;">
-13.77
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:right;">
-12.06
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:right;">
-12.06
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:right;">
-11.88
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:right;">
-10.98
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:right;">
-10.98
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:right;">
-10.08
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:right;">
-9.18
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:right;">
-9.00
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:right;">
-7.02
-</td>
-</tr>
-</tbody>
-</table>
-<p><strong>(LC3.24)</strong> We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?</p>
-<p><strong>Solution</strong>: Because we need a way to group many numerical observations together, say by grouping by month. For pressure, we have near unique values for pressure, i.e. no groups, so we can’t make boxplots.</p>
-<p><strong>(LC3.25)</strong> Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?</p>
-<p><strong>Solution</strong>: In a histogram, the bin corresponding to where an outlier lies may not by high enough for us to see. In a boxplot, they are explicitly labelled separately.</p>
-<p><strong>(LC3.26)</strong> Why are histograms inappropriate for visualizing categorical variables?</p>
-<p><strong>Solution</strong>: Histograms are for numerical variables i.e. the horizontal part of each histogram bar represents an interval, whereas for a categorical variable each bar represents only one level of the categorical variable.</p>
-<p><strong>(LC3.27)</strong> What is the difference between histograms and barplots?</p>
-<p><strong>Solution</strong>: See above.</p>
-<p><strong>(LC3.28)</strong> How many Envoy Air flights departed NYC in 2013?</p>
-<p><strong>Solution</strong>: Envoy Air is carrier code <code>MQ</code> and thus 26397 flights departed NYC in 2013.</p>
-<p><strong>(LC3.29)</strong> What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly?</p>
-<p><strong>Solution</strong>: What a pain! We’ll see in Chapter 5 on Data Wrangling that applying <code>arrange(desc(n))</code> will sort this table in descending order of <code>n</code>!</p>
-<p><strong>(LC3.30)</strong> Why should pie charts be avoided and replaced by barplots?</p>
-<p><strong>Solution</strong>: In our <strong>opinion</strong>, comparisons using horizontal lines are easier than comparing angles and areas of circles.</p>
-<p><strong>(LC3.31)</strong> What is your opinion as to why pie charts continue to be used?</p>
-<p><strong>Solution</strong>: Legacy?</p>
-<p><strong>(LC3.32)</strong> What kinds of questions are not easily answered by looking at the above figure?</p>
-<p><strong>Solution</strong>: Because the red, green, and blue bars don’t all start at 0 (only red does), it makes comparing counts hard.</p>
-<p><strong>(LC3.33)</strong> What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights?</p>
-<p><strong>Solution</strong>: The different airlines prefer different airports. For example, United is mostly a Newark carrier and JetBlue is a JFK carrier. If airlines didn’t prefer airports, each color would be roughly one third of each bar.}</p>
-<p><strong>(LC3.34)</strong> Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case?</p>
-<p><strong>Solution</strong>: We can easily compare the different aiports for a given carrier using a single comparison line i.e. things are lined up</p>
-<p><strong>(LC3.35)</strong> What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general?</p>
-<p><strong>Solution</strong>: It is hard to get totals for each airline.</p>
-<p><strong>(LC3.36)</strong> Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case?</p>
-<p><strong>Solution</strong>: Not that different than using side-by-side; depends on how you want to organize your presentation.</p>
-<p><strong>(LC3.37)</strong> What information about the different carriers at different airports is more easily seen in the faceted barplot?</p>
-<p><strong>Solution</strong>: Now we can also compare the different carriers <strong>within</strong> a particular airport easily too. For example, we can read off who the top carrier for each airport is easily using a single horizontal line.</p>
-<hr />
-</div>
-<div id="chapter-4-solutions" class="section level2">
-<h2><span class="header-section-number">5.3</span> Chapter 4 Solutions</h2>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
-<span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(nycflights13)
-<span class="kw">library</span>(tidyr)
-<span class="kw">library</span>(readr)</code></pre></div>
-<p><strong>(LC4.1)</strong> Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article <a href="https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/">Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?</a></p>
-<pre><code># A tibble: 3 x 4
-  country     beer_servings spirit_servings wine_servings
-  &lt;chr&gt;               &lt;int&gt;           &lt;int&gt;         &lt;int&gt;
-1 Canada                240             122           100
-2 South Korea           140              16             9
-3 USA                   249             158            84</code></pre>
-<p>This data frame is not in tidy format. What would it look like if it were?</p>
-<p><strong>Solution</strong>: There are three variables of information included: country, alcohol type, and number of servings. In tidy format, each of these variables of information are included in their own column.</p>
-<pre><code># A tibble: 9 x 3
-  country     `alcohol type` servings
-  &lt;chr&gt;       &lt;chr&gt;             &lt;int&gt;
-1 Canada      beer                240
-2 Canada      spirit              122
-3 Canada      wine                100
-4 South Korea beer                140
-5 South Korea spirit               16
-6 South Korea wine                  9
-7 USA         beer                249
-8 USA         spirit              158
-9 USA         wine                 84</code></pre>
-<p>Note that how the rows are sorted is inconsequential in whether or not the data frame is in tidy format. In other words, the following data frame sorted by alcohol type instead of country is equally in tidy format.</p>
-<pre><code># A tibble: 9 x 3
-  country     `alcohol type` servings
-  &lt;chr&gt;       &lt;chr&gt;             &lt;int&gt;
-1 Canada      beer                240
-2 South Korea beer                140
-3 USA         beer                249
-4 Canada      spirit              122
-5 South Korea spirit               16
-6 USA         spirit              158
-7 Canada      wine                100
-8 South Korea wine                  9
-9 USA         wine                 84</code></pre>
-<p><strong>(LC4.2)</strong> What properties of the observational unit do each of <code>lat</code>, <code>lon</code>, <code>alt</code>, <code>tz</code>, <code>dst</code>, and <code>tzone</code> describe for the <code>airports</code> data frame? Note that you may want to use <code>?airports</code> to get more information.</p>
-<p><strong>Solution</strong>: <code>lat</code> <code>long</code> represent the airport geographic coordinates, <code>alt</code> is the altitude above sea level of the airport (Run <code>airports %&gt;% filter(faa == &quot;DEN&quot;)</code> to see the altitude of Denver International Airport), <code>tz</code> is the time zone difference with respect to GMT in London UK, <code>dst</code> is the daylight savings time zone, and <code>tzone</code> is the time zone label.</p>
-<p><strong>(LC4.3)</strong> Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li>In the <code>weather</code> example in LC3.8, the combination of <code>origin</code>, <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code> are identification variables as they identify the observation in question.</li>
-<li>Anything else pertains to observations: <code>temp</code>, <code>humid</code>, <code>wind_speed</code>, etc.</li>
-</ul>
-<p><strong>(LC4.4)</strong> Convert the <code>dem_score</code> data frame into a tidy data frame and assign the name of <code>dem_score_tidy</code> to the resulting long-formatted data frame.</p>
-<p><strong>Solution</strong>: Running the following in the console:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dem_score_tidy &lt;-<span class="st"> </span><span class="kw">gather</span>(<span class="dt">data =</span> dem_score, <span class="dt">key =</span> year, <span class="dt">value =</span> democracy_score, <span class="op">-</span><span class="st"> </span>country)</code></pre></div>
-<p>Let’s now compare the <code>dem_score</code> and <code>dem_score_tidy</code>. <code>dem_score</code> has democracy score information for each year in columns, whereas in <code>dem_score_tidy</code> there are explicit variables <code>year</code> and <code>democracy_score</code>. While both representations of the data contain the same information, we can only use <code>ggplot()</code> to create plots using the <code>dem_score_tidy</code> data frame.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dem_score</code></pre></div>
-<pre><code># A tibble: 96 x 10
-   country    `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
-   &lt;chr&gt;       &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;
- 1 Albania        -9     -9     -9     -9     -9     -9     -9     -9      5
- 2 Argentina      -9     -1     -1     -9     -9     -9     -8      8      7
- 3 Armenia        -9     -7     -7     -7     -7     -7     -7     -7      7
- 4 Australia      10     10     10     10     10     10     10     10     10
- 5 Austria        10     10     10     10     10     10     10     10     10
- 6 Azerbaijan     -9     -7     -7     -7     -7     -7     -7     -7      1
- 7 Belarus        -9     -7     -7     -7     -7     -7     -7     -7      7
- 8 Belgium        10     10     10     10     10     10     10     10     10
- 9 Bhutan        -10    -10    -10    -10    -10    -10    -10    -10    -10
-10 Bolivia        -4     -3     -3     -4     -7     -7      8      9      9
-# ... with 86 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dem_score_tidy</code></pre></div>
-<pre><code># A tibble: 864 x 3
-   country    year  democracy_score
-   &lt;chr&gt;      &lt;chr&gt;           &lt;int&gt;
- 1 Albania    1952               -9
- 2 Argentina  1952               -9
- 3 Armenia    1952               -9
- 4 Australia  1952               10
- 5 Austria    1952               10
- 6 Azerbaijan 1952               -9
- 7 Belarus    1952               -9
- 8 Belgium    1952               10
- 9 Bhutan     1952              -10
-10 Bolivia    1952               -4
-# ... with 854 more rows</code></pre>
-<p><strong>(LC4.5)</strong> Read in the life expectancy data stored at <a href="https://moderndive.com/data/le_mess.csv" class="uri">https://moderndive.com/data/le_mess.csv</a> and convert it to a tidy data frame.</p>
-<p><strong>Solution</strong>: The code is similar</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">life_expectancy &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&#39;https://moderndive.com/data/le_mess.csv&#39;</span>)
-life_expectancy_tidy &lt;-<span class="st"> </span><span class="kw">gather</span>(<span class="dt">data =</span> life_expectancy, <span class="dt">key =</span> year, <span class="dt">value =</span> life_expectancy, <span class="op">-</span>country)</code></pre></div>
-<p>We observe the same construct structure with respect to <code>year</code> in <code>life_expectancy</code> vs <code>life_expectancy_tidy</code> as we did in <code>dem_score</code> vs <code>dem_score_tidy</code>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">life_expectancy</code></pre></div>
-<pre><code># A tibble: 202 x 67
-   country `1951` `1952` `1953` `1954` `1955` `1956` `1957` `1958` `1959` `1960`
-   &lt;chr&gt;    &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
- 1 Afghan…   27.1   27.7   28.2   28.7   29.3   29.8   30.3   30.9   31.4   31.9
- 2 Albania   54.7   55.2   55.8   56.6   57.4   58.4   59.5   60.6   61.8   62.9
- 3 Algeria   43.0   43.5   44.0   44.4   44.9   45.4   45.9   46.4   47.0   47.5
- 4 Angola    31.0   31.6   32.1   32.7   33.2   33.8   34.3   34.9   35.4   36.0
- 5 Antigu…   58.3   58.8   59.3   59.9   60.4   60.9   61.4   62.0   62.5   63.0
- 6 Argent…   61.9   62.5   63.1   63.6   64.0   64.4   64.7   65     65.2   65.4
- 7 Armenia   62.7   63.1   63.6   64.1   64.5   65     65.4   65.9   66.4   66.9
- 8 Aruba     59.0   60.0   61.0   61.9   62.7   63.4   64.1   64.7   65.2   65.7
- 9 Austra…   68.7   69.1   69.7   69.8   70.2   70.0   70.3   70.9   70.4   70.9
-10 Austria   65.2   66.8   67.3   67.3   67.6   67.7   67.5   68.5   68.4   68.8
-# ... with 192 more rows, and 56 more variables: `1961` &lt;dbl&gt;, `1962` &lt;dbl&gt;,
-#   `1963` &lt;dbl&gt;, `1964` &lt;dbl&gt;, `1965` &lt;dbl&gt;, `1966` &lt;dbl&gt;, `1967` &lt;dbl&gt;,
-#   `1968` &lt;dbl&gt;, `1969` &lt;dbl&gt;, `1970` &lt;dbl&gt;, `1971` &lt;dbl&gt;, `1972` &lt;dbl&gt;,
-#   `1973` &lt;dbl&gt;, `1974` &lt;dbl&gt;, `1975` &lt;dbl&gt;, `1976` &lt;dbl&gt;, `1977` &lt;dbl&gt;,
-#   `1978` &lt;dbl&gt;, `1979` &lt;dbl&gt;, `1980` &lt;dbl&gt;, `1981` &lt;dbl&gt;, `1982` &lt;dbl&gt;,
-#   `1983` &lt;dbl&gt;, `1984` &lt;dbl&gt;, `1985` &lt;dbl&gt;, `1986` &lt;dbl&gt;, `1987` &lt;dbl&gt;,
-#   `1988` &lt;dbl&gt;, `1989` &lt;dbl&gt;, `1990` &lt;dbl&gt;, `1991` &lt;dbl&gt;, `1992` &lt;dbl&gt;,
-#   `1993` &lt;dbl&gt;, `1994` &lt;dbl&gt;, `1995` &lt;dbl&gt;, `1996` &lt;dbl&gt;, `1997` &lt;dbl&gt;,
-#   `1998` &lt;dbl&gt;, `1999` &lt;dbl&gt;, `2000` &lt;dbl&gt;, `2001` &lt;dbl&gt;, `2002` &lt;dbl&gt;,
-#   `2003` &lt;dbl&gt;, `2004` &lt;dbl&gt;, `2005` &lt;dbl&gt;, `2006` &lt;dbl&gt;, `2007` &lt;dbl&gt;,
-#   `2008` &lt;dbl&gt;, `2009` &lt;dbl&gt;, `2010` &lt;dbl&gt;, `2011` &lt;dbl&gt;, `2012` &lt;dbl&gt;,
-#   `2013` &lt;dbl&gt;, `2014` &lt;dbl&gt;, `2015` &lt;dbl&gt;, `2016` &lt;dbl&gt;</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">life_expectancy_tidy</code></pre></div>
-<pre><code># A tibble: 13,332 x 3
-   country             year  life_expectancy
-   &lt;chr&gt;               &lt;chr&gt;           &lt;dbl&gt;
- 1 Afghanistan         1951             27.1
- 2 Albania             1951             54.7
- 3 Algeria             1951             43.0
- 4 Angola              1951             31.0
- 5 Antigua and Barbuda 1951             58.3
- 6 Argentina           1951             61.9
- 7 Armenia             1951             62.7
- 8 Aruba               1951             59.0
- 9 Australia           1951             68.7
-10 Austria             1951             65.2
-# ... with 13,322 more rows</code></pre>
-<p><strong>(LC4.6)</strong> What are common characteristics of “tidy” datasets?</p>
-<p><strong>Solution</strong>: Rows correspond to observations, while columns correspond to variables.</p>
-<p><strong>(LC4.7)</strong> What makes “tidy” datasets useful for organizing data?</p>
-<p><strong>Solution</strong>: Tidy datasets are an organized way of viewing data. We’ll see later that this format is required for the <code>ggplot2</code> and <code>dplyr</code> packages for data visualization and wrangling.</p>
-<p><strong>(LC4.8)</strong> What are some advantages of data in normal forms? What are some disadvantages?</p>
-<p><strong>Solution</strong>: When datasets are in normal form, we can easily <code>_join</code> them with other datasets! For example, can we join the <code>flights</code> data with the <code>planes</code> data? We’ll see this more in Chapter 5!</p>
-
-<div id="refs" class="references">
-<div>
-<p>Chihara, Laura M., and Tim C. Hesterberg. 2011. <em>Mathematical Statistics with Resampling and R</em>. Hoboken, NJ: John Wiley; Sons. <a href="https://sites.google.com/site/chiharahesterberg/home" class="uri">https://sites.google.com/site/chiharahesterberg/home</a>.</p>
-</div>
-<div>
-<p>Diez, David M, Christopher D Barr, and Mine Çetinkaya-Rundel. 2014. <em>Introductory Statistics with Randomization and Simulation</em>. First Edition. <a href="https://www.openintro.org/stat/textbook.php?stat_book=isrs" class="uri">https://www.openintro.org/stat/textbook.php?stat_book=isrs</a>.</p>
-</div>
-<div>
-<p>Grolemund, Garrett, and Hadley Wickham. 2016. <em>R for Data Science</em>. <a href="http://r4ds.had.co.nz/" class="uri">http://r4ds.had.co.nz/</a>.</p>
-</div>
-<div>
-<p>Ismay, Chester. 2016. <em>Getting Used to R, RStudio, and R Markdown</em>. <a href="http://ismayc.github.io/rbasics-book" class="uri">http://ismayc.github.io/rbasics-book</a>.</p>
-</div>
-<div>
-<p>Kim, Albert Y., Chester Ismay, and Jennifer Chunn. 2018. <em>Fivethirtyeight: Data and Code Behind the Stories and Interactives at ’Fivethirtyeight’</em>. <a href="https://github.com/rudeboybert/fivethirtyeight" class="uri">https://github.com/rudeboybert/fivethirtyeight</a>.</p>
-</div>
-<div>
-<p>Robbins, Naomi. 2013. <em>Creating More Effective Graphs</em>. Chart House.</p>
-</div>
-<div>
-<p>Wickham, Hadley. 2014. “Tidy Data.” <em>Journal of Statistical Software</em> Volume 59 (Issue 10). <a href="https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf" class="uri">https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf</a>.</p>
-</div>
-<div>
-<p>———. 2018. <em>Nycflights13: Flights That Departed Nyc in 2013</em>. <a href="https://CRAN.R-project.org/package=nycflights13" class="uri">https://CRAN.R-project.org/package=nycflights13</a>.</p>
-</div>
-<div>
-<p>Wickham, Hadley, and Lionel Henry. 2018. <em>Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions</em>. <a href="https://CRAN.R-project.org/package=tidyr" class="uri">https://CRAN.R-project.org/package=tidyr</a>.</p>
-</div>
-<div>
-<p>Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. <em>Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics</em>.</p>
-</div>
-<div>
-<p>Wilkinson, Leland. 2005. <em>The Grammar of Graphics (Statistics and Computing)</em>. Secaucus, NJ, USA: Springer-Verlag New York, Inc.</p>
-</div>
-<div>
-<p>Xie, Yihui. 2018. <em>Bookdown: Authoring Books and Technical Documents with R Markdown</em>. <a href="https://CRAN.R-project.org/package=bookdown" class="uri">https://CRAN.R-project.org/package=bookdown</a>.</p>
-</div>
-</div>
-</div>
-</div>
-            </section>
-
-          </div>
-        </div>
-      </div>
-<a href="4-tidy.html" class="navigation navigation-prev navigation-unique" aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
-
-    </div>
-  </div>
-<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
-<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
-<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
-<script>
-gitbook.require(["gitbook"], function(gitbook) {
-gitbook.start({
-"sharing": {
-"github": false,
-"facebook": true,
-"twitter": true,
-"google": false,
-"linkedin": false,
-"weibo": false,
-"instapper": false,
-"vk": false,
-"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
-},
-"fontsettings": {
-"theme": "white",
-"family": "sans",
-"size": 2
-},
-"edit": {
-"link": "https://github.com/moderndive/moderndive_book/edit/master/94-appendixD.Rmd",
-"text": "Edit"
-},
-"download": null,
-"toc": {
-"collapse": "section",
-"scroll_highlight": true
-}
-});
-});
-</script>
-
-<!-- dynamically load mathjax for compatibility with self-contained -->
-<script>
-  (function () {
-    var script = document.createElement("script");
-    script.type = "text/javascript";
-    var src = "";
-    if (src === "" || src === "true") src = "https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML";
-    if (location.protocol !== "file:" && /^https?:/.test(src))
-      src = src.replace(/^https?:/, '');
-    script.src = src;
-    document.getElementsByTagName("head")[0].appendChild(script);
-  })();
-</script>
-</body>
-
-</html>
diff --git a/docs/5-tidy.html b/docs/5-tidy.html
index 519007dc5..0909ad5c6 100644
--- a/docs/5-tidy.html
+++ b/docs/5-tidy.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 5 Data Importing &amp; “Tidy” Data | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 5 Data Importing &amp; “Tidy” Data | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 5 Data Importing &amp; “Tidy” Data | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -538,9 +531,9 @@ <h1>
 </html>
 <div id="tidy" class="section level1">
 <h1><span class="header-section-number">Chapter 5</span> Data Importing &amp; “Tidy” Data</h1>
-<p>In Subsection <a href="2-getting-started.html#programming-concepts">2.2.1</a> we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section <a href="2-getting-started.html#nycflights13">2.4</a>, we started exploring our first data frame: the <code>flights</code> data frame included in the <code>nycflights13</code> package. In Chapter <a href="3-viz.html#viz">3</a> we created visualizations based on the data included in <code>flights</code> and other data frames such as <code>weather</code>. In Chapter <a href="4-wrangling.html#wrangling">4</a>, we learned how to wrangle data, in other words take existing data frames and transform and modify them to suit our desired analysis.</p>
-<p>In this final chapter of the “Data Science via the tidyverse” portion of the book, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest of having your data “neatly organized” in a spreadsheet. Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored and the implications of these rules for analyses.</p>
-<p>Although knowledge of this type of data formatting was not necessary in our treatment of data visualization in Chapter <a href="3-viz.html#viz">3</a> since all the data was already in tidy format, we’ll see going forward that having tidy data will allow you to more easily create data visualizations in a wide range of settings. Furthermore, it will also help you with data wrangling in Chapter <a href="4-wrangling.html#wrangling">4</a> and in all subsequent chapters in this book when we cover regression and discuss statistical inference.</p>
+<p>In Subsection <a href="2-getting-started.html#programming-concepts">2.2.1</a> we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section <a href="2-getting-started.html#nycflights13">2.4</a>, we started exploring our first data frame: the <code>flights</code> data frame included in the <code>nycflights13</code> package. In Chapter <a href="3-viz.html#viz">3</a> we created visualizations based on the data included in <code>flights</code> and other data frames such as <code>weather</code>. In Chapter <a href="4-wrangling.html#wrangling">4</a>, we learned how to wrangle data, in other words take existing data frames and transform/ modify them to suit our analysis goals.</p>
+<p>In this final chapter of the “Data Science via the tidyverse” portion of the book, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest: having your data “neatly organized.” Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored, and the implications of these rules for analyses.</p>
+<p>Although knowledge of this type of data formatting was not necessary for our treatment of data visualization in Chapter <a href="3-viz.html#viz">3</a> and data wrangling in Chapter <a href="4-wrangling.html#wrangling">4</a> since all the data was already in “tidy” format, we’ll now see this format is actually essential to using the tools we covered in these two chapters. Furthermore, it will also be useful for all subsequent chapters in this book when we cover regression and statistical inference. First however, we’ll show you how to import spreadsheet data for use in R.</p>
 <div id="needed-packages-2" class="section level3 unnumbered">
 <h3>Needed packages</h3>
 <p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
@@ -554,21 +547,21 @@ <h3>Needed packages</h3>
 </div>
 <div id="csv" class="section level2">
 <h2><span class="header-section-number">5.1</span> Importing data</h2>
-<p>Up to this point, we’ve almost entirely used data stored inside of an R package. Another common way to getting data into R is by importing from a spreadsheet file either on your computer or online. Spreadsheet data is often saved in one of two formats:</p>
+<p>Up to this point, we’ve almost entirely used data stored inside of an R package. Say instead you have your own data saved on your computer or somewhere online? How can you analyze this data in R? Spreadsheet data is often saved in one of the following formats:</p>
 <ul>
-<li>A <em>Comma Separated Values</em> <code>.csv</code> file. You can think of a CSV file as a bare-bones spreadsheet where:
+<li>A <em>Comma Separated Values</em> <code>.csv</code> file. You can think of a <code>.csv</code> file as a bare-bones spreadsheet where:
 <ul>
 <li>Each line in the file corresponds to one row of data/one observation.</li>
 <li>Values for each line are separated with commas. In other words, the values of different variables are separated by commas.</li>
 <li>The first line is often, but not always, a <em>header</em> row indicating the names of the columns/variables.</li>
 </ul></li>
-<li>An Excel <code>.xlsx</code> file. This format is based on Microsoft’s proprietary Excel software. As opposed to a bare-bones <code>.csv</code> files, <code>.xlsx</code> Excel files contain a lot of <em>metadata</em>, or put more simply, data about the data. Examples include the use of bold and italic fonts, colored cells, different column widths, and formula macros etc.</li>
+<li>An Excel <code>.xlsx</code> file. This format is based on Microsoft’s proprietary Excel software. As opposed to a bare-bones <code>.csv</code> files, <code>.xlsx</code> Excel files contain a lot of meta-data, or put more simply, data about the data. (Recall we saw a previous example of meta-data in Section <a href="4-wrangling.html#groupby">4.4</a> when adding “group structure” meta-data to a data frame by using the <code>group_by()</code> verb.) Some examples of spreadsheet meta-data include the use of bold and italic fonts, colored cells, different column widths, and formula macros.</li>
+<li>A <a href="https://www.google.com/sheets/about/">Google Sheets</a> file, which is a “cloud” or online-based way to work with a spreadsheet. Google Sheets allows you to download your data in both comma separated values <code>.csv</code> and Excel <code>.xlsx</code> formats however: go to the Google Sheets menu bar -&gt; File -&gt; Download as -&gt; Select “Microsoft Excel” or “Comma-separated values.”</li>
 </ul>
-<p><a href="https://www.google.com/sheets/about/">Google Sheets</a> allows you to download your data in both comma separated values <code>.csv</code> and Excel <code>.xlsx</code> formats: Go to the Google Sheets menu bar -&gt; File -&gt; Download as -&gt; Select “Microsoft Excel” or “Comma-separated values.”</p>
-<p>We’ll cover two methods for importing data in R: one using the R console and the other using RStudio’s graphical interface.</p>
-<div id="importing-via-the-console" class="section level3">
-<h3><span class="header-section-number">5.1.1</span> Importing via the console</h3>
-<p>First, let’s import a <em>Comma Separated Values</em> (CSV) of data directly off the internet. The CSV file <code>dem_score.csv</code> accessible at <a href="https://moderndive.com/data/dem_score.csv" class="uri">https://moderndive.com/data/dem_score.csv</a> contains ratings of the level of democracy in different countries spanning 1952 to 1992. Let’s use the <code>read_csv()</code> function from the <code>readr</code> package to read it off the web, import it into R, and save the data in a data frame called <code>dem_score</code></p>
+<p>We’ll cover two methods for importing <code>.csv</code> and <code>.xlsx</code> spreadsheet data in R: one using the R console and the other using RStudio’s graphical user interface, abbreviated a GUI.</p>
+<div id="using-the-console" class="section level3">
+<h3><span class="header-section-number">5.1.1</span> Using the console</h3>
+<p>First, let’s import a Comma Separated Values <code>.csv</code> file of data directly off the internet. The <code>.csv</code> file <code>dem_score.csv</code> accessible at <a href="https://moderndive.com/data/dem_score.csv" class="uri">https://moderndive.com/data/dem_score.csv</a> contains ratings of the level of democracy in different countries spanning 1952 to 1992. Let’s use the <code>read_csv()</code> function from the <code>readr</code> package to read it off the web, import it into R, and save it in a data frame called <code>dem_score</code></p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(readr)
 dem_score &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;https://moderndive.com/data/dem_score.csv&quot;</span>)
 dem_score</code></pre></div>
@@ -586,17 +579,17 @@ <h3><span class="header-section-number">5.1.1</span> Importing via the console</
  9 Bhutan        -10    -10    -10    -10    -10    -10    -10    -10    -10
 10 Bolivia        -4     -3     -3     -4     -7     -7      8      9      9
 # … with 86 more rows</code></pre>
-<p>In this <code>dem_score</code> data frame, the minimum value of <code>-10</code> corresponds to a highly autocratic nation whereas a value of <code>10</code> corresponds to a highly democratic nation. We’ll revisit the <code>dem_score</code> data frame in a case study analysis in the upcoming Section <a href="5-tidy.html#case-study-tidy">5.3</a>.</p>
-<p>Note that the <code>read_csv()</code> function included in the <code>readr</code> package is different than the <code>read.csv()</code> function that comes with R even if you don’t install any packages. While the different in the names might be near meaningless (an <code>_</code> instead of a <code>.</code>), the <code>read_csv()</code> is in our opinions easier to use since it can easily read data off the web and generally imports data at a much faster speed.</p>
+<p>In this <code>dem_score</code> data frame, the minimum value of <code>-10</code> corresponds to a highly autocratic nation whereas a value of <code>10</code> corresponds to a highly democratic nation. We’ll revisit the <code>dem_score</code> data frame in a case study in the upcoming Section <a href="5-tidy.html#case-study-tidy">5.3</a>.</p>
+<p>Note that the <code>read_csv()</code> function included in the <code>readr</code> package is different than the <code>read.csv()</code> function that comes installed with R by default. While the difference in the names might seem near meaningless (an <code>_</code> instead of a <code>.</code>), the <code>read_csv()</code> function is in our opinion easier to use since it can more easily read data off the web and generally imports data at a much faster speed.</p>
 <!--Note also that backticks surround the different names of the columns here.  Variable names are not allowed to start with a number but this can be worked around by surrounding the column name in backticks.  Variable names also can't include spaces so if you'd like to refer to the variable **Stock Names** above, for example, you'll need to surround it in backticks: `` `Stock Names` ``.-->
 </div>
-<div id="importing-via-rstudios-interface" class="section level3">
-<h3><span class="header-section-number">5.1.2</span> Importing via RStudio’s interface</h3>
+<div id="using-rstudios-interface" class="section level3">
+<h3><span class="header-section-number">5.1.2</span> Using RStudio’s interface</h3>
 <p>Let’s read in the exact same data saved in Excel format, but this time via RStudio’s graphical interface instead of via the R console. First download the Excel file <code>dem_score.xlsx</code> by clicking <a href="https://moderndive.com/data/dem_score.xlsx" download>here</a>, then</p>
 <ol style="list-style-type: decimal">
 <li>Go to the Files panel of RStudio.</li>
-<li>Navigate to the directory where your downloaded <code>dem_score.xlsx</code> is saved.</li>
-<li>Click on <code>dem_score.xlsx</code></li>
+<li>Navigate to the directory i.e. folder on your computer where the downloaded <code>dem_score.xlsx</code> Excel file is saved.</li>
+<li>Click on <code>dem_score.xlsx</code>.</li>
 <li>Click “Import Dataset…”</li>
 </ol>
 <p>At this point you should see an image like this:</p>
@@ -604,13 +597,13 @@ <h3><span class="header-section-number">5.1.2</span> Importing via RStudio’s i
 <img src="images/read_excel.png" />
 
 </div>
-<p>After clicking on the “Import” button on the bottom right RStudio save this spreadsheet’s data in a data frame called <code>dem_score</code> and display its contents in the spreadsheet viewer. Furthermore on the bottom right you’ll see the code that read in your data in the console; you can copy and paste this code to reload your data again later automatically instead of repeating the above manual process.</p>
+<p>After clicking on the “Import” button on the bottom right RStudio, RStudio will save this spreadsheet’s data in a data frame called <code>dem_score</code> and display its contents in the spreadsheet viewer. Furthermore, note in the bottom right of the above image there exists a “Code Preview”: you can copy and paste this code to reload your data again later automatically instead of repeating the above manual point-and-click process.</p>
 <hr />
 </div>
 </div>
-<div id="tidy-data" class="section level2">
+<div id="tidy-data-ex" class="section level2">
 <h2><span class="header-section-number">5.2</span> Tidy data</h2>
-<p>Let’s now switch gears and learn about the concept of “tidy” data format. Let’s start with a motivating example. Let’s consider the <code>drinks</code> data frame included in the <code>fivethirtyeight</code> data. Run the</p>
+<p>Let’s now switch gears and learn about the concept of “tidy” data format by starting with a motivating example. Let’s consider the <code>drinks</code> data frame included in the <code>fivethirtyeight</code> data. Run the following:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">drinks</code></pre></div>
 <pre><code># A tibble: 193 x 5
    country      beer_servings spirit_servings wine_servings total_litres_of_pur…
@@ -626,8 +619,14 @@ <h2><span class="header-section-number">5.2</span> Tidy data</h2>
  9 Australia              261              72           212                 10.4
 10 Austria                279              75           191                  9.7
 # … with 183 more rows</code></pre>
-<p>After reading the help file by running <code>?drinks</code> we see that is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries originally reported on the data journalism website FiveThirtyEight.com’s article <a href="https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/">“Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?”</a>.</p>
-<p>Let’s filter <code>drinks</code> to only consider 4 countries: the US, China, Italy, and Saudi Arabia; drop the column <code>total_litres_of_pure_alcohol</code> by using <code>select()</code> with a <code>-</code> sign; and rename the variables <code>beer_servings</code>, <code>spirit_servings</code>, and <code>wine_servings</code> to read <code>beer</code>, <code>spirit</code>, and <code>wine</code>.</p>
+<p>After reading the help file by running <code>?drinks</code>, we see that <code>drinks</code> is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries. This data was originally reported on the data journalism website FiveThirtyEight.com in Mona Chalabi’s article <a href="https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/">“Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?”</a></p>
+<p>Let’s apply some of the data wrangling verbs we learned in Chapter <a href="4-wrangling.html#wrangling">4</a> on the <code>drinks</code> data frame. Let’s</p>
+<ol style="list-style-type: decimal">
+<li><code>filter()</code> the <code>drinks</code> data frame to only consider 4 countries (the United States, China, Italy, and Saudi Arabia) then</li>
+<li><code>select()</code> all columns except <code>total_litres_of_pure_alcohol</code> by using <code>-</code> sign, then</li>
+<li><code>rename()</code> the variables <code>beer_servings</code>, <code>spirit_servings</code>, and <code>wine_servings</code> to <code>beer</code>, <code>spirit</code>, and <code>wine</code> respectively</li>
+</ol>
+<p>and save the resulting data frame in <code>drinks_smaller</code>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">drinks_smaller &lt;-<span class="st"> </span>drinks <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">filter</span>(country <span class="op">%in%</span><span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;USA&quot;</span>, <span class="st">&quot;China&quot;</span>, <span class="st">&quot;Italy&quot;</span>, <span class="st">&quot;Saudi Arabia&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">select</span>(<span class="op">-</span>total_litres_of_pure_alcohol) <span class="op">%&gt;%</span><span class="st"> </span>
@@ -640,7 +639,7 @@ <h2><span class="header-section-number">5.2</span> Tidy data</h2>
 2 Italy           85     42   237
 3 Saudi Arabia     0      5     0
 4 USA            249    158    84</code></pre>
-<p>Using <code>drinks_smaller</code>, how would we create the side-by-side AKA dodged barplot in Figure <a href="5-tidy.html#fig:drinks-smaller">5.1</a>; recall we saw barplots displaying two categorical variables in Section <a href="3-viz.html#two-categ-barplot">3.8.3</a>.</p>
+<p>Using the <code>drinks_smaller</code> data frame, how would we create the side-by-side AKA dodged barplot in Figure <a href="5-tidy.html#fig:drinks-smaller">5.1</a>? Recall we saw barplots displaying two categorical variables in Section <a href="3-viz.html#two-categ-barplot">3.8.3</a>.</p>
 <div class="figure" style="text-align: center"><span id="fig:drinks-smaller"></span>
 <img src="ismaykimkuyper_files/figure-html/drinks-smaller-1.png" alt="Alcohol consumption in 4 countries." width="\textwidth" />
 <p class="caption">
@@ -649,11 +648,11 @@ <h2><span class="header-section-number">5.2</span> Tidy data</h2>
 </div>
 <p>Let’s break down the Grammar of Graphics:</p>
 <ol style="list-style-type: decimal">
-<li>The categorical variable <code>country</code> with four levels (China, Italy, Saudi Arabia, USA) is mapped to the <code>x</code>-position of the bars.</li>
-<li>The numerical variable <code>servings</code> is mapped to the <code>y</code>-position of the bars, in other words the height.</li>
-<li>The cateogircal variable <code>type</code> with three levels (beer, spirit, wine) is mapped to the <code>fill</code> color of the bars.</li>
+<li>The categorical variable <code>country</code> with four levels (China, Italy, Saudi Arabia, USA) would have to be mapped to the <code>x</code>-position of the bars.</li>
+<li>The numerical variable <code>servings</code> would have to be mapped to the <code>y</code>-position of the bars, in other words the height of the bars.</li>
+<li>The categorical variable <code>type</code> with three levels (beer, spirit, wine) who have to be mapped to the <code>fill</code> color of the bars.</li>
 </ol>
-<p>Observe however that <code>drinks_smaller</code> has <em>three separate columns</em> for <code>beer</code>, <code>spirit</code>, and <code>wine</code>, whereas in order to recreate the side-by-side AKA dodged barplot in Figure <a href="5-tidy.html#fig:drinks-smaller">5.1</a> we would need a <em>single column</em> <code>type</code> with three possible values: <code>beer</code>, <code>spirit</code>, and <code>wine</code>. In other words, for us to be able to create this barplot, our data frame would have to look like:</p>
+<p>Observe however that <code>drinks_smaller</code> has <em>three separate variables</em> for <code>beer</code>, <code>spirit</code>, and <code>wine</code>, whereas in order to recreate the side-by-side AKA dodged barplot in Figure <a href="5-tidy.html#fig:drinks-smaller">5.1</a> we would need a <em>single variable</em> <code>type</code> with three possible values: <code>beer</code>, <code>spirit</code>, and <code>wine</code>, which we would then map to the <code>fill</code> aesthetic. In other words, for us to be able to create the barplot in Figure <a href="5-tidy.html#fig:drinks-smaller">5.1</a>, our data frame would have to look like this:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">drinks_smaller_tidy</code></pre></div>
 <pre><code># A tibble: 12 x 3
    country      type   servings
@@ -670,9 +669,19 @@ <h2><span class="header-section-number">5.2</span> Tidy data</h2>
 10 Italy        wine        237
 11 Saudi Arabia wine          0
 12 USA          wine         84</code></pre>
-<p>Observe that while <code>drinks_smaller</code> and <code>drinks_smaller_tidy</code> are both rectangular in shape and contain the same data on 4 countries average number of servings for 3 alcohol types, totalling 12 numerical values, they are formatted differently. <code>drinks_smaller</code> is formatted in what’s known as <a href="https://en.wikipedia.org/wiki/Wide_and_narrow_data">“wide”</a> format, whereas <code>drinks_smaller_tidy</code> is formated in what’s known as <a href="https://en.wikipedia.org/wiki/Wide_and_narrow_data#Narrow">“long/narrow”</a>. “Long/narrow” format is as known in R circles as “tidy” format.</p>
-<div id="what-is-tidy-data" class="section level3">
-<h3><span class="header-section-number">5.2.1</span> What is tidy data?</h3>
+<p>Let’s compare the <code>drinks_smaller_tidy</code> with the <code>drinks_smaller</code> data frame from earlier:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">drinks_smaller</code></pre></div>
+<pre><code># A tibble: 4 x 4
+  country       beer spirit  wine
+  &lt;chr&gt;        &lt;int&gt;  &lt;int&gt; &lt;int&gt;
+1 China           79    192     8
+2 Italy           85     42   237
+3 Saudi Arabia     0      5     0
+4 USA            249    158    84</code></pre>
+<p>Observe that while <code>drinks_smaller</code> and <code>drinks_smaller_tidy</code> are both rectangular in shape and contain the same 12 numerical values (3 alcohol types <span class="math inline">\(\times\)</span> 4 countries), they are formatted differently. <code>drinks_smaller</code> is formatted in what’s known as <a href="https://en.wikipedia.org/wiki/Wide_and_narrow_data">“wide”</a> format, whereas <code>drinks_smaller_tidy</code> is formatted in what’s known as <a href="https://en.wikipedia.org/wiki/Wide_and_narrow_data#Narrow">“long/narrow”</a>. In the context of using R, long/narrow format is also known as “tidy” format. Furthermore, in order to use the <code>ggplot2</code> and <code>dplyr</code> packages for data visualization and data wrangling, your input data frames <em>must</em> be in “tidy” format. So all non-“tidy” data must be converted to “tidy” format first.</p>
+<p>Before we show you how to convert non-“tidy” data frames like <code>drinks_smaller</code> to “tidy” data frames like <code>drinks_smaller_tidy</code>, let’s go over the explicit definition of “tidy” data.</p>
+<div id="definition-of-tidy-data" class="section level3">
+<h3><span class="header-section-number">5.2.1</span> Definition of “tidy” data</h3>
 <p>You have surely heard the word “tidy” in your life:</p>
 <ul>
 <li>“Tidy up your room!”</li>
@@ -680,7 +689,7 @@ <h3><span class="header-section-number">5.2.1</span> What is tidy data?</h3>
 <li>Marie Kondo’s best-selling book <a href="https://www.amazon.com/Life-Changing-Magic-Tidying-Decluttering-Organizing/dp/1607747308/ref=sr_1_1?ie=UTF8&amp;qid=1469400636&amp;sr=8-1&amp;keywords=tidying+up"><em>The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing</em></a> and Netflix TV series <a href="https://www.netflix.com/title/80209379"><em>Tidying Up with Marie Kondo</em></a>.</li>
 <li>“I am not by any stretch of the imagination a tidy person, and the piles of unread books on the coffee table and by my bed have a plaintive, pleading quality to me - ‘Read me, please!’” - Linda Grant</li>
 </ul>
-<p>What does it mean for your data to be “tidy”? While “tidy” has a clear english meaning of “organized”, “tidy” in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham’s definition of <em>tidy data</em> here <span class="citation">(Wickham <a href="#ref-tidy">2014</a>)</span>:</p>
+<p>What does it mean for your data to be “tidy”? While “tidy” has a clear English meaning of “organized”, “tidy” in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham’s definition of <em>tidy data</em> here <span class="citation">(Wickham <a href="#ref-tidy">2014</a>)</span>:</p>
 <blockquote>
 <p>A dataset is a collection of values, usually either numbers (if quantitative) or strings AKA text data (if qualitative). Values are organised in two ways. Every value belongs to a variable and an observation. A variable contains all values that measure the same underlying attribute (like height, temperature, duration) across units. An observation contains all values measured on the same unit (like a person, or a day, or a city) across attributes.</p>
 </blockquote>
@@ -695,15 +704,15 @@ <h3><span class="header-section-number">5.2.1</span> What is tidy data?</h3>
 </ol>
 </blockquote>
 <div class="figure" style="text-align: center"><span id="fig:tidyfig"></span>
-<img src="images/tidy-1.png" alt="Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html" width="\textwidth" />
+<img src="images/tidy-1.png" alt="Tidy data graphic from [R for Data Science](http://r4ds.had.co.nz/tidy-data.html)." width="\textwidth" />
 <p class="caption">
-FIGURE 5.2: Tidy data graphic from <a href="http://r4ds.had.co.nz/tidy-data.html" class="uri">http://r4ds.had.co.nz/tidy-data.html</a>
+FIGURE 5.2: Tidy data graphic from <a href="http://r4ds.had.co.nz/tidy-data.html">R for Data Science</a>.
 </p>
 </div>
-<p>For example, say the following table consists of stock prices:</p>
+<p>For example, say you have the following table of stock prices in Table <a href="5-tidy.html#tab:non-tidy-stocks">5.1</a>:</p>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-109">TABLE 5.1: </span>Stock Prices (Non-Tidy Format)
+<span id="tab:non-tidy-stocks">TABLE 5.1: </span>Stock Prices (Non-Tidy Format)
 </caption>
 <thead>
 <tr>
@@ -752,10 +761,10 @@ <h3><span class="header-section-number">5.2.1</span> What is tidy data?</h3>
 </tr>
 </tbody>
 </table>
-<p>Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format since there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), but there are not three columns. In tidy data format each variable should be its own column, as shown below. Notice that both tables present the same information, but in different formats.</p>
+<p>Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format because while there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), there are not three columns. In “tidy” data format each variable should be its own column, as shown in Table <a href="5-tidy.html#tab:tidy-stocks">5.2</a>. Notice that both tables present the same information, but in different formats.</p>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-110">TABLE 5.2: </span>Stock Prices (Tidy Format)
+<span id="tab:tidy-stocks">TABLE 5.2: </span>Stock Prices (Tidy Format)
 </caption>
 <thead>
 <tr>
@@ -839,10 +848,10 @@ <h3><span class="header-section-number">5.2.1</span> What is tidy data?</h3>
 </tr>
 </tbody>
 </table>
-<p>However, consider the following table</p>
+<p>Now we have the requisite three columns Date, Stock Name, and Stock Price. On the other hand, consider the data in Table <a href="5-tidy.html#tab:tidy-stocks-2">5.3</a>.</p>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-111">TABLE 5.3: </span>Date, Boeing Price, Weather Data
+<span id="tab:tidy-stocks-2">TABLE 5.3: </span>Date, Boeing Price, Weather Data
 </caption>
 <thead>
 <tr>
@@ -882,11 +891,22 @@ <h3><span class="header-section-number">5.2.1</span> What is tidy data?</h3>
 </tr>
 </tbody>
 </table>
-<p>In this case, even though the variable “Boeing Price” occurs again, the data <em>is</em> tidy since there are three variables corresponding to three unique pieces of information (Date, Boeing stock price, and the weather that particular day).</p>
+<p>In this case, even though the variable “Boeing Price” occurs just like in our non-“tidy” data in Table <a href="5-tidy.html#tab:non-tidy-stocks">5.1</a>, the data <em>is</em> “tidy” since there are three variables corresponding to three unique pieces of information: Date, Boeing stock price, and the weather that particular day.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC5.1)</strong> What are common characteristics of “tidy” data frames?</p>
+<p><strong>(LC5.2)</strong> What makes “tidy” data frames useful for organizing data?</p>
+<div class="learncheck">
+
+</div>
 </div>
-<div id="converting-to-tidy-format" class="section level3">
-<h3><span class="header-section-number">5.2.2</span> Converting to “tidy” format</h3>
-<p>In this book so far, you’ve only seen data frames that were already in “tidy” format. Furthermore for the rest of this book, you’ll only see data frames that are already in “tidy” format. This is not always the case however with data in the wild. If your original data is in wide AKA non-“tidy” format and you would like to use the <code>ggplot2</code> or <code>dplyr</code> packages on it, you will have to convert it “tidy” format using the <code>gather()</code> function in the <code>tidyr</code> package <span class="citation">(Wickham and Henry <a href="#ref-R-tidyr">2018</a>)</span>. Going back to our <code>drinks_smaller</code> data frame</p>
+<div id="converting-to-tidy-data" class="section level3">
+<h3><span class="header-section-number">5.2.2</span> Converting to “tidy” data</h3>
+<p>In this book so far, you’ve only seen data frames that were already in “tidy” format. Furthermore for the rest of this book, you’ll mostly only see data frames that are already in “tidy” format as well. This is not always the case however with data in the wild. If your original data frame is in wide i.e. non-“tidy” format and you would like to use the <code>ggplot2</code> package for data visualization or the <code>dplyr</code> package for data wrangling, you will first have to convert it “tidy” format using the <code>gather()</code> function in the <code>tidyr</code> package <span class="citation">(Wickham and Henry <a href="#ref-R-tidyr">2018</a>)</span>.</p>
+<p>Going back to our <code>drinks_smaller</code> data frame from earlier:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">drinks_smaller</code></pre></div>
 <pre><code># A tibble: 4 x 4
   country       beer spirit  wine
@@ -895,7 +915,7 @@ <h3><span class="header-section-number">5.2.2</span> Converting to “tidy” fo
 2 Italy           85     42   237
 3 Saudi Arabia     0      5     0
 4 USA            249    158    84</code></pre>
-<p>let’s convert it to “tidy” format by using the <code>gather()</code> function from the <code>tidyr</code> package:</p>
+<p>We convert it to “tidy” format by using the <code>gather()</code> function from the <code>tidyr</code> package as follows:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">drinks_smaller_tidy &lt;-<span class="st"> </span>drinks_smaller <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">gather</span>(<span class="dt">key =</span> type, <span class="dt">value =</span> servings, <span class="op">-</span>country)
 drinks_smaller_tidy</code></pre></div>
@@ -914,30 +934,48 @@ <h3><span class="header-section-number">5.2.2</span> Converting to “tidy” fo
 10 Italy        wine        237
 11 Saudi Arabia wine          0
 12 USA          wine         84</code></pre>
-<p>We set the</p>
+<p>We set the arguments to <code>gather()</code> as follows:</p>
 <ol style="list-style-type: decimal">
-<li><code>key</code> argument to be the name of the column/variable in the new “tidy” frame that contains the column names of the original data frame that you want to gather. Observe we set <code>key = type</code> and in the resulting <code>drinks_smaller_tidy</code> data frame, the column <code>type</code> contains the names <code>beer</code>, <code>spirit</code>, and <code>serving</code>.</li>
-<li><code>value</code> argument to be the name of the column/variable in the “tidy” frame that contains the rows and columns of values in the original data frame you want to gather. Observe we set <code>value = servings</code> and in the resulting <code>drinks_smaller_tidy</code> data frame, the column <code>value</code> contains the 4 <span class="math inline">\(\times\)</span> 3 numerical values.</li>
-<li>Third argument to be the columns you want to or don’t want to gather. Observe we set this to <code>-country</code> indicating that we don’t want to gather the <code>country</code> variable and in the resulting <code>drinks_smaller_tidy</code> data frame there is still a variable <code>country</code>.</li>
+<li><code>key</code> is the name of the column/variable in the new “tidy” frame that contains the column names of the original data frame that you want to tidy. Observe how we set <code>key = type</code> and in the resulting <code>drinks_smaller_tidy</code> the column <code>type</code> contains the three types of alcohol <code>beer</code>, <code>spirit</code>, and <code>wine</code>.</li>
+<li><code>value</code> is the name of the column/variable in the “tidy” frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set <code>value = servings</code> and in the resulting <code>drinks_smaller_tidy</code> the column <code>value</code> contains the 4 <span class="math inline">\(\times\)</span> 3 = 12 numerical values.</li>
+<li>The third argument are the columns you either want to or don’t want to tidy. Observe how we set this to <code>-country</code> indicating that we don’t want to tidy the <code>country</code> variable in <code>drinks_smaller</code> and rather only <code>beer</code>, <code>spirit</code>, and <code>wine</code>.</li>
 </ol>
-<p>With the resulting <code>drinks_smaller_tidy</code> “tidy” format data frame, we can now produce a side-by-side AKA dodged barplot using <code>geom_col()</code> and not <code>geom_bar()</code>, since we would like to map the <code>servings</code> variable to the <code>y</code>-aesthetic of the bars.</p>
+<p>The third argument is a little nuanced, so let’s consider another example. Note the code below is very similar, but now the third argument species which columns we’d want to tidy <code>c(beer, spirit, wine)</code>, instead of the columns we don’t want to tidy <code>-country</code>. Note the use of <code>c()</code> to create a vector of the columns in <code>drinks_smaller</code> that we’d like to tidy. If you run the code below, you’ll see that the resulting <code>drinks_smaller_tidy</code> is the same.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">drinks_smaller_tidy &lt;-<span class="st"> </span>drinks_smaller <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">gather</span>(<span class="dt">key =</span> type, <span class="dt">value =</span> servings, <span class="kw">c</span>(beer, spirit, wine))
+drinks_smaller_tidy</code></pre></div>
+<p>With our <code>drinks_smaller_tidy</code> “tidy” format data frame, we can now produce a side-by-side AKA dodged barplot using <code>geom_col()</code> and not <code>geom_bar()</code>, since we would like to map the <code>servings</code> variable to the <code>y</code>-aesthetic of the bars.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(drinks_smaller_tidy, <span class="kw">aes</span>(<span class="dt">x=</span>country, <span class="dt">y=</span>servings, <span class="dt">fill=</span>type)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_col</span>(<span class="dt">position =</span> <span class="st">&quot;dodge&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-114-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-126-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Converting “wide” format data to “tidy” format often confuses new R users. The only way to learn to get comfortable with the <code>gather()</code> function is with practice, practice, and more practice. For example, see the examples in the bottom of the help file for <code>gather()</code> by running <code>?gather</code>. We’ll show another example of using <code>gather()</code> to convert a “wide” formatted data frame to “tidy” format in Section <a href="5-tidy.html#case-study-tidy">5.3</a>. For other examples of converting a dataset into “tidy” format, check out the different functions available for data tidying and a case study using data from the World Health Organization in <a href="http://r4ds.had.co.nz/tidy-data.html">R for Data Science</a> <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>.</p>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
 </p>
 </div>
-<p><strong>(LC5.1)</strong> Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article <a href="https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/">Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?</a></p>
-<pre><code># A tibble: 3 x 4
-  country     beer_servings spirit_servings wine_servings
-  &lt;chr&gt;               &lt;int&gt;           &lt;int&gt;         &lt;int&gt;
-1 Canada                240             122           100
-2 South Korea           140              16             9
-3 USA                   249             158            84</code></pre>
-<p>This data frame is not in tidy format. What would it look like if it were?</p>
+<p><strong>(LC5.3)</strong> Take a look the <code>airline_safety</code> data frame included in the <code>fivethirtyeight</code> data. Run the following:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">airline_safety</code></pre></div>
+<p>After reading the help file by running <code>?airline_safety</code>, we see that <code>airline_safety</code> is a data frame containing information on different airlines companies’ safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver’s article <a href="https://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/">“Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?”</a>. Let’s ignore the <code>incl_reg_subsidiaries</code> and <code>avail_seat_km_per_week</code> variables for simplicity:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">airline_safety_smaller &lt;-<span class="st"> </span>airline_safety <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span><span class="kw">c</span>(incl_reg_subsidiaries, avail_seat_km_per_week))
+airline_safety_smaller</code></pre></div>
+<pre><code># A tibble: 56 x 7
+   airline incidents_85_99 fatal_accidents… fatalities_85_99 incidents_00_14
+   &lt;chr&gt;             &lt;int&gt;            &lt;int&gt;            &lt;int&gt;           &lt;int&gt;
+ 1 Aer Li…               2                0                0               0
+ 2 Aerofl…              76               14              128               6
+ 3 Aeroli…               6                0                0               1
+ 4 Aerome…               3                1               64               5
+ 5 Air Ca…               2                0                0               2
+ 6 Air Fr…              14                4               79               6
+ 7 Air In…               2                1              329               4
+ 8 Air Ne…               3                0                0               5
+ 9 Alaska…               5                0                0               5
+10 Alital…               7                2               50               4
+# … with 46 more rows, and 2 more variables: fatal_accidents_00_14 &lt;int&gt;,
+#   fatalities_00_14 &lt;int&gt;</code></pre>
+<p>This data frame is not in “tidy” format. How would you convert this data frame to be in “tidy” format, in particular so that it has a variable <code>incident_type_years</code> indicating the incident type/year and a variable <code>count</code> of the counts?</p>
 <div class="learncheck">
 
 </div>
@@ -945,7 +983,7 @@ <h3><span class="header-section-number">5.2.2</span> Converting to “tidy” fo
 </div>
 <div id="nycflights13-package-1" class="section level3">
 <h3><span class="header-section-number">5.2.3</span> <code>nycflights13</code> package</h3>
-<p>Recall the <code>nycflights13</code> package with data about all domestic flights departing from New York City in 2013 that we introduced in Section <a href="2-getting-started.html#nycflights13">2.4</a> and used extensively in Chapter <a href="3-viz.html#viz">3</a> to create visualizations. In particular, let’s revisit the <code>flights</code> data frame by running <code>View(flights)</code> in your console. We see that <code>flights</code> has a rectangular shape with each row corresponding to a different flight and each column corresponding to a characteristic of that flight. This matches exactly with how Hadley Wickham defined tidy data:</p>
+<p>Recall the <code>nycflights13</code> package with data about all domestic flights departing from New York City in 2013 that we introduced in Section <a href="2-getting-started.html#nycflights13">2.4</a> and used extensively in Chapter <a href="3-viz.html#viz">3</a> on data visualization and Chapter <a href="4-wrangling.html#wrangling">4</a> on data wrangling. Let’s revisit the <code>flights</code> data frame by running <code>View(flights)</code>. We saw that <code>flights</code> has a rectangular shape with each of its 336,776 rows corresponding to a flight and each of its 22 columns corresponding to different characteristics/measurements of each flight. This matches exactly with our definition of “tidy” data from above.</p>
 <ol style="list-style-type: decimal">
 <li>Each variable forms a column.</li>
 <li>Each observation forms a row.</li>
@@ -956,48 +994,21 @@ <h3><span class="header-section-number">5.2.3</span> <code>nycflights13</code> p
 <li>Each type of observational unit forms a table.</li>
 </ol>
 </blockquote>
-<p><strong>Observational units</strong>:</p>
-<p>We identified earlier that the observational unit in the <code>flights</code> dataset is an individual flight. And we have shown that this dataset consists of 336,776 flights with 22 variables. In other words, rows of this dataset don’t refer to a measurement on an airline or on an airport; they refer to characteristics/measurements on a given flight from New York City in 2013.</p>
-<p>Also included in the <code>nycflights13</code> package are datasets with different observational units <span class="citation">(Wickham <a href="#ref-R-nycflights13">2018</a>)</span>:</p>
+<p>Recall that we also saw in Section <a href="2-getting-started.html#exploredataframes">2.4.3</a> that the observational unit for the <code>flights</code> data frame is an individual flight. In other words, the rows of the <code>flights</code> data frame refer to characteristics/measurements of individual flights. Also included in the <code>nycflights13</code> package are other data frames with their rows representing different observational units <span class="citation">(Wickham <a href="#ref-R-nycflights13">2018</a>)</span>:</p>
 <ul>
-<li><code>airlines</code>: translation between two letter IATA carrier codes and names (16 in total)</li>
-<li><code>planes</code>: construction information about each of 3,322 planes used</li>
-<li><code>weather</code>: hourly meteorological data (about 8705 observations) for each of the three NYC airports</li>
-<li><code>airports</code>: airport names and locations</li>
+<li><code>airlines</code>: translation between two letter IATA carrier codes and names (16 in total). i.e. the observational unit is an airline company.</li>
+<li><code>planes</code>: construction information about each of 3,322 planes used. i.e. the observational unit is an aircraft.</li>
+<li><code>weather</code>: hourly meteorological data (about 8705 observations) for each of the three NYC airports. i.e. the observational unit is an hourly measurement.</li>
+<li><code>airports</code>: airport names and locations. i.e. the observational unit is an airport.</li>
 </ul>
-<p>The organization of this data follows the third “tidy” data property: observations corresponding to the same observational unit should be saved in the same table/data frame. Another example involves a spreadsheet of all students enrolled in a university along with information about them, such as name, gender, and date of birth. Each row represents an individual student, which is the observational unit in question.</p>
-<p><strong>Identification vs measurement variables</strong>:</p>
-<p>There is a subtle difference between the kinds of variables that you will encounter in data frames: <em>measurement variables</em> and <em>identification variables</em>. The <code>airports</code> data frame you worked with above contains both these types of variables. Recall that in <code>airports</code> the observational unit is an airport, and thus each row corresponds to one particular airport. Let’s pull them apart using the <code>glimpse</code> function:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(airports)</code></pre></div>
-<pre><code>Observations: 1,458
-Variables: 8
-$ faa   &lt;chr&gt; &quot;04G&quot;, &quot;06A&quot;, &quot;06C&quot;, &quot;06N&quot;, &quot;09J&quot;, &quot;0A9&quot;, &quot;0G6&quot;, &quot;0G7&quot;, &quot;0P2&quot;, …
-$ name  &lt;chr&gt; &quot;Lansdowne Airport&quot;, &quot;Moton Field Municipal Airport&quot;, &quot;Schaumbu…
-$ lat   &lt;dbl&gt; 41.1, 32.5, 42.0, 41.4, 31.1, 36.4, 41.5, 42.9, 39.8, 48.1, 39.…
-$ lon   &lt;dbl&gt; -80.6, -85.7, -88.1, -74.4, -81.4, -82.2, -84.5, -76.8, -76.6, …
-$ alt   &lt;int&gt; 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 409, 875, 1…
-$ tz    &lt;dbl&gt; -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -5, -5, -5,…
-$ dst   &lt;chr&gt; &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;…
-$ tzone &lt;chr&gt; &quot;America/New_York&quot;, &quot;America/Chicago&quot;, &quot;America/Chicago&quot;, &quot;Amer…</code></pre>
-<p>The variables <code>faa</code> and <code>name</code> are what we will call <em>identification variables</em>: variables that uniquely identify each observational unit. They are mainly used to provide a unique name to each observational unit, thereby allowing us to uniquely identify them. <code>faa</code> gives the unique code provided by the FAA for that airport, while the <code>name</code> variable gives the longer more natural name of the airport. The remaining variables (<code>lat</code>, <code>lon</code>, <code>alt</code>, <code>tz</code>, <code>dst</code>, <code>tzone</code>) are often called <em>measurement</em> or <em>characteristic</em> variables: variables that describe properties of each observational unit, in other words each observation in each row. For example, <code>lat</code> and <code>long</code> describe the latitude and longitude of each airport.</p>
-<p>So in our above example of a spreadsheet of all students enrolled at a university, email address could be treated as an identical variable since it uniquely identifies each observational unit i.e. each student, while date of birth could not since it is possible (and highly probable) that two students share the same birthday.</p>
-<p>Furthermore, sometimes a single variable might not be enough to uniquely identify each observational unit: combinations of variables might be needed (see Learning Check below). While it is not an absolute rule, for organizational purposes it is considered good practice to have your identification variables in the far left-most columns of your data frame.</p>
-<div class="learncheck">
-<p>
-<strong><em>Learning check</em></strong>
-</p>
-</div>
-<p><strong>(LC5.2)</strong> What properties of the observational unit do each of <code>lat</code>, <code>lon</code>, <code>alt</code>, <code>tz</code>, <code>dst</code>, and <code>tzone</code> describe for the <code>airports</code> data frame? Note that you may want to use <code>?airports</code> to get more information.</p>
-<p><strong>(LC5.3)</strong> Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.</p>
-<div class="learncheck">
-
-</div>
+<p>The organization of the information into these five data frames follow the third “tidy” data property: observations corresponding to the same observational unit should be saved in the same table i.e. data frame. You could think of this property as the old English expression: “birds of a feather flock together.”</p>
 <hr />
 </div>
 </div>
 <div id="case-study-tidy" class="section level2">
 <h2><span class="header-section-number">5.3</span> Case study: Democracy in Guatemala</h2>
-<p>In this section, we’ll show you another example of how to convert a dataset that isn’t in “tidy” format i.e. “wide” format, to a dataset that is in “tidy” format i.e. “long/narrow” format using the <code>gather()</code> function from the <code>tidyr</code> package.. Let’s use the <code>dem_score</code> data frame we imported in Section <a href="5-tidy.html#csv">5.1</a>, but focus on only data corresponding to the country of Guatemala.</p>
+<p>In this section, we’ll show you another example of how to convert a data frame that isn’t in “tidy” format i.e. “wide” format, to a data frame that is in “tidy” format i.e. “long/narrow” format. We’ll do this using the <code>gather()</code> function from the <code>tidyr</code> package again. Furthermore, we’ll make use of some of the <code>ggplot2</code> data visualization and <code>dplyr</code> data wrangling tools you learned in Chapters <a href="3-viz.html#viz">3</a> and <a href="4-wrangling.html#wrangling">4</a>.</p>
+<p>Let’s use the <code>dem_score</code> data frame we imported in Section <a href="5-tidy.html#csv">5.1</a>, but focus on only data corresponding to Guatemala.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">guat_dem &lt;-<span class="st"> </span>dem_score <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">filter</span>(country <span class="op">==</span><span class="st"> &quot;Guatemala&quot;</span>)
 guat_dem</code></pre></div>
@@ -1005,21 +1016,17 @@ <h2><span class="header-section-number">5.3</span> Case study: Democracy in Guat
   country   `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
   &lt;chr&gt;      &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
 1 Guatemala      2     -6     -5      3      1     -3     -7      3      3</code></pre>
-<p>Now let’s produce a plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Let’s start by laying out how we would map our aesthetics to variables in the data frame:</p>
-<ul>
-<li>The <code>data</code> frame is <code>guat_dem</code> by setting <code>data = guat_dem</code></li>
-</ul>
-<p>What are the names of the variables to plot? We’d like to see how the democracy score has changed over the years. Now we are stuck in a predicament. We see that we have a variable named <code>country</code> but its only value is <code>&quot;Guatemala&quot;</code>. We have other variables denoted by different year values. Unfortunately, we’ve run into a dataset that is not in the appropriate format to apply the Grammar of Graphics and <code>ggplot2</code>. Remember that <code>ggplot2</code> is a package in the <code>tidyverse</code> and, thus, needs data to be in a tidy format. We’d like to finish off our mapping of aesthetics to variables by doing something like</p>
+<p>Now let’s produce a <em>time-series plot</em> showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Recall that we saw time-series plot in Section <a href="3-viz.html#linegraphs">3.4</a> on creating linegraphs using <code>geom_line()</code>. Let’s lay out the Grammar of Graphics we saw in Section <a href="3-viz.html#grammarofgraphics">3.1</a>.</p>
+<p>First we know we need to set <code>data = guat_dem</code> and use a <code>geom_line()</code> layer, but what is the aesthetic mapping of variables. We’d like to see how the democracy score has changed over the years, so we need to map:</p>
 <ul>
-<li>The <code>aes</code>thetic mapping is set by <code>aes(x = year, y = democracy_score)</code></li>
+<li><code>year</code> to the x-position aesthetic and</li>
+<li><code>democracy_score</code> to the y-position aesthetic</li>
 </ul>
-<p>but this is not possible with our wide-formatted data. We need to take the values of the current column names in <code>guat_dem</code> (aside from <code>country</code>) and convert them into a new variable that will act as a key called <code>year</code>. Then, we’d like to take the numbers on the inside of the table and turn them into a column that will act as values called <code>democracy_score</code>. Our resulting data frame will have three columns: <code>country</code>, <code>year</code>, and <code>democracy_score</code>.</p>
-<p>The <code>gather()</code> function in the <code>tidyr</code> package can complete this task for us. The first argument to <code>gather()</code>, just as with <code>ggplot2()</code>, is the <code>data</code> argument where we specify which data frame we would like to tidy. The next two arguments to <code>gather()</code> are <code>key</code> and <code>value</code>, which specify what we’d like to call the new columns that convert our wide data into long format. Lastly, we include a specification for variables we’d like to NOT include in this tidying process using a <code>-</code>.</p>
-<!-- Should we include a mention of also including all the variables you'd like to include? I rarely do this and use the negation instead. -->
-<!-- I like not teaching the pipe here since the data argument is the same as what they are used to with ggplot2 -->
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">guat_tidy &lt;-<span class="st"> </span>guat_dem <span class="op">%&gt;%</span><span class="st"> </span>
+<p>Now we are stuck in a predicament, much like with our <code>drinks_smaller</code> example in Section <a href="5-tidy.html#tidy-data-ex">5.2</a>. We see that we have a variable named <code>country</code>, but its only value is <code>&quot;Guatemala&quot;</code>. We have other variables denoted by different year values. Unfortunately, the <code>guat_dem</code> data frame is not “tidy” and hence is not in the appropriate format to apply the Grammar of Graphics and thus we cannot use the <code>ggplot2</code> package. We need to take the values of the columns corresponding to years in <code>guat_dem</code> and convert them into a new “key” variable called <code>year</code>. Furthermore, we’d like to take the democracy scores on the inside of the table and turn them into a new “value” variable called <code>democracy_score</code>. Our resulting data frame will thus have three columns: <code>country</code>, <code>year</code>, and <code>democracy_score</code>.</p>
+<p>Recall that the <code>gather()</code> function in the <code>tidyr</code> package can complete this task for us:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">guat_dem_tidy &lt;-<span class="st"> </span>guat_dem <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">gather</span>(<span class="dt">key =</span> year, <span class="dt">value =</span> democracy_score, <span class="op">-</span>country) 
-guat_tidy</code></pre></div>
+guat_dem_tidy</code></pre></div>
 <pre><code># A tibble: 9 x 3
   country   year  democracy_score
   &lt;chr&gt;     &lt;chr&gt;           &lt;dbl&gt;
@@ -1032,30 +1039,24 @@ <h2><span class="header-section-number">5.3</span> Case study: Democracy in Guat
 7 Guatemala 1982               -7
 8 Guatemala 1987                3
 9 Guatemala 1992                3</code></pre>
-<p>We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a linegraph and <code>ggplot2</code>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(guat_tidy, <span class="kw">aes</span>(<span class="dt">x =</span> year, <span class="dt">y =</span> democracy_score)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_line</span>()</code></pre></div>
-<pre><code>geom_path: Each group consists of only one observation. Do you need to adjust
-the group aesthetic?</code></pre>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-122-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<!-- Arg, this is really annoying that gather() doesn't see that these are all numbers.  Do you know a way around this? I usually just go mutate(year = as.numeric(year) but they don't know mutate() yet. -->
-<p>Observe that the <code>year</code> variable in <code>guat_tidy</code> is stored as a character vector since we had to circumvent the naming rules in R by adding backticks around the different year columns in <code>guat_dem</code>. This is leading to <code>ggplot</code> not knowing exactly how to plot a line using a categorical variable. We can fix this by using the <code>parse_number()</code> function in the <code>readr</code> package and then specify the horizontal axis label to be <code>&quot;year&quot;</code>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(guat_tidy, <span class="kw">aes</span>(<span class="dt">x =</span> <span class="kw">parse_number</span>(year), <span class="dt">y =</span> democracy_score)) <span class="op">+</span>
+<p>We set the arguments to <code>gather()</code> as follows:</p>
+<ol style="list-style-type: decimal">
+<li><code>key</code> is the name of the column/variable in the new “tidy” frame that contains the column names of the original data frame that you want to tidy. Observe how we set <code>key = year</code> and in the resulting <code>guat_dem_tidy</code> the column <code>year</code> contains the years where the Guatemala’s democracy score were measured.</li>
+<li><code>value</code> is the name of the column/variable in the “tidy” frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set <code>value = democracy_score</code> and in the resulting <code>guat_dem_tidy</code> the column <code>democracy_score</code> contains the 1 <span class="math inline">\(\times\)</span> 9 = 9 democracy scores.</li>
+<li>The third argument are the columns you either want to or don’t want to tidy. Observe how we set this to <code>-country</code> indicating that we don’t want to tidy the <code>country</code> variable in <code>guat_dem</code> and rather only <code>1952</code> through <code>1992</code>.</li>
+</ol>
+<!-- 
+Chester: Should we include a mention of also including all the variables you'd like to include? I rarely do this and use the negation instead. 
+Albert: I did for the drinks_smaller example above, but in this case it will be a little hairy to include: c(`1952`:`1992`)
+-->
+<p>However, observe in the output for <code>guat_dem_tidy</code> that the <code>year</code> variable is of type <code>chr</code> or character. Before we can plot this variable on the x-axis, we need to convert it into a numerical variable using the <code>as.numeric()</code> function within the <code>mutate()</code> function, which we saw in Section <a href="4-wrangling.html#mutate">4.5</a> on mutating existing variables to create new ones.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">guat_dem_tidy &lt;-<span class="st"> </span>guat_dem_tidy <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">year =</span> <span class="kw">as.numeric</span>(year))</code></pre></div>
+<p>We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a <code>geom_line()</code>:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(guat_dem_tidy, <span class="kw">aes</span>(<span class="dt">x =</span> year, <span class="dt">y =</span> democracy_score)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_line</span>() <span class="op">+</span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;year&quot;</span>)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:guatline"></span>
-<img src="ismaykimkuyper_files/figure-html/guatline-1.png" alt="Guatemala's democracy score ratings from 1952 to 1992" width="\textwidth" />
-<p class="caption">
-FIGURE 5.3: Guatemala’s democracy score ratings from 1952 to 1992
-</p>
-</div>
-<p>We’ll see in Chapter <a href="4-wrangling.html#wrangling">4</a> how we could use the <code>mutate()</code> function to change <code>year</code> to be a numeric variable instead after we have done our tidying. Notice now that the mappings of aesthetics to variables make sense in Figure <a href="5-tidy.html#fig:guatline">5.3</a>:</p>
-<ul>
-<li>The <code>data</code> frame is <code>guat_tidy</code> by setting <code>data = dem_score</code></li>
-<li>The <code>x</code> <code>aes</code>thetic is mapped to <code>year</code></li>
-<li>The <code>y</code> <code>aes</code>thetic is mapped to <code>democracy_score</code></li>
-<li>The <code>geom_</code>etry chosen is <code>line</code></li>
-</ul>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Year&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Democracy Score&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Democracy score in Guatemala 1952-1992&quot;</span>)</code></pre></div>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-134-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
@@ -1072,12 +1073,12 @@ <h2><span class="header-section-number">5.3</span> Case study: Democracy in Guat
 <h2><span class="header-section-number">5.4</span> Conclusion</h2>
 <div id="tidyverse-package" class="section level3">
 <h3><span class="header-section-number">5.4.1</span> <code>tidyverse</code> package</h3>
-<p>Notice at the beginning of the Chapter we loaded the following four packages:</p>
+<p>Notice at the beginning of the chapter we loaded the following four packages, which are among the four of the most frequently used R packages for data science:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
 <span class="kw">library</span>(ggplot2)
 <span class="kw">library</span>(readr)
 <span class="kw">library</span>(tidyr)</code></pre></div>
-<p>In fact, these are among the four of the most frequently used R packages for data science. There is a much quicker way to load these packages than by individually loading them as we did above. We can install and load the <code>tidyverse</code> package. The <code>tidyverse</code> package acts as an “umbrella” package whereby installing/loading it will install/load multiple packages at once for you. So that after installing the <code>tidyverse</code> package as you would a normal package, running this:</p>
+<p>There is a much quicker way to load these packages than by individually loading them as we did above: by installing and loading the <code>tidyverse</code> package. The <code>tidyverse</code> package acts as an “umbrella” package whereby installing/loading it will install/load multiple packages at once for you. So after installing the <code>tidyverse</code> package as you would a normal package, running this:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(tidyverse)</code></pre></div>
 <p>would be the same as running this:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
@@ -1089,36 +1090,16 @@ <h3><span class="header-section-number">5.4.1</span> <code>tidyverse</code> pack
 <span class="kw">library</span>(stringr)
 <span class="kw">library</span>(forcats)</code></pre></div>
 <p>You’ve seen the first 4 of the these packages: <code>ggplot2</code> for data visualization, <code>dplyr</code> for data wrangling, <code>tidyr</code> for converting data to “tidy” format, and <code>readr</code> for importing spreadsheet data into R. The remaining packages (<code>purrr</code>, <code>tibble</code>, <code>stringr</code>, and <code>forcats</code>) are left for a more advanced book; check out <a href="http://r4ds.had.co.nz/">R for Data Science</a> to learn about these packages.</p>
-<p>The <code>tidyverse</code> “umbrella” package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in “tidy” format and all output data frames are in “tidy” format as well. This acts as a standardization to make transitions between the various functions in these packages as seamless as possible.</p>
-</div>
-<div id="optional-normal-forms-of-data" class="section level3">
-<h3><span class="header-section-number">5.4.2</span> Optional: Normal forms of data</h3>
-<p>The datasets included in the <code>nycflights13</code> package are in a form that minimizes redundancy of data. We will see that there are ways to <em>merge</em> (or <em>join</em>) the different tables together easily. We are capable of doing so because each of the tables have <em>keys</em> in common to relate one to another. This is an important property of <strong>normal forms</strong> of data. The process of decomposing data frames into less redundant tables without losing information is called <strong>normalization</strong>. More information is available on <a href="https://en.wikipedia.org/wiki/Database_normalization">Wikipedia</a>.</p>
-<p>We saw an example of this above with the <code>airlines</code> dataset. While the <code>flights</code> data frame could also include a column with the names of the airlines instead of the carrier code, this would be repetitive since there is a unique mapping of the carrier code to the name of the airline/carrier.</p>
-<p>Below an example is given showing how to <strong>join</strong> the <code>airlines</code> data frame together with the <code>flights</code> data frame by linking together the two datasets via a common <strong>key</strong> of <code>&quot;carrier&quot;</code>. Note that this “joined” data frame is assigned to a new data frame called <code>joined_flights</code>. The <strong>key</strong> variable that we frequently join by is one of the <em>identification variables</em> mentioned above.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">joined_flights &lt;-<span class="st"> </span><span class="kw">inner_join</span>(<span class="dt">x =</span> flights, <span class="dt">y =</span> airlines, <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)</code></pre></div>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(joined_flights)</code></pre></div>
-<p>If we <code>View()</code> this dataset, we see a new variable has been created called <code>name</code>. (We will see in Subsection <a href="4-wrangling.html#rename">4.8.2</a> ways to change <code>name</code> to a more descriptive variable name.) More discussion about joining data frames together will be given in Chapter <a href="4-wrangling.html#wrangling">4</a>. We will see there that the names of the columns to be linked need not match as they did here with <code>&quot;carrier&quot;</code>.</p>
-<div class="learncheck">
-<p>
-<strong><em>Learning check</em></strong>
-</p>
-</div>
-<p><strong>(LC5.6)</strong> What are common characteristics of “tidy” datasets?</p>
-<p><strong>(LC5.7)</strong> What makes “tidy” datasets useful for organizing data?</p>
-<p><strong>(LC5.8)</strong> What are some advantages of data in normal forms? What are some disadvantages?</p>
-<div class="learncheck">
-
-</div>
+<p>The <code>tidyverse</code> “umbrella” package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in “tidy” format and all output data frames are in “tidy” format as well. This standardization of input and output data frames makes transitions between the various functions in these packages as seamless as possible.</p>
 </div>
 <div id="additional-resources-3" class="section level3">
-<h3><span class="header-section-number">5.4.3</span> Additional resources</h3>
+<h3><span class="header-section-number">5.4.2</span> Additional resources</h3>
 <p>An R script file of all R code used in this chapter is available <a href="scripts/05-tidy.R">here</a>.</p>
 <p>If you want to learn more about using the <code>readr</code> and <code>tidyr</code> package, we suggest you that you check out RStudio’s “Data Import” cheatsheet. You can access this cheatsheet by going to RStudio’s <a href="https://www.rstudio.com/resources/cheatsheets/">cheatsheet page</a> and searching for “Data Import Cheat Sheet”.</p>
 <div class="figure" style="text-align: center"><span id="fig:import-cheatsheet"></span>
 <img src="images/import_cheatsheet-1.png" alt="Data Import cheatsheat" width="\textwidth" />
 <p class="caption">
-FIGURE 5.4: Data Import cheatsheat
+FIGURE 5.3: Data Import cheatsheat
 </p>
 </div>
 <!-- 
@@ -1130,12 +1111,12 @@ <h3><span class="header-section-number">5.4.3</span> Additional resources</h3>
 -->
 </div>
 <div id="whats-to-come-2" class="section level3">
-<h3><span class="header-section-number">5.4.4</span> What’s to come?</h3>
-<p>Congratulations! We’ve completed the “Data Science via the tidyverse” portion of this book! We’ll now move to the “data modeling” portion in Chapters <a href="6-regression.html#regression">6</a> and <a href="7-multiple-regression.html#multiple-regression">7</a>, where you’ll leverage your data visualization and wrangling skills to model relationships between different variables in datasets. However, we’re going to leave the Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on “Inference for Regression” until after we’ve covered statistical inference.</p>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-130"></span>
+<h3><span class="header-section-number">5.4.3</span> What’s to come?</h3>
+<p>Congratulations! We’ve completed the “Data Science via the tidyverse” portion of this book! We’ll now move to the “data modeling” portion in Chapters <a href="6-regression.html#regression">6</a> and <a href="7-multiple-regression.html#multiple-regression">7</a>, where you’ll leverage your data visualization and wrangling skills to model relationships between different variables in data frames. However, we’re going to leave the Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on “Inference for Regression” until after we’ve covered statistical inference.</p>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-139"></span>
 <img src="images/flowcharts/flowchart/flowchart.005.png" alt="ModernDive flowchart - On to Part II!" width="\textwidth" />
 <p class="caption">
-FIGURE 5.5: ModernDive flowchart - On to Part II!
+FIGURE 5.4: ModernDive flowchart - On to Part II!
 </p>
 </div>
 
diff --git a/docs/6-appendixD.html b/docs/6-appendixD.html
deleted file mode 100644
index 7afc054fe..000000000
--- a/docs/6-appendixD.html
+++ /dev/null
@@ -1,2194 +0,0 @@
-<!DOCTYPE html>
-<html >
-
-<head>
-
-  <meta charset="UTF-8">
-  <meta http-equiv="X-UA-Compatible" content="IE=edge">
-  <title>Statistical Inference via Data Science in R</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
-  <meta name="generator" content="bookdown 0.7 and GitBook 2.6.7">
-
-  <meta property="og:title" content="Statistical Inference via Data Science in R" />
-  <meta property="og:type" content="book" />
-  <meta property="og:url" content="https://moderndive.com/" />
-  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
-  <meta name="github-repo" content="moderndive/moderndive_book" />
-
-  <meta name="twitter:card" content="summary" />
-  <meta name="twitter:title" content="Statistical Inference via Data Science in R" />
-  
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
-  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
-
-<meta name="author" content="Chester Ismay and Albert Y. Kim">
-
-
-<meta name="date" content="2018-10-02">
-
-  <meta name="viewport" content="width=device-width, initial-scale=1">
-  <meta name="apple-mobile-web-app-capable" content="yes">
-  <meta name="apple-mobile-web-app-status-bar-style" content="black">
-  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
-  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
-<link rel="prev" href="5-wrangling.html">
-
-<script src="libs/jquery-2.2.3/jquery.min.js"></script>
-<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
-<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
-
-
-
-
-
-
-
-<script src="libs/kePrint-0.0.1/kePrint.js"></script>
-<script>
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
-
-  ga('create', 'UA-89938436-1', 'auto');
-  ga('send', 'pageview');
-
-</script>
-
-
-<style type="text/css">
-div.sourceCode { overflow-x: auto; }
-table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
-  margin: 0; padding: 0; vertical-align: baseline; border: none; }
-table.sourceCode { width: 100%; line-height: 100%; }
-td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
-td.sourceCode { padding-left: 5px; }
-code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
-code > span.dt { color: #902000; } /* DataType */
-code > span.dv { color: #40a070; } /* DecVal */
-code > span.bn { color: #40a070; } /* BaseN */
-code > span.fl { color: #40a070; } /* Float */
-code > span.ch { color: #4070a0; } /* Char */
-code > span.st { color: #4070a0; } /* String */
-code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
-code > span.ot { color: #007020; } /* Other */
-code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
-code > span.fu { color: #06287e; } /* Function */
-code > span.er { color: #ff0000; font-weight: bold; } /* Error */
-code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
-code > span.cn { color: #880000; } /* Constant */
-code > span.sc { color: #4070a0; } /* SpecialChar */
-code > span.vs { color: #4070a0; } /* VerbatimString */
-code > span.ss { color: #bb6688; } /* SpecialString */
-code > span.im { } /* Import */
-code > span.va { color: #19177c; } /* Variable */
-code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
-code > span.op { color: #666666; } /* Operator */
-code > span.bu { } /* BuiltIn */
-code > span.ex { } /* Extension */
-code > span.pp { color: #bc7a00; } /* Preprocessor */
-code > span.at { color: #7d9029; } /* Attribute */
-code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
-code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
-code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
-code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
-</style>
-
-<link rel="stylesheet" href="style.css" type="text/css" />
-</head>
-
-<body>
-
-
-
-  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
-
-    <div class="book-summary">
-      <nav role="navigation">
-
-<ul class="summary">
-<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.1</b> Introduction for students</a><ul>
-<li class="chapter" data-level="1.1.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.1.1</b> What you will learn from this book</a></li>
-<li class="chapter" data-level="1.1.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.1.2</b> Data/science pipeline</a></li>
-<li class="chapter" data-level="1.1.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.1.3</b> Reproducible research</a></li>
-<li class="chapter" data-level="1.1.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.1.4</b> Final note for students</a></li>
-</ul></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.2</b> Introduction for instructors</a><ul>
-<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.2.1</b> Who is this book for?</a></li>
-</ul></li>
-<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.3</b> DataCamp</a></li>
-<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.4</b> Connect and contribute</a></li>
-<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.5</b> About this book</a></li>
-<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.6</b> About the authors</a></li>
-</ul></li>
-<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
-<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
-<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
-<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
-</ul></li>
-<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
-<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
-<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
-</ul></li>
-<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
-<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
-<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
-</ul></li>
-<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
-<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
-<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
-<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
-<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
-</ul></li>
-<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
-<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
-</ul></li>
-</ul></li>
-<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
-<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
-<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
-<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
-<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
-<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
-<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
-</ul></li>
-<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
-<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
-<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
-<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
-<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
-<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
-<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
-<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
-<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
-<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
-<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
-<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
-<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
-<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
-<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
-<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
-<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
-<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
-<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
-<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
-<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
-<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
-<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
-<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
-</ul></li>
-<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
-<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
-<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
-</ul></li>
-<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
-<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
-<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
-<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
-<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
-<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
-<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
-<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
-<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
-<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
-<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
-</ul></li>
-<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
-<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
-<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
-<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
-</ul></li>
-<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
-<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
-<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
-<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
-</ul></li>
-<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
-<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
-<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
-<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
-<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
-<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="6" data-path="6-appendixD.html"><a href="6-appendixD.html"><i class="fa fa-check"></i><b>6</b> Learning Check Solutions</a><ul>
-<li class="chapter" data-level="6.1" data-path="6-appendixD.html"><a href="6-appendixD.html#chapter-2-solutions"><i class="fa fa-check"></i><b>6.1</b> Chapter 2 Solutions</a></li>
-<li class="chapter" data-level="6.2" data-path="6-appendixD.html"><a href="6-appendixD.html#chapter-3-solutions"><i class="fa fa-check"></i><b>6.2</b> Chapter 3 Solutions</a></li>
-<li class="chapter" data-level="6.3" data-path="6-appendixD.html"><a href="6-appendixD.html#chapter-4-solutions"><i class="fa fa-check"></i><b>6.3</b> Chapter 4 Solutions</a></li>
-<li class="chapter" data-level="6.4" data-path="6-appendixD.html"><a href="6-appendixD.html#chapter-5-solutions"><i class="fa fa-check"></i><b>6.4</b> Chapter 5 Solutions</a></li>
-</ul></li>
-</ul>
-
-      </nav>
-    </div>
-
-    <div class="book-body">
-      <div class="body-inner">
-        <div class="book-header" role="navigation">
-          <h1>
-            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Statistical Inference via Data Science in R</a>
-          </h1>
-        </div>
-
-        <div class="page-wrapper" tabindex="-1" role="main">
-          <div class="page-inner">
-
-            <section class="normal" id="section-">
-<html>
-<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
-</html>
-<div id="appendixD" class="section level1">
-<h1><span class="header-section-number">Chapter 6</span> Learning Check Solutions</h1>
-<div id="chapter-2-solutions" class="section level2">
-<h2><span class="header-section-number">6.1</span> Chapter 2 Solutions</h2>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
-<span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(nycflights13)</code></pre></div>
-<p><strong>(LC2.1)</strong> What does any <em>ONE</em> row in this <code>flights</code> dataset refer to?</p>
-<ul>
-<li>A. Data on an airline</li>
-<li>B. Data on a flight</li>
-<li>C. Data on an airport</li>
-<li>D. Data on multiple flights</li>
-</ul>
-<p><strong>Solution</strong>: This is data on a flight. Not a flight path! Example:</p>
-<ul>
-<li>a flight path would be United 1545 to Houston</li>
-<li>a flight would be United 1545 to Houston at a specific date/time. For example: 2013/1/1 at 5:15am.</li>
-</ul>
-<p><strong>(LC2.2)</strong> What are some examples in this dataset of <strong>categorical</strong> variables? What makes them different than <strong>quantitative</strong> variables?</p>
-<p><strong>Solution</strong>: Hint: Type <code>?flights</code> in the console to see what all the variables mean!</p>
-<ul>
-<li>Categorical:
-<ul>
-<li><code>carrier</code> the company</li>
-<li><code>dest</code> the destination</li>
-<li><code>flight</code> the flight number. Even though this is a number, its simply a label. Example United 1545 is not less than United 1714</li>
-</ul></li>
-<li>Quantitative:
-<ul>
-<li><code>distance</code> the distance in miles</li>
-<li><code>time_hour</code> time</li>
-</ul></li>
-</ul>
-<p><strong>(LC2.3)</strong> What does <code>int</code>, <code>dbl</code>, and <code>chr</code> mean in the output above?</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li><code>int</code>: integer. Used to count things i.e. a discrete value. Ex: the # of cars parked in a lot</li>
-<li><code>dbl</code>: double. Used to measure things. i.e. a continuous value. Ex: your height in inches</li>
-<li><code>chr</code>: character. i.e. text</li>
-</ul>
-<hr />
-</div>
-<div id="chapter-3-solutions" class="section level2">
-<h2><span class="header-section-number">6.2</span> Chapter 3 Solutions</h2>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(nycflights13)
-<span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(dplyr)</code></pre></div>
-<p><strong>(LC3.1)</strong> Take a look at both the <code>flights</code> and <code>alaska_flights</code> data frames by running <code>View(flights)</code> and <code>View(alaska_flights)</code> in the console. In what respect do these data frames differ?</p>
-<p><strong>Solution</strong>: <code>flights</code> contains all flight data, while <code>alaska_flights</code> contains only data from Alaskan carrier “AS”. We can see that flights has 336776 rows while <code>alaska_flights</code> has only 714</p>
-<p><strong>(LC3.2)</strong> What are some practical reasons why <code>dep_delay</code> and <code>arr_delay</code> have a positive relationship?</p>
-<p><strong>Solution</strong>: The later a plane departs, typically the later it will arrive.</p>
-<p><strong>(LC3.3)</strong> What variables (not necessarily in the <code>flights</code> data frame) would you expect to have a negative correlation (i.e. a negative relationship) with <code>dep_delay</code>? Why? Remember that we are focusing on numerical variables here.</p>
-<p><strong>Solution</strong>: An example in the <code>weather</code> dataset is <code>visibility</code>, which measure visibility in miles. As visibility increases, we would expect departure delays to decrease.</p>
-<p><strong>(LC3.4)</strong> Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights?</p>
-<p><strong>Solution</strong>: The point (0,0) means no delay in departure nor arrival. From the point of view of Alaska airlines, this means the flight was on time. It seems most flights are at least close to being on time.</p>
-<p><strong>(LC3.5)</strong> What are some other features of the plot that stand out to you?</p>
-<p><strong>Solution</strong>: Different people will answer this one differently. One answer is most flights depart and arrive less than an hour late.</p>
-<p><strong>(LC3.6)</strong> Create a new scatterplot using different variables in the <code>alaska_flights</code> data frame by modifying the example above.</p>
-<p><strong>Solution</strong>: Many possibilities for this one, see the plot below. Is there a pattern in departure delay depending on when the flight is scheduled to depart? Interestingly, there seems to be only two blocks of time where flights depart.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_time, <span class="dt">y =</span> dep_delay)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_point</span>()</code></pre></div>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-163-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<p><strong>(LC3.7)</strong> Why is setting the <code>alpha</code> argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot?</p>
-<p><strong>Solution</strong>: Why is setting the <code>alpha</code> argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? <em>It thins out the points so we address overplotting. But more importantly it hints at the (statistical) <strong>density</strong> and <strong>distribution</strong> of the points: where are the points concentrated, where do they occur. We will see more about densities and distributions in Chapter 6 when we switch gears to statistical topics.</em></p>
-<p><strong>(LC3.8)</strong> After viewing the Figure <a href="3-viz.html#fig:alpha">3.4</a> above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the <code>alpha = 0.2</code> set in Figure <a href="3-viz.html#fig:noalpha">3.2</a>?</p>
-<p><strong>Solution</strong>: After viewing the Figure <a href="3-viz.html#fig:alpha">3.4</a> above, give a range of arrival delays and departure delays that occur most frequently? How has that region changed compared to when you observed the same plot without the <code>alpha = 0.2</code> set in Figure <a href="3-viz.html#fig:noalpha">3.2</a>? <em>The lower plot suggests that most Alaska flights from NYC depart between 12 minutes early and on time and arrive between 50 minutes early and on time.</em></p>
-<p><strong>(LC3.9)</strong> Take a look at both the <code>weather</code> and <code>early_january_weather</code> data frames by running <code>View(weather)</code> and <code>View(early_january_weather)</code> in the console. In what respect do these data frames differ?</p>
-<p><strong>Solution</strong>: Take a look at both the <code>weather</code> and <code>early_january_weather</code> data frames by running <code>View(weather)</code> and <code>View(early_january_weather)</code> in the console. In what respect do these data frames differ? <em>The rows of <code>early_january_weather</code> are a subset of <code>weather</code>.</em></p>
-<p><strong>(LC3.10)</strong> <code>View()</code> the <code>flights</code> data frame again. Why does the <code>time_hour</code> variable uniquely identify the hour of the measurement whereas the <code>hour</code> variable does not?</p>
-<p><strong>Solution</strong>: <code>View()</code> the <code>flights</code> data frame again. Why does the <code>time_hour</code> variable correctly identify the hour of the measurement whereas the <code>hour</code> variable does not? <em>Because to uniquely identify an hour, we need the <code>year</code>/<code>month</code>/<code>day</code>/<code>hour</code> sequence, whereas there are only 24 possible <code>hour</code>’s.</em></p>
-<p><strong>(LC3.11)</strong> Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis?</p>
-<p><strong>Solution</strong>: Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? <em>Because lines suggest connectedness and ordering.</em></p>
-<p><strong>(LC3.12)</strong> Why are linegraphs frequently used when time is the explanatory variable?</p>
-<p><strong>Solution</strong>: Why are linegraphs frequently used when time is the explanatory variable? <em>Because time is sequential: subsequent observations are closely related to each other.</em></p>
-<p><strong>(LC3.13)</strong> Plot a time series of a variable other than <code>temp</code> for Newark Airport in the first 15 days of January 2013.</p>
-<p><strong>Solution</strong>: Plot a time series of a variable other than <code>temp</code> for Newark Airport in the first 15 days of January 2013. <em>Humidity is a good one to look at, since this very closely related to the cycles of a day.</em></p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> early_january_weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> time_hour, <span class="dt">y =</span> humid)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_line</span>()</code></pre></div>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-164-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<p><strong>(LC3.14)</strong> What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures?</p>
-<p><strong>Solution</strong>: The distribution doesn’t change much. But by refining the bid width, we see that the temperature data has a high degree of accuracy. What do I mean by accuracy? Looking at the <code>temp</code> variabile by <code>View(weather)</code>, we see that the precision of each temperature recording is 2 decimal places.</p>
-<p><strong>(LC3.15)</strong> Would you classify the distribution of temperatures as symmetric or skewed?</p>
-<p><strong>Solution</strong>: It is rather symmetric, i.e. there are no <strong>long tails</strong> on only one side of the distribution</p>
-<p><strong>(LC3.16)</strong> What would you guess is the “center” value in this distribution? Why did you make that choice?</p>
-<p><strong>Solution</strong>: The center is around 55.2603921°F. By running the <code>summary()</code> command, we see that the mean and median are very similar. In fact, when the distribution is symmetric the mean equals the median.</p>
-<p><strong>(LC3.17)</strong> Is this data spread out greatly from the center or is it close? Why?</p>
-<p><strong>Solution</strong>: This can only be answered relatively speaking! Let’s pick things to be relative to Seattle, WA temperatures:</p>
-<div class="figure">
-<img src="images/temp.png" />
-
-</div>
-<p>While, it appears that Seattle weather has a similar center of 55°F, its temperatures are almost entirely between 35°F and 75°F for a range of about 40°F. Seattle temperatures are much less spread out than New York i.e. much more consistent over the year. New York on the other hand has much colder days in the winter and much hotter days in the summer. Expressed differently, the middle 50% of values, as delineated by the interquartile range is 30°F:</p>
-<p><strong>(LC3.18)</strong> What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables?</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li>Certain months have much more consistent weather (August in particular), while others have crazy variability like January and October, representing changes in the seasons.</li>
-<li>The two variables we are see the relationship of are <code>temp</code> and <code>month</code>.</li>
-</ul>
-<p><strong>(LC3.19)</strong> What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100?</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li>While month is technically a number between 1-12, we’re viewing it as a categorical variable here. Specifically an <strong>ordinal categorical</strong> variable since there is a ordering to the categories</li>
-<li>25, 50, 75, 100 are temperatures</li>
-</ul>
-<p><strong>(LC3.20)</strong> For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics.</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li>We’d have 365 facets to look at. Way to many.</li>
-<li>We don’t really care about day-to-day fluctuation in weather so much, but maybe more week-to-week variation. We’d like to focus on seasonal trends.</li>
-</ul>
-<p><strong>(LC3.21)</strong> Does the <code>temp</code> variable in the <code>weather</code> data-set have a lot of variability? Why do you say that?</p>
-<p><strong>Solution</strong>: Again, like in LC (LC3.17), this is a relative question. I would say yes, because in New York City, you have 4 clear seasons with different weather. Whereas in Seattle WA and Portland OR, you have two seasons: summer and rain!</p>
-<p><strong>(LC3.22)</strong> What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point.</p>
-<p><strong>Solution</strong>: It appears to be an outlier. Let’s revisit the use of the <code>filter</code> command to hone in on it. We want all data points where the <code>month</code> is 5 and <code>temp&lt;25</code></p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">weather <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">filter</span>(month<span class="op">==</span><span class="dv">5</span> <span class="op">&amp;</span><span class="st"> </span>temp <span class="op">&lt;</span><span class="st"> </span><span class="dv">25</span>)</code></pre></div>
-<pre><code># A tibble: 1 x 15
-  origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
-  &lt;chr&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt;      &lt;dbl&gt;     &lt;dbl&gt;
-1 JFK     2013     5     8    22  13.1  12.0  95.3       80       8.06        NA
-# ... with 4 more variables: precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;,
-#   time_hour &lt;dttm&gt;</code></pre>
-<p>There appears to be only one hour and only at JFK that recorded 13.1 F (-10.5 C) in the month of May. This is probably a data entry mistake! Why wasn’t the weather at least similar at EWR (Newark) and LGA (La Guardia)?</p>
-<p><strong>(LC3.23)</strong> Which months have the highest variability in temperature? What reasons do you think this is?</p>
-<p><strong>Solution</strong>: We are now interested in the <strong>spread</strong> of the data. One measure some of you may have seen previously is the standard deviation. But in this plot we can read off the Interquartile Range (IQR):</p>
-<ul>
-<li>The distance from the 1st to the 3rd quartiles i.e. the length of the boxes</li>
-<li>You can also think of this as the spread of the <strong>middle 50%</strong> of the data</li>
-</ul>
-<p>Just from eyeballing it, it seems</p>
-<ul>
-<li>November has the biggest IQR, i.e. the widest box, so has the most variation in temperature</li>
-<li>August has the smallest IQR, i.e. the narrowest box, so is the most consistent temperature-wise</li>
-</ul>
-<p>Here’s how we compute the exact IQR values for each month (we’ll see this more in depth Chapter 5 of the text):</p>
-<ol style="list-style-type: decimal">
-<li><code>group</code> the observations by <code>month</code> then</li>
-<li>for each <code>group</code>, i.e. <code>month</code>, <code>summarize</code> it by applying the summary statistic function <code>IQR()</code>, while making sure to skip over missing data via <code>na.rm=TRUE</code> then</li>
-<li><code>arrange</code> the table in <code>desc</code>ending order of <code>IQR</code></li>
-</ol>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">weather <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">IQR =</span> <span class="kw">IQR</span>(temp, <span class="dt">na.rm=</span><span class="ot">TRUE</span>)) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(IQR))</code></pre></div>
-<table>
-<thead>
-<tr>
-<th style="text-align:right;">
-month
-</th>
-<th style="text-align:right;">
-IQR
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:right;">
-16.02
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:right;">
-14.04
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:right;">
-13.77
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:right;">
-12.06
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:right;">
-12.06
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:right;">
-11.88
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:right;">
-10.98
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:right;">
-10.98
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:right;">
-10.08
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:right;">
-9.18
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:right;">
-9.00
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:right;">
-7.02
-</td>
-</tr>
-</tbody>
-</table>
-<p><strong>(LC3.24)</strong> We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?</p>
-<p><strong>Solution</strong>: Because we need a way to group many numerical observations together, say by grouping by month. For pressure, we have near unique values for pressure, i.e. no groups, so we can’t make boxplots.</p>
-<p><strong>(LC3.25)</strong> Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?</p>
-<p><strong>Solution</strong>: In a histogram, the bin corresponding to where an outlier lies may not by high enough for us to see. In a boxplot, they are explicitly labelled separately.</p>
-<p><strong>(LC3.26)</strong> Why are histograms inappropriate for visualizing categorical variables?</p>
-<p><strong>Solution</strong>: Histograms are for numerical variables i.e. the horizontal part of each histogram bar represents an interval, whereas for a categorical variable each bar represents only one level of the categorical variable.</p>
-<p><strong>(LC3.27)</strong> What is the difference between histograms and barplots?</p>
-<p><strong>Solution</strong>: See above.</p>
-<p><strong>(LC3.28)</strong> How many Envoy Air flights departed NYC in 2013?</p>
-<p><strong>Solution</strong>: Envoy Air is carrier code <code>MQ</code> and thus 26397 flights departed NYC in 2013.</p>
-<p><strong>(LC3.29)</strong> What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly?</p>
-<p><strong>Solution</strong>: What a pain! We’ll see in Chapter 5 on Data Wrangling that applying <code>arrange(desc(n))</code> will sort this table in descending order of <code>n</code>!</p>
-<p><strong>(LC3.30)</strong> Why should pie charts be avoided and replaced by barplots?</p>
-<p><strong>Solution</strong>: In our <strong>opinion</strong>, comparisons using horizontal lines are easier than comparing angles and areas of circles.</p>
-<p><strong>(LC3.31)</strong> What is your opinion as to why pie charts continue to be used?</p>
-<p><strong>Solution</strong>: Legacy?</p>
-<p><strong>(LC3.32)</strong> What kinds of questions are not easily answered by looking at the above figure?</p>
-<p><strong>Solution</strong>: Because the red, green, and blue bars don’t all start at 0 (only red does), it makes comparing counts hard.</p>
-<p><strong>(LC3.33)</strong> What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights?</p>
-<p><strong>Solution</strong>: The different airlines prefer different airports. For example, United is mostly a Newark carrier and JetBlue is a JFK carrier. If airlines didn’t prefer airports, each color would be roughly one third of each bar.}</p>
-<p><strong>(LC3.34)</strong> Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case?</p>
-<p><strong>Solution</strong>: We can easily compare the different aiports for a given carrier using a single comparison line i.e. things are lined up</p>
-<p><strong>(LC3.35)</strong> What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general?</p>
-<p><strong>Solution</strong>: It is hard to get totals for each airline.</p>
-<p><strong>(LC3.36)</strong> Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case?</p>
-<p><strong>Solution</strong>: Not that different than using side-by-side; depends on how you want to organize your presentation.</p>
-<p><strong>(LC3.37)</strong> What information about the different carriers at different airports is more easily seen in the faceted barplot?</p>
-<p><strong>Solution</strong>: Now we can also compare the different carriers <strong>within</strong> a particular airport easily too. For example, we can read off who the top carrier for each airport is easily using a single horizontal line.</p>
-<hr />
-</div>
-<div id="chapter-4-solutions" class="section level2">
-<h2><span class="header-section-number">6.3</span> Chapter 4 Solutions</h2>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
-<span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(nycflights13)
-<span class="kw">library</span>(tidyr)
-<span class="kw">library</span>(readr)</code></pre></div>
-<p><strong>(LC4.1)</strong> Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article <a href="https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/">Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?</a></p>
-<pre><code># A tibble: 3 x 4
-  country     beer_servings spirit_servings wine_servings
-  &lt;chr&gt;               &lt;int&gt;           &lt;int&gt;         &lt;int&gt;
-1 Canada                240             122           100
-2 South Korea           140              16             9
-3 USA                   249             158            84</code></pre>
-<p>This data frame is not in tidy format. What would it look like if it were?</p>
-<p><strong>Solution</strong>: There are three variables of information included: country, alcohol type, and number of servings. In tidy format, each of these variables of information are included in their own column.</p>
-<pre><code># A tibble: 9 x 3
-  country     `alcohol type` servings
-  &lt;chr&gt;       &lt;chr&gt;             &lt;int&gt;
-1 Canada      beer                240
-2 Canada      spirit              122
-3 Canada      wine                100
-4 South Korea beer                140
-5 South Korea spirit               16
-6 South Korea wine                  9
-7 USA         beer                249
-8 USA         spirit              158
-9 USA         wine                 84</code></pre>
-<p>Note that how the rows are sorted is inconsequential in whether or not the data frame is in tidy format. In other words, the following data frame sorted by alcohol type instead of country is equally in tidy format.</p>
-<pre><code># A tibble: 9 x 3
-  country     `alcohol type` servings
-  &lt;chr&gt;       &lt;chr&gt;             &lt;int&gt;
-1 Canada      beer                240
-2 South Korea beer                140
-3 USA         beer                249
-4 Canada      spirit              122
-5 South Korea spirit               16
-6 USA         spirit              158
-7 Canada      wine                100
-8 South Korea wine                  9
-9 USA         wine                 84</code></pre>
-<p><strong>(LC4.2)</strong> What properties of the observational unit do each of <code>lat</code>, <code>lon</code>, <code>alt</code>, <code>tz</code>, <code>dst</code>, and <code>tzone</code> describe for the <code>airports</code> data frame? Note that you may want to use <code>?airports</code> to get more information.</p>
-<p><strong>Solution</strong>: <code>lat</code> <code>long</code> represent the airport geographic coordinates, <code>alt</code> is the altitude above sea level of the airport (Run <code>airports %&gt;% filter(faa == &quot;DEN&quot;)</code> to see the altitude of Denver International Airport), <code>tz</code> is the time zone difference with respect to GMT in London UK, <code>dst</code> is the daylight savings time zone, and <code>tzone</code> is the time zone label.</p>
-<p><strong>(LC4.3)</strong> Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li>In the <code>weather</code> example in LC3.8, the combination of <code>origin</code>, <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code> are identification variables as they identify the observation in question.</li>
-<li>Anything else pertains to observations: <code>temp</code>, <code>humid</code>, <code>wind_speed</code>, etc.</li>
-</ul>
-<p><strong>(LC4.4)</strong> Convert the <code>dem_score</code> data frame into a tidy data frame and assign the name of <code>dem_score_tidy</code> to the resulting long-formatted data frame.</p>
-<p><strong>Solution</strong>: Running the following in the console:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dem_score_tidy &lt;-<span class="st"> </span><span class="kw">gather</span>(<span class="dt">data =</span> dem_score, <span class="dt">key =</span> year, <span class="dt">value =</span> democracy_score, <span class="op">-</span><span class="st"> </span>country)</code></pre></div>
-<p>Let’s now compare the <code>dem_score</code> and <code>dem_score_tidy</code>. <code>dem_score</code> has democracy score information for each year in columns, whereas in <code>dem_score_tidy</code> there are explicit variables <code>year</code> and <code>democracy_score</code>. While both representations of the data contain the same information, we can only use <code>ggplot()</code> to create plots using the <code>dem_score_tidy</code> data frame.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dem_score</code></pre></div>
-<pre><code># A tibble: 96 x 10
-   country    `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
-   &lt;chr&gt;       &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;
- 1 Albania        -9     -9     -9     -9     -9     -9     -9     -9      5
- 2 Argentina      -9     -1     -1     -9     -9     -9     -8      8      7
- 3 Armenia        -9     -7     -7     -7     -7     -7     -7     -7      7
- 4 Australia      10     10     10     10     10     10     10     10     10
- 5 Austria        10     10     10     10     10     10     10     10     10
- 6 Azerbaijan     -9     -7     -7     -7     -7     -7     -7     -7      1
- 7 Belarus        -9     -7     -7     -7     -7     -7     -7     -7      7
- 8 Belgium        10     10     10     10     10     10     10     10     10
- 9 Bhutan        -10    -10    -10    -10    -10    -10    -10    -10    -10
-10 Bolivia        -4     -3     -3     -4     -7     -7      8      9      9
-# ... with 86 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dem_score_tidy</code></pre></div>
-<pre><code># A tibble: 864 x 3
-   country    year  democracy_score
-   &lt;chr&gt;      &lt;chr&gt;           &lt;int&gt;
- 1 Albania    1952               -9
- 2 Argentina  1952               -9
- 3 Armenia    1952               -9
- 4 Australia  1952               10
- 5 Austria    1952               10
- 6 Azerbaijan 1952               -9
- 7 Belarus    1952               -9
- 8 Belgium    1952               10
- 9 Bhutan     1952              -10
-10 Bolivia    1952               -4
-# ... with 854 more rows</code></pre>
-<p><strong>(LC4.5)</strong> Read in the life expectancy data stored at <a href="https://moderndive.com/data/le_mess.csv" class="uri">https://moderndive.com/data/le_mess.csv</a> and convert it to a tidy data frame.</p>
-<p><strong>Solution</strong>: The code is similar</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">life_expectancy &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&#39;https://moderndive.com/data/le_mess.csv&#39;</span>)
-life_expectancy_tidy &lt;-<span class="st"> </span><span class="kw">gather</span>(<span class="dt">data =</span> life_expectancy, <span class="dt">key =</span> year, <span class="dt">value =</span> life_expectancy, <span class="op">-</span>country)</code></pre></div>
-<p>We observe the same construct structure with respect to <code>year</code> in <code>life_expectancy</code> vs <code>life_expectancy_tidy</code> as we did in <code>dem_score</code> vs <code>dem_score_tidy</code>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">life_expectancy</code></pre></div>
-<pre><code># A tibble: 202 x 67
-   country `1951` `1952` `1953` `1954` `1955` `1956` `1957` `1958` `1959` `1960`
-   &lt;chr&gt;    &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
- 1 Afghan…   27.1   27.7   28.2   28.7   29.3   29.8   30.3   30.9   31.4   31.9
- 2 Albania   54.7   55.2   55.8   56.6   57.4   58.4   59.5   60.6   61.8   62.9
- 3 Algeria   43.0   43.5   44.0   44.4   44.9   45.4   45.9   46.4   47.0   47.5
- 4 Angola    31.0   31.6   32.1   32.7   33.2   33.8   34.3   34.9   35.4   36.0
- 5 Antigu…   58.3   58.8   59.3   59.9   60.4   60.9   61.4   62.0   62.5   63.0
- 6 Argent…   61.9   62.5   63.1   63.6   64.0   64.4   64.7   65     65.2   65.4
- 7 Armenia   62.7   63.1   63.6   64.1   64.5   65     65.4   65.9   66.4   66.9
- 8 Aruba     59.0   60.0   61.0   61.9   62.7   63.4   64.1   64.7   65.2   65.7
- 9 Austra…   68.7   69.1   69.7   69.8   70.2   70.0   70.3   70.9   70.4   70.9
-10 Austria   65.2   66.8   67.3   67.3   67.6   67.7   67.5   68.5   68.4   68.8
-# ... with 192 more rows, and 56 more variables: `1961` &lt;dbl&gt;, `1962` &lt;dbl&gt;,
-#   `1963` &lt;dbl&gt;, `1964` &lt;dbl&gt;, `1965` &lt;dbl&gt;, `1966` &lt;dbl&gt;, `1967` &lt;dbl&gt;,
-#   `1968` &lt;dbl&gt;, `1969` &lt;dbl&gt;, `1970` &lt;dbl&gt;, `1971` &lt;dbl&gt;, `1972` &lt;dbl&gt;,
-#   `1973` &lt;dbl&gt;, `1974` &lt;dbl&gt;, `1975` &lt;dbl&gt;, `1976` &lt;dbl&gt;, `1977` &lt;dbl&gt;,
-#   `1978` &lt;dbl&gt;, `1979` &lt;dbl&gt;, `1980` &lt;dbl&gt;, `1981` &lt;dbl&gt;, `1982` &lt;dbl&gt;,
-#   `1983` &lt;dbl&gt;, `1984` &lt;dbl&gt;, `1985` &lt;dbl&gt;, `1986` &lt;dbl&gt;, `1987` &lt;dbl&gt;,
-#   `1988` &lt;dbl&gt;, `1989` &lt;dbl&gt;, `1990` &lt;dbl&gt;, `1991` &lt;dbl&gt;, `1992` &lt;dbl&gt;,
-#   `1993` &lt;dbl&gt;, `1994` &lt;dbl&gt;, `1995` &lt;dbl&gt;, `1996` &lt;dbl&gt;, `1997` &lt;dbl&gt;,
-#   `1998` &lt;dbl&gt;, `1999` &lt;dbl&gt;, `2000` &lt;dbl&gt;, `2001` &lt;dbl&gt;, `2002` &lt;dbl&gt;,
-#   `2003` &lt;dbl&gt;, `2004` &lt;dbl&gt;, `2005` &lt;dbl&gt;, `2006` &lt;dbl&gt;, `2007` &lt;dbl&gt;,
-#   `2008` &lt;dbl&gt;, `2009` &lt;dbl&gt;, `2010` &lt;dbl&gt;, `2011` &lt;dbl&gt;, `2012` &lt;dbl&gt;,
-#   `2013` &lt;dbl&gt;, `2014` &lt;dbl&gt;, `2015` &lt;dbl&gt;, `2016` &lt;dbl&gt;</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">life_expectancy_tidy</code></pre></div>
-<pre><code># A tibble: 13,332 x 3
-   country             year  life_expectancy
-   &lt;chr&gt;               &lt;chr&gt;           &lt;dbl&gt;
- 1 Afghanistan         1951             27.1
- 2 Albania             1951             54.7
- 3 Algeria             1951             43.0
- 4 Angola              1951             31.0
- 5 Antigua and Barbuda 1951             58.3
- 6 Argentina           1951             61.9
- 7 Armenia             1951             62.7
- 8 Aruba               1951             59.0
- 9 Australia           1951             68.7
-10 Austria             1951             65.2
-# ... with 13,322 more rows</code></pre>
-<p><strong>(LC4.6)</strong> What are common characteristics of “tidy” datasets?</p>
-<p><strong>Solution</strong>: Rows correspond to observations, while columns correspond to variables.</p>
-<p><strong>(LC4.7)</strong> What makes “tidy” datasets useful for organizing data?</p>
-<p><strong>Solution</strong>: Tidy datasets are an organized way of viewing data. We’ll see later that this format is required for the <code>ggplot2</code> and <code>dplyr</code> packages for data visualization and wrangling.</p>
-<p><strong>(LC4.8)</strong> What are some advantages of data in normal forms? What are some disadvantages?</p>
-<p><strong>Solution</strong>: When datasets are in normal form, we can easily <code>_join</code> them with other datasets! For example, can we join the <code>flights</code> data with the <code>planes</code> data? We’ll see this more in Chapter 5!</p>
-<hr />
-</div>
-<div id="chapter-5-solutions" class="section level2">
-<h2><span class="header-section-number">6.4</span> Chapter 5 Solutions</h2>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
-<span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(nycflights13)</code></pre></div>
-<p><strong>(LC5.1)</strong> What’s another way using the “not” operator <code>!</code> we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the <code>flights</code> data frame? Test this out using the code above.</p>
-<p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Original in book</span>
-not_BTV_SEA &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">filter</span>(<span class="op">!</span>(dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>))
-
-<span class="co"># Alternative way</span>
-not_BTV_SEA &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">filter</span>(<span class="op">!</span>dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">&amp;</span><span class="st"> </span><span class="op">!</span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>)
-
-<span class="co"># Yet another way</span>
-not_BTV_SEA &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">filter</span>(dest <span class="op">!=</span><span class="st"> &quot;BTV&quot;</span> <span class="op">&amp;</span><span class="st"> </span>dest <span class="op">!=</span><span class="st"> &quot;SEA&quot;</span>)</code></pre></div>
-<p><strong>(LC5.2)</strong> Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach?</p>
-<p><strong>Solution</strong>: The missing patients may have died of lung cancer! So to ignore them might seriously <strong>bias</strong> your results! It is very important to think of what the consequences on your analysis are of ignoring missing data! Ask yourself:</p>
-<ul>
-<li>There is a systematic reasons why certain values are missing? If so, you might be biasing your results!</li>
-<li>If there isn’t, then it might be ok to “sweep missing values under the rug.”</li>
-</ul>
-<p><strong>(LC5.3)</strong> Modify the above <code>summarize</code> function to create <code>summary_temp</code> to also use the <code>n()</code> summary function: <code>summarize(count = n())</code>. What does the returned value correspond to?</p>
-<p><strong>Solution</strong>: It corresponds to a count of the number of observations/rows:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">weather <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())</code></pre></div>
-<pre><code># A tibble: 1 x 1
-  count
-  &lt;int&gt;
-1 26115</code></pre>
-<p><strong>(LC5.4)</strong> Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run <code>summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE))</code> first.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st">   </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</code></pre></div>
-<p><strong>Solution</strong>: Consider the output of only running the first two lines:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">weather <span class="op">%&gt;%</span><span class="st">   </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</code></pre></div>
-<pre><code># A tibble: 1 x 1
-   mean
-  &lt;dbl&gt;
-1  55.3</code></pre>
-<p>Because after the first <code>summarize()</code>, the variable <code>temp</code> disappears as it has been collapsed to the value <code>mean</code>. So when we try to run the second <code>summarize()</code>, it can’t find the variable <code>temp</code> to compute the standard deviation of.</p>
-<p><strong>(LC5.5)</strong> Recall from Chapter <a href="3-viz.html#viz">3</a> when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the <code>summary_monthly_temp</code> data frame tell us about temperatures in New York City throughout the year?</p>
-<p><strong>Solution</strong>:</p>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<thead>
-<tr>
-<th style="text-align:right;">
-month
-</th>
-<th style="text-align:right;">
-mean
-</th>
-<th style="text-align:right;">
-std_dev
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:right;">
-35.63566
-</td>
-<td style="text-align:right;">
-10.224635
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:right;">
-34.27060
-</td>
-<td style="text-align:right;">
-6.982378
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:right;">
-39.88007
-</td>
-<td style="text-align:right;">
-6.249278
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:right;">
-51.74564
-</td>
-<td style="text-align:right;">
-8.786168
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:right;">
-61.79500
-</td>
-<td style="text-align:right;">
-9.681644
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:right;">
-72.18400
-</td>
-<td style="text-align:right;">
-7.546371
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:right;">
-80.06622
-</td>
-<td style="text-align:right;">
-7.119898
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:right;">
-74.46847
-</td>
-<td style="text-align:right;">
-5.191615
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:right;">
-67.37129
-</td>
-<td style="text-align:right;">
-8.465902
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:right;">
-60.07113
-</td>
-<td style="text-align:right;">
-8.846035
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:right;">
-44.99043
-</td>
-<td style="text-align:right;">
-10.443805
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:right;">
-38.44180
-</td>
-<td style="text-align:right;">
-9.982432
-</td>
-</tr>
-</tbody>
-</table>
-<p>The standard deviation is a quantification of <strong>spread</strong> and <strong>variability</strong>. We see that the period in November, December, and January has the most variation in weather, so you can expect very different temperatures on different days.</p>
-<p><strong>(LC5.6)</strong> What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC?</p>
-<p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">summary_temp_by_day &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(year, month, day) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(
-          <span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
-          <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)
-          )
-summary_temp_by_day</code></pre></div>
-<pre><code># A tibble: 364 x 5
-# Groups:   year, month [?]
-    year month   day  mean std_dev
-   &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;   &lt;dbl&gt;
- 1  2013     1     1  37.0    4.00
- 2  2013     1     2  28.7    3.45
- 3  2013     1     3  30.0    2.58
- 4  2013     1     4  34.9    2.45
- 5  2013     1     5  37.2    4.01
- 6  2013     1     6  40.1    4.40
- 7  2013     1     7  40.6    3.68
- 8  2013     1     8  40.1    5.77
- 9  2013     1     9  43.2    5.40
-10  2013     1    10  43.8    2.95
-# ... with 354 more rows</code></pre>
-<p>Note: <code>group_by(day)</code> is not enough, because <code>day</code> is a value between 1-31. We need to <code>group_by(year, month, day)</code></p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
-<span class="kw">library</span>(nycflights13)
-
-summary_temp_by_month &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(
-          <span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
-          <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)
-          )</code></pre></div>
-<p><strong>(LC5.7)</strong> Recreate <code>by_monthly_origin</code>, but instead of grouping via <code>group_by(origin, month)</code>, group variables in a different order <code>group_by(month, origin)</code>. What differs in the resulting dataset?</p>
-<p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">by_monthly_origin &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(month, origin) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())</code></pre></div>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">by_monthly_origin</code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<thead>
-<tr>
-<th style="text-align:right;">
-month
-</th>
-<th style="text-align:left;">
-origin
-</th>
-<th style="text-align:right;">
-count
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-9893
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9161
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-7950
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-9107
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-8421
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-7423
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10420
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9697
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8717
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10531
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9218
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8581
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10592
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9397
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8807
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10175
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9472
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8596
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10475
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-10023
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8927
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10359
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9983
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8985
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-9550
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-8908
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-9116
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10104
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9143
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-9642
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-9707
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-8710
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8851
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-9922
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9146
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-9067
-</td>
-</tr>
-</tbody>
-</table>
-<p>The difference is they are organized/sorted by <code>month</code> first, then <code>origin</code></p>
-<p><strong>(LC5.8)</strong> How could we identify how many flights left each of the three airports for each <code>carrier</code>?</p>
-<p><strong>Solution</strong>: We could summarize the count from each airport using the <code>n()</code> function, which <em>counts rows</em>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">count_flights_by_airport &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(origin, carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count=</span><span class="kw">n</span>())</code></pre></div>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">count_flights_by_airport</code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<thead>
-<tr>
-<th style="text-align:left;">
-origin
-</th>
-<th style="text-align:left;">
-carrier
-</th>
-<th style="text-align:right;">
-count
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-9E
-</td>
-<td style="text-align:right;">
-1268
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-AA
-</td>
-<td style="text-align:right;">
-3487
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-AS
-</td>
-<td style="text-align:right;">
-714
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-B6
-</td>
-<td style="text-align:right;">
-6557
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-DL
-</td>
-<td style="text-align:right;">
-4342
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-EV
-</td>
-<td style="text-align:right;">
-43939
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-MQ
-</td>
-<td style="text-align:right;">
-2276
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-OO
-</td>
-<td style="text-align:right;">
-6
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-UA
-</td>
-<td style="text-align:right;">
-46087
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-US
-</td>
-<td style="text-align:right;">
-4405
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-VX
-</td>
-<td style="text-align:right;">
-1566
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-WN
-</td>
-<td style="text-align:right;">
-6188
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-9E
-</td>
-<td style="text-align:right;">
-14651
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-AA
-</td>
-<td style="text-align:right;">
-13783
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-B6
-</td>
-<td style="text-align:right;">
-42076
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-DL
-</td>
-<td style="text-align:right;">
-20701
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-EV
-</td>
-<td style="text-align:right;">
-1408
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-HA
-</td>
-<td style="text-align:right;">
-342
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-MQ
-</td>
-<td style="text-align:right;">
-7193
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-UA
-</td>
-<td style="text-align:right;">
-4534
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-US
-</td>
-<td style="text-align:right;">
-2995
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-VX
-</td>
-<td style="text-align:right;">
-3596
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-9E
-</td>
-<td style="text-align:right;">
-2541
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-AA
-</td>
-<td style="text-align:right;">
-15459
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-B6
-</td>
-<td style="text-align:right;">
-6002
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-DL
-</td>
-<td style="text-align:right;">
-23067
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-EV
-</td>
-<td style="text-align:right;">
-8826
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-F9
-</td>
-<td style="text-align:right;">
-685
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-FL
-</td>
-<td style="text-align:right;">
-3260
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-MQ
-</td>
-<td style="text-align:right;">
-16928
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-OO
-</td>
-<td style="text-align:right;">
-26
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-UA
-</td>
-<td style="text-align:right;">
-8044
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-US
-</td>
-<td style="text-align:right;">
-13136
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-WN
-</td>
-<td style="text-align:right;">
-6087
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-YV
-</td>
-<td style="text-align:right;">
-601
-</td>
-</tr>
-</tbody>
-</table>
-<p>All remarkably similar! Note: the <code>n()</code> function counts rows, whereas the <code>sum(VARIABLE_NAME)</code> funciton sums all values of a certain numerical variable <code>VARIABLE_NAME</code>.</p>
-<p><strong>(LC5.9)</strong> How does the <code>filter</code> operation differ from a <code>group_by</code> followed by a <code>summarize</code>?</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li><code>filter</code> picks out rows from the original dataset without modifying them, whereas</li>
-<li><code>group_by %&gt;% summarize</code> computes summaries of numerical variables, and hence reports new values.</li>
-</ul>
-<p><strong>(LC5.10)</strong> What do positive values of the <code>gain</code> variable in <code>flights</code> correspond to? What about negative values? And what about a zero value?</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li>Say a flight departed 20 minutes late, i.e. <code>dep_delay = 20</code></li>
-<li>Then arrived 10 minutes late, i.e. <code>arr_delay = 10</code>.</li>
-<li>Then <code>gain = dep_delay - arr_delay = 20 - 10  = 10</code> is positive, so it “made up/gained time in the air.”</li>
-<li>0 means the departure and arrival time were the same, so no time was made up in the air. We see in most cases that the <code>gain</code> is near 0 minutes.</li>
-<li>I never understood this. If the pilot says “we’re going make up time in the air” because of delay by flying faster, why don’t you always just fly faster to begin with?</li>
-</ul>
-<p><strong>(LC5.11)</strong> Could we create the <code>dep_delay</code> and <code>arr_delay</code> columns by simply subtracting <code>dep_time</code> from <code>sched_dep_time</code> and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in <code>flights</code>.</p>
-<p><strong>Solution</strong>: No because you can’t do direct arithmetic on times. The difference in time between 12:03 and 11:59 is 4 minutes, but <code>1203-1159 = 44</code></p>
-<p><strong>(LC5.12)</strong> What can we say about the distribution of <code>gain</code>? Describe it in a few sentences using the plot and the <code>gain_summary</code> data frame values.</p>
-<p><strong>Solution</strong>: Most of the time the gain is a little under zero, most of the time the gain is between -50 and 50 minutes. There are some extreme cases however!</p>
-<p><strong>(LC5.13)</strong> Looking at Figure <a href="5-wrangling.html#fig:reldiagram">5.7</a>, when joining <code>flights</code> and <code>weather</code> (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code>, and <code>origin</code>, and not just <code>hour</code>?</p>
-<p><strong>Solution</strong>: Because <code>hour</code> is simply a value between 0 and 23; to identify a <em>specific</em> hour, we need to know which year, month, day and at which airport.</p>
-<p><strong>(LC5.14)</strong> What surprises you about the top 10 destinations from NYC in 2013?</p>
-<p><strong>Solution</strong>: This question is subjective! What surprises me is the high number of flights to Boston. Wouldn’t it be easier and quicker to take the train?</p>
-<p><strong>(LC5.15)</strong> What are some ways to select all three of the <code>dest</code>, <code>air_time</code>, and <code>distance</code> variables from <code>flights</code>? Give the code showing how to do this in at least three different ways.</p>
-<p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># The regular way:</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(dest, air_time, distance)</code></pre></div>
-<pre><code># A tibble: 336,776 x 3
-   dest  air_time distance
-   &lt;chr&gt;    &lt;dbl&gt;    &lt;dbl&gt;
- 1 IAH        227     1400
- 2 IAH        227     1416
- 3 MIA        160     1089
- 4 BQN        183     1576
- 5 ATL        116      762
- 6 ORD        150      719
- 7 FLL        158     1065
- 8 IAD         53      229
- 9 MCO        140      944
-10 ORD        138      733
-# ... with 336,766 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Since they are sequential columns in the dataset</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(dest<span class="op">:</span>distance)</code></pre></div>
-<pre><code># A tibble: 336,776 x 3
-   dest  air_time distance
-   &lt;chr&gt;    &lt;dbl&gt;    &lt;dbl&gt;
- 1 IAH        227     1400
- 2 IAH        227     1416
- 3 MIA        160     1089
- 4 BQN        183     1576
- 5 ATL        116      762
- 6 ORD        150      719
- 7 FLL        158     1065
- 8 IAD         53      229
- 9 MCO        140      944
-10 ORD        138      733
-# ... with 336,766 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Not as effective, by removing everything else</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span>year, <span class="op">-</span>month, <span class="op">-</span>day, <span class="op">-</span>dep_time, <span class="op">-</span>sched_dep_time, <span class="op">-</span>dep_delay, <span class="op">-</span>arr_time,
-         <span class="op">-</span>sched_arr_time, <span class="op">-</span>arr_delay, <span class="op">-</span>carrier, <span class="op">-</span>flight, <span class="op">-</span>tailnum, <span class="op">-</span>origin, 
-         <span class="op">-</span>hour, <span class="op">-</span>minute, <span class="op">-</span>time_hour)</code></pre></div>
-<pre><code># A tibble: 336,776 x 6
-   dest  air_time distance  gain hours gain_per_hour
-   &lt;chr&gt;    &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;         &lt;dbl&gt;
- 1 IAH        227     1400    -9 3.78          -2.38
- 2 IAH        227     1416   -16 3.78          -4.23
- 3 MIA        160     1089   -31 2.67         -11.6 
- 4 BQN        183     1576    17 3.05           5.57
- 5 ATL        116      762    19 1.93           9.83
- 6 ORD        150      719   -16 2.5           -6.4 
- 7 FLL        158     1065   -24 2.63          -9.11
- 8 IAD         53      229    11 0.883         12.5 
- 9 MCO        140      944     5 2.33           2.14
-10 ORD        138      733   -10 2.3           -4.35
-# ... with 336,766 more rows</code></pre>
-<p><strong>(LC5.16)</strong> How could one use <code>starts_with</code>, <code>ends_with</code>, and <code>contains</code> to select columns from the <code>flights</code> data frame? Provide three different examples in total: one for <code>starts_with</code>, one for <code>ends_with</code>, and one for <code>contains</code>.</p>
-<p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Anything that starts with &quot;d&quot;</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(<span class="kw">starts_with</span>(<span class="st">&quot;d&quot;</span>))</code></pre></div>
-<pre><code># A tibble: 336,776 x 5
-     day dep_time dep_delay dest  distance
-   &lt;int&gt;    &lt;int&gt;     &lt;dbl&gt; &lt;chr&gt;    &lt;dbl&gt;
- 1     1      517         2 IAH       1400
- 2     1      533         4 IAH       1416
- 3     1      542         2 MIA       1089
- 4     1      544        -1 BQN       1576
- 5     1      554        -6 ATL        762
- 6     1      554        -4 ORD        719
- 7     1      555        -5 FLL       1065
- 8     1      557        -3 IAD        229
- 9     1      557        -3 MCO        944
-10     1      558        -2 ORD        733
-# ... with 336,766 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Anything related to delays:</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(<span class="kw">ends_with</span>(<span class="st">&quot;delay&quot;</span>))</code></pre></div>
-<pre><code># A tibble: 336,776 x 2
-   dep_delay arr_delay
-       &lt;dbl&gt;     &lt;dbl&gt;
- 1         2        11
- 2         4        20
- 3         2        33
- 4        -1       -18
- 5        -6       -25
- 6        -4        12
- 7        -5        19
- 8        -3       -14
- 9        -3        -8
-10        -2         8
-# ... with 336,766 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Anything related to departures:</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(<span class="kw">contains</span>(<span class="st">&quot;dep&quot;</span>))</code></pre></div>
-<pre><code># A tibble: 336,776 x 3
-   dep_time sched_dep_time dep_delay
-      &lt;int&gt;          &lt;int&gt;     &lt;dbl&gt;
- 1      517            515         2
- 2      533            529         4
- 3      542            540         2
- 4      544            545        -1
- 5      554            600        -6
- 6      554            558        -4
- 7      555            600        -5
- 8      557            600        -3
- 9      557            600        -3
-10      558            600        -2
-# ... with 336,766 more rows</code></pre>
-<p><strong>(LC5.17)</strong> Why might we want to use the <code>select()</code> function on a data frame?</p>
-<p><strong>Solution</strong>: To narrow down the data frame, to make it easier to look at. Using <code>View()</code> for example.</p>
-<p><strong>(LC5.18)</strong> Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.</p>
-<p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">top_five &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">avg_delay =</span> <span class="kw">mean</span>(arr_delay, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(avg_delay)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">5</span>)</code></pre></div>
-<pre><code>Selecting by avg_delay</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">top_five</code></pre></div>
-<pre><code># A tibble: 5 x 2
-  dest  avg_delay
-  &lt;chr&gt;     &lt;dbl&gt;
-1 CAE        41.8
-2 TUL        33.7
-3 OKC        30.6
-4 JAC        28.1
-5 TYS        24.1</code></pre>
-<p><strong>(LC5.19)</strong> Using the datasets included in the <code>nycflights13</code> package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:</p>
-<ol style="list-style-type: decimal">
-<li><strong>Crucial</strong>: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level <em>pseudocode</em> that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse <em>what</em> you are trying to do (the algorithm) with <em>how</em> you are going to do it (writing <code>dplyr</code> code).</li>
-<li>Take a close look at all the datasets using the <code>View()</code> function: <code>flights</code>, <code>weather</code>, <code>planes</code>, <code>airports</code>, and <code>airlines</code> to identify which variables are necessary to compute available seat miles.</li>
-<li>Figure <a href="5-wrangling.html#fig:reldiagram">5.7</a> above showing how the various datasets can be joined will also be useful.</li>
-<li>Consider the data wrangling verbs in Table <a href="5-wrangling.html#tab:wrangle-summary-table">5.1</a> as your toolbox!</li>
-</ol>
-<p>Here are some examples of student-written <a href="https://twitter.com/rudeboybert/status/964181298691629056">pseudocode</a>. Based on our own pseudocode, let’s first display the entire solution.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">ASM =</span> <span class="kw">sum</span>(ASM, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(ASM))</code></pre></div>
-<pre><code># A tibble: 16 x 2
-   carrier         ASM
-   &lt;chr&gt;         &lt;dbl&gt;
- 1 UA      15516377526
- 2 DL      10532885801
- 3 B6       9618222135
- 4 AA       3677292231
- 5 US       2533505829
- 6 VX       2296680778
- 7 EV       1817236275
- 8 WN       1718116857
- 9 9E        776970310
-10 HA        642478122
-11 AS        314104736
-12 FL        219628520
-13 F9        184832280
-14 YV         20163632
-15 MQ          7162420
-16 OO          1299835</code></pre>
-<p>Let’s now break this down step-by-step. To compute the available seat miles for a given flight, we need the <code>distance</code> variable from the <code>flights</code> data frame and the <code>seats</code> variable from the <code>planes</code> data frame, necessitating a join by the key variable <code>tailnum</code> as illustrated in Figure <a href="5-wrangling.html#fig:reldiagram">5.7</a>. To keep the resulting data frame easy to view, we’ll <code>select()</code> only these two variables and <code>carrier</code>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance)</code></pre></div>
-<pre><code># A tibble: 284,170 x 3
-   carrier seats distance
-   &lt;chr&gt;   &lt;int&gt;    &lt;dbl&gt;
- 1 UA        149     1400
- 2 UA        149     1416
- 3 AA        178     1089
- 4 B6        200     1576
- 5 DL        178      762
- 6 UA        191      719
- 7 B6        200     1065
- 8 EV         55      229
- 9 B6        200      944
-10 B6        200     1028
-# ... with 284,160 more rows</code></pre>
-<p>Now for each flight we can compute the available seat miles <code>ASM</code> by multiplying the number of seats by the distance via a <code>mutate()</code>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># Added:</span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance)</code></pre></div>
-<pre><code># A tibble: 284,170 x 4
-   carrier seats distance    ASM
-   &lt;chr&gt;   &lt;int&gt;    &lt;dbl&gt;  &lt;dbl&gt;
- 1 UA        149     1400 208600
- 2 UA        149     1416 210984
- 3 AA        178     1089 193842
- 4 B6        200     1576 315200
- 5 DL        178      762 135636
- 6 UA        191      719 137329
- 7 B6        200     1065 213000
- 8 EV         55      229  12595
- 9 B6        200      944 188800
-10 B6        200     1028 205600
-# ... with 284,160 more rows</code></pre>
-<p>Next we want to sum the <code>ASM</code> for each carrier. We achieve this by first grouping by <code>carrier</code> and then summarizing using the <code>sum()</code> function:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># Added:</span>
-<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">ASM =</span> <span class="kw">sum</span>(ASM))</code></pre></div>
-<pre><code># A tibble: 16 x 2
-   carrier         ASM
-   &lt;chr&gt;         &lt;dbl&gt;
- 1 9E        776970310
- 2 AA       3677292231
- 3 AS        314104736
- 4 B6       9618222135
- 5 DL      10532885801
- 6 EV       1817236275
- 7 F9        184832280
- 8 FL        219628520
- 9 HA        642478122
-10 MQ          7162420
-11 OO          1299835
-12 UA      15516377526
-13 US       2533505829
-14 VX       2296680778
-15 WN       1718116857
-16 YV         20163632</code></pre>
-<p>However, because for certain carriers certain flights have missing <code>NA</code> values, the resulting table also returns <code>NA</code>’s. We can eliminate these by adding a <code>na.rm = TRUE</code> argument to <code>sum()</code>, telling R that we want to remove the <code>NA</code>’s in the sum. We saw this in Section (summarize):</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># Modified:</span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">ASM =</span> <span class="kw">sum</span>(ASM, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</code></pre></div>
-<pre><code># A tibble: 16 x 2
-   carrier         ASM
-   &lt;chr&gt;         &lt;dbl&gt;
- 1 9E        776970310
- 2 AA       3677292231
- 3 AS        314104736
- 4 B6       9618222135
- 5 DL      10532885801
- 6 EV       1817236275
- 7 F9        184832280
- 8 FL        219628520
- 9 HA        642478122
-10 MQ          7162420
-11 OO          1299835
-12 UA      15516377526
-13 US       2533505829
-14 VX       2296680778
-15 WN       1718116857
-16 YV         20163632</code></pre>
-<p>Finally, we <code>arrange()</code> the data in <code>desc()</code>ending order of <code>ASM</code>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">ASM =</span> <span class="kw">sum</span>(ASM, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># Added:</span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(ASM))</code></pre></div>
-<pre><code># A tibble: 16 x 2
-   carrier         ASM
-   &lt;chr&gt;         &lt;dbl&gt;
- 1 UA      15516377526
- 2 DL      10532885801
- 3 B6       9618222135
- 4 AA       3677292231
- 5 US       2533505829
- 6 VX       2296680778
- 7 EV       1817236275
- 8 WN       1718116857
- 9 9E        776970310
-10 HA        642478122
-11 AS        314104736
-12 FL        219628520
-13 F9        184832280
-14 YV         20163632
-15 MQ          7162420
-16 OO          1299835</code></pre>
-<p>While the above data frame is correct, the IATA <code>carrier</code> code is not always useful. For example, what carrier is <code>WN</code>? We can address this by joining with the <code>airlines</code> dataset using <code>carrier</code> is the key variable. While this step is not absolutely required, it goes a long way to making the table easier to make sense of. It is important to be empathetic with the ultimate consumers of your presented data!</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">ASM =</span> <span class="kw">sum</span>(ASM, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(ASM)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># Added:</span>
-<span class="st">  </span><span class="kw">inner_join</span>(airlines, <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)</code></pre></div>
-<pre><code># A tibble: 16 x 3
-   carrier         ASM name                       
-   &lt;chr&gt;         &lt;dbl&gt; &lt;chr&gt;                      
- 1 UA      15516377526 United Air Lines Inc.      
- 2 DL      10532885801 Delta Air Lines Inc.       
- 3 B6       9618222135 JetBlue Airways            
- 4 AA       3677292231 American Airlines Inc.     
- 5 US       2533505829 US Airways Inc.            
- 6 VX       2296680778 Virgin America             
- 7 EV       1817236275 ExpressJet Airlines Inc.   
- 8 WN       1718116857 Southwest Airlines Co.     
- 9 9E        776970310 Endeavor Air Inc.          
-10 HA        642478122 Hawaiian Airlines Inc.     
-11 AS        314104736 Alaska Airlines Inc.       
-12 FL        219628520 AirTran Airways Corporation
-13 F9        184832280 Frontier Airlines Inc.     
-14 YV         20163632 Mesa Airlines Inc.         
-15 MQ          7162420 Envoy Air                  
-16 OO          1299835 SkyWest Airlines Inc.      </code></pre>
-<hr />
-
-<div id="refs" class="references">
-<div>
-<p>Chihara, Laura M., and Tim C. Hesterberg. 2011. <em>Mathematical Statistics with Resampling and R</em>. Hoboken, NJ: John Wiley; Sons. <a href="https://sites.google.com/site/chiharahesterberg/home" class="uri">https://sites.google.com/site/chiharahesterberg/home</a>.</p>
-</div>
-<div>
-<p>Diez, David M, Christopher D Barr, and Mine Çetinkaya-Rundel. 2014. <em>Introductory Statistics with Randomization and Simulation</em>. First Edition. <a href="https://www.openintro.org/stat/textbook.php?stat_book=isrs" class="uri">https://www.openintro.org/stat/textbook.php?stat_book=isrs</a>.</p>
-</div>
-<div>
-<p>Grolemund, Garrett, and Hadley Wickham. 2016. <em>R for Data Science</em>. <a href="http://r4ds.had.co.nz/" class="uri">http://r4ds.had.co.nz/</a>.</p>
-</div>
-<div>
-<p>Ismay, Chester. 2016. <em>Getting Used to R, RStudio, and R Markdown</em>. <a href="http://ismayc.github.io/rbasics-book" class="uri">http://ismayc.github.io/rbasics-book</a>.</p>
-</div>
-<div>
-<p>Kim, Albert Y., Chester Ismay, and Jennifer Chunn. 2018. <em>Fivethirtyeight: Data and Code Behind the Stories and Interactives at ’Fivethirtyeight’</em>. <a href="https://github.com/rudeboybert/fivethirtyeight" class="uri">https://github.com/rudeboybert/fivethirtyeight</a>.</p>
-</div>
-<div>
-<p>Robbins, Naomi. 2013. <em>Creating More Effective Graphs</em>. Chart House.</p>
-</div>
-<div>
-<p>Wickham, Hadley. 2014. “Tidy Data.” <em>Journal of Statistical Software</em> Volume 59 (Issue 10). <a href="https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf" class="uri">https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf</a>.</p>
-</div>
-<div>
-<p>———. 2018. <em>Nycflights13: Flights That Departed Nyc in 2013</em>. <a href="https://CRAN.R-project.org/package=nycflights13" class="uri">https://CRAN.R-project.org/package=nycflights13</a>.</p>
-</div>
-<div>
-<p>Wickham, Hadley, and Lionel Henry. 2018. <em>Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions</em>. <a href="https://CRAN.R-project.org/package=tidyr" class="uri">https://CRAN.R-project.org/package=tidyr</a>.</p>
-</div>
-<div>
-<p>Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. <em>Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics</em>.</p>
-</div>
-<div>
-<p>Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2018. <em>Dplyr: A Grammar of Data Manipulation</em>. <a href="https://CRAN.R-project.org/package=dplyr" class="uri">https://CRAN.R-project.org/package=dplyr</a>.</p>
-</div>
-<div>
-<p>Wilkinson, Leland. 2005. <em>The Grammar of Graphics (Statistics and Computing)</em>. Secaucus, NJ, USA: Springer-Verlag New York, Inc.</p>
-</div>
-<div>
-<p>Xie, Yihui. 2018. <em>Bookdown: Authoring Books and Technical Documents with R Markdown</em>. <a href="https://CRAN.R-project.org/package=bookdown" class="uri">https://CRAN.R-project.org/package=bookdown</a>.</p>
-</div>
-</div>
-</div>
-</div>
-            </section>
-
-          </div>
-        </div>
-      </div>
-<a href="5-wrangling.html" class="navigation navigation-prev navigation-unique" aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
-
-    </div>
-  </div>
-<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
-<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
-<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
-<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
-<script>
-gitbook.require(["gitbook"], function(gitbook) {
-gitbook.start({
-"sharing": {
-"github": false,
-"facebook": true,
-"twitter": true,
-"google": false,
-"linkedin": false,
-"weibo": false,
-"instapper": false,
-"vk": false,
-"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
-},
-"fontsettings": {
-"theme": "white",
-"family": "sans",
-"size": 2
-},
-"edit": {
-"link": "https://github.com/moderndive/moderndive_book/edit/master/94-appendixD.Rmd",
-"text": "Edit"
-},
-"download": null,
-"toc": {
-"collapse": "section",
-"scroll_highlight": true
-}
-});
-});
-</script>
-
-<!-- dynamically load mathjax for compatibility with self-contained -->
-<script>
-  (function () {
-    var script = document.createElement("script");
-    script.type = "text/javascript";
-    var src = "";
-    if (src === "" || src === "true") src = "https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML";
-    if (location.protocol !== "file:" && /^https?:/.test(src))
-      src = src.replace(/^https?:/, '');
-    script.src = src;
-    document.getElementsByTagName("head")[0].appendChild(script);
-  })();
-</script>
-</body>
-
-</html>
diff --git a/docs/6-regression.html b/docs/6-regression.html
index 7adfabec4..cc315f7b1 100644
--- a/docs/6-regression.html
+++ b/docs/6-regression.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 6 Basic Regression | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 6 Basic Regression | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 6 Basic Regression | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -578,13 +571,7 @@ <h3>Needed packages</h3>
 <span class="kw">library</span>(moderndive)
 <span class="kw">library</span>(gapminder)
 <span class="kw">library</span>(skimr)</code></pre></div>
-</div>
-<div id="datacamp" class="section level3 unnumbered">
-<h3>DataCamp</h3>
-<p>The introductory basic regression analysis below was the inspiration for a large part of ModernDive co-author <a href="https://twitter.com/rudeboybert">Albert Y. Kim’s</a> DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 2 “Modeling with Basic Regression”.</p>
-<center>
-<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/working-with-data-in-the-tidyverse"><img src="images/datacamp_working_with_data.png" style="height: 150px;"/></a>
-</center>
+<hr />
 </div>
 <div id="model1" class="section level2">
 <h2><span class="header-section-number">6.1</span> One numerical explanatory variable</h2>
@@ -611,7 +598,7 @@ <h3><span class="header-section-number">6.1.1</span> Exploratory data analysis</
 <span class="st">  </span><span class="kw">sample_n</span>(<span class="dv">5</span>)</code></pre></div>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-138">TABLE 6.1: </span>Random sample of 5 instructors
+<span id="tab:unnamed-chunk-146">TABLE 6.1: </span>Random sample of 5 instructors
 </caption>
 <thead>
 <tr>
@@ -1108,7 +1095,7 @@ <h3><span class="header-section-number">6.1.3</span> Observed/fitted values and
 <p>For example, say we are interested in the 21st instructor in this dataset:</p>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-151">TABLE 6.3: </span>Data for 21st instructor
+<span id="tab:unnamed-chunk-159">TABLE 6.3: </span>Data for 21st instructor
 </caption>
 <thead>
 <tr>
@@ -1160,7 +1147,7 @@ <h3><span class="header-section-number">6.1.3</span> Observed/fitted values and
 regression_points</code></pre></div>
 <table>
 <caption>
-<span id="tab:unnamed-chunk-154">TABLE 6.4: </span>Regression points (for only 21st through 24th instructor)
+<span id="tab:unnamed-chunk-162">TABLE 6.4: </span>Regression points (for only 21st through 24th instructor)
 </caption>
 <thead>
 <tr>
@@ -1268,6 +1255,7 @@ <h3><span class="header-section-number">6.1.3</span> Observed/fitted values and
 <li><code>residual</code> = 0.153 = 4.4 - 4.25 is the value of the residual for this instructor. In other words, the model was off by 0.153 teaching score units for this instructor.</li>
 </ul>
 <p>More development of this idea appears in Section <a href="6-regression.html#leastsquares">6.3.3</a> and we encourage you to read that section after you investigate residuals.</p>
+<hr />
 </div>
 </div>
 <div id="model2" class="section level2">
@@ -1401,13 +1389,13 @@ <h3><span class="header-section-number">6.2.1</span> Exploratory data analysis</
  n obs: 142 
  n variables: 2 
 
-── Variable type:factor ──────
+── Variable type:factor ──────────────────────────────────────────────────────────────────────────────────────────────────────
   variable missing complete   n n_unique                         top_counts
  continent       0      142 142        5 Afr: 52, Asi: 33, Eur: 30, Ame: 25
  ordered
    FALSE
 
-── Variable type:numeric ─────
+── Variable type:numeric ─────────────────────────────────────────────────────────────────────────────────────────────────────
  variable missing complete   n  mean    sd    p0   p25   p50   p75 p100
   lifeExp       0      142 142 67.01 12.07 39.61 57.16 71.94 76.41 82.6
      hist
@@ -1426,7 +1414,7 @@ <h3><span class="header-section-number">6.2.1</span> Exploratory data analysis</
 FIGURE 6.9: Histogram of Life Expectancy in 2007
 </p>
 </div>
-<p>We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancies that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a <code>group_by(continent)</code> to the above code:</p>
+<p>We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancy that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a <code>group_by(continent)</code> to the above code:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">lifeExp_by_continent &lt;-<span class="st"> </span>gapminder2007 <span class="op">%&gt;%</span>
 <span class="st">  </span><span class="kw">group_by</span>(continent) <span class="op">%&gt;%</span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">median =</span> <span class="kw">median</span>(lifeExp), <span class="dt">mean =</span> <span class="kw">mean</span>(lifeExp))</code></pre></div>
@@ -1505,8 +1493,8 @@ <h3><span class="header-section-number">6.2.1</span> Exploratory data analysis</
 </tr>
 </tbody>
 </table>
-<p>We see now that there are differences in life expectancies between the continents. For example let’s focus on only medians. While the median life expectancy across all <span class="math inline">\(n = 142\)</span> countries in 2007 was 71.935, the median life expectancy across the <span class="math inline">\(n =52\)</span> countries in Africa was only 52.927.</p>
-<p>Let’s create a corresponding visualization. One way to compare the life expectancies of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section <a href="3-viz.html#facets">3.6</a>, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure <a href="6-regression.html#fig:catxplot0b">6.10</a>, the variable we facet by is <code>continent</code>, which is categorical with five levels, each corresponding to the five continents of the world.</p>
+<p>We see now that there are differences in life expectancy between the continents. For example let’s focus on only medians. While the median life expectancy across all <span class="math inline">\(n = 142\)</span> countries in 2007 was 71.935, the median life expectancy across the <span class="math inline">\(n =52\)</span> countries in Africa was only 52.927.</p>
+<p>Let’s create a corresponding visualization. One way to compare the life expectancy of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section <a href="3-viz.html#facets">3.6</a>, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure <a href="6-regression.html#fig:catxplot0b">6.10</a>, the variable we facet by is <code>continent</code>, which is categorical with five levels, each corresponding to the five continents of the world.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(gapminder2007, <span class="kw">aes</span>(<span class="dt">x =</span> lifeExp)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
 <span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Life expectancy&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Number of countries&quot;</span>, 
@@ -1518,7 +1506,7 @@ <h3><span class="header-section-number">6.2.1</span> Exploratory data analysis</
 FIGURE 6.10: Life expectancy in 2007
 </p>
 </div>
-<p>Another way would be via a <code>geom_boxplot</code> where we map the categorical variable <code>continent</code> to the <span class="math inline">\(x\)</span>-axis and the different life expectancies within each continent on the <span class="math inline">\(y\)</span>-axis; we do this in Figure <a href="6-regression.html#fig:catxplot1">6.11</a>.</p>
+<p>Another way would be via a <code>geom_boxplot</code> where we map the categorical variable <code>continent</code> to the <span class="math inline">\(x\)</span>-axis and the different life expectancy within each continent on the <span class="math inline">\(y\)</span>-axis; we do this in Figure <a href="6-regression.html#fig:catxplot1">6.11</a>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(gapminder2007, <span class="kw">aes</span>(<span class="dt">x =</span> continent, <span class="dt">y =</span> lifeExp)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_boxplot</span>() <span class="op">+</span>
 <span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Continent&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Life expectancy (years)&quot;</span>, 
@@ -1535,7 +1523,7 @@ <h3><span class="header-section-number">6.2.1</span> Exploratory data analysis</
 <li>Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes).</li>
 <li>Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand.</li>
 </ul>
-<p>Now, let’s start making comparisons of life expectancy <em>between</em> continents. Let’s use Africa as a <em>baseline for comparsion</em>. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa:</p>
+<p>Now, let’s start making comparisons of life expectancy <em>between</em> continents. Let’s use Africa as a <em>baseline for comparison</em>. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa:</p>
 <ol style="list-style-type: decimal">
 <li>The median life expectancy of the Americas is roughly 20 years greater.</li>
 <li>The median life expectancy of Asia is roughly 20 years greater.</li>
@@ -1826,7 +1814,7 @@ <h3><span class="header-section-number">6.2.2</span> Linear regression</h3>
 &amp;= 54.8
 \end{align}\]</span>
 <p>i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table <a href="6-regression.html#tab:continent-mean-life-expectancies">6.7</a>.</p>
-<p>Next, <span class="math inline">\(b_{\text{Amer}}\)</span> = <code>continentAmericas = 18.8</code> is the difference in mean life expectancies of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is:</p>
+<p>Next, <span class="math inline">\(b_{\text{Amer}}\)</span> = <code>continentAmericas = 18.8</code> is the difference in mean life expectancy of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is:</p>
 <span class="math display">\[\begin{align}
 \widehat{\text{life exp}} &amp;= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
 + b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
@@ -1837,7 +1825,7 @@ <h3><span class="header-section-number">6.2.2</span> Linear regression</h3>
 &amp;= 72.9
 \end{align}\]</span>
 <p>i.e. in this case, only the indicator function <span class="math inline">\(\mathbb{1}_{\mbox{Amer}}(x)\)</span> is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table <a href="6-regression.html#tab:continent-mean-life-expectancies">6.7</a>.</p>
-<p>Similarly, <span class="math inline">\(b_{\text{Asia}}\)</span> = <code>continentAsia = 15.9</code> is the difference in mean life expectancies of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is:</p>
+<p>Similarly, <span class="math inline">\(b_{\text{Asia}}\)</span> = <code>continentAsia = 15.9</code> is the difference in mean life expectancy of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is:</p>
 <span class="math display">\[\begin{align}
 \widehat{\text{life exp}} &amp;= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
 + b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
@@ -1871,7 +1859,7 @@ <h3><span class="header-section-number">6.2.3</span> Observed/fitted values and
 <p>What do fitted values <span class="math inline">\(\widehat{y}\)</span> and residuals <span class="math inline">\(y - \widehat{y}\)</span> correspond to when the explanatory variable <span class="math inline">\(x\)</span> is categorical? Let’s investigate these values for the first 10 countries in the <code>gapminder2007</code> dataset:</p>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-169">TABLE 6.9: </span>First 10 out of 142 countries
+<span id="tab:unnamed-chunk-177">TABLE 6.9: </span>First 10 out of 142 countries
 </caption>
 <thead>
 <tr>
@@ -2043,7 +2031,7 @@ <h3><span class="header-section-number">6.2.3</span> Observed/fitted values and
 regression_points</code></pre></div>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-171">TABLE 6.10: </span>Regression points (First 10 out of 142 countries)
+<span id="tab:unnamed-chunk-179">TABLE 6.10: </span>Regression points (First 10 out of 142 countries)
 </caption>
 <thead>
 <tr>
@@ -2242,6 +2230,7 @@ <h3><span class="header-section-number">6.2.3</span> Observed/fitted values and
 <li>The fitted values <code>lifeExp_hat</code> <span class="math inline">\(\widehat{\text{lifeexp}}\)</span>. Countries in Africa have the same fitted value of 54.8, which is the mean life expectancy of Africa. Countries in Asia have the same fitted value of 70.7, which is the mean life expectancy of Asia. This similarly holds for countries in the Americas, Europe, and Oceania.</li>
 <li>The <code>residual</code> column is simply <span class="math inline">\(y - \widehat{y}\)</span> = <code>lifeexp - lifeexp_hat</code>. These values can be interpreted as that particular country’s deviation from the mean life expectancy of the respective continent’s mean. For example, the first row of this dataset corresponds to Afghanistan, and the residual of <span class="math inline">\(-26.9 = 43.8 - 70.7\)</span> is Afghanistan’s mean life expectancy minus the mean life expectancy of all Asian countries.</li>
 </ul>
+<hr />
 </div>
 </div>
 <div id="related-topics" class="section level2">
@@ -2299,9 +2288,9 @@ <h3><span class="header-section-number">6.3.3</span> Best fitting line</h3>
 <li>The residual <span class="math inline">\(y-\widehat{y} = 4.9-4.369 = 0.531\)</span> was the length of the blue arrow.</li>
 </ul>
 <p>Let’s do this for another arbitrarily chosen instructor whose beauty score was <span class="math inline">\(x=2.333\)</span>. The residual in this case is <span class="math inline">\(2.7 - 4.036 = -1.336\)</span>.</p>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-173-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-181-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Another arbitrarily chosen instructor whose beauty score was <span class="math inline">\(x=3.667\)</span> results in the residual in this case being <span class="math inline">\(4.4 - 4.125 = 0.2753\)</span>.</p>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-174-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-182-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Let’s do this one more time for another arbitrarily chosen instructor. This instructor had a beauty score of <span class="math inline">\(x = 6\)</span>. The residual in this case is <span class="math inline">\(3.8 - 4.28 = -0.4802\)</span>.</p>
 <p><img src="ismaykimkuyper_files/figure-html/here-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Now let’s say we repeated this process for all 463 instructors in our dataset. Regression <em>minimizes the sum of all 463 arrow lengths squared.</em> In other words, it minimizes the sum of the squared residuals:</p>
@@ -2657,14 +2646,18 @@ <h3><span class="header-section-number">6.3.4</span> <code>get_regression_x()</c
 </table>
 <p>In this case, it outputs only variables of interest to us as new regression modelers: the outcome variable <span class="math inline">\(y\)</span> (<code>score</code>), all explanatory/predictor variables (<code>bty_avg</code>), all resulting <code>fitted</code> values <span class="math inline">\(\hat{y}\)</span> used by applying the equation of the regression line to <code>bty_avg</code>, and the <code>resid</code>ual <span class="math inline">\(y - \hat{y}\)</span>.</p>
 <p>If you’re even more curious, take a look at the source code for these functions on <a href="https://github.com/moderndive/moderndive/blob/master/R/regression_functions.R">GitHub</a>.</p>
+<hr />
 </div>
 </div>
 <div id="conclusion-4" class="section level2">
 <h2><span class="header-section-number">6.4</span> Conclusion</h2>
-<p>In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter <a href="7-multiple-regression.html#multiple-regression">7</a>, we’ll study <em>multiple regression</em> where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections <a href="11-inference-for-regression.html#model1residuals">11.4.1</a> and <a href="11-inference-for-regression.html#model2residuals">11.4.2</a>. We are actually verifying some very important assumptions that must be met for the <code>std_error</code> (standard error), <code>p_value</code>, <code>lower_ci</code> and <code>upper_ci</code> (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in!</p>
-<div id="script-of-r-code" class="section level3">
-<h3><span class="header-section-number">6.4.1</span> Script of R code</h3>
+<div id="additional-resources-4" class="section level3">
+<h3><span class="header-section-number">6.4.1</span> Additional resources</h3>
 <p>An R script file of all R code used in this chapter is available <a href="scripts/06-regression.R">here</a>.</p>
+</div>
+<div id="whats-to-come-3" class="section level3">
+<h3><span class="header-section-number">6.4.2</span> What’s to come?</h3>
+<p>In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter <a href="7-multiple-regression.html#multiple-regression">7</a>, we’ll study <em>multiple regression</em> where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections <a href="11-inference-for-regression.html#model1residuals">11.4.1</a> and <a href="11-inference-for-regression.html#model2residuals">11.4.2</a>. We are actually verifying some very important assumptions that must be met for the <code>std_error</code> (standard error), <code>p_value</code>, <code>lower_ci</code> and <code>upper_ci</code> (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in!</p>
 
 </div>
 </div>
diff --git a/docs/7-multiple-regression.html b/docs/7-multiple-regression.html
index b349e0da4..16fb36da4 100644
--- a/docs/7-multiple-regression.html
+++ b/docs/7-multiple-regression.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 7 Multiple Regression | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 7 Multiple Regression | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 7 Multiple Regression | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -549,13 +542,7 @@ <h3>Needed packages</h3>
 <span class="kw">library</span>(moderndive)
 <span class="kw">library</span>(ISLR)
 <span class="kw">library</span>(skimr)</code></pre></div>
-</div>
-<div id="datacamp-1" class="section level3 unnumbered">
-<h3>DataCamp</h3>
-<p>The approach taken below of using more than one variable of information in models using multiple regression is identical to that taken in ModernDive co-author <a href="https://twitter.com/rudeboybert">Albert Y. Kim’s</a> DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression.”</p>
-<center>
-<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/working-with-data-in-the-tidyverse"><img src="images/datacamp_working_with_data.png" style="height: 150px;"/></a>
-</center>
+<hr />
 </div>
 <div id="model3" class="section level2">
 <h2><span class="header-section-number">7.1</span> Two numerical explanatory variables</h2>
@@ -614,102 +601,102 @@ <h3><span class="header-section-number">7.1.1</span> Exploratory data analysis</
 <tbody>
 <tr>
 <td style="text-align:left;">
-119
+250
 </td>
 <td style="text-align:right;">
-0
+98
 </td>
 <td style="text-align:right;">
-2161
+1551
 </td>
 <td style="text-align:right;">
-27.0
+22.6
 </td>
 <td style="text-align:right;">
-173
+134
 </td>
 <td style="text-align:right;">
-40
+43
 </td>
 </tr>
 <tr>
 <td style="text-align:left;">
-41
+294
 </td>
 <td style="text-align:right;">
-50
+1677
 </td>
 <td style="text-align:right;">
-3327
+11200
 </td>
 <td style="text-align:right;">
-35.0
+140.7
 </td>
 <td style="text-align:right;">
-253
+817
 </td>
 <td style="text-align:right;">
-54
+46
 </td>
 </tr>
 <tr>
 <td style="text-align:left;">
-308
+172
 </td>
 <td style="text-align:right;">
-0
+283
 </td>
 <td style="text-align:right;">
-3874
+4270
 </td>
 <td style="text-align:right;">
-75.4
+36.9
 </td>
 <td style="text-align:right;">
-298
+299
 </td>
 <td style="text-align:right;">
-41
+63
 </td>
 </tr>
 <tr>
 <td style="text-align:left;">
-399
+41
 </td>
 <td style="text-align:right;">
-0
+50
 </td>
 <td style="text-align:right;">
-2525
+3327
 </td>
 <td style="text-align:right;">
-37.7
+35.0
 </td>
 <td style="text-align:right;">
-192
+253
 </td>
 <td style="text-align:right;">
-44
+54
 </td>
 </tr>
 <tr>
 <td style="text-align:left;">
-296
+186
 </td>
 <td style="text-align:right;">
-0
+450
 </td>
 <td style="text-align:right;">
-1389
+4442
 </td>
 <td style="text-align:right;">
-27.3
+30.4
 </td>
 <td style="text-align:right;">
-149
+316
 </td>
 <td style="text-align:right;">
-67
+30
 </td>
 </tr>
 </tbody>
@@ -730,7 +717,7 @@ <h3><span class="header-section-number">7.1.1</span> Exploratory data analysis</
  n obs: 400 
  n variables: 3 
 
-── Variable type:integer ─────
+── Variable type:integer ─────────────────────────────────────────────────────────────────────────────────────────────────────
  variable missing complete   n    mean      sd  p0     p25    p50     p75  p100
   Balance       0      400 400  520.01  459.76   0   68.75  459.5  863     1999
     Limit       0      400 400 4735.6  2308.2  855 3088    4622.5 5872.75 13913
@@ -738,7 +725,7 @@ <h3><span class="header-section-number">7.1.1</span> Exploratory data analysis</
  ▇▃▃▃▂▁▁▁
  ▅▇▇▃▂▁▁▁
 
-── Variable type:numeric ─────
+── Variable type:numeric ─────────────────────────────────────────────────────────────────────────────────────────────────────
  variable missing complete   n  mean    sd    p0   p25   p50   p75   p100
    Income       0      400 400 45.22 35.24 10.35 21.01 33.12 57.47 186.63
      hist
@@ -864,7 +851,7 @@ <h3><span class="header-section-number">7.1.1</span> Exploratory data analysis</
 </center>
 <p>Previously in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>, we plotted a “best-fitting” regression line through a set of points where the numerical outcome variable <span class="math inline">\(y\)</span> was teaching <code>score</code> and a single numerical explanatory variable <span class="math inline">\(x\)</span> was <code>bty_avg</code>. What is the analogous concept when we have <em>two</em> numerical predictor variables? Instead of a best-fitting line, we now have a best-fitting <em>plane</em>, which is a 3D generalization of lines which exist in 2D. Click <a href="https://beta.rstudioconnect.com/connect/#/apps/3214/">here</a> to open an interactive plot of the regression plane shown below in your browser. Move the image around, zoom in, and think about how this plane generalizes the concept of a linear regression line to three dimensions.</p>
 <!-- Need to replace link here since RStudio Connect is Amherst? -->
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-196"></span>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-203"></span>
 <img src="images/credit_card_balance_regression_plane.png" alt="Regression plane" width="\textwidth" />
 <p class="caption">
 FIGURE 7.2: Regression plane
@@ -1162,6 +1149,7 @@ <h3><span class="header-section-number">7.1.3</span> Observed/fitted values and
 <li><code>Balance_hat</code> corresponds to <span class="math inline">\(\widehat{y}\)</span> (the fitted value)</li>
 <li><code>residual</code> corresponds to <span class="math inline">\(y - \widehat{y}\)</span> (the residual)</li>
 </ul>
+<hr />
 </div>
 </div>
 <div id="model4" class="section level2">
@@ -1266,15 +1254,15 @@ <h3><span class="header-section-number">7.2.1</span> Exploratory data analysis</
  n obs: 463 
  n variables: 3 
 
-── Variable type:factor ──────
+── Variable type:factor ──────────────────────────────────────────────────────────────────────────────────────────────────────
  variable missing complete   n n_unique                top_counts ordered
    gender       0      463 463        2 mal: 268, fem: 195, NA: 0   FALSE
 
-── Variable type:integer ─────
+── Variable type:integer ─────────────────────────────────────────────────────────────────────────────────────────────────────
  variable missing complete   n  mean  sd p0 p25 p50 p75 p100     hist
       age       0      463 463 48.37 9.8 29  42  48  57   73 ▅▅▅▇▅▇▂▁
 
-── Variable type:numeric ─────
+── Variable type:numeric ─────────────────────────────────────────────────────────────────────────────────────────────────────
  variable missing complete   n mean   sd  p0 p25 p50 p75 p100     hist
     score       0      463 463 4.17 0.54 2.3 3.8 4.3 4.6    5 ▁▁▂▃▅▇▇▆</code></pre>
 <p>Furthermore, let’s compute the correlation between two numerical variables we have <code>score</code> and <code>age</code>. Recall that correlation coefficients only exist between numerical variables. We observe that they are weakly negatively correlated.</p>
@@ -1312,7 +1300,7 @@ <h3><span class="header-section-number">7.2.2</span> Multiple regression: Parall
 <span class="kw">get_regression_table</span>(score_model_<span class="dv">2</span>)</code></pre></div>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-211">TABLE 7.6: </span>Regression table
+<span id="tab:unnamed-chunk-218">TABLE 7.6: </span>Regression table
 </caption>
 <thead>
 <tr>
@@ -1443,7 +1431,7 @@ <h3><span class="header-section-number">7.2.3</span> Multiple regression: Intera
 <span class="kw">get_regression_table</span>(score_model_interaction)</code></pre></div>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-213">TABLE 7.7: </span>Regression table
+<span id="tab:unnamed-chunk-220">TABLE 7.7: </span>Regression table
 </caption>
 <thead>
 <tr>
@@ -1588,7 +1576,7 @@ <h3><span class="header-section-number">7.2.3</span> Multiple regression: Intera
 <p>Let’s summarize these values in a table:</p>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-214">TABLE 7.8: </span>Comparison of male and female intercepts and age slopes
+<span id="tab:unnamed-chunk-221">TABLE 7.8: </span>Comparison of male and female intercepts and age slopes
 </caption>
 <thead>
 <tr>
@@ -1771,6 +1759,7 @@ <h3><span class="header-section-number">7.2.4</span> Observed/fitted values and
 <li><code>score_hat</code> corresponds to <span class="math inline">\(\widehat{y} = \widehat{\mbox{score}}\)</span> the fitted value</li>
 <li><code>residual</code> corresponds to the residual <span class="math inline">\(y - \widehat{y}\)</span></li>
 </ul>
+<hr />
 </div>
 </div>
 <div id="related-topics-1" class="section level2">
@@ -1834,8 +1823,8 @@ <h3><span class="header-section-number">7.3.1</span> More on the correlation coe
 <div id="simpsonsparadox" class="section level3">
 <h3><span class="header-section-number">7.3.2</span> Simpson’s Paradox</h3>
 <p>Recall in Section <a href="7-multiple-regression.html#model3">7.1</a>, we saw the two following seemingly contradictory results when studying the relationship between credit card balance, credit limit, and income. On the one hand, the right hand plot of Figure <a href="7-multiple-regression.html#fig:2numxplot1">7.1</a> suggested that credit card balance and income were positively related:</p>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-217"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-217-1.png" alt="Relationship between credit card balance and credit limit/income" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-224"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-224-1.png" alt="Relationship between credit card balance and credit limit/income" width="\textwidth" />
 <p class="caption">
 FIGURE 7.5: Relationship between credit card balance and credit limit/income
 </p>
@@ -1848,6 +1837,7 @@ <h3><span class="header-section-number">7.3.2</span> Simpson’s Paradox</h3>
 <li>25% of credit limits were between $4622 and $5873. Let’s call this the “medium-high” credit limit bracket.</li>
 <li>25% of credit limits were over $5873. Let’s call this the “high” credit limit bracket.</li>
 </ol>
+<pre><code>`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</code></pre>
 <div class="figure" style="text-align: center"><span id="fig:credit-limit-quartiles"></span>
 <img src="ismaykimkuyper_files/figure-html/credit-limit-quartiles-1.png" alt="Histogram of credit limits and quartiles" width="\textwidth" />
 <p class="caption">
@@ -1882,21 +1872,21 @@ <h3><span class="header-section-number">7.3.2</span> Simpson’s Paradox</h3>
 <p class="caption">(\#fig:2numxplot5)Relationship between credit card balance and income for different credit limit brackets</p>
 </div>
 -->
+<hr />
 </div>
 </div>
 <div id="conclusion-5" class="section level2">
 <h2><span class="header-section-number">7.4</span> Conclusion</h2>
-<div id="whats-to-come-3" class="section level3">
-<h3><span class="header-section-number">7.4.1</span> What’s to come?</h3>
-<p>Congratulations! We’re ready to proceed to the third portion of this book: “statistical inference” using a new package called <code>infer</code>. Once we’ve covered Chapters <a href="8-sampling.html#sampling">8</a> on sampling, <a href="9-confidence-intervals.html#confidence-intervals">9</a> on confidence intervals, and <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> on hypothesis testing, we’ll come back to the models we’ve seen in “data modeling” in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on inference for regression. As we said at the end of Chapter <a href="6-regression.html#regression">6</a>, we’ll see why we’ve been conducting the residual analyses from Subsections <a href="11-inference-for-regression.html#model3residuals">11.4.3</a> and <a href="11-inference-for-regression.html#model4residuals">11.4.4</a>. We are actually verifying some very important assumptions that must be met for the <code>std_error</code> (standard error), <code>p_value</code>, <code>conf_low</code> and <code>conf_high</code> (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation.</p>
-<p>Up next:</p>
+<div id="additional-resources-5" class="section level3">
+<h3><span class="header-section-number">7.4.1</span> Additional resources</h3>
+<p>An R script file of all R code used in this chapter is available <a href="scripts/07-multiple-regression.R">here</a>.</p>
+</div>
+<div id="whats-to-come-4" class="section level3">
+<h3><span class="header-section-number">7.4.2</span> What’s to come?</h3>
+<p>Congratulations! We’re ready to proceed to the third portion of this book: “statistical inference” using a new package called <code>infer</code>. Once we’ve covered Chapters <a href="8-sampling.html#sampling">8</a> on sampling, <a href="9-confidence-intervals.html#confidence-intervals">9</a> on confidence intervals, and <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> on hypothesis testing, we’ll come back to the models we’ve seen in “data modeling” in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on inference for regression. As we said at the end of Chapter <a href="6-regression.html#regression">6</a>, we’ll see why we’ve been conducting the residual analyses from Subsections <a href="11-inference-for-regression.html#model3residuals">11.4.3</a> and <a href="11-inference-for-regression.html#model4residuals">11.4.4</a>. We are actually verifying some very important assumptions that must be met for the <code>std_error</code> (standard error), <code>p_value</code>, <code>conf_low</code> and <code>conf_high</code> (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Up next:</p>
 <center>
 <img src="images/flowcharts/flowchart/flowchart.006.png" title="ModernDive flowchart" width="800"/>
 </center>
-</div>
-<div id="script-of-r-code-1" class="section level3">
-<h3><span class="header-section-number">7.4.2</span> Script of R code</h3>
-<p>An R script file of all R code used in this chapter is available <a href="scripts/07-multiple-regression.R">here</a>.</p>
 
 </div>
 </div>
diff --git a/docs/8-sampling.html b/docs/8-sampling.html
index 13accdeac..e0119b191 100644
--- a/docs/8-sampling.html
+++ b/docs/8-sampling.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 8 Sampling | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 8 Sampling | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 8 Sampling | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -549,10 +542,11 @@ <h3>Needed packages</h3>
 </div>
 <div id="sampling-activity" class="section level2">
 <h2><span class="header-section-number">8.1</span> Sampling activity</h2>
-<p>Let’s start with a hand-on activity.</p>
+<p>Let’s start with a hands-on activity.</p>
 <div id="what-proportion-of-this-bowls-balls-are-red" class="section level3">
 <h3><span class="header-section-number">8.1.1</span> What proportion of this bowl’s balls are red?</h3>
-<p>Take a look at the bowl in Figure <a href="8-sampling.html#fig:sampling-exercise-1">8.1</a>. It has a certain number of red and and a certain number of white balls, all of equal size. What proportion of this bowl’s balls are red?</p>
+<p>Take a look at the bowl in Figure <a href="8-sampling.html#fig:sampling-exercise-1">8.1</a>. It has a certain number of red and a certain number of white balls all of equal size. Furthermore, it appears the bowl has been mixed beforehand as there does not seem to be any particular pattern to the spatial distribution of red and white balls.</p>
+<p>Let’s now ask ourselves, what proportion of this bowl’s balls are red?</p>
 <div class="figure" style="text-align: center"><span id="fig:sampling-exercise-1"></span>
 <img src="images/sampling_bowl_1.jpg" alt="A bowl with red and white balls." width="80%" />
 <p class="caption">
@@ -561,8 +555,8 @@ <h3><span class="header-section-number">8.1.1</span> What proportion of this bow
 </div>
 <p>One way to answer this question would be to perform an exhaustive count: remove each ball individually, count the number of red balls and the number of white balls, and divide the number of red balls by the total number of balls. However this would be a long and tedious process.</p>
 </div>
-<div id="using-shovel-once" class="section level3">
-<h3><span class="header-section-number">8.1.2</span> Using shovel once</h3>
+<div id="using-the-shovel-once" class="section level3">
+<h3><span class="header-section-number">8.1.2</span> Using the shovel once</h3>
 <p>Instead of performing an exhaustive count, let’s insert a shovel into the bowl as seen in Figure <a href="8-sampling.html#fig:sampling-exercise-2">8.2</a>.</p>
 <div class="figure" style="text-align: center"><span id="fig:sampling-exercise-2"></span>
 <img src="images/sampling_bowl_2.jpg" alt="Inserting a shovel into the bowl." width="80%" />
@@ -577,27 +571,22 @@ <h3><span class="header-section-number">8.1.2</span> Using shovel once</h3>
 FIGURE 8.3: Fifty balls from the bowl.
 </p>
 </div>
-<p>Observe that 17 of the balls are red and there are a total of 5 x 10 = 50 balls and thus 0.34 = 34% of the shovel’s balls are red. The proportion of balls that are red in this shovel is a guess of the proportion of balls that are red in the entire bowl. While not as exact as doing an exhaustive count, our guess of 34% took much less time and energy to obtain.</p>
-<p>However say we started this activity over from the beginning. In other words, we replace the 50 balls back into the ball and start over. Would we remove exactly 17 red balls again? In other words, would our guess at the proportion of the bowl’s balls that are red by exactly 34% again? Maybe?</p>
-<p>What if we repeated this exercise several times? Would I obtain exactly 17 red balls each time? In other words, would our guess at the proportion of the bowl’s balls that are red by exactly 34% every time? Surely not. Let’s actually do and observe the results with the help of 33 of our friends.</p>
+<p>Observe that 17 of the balls are red and there are a total of 5 x 10 = 50 balls and thus 0.34 = 34% of the shovel’s balls are red. We can view the proportion of balls that are red <em>in this shovel</em> as a guess of the proportion of balls that are red <em>in the entire bowl</em>. While not as exact as doing an exhaustive count, our guess of 34% took much less time and energy to obtain.</p>
+<p>However, say, we started this activity over from the beginning. In other words, we replace the 50 balls back into the bowl and start over. Would we remove exactly 17 red balls again? In other words, would our guess at the proportion of the bowl’s balls that are red be exactly 34% again? Maybe?</p>
+<p>What if we repeated this exercise several times? Would I obtain exactly 17 red balls each time? In other words, would our guess at the proportion of the bowl’s balls that are red be exactly 34% every time? Surely not. Let’s actually do and observe the results with the help of 33 of our friends.</p>
 </div>
 <div id="student-shovels" class="section level3">
-<h3><span class="header-section-number">8.1.3</span> Using shovel 33 times</h3>
-<p>Each of our 33 friends will do the following: use the shovel to remove 50 balls each, count the number of red balls, use this number to compute the proportion of the 50 balls they removed that are red, return the balls into the bowl, and mix the contents of the bowl a little to not let a previous group;s results influence the next group’s set of results.</p>
-<div class="figure" style="text-align: center"><span id="fig:sampling-exercise-3b"></span>
-<img src="images/sampling/tactile_2_a.jpg" alt="Repeating sampling activity 33 times." width="20%" />
-<p class="caption">
-FIGURE 8.4: Repeating sampling activity 33 times.
-</p>
-</div>
-<div class="figure" style="text-align: center"><span id="fig:sampling-exercise-3b"></span>
-<img src="images/sampling/tactile_2_b.jpg" alt="Repeating sampling activity 33 times." width="20%" />
-<p class="caption">
-FIGURE 8.4: Repeating sampling activity 33 times.
-</p>
-</div>
+<h3><span class="header-section-number">8.1.3</span> Using the shovel 33 times</h3>
+<p>Each of our 33 friends will do the following:</p>
+<ul>
+<li>use the shovel to remove 50 balls each,</li>
+<li>count the number of red balls,</li>
+<li>use this number to compute the proportion of the 50 balls they removed that are red,</li>
+<li>return the balls into the bowl, and</li>
+<li>mix the contents of the bowl a little to not let a previous group’s results influence the next group’s set of results.</li>
+</ul>
 <div class="figure" style="text-align: center"><span id="fig:sampling-exercise-3b"></span>
-<img src="images/sampling/tactile_2_c.jpg" alt="Repeating sampling activity 33 times." width="20%" />
+<img src="images/sampling/tactile_2_a.jpg" alt="Repeating sampling activity 33 times." width="20%" /><img src="images/sampling/tactile_2_b.jpg" alt="Repeating sampling activity 33 times." width="20%" /><img src="images/sampling/tactile_2_c.jpg" alt="Repeating sampling activity 33 times." width="20%" />
 <p class="caption">
 FIGURE 8.4: Repeating sampling activity 33 times.
 </p>
@@ -620,10 +609,10 @@ <h3><span class="header-section-number">8.1.3</span> Using shovel 33 times</h3>
 <ul>
 <li>At the low end, one group removed 50 balls from the bowl with proportion between 0.20 = 20% and 0.25 = 25%</li>
 <li>At the high end, another group removed 50 balls from the bowl with proportion between 0.45 = 45% and 0.5 = 50% red.</li>
-<li>However the most frequently occuring proportions were between 0.30 = 30% and 0.35 = 35% red, right in the middle of the distribution.</li>
+<li>However the most frequently occurring proportions were between 0.30 = 30% and 0.35 = 35% red, right in the middle of the distribution.</li>
 <li>The shape of this distribution is somewhat bell-shaped.</li>
 </ul>
-<p>Let’s construct this same hand-drawn histogram in R using your data visualization skills that you honed in Chapter <a href="3-viz.html#viz">3</a>. We saved our 33 groups of friend’s proportion red in a data frame <code>tactile_prop_red</code> which is included in the <code>moderndive</code> package you loaded earlier.</p>
+<p>Let’s construct this same hand-drawn histogram in R using your data visualization skills that you honed in Chapter <a href="3-viz.html#viz">3</a>. We saved our 33 group of friends’ proportion red in a data frame <code>tactile_prop_red</code> which is included in the <code>moderndive</code> package you loaded earlier.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">tactile_prop_red
 <span class="kw">View</span>(tactile_prop_red)</code></pre></div>
 <p>Let’s display only the first 10 out of 33 rows of <code>tactile_prop_red</code>’s contents in Table <a href="8-sampling.html#tab:tactilered">8.1</a>.</p>
@@ -790,8 +779,8 @@ <h3><span class="header-section-number">8.1.3</span> Using shovel 33 times</h3>
 </tr>
 </tbody>
 </table>
-<p>Observe for each <code>group</code> we have their names, the number of <code>red_balls</code> they obtained, and the corresponding proportion out of 50 balls that were red <code>prop_red</code>. Observe, we also have a variable <code>replicate</code> enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red.</p>
-<p>We visualize the distribution of these 33 proportions using a <code>geom_histogram()</code> with <code>binwidth = 0.05</code> in Figure <a href="8-sampling.html#fig:samplingdistribution-tactile">8.7</a>, which matches our hand-drawn histogram from the earlier Figure <a href="8-sampling.html#fig:sampling-exercise-5">8.6</a>. Recall that using a histogram is appropriate since <code>prop_red</code> is a numerical variable.</p>
+<p>Observe for each <code>group</code> we have their names, the number of <code>red_balls</code> they obtained, and the corresponding proportion out of 50 balls that were red named <code>prop_red</code>. Observe, we also have a variable <code>replicate</code> enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red.</p>
+<p>We visualize the distribution of these 33 proportions using a <code>geom_histogram()</code> with <code>binwidth = 0.05</code> in Figure <a href="8-sampling.html#fig:samplingdistribution-tactile">8.7</a>, which is appropriate since the variable <code>prop_red</code> is numerical. This computer-generated histogram matches our hand-drawn histogram from the earlier Figure <a href="8-sampling.html#fig:sampling-exercise-5">8.6</a>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(tactile_prop_red, <span class="kw">aes</span>(<span class="dt">x =</span> prop_red)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.05</span>, <span class="dt">boundary =</span> <span class="fl">0.4</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
 <span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Proportion of 50 balls that were red&quot;</span>, 
@@ -802,23 +791,28 @@ <h3><span class="header-section-number">8.1.3</span> Using shovel 33 times</h3>
 FIGURE 8.7: Distribution of 33 proportions based on 33 samples of size 50
 </p>
 </div>
+<!-- Albert will make sure that the chalkboard histogram matches up 
+with the ggplot2 histogram so that `boundary` isn't needed. -->
 </div>
 <div id="what-are-we-doing-here" class="section level3">
 <h3><span class="header-section-number">8.1.4</span> What are we doing here?</h3>
-<p>What we just demonstrated in this activity is the statistical concept of sampling. We would like to know the proportion of the bowl’s balls that are red. However, because the bowl has a very large number of balls, performing an exhaustive count of the number of red and white balls in the bowl would be very costly, both in terms of both time and energy. We therefore instead mix the balls and extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we approximate the proportion of the bowl’s balls that are red using the proportion of the shovel’s balls that are red, 17 red balls out of 50 balls = 34% in our earlier example.</p>
-<p>Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table <a href="8-sampling.html#tab:tactilered">8.1</a>. This is known as the concept of <em>sampling variation</em>.</p>
-<p>In Section <a href="8-sampling.html#sampling-simulation">8.2</a> we’ll mimic the hands-on sampling activity we just performed in a <em>computer simulation</em>; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the activity a very large number of times, but we will also be able to repeat it with different sized shovels.</p>
-<p>After these simulations, in Section <a href="8-sampling.html#sampling-goal">8.3</a> we’ll explicitly articulate our goals for this chapter: understanding the concept of sampling variation and the role that sample size plays in this variation.</p>
-<p>After having armed ourselves with this conceptual understanding of sampling, we’ll present you with definitions, terminology, and notation related to sampling in Section <a href="8-sampling.html#sampling-framework">8.4</a>. As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you’ll be able to master these topics.</p>
-<p>To tie the contents of this chapter to the real-word, we’ll present an example of one of the most recognizable uses of sampling: polls. In Section <a href="8-sampling.html#sampling-case-study">8.6</a> we’ll look at a particular case study: a 2013 poll on then President Obama’s popularity amongst young Americans, conducted by the Harvard Kennedy School’s Institute of Politics.</p>
-<p>We’ll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distiguishing between <em>random sampling</em> and <em>random assignment</em>, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter.</p>
+<p>What we just demonstrated in this activity is the statistical concept of <em>sampling</em>. We would like to know the proportion of the bowl’s balls that are red, but because the bowl has a very large number of balls performing an exhaustive count of the number of red and white balls in the bowl would be very costly in terms of both time and energy. We therefore extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we estimate the proportion of the bowl’s balls that are red using the proportion of the shovel’s balls that are red. This estimate in our earlier example was 17 red balls out of 50 balls = 34%. Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table <a href="8-sampling.html#tab:tactilered">8.1</a>. This is known as the concept of <em>sampling variation</em>.</p>
+<p>In Section <a href="8-sampling.html#sampling-simulation">8.2</a> we’ll mimic the hands-on sampling activity we just performed in a <em>computer simulation</em>; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the hands-on activity a very large number of times, but we will also be able to repeat it using different sized shovels.</p>
+<p>The purpose of these simulations is to develop an understanding of two key concepts relating to sampling: understanding the concept of sampling variation and the role that sample size plays in this variation. To this end, we’ll present you with definitions, terminology, and notation related to sampling in Section <a href="8-sampling.html#sampling-framework">8.3</a>. As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you’ll be able to master these topics.</p>
+<p>To tie the contents of this chapter to the real-word, we’ll present an example of one of the most recognizable uses of sampling: polls. In Section <a href="8-sampling.html#sampling-case-study">8.4</a> we’ll look at a particular case study: a 2013 poll on then President Obama’s popularity among young Americans, conducted by the Harvard Kennedy School’s Institute of Politics.</p>
+<p>We’ll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distinguishing between <em>random sampling</em> and <em>random assignment</em>, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter.</p>
+<!-- 
+Chester: Albert will add a discussion of *random sampling* and *random assignment* as it is currently missing. 
+Albert: To be added in Conclusion later
+-->
 <hr />
 </div>
 </div>
 <div id="sampling-simulation" class="section level2">
 <h2><span class="header-section-number">8.2</span> Computer simulation</h2>
-<p>What we performed in Section <a href="8-sampling.html#sampling-activity">8.1</a> is a <em>simulation</em> of sampling. The crowd-sourced Wikipedia definition of a simulation states: “A simulation is an approximate imitation of the operation of a process or system.”<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a> One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible.</p>
-<p>Now you might be thinking that simulations must necssarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengeres of being in an automobile crash. To distinguish between these two simulation types, we’ll term a simulation performed in real-life as a “tactile” simulation done with your hands and to the touch as opposed to a “virtual” simulation performed on a computer.</p>
+<p>What we performed in Section <a href="8-sampling.html#sampling-activity">8.1</a> is a <em>simulation</em> of sampling. In other words, we were not in a real-life sampling scenario in order to answer a real-life question, but rather we were mimicking such a scenario with our bowl and shovel. The crowd-sourced Wikipedia definition of a simulation states: “A simulation is an approximate imitation of the operation of a process or system.”<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a> One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible.</p>
+<p>Now you might be thinking that simulations must necessarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengers of being in an automobile crash. To distinguish between these two simulation types, we’ll term a simulation performed in real-life as a “tactile” simulation done with your hands and to the touch as opposed to a “virtual” simulation performed on a computer.</p>
+<!-- Albert will check if images exist on shutterstock and link to those images if needed below. -->
 <table>
 <thead>
 <tr class="header">
@@ -833,13 +827,13 @@ <h2><span class="header-section-number">8.2</span> Computer simulation</h2>
 </tr>
 </tbody>
 </table>
-<p>So while in Section <a href="8-sampling.html#sampling-activity">8.1</a> we performed a “tactile” simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we’ll perform a “virtual” simulation using a virtual bowl and a virtual shovel with our computers.</p>
+<p>So while in Section <a href="8-sampling.html#sampling-activity">8.1</a> we performed a “tactile” simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we’ll perform a “virtual” simulation using a “virtual” bowl and a “virtual” shovel with our computers.</p>
 <!--
 Supplement definition of simulation with idea of "replicates"?
 -->
-<div id="using-shovel-once-1" class="section level3">
-<h3><span class="header-section-number">8.2.1</span> Using shovel once</h3>
-<p>Let’s start by perfoming the virtual analogue of the tactile sampling simulation we performed in <a href="8-sampling.html#sampling-activity">8.1</a>. We first need a virtual analogue of the bowl seen in Figure <a href="8-sampling.html#fig:sampling-exercise-1">8.1</a>. To this end, we created a data frame called <code>bowl</code> whose rows correspond exactly with the contents of the actual bowl; we’ve included this data frame in the <code>moderndive</code> package.</p>
+<div id="using-the-virtual-shovel-once" class="section level3">
+<h3><span class="header-section-number">8.2.1</span> Using the virtual shovel once</h3>
+<p>Let’s start by performing the virtual analogue of the tactile sampling simulation we performed in <a href="8-sampling.html#sampling-activity">8.1</a>. We first need a virtual analogue of the bowl seen in Figure <a href="8-sampling.html#fig:sampling-exercise-1">8.1</a>. To this end, we included a data frame <code>bowl</code> in the <code>moderndive</code> package whose rows correspond exactly with the contents of the actual bowl.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bowl</code></pre></div>
 <pre><code># A tibble: 2,400 x 2
    ball_ID color
@@ -855,8 +849,9 @@ <h3><span class="header-section-number">8.2.1</span> Using shovel once</h3>
  9       9 red  
 10      10 white
 # … with 2,390 more rows</code></pre>
-<p>Observe in the output that <code>bowl</code> has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable <code>ball_ID</code> is used merely as an “identification variable” for this data frame as discussed in Subsection <a href="#identification-vs-measurement"><strong>??</strong></a>; none of the balls in the actual bowl are marked with numbers. The second variable <code>color</code> indicates whether a particular virtual ball i s red or white. Run <code>View(bowl)</code> in RStudio and scroll through the contents to convince yourselves that <code>bowl</code> is indeed a virtual version of the actual bowl in Figure <a href="8-sampling.html#fig:sampling-exercise-1">8.1</a>.</p>
-<p>Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure <a href="8-sampling.html#fig:sampling-exercise-2">8.2</a> to generate our random samples of 50 balls. We’re going to use the <code>rep_sample_n()</code> function included in the <code>moderndive</code> package that allows us to take <code>rep</code>eated/<code>rep</code>licated <code>samples of size</code>n<code>. Run the following and explore</code>virtual_shovel`’s contents in the spreadsheet viewer.</p>
+<!-- Albert will make sure identification-vs-measurement matches the name of a Subsection. -->
+<p>Observe in the output that <code>bowl</code> has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable <code>ball_ID</code> is used merely as an “identification variable” for this data frame as discussed in Subsection <a href="#identification-vs-measurement-variables"><strong>??</strong></a>; none of the balls in the actual bowl are marked with numbers. The second variable <code>color</code> indicates whether a particular virtual ball is red or white. View the contents of the bowl in RStudio’s data viewer and scroll through the contents to convince yourselves that <code>bowl</code> is indeed a virtual version of the actual bowl in Figure <a href="8-sampling.html#fig:sampling-exercise-1">8.1</a>.</p>
+<p>Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure <a href="8-sampling.html#fig:sampling-exercise-2">8.2</a>; we’ll use this virtual shovel to generate our virtual random samples of 50 balls. We’re going to use the <code>rep_sample_n()</code> function included in the <code>moderndive</code> package. This function allows us to take <code>rep</code>eated, or <code>rep</code>licated, <code>samples</code> of size <code>n</code>. Run the following and explore <code>virtual_shovel</code>’s contents in the RStudio viewer.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_shovel &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>)
 <span class="kw">View</span>(virtual_shovel)</code></pre></div>
@@ -991,10 +986,10 @@ <h3><span class="header-section-number">8.2.1</span> Using shovel once</h3>
 </tr>
 </tbody>
 </table>
-<p>The <code>ball_ID</code> variable identifies which of balls from <code>bowl</code> are included in our sample of 50 balls and <code>color</code> denotes it’s color. However what does the <code>replicate</code> variable indicate? In <code>virtual_shovel</code>’s case, <code>replicate</code> is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in other words our first sample. We’ll see below when we “virtually” take 33 samples below, <code>replicate</code> will take values between 1 and 33. Before we do this, let’s compute the proportion of balls in our virtual sample of size 50 that are red. We’ll be using the <code>dplyr</code> data wrangling verbs you learned in Chapter <a href="4-wrangling.html#wrangling">4</a>. Let’s breakdown the steps individually:</p>
-<p>First, for each of our 50 sampled balls, identify if it is red or not using the boolean algebra. For every row where <code>color == &quot;red&quot;</code>, the boolean <code>TRUE</code> is returned and for every row where <code>color</code> is not equal to <code>&quot;red&quot;</code>, the boolean <code>FALSE</code> is returned. Let’s create a new boolean variable <code>is_red</code> using the <code>mutate()</code> function from Section <a href="4-wrangling.html#mutate">4.5</a>:</p>
+<p>The <code>ball_ID</code> variable identifies which of the balls from <code>bowl</code> are included in our sample of 50 balls and <code>color</code> denotes its color. However what does the <code>replicate</code> variable indicate? In <code>virtual_shovel</code>’s case, <code>replicate</code> is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in our case our first sample. We’ll see below when we “virtually” take 33 samples, <code>replicate</code> will take values between 1 and 33. Before we do this, let’s compute the proportion of balls in our virtual sample of size 50 that are red using the <code>dplyr</code> data wrangling verbs you learned in Chapter <a href="4-wrangling.html#wrangling">4</a>. Let’s breakdown the steps individually:</p>
+<p>First, for each of our 50 sampled balls, identify if it is red using a test for equality using <code>==</code>. For every row where <code>color == &quot;red&quot;</code>, the Boolean <code>TRUE</code> is returned and for every row where <code>color</code> is not equal to <code>&quot;red&quot;</code>, the Boolean <code>FALSE</code> is returned. Let’s create a new Boolean variable <code>is_red</code> using the <code>mutate()</code> function from Section <a href="4-wrangling.html#mutate">4.5</a>:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_shovel <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">is_red =</span> color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)</code></pre></div>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">is_red =</span> (color <span class="op">==</span><span class="st"> &quot;red&quot;</span>))</code></pre></div>
 <pre><code># A tibble: 50 x 4
 # Groups:   replicate [1]
    replicate ball_ID color is_red
@@ -1012,13 +1007,13 @@ <h3><span class="header-section-number">8.2.1</span> Using shovel once</h3>
 # … with 40 more rows</code></pre>
 <p>Second, we compute the number of balls out of 50 that are red using the <code>summarize()</code> function. Recall from Section <a href="4-wrangling.html#summarize">4.3</a> that <code>summarize()</code> takes a data frame with many rows and returns a data frame with a single row containing summary statistics that you specify, like <code>mean()</code> and <code>median()</code>. In this case we use the <code>sum()</code>:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_shovel <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">is_red =</span> color <span class="op">==</span><span class="st"> &quot;red&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">is_red =</span> (color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_red =</span> <span class="kw">sum</span>(is_red))  </code></pre></div>
 <pre><code># A tibble: 1 x 2
   replicate num_red
       &lt;int&gt;   &lt;int&gt;
 1         1      17</code></pre>
-<p>Why does this work? Because R treats <code>TRUE</code> like the number <code>1</code> and <code>FALSE</code> like the number <code>0</code>. So summing the number of <code>TRUE</code>’s and <code>FALSE</code>’s is equivalent to summing <code>1</code>’s and <code>0</code>’s, which in the end which counts the number of balls where <code>color</code> is <code>red</code>.</p>
+<p>Why does this work? Because R treats <code>TRUE</code> like the number <code>1</code> and <code>FALSE</code> like the number <code>0</code>. So summing the number of <code>TRUE</code>’s and <code>FALSE</code>’s is equivalent to summing <code>1</code>’s and <code>0</code>’s, which in the end counts the number of balls where <code>color</code> is <code>red</code>. In our case, 17 of the 50 balls were red.</p>
 <p>Third and last, we compute the proportion of the 50 sampled balls that are red by dividing <code>num_red</code> by 50:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_shovel <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">mutate</span>(<span class="dt">is_red =</span> color <span class="op">==</span><span class="st"> &quot;red&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
@@ -1028,7 +1023,7 @@ <h3><span class="header-section-number">8.2.1</span> Using shovel once</h3>
   replicate num_red prop_red
       &lt;int&gt;   &lt;int&gt;    &lt;dbl&gt;
 1         1      17     0.34</code></pre>
-<p>Let’s make the above code a little more compact and succinct by combining the first <code>mutate()</code> and the <code>summarize()</code> as follows:</p>
+<p>In other words, this “virtual” sample’s balls were 34% red. Let’s make the above code a little more compact and succinct by combining the first <code>mutate()</code> and the <code>summarize()</code> as follows:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_shovel <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> num_red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)</code></pre></div>
@@ -1036,24 +1031,23 @@ <h3><span class="header-section-number">8.2.1</span> Using shovel once</h3>
   replicate num_red prop_red
       &lt;int&gt;   &lt;int&gt;    &lt;dbl&gt;
 1         1      17     0.34</code></pre>
-<p>Great! 44% of <code>virtual_shovel</code>’s 50 balls were red! So based on this particular sample, our guess at the proportion of <code>bowl</code>’s balls that are red is 44%. But remember from our earlier tactile sampling activity, that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 44% of them being red; there will likely be some variation.</p>
-<p>In fact in Table <a href="8-sampling.html#tab:virtual-shovel">8.2</a> we displayed 33 such proportions based on 33 tactile samples and then in Figure <a href="8-sampling.html#fig:sampling-exercise-5">8.6</a> we visualized the distribution of the 33 proportions in a histogram. Let’s now perform the virtual analogue of having 33 groups of students use the sampling shovel!</p>
+<p>Great! 34% of <code>virtual_shovel</code>’s 50 balls were red! So based on this particular sample, our guess at the proportion of the <code>bowl</code>’s balls that are red is 34%. But remember from our earlier tactile sampling activity that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 34% of them being red again; there will likely be some variation. In fact in Table <a href="8-sampling.html#tab:virtual-shovel">8.2</a> we displayed 33 such proportions based on 33 tactile samples and then in Figure <a href="8-sampling.html#fig:sampling-exercise-5">8.6</a> we visualized the distribution of the 33 proportions in a histogram. Let’s now perform the virtual analogue of having 33 groups of students use the sampling shovel!</p>
 </div>
-<div id="using-shovel-33-times" class="section level3">
-<h3><span class="header-section-number">8.2.2</span> Using shovel 33 times</h3>
-<p>Recall however in our tactile sampling exercise in Section <a href="8-sampling.html#sampling-activity">8.1</a> above that we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we used to then compute 33 proportions. In other words we <em>repeated/replicated</em> the sampling activity 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel funciton <code>rep_sample_n()</code>, but by adding the <code>reps = 33</code> argument indicating we want to repeat the sampling 33 times.</p>
-<p>Be sure to scroll through the contents of <code>virtual_samples</code> in RStudio’s spreadsheet viewer.</p>
+<div id="using-the-virtual-shovel-33-times" class="section level3">
+<h3><span class="header-section-number">8.2.2</span> Using the virtual shovel 33 times</h3>
+<p>Recall that in our tactile sampling exercise in Section <a href="8-sampling.html#sampling-activity">8.1</a> we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we then used to compute 33 proportions. In other words we repeated/replicated using the shovel 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel function <code>rep_sample_n()</code>, but by adding the <code>reps = 33</code> argument, indicating we want to repeat the sampling 33 times. Be sure to scroll through the contents of <code>virtual_samples</code> in RStudio’s viewer.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_samples &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">33</span>)
 <span class="kw">View</span>(virtual_samples)</code></pre></div>
-<p>Observe that while the first 50 rows of <code>replicate</code> are equal to <code>1</code> the next 50 are equal to <code>2</code>. This is indicating that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all <code>reps = 33</code> replicates and thus <code>virtual_samples</code> has 33 <span class="math inline">\(\times\)</span> 50 = 1650 rows.</p>
-<p>Let’s now take the data frame <code>virtual_samples</code> with 33 <span class="math inline">\(\times\)</span> 50 = 1650 rows corresponding to 33 samples of size 50 and compute the resulting 33 proportions red. We’ll use the same <code>dplyr</code> verbs as we did in the previous section, but this time with an additional <code>group_by()</code> the <code>replicate</code> variable. Recall from Section <a href="4-wrangling.html#groupby">4.4</a> that by assigning grouping “meta-data” before <code>summarizing()</code>, we’ll obtain 33 different proportions red:</p>
+<p>Observe that while the first 50 rows of <code>replicate</code> are equal to <code>1</code>, the next 50 rows of <code>replicate</code> are equal to <code>2</code>. This is telling us that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all <code>reps = 33</code> replicates and thus <code>virtual_samples</code> has 33 <span class="math inline">\(\times\)</span> 50 = 1650 rows.</p>
+<p>Let’s now take the data frame <code>virtual_samples</code> with 33 <span class="math inline">\(\times\)</span> 50 = 1650 rows corresponding to 33 samples of size 50 balls and compute the resulting 33 proportions red. We’ll use the same <code>dplyr</code> verbs as we did in the previous section, but this time with an additional <code>group_by()</code> of the <code>replicate</code> variable. Recall from Section <a href="4-wrangling.html#groupby">4.4</a> that by assigning the grouping variable “meta-data” before <code>summarizing()</code>, we’ll obtain 33 different proportions red:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red &lt;-<span class="st"> </span>virtual_samples <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)
 <span class="kw">View</span>(virtual_prop_red)</code></pre></div>
-<p>Let’s display only the first 10 out of 33 rows of <code>virtual_prop_red</code>’s contents in Table <a href="8-sampling.html#tab:tactilered">8.1</a>.</p>
+<p>Let’s display only the first 10 out of 33 rows of <code>virtual_prop_red</code>’s contents in Table <a href="8-sampling.html#tab:tactilered">8.1</a>. As one would expect, there is variation in the resulting <code>prop_red</code> proportions red for the first 10 out 33 repeated/replicated samples.</p>
+<!-- Albert will remove `boundary` here on updates to chalkboard image. -->
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
 <span id="tab:virtualred">TABLE 8.3: </span>First 10 out of 33 virtual proportion of 50 balls that are red.
@@ -1195,29 +1189,29 @@ <h3><span class="header-section-number">8.2.2</span> Using shovel 33 times</h3>
 FIGURE 8.8: Distribution of 33 proportions based on 33 samples of size 50
 </p>
 </div>
-<p>Observe that occasionally we obtained proportions red that are less than 0.3 = 30%, while occasionally we obtained proportions that are greater than 0.45 = 45%. However, the most frequently occurring proportions red out of 50 balls were between 35% and 40% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation.</p>
-<p>Let’s now compare our virtual results with our tactile results from the previous section in Figure <a href="8-sampling.html#fig:tactile-vs-virtual">8.9</a>. We see that both histograms, in other words the distribution of the 33 proportions red, are <em>somewhat</em> somewhat similar in their center and spread, although not identical; these slight differences are again due to random variation. Furthermore both distributions are <em>somewhat</em> bell-shaped.</p>
+<p>Observe that occasionally we obtained proportions red that are less than 0.3 = 30%, while on the other hand we occasionally we obtained proportions that are greater than 0.45 = 45%. However, the most frequently occurring proportions red out of 50 balls were between 35% and 40% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation.</p>
+<p>Let’s now compare our virtual results with our tactile results from the previous section in Figure <a href="8-sampling.html#fig:tactile-vs-virtual">8.9</a>. We see that both histograms, in other words the distribution of the 33 proportions red, are <em>somewhat</em> similar in their center and spread although not identical. These slight differences are again due to random variation. Furthermore both distributions are <em>somewhat</em> bell-shaped.</p>
 <div class="figure" style="text-align: center"><span id="fig:tactile-vs-virtual"></span>
-<img src="ismaykimkuyper_files/figure-html/tactile-vs-virtual-1.png" alt="Two distribution of 33 proportions based on 33 samples of size 50" width="\textwidth" />
+<img src="ismaykimkuyper_files/figure-html/tactile-vs-virtual-1.png" alt="Comparing 33 virtual and 33 tactile proportions red." width="\textwidth" />
 <p class="caption">
-FIGURE 8.9: Two distribution of 33 proportions based on 33 samples of size 50
+FIGURE 8.9: Comparing 33 virtual and 33 tactile proportions red.
 </p>
 </div>
 </div>
-<div id="using-shovel-1000-times" class="section level3">
-<h3><span class="header-section-number">8.2.3</span> Using shovel 1000 times</h3>
-<p>Now say we want study the variation in proportions red not based on 33 samples but rather a very large number of samples, say 1000 samples. We have two choices at this point. We could make our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. However, this would be cruel and unusual, as it this would be very tedious and time consuming. This is however where computers excel: for automating long and repetitive tasks and having them performed very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let’s once again use the <code>rep_sample_n()</code> function with sample <code>size</code> set to 50, but the number of replicates <code>reps = 1000</code>.</p>
-<p>Be sure to scroll through the contents of <code>virtual_samples</code> in RStudio’s spreadsheet viewer.</p>
+<div id="using-the-virtual-shovel-1000-times" class="section level3">
+<h3><span class="header-section-number">8.2.3</span> Using the virtual shovel 1000 times</h3>
+<p>Now say we want study the variation in proportions red not based on 33 repeated/replicated samples, but rather a very large number of samples say 1000 samples. We have two choices at this point. We could have our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. This would be cruel and unusual however, as this would be very tedious and time-consuming. This is where computers excel: automating long and repetitive tasks while performing them very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let’s once again use the <code>rep_sample_n()</code> function with sample <code>size</code> set to 50 once again, but this time with the number of replicates <code>reps = 1000</code>. Be sure to scroll through the contents of <code>virtual_samples</code> in RStudio’s viewer.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_samples &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">1000</span>)
 <span class="kw">View</span>(virtual_samples)</code></pre></div>
-<p>Observe that now <code>virtual_samples</code> has 1000 <span class="math inline">\(\times\)</span> 50 = 50,000 rows, instead of the 33 <span class="math inline">\(\times\)</span> 50 = 1650 rows from earlier. Using the same code as earlier, let’s take the data frame <code>virtual_samples</code> with 1000 <span class="math inline">\(\times\)</span> 50 = 50,000 and compute the resulting 33 proportions red.</p>
+<p>Observe that now <code>virtual_samples</code> has 1000 <span class="math inline">\(\times\)</span> 50 = 50,000 rows, instead of the 33 <span class="math inline">\(\times\)</span> 50 = 1650 rows from earlier. Using the same code as earlier, let’s take the data frame <code>virtual_samples</code> with 1000 <span class="math inline">\(\times\)</span> 50 = 50,000 and compute the resulting 1000 proportions red.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red &lt;-<span class="st"> </span>virtual_samples <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)
 <span class="kw">View</span>(virtual_prop_red)</code></pre></div>
 <p>Observe that we now have 1000 replicates of <code>prop_red</code>, the proportion of 50 balls that are red. Using the same code as earlier, let’s now visualize the distribution of these 1000 replicates of <code>prop_red</code> in a histogram in Figure <a href="8-sampling.html#fig:samplingdistribution-virtual-1000">8.10</a>.</p>
+<!-- Albert will remove `boundary` here on updates to chalkboard image. -->
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(virtual_prop_red, <span class="kw">aes</span>(<span class="dt">x =</span> prop_red)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.05</span>, <span class="dt">boundary =</span> <span class="fl">0.4</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
 <span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Proportion of 50 balls that were red&quot;</span>, 
@@ -1228,11 +1222,12 @@ <h3><span class="header-section-number">8.2.3</span> Using shovel 1000 times</h3
 FIGURE 8.10: Distribution of 1000 proportions based on 33 samples of size 50
 </p>
 </div>
-<p>Once again, the most frequently occuring proportions red occur between 35% and 40%. Every now and then, we’d obtain proportions are low as between 20% and 25%, and others as high as between 55% and 60%, but those are rarities. Furthermore observe that we now have much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix <a href="A-appendixA.html#appendixA">A</a> for a brief discussion on properties of the Normal distribution.</p>
+<p>Once again, the most frequently occurring proportions red occur between 35% and 40%. Every now and then, we obtain proportions as low as between 20% and 25%, and others as high as between 55% and 60%. These are rare however. Furthermore observe that we now have a much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix <a href="A-appendixA.html#appendixA">A</a> for a brief discussion on properties of the Normal distribution.</p>
+<!-- Albert will add Learning Checks throughout this chapter after going over this chapter with his students. We should aim for 10-15 multiple choice or explanation type questions for the chapter as a whole with a concluding lab at the end. Documenting student questions can help to write these. -->
 </div>
 <div id="using-different-shovels" class="section level3">
 <h3><span class="header-section-number">8.2.4</span> Using different shovels</h3>
-<p>We ask ourselves a question now. Say you had three choices of shovels to extract a sample of balls and compute the corresponding proportion of balls in the shovel that are red:</p>
+<p>Now say instead of just one shovel, you had three choices of shovels to extract a sample of balls with.</p>
 <table>
 <thead>
 <tr class="header">
@@ -1249,7 +1244,7 @@ <h3><span class="header-section-number">8.2.4</span> Using different shovels</h3
 </tr>
 </tbody>
 </table>
-<p>Which would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size, and thus would yield the “best” guess of the proportion of the bowl’s 2400 balls that are red. The three shovels above present with three possible sample sizes. Using our newly developed tools for virtual sampling simulations, let’s unpack the effect of having different sample sizes! In other words, for <code>size = 25</code>, <code>size = 50</code>, and <code>size = 100</code>:</p>
+<p>If your goal was still to estimate the proportion of the bowl’s balls that were red, which shovel would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size and hence would yield the “best” guess of the proportion of the bowl’s 2400 balls that are red. Using our newly developed tools for virtual sampling simulations, let’s unpack the effect of having different sample sizes! In other words, let’s use <code>rep_sample_n()</code> with <code>size = 25</code>, <code>size = 50</code>, and <code>size = 100</code>, while keeping the number of repeated/replicated samples at 1000:</p>
 <ol style="list-style-type: decimal">
 <li>Virtually use the appropriate shovel to generate 1000 samples with <code>size</code> balls.</li>
 <li>Compute the resulting 1000 replicated of the proportion of the shovel’s balls that are red.</li>
@@ -1310,8 +1305,8 @@ <h3><span class="header-section-number">8.2.4</span> Using different shovels</h3
 FIGURE 8.11: Comparing the distributions of proportion red for different sample sizes
 </p>
 </div>
-<p>Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation, and the distribution centers more tightly around the same value. Eyeballing Figure <a href="8-sampling.html#fig:comparing-sampling-distributions">8.11</a>, things appear to center more tightly around roughly 40%.</p>
-<p>We can be numerically explicit about the amount of spread using the <em>standard deviation</em>: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix <a href="A-appendixA.html#appendixA">A</a> for a brief discussion on properties of the standard deviation. For all three sample sizes, compute the standard deviation of <code>sd()</code> of the 1000 proportions red by running the following data wrangling code.</p>
+<p>Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation and the distribution centers more tightly around the same value. Eyeballing Figure <a href="8-sampling.html#fig:comparing-sampling-distributions">8.11</a>, things appear to center tightly around roughly 40%.</p>
+<p>We can be numerically explicit about the amount of spread in our 3 sets of 1000 values of <code>prop_red</code> using the <em>standard deviation</em>: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix <a href="A-appendixA.html#appendixA">A</a> for a brief discussion on properties of the standard deviation. For all three sample sizes, let’s compute the standard deviation of the 1000 proportions red by running the following data wrangling code that uses the <code>sd()</code> summary function.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># n = 25</span>
 virtual_prop_red_<span class="dv">25</span> <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">sd =</span> <span class="kw">sd</span>(prop_red))
@@ -1323,18 +1318,18 @@ <h3><span class="header-section-number">8.2.4</span> Using different shovels</h3
 <span class="co"># n = 100</span>
 virtual_prop_red_<span class="dv">100</span> <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">sd =</span> <span class="kw">sd</span>(prop_red))</code></pre></div>
-<p>Let’s compare these 3 measures of spread of the distributions we in Table <a href="8-sampling.html#tab:comparing-n">8.4</a>.</p>
+<p>Let’s compare these 3 measures of spread of the distributions in Table <a href="8-sampling.html#tab:comparing-n">8.4</a>.</p>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:comparing-n">TABLE 8.4: </span>Comparing the standard deviations of the proportion red for different sample sizes.
+<span id="tab:comparing-n">TABLE 8.4: </span>Comparing standard deviations of proportions red for 3 different shovels.
 </caption>
 <thead>
 <tr>
 <th style="text-align:right;">
-sample size
+Number of slots in shovel
 </th>
 <th style="text-align:right;">
-standard deviation
+Standard deviation of proportions red
 </th>
 </tr>
 </thead>
@@ -1369,288 +1364,295 @@ <h3><span class="header-section-number">8.2.4</span> Using different shovels</h3
 <hr />
 </div>
 </div>
-<div id="sampling-goal" class="section level2">
-<h2><span class="header-section-number">8.3</span> Our goal</h2>
-<p>Simply put: study the effects of sampling variation</p>
-<div id="what-is-sampling-variation" class="section level3">
-<h3><span class="header-section-number">8.3.1</span> What is sampling variation?</h3>
-</div>
-<div id="effect-of-sample-size" class="section level3">
-<h3><span class="header-section-number">8.3.2</span> Effect of sample size</h3>
-<hr />
-</div>
-</div>
 <div id="sampling-framework" class="section level2">
-<h2><span class="header-section-number">8.4</span> Sampling framework</h2>
-<div id="terminology" class="section level3">
-<h3><span class="header-section-number">8.4.1</span> Terminology</h3>
-<p>Let’s now define some concepts and terminology important to understand sampling, being sure to tie things back to the above example. You might have to read this a couple times more as you progress throughout this book, as they are very deeply layered concepts. However as we’ll soon see, they are very powerful concepts that open up a whole new world of scientific thinking:</p>
+<h2><span class="header-section-number">8.3</span> Sampling framework</h2>
+<p>In both our “hands-on” tactile simulations and our “virtual” simulations using a computer, we used sampling for the purpose of estimation: we extract samples in order to estimate the proportion of the bowl’s balls that are red. We used sampling as a cheaper and less-time consuming approach than to do a full census of all the balls. Our virtual simulations all built up to the results shown in Figure <a href="8-sampling.html#fig:comparing-sampling-distributions">8.11</a> and Table <a href="8-sampling.html#tab:comparing-n">8.4</a>, comparing 1000 proportions red based on samples of size 25, 50, and 100. This was our first attempt at understanding two key concepts relating to sampling for estimation:</p>
 <ol style="list-style-type: decimal">
-<li><strong>Population</strong>: The population is a set of <span class="math inline">\(N\)</span> observations of interest.
-<ul>
-<li>Above Ex: Our bowl consisting of <span class="math inline">\(N=2400\)</span> identically-shaped balls.</li>
-</ul></li>
-<li><strong>Population parameter</strong>: A population parameter is a numerical summary value about the population. In most settings, this is a value that’s unknown and you wish you knew it.
-<ul>
-<li>Above Ex: The true <em>population proportion <span class="math inline">\(p\)</span></em> of the balls in the bowl that are red.</li>
-<li>In this scenario the parameter of interest is the proportion, but in others it could be numerical summary values like the mean, median, etc.</li>
-</ul></li>
-<li><strong>Census</strong>: An exhaustive enumeration/counting of all observations in the population in order to compute the population parameter’s numerical value <em>exactly</em>.
-<ul>
-<li>Above Ex: This corresponds to manually going over all <span class="math inline">\(N=2400\)</span> balls and counting the number that are red, thereby allowing us to compute the population proportion <span class="math inline">\(p\)</span> of the balls that are red exactly.</li>
-<li>When <span class="math inline">\(N\)</span> is small, a census is feasible. However, when <span class="math inline">\(N\)</span> is large, a census can get very expensive, either in terms of time, energy, or money.</li>
-<li>Ex: the Decennial United States census attempts to exhaustively count the US population. Consequently it is a very expensive, but necessary, procedure.</li>
-</ul></li>
-<li><strong>Sampling</strong>: Collecting a sample of size <span class="math inline">\(n\)</span> of observations from the population. Typically the sample size <span class="math inline">\(n\)</span> is much smaller than the population size <span class="math inline">\(N\)</span>, thereby making sampling a much cheaper procedure than a census.
-<ul>
-<li>Above Ex: Using the shovel to extract a sample of <span class="math inline">\(n=50\)</span> balls.</li>
-<li>It is important to remember that the lowercase <span class="math inline">\(n\)</span> corresponds to the sample size and uppercase <span class="math inline">\(N\)</span> corresponds to the population size, thus <span class="math inline">\(n \leq N\)</span>.</li>
-</ul></li>
-<li><strong>Point estimates/sample statistics</strong>: A summary statistic based on the sample of size <span class="math inline">\(n\)</span> that <em>estimates</em> the unknown population parameter.
-<ul>
-<li>Above Ex: it’s the <em>sample proportion <span class="math inline">\(\widehat{p}\)</span></em> red of the balls in the sample of size <span class="math inline">\(n=50\)</span>.</li>
-<li>Key: The sample proportion red <span class="math inline">\(\widehat{p}\)</span> is an <em>estimate</em> of the true unknown population proportion red <span class="math inline">\(p\)</span>.</li>
-</ul></li>
-<li><strong>Representative sampling</strong>: A sample is said be a <em>representative sample</em> if it “looks like the population.” In other words, the sample’s characteristics are a good representation of the population’s characteristics.
-<ul>
-<li>Above Ex: Does our sample of <span class="math inline">\(n=50\)</span> balls “look like” the contents of the larger set of <span class="math inline">\(N=2400\)</span> balls in the bowl?</li>
-</ul></li>
-<li><strong>Generalizability</strong>: We say a sample is <em>generalizable</em> if any results of based on the sample can generalize to the population.
-<ul>
-<li>Above Ex: Is <span class="math inline">\(\widehat{p}\)</span> a “good guess” of <span class="math inline">\(p\)</span>?</li>
-<li>In other words, can we <em>infer</em> about the true proportion of the balls in the bowl that are red, based on the results of our sample of <span class="math inline">\(n=50\)</span> balls?</li>
-</ul></li>
-<li><strong>Bias</strong>: In a statistical sense, we say <em>bias</em> occurs if certain observations in a population have a higher chance of being sampled than others. We say a sampling procedure is <em>unbiased</em> if every observation in a population had an equal chance of being sampled.
-<ul>
-<li>Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? We feel since the balls are all of the same size, there isn’t any bias in the sampling. If, say, the red balls had a much larger diameter than the white ones then you might have have a higher or lower probability of now sampling red balls.</li>
-</ul></li>
-<li><strong>Random sampling</strong>: We say a sampling procedure is <em>random</em> if we sample randomly from the population in an unbiased fashion.
-<ul>
-<li>Above Ex: As long as you mixed the bowl sufficiently before sampling, your samples of size <span class="math inline">\(n=50\)</span> balls would be random.</li>
-</ul></li>
+<li>The effect of sampling variation on our estimates.</li>
+<li>The effect of sample size on sampling variation.</li>
 </ol>
-</div>
-<div id="sampling-for-inference" class="section level3">
-<h3><span class="header-section-number">8.4.2</span> Sampling for inference</h3>
-<p>Why did we go through the trouble of enumerating all the above concepts and terminology?</p>
-<p><strong>The moral of the story</strong>:</p>
+<p>Let’s now introduce some terminology and notation as well as statistical definitions related to sampling. Given the number of new words to learn, you will likely have to read these next three subsections multiple times. Keep in mind however that none of the concepts underlying these terminology, notation, and definitions are any different than the concepts underlying our simulations in Sections <a href="8-sampling.html#sampling-activity">8.1</a> and <a href="8-sampling.html#sampling-simulation">8.2</a>; it will simply take time and practice to master them.</p>
+<div id="terminology-notation" class="section level3">
+<h3><span class="header-section-number">8.3.1</span> Terminology &amp; notation</h3>
+<p>Here is a list of terminology and mathematical notation relating to sampling. For each item, we’ll be sure to tie them to our simulations in Sections <a href="8-sampling.html#sampling-activity">8.1</a> and <a href="8-sampling.html#sampling-simulation">8.2</a>.</p>
+<ol style="list-style-type: decimal">
+<li><strong>(Study) Population</strong>: A (study) population is a collection of individuals or observations about which we are interested. We mathematically denote the population’s size using upper case <span class="math inline">\(N\)</span>. In our simulations the (study) population was the collection of <span class="math inline">\(N\)</span> = 2400 identically sized red and white balls contained in the bowl.</li>
+<li><strong>Population parameter</strong>: A population parameter is a numerical summary quantity about the population that is unknown, but you wish you knew. For example, when this quantity is a mean, the population parameter of interest is the <em>population mean</em> which is mathematically denoted with the Greek letter <span class="math inline">\(\mu\)</span> (pronounced “mu”). In our simulations however since we were interested in the proportion of the bowl’s balls that were red, the population parameter is the <em>population proportion</em> which is mathematically denoted with the letter <span class="math inline">\(p\)</span>.</li>
+<li><strong>Census</strong>: An exhaustive enumeration or counting of all <span class="math inline">\(N\)</span> individuals or observations in the population in order to compute the population parameter’s value <em>exactly</em>. In our simulations, this would correspond to manually going over all <span class="math inline">\(N\)</span> = 2400 balls in the bowl and counting the number that are red and computing the population proportion <span class="math inline">\(p\)</span> of the balls that are red <em>exactly</em>. When the number <span class="math inline">\(N\)</span> of individuals or observations in our population is large, as was the case with our bowl, a census can be very expensive in terms of time, energy, and money.</li>
+<li><strong>Sampling</strong>: Sampling is the act of collecting a sample from the population when we don’t have the means to perform a census. We mathematically denote the sample’s size using lower case <span class="math inline">\(n\)</span>, as opposed to upper case <span class="math inline">\(N\)</span> which denotes the population’s size. Typically the sample size <span class="math inline">\(n\)</span> is much smaller than the population size <span class="math inline">\(N\)</span>, thereby making sampling a much cheaper procedure than a census. In our simulations, we used shovels with 25, 50, and 100 slots to extract a sample of size <span class="math inline">\(n\)</span> = 25, <span class="math inline">\(n\)</span> = 50, and <span class="math inline">\(n\)</span> = 100 balls.</li>
+<li><strong>Point estimate (AKA sample statistic)</strong>: A summary statistic computed from the sample that <em>estimates</em> the unknown population parameter. In our simulations, recall that the unknown population parameter was the population proportion and that this is mathematically denoted with <span class="math inline">\(p\)</span>. Our point estimate is the <em>sample proportion</em>: the proportion of the shovel’s balls that are red. In other words, it is our guess of the proportion of the bowl’s balls balls that are red. We mathematically denote the sample proportion using <span class="math inline">\(\widehat{p}\)</span>; the “hat” on top of the <span class="math inline">\(p\)</span> indicates that it is an estimate of the unknown population proportion <span class="math inline">\(p\)</span>.</li>
+<li><strong>Representative sampling</strong>: A sample is said be a <em>representative sample</em> if it is representative of the population. In other words, are the sample’s characteristics a good representation of the population’s characteristics? In our simulations, are the samples of <span class="math inline">\(n\)</span> balls extracted using our shovels representative of the bowl’s <span class="math inline">\(N\)</span>=2400 balls?</li>
+<li><strong>Generalizability</strong>: We say a sample is <em>generalizable</em> if any results based on the sample can generalize to the population. In other words, can the value of the point estimate be generalized to estimate the value of the population parameter well? In our simulations, can we generalize the values of the sample proportions red of our shovels to the population proportion red of the bowl? Using mathematical notation, is <span class="math inline">\(\widehat{p}\)</span> a “good guess” of <span class="math inline">\(p\)</span>?</li>
+<li><strong>Bias</strong>: In a statistical sense, we say <em>bias</em> occurs if certain individuals or observations in a population have a higher chance of being included in a sample than others. We say a sampling procedure is <em>unbiased</em> if every observation in a population had an equal chance of being sampled. In our simulations, since each ball had the same size and hence an equal chance of being sample in our shovels, our samples were unbiased.</li>
+<li><strong>Random sampling</strong>: We say a sampling procedure is <em>random</em> if we sample randomly from the population in an unbiased fashion. In our simulations, this would correspond to sufficiently mixing the bowl before each use of the shovel.</li>
+</ol>
+<p>Phew, that’s a lot of new terminology and notation to learn! Let’s put them all together to describe the paradigm of sampling:</p>
 <blockquote>
 <ul>
 <li>If the sampling of a sample of size <span class="math inline">\(n\)</span> is done at <strong>random</strong>, then</li>
-<li>The sample is <strong>unbiased</strong> and <strong>representative</strong> of the population, thus</li>
-<li>Any result based on the sample can <strong>generalize</strong> to the population, thus</li>
-<li>The <strong>point estimate/sample statistic</strong> is a “good guess” of the unknown population parameter of interest</li>
+<li>the sample is <strong>unbiased</strong> and <strong>representative</strong> of the population of size <span class="math inline">\(N\)</span>, thus</li>
+<li>any result based on the sample can <strong>generalize</strong> to the population, thus</li>
+<li>the point estimate is a <strong>“good guess”</strong> of the unknown population parameter, thus</li>
+<li>instead of performing a census, we can <strong>infer</strong> about the population using sampling.</li>
 </ul>
 </blockquote>
-<p><strong>and thus we have inferred about the population based on our sample. In the above example</strong>:</p>
+<p>Restricting consideration to a shovel with 50 slots from our simulations,</p>
 <blockquote>
 <ul>
-<li>If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size <span class="math inline">\(n=50\)</span>, then</li>
-<li>The contents of the shovel will “look like” the contents of the bowl, thus</li>
-<li>Any results based on the sample of <span class="math inline">\(n=50\)</span> balls can generalize to the large bowl of <span class="math inline">\(N=2400\)</span> balls, thus</li>
-<li>The sample proportion <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> balls in the shovel that are red is a “good guess” of the true population proportion <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls that are red.</li>
+<li>If we extract a sample of <span class="math inline">\(n=50\)</span> balls at <strong>random</strong>, in other words we mix the equally-sized balls before using the shovel, then</li>
+<li>the contents of the shovel are an <strong>unbiased representation</strong> of the contents of the bowl’s 2400 balls, thus</li>
+<li>any result based on the sample of balls can <strong>generalize</strong> to the bowl, thus</li>
+<li>the sample proportion <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> balls in the shovel that are red is a <strong>“good guess”</strong> of the population proportion <span class="math inline">\(p\)</span> of the <span class="math inline">\(N\)</span>=2400 balls that are red, thus</li>
+<li>instead of manually going over all the balls in the bowl, we can <strong>infer</strong> about the bowl using the shovel.</li>
 </ul>
 </blockquote>
-<p><strong>and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel.</strong></p>
+<p>Note that last word we wrote in bold: <strong>infer</strong>. The act of “inferring” is to deduce or conclude (information) from evidence and reasoning. In our simulations, we wanted to infer about the proportion of the bowl’s balls that are red. <em>Statistical inference</em> is the theory, methods, and practice of forming judgments about the parameters of a population and the reliability of statistical relationships, typically on the basis of random sampling (Wikipedia). In other words, statistical inference is the act of inference via sampling. In the upcoming Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a> on confidence intervals, we’ll introduce the <code>infer</code> package, which makes statistical inference “tidy” and transparent. It is why this third portion of the book is called “Statistical inference via infer”.</p>
 </div>
 <div id="statistical-definitions" class="section level3">
-<h3><span class="header-section-number">8.4.3</span> Statistical definitions</h3>
-<p>Sampling distributions are a specific kind of distribution: distributions of <em>point estimates/sample statistics</em> based on samples of size <span class="math inline">\(n\)</span> used to estimate an unknown <em>population parameter</em>.</p>
-<p>In the case of the histogram in Figure <a href="8-sampling.html#fig:samplingdistribution-tactile">8.7</a>, its the distribution of the sample proportion red <span class="math inline">\(\widehat{p}\)</span> based on <span class="math inline">\(n=50\)</span> sampled balls from the bowl, for which we want to estimate the unknown <em>population proportion</em> <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls that are red. Sampling distributions describe how values of the sample proportion red <span class="math inline">\(\widehat{p}\)</span> will vary from sample to sample due to <strong>sampling variability</strong> and thus identify “typical” and “atypical” values of <span class="math inline">\(\widehat{p}\)</span>. For example</p>
-<ul>
-<li>Obtaining a sample that yields <span class="math inline">\(\widehat{p} = 0.36\)</span> would be considered typical, common, and plausible since it would in theory occur frequently.</li>
-<li>Obtaining a sample that yields <span class="math inline">\(\widehat{p} = 0.8\)</span> would be considered atypical, uncommon, and implausible since it lies far away from most of the distribution.</li>
-</ul>
-<p>Let’s now ask ourselves the following questions:</p>
-<ol style="list-style-type: decimal">
-<li>Where is the sampling distribution centered?</li>
-<li>What is the spread of this sampling distribution?</li>
-</ol>
-<p>Recall from Section <a href="4-wrangling.html#summarize">4.3</a> the mean and the standard deviation are two summary statistics that would answer this question:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">tactile_prop_red <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(prop_red), <span class="dt">sd =</span> <span class="kw">sd</span>(prop_red))</code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
+<h3><span class="header-section-number">8.3.2</span> Statistical definitions</h3>
+<p>Now for some important statistical definitions related to sampling. As a refresher of our 1000 repeated/replicated virtual samples of size <span class="math inline">\(n\)</span> = 25, <span class="math inline">\(n\)</span> = 50, and <span class="math inline">\(n\)</span> = 100 in Section <a href="8-sampling.html#sampling-simulation">8.2</a>, let’s display Figure <a href="8-sampling.html#fig:comparing-sampling-distributions">8.11</a> again below.</p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-245-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>These types of distributions have a special name: <strong>sampling distributions</strong>; their visualization displays the effect of sampling variation on the distribution of any point estimate, in this case the sample proportion <span class="math inline">\(\widehat{p}\)</span>. Using these sampling distributions, for a given sample size <span class="math inline">\(n\)</span>, we can make statements about what values we can typically expect. For example, observe the centers of all three sampling distributions: they are all roughly centered around 0.4 = 40%. Furthermore, observe that while we are somewhat likely to observe sample proportions red of 0.2 = 20% when using the shovel with 25 slots, we will almost never observe this sample proportion when using the shovel with 100 slots. Observe also the effect of sample size on the sampling variation. As the sample size <span class="math inline">\(n\)</span> increases from 25 to 50 to 100, the spread/variation of the sampling distribution decreases and thus the values cluster more and more tightly around the same center of around 40%. We quantified this spread/variation using the standard deviation of our proportions in Table <a href="8-sampling.html#tab:comparing-n">8.4</a>, which we display again below:</p>
+<table>
 <thead>
 <tr>
 <th style="text-align:right;">
-mean
+Number of slots in shovel
 </th>
 <th style="text-align:right;">
-sd
+Standard deviation of proportions red
 </th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td style="text-align:right;">
-0.356
+25
+</td>
+<td style="text-align:right;">
+0.099
+</td>
+</tr>
+<tr>
+<td style="text-align:right;">
+50
+</td>
+<td style="text-align:right;">
+0.071
+</td>
+</tr>
+<tr>
+<td style="text-align:right;">
+100
 </td>
 <td style="text-align:right;">
-0.058
+0.048
 </td>
 </tr>
 </tbody>
 </table>
-<p>Finally, it’s important to keep in mind:</p>
-<ol style="list-style-type: decimal">
-<li>If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red <span class="math inline">\(p\)</span>, or in other words the true number of balls out of 2400 that are red.</li>
-<li>The spread of this histogram, as quantified by the standard deviation of 0.058, is called the <strong>standard error</strong>. It quantifies the uncertainty of our estimates of <span class="math inline">\(p\)</span>, which recall are called <span class="math inline">\(\widehat{p}\)</span>.
-<ul>
-<li><strong>Note</strong>: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors.</li>
-</ul></li>
-</ol>
-<ul>
-<li>sampling distribution</li>
-<li>standard error</li>
-</ul>
-<!--
-virtual_histogram <- virtual_histogram +
-  labs(
-    x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
-    title = "Sampling distribution: Virtual"
-    )
--->
-<p>Now let’s mimic the above <em>tactile</em> sampling, but with <em>virtual</em> sampling. We’ll resort to virtual sampling because while collecting 33 tactile samples manually is feasible, for large numbers like 1000, things start getting tiresome! That’s where a computer can really help: computers excel at performing mundane tasks repeatedly; think of what accounting software must be like!</p>
-<p>In Figure <a href="8-sampling.html#fig:samplingdistribution-virtual">8.8</a>, we can start seeing a pattern in the sampling distribution emerge. However, 33 values of the sample proportion <span class="math inline">\(\widehat{p}\)</span> might not be enough to get a true sense of the distribution. Using 1000 values of <span class="math inline">\(\widehat{p}\)</span> would definitely give a better sense. What are our two options for constructing these histograms?</p>
-<ol style="list-style-type: decimal">
-<li>Tactile sampling: Make the 33 groups of students take <span class="math inline">\(1000 / 33 \approx 31\)</span> samples of size <span class="math inline">\(n=50\)</span> each, count the number of red balls for each of the 1000 tactile samples, and then compute the 1000 corresponding values of the sample proportion <span class="math inline">\(\widehat{p}\)</span>. However, this would be cruel and unusual as this would take hours!</li>
-<li>Virtual sampling: Computers are very good at automating repetitive tasks such as this one. This is the way to go!</li>
-</ol>
-<p>First, generate 1000 samples of size <span class="math inline">\(n=50\)</span></p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_samples &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">1000</span>)
-<span class="kw">View</span>(virtual_samples)</code></pre></div>
-<p>Then for each of these 1000 samples of size <span class="math inline">\(n=50\)</span>, compute the corresponding sample proportions</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red &lt;-<span class="st"> </span>virtual_samples <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)
-<span class="kw">View</span>(virtual_prop_red)</code></pre></div>
-<p>As previously done, let’s plot the sampling distribution of these 1000 simulated values of the sample proportion red <span class="math inline">\(\widehat{p}\)</span> with a histogram in Figure <a href="8-sampling.html#fig:samplingdistribution-virtual-1000">8.10</a>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(virtual_prop_red, <span class="kw">aes</span>(<span class="dt">x =</span> prop_red)) <span class="op">+</span>
-<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.05</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
-<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Sample proportion red based on n = 50&quot;</span>, 
-       <span class="dt">title =</span> <span class="st">&quot;Sampling distribution of p-hat&quot;</span>) </code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-245"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-245-1.png" alt="Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50" width="\textwidth" />
+<p>So as the number of slots in the shovel increased, this standard deviation decreased. These types of standard deviations have another special name: <strong>standard errors</strong>; they quantify the effect of sampling variation induced on our estimates. In other words, they are quantifying how much we can expect different proportions of a shovel’s balls that are red to vary from random sample to random sample.</p>
+<p>Unfortunately, many new statistics practitioners get confused by these names. For example, it’s common for people new to statistical inference to call the “sampling distribution” the “sample distribution”. Another additional source of confusion is the name “standard deviation” and “standard error”. Remember that a standard error is merely a <em>kind</em> of standard deviation: the standard deviation of any point estimate from a sampling scenario. In other words, all standard errors are standard deviations, but not all standard deviations are a standard error.</p>
+<p>To help reinforce these concepts, let’s re-display Figure <a href="8-sampling.html#fig:comparing-sampling-distributions">8.11</a> but using our new terminology, notation, and definitions relating to sampling in Figure <a href="8-sampling.html#fig:comparing-sampling-distributions-2">8.12</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:comparing-sampling-distributions-2"></span>
+<img src="ismaykimkuyper_files/figure-html/comparing-sampling-distributions-2-1.png" alt="Three sampling distributions of the sample proportion $\widehat{p}$." width="\textwidth" />
 <p class="caption">
-FIGURE 8.12: Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50
+FIGURE 8.12: Three sampling distributions of the sample proportion <span class="math inline">\(\widehat{p}\)</span>.
 </p>
 </div>
-<p>Since the sampling is random and thus representative and unbiased, the above sampling distribution is centered at the true population proportion red <span class="math inline">\(p\)</span> of all <span class="math inline">\(N=2400\)</span> balls in the bowl. Eyeballing it, the sampling distribution appears to be centered at around 0.375.</p>
-<p>What is the standard error of the above sampling distribution of <span class="math inline">\(\widehat{p}\)</span> based on 1000 samples of size <span class="math inline">\(n=50\)</span>?</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">SE =</span> <span class="kw">sd</span>(prop_red))</code></pre></div>
-<pre><code># A tibble: 1 x 1
-      SE
-   &lt;dbl&gt;
-1 0.0702</code></pre>
-<p>What this value is saying might not be immediately apparent by itself to someone who is new to sampling. It’s best to first compare different standard errors for different sampling schemes based on different sample sizes <span class="math inline">\(n\)</span>. We’ll do so for samples of size <span class="math inline">\(n=25\)</span>, <span class="math inline">\(n=50\)</span>, and <span class="math inline">\(n=100\)</span> next.</p>
-<hr />
+<p>Furthermore, let’s re-display Table <a href="8-sampling.html#tab:comparing-n">8.4</a> but using our new terminology, notation, and definitions relating to sampling in Table <a href="8-sampling.html#tab:comparing-n-2">8.5</a>.</p>
+<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
+<caption style="font-size: initial !important;">
+<span id="tab:comparing-n-2">TABLE 8.5: </span>Three standard errors of the sample proportion <span class="math inline">\(\widehat{p}\)</span> based on n = 25, 50, 100.
+</caption>
+<thead>
+<tr>
+<th style="text-align:left;">
+Sample size
+</th>
+<th style="text-align:right;">
+Standard error of <span class="math inline">\(\widehat{p}\)</span>
+</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td style="text-align:left;">
+n = 25
+</td>
+<td style="text-align:right;">
+0.099
+</td>
+</tr>
+<tr>
+<td style="text-align:left;">
+n = 50
+</td>
+<td style="text-align:right;">
+0.071
+</td>
+</tr>
+<tr>
+<td style="text-align:left;">
+n = 100
+</td>
+<td style="text-align:right;">
+0.048
+</td>
+</tr>
+</tbody>
+</table>
+<!-- Potential Learning Check: Have readers fill in portions of the table instead. Have them write "Standard error of p-hat" and "sample size", and match the values of the standard errors with the different values of n. We could also use different sizes here instead of just re-displaying Figure 8.11 to ensure they aren't just looking back to match. -->
+<p>Remember the key message of this last table: that as the sample size <span class="math inline">\(n\)</span> goes up, the “typical” error of your point estimate as quantified by the standard error will go down.</p>
 </div>
+<div id="the-moral-of-the-story" class="section level3">
+<h3><span class="header-section-number">8.3.3</span> The moral of the story</h3>
+<p>Let’s recap this section so far. We’ve seen that if a sample is generated at random, then the resulting point estimate is a “good guess” of the true unknown population parameter. In our simulations, since we made sure to mix the balls first before extracting a sample with the shovel, the resulting sample proportion <span class="math inline">\(\widehat{p}\)</span> of the shovel’s balls that were red was a “good guess” of the population proportion <span class="math inline">\(p\)</span> of the bowl’s balls that were red.</p>
+<p>However, what do we mean by our point estimate being a “good guess”? While sometimes we’ll obtain a point estimate less than the true value of the unknown population parameter, other times we’ll obtain a point estimate greater than the true value of the unknown population parameter, this is because of sampling variation. However despite this sampling variation, our point estimates will “on average” be correct. In our simulations, sometimes our sample proportion <span class="math inline">\(\widehat{p}\)</span> was less than the true population proportion <span class="math inline">\(p\)</span>, other times the sample proportion <span class="math inline">\(\widehat{p}\)</span> was greater than the true population proportion <span class="math inline">\(p\)</span>. This was due to the sampling variability induced by the mixing. However despite this sampling variation, our sample proportions <span class="math inline">\(\widehat{p}\)</span> were always centered around the true population proportion. This is also known as having an <strong>accurate</strong> estimate.</p>
+<p>What was the value of the population proportion <span class="math inline">\(p\)</span> of the <span class="math inline">\(N\)</span> = 2400 balls in the actual bowl? There were 900 red balls, for a proportion red of 900/2400 = 0.375 = 37.5%! How do we know this? Did the authors do an exhaustive count of all the balls? No! They were listed on the contexts of the box that the bowl came in. Hence we made the contents of the virtual <code>bowl</code> match the tactile bowl:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">sum_red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>), 
+            <span class="dt">sum_not_red =</span> <span class="kw">sum</span>(color <span class="op">!=</span><span class="st"> &quot;red&quot;</span>))</code></pre></div>
+<pre><code># A tibble: 1 x 2
+  sum_red sum_not_red
+    &lt;int&gt;       &lt;int&gt;
+1     900        1500</code></pre>
+<p>Let’s re-display our sampling distributions from Figures <a href="8-sampling.html#fig:comparing-sampling-distributions">8.11</a> and <a href="8-sampling.html#fig:comparing-sampling-distributions-2">8.12</a>, but now with a vertical red line marking the true population proportion <span class="math inline">\(p\)</span> of balls that are red = 37.5% in Figure <a href="8-sampling.html#fig:comparing-sampling-distributions-3">8.13</a>. We see that while there is a certain amount of error in the sample proportions <span class="math inline">\(\widehat{p}\)</span> for all three sampling distributions, on average the <span class="math inline">\(\widehat{p}\)</span> are centered at the true population proportion red <span class="math inline">\(p\)</span>.</p>
+<div class="figure" style="text-align: center"><span id="fig:comparing-sampling-distributions-3"></span>
+<img src="ismaykimkuyper_files/figure-html/comparing-sampling-distributions-3-1.png" alt="Three sampling distributions with population proportion $p$ marked in red." width="\textwidth" />
+<p class="caption">
+FIGURE 8.13: Three sampling distributions with population proportion <span class="math inline">\(p\)</span> marked in red.
+</p>
 </div>
-<div id="sampling-intepretation" class="section level2">
-<h2><span class="header-section-number">8.5</span> Interpretation</h2>
-<p>At this point, you might be saying to yourself: “Big deal, why do we care about this bowl?” As hopefully you’ll soon come to appreciate, this sampling bowl exercise is merely a <strong>simulation</strong> representing the reality of many important sampling scenarios in a simplified and accessible setting. One in particular sampling scenario is familiar to many: polling. Whether for market research or for political purposes, polls inform much of the world’s decision and opinion making, and understanding the mechanism behind them can better inform you statistical citizenship. We’ll tie-in everything we learn in this chapter with an example relating to a 2013 poll on President Obama’s approval ratings among young adults in Section <a href="#polls"><strong>??</strong></a>.</p>
+<p>We also saw in this section that as your sample size <span class="math inline">\(n\)</span> increases, your point estimates will vary less and less and be more and more concentrated around the true population parameter; this is quantified by the decreasing standard error. In other words, the typical error of your point estimates will decrease. In our simulations, as the sample size increases, the spread/variation of our sample proportions <span class="math inline">\(\widehat{p}\)</span> around the true population proportion <span class="math inline">\(p\)</span> decreases. You can observe this behavior as well in Figure <a href="8-sampling.html#fig:comparing-sampling-distributions-3">8.13</a>. This is also known as having a more <strong>precise</strong> estimate.</p>
+<p>So random sampling ensures our point estimates are accurate, while having a large sample size ensures our point estimates are precise. While accuracy and precision may sound like the same concept, they are actually not. Accuracy relates to how “on target” our estimates are whereas precision relates to how “consistent” our estimates are. Figure <a href="8-sampling.html#fig:accuracy-vs-precision">8.14</a> illustrates the difference.</p>
+<!-- Albert will check to see if this image exists on shutterstock. If not, 
+it will need to be recreated. -->
+<div class="figure" style="text-align: center"><span id="fig:accuracy-vs-precision"></span>
+<img src="images/accuracy_vs_precision.jpg" alt="Comparing accuracy and precision" width="50%" />
+<p class="caption">
+FIGURE 8.14: Comparing accuracy and precision
+</p>
+</div>
+<p>As this point you might be asking yourself: “If you already knew the true proportion of the bowl’s balls that are red was 37.5%, then what did we do any of this for?” In other words, “If you already knew the value of the true unknown population parameter, then why did we do any sampling?” You might also be asking: “Why did we take 1000 repeated/replicated samples of size n = 25, 50, and 100? Shouldn’t we be taking only <em>one</em> sample that’s as large as possible?” Recall our definition of a simulation from Section <a href="8-sampling.html#sampling-simulation">8.2</a>: an approximate imitation of the operation of a process or system. We performed these simulations to study:</p>
+<ol style="list-style-type: decimal">
+<li>The effect of sampling variation on our estimates.</li>
+<li>The effect of sample size on sampling variation.</li>
+</ol>
+<p>In a real-life scenario, we won’t know what the true value of the population parameter is and furthermore we won’t take repeated/replicated samples but rather a single sample that’s as large as we can afford. This was also done to show the power of the technique of sampling when trying to estimate a population parameter. Since we knew the value was 37.5%, we could show just how well the different sample sizes approximated this value in their sampling distributions. We present one case study of a real-life sampling scenario in the next section: polling.</p>
 <hr />
 </div>
+</div>
 <div id="sampling-case-study" class="section level2">
-<h2><span class="header-section-number">8.6</span> Case study: Polls</h2>
-<p>In December 4, 2013 National Public Radio reported on a recent poll of President Obama’s approval rating among young Americans aged 18-29 in an article <a href="https://www.npr.org/sections/itsallpolitics/2013/12/04/248793753/poll-support-for-obama-among-young-americans-eroding">Poll: Support For Obama Among Young Americans Eroding</a>. A quote from the article:</p>
+<h2><span class="header-section-number">8.4</span> Case study: Polls</h2>
+<p>In December 4, 2013 National Public Radio in the US reported on a recent, at the time, poll of President Obama’s approval rating among young Americans aged 18-29 in an article <a href="https://www.npr.org/sections/itsallpolitics/2013/12/04/248793753/poll-support-for-obama-among-young-americans-eroding">Poll: Support For Obama Among Young Americans Eroding</a>. A quote from the article:</p>
 <blockquote>
 <p>After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama.</p>
 <p>According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama’s job performance, his lowest-ever standing among the group and an 11-point drop from April.</p>
 </blockquote>
-<p>Let’s tie elements of this story using the concepts and terminology we learned at the outset of this chapter along with our observations from the tactile and virtual sampling simulations:</p>
+<p>Let’s tie elements of the real-life poll in this new article with our “tactile” and “virtual” simulations from Sections <a href="8-sampling.html#sampling-activity">8.1</a> and <a href="8-sampling.html#sampling-simulation">8.2</a> using the terminology, notations, and definitions we learned in Section <a href="8-sampling.html#sampling-framework">8.3</a>.</p>
 <ol style="list-style-type: decimal">
-<li><strong>Population</strong>: Who is the population of <span class="math inline">\(N\)</span> observations of interest?
+<li><strong>(Study) Population</strong>: Who is the population of <span class="math inline">\(N\)</span> individuals or observations of interest?
 <ul>
-<li>Bowl: <span class="math inline">\(N=2400\)</span> identically-shaped balls</li>
-<li>Obama poll: <span class="math inline">\(N = \text{?}\)</span> young Americans aged 18-29</li>
+<li>Simulation: <span class="math inline">\(N\)</span> = 2400 identically-sized red and white balls</li>
+<li>Obama poll: <span class="math inline">\(N\)</span> = ? young Americans aged 18-29</li>
 </ul></li>
 <li><strong>Population parameter</strong>: What is the population parameter?
 <ul>
-<li>Bowl: The true population proportion <span class="math inline">\(p\)</span> of the balls in the bowl that are red.</li>
-<li>Obama poll: The true population proportion <span class="math inline">\(p\)</span> of young Americans who approve of Obama’s job performance.</li>
+<li>Simulation: The population proportion <span class="math inline">\(p\)</span> of ALL the balls in the bowl that are red.</li>
+<li>Obama poll: The population proportion <span class="math inline">\(p\)</span> of ALL young Americans who approve of Obama’s job performance.</li>
 </ul></li>
-<li><strong>Census</strong>: What would a census be in this case?
+<li><strong>Census</strong>: What would a census look like?
 <ul>
-<li>Bowl: Manually going over all <span class="math inline">\(N=2400\)</span> balls and exactly computing the population proportion <span class="math inline">\(p\)</span> of the balls that are red.</li>
-<li>Obama poll: Locating all <span class="math inline">\(N = \text{?}\)</span> young Americans (which is in the millions) and asking them if they approve of Obama’s job performance. This would be quite expensive to do!</li>
+<li>Simulation: Manually going over all <span class="math inline">\(N\)</span> = 2400 balls and exactly computing the population proportion <span class="math inline">\(p\)</span> of the balls that are red, a time consuming task.</li>
+<li>Obama poll: Locating all <span class="math inline">\(N\)</span> = ? young Americans and asking them all if they approve of Obama’s job performance, an expensive task.</li>
 </ul></li>
-<li><strong>Sampling</strong>: How do you acquire the sample of size <span class="math inline">\(n\)</span> observations?
+<li><strong>Sampling</strong>: How do you collect the sample of size <span class="math inline">\(n\)</span> individuals or observations?
 <ul>
-<li>Bowl: Using the shovel to extract a sample of <span class="math inline">\(n=50\)</span> balls.</li>
-<li>Obama poll: One way would be to get phone records from a database and pick out <span class="math inline">\(n\)</span> phone numbers. In the case of the above poll, the sample was of size <span class="math inline">\(n=2089\)</span> young adults.</li>
+<li>Simulation: Using a shovel with <span class="math inline">\(n\)</span> slots.</li>
+<li>Obama poll: One method is to get a list of phone numbers of all young Americans and pick out <span class="math inline">\(n\)</span> phone numbers. In this poll’s case, the sample size of this poll was <span class="math inline">\(n\)</span> = 2089 young Americans.</li>
 </ul></li>
-<li><strong>Point estimates/sample statistics</strong>: What is the summary statistic based on the sample of size <span class="math inline">\(n\)</span> that <em>estimates</em> the unknown population parameter?
+<li><strong>Point estimate (AKA sample statistic)</strong>: What is your estimate of the unknown population parameter?
 <ul>
-<li>Bowl: The <em>sample proportion <span class="math inline">\(\widehat{p}\)</span></em> red of the balls in the sample of size <span class="math inline">\(n=50\)</span>.</li>
-<li>Key: The sample proportion red <span class="math inline">\(\widehat{p}\)</span> of young Americans in the sample of size <span class="math inline">\(n=2089\)</span> that approve of Obama’s job performance. In this study’s case, <span class="math inline">\(\widehat{p} = 0.41\)</span> which is the quoted 41% figure in the article.</li>
+<li>Simulation: The sample proportion <span class="math inline">\(\widehat{p}\)</span> of the balls in the shovel that were red.</li>
+<li>Obama poll: The sample proportion <span class="math inline">\(\widehat{p}\)</span> of young Americans in the sample that approve of Obama’s job performance. In this poll’s case, <span class="math inline">\(\widehat{p}\)</span> = 0.41 = 41%, the quoted percentage in the second paragraph of the article.</li>
 </ul></li>
-<li><strong>Representative sampling</strong>: Is the sample procedure <em>representative</em>? In other words, to the resulting samples “look like” the population?
+<li><strong>Representative sampling</strong>: Is the sampling procedure <em>representative</em>?
 <ul>
-<li>Bowl: Does our sample of <span class="math inline">\(n=50\)</span> balls “look like” the contents of the larger set of <span class="math inline">\(N=2400\)</span> balls in the bowl?</li>
-<li>Obama poll: Does our sample of <span class="math inline">\(n=2089\)</span> young Americans “look like” the population of all young Americans aged 18-29?</li>
+<li>Simulation: Are the contents of the shovel representative of the contents of the bowl?</li>
+<li>Obama poll: Is the sample of <span class="math inline">\(n\)</span> = 2089 young Americans representative of all young Americans aged 18-29?</li>
 </ul></li>
 <li><strong>Generalizability</strong>: Are the samples <em>generalizable</em> to the greater population?
 <ul>
-<li>Bowl: Is <span class="math inline">\(\widehat{p}\)</span> a “good guess” of <span class="math inline">\(p\)</span>?</li>
-<li>Obama poll: Is <span class="math inline">\(\widehat{p} = 0.41\)</span> a “good guess” of <span class="math inline">\(p\)</span>? In other words, can we confidently say that 41% of <em>all</em> young Americans approve of Obama.</li>
+<li>Simulation: Is the sample proportion <span class="math inline">\(\widehat{p}\)</span> of the shovel’s balls that are red a “good guess” of the population proportion <span class="math inline">\(p\)</span> of the bowl’s balls that are red?</li>
+<li>Obama poll: Is the sample proportion <span class="math inline">\(\widehat{p}\)</span> = 0.41 of the sample of young Americans who support Obama a “good guess” of the population proportion <span class="math inline">\(p\)</span> of all young Americans who support Obama? In other words, can we confidently say that 41% of <em>all</em> young Americans approve of Obama?</li>
 </ul></li>
 <li><strong>Bias</strong>: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample?
 <ul>
-<li>Bowl: Here, I would say it is unbiased. All balls are equally sized as evidenced by the slots of the <span class="math inline">\(n=50\)</span> shovel, and thus no particular color of ball can be favored in our samples over others.</li>
-<li>Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using a database of only mobile phone numbers, would people without mobile phones be included? What about if this were an internet poll on a certain news website? Would non-readers of this this website be included?</li>
+<li>Simulation: Since each ball was equally sized, each ball had an equal chance of being included in a shovel’s sample, and hence the sampling was unbiased.</li>
+<li>Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using only mobile phone numbers, would people without mobile phones be included? What if those who disapproved of Obama were less likely to agree to take part in the poll? What about if this were an internet poll on a certain news website? Would non-readers of this website be included? We need to ask the Harvard University Institute of Politics pollsters about their <em>sampling methodology</em>.</li>
 </ul></li>
 <li><strong>Random sampling</strong>: Was the sampling random?
 <ul>
-<li>Bowl: As long as you mixed the bowl sufficiently before sampling, your samples would be random?</li>
-<li>Obama poll: Random sampling is a necessary assumption for all of the above to work. Most articles reporting on polls take this assumption as granted. In our Obama poll, you’d have to ask the group that conducted the poll: The Harvard University Institute of Politics.</li>
+<li>Simulation: As long as you mixed the bowl sufficiently before sampling, your samples would be random.</li>
+<li>Obama poll: Was the sample conducted at random? We need to ask the Harvard University Institute of Politics pollsters about their <em>sampling methodology</em>.</li>
 </ul></li>
 </ol>
-<p>Recall the punchline of all the above:</p>
+<p>Once again, let’s revisit the sampling paradigm:</p>
 <blockquote>
 <ul>
 <li>If the sampling of a sample of size <span class="math inline">\(n\)</span> is done at <strong>random</strong>, then</li>
-<li>The sample is <strong>unbiased</strong> and <strong>representative</strong> of the population, thus</li>
-<li>Any result based on the sample can <strong>generalize</strong> to the population, thus</li>
-<li>The <strong>point estimate/sample statistic</strong> is a “good guess” of the unknown population parameter of interest</li>
+<li>the sample is <strong>unbiased</strong> and <strong>representative</strong> of the population of size <span class="math inline">\(N\)</span>, thus</li>
+<li>any result based on the sample can <strong>generalize</strong> to the population, thus</li>
+<li>the point estimate is a <strong>“good guess”</strong> of the unknown population parameter, thus</li>
+<li>instead of performing a census, we can <strong>infer</strong> about the population using sampling.</li>
 </ul>
 </blockquote>
-<p>and thus we have <em>inferred</em> about the population based on our sample. In the bowl example:</p>
+<p>In our simulations using the shovel with 50 slots:</p>
 <blockquote>
 <ul>
-<li>If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size <span class="math inline">\(n=50\)</span>, then</li>
-<li>The contents of the shovel will “look like” the contents of the bowl, thus</li>
-<li>Any results based on the sample of <span class="math inline">\(n=50\)</span> balls can generalize to the large bowl of <span class="math inline">\(N=2400\)</span> balls, thus</li>
-<li>The sample proportion <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> sampled balls in the shovel that are red is a “good guess” of the true population proportion <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls that are red.</li>
+<li>If we extract a sample of <span class="math inline">\(n\)</span> = 50 balls at <strong>random</strong>, in other words we mix the equally-sized balls before using the shovel, then</li>
+<li>the contents of the shovel are an <strong>unbiased representation</strong> of the contents of the bowl’s 2400 balls, thus</li>
+<li>any result based on the sample of balls can <strong>generalize</strong> to the bowl, thus</li>
+<li>the sample proportion <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n\)</span> = 50 balls in the shovel that are red is a <strong>“good guess”</strong> of the population proportion <span class="math inline">\(p\)</span> of the <span class="math inline">\(N\)</span> = 2400 balls that are red, thus</li>
+<li>instead of manually going over all the balls in the bowl, we can <strong>infer</strong> about the bowl using the shovel.</li>
 </ul>
 </blockquote>
-<p>and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel: the proportion of balls that are red. In the Obama poll example:</p>
+<p>In the in-real life Obama poll:</p>
 <blockquote>
 <ul>
-<li>If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then</li>
-<li>These 2089 young Americans would “look like” the population of all young Americans, thus</li>
-<li>Any results based on this sample of 2089 young Americans can generalize to entire population of all young Americans, thus</li>
-<li>The reported sample approval rating of 41% of these 2089 young Americans is a “good guess” of the true approval rating amongst <em>all</em> young Americans.</li>
+<li>If we had a way of contacting a <strong>randomly</strong> chosen sample of 2089 young Americans and poll their approval of Obama, then</li>
+<li>these 2089 young Americans would be an <strong>unbiased</strong> and <strong>representative</strong> sample of <em>all</em> young Americans, thus</li>
+<li>any results based on this sample of 2089 young Americans can <strong>generalize</strong> to the entire population of all young Americans, thus</li>
+<li>the reported sample approval rating of 41% of these 2089 young Americans is a <strong>good guess</strong> of the true approval rating among all young Americans, thus</li>
+<li>instead of performing a highly costly census of all young Americans, we can <strong>infer</strong> about all young Americans using polling.</li>
 </ul>
 </blockquote>
-<p>So long story short, this poll’s guess of Obama’s approval rating was 41%. However is this the end of the story when understanding the results of a poll? If you read further in the article, it states:</p>
-<blockquote>
-<p>The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points.</p>
-</blockquote>
-<p>Note the term <em>margin of error</em>, which here is plus or minus 2.1 percentage points. This is saying that a typical range of errors for polls of this type is about <span class="math inline">\(\pm 2.1\%\)</span>, in words from about 2.1% too small to about 2.1% too big. These errors are caused by <em>sampling variation</em>, the same sampling variation you saw studied in the histograms in Sections <a href="#tactile"><strong>??</strong></a> on our tactile sampling simulations and Sections <a href="#virtual"><strong>??</strong></a> on our virtual sampling simulations.</p>
-<p>In this case of polls, any variation from the true approval rating is an “error” and a reasonable range of errors is the margin of error. We’ll see in the next chapter that this what’s known as a 95% confidence interval for the unknown approval rating. We’ll study confidence intervals using a new package for our data science and statistical toolbox: the <code>infer</code> package for statistical inference.</p>
+<!-- Albert will include different 5-10 problem statements here (made-up is perfectly fine) for students to practice these ideas as Learning Checks. -->
 <hr />
 </div>
 <div id="sampling-conclusion" class="section level2">
-<h2><span class="header-section-number">8.7</span> Conclusion</h2>
+<h2><span class="header-section-number">8.5</span> Conclusion</h2>
+<!--
+### Random sampling vs random assignment {#sampling-conclusion-sampling-vs-assignment}
+
+As big point of confusion is the difference between random sampling and random assignment.
+-->
+<div id="sampling-conclusion-central-limit-theorem" class="section level3">
+<h3><span class="header-section-number">8.5.1</span> Central Limit Theorem</h3>
+<p>What you did in Sections <a href="8-sampling.html#sampling-activity">8.1</a> and <a href="8-sampling.html#sampling-simulation">8.2</a> (in particular in Figure <a href="8-sampling.html#fig:comparing-sampling-distributions">8.11</a> and Table <a href="8-sampling.html#tab:comparing-n">8.4</a>) was demonstrate a very famous theorem, or mathematically proven truth, called the <em>Central Limit Theorem</em>. It loosely states that when sample means and sample proportions are based on larger and larger sample sizes, the sampling distribution of these two point estimates become more and more normally shaped and more and more narrow. In other words, their sampling distributions become more normally distributed and the spread/variation of these sampling distributions as quantified by their standard errors gets smaller. Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following 3m38s video at <a href="https://www.youtube.com/embed/jvoxEYmQHNM" class="uri">https://www.youtube.com/embed/jvoxEYmQHNM</a> explaining this crucial statistical theorem using the average weight of wild bunny rabbits and the average wing span of dragons as examples. Enjoy!</p>
+<center>
+<iframe width="800" height="450" src="https://www.youtube.com/embed/jvoxEYmQHNM" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen>
+</iframe>
+</center>
+</div>
 <div id="sampling-conclusion-table" class="section level3">
-<h3><span class="header-section-number">8.7.1</span> Table of inference scenarios</h3>
+<h3><span class="header-section-number">8.5.2</span> Summary table</h3>
+<p>In this chapter, we performed both tactile and virtual simulations of sampling to infer about an unknown proportion. We also presented a case study of a sampling in real life situation: polls. In both cases, we used the sample proportion <span class="math inline">\(\widehat{p}\)</span> to estimate the population proportion <span class="math inline">\(p\)</span>. However, we are not just limited to scenarios related statistical inference for proportions. In other words, we can consider other population parameter and point estimate scenarios than just the population proportion <span class="math inline">\(p\)</span> and sample proportion <span class="math inline">\(\widehat{p}\)</span> scenarios we studied in this chapter. We present 5 more such scenarios in Table <a href="8-sampling.html#tab:summarytable-ch8">8.6</a>.</p>
+<p>Note that the sample mean is traditionally noted as <span class="math inline">\(\overline{x}\)</span> but can also be thought of as an estimate of the population mean <span class="math inline">\(\mu\)</span>. Thus, it can also be denoted as <span class="math inline">\(\widehat{\mu}\)</span> as shown below in the table.</p>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:summarytable-ch8">TABLE 8.5: </span>Scenarios of sampling for inference
+<span id="tab:summarytable-ch8">TABLE 8.6: </span>Scenarios of sampling for inference
 </caption>
 <thead>
 <tr>
@@ -1776,42 +1778,35 @@ <h3><span class="header-section-number">8.7.1</span> Table of inference scenario
 </tr>
 </tbody>
 </table>
-<p>We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing:</p>
+<p>We’ll cover all the remaining scenarios as follows, using the terminology, notation, and definitions related to sampling you saw in Section <a href="8-sampling.html#sampling-framework">8.3</a>:</p>
+<ul>
+<li>In Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>, we’ll cover examples of statistical inference for
+<ul>
+<li>Scenario 2: The mean age <span class="math inline">\(\mu\)</span> of all pennies in circulation in the US.</li>
+<li>Scenario 3: The difference <span class="math inline">\(p_1 - p_2\)</span> in the proportion of people who yawn when seeing someone else yawn and the proportion of people who yawn without seeing someone else yawn. This is an example of <em>two-sample</em> inference.</li>
+</ul></li>
+<li>In Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a>, we’ll cover an example of statistical inference for
+<ul>
+<li>Scenario 4: The difference <span class="math inline">\(\mu_1 - \mu_2\)</span> in average IMDB ratings for action and romance movies. This is another example of <em>two-sample</em> inference.</li>
+</ul></li>
+<li>In Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a>, we’ll cover an example of statistical inference for the relationship between teaching score and various instructor demographic variables you saw in Chapter <a href="6-regression.html#regression">6</a> on basic regression and Chapter <a href="7-multiple-regression.html#multiple-regression">7</a> on multiple regression. Specifically
 <ul>
-<li>Scenario 2 about means. Ex: the average age of pennies.</li>
-<li>Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of <em>two-sample</em> inference.</li>
-<li>Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This is another situation of <em>two-sample</em> inference.</li>
+<li>Scenario 5: The intercept <span class="math inline">\(\beta_0\)</span> of some population regression line.</li>
+<li>Scenario 6: The slope <span class="math inline">\(\beta_1\)</span> of some population regression line.</li>
+</ul></li>
 </ul>
-<p>In Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on inference for regression, we’ll cover Scenarios 5 &amp; 6 about the regression line. In particular we’ll see that the fitted regression line from Chapter <a href="6-regression.html#regression">6</a> on basic regression, <span class="math inline">\(\widehat{y} = b_0 + b_1 \cdot x\)</span>, is in fact an estimate of some true population regression line <span class="math inline">\(y = \beta_0 + \beta_1 \cdot x\)</span> based on a sample of <span class="math inline">\(n\)</span> pairs of points <span class="math inline">\((x, y)\)</span>. Ex: Recall our sample of <span class="math inline">\(n=463\)</span> instructors at the UT Austin from the <code>evals</code> data set in Chapter <a href="6-regression.html#regression">6</a>. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for <em>all</em> instructors, not just those at the UT Austin?</p>
-<p>In most cases, we don’t have the population values as we did with the <code>bowl</code> of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a <strong>confidence interval</strong> and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as <strong>bootstrapping</strong> that will be the focus of the beginning sections of this chapter.</p>
-</div>
-<div id="sampling-conclusion-sampling-vs-assignment" class="section level3">
-<h3><span class="header-section-number">8.7.2</span> Random sampling vs random assignment</h3>
-</div>
-<div id="sampling-conclusion-central-limit-theorem" class="section level3">
-<h3><span class="header-section-number">8.7.3</span> Theory: Central Limit Theorem</h3>
-<p>What you did in Section <a href="#tactile"><strong>??</strong></a> and <a href="#virtual"><strong>??</strong></a> was demonstrate a very famous theorem, or mathematically proven truth, called the <em>Central Limit Theorem</em>. It loosely states that when sample means and sample proportions are based on larger and larger samples, the sampling distribution corresponding to these point estimates get</p>
-<ol style="list-style-type: decimal">
-<li>More and more normal</li>
-<li>More and more narrow</li>
-</ol>
-<p>Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following three minute and 38 second video explaining this crucial theorem to statistics using as examples, what else?</p>
-<ol style="list-style-type: decimal">
-<li>The average weight of wild bunny rabbits!</li>
-<li>The average wing span of dragons!</li>
-</ol>
-<center>
-<iframe width="800" height="450" src="https://www.youtube.com/embed/jvoxEYmQHNM" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen>
-</iframe>
-</center>
 </div>
-<div id="sampling-conclusion-standard-error" class="section level3">
-<h3><span class="header-section-number">8.7.4</span> Formula: Standard error</h3>
-</div>
-<div id="closing-notes" class="section level3">
-<h3><span class="header-section-number">8.7.5</span> Closing notes</h3>
-<p>This chapter serves as an introduction to the theoretical underpinning of the statistical inference techniques that will be discussed in greater detail in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a> for confidence intervals and Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> for hypothesis testing.</p>
+<div id="additional-resources-6" class="section level3">
+<h3><span class="header-section-number">8.5.3</span> Additional resources</h3>
 <p>An R script file of all R code used in this chapter is available <a href="scripts/08-sampling.R">here</a>.</p>
+</div>
+<div id="whats-to-come-5" class="section level3">
+<h3><span class="header-section-number">8.5.4</span> What’s to come?</h3>
+<p>Recall in our Obama poll case study in Section <a href="8-sampling.html#sampling-case-study">8.4</a> that based on this particular sample, the Harvard University Institute of Politics’ best guess of Obama’s approval rating among all young Americans was 41%. However, this isn’t the end of the story. If you read further in the article, it states:</p>
+<blockquote>
+<p>The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points.</p>
+</blockquote>
+<p>Note the term <em>margin of error</em>, which here is plus or minus 2.1 percentage points. What this is saying is that most polls won’t get it perfectly right; there will always be a certain amount of error caused by <em>sampling variation</em>. The margin of error of plus or minus 2.1 percentage points is saying that a typical range of errors for polls of this type is about <span class="math inline">\(\pm\)</span> 2.1%, in words from about 2.1% too small to about 2.1% too big for an interval of [41% - 2.1%, 41% + 2.1%] = [37.9%, 43.1%]. Remember that this notation corresponds to 37.9% and 43.1% being included as well as all numbers between the two of them. We’ll see in the next chapter that such intervals are known as <em>confidence intervals</em>.</p>
 
 </div>
 </div>
diff --git a/docs/9-confidence-intervals.html b/docs/9-confidence-intervals.html
index 39a0822a0..94c42a1d9 100644
--- a/docs/9-confidence-intervals.html
+++ b/docs/9-confidence-intervals.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Chapter 9 Confidence Intervals | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Chapter 9 Confidence Intervals | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Chapter 9 Confidence Intervals | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -538,6 +531,13 @@ <h1>
 </html>
 <div id="confidence-intervals" class="section level1">
 <h1><span class="header-section-number">Chapter 9</span> Confidence Intervals</h1>
+<hr />
+<div class="announcement">
+<p>
+<strong>In preparation for our first print edition to be published by CRC Press in Fall 2019, we’re remodeling this chapter a bit. Don’t expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at <a href="https://moderndive.com/">ModernDive.com</a> by early Summer 2019!</strong>
+</p>
+</div>
+<hr />
 <p>In Chapter <a href="8-sampling.html#sampling">8</a>, we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter <a href="8-sampling.html#sampling">8</a>:</p>
 <p>Generally speaking, we learned that if the sampling of a sample of size <span class="math inline">\(n\)</span> is done at <em>random</em>, then the resulting sample is <em>unbiased</em> and <em>representative</em> of the <em>population</em>, thus any result based on the sample can <em>generalize</em> to the population, and hence the <strong>point estimate/sample statistic</strong> computed from this sample is a “good guess” of the unknown population parameter of interest</p>
 <p>Specific to the bowl, we learned that if we properly mix the balls first thereby ensuring the randomness of samples extracted using the shovel with <span class="math inline">\(n=50\)</span> slots, then the contents of the shovel will “look like” the contents of the bowl, thus any results based on the sample of <span class="math inline">\(n=50\)</span> balls can generalize to the large bowl of <span class="math inline">\(N=2400\)</span> balls, and hence the sample proportion red <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> balls in the shovel is a “good guess” of the true population proportion red <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls in the bowl.</p>
@@ -722,7 +722,7 @@ <h3><span class="header-section-number">9.1.2</span> Exploratory data analysis</
 <p>First, let’s visualize the values in this sample as a histogram:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(pennies_sample, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-251-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-253-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see a roughly symmetric distribution here that has quite a few values near 20 years in age with only a few larger than 40 years or smaller than 5 years. If <code>pennies_sample</code> is a representative sample from the population, we’d expect the age of all US pennies collected in 2011 to have a similar shape, a similar spread, and similar measures of central tendency like the mean.</p>
 <p>So where does the mean value fall for this sample? This point will be known as our <strong>point estimate</strong> and provides us with a single number that could serve as the guess to what the true population mean age might be. Recall how to find this using the <code>dplyr</code> package:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x_bar &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
@@ -760,7 +760,7 @@ <h3><span class="header-section-number">9.1.3</span> The Bootstrapping Process</
 <p>Let’s visualize what this new bootstrap sample looks like:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(bootstrap_sample1, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-255-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-257-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We now have another sample from what we could assume comes from the population of interest. We can similarly calculate the sample mean of this bootstrap sample, called a <strong>bootstrap statistic</strong>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_sample1 <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre></div>
@@ -787,7 +787,7 @@ <h3><span class="header-section-number">9.1.3</span> The Bootstrapping Process</
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(six_bootstrap_samples, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
 <span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>replicate)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-258-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-260-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We can also look at the six different means using <code>dplyr</code> syntax:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">six_bootstrap_samples <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
@@ -862,6 +862,7 @@ <h3><span class="header-section-number">9.2.2</span> Generate replicates</h3>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">thousand_bootstrap_samples &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age_in_<span class="dv">2011</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>)</code></pre></div>
+<pre><code>Setting `type = &quot;bootstrap&quot;` in `generate()`.</code></pre>
 <p>We can use the <code>dplyr</code> <code>count()</code> function to help us understand what the <code>thousand_bootstrap_samples</code> data frame looks like:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">thousand_bootstrap_samples <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">count</span>(replicate)</code></pre></div>
@@ -890,8 +891,9 @@ <h3><span class="header-section-number">9.2.3</span> Calculate summary statistic
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age_in_<span class="dv">2011</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)
-bootstrap_distribution</code></pre></div>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre></div>
+<pre><code>Setting `type = &quot;bootstrap&quot;` in `generate()`.</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution</code></pre></div>
 <pre><code># A tibble: 1,000 x 2
    replicate  stat
        &lt;int&gt; &lt;dbl&gt;
@@ -933,7 +935,7 @@ <h3><span class="header-section-number">9.2.4</span> Visualize the results</h3>
 <p>The <code>visualize()</code> verb provides a simple way to view the bootstrap distribution as a histogram of the <code>stat</code> variable values. It has many other arguments that one can use as well including the shading of the histogram values corresponding to the confidence interval values.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-271-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-273-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>The shape of this resulting distribution may look familiar to you. It resembles the well-known normal (bell-shaped) curve.</p>
 <p>The following diagram recaps the <code>infer</code> pipeline for creating a bootstrap distribution.</p>
 <p><img src="images/flowcharts/infer/ci_diagram.png" width="\textwidth" style="display: block; margin: auto;" /></p>
@@ -948,7 +950,7 @@ <h2><span class="header-section-number">9.3</span> Now to confidence intervals</
 <p>The bootstrapping process will provide bootstrap statistics that have a bootstrap distribution with center at (or extremely close to) the mean of the original sample. This can be seen by giving the observed statistic <code>obs_stat</code> argument the value of the point estimate <code>x_bar</code>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> x_bar)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-273-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-275-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We can also compute the mean of the bootstrap distribution of means to see how it compares to <code>x_bar</code>:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_of_means =</span> <span class="kw">mean</span>(stat))</code></pre></div>
@@ -977,7 +979,7 @@ <h3><span class="header-section-number">9.3.1</span> The percentile method</h3>
 <p>Using the percentile method, our range of plausible values for the mean age of US pennies in circulation in 2011 is 20.972 years to 29.252 years. We can use the <code>visualize()</code> function to view this using the <code>endpoints</code> and <code>direction</code> arguments, setting <code>direction</code> to <code>&quot;between&quot;</code> (between the values) and <code>endpoints</code> to be those stored with name <code>percentile_ci</code>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> percentile_ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-277-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-279-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>You can see that 95% of the data stored in the <code>stat</code> variable in <code>bootstrap_distribution</code> falls between the two endpoints with 2.5% to the left outside of the shading and 2.5% to the right outside of the shading. The cut-off points that provide our range are shown with the darker lines.</p>
 </div>
 <div id="the-standard-error-method" class="section level3">
@@ -998,7 +1000,7 @@ <h3><span class="header-section-number">9.3.2</span> The standard error method</
 1  21.0  29.3</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> standard_error_ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-280-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-282-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see that both methods produce nearly identical confidence intervals with the percentile method being <span class="math inline">\([20.97, 29.25]\)</span> and the standard error method being <span class="math inline">\([20.97, 29.28]\)</span>.</p>
 </div>
 </div>
@@ -1007,7 +1009,7 @@ <h2><span class="header-section-number">9.4</span> Comparing bootstrap and sampl
 <p>To help build up the idea of a confidence interval, we weren’t completely honest in our initial discussion. The <code>pennies_sample</code> data frame represents a sample from a larger number of pennies stored as <code>pennies</code> in the <code>moderndive</code> package. The <code>pennies</code> data frame (also in the <code>moderndive</code> package) contains 800 rows of data and two columns pertaining to the same variables as <code>pennies_sample</code>. Let’s begin by understanding some of the properties of the <code>age_by_2011</code> variable in the <code>pennies</code> data frame.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(pennies, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-281-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-283-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pennies <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_age =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>),
             <span class="dt">median_age =</span> <span class="kw">median</span>(age_in_<span class="dv">2011</span>))</code></pre></div>
@@ -1018,7 +1020,7 @@ <h2><span class="header-section-number">9.4</span> Comparing bootstrap and sampl
 <p>We see that <code>pennies</code> is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that <code>pennies_sample</code> was more symmetric than <code>pennies</code>. In fact, it actually exhibited some left-skew as we compare the mean and median values.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(pennies_sample, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-283-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-285-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_age =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>), <span class="dt">median_age =</span> <span class="kw">median</span>(age_in_<span class="dv">2011</span>))</code></pre></div>
 <pre><code># A tibble: 1 x 2
@@ -1040,8 +1042,8 @@ <h4>Sampling distribution</h4>
 -->
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(sampling_distribution, <span class="kw">aes</span>(<span class="dt">x =</span> stat)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">fill =</span> <span class="st">&quot;salmon&quot;</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-287"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-287-1.png" alt="Sampling distribution for n=40 samples of pennies" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-289"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-289-1.png" alt="Sampling distribution for n=40 samples of pennies" width="\textwidth" />
 <p class="caption">
 FIGURE 9.1: Sampling distribution for n=40 samples of pennies
 </p>
@@ -1059,7 +1061,7 @@ <h4>Bootstrap distribution</h4>
 <p>Let’s now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We’ll shade the bootstrap distribution blue to further assist with remembering which is which.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">fill =</span> <span class="st">&quot;blue&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-289-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-291-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">se =</span> <span class="kw">sd</span>(stat))</code></pre></div>
 <pre><code># A tibble: 1 x 1
@@ -1103,17 +1105,18 @@ <h2><span class="header-section-number">9.5</span> Interpreting the confidence i
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> age_in_<span class="dv">2011</span> <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">get_ci</span>()
-percentile_ci2</code></pre></div>
+<span class="st">  </span><span class="kw">get_ci</span>()</code></pre></div>
+<pre><code>Setting `type = &quot;bootstrap&quot;` in `generate()`.</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">percentile_ci2</code></pre></div>
 <pre><code># A tibble: 1 x 2
   `2.5%` `97.5%`
    &lt;dbl&gt;   &lt;dbl&gt;
 1   18.4    25.3</code></pre>
 <p>This new confidence interval also contains the value of <span class="math inline">\(\mu\)</span>. Let’s further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of <code>pennies</code>. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years.</p>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-297-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-299-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Of the 100 confidence intervals based on samples of size <span class="math inline">\(n = 40\)</span>, 96 of them captured the population mean <span class="math inline">\(\mu = 21.152\)</span>, whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we’d expect 95% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is “95% reliable” in that we can expect it to include the true population parameter 95% of the time if the process is repeated.</p>
 <p>To further accentuate this point, let’s perform a similar procedure using 90% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals.</p>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-298-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-300-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Of the 100 confidence intervals based on samples of size <span class="math inline">\(n = 40\)</span>, 87 of them captured the population mean <span class="math inline">\(\mu = 21.152\)</span>, whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be “95% confident” or “90% confident” that the true value falls within the range of the specified confidence interval. We will use this “confident” language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process.</p>
 <div id="back-to-our-pennies-example" class="section level4 unnumbered">
 <h4>Back to our pennies example</h4>
@@ -1160,6 +1163,7 @@ <h3><span class="header-section-number">9.6.2</span> Bootstrap distribution</h3>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">tactile_shovel1 <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> color <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>, <span class="dt">success =</span> <span class="st">&quot;red&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>)</code></pre></div>
+<pre><code>Setting `type = &quot;bootstrap&quot;` in `generate()`.</code></pre>
 <p>This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the <code>calculate()</code> step.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_props &lt;-<span class="st"> </span>tactile_shovel1 <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> color <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>, <span class="dt">success =</span> <span class="st">&quot;red&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
@@ -1168,7 +1172,7 @@ <h3><span class="header-section-number">9.6.2</span> Bootstrap distribution</h3>
 <p>Let’s <code>visualize()</code> what the resulting bootstrap distribution looks like as a histogram. We’ve adjusted the number of bins here as well to better see the resulting shape.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_props <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">25</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-306-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-308-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see that the resulting distribution is symmetric and bell-shaped so it doesn’t much matter which confidence interval method we choose. Let’s use the standard error method to create a 95% confidence interval.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">standard_error_ci &lt;-<span class="st"> </span>bootstrap_props <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">type =</span> <span class="st">&quot;se&quot;</span>, <span class="dt">level =</span> <span class="fl">0.95</span>, <span class="dt">point_estimate =</span> p_hat)
@@ -1179,7 +1183,7 @@ <h3><span class="header-section-number">9.6.2</span> Bootstrap distribution</h3>
 1 0.284 0.556</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_props <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">25</span>, <span class="dt">endpoints =</span> standard_error_ci)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-308-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-310-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We are 95% confident that the true proportion of red balls in the bowl is between 0.284 and 0.556. This level of confidence is based on the standard error-based method including the true proportion 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.</p>
 </div>
 <div id="theory-based-confidence-intervals" class="section level3">
@@ -1230,7 +1234,7 @@ <h4>Confidence intervals based on 33 tactile samples</h4>
 conf_ints</code></pre></div>
 <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-311">TABLE 9.2: </span>33 confidence intervals from 33 tactile samples of size n=50
+<span id="tab:unnamed-chunk-313">TABLE 9.2: </span>33 confidence intervals from 33 tactile samples of size n=50
 </caption>
 <thead>
 <tr>
@@ -2301,9 +2305,10 @@ <h3><span class="header-section-number">9.7.2</span> Bootstrap distribution</h3>
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;seed&quot;</span>, <span class="st">&quot;control&quot;</span>))</code></pre></div>
+<pre><code>Setting `type = &quot;bootstrap&quot;` in `generate()`.</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">20</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-323-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-325-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply <code>get_ci()</code> can be used.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">type =</span> <span class="st">&quot;percentile&quot;</span>, <span class="dt">level =</span> <span class="fl">0.95</span>)</code></pre></div>
@@ -2329,11 +2334,11 @@ <h3><span class="header-section-number">9.7.2</span> Bootstrap distribution</h3>
 </div>
 <div id="conclusion-6" class="section level2">
 <h2><span class="header-section-number">9.8</span> Conclusion</h2>
-<div id="whats-to-come-4" class="section level3">
+<div id="whats-to-come-6" class="section level3">
 <h3><span class="header-section-number">9.8.1</span> What’s to come?</h3>
 <p>This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> up next!</p>
 </div>
-<div id="script-of-r-code-2" class="section level3">
+<div id="script-of-r-code" class="section level3">
 <h3><span class="header-section-number">9.8.2</span> Script of R code</h3>
 <p>An R script file of all R code used in this chapter is available <a href="scripts/09-confidence-intervals.R">here</a>.</p>
 
diff --git a/docs/A-appendixA.html b/docs/A-appendixA.html
index 857c2c81a..3f49a0128 100644
--- a/docs/A-appendixA.html
+++ b/docs/A-appendixA.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>A Statistical Background | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="A Statistical Background | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="A Statistical Background | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -570,8 +563,11 @@ <h3><span class="header-section-number">A.1.5</span> Distribution</h3>
 <h3><span class="header-section-number">A.1.6</span> Outliers</h3>
 <p><strong>Outliers</strong> correspond to values in the dataset that fall far outside the range of “ordinary” values. In regards to a boxplot (by default), they correspond to values below <span class="math inline">\(Q_1 - (1.5 * IQR)\)</span> or above <span class="math inline">\(Q_3 + (1.5 * IQR)\)</span>.</p>
 <p>Note that these terms (aside from <strong>Distribution</strong>) only apply to quantitative variables.</p>
-
 </div>
+</div>
+<div id="normal-distribution-discussion" class="section level2">
+<h2><span class="header-section-number">A.2</span> Normal distribution discussion</h2>
+
 </div>
 </div>
             </section>
diff --git a/docs/B-appendixB.html b/docs/B-appendixB.html
index 9508e1c7c..94ba3e2f6 100644
--- a/docs/B-appendixB.html
+++ b/docs/B-appendixB.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>B Inference Examples | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="B Inference Examples | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="B Inference Examples | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -713,11 +706,11 @@ <h4>Bootstrapping for hypothesis test</h4>
 <span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre></div>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distn_one_mean <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-444-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-447-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44 for our <span class="math inline">\(p\)</span>-value.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distn_one_mean <span class="op">%&gt;%</span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> x_bar, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-445-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-448-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <div id="calculate-p-value" class="section level5 unnumbered">
 <h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_one_mean <span class="op">%&gt;%</span>
@@ -746,7 +739,7 @@ <h4>Bootstrapping for confidence interval</h4>
 1   23.3    23.6</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">boot_distn_one_mean <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-449-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-452-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see that 23 is not contained in this confidence interval as a plausible value of <span class="math inline">\(\mu\)</span> (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (<span class="math inline">\(\mu &gt; 23\)</span>).</p>
 <p><strong>Interpretation</strong>: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.316 and 23.565.</p>
 <hr />
@@ -878,11 +871,11 @@ <h4>Simulation for hypothesis test</h4>
 <span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)</code></pre></div>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-452-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-455-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a two-tailed test so we will be looking for values that are 0.8 - 0.73 = 0.07 away from 0.8 in BOTH directions for our <span class="math inline">\(p\)</span>-value:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> p_hat, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-453-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-456-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <div id="calculate-p-value-1" class="section level5 unnumbered">
 <h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
@@ -919,7 +912,7 @@ <h4>Bootstrapping for confidence interval</h4>
 1   0.64    0.81</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">boot_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-457-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-460-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see that 0.80 is contained in this confidence interval as a plausible value of <span class="math inline">\(\pi\)</span> (the unknown population proportion). This matches with our hypothesis test results of failing to reject the null hypothesis.</p>
 <p><strong>Interpretation</strong>: We are 95% confident the true proportion of customers who are satisfied with the service they receive is between 0.64 and 0.81.</p>
 <hr />
@@ -1076,11 +1069,11 @@ <h4>Randomization for hypothesis test</h4>
 <span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;yes&quot;</span>, <span class="st">&quot;no&quot;</span>))</code></pre></div>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-462-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-465-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.099 or less than or equal to 0.099 for our <span class="math inline">\(p\)</span>-value.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;two_sided&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-463-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-466-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <div id="calculate-p-value-2" class="section level5 unnumbered">
 <h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
@@ -1109,7 +1102,7 @@ <h4>Bootstrapping for confidence interval</h4>
 1 -0.161 -0.0378</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">boot_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-467-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-470-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see that 0 is not contained in this confidence interval as a plausible value of <span class="math inline">\(\pi_{college} - \pi_{no\_college}\)</span> (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates.</p>
 <p><strong>Interpretation</strong>: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates.</p>
 <hr />
@@ -1348,11 +1341,11 @@ <h4>Randomization for hypothesis test</h4>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>,
             <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Sacramento_ CA&quot;</span>, <span class="st">&quot;Cleveland_ OH&quot;</span>))</code></pre></div>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-471-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-474-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to 4960.477 or less than or equal to -4960.477 for our <span class="math inline">\(p\)</span>-value.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-472-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-475-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <div id="calculate-p-value-3" class="section level5 unnumbered">
 <h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
@@ -1382,7 +1375,7 @@ <h4>Bootstrapping for confidence interval</h4>
 1 -1446.  11308.</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">boot_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-476-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-479-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see that 0 is contained in this confidence interval as a plausible value of <span class="math inline">\(\mu_{sac} - \mu_{cle}\)</span> (the unknown population parameter). This matches with our hypothesis test results of failing to reject the null hypothesis. Since zero is a plausible value of the population parameter, we do not have evidence that Sacramento incomes are different than Cleveland incomes.</p>
 <p><strong>Interpretation</strong>: We are 95% confident the true mean yearly income for those living in Sacramento is between 1445.53 dollars smaller to 11307.82 dollars higher than for Cleveland.</p>
 <p><strong>Note</strong>: You could also use the null distribution based on randomization with a shift to have its center at <span class="math inline">\(\bar{x}_{sac} - \bar{x}_{cle} = \$4960.48\)</span> instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above.</p>
@@ -1528,11 +1521,11 @@ <h4>Bootstrapping for hypothesis test</h4>
 <span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre></div>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-480-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-483-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a left-tailed test so we will be looking for values that are less than or equal to 4960.477 for our <span class="math inline">\(p\)</span>-value.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">null_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;less&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-481-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-484-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <div id="calculate-p-value-4" class="section level5 unnumbered">
 <h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
@@ -1561,7 +1554,7 @@ <h4>Bootstrapping for confidence interval</h4>
 1 -0.112 -0.0503</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">boot_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-485-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-488-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see that 0 is not contained in this confidence interval as a plausible value of <span class="math inline">\(\mu_{diff}\)</span> (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter and since the entire confidence interval falls below zero, we have evidence that surface zinc concentration levels are lower, on average, than bottom level zinc concentrations.</p>
 <p><strong>Interpretation</strong>: We are 95% confident the true mean zinc concentration on the surface is between 0.11 units smaller to 0.05 units smaller than on the bottom.</p>
 <hr />
diff --git a/docs/C-appendixC.html b/docs/C-appendixC.html
index f7f5eb315..7169fbe9d 100644
--- a/docs/C-appendixC.html
+++ b/docs/C-appendixC.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>C Reach for the Stars | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="C Reach for the Stars | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="C Reach for the Stars | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -566,8 +559,8 @@ <h2><span class="header-section-number">C.1</span> Sorted barplots</h2>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_bar</span>() <span class="op">+</span>
 <span class="st">  </span><span class="kw">scale_x_discrete</span>(<span class="dt">limits =</span> <span class="kw">names</span>(sorted_flights))</code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-491"></span>
-<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-491-1.png" alt="Number of flights departing NYC in 2013 by airline - Descending numbers" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-494"></span>
+<img src="ismaykimkuyper_files/figure-html/unnamed-chunk-494-1.png" alt="Number of flights departing NYC in 2013 by airline - Descending numbers" width="\textwidth" />
 <p class="caption">
 FIGURE C.1: Number of flights departing NYC in 2013 by airline - Descending numbers
 </p>
@@ -587,8 +580,8 @@ <h3><span class="header-section-number">C.2.1</span> Interactive linegraphs</h3>
 <span class="kw">rownames</span>(flights_summarized) &lt;-<span class="st"> </span>flights_summarized<span class="op">$</span>date
 flights_summarized &lt;-<span class="st"> </span><span class="kw">select</span>(flights_summarized, <span class="op">-</span>date)
 <span class="kw">dyRangeSelector</span>(<span class="kw">dygraph</span>(flights_summarized))</code></pre></div>
-<div id="htmlwidget-f6511c8b4e6155b4bf11" style="width:100%;height:384px;" class="dygraphs html-widget"></div>
-<script type="application/json" data-for="htmlwidget-f6511c8b4e6155b4bf11">{"x":{"attrs":{"labels":["day","median_arr_delay"],"legend":"auto","retainDateWindow":false,"axes":{"x":{"pixelsPerLabel":60}},"showRangeSelector":true,"rangeSelectorHeight":40,"rangeSelectorPlotFillColor":" #A7B1C4","rangeSelectorPlotStrokeColor":"#808FAB","interactionModel":"Dygraph.Interaction.defaultModel"},"scale":"daily","annotations":[],"shadings":[],"events":[],"format":"date","data":[["2013-01-01T06:00:00.000Z","2013-01-02T06:00:00.000Z","2013-01-03T06:00:00.000Z","2013-01-04T06:00:00.000Z","2013-01-05T06:00:00.000Z","2013-01-06T06:00:00.000Z","2013-01-07T06:00:00.000Z","2013-01-08T06:00:00.000Z","2013-01-09T06:00:00.000Z","2013-01-10T06:00:00.000Z","2013-01-11T06:00:00.000Z","2013-01-12T06:00:00.000Z","2013-01-13T06:00:00.000Z","2013-01-14T06:00:00.000Z","2013-01-15T06:00:00.000Z","2013-01-16T06:00:00.000Z","2013-01-17T06:00:00.000Z","2013-01-18T06:00:00.000Z","2013-01-19T06:00:00.000Z","2013-01-20T06:00:00.000Z","2013-01-21T06:00:00.000Z","2013-01-22T06:00:00.000Z","2013-01-23T06:00:00.000Z","2013-01-24T06:00:00.000Z","2013-01-25T06:00:00.000Z","2013-01-26T06:00:00.000Z","2013-01-27T06:00:00.000Z","2013-01-28T06:00:00.000Z","2013-01-29T06:00:00.000Z","2013-01-30T06:00:00.000Z","2013-01-31T06:00:00.000Z","2013-02-01T06:00:00.000Z","2013-02-02T06:00:00.000Z","2013-02-03T06:00:00.000Z","2013-02-04T06:00:00.000Z","2013-02-05T06:00:00.000Z","2013-02-06T06:00:00.000Z","2013-02-07T06:00:00.000Z","2013-02-08T06:00:00.000Z","2013-02-09T06:00:00.000Z","2013-02-10T06:00:00.000Z","2013-02-11T06:00:00.000Z","2013-02-12T06:00:00.000Z","2013-02-13T06:00:00.000Z","2013-02-14T06:00:00.000Z","2013-02-15T06:00:00.000Z","2013-02-16T06:00:00.000Z","2013-02-17T06:00:00.000Z","2013-02-18T06:00:00.000Z","2013-02-19T06:00:00.000Z","2013-02-20T06:00:00.000Z","2013-02-21T06:00:00.000Z","2013-02-22T06:00:00.000Z","2013-02-23T06:00:00.000Z","2013-02-24T06:00:00.000Z","2013-02-25T06:00:00.000Z","2013-02-26T06:00:00.000Z","2013-02-27T06:00:00.000Z","2013-02-28T06:00:00.000Z","2013-03-01T06:00:00.000Z","2013-03-02T06:00:00.000Z","2013-03-03T06:00:00.000Z","2013-03-04T06:00:00.000Z","2013-03-05T06:00:00.000Z","2013-03-06T06:00:00.000Z","2013-03-07T06:00:00.000Z","2013-03-08T06:00:00.000Z","2013-03-09T06:00:00.000Z","2013-03-10T06:00:00.000Z","2013-03-11T05:00:00.000Z","2013-03-12T05:00:00.000Z","2013-03-13T05:00:00.000Z","2013-03-14T05:00:00.000Z","2013-03-15T05:00:00.000Z","2013-03-16T05:00:00.000Z","2013-03-17T05:00:00.000Z","2013-03-18T05:00:00.000Z","2013-03-19T05:00:00.000Z","2013-03-20T05:00:00.000Z","2013-03-21T05:00:00.000Z","2013-03-22T05:00:00.000Z","2013-03-23T05:00:00.000Z","2013-03-24T05:00:00.000Z","2013-03-25T05:00:00.000Z","2013-03-26T05:00:00.000Z","2013-03-27T05:00:00.000Z","2013-03-28T05:00:00.000Z","2013-03-29T05:00:00.000Z","2013-03-30T05:00:00.000Z","2013-03-31T05:00:00.000Z","2013-04-01T05:00:00.000Z","2013-04-02T05:00:00.000Z","2013-04-03T05:00:00.000Z","2013-04-04T05:00:00.000Z","2013-04-05T05:00:00.000Z","2013-04-06T05:00:00.000Z","2013-04-07T05:00:00.000Z","2013-04-08T05:00:00.000Z","2013-04-09T05:00:00.000Z","2013-04-10T05:00:00.000Z","2013-04-11T05:00:00.000Z","2013-04-12T05:00:00.000Z","2013-04-13T05:00:00.000Z","2013-04-14T05:00:00.000Z","2013-04-15T05:00:00.000Z","2013-04-16T05:00:00.000Z","2013-04-17T05:00:00.000Z","2013-04-18T05:00:00.000Z","2013-04-19T05:00:00.000Z","2013-04-20T05:00:00.000Z","2013-04-21T05:00:00.000Z","2013-04-22T05:00:00.000Z","2013-04-23T05:00:00.000Z","2013-04-24T05:00:00.000Z","2013-04-25T05:00:00.000Z","2013-04-26T05:00:00.000Z","2013-04-27T05:00:00.000Z","2013-04-28T05:00:00.000Z","2013-04-29T05:00:00.000Z","2013-04-30T05:00:00.000Z","2013-05-01T05:00:00.000Z","2013-05-02T05:00:00.000Z","2013-05-03T05:00:00.000Z","2013-05-04T05:00:00.000Z","2013-05-05T05:00:00.000Z","2013-05-06T05:00:00.000Z","2013-05-07T05:00:00.000Z","2013-05-08T05:00:00.000Z","2013-05-09T05:00:00.000Z","2013-05-10T05:00:00.000Z","2013-05-11T05:00:00.000Z","2013-05-12T05:00:00.000Z","2013-05-13T05:00:00.000Z","2013-05-14T05:00:00.000Z","2013-05-15T05:00:00.000Z","2013-05-16T05:00:00.000Z","2013-05-17T05:00:00.000Z","2013-05-18T05:00:00.000Z","2013-05-19T05:00:00.000Z","2013-05-20T05:00:00.000Z","2013-05-21T05:00:00.000Z","2013-05-22T05:00:00.000Z","2013-05-23T05:00:00.000Z","2013-05-24T05:00:00.000Z","2013-05-25T05:00:00.000Z","2013-05-26T05:00:00.000Z","2013-05-27T05:00:00.000Z","2013-05-28T05:00:00.000Z","2013-05-29T05:00:00.000Z","2013-05-30T05:00:00.000Z","2013-05-31T05:00:00.000Z","2013-06-01T05:00:00.000Z","2013-06-02T05:00:00.000Z","2013-06-03T05:00:00.000Z","2013-06-04T05:00:00.000Z","2013-06-05T05:00:00.000Z","2013-06-06T05:00:00.000Z","2013-06-07T05:00:00.000Z","2013-06-08T05:00:00.000Z","2013-06-09T05:00:00.000Z","2013-06-10T05:00:00.000Z","2013-06-11T05:00:00.000Z","2013-06-12T05:00:00.000Z","2013-06-13T05:00:00.000Z","2013-06-14T05:00:00.000Z","2013-06-15T05:00:00.000Z","2013-06-16T05:00:00.000Z","2013-06-17T05:00:00.000Z","2013-06-18T05:00:00.000Z","2013-06-19T05:00:00.000Z","2013-06-20T05:00:00.000Z","2013-06-21T05:00:00.000Z","2013-06-22T05:00:00.000Z","2013-06-23T05:00:00.000Z","2013-06-24T05:00:00.000Z","2013-06-25T05:00:00.000Z","2013-06-26T05:00:00.000Z","2013-06-27T05:00:00.000Z","2013-06-28T05:00:00.000Z","2013-06-29T05:00:00.000Z","2013-06-30T05:00:00.000Z","2013-07-01T05:00:00.000Z","2013-07-02T05:00:00.000Z","2013-07-03T05:00:00.000Z","2013-07-04T05:00:00.000Z","2013-07-05T05:00:00.000Z","2013-07-06T05:00:00.000Z","2013-07-07T05:00:00.000Z","2013-07-08T05:00:00.000Z","2013-07-09T05:00:00.000Z","2013-07-10T05:00:00.000Z","2013-07-11T05:00:00.000Z","2013-07-12T05:00:00.000Z","2013-07-13T05:00:00.000Z","2013-07-14T05:00:00.000Z","2013-07-15T05:00:00.000Z","2013-07-16T05:00:00.000Z","2013-07-17T05:00:00.000Z","2013-07-18T05:00:00.000Z","2013-07-19T05:00:00.000Z","2013-07-20T05:00:00.000Z","2013-07-21T05:00:00.000Z","2013-07-22T05:00:00.000Z","2013-07-23T05:00:00.000Z","2013-07-24T05:00:00.000Z","2013-07-25T05:00:00.000Z","2013-07-26T05:00:00.000Z","2013-07-27T05:00:00.000Z","2013-07-28T05:00:00.000Z","2013-07-29T05:00:00.000Z","2013-07-30T05:00:00.000Z","2013-07-31T05:00:00.000Z","2013-08-01T05:00:00.000Z","2013-08-02T05:00:00.000Z","2013-08-03T05:00:00.000Z","2013-08-04T05:00:00.000Z","2013-08-05T05:00:00.000Z","2013-08-06T05:00:00.000Z","2013-08-07T05:00:00.000Z","2013-08-08T05:00:00.000Z","2013-08-09T05:00:00.000Z","2013-08-10T05:00:00.000Z","2013-08-11T05:00:00.000Z","2013-08-12T05:00:00.000Z","2013-08-13T05:00:00.000Z","2013-08-14T05:00:00.000Z","2013-08-15T05:00:00.000Z","2013-08-16T05:00:00.000Z","2013-08-17T05:00:00.000Z","2013-08-18T05:00:00.000Z","2013-08-19T05:00:00.000Z","2013-08-20T05:00:00.000Z","2013-08-21T05:00:00.000Z","2013-08-22T05:00:00.000Z","2013-08-23T05:00:00.000Z","2013-08-24T05:00:00.000Z","2013-08-25T05:00:00.000Z","2013-08-26T05:00:00.000Z","2013-08-27T05:00:00.000Z","2013-08-28T05:00:00.000Z","2013-08-29T05:00:00.000Z","2013-08-30T05:00:00.000Z","2013-08-31T05:00:00.000Z","2013-09-01T05:00:00.000Z","2013-09-02T05:00:00.000Z","2013-09-03T05:00:00.000Z","2013-09-04T05:00:00.000Z","2013-09-05T05:00:00.000Z","2013-09-06T05:00:00.000Z","2013-09-07T05:00:00.000Z","2013-09-08T05:00:00.000Z","2013-09-09T05:00:00.000Z","2013-09-10T05:00:00.000Z","2013-09-11T05:00:00.000Z","2013-09-12T05:00:00.000Z","2013-09-13T05:00:00.000Z","2013-09-14T05:00:00.000Z","2013-09-15T05:00:00.000Z","2013-09-16T05:00:00.000Z","2013-09-17T05:00:00.000Z","2013-09-18T05:00:00.000Z","2013-09-19T05:00:00.000Z","2013-09-20T05:00:00.000Z","2013-09-21T05:00:00.000Z","2013-09-22T05:00:00.000Z","2013-09-23T05:00:00.000Z","2013-09-24T05:00:00.000Z","2013-09-25T05:00:00.000Z","2013-09-26T05:00:00.000Z","2013-09-27T05:00:00.000Z","2013-09-28T05:00:00.000Z","2013-09-29T05:00:00.000Z","2013-09-30T05:00:00.000Z","2013-10-01T05:00:00.000Z","2013-10-02T05:00:00.000Z","2013-10-03T05:00:00.000Z","2013-10-04T05:00:00.000Z","2013-10-05T05:00:00.000Z","2013-10-06T05:00:00.000Z","2013-10-07T05:00:00.000Z","2013-10-08T05:00:00.000Z","2013-10-09T05:00:00.000Z","2013-10-10T05:00:00.000Z","2013-10-11T05:00:00.000Z","2013-10-12T05:00:00.000Z","2013-10-13T05:00:00.000Z","2013-10-14T05:00:00.000Z","2013-10-15T05:00:00.000Z","2013-10-16T05:00:00.000Z","2013-10-17T05:00:00.000Z","2013-10-18T05:00:00.000Z","2013-10-19T05:00:00.000Z","2013-10-20T05:00:00.000Z","2013-10-21T05:00:00.000Z","2013-10-22T05:00:00.000Z","2013-10-23T05:00:00.000Z","2013-10-24T05:00:00.000Z","2013-10-25T05:00:00.000Z","2013-10-26T05:00:00.000Z","2013-10-27T05:00:00.000Z","2013-10-28T05:00:00.000Z","2013-10-29T05:00:00.000Z","2013-10-30T05:00:00.000Z","2013-10-31T05:00:00.000Z","2013-11-01T05:00:00.000Z","2013-11-02T05:00:00.000Z","2013-11-03T05:00:00.000Z","2013-11-04T06:00:00.000Z","2013-11-05T06:00:00.000Z","2013-11-06T06:00:00.000Z","2013-11-07T06:00:00.000Z","2013-11-08T06:00:00.000Z","2013-11-09T06:00:00.000Z","2013-11-10T06:00:00.000Z","2013-11-11T06:00:00.000Z","2013-11-12T06:00:00.000Z","2013-11-13T06:00:00.000Z","2013-11-14T06:00:00.000Z","2013-11-15T06:00:00.000Z","2013-11-16T06:00:00.000Z","2013-11-17T06:00:00.000Z","2013-11-18T06:00:00.000Z","2013-11-19T06:00:00.000Z","2013-11-20T06:00:00.000Z","2013-11-21T06:00:00.000Z","2013-11-22T06:00:00.000Z","2013-11-23T06:00:00.000Z","2013-11-24T06:00:00.000Z","2013-11-25T06:00:00.000Z","2013-11-26T06:00:00.000Z","2013-11-27T06:00:00.000Z","2013-11-28T06:00:00.000Z","2013-11-29T06:00:00.000Z","2013-11-30T06:00:00.000Z","2013-12-01T06:00:00.000Z","2013-12-02T06:00:00.000Z","2013-12-03T06:00:00.000Z","2013-12-04T06:00:00.000Z","2013-12-05T06:00:00.000Z","2013-12-06T06:00:00.000Z","2013-12-07T06:00:00.000Z","2013-12-08T06:00:00.000Z","2013-12-09T06:00:00.000Z","2013-12-10T06:00:00.000Z","2013-12-11T06:00:00.000Z","2013-12-12T06:00:00.000Z","2013-12-13T06:00:00.000Z","2013-12-14T06:00:00.000Z","2013-12-15T06:00:00.000Z","2013-12-16T06:00:00.000Z","2013-12-17T06:00:00.000Z","2013-12-18T06:00:00.000Z","2013-12-19T06:00:00.000Z","2013-12-20T06:00:00.000Z","2013-12-21T06:00:00.000Z","2013-12-22T06:00:00.000Z","2013-12-23T06:00:00.000Z","2013-12-24T06:00:00.000Z","2013-12-25T06:00:00.000Z","2013-12-26T06:00:00.000Z","2013-12-27T06:00:00.000Z","2013-12-28T06:00:00.000Z","2013-12-29T06:00:00.000Z","2013-12-30T06:00:00.000Z","2013-12-31T06:00:00.000Z","2014-01-01T06:00:00.000Z"],[3,4,1,-7,-7,-2,-8,-8,-6,-11,-11,-14,-9,3,-3,16,1,-3,-12,-7,-3,2,-1,-1,3,-1,-9.5,-3,-12,-1,12,0,-9,-6,-3,1,-6,-5,10,-3,-5,7,-3,-6,-2,-3,-4,-12,-9.5,-3,-5,0,3,3,-8,-5,-5,11,-10,-8,-9,-13,-8,-10,-7.5,0,58,-9,-12,-7,3,-7,-7,-7,2.5,0,9,15,-3,-5,-6,3,-1,-1,-11,-11,-13,-14,-17,-10,0,-5,-4,-1,-2,-11,-9,-10,-9,-4,6,19,0,-3,-5,-8,-4,10,14,1,-3,19,13,4,23,11,-14,-11,-10,-13,-14,-13,-7,-15,-15,-12,-16,10,4,-3,2,-10,-12,-15,-12,-5,-6,-15,-3,-3,-6,5,30.5,10,-7,-14,-15,-9,-6,-10,-11,-16,-5,10,-5,-10,-6,5,-8,-9,3,-7,-5,30,4,-11,-9,3,13,4,-11,-11,-12,-8,14.5,15,5,8,14,0,11,44.5,1,2,-13.5,-15,-15,0,9,7,13,4,4,2,-16,-14,-6,-5.5,-6,-3,-1,-1,12,25,5,2,3,-7,7,3,-7,-8,11,2,-2,-4,-5,-8,-2,20,27,-1,-2,2,16,6,-2,-8,-12,-9,-9,-13,-14,10,-6,-16.5,-20,-18,-16,1,-6,-18,-15,-16,-2,-8,-18,-18,-19,-22,-16,-15,-15,-10,16,4,-17,-15,-7,-13,-16,-11,-13,-11,-8,-10,-12,-9,-9,-11,-19,-14,-15,-21,-16,-5,-10,-13,-10,5,-8,-11,-2,2,-13,-9,-7,-10,-7,-4,-2,-4,-5,-5,-4,-1,-4,-3,-5,-13,-7,-7,-9,-5,0,-4,-11,-1,-8,-8,-1,-3,-10,-9,-9,4,-12,-13,-8,-11,-2.5,-4,-6,-9,-5,-1,-1,-6,-8,-3,8,-5,-14,-17,-11.5,-7,-3,-7.5,12,9,1,10,29,35,7,0,-4,16,5.5,1,27,8,0,2,3,5,15,-1,-9,-2,-5,-8,-1,2,1,3]]},"evals":["attrs.interactionModel"],"jsHooks":[]}</script>
+<div id="htmlwidget-9fbc6e1a77c34db852a0" style="width:100%;height:384px;" class="dygraphs html-widget"></div>
+<script type="application/json" data-for="htmlwidget-9fbc6e1a77c34db852a0">{"x":{"attrs":{"labels":["day","median_arr_delay"],"legend":"auto","retainDateWindow":false,"axes":{"x":{"pixelsPerLabel":60}},"showRangeSelector":true,"rangeSelectorHeight":40,"rangeSelectorPlotFillColor":" #A7B1C4","rangeSelectorPlotStrokeColor":"#808FAB","interactionModel":"Dygraph.Interaction.defaultModel"},"scale":"daily","annotations":[],"shadings":[],"events":[],"format":"date","data":[["2013-01-01T06:00:00.000Z","2013-01-02T06:00:00.000Z","2013-01-03T06:00:00.000Z","2013-01-04T06:00:00.000Z","2013-01-05T06:00:00.000Z","2013-01-06T06:00:00.000Z","2013-01-07T06:00:00.000Z","2013-01-08T06:00:00.000Z","2013-01-09T06:00:00.000Z","2013-01-10T06:00:00.000Z","2013-01-11T06:00:00.000Z","2013-01-12T06:00:00.000Z","2013-01-13T06:00:00.000Z","2013-01-14T06:00:00.000Z","2013-01-15T06:00:00.000Z","2013-01-16T06:00:00.000Z","2013-01-17T06:00:00.000Z","2013-01-18T06:00:00.000Z","2013-01-19T06:00:00.000Z","2013-01-20T06:00:00.000Z","2013-01-21T06:00:00.000Z","2013-01-22T06:00:00.000Z","2013-01-23T06:00:00.000Z","2013-01-24T06:00:00.000Z","2013-01-25T06:00:00.000Z","2013-01-26T06:00:00.000Z","2013-01-27T06:00:00.000Z","2013-01-28T06:00:00.000Z","2013-01-29T06:00:00.000Z","2013-01-30T06:00:00.000Z","2013-01-31T06:00:00.000Z","2013-02-01T06:00:00.000Z","2013-02-02T06:00:00.000Z","2013-02-03T06:00:00.000Z","2013-02-04T06:00:00.000Z","2013-02-05T06:00:00.000Z","2013-02-06T06:00:00.000Z","2013-02-07T06:00:00.000Z","2013-02-08T06:00:00.000Z","2013-02-09T06:00:00.000Z","2013-02-10T06:00:00.000Z","2013-02-11T06:00:00.000Z","2013-02-12T06:00:00.000Z","2013-02-13T06:00:00.000Z","2013-02-14T06:00:00.000Z","2013-02-15T06:00:00.000Z","2013-02-16T06:00:00.000Z","2013-02-17T06:00:00.000Z","2013-02-18T06:00:00.000Z","2013-02-19T06:00:00.000Z","2013-02-20T06:00:00.000Z","2013-02-21T06:00:00.000Z","2013-02-22T06:00:00.000Z","2013-02-23T06:00:00.000Z","2013-02-24T06:00:00.000Z","2013-02-25T06:00:00.000Z","2013-02-26T06:00:00.000Z","2013-02-27T06:00:00.000Z","2013-02-28T06:00:00.000Z","2013-03-01T06:00:00.000Z","2013-03-02T06:00:00.000Z","2013-03-03T06:00:00.000Z","2013-03-04T06:00:00.000Z","2013-03-05T06:00:00.000Z","2013-03-06T06:00:00.000Z","2013-03-07T06:00:00.000Z","2013-03-08T06:00:00.000Z","2013-03-09T06:00:00.000Z","2013-03-10T06:00:00.000Z","2013-03-11T05:00:00.000Z","2013-03-12T05:00:00.000Z","2013-03-13T05:00:00.000Z","2013-03-14T05:00:00.000Z","2013-03-15T05:00:00.000Z","2013-03-16T05:00:00.000Z","2013-03-17T05:00:00.000Z","2013-03-18T05:00:00.000Z","2013-03-19T05:00:00.000Z","2013-03-20T05:00:00.000Z","2013-03-21T05:00:00.000Z","2013-03-22T05:00:00.000Z","2013-03-23T05:00:00.000Z","2013-03-24T05:00:00.000Z","2013-03-25T05:00:00.000Z","2013-03-26T05:00:00.000Z","2013-03-27T05:00:00.000Z","2013-03-28T05:00:00.000Z","2013-03-29T05:00:00.000Z","2013-03-30T05:00:00.000Z","2013-03-31T05:00:00.000Z","2013-04-01T05:00:00.000Z","2013-04-02T05:00:00.000Z","2013-04-03T05:00:00.000Z","2013-04-04T05:00:00.000Z","2013-04-05T05:00:00.000Z","2013-04-06T05:00:00.000Z","2013-04-07T05:00:00.000Z","2013-04-08T05:00:00.000Z","2013-04-09T05:00:00.000Z","2013-04-10T05:00:00.000Z","2013-04-11T05:00:00.000Z","2013-04-12T05:00:00.000Z","2013-04-13T05:00:00.000Z","2013-04-14T05:00:00.000Z","2013-04-15T05:00:00.000Z","2013-04-16T05:00:00.000Z","2013-04-17T05:00:00.000Z","2013-04-18T05:00:00.000Z","2013-04-19T05:00:00.000Z","2013-04-20T05:00:00.000Z","2013-04-21T05:00:00.000Z","2013-04-22T05:00:00.000Z","2013-04-23T05:00:00.000Z","2013-04-24T05:00:00.000Z","2013-04-25T05:00:00.000Z","2013-04-26T05:00:00.000Z","2013-04-27T05:00:00.000Z","2013-04-28T05:00:00.000Z","2013-04-29T05:00:00.000Z","2013-04-30T05:00:00.000Z","2013-05-01T05:00:00.000Z","2013-05-02T05:00:00.000Z","2013-05-03T05:00:00.000Z","2013-05-04T05:00:00.000Z","2013-05-05T05:00:00.000Z","2013-05-06T05:00:00.000Z","2013-05-07T05:00:00.000Z","2013-05-08T05:00:00.000Z","2013-05-09T05:00:00.000Z","2013-05-10T05:00:00.000Z","2013-05-11T05:00:00.000Z","2013-05-12T05:00:00.000Z","2013-05-13T05:00:00.000Z","2013-05-14T05:00:00.000Z","2013-05-15T05:00:00.000Z","2013-05-16T05:00:00.000Z","2013-05-17T05:00:00.000Z","2013-05-18T05:00:00.000Z","2013-05-19T05:00:00.000Z","2013-05-20T05:00:00.000Z","2013-05-21T05:00:00.000Z","2013-05-22T05:00:00.000Z","2013-05-23T05:00:00.000Z","2013-05-24T05:00:00.000Z","2013-05-25T05:00:00.000Z","2013-05-26T05:00:00.000Z","2013-05-27T05:00:00.000Z","2013-05-28T05:00:00.000Z","2013-05-29T05:00:00.000Z","2013-05-30T05:00:00.000Z","2013-05-31T05:00:00.000Z","2013-06-01T05:00:00.000Z","2013-06-02T05:00:00.000Z","2013-06-03T05:00:00.000Z","2013-06-04T05:00:00.000Z","2013-06-05T05:00:00.000Z","2013-06-06T05:00:00.000Z","2013-06-07T05:00:00.000Z","2013-06-08T05:00:00.000Z","2013-06-09T05:00:00.000Z","2013-06-10T05:00:00.000Z","2013-06-11T05:00:00.000Z","2013-06-12T05:00:00.000Z","2013-06-13T05:00:00.000Z","2013-06-14T05:00:00.000Z","2013-06-15T05:00:00.000Z","2013-06-16T05:00:00.000Z","2013-06-17T05:00:00.000Z","2013-06-18T05:00:00.000Z","2013-06-19T05:00:00.000Z","2013-06-20T05:00:00.000Z","2013-06-21T05:00:00.000Z","2013-06-22T05:00:00.000Z","2013-06-23T05:00:00.000Z","2013-06-24T05:00:00.000Z","2013-06-25T05:00:00.000Z","2013-06-26T05:00:00.000Z","2013-06-27T05:00:00.000Z","2013-06-28T05:00:00.000Z","2013-06-29T05:00:00.000Z","2013-06-30T05:00:00.000Z","2013-07-01T05:00:00.000Z","2013-07-02T05:00:00.000Z","2013-07-03T05:00:00.000Z","2013-07-04T05:00:00.000Z","2013-07-05T05:00:00.000Z","2013-07-06T05:00:00.000Z","2013-07-07T05:00:00.000Z","2013-07-08T05:00:00.000Z","2013-07-09T05:00:00.000Z","2013-07-10T05:00:00.000Z","2013-07-11T05:00:00.000Z","2013-07-12T05:00:00.000Z","2013-07-13T05:00:00.000Z","2013-07-14T05:00:00.000Z","2013-07-15T05:00:00.000Z","2013-07-16T05:00:00.000Z","2013-07-17T05:00:00.000Z","2013-07-18T05:00:00.000Z","2013-07-19T05:00:00.000Z","2013-07-20T05:00:00.000Z","2013-07-21T05:00:00.000Z","2013-07-22T05:00:00.000Z","2013-07-23T05:00:00.000Z","2013-07-24T05:00:00.000Z","2013-07-25T05:00:00.000Z","2013-07-26T05:00:00.000Z","2013-07-27T05:00:00.000Z","2013-07-28T05:00:00.000Z","2013-07-29T05:00:00.000Z","2013-07-30T05:00:00.000Z","2013-07-31T05:00:00.000Z","2013-08-01T05:00:00.000Z","2013-08-02T05:00:00.000Z","2013-08-03T05:00:00.000Z","2013-08-04T05:00:00.000Z","2013-08-05T05:00:00.000Z","2013-08-06T05:00:00.000Z","2013-08-07T05:00:00.000Z","2013-08-08T05:00:00.000Z","2013-08-09T05:00:00.000Z","2013-08-10T05:00:00.000Z","2013-08-11T05:00:00.000Z","2013-08-12T05:00:00.000Z","2013-08-13T05:00:00.000Z","2013-08-14T05:00:00.000Z","2013-08-15T05:00:00.000Z","2013-08-16T05:00:00.000Z","2013-08-17T05:00:00.000Z","2013-08-18T05:00:00.000Z","2013-08-19T05:00:00.000Z","2013-08-20T05:00:00.000Z","2013-08-21T05:00:00.000Z","2013-08-22T05:00:00.000Z","2013-08-23T05:00:00.000Z","2013-08-24T05:00:00.000Z","2013-08-25T05:00:00.000Z","2013-08-26T05:00:00.000Z","2013-08-27T05:00:00.000Z","2013-08-28T05:00:00.000Z","2013-08-29T05:00:00.000Z","2013-08-30T05:00:00.000Z","2013-08-31T05:00:00.000Z","2013-09-01T05:00:00.000Z","2013-09-02T05:00:00.000Z","2013-09-03T05:00:00.000Z","2013-09-04T05:00:00.000Z","2013-09-05T05:00:00.000Z","2013-09-06T05:00:00.000Z","2013-09-07T05:00:00.000Z","2013-09-08T05:00:00.000Z","2013-09-09T05:00:00.000Z","2013-09-10T05:00:00.000Z","2013-09-11T05:00:00.000Z","2013-09-12T05:00:00.000Z","2013-09-13T05:00:00.000Z","2013-09-14T05:00:00.000Z","2013-09-15T05:00:00.000Z","2013-09-16T05:00:00.000Z","2013-09-17T05:00:00.000Z","2013-09-18T05:00:00.000Z","2013-09-19T05:00:00.000Z","2013-09-20T05:00:00.000Z","2013-09-21T05:00:00.000Z","2013-09-22T05:00:00.000Z","2013-09-23T05:00:00.000Z","2013-09-24T05:00:00.000Z","2013-09-25T05:00:00.000Z","2013-09-26T05:00:00.000Z","2013-09-27T05:00:00.000Z","2013-09-28T05:00:00.000Z","2013-09-29T05:00:00.000Z","2013-09-30T05:00:00.000Z","2013-10-01T05:00:00.000Z","2013-10-02T05:00:00.000Z","2013-10-03T05:00:00.000Z","2013-10-04T05:00:00.000Z","2013-10-05T05:00:00.000Z","2013-10-06T05:00:00.000Z","2013-10-07T05:00:00.000Z","2013-10-08T05:00:00.000Z","2013-10-09T05:00:00.000Z","2013-10-10T05:00:00.000Z","2013-10-11T05:00:00.000Z","2013-10-12T05:00:00.000Z","2013-10-13T05:00:00.000Z","2013-10-14T05:00:00.000Z","2013-10-15T05:00:00.000Z","2013-10-16T05:00:00.000Z","2013-10-17T05:00:00.000Z","2013-10-18T05:00:00.000Z","2013-10-19T05:00:00.000Z","2013-10-20T05:00:00.000Z","2013-10-21T05:00:00.000Z","2013-10-22T05:00:00.000Z","2013-10-23T05:00:00.000Z","2013-10-24T05:00:00.000Z","2013-10-25T05:00:00.000Z","2013-10-26T05:00:00.000Z","2013-10-27T05:00:00.000Z","2013-10-28T05:00:00.000Z","2013-10-29T05:00:00.000Z","2013-10-30T05:00:00.000Z","2013-10-31T05:00:00.000Z","2013-11-01T05:00:00.000Z","2013-11-02T05:00:00.000Z","2013-11-03T05:00:00.000Z","2013-11-04T06:00:00.000Z","2013-11-05T06:00:00.000Z","2013-11-06T06:00:00.000Z","2013-11-07T06:00:00.000Z","2013-11-08T06:00:00.000Z","2013-11-09T06:00:00.000Z","2013-11-10T06:00:00.000Z","2013-11-11T06:00:00.000Z","2013-11-12T06:00:00.000Z","2013-11-13T06:00:00.000Z","2013-11-14T06:00:00.000Z","2013-11-15T06:00:00.000Z","2013-11-16T06:00:00.000Z","2013-11-17T06:00:00.000Z","2013-11-18T06:00:00.000Z","2013-11-19T06:00:00.000Z","2013-11-20T06:00:00.000Z","2013-11-21T06:00:00.000Z","2013-11-22T06:00:00.000Z","2013-11-23T06:00:00.000Z","2013-11-24T06:00:00.000Z","2013-11-25T06:00:00.000Z","2013-11-26T06:00:00.000Z","2013-11-27T06:00:00.000Z","2013-11-28T06:00:00.000Z","2013-11-29T06:00:00.000Z","2013-11-30T06:00:00.000Z","2013-12-01T06:00:00.000Z","2013-12-02T06:00:00.000Z","2013-12-03T06:00:00.000Z","2013-12-04T06:00:00.000Z","2013-12-05T06:00:00.000Z","2013-12-06T06:00:00.000Z","2013-12-07T06:00:00.000Z","2013-12-08T06:00:00.000Z","2013-12-09T06:00:00.000Z","2013-12-10T06:00:00.000Z","2013-12-11T06:00:00.000Z","2013-12-12T06:00:00.000Z","2013-12-13T06:00:00.000Z","2013-12-14T06:00:00.000Z","2013-12-15T06:00:00.000Z","2013-12-16T06:00:00.000Z","2013-12-17T06:00:00.000Z","2013-12-18T06:00:00.000Z","2013-12-19T06:00:00.000Z","2013-12-20T06:00:00.000Z","2013-12-21T06:00:00.000Z","2013-12-22T06:00:00.000Z","2013-12-23T06:00:00.000Z","2013-12-24T06:00:00.000Z","2013-12-25T06:00:00.000Z","2013-12-26T06:00:00.000Z","2013-12-27T06:00:00.000Z","2013-12-28T06:00:00.000Z","2013-12-29T06:00:00.000Z","2013-12-30T06:00:00.000Z","2013-12-31T06:00:00.000Z","2014-01-01T06:00:00.000Z"],[3,4,1,-7,-7,-2,-8,-8,-6,-11,-11,-14,-9,3,-3,16,1,-3,-12,-7,-3,2,-1,-1,3,-1,-9.5,-3,-12,-1,12,0,-9,-6,-3,1,-6,-5,10,-3,-5,7,-3,-6,-2,-3,-4,-12,-9.5,-3,-5,0,3,3,-8,-5,-5,11,-10,-8,-9,-13,-8,-10,-7.5,0,58,-9,-12,-7,3,-7,-7,-7,2.5,0,9,15,-3,-5,-6,3,-1,-1,-11,-11,-13,-14,-17,-10,0,-5,-4,-1,-2,-11,-9,-10,-9,-4,6,19,0,-3,-5,-8,-4,10,14,1,-3,19,13,4,23,11,-14,-11,-10,-13,-14,-13,-7,-15,-15,-12,-16,10,4,-3,2,-10,-12,-15,-12,-5,-6,-15,-3,-3,-6,5,30.5,10,-7,-14,-15,-9,-6,-10,-11,-16,-5,10,-5,-10,-6,5,-8,-9,3,-7,-5,30,4,-11,-9,3,13,4,-11,-11,-12,-8,14.5,15,5,8,14,0,11,44.5,1,2,-13.5,-15,-15,0,9,7,13,4,4,2,-16,-14,-6,-5.5,-6,-3,-1,-1,12,25,5,2,3,-7,7,3,-7,-8,11,2,-2,-4,-5,-8,-2,20,27,-1,-2,2,16,6,-2,-8,-12,-9,-9,-13,-14,10,-6,-16.5,-20,-18,-16,1,-6,-18,-15,-16,-2,-8,-18,-18,-19,-22,-16,-15,-15,-10,16,4,-17,-15,-7,-13,-16,-11,-13,-11,-8,-10,-12,-9,-9,-11,-19,-14,-15,-21,-16,-5,-10,-13,-10,5,-8,-11,-2,2,-13,-9,-7,-10,-7,-4,-2,-4,-5,-5,-4,-1,-4,-3,-5,-13,-7,-7,-9,-5,0,-4,-11,-1,-8,-8,-1,-3,-10,-9,-9,4,-12,-13,-8,-11,-2.5,-4,-6,-9,-5,-1,-1,-6,-8,-3,8,-5,-14,-17,-11.5,-7,-3,-7.5,12,9,1,10,29,35,7,0,-4,16,5.5,1,27,8,0,2,3,5,15,-1,-9,-2,-5,-8,-1,2,1,3]]},"evals":["attrs.interactionModel"],"jsHooks":[]}</script>
 <p><br></p>
 <p>The syntax here is a little different than what we have covered so far. The <code>dygraph</code> function is expecting for the dates to be given as the <code>rownames</code> of the object. We then remove the <code>date</code> variable from the <code>flights_summarized</code> data frame since it is accounted for in the <code>rownames</code>. Lastly, we run the <code>dygraph</code> function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via <code>dyRangeSelector</code>. (Note that this plot will only be interactive in the HTML version of this book.)</p>
 <!--
diff --git a/docs/D-appendixD.html b/docs/D-appendixD.html
index ce0a84161..65b5a695d 100644
--- a/docs/D-appendixD.html
+++ b/docs/D-appendixD.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>D Learning Check Solutions | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="D Learning Check Solutions | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="D Learning Check Solutions | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -538,12 +531,20 @@ <h1>
 </html>
 <div id="appendixD" class="section level1">
 <h1><span class="header-section-number">D</span> Learning Check Solutions</h1>
+<!-- Albert will make sure the exercises here match up with exercises in the
+reordering of the book. -->
 <div id="chapter-2-solutions" class="section level2">
 <h2><span class="header-section-number">D.1</span> Chapter 2 Solutions</h2>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
 <span class="kw">library</span>(ggplot2)
 <span class="kw">library</span>(nycflights13)</code></pre></div>
-<p><strong>(LC2.1)</strong> What does any <em>ONE</em> row in this <code>flights</code> dataset refer to?</p>
+<p><strong>(LC2.1)</strong> Repeat the above installing steps, but for the <code>dplyr</code>, <code>nycflights13</code>, and <code>knitr</code> packages. This will install the earlier mentioned <code>dplyr</code> package, the <code>nycflights13</code> package containing data on all domestic flights leaving a NYC airport in 2013, and the <code>knitr</code> package for writing reports in R.</p>
+<p><strong>(LC2.2)</strong> “Load” the <code>dplyr</code>, <code>nycflights13</code>, and <code>knitr</code> packages as well by repeating the above steps.</p>
+<p><strong>Solution</strong>: If the following code runs with no errors, you’ve succeeded!</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(nycflights13)
+<span class="kw">library</span>(knitr)</code></pre></div>
+<p><strong>(LC2.3)</strong> What does any <em>ONE</em> row in this <code>flights</code> dataset refer to?</p>
 <ul>
 <li>A. Data on an airline</li>
 <li>B. Data on a flight</li>
@@ -555,7 +556,7 @@ <h2><span class="header-section-number">D.1</span> Chapter 2 Solutions</h2>
 <li>a flight path would be United 1545 to Houston</li>
 <li>a flight would be United 1545 to Houston at a specific date/time. For example: 2013/1/1 at 5:15am.</li>
 </ul>
-<p><strong>(LC2.2)</strong> What are some examples in this dataset of <strong>categorical</strong> variables? What makes them different than <strong>quantitative</strong> variables?</p>
+<p><strong>(LC2.4)</strong> What are some examples in this dataset of <strong>categorical</strong> variables? What makes them different than <strong>quantitative</strong> variables?</p>
 <p><strong>Solution</strong>: Hint: Type <code>?flights</code> in the console to see what all the variables mean!</p>
 <ul>
 <li>Categorical:
@@ -570,13 +571,21 @@ <h2><span class="header-section-number">D.1</span> Chapter 2 Solutions</h2>
 <li><code>time_hour</code> time</li>
 </ul></li>
 </ul>
-<p><strong>(LC2.3)</strong> What does <code>int</code>, <code>dbl</code>, and <code>chr</code> mean in the output above?</p>
+<p><strong>(LC2.5)</strong> What does <code>int</code>, <code>dbl</code>, and <code>chr</code> mean in the output above?</p>
 <p><strong>Solution</strong>:</p>
 <ul>
 <li><code>int</code>: integer. Used to count things i.e. a discrete value. Ex: the # of cars parked in a lot</li>
 <li><code>dbl</code>: double. Used to measure things. i.e. a continuous value. Ex: your height in inches</li>
 <li><code>chr</code>: character. i.e. text</li>
 </ul>
+<p><strong>(LC2.6)</strong> What properties of the observational unit do each of <code>lat</code>, <code>lon</code>, <code>alt</code>, <code>tz</code>, <code>dst</code>, and <code>tzone</code> describe for the <code>airports</code> data frame? Note that you may want to use <code>?airports</code> to get more information.</p>
+<p><strong>Solution</strong>: <code>lat</code> <code>long</code> represent the airport geographic coordinates, <code>alt</code> is the altitude above sea level of the airport (Run <code>airports %&gt;% filter(faa == &quot;DEN&quot;)</code> to see the altitude of Denver International Airport), <code>tz</code> is the time zone difference with respect to GMT in London UK, <code>dst</code> is the daylight savings time zone, and <code>tzone</code> is the time zone label.</p>
+<p><strong>(LC2.7)</strong> Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.</p>
+<p><strong>Solution</strong>:</p>
+<ul>
+<li>In the <code>weather</code> example in LC3.8, the combination of <code>origin</code>, <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code> are identification variables as they identify the observation in question.</li>
+<li>Anything else pertains to observations: <code>temp</code>, <code>humid</code>, <code>wind_speed</code>, etc.</li>
+</ul>
 <hr />
 </div>
 <div id="chapter-3-solutions" class="section level2">
@@ -598,7 +607,7 @@ <h2><span class="header-section-number">D.2</span> Chapter 3 Solutions</h2>
 <p><strong>Solution</strong>: Many possibilities for this one, see the plot below. Is there a pattern in departure delay depending on when the flight is scheduled to depart? Interestingly, there seems to be only two blocks of time where flights depart.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_time, <span class="dt">y =</span> dep_delay)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_point</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-498-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-502-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p><strong>(LC3.7)</strong> Why is setting the <code>alpha</code> argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot?</p>
 <p><strong>Solution</strong>: Why is setting the <code>alpha</code> argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? <em>It thins out the points so we address overplotting. But more importantly it hints at the (statistical) <strong>density</strong> and <strong>distribution</strong> of the points: where are the points concentrated, where do they occur. We will see more about densities and distributions in Chapter 6 when we switch gears to statistical topics.</em></p>
 <p><strong>(LC3.8)</strong> After viewing the Figure <a href="3-viz.html#fig:alpha">3.4</a> above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the <code>alpha = 0.2</code> set in Figure <a href="3-viz.html#fig:noalpha">3.2</a>?</p>
@@ -615,8 +624,8 @@ <h2><span class="header-section-number">D.2</span> Chapter 3 Solutions</h2>
 <p><strong>Solution</strong>: Plot a time series of a variable other than <code>temp</code> for Newark Airport in the first 15 days of January 2013. <em>Humidity is a good one to look at, since this very closely related to the cycles of a day.</em></p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> early_january_weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> time_hour, <span class="dt">y =</span> humid)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_line</span>()</code></pre></div>
-<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-499-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<p><strong>(LC3.14)</strong> What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures?</p>
+<p><img src="ismaykimkuyper_files/figure-html/unnamed-chunk-503-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><strong>(LC3.14)</strong> What does changing the number of bins from 30 to 40 tell us about the distribution of temperatures?</p>
 <p><strong>Solution</strong>: The distribution doesn’t change much. But by refining the bin width, we see that the temperature data has a high degree of accuracy. What do I mean by accuracy? Looking at the <code>temp</code> variabile by <code>View(weather)</code>, we see that the precision of each temperature recording is 2 decimal places.</p>
 <p><strong>(LC3.15)</strong> Would you classify the distribution of temperatures as symmetric or skewed?</p>
 <p><strong>Solution</strong>: It is rather symmetric, i.e. there are no <strong>long tails</strong> on only one side of the distribution</p>
@@ -644,7 +653,7 @@ <h2><span class="header-section-number">D.2</span> Chapter 3 Solutions</h2>
 <p><strong>(LC3.20)</strong> For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics.</p>
 <p><strong>Solution</strong>:</p>
 <ul>
-<li>We’d have 365 facets to look at. Way to many.</li>
+<li>We’d have 365 facets to look at. Way too many.</li>
 <li>We don’t really care about day-to-day fluctuation in weather so much, but maybe more week-to-week variation. We’d like to focus on seasonal trends.</li>
 </ul>
 <p><strong>(LC3.21)</strong> Does the <code>temp</code> variable in the <code>weather</code> data-set have a lot of variability? Why do you say that?</p>
@@ -653,12 +662,12 @@ <h2><span class="header-section-number">D.2</span> Chapter 3 Solutions</h2>
 <p><strong>Solution</strong>: It appears to be an outlier. Let’s revisit the use of the <code>filter</code> command to hone in on it. We want all data points where the <code>month</code> is 5 and <code>temp&lt;25</code></p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">weather <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">filter</span>(month<span class="op">==</span><span class="dv">5</span> <span class="op">&amp;</span><span class="st"> </span>temp <span class="op">&lt;</span><span class="st"> </span><span class="dv">25</span>)</code></pre></div>
-<pre><code># A tibble: 1 x 15
+<pre><code># A tibble: 1 x 16
   origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
   &lt;chr&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt;      &lt;dbl&gt;     &lt;dbl&gt;
 1 JFK     2013     5     8    22  13.1  12.0  95.3       80       8.06        NA
-# … with 4 more variables: precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;,
-#   time_hour &lt;dttm&gt;</code></pre>
+# … with 5 more variables: precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;,
+#   time_hour &lt;dttm&gt;, temp_in_C &lt;dbl&gt;</code></pre>
 <p>There appears to be only one hour and only at JFK that recorded 13.1 F (-10.5 C) in the month of May. This is probably a data entry mistake! Why wasn’t the weather at least similar at EWR (Newark) and LGA (La Guardia)?</p>
 <p><strong>(LC3.23)</strong> Which months have the highest variability in temperature? What reasons do you think this is?</p>
 <p><strong>Solution</strong>: We are now interested in the <strong>spread</strong> of the data. One measure some of you may have seen previously is the standard deviation. But in this plot we can read off the Interquartile Range (IQR):</p>
@@ -791,8 +800,8 @@ <h2><span class="header-section-number">D.2</span> Chapter 3 Solutions</h2>
 </tr>
 </tbody>
 </table>
-<p><strong>(LC3.24)</strong> We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?</p>
-<p><strong>Solution</strong>: Because we need a way to group many numerical observations together, say by grouping by month. For pressure, we have near unique values for pressure, i.e. no groups, so we can’t make boxplots.</p>
+<p><strong>(LC3.24)</strong> We looked at the distribution of the numerical variable <code>temp</code> split by the numerical variable <code>month</code> that we converted to a categorical variable using the <code>factor()</code> function. Why would a boxplot of <code>temp</code> split by the numerical variable <code>pressure</code> similarly converted to a categorical variable using the <code>factor()</code> not be informative?</p>
+<p><strong>Solution</strong>: Because there are 12 unique values of <code>month</code> yielding only 12 boxes in our boxplot. There are many more unique values of <code>pressure</code> (469 unique values in fact), because values are to the first decimal place. This would lead to 469 boxes, which is too many for people to digest.</p>
 <p><strong>(LC3.25)</strong> Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?</p>
 <p><strong>Solution</strong>: In a histogram, the bin corresponding to where an outlier lies may not by high enough for us to see. In a boxplot, they are explicitly labelled separately.</p>
 <p><strong>(LC3.26)</strong> Why are histograms inappropriate for visualizing categorical variables?</p>
@@ -825,145 +834,8 @@ <h2><span class="header-section-number">D.2</span> Chapter 3 Solutions</h2>
 <h2><span class="header-section-number">D.3</span> Chapter 4 Solutions</h2>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
 <span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(nycflights13)
-<span class="kw">library</span>(tidyr)
-<span class="kw">library</span>(readr)</code></pre></div>
-<p><strong>(LC4.1)</strong> Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article <a href="https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/">Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?</a></p>
-<pre><code># A tibble: 3 x 4
-  country     beer_servings spirit_servings wine_servings
-  &lt;chr&gt;               &lt;int&gt;           &lt;int&gt;         &lt;int&gt;
-1 Canada                240             122           100
-2 South Korea           140              16             9
-3 USA                   249             158            84</code></pre>
-<p>This data frame is not in tidy format. What would it look like if it were?</p>
-<p><strong>Solution</strong>: There are three variables of information included: country, alcohol type, and number of servings. In tidy format, each of these variables of information are included in their own column.</p>
-<pre><code># A tibble: 9 x 3
-  country     `alcohol type` servings
-  &lt;chr&gt;       &lt;chr&gt;             &lt;int&gt;
-1 Canada      beer                240
-2 Canada      spirit              122
-3 Canada      wine                100
-4 South Korea beer                140
-5 South Korea spirit               16
-6 South Korea wine                  9
-7 USA         beer                249
-8 USA         spirit              158
-9 USA         wine                 84</code></pre>
-<p>Note that how the rows are sorted is inconsequential in whether or not the data frame is in tidy format. In other words, the following data frame sorted by alcohol type instead of country is equally in tidy format.</p>
-<pre><code># A tibble: 9 x 3
-  country     `alcohol type` servings
-  &lt;chr&gt;       &lt;chr&gt;             &lt;int&gt;
-1 Canada      beer                240
-2 South Korea beer                140
-3 USA         beer                249
-4 Canada      spirit              122
-5 South Korea spirit               16
-6 USA         spirit              158
-7 Canada      wine                100
-8 South Korea wine                  9
-9 USA         wine                 84</code></pre>
-<p><strong>(LC4.2)</strong> What properties of the observational unit do each of <code>lat</code>, <code>lon</code>, <code>alt</code>, <code>tz</code>, <code>dst</code>, and <code>tzone</code> describe for the <code>airports</code> data frame? Note that you may want to use <code>?airports</code> to get more information.</p>
-<p><strong>Solution</strong>: <code>lat</code> <code>long</code> represent the airport geographic coordinates, <code>alt</code> is the altitude above sea level of the airport (Run <code>airports %&gt;% filter(faa == &quot;DEN&quot;)</code> to see the altitude of Denver International Airport), <code>tz</code> is the time zone difference with respect to GMT in London UK, <code>dst</code> is the daylight savings time zone, and <code>tzone</code> is the time zone label.</p>
-<p><strong>(LC4.3)</strong> Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.</p>
-<p><strong>Solution</strong>:</p>
-<ul>
-<li>In the <code>weather</code> example in LC3.8, the combination of <code>origin</code>, <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code> are identification variables as they identify the observation in question.</li>
-<li>Anything else pertains to observations: <code>temp</code>, <code>humid</code>, <code>wind_speed</code>, etc.</li>
-</ul>
-<p><strong>(LC4.4)</strong> Convert the <code>dem_score</code> data frame into a tidy data frame and assign the name of <code>dem_score_tidy</code> to the resulting long-formatted data frame.</p>
-<p><strong>Solution</strong>: Running the following in the console:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dem_score_tidy &lt;-<span class="st"> </span><span class="kw">gather</span>(<span class="dt">data =</span> dem_score, <span class="dt">key =</span> year, <span class="dt">value =</span> democracy_score, <span class="op">-</span><span class="st"> </span>country)</code></pre></div>
-<p>Let’s now compare the <code>dem_score</code> and <code>dem_score_tidy</code>. <code>dem_score</code> has democracy score information for each year in columns, whereas in <code>dem_score_tidy</code> there are explicit variables <code>year</code> and <code>democracy_score</code>. While both representations of the data contain the same information, we can only use <code>ggplot()</code> to create plots using the <code>dem_score_tidy</code> data frame.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dem_score</code></pre></div>
-<pre><code># A tibble: 96 x 10
-   country    `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
-   &lt;chr&gt;       &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
- 1 Albania        -9     -9     -9     -9     -9     -9     -9     -9      5
- 2 Argentina      -9     -1     -1     -9     -9     -9     -8      8      7
- 3 Armenia        -9     -7     -7     -7     -7     -7     -7     -7      7
- 4 Australia      10     10     10     10     10     10     10     10     10
- 5 Austria        10     10     10     10     10     10     10     10     10
- 6 Azerbaijan     -9     -7     -7     -7     -7     -7     -7     -7      1
- 7 Belarus        -9     -7     -7     -7     -7     -7     -7     -7      7
- 8 Belgium        10     10     10     10     10     10     10     10     10
- 9 Bhutan        -10    -10    -10    -10    -10    -10    -10    -10    -10
-10 Bolivia        -4     -3     -3     -4     -7     -7      8      9      9
-# … with 86 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dem_score_tidy</code></pre></div>
-<pre><code># A tibble: 864 x 3
-   country    year  democracy_score
-   &lt;chr&gt;      &lt;chr&gt;           &lt;dbl&gt;
- 1 Albania    1952               -9
- 2 Argentina  1952               -9
- 3 Armenia    1952               -9
- 4 Australia  1952               10
- 5 Austria    1952               10
- 6 Azerbaijan 1952               -9
- 7 Belarus    1952               -9
- 8 Belgium    1952               10
- 9 Bhutan     1952              -10
-10 Bolivia    1952               -4
-# … with 854 more rows</code></pre>
-<p><strong>(LC4.5)</strong> Read in the life expectancy data stored at <a href="https://moderndive.com/data/le_mess.csv" class="uri">https://moderndive.com/data/le_mess.csv</a> and convert it to a tidy data frame.</p>
-<p><strong>Solution</strong>: The code is similar</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">life_expectancy &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&#39;https://moderndive.com/data/le_mess.csv&#39;</span>)
-life_expectancy_tidy &lt;-<span class="st"> </span><span class="kw">gather</span>(<span class="dt">data =</span> life_expectancy, <span class="dt">key =</span> year, <span class="dt">value =</span> life_expectancy, <span class="op">-</span>country)</code></pre></div>
-<p>We observe the same construct structure with respect to <code>year</code> in <code>life_expectancy</code> vs <code>life_expectancy_tidy</code> as we did in <code>dem_score</code> vs <code>dem_score_tidy</code>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">life_expectancy</code></pre></div>
-<pre><code># A tibble: 202 x 67
-   country `1951` `1952` `1953` `1954` `1955` `1956` `1957` `1958` `1959` `1960`
-   &lt;chr&gt;    &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
- 1 Afghan…   27.1   27.7   28.2   28.7   29.3   29.8   30.3   30.9   31.4   31.9
- 2 Albania   54.7   55.2   55.8   56.6   57.4   58.4   59.5   60.6   61.8   62.9
- 3 Algeria   43.0   43.5   44.0   44.4   44.9   45.4   45.9   46.4   47.0   47.5
- 4 Angola    31.0   31.6   32.1   32.7   33.2   33.8   34.3   34.9   35.4   36.0
- 5 Antigu…   58.3   58.8   59.3   59.9   60.4   60.9   61.4   62.0   62.5   63.0
- 6 Argent…   61.9   62.5   63.1   63.6   64.0   64.4   64.7   65     65.2   65.4
- 7 Armenia   62.7   63.1   63.6   64.1   64.5   65     65.4   65.9   66.4   66.9
- 8 Aruba     59.0   60.0   61.0   61.9   62.7   63.4   64.1   64.7   65.2   65.7
- 9 Austra…   68.7   69.1   69.7   69.8   70.2   70.0   70.3   70.9   70.4   70.9
-10 Austria   65.2   66.8   67.3   67.3   67.6   67.7   67.5   68.5   68.4   68.8
-# … with 192 more rows, and 56 more variables: `1961` &lt;dbl&gt;, `1962` &lt;dbl&gt;,
-#   `1963` &lt;dbl&gt;, `1964` &lt;dbl&gt;, `1965` &lt;dbl&gt;, `1966` &lt;dbl&gt;, `1967` &lt;dbl&gt;,
-#   `1968` &lt;dbl&gt;, `1969` &lt;dbl&gt;, `1970` &lt;dbl&gt;, `1971` &lt;dbl&gt;, `1972` &lt;dbl&gt;,
-#   `1973` &lt;dbl&gt;, `1974` &lt;dbl&gt;, `1975` &lt;dbl&gt;, `1976` &lt;dbl&gt;, `1977` &lt;dbl&gt;,
-#   `1978` &lt;dbl&gt;, `1979` &lt;dbl&gt;, `1980` &lt;dbl&gt;, `1981` &lt;dbl&gt;, `1982` &lt;dbl&gt;,
-#   `1983` &lt;dbl&gt;, `1984` &lt;dbl&gt;, `1985` &lt;dbl&gt;, `1986` &lt;dbl&gt;, `1987` &lt;dbl&gt;,
-#   `1988` &lt;dbl&gt;, `1989` &lt;dbl&gt;, `1990` &lt;dbl&gt;, `1991` &lt;dbl&gt;, `1992` &lt;dbl&gt;,
-#   `1993` &lt;dbl&gt;, `1994` &lt;dbl&gt;, `1995` &lt;dbl&gt;, `1996` &lt;dbl&gt;, `1997` &lt;dbl&gt;,
-#   `1998` &lt;dbl&gt;, `1999` &lt;dbl&gt;, `2000` &lt;dbl&gt;, `2001` &lt;dbl&gt;, `2002` &lt;dbl&gt;,
-#   `2003` &lt;dbl&gt;, `2004` &lt;dbl&gt;, `2005` &lt;dbl&gt;, `2006` &lt;dbl&gt;, `2007` &lt;dbl&gt;,
-#   `2008` &lt;dbl&gt;, `2009` &lt;dbl&gt;, `2010` &lt;dbl&gt;, `2011` &lt;dbl&gt;, `2012` &lt;dbl&gt;,
-#   `2013` &lt;dbl&gt;, `2014` &lt;dbl&gt;, `2015` &lt;dbl&gt;, `2016` &lt;dbl&gt;</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">life_expectancy_tidy</code></pre></div>
-<pre><code># A tibble: 13,332 x 3
-   country             year  life_expectancy
-   &lt;chr&gt;               &lt;chr&gt;           &lt;dbl&gt;
- 1 Afghanistan         1951             27.1
- 2 Albania             1951             54.7
- 3 Algeria             1951             43.0
- 4 Angola              1951             31.0
- 5 Antigua and Barbuda 1951             58.3
- 6 Argentina           1951             61.9
- 7 Armenia             1951             62.7
- 8 Aruba               1951             59.0
- 9 Australia           1951             68.7
-10 Austria             1951             65.2
-# … with 13,322 more rows</code></pre>
-<p><strong>(LC4.6)</strong> What are common characteristics of “tidy” datasets?</p>
-<p><strong>Solution</strong>: Rows correspond to observations, while columns correspond to variables.</p>
-<p><strong>(LC4.7)</strong> What makes “tidy” datasets useful for organizing data?</p>
-<p><strong>Solution</strong>: Tidy datasets are an organized way of viewing data. We’ll see later that this format is required for the <code>ggplot2</code> and <code>dplyr</code> packages for data visualization and wrangling.</p>
-<p><strong>(LC4.8)</strong> What are some advantages of data in normal forms? What are some disadvantages?</p>
-<p><strong>Solution</strong>: When datasets are in normal form, we can easily <code>_join</code> them with other datasets! For example, can we join the <code>flights</code> data with the <code>planes</code> data? We’ll see this more in Chapter 5!</p>
-<hr />
-</div>
-<div id="chapter-5-solutions" class="section level2">
-<h2><span class="header-section-number">D.4</span> Chapter 5 Solutions</h2>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
-<span class="kw">library</span>(ggplot2)
 <span class="kw">library</span>(nycflights13)</code></pre></div>
-<p><strong>(LC5.1)</strong> What’s another way using the “not” operator <code>!</code> we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the <code>flights</code> data frame? Test this out using the code above.</p>
+<p><strong>(LC4.1)</strong> What’s another way using the “not” operator <code>!</code> to filter only the rows that are not going to Burlington, VT nor Seattle, WA in the <code>flights</code> data frame? Test this out using the code above.</p>
 <p><strong>Solution</strong>:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Original in book</span>
 not_BTV_SEA &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
@@ -976,13 +848,13 @@ <h2><span class="header-section-number">D.4</span> Chapter 5 Solutions</h2>
 <span class="co"># Yet another way</span>
 not_BTV_SEA &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">filter</span>(dest <span class="op">!=</span><span class="st"> &quot;BTV&quot;</span> <span class="op">&amp;</span><span class="st"> </span>dest <span class="op">!=</span><span class="st"> &quot;SEA&quot;</span>)</code></pre></div>
-<p><strong>(LC5.2)</strong> Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach?</p>
+<p><strong>(LC4.2)</strong> Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach?</p>
 <p><strong>Solution</strong>: The missing patients may have died of lung cancer! So to ignore them might seriously <strong>bias</strong> your results! It is very important to think of what the consequences on your analysis are of ignoring missing data! Ask yourself:</p>
 <ul>
 <li>There is a systematic reasons why certain values are missing? If so, you might be biasing your results!</li>
 <li>If there isn’t, then it might be ok to “sweep missing values under the rug.”</li>
 </ul>
-<p><strong>(LC5.3)</strong> Modify the above <code>summarize</code> function to create <code>summary_temp</code> to also use the <code>n()</code> summary function: <code>summarize(count = n())</code>. What does the returned value correspond to?</p>
+<p><strong>(LC4.3)</strong> Modify the above <code>summarize</code> function to create <code>summary_temp</code> to also use the <code>n()</code> summary function: <code>summarize(count = n())</code>. What does the returned value correspond to?</p>
 <p><strong>Solution</strong>: It corresponds to a count of the number of observations/rows:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">weather <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())</code></pre></div>
@@ -990,7 +862,7 @@ <h2><span class="header-section-number">D.4</span> Chapter 5 Solutions</h2>
   count
   &lt;int&gt;
 1 26115</code></pre>
-<p><strong>(LC5.4)</strong> Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run <code>summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE))</code> first.</p>
+<p><strong>(LC4.4)</strong> Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run <code>summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE))</code> first.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st">   </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">summarize</span>(<span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</code></pre></div>
@@ -1002,182 +874,11 @@ <h2><span class="header-section-number">D.4</span> Chapter 5 Solutions</h2>
   &lt;dbl&gt;
 1  55.3</code></pre>
 <p>Because after the first <code>summarize()</code>, the variable <code>temp</code> disappears as it has been collapsed to the value <code>mean</code>. So when we try to run the second <code>summarize()</code>, it can’t find the variable <code>temp</code> to compute the standard deviation of.</p>
-<p><strong>(LC5.5)</strong> Recall from Chapter <a href="3-viz.html#viz">3</a> when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the <code>summary_monthly_temp</code> data frame tell us about temperatures in New York City throughout the year?</p>
+<p><strong>(LC4.5)</strong> Recall from Chapter <a href="3-viz.html#viz">3</a> when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the <code>summary_monthly_temp</code> data frame tell us about temperatures in New York City throughout the year?</p>
 <p><strong>Solution</strong>:</p>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<thead>
-<tr>
-<th style="text-align:right;">
-month
-</th>
-<th style="text-align:right;">
-mean
-</th>
-<th style="text-align:right;">
-std_dev
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:right;">
-35.6
-</td>
-<td style="text-align:right;">
-10.22
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:right;">
-34.3
-</td>
-<td style="text-align:right;">
-6.98
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:right;">
-39.9
-</td>
-<td style="text-align:right;">
-6.25
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:right;">
-51.7
-</td>
-<td style="text-align:right;">
-8.79
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:right;">
-61.8
-</td>
-<td style="text-align:right;">
-9.68
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:right;">
-72.2
-</td>
-<td style="text-align:right;">
-7.55
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:right;">
-80.1
-</td>
-<td style="text-align:right;">
-7.12
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:right;">
-74.5
-</td>
-<td style="text-align:right;">
-5.19
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:right;">
-67.4
-</td>
-<td style="text-align:right;">
-8.47
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:right;">
-60.1
-</td>
-<td style="text-align:right;">
-8.85
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:right;">
-45.0
-</td>
-<td style="text-align:right;">
-10.44
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:right;">
-38.4
-</td>
-<td style="text-align:right;">
-9.98
-</td>
-</tr>
-</tbody>
-</table>
 <p>The standard deviation is a quantification of <strong>spread</strong> and <strong>variability</strong>. We see that the period in November, December, and January has the most variation in weather, so you can expect very different temperatures on different days.</p>
-<p><strong>(LC5.6)</strong> What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC?</p>
+<p><strong>(LC4.6)</strong> What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC?</p>
 <p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">summary_temp_by_day &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(year, month, day) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(
-          <span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
-          <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)
-          )
-summary_temp_by_day</code></pre></div>
-<pre><code># A tibble: 364 x 5
-# Groups:   year, month [?]
-    year month   day  mean std_dev
-   &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;   &lt;dbl&gt;
- 1  2013     1     1  37.0    4.00
- 2  2013     1     2  28.7    3.45
- 3  2013     1     3  30.0    2.58
- 4  2013     1     4  34.9    2.45
- 5  2013     1     5  37.2    4.01
- 6  2013     1     6  40.1    4.40
- 7  2013     1     7  40.6    3.68
- 8  2013     1     8  40.1    5.77
- 9  2013     1     9  43.2    5.40
-10  2013     1    10  43.8    2.95
-# … with 354 more rows</code></pre>
 <p>Note: <code>group_by(day)</code> is not enough, because <code>day</code> is a value between 1-31. We need to <code>group_by(year, month, day)</code></p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
 <span class="kw">library</span>(nycflights13)
@@ -1188,842 +889,20 @@ <h2><span class="header-section-number">D.4</span> Chapter 5 Solutions</h2>
           <span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
           <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)
           )</code></pre></div>
-<p><strong>(LC5.7)</strong> Recreate <code>by_monthly_origin</code>, but instead of grouping via <code>group_by(origin, month)</code>, group variables in a different order <code>group_by(month, origin)</code>. What differs in the resulting dataset?</p>
+<p><strong>(LC4.7)</strong> Recreate <code>by_monthly_origin</code>, but instead of grouping via <code>group_by(origin, month)</code>, group variables in a different order <code>group_by(month, origin)</code>. What differs in the resulting dataset?</p>
 <p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">by_monthly_origin &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(month, origin) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())</code></pre></div>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">by_monthly_origin</code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<thead>
-<tr>
-<th style="text-align:right;">
-month
-</th>
-<th style="text-align:left;">
-origin
-</th>
-<th style="text-align:right;">
-count
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-9893
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9161
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-7950
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-9107
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-8421
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-7423
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10420
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9697
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8717
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10531
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9218
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8581
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10592
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9397
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8807
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10175
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9472
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8596
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10475
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-10023
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8927
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10359
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9983
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8985
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-9550
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-8908
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-9116
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-10104
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9143
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-9642
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-9707
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-8710
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-8851
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-9922
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-9146
-</td>
-</tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-9067
-</td>
-</tr>
-</tbody>
-</table>
 <p>In <code>by_monthly_origin</code> the <code>month</code> column is now first and the rows are sorted by <code>month</code> instead of origin. If you compare the values of <code>count</code> in <code>by_origin_monthly</code> and <code>by_monthly_origin</code> using the <code>View()</code> function, you’ll see that the values are actually the same, just presented in a different order.</p>
-<p><strong>(LC5.8)</strong> How could we identify how many flights left each of the three airports for each <code>carrier</code>?</p>
+<p><strong>(LC4.8)</strong> How could we identify how many flights left each of the three airports for each <code>carrier</code>?</p>
 <p><strong>Solution</strong>: We could summarize the count from each airport using the <code>n()</code> function, which <em>counts rows</em>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">count_flights_by_airport &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(origin, carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count=</span><span class="kw">n</span>())</code></pre></div>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">count_flights_by_airport</code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<thead>
-<tr>
-<th style="text-align:left;">
-origin
-</th>
-<th style="text-align:left;">
-carrier
-</th>
-<th style="text-align:right;">
-count
-</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-9E
-</td>
-<td style="text-align:right;">
-1268
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-AA
-</td>
-<td style="text-align:right;">
-3487
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-AS
-</td>
-<td style="text-align:right;">
-714
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-B6
-</td>
-<td style="text-align:right;">
-6557
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-DL
-</td>
-<td style="text-align:right;">
-4342
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-EV
-</td>
-<td style="text-align:right;">
-43939
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-MQ
-</td>
-<td style="text-align:right;">
-2276
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-OO
-</td>
-<td style="text-align:right;">
-6
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-UA
-</td>
-<td style="text-align:right;">
-46087
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-US
-</td>
-<td style="text-align:right;">
-4405
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-VX
-</td>
-<td style="text-align:right;">
-1566
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:left;">
-WN
-</td>
-<td style="text-align:right;">
-6188
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-9E
-</td>
-<td style="text-align:right;">
-14651
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-AA
-</td>
-<td style="text-align:right;">
-13783
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-B6
-</td>
-<td style="text-align:right;">
-42076
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-DL
-</td>
-<td style="text-align:right;">
-20701
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-EV
-</td>
-<td style="text-align:right;">
-1408
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-HA
-</td>
-<td style="text-align:right;">
-342
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-MQ
-</td>
-<td style="text-align:right;">
-7193
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-UA
-</td>
-<td style="text-align:right;">
-4534
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-US
-</td>
-<td style="text-align:right;">
-2995
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:left;">
-VX
-</td>
-<td style="text-align:right;">
-3596
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-9E
-</td>
-<td style="text-align:right;">
-2541
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-AA
-</td>
-<td style="text-align:right;">
-15459
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-B6
-</td>
-<td style="text-align:right;">
-6002
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-DL
-</td>
-<td style="text-align:right;">
-23067
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-EV
-</td>
-<td style="text-align:right;">
-8826
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-F9
-</td>
-<td style="text-align:right;">
-685
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-FL
-</td>
-<td style="text-align:right;">
-3260
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-MQ
-</td>
-<td style="text-align:right;">
-16928
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-OO
-</td>
-<td style="text-align:right;">
-26
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-UA
-</td>
-<td style="text-align:right;">
-8044
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-US
-</td>
-<td style="text-align:right;">
-13136
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-WN
-</td>
-<td style="text-align:right;">
-6087
-</td>
-</tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:left;">
-YV
-</td>
-<td style="text-align:right;">
-601
-</td>
-</tr>
-</tbody>
-</table>
 <p>All remarkably similar! Note: the <code>n()</code> function counts rows, whereas the <code>sum(VARIABLE_NAME)</code> funciton sums all values of a certain numerical variable <code>VARIABLE_NAME</code>.</p>
-<p><strong>(LC5.9)</strong> How does the <code>filter</code> operation differ from a <code>group_by</code> followed by a <code>summarize</code>?</p>
+<p><strong>(LC4.9)</strong> How does the <code>filter</code> operation differ from a <code>group_by</code> followed by a <code>summarize</code>?</p>
 <p><strong>Solution</strong>:</p>
 <ul>
 <li><code>filter</code> picks out rows from the original dataset without modifying them, whereas</li>
 <li><code>group_by %&gt;% summarize</code> computes summaries of numerical variables, and hence reports new values.</li>
 </ul>
-<p><strong>(LC5.10)</strong> What do positive values of the <code>gain</code> variable in <code>flights</code> correspond to? What about negative values? And what about a zero value?</p>
+<p><strong>(LC4.10)</strong> What do positive values of the <code>gain</code> variable in <code>flights</code> correspond to? What about negative values? And what about a zero value?</p>
 <p><strong>Solution</strong>:</p>
 <ul>
 <li>Say a flight departed 20 minutes late, i.e. <code>dep_delay = 20</code></li>
@@ -2032,141 +911,25 @@ <h2><span class="header-section-number">D.4</span> Chapter 5 Solutions</h2>
 <li>0 means the departure and arrival time were the same, so no time was made up in the air. We see in most cases that the <code>gain</code> is near 0 minutes.</li>
 <li>I never understood this. If the pilot says “we’re going make up time in the air” because of delay by flying faster, why don’t you always just fly faster to begin with?</li>
 </ul>
-<p><strong>(LC5.11)</strong> Could we create the <code>dep_delay</code> and <code>arr_delay</code> columns by simply subtracting <code>dep_time</code> from <code>sched_dep_time</code> and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in <code>flights</code>.</p>
+<p><strong>(LC4.11)</strong> Could we create the <code>dep_delay</code> and <code>arr_delay</code> columns by simply subtracting <code>dep_time</code> from <code>sched_dep_time</code> and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in <code>flights</code>.</p>
 <p><strong>Solution</strong>: No because you can’t do direct arithmetic on times. The difference in time between 12:03 and 11:59 is 4 minutes, but <code>1203-1159 = 44</code></p>
-<p><strong>(LC5.12)</strong> What can we say about the distribution of <code>gain</code>? Describe it in a few sentences using the plot and the <code>gain_summary</code> data frame values.</p>
+<p><strong>(LC4.12)</strong> What can we say about the distribution of <code>gain</code>? Describe it in a few sentences using the plot and the <code>gain_summary</code> data frame values.</p>
 <p><strong>Solution</strong>: Most of the time the gain is a little under zero, most of the time the gain is between -50 and 50 minutes. There are some extreme cases however!</p>
-<p><strong>(LC5.13)</strong> Looking at Figure <a href="4-wrangling.html#fig:reldiagram">4.7</a>, when joining <code>flights</code> and <code>weather</code> (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code>, and <code>origin</code>, and not just <code>hour</code>?</p>
+<p><strong>(LC4.13)</strong> Looking at Figure <a href="4-wrangling.html#fig:reldiagram">4.7</a>, when joining <code>flights</code> and <code>weather</code> (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code>, and <code>origin</code>, and not just <code>hour</code>?</p>
 <p><strong>Solution</strong>: Because <code>hour</code> is simply a value between 0 and 23; to identify a <em>specific</em> hour, we need to know which year, month, day and at which airport.</p>
-<p><strong>(LC5.14)</strong> What surprises you about the top 10 destinations from NYC in 2013?</p>
+<p><strong>(LC4.14)</strong> What surprises you about the top 10 destinations from NYC in 2013?</p>
 <p><strong>Solution</strong>: This question is subjective! What surprises me is the high number of flights to Boston. Wouldn’t it be easier and quicker to take the train?</p>
-<p><strong>(LC5.15)</strong> What are some ways to select all three of the <code>dest</code>, <code>air_time</code>, and <code>distance</code> variables from <code>flights</code>? Give the code showing how to do this in at least three different ways.</p>
+<p><strong>(LC4.15)</strong> What are some advantages of data in normal forms? What are some disadvantages?</p>
+<p><strong>Solution</strong>: When datasets are in normal form, we can easily <code>_join</code> them with other datasets! For example, we can join the <code>flights</code> data with the <code>planes</code> data.</p>
+<p><strong>(LC4.16)</strong> What are some ways to select all three of the <code>dest</code>, <code>air_time</code>, and <code>distance</code> variables from <code>flights</code>? Give the code showing how to do this in at least three different ways.</p>
 <p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># The regular way:</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(dest, air_time, distance)</code></pre></div>
-<pre><code># A tibble: 336,776 x 3
-   dest  air_time distance
-   &lt;chr&gt;    &lt;dbl&gt;    &lt;dbl&gt;
- 1 IAH        227     1400
- 2 IAH        227     1416
- 3 MIA        160     1089
- 4 BQN        183     1576
- 5 ATL        116      762
- 6 ORD        150      719
- 7 FLL        158     1065
- 8 IAD         53      229
- 9 MCO        140      944
-10 ORD        138      733
-# … with 336,766 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Since they are sequential columns in the dataset</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(dest<span class="op">:</span>distance)</code></pre></div>
-<pre><code># A tibble: 336,776 x 3
-   dest  air_time distance
-   &lt;chr&gt;    &lt;dbl&gt;    &lt;dbl&gt;
- 1 IAH        227     1400
- 2 IAH        227     1416
- 3 MIA        160     1089
- 4 BQN        183     1576
- 5 ATL        116      762
- 6 ORD        150      719
- 7 FLL        158     1065
- 8 IAD         53      229
- 9 MCO        140      944
-10 ORD        138      733
-# … with 336,766 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Not as effective, by removing everything else</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span>year, <span class="op">-</span>month, <span class="op">-</span>day, <span class="op">-</span>dep_time, <span class="op">-</span>sched_dep_time, <span class="op">-</span>dep_delay, <span class="op">-</span>arr_time,
-         <span class="op">-</span>sched_arr_time, <span class="op">-</span>arr_delay, <span class="op">-</span>carrier, <span class="op">-</span>flight, <span class="op">-</span>tailnum, <span class="op">-</span>origin, 
-         <span class="op">-</span>hour, <span class="op">-</span>minute, <span class="op">-</span>time_hour)</code></pre></div>
-<pre><code># A tibble: 336,776 x 6
-   dest  air_time distance  gain hours gain_per_hour
-   &lt;chr&gt;    &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;         &lt;dbl&gt;
- 1 IAH        227     1400    -9 3.78          -2.38
- 2 IAH        227     1416   -16 3.78          -4.23
- 3 MIA        160     1089   -31 2.67         -11.6 
- 4 BQN        183     1576    17 3.05           5.57
- 5 ATL        116      762    19 1.93           9.83
- 6 ORD        150      719   -16 2.5           -6.4 
- 7 FLL        158     1065   -24 2.63          -9.11
- 8 IAD         53      229    11 0.883         12.5 
- 9 MCO        140      944     5 2.33           2.14
-10 ORD        138      733   -10 2.3           -4.35
-# … with 336,766 more rows</code></pre>
-<p><strong>(LC5.16)</strong> How could one use <code>starts_with</code>, <code>ends_with</code>, and <code>contains</code> to select columns from the <code>flights</code> data frame? Provide three different examples in total: one for <code>starts_with</code>, one for <code>ends_with</code>, and one for <code>contains</code>.</p>
+<p><strong>(LC4.17)</strong> How could one use <code>starts_with</code>, <code>ends_with</code>, and <code>contains</code> to select columns from the <code>flights</code> data frame? Provide three different examples in total: one for <code>starts_with</code>, one for <code>ends_with</code>, and one for <code>contains</code>.</p>
 <p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Anything that starts with &quot;d&quot;</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(<span class="kw">starts_with</span>(<span class="st">&quot;d&quot;</span>))</code></pre></div>
-<pre><code># A tibble: 336,776 x 5
-     day dep_time dep_delay dest  distance
-   &lt;int&gt;    &lt;int&gt;     &lt;dbl&gt; &lt;chr&gt;    &lt;dbl&gt;
- 1     1      517         2 IAH       1400
- 2     1      533         4 IAH       1416
- 3     1      542         2 MIA       1089
- 4     1      544        -1 BQN       1576
- 5     1      554        -6 ATL        762
- 6     1      554        -4 ORD        719
- 7     1      555        -5 FLL       1065
- 8     1      557        -3 IAD        229
- 9     1      557        -3 MCO        944
-10     1      558        -2 ORD        733
-# … with 336,766 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Anything related to delays:</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(<span class="kw">ends_with</span>(<span class="st">&quot;delay&quot;</span>))</code></pre></div>
-<pre><code># A tibble: 336,776 x 2
-   dep_delay arr_delay
-       &lt;dbl&gt;     &lt;dbl&gt;
- 1         2        11
- 2         4        20
- 3         2        33
- 4        -1       -18
- 5        -6       -25
- 6        -4        12
- 7        -5        19
- 8        -3       -14
- 9        -3        -8
-10        -2         8
-# … with 336,766 more rows</code></pre>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Anything related to departures:</span>
-flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(<span class="kw">contains</span>(<span class="st">&quot;dep&quot;</span>))</code></pre></div>
-<pre><code># A tibble: 336,776 x 3
-   dep_time sched_dep_time dep_delay
-      &lt;int&gt;          &lt;int&gt;     &lt;dbl&gt;
- 1      517            515         2
- 2      533            529         4
- 3      542            540         2
- 4      544            545        -1
- 5      554            600        -6
- 6      554            558        -4
- 7      555            600        -5
- 8      557            600        -3
- 9      557            600        -3
-10      558            600        -2
-# … with 336,766 more rows</code></pre>
-<p><strong>(LC5.17)</strong> Why might we want to use the <code>select()</code> function on a data frame?</p>
+<p><strong>(LC4.18)</strong> Why might we want to use the <code>select()</code> function on a data frame?</p>
 <p><strong>Solution</strong>: To narrow down the data frame, to make it easier to look at. Using <code>View()</code> for example.</p>
-<p><strong>(LC5.18)</strong> Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.</p>
+<p><strong>(LC4.19)</strong> Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.</p>
 <p><strong>Solution</strong>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">top_five &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">avg_delay =</span> <span class="kw">mean</span>(arr_delay, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(avg_delay)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">5</span>)
-top_five</code></pre></div>
-<pre><code># A tibble: 5 x 2
-  dest  avg_delay
-  &lt;chr&gt;     &lt;dbl&gt;
-1 CAE        41.8
-2 TUL        33.7
-3 OKC        30.6
-4 JAC        28.1
-5 TYS        24.1</code></pre>
-<p><strong>(LC5.19)</strong> Using the datasets included in the <code>nycflights13</code> package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:</p>
+<p><strong>(LC4.20)</strong> Using the datasets included in the <code>nycflights13</code> package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:</p>
 <ol style="list-style-type: decimal">
 <li><strong>Crucial</strong>: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level <em>pseudocode</em> that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse <em>what</em> you are trying to do (the algorithm) with <em>how</em> you are going to do it (writing <code>dplyr</code> code).</li>
 <li>Take a close look at all the datasets using the <code>View()</code> function: <code>flights</code>, <code>weather</code>, <code>planes</code>, <code>airports</code>, and <code>airlines</code> to identify which variables are necessary to compute available seat miles.</li>
@@ -2174,185 +937,77 @@ <h2><span class="header-section-number">D.4</span> Chapter 5 Solutions</h2>
 <li>Consider the data wrangling verbs in Table <a href="4-wrangling.html#tab:wrangle-summary-table">4.1</a> as your toolbox!</li>
 </ol>
 <p><strong>Solution</strong>: Here are some examples of student-written <a href="https://twitter.com/rudeboybert/status/964181298691629056">pseudocode</a>. Based on our own pseudocode, let’s first display the entire solution.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">ASM =</span> <span class="kw">sum</span>(ASM, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(ASM))</code></pre></div>
-<pre><code># A tibble: 16 x 2
-   carrier         ASM
-   &lt;chr&gt;         &lt;dbl&gt;
- 1 UA      15516377526
- 2 DL      10532885801
- 3 B6       9618222135
- 4 AA       3677292231
- 5 US       2533505829
- 6 VX       2296680778
- 7 EV       1817236275
- 8 WN       1718116857
- 9 9E        776970310
-10 HA        642478122
-11 AS        314104736
-12 FL        219628520
-13 F9        184832280
-14 YV         20163632
-15 MQ          7162420
-16 OO          1299835</code></pre>
 <p>Let’s now break this down step-by-step. To compute the available seat miles for a given flight, we need the <code>distance</code> variable from the <code>flights</code> data frame and the <code>seats</code> variable from the <code>planes</code> data frame, necessitating a join by the key variable <code>tailnum</code> as illustrated in Figure <a href="4-wrangling.html#fig:reldiagram">4.7</a>. To keep the resulting data frame easy to view, we’ll <code>select()</code> only these two variables and <code>carrier</code>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance)</code></pre></div>
-<pre><code># A tibble: 284,170 x 3
-   carrier seats distance
-   &lt;chr&gt;   &lt;int&gt;    &lt;dbl&gt;
- 1 UA        149     1400
- 2 UA        149     1416
- 3 AA        178     1089
- 4 B6        200     1576
- 5 DL        178      762
- 6 UA        191      719
- 7 B6        200     1065
- 8 EV         55      229
- 9 B6        200      944
-10 B6        200     1028
-# … with 284,160 more rows</code></pre>
 <p>Now for each flight we can compute the available seat miles <code>ASM</code> by multiplying the number of seats by the distance via a <code>mutate()</code>:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># Added:</span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance)</code></pre></div>
-<pre><code># A tibble: 284,170 x 4
-   carrier seats distance    ASM
-   &lt;chr&gt;   &lt;int&gt;    &lt;dbl&gt;  &lt;dbl&gt;
- 1 UA        149     1400 208600
- 2 UA        149     1416 210984
- 3 AA        178     1089 193842
- 4 B6        200     1576 315200
- 5 DL        178      762 135636
- 6 UA        191      719 137329
- 7 B6        200     1065 213000
- 8 EV         55      229  12595
- 9 B6        200      944 188800
-10 B6        200     1028 205600
-# … with 284,160 more rows</code></pre>
 <p>Next we want to sum the <code>ASM</code> for each carrier. We achieve this by first grouping by <code>carrier</code> and then summarizing using the <code>sum()</code> function:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># Added:</span>
-<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">ASM =</span> <span class="kw">sum</span>(ASM))</code></pre></div>
-<pre><code># A tibble: 16 x 2
-   carrier         ASM
-   &lt;chr&gt;         &lt;dbl&gt;
- 1 9E        776970310
- 2 AA       3677292231
- 3 AS        314104736
- 4 B6       9618222135
- 5 DL      10532885801
- 6 EV       1817236275
- 7 F9        184832280
- 8 FL        219628520
- 9 HA        642478122
-10 MQ          7162420
-11 OO          1299835
-12 UA      15516377526
-13 US       2533505829
-14 VX       2296680778
-15 WN       1718116857
-16 YV         20163632</code></pre>
-<p>However, because for certain carriers certain flights have missing <code>NA</code> values, the resulting table also returns <code>NA</code>’s. We can eliminate these by adding a <code>na.rm = TRUE</code> argument to <code>sum()</code>, telling R that we want to remove the <code>NA</code>’s in the sum. We saw this in Section (summarize):</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># Modified:</span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">ASM =</span> <span class="kw">sum</span>(ASM, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</code></pre></div>
-<pre><code># A tibble: 16 x 2
-   carrier         ASM
-   &lt;chr&gt;         &lt;dbl&gt;
- 1 9E        776970310
- 2 AA       3677292231
- 3 AS        314104736
- 4 B6       9618222135
- 5 DL      10532885801
- 6 EV       1817236275
- 7 F9        184832280
- 8 FL        219628520
- 9 HA        642478122
-10 MQ          7162420
-11 OO          1299835
-12 UA      15516377526
-13 US       2533505829
-14 VX       2296680778
-15 WN       1718116857
-16 YV         20163632</code></pre>
+<p>However, because for certain carriers certain flights have missing <code>NA</code> values, the resulting table also returns <code>NA</code>’s. We can eliminate these by adding a <code>na.rm = TRUE</code> argument to <code>sum()</code>, telling R that we want to remove the <code>NA</code>’s in the sum. We saw this in Section <a href="4-wrangling.html#summarize">4.3</a>:</p>
 <p>Finally, we <code>arrange()</code> the data in <code>desc()</code>ending order of <code>ASM</code>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">ASM =</span> <span class="kw">sum</span>(ASM, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># Added:</span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(ASM))</code></pre></div>
-<pre><code># A tibble: 16 x 2
-   carrier         ASM
-   &lt;chr&gt;         &lt;dbl&gt;
- 1 UA      15516377526
- 2 DL      10532885801
- 3 B6       9618222135
- 4 AA       3677292231
- 5 US       2533505829
- 6 VX       2296680778
- 7 EV       1817236275
- 8 WN       1718116857
- 9 9E        776970310
-10 HA        642478122
-11 AS        314104736
-12 FL        219628520
-13 F9        184832280
-14 YV         20163632
-15 MQ          7162420
-16 OO          1299835</code></pre>
 <p>While the above data frame is correct, the IATA <code>carrier</code> code is not always useful. For example, what carrier is <code>WN</code>? We can address this by joining with the <code>airlines</code> dataset using <code>carrier</code> is the key variable. While this step is not absolutely required, it goes a long way to making the table easier to make sense of. It is important to be empathetic with the ultimate consumers of your presented data!</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">inner_join</span>(planes, <span class="dt">by =</span> <span class="st">&quot;tailnum&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">select</span>(carrier, seats, distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">ASM =</span> seats <span class="op">*</span><span class="st"> </span>distance) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">ASM =</span> <span class="kw">sum</span>(ASM, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(ASM)) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="co"># Added:</span>
-<span class="st">  </span><span class="kw">inner_join</span>(airlines, <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)</code></pre></div>
-<pre><code># A tibble: 16 x 3
-   carrier         ASM name                       
-   &lt;chr&gt;         &lt;dbl&gt; &lt;chr&gt;                      
- 1 UA      15516377526 United Air Lines Inc.      
- 2 DL      10532885801 Delta Air Lines Inc.       
- 3 B6       9618222135 JetBlue Airways            
- 4 AA       3677292231 American Airlines Inc.     
- 5 US       2533505829 US Airways Inc.            
- 6 VX       2296680778 Virgin America             
- 7 EV       1817236275 ExpressJet Airlines Inc.   
- 8 WN       1718116857 Southwest Airlines Co.     
- 9 9E        776970310 Endeavor Air Inc.          
-10 HA        642478122 Hawaiian Airlines Inc.     
-11 AS        314104736 Alaska Airlines Inc.       
-12 FL        219628520 AirTran Airways Corporation
-13 F9        184832280 Frontier Airlines Inc.     
-14 YV         20163632 Mesa Airlines Inc.         
-15 MQ          7162420 Envoy Air                  
-16 OO          1299835 SkyWest Airlines Inc.      </code></pre>
+<hr />
+</div>
+<div id="chapter-5-solutions" class="section level2">
+<h2><span class="header-section-number">D.4</span> Chapter 5 Solutions</h2>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(nycflights13)
+<span class="kw">library</span>(tidyr)
+<span class="kw">library</span>(readr)</code></pre></div>
+<p><strong>(LC5.1)</strong> What are common characteristics of “tidy” datasets?</p>
+<p><strong>Solution</strong>: Rows correspond to observations, while columns correspond to variables.</p>
+<p><strong>(LC5.2)</strong> What makes “tidy” datasets useful for organizing data?</p>
+<p><strong>Solution</strong>: Tidy datasets are an organized way of viewing data. This format is required for the <code>ggplot2</code> and <code>dplyr</code> packages for data visualization and wrangling.</p>
+<p><strong>(LC5.3)</strong> Take a look the <code>airline_safety</code> data frame included in the <code>fivethirtyeight</code> data. Run the following:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">airline_safety</code></pre></div>
+<p>After reading the help file by running <code>?airline_safety</code>, we see that <code>airline_safety</code> is a data frame containing information on different airlines companies’ safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver’s article <a href="https://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/">“Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?”</a>. Let’s ignore the <code>incl_reg_subsidiaries</code> and <code>avail_seat_km_per_week</code> variables for simplicity:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">airline_safety_smaller &lt;-<span class="st"> </span>airline_safety <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span><span class="kw">c</span>(incl_reg_subsidiaries, avail_seat_km_per_week))
+airline_safety_smaller</code></pre></div>
+<pre><code># A tibble: 56 x 7
+   airline incidents_85_99 fatal_accidents… fatalities_85_99 incidents_00_14
+   &lt;chr&gt;             &lt;int&gt;            &lt;int&gt;            &lt;int&gt;           &lt;int&gt;
+ 1 Aer Li…               2                0                0               0
+ 2 Aerofl…              76               14              128               6
+ 3 Aeroli…               6                0                0               1
+ 4 Aerome…               3                1               64               5
+ 5 Air Ca…               2                0                0               2
+ 6 Air Fr…              14                4               79               6
+ 7 Air In…               2                1              329               4
+ 8 Air Ne…               3                0                0               5
+ 9 Alaska…               5                0                0               5
+10 Alital…               7                2               50               4
+# … with 46 more rows, and 2 more variables: fatal_accidents_00_14 &lt;int&gt;,
+#   fatalities_00_14 &lt;int&gt;</code></pre>
+<p>This data frame is not in “tidy” format. How would you convert this data frame to be in “tidy” format, in particular so that it has a variable <code>incident_type_years</code> indicating the indicent type/year and a variable <code>count</code> of the counts?</p>
+<p><strong>Solution</strong>: Using the <code>gather()</code> function from the <code>tidyr</code> package:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">airline_safety_smaller_tidy &lt;-<span class="st"> </span>airline_safety_smaller <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">gather</span>(<span class="dt">key =</span> incident_type_years, <span class="dt">value =</span> count, <span class="op">-</span>airline)
+airline_safety_smaller_tidy</code></pre></div>
+<pre><code># A tibble: 336 x 3
+   airline               incident_type_years count
+   &lt;chr&gt;                 &lt;chr&gt;               &lt;int&gt;
+ 1 Aer Lingus            incidents_85_99         2
+ 2 Aeroflot              incidents_85_99        76
+ 3 Aerolineas Argentinas incidents_85_99         6
+ 4 Aeromexico            incidents_85_99         3
+ 5 Air Canada            incidents_85_99         2
+ 6 Air France            incidents_85_99        14
+ 7 Air India             incidents_85_99         2
+ 8 Air New Zealand       incidents_85_99         3
+ 9 Alaska Airlines       incidents_85_99         5
+10 Alitalia              incidents_85_99         7
+# … with 326 more rows</code></pre>
+<p>If you look at the resulting <code>airline_safety_smaller_tidy</code> data frame in the spreadsheet viewer, you’ll see that the variable <code>incident_type_years</code> has 6 possible values: <code>&quot;incidents_85_99&quot;, &quot;fatal_accidents_85_99&quot;, &quot;fatalities_85_99&quot;,  &quot;incidents_00_14&quot;, &quot;fatal_accidents_00_14&quot;, &quot;fatalities_00_14&quot;</code> corresponding to the 6 columns of <code>airline_safety_smaller</code> we tidied.</p>
+<p><strong>(LC5.4)</strong> Convert the <code>dem_score</code> data frame into a tidy data frame and assign the name of <code>dem_score_tidy</code> to the resulting long-formatted data frame.</p>
+<p><strong>Solution</strong>: Running the following in the console:</p>
+<p>Let’s now compare the <code>dem_score</code> and <code>dem_score_tidy</code>. <code>dem_score</code> has democracy score information for each year in columns, whereas in <code>dem_score_tidy</code> there are explicit variables <code>year</code> and <code>democracy_score</code>. While both representations of the data contain the same information, we can only use <code>ggplot()</code> to create plots using the <code>dem_score_tidy</code> data frame.</p>
+<p><strong>(LC5.5)</strong> Read in the life expectancy data stored at <a href="https://moderndive.com/data/le_mess.csv" class="uri">https://moderndive.com/data/le_mess.csv</a> and convert it to a tidy data frame.</p>
+<p><strong>Solution</strong>: The code is similar</p>
+<p>We observe the same construct structure with respect to <code>year</code> in <code>life_expectancy</code> vs <code>life_expectancy_tidy</code> as we did in <code>dem_score</code> vs <code>dem_score_tidy</code>:</p>
 <hr />
 </div>
 <div id="chapter-6-solutions" class="section level2">
 <h2><span class="header-section-number">D.5</span> Chapter 6 Solutions</h2>
+<p>To come!</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
 <span class="kw">library</span>(dplyr)
 <span class="kw">library</span>(moderndive)
diff --git a/docs/images/accuracy_vs_precision.jpg b/docs/images/accuracy_vs_precision.jpg
new file mode 100644
index 000000000..8c5c7d131
Binary files /dev/null and b/docs/images/accuracy_vs_precision.jpg differ
diff --git a/docs/images/accuracy_vs_precision.png b/docs/images/accuracy_vs_precision.png
new file mode 100644
index 000000000..0c1edcafa
Binary files /dev/null and b/docs/images/accuracy_vs_precision.png differ
diff --git a/docs/images/crash-test-dummy.jpg b/docs/images/crash-test-dummy.jpg
new file mode 100644
index 000000000..3364e6598
Binary files /dev/null and b/docs/images/crash-test-dummy.jpg differ
diff --git a/docs/images/crc_press.jpg b/docs/images/crc_press.jpg
new file mode 100644
index 000000000..c7a78c667
Binary files /dev/null and b/docs/images/crc_press.jpg differ
diff --git a/docs/images/flight-simulator.jpg b/docs/images/flight-simulator.jpg
new file mode 100644
index 000000000..7a5fe9df4
Binary files /dev/null and b/docs/images/flight-simulator.jpg differ
diff --git a/docs/images/import-cheatsheet-1.png b/docs/images/import_cheatsheet-1.png
similarity index 100%
rename from docs/images/import-cheatsheet-1.png
rename to docs/images/import_cheatsheet-1.png
diff --git a/docs/images/import-cheatsheet-2.png b/docs/images/import_cheatsheet-2.png
similarity index 100%
rename from docs/images/import-cheatsheet-2.png
rename to docs/images/import_cheatsheet-2.png
diff --git a/docs/index.html b/docs/index.html
index d7b0cea89..012644159 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
@@ -545,66 +538,74 @@ <h2 class="subtitle"><em>A moderndive into R and the tidyverse</em></h2>
 <div id="intro" class="section level1">
 <h1><span class="header-section-number">Chapter 1</span> Introduction</h1>
 <hr />
-<div class="learncheck">
+<h1>
+Special Announcement
+</h1>
+<div class="announcement">
 <p>
-<strong>Note: This is the development version of ModernDive and is currently in the process of being edited. For the latest released version of ModernDive, please go to <a href="https://moderndive.com/">ModernDive.com</a>.</strong>
+<strong>We’re excited to announce that we’ve signed a book deal with CRC Press! We will be publishing our first fully complete online version of ModernDive in Summer 2019, with a corresponding print edition to follow in Fall 2019. Don’t worry though, our content will always remain freely available on <a href="https://moderndive.com/">ModernDive.com</a>.</strong>
 </p>
 </div>
+<center>
+<img src="images/crc_press.jpg" height="150" />
+</center>
 <!--
-## Important Note
-
-This is a previous version (v0.4.0.9000) of ModernDive and may be out of date. For the current version of ModernDive, please go to [ModernDive.com](https://moderndive.com/). 
+<div class="announcement">
+<p><strong>This is a previous version (v<code>r version</code>) of ModernDive and may be out of date. For the current version of ModernDive, please go to <a href="https://moderndive.com/">ModernDive.com</a>.</strong></p>
+</div>
 -->
 <hr />
+<div class="learncheck">
+<p>
+<strong>Please note that you are currently looking at the “development version” of ModernDive, which is a work in progress currently being edited and thus subject to frequent change. For the latest “released version” of ModernDive, which changes much less frequently, please visit <a href="https://moderndive.com/">ModernDive.com</a>.</strong>
+</p>
+</div>
+<p><strong>Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do?</strong></p>
 <!-- https://cran.r-project.org/Rlogo.svg -->
 <!-- https://www.rstudio.com/wp-content/uploads/2014/07/RStudio-Logo-Blue-Gradient.png -->
-<p><img src="images/Rlogo.png" height="150" />         <img src="images/RStudio-Logo-Blue-Gradient.png" style="width:45.0%" /></p>
+<center>
+<img src="images/Rlogo.png" height="100" />         <img src="images/RStudio-Logo-Blue-Gradient.png" height="100" />
+</center>
 <!--
 <img src="images/Rlogo.svg" style="height: 150px;"/>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 <img src="images/RStudio-Logo-Blue-Gradient.png" style="height: 150px;"/>
 -->
-<p><strong>Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do?</strong> If you’re asking yourself this question, then you’ve come to the right place! Start with our <a href="index.html#sec:intro-for-students">Introduction for Students</a>.</p>
+<p>If you’re asking yourself this question, then you’ve come to the right place! Start with our <a href="index.html#sec:intro-for-students">Introduction for Students</a>.</p>
 <ul>
 <li><em>Are you an instructor hoping to use this book in your courses? Then click <a href="index.html#sec:intro-instructors">here</a> for more information on how to teach with this book.</em></li>
 <li><em>Are you looking to connect with and contribute to ModernDive? Then click <a href="index.html#sec:connect-contribute">here</a> for information on how.</em></li>
 <li><em>Are you curious about the publishing of this book? Then click <a href="index.html#sec:about-book">here</a> for more information on the open-source technology, in particular R Markdown and the bookdown package.</em></li>
 </ul>
-<p>This is version 0.4.0.9000 of ModernDive published on February 26, 2019. For previous versions of ModernDive, see Section <a href="index.html#sec:about-book">1.5</a>. While a PDF version of this book can be found <a href="https://github.com/moderndive/moderndive_book/raw/master/docs/ismaykim.pdf" target="_blank">here</a>, this is very much a work in progress with many things that still need to be fixed. We appreciate your patience.</p>
+<p>This is version 0.5.0.9000 of ModernDive published on February 26, 2019. For previous versions of ModernDive, see Section <a href="index.html#sec:about-book">1.5</a>. While a PDF version of this book can be found <a href="ismaykim.pdf" target="_blank">here</a>, this is very much a work in progress with many things that still need to be fixed. We appreciate your patience.</p>
 <hr />
 <div id="sec:intro-for-students" class="section level2">
 <h2><span class="header-section-number">1.1</span> Introduction for students</h2>
 <p>This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.</p>
 <p>In Figure <a href="index.html#fig:moderndive-figure">1.1</a> we present a flowchart of what you’ll cover in this book. You’ll first get started with data in Chapter <a href="2-getting-started.html#getting-started">2</a>, where you’ll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then</p>
 <ol style="list-style-type: decimal">
-<li><strong>Data science</strong>: You’ll assemble your data science toolbox using <code>tidyverse</code> packages. In particular:</li>
-</ol>
+<li><strong>Data science</strong>: You’ll assemble your data science toolbox using <code>tidyverse</code> packages. In particular:
 <ul>
 <li>Ch.<a href="3-viz.html#viz">3</a>: Visualizing data via the <code>ggplot2</code> package.</li>
 <li>Ch.<a href="5-tidy.html#tidy">5</a>: Understanding the concept of “tidy” data as a standardized data input format for all packages in the <code>tidyverse</code></li>
 <li>Ch.<a href="4-wrangling.html#wrangling">4</a>: Wrangling data via the <code>dplyr</code> package.</li>
-</ul>
-<ol style="list-style-type: decimal">
-<li><strong>Data modeling</strong>: Using these data science tools and helper functions from the <code>moderndive</code> package, you’ll start performing data modeling. In particular:</li>
-</ol>
+</ul></li>
+<li><strong>Data modeling</strong>: Using these data science tools and helper functions from the <code>moderndive</code> package, you’ll start performing data modeling. In particular:
 <ul>
 <li>Ch.<a href="6-regression.html#regression">6</a>: Constructing basic regression models.</li>
 <li>Ch.<a href="7-multiple-regression.html#multiple-regression">7</a>: Constructing multiple regression models.</li>
-</ul>
-<ol style="list-style-type: decimal">
-<li><strong>Statistical inference</strong>: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the <code>infer</code> package. In particular:</li>
-</ol>
+</ul></li>
+<li><strong>Statistical inference</strong>: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the <code>infer</code> package. In particular:
 <ul>
 <li>Ch.<a href="8-sampling.html#sampling">8</a>: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls.</li>
 <li>Ch.<a href="9-confidence-intervals.html#confidence-intervals">9</a>: Building confidence intervals.</li>
 <li>Ch.<a href="10-hypothesis-testing.html#hypothesis-testing">10</a>: Conducting hypothesis tests.</li>
-</ul>
-<ol style="list-style-type: decimal">
-<li><strong>Data modeling revisited</strong>: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.<a href="6-regression.html#regression">6</a> &amp; Ch.<a href="7-multiple-regression.html#multiple-regression">7</a>. In particular:</li>
-</ol>
+</ul></li>
+<li><strong>Data modeling revisited</strong>: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.<a href="6-regression.html#regression">6</a> &amp; Ch.<a href="7-multiple-regression.html#multiple-regression">7</a>. In particular:
 <ul>
 <li>Ch.<a href="11-inference-for-regression.html#inference-for-regression">11</a>: Interpreting both the statistical and practice significance of the results of the models.</li>
-</ul>
+</ul></li>
+</ol>
 <p>We’ll end with a discussion on what it means to “think with data” in Chapter <a href="12-thinking-with-data.html#thinking-with-data">12</a> and present an example case study data analysis of house prices in Seattle.</p>
 <div class="figure" style="text-align: center"><span id="fig:moderndive-figure"></span>
 <img src="images/flowcharts/flowchart/flowchart.002.png" alt="ModernDive Flowchart" width="\textwidth" />
@@ -696,48 +697,38 @@ <h3><span class="header-section-number">1.2.1</span> Who is this book for?</h3>
 <p>This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience.</p>
 <p>Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you.</p>
 <ol style="list-style-type: decimal">
-<li><strong>Blur the lines between lecture and lab</strong></li>
-</ol>
+<li><strong>Blur the lines between lecture and lab</strong>
 <ul>
 <li>With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened.</li>
 <li>It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key.</li>
-</ul>
-<ol style="list-style-type: decimal">
-<li><strong>Focus on the entire data/science research pipeline</strong></li>
-</ol>
+</ul></li>
+<li><strong>Focus on the entire data/science research pipeline</strong>
 <ul>
 <li>We believe that the entirety of Grolemund and Wickham’s <a href="http://r4ds.had.co.nz/introduction.html">data/science pipeline</a> should be taught.</li>
 <li>We believe in <a href="https://arxiv.org/abs/1507.05346">“minimizing prerequisites to research”</a>: students should be answering questions with data as soon as possible.</li>
-</ul>
-<ol style="list-style-type: decimal">
-<li><strong>It’s all about the data</strong></li>
-</ol>
+</ul></li>
+<li><strong>It’s all about the data</strong>
 <ul>
 <li>We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the <code>nycflights13</code> and <code>fivethirtyeight</code> packages.</li>
 <li>We believe that <a href="http://escholarship.org/uc/item/84v3774z">data visualization is a gateway drug for statistics</a> and that the Grammar of Graphics as implemented in the <code>ggplot2</code> package is the best way to impart such lessons. However, we often hear: “You can’t teach <code>ggplot2</code> for data visualization in intro stats!” We, like <a href="http://varianceexplained.org/r/teach_ggplot2_to_beginners/">David Robinson</a>, are much more optimistic.</li>
 <li><code>dplyr</code> has made data wrangling much more <a href="http://chance.amstat.org/2015/04/setting-the-stage/">accessible</a> to novices, and hence much more interesting data-sets can be explored.</li>
-</ul>
-<ol style="list-style-type: decimal">
-<li><strong>Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas</strong></li>
-</ol>
+</ul></li>
+<li><strong>Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas</strong>
 <ul>
 <li>Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference.</li>
 <li>This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics.</li>
-</ul>
-<ol style="list-style-type: decimal">
-<li><strong>Don’t fence off students from the computation pool, throw them in!</strong></li>
-</ol>
+</ul></li>
+<li><strong>Don’t fence off students from the computation pool, throw them in!</strong>
 <ul>
 <li>Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice.</li>
 <li>We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis.</li>
-</ul>
-<ol style="list-style-type: decimal">
-<li><strong>Complete reproducibility and customizability</strong></li>
-</ol>
+</ul></li>
+<li><strong>Complete reproducibility and customizability</strong>
 <ul>
 <li>We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book!</li>
 <li>Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see <a href="index.html#sec:about-book">About this Book</a>.</li>
-</ul>
+</ul></li>
+</ol>
 <hr />
 </div>
 </div>
@@ -822,17 +813,24 @@ <h2><span class="header-section-number">1.4</span> Connect and contribute</h2>
 <h2><span class="header-section-number">1.5</span> About this book</h2>
 <p>This book was written using RStudio’s <a href="https://bookdown.org/">bookdown</a> package by Yihui Xie <span class="citation">(Xie <a href="#ref-R-bookdown">2018</a>)</span>. This package simplifies the publishing of books by having all content written in <a href="http://rmarkdown.rstudio.com/html_document_format.html">R Markdown</a>. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub:</p>
 <ul>
-<li><strong>Latest published version</strong> The most up-to-date release:</li>
-<li>Version 0.4.0 released on July 21, 2018 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.4.0">source code</a>).</li>
+<li><strong>Latest published version</strong> The most up-to-date release:
+<ul>
+<li>Version 0.5.0 released on February 24, 2019 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.5.0">source code</a>).</li>
 <li>Available at <a href="https://moderndive.com/">ModernDive.com</a></li>
-<li><strong>Development version</strong> The working copy of the next version which is currently being edited:</li>
+</ul></li>
+<li><strong>Development version</strong> The working copy of the next version which is currently being edited:
+<ul>
 <li>Preview of development version is available at <a href="https://moderndive.netlify.com/" class="uri">https://moderndive.netlify.com/</a></li>
 <li>Source code: Available on ModernDive’s <a href="https://github.com/moderndive/moderndive_book">GitHub repository page</a></li>
-<li><strong>Previous versions</strong> Older versions that may be out of date:</li>
+</ul></li>
+<li><strong>Previous versions</strong> Older versions that may be out of date:
+<ul>
+<li><a href="previous_versions/v0.4.0/index.html">Version 0.4.0</a> released on July 21, 2018 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.4.0">source code</a>)</li>
 <li><a href="previous_versions/v0.3.0/index.html">Version 0.3.0</a> released on February 3, 2018 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.3.0">source code</a>)</li>
 <li><a href="previous_versions/v0.2.0/index.html">Version 0.2.0</a> released on August 02, 2017 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.2.0">source code</a>)</li>
 <li><a href="previous_versions/v0.1.3/index.html">Version 0.1.3</a> released on February 09, 2017 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.1.3">source code</a>)</li>
 <li><a href="previous_versions/v0.1.2/index.html">Version 0.1.2</a> released on January 22, 2017 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.1.2">source code</a>)</li>
+</ul></li>
 </ul>
 <p>Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated <em>editions</em> of the textbook every few years, we apply a software design influenced model of publishing more easily updated <em>versions</em>. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests.</p>
 <p>Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of <code>index.Rmd</code> as “Chester Ismay, Albert Y. Kim, and YOU!”</p>
@@ -857,16 +855,20 @@ <h2><span class="header-section-number">1.6</span> About the authors</h2>
 </table>
 <!-- <img src="images/ismay.jpeg" alt="Drawing" style="height: 200px;"/>  |  <img src="images/kim.jpeg" alt="Drawing" style="height: 200px;"/> -->
 <ul>
-<li>Chester Ismay: Senior Curriculum Lead - DataCamp, Portland, OR, USA.</li>
+<li>Chester Ismay: Senior Curriculum Lead - DataCamp, Portland, OR, USA.
+<ul>
 <li>Email: <a href="mailto:chester.ismay@gmail.com">chester.ismay@gmail.com</a></li>
 <li>Webpage: <a href="http://chester.rbind.io/" class="uri">http://chester.rbind.io/</a></li>
 <li>Twitter: <a href="https://twitter.com/old_man_chester">old_man_chester</a></li>
 <li>GitHub: <a href="https://github.com/ismayc" class="uri">https://github.com/ismayc</a></li>
-<li>Albert Y. Kim: Assistant Professor of Statistical &amp; Data Sciences - Smith College, Northampton, MA, USA.</li>
+</ul></li>
+<li>Albert Y. Kim: Assistant Professor of Statistical &amp; Data Sciences - Smith College, Northampton, MA, USA.
+<ul>
 <li>Email: <a href="mailto:albert.ys.kim@gmail.com">albert.ys.kim@gmail.com</a></li>
 <li>Webpage: <a href="http://rudeboybert.rbind.io/" class="uri">http://rudeboybert.rbind.io/</a></li>
 <li>Twitter: <a href="https://twitter.com/rudeboybert">rudeboybert</a></li>
 <li>GitHub: <a href="https://github.com/rudeboybert" class="uri">https://github.com/rudeboybert</a></li>
+</ul></li>
 </ul>
 <!--
 ### Colophon 
diff --git a/docs/ismaykim.pdf b/docs/ismaykim.pdf
new file mode 100644
index 000000000..bd0f68c5e
Binary files /dev/null and b/docs/ismaykim.pdf differ
diff --git a/docs/ismaykim.tex b/docs/ismaykim.tex
new file mode 100644
index 000000000..52a6e0a95
--- /dev/null
+++ b/docs/ismaykim.tex
@@ -0,0 +1,15337 @@
+\PassOptionsToPackage{unicode=true}{hyperref} % options for packages loaded elsewhere
+\PassOptionsToPackage{hyphens}{url}
+\PassOptionsToPackage{dvipsnames,svgnames*,x11names*}{xcolor}
+%
+\documentclass[12pt, krantz2,]{krantz}
+\usepackage{lmodern}
+\usepackage{amssymb,amsmath}
+\usepackage{ifxetex,ifluatex}
+\usepackage{fixltx2e} % provides \textsubscript
+\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
+  \usepackage[T1]{fontenc}
+  \usepackage[utf8]{inputenc}
+  \usepackage{textcomp} % provides euro and other symbols
+\else % if luatex or xelatex
+  \usepackage{unicode-math}
+  \defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
+    \setmonofont[Mapping=tex-ansi,Scale=0.7]{Source Code Pro}
+\fi
+% use upquote if available, for straight quotes in verbatim environments
+\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
+% use microtype if available
+\IfFileExists{microtype.sty}{%
+\usepackage[]{microtype}
+\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
+}{}
+\IfFileExists{parskip.sty}{%
+\usepackage{parskip}
+}{% else
+\setlength{\parindent}{0pt}
+\setlength{\parskip}{6pt plus 2pt minus 1pt}
+}
+\usepackage{xcolor}
+\usepackage{hyperref}
+\hypersetup{
+            pdftitle={Statistical Inference via Data Science},
+            pdfauthor={Chester Ismay and Albert Y. Kim},
+            colorlinks=true,
+            linkcolor=Maroon,
+            filecolor=Maroon,
+            citecolor=Blue,
+            urlcolor=Blue,
+            breaklinks=true}
+\urlstyle{same}  % don't use monospace font for urls
+\usepackage{color}
+\usepackage{fancyvrb}
+\newcommand{\VerbBar}{|}
+\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
+\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
+% Add ',fontsize=\small' for more characters per line
+\usepackage{framed}
+\definecolor{shadecolor}{RGB}{248,248,248}
+\newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
+\newcommand{\AlertTok}[1]{\textcolor[rgb]{0.33,0.33,0.33}{#1}}
+\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textbf{\textit{#1}}}}
+\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.61,0.61,0.61}{#1}}
+\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.06,0.06,0.06}{#1}}
+\newcommand{\BuiltInTok}[1]{#1}
+\newcommand{\CharTok}[1]{\textcolor[rgb]{0.5,0.5,0.5}{#1}}
+\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textit{#1}}}
+\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textbf{\textit{#1}}}}
+\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0,0,0}{#1}}
+\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.27,0.27,0.27}{\textbf{#1}}}
+\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.27,0.27,0.27}{#1}}
+\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.06,0.06,0.06}{#1}}
+\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textbf{\textit{#1}}}}
+\newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.14,0.14,0.14}{\textbf{#1}}}
+\newcommand{\ExtensionTok}[1]{#1}
+\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.06,0.06,0.06}{#1}}
+\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0,0,0}{#1}}
+\newcommand{\ImportTok}[1]{#1}
+\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textbf{\textit{#1}}}}
+\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.27,0.27,0.27}{\textbf{#1}}}
+\newcommand{\NormalTok}[1]{#1}
+\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.43,0.43,0.43}{\textbf{#1}}}
+\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
+\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textit{#1}}}
+\newcommand{\RegionMarkerTok}[1]{#1}
+\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0,0,0}{#1}}
+\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.5,0.5,0.5}{#1}}
+\newcommand{\StringTok}[1]{\textcolor[rgb]{0.5,0.5,0.5}{#1}}
+\newcommand{\VariableTok}[1]{\textcolor[rgb]{0,0,0}{#1}}
+\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.5,0.5,0.5}{#1}}
+\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textbf{\textit{#1}}}}
+\usepackage{longtable,booktabs}
+% Fix footnotes in tables (requires footnote package)
+\IfFileExists{footnote.sty}{\usepackage{footnote}\makesavenoteenv{longtable}}{}
+\usepackage{graphicx,grffile}
+\makeatletter
+\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
+\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
+\makeatother
+% Scale images if necessary, so that they will not overflow the page
+% margins by default, and it is still possible to overwrite the defaults
+% using explicit options in \includegraphics[width, height, ...]{}
+\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
+\setlength{\emergencystretch}{3em}  % prevent overfull lines
+\providecommand{\tightlist}{%
+  \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
+\setcounter{secnumdepth}{5}
+% Redefines (sub)paragraphs to behave more like sections
+\ifx\paragraph\undefined\else
+\let\oldparagraph\paragraph
+\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
+\fi
+\ifx\subparagraph\undefined\else
+\let\oldsubparagraph\subparagraph
+\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
+\fi
+
+% set default figure placement to htbp
+\makeatletter
+\def\fps@figure{htbp}
+\makeatother
+
+\usepackage{booktabs}
+\usepackage{longtable}
+\usepackage[bf,singlelinecheck=off]{caption}
+
+\usepackage{framed,color}
+\definecolor{shadecolor}{RGB}{248,248,248}
+
+\usepackage{float}
+\usepackage{array}
+\usepackage{multirow}
+%\usepackage[table]{xcolor}
+\usepackage{wrapfig}
+\usepackage{colortbl}
+\usepackage{pdflscape}
+\usepackage{tabu}
+\usepackage{threeparttable}
+\usepackage{threeparttablex}
+\usepackage[normalem]{ulem}
+\usepackage{makecell}
+
+\renewcommand{\textfraction}{0.05}
+\renewcommand{\topfraction}{0.8}
+\renewcommand{\bottomfraction}{0.8}
+\renewcommand{\floatpagefraction}{0.75}
+
+\renewenvironment{quote}{\begin{VF}}{\end{VF}}
+\let\oldhref\href
+\renewcommand{\href}[2]{#2\footnote{\url{#1}}}
+
+\makeatletter
+\newenvironment{kframe}{%
+\medskip{}
+\setlength{\fboxsep}{.8em}
+ \def\at@end@of@kframe{}%
+ \ifinner\ifhmode%
+  \def\at@end@of@kframe{\end{minipage}}%
+  \begin{minipage}{\columnwidth}%
+ \fi\fi%
+ \def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
+ \colorbox{shadecolor}{##1}\hskip-\fboxsep
+     % There is no \\@totalrightmargin, so:
+     \hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
+ \MakeFramed {\advance\hsize-\width
+   \@totalleftmargin\z@ \linewidth\hsize
+   \@setminipage}}%
+ {\par\unskip\endMakeFramed%
+ \at@end@of@kframe}
+\makeatother
+
+\renewenvironment{Shaded}{\begin{kframe}}{\end{kframe}}
+
+\usepackage{makeidx}
+\makeindex
+
+\urlstyle{tt}
+
+%% Need to clean up
+\newenvironment{rmdblock}[1]
+  {\begin{shaded*}
+  \begin{itemize}
+  \renewcommand{\labelitemi}{
+    \raisebox{-.7\height}[0pt][0pt]{
+  %    {\setkeys{Gin}{width=3em,keepaspectratio}\includegraphics{images/#1}}
+    }
+  }
+  \item
+  }
+  {
+  \end{itemize}
+  \end{shaded*}
+  }
+
+\newenvironment{rmdnote}
+  {\begin{rmdblock}{note}}
+  {\end{rmdblock}}
+\newenvironment{rmdcaution}
+  {\begin{rmdblock}{caution}}
+  {\end{rmdblock}}
+\newenvironment{rmdimportant}
+  {\begin{rmdblock}{important}}
+  {\end{rmdblock}}
+\newenvironment{rmdtip}
+  {\begin{rmdblock}{tip}}
+  {\end{rmdblock}}
+\newenvironment{rmdwarning}
+  {\begin{rmdblock}{warning}}
+  {\end{rmdblock}}
+\newenvironment{learncheck}
+  {\begin{rmdblock}{warning}}
+  {\end{rmdblock}}
+\newenvironment{review}
+  {\begin{rmdblock}{warning}}
+  {\end{rmdblock}}
+\newenvironment{announcement}
+  {\begin{rmdblock}{warning}}
+  {\end{rmdblock}}
+
+
+\usepackage{amsthm}
+\makeatletter
+\def\thm@space@setup{%
+  \thm@preskip=8pt plus 2pt minus 4pt
+  \thm@postskip=\thm@preskip
+}
+\makeatother
+
+\frontmatter
+\usepackage[]{natbib}
+\bibliographystyle{apalike}
+
+\title{Statistical Inference via Data Science}
+\providecommand{\subtitle}[1]{}
+\subtitle{A moderndive into R and the tidyverse}
+\author{Chester Ismay and Albert Y. Kim}
+\date{February 24, 2019}
+
+\begin{document}
+\maketitle
+
+% you may need to leave a few empty pages before the dedication page
+
+%\cleardoublepage\newpage\thispagestyle{empty}\null
+%\cleardoublepage\newpage\thispagestyle{empty}\null
+%\cleardoublepage\newpage
+\thispagestyle{empty}
+
+\begin{center}
+%\includegraphics{images/dedication.pdf}
+\end{center}
+
+\setlength{\abovedisplayskip}{-5pt}
+\setlength{\abovedisplayshortskip}{-5pt}
+
+{
+\hypersetup{linkcolor=}
+\setcounter{tocdepth}{2}
+\tableofcontents
+}
+\listoftables
+\listoffigures
+\mainmatter
+
+\hypertarget{intro}{%
+\chapter{Introduction}\label{intro}}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+Special Announcement
+
+\begin{announcement}
+\textbf{We're excited to announce that we've signed a book deal with CRC
+Press! We will be publishing our first fully complete online version of
+ModernDive in Summer 2019, with a corresponding print edition to follow
+in Fall 2019. Don't worry though, our content will always remain freely
+available on \href{https://moderndive.com/}{ModernDive.com}.}
+\end{announcement}
+
+\includegraphics[width=0.2\textwidth,height=\textheight]{images/crc_press.jpg}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\begin{learncheck}
+\textbf{Please note that you are currently looking at the ``development
+version'' of ModernDive, which is a work in progress currently being
+edited and thus subject to frequent change. For the latest ``released
+version'' of ModernDive, which changes much less frequently, please
+visit \href{https://moderndive.com/}{ModernDive.com}.}
+\end{learncheck}
+
+\textbf{Help! I'm new to R and RStudio and I need to learn about them! However, I'm completely new to coding! What do I do?}
+
+\includegraphics[width=\textwidth,height=0.2\textheight]{images/Rlogo.png} \hfill         \includegraphics[width=\textwidth,height=0.2\textheight]{images/RStudio-Logo-Blue-Gradient.png}
+
+If you're asking yourself this question, then you've come to the right place! Start with our \protect\hyperlink{sec:intro-for-students}{Introduction for Students}.
+
+\begin{itemize}
+\tightlist
+\item
+  \emph{Are you an instructor hoping to use this book in your courses? Then click \protect\hyperlink{sec:intro-instructors}{here} for more information on how to teach with this book.}
+\item
+  \emph{Are you looking to connect with and contribute to ModernDive? Then click \protect\hyperlink{sec:connect-contribute}{here} for information on how.}
+\item
+  \emph{Are you curious about the publishing of this book? Then click \protect\hyperlink{sec:about-book}{here} for more information on the open-source technology, in particular R Markdown and the bookdown package.}
+\end{itemize}
+
+This is version 0.5.0.9000 of ModernDive published on February 24, 2019. For previous versions of ModernDive, see Section \ref{sec:about-book}. While a PDF version of this book can be found \href{https://github.com/moderndive/moderndive_book/raw/master/docs/ismaykim.pdf}{here}, this is very much a work in progress with many things that still need to be fixed. We appreciate your patience.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{sec:intro-for-students}{%
+\section{Introduction for students}\label{sec:intro-for-students}}
+
+This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.
+
+In Figure \ref{fig:moderndive-figure} we present a flowchart of what you'll cover in this book. You'll first get started with data in Chapter \ref{getting-started}, where you'll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{Data science}: You'll assemble your data science toolbox using \texttt{tidyverse} packages. In particular:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Ch.\ref{viz}: Visualizing data via the \texttt{ggplot2} package.
+  \item
+    Ch.\ref{tidy}: Understanding the concept of ``tidy'' data as a standardized data input format for all packages in the \texttt{tidyverse}
+  \item
+    Ch.\ref{wrangling}: Wrangling data via the \texttt{dplyr} package.
+  \end{itemize}
+\item
+  \textbf{Data modeling}: Using these data science tools and helper functions from the \texttt{moderndive} package, you'll start performing data modeling. In particular:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Ch.\ref{regression}: Constructing basic regression models.
+  \item
+    Ch.\ref{multiple-regression}: Constructing multiple regression models.
+  \end{itemize}
+\item
+  \textbf{Statistical inference}: Once again using your newly acquired data science tools, we'll unpack statistical inference using the \texttt{infer} package. In particular:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Ch.\ref{sampling}: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a ``bowl'' with an unknown proportion of red balls.
+  \item
+    Ch.\ref{confidence-intervals}: Building confidence intervals.
+  \item
+    Ch.\ref{hypothesis-testing}: Conducting hypothesis tests.
+  \end{itemize}
+\item
+  \textbf{Data modeling revisited}: Armed with your new understanding of statistical inference, you'll revisit and review the models you constructed in Ch.\ref{regression} \& Ch.\ref{multiple-regression}. In particular:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Ch.\ref{inference-for-regression}: Interpreting both the statistical and practice significance of the results of the models.
+  \end{itemize}
+\end{enumerate}
+
+We'll end with a discussion on what it means to ``think with data'' in Chapter \ref{thinking-with-data} and present an example case study data analysis of house prices in Seattle.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/flowcharts/flowchart/flowchart.002} 
+
+}
+
+\caption{ModernDive Flowchart}\label{fig:moderndive-figure}
+\end{figure}
+
+\hypertarget{subsec:learning-goals}{%
+\subsection{What you will learn from this book}\label{subsec:learning-goals}}
+
+We hope that by the end of this book, you'll have learned
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  How to use R to explore data.\\
+\item
+  How to answer statistical questions using tools like confidence intervals and hypothesis tests.
+\item
+  How to effectively create ``data stories'' using these tools.
+\end{enumerate}
+
+What do we mean by data stories? We mean any analysis involving data that engages the reader in answering questions with careful visuals and thoughtful discussion, such as \href{http://rpubs.com/ry_lisa_elana/chicago}{How strong is the relationship between per capita income and crime in Chicago neighborhoods?} and \href{https://ismayc.github.io/soc301_s2017/group_projects/group4.html}{How many f**ks does Quentin Tarantino give (as measured by the amount of swearing in his films)?}. Further discussions on data stories can be found in this \href{https://www.thinkwithgoogle.com/marketing-resources/data-measurement/tell-meaningful-stories-with-data/}{Think With Google article}.
+
+For other examples of data stories constructed by students like yourselves, look at the final projects for two courses that have previously used ModernDive:
+
+\begin{itemize}
+\tightlist
+\item
+  Middlebury College \href{https://rudeboybert.github.io/MATH116/PS/final_project/final_project_outline.html\#past_examples}{MATH 116 Introduction to Statistical and Data Sciences} using student collected data.
+\item
+  Pacific University \href{https://ismayc.github.io/soc301_s2017/group-projects/index.html}{SOC 301 Social Statistics} using data from the \href{https://cran.r-project.org/web/packages/fivethirtyeight/vignettes/fivethirtyeight.html}{fivethirtyeight R package}.
+\end{itemize}
+
+This book will help you develop your ``data science toolbox'', including tools such as data visualization, data formatting, data wrangling, and data modeling using regression. With these tools, you'll be able to perform the entirety of the ``data/science pipeline'' while building data communication skills (see Subsection \ref{subsec:pipeline} for more details).
+
+In particular, this book will lean heavily on data visualization. In today's world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are to convey relationships with data. You'll also see the use of visualization to introduce concepts like mean, median, standard deviation, distributions, etc. In general, we'll use visualization as a way of building almost all of the ideas in this book.
+
+To impart the statistical lessons in this book, we have intentionally minimized the number of mathematical formulas used and instead have focused on developing a conceptual understanding via data visualization, statistical computing, and simulations. We hope this is a more intuitive experience than the way statistics has traditionally been taught in the past and how it is commonly perceived.
+
+Finally, you'll learn the importance of literate programming. By this we mean you'll learn how to write code that is useful not just for a computer to execute but also for readers to understand exactly what your analysis is doing and how you did it. This is part of a greater effort to encourage reproducible research (see Subsection \ref{subsec:reproducible} for more details). Hal Abelson coined the phrase that we will follow throughout this book:
+
+\begin{quote}
+``Programs must be written for people to read, and only incidentally for machines to execute.''
+\end{quote}
+
+We understand that there may be challenging moments as you learn to program. Both of us continue to struggle and find ourselves often using web searches to find answers and reach out to colleagues for help. In the long run though, we all can solve problems faster and more elegantly via programming. We wrote this book as our way to help you get started and you should know that there is a huge community of R users that are always happy to help everyone along as well. This community exists in particular on the internet on various forums and websites such as \href{https://stackoverflow.com/}{stackoverflow.com}.
+
+\hypertarget{subsec:pipeline}{%
+\subsection{Data/science pipeline}\label{subsec:pipeline}}
+
+You may think of statistics as just being a bunch of numbers. We commonly hear the phrase ``statistician'' when listening to broadcasts of sporting events. Statistics (in particular, data analysis), in addition to describing numbers like with baseball batting averages, plays a vital role in all of the sciences. You'll commonly hear the phrase ``statistically significant'' thrown around in the media. You'll see articles that say ``Science now shows that chocolate is good for you.'' Underpinning these claims is data analysis. By the end of this book, you'll be able to better understand whether these claims should be trusted or whether we should be wary. Inside data analysis are many sub-fields that we will discuss throughout this book (though not necessarily in this order):
+
+\begin{itemize}
+\tightlist
+\item
+  data collection
+\item
+  data wrangling
+\item
+  data visualization
+\item
+  data modeling
+\item
+  inference
+\item
+  correlation and regression
+\item
+  interpretation of results
+\item
+  data communication/storytelling
+\end{itemize}
+
+These sub-fields are summarized in what Grolemund and Wickham term the \href{http://r4ds.had.co.nz/explore-intro.html}{``Data/Science Pipeline''} in Figure \ref{fig:pipeline-figure}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/tidy1} 
+
+}
+
+\caption{Data/Science Pipeline}\label{fig:pipeline-figure}
+\end{figure}
+
+We will begin by digging into the gray \textbf{Understand} portion of the cycle with data visualization, then with a discussion on what is meant by tidy data and data wrangling, and then conclude by talking about interpreting and discussing the results of our models via \textbf{Communication}. These steps are vital to any statistical analysis. But why should you care about statistics? ``Why did they make me take this class?''
+
+There's a reason so many fields require a statistics course. Scientific knowledge grows through an understanding of statistical significance and data analysis. You needn't be intimidated by statistics. It's not the beast that it used to be and, paired with computation, you'll see how reproducible research in the sciences particularly increases scientific knowledge.
+
+\hypertarget{subsec:reproducible}{%
+\subsection{Reproducible research}\label{subsec:reproducible}}
+
+\begin{quote}
+``The most important tool is the \emph{mindset}, when starting, that the end product will be reproducible.'' -- Keith Baggerly
+\end{quote}
+
+Another goal of this book is to help readers understand the importance of reproducible analyses. The hope is to get readers into the habit of making their analyses reproducible from the very beginning. This means we'll be trying to help you build new habits. This will take practice and be difficult at times. You'll see just why it is so important for you to keep track of your code and well-document it to help yourself later and any potential collaborators as well.
+
+Copying and pasting results from one program into a word processor is not the way that efficient and effective scientific research is conducted. It's much more important for time to be spent on data collection and data analysis and not on copying and pasting plots back and forth across a variety of programs.
+
+In a traditional analyses if an error was made with the original data, we'd need to step through the entire process again: recreate the plots and copy and paste all of the new plots and our statistical analysis into your document. This is error prone and a frustrating use of time. We'll see how to use R Markdown to get away from this tedious activity so that we can spend more time doing science.
+
+\begin{quote}
+``We are talking about \emph{computational} reproducibility.'' - Yihui Xie
+\end{quote}
+
+Reproducibility means a lot of things in terms of different scientific fields. Are experiments conducted in a way that another researcher could follow the steps and get similar results? In this book, we will focus on what is known as \textbf{computational reproducibility}. This refers to being able to pass all of one's data analysis, data-sets, and conclusions to someone else and have them get exactly the same results on their machine. This allows for time to be spent interpreting results and considering assumptions instead of the more error prone way of starting from scratch or following a list of steps that may be different from machine to machine.
+
+\hypertarget{final-note-for-students}{%
+\subsection{Final note for students}\label{final-note-for-students}}
+
+At this point, if you are interested in instructor perspectives on this book, ways to contribute and collaborate, or the technical details of this book's construction and publishing, then continue with the rest of the chapter below. Otherwise, let's get started with R and RStudio in Chapter \ref{getting-started}!
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{sec:intro-instructors}{%
+\section{Introduction for instructors}\label{sec:intro-instructors}}
+
+This book is inspired by the following books:
+
+\begin{itemize}
+\tightlist
+\item
+  ``Mathematical Statistics with Resampling and R'' \citep{hester2011},
+\item
+  ``OpenIntro: Intro Stat with Randomization and Simulation'' \citep{isrs2014}, and
+\item
+  ``R for Data Science'' \citep{rds2016}.
+\end{itemize}
+
+The first book, while designed for upper-level undergraduates and graduate students, provides an excellent resource on how to use resampling to impart statistical concepts like sampling distributions using computation instead of large-sample approximations and other mathematical formulas. The last two books are free options to learning introductory statistics and data science, providing an alternative to the many traditionally expensive introductory statistics textbooks.
+
+When looking over the large number of introductory statistics textbooks that currently exist, we found that there wasn't one that incorporated many newly developed R packages directly into the text, in particular the many packages included in the \href{http://tidyverse.org/}{\texttt{tidyverse}} collection of packages, such as \texttt{ggplot2}, \texttt{dplyr}, \texttt{tidyr}, and \texttt{broom}. Additionally, there wasn't an open-source and easily reproducible textbook available that exposed new learners all of three of the learning goals listed at the outset of Subsection \ref{subsec:learning-goals}.
+
+\hypertarget{who-is-this-book-for}{%
+\subsection{Who is this book for?}\label{who-is-this-book-for}}
+
+This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience.
+
+Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{Blur the lines between lecture and lab}
+
+  \begin{itemize}
+  \tightlist
+  \item
+    With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened.
+  \item
+    It's much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key.
+  \end{itemize}
+\item
+  \textbf{Focus on the entire data/science research pipeline}
+
+  \begin{itemize}
+  \tightlist
+  \item
+    We believe that the entirety of Grolemund and Wickham's \href{http://r4ds.had.co.nz/introduction.html}{data/science pipeline} should be taught.
+  \item
+    We believe in \href{https://arxiv.org/abs/1507.05346}{``minimizing prerequisites to research''}: students should be answering questions with data as soon as possible.
+  \end{itemize}
+\item
+  \textbf{It's all about the data}
+
+  \begin{itemize}
+  \tightlist
+  \item
+    We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the \texttt{nycflights13} and \texttt{fivethirtyeight} packages.
+  \item
+    We believe that \href{http://escholarship.org/uc/item/84v3774z}{data visualization is a gateway drug for statistics} and that the Grammar of Graphics as implemented in the \texttt{ggplot2} package is the best way to impart such lessons. However, we often hear: ``You can't teach \texttt{ggplot2} for data visualization in intro stats!'' We, like \href{http://varianceexplained.org/r/teach_ggplot2_to_beginners/}{David Robinson}, are much more optimistic.
+  \item
+    \texttt{dplyr} has made data wrangling much more \href{http://chance.amstat.org/2015/04/setting-the-stage/}{accessible} to novices, and hence much more interesting data-sets can be explored.
+  \end{itemize}
+\item
+  \textbf{Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas}
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference.
+  \item
+    This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics.
+  \end{itemize}
+\item
+  \textbf{Don't fence off students from the computation pool, throw them in!}
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice.
+  \item
+    We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis.
+  \end{itemize}
+\item
+  \textbf{Complete reproducibility and customizability}
+
+  \begin{itemize}
+  \tightlist
+  \item
+    We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book!
+  \item
+    Ultimately the best textbook is one you've written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see \protect\hyperlink{sec:about-book}{About this Book}.
+  \end{itemize}
+\end{enumerate}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{datacamp}{%
+\section{DataCamp}\label{datacamp}}
+
+\begin{figure}
+\centering
+\includegraphics[width=1\textwidth,height=\textheight]{images/datacamp.png}
+\caption{DataCamp logo}
+\end{figure}
+
+DataCamp is a browser-based interactive platform for learning data science, offering courses on a wide array of courses on data science, analytics, statistics, machine learning, and artificial intelligence, where each course is a combination of lectures and exercises that offer immediate feedback.
+
+The following chapters of ModernDive roughly map to the following closely-integrated DataCamp courses that use the same R tools and often even the same datasets. By no means is this an exhaustive list of possible DataCamp courses that are relevant to the topics in this book, we recommend these ones in particular to supplement your ModernDive experience.
+
+Click on the image for each course to access its webpage on \href{https://www.datacamp.com/home}{datacamp.com}. Instructors at accredited universities can sign their class up for a free academic licence at \href{https://www.datacamp.com/groups/education}{DataCamp For The Classroom}, giving their students access to all premium courses for 6 months for free.
+
+\begin{longtable}[]{@{}lll@{}}
+\toprule
+\begin{minipage}[b]{0.30\columnwidth}\raggedright
+Chapter\strut
+\end{minipage} & \begin{minipage}[b]{0.30\columnwidth}\raggedright
+Topic\strut
+\end{minipage} & \begin{minipage}[b]{0.30\columnwidth}\raggedright
+DataCamp Courses\strut
+\end{minipage}\tabularnewline
+\midrule
+\endhead
+\begin{minipage}[t]{0.30\columnwidth}\raggedright
+2\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+Basic R programming concepts\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+\href{https://www.datacamp.com/courses/free-introduction-to-r}{\includegraphics[width=0.6\textwidth]{images/datacamp_intro_to_R.png}} \href{https://www.datacamp.com/courses/intermediate-r}{\includegraphics[width=0.6\textwidth]{images/datacamp_intermediate_R.png}}\strut
+\end{minipage}\tabularnewline
+\begin{minipage}[t]{0.30\columnwidth}\raggedright
+3 \& 5\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+Introductory data visualization and wrangling\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+\href{https://www.datacamp.com/courses/introduction-to-the-tidyverse}{\includegraphics[width=0.6\textwidth]{images/datacamp_intro_to_tidyverse.png}}\strut
+\end{minipage}\tabularnewline
+\begin{minipage}[t]{0.30\columnwidth}\raggedright
+4 \& 5\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+Data ``tidying'' and intermediate data wrangling\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+\href{https://www.datacamp.com/courses/working-with-data-in-the-tidyverse}{\includegraphics[width=0.6\textwidth]{images/datacamp_working_with_data.png}}\strut
+\end{minipage}\tabularnewline
+\begin{minipage}[t]{0.30\columnwidth}\raggedright
+6 \& 7\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+Data modeling, basic regression, and multiple regression\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+\href{https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse}{\includegraphics[width=0.6\textwidth]{images/datacamp_intro_to_modeling.png}}\strut
+\end{minipage}\tabularnewline
+\begin{minipage}[t]{0.30\columnwidth}\raggedright
+9 \& 10\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+Statistical inference: confidence intervals and hypothesis testing\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+\href{https://www.datacamp.com/courses/inference-for-numerical-data}{\includegraphics[width=0.6\textwidth]{images/datacamp_inference_for_numerical_data.png}} \href{https://www.datacamp.com/courses/inference-for-categorical-data}{\includegraphics[width=0.6\textwidth]{images/datacamp_inference_for_categorical_data.png}}\strut
+\end{minipage}\tabularnewline
+\begin{minipage}[t]{0.30\columnwidth}\raggedright
+11\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+Inference for regression\strut
+\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright
+\href{https://www.datacamp.com/courses/inference-for-linear-regression}{\includegraphics[width=0.6\textwidth]{images/datacamp_inference_for_regression.png}}\strut
+\end{minipage}\tabularnewline
+\bottomrule
+\end{longtable}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{sec:connect-contribute}{%
+\section{Connect and contribute}\label{sec:connect-contribute}}
+
+If you would like to connect with ModernDive, check out the following links:
+
+\begin{itemize}
+\tightlist
+\item
+  If you would like to receive periodic updates about ModernDive (roughly every 3 months), please sign up for our \href{http://eepurl.com/cBkItf}{mailing list}.
+\item
+  Contact Albert at \href{mailto:albert.ys.kim@gmail.com}{\nolinkurl{albert.ys.kim@gmail.com}} and Chester at \href{mailto:chester.ismay@gmail.com}{\nolinkurl{chester.ismay@gmail.com}}.
+\item
+  We're on Twitter at \href{https://twitter.com/ModernDive}{ModernDive}.
+\end{itemize}
+
+If you would like to contribute to ModernDive, there are many ways! Let's all work together to make this book as great as possible for as many students and instructors as possible!
+
+\begin{itemize}
+\tightlist
+\item
+  Please let us know if you find any errors, typos, or areas from improvement on our \href{https://github.com/moderndive/moderndive_book/issues}{GitHub issues} page.
+\item
+  If you are familiar with GitHub and would like to contribute more, please see Section \ref{sec:about-book} below.
+\end{itemize}
+
+For example, we thank
+
+\begin{itemize}
+\tightlist
+\item
+  \href{https://twitter.com/andrewheiss}{Dr Andrew Heiss} for contributing Subsection 2.2.3 on ``Errors, warnings, and messages''.
+\end{itemize}
+
+The authors would like to thank \href{https://github.com/nsonneborn}{Nina Sonneborn}, \href{https://twitter.com/rhobott?lang=en}{Kristin Bott}, and the participants of our \href{https://www.causeweb.org/cause/uscots/uscots17/workshop/3}{USCOTS 2017 workshop} for their feedback and suggestions. A special thanks goes to Dr.~Yana Weinstein, cognitive psychological scientist and co-founder of \href{http://www.learningscientists.org/yana-weinstein/}{The Learning Scientists}, for her extensive contributions.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{sec:about-book}{%
+\section{About this book}\label{sec:about-book}}
+
+This book was written using RStudio's \href{https://bookdown.org/}{bookdown} package by Yihui Xie \citep{R-bookdown}. This package simplifies the publishing of books by having all content written in \href{http://rmarkdown.rstudio.com/html_document_format.html}{R Markdown}. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub:
+
+\begin{itemize}
+\tightlist
+\item
+  \textbf{Latest published version} The most up-to-date release:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Version 0.5.0 released on February 24, 2019 (\href{https://github.com/moderndive/moderndive_book/releases/tag/v0.5.0}{source code}).
+  \item
+    Available at \href{https://moderndive.com/}{ModernDive.com}
+  \end{itemize}
+\item
+  \textbf{Development version} The working copy of the next version which is currently being edited:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Preview of development version is available at \url{https://moderndive.netlify.com/}
+  \item
+    Source code: Available on ModernDive's \href{https://github.com/moderndive/moderndive_book}{GitHub repository page}
+  \end{itemize}
+\item
+  \textbf{Previous versions} Older versions that may be out of date:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    \href{previous_versions/v0.4.0/index.html}{Version 0.4.0} released on July 21, 2018 (\href{https://github.com/moderndive/moderndive_book/releases/tag/v0.4.0}{source code})
+  \item
+    \href{previous_versions/v0.3.0/index.html}{Version 0.3.0} released on February 3, 2018 (\href{https://github.com/moderndive/moderndive_book/releases/tag/v0.3.0}{source code})
+  \item
+    \href{previous_versions/v0.2.0/index.html}{Version 0.2.0} released on August 02, 2017 (\href{https://github.com/moderndive/moderndive_book/releases/tag/v0.2.0}{source code})
+  \item
+    \href{previous_versions/v0.1.3/index.html}{Version 0.1.3} released on February 09, 2017 (\href{https://github.com/moderndive/moderndive_book/releases/tag/v0.1.3}{source code})
+  \item
+    \href{previous_versions/v0.1.2/index.html}{Version 0.1.2} released on January 22, 2017 (\href{https://github.com/moderndive/moderndive_book/releases/tag/v0.1.2}{source code})
+  \end{itemize}
+\end{itemize}
+
+Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated \emph{editions} of the textbook every few years, we apply a software design influenced model of publishing more easily updated \emph{versions}. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests.
+
+Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of \texttt{index.Rmd} as ``Chester Ismay, Albert Y. Kim, and YOU!''
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{sec:about-authors}{%
+\section{About the authors}\label{sec:about-authors}}
+
+Who we are!
+
+\begin{longtable}[]{@{}cc@{}}
+\toprule
+Chester Ismay & Albert Y. Kim\tabularnewline
+\midrule
+\endhead
+\includegraphics[width=0.4\textwidth,height=\textheight]{images/ismay.png} & \includegraphics[width=0.4\textwidth,height=\textheight]{images/kim.png}\tabularnewline
+\bottomrule
+\end{longtable}
+
+\begin{itemize}
+\tightlist
+\item
+  Chester Ismay: Senior Curriculum Lead - DataCamp, Portland, OR, USA.
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Email: \href{mailto:chester.ismay@gmail.com}{\nolinkurl{chester.ismay@gmail.com}}
+  \item
+    Webpage: \url{http://chester.rbind.io/}
+  \item
+    Twitter: \href{https://twitter.com/old_man_chester}{old\_man\_chester}
+  \item
+    GitHub: \url{https://github.com/ismayc}
+  \end{itemize}
+\item
+  Albert Y. Kim: Assistant Professor of Statistical \& Data Sciences - Smith College, Northampton, MA, USA.
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Email: \href{mailto:albert.ys.kim@gmail.com}{\nolinkurl{albert.ys.kim@gmail.com}}
+  \item
+    Webpage: \url{http://rudeboybert.rbind.io/}
+  \item
+    Twitter: \href{https://twitter.com/rudeboybert}{rudeboybert}
+  \item
+    GitHub: \url{https://github.com/rudeboybert}
+  \end{itemize}
+\end{itemize}
+
+\hypertarget{getting-started}{%
+\chapter{Getting Started with Data in R}\label{getting-started}}
+
+Before we can start exploring data in R, there are some key concepts to understand first:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  What are R and RStudio?
+\item
+  How do I code in R?
+\item
+  What are R packages?
+\end{enumerate}
+
+We'll introduce these concepts in upcoming Sections \ref{r-rstudio}-\ref{packages}. If you are already somewhat familiar with these concepts, feel free to skip to Section \ref{nycflights13} where we'll introduce our first data set: all domestic flights departing a New York City airport in 2013. This is a dataset we will explore in depth in this book.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{r-rstudio}{%
+\section{What are R and RStudio?}\label{r-rstudio}}
+
+For much of this book, we will assume that you are using R via RStudio. First time users often confuse the two. At its simplest:
+
+\begin{itemize}
+\tightlist
+\item
+  R is like a car's engine.
+\item
+  RStudio is like a car's dashboard.
+\end{itemize}
+
+\begin{longtable}[]{@{}cc@{}}
+\toprule
+R: Engine & RStudio: Dashboard\tabularnewline
+\midrule
+\endhead
+\includegraphics[width=\textwidth,height=1.7in]{images/engine.jpg} & \includegraphics[width=\textwidth,height=1.7in]{images/dashboard.jpg}\tabularnewline
+\bottomrule
+\end{longtable}
+
+More precisely, R is a programming language that runs computations while RStudio is an \emph{integrated development environment (IDE)} that provides an interface by adding many convenient features and tools. So the way of having access to a speedometer, rearview mirrors, and a navigation system makes driving much easier, using RStudio's interface makes using R much easier as well.
+
+If you are still not sure about the difference between R and RStudio IDE, we suggest you watch this \href{https://campus.datacamp.com/courses/working-with-the-rstudio-ide-part-1/orientation?ex=1}{DataCamp video}.
+
+\hypertarget{installing-r-and-rstudio}{%
+\subsection{Installing R and RStudio}\label{installing-r-and-rstudio}}
+
+\begin{quote}
+\textbf{Note about RStudio Server}: If your instructor has provided you with a link and access to RStudio Server, then you can skip this section. We do recommend though after a few months of working on the RStudio Server that you return to these instructions.
+\end{quote}
+
+You will first need to download and install both R and RStudio (Desktop version) on your computer.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{You must do this first:} \href{https://cran.r-project.org/}{Download and install R}.
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Click on the download link corresponding to your computer's operating system.
+  \end{itemize}
+\item
+  \textbf{You must do this second:} \href{https://www.rstudio.com/products/rstudio/download3/}{Download and install RStudio}.
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Scroll down to ``Installers for Supported Platforms''
+  \item
+    Click on the download link corresponding to your computer's operating system.
+  \end{itemize}
+\end{enumerate}
+
+If you had trouble with these two steps, we suggest you watch this \href{https://campus.datacamp.com/courses/working-with-the-rstudio-ide-part-1/orientation?ex=3}{DataCamp video}.
+
+\hypertarget{using-r-via-rstudio}{%
+\subsection{Using R via RStudio}\label{using-r-via-rstudio}}
+
+Recall our car analogy from above. Much as we don't drive a car by interacting directly with the engine but rather by interacting with elements on the car's dashboard, we won't be using R directly but rather we will use RStudio's interface. After you install R and RStudio on your computer, you'll have two new programs AKA applications you can open. We will always work in RStudio and not R. In other words:
+
+\begin{longtable}[]{@{}cc@{}}
+\toprule
+R: Do not open this & RStudio: Open this\tabularnewline
+\midrule
+\endhead
+\includegraphics[width=0.25\textwidth,height=\textheight]{images/Rlogo.png} & \includegraphics[width=0.2\textwidth,height=\textheight]{images/RStudio-Ball.png}\tabularnewline
+\bottomrule
+\end{longtable}
+
+After you open RStudio, you should see the following:
+
+\includegraphics{images/rstudio.png}
+
+Note the three panes, which are three panels dividing the screen: The \emph{Console pane}, the \emph{Files pane}, and the \emph{Environment pane}. Over the course of this chapter, you'll come to learn what purpose each of these panes serve.
+
+If however you would like an in depth explanation right now however, we suggest you watch following \href{https://campus.datacamp.com/courses/working-with-the-rstudio-ide-part-1/orientation?ex=5}{DataCamp video}.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{code}{%
+\section{How do I code in R?}\label{code}}
+
+Now that you're set up with R and RStudio, you are probably asking yourself ``OK. Now how do I use R?'' The first thing to note as that unlike other statistical software programs like Excel, STATA, or SAS that provide \href{https://en.wikipedia.org/wiki/Point_and_click}{point and click} interfaces, R is an \href{https://en.wikipedia.org/wiki/Interpreted_language}{interpreted language}, meaning you have to enter in R commands written in R code. In other words, you have to code/program in R. Note that we'll use the terms ``coding'' and ``programming'' interchangeably in this book.
+
+While it is not required to be a seasoned coder/computer programmer to use R, there is still a set of basic programming concepts that R users need to understand. Consequently, while this book is not a book on programming, you will still learn just enough of these basic programming concepts needed to explore and analyze data effectively.
+
+\hypertarget{programming-concepts}{%
+\subsection{Basic programming concepts and terminology}\label{programming-concepts}}
+
+To introduce you to many of these basic programming concepts and terminology, we direct you to the following DataCamp online interactive tutorials. For each of the tutorials, we give a list of the basic programming concepts covered. Note that in this book, we will use a different font to distinguish regular font from \texttt{computer\_code}.
+
+It is important to note that while these tutorials serve as excellent introductions, a single pass through them is insufficient for long-term learning and retention. The ultimate tools for long-term learning and retention are ``learning by doing'' and repetition, something we will have you do over the course of the entire book and we encourage this process as much as possible as you learn any new skill.
+
+\begin{itemize}
+\tightlist
+\item
+  From the \href{https://www.datacamp.com/courses/free-introduction-to-r}{Introduction to R} course complete the following chapters. As you work through the chapters, carefully note the important terms and what they are used for. We recommend you do so in a notebook that you can easily refer back to.
+
+  \begin{itemize}
+  \tightlist
+  \item
+    \href{https://campus.datacamp.com/courses/free-introduction-to-r/chapter-1-intro-to-basics-1?ex=1}{Chapter 1 Intro to basics}:
+
+    \begin{itemize}
+    \tightlist
+    \item
+      Console pane: where you enter in commands
+    \item
+      Objects: where values are saved, how to assign values to objects.
+    \item
+      Data types: integers, doubles/numerics, logicals, characters.\\
+    \end{itemize}
+  \item
+    \href{https://campus.datacamp.com/courses/free-introduction-to-r/chapter-2-vectors-2?ex=1}{Chapter 2 Vectors}:
+
+    \begin{itemize}
+    \tightlist
+    \item
+      Vectors: a series of values. These are created using the \texttt{c()} function where \texttt{c()} stands for ``combine'' or ``concatenate''. For example: \texttt{c(6,\ 11,\ 13,\ 31,\ 90,\ 92)}.
+    \end{itemize}
+  \item
+    \href{https://campus.datacamp.com/courses/free-introduction-to-r/chapter-4-factors-4?ex=1}{Chapter 4 Factors}:
+
+    \begin{itemize}
+    \tightlist
+    \item
+      \emph{Categorical data} (as opposed to \emph{numerical data}) are represented in R as \texttt{factor}s.
+    \end{itemize}
+  \item
+    \href{https://campus.datacamp.com/courses/free-introduction-to-r/chapter-5-data-frames?ex=1}{Chapter 5 Data frames}:
+
+    \begin{itemize}
+    \tightlist
+    \item
+      Data frames are analogous to rectangular spreadsheets: they are representations of datasets in R where the rows correspond \emph{observations} and the columns correspond to \emph{variables} that describe the observations. We will revisit this later in Section \ref{nycflights13}.
+    \end{itemize}
+  \end{itemize}
+\item
+  From the \href{https://www.datacamp.com/courses/intermediate-r}{Intermediate R} course complete the following chapters:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    \href{https://campus.datacamp.com/courses/intermediate-r/chapter-1-conditionals-and-control-flow?ex=1}{Chapter 1 Conditionals and Control Flow}:
+
+    \begin{itemize}
+    \tightlist
+    \item
+      Testing for equality in R using \texttt{==} (and not \texttt{=} which is typically used for assignment). Ex: \texttt{2\ +\ 1\ ==\ 3} compares \texttt{2\ +\ 1} to \texttt{3} and is correct R syntax, while \texttt{2\ +\ 1\ =\ 3} is not and is incorrect R syntax.
+    \item
+      Boolean algebra: \texttt{TRUE/FALSE} statements and mathematical operators such as \texttt{\textless{}} (less than), \texttt{\textless{}=} (less than or equal), and \texttt{!=} (not equal to).
+    \item
+      Logical operators: \texttt{\&} representing ``and'', \texttt{\textbar{}} representing ``or''. Ex: \texttt{(2\ +\ 1\ ==\ 3)\ \&\ (2\ +\ 1\ ==\ 4)} returns \texttt{FALSE} while \texttt{(2\ +\ 1\ ==\ 3)\ \textbar{}\ (2\ +\ 1\ ==\ 4)} returns \texttt{TRUE}.
+    \end{itemize}
+  \item
+    \href{https://campus.datacamp.com/courses/intermediate-r/chapter-3-functions?ex=1}{Chapter 3 Functions}:
+
+    \begin{itemize}
+    \tightlist
+    \item
+      Concept of functions: they take in inputs (called \emph{arguments}) and return outputs.
+    \item
+      You either manually specify a function's arguments or use the function's \emph{defaults}.
+    \end{itemize}
+  \end{itemize}
+\end{itemize}
+
+This list is by no means an exhaustive list of all the programming concepts and terminology needed to become a savvy R user; such a list would be so large it wouldn't be very useful, especially for novices. Rather, we feel this is the bare minimum you need to know before you get started; the rest we feel you can learn as you go. Remember that your knowledge of all of these concepts will build as you get better and better at ``speaking R'' and getting used to its syntax.
+
+\hypertarget{errors-warnings-and-messages}{%
+\subsection{Errors, warnings, and messages}\label{errors-warnings-and-messages}}
+
+One slightly confusing part of R is how it reports errors, warnings, and messages. The default theme in RStudio colors errors, warnings, and messages in red, which makes them seem like you did something wrong. However, seeing red text in the console \emph{is not always bad.}
+
+R will show red text in the console in three different situations:
+
+\begin{itemize}
+\tightlist
+\item
+  \textbf{Errors}: When the red text is a legitimate error, it will be prefaced with ``Error in\ldots{}'' and try to explain what went wrong. Generally when there's an error, the code will not run. For example, as shown in Subsection \ref{package-use} below if you see \texttt{Error\ in\ ggplot(...)\ :\ could\ not\ find\ function\ "ggplot"}, it means that the \texttt{ggplot()} function is not accessible because the package was not loaded with \texttt{library(ggplot2)}, and thus you cannot use it.
+\item
+  \textbf{Warnings}: When the red text is a warning, it will be prefaced with ``Warning:'' and try to explain why there's a warning. Generally your code will still work, but with some caveats. For example, you see in Chapter \ref{viz} if you plot a scatterplot and one of the rows in your data frame is missing a value, you will see this warning: \texttt{Warning:\ Removed\ 1\ rows\ containing\ missing\ values\ (geom\_point)}. R will still make the scatterplot with all the remaining values, but it's warning you that one of the points isn't there.
+\item
+  \textbf{Messages}: When the red text doesn't start with either ``Error'' or ``Warning'', it's \emph{just a friendly message}. You'll see these messages when you load some packages like the \texttt{dplyr} package in Subsection \ref{package-loading} below, or when you read data saved in spreadsheet files with \texttt{read\_csv()} as you'll see in Chapter \ref{tidy}. These are helpful diagnostic messages and they don't stop your code from working.
+\end{itemize}
+
+Remember, when you see red text in the console, \emph{don't panic}. It doesn't necessarily mean anything is wrong.
+
+\begin{itemize}
+\tightlist
+\item
+  If the text starts with ``Error'', figure out what's causing it. {Think of errors as a red traffic light: something is wrong!}
+\item
+  If the text starts with ``Warning'', figure out if it's something to worry about. For instance, if you get a warning about missing values in a scatterplot and you know there are missing values, you're fine. If that's surprising, look at your data and see what's missing. {Think of warnings as a yellow traffic light: everything is working fine, but watch out/pay attention.}
+\item
+  Otherwise the text is just a message. Read it, wave back at R, and thank it for talking to you. {Think of messages as a green traffic light: everything is working fine.}
+\end{itemize}
+
+\hypertarget{tips-on-learning-to-code}{%
+\subsection{Tips on learning to code}\label{tips-on-learning-to-code}}
+
+Learning to code/program is very much like learning a foreign language, it can be very daunting and frustrating at first. Such frustrations are very common and it is very normal to feel discouraged as you learn. However just as with learning a foreign language, if you put in the effort and are not afraid to make mistakes, anybody can learn.
+
+Here are a few useful tips to keep in mind as you learn to program:
+
+\begin{itemize}
+\tightlist
+\item
+  \textbf{Remember that computers are not actually that smart}: You may think your computer or smartphone are ``smart,'' but really people spent a lot of time and energy designing them to appear ``smart.'' Rather you have to tell a computer everything it needs to do. Furthermore the instructions you give your computer can't have any mistakes in them, nor can they be ambiguous in any way.
+\item
+  \textbf{Take the ``copy, paste, and tweak'' approach}: Especially when learning your first programming language, it is often much easier to taking existing code that you know works and modify it to suit your ends, rather than trying to write new code from scratch. We call this the \emph{copy, paste, and tweak} approach. So early on, we suggest not trying to write code from memory, but rather take existing examples we have provided you, then copy, paste, and tweak them to suit your goals. Don't be afraid to play around!
+\item
+  \textbf{The best way to learn to code is by doing}: Rather than learning to code for its own sake, we feel that learning to code goes much smoother when you have a goal in mind or when you are working on a particular project, like analyzing data that you are interested in.
+\item
+  \textbf{Practice is key}: Just as the only method to improving your foreign language skills is through practice, practice, and practice; so also the only method to improving your coding is through practice, practice, and practice. Don't worry however; we'll give you plenty of opportunities to do so!
+\end{itemize}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{packages}{%
+\section{What are R packages?}\label{packages}}
+
+Another point of confusion with many new R users is the idea of an R package. R packages extend the functionality of R by providing additional functions, data, and documentation. They are written by a world-wide community of R users and can be downloaded for free from the internet. For example, among the many packages we will use in this book are:
+
+\begin{itemize}
+\tightlist
+\item
+  The \texttt{ggplot2} package for data visualization in Chapter \ref{viz}.
+\item
+  The \texttt{dplyr} package for data wrangling in Chapter \ref{wrangling}.
+\item
+  The \texttt{moderndive} package that accompanies this book.
+\item
+  The \texttt{infer} package for ``tidy'' and transparent statistical inference in Chapters \ref{confidence-intervals}, \ref{hypothesis-testing}, and \ref{inference-for-regression}.
+\end{itemize}
+
+A good analogy for R packages is they are like apps you can download onto a mobile phone:
+
+\begin{longtable}[]{@{}cc@{}}
+\toprule
+R: A new phone & R Packages: Apps you can download\tabularnewline
+\midrule
+\endhead
+\includegraphics[width=\textwidth,height=1.5in]{images/iphone.jpg} & \includegraphics[width=\textwidth,height=1.5in]{images/apps.jpg}\tabularnewline
+\bottomrule
+\end{longtable}
+
+So R is like a new mobile phone: while it has a certain amount of features when you use it for the first time, it doesn't have everything. R packages are like the apps you can download onto your phone from Apple's App Store or Android's Google Play.
+
+Let's continue this analogy by considering the Instagram app for editing and sharing pictures. Say you have purchased a new phone and you would like to share a recent photo you have taken on Instagram. You need to:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \emph{Install the app}: Since your phone is new and does not include the Instagram app, you need to download the app from either the App Store or Google Play. You do this once and you're set. You might do this again in the future any time there is an update to the app.
+\item
+  \emph{Open the app}: After you've installed Instagram, you need to open the app.
+\end{enumerate}
+
+Once Instagram is open on your phone, you can then proceed to share your photo with your friends and family. The process is very similar for using an R package. You need to:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \emph{Install the package}: This is like installing an app on your phone. Most packages are not installed by default when you install R and RStudio. Thus if you want to use a package for the first time, you need to install it first. Once you've installed a package, you likely won't install it again unless you want to update it to a newer version.
+\item
+  \emph{``Load'' the package}: ``Loading'' a package is like opening an app on your phone. Packages are not ``loaded'' by default when you start RStudio on your computer; you need to ``load'' each package you want to use every time you start RStudio.
+\end{enumerate}
+
+Let's now show you how to perform these two steps for the \texttt{ggplot2} package for data visualization.
+
+\hypertarget{package-installation}{%
+\subsection{Package installation}\label{package-installation}}
+
+\begin{quote}
+\textbf{Note about RStudio Server}: If your instructor has provided you with a link and access to RStudio Server, you probably will not need to install packages, as they have likely been pre-installed for you by your instructor. That being said, it is still a good idea to know this process for later on when you are not using RStudio Server, but rather RStudio Desktop on your own computer.
+\end{quote}
+
+There are two ways to install an R package. For example, to install the \texttt{ggplot2} package:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{Easy way}: In the Files pane of RStudio:
+
+  \begin{enumerate}
+  \def\labelenumii{\alph{enumii})}
+  \tightlist
+  \item
+    Click on the ``Packages'' tab
+  \item
+    Click on ``Install''
+  \item
+    Type the name of the package under ``Packages (separate multiple with space or comma):'' In this case, type \texttt{ggplot2}
+  \item
+    Click ``Install''\\
+    \includegraphics[width=\textwidth,height=4in]{images/install_packages_easy_way.png}
+  \end{enumerate}
+\item
+  \textbf{Slightly harder way}: An alternative but slightly less convenient way to install a package is by typing \texttt{install.packages("ggplot2")} in the Console pane of RStudio and hitting enter. Note you must include the quotation marks.
+\end{enumerate}
+
+Much like an app on your phone, you only have to install a package once. However, if you want to update an already installed package to a newer version, you need to re-install it by repeating the above steps.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC2.1)} Repeat the above installing steps, but for the \texttt{dplyr}, \texttt{nycflights13}, and \texttt{knitr} packages. This will install the earlier mentioned \texttt{dplyr} package, the \texttt{nycflights13} package containing data on all domestic flights leaving a NYC airport in 2013, and the \texttt{knitr} package for writing reports in R.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{package-loading}{%
+\subsection{Package loading}\label{package-loading}}
+
+Recall that after you've installed a package, you need to ``load'' it, in other words open it. We do this by using the \texttt{library()} command. For example, to load the \texttt{ggplot2} package, run the following code in the Console pane. What do we mean by ``run the following code''? Either type or copy \& paste the following code into the Console pane and then hit the enter key.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\end{Highlighting}
+\end{Shaded}
+
+If after running the above code, a blinking cursor returns next to the \texttt{\textgreater{}} ``prompt'' sign, it means you were successful and the \texttt{ggplot2} package is now loaded and ready to use. If however, you get a red ``error message'' that reads\ldots{}
+
+\begin{verbatim}
+Error in library(ggplot2) : there is no package called ‘ggplot2’
+\end{verbatim}
+
+\ldots{} it means that you didn't successfully install it. In that case, go back to the previous subsection ``Package installation'' and install it.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC2.2)} ``Load'' the \texttt{dplyr}, \texttt{nycflights13}, and \texttt{knitr} packages as well by repeating the above steps.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{package-use}{%
+\subsection{Package use}\label{package-use}}
+
+One extremely common mistake new R users make when wanting to use particular packages is they forget to ``load'' them first by using the \texttt{library()} command we just saw. Remember: \emph{you have to load each package you want to use every time you start RStudio.} If you don't first ``load'' a package, but attempt to use one of its features, you'll see an error message similar to:
+
+\begin{verbatim}
+Error: could not find function
+\end{verbatim}
+
+R is telling you that you are trying to use a function in a package that has not yet been ``loaded.'' Almost all new users forget do this when starting out, and it is a little annoying to get used. However, you'll remember with practice.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{nycflights13}{%
+\section{Explore your first datasets}\label{nycflights13}}
+
+Let's put everything we've learned so far into practice and start exploring some real data! Data comes to us in a variety of formats, from pictures to text to numbers. Throughout this book, we'll focus on datasets that are saved in ``spreadsheet''-type format; this is probably the most common way data are collected and saved in many fields. Remember from Subsection \ref{programming-concepts} that these ``spreadsheet''-type datasets are called \emph{data frames} in R; we will focus on working with data saved as data frames throughout this book.
+
+Let's first load all the packages needed for this chapter, assuming you've already installed them. Read Section \ref{packages} for information on how to install and load R packages if you haven't already.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(knitr)}
+\end{Highlighting}
+\end{Shaded}
+
+At the beginning of all subsequent chapters in this text, we'll always have a list of packages that you should have installed and loaded to work with that chapter's R code.
+
+\hypertarget{nycflights13-package}{%
+\subsection{\texorpdfstring{\texttt{nycflights13} package}{nycflights13 package}}\label{nycflights13-package}}
+
+Many of us have flown on airplanes or know someone who has. Air travel has become an ever-present aspect in many people's lives. If you live in or are visiting a relatively large city and you walk around that city's airport, you see gates showing flight information from many different airlines. And you will frequently see that some flights are delayed because of a variety of conditions. Are there ways that we can avoid having to deal with these flight delays?
+
+We'd all like to arrive at our destinations on time whenever possible. (Unless you secretly love hanging out at airports. If you are one of these people, pretend for the moment that you are very much anticipating being at your final destination.) Throughout this book, we're going to analyze data related to flights contained in the \texttt{nycflights13} package \citep{R-nycflights13}. Specifically, this package contains five data sets saved in five separate data frames with information about all domestic flights departing from New York City in 2013. These include Newark Liberty International (EWR), John F. Kennedy International (JFK), and LaGuardia (LGA) airports:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{flights}: Information on all 336,776 flights
+\item
+  \texttt{airlines}: A table matching airline names and their two letter IATA airline codes (also known as carrier codes) for 16 airline companies
+\item
+  \texttt{planes}: Information about each of 3,322 physical aircraft used.
+\item
+  \texttt{weather}: Hourly meteorological data for each of the three NYC airports. This data frame has 26,115 rows, roughly corresponding to the 365 \(\times\) 24 \(\times\) 3 = 26,280 possible hourly measurements one can observe at three locations over the course of a year.
+\item
+  \texttt{airports}: Airport names, codes, and locations for 1,458 destination airports.
+\end{itemize}
+
+\hypertarget{flights-data-frame}{%
+\subsection{\texorpdfstring{\texttt{flights} data frame}{flights data frame}}\label{flights-data-frame}}
+
+We will begin by exploring the \texttt{flights} data frame that is included in the \texttt{nycflights13} package and getting an idea of its structure. Run the following code in your console (either by typing it or cutting \& pasting it): it loads in the \texttt{flights} dataset into your Console. Note depending on the size of your monitor, the output may vary slightly.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 336,776 x 19
+    year month   day dep_time sched_dep_time dep_delay arr_time
+   <int> <int> <int>    <int>          <int>     <dbl>    <int>
+ 1  2013     1     1      517            515         2      830
+ 2  2013     1     1      533            529         4      850
+ 3  2013     1     1      542            540         2      923
+ 4  2013     1     1      544            545        -1     1004
+ 5  2013     1     1      554            600        -6      812
+ 6  2013     1     1      554            558        -4      740
+ 7  2013     1     1      555            600        -5      913
+ 8  2013     1     1      557            600        -3      709
+ 9  2013     1     1      557            600        -3      838
+10  2013     1     1      558            600        -2      753
+# ... with 336,766 more rows, and 12 more variables:
+#   sched_arr_time <int>, arr_delay <dbl>, carrier <chr>,
+#   flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
+#   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
+#   time_hour <dttm>
+\end{verbatim}
+
+Let's unpack this output:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{A\ tibble:\ 336,776\ x\ 19}: A \texttt{tibble} is a kind of data frame used in R. This particular data frame has
+
+  \begin{itemize}
+  \tightlist
+  \item
+    \texttt{336,776} rows
+  \item
+    \texttt{19} columns corresponding to 19 variables describing each observation
+  \end{itemize}
+\item
+  \texttt{year\ month\ day\ dep\_time\ sched\_dep\_time\ dep\_delay\ arr\_time} are different columns, in other words variables, of this data frame.
+\item
+  We then have the first 10 rows of observations corresponding to 10 flights.
+\item
+  \texttt{...\ with\ 336,766\ more\ rows,\ and\ 11\ more\ variables:} indicating to us that 336,766 more rows of data and 11 more variables could not fit in this screen.
+\end{itemize}
+
+Unfortunately, this output does not allow us to explore the data very well. Let's look at different tools to explore data frames.
+
+\hypertarget{exploredataframes}{%
+\subsection{Exploring data frames}\label{exploredataframes}}
+
+Among the many ways of getting a feel for the data contained in a data frame such as \texttt{flights}, we present three functions that take as their ``argument'', in other words their input, the data frame in question. We also include a fourth method for exploring one particular column of a data frame:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Using the \texttt{View()} function built for use in RStudio. We will use this the most.
+\item
+  Using the \texttt{glimpse()} function, which is included in the \texttt{dplyr} package.
+\item
+  Using the \texttt{kable()} function, which is included in the \texttt{knitr} package.
+\item
+  Using the \texttt{\$} operator to view a single variable in a data frame.
+\end{enumerate}
+
+\textbf{1. \texttt{View()}}:
+
+Run \texttt{View(flights)} in your Console in RStudio, either by typing it or cutting \& pasting it into the Console pane, and explore this data frame in the resulting pop-up viewer. You should get into the habit of always \texttt{View}ing any data frames that come your way. Note the capital ``V'' in \texttt{View}. R is case-sensitive so you'll receive an error is you run \texttt{view(flights)} instead of \texttt{View(flights)}.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC2.3)} What does any \emph{ONE} row in this \texttt{flights} dataset refer to?
+
+\begin{itemize}
+\tightlist
+\item
+  A. Data on an airline
+\item
+  B. Data on a flight
+\item
+  C. Data on an airport
+\item
+  D. Data on multiple flights
+\end{itemize}
+
+\begin{learncheck}
+
+\end{learncheck}
+
+By running \texttt{View(flights)}, we see the different \emph{variables} listed in the columns and we see that there are different types of variables. Some of the variables like \texttt{distance}, \texttt{day}, and \texttt{arr\_delay} are what we will call \emph{quantitative} variables. These variables are numerical in nature. Other variables here are \emph{categorical}.
+
+Note that if you look in the leftmost column of the \texttt{View(flights)} output, you will see a column of numbers. These are the row numbers of the dataset. If you glance across a row with the same number, say row 5, you can get an idea of what each row corresponds to. In other words, this will allow you to identify what object is being referred to in a given row. This is often called the \emph{observational unit}. The observational unit in this example is an individual flight departing New York City in 2013. You can identify the observational unit by determining what ``thing'' is being measured or described by each of the variables. We'll talk more about observational units in Section \ref{identification-vs-measurement-variables} on \emph{identification} and \emph{measurement} variables below.
+
+\textbf{2. \texttt{glimpse()}}:
+
+The second way to explore a data frame is using the \texttt{glimpse()} function included in the \texttt{dplyr} package. Thus, you can only use the \texttt{glimpse()} function after you've loaded the \texttt{dplyr} package. This function provides us with an alternative method for exploring a data frame than the \texttt{View()} function:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{glimpse}\NormalTok{(flights)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Observations: 336,776
+Variables: 19
+$ year           <int> 2013, 2013, 2013, 2013, 2013, 2013, 2...
+$ month          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
+$ day            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
+$ dep_time       <int> 517, 533, 542, 544, 554, 554, 555, 55...
+$ sched_dep_time <int> 515, 529, 540, 545, 600, 558, 600, 60...
+$ dep_delay      <dbl> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, ...
+$ arr_time       <int> 830, 850, 923, 1004, 812, 740, 913, 7...
+$ sched_arr_time <int> 819, 830, 850, 1022, 837, 728, 854, 7...
+$ arr_delay      <dbl> 11, 20, 33, -18, -25, 12, 19, -14, -8...
+$ carrier        <chr> "UA", "UA", "AA", "B6", "DL", "UA", "...
+$ flight         <int> 1545, 1714, 1141, 725, 461, 1696, 507...
+$ tailnum        <chr> "N14228", "N24211", "N619AA", "N804JB...
+$ origin         <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "E...
+$ dest           <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "O...
+$ air_time       <dbl> 227, 227, 160, 183, 116, 150, 158, 53...
+$ distance       <dbl> 1400, 1416, 1089, 1576, 762, 719, 106...
+$ hour           <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6...
+$ minute         <dbl> 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0,...
+$ time_hour      <dttm> 2013-01-01 05:00:00, 2013-01-01 05:0...
+\end{verbatim}
+
+We see that \texttt{glimpse()} will give you the first few entries of each variable in a row after the variable. In addition, the \emph{data type} (see Subsection \ref{programming-concepts}) of the variable is given immediately after each variable's name inside \texttt{\textless{}\ \textgreater{}}. Here, \texttt{int} and \texttt{dbl} refer to ``integer'' and ``double'', which are computer coding terminology for quantitative/numerical variables. In contrast, \texttt{chr} refers to ``character'', which is computer terminology for text data. Text data, such as the \texttt{carrier} or \texttt{origin} of a flight, are categorical variables. The \texttt{time\_hour} variable is an example of one more type of data type: \texttt{dttm}. As you may suspect, this variable corresponds to a specific date and time of day. However, we won't work with dates in this class and leave it to a more advanced book on data science.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC2.4)} What are some examples in this dataset of \textbf{categorical} variables? What makes them different than \textbf{quantitative} variables?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\textbf{3. \texttt{kable()}}:
+
+The final way to explore the entirety of a data frame is using the \texttt{kable()} function from the \texttt{knitr} package. Let's explore the different carrier codes for all the airlines in our dataset two ways. Run both of these lines of code in your Console:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{airlines}
+\KeywordTok{kable}\NormalTok{(airlines)}
+\end{Highlighting}
+\end{Shaded}
+
+At first glance, it may not appear that there is much difference in the outputs. However when using tools for document production such as \href{http://rmarkdown.rstudio.com/lesson-1.html}{R Markdown}, the latter code produces output that is much more legible and reader-friendly.
+
+\textbf{4. \texttt{\$} operator}
+
+Lastly, the \texttt{\$} operator allows us to explore a single variable within a data frame. For example, run the following in your console
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{airlines}
+\NormalTok{airlines}\OperatorTok{$}\NormalTok{name}
+\end{Highlighting}
+\end{Shaded}
+
+We used the \texttt{\$} operator to extract only the \texttt{name} variable and return it as a vector of length 16. We will only be occasionally exploring data frames using this operator, instead favoring the \texttt{View()} and \texttt{glimpse()} functions.
+
+\hypertarget{identification-vs-measurement-variables}{%
+\subsection{Identification \& measurement variables}\label{identification-vs-measurement-variables}}
+
+There is a subtle difference between the kinds of variables that you will encounter in data frames: \emph{identification variables} and \emph{measurement variables}. For example, let's explore the \texttt{airports} data frame by showing the output of \texttt{glimpse(airports)} below:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{glimpse}\NormalTok{(airports)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Observations: 1,458
+Variables: 8
+$ faa   <chr> "04G", "06A", "06C", "06N", "09J", "0A9", "0G6...
+$ name  <chr> "Lansdowne Airport", "Moton Field Municipal Ai...
+$ lat   <dbl> 41.1, 32.5, 42.0, 41.4, 31.1, 36.4, 41.5, 42.9...
+$ lon   <dbl> -80.6, -85.7, -88.1, -74.4, -81.4, -82.2, -84....
+$ alt   <int> 1044, 264, 801, 523, 11, 1593, 730, 492, 1000,...
+$ tz    <dbl> -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6...
+$ dst   <chr> "A", "A", "A", "A", "A", "A", "A", "A", "U", "...
+$ tzone <chr> "America/New_York", "America/Chicago", "Americ...
+\end{verbatim}
+
+The variables \texttt{faa} and \texttt{name} are what we will call \emph{identification variables}: variables that uniquely identify each observational unit. They are mainly used to provide a unique name to each observational unit i.e.~row, thereby allowing us to uniquely identify them. \texttt{faa} gives the unique code provided by the FAA for that airport, while the \texttt{name} variable gives the longer more natural name of the airport. The remaining variables (\texttt{lat}, \texttt{lon}, \texttt{alt}, \texttt{tz}, \texttt{dst}, \texttt{tzone}) are often called \emph{measurement} or \emph{characteristic} variables: variables that describe properties of each observational unit, in other words each observation in each row. For example, \texttt{lat} and \texttt{long} describe the latitude and longitude of each airport.
+
+Furthermore, sometimes a single variable might not be enough to uniquely identify each observational unit: combinations of variables might be needed. While it is not an absolute rule, for organizational purposes it is considered good practice to have your identification variables in the left-most columns of your data frame.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC2.5)} What properties of the observational unit do each of \texttt{lat}, \texttt{lon}, \texttt{alt}, \texttt{tz}, \texttt{dst}, and \texttt{tzone} describe for the \texttt{airports} data frame? Note that you may want to use \texttt{?airports} to get more information.
+
+\textbf{(LC2.6)} Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy data frame that matches these conditions.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{help-files}{%
+\subsection{Help files}\label{help-files}}
+
+Another nice feature of R is the help system. You can get help in R by entering a \texttt{?} before the name of a function or data frame in question and you will be presented with a page showing the documentation. For example, let's look at the help file for the \texttt{flights} data frame:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{?flights}
+\end{Highlighting}
+\end{Shaded}
+
+A help file should pop-up in the Help pane of RStudio. If you have questions about a function or data frame included in an R package, you should get in the habit of consulting the help file right away.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{conclusion}{%
+\section{Conclusion}\label{conclusion}}
+
+We've given you what we feel are the most essential concepts to know before you can start exploring data in R. Is this chapter exhaustive? Absolutely not. To try to include everything in this chapter would make the chapter so large it wouldn't be useful!
+
+\hypertarget{additional-resources}{%
+\subsection{Additional resources}\label{additional-resources}}
+
+If you are completely new to the world of coding, R, and RStudio and feel you could benefit from a more detailed introduction, we suggest you check out ModernDive co-author Chester Ismay's \href{https://rbasics.netlify.com/}{Getting used to R, RStudio, and R Markdown} short book \citep{usedtor2016}, which includes screencast recordings that you can follow along and pause as you learn. Furthermore, there is an introduction to R Markdown, a tool used for reproducible research in R.
+
+\includegraphics[width=\textwidth,height=3.5in]{images/gettting-used-to-R.png}
+
+\hypertarget{whats-to-come}{%
+\subsection{What's to come?}\label{whats-to-come}}
+
+As we stated earlier however, the best way to learn R is to learn by doing. We now start the ``data science'' portion of the book in Chapter \ref{viz} with what we feel is the most important tool in a data scientist's toolbox: data visualization. We will continue to explore the data included in the \texttt{nycflights13} package through data visualization. We'll see that data visualization is a powerful tool to add to our toolbox for data exploring that provides additional insight to what the \texttt{View()} and \texttt{glimpse()} functions can provide.
+
+\begin{figure}
+
+{\centering \includegraphics[width=1.1\linewidth]{images/flowcharts/flowchart/flowchart.004} 
+
+}
+
+\caption{ModernDive flowchart}\label{fig:unnamed-chunk-16}
+\end{figure}
+
+\hypertarget{part-data-science-via-the-tidyverse}{%
+\part{Data Science via the tidyverse}\label{part-data-science-via-the-tidyverse}}
+
+\hypertarget{viz}{%
+\chapter{Data Visualization}\label{viz}}
+
+We begin the development of your data science toolbox with data visualization. By visualizing our data, we gain valuable insights that we couldn't initially see from just looking at the raw data in spreadsheet form. We will use the \texttt{ggplot2} package as it provides an easy way to customize your plots. \texttt{ggplot2} is rooted in the data visualization theory known as \emph{The Grammar of Graphics} \citep{wilkinson2005}.
+
+At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). Graphics should be designed to emphasize the findings and insight you want your audience to understand. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don't want to include so many as to overwhelm your audience.
+
+As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the \emph{distribution} of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is \emph{distributed} in terms of its values) as we go across the levels of a different categorical variable.
+
+\hypertarget{needed-packages}{%
+\subsection*{Needed packages}\label{needed-packages}}
+
+
+Let's load all the packages needed for this chapter (this assumes you've already installed them). Read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{grammarofgraphics}{%
+\section{The Grammar of Graphics}\label{grammarofgraphics}}
+
+We begin with a discussion of a theoretical framework for data visualization known as ``The Grammar of Graphics,'' which serves as the foundation for the \texttt{ggplot2} package. Think of how we construct sentences in English to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can't just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, ``The Grammar of Graphics'' define a set of rules for constructing \emph{statistical graphics} by combining different types of \emph{layers}. This grammar was created by Leland Wilkinson \citep{wilkinson2005} and has been implemented in a variety of data visualization software including R.
+
+\hypertarget{components-of-the-grammar}{%
+\subsection{Components of the Grammar}\label{components-of-the-grammar}}
+
+In short, the grammar tells us that:
+
+\begin{quote}
+\textbf{A statistical graphic is a \texttt{mapping} of \texttt{data} variables to \texttt{aes}thetic attributes of \texttt{geom}etric objects.}
+\end{quote}
+
+Specifically, we can break a graphic into the following three essential components:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \texttt{data}: the data set composed of variables that we map.
+\item
+  \texttt{geom}: the geometric object in question. This refers to the type of object we can observe in a plot. For example: points, lines, and bars.
+\item
+  \texttt{aes}: aesthetic attributes of the geometric object. For example, x/y position, color, shape, and size. Each assigned aesthetic attribute can be mapped to a variable in our data set.
+\end{enumerate}
+
+You might be wondering why we wrote the terms \texttt{data}, \texttt{geom}, and \texttt{aes} in a computer code type font. We'll see very shortly that we'll specify the elements of the grammar in R using these terms. However, let's first break down the grammar with an example.
+
+\hypertarget{gapminder}{%
+\subsection{Gapminder data}\label{gapminder}}
+
+In February 2006, a statistician named Hans Rosling gave a TED talk titled \href{https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen}{``The best stats you've ever seen''} where he presented global economic, health, and development data from the website \href{http://www.gapminder.org/tools/\#_locale_id=en;\&chart-type=bubbles}{gapminder.org}. For example, for the 142 countries included from 2007, let's consider only the first 6 countries when listed alphabetically in Table \ref{tab:gapminder-2007}.
+
+\begin{table}[H]
+
+\caption{\label{tab:gapminder-2007}Gapminder 2007 Data: First 6 of 142 countries}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{llrrr}
+\toprule
+Country & Continent & Life Expectancy & Population & GDP per Capita\\
+\midrule
+Afghanistan & Asia & 43.8 & 31889923 & 975\\
+Albania & Europe & 76.4 & 3600523 & 5937\\
+Algeria & Africa & 72.3 & 33333216 & 6223\\
+Angola & Africa & 42.7 & 12420476 & 4797\\
+Argentina & Americas & 75.3 & 40301927 & 12779\\
+\addlinespace
+Australia & Oceania & 81.2 & 20434176 & 34435\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Each row in this table corresponds to a country in 2007. For each row, we have 5 columns:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{Country}: Name of country.
+\item
+  \textbf{Continent}: Which of the five continents the country is part of. (Note that ``Americas'' includes countries in both North and South America and that Antarctica is excluded.)
+\item
+  \textbf{Life Expectancy}: Life expectancy in years.
+\item
+  \textbf{Population}: Number of people living in the country.
+\item
+  \textbf{GDP per Capita}: Gross domestic product (in US dollars).
+\end{enumerate}
+
+Now consider Figure \ref{fig:gapminder}, which plots this data for all 142 countries in the data.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/gapminder-1} 
+
+}
+
+\caption{Life Expectancy over GDP per Capita in 2007}\label{fig:gapminder}
+\end{figure}
+
+Let's view this plot through the grammar of graphics:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The \texttt{data} variable \textbf{GDP per Capita} gets mapped to the \texttt{x}-position \texttt{aes}thetic of the points.
+\item
+  The \texttt{data} variable \textbf{Life Expectancy} gets mapped to the \texttt{y}-position \texttt{aes}thetic of the points.
+\item
+  The \texttt{data} variable \textbf{Population} gets mapped to the \texttt{size} \texttt{aes}thetic of the points.
+\item
+  The \texttt{data} variable \textbf{Continent} gets mapped to the \texttt{color} \texttt{aes}thetic of the points.
+\end{enumerate}
+
+We'll see shortly that \texttt{data} corresponds to the particular data frame where our data is saved and a ``data variable'' corresponds to a particular column in the data frame. Furthermore, the type of \texttt{geom}etric object considered in this plot are points. That being said, while in this example we are considering points, graphics are not limited to just points. Other plots involve lines while others involve bars.
+
+Let's summarize the three essential components of the Grammar in Table \ref{tab:summary-table-gapminder}.
+
+\begin{table}[H]
+
+\caption{\label{tab:summary-table-gapminder}Summary of Grammar of Graphics for this plot}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lll}
+\toprule
+data variable & aes & geom\\
+\midrule
+GDP per Capita & x & point\\
+Life Expectancy & y & point\\
+Population & size & point\\
+Continent & color & point\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+\hypertarget{other-components}{%
+\subsection{Other components}\label{other-components}}
+
+There are other components of the Grammar of Graphics we can control as well. As you start to delve deeper into the Grammar of Graphics, you'll start to encounter these topics more frequently. In this book however, we'll keep things simple and only work with the two additional components listed below:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{facet}ing breaks up a plot into small multiples corresponding to the levels of another variable (Section \ref{facets})
+\item
+  \texttt{position} adjustments for barplots (Section \ref{geombar})
+\end{itemize}
+
+Other more complex components like \texttt{scales} and \texttt{coord}inate systems are left for a more advanced text such as \href{http://r4ds.had.co.nz/data-visualisation.html\#aesthetic-mappings}{R for Data Science} \citep{rds2016}. Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifying them.
+
+\hypertarget{ggplot2-package}{%
+\subsection{ggplot2 package}\label{ggplot2-package}}
+
+In this book, we will be using the \texttt{ggplot2} package for data visualization, which is an implementation of the Grammar of Graphics for R \citep{R-ggplot2}. As we noted earlier, a lot of the previous section was written in a computer code type font. This is because the various components of the Grammar of Graphics are specified in the \texttt{ggplot()} function included in the \texttt{ggplot2} package, which expects at a minimum as arguments (i.e.~inputs):
+
+\begin{itemize}
+\tightlist
+\item
+  The data frame where the variables exist: the \texttt{data} argument.
+\item
+  The mapping of the variables to aesthetic attributes: the \texttt{mapping} argument which specifies the \texttt{aes}thetic attributes involved.
+\end{itemize}
+
+After we've specified these components, we then add \emph{layers} to the plot using the \texttt{+} sign. The most essential layer to add to a plot is the layer that specifies which type of \texttt{geom}etric object we want the plot to involve: points, lines, bars, and others. Other layers we can add to a plot include layers specifying the plot title, axes labels, visual themes for the plots, and facets (which we'll see in Section \ref{facets}).
+
+Let's now put the theory of the Grammar of Graphics into practice.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{FiveNG}{%
+\section{Five Named Graphs - The 5NG}\label{FiveNG}}
+
+In order to keep things simple, we will only five different types of graphics in this book, each with a commonly given name. We term these ``five named graphs'' the \textbf{5NG}:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  scatterplots
+\item
+  linegraphs
+\item
+  boxplots
+\item
+  histograms
+\item
+  barplots
+\end{enumerate}
+
+We will discuss some variations of these plots, but with this basic repertoire of graphics in your toolbox you can visualize a wide array of different variable types. Note that certain plots are only appropriate for categorical variables and while others are only appropriate for quantitative variables. You'll want to quiz yourself often as we go along on which plot makes sense a given a particular problem or data set.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{scatterplots}{%
+\section{5NG\#1: Scatterplots}\label{scatterplots}}
+
+The simplest of the 5NG are \emph{scatterplots}, also called bivariate plots. They allow you to visualize the relationship between two numerical variables. While you may already be familiar with scatterplots, let's view them through the lens of the Grammar of Graphics. Specifically, we will visualize the relationship between the following two numerical variables in the \texttt{flights} data frame included in the \texttt{nycflights13} package:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \texttt{dep\_delay}: departure delay on the horizontal ``x'' axis and
+\item
+  \texttt{arr\_delay}: arrival delay on the vertical ``y'' axis
+\end{enumerate}
+
+for Alaska Airlines flights leaving NYC in 2013. This requires paring down the data from all 336,776 flights that left NYC in 2013, to only the 714 \emph{Alaska Airlines} flights that left NYC in 2013.
+
+What this means computationally is: we'll take the \texttt{flights} data frame, extract only the 714 rows corresponding to Alaska Airlines flights, and save this in a new data frame called \texttt{alaska\_flights}. Run the code below to do this:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{alaska_flights <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(carrier }\OperatorTok{==}\StringTok{ "AS"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+For now we suggest you ignore how this code works; we'll explain this in detail in Chapter \ref{wrangling} when we cover data wrangling. However, convince yourself that this code does what it is supposed to by running \texttt{View(alaska\_flights)}: it creates a new data frame \texttt{alaska\_flights} consisting of only the 714 Alaska Airlines flights.
+
+We'll see later in Chapter \ref{wrangling} on data wrangling that this code uses the \texttt{dplyr} package for data wrangling to achieve our goal: it takes the \texttt{flights} data frame and \texttt{filter}s it to only return the rows where \texttt{carrier} is equal to \texttt{"AS"}, Alaska Airlines' carrier code. Other examples of carrier codes include ``AA'' for American Airlines and ``UA'' for United Airlines. Recall from Section \ref{code} that testing for equality is specified with \texttt{==} and not \texttt{=}. Fasten your seat belts and sit tight for now however, we'll introduce these ideas more fully in Chapter \ref{wrangling}.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.1)} Take a look at both the \texttt{flights} and \texttt{alaska\_flights} data frames by running \texttt{View(flights)} and \texttt{View(alaska\_flights)}. In what respect do these data frames differ?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{geompoint}{%
+\subsection{Scatterplots via geom\_point}\label{geompoint}}
+
+Let's now go over the code that will create the desired scatterplot, keeping in mind our discussion on the Grammar of Graphics in Section \ref{grammarofgraphics}. We'll be using the \texttt{ggplot()} function included in the \texttt{ggplot2} package.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ alaska_flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ dep_delay, }\DataTypeTok{y =}\NormalTok{ arr_delay)) }\OperatorTok{+}\StringTok{ }
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+Let's break this down piece-by-piece:
+
+\begin{itemize}
+\tightlist
+\item
+  Within the \texttt{ggplot()} function, we specify two of the components of the Grammar of Graphics as arguments (i.e.~inputs):
+
+  \begin{enumerate}
+  \def\labelenumi{\arabic{enumi}.}
+  \tightlist
+  \item
+    The \texttt{data} frame to be \texttt{alaska\_flights} by setting \texttt{data\ =\ alaska\_flights}.
+  \item
+    The \texttt{aes}thetic \texttt{mapping} by setting \texttt{aes(x\ =\ dep\_delay,\ y\ =\ arr\_delay)}. Specifically:
+
+    \begin{itemize}
+    \tightlist
+    \item
+      the variable \texttt{dep\_delay} maps to the \texttt{x} position aesthetic
+    \item
+      the variable \texttt{arr\_delay} maps to the \texttt{y} position aesthetic
+    \end{itemize}
+  \end{enumerate}
+\item
+  We add a layer to the \texttt{ggplot()} function call using the \texttt{+} sign. The layer in question specifies the third component of the grammar: the \texttt{geom}etric object. In this case the geometric object are points, set by specifying \texttt{geom\_point()}.
+\end{itemize}
+
+After running the above code, you'll notice two outputs: a warning message and the graphic shown in Figure \ref{fig:noalpha}. Let's first unpack the warning message:
+
+\begin{verbatim}
+Warning: Removed 5 rows containing missing values (geom_point).
+\end{verbatim}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/noalpha-1} 
+
+}
+
+\caption{Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013}\label{fig:noalpha}
+\end{figure}
+
+After running the above code, R returns a warning message alerting us to the fact that 5 rows were ignored due to them being missing. For 5 rows either the value for \texttt{dep\_delay} or \texttt{arr\_delay} or both were missing (recorded in R as \texttt{NA}), and thus these rows were ignored in our plot. Turning our attention to the resulting scatterplot in Figure \ref{fig:noalpha}, we see that a positive relationship exists between \texttt{dep\_delay} and \texttt{arr\_delay}: as departure delays increase, arrival delays tend to also increase. We also note the large mass of points clustered near (0, 0).
+
+Before we continue, let's consider a few more notes on the layers in the above code that generated the scatterplot:
+
+\begin{itemize}
+\tightlist
+\item
+  Note that the \texttt{+} sign comes at the end of lines, and not at the beginning. You'll get an error in R if you put it at the beginning.
+\item
+  When adding layers to a plot, you are encouraged to start a new line after the \texttt{+} so that the code for each layer is on a new line. As we add more and more layers to plots, you'll see this will greatly improve the legibility of your code.
+\item
+  To stress the importance of adding layers in particular the layer specifying the \texttt{geom}etric object, consider Figure \ref{fig:nolayers} where no layers are added. A not very useful plot!
+\end{itemize}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ alaska_flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ dep_delay, }\DataTypeTok{y =}\NormalTok{ arr_delay))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/nolayers-1} 
+
+}
+
+\caption{Plot with No Layers}\label{fig:nolayers}
+\end{figure}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.2)} What are some practical reasons why \texttt{dep\_delay} and \texttt{arr\_delay} have a positive relationship?
+
+\textbf{(LC3.3)} What variables (not necessarily in the \texttt{flights} data frame) would you expect to have a negative correlation (i.e.~a negative relationship) with \texttt{dep\_delay}? Why? Remember that we are focusing on numerical variables here.
+
+\textbf{(LC3.4)} Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights?
+
+\textbf{(LC3.5)} What are some other features of the plot that stand out to you?
+
+\textbf{(LC3.6)} Create a new scatterplot using different variables in the \texttt{alaska\_flights} data frame by modifying the example above.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{overplotting}{%
+\subsection{Over-plotting}\label{overplotting}}
+
+The large mass of points near (0, 0) in Figure \ref{fig:noalpha} can cause some confusion as it is hard to tell the true number of points that are plotted. This is the result of a phenomenon called \emph{overplotting}. As one may guess, this corresponds to values being plotted on top of each other \emph{over} and \emph{over} again. It is often difficult to know just how many values are plotted in this way when looking at a basic scatterplot as we have here. There are two methods to address the issue of overplotting:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  By adjusting the transparency of the points.
+\item
+  By adding a little random ``jitter'', or random ``nudges'', to each of the points.
+\end{enumerate}
+
+\textbf{Method 1: Changing the transparency}
+
+The first way of addressing overplotting is by changing the transparency of the points by using the \texttt{alpha} argument in \texttt{geom\_point()}. By default, this value is set to \texttt{1}. We can change this to any value between \texttt{0} and \texttt{1}, where \texttt{0} sets the points to be 100\% transparent and \texttt{1} sets the points to be 100\% opaque. Note how the following code is identical to the code in Section \ref{scatterplots} that created the scatterplot with overplotting, but with \texttt{alpha\ =\ 0.2} added to the \texttt{geom\_point()}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ alaska_flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ dep_delay, }\DataTypeTok{y =}\NormalTok{ arr_delay)) }\OperatorTok{+}\StringTok{ }
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{alpha =} \FloatTok{0.2}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/alpha-1} 
+
+}
+
+\caption{Delay scatterplot with alpha=0.2}\label{fig:alpha}
+\end{figure}
+
+The key feature to note in Figure \ref{fig:alpha} is that the transparency of the points is cumulative: areas with a high-degree of overplotting are darker, whereas areas with a lower degree are less dark. Note furthermore that there is no \texttt{aes()} surrounding \texttt{alpha\ =\ 0.2}. This is because we are not mapping a variable to an aesthetic attribute, but rather merely changing the default setting of \texttt{alpha}. In fact, you'll receive an error if you try to change the second line above to read \texttt{geom\_point(aes(alpha\ =\ 0.2))}.
+
+\textbf{Method 2: Jittering the points}
+
+The second way of addressing overplotting is by \emph{jittering} all the points, in other words give each point a small nudge in a random direction. You can think of ``jittering'' as shaking the points around a bit on the plot. Let's illustrate using a simple example first. Say we have a data frame \texttt{jitter\_example} with 4 rows of identical value 0 for both \texttt{x} and \texttt{y}:
+
+\begin{verbatim}
+# A tibble: 4 x 2
+      x     y
+  <dbl> <dbl>
+1     0     0
+2     0     0
+3     0     0
+4     0     0
+\end{verbatim}
+
+We display the resulting scatterplot in Figure \ref{fig:jitter-example-plot-1}; observe that the 4 points are superimposed on top of each other. While we know there are 4 values being plotted, this fact might not be apparent to others.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/jitter-example-plot-1-1} 
+
+}
+
+\caption{Regular scatterplot of jitter example data}\label{fig:jitter-example-plot-1}
+\end{figure}
+
+In Figure \ref{fig:jitter-example-plot-2} we instead display a \emph{jittered scatterplot} where each point is given a random ``nudge.'' It is now plainly evident that this plot involves four points. Keep in mind that jittering is strictly a visualization tool; even after creating a jittered scatterplot, the original values saved in \texttt{jitter\_example} remain unchanged.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/jitter-example-plot-2-1} 
+
+}
+
+\caption{Jittered scatterplot of jitter example data}\label{fig:jitter-example-plot-2}
+\end{figure}
+
+To create a jittered scatterplot, instead of using \texttt{geom\_point()}, we use \texttt{geom\_jitter()}. To specify how much jitter to add, we adjust the \texttt{width} and \texttt{height} arguments. This corresponds to how hard you'd like to shake the plot in units corresponding to those for both the horizontal and vertical variables (in this case minutes).
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ alaska_flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ dep_delay, }\DataTypeTok{y =}\NormalTok{ arr_delay)) }\OperatorTok{+}\StringTok{ }
+\StringTok{  }\KeywordTok{geom_jitter}\NormalTok{(}\DataTypeTok{width =} \DecValTok{30}\NormalTok{, }\DataTypeTok{height =} \DecValTok{30}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/jitter-1} 
+
+}
+
+\caption{Jittered delay scatterplot}\label{fig:jitter}
+\end{figure}
+
+Observe how the above code is identical to the code that created the scatterplot with overplotting in Subsection \ref{geompoint}, but with \texttt{geom\_point()} replaced with \texttt{geom\_jitter()}.
+
+The resulting plot in Figure \ref{fig:jitter} helps us a little bit in getting a sense for the overplotting, but with a relatively large data set like this one (714 flights), it can be argued that changing the transparency of the points by setting \texttt{alpha} proved more effective. In terms of how much jitter one should add using the \texttt{width} and \texttt{height} arguments, it is important to add just enough jitter to break any overlap in points, but not so much that we completely alter the overall pattern in points.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.7)} Why is setting the \texttt{alpha} argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot?
+
+\textbf{(LC3.8)} After viewing the Figure \ref{fig:alpha} above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the \texttt{alpha\ =\ 0.2} set in Figure \ref{fig:noalpha}?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{summary}{%
+\subsection{Summary}\label{summary}}
+
+Scatterplots display the relationship between two numerical variables. They are among the most commonly used plots because they can provide an immediate way to see the trend in one variable versus another. However, if you try to create a scatterplot where either one of the two variables is not numerical, you might get strange results. Be careful!
+
+With medium to large data sets, you may need to play around with the different modifications one can make to a scatterplot. This tweaking is often a fun part of data visualization, since you'll have the chance to see different relationships come about as you make subtle changes to your plots.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{linegraphs}{%
+\section{5NG\#2: Linegraphs}\label{linegraphs}}
+
+The next of the five named graphs are linegraphs. Linegraphs show the relationship between two numerical variables when the variable on the x-axis, also called the \emph{explanatory} variable, is of a sequential nature; in other words there is an inherent ordering to the variable. The most common example of linegraphs have some notion of time on the x-axis: hours, days, weeks, years, etc. Since time is sequential, we connect consecutive observations of the variable on the y-axis with a line. Linegraphs that have some notion of time on the x-axis are also called \emph{time series} plots. Linegraphs should be avoided when there is not a clear sequential ordering to the variable on the x-axis. Let's illustrate linegraphs using another data set in the \texttt{nycflights13} package: the \texttt{weather} data frame.
+
+Let's get a sense for the \texttt{weather} data frame:
+
+\begin{itemize}
+\tightlist
+\item
+  Explore the \texttt{weather} data by running \texttt{View(weather)}.
+\item
+  Run \texttt{?weather} to bring up the help file.
+\end{itemize}
+
+We can see that there is a variable called \texttt{temp} of hourly temperature recordings in Fahrenheit at weather stations near all three airports in New York City: Newark (\texttt{origin} code \texttt{EWR}), JFK, and La Guardia (\texttt{LGA}). Instead of considering hourly temperatures for all days in 2013 for all three airports however, for simplicity let's only consider hourly temperatures at only Newark airport for the first 15 days in January.
+
+Recall in Section \ref{scatterplots} we used the \texttt{filter()} function to only choose the subset of rows of \texttt{flights} corresponding to Alaska Airlines flights. We similarly use \texttt{filter()} here, but by using the \texttt{\&} operator we only choose the subset of rows of \texttt{weather} where
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The \texttt{origin} is \texttt{"EWR"} and
+\item
+  the \texttt{month} is January and
+\item
+  the \texttt{day} is between \texttt{1} and \texttt{15}
+\end{enumerate}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{early_january_weather <-}\StringTok{ }\NormalTok{weather }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(origin }\OperatorTok{==}\StringTok{ "EWR"} \OperatorTok{&}\StringTok{ }\NormalTok{month }\OperatorTok{==}\StringTok{ }\DecValTok{1} \OperatorTok{&}\StringTok{ }\NormalTok{day }\OperatorTok{<=}\StringTok{ }\DecValTok{15}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.9)} Take a look at both the \texttt{weather} and \texttt{early\_january\_weather} data frames by running \texttt{View(weather)} and \texttt{View(early\_january\_weather)}. In what respect do these data frames differ?
+
+\textbf{(LC3.10)} \texttt{View()} the \texttt{flights} data frame again. Why does the \texttt{time\_hour} variable uniquely identify the hour of the measurement whereas the \texttt{hour} variable does not?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{geomline}{%
+\subsection{Linegraphs via geom\_line}\label{geomline}}
+
+Let's plot a linegraph of hourly temperatures in \texttt{early\_january\_weather} by using \texttt{geom\_line()} instead of \texttt{geom\_point()} like we did for scatterplots:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ early_january_weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ time_hour, }\DataTypeTok{y =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_line}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/hourlytemp-1} 
+
+}
+
+\caption{Hourly Temperature in Newark for January 1-15, 2013}\label{fig:hourlytemp}
+\end{figure}
+
+Much as with the \texttt{ggplot()} code that created the scatterplot of departure and arrival delays for Alaska Airlines flights in Figure \ref{fig:noalpha}, let's break down the above code piece-by-piece in terms of the Grammar of Graphics:
+
+\begin{itemize}
+\tightlist
+\item
+  Within the \texttt{ggplot()} function call, we specify two of the components of the Grammar of Graphics as arguments:
+
+  \begin{enumerate}
+  \def\labelenumi{\arabic{enumi}.}
+  \tightlist
+  \item
+    The \texttt{data} frame to be \texttt{early\_january\_weather} by setting \texttt{data\ =\ early\_january\_weather}
+  \item
+    The \texttt{aes}thetic mapping by setting \texttt{aes(x\ =\ time\_hour,\ y\ =\ temp)}. Specifically:
+
+    \begin{itemize}
+    \tightlist
+    \item
+      the variable \texttt{time\_hour} maps to the \texttt{x} position aesthetic.
+    \item
+      the variable \texttt{temp} maps to the \texttt{y} position aesthetic
+    \end{itemize}
+  \end{enumerate}
+\item
+  We add a layer to the \texttt{ggplot()} function call using the \texttt{+} sign. The layer in question specifies the third component of the grammar: the \texttt{geom}etric object in question. In this case the geometric object is a \texttt{line}, set by specifying \texttt{geom\_line()}.
+\end{itemize}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.11)} Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis?
+
+\textbf{(LC3.12)} Why are linegraphs frequently used when time is the explanatory variable on the x-axis?
+
+\textbf{(LC3.13)} Plot a time series of a variable other than \texttt{temp} for Newark Airport in the first 15 days of January 2013.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{summary-1}{%
+\subsection{Summary}\label{summary-1}}
+
+Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use linegraphs over scatterplots when the variable on the x-axis (i.e.~the explanatory variable) has an inherent ordering, like some notion of time.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{histograms}{%
+\section{5NG\#3: Histograms}\label{histograms}}
+
+Let's consider the \texttt{temp} variable in the \texttt{weather} data frame once again, but unlike with the linegraphs in Section \ref{linegraphs}, let's say we don't care about the relationship of temperature to time, but rather we only care about how the values of \texttt{temp} \emph{distribute}. In other words:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  What are the smallest and largest values?
+\item
+  What is the ``center'' value?
+\item
+  How do the values spread out?
+\item
+  What are frequent and infrequent values?
+\end{enumerate}
+
+One way to visualize this \emph{distribution} of this single variable \texttt{temp} is to plot them on a horizontal line as we do in Figure \ref{fig:temp-on-line}:
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/temp-on-line-1} 
+
+}
+
+\caption{Plot of Hourly Temperature Recordings from NYC in 2013}\label{fig:temp-on-line}
+\end{figure}
+
+This gives us a general idea of how the values of \texttt{temp} distribute: observe that temperatures vary from around 11°F up to 100°F. Furthermore, there appear to be more recorded temperatures between 40°F and 60°F than outside this range. However, because of the high degree of overlap in the points, it's hard to get a sense of exactly how many values are between, say, 50°F and 55°F.
+
+What is commonly produced instead of the above plot is known as a \emph{histogram}. A histogram is a plot that visualizes the \emph{distribution} of a numerical value as follows:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  We first cut up the x-axis into a series of \emph{bins}, where each bin represents a range of values.
+\item
+  For each bin, we count the number of observations that fall in the range corresponding to that bin.
+\item
+  Then for each bin, we draw a bar whose height marks the corresponding count.
+\end{enumerate}
+
+Let's drill-down on an example of a histogram, shown in Figure \ref{fig:histogramexample}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/histogramexample-1} 
+
+}
+
+\caption{Example histogram.}\label{fig:histogramexample}
+\end{figure}
+
+Observe that there are three bins of equal width between 30°F and 60°F, thus we have three bins of width 10°F each: one bin for the 30-40°F range, another bin for the 40-50°F range, and another bin for the 50-60°F range. Since:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The bin for the 30-40°F range has a height of around 5000, this histogram is telling us that around 5000 of the hourly temperature recordings are between 30°F and 40°F.
+\item
+  The bin for the 40-50°F range has a height of around 4300, this histogram is telling us that around 4300 of the hourly temperature recordings are between 40°F and 50°F.
+\item
+  The bin for the 50-60°F range has a height of around 3500, this histogram is telling us that around 3500 of the hourly temperature recordings are between 50°F and 60°F.
+\end{enumerate}
+
+The remaining bins all have a similar interpretation.
+
+\hypertarget{geomhistogram}{%
+\subsection{Histograms via geom\_histogram}\label{geomhistogram}}
+
+Let's now present the \texttt{ggplot()} code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in \texttt{aes()}: the single numerical variable \texttt{temp}. The y-aesthetic of a histogram gets computed for you automatically. Furthermore, the geometric object layer is now a \texttt{geom\_histogram()}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+`stat_bin()` using `bins = 30`. Pick better value with
+`binwidth`.
+\end{verbatim}
+
+\begin{verbatim}
+Warning: Removed 1 rows containing non-finite values (stat_bin).
+\end{verbatim}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/weather-histogram-1} 
+
+}
+
+\caption{Histogram of hourly temperatures at three NYC airports.}\label{fig:weather-histogram}
+\end{figure}
+
+Let's unpack the messages R sent us first. The first message is telling us that the histogram was constructed using \texttt{bins\ =\ 30}, in other words 30 equally spaced bins. This is known in computer programming as a default value; unless you override this default number of bins with a number you specify, R will choose 30 by default. We'll see in the next section how to change this default number of bins. The second message is telling us something similar to the warning message we received when we ran the code to create a scatterplot of departure and arrival delays for Alaska Airlines flights in Figure \ref{fig:noalpha}: that because one row has a missing \texttt{NA} value for \texttt{temp}, it was omitted from the histogram. R is just giving us a friendly heads up that this was the case.
+
+Now's let's unpack the resulting histogram in Figure \ref{fig:weather-histogram}. Observe that values less than 25°F as well as values above 80°F are rather rare. However, because of the large number of bins, its hard to get a sense for which range of temperatures is covered by each bin; everything is one giant amorphous blob. So let's add white vertical borders demarcating the bins by adding a \texttt{color\ =\ "white"} argument to \texttt{geom\_histogram()}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/weather-histogram-2-1} 
+
+}
+
+\caption{Histogram of hourly temperatures at three NYC airports with white borders.}\label{fig:weather-histogram-2}
+\end{figure}
+
+We can now better associate ranges of temperatures to each of the bins. We can also vary the color of the bars by setting the \texttt{fill} argument. Run \texttt{colors()} to see all 657 possible choice of colors!
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{fill =} \StringTok{"steelblue"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/weather-histogram-3-1} 
+
+}
+
+\caption{Histogram of hourly temperatures at three NYC airports with white borders.}\label{fig:weather-histogram-3}
+\end{figure}
+
+\hypertarget{adjustbins}{%
+\subsection{Adjusting the bins}\label{adjustbins}}
+
+Observe in both Figure \ref{fig:weather-histogram-2} and Figure \ref{fig:weather-histogram-3} that in the 50-75°F range there appear to be roughly 8 bins. Thus each bin has width 25 divided by 8, or roughly 3.12°F which is not a very easily interpretable range to work with. Let's now adjust the number of bins in our histogram in one of two methods:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  By adjusting the number of bins via the \texttt{bins} argument to \texttt{geom\_histogram()}.
+\item
+  By adjusting the width of the bins via the \texttt{binwidth} argument to \texttt{geom\_histogram()}.
+\end{enumerate}
+
+Using the first method, we have the power to specify how many bins we would like to cut the x-axis up in. As mentioned in the previous section, the default number of bins is 30. We can override this default, to say 40 bins, as follows:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{40}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-28-1} 
+
+}
+
+\caption{Histogram with 40 bins.}\label{fig:unnamed-chunk-28}
+\end{figure}
+
+Using the second method, instead of specifying the number of bins, we specify the width of the bins by using the \texttt{binwidth} argument in the \texttt{geom\_histogram()} layer. For example, let's set the width of each bin to be 10°F.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{10}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-29-1} 
+
+}
+
+\caption{Histogram with binwidth 10.}\label{fig:unnamed-chunk-29}
+\end{figure}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.14)} What does changing the number of bins from 30 to 40 tell us about the distribution of temperatures?
+
+\textbf{(LC3.15)} Would you classify the distribution of temperatures as symmetric or skewed?
+
+\textbf{(LC3.16)} What would you guess is the ``center'' value in this distribution? Why did you make that choice?
+
+\textbf{(LC3.17)} Is this data spread out greatly from the center or is it close? Why?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{summary-2}{%
+\subsection{Summary}\label{summary-2}}
+
+Histograms, unlike scatterplots and linegraphs, present information on only a single numerical variable. Specifically, they are visualizations of the distribution of the numerical variable in question.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{facets}{%
+\section{Facets}\label{facets}}
+
+Before continuing the 5NG, let's briefly introduce a new concept called \emph{faceting}. Faceting is used when we'd like to split a particular visualization of variables by another variable. This will create multiple copies of the same type of plot with matching x and y axes, but whose content will differ.
+
+For example, suppose we were interested in looking at how the histogram of hourly temperature recordings at the three NYC airports we saw in Section \ref{histograms} differed by month. We would ``split'' this histogram by the 12 possible months in a given year, in other words plot histograms of \texttt{temp} for each \texttt{month}. We do this by adding \texttt{facet\_wrap(\textasciitilde{}\ month)} layer.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{5}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_wrap}\NormalTok{(}\OperatorTok{~}\StringTok{ }\NormalTok{month)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/facethistogram-1} 
+
+}
+
+\caption{Faceted histogram.}\label{fig:facethistogram}
+\end{figure}
+
+Note the use of the tilde \texttt{\textasciitilde{}} before \texttt{month} in \texttt{facet\_wrap()}. The tilde is required and you'll receive the error \texttt{Error\ in\ as.quoted(facets)\ :\ object\ \textquotesingle{}month\textquotesingle{}\ not\ found} if you don't include it before \texttt{month} here. We can also specify the number of rows and columns in the grid by using the \texttt{nrow} and \texttt{ncol} arguments inside of \texttt{facet\_wrap()}. For example, say we would like our faceted plot to have 4 rows instead of 3. Add the \texttt{nrow\ =\ 4} argument to \texttt{facet\_wrap(\textasciitilde{}\ month)}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{5}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_wrap}\NormalTok{(}\OperatorTok{~}\StringTok{ }\NormalTok{month, }\DataTypeTok{nrow =} \DecValTok{4}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/facethistogram2-1} 
+
+}
+
+\caption{Faceted histogram with 4 instead of 3 rows.}\label{fig:facethistogram2}
+\end{figure}
+
+Observe in both Figure \ref{fig:facethistogram} and Figure \ref{fig:facethistogram2} that as we might expect in the Northern Hemisphere, temperatures tend to be higher in the summer months, while they tend to be lower in the winter.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.18)} What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables?
+
+\textbf{(LC3.19)} What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100?
+
+\textbf{(LC3.20)} For which types of data sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics.
+
+\textbf{(LC3.21)} Does the \texttt{temp} variable in the \texttt{weather} data set have a lot of variability? Why do you say that?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{boxplots}{%
+\section{5NG\#4: Boxplots}\label{boxplots}}
+
+While faceted histograms are one visualization that allows us to compare distributions of a numerical variable split by another variable, another visualization that achieves this same goal are \emph{side-by-side boxplots}. A boxplot is constructed from the information provided in the \emph{five-number summary} of a numerical variable (see Appendix \ref{appendixA}). To keep things simple for now, let's only consider hourly temperature recordings for the month of November in Figure \ref{fig:nov1}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/nov1-1} 
+
+}
+
+\caption{November temperatures.}\label{fig:nov1}
+\end{figure}
+
+These 2141 observations have the following five-number summary:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Minimum: 21.02°F
+\item
+  First quartile AKA 25\textsuperscript{th} percentile: 35.96°F
+\item
+  Median AKA second quartile AKA 50\textsuperscript{th} percentile: 44.96°F
+\item
+  Third quartile AKA 75\textsuperscript{th} percentile: 51.98°F
+\item
+  Maximum: 71.06°F
+\end{enumerate}
+
+Let's mark these 5 values with dashed horizontal lines in Figure \ref{fig:nov2}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/nov2-1} 
+
+}
+
+\caption{November temperatures.}\label{fig:nov2}
+\end{figure}
+
+Let's add the boxplot underneath these points and dashed horizontal lines in Figure \ref{fig:nov3}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/nov3-1} 
+
+}
+
+\caption{November temperatures.}\label{fig:nov3}
+\end{figure}
+
+What the boxplot does summarize the 2141 points by emphasizing that:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  25\% of points (about 534 observations) fall below the bottom edge of the box, which is the first quartile of 35.96°F. In other words 25\% of observations were colder than 35.96°F.
+\item
+  25\% of points fall between the bottom edge of the box and the solid middle line, which is the median of 44.96°F. In other words 25\% of observations were between 35.96 and 44.96°F and 50\% of observations were colder than 44.96°F.
+\item
+  25\% of points fall between the solid middle line and the top edge of the box, which is the third quartile of 51.98°F. In other words 25\% of observations were between 44.96 and 51.98°F and 75\% of observations were colder than 51.98°F.
+\item
+  25\% of points fall over the top edge of the box. In other words 25\% of observations were warmer than 51.98°F.
+\item
+  The middle 50\% of points lie within the \emph{interquartile range} between the first and third quartile of 51.98 - 35.96 = 16.02°F.
+\end{enumerate}
+
+Lastly, for clarity's sake let's remove the points but keep the dashed horizontal lines in Figure \ref{fig:nov4}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/nov4-1} 
+
+}
+
+\caption{November temperatures.}\label{fig:nov4}
+\end{figure}
+
+We can now better see the \emph{whiskers} of the boxplot. They stick out from either end of the box all the way to the minimum and maximum observed temperatures of 21.02°F and 71.06°F respectively. However, the whiskers don't always extend to the smallest and largest observed values. They in fact can extend no more than 1.5 \(\times\) the interquartile range from either end of the box, in this case 1.5 \(\times\) 16.02°F = 24.03°F from either end of the box. Any observed values outside this whiskers get marked with points called \emph{outliers}, which we'll see in the next section.
+
+\hypertarget{geomboxplot}{%
+\subsection{Boxplots via geom\_boxplot}\label{geomboxplot}}
+
+Let's now create a side-by-side boxplot of hourly temperatures split by the 12 months as we did above with the faceted histograms. We do this by mapping the \texttt{month} variable to the x-position aesthetic, the \texttt{temp} variable to the y-position aesthetic, and by adding a \texttt{geom\_boxplot()} layer:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ month, }\DataTypeTok{y =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_boxplot}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/badbox-1} 
+
+}
+
+\caption{Invalid boxplot specification}\label{fig:badbox}
+\end{figure}
+
+\begin{verbatim}
+Warning messages:
+1: Continuous x aesthetic -- did you forget aes(group=...)? 
+2: Removed 1 rows containing non-finite values (stat_boxplot). 
+\end{verbatim}
+
+Observe in Figure \ref{fig:badbox} that this plot does not provide information about temperature separated by month. The warning messages clue us in as to why. The second warning message is identical to the warning message when plotting a histogram of hourly temperatures: that one of the values was recorded as \texttt{NA} missing. However, the first warning message is telling us that we have a ``continuous'', or numerical variable, on the x-position aesthetic. Boxplots however require a categorical variable on the x-axis.
+
+We can convert the numerical variable \texttt{month} into a categorical variable by using the \texttt{factor()} function. So after applying \texttt{factor(month)}, month goes from having numerical values 1, 2, \ldots{}, 12 to having labels ``1'', ``2'', \ldots{}, ``12.''
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =} \KeywordTok{factor}\NormalTok{(month), }\DataTypeTok{y =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_boxplot}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/monthtempbox-1} 
+
+}
+
+\caption{Month by temp boxplot}\label{fig:monthtempbox}
+\end{figure}
+
+The resulting Figure \ref{fig:monthtempbox} shows 12 separate ``box and whiskers'' plots with the features we saw earlier focusing only on November:
+
+\begin{itemize}
+\tightlist
+\item
+  The ``box'' portions of this visualization represent the 1\textsuperscript{st} quartile, the median AKA the 2\textsuperscript{nd} quartile, and the 3\textsuperscript{rd} quartile.
+\item
+  The height of each box, i.e.~the value of the 3\textsuperscript{rd} quartile minus the value of the 1\textsuperscript{st} quartile, is the \emph{interquartile range}. It is a measure of spread of the middle 50\% of values, with longer boxes indicating more variability.
+\item
+  The ``whisker'' portions of these plots extend out from the bottoms and tops of the boxes and represent points less than the 25\textsuperscript{th} percentile and greater than the 75\textsuperscript{th} percentiles respectively. They're set to extend out no more than \(1.5 \times IQR\) units away from either end of the boxes. We say ``no more than'' because the ends of the whiskers have to correspond to observed temperatures. The length of these whiskers show how the data outside the middle 50\% of values vary, with longer whiskers indicating more variability.
+\item
+  The dots representing values falling outside the whiskers are called \emph{outliers}. These can be thought of as anomalous values.
+\end{itemize}
+
+It is important to keep in mind that the definition of an outlier is somewhat arbitrary and not absolute. In this case, they are defined by the length of the whiskers, which are no more than \(1.5 \times IQR\) units long. Looking at this plot we can see, as expected, that summer months (6 through 8) have higher median temperatures as evidenced by the higher solid lines in the middle of the boxes. We can easily compare temperatures across months by drawing imaginary horizontal lines across the plot. Furthermore, the height of the 12 boxes as quantified by the interquartile ranges are informative too; they tell us about variability, or spread, of temperatures recorded in a given month.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.22)} What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point.
+
+\textbf{(LC3.23)} Which months have the highest variability in temperature? What reasons can you give for this?
+
+\textbf{(LC3.24)} We looked at the distribution of the numerical variable \texttt{temp} split by the numerical variable \texttt{month} that we converted to a categorical variable using the \texttt{factor()} function. Why would a boxplot of \texttt{temp} split by the numerical variable \texttt{pressure} similarly converted to a categorical variable using the \texttt{factor()} not be informative?
+
+\textbf{(LC3.25)} Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{summary-3}{%
+\subsection{Summary}\label{summary-3}}
+
+Side-by-side boxplots provide us with a way to compare and contrast the distribution of a quantitative variable across multiple levels of another categorical variable. One can see where the median falls across the different groups by looking at the center line in the boxes. To see how spread out the variable is across the different groups, look at both the width of the box and also how far the whiskers stretch out away from the box. Outliers are even more easily identified when looking at a boxplot than when looking at a histogram as they are marked with points.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{geombar}{%
+\section{5NG\#5: Barplots}\label{geombar}}
+
+Both histograms and boxplots are tools to visualize the distribution of numerical variables. Another common task is visualize the distribution of a categorical variable. This is a simpler task, as we are simply counting different categories, also known as \emph{levels}, of a categorical variable. Often the best way to visualize these different counts, also known as \emph{frequencies}, is with a barplot (also known as a barchart). One complication, however, is how your data is represented: is the categorical variable of interest ``pre-counted'' or not? For example, run the following code that manually creates two data frames representing a collection of fruit: 3 apples and 2 oranges.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{fruits <-}\StringTok{ }\KeywordTok{data_frame}\NormalTok{(}
+  \DataTypeTok{fruit =} \KeywordTok{c}\NormalTok{(}\StringTok{"apple"}\NormalTok{, }\StringTok{"apple"}\NormalTok{, }\StringTok{"orange"}\NormalTok{, }\StringTok{"apple"}\NormalTok{, }\StringTok{"orange"}\NormalTok{)}
+\NormalTok{)}
+\NormalTok{fruits_counted <-}\StringTok{ }\KeywordTok{data_frame}\NormalTok{(}
+  \DataTypeTok{fruit =} \KeywordTok{c}\NormalTok{(}\StringTok{"apple"}\NormalTok{, }\StringTok{"orange"}\NormalTok{),}
+  \DataTypeTok{number =} \KeywordTok{c}\NormalTok{(}\DecValTok{3}\NormalTok{, }\DecValTok{2}\NormalTok{)}
+\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+We see both the \texttt{fruits} and \texttt{fruits\_counted} data frames represent the same collection of fruit. Whereas \texttt{fruits} just lists the fruit individually\ldots{}
+
+\begin{verbatim}
+# A tibble: 5 x 1
+  fruit 
+  <chr> 
+1 apple 
+2 apple 
+3 orange
+4 apple 
+5 orange
+\end{verbatim}
+
+\ldots{} \texttt{fruits\_counted} has a variable \texttt{count} which represents pre-counted values of each fruit.
+
+\begin{verbatim}
+# A tibble: 2 x 2
+  fruit  number
+  <chr>   <dbl>
+1 apple       3
+2 orange      2
+\end{verbatim}
+
+Depending on how your categorical data is represented, you'll need to use add a different \texttt{geom} layer to your \texttt{ggplot()} to create a barplot, as we now explore.
+
+\hypertarget{barplots-via-geom_bar-or-geom_col}{%
+\subsection{Barplots via geom\_bar or geom\_col}\label{barplots-via-geom_bar-or-geom_col}}
+
+Let's generate barplots using these two different representations of the same basket of fruit: 3 apples and 2 oranges. Using the \texttt{fruits} data frame where all 5 fruits are listed individually in 5 rows, we map the \texttt{fruit} variable to the x-position aesthetic and add a \texttt{geom\_bar()} layer.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ fruits, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ fruit)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/geombar-1} 
+
+}
+
+\caption{Barplot when counts are not pre-counted}\label{fig:geombar}
+\end{figure}
+
+However, using the \texttt{fruits\_counted} data frame where the fruit have been ``pre-counted'', we map the \texttt{fruit} variable to the x-position aesthetic as with \texttt{geom\_bar()}, but we also map the \texttt{count} variable to the y-position aesthetic, and add a \texttt{geom\_col()} layer.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ fruits_counted, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ fruit, }\DataTypeTok{y =}\NormalTok{ number)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_col}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/geomcol-1} 
+
+}
+
+\caption{Barplot when counts are pre-counted}\label{fig:geomcol}
+\end{figure}
+
+Compare the barplots in Figures \ref{fig:geombar} and \ref{fig:geomcol}. They are identical because they reflect count of the same 5 fruit. However depending on how our data is saved, either pre-counted or not, we must add a different \texttt{geom} layer. When the categorical variable whose distribution you want to visualize is:
+
+\begin{itemize}
+\tightlist
+\item
+  Is not pre-counted in your data frame: use \texttt{geom\_bar()}.
+\item
+  Is pre-counted in your data frame, use \texttt{geom\_col()} with the y-position aesthetic mapped to the variable that has the counts.
+\end{itemize}
+
+Let's now go back to the \texttt{flights} data frame in the \texttt{nycflights13} package and visualize the distribution of the categorical variable \texttt{carrier}. In other words, let's visualize the number of domestic flights out of the three New York City airports each airline company flew in 2013. Recall from Section \ref{exploredataframes} when you first explored the \texttt{flights} data frame you saw that each row corresponds to a flight. In other words the \texttt{flights} data frame is more like the \texttt{fruits} data frame than the \texttt{fruits\_counted} data frame above, and thus we should use \texttt{geom\_bar()} instead of \texttt{geom\_col()} to create a barplot. Much like a \texttt{geom\_histogram()}, there is only one variable in the \texttt{aes()} aesthetic mapping: the variable \texttt{carrier} gets mapped to the \texttt{x}-position.
+
+
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/flightsbar-1} 
+
+}
+
+\caption{Number of flights departing NYC in 2013 by airline using geom\_bar}\label{fig:flightsbar}
+\end{figure}
+
+Observe in Figure \ref{fig:flightsbar} that United Air Lines (UA), JetBlue Airways (B6), and ExpressJet Airlines (EV) had the most flights depart New York City in 2013. If you don't know which airlines correspond to which carrier codes, then run \texttt{View(airlines)} to see a directory of airlines. For example: AA is American Airlines; B6 is JetBlue Airways; DL is Delta Airlines; EV is ExpressJet Airlines; MQ is Envoy Air; while UA is United Airlines.
+
+Alternatively, say you had a data frame \texttt{flights\_counted} where the number of flights for each \texttt{carrier} was pre-counted like in Table \ref{tab:flights-counted}.
+
+\begingroup\fontsize{10}{12}\selectfont
+
+\begin{longtable}{lr}
+\caption{\label{tab:flights-counted}Number of flights pre-counted for each carrier.}\\
+\toprule
+carrier & number\\
+\midrule
+9E & 18460\\
+AA & 32729\\
+AS & 714\\
+B6 & 54635\\
+DL & 48110\\
+\addlinespace
+EV & 54173\\
+F9 & 685\\
+FL & 3260\\
+HA & 342\\
+MQ & 26397\\
+\addlinespace
+OO & 32\\
+UA & 58665\\
+US & 20536\\
+VX & 5162\\
+WN & 12275\\
+\addlinespace
+YV & 601\\
+\bottomrule
+\end{longtable}
+\endgroup{}
+
+In order to create a barplot visualizing the distribution of the categorical variable \texttt{carrier} in this case, we would use \texttt{geom\_col()} instead with \texttt{x} mapped to \texttt{carrier} and \texttt{y} mapped to \texttt{number} as seen below. The resulting barplot would be identical to Figure \ref{fig:flightsbar}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights_table, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier, }\DataTypeTok{y =}\NormalTok{ number)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_col}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.26)} Why are histograms inappropriate for visualizing categorical variables?
+
+\textbf{(LC3.27)} What is the difference between histograms and barplots?
+
+\textbf{(LC3.28)} How many Envoy Air flights departed NYC in 2013?
+
+\textbf{(LC3.29)} What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{must-avoid-pie-charts}{%
+\subsection{Must avoid pie charts!}\label{must-avoid-pie-charts}}
+
+Unfortunately, one of the most common plots seen today for categorical data is the pie chart. While they may seem harmless enough, they actually present a problem in that humans are unable to judge angles well. As Naomi Robbins describes in her book ``Creating More Effective Graphs'' \citep{robbins2013}, we overestimate angles greater than 90 degrees and we underestimate angles less than 90 degrees. In other words, it is difficult for us to determine relative size of one piece of the pie compared to another.
+
+Let's examine the same data used in our previous barplot of the number of flights departing NYC by airline in Figure \ref{fig:flightsbar}, but this time we will use a pie chart in Figure \ref{fig:carrierpie}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/carrierpie-1} 
+
+}
+
+\caption{The dreaded pie chart}\label{fig:carrierpie}
+\end{figure}
+
+Try to answer the following questions:
+
+\begin{itemize}
+\tightlist
+\item
+  How much larger the portion of the pie is for ExpressJet Airlines (\texttt{EV}) compared to US Airways (\texttt{US}),
+\item
+  What the third largest carrier is in terms of departing flights, and
+\item
+  How many carriers have fewer flights than United Airlines (\texttt{UA})?
+\end{itemize}
+
+While it is quite difficult to answer these questions when looking at the pie chart in Figure \ref{fig:carrierpie}, we can much more easily answer these questions using the barchart in Figure \ref{fig:flightsbar}. This is true since barplots present the information in a way such that comparisons between categories can be made with single horizontal lines, whereas pie charts present the information in a way such that comparisons between categories must be made by comparing angles.
+
+There may be one exception of a pie chart not to avoid courtesy Nathan Yau at \href{https://flowingdata.com/2008/09/19/pie-i-have-eaten-and-pie-i-have-not-eaten/}{FlowingData.com}, but we will leave this for the reader to decide:
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth,height=2.5in]{images/Pie-I-have-Eaten} 
+
+}
+
+\caption{The only good pie chart}\label{fig:unnamed-chunk-36}
+\end{figure}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.30)} Why should pie charts be avoided and replaced by barplots?
+
+\textbf{(LC3.31)} Why do you think people continue to use pie charts?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{two-categ-barplot}{%
+\subsection{Two categorical variables}\label{two-categ-barplot}}
+
+Barplots are the go-to way to visualize the frequency of different categories, or levels, of a single categorical variable. Another use of barplots is to visualize the \emph{joint} distribution of two categorical variables at the same time. Let's examine the \emph{joint} distribution of outgoing domestic flights from NYC by \texttt{carrier} and \texttt{origin}, or in other words the number of flights for each \texttt{carrier} and \texttt{origin} combination. For example, the number of WestJet flights from \texttt{JFK}, the number of WestJet flights from \texttt{LGA}, the number of WestJet flights from \texttt{EWR}, the number of American Airlines flights from \texttt{JFK}, and so on. Recall the \texttt{ggplot()} code that created the barplot of \texttt{carrier} frequency in Figure \ref{fig:flightsbar}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-38-1} \end{center}
+
+We can now map the additional variable \texttt{origin} by adding a \texttt{fill\ =\ origin} inside the \texttt{aes()} aesthetic mapping; the \texttt{fill} aesthetic of any bar corresponds to the color used to fill the bars.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier, }\DataTypeTok{fill =}\NormalTok{ origin)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/flights-stacked-bar-1} 
+
+}
+
+\caption{Stacked barplot comparing the number of flights by carrier and origin.}\label{fig:flights-stacked-bar}
+\end{figure}
+
+Figure \ref{fig:flights-stacked-bar} is an example of a \emph{stacked barplot}. While simple to make, in certain aspects it is not ideal. For example, it is difficult to compare the heights of the different colors between the bars, corresponding to comparing the number of flights from each \texttt{origin} airport between the carriers.
+
+Before we continue, let's address some common points of confusion amongst new R users. First, note that \texttt{fill} is another aesthetic mapping much like \texttt{x}-position; thus it must be included within the parentheses of the \texttt{aes()} mapping. The following code, where the \texttt{fill} aesthetic is specified outside the \texttt{aes()} mapping will yield an error. This is a fairly common error that new \texttt{ggplot} users make:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier), }\DataTypeTok{fill =}\NormalTok{ origin) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+Second, the \texttt{fill} aesthetic corresponds to the color used to fill the bars, while the \texttt{color} aesthetic corresponds to the color of the outline of the bars. Observe in Figure \ref{fig:flights-stacked-bar-color} that mapping \texttt{origin} to \texttt{color} and not \texttt{fill} yields grey bars with different colored outlines.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier, }\DataTypeTok{color =}\NormalTok{ origin)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/flights-stacked-bar-color-1} 
+
+}
+
+\caption{Stacked barplot with color aesthetic used instead of fill.}\label{fig:flights-stacked-bar-color}
+\end{figure}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.32)} What kinds of questions are not easily answered by looking at the above figure?
+
+\textbf{(LC3.33)} What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+Another alternative to stacked barplots are \emph{side-by-side barplots}, also known as a \emph{dodged barplot}. The code to created a side-by-side barplot is identical to the code to create a stacked barplot, but with a \texttt{position\ =\ "dodge"} argument added to \texttt{geom\_bar()}. In other words, we are overriding the default barplot type, which is a stacked barplot, and specifying it to be a side-by-side barplot.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier, }\DataTypeTok{fill =}\NormalTok{ origin)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{position =} \StringTok{"dodge"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-41-1} 
+
+}
+
+\caption{Side-by-side AKA dodged barplot comparing the number of flights by carrier and origin.}\label{fig:unnamed-chunk-41}
+\end{figure}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.34)} Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case?
+
+\textbf{(LC3.35)} What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+Lastly, another type of barplot is a \emph{faceted barplot}. Recall in Section \ref{facets} we visualized the distribution of hourly temperatures at the 3 NYC airports \emph{split} by month using facets. We apply the same principle to our barplot visualizing the frequency of \texttt{carrier} split by \texttt{origin}: instead of mapping \texttt{origin}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_wrap}\NormalTok{(}\OperatorTok{~}\StringTok{ }\NormalTok{origin, }\DataTypeTok{ncol =} \DecValTok{1}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/facet-bar-vert-1} 
+
+}
+
+\caption{Faceted barplot comparing the number of flights by carrier and origin.}\label{fig:facet-bar-vert}
+\end{figure}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC3.36)} Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case?
+
+\textbf{(LC3.37)} What information about the different carriers at different airports is more easily seen in the faceted barplot?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{summary-4}{%
+\subsection{Summary}\label{summary-4}}
+
+Barplots are the preferred way of displaying the distribution of a categorical variable, or in other words the frequency with which the different categories called \emph{levels} occur. They are easy to understand and make it easy to make comparisons across levels. When trying to visualize two categorical variables, you have many options: stacked barplots, side-by-side barplots, and faceted barplots. Depending on what aspect of the joint distribution you are trying to emphasize, you will need to make a choice between these three types of barplots.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{conclusion-1}{%
+\section{Conclusion}\label{conclusion-1}}
+
+\hypertarget{summary-table}{%
+\subsection{Summary table}\label{summary-table}}
+
+Let's recap all five of the Five Named Graphs (5NG) in Table \ref{tab:viz-summary-table} summarizing their differences. Using these 5NG, you'll be able to visualize the distributions and relationships of variables contained in a wide array of datasets. This will be even more the case as we start to map more variables to more of each \texttt{geom}etric object's \texttt{aes}thetic attribute options, further unlocking the awesome power of the \texttt{ggplot2} package.
+
+\begin{table}[H]
+
+\caption{\label{tab:viz-summary-table}Summary of 5NG}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{rl>{\raggedright\arraybackslash}p{0.75in}>{\raggedright\arraybackslash}p{1.1in}>{\raggedright\arraybackslash}p{1.1in}}
+\toprule
+  & Named graph & Shows & Geometric object & Notes\\
+\midrule
+1 & Scatterplot & Relationship between 2 numerical variables & `geom\_point()` & \\
+2 & Linegraph & Relationship between 2 numerical variables & `geom\_line()` & Used when there is a sequential order to x-variable e.g. time\\
+3 & Histogram & Distribution of 1 numerical variable & `geom\_histogram()` & Facetted histograms show the distribution of 1 numerical variable split by values of another variable\\
+4 & Boxplot & Distribution of 1 numerical variable split by 1 categorical variable & `geom\_boxplot()` & \\
+5 & Barplot & Distribution of 1 categorical variable & `geom\_bar()` when counts are not pre-counted, `geom\_col()` when counts are pre-counted & Stacked, side-by-side, and faceted barplots show the *joint* distribution of 2 categorical variables\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+\hypertarget{argument-specification}{%
+\subsection{Argument specification}\label{argument-specification}}
+
+Run the following two segments of code. First this:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+then this:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(flights, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+You'll notice that that both code segments create the same barplot, even though in the second segment we omitted the \texttt{data\ =} and \texttt{mapping\ =} code argument names. This is because the \texttt{ggplot()} by default assumes that the \texttt{data} argument comes first and the \texttt{mapping} argument comes second. So as long as you specify the data frame in question first and the \texttt{aes()} mapping second, you can omit the explicit statement of the argument names \texttt{data\ =} and \texttt{mapping\ =}.
+
+Going forward for the rest of this book, all \texttt{ggplot()} will be like the second segment above: with the \texttt{data\ =} and \texttt{mapping\ =} explicit naming of the argument omitted and the default ordering of arguments respected.
+
+\hypertarget{additional-resources-1}{%
+\subsection{Additional resources}\label{additional-resources-1}}
+
+An R script file of all R code used in this chapter is available \href{scripts/03-visualization.R}{here}.
+
+If you want to further unlock the power of the \texttt{ggplot2} package for data visualization, we suggest you that you check out RStudio's ``Data Visualization with ggplot2'' cheatsheet. This cheatsheet summarizes much more than what we've discussed in this chapter, in particular the many more than the 5 \texttt{geom} geometric objects we covered in this Chapter, while providing quick and easy to read visual descriptions.
+
+You can access this cheatsheet by going to the RStudio Menu Bar -\textgreater{} Help -\textgreater{} Cheatsheets -\textgreater{} ``Data Visualization with ggplot2'':
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/ggplot_cheatsheet-1} 
+
+}
+
+\caption{Data Visualization with ggplot2 cheatsheat}\label{fig:ggplot-cheatsheet}
+\end{figure}
+
+\hypertarget{whats-to-come-3}{%
+\subsection{What's to come}\label{whats-to-come-3}}
+
+Recall in Figure \ref{fig:noalpha} in Section \ref{scatterplots} we visualized the relationship between departure delay and arrival delay for Alaska Airlines flights. This necessitated paring down the \texttt{flights} data frame to a new data frame \texttt{alaska\_flights} consisting of only \texttt{carrier\ ==\ AS} flights first:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{alaska_flights <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(carrier }\OperatorTok{==}\StringTok{ "AS"}\NormalTok{)}
+
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ alaska_flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ dep_delay, }\DataTypeTok{y =}\NormalTok{ arr_delay)) }\OperatorTok{+}\StringTok{ }
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+Furthermore recall in Figure \ref{fig:hourlytemp} in Section \ref{linegraphs} we visualized hourly temperature recordings at Newark airport only for the first 15 days of January 2013. This necessitated paring down the \texttt{weather} data frame to a new data frame \texttt{early\_january\_weather} consisting of hourly temperature recordings only for \texttt{origin\ ==\ "EWR"}, \texttt{month\ ==\ 1}, and day less than or equal to \texttt{15} first:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{early_january_weather <-}\StringTok{ }\NormalTok{weather }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(origin }\OperatorTok{==}\StringTok{ "EWR"} \OperatorTok{&}\StringTok{ }\NormalTok{month }\OperatorTok{==}\StringTok{ }\DecValTok{1} \OperatorTok{&}\StringTok{ }\NormalTok{day }\OperatorTok{<=}\StringTok{ }\DecValTok{15}\NormalTok{)}
+
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ early_january_weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ time_hour, }\DataTypeTok{y =}\NormalTok{ temp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_line}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+These two code segments were a preview of Chapter \ref{wrangling} on data wrangling where we'll delve further into the \texttt{dplyr} package. Data wrangling is the process of transforming and modifying existing data with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the \texttt{filter()} function to create new data frames (\texttt{alaska\_flights} and \texttt{early\_january\_weather}) by choosing only a subset of rows of existing data frames (\texttt{flights} and \texttt{weather}). In this next chapter, we'll formally introduce the \texttt{filter()} and other data wrangling functions as well as the \emph{pipe operator} \texttt{\%\textgreater{}\%} which allows you to combine multiple data wrangling actions into a single sequential \emph{chain} of actions. On to Chapter \ref{wrangling} on data wrangling!
+
+\hypertarget{wrangling}{%
+\chapter{Data Wrangling}\label{wrangling}}
+
+So far in our journey, we've seen how to look at data saved in data frames using the \texttt{glimpse()} and \texttt{View()} functions in Chapter \ref{getting-started} on and how to create data visualizations using the \texttt{ggplot2} package in Chapter \ref{viz}. In particular we studied what we term the ``five named graphs'' (5NG):
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  scatterplots via \texttt{geom\_point()}
+\item
+  linegraphs via \texttt{geom\_line()}
+\item
+  boxplots via \texttt{geom\_boxplot()}
+\item
+  histograms via \texttt{geom\_histogram()}
+\item
+  barplots via \texttt{geom\_bar()} or \texttt{geom\_col()}
+\end{enumerate}
+
+We created these visualizations using the ``Grammar of Graphics'', which maps variables in a data frame to the aesthetic attributes of one the above 5 \texttt{geom}etric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure \ref{fig:gapminder}.
+
+Recall however in Section \ref{whats-to-come-3} we discussed that for two of our visualizations we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay \emph{only} for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the \texttt{flights} data frame to a new data frame \texttt{alaska\_flights} consisting of only \texttt{carrier\ ==\ "AS"} flights using the \texttt{filter()} function.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{alaska_flights <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(carrier }\OperatorTok{==}\StringTok{ "AS"}\NormalTok{)}
+
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ alaska_flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ dep_delay, }\DataTypeTok{y =}\NormalTok{ arr_delay)) }\OperatorTok{+}\StringTok{ }
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+In this chapter, we'll introduce a series of functions from the \texttt{dplyr} package that will allow you to take a data frame and
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \texttt{filter()} its existing rows to only pick out a subset of them. For example, the \texttt{alaska\_flights} data frame above.
+\item
+  \texttt{summarize()} one of its columns/variables with a \emph{summary statistic}. Examples include the median and interquartile range of temperatures as we saw in Section \ref{boxplots} on boxplots.
+\item
+  \texttt{group\_by()} its rows. In other words assign different rows to be part of the same \emph{group} and report summary statistics for each group separately. For example, say perhaps you don't want a single overall average departure delay \texttt{dep\_delay} for all three \texttt{origin} airports combined, but rather three separate average departure delays, one for each of the three \texttt{origin} airports.
+\item
+  \texttt{mutate()} its existing columns/variables to create new ones. For example, convert hourly temperature recordings from °F to °C.
+\item
+  \texttt{arrange()} its rows. For example, sort the rows of \texttt{weather} in ascending or descending order of \texttt{temp}.
+\item
+  \texttt{join()} it with another data frame by matching along a ``key'' variable. In other words, merge these two data frames together.
+\end{enumerate}
+
+Notice how we used \texttt{computer\ code} font to describe the actions we want to take on our data frames. This is because the \texttt{dplyr} package for data wrangling that we'll introduce in this chapter has intuitively verb-named functions that are easy to remember.
+
+We'll start by introducing the pipe operator \texttt{\%\textgreater{}\%}, which allows you to combine multiple data wrangling verb-named functions into a single sequential \emph{chain} of actions.
+
+\hypertarget{needed-packages-1}{%
+\subsection*{Needed packages}\label{needed-packages-1}}
+
+
+Let's load all the packages needed for this chapter (this assumes you've already installed them). If needed, read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{piping}{%
+\section{\texorpdfstring{The pipe operator: \texttt{\%\textgreater{}\%}}{The pipe operator: \%\textgreater{}\%}}\label{piping}}
+
+Before we start data wrangling, let's first introduce a very nifty tool that gets loaded along with the \texttt{dplyr} package: the pipe operator \texttt{\%\textgreater{}\%}. Say you would like to perform a hypothetical sequence of operations on a hypothetical data frame \texttt{x} using hypothetical functions \texttt{f()}, \texttt{g()}, and \texttt{h()}:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Take \texttt{x} \emph{then}
+\item
+  Use \texttt{x} as an input to a function \texttt{f()} \emph{then}
+\item
+  Use the output of \texttt{f(x)} as an input to a function \texttt{g()} \emph{then}
+\item
+  Use the output of \texttt{g(f(x))} as an input to a function \texttt{h()}
+\end{enumerate}
+
+One way to achieve this sequence of operations is by using nesting parentheses as follows:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{h}\NormalTok{(}\KeywordTok{g}\NormalTok{(}\KeywordTok{f}\NormalTok{(x)))}
+\end{Highlighting}
+\end{Shaded}
+
+The above code isn't so hard to read since we are applying only three functions: \texttt{f()}, then \texttt{g()}, then \texttt{h()}. However, you can imagine that this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator \texttt{\%\textgreater{}\%} comes in handy. \texttt{\%\textgreater{}\%} takes one output of one function and then ``pipes'' it to be the input of the next function. Furthermore, a helpful trick is to read \texttt{\%\textgreater{}\%} as ``then.'' For example, you can obtain the same output as the above sequence of operations as follows:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{x }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{f}\NormalTok{() }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{g}\NormalTok{() }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{h}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+You would read this above sequence as:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Take \texttt{x} \emph{then}
+\item
+  Use this output as the input to the next function \texttt{f()} \emph{then}
+\item
+  Use this output as the input to the next function \texttt{g()} \emph{then}
+\item
+  Use this output as the input to the next function \texttt{h()}
+\end{enumerate}
+
+So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are the hypothetical \texttt{x}, \texttt{f()}, \texttt{g()}, and \texttt{h()}? Throughout this chapter on data wrangling:
+
+\begin{itemize}
+\tightlist
+\item
+  The starting value \texttt{x} will be a data frame. For example: \texttt{flights}.
+\item
+  The sequence of functions, here \texttt{f()}, \texttt{g()}, and \texttt{h()}, will be a sequence of any number of the 6 data wrangling verb-named functions we listed in the introduction to this chapter. For example: \texttt{filter(carrier\ ==\ "AS")}.
+\item
+  The result will the transformed/modified data frame that you want. For example: a data frame consisting of only the subset of rows in \texttt{flights} corresponding to Alaska Airlines flights.
+\end{itemize}
+
+Much like when adding layers to a \texttt{ggplot()} using the \texttt{+} sign at the end of lines, you form a single \emph{chain} of data wrangling operations by combining verb-named functions into a single sequence with pipe operators \texttt{\%\textgreater{}\%} at the end of lines. So continuing our example involving Alaska Airlines flights, we form a chain using the pipe operator \texttt{\%\textgreater{}\%} and save the resulting data frame in \texttt{alaska\_flights}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{alaska_flights <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(carrier }\OperatorTok{==}\StringTok{ "AS"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you'll see some examples of these near in Section \ref{other-verbs}. However, just with these 6 verb-named functions you'll be able to perform a broad array of data wrangling tasks for the rest of this book.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{filter}{%
+\section{\texorpdfstring{\texttt{filter} rows}{filter rows}}\label{filter}}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/filter} 
+
+}
+
+\caption{Diagram of }\label{fig:filter}
+\end{figure}
+
+The \texttt{filter()} function here works much like the ``Filter'' option in Microsoft Excel; it allows you to specify criteria about the values of a variables in your dataset and then filters out only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The \texttt{dest} code (or airport code) for Portland, Oregon is \texttt{"PDX"}. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{portland_flights <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(dest }\OperatorTok{==}\StringTok{ "PDX"}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(portland_flights)}
+\end{Highlighting}
+\end{Shaded}
+
+Note the following:
+
+\begin{itemize}
+\tightlist
+\item
+  The ordering of the commands:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Take the \texttt{flights} data frame \texttt{flights} \emph{then}
+  \item
+    \texttt{filter} the data frame so that only those where the \texttt{dest} equals \texttt{"PDX"} are included.
+  \end{itemize}
+\item
+  We test for equality using the double equal sign \texttt{==} and not a single equal sign \texttt{=}. In other words \texttt{filter(dest\ =\ "PDX")} will yield an error. This is a convention across many programming languages. If you are new to coding, you'll probably forget to use the double equal sign \texttt{==} a few times before you get the hang of it.
+\end{itemize}
+
+You can use other mathematical operations beyond just \texttt{==} to form criteria:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{\textgreater{}} corresponds to ``greater than''
+\item
+  \texttt{\textless{}} corresponds to ``less than''
+\item
+  \texttt{\textgreater{}=} corresponds to ``greater than or equal to''
+\item
+  \texttt{\textless{}=} corresponds to ``less than or equal to''
+\item
+  \texttt{!=} corresponds to ``not equal to''. The \texttt{!} is used in many programming languages to indicate ``not''.
+\end{itemize}
+
+Furthermore, you can combine multiple criteria together using operators that make comparisons:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{\textbar{}} corresponds to ``or''
+\item
+  \texttt{\&} corresponds to ``and''
+\end{itemize}
+
+To see many of these in action, let's filter \texttt{flights} for all rows that:
+
+\begin{itemize}
+\tightlist
+\item
+  Departed from JFK airport and
+\item
+  Were heading to Burlington, Vermont (\texttt{"BTV"}) or Seattle, Washington (\texttt{"SEA"}) and
+\item
+  Departed in the months of October, November, or December.
+\end{itemize}
+
+Run the following:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{btv_sea_flights_fall <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(origin }\OperatorTok{==}\StringTok{ "JFK"} \OperatorTok{&}\StringTok{ }\NormalTok{(dest }\OperatorTok{==}\StringTok{ "BTV"} \OperatorTok{|}\StringTok{ }\NormalTok{dest }\OperatorTok{==}\StringTok{ "SEA"}\NormalTok{) }\OperatorTok{&}\StringTok{ }\NormalTok{month }\OperatorTok{>=}\StringTok{ }\DecValTok{10}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(btv_sea_flights_fall)}
+\end{Highlighting}
+\end{Shaded}
+
+Note that even though colloquially speaking one might say ``all flights leaving Burlington, Vermont \emph{and} Seattle, Washington,'' in terms of computer operations, we really mean ``all flights leaving Burlington, Vermont \emph{or} leaving Seattle, Washington.'' For a given row in the data, \texttt{dest} can be ``BTV'', ``SEA'', or something else, but not ``BTV'' and ``SEA'' at the same time. Furthermore, note the careful use of parentheses around the \texttt{dest\ ==\ "BTV"\ \textbar{}\ dest\ ==\ "SEA"}.
+
+We can often skip the use of \texttt{\&} and just separate our conditions with a comma. In other words the code above will return the identical output \texttt{btv\_sea\_flights\_fall} as this code below:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{btv_sea_flights_fall <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(origin }\OperatorTok{==}\StringTok{ "JFK"}\NormalTok{, (dest }\OperatorTok{==}\StringTok{ "BTV"} \OperatorTok{|}\StringTok{ }\NormalTok{dest }\OperatorTok{==}\StringTok{ "SEA"}\NormalTok{), month }\OperatorTok{>=}\StringTok{ }\DecValTok{10}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(btv_sea_flights_fall)}
+\end{Highlighting}
+\end{Shaded}
+
+Let's present another example that uses the \texttt{!} ``not'' operator to pick rows that \emph{don't} match a criteria. As mentioned earlier, the \texttt{!} can be read as ``not.'' Here we are filtering rows corresponding to flights that didn't go to Burlington, VT or Seattle, WA.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{not_BTV_SEA <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(}\OperatorTok{!}\NormalTok{(dest }\OperatorTok{==}\StringTok{ "BTV"} \OperatorTok{|}\StringTok{ }\NormalTok{dest }\OperatorTok{==}\StringTok{ "SEA"}\NormalTok{))}
+\KeywordTok{View}\NormalTok{(not_BTV_SEA)}
+\end{Highlighting}
+\end{Shaded}
+
+Again, note the careful use of parentheses around the \texttt{(dest\ ==\ "BTV"\ \textbar{}\ dest\ ==\ "SEA")}. If we didn't use parentheses as follows:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(}\OperatorTok{!}\NormalTok{dest }\OperatorTok{==}\StringTok{ "BTV"} \OperatorTok{|}\StringTok{ }\NormalTok{dest }\OperatorTok{==}\StringTok{ "SEA"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+We would be returning all flights not headed to \texttt{"BTV"} \emph{or} those headed to \texttt{"SEA"}, which is an entirely different resulting data frame.
+
+Now say we have a large list of airports we want to filter for, say \texttt{BTV}, \texttt{SEA}, \texttt{PDX}, \texttt{SFO}, and \texttt{BDL}. We could continue to use the \texttt{\textbar{}} or operator as so:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{many_airports <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(dest }\OperatorTok{==}\StringTok{ "BTV"} \OperatorTok{|}\StringTok{ }\NormalTok{dest }\OperatorTok{==}\StringTok{ "SEA"} \OperatorTok{|}\StringTok{ }\NormalTok{dest }\OperatorTok{==}\StringTok{ "PDX"} \OperatorTok{|}\StringTok{ }\NormalTok{dest }\OperatorTok{==}\StringTok{ "SFO"} \OperatorTok{|}\StringTok{ }\NormalTok{dest }\OperatorTok{==}\StringTok{ "BDL"}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(many_airports)}
+\end{Highlighting}
+\end{Shaded}
+
+but as we progressively include more airports, this will get unwieldy. A slightly shorter approach uses the \texttt{\%in\%} operator:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{many_airports <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(dest }\OperatorTok{%in%}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{"BTV"}\NormalTok{, }\StringTok{"SEA"}\NormalTok{, }\StringTok{"PDX"}\NormalTok{, }\StringTok{"SFO"}\NormalTok{, }\StringTok{"BDL"}\NormalTok{))}
+\KeywordTok{View}\NormalTok{(many_airports)}
+\end{Highlighting}
+\end{Shaded}
+
+What this code is doing is filtering \texttt{flights} for all flights where \texttt{dest} is in the list of airports \texttt{c("BTV",\ "SEA",\ "PDX",\ "SFO",\ "BDL")}. Recall from Chapter \ref{getting-started} that the \texttt{c()} function ``combines'' or ``concatenates'' values in a vector of values. Both outputs of \texttt{many\_airports} are the same, but as you can see the latter takes much less time to code.
+
+As a final note we point out that \texttt{filter()} should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope of your data frame to just the observations your care about.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC4.1)} What's another way of using the ``not'' operator \texttt{!} to filter only the rows that are not going to Burlington VT nor Seattle WA in the \texttt{flights} data frame? Test this out using the code above.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{summarize}{%
+\section{\texorpdfstring{\texttt{summarize} variables}{summarize variables}}\label{summarize}}
+
+The next common task when working with data is to return \emph{summary statistics}: a single numerical value that summarizes a large number of values, for example the mean/average or the median. Other examples of summary statistics that might not immediately come to mind include the sum, the smallest value AKA the minimum, the largest value AKA the maximum, and the standard deviation; they are all summaries of a large number of values.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/summarize1} 
+
+}
+
+\caption{Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet}\label{fig:sum1}
+\end{figure}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/summary} 
+
+}
+
+\caption{Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet}\label{fig:sum2}
+\end{figure}
+
+Let's calculate the mean and the standard deviation of the temperature variable \texttt{temp} in the \texttt{weather} data frame included in the \texttt{nycflights13} package (See Appendix \ref{appendixA}). We'll do this in one step using the \texttt{summarize()} function from the \texttt{dplyr} package and save the results in a new data frame \texttt{summary\_temp} with columns/variables \texttt{mean} and the \texttt{std\_dev}. Note you can also use the UK spelling of ``summarise'' using the \texttt{summarise()} function.
+
+As shown in Figures \ref{fig:sum1} and \ref{fig:sum2}, the \texttt{weather} data frame's many rows will be collapsed into a single row of just the summary values, in this case the mean and standard deviation:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{summary_temp <-}\StringTok{ }\NormalTok{weather }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(temp), }\DataTypeTok{std_dev =} \KeywordTok{sd}\NormalTok{(temp))}
+\NormalTok{summary_temp}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+   mean std_dev
+  <dbl>   <dbl>
+1    NA      NA
+\end{verbatim}
+
+Why are the values returned \texttt{NA}? As we saw in Section \ref{geompoint} when creating the scatterplot of departure and arrival delays for \texttt{alaska\_flights}, \texttt{NA} is how R encodes \emph{missing values} where \texttt{NA} indicates ``not available'' or ``not applicable.'' If a value for a particular row and a particular column does not exist, \texttt{NA} is stored instead. Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You'll often encounter issues with missing values when working with real data.
+
+Going back to our \texttt{summary\_temp} output above, by default any time you try to calculate a summary statistic of a variable that has one or more \texttt{NA} missing values in R, then \texttt{NA} is returned. To work around this fact, you can set the \texttt{na.rm} argument to \texttt{TRUE}, where \texttt{rm} is short for ``remove''; this will ignore any \texttt{NA} missing values and only return the summary value for all non-missing values.
+
+The code below computes the mean and standard deviation of all non-missing values of \texttt{temp}. Notice how the \texttt{na.rm=TRUE} are used as arguments to the \texttt{mean()} and \texttt{sd()} functions individually, and not to the \texttt{summarize()} function.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{summary_temp <-}\StringTok{ }\NormalTok{weather }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{), }
+            \DataTypeTok{std_dev =} \KeywordTok{sd}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{))}
+\NormalTok{summary_temp}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+   mean std_dev
+  <dbl>   <dbl>
+1  55.3    17.8
+\end{verbatim}
+
+However, one needs to be cautious whenever ignoring missing values as we've done above. In the upcoming Learning Checks we'll consider the possible ramifications of blindly sweeping rows with missing values ``under the rug.'' This is in fact why the \texttt{na.rm} argument to any summary statistic function in R has is set to \texttt{FALSE} by default; in other words, do not ignore rows with missing values by default. R is alerting you to the presence of missing data and you should by mindful of this missingness and any potential causes of this missingness throughtout your analysis.
+
+What are other summary statistic functions can we use inside the \texttt{summarize()} verb? As seen in Figure \ref{fig:sum2}, you can use any function in R that takes many values and returns just one. Here are just a few:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{mean()}: the mean AKA the average
+\item
+  \texttt{sd()}: the standard deviation, which is a measure of spread
+\item
+  \texttt{min()} and \texttt{max()}: the minimum and maximum values respectively
+\item
+  \texttt{IQR()}: Interquartile range
+\item
+  \texttt{sum()}: the sum
+\item
+  \texttt{n()}: a count of the number of rows/observations in each group. This particular summary function will make more sense when \texttt{group\_by()} is covered in Section \ref{groupby}.
+\end{itemize}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC4.2)} Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor's approach?
+
+\textbf{(LC4.3)} Modify the above \texttt{summarize} function to create \texttt{summary\_temp} to also use the \texttt{n()} summary function: \texttt{summarize(count\ =\ n())}. What does the returned value correspond to?
+
+\textbf{(LC4.4)} Why doesn't the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run \texttt{summary\_temp\ \textless{}-\ weather\ \%\textgreater{}\%\ summarize(mean\ =\ mean(temp,\ na.rm\ =\ TRUE))} first.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{summary_temp <-}\StringTok{ }\NormalTok{weather }\OperatorTok{%>%}\StringTok{   }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{std_dev =} \KeywordTok{sd}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{groupby}{%
+\section{\texorpdfstring{\texttt{group\_by} rows}{group\_by rows}}\label{groupby}}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/group_summary} 
+
+}
+
+\caption{Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet}\label{fig:groupsummarize}
+\end{figure}
+
+Say instead of the a single mean temperature for the whole year, you would like 12 mean temperatures, one for each of the 12 months separately? In other words, we would like to compute the mean temperature split by month AKA sliced by month AKA aggregated by month. We can do this by ``grouping'' temperature observations by the values of another variable, in this case by the 12 values of the variable \texttt{month}. Run the following code:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{summary_monthly_temp <-}\StringTok{ }\NormalTok{weather }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(month) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{), }
+            \DataTypeTok{std_dev =} \KeywordTok{sd}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{))}
+\NormalTok{summary_monthly_temp}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+\centering\begingroup\fontsize{10}{12}\selectfont
+
+\begin{tabular}{r|r|r}
+\hline
+month & mean & std\_dev\\
+\hline
+1 & 35.6 & 10.22\\
+\hline
+2 & 34.3 & 6.98\\
+\hline
+3 & 39.9 & 6.25\\
+\hline
+4 & 51.7 & 8.79\\
+\hline
+5 & 61.8 & 9.68\\
+\hline
+6 & 72.2 & 7.55\\
+\hline
+7 & 80.1 & 7.12\\
+\hline
+8 & 74.5 & 5.19\\
+\hline
+9 & 67.4 & 8.47\\
+\hline
+10 & 60.1 & 8.85\\
+\hline
+11 & 45.0 & 10.44\\
+\hline
+12 & 38.4 & 9.98\\
+\hline
+\end{tabular}
+\endgroup{}
+\end{table}
+
+This code is identical to the previous code that created \texttt{summary\_temp}, but with an extra \texttt{group\_by(month)} added before the \texttt{summarize()}. Grouping the \texttt{weather} dataset by \texttt{month} and then applying the \texttt{summarize()} functions yields a data frame that displays the mean and standard deviation temperature split by the 12 months of the year.
+
+It is important to note that the \texttt{group\_by()} function doesn't change data frames by itself. Rather it changes the \emph{meta-data}, or data about the data, specifically the group structure. It is only after we apply the \texttt{summarize()} function that the data frame changes. For example, let's consider the \texttt{diamonds} data frame included in the \texttt{ggplot2} package. Run this code, specifically in the console:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{diamonds}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 53,940 x 10
+   carat cut    color clarity depth table price     x     y     z
+   <dbl> <ord>  <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
+ 1 0.23  Ideal  E     SI2      61.5    55   326  3.95  3.98  2.43
+ 2 0.21  Premi~ E     SI1      59.8    61   326  3.89  3.84  2.31
+ 3 0.23  Good   E     VS1      56.9    65   327  4.05  4.07  2.31
+ 4 0.290 Premi~ I     VS2      62.4    58   334  4.2   4.23  2.63
+ 5 0.31  Good   J     SI2      63.3    58   335  4.34  4.35  2.75
+ 6 0.24  Very ~ J     VVS2     62.8    57   336  3.94  3.96  2.48
+ 7 0.24  Very ~ I     VVS1     62.3    57   336  3.95  3.98  2.47
+ 8 0.26  Very ~ H     SI1      61.9    55   337  4.07  4.11  2.53
+ 9 0.22  Fair   E     VS2      65.1    61   337  3.87  3.78  2.49
+10 0.23  Very ~ H     VS1      59.4    61   338  4     4.05  2.39
+# ... with 53,930 more rows
+\end{verbatim}
+
+Observe that the first line of the output reads \texttt{\#\ A\ tibble:\ 53,940\ x\ 10}. This is an example of meta-data, in this case the number of observations/rows and variables/columns in \texttt{diamonds}. The actual data itself are the subsequent table of values.
+
+Now let's pipe the \texttt{diamonds} data frame into \texttt{group\_by(cut)}. Run this code, specifically in the console:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{diamonds }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(cut)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 53,940 x 10
+# Groups:   cut [5]
+   carat cut    color clarity depth table price     x     y     z
+   <dbl> <ord>  <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
+ 1 0.23  Ideal  E     SI2      61.5    55   326  3.95  3.98  2.43
+ 2 0.21  Premi~ E     SI1      59.8    61   326  3.89  3.84  2.31
+ 3 0.23  Good   E     VS1      56.9    65   327  4.05  4.07  2.31
+ 4 0.290 Premi~ I     VS2      62.4    58   334  4.2   4.23  2.63
+ 5 0.31  Good   J     SI2      63.3    58   335  4.34  4.35  2.75
+ 6 0.24  Very ~ J     VVS2     62.8    57   336  3.94  3.96  2.48
+ 7 0.24  Very ~ I     VVS1     62.3    57   336  3.95  3.98  2.47
+ 8 0.26  Very ~ H     SI1      61.9    55   337  4.07  4.11  2.53
+ 9 0.22  Fair   E     VS2      65.1    61   337  3.87  3.78  2.49
+10 0.23  Very ~ H     VS1      59.4    61   338  4     4.05  2.39
+# ... with 53,930 more rows
+\end{verbatim}
+
+Observe that now there is additional meta-data: \texttt{\#\ Groups:\ cut\ {[}5{]}} indicating that the grouping structure meta-data has been set based on the 5 possible values AKA levels of the categorical variable \texttt{cut}: \texttt{"Fair"}, \texttt{"Good"}, \texttt{"Very\ Good"}, \texttt{"Premium"}, \texttt{"Ideal"}. On the other hand observe that the data has not changed: it is still a table of 53,940 \(\times\) 10 values.
+
+Only by combining a \texttt{group\_by()} with another data wrangling operation, in this case \texttt{summarize()} will the actual data be transformed.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{diamonds }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(cut) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{avg_price =} \KeywordTok{mean}\NormalTok{(price))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 5 x 2
+  cut       avg_price
+  <ord>         <dbl>
+1 Fair          4359.
+2 Good          3929.
+3 Very Good     3982.
+4 Premium       4584.
+5 Ideal         3458.
+\end{verbatim}
+
+If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the \texttt{ungroup()} function. Observe how the \texttt{\#\ Groups:\ cut\ {[}5{]}} meta-data is no longer present. Run this code, specifically in the console:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{diamonds }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(cut) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{ungroup}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 53,940 x 10
+   carat cut    color clarity depth table price     x     y     z
+   <dbl> <ord>  <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
+ 1 0.23  Ideal  E     SI2      61.5    55   326  3.95  3.98  2.43
+ 2 0.21  Premi~ E     SI1      59.8    61   326  3.89  3.84  2.31
+ 3 0.23  Good   E     VS1      56.9    65   327  4.05  4.07  2.31
+ 4 0.290 Premi~ I     VS2      62.4    58   334  4.2   4.23  2.63
+ 5 0.31  Good   J     SI2      63.3    58   335  4.34  4.35  2.75
+ 6 0.24  Very ~ J     VVS2     62.8    57   336  3.94  3.96  2.48
+ 7 0.24  Very ~ I     VVS1     62.3    57   336  3.95  3.98  2.47
+ 8 0.26  Very ~ H     SI1      61.9    55   337  4.07  4.11  2.53
+ 9 0.22  Fair   E     VS2      65.1    61   337  3.87  3.78  2.49
+10 0.23  Very ~ H     VS1      59.4    61   338  4     4.05  2.39
+# ... with 53,930 more rows
+\end{verbatim}
+
+Let's now revisit the \texttt{n()} counting summary function we introduced in the previous section. For example, suppose we'd like to count how many flights departed each of the three airports in New York City:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{by_origin <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(origin) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{count =} \KeywordTok{n}\NormalTok{())}
+\NormalTok{by_origin}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 3 x 2
+  origin  count
+  <chr>   <int>
+1 EWR    120835
+2 JFK    111279
+3 LGA    104662
+\end{verbatim}
+
+We see that Newark (\texttt{"EWR"}) had the most flights departing in 2013 followed by \texttt{"JFK"} and lastly by LaGuardia (\texttt{"LGA"}). Note there is a subtle but important difference between \texttt{sum()} and \texttt{n()}; While \texttt{sum()} returns the sum of a numerical variable, \texttt{n()} returns counts of the the number of rows/observations.
+
+\hypertarget{grouping-by-more-than-one-variable}{%
+\subsection{Grouping by more than one variable}\label{grouping-by-more-than-one-variable}}
+
+You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports \emph{for each month}, we can also group by a second variable \texttt{month}: \texttt{group\_by(origin,\ month)}. We see there are 36 rows to \texttt{by\_origin\_monthly} because there are 12 months for 3 airports (\texttt{EWR}, \texttt{JFK}, and \texttt{LGA}).
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{by_origin_monthly <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(origin, month) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{count =} \KeywordTok{n}\NormalTok{())}
+\NormalTok{by_origin_monthly}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 36 x 3
+# Groups:   origin [3]
+   origin month count
+   <chr>  <int> <int>
+ 1 EWR        1  9893
+ 2 EWR        2  9107
+ 3 EWR        3 10420
+ 4 EWR        4 10531
+ 5 EWR        5 10592
+ 6 EWR        6 10175
+ 7 EWR        7 10475
+ 8 EWR        8 10359
+ 9 EWR        9  9550
+10 EWR       10 10104
+# ... with 26 more rows
+\end{verbatim}
+
+Why do we \texttt{group\_by(origin,\ month)} and not \texttt{group\_by(origin)} and then \texttt{group\_by(month)}? Let's investigate:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{by_origin_monthly_incorrect <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(origin) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(month) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{count =} \KeywordTok{n}\NormalTok{())}
+\NormalTok{by_origin_monthly_incorrect}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 12 x 2
+   month count
+   <int> <int>
+ 1     1 27004
+ 2     2 24951
+ 3     3 28834
+ 4     4 28330
+ 5     5 28796
+ 6     6 28243
+ 7     7 29425
+ 8     8 29327
+ 9     9 27574
+10    10 28889
+11    11 27268
+12    12 28135
+\end{verbatim}
+
+What happened here is that the second \texttt{group\_by(month)} overrode the group structure meta-data of the first \texttt{group\_by(origin)}, so that in the end we are only grouping by \texttt{month}. The lesson here is if you want to \texttt{group\_by()} two or more variables, you should include all these variables in a single \texttt{group\_by()} function call.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC4.5)} Recall from Chapter \ref{viz} when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the \texttt{summary\_monthly\_temp} data frame tell us about temperatures in New York City throughout the year?
+
+\textbf{(LC4.6)} What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC?
+
+\textbf{(LC4.7)} Recreate \texttt{by\_monthly\_origin}, but instead of grouping via \texttt{group\_by(origin,\ month)}, group variables in a different order \texttt{group\_by(month,\ origin)}. What differs in the resulting dataset?
+
+\textbf{(LC4.8)} How could we identify how many flights left each of the three airports for each \texttt{carrier}?
+
+\textbf{(LC4.9)} How does the \texttt{filter} operation differ from a \texttt{group\_by} followed by a \texttt{summarize}?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{mutate}{%
+\section{\texorpdfstring{\texttt{mutate} existing variables}{mutate existing variables}}\label{mutate}}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/mutate} 
+
+}
+
+\caption{Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet}\label{fig:select}
+\end{figure}
+
+Another common transformation of data is to create/compute new variables based on existing ones. For example, say you are more comfortable thinking of temperature in degrees Celsius °C and not degrees Farenheit °F. The formula to convert temperatures from °F to °C is:
+
+\[
+\text{temp in C} = \frac{\text{temp in F} - 32}{1.8}
+\]
+
+We can apply this formula to the \texttt{temp} variable using the \texttt{mutate()} function, which takes existing variables and mutates them to create new ones.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{weather <-}\StringTok{ }\NormalTok{weather }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{temp_in_C =}\NormalTok{ (temp}\DecValTok{-32}\NormalTok{)}\OperatorTok{/}\FloatTok{1.8}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(weather)}
+\end{Highlighting}
+\end{Shaded}
+
+Note that we have overwritten the original \texttt{weather} data frame with a new version that now includes the additional variable \texttt{temp\_in\_C}. In other words, the \texttt{mutate()} command outputs a new data frame which then gets saved over the original \texttt{weather} data frame. Furthermore, note how in \texttt{mutate()} we used \texttt{temp\_in\_C\ =\ (temp-32)/1.8} to create a new variable \texttt{temp\_in\_C}.
+
+Why did we overwrite the data frame \texttt{weather} instead of assigning the result to a new data frame like \texttt{weather\_new}, but on the other hand why did we \emph{not} overwrite \texttt{temp}, but instead created a new variable called \texttt{temp\_in\_C}? As a rough rule of thumb, as long as you are not losing original information that you might need later, it's acceptable practice to overwrite existing data frames. On the other hand, had we used \texttt{mutate(temp\ =\ (temp-32)/1.8)} instead of \texttt{mutate(temp\_in\_C\ =\ (temp-32)/1.8)}, we would have overwritten the original variable \texttt{temp} and lost its values.
+
+Let's compute average monthly temperatures in both °F and °C using the similar \texttt{group\_by()} and \texttt{summarize()} code as in the previous section.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{summary_monthly_temp <-}\StringTok{ }\NormalTok{weather }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(month) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean_temp_in_F =} \KeywordTok{mean}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{), }
+            \DataTypeTok{mean_temp_in_C =} \KeywordTok{mean}\NormalTok{(temp_in_C, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{))}
+\NormalTok{summary_monthly_temp}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 12 x 3
+   month mean_temp_in_F mean_temp_in_C
+   <dbl>          <dbl>          <dbl>
+ 1     1           35.6           2.02
+ 2     2           34.3           1.26
+ 3     3           39.9           4.38
+ 4     4           51.7          11.0 
+ 5     5           61.8          16.6 
+ 6     6           72.2          22.3 
+ 7     7           80.1          26.7 
+ 8     8           74.5          23.6 
+ 9     9           67.4          19.7 
+10    10           60.1          15.6 
+11    11           45.0           7.22
+12    12           38.4           3.58
+\end{verbatim}
+
+Let's consider another example. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to the original arrival time. This is commonly referred to as ``gain'' and we will create this variable using the \texttt{mutate()} function.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{gain =}\NormalTok{ dep_delay }\OperatorTok{-}\StringTok{ }\NormalTok{arr_delay)}
+\end{Highlighting}
+\end{Shaded}
+
+Let's take a look at \texttt{dep\_delay}, \texttt{arr\_delay}, and the resulting \texttt{gain} variables for the first 5 rows in our new \texttt{flights} data frame:
+
+\begin{verbatim}
+# A tibble: 5 x 3
+  dep_delay arr_delay  gain
+      <dbl>     <dbl> <dbl>
+1         2        11    -9
+2         4        20   -16
+3         2        33   -31
+4        -1       -18    17
+5        -6       -25    19
+\end{verbatim}
+
+The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its ``gained time in the air'' is actually a loss of 9 minutes, hence its \texttt{gain} is \texttt{-9}. Contrast this to the flight in the fourth row which departed a minute early (\texttt{dep\_delay} of \texttt{-1}) but arrived 18 minutes early (\texttt{arr\_delay} of \texttt{-18}), so its ``gained time in the air'' is 17 minutes, hence its \texttt{gain} is \texttt{+17}.
+
+Let's look at summary measures of this \texttt{gain} variable and even plot it in the form of a histogram:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{gain_summary <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}
+    \DataTypeTok{min =} \KeywordTok{min}\NormalTok{(gain, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{),}
+    \DataTypeTok{q1 =} \KeywordTok{quantile}\NormalTok{(gain, }\FloatTok{0.25}\NormalTok{, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{),}
+    \DataTypeTok{median =} \KeywordTok{quantile}\NormalTok{(gain, }\FloatTok{0.5}\NormalTok{, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{),}
+    \DataTypeTok{q3 =} \KeywordTok{quantile}\NormalTok{(gain, }\FloatTok{0.75}\NormalTok{, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{),}
+    \DataTypeTok{max =} \KeywordTok{max}\NormalTok{(gain, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{),}
+    \DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(gain, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{),}
+    \DataTypeTok{sd =} \KeywordTok{sd}\NormalTok{(gain, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{),}
+    \DataTypeTok{missing =} \KeywordTok{sum}\NormalTok{(}\KeywordTok{is.na}\NormalTok{(gain))}
+\NormalTok{  )}
+\NormalTok{gain_summary}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+\centering\begingroup\fontsize{10}{12}\selectfont
+
+\begin{tabular}{r|r|r|r|r|r|r|r}
+\hline
+min & q1 & median & q3 & max & mean & sd & missing\\
+\hline
+-196 & -3 & 7 & 17 & 109 & 5.66 & 18 & 9430\\
+\hline
+\end{tabular}
+\endgroup{}
+\end{table}
+
+We've recreated the \texttt{summary} function we saw in Chapter \ref{viz} here using the \texttt{summarize} function in \texttt{dplyr}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ gain)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{bins =} \DecValTok{20}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-87-1} 
+
+}
+
+\caption{Histogram of gain variable}\label{fig:unnamed-chunk-87}
+\end{figure}
+
+We can also create multiple columns at once and even refer to columns that were just created in a new column. Hadley and Garrett produce one such example in Chapter 5 of ``R for Data Science'' \citep{rds2016}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}
+    \DataTypeTok{gain =}\NormalTok{ dep_delay }\OperatorTok{-}\StringTok{ }\NormalTok{arr_delay,}
+    \DataTypeTok{hours =}\NormalTok{ air_time }\OperatorTok{/}\StringTok{ }\DecValTok{60}\NormalTok{,}
+    \DataTypeTok{gain_per_hour =}\NormalTok{ gain }\OperatorTok{/}\StringTok{ }\NormalTok{hours}
+\NormalTok{  )}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC4.10)} What do positive values of the \texttt{gain} variable in \texttt{flights} correspond to? What about negative values? And what about a zero value?
+
+\textbf{(LC4.11)} Could we create the \texttt{dep\_delay} and \texttt{arr\_delay} columns by simply subtracting \texttt{dep\_time} from \texttt{sched\_dep\_time} and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in \texttt{flights}.
+
+\textbf{(LC4.12)} What can we say about the distribution of \texttt{gain}? Describe it in a few sentences using the plot and the \texttt{gain\_summary} data frame values.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{arrange}{%
+\section{\texorpdfstring{\texttt{arrange} and sort rows}{arrange and sort rows}}\label{arrange}}
+
+One of the most common tasks people working with data would like to perform is sort the data frame's rows in alphanumeric order of the values in a variable/column. For example, when calculating a median by hand requires you to first sort the data from the smallest to highest in value and then identify the ``middle'' value. The \texttt{dplyr} package has a function called \texttt{arrange()} that we will use to sort/reorder a data frame's rows according to the values of the specified variable. This is often used after we have used the \texttt{group\_by()} and \texttt{summarize()} functions as we will see.
+
+Let's suppose we were interested in determining the most frequent destination airports for all domestic flights departing from New York City in 2013:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{freq_dest <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(dest) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{num_flights =} \KeywordTok{n}\NormalTok{())}
+\NormalTok{freq_dest}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 105 x 2
+   dest  num_flights
+   <chr>       <int>
+ 1 ABQ           254
+ 2 ACK           265
+ 3 ALB           439
+ 4 ANC             8
+ 5 ATL         17215
+ 6 AUS          2439
+ 7 AVL           275
+ 8 BDL           443
+ 9 BGR           375
+10 BHM           297
+# ... with 95 more rows
+\end{verbatim}
+
+Observe that by default the rows of the resulting \texttt{freq\_dest} data frame are sorted in alphabetical order of \texttt{dest} destination. Say instead we would like to see the same data, but sorted from the most to the least number of flights \texttt{num\_flights} instead:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{freq_dest }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{arrange}\NormalTok{(num_flights)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 105 x 2
+   dest  num_flights
+   <chr>       <int>
+ 1 LEX             1
+ 2 LGA             1
+ 3 ANC             8
+ 4 SBN            10
+ 5 HDN            15
+ 6 MTJ            15
+ 7 EYW            17
+ 8 PSP            19
+ 9 JAC            25
+10 BZN            36
+# ... with 95 more rows
+\end{verbatim}
+
+This is actually giving us the opposite of what we are looking for: the rows are sorted with the least frequent destination airports displayed first. To switch the ordering to be descending instead of ascending we use the \texttt{desc()} function, which is short for ``descending'':
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{freq_dest }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{arrange}\NormalTok{(}\KeywordTok{desc}\NormalTok{(num_flights))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 105 x 2
+   dest  num_flights
+   <chr>       <int>
+ 1 ORD         17283
+ 2 ATL         17215
+ 3 LAX         16174
+ 4 BOS         15508
+ 5 MCO         14082
+ 6 CLT         14064
+ 7 SFO         13331
+ 8 FLL         12055
+ 9 MIA         11728
+10 DCA          9705
+# ... with 95 more rows
+\end{verbatim}
+
+In other words, \texttt{arrange()} sorts in ascending order by default unless you override this default behavior by using \texttt{desc()}.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{joins}{%
+\section{\texorpdfstring{\texttt{join} data frames}{join data frames}}\label{joins}}
+
+Another common data transformation task is ``joining'' or ``merging'' two different datasets. For example in the \texttt{flights} data frame the variable \texttt{carrier} lists the carrier code for the different flights. While the corresponding airline names for \texttt{"UA"} and \texttt{"AA"} might be somewhat easy to guess (United and American Airlines), what airlines have codes? \texttt{"VX"}, \texttt{"HA"}, and \texttt{"B6"}? This information is provided in a separate data frame \texttt{airlines}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{View}\NormalTok{(airlines)}
+\end{Highlighting}
+\end{Shaded}
+
+We see that in \texttt{airports}, \texttt{carrier} is the carrier code while \texttt{name} is the full name of the airline company. Using this table, we can see that \texttt{"VX"}, \texttt{"HA"}, and \texttt{"B6"} correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, wouldn't it be nice to have all this information in a single data frame instead of two separate data frames? We can do this by ``joining'' i.e. ``merging'' the \texttt{flights} and \texttt{airlines} data frames.
+
+Note that the values in the variable \texttt{carrier} in the \texttt{flights} data frame match the values in the variable \texttt{carrier} in the \texttt{airlines} data frame. In this case, we can use the variable \texttt{carrier} as a \emph{key variable} to match the rows of the two data frames. Key variables are almost always identification variables that uniquely identify the observational units as we saw in Subsection \ref{identification-vs-measurement-variables}. This ensures that rows in both data frames are appropriately matched during the join. Hadley and Garrett \citep{rds2016} created the following diagram to help us understand how the different datasets are linked by various key variables:
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/relational-nycflights} 
+
+}
+
+\caption{Data relationships in nycflights13 from R for Data Science}\label{fig:reldiagram}
+\end{figure}
+
+\hypertarget{matching-key-variable-names}{%
+\subsection{Matching ``key'' variable names}\label{matching-key-variable-names}}
+
+In both the \texttt{flights} and \texttt{airlines} data frames, the key variable we want to join/merge/match the rows of the two data frames by have the same name: \texttt{carriers}. We make use of the \texttt{inner\_join()} function to join the two data frames, where the rows will be matched by the variable \texttt{carrier}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights_joined <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{inner_join}\NormalTok{(airlines, }\DataTypeTok{by =} \StringTok{"carrier"}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(flights)}
+\KeywordTok{View}\NormalTok{(flights_joined)}
+\end{Highlighting}
+\end{Shaded}
+
+Observe that the \texttt{flights} and \texttt{flights\_joined} data frames are identical except that \texttt{flights\_joined} has an additional variable \texttt{name} whose values correspond to the airline company names drawn from the \texttt{airlines} data frame.
+
+A visual representation of the \texttt{inner\_join()} is given below \citep{rds2016}. There are other types of joins available (such as \texttt{left\_join()}, \texttt{right\_join()}, \texttt{outer\_join()}, and \texttt{anti\_join()}), but the \texttt{inner\_join()} will solve nearly all of the problems you'll encounter in this book.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/join-inner} 
+
+}
+
+\caption{Diagram of inner join from R for Data Science}\label{fig:ijdiagram}
+\end{figure}
+
+\hypertarget{diff-key}{%
+\subsection{Different ``key'' variable names}\label{diff-key}}
+
+Say instead you are interested in the destinations of all domestic flights departing NYC in 2013 and ask yourself:
+
+\begin{itemize}
+\tightlist
+\item
+  ``What cities are these airports in?''
+\item
+  ``Is \texttt{"ORD"} Orlando?''
+\item
+  "Where is \texttt{"FLL"}?
+\end{itemize}
+
+The \texttt{airports} data frame contains airport codes:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{View}\NormalTok{(airports)}
+\end{Highlighting}
+\end{Shaded}
+
+However, looking at both the \texttt{airports} and \texttt{flights} frames and the visual representation of the relations between these data frames in Figure \ref{fig:ijdiagram} above, we see that in:
+
+\begin{itemize}
+\tightlist
+\item
+  the \texttt{airports} data frame the airport code is in the variable \texttt{faa}
+\item
+  the \texttt{flights} data frame the airport codes are in the variables \texttt{origin} and \texttt{dest}
+\end{itemize}
+
+So to join these two data frames so that we can identify the destination cities for example, our \texttt{inner\_join()} operation will use the \texttt{by\ =\ c("dest"\ =\ "faa")} argument, which allows us to join two data frames where the key variable has a different name:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights_with_airport_names <-}\StringTok{  }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{inner_join}\NormalTok{(airports, }\DataTypeTok{by =} \KeywordTok{c}\NormalTok{(}\StringTok{"dest"}\NormalTok{ =}\StringTok{ "faa"}\NormalTok{))}
+\KeywordTok{View}\NormalTok{(flights_with_airport_names)}
+\end{Highlighting}
+\end{Shaded}
+
+Let's construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{named_dests <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(dest) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{num_flights =} \KeywordTok{n}\NormalTok{()) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{arrange}\NormalTok{(}\KeywordTok{desc}\NormalTok{(num_flights)) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{inner_join}\NormalTok{(airports, }\DataTypeTok{by =} \KeywordTok{c}\NormalTok{(}\StringTok{"dest"}\NormalTok{ =}\StringTok{ "faa"}\NormalTok{)) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{rename}\NormalTok{(}\DataTypeTok{airport_name =}\NormalTok{ name)}
+\NormalTok{named_dests}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 101 x 9
+   dest  num_flights airport_name   lat    lon   alt    tz dst  
+   <chr>       <int> <chr>        <dbl>  <dbl> <int> <dbl> <chr>
+ 1 ORD         17283 Chicago Oha~  42.0  -87.9   668    -6 A    
+ 2 ATL         17215 Hartsfield ~  33.6  -84.4  1026    -5 A    
+ 3 LAX         16174 Los Angeles~  33.9 -118.    126    -8 A    
+ 4 BOS         15508 General Edw~  42.4  -71.0    19    -5 A    
+ 5 MCO         14082 Orlando Intl  28.4  -81.3    96    -5 A    
+ 6 CLT         14064 Charlotte D~  35.2  -80.9   748    -5 A    
+ 7 SFO         13331 San Francis~  37.6 -122.     13    -8 A    
+ 8 FLL         12055 Fort Lauder~  26.1  -80.2     9    -5 A    
+ 9 MIA         11728 Miami Intl    25.8  -80.3     8    -5 A    
+10 DCA          9705 Ronald Reag~  38.9  -77.0    15    -5 A    
+# ... with 91 more rows, and 1 more variable: tzone <chr>
+\end{verbatim}
+
+In case you didn't know, \texttt{"ORD"} is the airport code of Chicago O'Hare airport and \texttt{"FLL"} is the main airport in Fort Lauderdale, Florida, which we can now see in the \texttt{airport\_name} variable in the resulting \texttt{named\_dests} data frame.
+
+\hypertarget{multiple-key-variables}{%
+\subsection{Multiple ``key'' variables}\label{multiple-key-variables}}
+
+Say instead we are in a situation where we need to join by multiple variables. For example, in Figure \ref{fig:reldiagram} above we see that in order to join the \texttt{flights} and \texttt{weather} data frames, we need more than one key variable: \texttt{year}, \texttt{month}, \texttt{day}, \texttt{hour}, and \texttt{origin}. This is because the combination of these 5 variables act to uniquely identify each observational unit in the \texttt{weather} data frame: hourly weather recordings at each of the 3 NYC airports.
+
+We achieve this by specifying a vector of key variables to join by using the \texttt{c()} function for ``combine'' or ``concatenate'' that we saw earlier:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights_weather_joined <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{inner_join}\NormalTok{(weather, }\DataTypeTok{by =} \KeywordTok{c}\NormalTok{(}\StringTok{"year"}\NormalTok{, }\StringTok{"month"}\NormalTok{, }\StringTok{"day"}\NormalTok{, }\StringTok{"hour"}\NormalTok{, }\StringTok{"origin"}\NormalTok{))}
+\KeywordTok{View}\NormalTok{(flights_weather_joined)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC4.13)} Looking at Figure \ref{fig:reldiagram}, when joining \texttt{flights} and \texttt{weather} (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of \texttt{year}, \texttt{month}, \texttt{day}, \texttt{hour}, and \texttt{origin}, and not just \texttt{hour}?
+
+\textbf{(LC4.14)} What surprises you about the top 10 destinations from NYC in 2013?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{normal-forms}{%
+\subsection{Normal forms}\label{normal-forms}}
+
+The data frames included in the \texttt{nycflights13} package are in a form that minimizes redundancy of data. For example, the \texttt{flights} data frame only saves the \texttt{carrier} code of the airline company; it does not include the actual name of the airline. For example the first row of \texttt{flights} has \texttt{carrier} equal to \texttt{UA}, but does it does not include the airline name ``United Air Lines Inc.'' The names of the airline companies are included in the \texttt{name} variable of the \texttt{airlines} data frame. In order to have the airline company name included in \texttt{flights}, we could join these two data frames as follows:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{joined_flights <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{inner_join}\NormalTok{(airlines, }\DataTypeTok{by =} \StringTok{"carrier"}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(joined_flights)}
+\end{Highlighting}
+\end{Shaded}
+
+We are capable of performing this join because each of the data frames have \emph{keys} in common to relate one to another: the \texttt{carrier} variable in both the \texttt{flights} and \texttt{airlines} data frames. The \emph{key} variable(s) that we join are often \emph{identification variables} we mentioned previously.
+
+This is an important property of what's known as \textbf{normal forms} of data. The process of decomposing data frames into less redundant tables without losing information is called \textbf{normalization}. More information is available on \href{https://en.wikipedia.org/wiki/Database_normalization}{Wikipedia}.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC4.15)} What are some advantages of data in normal forms? What are some disadvantages?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{other-verbs}{%
+\section{Other verbs}\label{other-verbs}}
+
+Here are some other useful data wrangling verbs that might come in handy:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{select()} only a subset of variables/columns
+\item
+  \texttt{rename()} variables/columns to have new names
+\item
+  Return only the \texttt{top\_n()} values of a variable
+\end{itemize}
+
+\hypertarget{select}{%
+\subsection{\texorpdfstring{\texttt{select} variables}{select variables}}\label{select}}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/select} 
+
+}
+
+\caption{Select diagram from Data Wrangling with dplyr and tidyr cheatsheet}\label{fig:selectfig}
+\end{figure}
+
+We've seen that the \texttt{flights} data frame in the \texttt{nycflights13} package contains 19 different variables. You can identify the names of these 19 variables by running the \texttt{glimpse()} function from the \texttt{dplyr} package:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{glimpse}\NormalTok{(flights)}
+\end{Highlighting}
+\end{Shaded}
+
+However, say you only need two of these variables, say \texttt{carrier} and \texttt{flight}. You can \texttt{select()} these two variables:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(carrier, flight)}
+\end{Highlighting}
+\end{Shaded}
+
+This function makes exploring data frames with a very large number of variables easier for humans to process by restricting consideration to only those we care about, like our example with \texttt{carrier} and \texttt{flight} above. This might make viewing the dataset using the \texttt{View()} spreadsheet viewer more digestible. However, as far as the computer is concerned, it doesn't care how many additional variables are in the data frame in question, so long as \texttt{carrier} and \texttt{flight} are included.
+
+Let's say instead you want to drop i.e deselect certain variables. For example, take the variable \texttt{year} in the \texttt{flights} data frame. This variable isn't quite a ``variable'' in the sense that all the values are \texttt{2013} i.e.~it doesn't change. Say you want to remove the \texttt{year} variable from the data frame; we can deselect \texttt{year} by using the \texttt{-} sign:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights_no_year <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(}\OperatorTok{-}\NormalTok{year)}
+\KeywordTok{glimpse}\NormalTok{(flights_no_year)}
+\end{Highlighting}
+\end{Shaded}
+
+Another way of selecting columns/variables is by specifying a range of columns:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flight_arr_times <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(month}\OperatorTok{:}\NormalTok{day, arr_time}\OperatorTok{:}\NormalTok{sched_arr_time)}
+\NormalTok{flight_arr_times}
+\end{Highlighting}
+\end{Shaded}
+
+The \texttt{select()} function can also be used to reorder columns in combination with the \texttt{everything()} helper function. Let's suppose we'd like the \texttt{hour}, \texttt{minute}, and \texttt{time\_hour} variables, which appear at the end of the \texttt{flights} dataset, to appear immediately after the \texttt{year}, \texttt{month}, and \texttt{day} variables while keeping the rest of the variables. In the code below \texttt{everything()} picks up all remaining variables.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights_reorder <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(year, month, day, hour, minute, time_hour, }\KeywordTok{everything}\NormalTok{())}
+\KeywordTok{glimpse}\NormalTok{(flights_reorder)}
+\end{Highlighting}
+\end{Shaded}
+
+Lastly, the helper functions \texttt{starts\_with()}, \texttt{ends\_with()}, and \texttt{contains()} can be used to select variables/column that match those conditions. For example:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights_begin_a <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(}\KeywordTok{starts_with}\NormalTok{(}\StringTok{"a"}\NormalTok{))}
+\NormalTok{flights_begin_a}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights_delays <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(}\KeywordTok{ends_with}\NormalTok{(}\StringTok{"delay"}\NormalTok{))}
+\NormalTok{flights_delays}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights_time <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(}\KeywordTok{contains}\NormalTok{(}\StringTok{"time"}\NormalTok{))}
+\NormalTok{flights_time}
+\end{Highlighting}
+\end{Shaded}
+
+\hypertarget{rename}{%
+\subsection{\texorpdfstring{\texttt{rename} variables}{rename variables}}\label{rename}}
+
+Another useful function is \texttt{rename()}, which as you may have guessed renames one column to another name. Suppose we want \texttt{dep\_time} and \texttt{arr\_time} to be \texttt{departure\_time} and \texttt{arrival\_time} instead in the \texttt{flights\_time} data frame:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights_time_new <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(}\KeywordTok{contains}\NormalTok{(}\StringTok{"time"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rename}\NormalTok{(}\DataTypeTok{departure_time =}\NormalTok{ dep_time,}
+         \DataTypeTok{arrival_time =}\NormalTok{ arr_time)}
+\KeywordTok{glimpse}\NormalTok{(flights_time)}
+\end{Highlighting}
+\end{Shaded}
+
+Note that in this case we used a single \texttt{=} sign within the \texttt{rename()}, for example \texttt{departure\_time\ =\ dep\_time}. This is because we are not testing for equality like we would using \texttt{==}, but instead we want to assign a new variable \texttt{departure\_time} to have the same values as \texttt{dep\_time} and then delete the variable \texttt{dep\_time}. It's easy to forget if the new name comes before or after the equals sign. I usually remember this as ``New Before, Old After'' or NBOA.
+
+\hypertarget{top_n-values-of-a-variable}{%
+\subsection{\texorpdfstring{\texttt{top\_n} values of a variable}{top\_n values of a variable}}\label{top_n-values-of-a-variable}}
+
+We can also return the top \texttt{n} values of a variable using the \texttt{top\_n()} function. For example, we can return a data frame of the top 10 destination airports using the example from Section \ref{diff-key}. Observe that we set the number of values to return to \texttt{n\ =\ 10} and \texttt{wt\ =\ num\_flights} to indicate that we want the rows of corresponding to the top 10 values of \texttt{num\_flights}. See the help file for \texttt{top\_n()} by running \texttt{?top\_n} for more information.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{named_dests }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{top_n}\NormalTok{(}\DataTypeTok{n =} \DecValTok{10}\NormalTok{, }\DataTypeTok{wt =}\NormalTok{ num_flights)}
+\end{Highlighting}
+\end{Shaded}
+
+Let's further \texttt{arrange()} these results in descending order of \texttt{num\_flights}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{named_dests  }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{top_n}\NormalTok{(}\DataTypeTok{n =} \DecValTok{10}\NormalTok{, }\DataTypeTok{wt =}\NormalTok{ num_flights) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{arrange}\NormalTok{(}\KeywordTok{desc}\NormalTok{(num_flights))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC4.16)} What are some ways to select all three of the \texttt{dest}, \texttt{air\_time}, and \texttt{distance} variables from \texttt{flights}? Give the code showing how to do this in at least three different ways.
+
+\textbf{(LC4.17)} How could one use \texttt{starts\_with}, \texttt{ends\_with}, and \texttt{contains} to select columns from the \texttt{flights} data frame? Provide three different examples in total: one for \texttt{starts\_with}, one for \texttt{ends\_with}, and one for \texttt{contains}.
+
+\textbf{(LC4.18)} Why might we want to use the \texttt{select} function on a data frame?
+
+\textbf{(LC4.19)} Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{conclusion-2}{%
+\section{Conclusion}\label{conclusion-2}}
+
+\hypertarget{summary-table-1}{%
+\subsection{Summary table}\label{summary-table-1}}
+
+Let's recap our data wrangling verbs in Table \ref{tab:wrangle-summary-table}. Using these verbs and the pipe \texttt{\%\textgreater{}\%} operator from Section \ref{piping}, you'll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book.
+
+\begin{table}[H]
+
+\caption{\label{tab:wrangle-summary-table}Summary of data wrangling verbs}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{>{\raggedright\arraybackslash}p{0.9in}>{\raggedright\arraybackslash}p{3.3in}}
+\toprule
+Verb & Data wrangling operation\\
+\midrule
+`filter()` & Pick out a subset of rows\\
+`summarize()` & Summarize many values to one using a summary statistic function like `mean()`, `median()`, etc.\\
+`group\_by()` & Add grouping structure to rows in data frame. Note this does not change values in data frame.\\
+`mutate()` & Create new variables by mutating existing ones\\
+`arrange()` & Arrange rows of a data variable in ascending (default) or `desc`ending order\\
+\addlinespace
+`inner\_join()` & Join/merge two data frames, matching rows by a key variable\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC4.20)} Let's now put your newly acquired data wrangling skills to the test!
+
+An airline industry measure of a passenger airline's capacity is the \href{https://en.wikipedia.org/wiki/Available_seat_miles}{available seat miles}, which is equal to the number of seats available multiplied by the number of miles or kilometers flown summed over all flights. So for example say an airline had 2 flights using a plane with 10 seats that flew 500 miles and 3 flights using a plane with 20 seats that flew 1000 miles, the available seat miles would be 2 \(\times\) 10 \(\times\) 500 \(+\) 3 \(\times\) 20 \(\times\) 1000 = 70,000 seat miles.
+
+Using the datasets included in the \texttt{nycflights13} package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{Crucial}: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level \emph{pseudocode} that is informal yet detailed enough to articulate what you are doing. This way you won't confuse \emph{what} you are trying to do (the algorithm) with \emph{how} you are going to do it (writing \texttt{dplyr} code).
+\item
+  Take a close look at all the datasets using the \texttt{View()} function: \texttt{flights}, \texttt{weather}, \texttt{planes}, \texttt{airports}, and \texttt{airlines} to identify which variables are necessary to compute available seat miles.
+\item
+  Figure \ref{fig:reldiagram} above showing how the various datasets can be joined will also be useful.
+\item
+  Consider the data wrangling verbs in Table \ref{tab:wrangle-summary-table} as your toolbox!
+\end{enumerate}
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{additional-resources-2}{%
+\subsection{Additional resources}\label{additional-resources-2}}
+
+An R script file of all R code used in this chapter is available \href{scripts/04-wrangling.R}{here}.
+
+If you want to further unlock the power of the \texttt{dplyr} package for data wrangling, we suggest you that you check out RStudio's ``Data Transformation with dplyr'' cheatsheet. This cheatsheet summarizes much more than what we've discussed in this chapter, in particular more-intermediate level and advanced data wrangling functions, while providing quick and easy to read visual descriptions.
+
+You can access this cheatsheet by going to the RStudio Menu Bar -\textgreater{} Help -\textgreater{} Cheatsheets -\textgreater{} ``Data Transformation with dplyr'':
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/dplyr_cheatsheet-1} 
+
+}
+
+\caption{Data Transformation with dplyr cheatsheat}\label{fig:dplyr-cheatsheet}
+\end{figure}
+
+On top of data wrangling verbs and examples we presented in this section, if you'd like to see more examples of using the \texttt{dplyr} package for data wrangling check out \href{http://r4ds.had.co.nz/transform.html}{Chapter 5} of Garrett Grolemund and Hadley Wickham's and Garrett's book \citep{rds2016}.
+
+\hypertarget{whats-to-come-1}{%
+\subsection{What's to come?}\label{whats-to-come-1}}
+
+So far in this book, we've explored, visualized, and wrangled data saved in data frames that are in spreadsheet-type format: rectangular with a certain number of rows corresponding to observations and a certain number of columns corresponding to variables describing the observations.
+
+We'll see in Chapter \ref{tidy} that there are actually two ways to represent data in spreadsheet-type rectangular format: 1) ``wide'' format and 2) ``tall/narrow'' format also known in R circles as ``tidy'' format. While the distinction between ``tidy'' and non-``tidy'' formatted data is very subtle, it has very large implications for whether or not we can use the \texttt{ggplot2} package for data visualization and the \texttt{dplyr} package for data wrangling.
+
+Furthermore, we've only explored, visualized, and wrangled data saved within R packages. What if you have spreadsheet data saved in a Microsoft Excel, Google Sheets, or ``Comma-Separated Values'' (CSV) file that you would like to analyze? In Chapter \ref{tidy}, we'll show you how to import this data into R using the \texttt{readr} package.
+
+\hypertarget{tidy}{%
+\chapter{Data Importing \& ``Tidy'' Data}\label{tidy}}
+
+In Subsection \ref{programming-concepts} we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section \ref{nycflights13}, we started exploring our first data frame: the \texttt{flights} data frame included in the \texttt{nycflights13} package. In Chapter \ref{viz} we created visualizations based on the data included in \texttt{flights} and other data frames such as \texttt{weather}. In Chapter \ref{wrangling}, we learned how to wrangle data, in other words take existing data frames and transform/ modify them to suit our analysis goals.
+
+In this final chapter of the ``Data Science via the tidyverse'' portion of the book, we extend some of these ideas by discussing a type of data formatting called ``tidy'' data. You will see that having data stored in ``tidy'' format is about more than what the colloquial definition of the term ``tidy'' might suggest: having your data ``neatly organized.'' Instead, we define the term ``tidy'' in a more rigorous fashion, outlining a set of rules by which data can be stored, and the implications of these rules for analyses.
+
+Although knowledge of this type of data formatting was not necessary for our treatment of data visualization in Chapter \ref{viz} and data wrangling in Chapter \ref{wrangling} since all the data was already in ``tidy'' format, we'll now see this format is actually essential to using the tools we covered in these two chapters. Furthermore, it will also be useful for all subsequent chapters in this book when we cover regression and statistical inference. First however, we'll show you how to import spreadsheet data for use in R.
+
+\hypertarget{needed-packages-2}{%
+\subsection*{Needed packages}\label{needed-packages-2}}
+
+
+Let's load all the packages needed for this chapter (this assumes you've already installed them). If needed, read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(readr)}
+\KeywordTok{library}\NormalTok{(tidyr)}
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\KeywordTok{library}\NormalTok{(fivethirtyeight)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{csv}{%
+\section{Importing data}\label{csv}}
+
+Up to this point, we've almost entirely used data stored inside of an R package. Say instead you have your own data saved on your computer or somewhere online? How can you analyze this data in R? Spreadsheet data is often saved in one of the following formats:
+
+\begin{itemize}
+\tightlist
+\item
+  A \emph{Comma Separated Values} \texttt{.csv} file. You can think of a \texttt{.csv} file as a bare-bones spreadsheet where:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Each line in the file corresponds to one row of data/one observation.
+  \item
+    Values for each line are separated with commas. In other words, the values of different variables are separated by commas.
+  \item
+    The first line is often, but not always, a \emph{header} row indicating the names of the columns/variables.
+  \end{itemize}
+\item
+  An Excel \texttt{.xlsx} file. This format is based on Microsoft's proprietary Excel software. As opposed to a bare-bones \texttt{.csv} files, \texttt{.xlsx} Excel files contain a lot of meta-data, or put more simply, data about the data. (Recall we saw a previous example of meta-data in Section \ref{groupby} when adding ``group structure'' meta-data to a data frame by using the \texttt{group\_by()} verb.) Some examples of spreadsheet meta-data include the use of bold and italic fonts, colored cells, different column widths, and formula macros.
+\item
+  A \href{https://www.google.com/sheets/about/}{Google Sheets} file, which is a ``cloud'' or online-based way to work with a spreadsheet. Google Sheets allows you to download your data in both comma separated values \texttt{.csv} and Excel \texttt{.xlsx} formats however: go to the Google Sheets menu bar -\textgreater{} File -\textgreater{} Download as -\textgreater{} Select ``Microsoft Excel'' or ``Comma-separated values.''
+\end{itemize}
+
+We'll cover two methods for importing \texttt{.csv} and \texttt{.xlsx} spreadsheet data in R: one using the R console and the other using RStudio's graphical user interface, abbreviated a GUI.
+
+\hypertarget{using-the-console}{%
+\subsection{Using the console}\label{using-the-console}}
+
+First, let's import a Comma Separated Values \texttt{.csv} file of data directly off the internet. The \texttt{.csv} file \texttt{dem\_score.csv} accessible at \url{https://moderndive.com/data/dem_score.csv} contains ratings of the level of democracy in different countries spanning 1952 to 1992. Let's use the \texttt{read\_csv()} function from the \texttt{readr} package to read it off the web, import it into R, and save it in a data frame called \texttt{dem\_score}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(readr)}
+\NormalTok{dem_score <-}\StringTok{ }\KeywordTok{read_csv}\NormalTok{(}\StringTok{"https://moderndive.com/data/dem_score.csv"}\NormalTok{)}
+\NormalTok{dem_score}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 96 x 10
+   country `1952` `1957` `1962` `1967` `1972` `1977` `1982`
+   <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
+ 1 Albania     -9     -9     -9     -9     -9     -9     -9
+ 2 Argent~     -9     -1     -1     -9     -9     -9     -8
+ 3 Armenia     -9     -7     -7     -7     -7     -7     -7
+ 4 Austra~     10     10     10     10     10     10     10
+ 5 Austria     10     10     10     10     10     10     10
+ 6 Azerba~     -9     -7     -7     -7     -7     -7     -7
+ 7 Belarus     -9     -7     -7     -7     -7     -7     -7
+ 8 Belgium     10     10     10     10     10     10     10
+ 9 Bhutan     -10    -10    -10    -10    -10    -10    -10
+10 Bolivia     -4     -3     -3     -4     -7     -7      8
+# ... with 86 more rows, and 2 more variables: `1987` <dbl>,
+#   `1992` <dbl>
+\end{verbatim}
+
+In this \texttt{dem\_score} data frame, the minimum value of \texttt{-10} corresponds to a highly autocratic nation whereas a value of \texttt{10} corresponds to a highly democratic nation. We'll revisit the \texttt{dem\_score} data frame in a case study in the upcoming Section \ref{case-study-tidy}.
+
+Note that the \texttt{read\_csv()} function included in the \texttt{readr} package is different than the \texttt{read.csv()} function that comes installed with R by default. While the difference in the names might seem near meaningless (an \texttt{\_} instead of a \texttt{.}), the \texttt{read\_csv()} function is in our opinion easier to use since it can more easily read data off the web and generally imports data at a much faster speed.
+
+\hypertarget{using-rstudios-interface}{%
+\subsection{Using RStudio's interface}\label{using-rstudios-interface}}
+
+Let's read in the exact same data saved in Excel format, but this time via RStudio's graphical interface instead of via the R console. First download the Excel file \texttt{dem\_score.xlsx} by clicking here, then
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Go to the Files panel of RStudio.
+\item
+  Navigate to the directory i.e.~folder on your computer where the downloaded \texttt{dem\_score.xlsx} Excel file is saved.
+\item
+  Click on \texttt{dem\_score.xlsx}.
+\item
+  Click ``Import Dataset\ldots{}''
+\end{enumerate}
+
+At this point you should see an image like this:
+
+\includegraphics{images/read_excel.png}
+
+After clicking on the ``Import'' button on the bottom right RStudio, RStudio will save this spreadsheet's data in a data frame called \texttt{dem\_score} and display its contents in the spreadsheet viewer. Furthermore, note in the bottom right of the above image there exists a ``Code Preview'': you can copy and paste this code to reload your data again later automatically instead of repeating the above manual point-and-click process.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{tidy-data-ex}{%
+\section{Tidy data}\label{tidy-data-ex}}
+
+Let's now switch gears and learn about the concept of ``tidy'' data format by starting with a motivating example. Let's consider the \texttt{drinks} data frame included in the \texttt{fivethirtyeight} data. Run the following:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{drinks}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 193 x 5
+   country beer_servings spirit_servings wine_servings
+   <chr>           <int>           <int>         <int>
+ 1 Afghan~             0               0             0
+ 2 Albania            89             132            54
+ 3 Algeria            25               0            14
+ 4 Andorra           245             138           312
+ 5 Angola            217              57            45
+ 6 Antigu~           102             128            45
+ 7 Argent~           193              25           221
+ 8 Armenia            21             179            11
+ 9 Austra~           261              72           212
+10 Austria           279              75           191
+# ... with 183 more rows, and 1 more variable:
+#   total_litres_of_pure_alcohol <dbl>
+\end{verbatim}
+
+After reading the help file by running \texttt{?drinks}, we see that \texttt{drinks} is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries. This data was originally reported on the data journalism website FiveThirtyEight.com in Mona Chalabi's article \href{https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/}{``Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?''}
+
+Let's apply some of the data wrangling verbs we learned in Chapter \ref{wrangling} on the \texttt{drinks} data frame. Let's
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \texttt{filter()} the \texttt{drinks} data frame to only consider 4 countries (the United States, China, Italy, and Saudi Arabia) then
+\item
+  \texttt{select()} all columns except \texttt{total\_litres\_of\_pure\_alcohol} by using \texttt{-} sign, then
+\item
+  \texttt{rename()} the variables \texttt{beer\_servings}, \texttt{spirit\_servings}, and \texttt{wine\_servings} to \texttt{beer}, \texttt{spirit}, and \texttt{wine} respectively
+\end{enumerate}
+
+and save the resulting data frame in \texttt{drinks\_smaller}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{drinks_smaller <-}\StringTok{ }\NormalTok{drinks }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(country }\OperatorTok{%in%}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{"USA"}\NormalTok{, }\StringTok{"China"}\NormalTok{, }\StringTok{"Italy"}\NormalTok{, }\StringTok{"Saudi Arabia"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(}\OperatorTok{-}\NormalTok{total_litres_of_pure_alcohol) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rename}\NormalTok{(}\DataTypeTok{beer =}\NormalTok{ beer_servings, }\DataTypeTok{spirit =}\NormalTok{ spirit_servings, }\DataTypeTok{wine =}\NormalTok{ wine_servings)}
+\NormalTok{drinks_smaller}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 4 x 4
+  country       beer spirit  wine
+  <chr>        <int>  <int> <int>
+1 China           79    192     8
+2 Italy           85     42   237
+3 Saudi Arabia     0      5     0
+4 USA            249    158    84
+\end{verbatim}
+
+Using the \texttt{drinks\_smaller} data frame, how would we create the side-by-side AKA dodged barplot in Figure \ref{fig:drinks-smaller}? Recall we saw barplots displaying two categorical variables in Section \ref{two-categ-barplot}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/drinks-smaller-1} 
+
+}
+
+\caption{Alcohol consumption in 4 countries.}\label{fig:drinks-smaller}
+\end{figure}
+
+Let's break down the Grammar of Graphics:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The categorical variable \texttt{country} with four levels (China, Italy, Saudi Arabia, USA) would have to be mapped to the \texttt{x}-position of the bars.
+\item
+  The numerical variable \texttt{servings} would have to be mapped to the \texttt{y}-position of the bars, in other words the height of the bars.
+\item
+  The categorical variable \texttt{type} with three levels (beer, spirit, wine) who have to be mapped to the \texttt{fill} color of the bars.
+\end{enumerate}
+
+Observe however that \texttt{drinks\_smaller} has \emph{three separate variables} for \texttt{beer}, \texttt{spirit}, and \texttt{wine}, whereas in order to recreate the side-by-side AKA dodged barplot in Figure \ref{fig:drinks-smaller} we would need a \emph{single variable} \texttt{type} with three possible values: \texttt{beer}, \texttt{spirit}, and \texttt{wine}, which we would then map to the \texttt{fill} aesthetic. In other words, for us to be able to create the barplot in Figure \ref{fig:drinks-smaller}, our data frame would have to look like this:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{drinks_smaller_tidy}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 12 x 3
+   country      type   servings
+   <chr>        <chr>     <int>
+ 1 China        beer         79
+ 2 Italy        beer         85
+ 3 Saudi Arabia beer          0
+ 4 USA          beer        249
+ 5 China        spirit      192
+ 6 Italy        spirit       42
+ 7 Saudi Arabia spirit        5
+ 8 USA          spirit      158
+ 9 China        wine          8
+10 Italy        wine        237
+11 Saudi Arabia wine          0
+12 USA          wine         84
+\end{verbatim}
+
+Let's compare the \texttt{drinks\_smaller\_tidy} with the \texttt{drinks\_smaller} data frame from earlier:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{drinks_smaller}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 4 x 4
+  country       beer spirit  wine
+  <chr>        <int>  <int> <int>
+1 China           79    192     8
+2 Italy           85     42   237
+3 Saudi Arabia     0      5     0
+4 USA            249    158    84
+\end{verbatim}
+
+Observe that while \texttt{drinks\_smaller} and \texttt{drinks\_smaller\_tidy} are both rectangular in shape and contain the same 12 numerical values (3 alcohol types \(\times\) 4 countries), they are formatted differently. \texttt{drinks\_smaller} is formatted in what's known as \href{https://en.wikipedia.org/wiki/Wide_and_narrow_data}{``wide''} format, whereas \texttt{drinks\_smaller\_tidy} is formatted in what's known as \href{https://en.wikipedia.org/wiki/Wide_and_narrow_data\#Narrow}{``long/narrow''}. In the context of using R, long/narrow format is also known as ``tidy'' format. Furthermore, in order to use the \texttt{ggplot2} and \texttt{dplyr} packages for data visualization and data wrangling, your input data frames \emph{must} be in ``tidy'' format. So all non-``tidy'' data must be converted to ``tidy'' format first.
+
+Before we show you how to convert non-``tidy'' data frames like \texttt{drinks\_smaller} to ``tidy'' data frames like \texttt{drinks\_smaller\_tidy}, let's go over the explicit definition of ``tidy'' data.
+
+\hypertarget{definition-of-tidy-data}{%
+\subsection{Definition of ``tidy'' data}\label{definition-of-tidy-data}}
+
+You have surely heard the word ``tidy'' in your life:
+
+\begin{itemize}
+\tightlist
+\item
+  ``Tidy up your room!''
+\item
+  ``Please write your homework in a tidy way so that it is easier to grade and to provide feedback.''
+\item
+  Marie Kondo's best-selling book \href{https://www.amazon.com/Life-Changing-Magic-Tidying-Decluttering-Organizing/dp/1607747308/ref=sr_1_1?ie=UTF8\&qid=1469400636\&sr=8-1\&keywords=tidying+up}{\emph{The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing}} and Netflix TV series \href{https://www.netflix.com/title/80209379}{\emph{Tidying Up with Marie Kondo}}.
+\item
+  ``I am not by any stretch of the imagination a tidy person, and the piles of unread books on the coffee table and by my bed have a plaintive, pleading quality to me - `Read me, please!'\,'' - Linda Grant
+\end{itemize}
+
+What does it mean for your data to be ``tidy''? While ``tidy'' has a clear English meaning of ``organized'', ``tidy'' in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham's definition of \emph{tidy data} here \citep{tidy}:
+
+\begin{quote}
+A dataset is a collection of values, usually either numbers (if quantitative)
+or strings AKA text data (if qualitative). Values are organised in two ways.
+Every value belongs to a variable and an observation. A variable contains all
+values that measure the same underlying attribute (like height, temperature,
+duration) across units. An observation contains all values measured on the same
+unit (like a person, or a day, or a city) across attributes.
+\end{quote}
+
+\begin{quote}
+Tidy data is a standard way of mapping the meaning of a dataset to its
+structure. A dataset is messy or tidy depending on how rows, columns and tables
+are matched up with observations, variables and types. In \emph{tidy data}:
+\end{quote}
+
+\begin{quote}
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Each variable forms a column.
+\item
+  Each observation forms a row.
+\item
+  Each type of observational unit forms a table.
+\end{enumerate}
+\end{quote}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/tidy-1} 
+
+}
+
+\caption{Tidy data graphic from [R for Data Science](http://r4ds.had.co.nz/tidy-data.html).}\label{fig:tidyfig}
+\end{figure}
+
+For example, say you have the following table of stock prices in Table \ref{tab:non-tidy-stocks}:
+
+\begin{table}[H]
+
+\caption{\label{tab:non-tidy-stocks}Stock Prices (Non-Tidy Format)}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{llll}
+\toprule
+Date & Boeing Stock Price & Amazon Stock Price & Google Stock Price\\
+\midrule
+2009-01-01 & \$173.55 & \$174.90 & \$174.34\\
+2009-01-02 & \$172.61 & \$171.42 & \$170.04\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format because while there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), there are not three columns. In ``tidy'' data format each variable should be its own column, as shown in Table \ref{tab:tidy-stocks}. Notice that both tables present the same information, but in different formats.
+
+\begin{table}[H]
+
+\caption{\label{tab:tidy-stocks}Stock Prices (Tidy Format)}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lll}
+\toprule
+Date & Stock Name & Stock Price\\
+\midrule
+2009-01-01 & Boeing & \$173.55\\
+2009-01-02 & Boeing & \$172.61\\
+2009-01-01 & Amazon & \$174.90\\
+2009-01-02 & Amazon & \$171.42\\
+2009-01-01 & Google & \$174.34\\
+\addlinespace
+2009-01-02 & Google & \$170.04\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Now we have the requisite three columns Date, Stock Name, and Stock Price. On the other hand, consider the data in Table \ref{tab:tidy-stocks-2}.
+
+\begin{table}[H]
+
+\caption{\label{tab:tidy-stocks-2}Date, Boeing Price, Weather Data}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lll}
+\toprule
+Date & Boeing Price & Weather\\
+\midrule
+2009-01-01 & \$173.55 & Sunny\\
+2009-01-02 & \$172.61 & Overcast\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+In this case, even though the variable ``Boeing Price'' occurs just like in our non-``tidy'' data in Table \ref{tab:non-tidy-stocks}, the data \emph{is} ``tidy'' since there are three variables corresponding to three unique pieces of information: Date, Boeing stock price, and the weather that particular day.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC5.1)} What are common characteristics of ``tidy'' data frames?
+
+\textbf{(LC5.2)} What makes ``tidy'' data frames useful for organizing data?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{converting-to-tidy-data}{%
+\subsection{Converting to ``tidy'' data}\label{converting-to-tidy-data}}
+
+In this book so far, you've only seen data frames that were already in ``tidy'' format. Furthermore for the rest of this book, you'll mostly only see data frames that are already in ``tidy'' format as well. This is not always the case however with data in the wild. If your original data frame is in wide i.e.~non-``tidy'' format and you would like to use the \texttt{ggplot2} package for data visualization or the \texttt{dplyr} package for data wrangling, you will first have to convert it ``tidy'' format using the \texttt{gather()} function in the \texttt{tidyr} package \citep{R-tidyr}.
+
+Going back to our \texttt{drinks\_smaller} data frame from earlier:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{drinks_smaller}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 4 x 4
+  country       beer spirit  wine
+  <chr>        <int>  <int> <int>
+1 China           79    192     8
+2 Italy           85     42   237
+3 Saudi Arabia     0      5     0
+4 USA            249    158    84
+\end{verbatim}
+
+We convert it to ``tidy'' format by using the \texttt{gather()} function from the \texttt{tidyr} package as follows:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{drinks_smaller_tidy <-}\StringTok{ }\NormalTok{drinks_smaller }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{gather}\NormalTok{(}\DataTypeTok{key =}\NormalTok{ type, }\DataTypeTok{value =}\NormalTok{ servings, }\OperatorTok{-}\NormalTok{country)}
+\NormalTok{drinks_smaller_tidy}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 12 x 3
+   country      type   servings
+   <chr>        <chr>     <int>
+ 1 China        beer         79
+ 2 Italy        beer         85
+ 3 Saudi Arabia beer          0
+ 4 USA          beer        249
+ 5 China        spirit      192
+ 6 Italy        spirit       42
+ 7 Saudi Arabia spirit        5
+ 8 USA          spirit      158
+ 9 China        wine          8
+10 Italy        wine        237
+11 Saudi Arabia wine          0
+12 USA          wine         84
+\end{verbatim}
+
+We set the arguments to \texttt{gather()} as follows:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \texttt{key} is the name of the column/variable in the new ``tidy'' frame that contains the column names of the original data frame that you want to tidy. Observe how we set \texttt{key\ =\ type} and in the resulting \texttt{drinks\_smaller\_tidy} the column \texttt{type} contains the three types of alcohol \texttt{beer}, \texttt{spirit}, and \texttt{wine}.
+\item
+  \texttt{value} is the name of the column/variable in the ``tidy'' frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set \texttt{value\ =\ servings} and in the resulting \texttt{drinks\_smaller\_tidy} the column \texttt{value} contains the 4 \(\times\) 3 = 12 numerical values.
+\item
+  The third argument are the columns you either want to or don't want to tidy. Observe how we set this to \texttt{-country} indicating that we don't want to tidy the \texttt{country} variable in \texttt{drinks\_smaller} and rather only \texttt{beer}, \texttt{spirit}, and \texttt{wine}.
+\end{enumerate}
+
+The third argument is a little nuanced, so let's consider another example. Note the code below is very similar, but now the third argument species which columns we'd want to tidy \texttt{c(beer,\ spirit,\ wine)}, instead of the columns we don't want to tidy \texttt{-country}. Note the use of \texttt{c()} to create a vector of the columns in \texttt{drinks\_smaller} that we'd like to tidy. If you run the code below, you'll see that the resulting \texttt{drinks\_smaller\_tidy} is the same.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{drinks_smaller_tidy <-}\StringTok{ }\NormalTok{drinks_smaller }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{gather}\NormalTok{(}\DataTypeTok{key =}\NormalTok{ type, }\DataTypeTok{value =}\NormalTok{ servings, }\KeywordTok{c}\NormalTok{(beer, spirit, wine))}
+\NormalTok{drinks_smaller_tidy}
+\end{Highlighting}
+\end{Shaded}
+
+With our \texttt{drinks\_smaller\_tidy} ``tidy'' format data frame, we can now produce a side-by-side AKA dodged barplot using \texttt{geom\_col()} and not \texttt{geom\_bar()}, since we would like to map the \texttt{servings} variable to the \texttt{y}-aesthetic of the bars.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(drinks_smaller_tidy, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x=}\NormalTok{country, }\DataTypeTok{y=}\NormalTok{servings, }\DataTypeTok{fill=}\NormalTok{type)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_col}\NormalTok{(}\DataTypeTok{position =} \StringTok{"dodge"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-128-1} \end{center}
+
+Converting ``wide'' format data to ``tidy'' format often confuses new R users. The only way to learn to get comfortable with the \texttt{gather()} function is with practice, practice, and more practice. For example, see the examples in the bottom of the help file for \texttt{gather()} by running \texttt{?gather}. We'll show another example of using \texttt{gather()} to convert a ``wide'' formatted data frame to ``tidy'' format in Section \ref{case-study-tidy}. For other examples of converting a dataset into ``tidy'' format, check out the different functions available for data tidying and a case study using data from the World Health Organization in \href{http://r4ds.had.co.nz/tidy-data.html}{R for Data Science} \citep{rds2016}.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC5.3)} Take a look the \texttt{airline\_safety} data frame included in the \texttt{fivethirtyeight} data. Run the following:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{airline_safety}
+\end{Highlighting}
+\end{Shaded}
+
+After reading the help file by running \texttt{?airline\_safety}, we see that \texttt{airline\_safety} is a data frame containing information on different airlines companies' safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver's article \href{https://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/}{``Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?''}. Let's ignore the \texttt{incl\_reg\_subsidiaries} and \texttt{avail\_seat\_km\_per\_week} variables for simplicity:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{airline_safety_smaller <-}\StringTok{ }\NormalTok{airline_safety }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(}\OperatorTok{-}\KeywordTok{c}\NormalTok{(incl_reg_subsidiaries, avail_seat_km_per_week))}
+\NormalTok{airline_safety_smaller}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 56 x 7
+   airline incidents_85_99 fatal_accidents~ fatalities_85_99
+   <chr>             <int>            <int>            <int>
+ 1 Aer Li~               2                0                0
+ 2 Aerofl~              76               14              128
+ 3 Aeroli~               6                0                0
+ 4 Aerome~               3                1               64
+ 5 Air Ca~               2                0                0
+ 6 Air Fr~              14                4               79
+ 7 Air In~               2                1              329
+ 8 Air Ne~               3                0                0
+ 9 Alaska~               5                0                0
+10 Alital~               7                2               50
+# ... with 46 more rows, and 3 more variables:
+#   incidents_00_14 <int>, fatal_accidents_00_14 <int>,
+#   fatalities_00_14 <int>
+\end{verbatim}
+
+This data frame is not in ``tidy'' format. How would you convert this data frame to be in ``tidy'' format, in particular so that it has a variable \texttt{incident\_type\_years} indicating the incident type/year and a variable \texttt{count} of the counts?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{nycflights13-package-1}{%
+\subsection{\texorpdfstring{\texttt{nycflights13} package}{nycflights13 package}}\label{nycflights13-package-1}}
+
+Recall the \texttt{nycflights13} package with data about all domestic flights departing from New York City in 2013 that we introduced in Section \ref{nycflights13} and used extensively in Chapter \ref{viz} on data visualization and Chapter \ref{wrangling} on data wrangling. Let's revisit the \texttt{flights} data frame by running \texttt{View(flights)}. We saw that \texttt{flights} has a rectangular shape with each of its 336,776 rows corresponding to a flight and each of its 22 columns corresponding to different characteristics/measurements of each flight. This matches exactly with our definition of ``tidy'' data from above.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Each variable forms a column.
+\item
+  Each observation forms a row.
+\end{enumerate}
+
+But what about the third property of ``tidy'' data?
+
+\begin{quote}
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\setcounter{enumi}{2}
+\tightlist
+\item
+  Each type of observational unit forms a table.
+\end{enumerate}
+\end{quote}
+
+Recall that we also saw in Section \ref{exploredataframes} that the observational unit for the \texttt{flights} data frame is an individual flight. In other words, the rows of the \texttt{flights} data frame refer to characteristics/measurements of individual flights. Also included in the \texttt{nycflights13} package are other data frames with their rows representing different observational units \citep{R-nycflights13}:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{airlines}: translation between two letter IATA carrier codes and names (16 in total). i.e.~the observational unit is an airline company.
+\item
+  \texttt{planes}: construction information about each of 3,322 planes used. i.e.~the observational unit is an aircraft.
+\item
+  \texttt{weather}: hourly meteorological data (about 8705 observations) for each of the three NYC airports. i.e.~the observational unit is an hourly measurement.
+\item
+  \texttt{airports}: airport names and locations. i.e.~the observational unit is an airport.
+\end{itemize}
+
+The organization of the information into these five data frames follow the third ``tidy'' data property: observations corresponding to the same observational unit should be saved in the same table i.e.~data frame. You could think of this property as the old English expression: ``birds of a feather flock together.''
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{case-study-tidy}{%
+\section{Case study: Democracy in Guatemala}\label{case-study-tidy}}
+
+In this section, we'll show you another example of how to convert a data frame that isn't in ``tidy'' format i.e. ``wide'' format, to a data frame that is in ``tidy'' format i.e. ``long/narrow'' format. We'll do this using the \texttt{gather()} function from the \texttt{tidyr} package again. Furthermore, we'll make use of some of the \texttt{ggplot2} data visualization and \texttt{dplyr} data wrangling tools you learned in Chapters \ref{viz} and \ref{wrangling}.
+
+Let's use the \texttt{dem\_score} data frame we imported in Section \ref{csv}, but focus on only data corresponding to Guatemala.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{guat_dem <-}\StringTok{ }\NormalTok{dem_score }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(country }\OperatorTok{==}\StringTok{ "Guatemala"}\NormalTok{)}
+\NormalTok{guat_dem}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 10
+  country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987`
+  <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
+1 Guatem~      2     -6     -5      3      1     -3     -7      3
+# ... with 1 more variable: `1992` <dbl>
+\end{verbatim}
+
+Now let's produce a \emph{time-series plot} showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Recall that we saw time-series plot in Section \ref{linegraphs} on creating linegraphs using \texttt{geom\_line()}. Let's lay out the Grammar of Graphics we saw in Section \ref{grammarofgraphics}.
+
+First we know we need to set \texttt{data\ =\ guat\_dem} and use a \texttt{geom\_line()} layer, but what is the aesthetic mapping of variables. We'd like to see how the democracy score has changed over the years, so we need to map:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{year} to the x-position aesthetic and
+\item
+  \texttt{democracy\_score} to the y-position aesthetic
+\end{itemize}
+
+Now we are stuck in a predicament, much like with our \texttt{drinks\_smaller} example in Section \ref{tidy-data-ex}. We see that we have a variable named \texttt{country}, but its only value is \texttt{"Guatemala"}. We have other variables denoted by different year values. Unfortunately, the \texttt{guat\_dem} data frame is not ``tidy'' and hence is not in the appropriate format to apply the Grammar of Graphics and thus we cannot use the \texttt{ggplot2} package. We need to take the values of the columns corresponding to years in \texttt{guat\_dem} and convert them into a new ``key'' variable called \texttt{year}. Furthermore, we'd like to take the democracy scores on the inside of the table and turn them into a new ``value'' variable called \texttt{democracy\_score}. Our resulting data frame will thus have three columns: \texttt{country}, \texttt{year}, and \texttt{democracy\_score}.
+
+Recall that the \texttt{gather()} function in the \texttt{tidyr} package can complete this task for us:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{guat_dem_tidy <-}\StringTok{ }\NormalTok{guat_dem }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{gather}\NormalTok{(}\DataTypeTok{key =}\NormalTok{ year, }\DataTypeTok{value =}\NormalTok{ democracy_score, }\OperatorTok{-}\NormalTok{country) }
+\NormalTok{guat_dem_tidy}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 9 x 3
+  country   year  democracy_score
+  <chr>     <chr>           <dbl>
+1 Guatemala 1952                2
+2 Guatemala 1957               -6
+3 Guatemala 1962               -5
+4 Guatemala 1967                3
+5 Guatemala 1972                1
+6 Guatemala 1977               -3
+7 Guatemala 1982               -7
+8 Guatemala 1987                3
+9 Guatemala 1992                3
+\end{verbatim}
+
+We set the arguments to \texttt{gather()} as follows:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \texttt{key} is the name of the column/variable in the new ``tidy'' frame that contains the column names of the original data frame that you want to tidy. Observe how we set \texttt{key\ =\ year} and in the resulting \texttt{guat\_dem\_tidy} the column \texttt{year} contains the years where the Guatemala's democracy score were measured.
+\item
+  \texttt{value} is the name of the column/variable in the ``tidy'' frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set \texttt{value\ =\ democracy\_score} and in the resulting \texttt{guat\_dem\_tidy} the column \texttt{democracy\_score} contains the 1 \(\times\) 9 = 9 democracy scores.
+\item
+  The third argument are the columns you either want to or don't want to tidy. Observe how we set this to \texttt{-country} indicating that we don't want to tidy the \texttt{country} variable in \texttt{guat\_dem} and rather only \texttt{1952} through \texttt{1992}.
+\end{enumerate}
+
+However, observe in the output for \texttt{guat\_dem\_tidy} that the \texttt{year} variable is of type \texttt{chr} or character. Before we can plot this variable on the x-axis, we need to convert it into a numerical variable using the \texttt{as.numeric()} function within the \texttt{mutate()} function, which we saw in Section \ref{mutate} on mutating existing variables to create new ones.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{guat_dem_tidy <-}\StringTok{ }\NormalTok{guat_dem_tidy }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{year =} \KeywordTok{as.numeric}\NormalTok{(year))}
+\end{Highlighting}
+\end{Shaded}
+
+We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a \texttt{geom\_line()}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(guat_dem_tidy, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ year, }\DataTypeTok{y =}\NormalTok{ democracy_score)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_line}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Year"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Democracy Score"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"Democracy score in Guatemala 1952-1992"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-136-1} \end{center}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC5.4)} Convert the \texttt{dem\_score} data frame into
+a tidy data frame and assign the name of \texttt{dem\_score\_tidy} to the resulting long-formatted data frame.
+
+\textbf{(LC5.5)} Read in the life expectancy data stored at \url{https://moderndive.com/data/le_mess.csv} and convert it to a tidy data frame.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{conclusion-3}{%
+\section{Conclusion}\label{conclusion-3}}
+
+\hypertarget{tidyverse-package}{%
+\subsection{\texorpdfstring{\texttt{tidyverse} package}{tidyverse package}}\label{tidyverse-package}}
+
+Notice at the beginning of the chapter we loaded the following four packages, which are among the four of the most frequently used R packages for data science:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(readr)}
+\KeywordTok{library}\NormalTok{(tidyr)}
+\end{Highlighting}
+\end{Shaded}
+
+There is a much quicker way to load these packages than by individually loading them as we did above: by installing and loading the \texttt{tidyverse} package. The \texttt{tidyverse} package acts as an ``umbrella'' package whereby installing/loading it will install/load multiple packages at once for you. So after installing the \texttt{tidyverse} package as you would a normal package, running this:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(tidyverse)}
+\end{Highlighting}
+\end{Shaded}
+
+would be the same as running this:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(tidyr)}
+\KeywordTok{library}\NormalTok{(readr)}
+\KeywordTok{library}\NormalTok{(purrr)}
+\KeywordTok{library}\NormalTok{(tibble)}
+\KeywordTok{library}\NormalTok{(stringr)}
+\KeywordTok{library}\NormalTok{(forcats)}
+\end{Highlighting}
+\end{Shaded}
+
+You've seen the first 4 of the these packages: \texttt{ggplot2} for data visualization, \texttt{dplyr} for data wrangling, \texttt{tidyr} for converting data to ``tidy'' format, and \texttt{readr} for importing spreadsheet data into R. The remaining packages (\texttt{purrr}, \texttt{tibble}, \texttt{stringr}, and \texttt{forcats}) are left for a more advanced book; check out \href{http://r4ds.had.co.nz/}{R for Data Science} to learn about these packages.
+
+The \texttt{tidyverse} ``umbrella'' package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in ``tidy'' format and all output data frames are in ``tidy'' format as well. This standardization of input and output data frames makes transitions between the various functions in these packages as seamless as possible.
+
+\hypertarget{additional-resources-3}{%
+\subsection{Additional resources}\label{additional-resources-3}}
+
+An R script file of all R code used in this chapter is available \href{scripts/05-tidy.R}{here}.
+
+If you want to learn more about using the \texttt{readr} and \texttt{tidyr} package, we suggest you that you check out RStudio's ``Data Import'' cheatsheet. You can access this cheatsheet by going to RStudio's \href{https://www.rstudio.com/resources/cheatsheets/}{cheatsheet page} and searching for ``Data Import Cheat Sheet''.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/import_cheatsheet-1} 
+
+}
+
+\caption{Data Import cheatsheat}\label{fig:import-cheatsheet}
+\end{figure}
+
+\hypertarget{whats-to-come-2}{%
+\subsection{What's to come?}\label{whats-to-come-2}}
+
+Congratulations! We've completed the ``Data Science via the tidyverse'' portion of this book! We'll now move to the ``data modeling'' portion in Chapters \ref{regression} and \ref{multiple-regression}, where you'll leverage your data visualization and wrangling skills to model relationships between different variables in data frames. However, we're going to leave the Chapter \ref{inference-for-regression} on ``Inference for Regression'' until after we've covered statistical inference.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/flowcharts/flowchart/flowchart.005} 
+
+}
+
+\caption{ModernDive flowchart - On to Part II!}\label{fig:unnamed-chunk-141}
+\end{figure}
+
+\hypertarget{part-data-modeling-via-moderndive}{%
+\part{Data Modeling via moderndive}\label{part-data-modeling-via-moderndive}}
+
+\hypertarget{regression}{%
+\chapter{Basic Regression}\label{regression}}
+
+Now that we are equipped with data visualization skills from Chapter \ref{viz}, an understanding of the ``tidy'' data format from Chapter \ref{tidy}, and data wrangling skills from Chapter \ref{wrangling}, we now proceed with data modeling. The fundamental premise of data modeling is \emph{to make explicit the relationship} between:
+
+\begin{itemize}
+\tightlist
+\item
+  an outcome variable \(y\), also called a dependent variable and
+\item
+  an explanatory/predictor variable \(x\), also called an independent variable or covariate.
+\end{itemize}
+
+Another way to state this is using mathematical terminology: we will model the outcome variable \(y\) \emph{as a function} of the explanatory/predictor variable \(x\). Why do we have two different labels, explanatory and predictor, for the variable \(x\)? That's because roughly speaking data modeling can be used for two purposes:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{Modeling for prediction}: You want to predict an outcome variable \(y\) based on the information contained in a set of predictor variables. You don't care so much about understanding how all the variables relate and interact, but so long as you can make good predictions about \(y\), you're fine. For example, if we know many individuals' risk factors for lung cancer, such as smoking habits and age, can we predict whether or not they will develop lung cancer? Here we wouldn't care so much about distinguishing the degree to which the different risk factors contribute to lung cancer, but instead only on whether or not they could be put together to make reliable predictions.
+\item
+  \textbf{Modeling for explanation}: You want to explicitly describe the relationship between an outcome variable \(y\) and a set of explanatory variables, determine the significance of any found relationships, and have measures summarizing these. Continuing our example from above, we would now be interested in describing the individual effects of the different risk factors and quantifying the magnitude of these effects. One reason could be to design an intervention to reduce lung cancer cases in a population, such as targeting smokers of a specific age group with an advertisement for smoking cessation programs. In this book, we'll focus more on this latter purpose.
+\end{enumerate}
+
+Data modeling is used in a wide variety of fields, including statistical inference, causal inference, artificial intelligence, and machine learning. There are many techniques for data modeling, such as tree-based models, neural networks and deep learning, and supervised learning. In this chapter, we'll focus on one particular technique: \emph{linear regression}, one of the most commonly-used and easy-to-understand approaches to modeling. Recall our discussion in Subsection \ref{exploredataframes} on numerical and categorical variables. Linear regression involves:
+
+\begin{itemize}
+\tightlist
+\item
+  an outcome variable \(y\) that is \emph{numerical} and
+\item
+  explanatory variables \(\vec{x}\) that are either \emph{numerical} or \emph{categorical}.
+\end{itemize}
+
+With linear regression there is always only one numerical outcome variable \(y\) but we have choices on both the number and the type of explanatory variables \(\vec{x}\) to use. We're going to cover the following regression scenarios:
+
+\begin{itemize}
+\tightlist
+\item
+  In this current chapter on basic regression, we'll always have only one explanatory variable.
+
+  \begin{itemize}
+  \tightlist
+  \item
+    In Section \ref{model1}, this explanatory variable will be a single numerical explanatory variable \(x\). This scenario is known as \emph{simple linear regression}.
+  \item
+    In Section \ref{model2}, this explanatory variable will be a categorical explanatory variable \(x\).
+  \end{itemize}
+\item
+  In the next chapter, Chapter \ref{multiple-regression} on \emph{multiple regression}, we'll have more than one explanatory variable:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    We'll focus on two numerical explanatory variables \(x_1\) and \(x_2\) in Section \ref{model3}. This can be denoted as \(\vec{x}\) as well since we have more than one explanatory variable.
+  \item
+    We'll use one numerical and one categorical explanatory variable in Section \ref{model3}. We'll also introduce \emph{interaction models} here; there, the effect of one explanatory variable depends on the value of another.
+  \end{itemize}
+\end{itemize}
+
+We'll study all four of these regression scenarios using real data, all easily accessible via R packages!
+
+\hypertarget{needed-packages-3}{%
+\subsection*{Needed packages}\label{needed-packages-3}}
+
+
+In this chapter we introduce a new package, \texttt{moderndive}, that is an accompaniment package to this ModernDive book. It includes useful functions for linear regression and other functions as well as data used later in the book. Let's now load all the packages needed for this chapter. If needed, read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(moderndive)}
+\KeywordTok{library}\NormalTok{(gapminder)}
+\KeywordTok{library}\NormalTok{(skimr)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{model1}{%
+\section{One numerical explanatory variable}\label{model1}}
+
+Why do some professors and instructors at universities and colleges get high teaching evaluations from students while others don't? What factors can explain these differences? Are there biases? These are questions that are of interest to university/college administrators, as teaching evaluations are among the many criteria considered in determining which professors and instructors should get promotions. Researchers at the University of Texas in Austin, Texas (UT Austin) tried to answer this question: what factors can explain differences in instructor's teaching evaluation scores? To this end, they collected information on \(n = 463\) instructors. A full description of the study can be found at \href{https://www.openintro.org/stat/data/?data=evals}{openintro.org}.
+
+We'll keep things simple for now and try to explain differences in instructor evaluation scores as a function of one numerical variable: their ``beauty score.'' The specifics on how this score was calculated will be described shortly.
+
+Could it be that instructors with higher beauty scores also have higher teaching evaluations? Could it be instead that instructors with higher beauty scores tend to have lower teaching evaluations? Or could it be there is no relationship between beauty score and teaching evaluations?
+
+We'll achieve ways to address these questions by modeling the relationship between these two variables with a particular kind of linear regression called \emph{simple linear regression}. Simple linear regression is the most basic form of linear regression. With it we have
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  A numerical outcome variable \(y\). In this case, their teaching score.
+\item
+  A single numerical explanatory variable \(x\). In this case, their beauty score.
+\end{enumerate}
+
+\hypertarget{model1EDA}{%
+\subsection{Exploratory data analysis}\label{model1EDA}}
+
+A crucial step before doing any kind of modeling or analysis is performing an \emph{exploratory data analysis}, or EDA, of all our data. Exploratory data analysis can give you a sense of the distribution of the data, and whether there are outliers and/or missing values. Most importantly, it can inform how to build your model. There are many approaches to exploratory data analysis; here are three:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Most fundamentally: just looking at the raw values, in a spreadsheet for example. While this may seem trivial, many people ignore this crucial step!
+\item
+  Computing summary statistics like means, medians, and standard deviations.
+\item
+  Creating data visualizations.
+\end{enumerate}
+
+Let's load the data, \texttt{select} only a subset of the variables, and look at the raw values. Recall you can look at the raw values by running \texttt{View()} in the console in RStudio to pop-up the spreadsheet viewer with the data frame of interest as the argument to \texttt{View()}. Here, however, we present only a snapshot of five randomly chosen rows:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{evals_ch6 <-}\StringTok{ }\NormalTok{evals }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{select}\NormalTok{(score, bty_avg, age)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{evals_ch6 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{sample_n}\NormalTok{(}\DecValTok{5}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:unnamed-chunk-148}Random sample of 5 instructors}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{rrr}
+\toprule
+score & bty\_avg & age\\
+\midrule
+3.6 & 6.67 & 34\\
+4.9 & 3.50 & 43\\
+3.3 & 2.33 & 47\\
+4.4 & 4.67 & 33\\
+4.7 & 3.67 & 60\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+While a full description of each of these variables can be found at \href{https://www.openintro.org/stat/data/?data=evals}{openintro.org}, let's summarize what each of these variables represents.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \texttt{score}: Numerical variable of the average teaching score based on students' evaluations between 1 and 5. This is the outcome variable \(y\) of interest.
+\item
+  \texttt{bty\_avg}: Numerical variable of average ``beauty'' rating based on a panel of 6 students' scores between 1 and 10. This is the numerical explanatory variable \(x\) of interest. Here 1 corresponds to a low beauty rating and 10 to a high beauty rating.
+\item
+  \texttt{age}: A numerical variable of age in years as an integer value.
+\end{enumerate}
+
+Another way to look at the raw values is using the \texttt{glimpse()} function, which gives us a slightly different view of the data. We see \texttt{Observations:\ 463}, indicating that there are 463 observations in \texttt{evals}, each corresponding to a particular instructor at UT Austin. Expressed differently, each row in the data frame \texttt{evals} corresponds to one of 463 instructors.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{glimpse}\NormalTok{(evals_ch6)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Observations: 463
+Variables: 3
+$ score   <dbl> 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4,...
+$ bty_avg <dbl> 5.00, 5.00, 5.00, 5.00, 3.00, 3.00, 3.00, 3....
+$ age     <int> 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, ...
+\end{verbatim}
+
+Since both the outcome variable \texttt{score} and the explanatory variable \texttt{bty\_avg} are numerical, we can compute summary statistics about them such as the mean, median, and standard deviation. Let's take \texttt{evals\_ch6} and select only the two variables of interest for now. However, let's instead pipe this into the \texttt{skim()} function from the \texttt{skimr} package. This function quickly uses a ``skim'' of the data to return the following summary information about each variable.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{evals_ch6 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(score, bty_avg) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{skim}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+Skim summary statistics\\
+n obs: 463\\
+n variables: 2
+
+Variable type: numeric
+
+\begin{tabular}{l|l|l|l|l|l|l|l|l|l|l|l}
+\hline
+variable & missing & complete & n & mean & sd & p0 & p25 & p50 & p75 & p100 & hist\\
+\hline
+bty\_avg & 0 & 463 & 463 & 4.42 & 1.53 & 1.67 & 3.17 & 4.33 & 5.5 & 8.17 & ▂▅▅▇▃▃▂▁\\
+\hline
+score & 0 & 463 & 463 & 4.17 & 0.54 & 2.3 & 3.8 & 4.3 & 4.6 & 5 & ▁▁▂▃▅▇▇▆\\
+\hline
+\end{tabular}
+
+In this case for our two numerical variables \texttt{bty\_avg} beauty score and teaching score \texttt{score} it returns:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{missing}: the number of missing values
+\item
+  \texttt{complete}: the number of non-missing or complete values
+\item
+  \texttt{n}: the total number of values
+\item
+  \texttt{mean}: the average
+\item
+  \texttt{sd}: the standard deviation
+\item
+  \texttt{p0}: the 0\textsuperscript{th} percentile: the value at which 0\% of observations are smaller than it. This is also known as the \emph{minimum}
+\item
+  \texttt{p25}: the 25\textsuperscript{th} percentile: the value at which 25\% of observations are smaller than it. This is also known as the \emph{1\textsuperscript{st} quartile}
+\item
+  \texttt{p50}: the 50\textsuperscript{th} percentile: the value at which 50\% of observations are smaller than it. This is also know as the \emph{2\textsuperscript{nd}} quartile and more commonly the \emph{median}
+\item
+  \texttt{p75}: the 75\textsuperscript{th} percentile: the value at which 75\% of observations are smaller than it. This is also known as the \emph{3\textsuperscript{rd} quartile}
+\item
+  \texttt{p100}: the 100\textsuperscript{th} percentile: the value at which 100\% of observations are smaller than it. This is also known as the \emph{maximum}
+\item
+  A quick snapshot of the \texttt{hist}ogram
+\end{itemize}
+
+We get an idea of how the values in both variables are distributed. For example, the mean teaching score was 4.17 out of 5 whereas the mean beauty score was 4.42 out of 10. Furthermore, the middle 50\% of teaching scores were between 3.80 and 4.6 (the first and third quartiles) while the middle 50\% of beauty scores were between 3.17 and 5.5 out of 10.
+
+The \texttt{skim()} function however only returns what are called \emph{univariate} summaries, i.e.~summaries about single variables at a time. Since we are considering the relationship between two numerical variables, it would be nice to have a summary statistic that simultaneously considers both variables. The \emph{correlation coefficient} is a \emph{bivariate} summary statistic that fits this bill. \emph{Coefficients} in general are quantitative expressions of a specific property of a phenomenon. A correlation coefficient is a quantitative expression between -1 and 1 that summarizes the \emph{strength of the linear relationship between two numerical variables}:
+
+\begin{itemize}
+\tightlist
+\item
+  -1 indicates a perfect \emph{negative relationship}: as the value of one variable goes up, the value of the other variable tends to go down.
+\item
+  0 indicates no relationship: the values of both variables go up/down independently of each other.
+\item
+  +1 indicates a perfect \emph{positive relationship}: as the value of one variable goes up, the value of the other variable tends to go up as well.
+\end{itemize}
+
+Figure \ref{fig:correlation1} gives examples of different correlation coefficient values for hypothetical numerical variables \(x\) and \(y\). We see that while for a correlation coefficient of -0.75 there is still a negative relationship between \(x\) and \(y\), it is not as strong as the negative relationship between \(x\) and \(y\) when the correlation coefficient is -1.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/correlation1-1} 
+
+}
+
+\caption{Different correlation coefficients}\label{fig:correlation1}
+\end{figure}
+
+The correlation coefficient is computed using the \texttt{get\_correlation()} function in the \texttt{moderndive} package, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. We place the name of the response variable on the left hand side of the \texttt{\textasciitilde{}} and the explanatory variable on the right hand side of the ``tilde.'' We will use this same ``formula'' syntax with regression later in this chapter.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{evals_ch6 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_correlation}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ score }\OperatorTok{~}\StringTok{ }\NormalTok{bty_avg)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  correlation
+        <dbl>
+1       0.187
+\end{verbatim}
+
+The correlation coefficient can also be computed using the \texttt{cor()} function, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. Recall from Subsection \ref{exploredataframes} that the \texttt{\$} pulls out specific variables from a data frame:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{cor}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ evals_ch6}\OperatorTok{$}\NormalTok{bty_avg, }\DataTypeTok{y =}\NormalTok{ evals_ch6}\OperatorTok{$}\NormalTok{score)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+[1] 0.187
+\end{verbatim}
+
+In our case, the correlation coefficient of 0.187 indicates that the relationship between teaching evaluation score and beauty average is ``weakly positive.'' There is a certain amount of subjectivity in interpreting correlation coefficients, especially those that aren't close to -1, 0, and 1. For help developing such intuition and more discussion on the correlation coefficient see Subsection \ref{correlationcoefficient} below.
+
+Let's now proceed by visualizing this data. Since both the \texttt{score} and \texttt{bty\_avg} variables are numerical, a scatterplot is an appropriate graph to visualize this data. Let's do this using \texttt{geom\_point()} and set informative axes labels and title and display the result in Figure \ref{fig:numxplot1}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(evals_ch6, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ bty_avg, }\DataTypeTok{y =}\NormalTok{ score)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Beauty Score"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Teaching Score"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Relationship of teaching and beauty scores"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxplot1-1} 
+
+}
+
+\caption{Instructor evaluation scores at UT Austin}\label{fig:numxplot1}
+\end{figure}
+
+Observe the following:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Most ``beauty'' scores lie between 2 and 8.
+\item
+  Most teaching scores lie between 3 and 5.
+\item
+  Recall our earlier computation of the correlation coefficient, which describes the strength of the linear relationship between two numerical variables. Looking at Figure \ref{fig:numxplot2}, it is not immediately apparent that these two variables are positively related. This is to be expected given the positive, but rather weak (close to 0), correlation coefficient of 0.187.
+\end{enumerate}
+
+Before we continue, we bring to light an important fact about this dataset: it suffers from \emph{overplotting}. Recall from the data visualization Subsection \ref{overplotting} that overplotting occurs when several points are stacked directly on top of each other thereby obscuring the number of points. For example, let's focus on the 6 points in the top-right of the plot with a beauty score of around 8 out of 10: are there truly only 6 points, or are there many more just stacked on top of each other? You can think of these as \emph{ties}. Let's break up these ties with a little random ``jitter'' added to the points in Figure \ref{fig:numxplot2}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxplot2-1} 
+
+}
+
+\caption{Instructor evaluation scores at UT Austin: Jittered}\label{fig:numxplot2}
+\end{figure}
+
+Jittering adds a little random bump to each of the points to break up these ties: just enough so you can distinguish them, but not so much that the plot is overly altered. Furthermore, jittering is strictly a visualization tool; it does not alter the original values in the dataset.
+
+Let's compare side-by-side the regular scatterplot in Figure \ref{fig:numxplot1} with the jittered scatterplot in Figure \ref{fig:numxplot2} in Figure \ref{fig:numxplot2-a}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxplot2-a-1} 
+
+}
+
+\caption{Comparing regular and jittered scatterplots.}\label{fig:numxplot2-a}
+\end{figure}
+
+We make several further observations:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Focusing our attention on the top-right of the plot again, as noted earlier where there seemed to only be 6 points in the regular scatterplot, we see there were in fact really 9 as seen in the jittered scatterplot.
+\item
+  A further interesting trend is that the jittering revealed a large number of instructors with beauty scores of between 3 and 4.5, towards the lower end of the beauty scale.
+\end{enumerate}
+
+To keep things simple in this chapter, we'll present regular scatterplots rather than the jittered scatterplots, though we'll keep the overplotting in mind whenever looking at such plots. Going back to scatterplot in Figure \ref{fig:numxplot1}, let's improve on it by adding a ``regression line'' in Figure \ref{fig:numxplot3}. This is easily done by adding a new layer to the \texttt{ggplot} code that created Figure \ref{fig:numxplot2}: \texttt{+\ geom\_smooth(method\ =\ "lm")}. A regression line is a ``best fitting'' line in that of all possible lines you could draw on this plot, it is ``best'' in terms of some mathematical criteria. We discuss the criteria for ``best'' in Subsection \ref{leastsquares} below, but we suggest you read this only after covering the concept of a \emph{residual} coming up in Subsection \ref{model1points}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(evals_ch6, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ bty_avg, }\DataTypeTok{y =}\NormalTok{ score)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Beauty Score"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Teaching Score"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Relationship of teaching and beauty scores"}\NormalTok{) }\OperatorTok{+}\StringTok{  }
+\StringTok{  }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxplot3-1} 
+
+}
+
+\caption{Regression line}\label{fig:numxplot3}
+\end{figure}
+
+When viewed on this plot, the regression line is a visual summary of the relationship between two numerical variables, in our case the outcome variable \texttt{score} and the explanatory variable \texttt{bty\_avg}. The positive slope of the blue line is consistent with our observed correlation coefficient of 0.187 suggesting that there is a positive relationship between \texttt{score} and \texttt{bty\_avg}. We'll see later however that while the correlation coefficient is not equal to the slope of this line, they always have the same sign: positive or negative.
+
+What are the grey bands surrounding the blue line? These are \emph{standard error} bands, which can be thought of as error/uncertainty bands. Let's skip this idea for now and suppress these grey bars by adding the argument \texttt{se\ =\ FALSE} to \texttt{geom\_smooth(method\ =\ "lm")}. We'll introduce standard errors in Chapter \ref{sampling} on sampling, use them for constructing \emph{confidence intervals} and conducting \emph{hypothesis tests} in Chapters \ref{confidence-intervals} and \ref{hypothesis-testing}, and consider them when we revisit regression in Chapter \ref{inference-for-regression}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(evals_ch6, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ bty_avg, }\DataTypeTok{y =}\NormalTok{ score)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Beauty Score"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Teaching Score"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Relationship of teaching and beauty scores"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxplot4-1} 
+
+}
+
+\caption{Regression line without error bands}\label{fig:numxplot4}
+\end{figure}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC6.1)} Conduct a new exploratory data analysis with the same outcome variable \(y\) being \texttt{score} but with \texttt{age} as the new explanatory variable \(x\). Remember, this involves three things:
+
+\begin{enumerate}
+\def\labelenumi{\alph{enumi})}
+\tightlist
+\item
+  Looking at the raw values.
+\item
+  Computing summary statistics of the variables of interest.
+\item
+  Creating informative visualizations.
+\end{enumerate}
+
+What can you say about the relationship between age and teaching scores based on this exploration?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{model1table}{%
+\subsection{Simple linear regression}\label{model1table}}
+
+You may recall from secondary school / high school algebra, in general, the equation of a line is \(y = a + bx\), which is defined by two coefficients. Recall we defined this earlier as ``quantitative expressions of a specific property of a phenomenon.'' These two coefficients are:
+
+\begin{itemize}
+\tightlist
+\item
+  the intercept coefficient \(a\), or the value of \(y\) when \(x = 0\), and
+\item
+  the slope coefficient \(b\), or the increase in \(y\) for every increase of one in \(x\).
+\end{itemize}
+
+However, when defining a line specifically for regression, like the blue regression line in Figure \ref{fig:numxplot4}, we use slightly different notation: the equation of the regression line is \(\widehat{y} = b_0 + b_1 \cdot x\) where
+
+\begin{itemize}
+\tightlist
+\item
+  the intercept coefficient is \(b_0\), or the value of \(\widehat{y}\) when \(x=0\), and
+\item
+  the slope coefficient \(b_1\), or the increase in \(\widehat{y}\) for every increase of one in \(x\).
+\end{itemize}
+
+Why do we put a ``hat'' on top of the \(y\)? It's a form of notation commonly used in regression, which we'll introduce in the next Subsection \ref{model1points} when we discuss \emph{fitted values}. For now, let's ignore the hat and treat the equation of the line as you would from secondary school / high school algebra recognizing the slope and the intercept. We know looking at Figure \ref{fig:numxplot4} that the slope coefficient corresponding to \texttt{bty\_avg} should be positive. Why? Because as \texttt{bty\_avg} increases, professors tend to roughly have larger teaching evaluation \texttt{scores}. However, what are the specific values of the intercept and slope coefficients? Let's not worry about computing these by hand, but instead let the computer do the work for us. Specifically let's use R!
+
+Let's get the value of the intercept and slope coefficients by outputting something called the \emph{linear regression table}. We will fit the linear regression model to the \texttt{data} using the \texttt{lm()} function and save this to \texttt{score\_model}. \texttt{lm} stands for ``linear model'', given that we are dealing with lines. When we say ``fit'', we are saying find the best fitting line to this data.
+
+The \texttt{lm()} function that ``fits'' the linear regression model is typically used as \texttt{lm(y\ \textasciitilde{}\ x,\ data\ =\ data\_frame\_name)} where:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{y} is the outcome variable, followed by a tilde (\texttt{\textasciitilde{}}). This is likely the key to the left of ``1'' on your keyboard. In our case, \texttt{y} is set to \texttt{score}.
+\item
+  \texttt{x} is the explanatory variable. In our case, \texttt{x} is set to \texttt{bty\_avg}. We call the combination \texttt{y\ \textasciitilde{}\ x} a \emph{model formula}. Recall the use of this notation when we computed the correlation coefficient using the \texttt{get\_correlation()} function in Subsection \ref{model1EDA}.
+\item
+  \texttt{data\_frame\_name} is the name of the data frame that contains the variables \texttt{y} and \texttt{x}. In our case, \texttt{data\_frame\_name} is the \texttt{evals\_ch6} data frame.
+\end{itemize}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{score_model <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{bty_avg, }\DataTypeTok{data =}\NormalTok{ evals_ch6)}
+\NormalTok{score_model}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+
+Call:
+lm(formula = score ~ bty_avg, data = evals_ch6)
+
+Coefficients:
+(Intercept)      bty_avg  
+     3.8803       0.0666  
+\end{verbatim}
+
+This output is telling us that the \texttt{Intercept} coefficient \(b_0\) of the regression line is 3.8803 and the slope coefficient for \texttt{by\_avg} is 0.0666. Therefore the blue regression line in Figure \ref{fig:numxplot4} is
+
+\[\widehat{\text{score}} = b_0 + b_{\text{bty avg}} \cdot\text{bty avg} = 3.8803 + 0.0666\cdot\text{ bty avg}\]
+
+where
+
+\begin{itemize}
+\item
+  The intercept coefficient \(b_0 = 3.8803\) means for instructors that had a hypothetical beauty score of 0, we would expect them to have on average a teaching score of 3.8803. In this case however, while the intercept has a mathematical interpretation when defining the regression line, there is no \emph{practical} interpretation since \texttt{score} is an average of a panel of 6 students' ratings from 1 to 10, a \texttt{bty\_avg} of 0 would be impossible. Furthermore, no instructors had a beauty score anywhere near 0 in this data.
+\item
+  Of more interest is the slope coefficient associated with \texttt{bty\_avg}: \(b_{\text{bty avg}} = +0.0666\). This is a numerical quantity that summarizes the relationship between the outcome and explanatory variables. Note that the sign is positive, suggesting a positive relationship between beauty scores and teaching scores, meaning as beauty scores go up, so also do teaching scores go up. The slope's precise interpretation is:
+
+  \begin{quote}
+  For every increase of 1 unit in \texttt{bty\_avg}, there is an \emph{associated} increase of, \emph{on average}, 0.0666 units of \texttt{score}.
+  \end{quote}
+\end{itemize}
+
+Such interpretations need be carefully worded:
+
+\begin{itemize}
+\tightlist
+\item
+  We only stated that there is an \emph{associated} increase, and not necessarily a \emph{causal} increase. For example, perhaps it's not that beauty directly affects teaching scores, but instead individuals from wealthier backgrounds tend to have had better education and training, and hence have higher teaching scores, but these same individuals also have higher beauty scores. Avoiding such reasoning can be summarized by the adage ``correlation is not necessarily causation.'' In other words, just because two variables are correlated, it doesn't mean one directly causes the other. We discuss these ideas more in Subsection \ref{correlation-is-not-causation}.\\
+\item
+  We say that this associated increase is \emph{on average} 0.0666 units of teaching \texttt{score} and not that the associated increase is \emph{exactly} 0.0666 units of \texttt{score} across all values of \texttt{bty\_avg}. This is because the slope is the average increase across all points as shown by the regression line in Figure \ref{fig:numxplot4}.
+\end{itemize}
+
+Now that we've learned how to compute the equation for the blue regression line in Figure \ref{fig:numxplot4} and interpreted all its terms, let's take our modeling one step further. This time after fitting the model using the \texttt{lm()}, let's get something called the \emph{regression table} using the \texttt{get\_regression\_table()} function from the \texttt{moderndive} package:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Fit regression model:}
+\NormalTok{score_model <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{bty_avg, }\DataTypeTok{data =}\NormalTok{ evals_ch6)}
+\CommentTok{# Get regression table:}
+\KeywordTok{get_regression_table}\NormalTok{(score_model)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:numxplot4b}Linear regression table}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrrrrrr}
+\toprule
+term & estimate & std\_error & statistic & p\_value & lower\_ci & upper\_ci\\
+\midrule
+intercept & 3.880 & 0.076 & 50.96 & 0 & 3.731 & 4.030\\
+bty\_avg & 0.067 & 0.016 & 4.09 & 0 & 0.035 & 0.099\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Note how we took the output of the model fit saved in \texttt{score\_model} and used it as an input to the subsequent \texttt{get\_regression\_table()} function. The output now looks like a table: in fact it is a data frame. The values of the intercept and slope of 3.880 and 0.0666 are now in the \texttt{estimate} column. But what are the remaining 5 columns: \texttt{std\_error}, \texttt{statistic}, \texttt{p\_value}, \texttt{lower\_ci} and \texttt{upper\_ci}? What do they tell us? They tell us about both the \emph{statistical significance} and \emph{practical significance} of our model results. You can think of this loosely as the ``meaningfulness'' of the results from a statistical perspective.
+
+We are going to put aside these ideas for now and revisit them in Chapter \ref{inference-for-regression} on (statistical) inference for regression, after we've had a chance to cover:
+
+\begin{itemize}
+\tightlist
+\item
+  Standard errors in Chapter \ref{sampling} (\texttt{std\_error})
+\item
+  Confidence intervals in Chapter \ref{confidence-intervals} (\texttt{lower\_ci} and \texttt{upper\_ci})
+\item
+  Hypothesis testing in Chapter \ref{hypothesis-testing} (\texttt{statistic} and \texttt{p\_value}).
+\end{itemize}
+
+For now, we'll only focus on the \texttt{term} and \texttt{estimate} columns of any regression table.
+
+The \texttt{get\_regression\_table()} from the \texttt{moderndive} is an example of what's known as a \emph{wrapper function} in computer programming, which takes other pre-existing functions and ``wraps'' them into a single function. This concept is illustrated in Figure \ref{fig:moderndive-figure-wrapper}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/flowcharts/flowchart.011-cropped} 
+
+}
+
+\caption{The concept of a 'wrapper' function.}\label{fig:moderndive-figure-wrapper}
+\end{figure}
+
+So all you need to worry about is the what the inputs look like and what the outputs look like; you leave all the other details ``under the hood of the car.'' In our regression modeling example, the \texttt{get\_regression\_table()} has
+
+\begin{itemize}
+\tightlist
+\item
+  Input: A saved \texttt{lm()} linear regression
+\item
+  Output: A data frame with information on the intercept and slope of the regression line.
+\end{itemize}
+
+If you're interested in learning more about the \texttt{get\_regression\_table()} function's construction and thinking, see Subsection \ref{underthehood} below.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC6.2)} Fit a new simple linear regression using \texttt{lm(score\ \textasciitilde{}\ age,\ data\ =\ evals\_ch6)} where \texttt{age} is the new explanatory variable \(x\). Get information about the ``best-fitting'' line from the regression table by applying the \texttt{get\_regression\_table()} function. How do the regression results match up with the results from your exploratory data analysis above?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{model1points}{%
+\subsection{Observed/fitted values and residuals}\label{model1points}}
+
+We just saw how to get the value of the intercept and the slope of the regression line from the regression table generated by \texttt{get\_regression\_table()}. Now instead, say we want information on individual points. In this case, we focus on one of the \(n = 463\) instructors in this dataset, corresponding to a single row of \texttt{evals\_ch6}.
+
+For example, say we are interested in the 21st instructor in this dataset:
+
+\begin{table}[H]
+
+\caption{\label{tab:unnamed-chunk-161}Data for 21st instructor}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{rrr}
+\toprule
+score & bty\_avg & age\\
+\midrule
+4.9 & 7.33 & 31\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+What is the value on the blue line corresponding to this instructor's \texttt{bty\_avg} of 7.333? In Figure \ref{fig:numxplot5} we mark three values in particular corresponding to this instructor.
+
+\begin{itemize}
+\tightlist
+\item
+  Red circle: This is the \emph{observed value} \(y\) = 4.9 and corresponds to this instructor's actual teaching score.
+\item
+  Red square: This is the \emph{fitted value} \(\widehat{y}\) and corresponds to the value on the regression line for \(x\) = 7.333. This value is computed using the intercept and slope in the regression table above: \[\widehat{y} = b_0 + b_1 \cdot x = 3.88 + 0.067 * 7.333 = 4.369\]
+\item
+  Blue arrow: The length of this arrow is the \emph{residual} and is computed by subtracting the fitted value \(\widehat{y}\) from the observed value \(y\). The residual can be thought of as the error or ``lack of fit'' of the regression line. In the case of this instructor, it is \(y - \widehat{y}\) = 4.9 - 4.369 = 0.531. In other words, the model was off by 0.531 teaching score units for this instructor.
+\end{itemize}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxplot5-1} 
+
+}
+
+\caption{Example of observed value, fitted value, and residual}\label{fig:numxplot5}
+\end{figure}
+
+What if we want both
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  the fitted value \(\widehat{y} = b_0 + b_1 \cdot x\) and
+\item
+  the residual \(y - \widehat{y}\)
+\end{enumerate}
+
+not only the 21st instructor but for all 463 instructors in the study? Recall that each instructor corresponds to one of the 463 rows in the \texttt{evals\_ch6} data frame and also one of the 463 points in the regression plot in Figure \ref{fig:numxplot4}.
+
+We could repeat the above calculations by hand 463 times, but that would be tedious and time consuming. Instead, let's use the \texttt{get\_regression\_points()} function that we've included in the \texttt{moderndive} R package. Note that in the table below we only present the results for the 21st through the 24th instructors.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{regression_points <-}\StringTok{ }\KeywordTok{get_regression_points}\NormalTok{(score_model)}
+\NormalTok{regression_points}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[t]
+
+\caption{\label{tab:unnamed-chunk-164}Regression points (for only 21st through 24th instructor)}
+\centering
+\begin{tabular}{rrrrr}
+\toprule
+ID & score & bty\_avg & score\_hat & residual\\
+\midrule
+21 & 4.9 & 7.33 & 4.37 & 0.531\\
+22 & 4.6 & 7.33 & 4.37 & 0.231\\
+23 & 4.5 & 7.33 & 4.37 & 0.131\\
+24 & 4.4 & 5.50 & 4.25 & 0.153\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Just as with the \texttt{get\_regression\_table()} function, the inputs to the \texttt{get\_regression\_points()} function are the same, however the outputs are different. Let's inspect the individual columns:
+
+\begin{itemize}
+\tightlist
+\item
+  The \texttt{score} column represents the observed value of the outcome variable \(y\).
+\item
+  The \texttt{bty\_avg} column represents the values of the explanatory variable \(x\).
+\item
+  The \texttt{score\_hat} column represents the fitted values \(\widehat{y}\).
+\item
+  The \texttt{residual} column represents the residuals \(y - \widehat{y}\).
+\end{itemize}
+
+\texttt{get\_regression\_points()} is another example of a wrapper function we described in Figure \ref{fig:moderndive-figure-wrapper}. If you're curious about this function as well, check out Subsection \ref{underthehood}.
+
+Just as we did for the 21st instructor in the \texttt{evals\_ch6} dataset (in the first row of the table above), let's repeat the above calculations for the 24th instructor in the \texttt{evals\_ch6} dataset (in the fourth row of the table above):
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{score} = 4.4 is the observed value \(y\) for this instructor.
+\item
+  \texttt{bty\_avg} = 5.50 is the value of the explanatory variable \(x\) for this instructor.
+\item
+  \texttt{score\_hat} = 4.25 = 3.88 + 0.067 * \(x\) = 3.88 + 0.067 * 5.50 is the fitted value \(\widehat{y}\) for this instructor.
+\item
+  \texttt{residual} = 0.153 = 4.4 - 4.25 is the value of the residual for this instructor. In other words, the model was off by 0.153 teaching score units for this instructor.
+\end{itemize}
+
+More development of this idea appears in Section \ref{leastsquares} and we encourage you to read that section after you investigate residuals.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{model2}{%
+\section{One categorical explanatory variable}\label{model2}}
+
+It's an unfortunate truth that life expectancy is not the same across various countries in the world; there are a multitude of factors that are associated with how long people live. International development agencies are very interested in studying these differences in the hope of understanding where governments should allocate resources to address this problem. In this section, we'll explore differences in life expectancy in two ways:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Differences between continents: Are there significant differences in life expectancy, on average, between the five continents of the world: Africa, the Americas, Asia, Europe, and Oceania?
+\item
+  Differences within continents: How does life expectancy vary within the world's five continents? For example, is the spread of life expectancy among the countries of Africa larger than the spread of life expectancy among the countries of Asia?
+\end{enumerate}
+
+To answer such questions, we'll study the \texttt{gapminder} dataset in the \texttt{gapminder} package. Recall we mentioned this dataset in Subsection \ref{gapminder} when we first studied the ``Grammar of Graphics'' introduced in Figure \ref{fig:gapminder}. This dataset has international development statistics such as life expectancy, GDP per capita, and population by country (\(n\) = 142) for 5-year intervals between 1952 and 2007.
+
+We'll use this data for linear regression again, but note that our explanatory variable \(x\) is now categorical, and not numerical like when we covered simple linear regression in Section \ref{model1}. More precisely, we have:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  A numerical outcome variable \(y\). In this case, life expectancy.
+\item
+  A single categorical explanatory variable \(x\), In this case, the continent the country is part of.
+\end{enumerate}
+
+When the explanatory variable \(x\) is categorical, the concept of a ``best-fitting'' line is a little different than the one we saw previously in Section \ref{model1} where the explanatory variable \(x\) was numerical. We'll study these differences shortly in Subsection \ref{model2table}, but first we conduct our exploratory data analysis.
+
+\hypertarget{model2EDA}{%
+\subsection{Exploratory data analysis}\label{model2EDA}}
+
+Let's load the \texttt{gapminder} data and \texttt{filter()} for only observations in 2007. Next we \texttt{select()} only the variables we'll need along with \texttt{gdpPercap}, which is each country's gross domestic product per capita (GDP). GDP is a rough measure of that country's economic performance. (This will be used for the upcoming Learning Check). Lastly, we save this in a data frame with name \texttt{gapminder2007}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(gapminder)}
+\NormalTok{gapminder2007 <-}\StringTok{ }\NormalTok{gapminder }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{filter}\NormalTok{(year }\OperatorTok{==}\StringTok{ }\DecValTok{2007}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(country, continent, lifeExp, gdpPercap)}
+\end{Highlighting}
+\end{Shaded}
+
+You should look at the raw data values both by bringing up RStudio's spreadsheet viewer and the \texttt{glimpse()} function. In Table \ref{tab:model2-data-preview} we only show 5 randomly selected countries out of 142:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{View}\NormalTok{(gapminder2007)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:model2-data-preview}Random sample of 5 countries}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{llrr}
+\toprule
+country & continent & lifeExp & gdpPercap\\
+\midrule
+Namibia & Africa & 52.9 & 4811\\
+Portugal & Europe & 78.1 & 20510\\
+Iran & Asia & 71.0 & 11606\\
+Brazil & Americas & 72.4 & 9066\\
+Italy & Europe & 80.5 & 28570\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{glimpse}\NormalTok{(gapminder2007)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Observations: 142
+Variables: 4
+$ country   <fct> Afghanistan, Albania, Algeria, Angola, Arg...
+$ continent <fct> Asia, Europe, Africa, Africa, Americas, Oc...
+$ lifeExp   <dbl> 43.8, 76.4, 72.3, 42.7, 75.3, 81.2, 79.8, ...
+$ gdpPercap <dbl> 975, 5937, 6223, 4797, 12779, 34435, 36126...
+\end{verbatim}
+
+We see that the variable \texttt{continent} is indeed categorical, as it is encoded as \texttt{fct} which stands for ``factor.'' This is R's way of storing categorical variables. Let's once again apply the \texttt{skim()} function from the \texttt{skimr} package to our two variables of interest: \texttt{continent} and \texttt{lifeExp}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{gapminder2007 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(continent, lifeExp) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{skim}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Skim summary statistics
+ n obs: 142 
+ n variables: 2 
+
+-- Variable type:factor -------------------------------------------------------
+  variable missing complete   n n_unique
+ continent       0      142 142        5
+                         top_counts ordered
+ Afr: 52, Asi: 33, Eur: 30, Ame: 25   FALSE
+
+-- Variable type:numeric ------------------------------------------------------
+ variable missing complete   n  mean    sd    p0   p25   p50
+  lifeExp       0      142 142 67.01 12.07 39.61 57.16 71.94
+   p75 p100     hist
+ 76.41 82.6 ▂▂▂▂▂▃▇▇
+\end{verbatim}
+
+The output now reports summaries for categorical variables (the variable type: factor) separately from the numerical variables. For the categorical variable \texttt{continent} it now reports:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{missing}, \texttt{complete}, \texttt{n} as before which are the number of missing, complete, and total number of values.
+\item
+  \texttt{n\_unique}: The unique number of levels to this variable, corresponding to Africa, Asia, Americas, Europe, and Oceania
+\item
+  \texttt{top\_counts}: In this case the top four counts: Africa has 52 entries each corresponding to a country, Asia has 33, Europe has 30, and Americans has 25. Not displayed is Oceania with 2 countries
+\item
+  \texttt{ordered}: Reporting whether the variable is ``ordinal.'' In this case, it is not ordered.
+\end{itemize}
+
+Given that the global median life expectancy is 71.94, half of the world's countries (71 countries) will have a life expectancy less than 71.94. Further, half will have a life expectancy greater than this value. The mean life expectancy of 67.01 is lower however. Why are these two values different? Let's look at a histogram of \texttt{lifeExp} in Figure \ref{fig:lifeExp2007hist} to see why.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/lifeExp2007hist-1} 
+
+}
+
+\caption{Histogram of Life Expectancy in 2007}\label{fig:lifeExp2007hist}
+\end{figure}
+
+We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancy that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let's proceed by comparing median and mean life expectancy between continents by adding a \texttt{group\_by(continent)} to the above code:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{lifeExp_by_continent <-}\StringTok{ }\NormalTok{gapminder2007 }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(continent) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{median =} \KeywordTok{median}\NormalTok{(lifeExp), }\DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(lifeExp))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[t]
+
+\caption{\label{tab:catxplot0}Life expectancy by continent}
+\centering
+\begin{tabular}{lrr}
+\toprule
+continent & median & mean\\
+\midrule
+Africa & 52.9 & 54.8\\
+Americas & 72.9 & 73.6\\
+Asia & 72.4 & 70.7\\
+Europe & 78.6 & 77.6\\
+Oceania & 80.7 & 80.7\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+We see now that there are differences in life expectancy between the continents. For example let's focus on only medians. While the median life expectancy across all \(n = 142\) countries in 2007 was 71.935, the median life expectancy across the \(n =52\) countries in Africa was only 52.927.
+
+Let's create a corresponding visualization. One way to compare the life expectancy of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section \ref{facets}, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure \ref{fig:catxplot0b}, the variable we facet by is \texttt{continent}, which is categorical with five levels, each corresponding to the five continents of the world.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(gapminder2007, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ lifeExp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{5}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Life expectancy"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Number of countries"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Life expectancy by continent"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_wrap}\NormalTok{(}\OperatorTok{~}\StringTok{ }\NormalTok{continent, }\DataTypeTok{nrow =} \DecValTok{2}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/catxplot0b-1} 
+
+}
+
+\caption{Life expectancy in 2007}\label{fig:catxplot0b}
+\end{figure}
+
+Another way would be via a \texttt{geom\_boxplot} where we map the categorical variable \texttt{continent} to the \(x\)-axis and the different life expectancy within each continent on the \(y\)-axis; we do this in Figure \ref{fig:catxplot1}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(gapminder2007, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ continent, }\DataTypeTok{y =}\NormalTok{ lifeExp)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_boxplot}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Continent"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Life expectancy (years)"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Life expectancy by continent"}\NormalTok{) }
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/catxplot1-1} 
+
+}
+
+\caption{Life expectancy in 2007}\label{fig:catxplot1}
+\end{figure}
+
+Some people prefer comparing a numerical variable between different levels of a categorical variable, in this case comparing life expectancy between different continents, using a boxplot over a faceted histogram as we can make quick comparisons with single horizontal lines. For example, we can see that even the country with the highest life expectancy in Africa is still lower than all countries in Oceania.
+
+It's important to remember however that the solid lines in the middle of the boxes correspond to the medians (i.e.~the middle value) rather than the mean (the average). So, for example, if you look at Asia, the solid line denotes the median life expectancy of around 72 years, indicating to us that half of all countries in Asia have a life expectancy below 72 years whereas half of all countries in Asia have a life expectancy above 72 years. Furthermore, note that:
+
+\begin{itemize}
+\tightlist
+\item
+  Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes).
+\item
+  Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand.
+\end{itemize}
+
+Now, let's start making comparisons of life expectancy \emph{between} continents. Let's use Africa as a \emph{baseline for comparison}. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the ``eyeball test'' (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The median life expectancy of the Americas is roughly 20 years greater.
+\item
+  The median life expectancy of Asia is roughly 20 years greater.
+\item
+  The median life expectancy of Europe is roughly 25 years greater.
+\item
+  The median life expectancy of Oceania is roughly 27.8 years greater.
+\end{enumerate}
+
+Let's remember these four differences vs Africa corresponding to the Americas, Asia, Europe, and Oceania: 20, 20, 25, 27.8.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC6.3)} Conduct a new exploratory data analysis with the same explanatory variable \(x\) being \texttt{continent} but with \texttt{gdpPercap} as the new outcome variable \(y\). Remember, this involves three things:
+
+\begin{enumerate}
+\def\labelenumi{\alph{enumi})}
+\tightlist
+\item
+  Looking at the raw values
+\item
+  Computing summary statistics of the variables of interest.
+\item
+  Creating informative visualizations
+\end{enumerate}
+
+What can you say about the differences in GDP per capita between continents based on this exploration?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{model2table}{%
+\subsection{Linear regression}\label{model2table}}
+
+In Subsection \ref{model1table} we introduced \emph{simple} linear regression, which involves modeling the relationship between a numerical outcome variable \(y\) as a function of a numerical explanatory variable \(x\), in our life expectancy example, we now have a categorical explanatory variable \(x\) \texttt{continent}. While we still can fit a regression model, given our categorical explanatory variable we no longer have a concept of a ``best-fitting'' line, but rather ``differences relative to a baseline for comparison.''
+
+Before we fit our regression model, let's create a table similar to Table \ref{tab:catxplot0}, but
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Report the mean life expectancy for each continent.
+\item
+  Report the difference in mean life expectancy \emph{relative} to Africa's mean life expectancy of 54.806 in the column ``mean vs Africa''; this column is simply the ``mean'' column minus 54.806.
+\end{enumerate}
+
+Think back to your observations from the eyeball test of Figure \ref{fig:catxplot1} at the end of the last subsection. The column ``mean vs Africa'' is the same idea of comparing a summary statistic to a baseline for comparison, in this case the countries of Africa, but using means instead of medians.
+
+\begin{table}[H]
+
+\caption{\label{tab:continent-mean-life-expectancies}Mean life expectancy by continent}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrr}
+\toprule
+continent & mean & mean vs Africa\\
+\midrule
+Africa & 54.8 & 0.0\\
+Americas & 73.6 & 18.8\\
+Asia & 70.7 & 15.9\\
+Europe & 77.6 & 22.8\\
+Oceania & 80.7 & 25.9\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Now, let's use the \texttt{get\_regression\_table()} function we introduced in Section \ref{model1table} to get the \emph{regression table} for \texttt{gapminder2007} analysis:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{lifeExp_model <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(lifeExp }\OperatorTok{~}\StringTok{ }\NormalTok{continent, }\DataTypeTok{data =}\NormalTok{ gapminder2007)}
+\KeywordTok{get_regression_table}\NormalTok{(lifeExp_model)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:catxplot4b}Linear regression table}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrrrrrr}
+\toprule
+term & estimate & std\_error & statistic & p\_value & lower\_ci & upper\_ci\\
+\midrule
+intercept & 54.8 & 1.02 & 53.45 & 0 & 52.8 & 56.8\\
+continentAmericas & 18.8 & 1.80 & 10.45 & 0 & 15.2 & 22.4\\
+continentAsia & 15.9 & 1.65 & 9.68 & 0 & 12.7 & 19.2\\
+continentEurope & 22.8 & 1.70 & 13.47 & 0 & 19.5 & 26.2\\
+continentOceania & 25.9 & 5.33 & 4.86 & 0 & 15.4 & 36.5\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Just as before, we have the \texttt{term} and \texttt{estimates} columns of interest, but unlike before, we now have 5 rows corresponding to 5 outputs in our table: an intercept like before, but also \texttt{continentAmericas}, \texttt{continentAsia}, \texttt{continentEurope}, and \texttt{continentOceania}. What are these values? First, we must describe the equation for fitted value \(\widehat{y}\), which is a little more complicated when the \(x\) explanatory variable is categorical:
+
+\begin{align}
+\widehat{\text{life exp}} &= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)
+\end{align}
+
+Let's break this down. First, \(\mathbb{1}_{A}(x)\) is what's known in mathematics as an ``indicator function'' that takes one of two possible values:
+
+\[
+\mathbb{1}_{A}(x) = \left\{
+\begin{array}{ll}
+1 & \text{if } x \text{ is in } A \\
+0 & \text{if } \text{otherwise} \end{array}
+\right.
+\]
+
+In a statistical modeling context this is also known as a ``dummy variable''. In our case, let's consider the first such indicator variable:
+
+\[
+\mathbb{1}_{\mbox{Amer}}(x) = \left\{
+\begin{array}{ll}
+1 & \text{if } \text{country } x \text{ is in the Americas} \\
+0 & \text{otherwise}\end{array}
+\right.
+\]
+
+Now let's interpret the terms in the estimate column of the regression table. First \(b_0 =\) \texttt{intercept\ =\ 54.8} corresponds to the mean life expectancy for countries in Africa, since for country \(x\) in Africa we have the following equation:
+
+\begin{align}
+\widehat{\text{life exp}} &= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&= 54.8 + 18.8\cdot 0 + 15.9\cdot 0 + 22.8\cdot 0 + 25.9\cdot 0\\
+&= 54.8
+\end{align}
+
+i.e.~All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table \ref{tab:continent-mean-life-expectancies}.
+
+Next, \(b_{\text{Amer}}\) = \texttt{continentAmericas\ =\ 18.8} is the difference in mean life expectancy of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is:
+
+\begin{align}
+\widehat{\text{life exp}} &= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&= 54.8 + 18.8\cdot 1 + 15.9\cdot 0 + 22.8\cdot 0 + 25.9\cdot 0\\
+&= 54.8 + 18.8\\
+&= 72.9
+\end{align}
+
+i.e.~in this case, only the indicator function \(\mathbb{1}_{\mbox{Amer}}(x)\) is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table \ref{tab:continent-mean-life-expectancies}.
+
+Similarly, \(b_{\text{Asia}}\) = \texttt{continentAsia\ =\ 15.9} is the difference in mean life expectancy of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is:
+
+\begin{align}
+\widehat{\text{life exp}} &= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&= 54.8 + 18.8\cdot 0 + 15.9\cdot 1 + 22.8\cdot 0 + 25.9\cdot 0\\
+&= 54.8 + 15.9\\
+&= 70.7
+\end{align}
+
+i.e.~in this case, only the indicator function \(\mathbb{1}_{\mbox{Asia}}(x)\) is equal to 1, but all others are 0. Recall that 70.7 corresponds to the group mean life expectancy for all countries in Asia in Table \ref{tab:continent-mean-life-expectancies}. The same logic applies to \(b_{\text{Euro}} = 22.8\) and \(b_{\text{Ocean}} = 25.9\); they correspond to the ``offset'' in mean life expectancy for countries in Europe and Oceania, relative to the mean life expectancy of the baseline group for comparison of African countries.
+
+Let's generalize this idea a bit. If we fit a linear regression model using a categorical explanatory variable \(x\) that has \(k\) levels, a regression model will return an intercept and \(k - 1\) ``slope'' coefficients. When \(x\) is a numerical explanatory variable the interpretation is of a ``slope'' coefficient, but when \(x\) is categorical the meaning is a little trickier. They are \emph{offsets} relative to the baseline.
+
+In our case, since there are \(k = 5\) continents, the regression model returns an intercept corresponding to the baseline for comparison Africa and \(k - 1 = 4\) slope coefficients corresponding to the Americas, Asia, Europe, and Oceania. Africa was chosen as the baseline by R for no other reason than it is first alphabetically of the 5 continents. You can manually specify which continent to use as baseline instead of the default choice of whichever comes first alphabetically, but we leave that to a more advanced course. (The \texttt{forcats} package is particularly nice for doing this and we encourage you to explore using it.)
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC6.4)} Fit a new linear regression using \texttt{lm(gdpPercap\ \textasciitilde{}\ continent,\ data\ =\ gapminder2007)} where \texttt{gdpPercap} is the new outcome variable \(y\). Get information about the ``best-fitting'' line from the regression table by applying the \texttt{get\_regression\_table()} function. How do the regression results match up with the results from your exploratory data analysis above?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{model2points}{%
+\subsection{Observed/fitted values and residuals}\label{model2points}}
+
+Recall in Subsection \ref{model1points} when we had a numerical explanatory variable \(x\), we defined:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Observed values \(y\), or the observed value of the outcome variable
+\item
+  Fitted values \(\widehat{y}\), or the value on the regression line for a given \(x\) value
+\item
+  Residuals \(y - \widehat{y}\), or the error between the observed value and the fitted value
+\end{enumerate}
+
+What do fitted values \(\widehat{y}\) and residuals \(y - \widehat{y}\) correspond to when the explanatory variable \(x\) is categorical? Let's investigate these values for the first 10 countries in the \texttt{gapminder2007} dataset:
+
+\begin{table}[H]
+
+\caption{\label{tab:unnamed-chunk-179}First 10 out of 142 countries}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{llrr}
+\toprule
+country & continent & lifeExp & gdpPercap\\
+\midrule
+Afghanistan & Asia & 43.8 & 975\\
+Albania & Europe & 76.4 & 5937\\
+Algeria & Africa & 72.3 & 6223\\
+Angola & Africa & 42.7 & 4797\\
+Argentina & Americas & 75.3 & 12779\\
+\addlinespace
+Australia & Oceania & 81.2 & 34435\\
+Austria & Europe & 79.8 & 36126\\
+Bahrain & Asia & 75.6 & 29796\\
+Bangladesh & Asia & 64.1 & 1391\\
+Belgium & Europe & 79.4 & 33693\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Recall the \texttt{get\_regression\_points()} function we used in Subsection \ref{model1points} to return
+
+\begin{itemize}
+\tightlist
+\item
+  the observed value of the outcome variable,
+\item
+  all explanatory variables,
+\item
+  fitted values, and
+\item
+  residuals for all points in the regression. Recall that each ``point''. In this case, each row corresponds to one of 142 countries in the \texttt{gapminder2007} dataset. They are also the 142 observations used to construct the boxplots in Figure \ref{fig:catxplot1}.
+\end{itemize}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{regression_points <-}\StringTok{ }\KeywordTok{get_regression_points}\NormalTok{(lifeExp_model)}
+\NormalTok{regression_points}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:unnamed-chunk-181}Regression points (First 10 out of 142 countries)}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{rrlrr}
+\toprule
+ID & lifeExp & continent & lifeExp\_hat & residual\\
+\midrule
+1 & 43.8 & Asia & 70.7 & -26.900\\
+2 & 76.4 & Europe & 77.6 & -1.226\\
+3 & 72.3 & Africa & 54.8 & 17.495\\
+4 & 42.7 & Africa & 54.8 & -12.075\\
+5 & 75.3 & Americas & 73.6 & 1.712\\
+\addlinespace
+6 & 81.2 & Oceania & 80.7 & 0.515\\
+7 & 79.8 & Europe & 77.6 & 2.180\\
+8 & 75.6 & Asia & 70.7 & 4.907\\
+9 & 64.1 & Asia & 70.7 & -6.666\\
+10 & 79.4 & Europe & 77.6 & 1.792\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Notice
+
+\begin{itemize}
+\tightlist
+\item
+  The fitted values \texttt{lifeExp\_hat} \(\widehat{\text{lifeexp}}\). Countries in Africa have the
+  same fitted value of 54.8, which is the mean life expectancy of Africa. Countries in Asia have the same fitted value of 70.7, which is the mean life
+  expectancy of Asia. This similarly holds for countries in the Americas, Europe,
+  and Oceania.
+\item
+  The \texttt{residual} column is simply \(y - \widehat{y}\) = \texttt{lifeexp\ -\ lifeexp\_hat}.
+  These values can be interpreted as that particular country's deviation from the
+  mean life expectancy of the respective continent's mean. For example, the first
+  row of this dataset corresponds to Afghanistan, and the residual of
+  \(-26.9 = 43.8 - 70.7\) is Afghanistan's mean life expectancy minus the mean life
+  expectancy of all Asian countries.
+\end{itemize}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{related-topics}{%
+\section{Related topics}\label{related-topics}}
+
+\hypertarget{correlationcoefficient}{%
+\subsection{Correlation coefficient}\label{correlationcoefficient}}
+
+Let's re-plot Figure \ref{fig:correlation1}, but now consider a broader range of correlation coefficient values in Figure \ref{fig:correlation2}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/correlation2-1} 
+
+}
+
+\caption{Different Correlation Coefficients}\label{fig:correlation2}
+\end{figure}
+
+As we suggested in Subsection \ref{model1EDA}, interpreting coefficients that are not close to the extreme values of -1 and 1 can be subjective. To develop your sense of correlation coefficients, we suggest you play the following 80's-style video game called ``Guess the correlation''! Click on the image below to do so:
+
+\begin{center}
+\href{http://guessthecorrelation.com/}{\includegraphics[width=0.2\textwidth]{images/guess_the_correlation.png}}
+\end{center}
+
+\hypertarget{correlation-is-not-causation}{%
+\subsection{Correlation is not necessarily causation}\label{correlation-is-not-causation}}
+
+You'll note throughout this chapter we've been very cautious in making statements of the ``associated effect'' of explanatory variables on the outcome variables, for example our statement from Subsection \ref{model1table} that ``for every increase of 1 unit in \texttt{bty\_avg}, there is an \emph{associated} increase of, \emph{on average}, 18.802 units of \texttt{score}.'' We stay this because we are careful not to make \emph{causal} statements. So while beauty score \texttt{bty\_avg} is positively correlated with teaching \texttt{score}, does it directly cause effects on teaching score.
+
+For example, let's say an instructor has their \texttt{bty\_avg} reevaluated, but only after taking steps to try to boost their beauty score. Does this mean that they will suddenly be a better instructor? Or will they suddenly get higher teaching scores? Maybe?
+
+Here is another example, a not-so-great medical doctor goes through their medical records and finds that patients who slept with their shoes on tended to wake up more with headaches. So this doctor declares ``Sleeping with shoes on cause headaches!''
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/flowcharts/flowchart.010-cropped} 
+
+}
+
+\caption{Does sleeping with shoes on cause headaches?}\label{fig:moderndive-figure-causal-graph-2}
+\end{figure}
+
+However as some of you might have guessed, if someone is sleeping with their shoes on its probably because they are intoxicated. Furthermore, drinking more tends to cause more hangovers, and hence more headaches.
+
+In this instance, alcohol is what's known as a \emph{confounding/lurking} variable. It ``lurks'' behind the scenes, confounding or making less apparent, the causal effect (if any) of ``sleeping with shoes on'' with waking up with a headache. We can summarize this notion in Figure \ref{fig:moderndive-figure-causal-graph} with a \emph{causal graph} where:
+
+\begin{itemize}
+\tightlist
+\item
+  Y: Is an \emph{outcome} variable, here ``waking up with a headache.''
+\item
+  X: Is a \emph{treatment} variable whose causal effect we are interested in, here ``sleeping with shoes on.''
+\end{itemize}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/flowcharts/flowchart.009-cropped} 
+
+}
+
+\caption{Causal graph.}\label{fig:moderndive-figure-causal-graph}
+\end{figure}
+
+So for example, many such studies use regression modeling where the outcome variable is set to Y and the explanatory/predictor variable is X, much as you've started learning how to do in this chapter. However, Figure \ref{fig:moderndive-figure-causal-graph} also includes a third variable with arrows pointing at both X and Y.
+
+\begin{itemize}
+\tightlist
+\item
+  Z: Is a \emph{confounding} variable that affects both X \& Y, thus ``confounding'' their relationship.
+\end{itemize}
+
+So as we said, alcohol will both cause people to be more likely to sleep with their shoes on as well as more likely to wake up with a headache. Thus when evaluating what causes one to wake up with a headache, its hard to tease out the effect of sleeping with shoes on versus just the alcohol. Thus our model needs to also use Z as an explanatory/predictor variable as well, in other words our doctor needs to take into account who had been drinking the night before. We'll start covering multiple regression models that allows us to incorporate more than one variable in the next chapter.
+
+Establishing causation is a tricky problem and frequently takes either carefully designed experiments or methods to control for the effects of potential confounding variables. Both these approaches attempt either to remove all confounding variables or take them into account as best they can, and only focus on the behavior of an outcome variable in the presence of the levels of the other variable(s). Be careful as you read studies to make sure that the writers aren't falling into this fallacy of correlation implying causation. If you spot one, you may want to send them a link to \href{http://www.tylervigen.com/spurious-correlations}{Spurious Correlations}.
+
+\hypertarget{leastsquares}{%
+\subsection{Best fitting line}\label{leastsquares}}
+
+Regression lines are also known as ``best fitting lines''. But what do we mean by best? Let's unpack the criteria
+that is used by regression to determine best. Recall the plot in Figure \ref{fig:numxplot5} where for a instructor
+with a beauty average score of \(x=7.333\)
+
+\begin{itemize}
+\tightlist
+\item
+  The observed value \(y=4.9\) was marked with a red circle
+\item
+  The fitted value \(\widehat{y} = 4.369\) on the regression line was marked with a red square
+\item
+  The residual \(y-\widehat{y} = 4.9-4.369 = 0.531\) was the length of the blue arrow.
+\end{itemize}
+
+Let's do this for another arbitrarily chosen instructor whose beauty score was
+\(x=2.333\). The residual in this case is \(2.7 - 4.036 = -1.336\).
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-183-1} \end{center}
+
+Another arbitrarily chosen instructor whose beauty score was
+\(x=3.667\) results in the residual in this case being \(4.4 - 4.125 = 0.2753\).
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-184-1} \end{center}
+
+Let's do this one more time for another arbitrarily chosen instructor. This instructor had a beauty score of
+\(x = 6\). The residual in this case is \(3.8 - 4.28 = -0.4802\).
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/here-1} \end{center}
+
+Now let's say we repeated this process for all 463 instructors in our
+dataset. Regression \emph{minimizes the sum of all 463 arrow lengths
+squared.} In other words, it minimizes the sum of the squared residuals:
+
+\[
+\sum_{i=1}^{n}(y_i - \widehat{y}_i)^2
+\]
+
+We square the arrow lengths so that positive and negative deviations of the same amount are treated equally. That's why alternative names for the simple linear regression line are the \textbf{least-squares line} and the \textbf{best fitting line}. It can be proven via calculus and linear algebra that this line uniquely minimizes the sum of the squared arrow lengths.
+
+For the regression line in the plot, the sum of the squared residuals is 131.879. This is the lowest possible value of the sum of the squared residuals of all possible lines we could draw on this scatterplot? How do we know this? We can mathematically prove this fact, but this requires some calculus and linear algebra, so let's leave this proof for another course!
+
+\hypertarget{underthehood}{%
+\subsection{\texorpdfstring{\texttt{get\_regression\_x()} functions}{get\_regression\_x() functions}}\label{underthehood}}
+
+What is going on behind the scenes with the \texttt{get\_regression\_table()} \texttt{get\_regression\_points()} from the \texttt{moderndive} package? Recall we introduced
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  In Subsection \ref{model1table}, the \texttt{get\_regression\_table()} function that returned a regression table.
+\item
+  In Subsection \ref{model1points}, the \texttt{get\_regression\_points()} function that returned information on all \(n\) points/observations involved in a regression?
+\end{enumerate}
+
+and that these were examples of \emph{wrapper functions} that takes other pre-existing functions and ``wraps'' them in a single function. This way all the user needs to worry about is the input and the output format, and ignore what's ``under the hood.'' In this subsection we ``lift the hood'' and see how the engine of these wrapper functions work.
+
+First, the \texttt{get\_regression\_table()} wrapper function leverages the
+
+\begin{itemize}
+\tightlist
+\item
+  the \texttt{tidy()} function in the \href{https://broom.tidyverse.org/}{\texttt{broom} package} and
+\item
+  the \texttt{clean\_names()} function in the \href{https://github.com/sfirke/janitor}{\texttt{janitor} package}
+\end{itemize}
+
+to generate tidy data frames with information about a regression model. Here is what the regression table from Subsection \ref{model1table} looks like:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{score_model <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{bty_avg, }\DataTypeTok{data =}\NormalTok{ evals_ch6)}
+\KeywordTok{get_regression_table}\NormalTok{(score_model)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+\centering\begingroup\fontsize{10}{12}\selectfont
+
+\begin{tabular}{l|r|r|r|r|r|r}
+\hline
+term & estimate & std\_error & statistic & p\_value & lower\_ci & upper\_ci\\
+\hline
+intercept & 3.880 & 0.076 & 50.96 & 0 & 3.731 & 4.030\\
+\hline
+bty\_avg & 0.067 & 0.016 & 4.09 & 0 & 0.035 & 0.099\\
+\hline
+\end{tabular}
+\endgroup{}
+\end{table}
+
+The \texttt{get\_regression\_table()} function takes the above two functions that already existed in other R packages, uses them, and hides the details as seen below. This was on the editorial decision on our part as we felt the following code was unfortunately out of the reach for some new coders, so the following wrapper function was written so that users need only focus on the output.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(broom)}
+\KeywordTok{library}\NormalTok{(janitor)}
+\NormalTok{score_model }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{tidy}\NormalTok{(}\DataTypeTok{conf.int =} \OtherTok{TRUE}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate_if}\NormalTok{(is.numeric, round, }\DataTypeTok{digits =} \DecValTok{3}\NormalTok{) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{clean_names}\NormalTok{() }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rename}\NormalTok{(}\DataTypeTok{lower_ci =}\NormalTok{ conf_low,}
+         \DataTypeTok{upper_ci =}\NormalTok{ conf_high)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+\centering\begingroup\fontsize{10}{12}\selectfont
+
+\begin{tabular}{l|r|r|r|r|r|r}
+\hline
+term & estimate & std\_error & statistic & p\_value & lower\_ci & upper\_ci\\
+\hline
+(Intercept) & 3.880 & 0.076 & 50.96 & 0 & 3.731 & 4.030\\
+\hline
+bty\_avg & 0.067 & 0.016 & 4.09 & 0 & 0.035 & 0.099\\
+\hline
+\end{tabular}
+\endgroup{}
+\end{table}
+
+Note that the \texttt{mutate\_if()} function is from the \texttt{dplyr} package and applies the \texttt{round()} function with 3 significant digits precision only to those variables that are numerical.
+
+Similarly, the second \texttt{get\_regression\_points()} function is another wrapper function, but this time returning information about the points in a regression rather than the regression table. It uses the \texttt{augment()} function in the \href{https://broom.tidyverse.org/}{\texttt{broom} package} instead of \texttt{tidy()} as with \texttt{get\_regression\_points()}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(broom)}
+\KeywordTok{library}\NormalTok{(janitor)}
+\NormalTok{score_model }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{augment}\NormalTok{() }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate_if}\NormalTok{(is.numeric, round, }\DataTypeTok{digits =} \DecValTok{3}\NormalTok{) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{clean_names}\NormalTok{() }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(}\OperatorTok{-}\KeywordTok{c}\NormalTok{(}\StringTok{"se_fit"}\NormalTok{, }\StringTok{"hat"}\NormalTok{, }\StringTok{"sigma"}\NormalTok{, }\StringTok{"cooksd"}\NormalTok{, }\StringTok{"std_resid"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+\centering\begingroup\fontsize{10}{12}\selectfont
+
+\begin{tabular}{r|r|r|r}
+\hline
+score & bty\_avg & fitted & resid\\
+\hline
+4.7 & 5.00 & 4.21 & 0.486\\
+\hline
+4.1 & 5.00 & 4.21 & -0.114\\
+\hline
+3.9 & 5.00 & 4.21 & -0.314\\
+\hline
+4.8 & 5.00 & 4.21 & 0.586\\
+\hline
+4.6 & 3.00 & 4.08 & 0.520\\
+\hline
+4.3 & 3.00 & 4.08 & 0.220\\
+\hline
+2.8 & 3.00 & 4.08 & -1.280\\
+\hline
+4.1 & 3.33 & 4.10 & -0.002\\
+\hline
+3.4 & 3.33 & 4.10 & -0.702\\
+\hline
+4.5 & 3.17 & 4.09 & 0.409\\
+\hline
+\end{tabular}
+\endgroup{}
+\end{table}
+
+In this case, it outputs only variables of interest to us as new regression modelers: the outcome variable \(y\) (\texttt{score}), all explanatory/predictor variables (\texttt{bty\_avg}), all resulting \texttt{fitted} values \(\hat{y}\) used by applying the equation of the regression line to \texttt{bty\_avg}, and the \texttt{resid}ual \(y - \hat{y}\).
+
+If you're even more curious, take a look at the source code for these functions on \href{https://github.com/moderndive/moderndive/blob/master/R/regression_functions.R}{GitHub}.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{conclusion-4}{%
+\section{Conclusion}\label{conclusion-4}}
+
+\hypertarget{additional-resources-4}{%
+\subsection{Additional resources}\label{additional-resources-4}}
+
+An R script file of all R code used in this chapter is available \href{scripts/06-regression.R}{here}.
+
+\hypertarget{whats-to-come-4}{%
+\subsection{What's to come?}\label{whats-to-come-4}}
+
+In this chapter, you've seen what we call ``basic regression'' when you only have one explanatory variable. In Chapter \ref{multiple-regression}, we'll study \emph{multiple regression} where we have more than one explanatory variable! In particular, we'll see why we've been conducting the residual analyses from Subsections \ref{model1residuals} and \ref{model2residuals}. We are actually verifying some very important assumptions that must be met for the \texttt{std\_error} (standard error), \texttt{p\_value}, \texttt{lower\_ci} and \texttt{upper\_ci} (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don't worry for now if you don't understand what these terms mean. After the next chapter on multiple regression, we'll dive in!
+
+\hypertarget{multiple-regression}{%
+\chapter{Multiple Regression}\label{multiple-regression}}
+
+In Chapter \ref{regression} we introduced ideas related to modeling, in particular that the fundamental premise of modeling is \emph{to make explicit the relationship} between an outcome variable \(y\) and an explanatory/predictor variable \(x\). Recall further the synonyms that we used to also denote \(y\) as the dependent variable and \(x\) as an independent variable or covariate.
+
+There are many modeling approaches one could take, among the most well-known being linear regression, which was the focus of the last chapter. Whereas in the last chapter we focused solely on regression scenarios where there is only one explanatory/predictor variable, in this chapter, we now focus on modeling scenarios where there is more than one. This case of regression more than one explanatory variable is known as multiple regression. You can imagine when trying to model a particular outcome variable, like teaching evaluation score as in Section \ref{model1} or life expectancy as in Section \ref{model2}, it would be very useful to incorporate more than one explanatory variable.
+
+Since our regression models will now consider more than one explanatory/predictor variable, the interpretation of the associated effect of any one explanatory/predictor variables must be made in conjunction with the others. For example, say we are modeling individuals' incomes as a function of their number of years of education and their parents' wealth. When interpreting the effect of education on income, one has to consider the effect of their parents' wealth at the same time, as these two variables are almost certainly related. Make note of this throughout this chapter and as you work on interpreting the results of multiple regression models into the future.
+
+\hypertarget{needed-packages-4}{%
+\subsection*{Needed packages}\label{needed-packages-4}}
+
+
+Let's load all the packages needed for this chapter (this assumes you've already installed them). Read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(moderndive)}
+\KeywordTok{library}\NormalTok{(ISLR)}
+\KeywordTok{library}\NormalTok{(skimr)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{model3}{%
+\section{Two numerical explanatory variables}\label{model3}}
+
+Let's now attempt to identify factors that are associated with how much credit card debt an individual will have. The textbook \href{http://www-bcf.usc.edu/~gareth/ISL/}{An Introduction to Statistical Learning with Applications in R} by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani is an intermediate-level textbook on statistical and machine learning freely available \href{http://www-bcf.usc.edu/~gareth/ISL/ISLR\%20Seventh\%20Printing.pdf}{here}. It has an accompanying R package called \texttt{ISLR} with datasets that the authors use to demonstrate various machine learning methods. One dataset that is frequently used by the authors is the \texttt{Credit} dataset where predictions are made on the credit card balance held by \(n = 400\) credit card holders. These predictions are based on information about them like income, credit limit, and education level. Note that this dataset is not based on actual individuals, it is a simulated dataset used for educational purposes.
+
+Since no information was provided as to who these \(n\) = 400 individuals are and how they came to be included in this dataset, it will be hard to make any scientific claims based on this data. Recall our discussion from the previous chapter that correlation does not necessarily imply causation. That being said, we'll still use \texttt{Credit} to demonstrate multiple regression with:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  A numerical outcome variable \(y\), in this case credit card balance.
+\item
+  Two explanatory variables:
+
+  \begin{enumerate}
+  \def\labelenumii{\arabic{enumii}.}
+  \tightlist
+  \item
+    A first numerical explanatory variable \(x_1\). In this case, their credit limit.
+  \item
+    A second numerical explanatory variable \(x_2\). In this case, their income (in thousands of dollars).
+  \end{enumerate}
+\end{enumerate}
+
+In the forthcoming Learning Checks, we'll consider a different scenario:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The same numerical outcome variable \(y\): credit card balance.
+\item
+  Two new explanatory variables:
+
+  \begin{enumerate}
+  \def\labelenumii{\arabic{enumii}.}
+  \tightlist
+  \item
+    A first numerical explanatory variable \(x_1\): their credit rating.
+  \item
+    A second numerical explanatory variable \(x_2\): their age.
+  \end{enumerate}
+\end{enumerate}
+
+\hypertarget{model3EDA}{%
+\subsection{Exploratory data analysis}\label{model3EDA}}
+
+Let's load the \texttt{Credit} data and \texttt{select()} only the needed subset of variables.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ISLR)}
+\NormalTok{Credit <-}\StringTok{ }\NormalTok{Credit }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{select}\NormalTok{(Balance, Limit, Income, Rating, Age)}
+\end{Highlighting}
+\end{Shaded}
+
+Let's look at the raw data values both by bringing up RStudio's spreadsheet viewer and the \texttt{glimpse()} function. Although in Table \ref{tab:model3-data-preview} we only show 5 randomly selected credit card holders out of 400:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{View}\NormalTok{(Credit)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:model3-data-preview}Random sample of 5 credit card holders}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{rrrrr}
+\toprule
+Balance & Limit & Income & Rating & Age\\
+\midrule
+98 & 1551 & 22.6 & 134 & 43\\
+1677 & 11200 & 140.7 & 817 & 46\\
+283 & 4270 & 36.9 & 299 & 63\\
+50 & 3327 & 35.0 & 253 & 54\\
+450 & 4442 & 30.4 & 316 & 30\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{glimpse}\NormalTok{(Credit)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Observations: 400
+Variables: 5
+$ Balance <int> 333, 903, 580, 964, 331, 1151, 203, 872, 279...
+$ Limit   <int> 3606, 6645, 7075, 9504, 4897, 8047, 3388, 71...
+$ Income  <dbl> 14.9, 106.0, 104.6, 148.9, 55.9, 80.2, 21.0,...
+$ Rating  <int> 283, 483, 514, 681, 357, 569, 259, 512, 266,...
+$ Age     <int> 34, 82, 71, 36, 68, 77, 37, 87, 66, 41, 30, ...
+\end{verbatim}
+
+Let's look at some summary statistics, again using the \texttt{skim()} function from the \texttt{skimr} package:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{Credit }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(Balance, Limit, Income) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{skim}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Skim summary statistics
+ n obs: 400 
+ n variables: 3 
+
+-- Variable type:integer ------------------------------------------------------
+ variable missing complete   n    mean      sd  p0     p25
+  Balance       0      400 400  520.01  459.76   0   68.75
+    Limit       0      400 400 4735.6  2308.2  855 3088   
+    p50     p75  p100     hist
+  459.5  863     1999 ▇▃▃▃▂▁▁▁
+ 4622.5 5872.75 13913 ▅▇▇▃▂▁▁▁
+
+-- Variable type:numeric ------------------------------------------------------
+ variable missing complete   n  mean    sd    p0   p25   p50
+   Income       0      400 400 45.22 35.24 10.35 21.01 33.12
+   p75   p100     hist
+ 57.47 186.63 ▇▃▂▁▁▁▁▁
+\end{verbatim}
+
+We observe for example:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The mean and median credit card balance are \$520.01 and \$459.50 respectively.
+\item
+  25\% of card holders had debts of \$68.75 or less.
+\item
+  The mean and median credit card limit are \$4735.6 and \$4622.50 respectively.
+\item
+  75\% of these card holders had incomes of \$57,470 or less.
+\end{enumerate}
+
+Since our outcome variable \texttt{Balance} and the explanatory variables \texttt{Limit} and
+\texttt{Income} are numerical, we can compute the correlation coefficient between pairs
+of these variables. First, we could run the \texttt{get\_correlation()} command as seen
+in Subsection \ref{model1EDA} twice, once for each explanatory variable:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{Credit }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_correlation}\NormalTok{(Balance }\OperatorTok{~}\StringTok{ }\NormalTok{Limit)}
+\NormalTok{Credit }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_correlation}\NormalTok{(Balance }\OperatorTok{~}\StringTok{ }\NormalTok{Income)}
+\end{Highlighting}
+\end{Shaded}
+
+Or we can simultaneously compute them by returning a \emph{correlation matrix} in
+Table \ref{tab:model3-correlation}. We can read off the correlation coefficient
+for any pair of variables by looking them up in the appropriate row/column combination.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{Credit }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{select}\NormalTok{(Balance, Limit, Income) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{cor}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:model3-correlation}Correlations between credit card balance, credit limit, and income}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrrr}
+\toprule
+  & Balance & Limit & Income\\
+\midrule
+Balance & 1.000 & 0.862 & 0.464\\
+Limit & 0.862 & 1.000 & 0.792\\
+Income & 0.464 & 0.792 & 1.000\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+For example, the correlation coefficient of:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \texttt{Balance} with itself is 1 as we would expect based on the definition of the correlation coefficient.
+\item
+  \texttt{Balance} with \texttt{Limit} is 0.862. This indicates a strong positive linear relationship, which makes sense as only individuals with large credit limits can accrue large credit card balances.
+\item
+  \texttt{Balance} with \texttt{Income} is 0.464. This is suggestive of another positive linear relationship, although not as strong as the relationship between \texttt{Balance} and \texttt{Limit}.
+\item
+  As an added bonus, we can read off the correlation coefficient of the two explanatory variables, \texttt{Limit} and \texttt{Income} of 0.792. In this case, we say there is a high degree of \emph{collinearity} between these two explanatory variables.
+\end{enumerate}
+
+Collinearity (or multicollinearity) is a phenomenon in which one explanatory variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. So in this case, if we knew someone's credit card \texttt{Limit} and since \texttt{Limit} and \texttt{Income} are highly correlated, we could make a fairly accurate guess as to that person's \texttt{Income}. Or put loosely, these two variables provided redundant information. For now let's ignore any issues related to collinearity and press on.
+
+Let's visualize the relationship of the outcome variable with each of the two explanatory variables in two separate plots:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(Credit, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ Limit, }\DataTypeTok{y =}\NormalTok{ Balance)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Credit limit (in $)"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Credit card balance (in $)"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Relationship between balance and credit limit"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{)}
+  
+\KeywordTok{ggplot}\NormalTok{(Credit, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ Income, }\DataTypeTok{y =}\NormalTok{ Balance)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Income (in $1000)"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Credit card balance (in $)"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Relationship between balance and income"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/2numxplot1-1} 
+
+}
+
+\caption{Relationship between credit card balance and credit limit/income}\label{fig:2numxplot1}
+\end{figure}
+
+First, there is a positive relationship between credit limit and balance, since as credit limit increases so also does credit card balance; this is to be expected given the strongly positive correlation coefficient of 0.862. In the case of income, the positive relationship doesn't appear as strong, given the weakly positive correlation coefficient of 0.464. However the two plots in Figure \ref{fig:2numxplot1} only focus on the relationship of the outcome variable with each of the explanatory variables independently. To get a sense of the \emph{joint} relationship of all three variables simultaneously through a visualization, let's display the data in a 3-dimensional (3D) scatterplot, where
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The numerical outcome variable \(y\) \texttt{Balance} is on the z-axis (vertical axis)
+\item
+  The two numerical explanatory variables form the ``floor'' axes. In this case
+
+  \begin{enumerate}
+  \def\labelenumii{\arabic{enumii}.}
+  \tightlist
+  \item
+    The first numerical explanatory variable \(x_1\) \texttt{Income} is on of the floor axes.
+  \item
+    The second numerical explanatory variable \(x_2\) \texttt{Limit} is on the other floor axis.
+  \end{enumerate}
+\end{enumerate}
+
+Click on the following image to open an interactive 3D scatterplot in your browser:
+
+\begin{center}
+\href{https://assets.datacamp.com/production/repositories/1575/datasets/f369dc94041e88effd5ed66512978f8cdfd33801/03-01-slides-interactive_3D_scatterplot_regression_plane.html}{\includegraphics[width=0.6\textwidth]{images/credit_card_balance_3D_scatterplot.png}}
+\end{center}
+
+Previously in Figure \ref{fig:numxplot4}, we plotted a ``best-fitting'' regression line through a set of points where the numerical outcome variable \(y\) was teaching \texttt{score} and a single numerical explanatory variable \(x\) was \texttt{bty\_avg}. What is the analogous concept when we have \emph{two} numerical predictor variables? Instead of a best-fitting line, we now have a best-fitting \emph{plane}, which is a 3D generalization of lines which exist in 2D. Click \href{https://beta.rstudioconnect.com/connect/\#/apps/3214/}{here} to open an interactive plot of the regression plane shown below in your browser. Move the image around, zoom in, and think about how this plane generalizes the concept of a linear regression line to three dimensions.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/credit_card_balance_regression_plane} 
+
+}
+
+\caption{Regression plane}\label{fig:unnamed-chunk-205}
+\end{figure}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC7.1)} Conduct a new exploratory data analysis with the same outcome variable \(y\) being \texttt{Balance} but with \texttt{Rating} and \texttt{Age} as the new explanatory variables \(x_1\) and \(x_2\). Remember, this involves three things:
+
+\begin{enumerate}
+\def\labelenumi{\alph{enumi})}
+\tightlist
+\item
+  Looking at the raw values
+\item
+  Computing summary statistics of the variables of interest.
+\item
+  Creating informative visualizations
+\end{enumerate}
+
+What can you say about the relationship between a credit card holder's balance and their credit rating and age?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{model3table}{%
+\subsection{Multiple regression}\label{model3table}}
+
+Just as we did when we had a single numerical explanatory variable \(x\) in Subsection \ref{model1table} and when we had a single categorical explanatory variable \(x\) in Subsection \ref{model2table}, we fit a regression model and obtained the regression table in our two numerical explanatory variable scenario. To fit a regression model and get a table using \texttt{get\_regression\_table()}, we now use a \texttt{+} to consider multiple explanatory variables. In this case since we want to perform a regression of \texttt{Limit} and \texttt{Income} simultaneously, we input \texttt{Balance\ \textasciitilde{}\ Limit\ +\ Income}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{Balance_model <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(Balance }\OperatorTok{~}\StringTok{ }\NormalTok{Limit }\OperatorTok{+}\StringTok{ }\NormalTok{Income, }\DataTypeTok{data =}\NormalTok{ Credit)}
+\KeywordTok{get_regression_table}\NormalTok{(Balance_model)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:model3-table-output}Multiple regression table}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrrrrrr}
+\toprule
+term & estimate & std\_error & statistic & p\_value & lower\_ci & upper\_ci\\
+\midrule
+intercept & -385.179 & 19.465 & -19.8 & 0 & -423.446 & -346.912\\
+Limit & 0.264 & 0.006 & 45.0 & 0 & 0.253 & 0.276\\
+Income & -7.663 & 0.385 & -19.9 & 0 & -8.420 & -6.906\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+How do we interpret these three values that define the regression plane?
+
+\begin{itemize}
+\tightlist
+\item
+  Intercept: -\$385.18 (rounded to two decimal points to represent cents). The intercept in our case represents the credit card balance for an individual who has both a credit \texttt{Limit} of \$0 and \texttt{Income} of \$0. In our data however, the intercept has limited practical interpretation as no individuals had \texttt{Limit} or \texttt{Income} values of \$0 and furthermore the smallest credit card balance was \$0. Rather, it is used to situate the regression plane in 3D space.
+\item
+  Limit: \$0.26. Now that we have multiple variables to consider, we have to add
+  a caveat to our interpretation: \emph{taking all other variables in our model into account, for every increase of one unit in credit \texttt{Limit} (dollars), there is an associated increase of on average \$0.26 in credit card balance}. Note:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Just as we did in Subsection \ref{model1table}, we are not making any causal statements, only statements relating to the association between credit limit and balance
+  \item
+    We need to preface our interpretation of the associated effect of \texttt{Limit} with the statement ``taking all other variables into account'', in this case \texttt{Income}, to emphasize that we are now jointly interpreting the associated effect of multiple explanatory variables in the same model and not in isolation.
+  \end{itemize}
+\item
+  Income: -\$7.66. Similarly, \emph{taking all other variables into account, for every increase of one unit in \texttt{Income} (in other words, \$1000 in income), there is an associated decrease of on average \$7.66 in credit card balance}.
+\end{itemize}
+
+However, recall in Figure \ref{fig:2numxplot1} that when considered separately, both \texttt{Limit} and \texttt{Income} had positive relationships with the outcome variable \texttt{Balance}. As card holders' credit limits increased their credit card balances tended to increase as well, and a similar relationship held for incomes and balances. In the above multiple regression, however, the slope for \texttt{Income} is now -7.66, suggesting a \emph{negative relationship} between income and credit card balance. What explains these contradictory results?
+
+This is known as Simpson's Paradox, a phenomenon in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. We expand on this in Subsection \ref{simpsonsparadox} where we'll look at the relationship between credit \texttt{Limit} and credit card balance but split by different income bracket groups.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC7.2)} Fit a new simple linear regression using \texttt{lm(Balance\ \textasciitilde{}\ Rating\ +\ Age,\ data\ =\ Credit)} where \texttt{Rating} and \texttt{Age} are the new numerical explanatory variables \(x_1\) and \(x_2\). Get information about the ``best-fitting'' line from the regression table by applying the \texttt{get\_regression\_table()} function. How do the regression results match up with the results from your exploratory data analysis above?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{model3points}{%
+\subsection{Observed/fitted values and residuals}\label{model3points}}
+
+As we did previously in Table \ref{tab:model3-points-table}, let's unpack the output of the \texttt{get\_regression\_points()} function for our model for credit card balance for all 400 card holders in the dataset. Recall that each card holder corresponds to one of the 400 rows in the \texttt{Credit} data frame and also for one of the 400 3D points in the 3D scatterplots in Subsection \ref{model3EDA}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{regression_points <-}\StringTok{ }\KeywordTok{get_regression_points}\NormalTok{(Balance_model)}
+\NormalTok{regression_points}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:model3-points-table}Regression points (first 5 rows of 400)}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{rrrrrr}
+\toprule
+ID & Balance & Limit & Income & Balance\_hat & residual\\
+\midrule
+1 & 333 & 3606 & 14.9 & 454 & -120.8\\
+2 & 903 & 6645 & 106.0 & 559 & 344.3\\
+3 & 580 & 7075 & 104.6 & 683 & -103.4\\
+4 & 964 & 9504 & 148.9 & 986 & -21.7\\
+5 & 331 & 4897 & 55.9 & 481 & -150.0\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Recall the format of the output:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{Balance} corresponds to \(y\) (the observed value)
+\item
+  \texttt{Balance\_hat} corresponds to \(\widehat{y}\) (the fitted value)
+\item
+  \texttt{residual} corresponds to \(y - \widehat{y}\) (the residual)
+\end{itemize}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{model4}{%
+\section{One numerical \& one categorical explanatory variable}\label{model4}}
+
+Let's revisit the instructor evaluation data introduced in Section \ref{model1}, where we studied the relationship between instructor evaluation scores and their beauty scores. This analysis suggested that there is a positive relationship between \texttt{bty\_avg} and \texttt{score}, in other words as instructors had higher beauty scores, they also tended to have higher teaching evaluation scores. Now let's say instead of \texttt{bty\_avg} we are interested in the numerical explanatory variable \(x_1\) \texttt{age} and furthermore we want to use a second explanatory variable \(x_2\), the (binary) categorical variable \texttt{gender}.
+
+\textbf{Note}: This study only focused on the gender binary of \texttt{"male"} or \texttt{"female"} when the data was collected and analyzed years ago. It has been tradition to use gender as an ``easy'' binary variable in the past in statistical analyses. We have chosen to include it here because of the interesting results of the study, but we also understand that a segment of the population is not included in this dichotomous assignment of gender and we advocate for more inclusion in future studies to show representation of groups that do not identify with the gender binary. We now resume our analyses using this \texttt{evals} data and hope that others find these results interesting and worth further exploration.
+
+Our modeling scenario now becomes
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  A numerical outcome variable \(y\). As before, instructor evaluation score.
+\item
+  Two explanatory variables:
+
+  \begin{enumerate}
+  \def\labelenumii{\arabic{enumii}.}
+  \tightlist
+  \item
+    A numerical explanatory variable \(x_1\): in this case, their age.
+  \item
+    A categorical explanatory variable \(x_2\): in this case, their binary gender.
+  \end{enumerate}
+\end{enumerate}
+
+\hypertarget{model4EDA}{%
+\subsection{Exploratory data analysis}\label{model4EDA}}
+
+Let's reload the \texttt{evals} data and \texttt{select()} only the needed subset of variables. Note that these are different than the variables chosen in Chapter 6. Let's given this the name \texttt{evals\_ch7}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{evals_ch7 <-}\StringTok{ }\NormalTok{evals }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{select}\NormalTok{(score, age, gender)}
+\end{Highlighting}
+\end{Shaded}
+
+Let's look at the raw data values both by bringing up RStudio's spreadsheet viewer and the \texttt{glimpse()} function, although in Table \ref{tab:model4-data-preview} we only show 5 randomly selected instructors out of 463:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{View}\NormalTok{(evals_ch7)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:model4-data-preview}Random sample of 5 instructors}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{rrl}
+\toprule
+score & age & gender\\
+\midrule
+3.6 & 34 & male\\
+4.9 & 43 & male\\
+3.3 & 47 & male\\
+4.4 & 33 & female\\
+4.7 & 60 & male\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Let's look at some summary statistics using the \texttt{skim()} function from the \texttt{skimr} package:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{evals_ch7 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{skim}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Skim summary statistics
+ n obs: 463 
+ n variables: 3 
+
+-- Variable type:factor -------------------------------------------------------
+ variable missing complete   n n_unique
+   gender       0      463 463        2
+                top_counts ordered
+ mal: 268, fem: 195, NA: 0   FALSE
+
+-- Variable type:integer ------------------------------------------------------
+ variable missing complete   n  mean  sd p0 p25 p50 p75 p100
+      age       0      463 463 48.37 9.8 29  42  48  57   73
+     hist
+ ▅▅▅▇▅▇▂▁
+
+-- Variable type:numeric ------------------------------------------------------
+ variable missing complete   n mean   sd  p0 p25 p50 p75 p100
+    score       0      463 463 4.17 0.54 2.3 3.8 4.3 4.6    5
+     hist
+ ▁▁▂▃▅▇▇▆
+\end{verbatim}
+
+Furthermore, let's compute the correlation between two numerical variables we have \texttt{score} and \texttt{age}. Recall that correlation coefficients only exist between numerical variables. We observe that they are weakly negatively correlated.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{evals_ch7 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_correlation}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ score }\OperatorTok{~}\StringTok{ }\NormalTok{age)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  correlation
+        <dbl>
+1      -0.107
+\end{verbatim}
+
+In Figure \ref{fig:numxcatxplot1}, we plot a scatterplot of \texttt{score} over \texttt{age}. Given that \texttt{gender} is a binary categorical variable in this study, we can make some interesting tweaks:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  We can assign a color to points from each of the two levels of \texttt{gender}: female and male.
+\item
+  Furthermore, the \texttt{geom\_smooth(method\ =\ "lm",\ se\ =\ FALSE)} layer automatically fits a different regression line for each since we have provided \texttt{color\ =\ gender} at the top level in \texttt{ggplot()}. This allows for all \texttt{geom\_}etries that follow to have the same mapping of \texttt{aes()}thetics to variables throughout the plot.
+\end{enumerate}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(evals_ch7, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ age, }\DataTypeTok{y =}\NormalTok{ score, }\DataTypeTok{color =}\NormalTok{ gender)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_jitter}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Age"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Teaching Score"}\NormalTok{, }\DataTypeTok{color =} \StringTok{"Gender"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxcatxplot1-1} 
+
+}
+
+\caption{Instructor evaluation scores at UT Austin split by gender (jittered)}\label{fig:numxcatxplot1}
+\end{figure}
+
+We notice some interesting trends:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  There are almost no women faculty over the age of 60. We can see this by the lack of red dots above 60.
+\item
+  Fitting separate regression lines for men and women, we see they have different slopes. We see that the associated effect of increasing age seems to be much harsher for women than men. In other words, as women age, the drop in their teaching score appears to be faster.
+\end{enumerate}
+
+\hypertarget{model4table}{%
+\subsection{Multiple regression: Parallel slopes model}\label{model4table}}
+
+Much like we started to consider multiple explanatory variables using the \texttt{+} sign in Subsection \ref{model3table}, let's fit a regression model and get the regression table. This time we provide the name of \texttt{score\_model\_2} to our regression model fit, in so as to not overwrite the model \texttt{score\_model} from Section \ref{model1table}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{score_model_}\DecValTok{2}\NormalTok{ <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{age }\OperatorTok{+}\StringTok{ }\NormalTok{gender, }\DataTypeTok{data =}\NormalTok{ evals_ch7)}
+\KeywordTok{get_regression_table}\NormalTok{(score_model_}\DecValTok{2}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:unnamed-chunk-220}Regression table}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrrrrrr}
+\toprule
+term & estimate & std\_error & statistic & p\_value & lower\_ci & upper\_ci\\
+\midrule
+intercept & 4.484 & 0.125 & 35.79 & 0.000 & 4.238 & 4.730\\
+age & -0.009 & 0.003 & -3.28 & 0.001 & -0.014 & -0.003\\
+gendermale & 0.191 & 0.052 & 3.63 & 0.000 & 0.087 & 0.294\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+The modeling equation for this scenario is:
+
+\begin{align}
+\widehat{y} &= b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 \\
+\widehat{\mbox{score}} &= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x)
+\end{align}
+
+where \(\mathbb{1}_{\mbox{is male}}(x)\) is an \emph{indicator function} for \texttt{sex\ ==\ male}. In other words, \(\mathbb{1}_{\mbox{is male}}(x)\) equals one if the current observation corresponds to a male professor, and 0 if the current observation corresponds to a female professor. This model can be visualized in Figure \ref{fig:numxcatxplot2}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxcatxplot2-1} 
+
+}
+
+\caption{Instructor evaluation scores at UT Austin by gender: same slope}\label{fig:numxcatxplot2}
+\end{figure}
+
+We see that:
+
+\begin{itemize}
+\tightlist
+\item
+  Females are treated as the baseline for comparison for no other reason than ``female'' is alphabetically earlier than ``male.'' The \(b_{male} = 0.1906\) is the vertical ``bump'' that men get in their teaching evaluation scores. Or more precisely, it is the average difference in teaching score
+  that men get \emph{relative to the baseline of women}.
+\item
+  Accordingly, the intercepts (which in this case make no sense since no instructor can have an age of 0) are :
+
+  \begin{itemize}
+  \tightlist
+  \item
+    for women: \(b_0\) = 4.484
+  \item
+    for men: \(b_0 + b_{male}\) = 4.484 + 0.191 = 4.675
+  \end{itemize}
+\item
+  Both men and women have the same slope. In other words, \emph{in this model} the associated effect of age is the same for men and women. So for every increase of one year in age, there is on average an associated change of \(b_{age}\) = -0.009 (a decrease) in teaching score.
+\end{itemize}
+
+But wait, why is Figure \ref{fig:numxcatxplot2} different than Figure \ref{fig:numxcatxplot1}! What is going on? What we have in the original plot is known as an \emph{interaction effect} between age and gender. Focusing on fitting a model for each of men and women, we see that the resulting regression lines are different. Thus, \texttt{gender} appears to interact in different ways for men and women with the different values of \texttt{age}.
+
+\hypertarget{model4interactiontable}{%
+\subsection{Multiple regression: Interaction model}\label{model4interactiontable}}
+
+We say a model has an \emph{interaction effect} if the associated effect of one variable \emph{depends on the value of another variable}. These types of models usually prove to be tricky to view on first glance because of their complexity. In this case, the effect of \texttt{age} will depend on the value of \texttt{gender}. Put differently, the effect of age on teaching scores will differ for men and for women, as was suggested by the different slopes for men and women in our visual exploratory data analysis in Figure \ref{fig:numxcatxplot1}.
+
+Let's fit a regression with an interaction term. Instead of using the \texttt{+} sign in the enumeration of explanatory variables, we use the \texttt{*} sign. Let's fit this regression and save it in \texttt{score\_model\_3}, then we get the regression table using the \texttt{get\_regression\_table()} function as before.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{score_model_interaction <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{age }\OperatorTok{*}\StringTok{ }\NormalTok{gender, }\DataTypeTok{data =}\NormalTok{ evals_ch7)}
+\KeywordTok{get_regression_table}\NormalTok{(score_model_interaction)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:unnamed-chunk-222}Regression table}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrrrrrr}
+\toprule
+term & estimate & std\_error & statistic & p\_value & lower\_ci & upper\_ci\\
+\midrule
+intercept & 4.883 & 0.205 & 23.80 & 0.000 & 4.480 & 5.286\\
+age & -0.018 & 0.004 & -3.92 & 0.000 & -0.026 & -0.009\\
+gendermale & -0.446 & 0.265 & -1.68 & 0.094 & -0.968 & 0.076\\
+age:gendermale & 0.014 & 0.006 & 2.45 & 0.015 & 0.003 & 0.024\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+The modeling equation for this scenario is:
+
+\begin{align}
+\widehat{y} &= b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 + b_3 \cdot x_1 \cdot x_2\\
+\widehat{\mbox{score}} &= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x)
+\end{align}
+
+Oof, that's a lot of rows in the regression table output and a lot of terms in the model equation. The fourth term being added on the right hand side of the equation corresponds to the \emph{interaction term}. Let's simplify things by considering men and women separately. First, recall that \(\mathbb{1}_{\mbox{is male}}(x)\) equals 1 if a particular observation (or row in \texttt{evals\_ch7}) corresponds to a male instructor. In this case, using the values from the regression table the fitted value of \(\widehat{\mbox{score}}\) is:
+
+\begin{align}
+\widehat{\mbox{score}} &= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\
+&= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot 1 + b_{\mbox{age,male}} \cdot \mbox{age} \cdot 1 \\
+&= \left(b_0 + b_{\mbox{male}}\right) + \left(b_{\mbox{age}} +  b_{\mbox{age,male}} \right) \cdot \mbox{age} \\
+&= \left(4.883 + -0.446\right) + \left(-0.018 +  0.014 \right) \cdot \mbox{age} \\
+&= 4.437 -0.004 \cdot \mbox{age}
+\end{align}
+
+Second, recall that \(\mathbb{1}_{\mbox{is male}}(x)\) equals 0 if a particular observation corresponds to a female instructor. Again, using the values from the regression table the fitted value of \(\widehat{\mbox{score}}\) is:
+
+\begin{align}
+\widehat{\mbox{score}} &= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\
+&= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot 0 + b_{\mbox{age,male}}\mbox{age} \cdot 0 \\
+&= b_0 + b_{\mbox{age}} \cdot \mbox{age}\\
+&= 4.883 -0.018 \cdot \mbox{age}
+\end{align}
+
+Let's summarize these values in a table:
+
+\begin{table}[H]
+
+\caption{\label{tab:unnamed-chunk-223}Comparison of male and female intercepts and age slopes}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrr}
+\toprule
+Gender & Intercept & Slope for age\\
+\midrule
+Male instructors & 4.44 & -0.004\\
+Female instructors & 4.88 & -0.018\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+We see that while male instructors have a lower intercept, as they age, they have a less steep associated average decrease in teaching scores: 0.004 teaching score units per year as opposed to -0.018 for women. This is consistent with the different slopes and intercepts of the red and blue regression lines fit in Figure \ref{fig:numxcatxplot1}. Recall our definition of a model having an interaction effect: when the associated effect of one variable, in this case \texttt{age}, depends on the value of another variable, in this case \texttt{gender}.
+
+But how do we know when it's appropriate to include an interaction effect? For example, which is the more appropriate model? The regular multiple regression model without an interaction term we saw in Section \ref{model4table} or the multiple regression model with the interaction term we just saw? We'll revisit this question in Chapter \ref{inference-for-regression} on ``inference for regression.''
+
+\hypertarget{model4points}{%
+\subsection{Observed/fitted values and residuals}\label{model4points}}
+
+Now say we want to apply the above calculations for male and female instructors for all 463 instructors in the \texttt{evals\_ch7} dataset. As our multiple regression models get more and more complex, computing such values by hand gets more and more tedious. The \texttt{get\_regression\_points()} function spares us this tedium and returns all fitted values and all residuals. For simplicity, let's focus only on the fitted interaction model, which is saved in \texttt{score\_model\_interaction}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{regression_points <-}\StringTok{ }\KeywordTok{get_regression_points}\NormalTok{(score_model_interaction)}
+\NormalTok{regression_points}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:model4-points-table}Regression points (first 5 rows of 463)}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{rrrlrr}
+\toprule
+ID & score & age & gender & score\_hat & residual\\
+\midrule
+1 & 4.7 & 36 & female & 4.25 & 0.448\\
+2 & 4.1 & 36 & female & 4.25 & -0.152\\
+3 & 3.9 & 36 & female & 4.25 & -0.352\\
+4 & 4.8 & 36 & female & 4.25 & 0.548\\
+5 & 4.6 & 59 & male & 4.20 & 0.399\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Recall the format of the output:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{score} corresponds to \(y\) the observed value
+\item
+  \texttt{score\_hat} corresponds to \(\widehat{y} = \widehat{\mbox{score}}\) the fitted value
+\item
+  \texttt{residual} corresponds to the residual \(y - \widehat{y}\)
+\end{itemize}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{related-topics-1}{%
+\section{Related topics}\label{related-topics-1}}
+
+\hypertarget{correlationcoefficient2}{%
+\subsection{More on the correlation coefficient}\label{correlationcoefficient2}}
+
+Recall from Table \ref{tab:model3-correlation} that we saw the correlation
+coefficient between \texttt{Income} in thousands of dollars and credit card \texttt{Balance}
+was 0.464. What if in instead we looked at the correlation coefficient between
+\texttt{Income} and credit card \texttt{Balance}, but where \texttt{Income} was in dollars and not
+thousands of dollars? This can be done by multiplying \texttt{Income} by 1000.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ISLR)}
+\KeywordTok{data}\NormalTok{(Credit)}
+\NormalTok{Credit }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(Balance, Income) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{Income =}\NormalTok{ Income }\OperatorTok{*}\StringTok{ }\DecValTok{1000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{cor}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:cor-credit-2}Correlation between income (in dollars) and credit card balance}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrr}
+\toprule
+  & Balance & Income\\
+\midrule
+Balance & 1.000 & 0.464\\
+Income & 0.464 & 1.000\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+We see it is the same! We say that the correlation coefficient is invariant to linear
+transformations! In other words,
+
+\begin{itemize}
+\tightlist
+\item
+  the correlation between \(x\) and \(y\) will be the same as
+\item
+  the correlation between \(a\times x + b\) and \(y\) where \(a\) and \(b\) are numerical values (real numbers in mathematical terms).
+\end{itemize}
+
+\hypertarget{simpsonsparadox}{%
+\subsection{Simpson's Paradox}\label{simpsonsparadox}}
+
+Recall in Section \ref{model3}, we saw the two following seemingly contradictory results when studying the relationship between credit card balance, credit limit, and income. On the one hand, the right hand plot of Figure \ref{fig:2numxplot1} suggested that credit card balance and income were positively related:
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-226-1} 
+
+}
+
+\caption{Relationship between credit card balance and credit limit/income}\label{fig:unnamed-chunk-226}
+\end{figure}
+
+On the other hand, the multiple regression in Table \ref{tab:model3-table-output}, suggested that when modeling credit card balance as a function of both credit limit and income at the same time, credit limit has a negative relationship with balance, as evidenced by the slope of -7.66. How can this be?
+
+First, let's dive a little deeper into the explanatory variable \texttt{Limit}. Figure \ref{fig:credit-limit-quartiles} shows a histogram of all 400 values of \texttt{Limit}, along with vertical red lines that cut up the data into quartiles, meaning:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  25\% of credit limits were between \$0 and \$3088. Let's call this the ``low'' credit limit bracket.
+\item
+  25\% of credit limits were between \$3088 and \$4622. Let's call this the ``medium-low'' credit limit bracket.
+\item
+  25\% of credit limits were between \$4622 and \$5873. Let's call this the ``medium-high'' credit limit bracket.
+\item
+  25\% of credit limits were over \$5873. Let's call this the ``high'' credit limit bracket.
+\end{enumerate}
+
+\begin{verbatim}
+`stat_bin()` using `bins = 30`. Pick better value with
+`binwidth`.
+\end{verbatim}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/credit-limit-quartiles-1} 
+
+}
+
+\caption{Histogram of credit limits and quartiles}\label{fig:credit-limit-quartiles}
+\end{figure}
+
+Let's now display
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The scatterplot showing the relationship between credit card balance and limit (the right-hand plot of Figure \ref{fig:2numxplot1}).
+\item
+  The scatterplot showing the relationship between credit card balance and limit now with a color aesthetic added corresponding to the credit limit bracket.
+\end{enumerate}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/2numxplot4-1} 
+
+}
+
+\caption{Relationship between credit card balance and income for different credit limit brackets}\label{fig:2numxplot4}
+\end{figure}
+
+In the right-hand plot, the
+
+\begin{itemize}
+\tightlist
+\item
+  Red points (bottom-left) correspond to the low credit limit bracket.
+\item
+  Green points correspond to the medium-low credit limit bracket.
+\item
+  Blue points correspond to the medium-high credit limit bracket.
+\item
+  Purple points (top-right) correspond to the high credit limit bracket.
+\end{itemize}
+
+The left-hand plot focuses of the relationship between balance and income in aggregate, but the right-hand plot focuses on the relationship between balance and income \emph{broken down by credit limit bracket}. Whereas in aggregate there is an overall positive relationship, when broken down we now see that for the low (red points), medium-low (green points), and medium-high (blue points) income bracket groups, the strong positive relationship between credit card balance and income disappears! Only for the high bracket does the relationship stay somewhat positive. In this example, credit limit is a \emph{confounding variable} for credit card balance and income.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{conclusion-5}{%
+\section{Conclusion}\label{conclusion-5}}
+
+\hypertarget{additional-resources-5}{%
+\subsection{Additional resources}\label{additional-resources-5}}
+
+An R script file of all R code used in this chapter is available \href{scripts/07-multiple-regression.R}{here}.
+
+\hypertarget{whats-to-come-5}{%
+\subsection{What's to come?}\label{whats-to-come-5}}
+
+Congratulations! We're ready to proceed to the third portion of this book: ``statistical inference'' using a new package called \texttt{infer}. Once we've covered Chapters \ref{sampling} on sampling, \ref{confidence-intervals} on confidence intervals, and \ref{hypothesis-testing} on hypothesis testing, we'll come back to the models we've seen in ``data modeling'' in Chapter \ref{inference-for-regression} on inference for regression. As we said at the end of Chapter \ref{regression}, we'll see why we've been conducting the residual analyses from Subsections \ref{model3residuals} and \ref{model4residuals}. We are actually verifying some very important assumptions that must be met for the \texttt{std\_error} (standard error), \texttt{p\_value}, \texttt{conf\_low} and \texttt{conf\_high} (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Up next:
+
+\hypertarget{part-statistical-inference-via-infer}{%
+\part{Statistical inference via infer}\label{part-statistical-inference-via-infer}}
+
+\hypertarget{sampling}{%
+\chapter{Sampling}\label{sampling}}
+
+In this chapter we kick off the third segment of this book, statistical inference, by learning about \textbf{sampling}. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we'll cover in Chapters \ref{confidence-intervals} and \ref{hypothesis-testing} respectively. We will see that the tools that you learned in the data science segment of this book, in particular data visualization and data wrangling, will also play an important role here in the development of your understanding. As mentioned before, the concepts throughout this text all build into a culmination allowing you to ``think with data.''
+
+\hypertarget{needed-packages-5}{%
+\subsection*{Needed packages}\label{needed-packages-5}}
+
+
+Let's load all the packages needed for this chapter (this assumes you've already installed them). If needed, read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(moderndive)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{sampling-activity}{%
+\section{Sampling activity}\label{sampling-activity}}
+
+Let's start with a hands-on activity.
+
+\hypertarget{what-proportion-of-this-bowls-balls-are-red}{%
+\subsection{What proportion of this bowl's balls are red?}\label{what-proportion-of-this-bowls-balls-are-red}}
+
+Take a look at the bowl in Figure \ref{fig:sampling-exercise-1}. It has a certain number of red and a certain number of white balls all of equal size. Furthermore, it appears the bowl has been mixed beforehand as there does not seem to be any particular pattern to the spatial distribution of red and white balls.
+
+Let's now ask ourselves, what proportion of this bowl's balls are red?
+
+\begin{figure}
+
+{\centering \includegraphics[width=0.8\linewidth]{images/sampling_bowl_1} 
+
+}
+
+\caption{A bowl with red and white balls.}\label{fig:sampling-exercise-1}
+\end{figure}
+
+One way to answer this question would be to perform an exhaustive count: remove each ball individually, count the number of red balls and the number of white balls, and divide the number of red balls by the total number of balls. However this would be a long and tedious process.
+
+\hypertarget{using-the-shovel-once}{%
+\subsection{Using the shovel once}\label{using-the-shovel-once}}
+
+Instead of performing an exhaustive count, let's insert a shovel into the bowl as seen in Figure \ref{fig:sampling-exercise-2}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=0.8\linewidth]{images/sampling_bowl_2} 
+
+}
+
+\caption{Inserting a shovel into the bowl.}\label{fig:sampling-exercise-2}
+\end{figure}
+
+Using the shovel we remove a number of balls as seen in Figure \ref{fig:sampling-exercise-3}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=0.8\linewidth]{images/sampling_bowl_3_cropped} 
+
+}
+
+\caption{Fifty balls from the bowl.}\label{fig:sampling-exercise-3}
+\end{figure}
+
+Observe that 17 of the balls are red and there are a total of 5 x 10 = 50 balls and thus 0.34 = 34\% of the shovel's balls are red. We can view the proportion of balls that are red \emph{in this shovel} as a guess of the proportion of balls that are red \emph{in the entire bowl}. While not as exact as doing an exhaustive count, our guess of 34\% took much less time and energy to obtain.
+
+However, say, we started this activity over from the beginning. In other words, we replace the 50 balls back into the bowl and start over. Would we remove exactly 17 red balls again? In other words, would our guess at the proportion of the bowl's balls that are red be exactly 34\% again? Maybe?
+
+What if we repeated this exercise several times? Would I obtain exactly 17 red balls each time? In other words, would our guess at the proportion of the bowl's balls that are red be exactly 34\% every time? Surely not. Let's actually do and observe the results with the help of 33 of our friends.
+
+\hypertarget{student-shovels}{%
+\subsection{Using the shovel 33 times}\label{student-shovels}}
+
+Each of our 33 friends will do the following:
+
+\begin{itemize}
+\tightlist
+\item
+  use the shovel to remove 50 balls each,
+\item
+  count the number of red balls,
+\item
+  use this number to compute the proportion of the 50 balls they removed that are red,
+\item
+  return the balls into the bowl, and
+\item
+  mix the contents of the bowl a little to not let a previous group's results influence the next group's set of results.
+\end{itemize}
+
+\begin{figure}
+
+{\centering \includegraphics[width=0.2\linewidth]{images/sampling/tactile_2_a} \includegraphics[width=0.2\linewidth]{images/sampling/tactile_2_b} \includegraphics[width=0.2\linewidth]{images/sampling/tactile_2_c} 
+
+}
+
+\caption{Repeating sampling activity 33 times.}\label{fig:sampling-exercise-3b}
+\end{figure}
+
+However, before returning the balls into the bowl, they are going to mark the proportion of the 50 balls they removed that are red in a histogram as seen in Figure \ref{fig:sampling-exercise-4}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=0.8\linewidth]{images/sampling/tactile_3_a} 
+
+}
+
+\caption{Constructing a histogram of proportions.}\label{fig:sampling-exercise-4}
+\end{figure}
+
+Recall from Section \ref{histograms} that histograms allow us to visualize the \emph{distribution} of a numerical variable: where the values center and in particular how they vary. The resulting hand-drawn histogram can be seen in Figure \ref{fig:sampling-exercise-5}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=0.8\linewidth]{images/sampling/tactile_3_c} 
+
+}
+
+\caption{Hand-drawn histogram of 33 proportions.}\label{fig:sampling-exercise-5}
+\end{figure}
+
+Observe the following about the histogram in Figure \ref{fig:sampling-exercise-5}:
+
+\begin{itemize}
+\tightlist
+\item
+  At the low end, one group removed 50 balls from the bowl with proportion between 0.20 = 20\% and 0.25 = 25\%
+\item
+  At the high end, another group removed 50 balls from the bowl with proportion between 0.45 = 45\% and 0.5 = 50\% red.
+\item
+  However the most frequently occurring proportions were between 0.30 = 30\% and 0.35 = 35\% red, right in the middle of the distribution.
+\item
+  The shape of this distribution is somewhat bell-shaped.
+\end{itemize}
+
+Let's construct this same hand-drawn histogram in R using your data visualization skills that you honed in Chapter \ref{viz}. We saved our 33 group of friends' proportion red in a data frame \texttt{tactile\_prop\_red} which is included in the \texttt{moderndive} package you loaded earlier.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{tactile_prop_red}
+\KeywordTok{View}\NormalTok{(tactile_prop_red)}
+\end{Highlighting}
+\end{Shaded}
+
+Let's display only the first 10 out of 33 rows of \texttt{tactile\_prop\_red}'s contents in Table \ref{tab:tactilered}.
+
+\begingroup\fontsize{10}{12}\selectfont
+
+\begin{longtable}{lrrr}
+\caption{\label{tab:tactilered}First 10 out of 33 groups' proportion of 50 balls that are red.}\\
+\toprule
+group & replicate & red\_balls & prop\_red\\
+\midrule
+\endfirsthead
+\caption[]{\label{tab:tactilered}First 10 out of 33 groups' proportion of 50 balls that are red. \textit{(continued)}}\\
+\toprule
+group & replicate & red\_balls & prop\_red\\
+\midrule
+\endhead
+\
+\endfoot
+\bottomrule
+\endlastfoot
+Ilyas, Yohan & 1 & 21 & 0.42\\
+Morgan, Terrance & 2 & 17 & 0.34\\
+Martin, Thomas & 3 & 21 & 0.42\\
+Clark, Frank & 4 & 21 & 0.42\\
+Riddhi, Karina & 5 & 18 & 0.36\\
+\addlinespace
+Andrew, Tyler & 6 & 19 & 0.38\\
+Julia & 7 & 19 & 0.38\\
+Rachel, Lauren & 8 & 11 & 0.22\\
+Daniel, Caroline & 9 & 15 & 0.30\\
+Josh, Maeve & 10 & 17 & 0.34\\*
+\end{longtable}
+\endgroup{}
+
+Observe for each \texttt{group} we have their names, the number of \texttt{red\_balls} they obtained, and the corresponding proportion out of 50 balls that were red named \texttt{prop\_red}. Observe, we also have a variable \texttt{replicate} enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red.
+
+We visualize the distribution of these 33 proportions using a \texttt{geom\_histogram()} with \texttt{binwidth\ =\ 0.05} in Figure \ref{fig:samplingdistribution-tactile}, which is appropriate since the variable \texttt{prop\_red} is numerical. This computer-generated histogram matches our hand-drawn histogram from the earlier Figure \ref{fig:sampling-exercise-5}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(tactile_prop_red, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ prop_red)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.05}\NormalTok{, }\DataTypeTok{boundary =} \FloatTok{0.4}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Proportion of 50 balls that were red"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Distribution of 33 proportions red"}\NormalTok{) }
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/samplingdistribution-tactile-1} 
+
+}
+
+\caption{Distribution of 33 proportions based on 33 samples of size 50}\label{fig:samplingdistribution-tactile}
+\end{figure}
+
+\hypertarget{what-are-we-doing-here}{%
+\subsection{What are we doing here?}\label{what-are-we-doing-here}}
+
+What we just demonstrated in this activity is the statistical concept of \emph{sampling}. We would like to know the proportion of the bowl's balls that are red, but because the bowl has a very large number of balls performing an exhaustive count of the number of red and white balls in the bowl would be very costly in terms of both time and energy. We therefore extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we estimate the proportion of the bowl's balls that are red using the proportion of the shovel's balls that are red. This estimate in our earlier example was 17 red balls out of 50 balls = 34\%. Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table \ref{tab:tactilered}. This is known as the concept of \emph{sampling variation}.
+
+In Section \ref{sampling-simulation} we'll mimic the hands-on sampling activity we just performed in a \emph{computer simulation}; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the hands-on activity a very large number of times, but we will also be able to repeat it using different sized shovels.
+
+The purpose of these simulations is to develop an understanding of two key concepts relating to sampling: understanding the concept of sampling variation and the role that sample size plays in this variation. To this end, we'll present you with definitions, terminology, and notation related to sampling in Section \ref{sampling-framework}. As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you'll be able to master these topics.
+
+To tie the contents of this chapter to the real-word, we'll present an example of one of the most recognizable uses of sampling: polls. In Section \ref{sampling-case-study} we'll look at a particular case study: a 2013 poll on then President Obama's popularity among young Americans, conducted by the Harvard Kennedy School's Institute of Politics.
+
+We'll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distinguishing between \emph{random sampling} and \emph{random assignment}, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{sampling-simulation}{%
+\section{Computer simulation}\label{sampling-simulation}}
+
+What we performed in Section \ref{sampling-activity} is a \emph{simulation} of sampling. In other words, we were not in a real-life sampling scenario in order to answer a real-life question, but rather we were mimicking such a scenario with our bowl and shovel. The crowd-sourced Wikipedia definition of a simulation states: ``A simulation is an approximate imitation of the operation of a process or system.''\footnote{\href{https://en.wikipedia.org/wiki/Simulation}{Wikipedia entry for simulation}} One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible.
+
+Now you might be thinking that simulations must necessarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengers of being in an automobile crash. To distinguish between these two simulation types, we'll term a simulation performed in real-life as a ``tactile'' simulation done with your hands and to the touch as opposed to a ``virtual'' simulation performed on a computer.
+
+\begin{longtable}[]{@{}cc@{}}
+\toprule
+Example of a ``tactile'' simulation & Example of ``virtual'' simulation\tabularnewline
+\midrule
+\endhead
+\includegraphics[width=\textwidth,height=1.7in]{images/crash-test-dummy.jpg} & \includegraphics[width=\textwidth,height=1.7in]{images/flight-simulator.jpg}\tabularnewline
+\bottomrule
+\end{longtable}
+
+So while in Section \ref{sampling-activity} we performed a ``tactile'' simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we'll perform a ``virtual'' simulation using a ``virtual'' bowl and a ``virtual'' shovel with our computers.
+
+\hypertarget{using-the-virtual-shovel-once}{%
+\subsection{Using the virtual shovel once}\label{using-the-virtual-shovel-once}}
+
+Let's start by performing the virtual analogue of the tactile sampling simulation we performed in \ref{sampling-activity}. We first need a virtual analogue of the bowl seen in Figure \ref{fig:sampling-exercise-1}. To this end, we included a data frame \texttt{bowl} in the \texttt{moderndive} package whose rows correspond exactly with the contents of the actual bowl.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bowl}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 2,400 x 2
+   ball_ID color
+     <int> <chr>
+ 1       1 white
+ 2       2 white
+ 3       3 white
+ 4       4 red  
+ 5       5 white
+ 6       6 white
+ 7       7 red  
+ 8       8 white
+ 9       9 red  
+10      10 white
+# ... with 2,390 more rows
+\end{verbatim}
+
+Observe in the output that \texttt{bowl} has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable \texttt{ball\_ID} is used merely as an ``identification variable'' for this data frame as discussed in Subsection \ref{identification-vs-measurement-variables}; none of the balls in the actual bowl are marked with numbers. The second variable \texttt{color} indicates whether a particular virtual ball is red or white. View the contents of the bowl in RStudio's data viewer and scroll through the contents to convince yourselves that \texttt{bowl} is indeed a virtual version of the actual bowl in Figure \ref{fig:sampling-exercise-1}.
+
+Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure \ref{fig:sampling-exercise-2}; we'll use this virtual shovel to generate our virtual random samples of 50 balls. We're going to use the \texttt{rep\_sample\_n()} function included in the \texttt{moderndive} package. This function allows us to take \texttt{rep}eated, or \texttt{rep}licated, \texttt{samples} of size \texttt{n}. Run the following and explore \texttt{virtual\_shovel}'s contents in the RStudio viewer.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{virtual_shovel <-}\StringTok{ }\NormalTok{bowl }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rep_sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{50}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(virtual_shovel)}
+\end{Highlighting}
+\end{Shaded}
+
+Let's display only the first 10 out of 50 rows of \texttt{virtual\_shovel}'s contents in Table \ref{tab:virtual-shovel}.
+
+\begin{table}[H]
+
+\caption{\label{tab:virtual-shovel}First 10 sampled balls of 50 in virtual sample}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{rrr}
+\toprule
+replicate & ball\_ID & color\\
+\midrule
+1 & 1500 & white\\
+1 & 1767 & red\\
+1 & 1035 & white\\
+1 & 245 & white\\
+1 & 1121 & white\\
+\addlinespace
+1 & 1828 & white\\
+1 & 721 & white\\
+1 & 1729 & red\\
+1 & 770 & white\\
+1 & 1499 & red\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+The \texttt{ball\_ID} variable identifies which of the balls from \texttt{bowl} are included in our sample of 50 balls and \texttt{color} denotes its color. However what does the \texttt{replicate} variable indicate? In \texttt{virtual\_shovel}'s case, \texttt{replicate} is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in our case our first sample. We'll see below when we ``virtually'' take 33 samples, \texttt{replicate} will take values between 1 and 33. Before we do this, let's compute the proportion of balls in our virtual sample of size 50 that are red using the \texttt{dplyr} data wrangling verbs you learned in Chapter \ref{wrangling}. Let's breakdown the steps individually:
+
+First, for each of our 50 sampled balls, identify if it is red using a test for equality using \texttt{==}. For every row where \texttt{color\ ==\ "red"}, the Boolean \texttt{TRUE} is returned and for every row where \texttt{color} is not equal to \texttt{"red"}, the Boolean \texttt{FALSE} is returned. Let's create a new Boolean variable \texttt{is\_red} using the \texttt{mutate()} function from Section \ref{mutate}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{virtual_shovel }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{is_red =}\NormalTok{ (color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 50 x 4
+# Groups:   replicate [1]
+   replicate ball_ID color is_red
+       <int>   <int> <chr> <lgl> 
+ 1         1    1500 white FALSE 
+ 2         1    1767 red   TRUE  
+ 3         1    1035 white FALSE 
+ 4         1     245 white FALSE 
+ 5         1    1121 white FALSE 
+ 6         1    1828 white FALSE 
+ 7         1     721 white FALSE 
+ 8         1    1729 red   TRUE  
+ 9         1     770 white FALSE 
+10         1    1499 red   TRUE  
+# ... with 40 more rows
+\end{verbatim}
+
+Second, we compute the number of balls out of 50 that are red using the \texttt{summarize()} function. Recall from Section \ref{summarize} that \texttt{summarize()} takes a data frame with many rows and returns a data frame with a single row containing summary statistics that you specify, like \texttt{mean()} and \texttt{median()}. In this case we use the \texttt{sum()}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{virtual_shovel }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{is_red =}\NormalTok{ (color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{num_red =} \KeywordTok{sum}\NormalTok{(is_red))  }
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  replicate num_red
+      <int>   <int>
+1         1      17
+\end{verbatim}
+
+Why does this work? Because R treats \texttt{TRUE} like the number \texttt{1} and \texttt{FALSE} like the number \texttt{0}. So summing the number of \texttt{TRUE}'s and \texttt{FALSE}'s is equivalent to summing \texttt{1}'s and \texttt{0}'s, which in the end counts the number of balls where \texttt{color} is \texttt{red}. In our case, 17 of the 50 balls were red.
+
+Third and last, we compute the proportion of the 50 sampled balls that are red by dividing \texttt{num\_red} by 50:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{virtual_shovel }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{is_red =}\NormalTok{ color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{num_red =} \KeywordTok{sum}\NormalTok{(is_red)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{prop_red =}\NormalTok{ num_red }\OperatorTok{/}\StringTok{ }\DecValTok{50}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 3
+  replicate num_red prop_red
+      <int>   <int>    <dbl>
+1         1      17     0.34
+\end{verbatim}
+
+In other words, this ``virtual'' sample's balls were 34\% red. Let's make the above code a little more compact and succinct by combining the first \texttt{mutate()} and the \texttt{summarize()} as follows:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{virtual_shovel }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{num_red =} \KeywordTok{sum}\NormalTok{(color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{prop_red =}\NormalTok{ num_red }\OperatorTok{/}\StringTok{ }\DecValTok{50}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 3
+  replicate num_red prop_red
+      <int>   <int>    <dbl>
+1         1      17     0.34
+\end{verbatim}
+
+Great! 34\% of \texttt{virtual\_shovel}'s 50 balls were red! So based on this particular sample, our guess at the proportion of the \texttt{bowl}'s balls that are red is 34\%. But remember from our earlier tactile sampling activity that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 34\% of them being red again; there will likely be some variation. In fact in Table \ref{tab:virtual-shovel} we displayed 33 such proportions based on 33 tactile samples and then in Figure \ref{fig:sampling-exercise-5} we visualized the distribution of the 33 proportions in a histogram. Let's now perform the virtual analogue of having 33 groups of students use the sampling shovel!
+
+\hypertarget{using-the-virtual-shovel-33-times}{%
+\subsection{Using the virtual shovel 33 times}\label{using-the-virtual-shovel-33-times}}
+
+Recall that in our tactile sampling exercise in Section \ref{sampling-activity} we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we then used to compute 33 proportions. In other words we repeated/replicated using the shovel 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel function \texttt{rep\_sample\_n()}, but by adding the \texttt{reps\ =\ 33} argument, indicating we want to repeat the sampling 33 times. Be sure to scroll through the contents of \texttt{virtual\_samples} in RStudio's viewer.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{virtual_samples <-}\StringTok{ }\NormalTok{bowl }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rep_sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{50}\NormalTok{, }\DataTypeTok{reps =} \DecValTok{33}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(virtual_samples)}
+\end{Highlighting}
+\end{Shaded}
+
+Observe that while the first 50 rows of \texttt{replicate} are equal to \texttt{1}, the next 50 rows of \texttt{replicate} are equal to \texttt{2}. This is telling us that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all \texttt{reps\ =\ 33} replicates and thus \texttt{virtual\_samples} has 33 \(\times\) 50 = 1650 rows.
+
+Let's now take the data frame \texttt{virtual\_samples} with 33 \(\times\) 50 = 1650 rows corresponding to 33 samples of size 50 balls and compute the resulting 33 proportions red. We'll use the same \texttt{dplyr} verbs as we did in the previous section, but this time with an additional \texttt{group\_by()} of the \texttt{replicate} variable. Recall from Section \ref{groupby} that by assigning the grouping variable ``meta-data'' before \texttt{summarizing()}, we'll obtain 33 different proportions red:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{virtual_prop_red <-}\StringTok{ }\NormalTok{virtual_samples }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(replicate) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{red =} \KeywordTok{sum}\NormalTok{(color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{prop_red =}\NormalTok{ red }\OperatorTok{/}\StringTok{ }\DecValTok{50}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(virtual_prop_red)}
+\end{Highlighting}
+\end{Shaded}
+
+Let's display only the first 10 out of 33 rows of \texttt{virtual\_prop\_red}'s contents in Table \ref{tab:tactilered}. As one would expect, there is variation in the resulting \texttt{prop\_red} proportions red for the first 10 out 33 repeated/replicated samples.
+
+\begingroup\fontsize{10}{12}\selectfont
+
+\begin{longtable}{rrr}
+\caption{\label{tab:virtualred}First 10 out of 33 virtual proportion of 50 balls that are red.}\\
+\toprule
+replicate & red & prop\_red\\
+\midrule
+\endfirsthead
+\caption[]{\label{tab:virtualred}First 10 out of 33 virtual proportion of 50 balls that are red. \textit{(continued)}}\\
+\toprule
+replicate & red & prop\_red\\
+\midrule
+\endhead
+\
+\endfoot
+\bottomrule
+\endlastfoot
+1 & 18 & 0.36\\
+2 & 20 & 0.40\\
+3 & 19 & 0.38\\
+4 & 18 & 0.36\\
+5 & 15 & 0.30\\
+\addlinespace
+6 & 18 & 0.36\\
+7 & 19 & 0.38\\
+8 & 13 & 0.26\\
+9 & 23 & 0.46\\
+10 & 14 & 0.28\\*
+\end{longtable}
+\endgroup{}
+
+Let's visualize the distribution of these 33 proportions red based on 33 virtual samples using a histogram with \texttt{binwidth\ =\ 0.05} in Figure \ref{fig:samplingdistribution-virtual}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(virtual_prop_red, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ prop_red)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.05}\NormalTok{, }\DataTypeTok{boundary =} \FloatTok{0.4}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Proportion of 50 balls that were red"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Distribution of 33 proportions red"}\NormalTok{) }
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/samplingdistribution-virtual-1} 
+
+}
+
+\caption{Distribution of 33 proportions based on 33 samples of size 50}\label{fig:samplingdistribution-virtual}
+\end{figure}
+
+Observe that occasionally we obtained proportions red that are less than 0.3 = 30\%, while on the other hand we occasionally we obtained proportions that are greater than 0.45 = 45\%. However, the most frequently occurring proportions red out of 50 balls were between 35\% and 40\% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation.
+
+Let's now compare our virtual results with our tactile results from the previous section in Figure \ref{fig:tactile-vs-virtual}. We see that both histograms, in other words the distribution of the 33 proportions red, are \emph{somewhat} similar in their center and spread although not identical. These slight differences are again due to random variation. Furthermore both distributions are \emph{somewhat} bell-shaped.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/tactile-vs-virtual-1} 
+
+}
+
+\caption{Comparing 33 virtual and 33 tactile proportions red.}\label{fig:tactile-vs-virtual}
+\end{figure}
+
+\hypertarget{using-the-virtual-shovel-1000-times}{%
+\subsection{Using the virtual shovel 1000 times}\label{using-the-virtual-shovel-1000-times}}
+
+Now say we want study the variation in proportions red not based on 33 repeated/replicated samples, but rather a very large number of samples say 1000 samples. We have two choices at this point. We could have our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. This would be cruel and unusual however, as this would be very tedious and time-consuming. This is where computers excel: automating long and repetitive tasks while performing them very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let's once again use the \texttt{rep\_sample\_n()} function with sample \texttt{size} set to 50 once again, but this time with the number of replicates \texttt{reps\ =\ 1000}. Be sure to scroll through the contents of \texttt{virtual\_samples} in RStudio's viewer.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{virtual_samples <-}\StringTok{ }\NormalTok{bowl }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rep_sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{50}\NormalTok{, }\DataTypeTok{reps =} \DecValTok{1000}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(virtual_samples)}
+\end{Highlighting}
+\end{Shaded}
+
+Observe that now \texttt{virtual\_samples} has 1000 \(\times\) 50 = 50,000 rows, instead of the 33 \(\times\) 50 = 1650 rows from earlier. Using the same code as earlier, let's take the data frame \texttt{virtual\_samples} with 1000 \(\times\) 50 = 50,000 and compute the resulting 1000 proportions red.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{virtual_prop_red <-}\StringTok{ }\NormalTok{virtual_samples }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(replicate) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{red =} \KeywordTok{sum}\NormalTok{(color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{prop_red =}\NormalTok{ red }\OperatorTok{/}\StringTok{ }\DecValTok{50}\NormalTok{)}
+\KeywordTok{View}\NormalTok{(virtual_prop_red)}
+\end{Highlighting}
+\end{Shaded}
+
+Observe that we now have 1000 replicates of \texttt{prop\_red}, the proportion of 50 balls that are red. Using the same code as earlier, let's now visualize the distribution of these 1000 replicates of \texttt{prop\_red} in a histogram in Figure \ref{fig:samplingdistribution-virtual-1000}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(virtual_prop_red, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ prop_red)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.05}\NormalTok{, }\DataTypeTok{boundary =} \FloatTok{0.4}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Proportion of 50 balls that were red"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Distribution of 1000 proportions red"}\NormalTok{) }
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/samplingdistribution-virtual-1000-1} 
+
+}
+
+\caption{Distribution of 1000 proportions based on 33 samples of size 50}\label{fig:samplingdistribution-virtual-1000}
+\end{figure}
+
+Once again, the most frequently occurring proportions red occur between 35\% and 40\%. Every now and then, we obtain proportions as low as between 20\% and 25\%, and others as high as between 55\% and 60\%. These are rare however. Furthermore observe that we now have a much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix \ref{appendixA} for a brief discussion on properties of the Normal distribution.
+
+\hypertarget{using-different-shovels}{%
+\subsection{Using different shovels}\label{using-different-shovels}}
+
+Now say instead of just one shovel, you had three choices of shovels to extract a sample of balls with.
+
+\begin{longtable}[]{@{}ccc@{}}
+\toprule
+A shovel with 25 slots & A shovel with 50 slots & A shovel with 100 slots\tabularnewline
+\midrule
+\endhead
+\includegraphics[width=\textwidth,height=1.7in]{images/sampling/shovel_025.jpg} & \includegraphics[width=\textwidth,height=1.7in]{images/sampling/shovel_050.jpg} & \includegraphics[width=\textwidth,height=1.7in]{images/sampling/shovel_100.jpg}\tabularnewline
+\bottomrule
+\end{longtable}
+
+If your goal was still to estimate the proportion of the bowl's balls that were red, which shovel would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size and hence would yield the ``best'' guess of the proportion of the bowl's 2400 balls that are red. Using our newly developed tools for virtual sampling simulations, let's unpack the effect of having different sample sizes! In other words, let's use \texttt{rep\_sample\_n()} with \texttt{size\ =\ 25}, \texttt{size\ =\ 50}, and \texttt{size\ =\ 100}, while keeping the number of repeated/replicated samples at 1000:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Virtually use the appropriate shovel to generate 1000 samples with \texttt{size} balls.
+\item
+  Compute the resulting 1000 replicated of the proportion of the shovel's balls that are red.
+\item
+  Visualize the distribution of these 1000 proportion red using a histogram.
+\end{enumerate}
+
+Run each of the following code segments individually and then compare the three resulting histograms.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Segment 1: sample size = 25 ------------------------------}
+\CommentTok{# 1.a) Virtually use shovel 1000 times}
+\NormalTok{virtual_samples_}\DecValTok{25}\NormalTok{ <-}\StringTok{ }\NormalTok{bowl }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rep_sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{25}\NormalTok{, }\DataTypeTok{reps =} \DecValTok{1000}\NormalTok{)}
+
+\CommentTok{# 1.b) Compute resulting 1000 replicates of proportion red}
+\NormalTok{virtual_prop_red_}\DecValTok{25}\NormalTok{ <-}\StringTok{ }\NormalTok{virtual_samples_}\DecValTok{25} \OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(replicate) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{red =} \KeywordTok{sum}\NormalTok{(color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{prop_red =}\NormalTok{ red }\OperatorTok{/}\StringTok{ }\DecValTok{25}\NormalTok{)}
+
+\CommentTok{# 1.c) Plot distribution via a histogram}
+\KeywordTok{ggplot}\NormalTok{(virtual_prop_red_}\DecValTok{25}\NormalTok{, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ prop_red)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.05}\NormalTok{, }\DataTypeTok{boundary =} \FloatTok{0.4}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Proportion of 25 balls that were red"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"25"}\NormalTok{) }
+
+\CommentTok{# Segment 2: sample size = 50 ------------------------------}
+\CommentTok{# 2.a) Virtually use shovel 1000 times}
+\NormalTok{virtual_samples_}\DecValTok{50}\NormalTok{ <-}\StringTok{ }\NormalTok{bowl }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rep_sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{50}\NormalTok{, }\DataTypeTok{reps =} \DecValTok{1000}\NormalTok{)}
+
+\CommentTok{# 2.b) Compute resulting 1000 replicates of proportion red}
+\NormalTok{virtual_prop_red_}\DecValTok{50}\NormalTok{ <-}\StringTok{ }\NormalTok{virtual_samples_}\DecValTok{50} \OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(replicate) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{red =} \KeywordTok{sum}\NormalTok{(color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{prop_red =}\NormalTok{ red }\OperatorTok{/}\StringTok{ }\DecValTok{50}\NormalTok{)}
+
+\CommentTok{# 2.c) Plot distribution via a histogram}
+\KeywordTok{ggplot}\NormalTok{(virtual_prop_red_}\DecValTok{50}\NormalTok{, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ prop_red)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.05}\NormalTok{, }\DataTypeTok{boundary =} \FloatTok{0.4}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Proportion of 50 balls that were red"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"50"}\NormalTok{)  }
+
+\CommentTok{# Segment 3: sample size = 100 ------------------------------}
+\CommentTok{# 3.a) Virtually using shovel with 100 slots 1000 times}
+\NormalTok{virtual_samples_}\DecValTok{100}\NormalTok{ <-}\StringTok{ }\NormalTok{bowl }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rep_sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{100}\NormalTok{, }\DataTypeTok{reps =} \DecValTok{1000}\NormalTok{)}
+
+\CommentTok{# 3.b) Compute resulting 1000 replicates of proportion red}
+\NormalTok{virtual_prop_red_}\DecValTok{100}\NormalTok{ <-}\StringTok{ }\NormalTok{virtual_samples_}\DecValTok{100} \OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(replicate) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{red =} \KeywordTok{sum}\NormalTok{(color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{prop_red =}\NormalTok{ red }\OperatorTok{/}\StringTok{ }\DecValTok{100}\NormalTok{)}
+
+\CommentTok{# 3.c) Plot distribution via a histogram}
+\KeywordTok{ggplot}\NormalTok{(virtual_prop_red_}\DecValTok{100}\NormalTok{, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ prop_red)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.05}\NormalTok{, }\DataTypeTok{boundary =} \FloatTok{0.4}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Proportion of 100 balls that were red"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"100"}\NormalTok{) }
+\end{Highlighting}
+\end{Shaded}
+
+For easy comparison, we present the three resulting histograms in a single row with matching x and y axes in Figure \ref{fig:comparing-sampling-distributions}. What do you observe?
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/comparing-sampling-distributions-1} 
+
+}
+
+\caption{Comparing the distributions of proportion red for different sample sizes}\label{fig:comparing-sampling-distributions}
+\end{figure}
+
+Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation and the distribution centers more tightly around the same value. Eyeballing Figure \ref{fig:comparing-sampling-distributions}, things appear to center tightly around roughly 40\%.
+
+We can be numerically explicit about the amount of spread in our 3 sets of 1000 values of \texttt{prop\_red} using the \emph{standard deviation}: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix \ref{appendixA} for a brief discussion on properties of the standard deviation. For all three sample sizes, let's compute the standard deviation of the 1000 proportions red by running the following data wrangling code that uses the \texttt{sd()} summary function.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# n = 25}
+\NormalTok{virtual_prop_red_}\DecValTok{25} \OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{sd =} \KeywordTok{sd}\NormalTok{(prop_red))}
+
+\CommentTok{# n = 50}
+\NormalTok{virtual_prop_red_}\DecValTok{50} \OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{sd =} \KeywordTok{sd}\NormalTok{(prop_red))}
+
+\CommentTok{# n = 100}
+\NormalTok{virtual_prop_red_}\DecValTok{100} \OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{sd =} \KeywordTok{sd}\NormalTok{(prop_red))}
+\end{Highlighting}
+\end{Shaded}
+
+Let's compare these 3 measures of spread of the distributions in Table \ref{tab:comparing-n}.
+
+\begin{table}[H]
+
+\caption{\label{tab:comparing-n}Comparing standard deviations of proportions red for 3 different shovels.}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{rr}
+\toprule
+Number of slots in shovel & Standard deviation of proportions red\\
+\midrule
+25 & 0.099\\
+50 & 0.071\\
+100 & 0.048\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+As we observed visually in Figure \ref{fig:comparing-sampling-distributions}, as the sample size increases our numerical measure of spread decreases; there is less variation in our proportions red. In other words, as the sample size increases, our guesses at the true proportion of the bowl's balls that are red get more consistent and precise.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{sampling-framework}{%
+\section{Sampling framework}\label{sampling-framework}}
+
+In both our ``hands-on'' tactile simulations and our ``virtual'' simulations using a computer, we used sampling for the purpose of estimation: we extract samples in order to estimate the proportion of the bowl's balls that are red. We used sampling as a cheaper and less-time consuming approach than to do a full census of all the balls. Our virtual simulations all built up to the results shown in Figure \ref{fig:comparing-sampling-distributions} and Table \ref{tab:comparing-n}, comparing 1000 proportions red based on samples of size 25, 50, and 100. This was our first attempt at understanding two key concepts relating to sampling for estimation:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The effect of sampling variation on our estimates.
+\item
+  The effect of sample size on sampling variation.
+\end{enumerate}
+
+Let's now introduce some terminology and notation as well as statistical definitions related to sampling. Given the number of new words to learn, you will likely have to read these next three subsections multiple times. Keep in mind however that none of the concepts underlying these terminology, notation, and definitions are any different than the concepts underlying our simulations in Sections \ref{sampling-activity} and \ref{sampling-simulation}; it will simply take time and practice to master them.
+
+\hypertarget{terminology-notation}{%
+\subsection{Terminology \& notation}\label{terminology-notation}}
+
+Here is a list of terminology and mathematical notation relating to sampling. For each item, we'll be sure to tie them to our simulations in Sections \ref{sampling-activity} and \ref{sampling-simulation}.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{(Study) Population}: A (study) population is a collection of individuals or observations about which we are interested. We mathematically denote the population's size using upper case \(N\). In our simulations the (study) population was the collection of \(N\) = 2400 identically sized red and white balls contained in the bowl.
+\item
+  \textbf{Population parameter}: A population parameter is a numerical summary quantity about the population that is unknown, but you wish you knew. For example, when this quantity is a mean, the population parameter of interest is the \emph{population mean} which is mathematically denoted with the Greek letter \(\mu\) (pronounced ``mu''). In our simulations however since we were interested in the proportion of the bowl's balls that were red, the population parameter is the \emph{population proportion} which is mathematically denoted with the letter \(p\).
+\item
+  \textbf{Census}: An exhaustive enumeration or counting of all \(N\) individuals or observations in the population in order to compute the population parameter's value \emph{exactly}. In our simulations, this would correspond to manually going over all \(N\) = 2400 balls in the bowl and counting the number that are red and computing the population proportion \(p\) of the balls that are red \emph{exactly}. When the number \(N\) of individuals or observations in our population is large, as was the case with our bowl, a census can be very expensive in terms of time, energy, and money.
+\item
+  \textbf{Sampling}: Sampling is the act of collecting a sample from the population when we don't have the means to perform a census. We mathematically denote the sample's size using lower case \(n\), as opposed to upper case \(N\) which denotes the population's size. Typically the sample size \(n\) is much smaller than the population size \(N\), thereby making sampling a much cheaper procedure than a census. In our simulations, we used shovels with 25, 50, and 100 slots to extract a sample of size \(n\) = 25, \(n\) = 50, and \(n\) = 100 balls.
+\item
+  \textbf{Point estimate (AKA sample statistic)}: A summary statistic computed from the sample that \emph{estimates} the unknown population parameter. In our simulations, recall that the unknown population parameter was the population proportion and that this is mathematically denoted with \(p\). Our point estimate is the \emph{sample proportion}: the proportion of the shovel's balls that are red. In other words, it is our guess of the proportion of the bowl's balls balls that are red. We mathematically denote the sample proportion using \(\widehat{p}\); the ``hat'' on top of the \(p\) indicates that it is an estimate of the unknown population proportion \(p\).
+\item
+  \textbf{Representative sampling}: A sample is said be a \emph{representative sample} if it is representative of the population. In other words, are the sample's characteristics a good representation of the population's characteristics? In our simulations, are the samples of \(n\) balls extracted using our shovels representative of the bowl's \(N\)=2400 balls?
+\item
+  \textbf{Generalizability}: We say a sample is \emph{generalizable} if any results based on the sample can generalize to the population. In other words, can the value of the point estimate be generalized to estimate the value of the population parameter well? In our simulations, can we generalize the values of the sample proportions red of our shovels to the population proportion red of the bowl? Using mathematical notation, is \(\widehat{p}\) a ``good guess'' of \(p\)?
+\item
+  \textbf{Bias}: In a statistical sense, we say \emph{bias} occurs if certain individuals or observations in a population have a higher chance of being included in a sample than others. We say a sampling procedure is \emph{unbiased} if every observation in a population had an equal chance of being sampled. In our simulations, since each ball had the same size and hence an equal chance of being sample in our shovels, our samples were unbiased.
+\item
+  \textbf{Random sampling}: We say a sampling procedure is \emph{random} if we sample randomly from the population in an unbiased fashion. In our simulations, this would correspond to sufficiently mixing the bowl before each use of the shovel.
+\end{enumerate}
+
+Phew, that's a lot of new terminology and notation to learn! Let's put them all together to describe the paradigm of sampling:
+
+\begin{quote}
+\begin{itemize}
+\tightlist
+\item
+  If the sampling of a sample of size \(n\) is done at \textbf{random}, then
+\item
+  the sample is \textbf{unbiased} and \textbf{representative} of the population of size \(N\), thus
+\item
+  any result based on the sample can \textbf{generalize} to the population, thus
+\item
+  the point estimate is a \textbf{``good guess''} of the unknown population parameter, thus
+\item
+  instead of performing a census, we can \textbf{infer} about the population using sampling.
+\end{itemize}
+\end{quote}
+
+Restricting consideration to a shovel with 50 slots from our simulations,
+
+\begin{quote}
+\begin{itemize}
+\tightlist
+\item
+  If we extract a sample of \(n=50\) balls at \textbf{random}, in other words we mix the equally-sized balls before using the shovel, then
+\item
+  the contents of the shovel are an \textbf{unbiased representation} of the contents of the bowl's 2400 balls, thus
+\item
+  any result based on the sample of balls can \textbf{generalize} to the bowl, thus
+\item
+  the sample proportion \(\widehat{p}\) of the \(n=50\) balls in the shovel that are red is a \textbf{``good guess''} of the population proportion \(p\) of the \(N\)=2400 balls that are red, thus
+\item
+  instead of manually going over all the balls in the bowl, we can \textbf{infer} about the bowl using the shovel.
+\end{itemize}
+\end{quote}
+
+Note that last word we wrote in bold: \textbf{infer}. The act of ``inferring'' is to deduce or conclude (information) from evidence and reasoning. In our simulations, we wanted to infer about the proportion of the bowl's balls that are red. \emph{Statistical inference} is the theory, methods, and practice of forming judgments about the parameters of a population and the reliability of statistical relationships, typically on the basis of random sampling (Wikipedia). In other words, statistical inference is the act of inference via sampling. In the upcoming Chapter \ref{confidence-intervals} on confidence intervals, we'll introduce the \texttt{infer} package, which makes statistical inference ``tidy'' and transparent. It is why this third portion of the book is called ``Statistical inference via infer''.
+
+\hypertarget{statistical-definitions}{%
+\subsection{Statistical definitions}\label{statistical-definitions}}
+
+Now for some important statistical definitions related to sampling. As a refresher of our 1000 repeated/replicated virtual samples of size \(n\) = 25, \(n\) = 50, and \(n\) = 100 in Section \ref{sampling-simulation}, let's display Figure \ref{fig:comparing-sampling-distributions} again below.
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-247-1} \end{center}
+
+These types of distributions have a special name: \textbf{sampling distributions}; their visualization displays the effect of sampling variation on the distribution of any point estimate, in this case the sample proportion \(\widehat{p}\). Using these sampling distributions, for a given sample size \(n\), we can make statements about what values we can typically expect. For example, observe the centers of all three sampling distributions: they are all roughly centered around 0.4 = 40\%. Furthermore, observe that while we are somewhat likely to observe sample proportions red of 0.2 = 20\% when using the shovel with 25 slots, we will almost never observe this sample proportion when using the shovel with 100 slots. Observe also the effect of sample size on the sampling variation. As the sample size \(n\) increases from 25 to 50 to 100, the spread/variation of the sampling distribution decreases and thus the values cluster more and more tightly around the same center of around 40\%. We quantified this spread/variation using the standard deviation of our proportions in Table \ref{tab:comparing-n}, which we display again below:
+
+\begin{tabular}{r|r}
+\hline
+Number of slots in shovel & Standard deviation of proportions red\\
+\hline
+25 & 0.099\\
+\hline
+50 & 0.071\\
+\hline
+100 & 0.048\\
+\hline
+\end{tabular}
+
+So as the number of slots in the shovel increased, this standard deviation decreased. These types of standard deviations have another special name: \textbf{standard errors}; they quantify the effect of sampling variation induced on our estimates. In other words, they are quantifying how much we can expect different proportions of a shovel's balls that are red to vary from random sample to random sample.
+
+Unfortunately, many new statistics practitioners get confused by these names. For example, it's common for people new to statistical inference to call the ``sampling distribution'' the ``sample distribution''. Another additional source of confusion is the name ``standard deviation'' and ``standard error''. Remember that a standard error is merely a \emph{kind} of standard deviation: the standard deviation of any point estimate from a sampling scenario. In other words, all standard errors are standard deviations, but not all standard deviations are a standard error.
+
+To help reinforce these concepts, let's re-display Figure \ref{fig:comparing-sampling-distributions} but using our new terminology, notation, and definitions relating to sampling in Figure \ref{fig:comparing-sampling-distributions-2}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/comparing-sampling-distributions-2-1} 
+
+}
+
+\caption{Three sampling distributions of the sample proportion $\widehat{p}$.}\label{fig:comparing-sampling-distributions-2}
+\end{figure}
+
+Furthermore, let's re-display Table \ref{tab:comparing-n} but using our new terminology, notation, and definitions relating to sampling in Table \ref{tab:comparing-n-2}.
+
+\begin{table}[H]
+
+\caption{\label{tab:comparing-n-2}Three standard errors of the sample proportion $\widehat{p}$ based on n = 25, 50, 100. }
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lr}
+\toprule
+Sample size & Standard error of \$\textbackslash{}widehat\{p\}\$\\
+\midrule
+n = 25 & 0.099\\
+n = 50 & 0.071\\
+n = 100 & 0.048\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Remember the key message of this last table: that as the sample size \(n\) goes up, the ``typical'' error of your point estimate as quantified by the standard error will go down.
+
+\hypertarget{the-moral-of-the-story}{%
+\subsection{The moral of the story}\label{the-moral-of-the-story}}
+
+Let's recap this section so far. We've seen that if a sample is generated at random, then the resulting point estimate is a ``good guess'' of the true unknown population parameter. In our simulations, since we made sure to mix the balls first before extracting a sample with the shovel, the resulting sample proportion \(\widehat{p}\) of the shovel's balls that were red was a ``good guess'' of the population proportion \(p\) of the bowl's balls that were red.
+
+However, what do we mean by our point estimate being a ``good guess''? While sometimes we'll obtain a point estimate less than the true value of the unknown population parameter, other times we'll obtain a point estimate greater than the true value of the unknown population parameter, this is because of sampling variation. However despite this sampling variation, our point estimates will ``on average'' be correct. In our simulations, sometimes our sample proportion \(\widehat{p}\) was less than the true population proportion \(p\), other times the sample proportion \(\widehat{p}\) was greater than the true population proportion \(p\). This was due to the sampling variability induced by the mixing. However despite this sampling variation, our sample proportions \(\widehat{p}\) were always centered around the true population proportion. This is also known as having an \textbf{accurate} estimate.
+
+What was the value of the population proportion \(p\) of the \(N\) = 2400 balls in the actual bowl? There were 900 red balls, for a proportion red of 900/2400 = 0.375 = 37.5\%! How do we know this? Did the authors do an exhaustive count of all the balls? No! They were listed on the contexts of the box that the bowl came in. Hence we made the contents of the virtual \texttt{bowl} match the tactile bowl:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bowl }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{sum_red =} \KeywordTok{sum}\NormalTok{(color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{), }
+            \DataTypeTok{sum_not_red =} \KeywordTok{sum}\NormalTok{(color }\OperatorTok{!=}\StringTok{ "red"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  sum_red sum_not_red
+    <int>       <int>
+1     900        1500
+\end{verbatim}
+
+Let's re-display our sampling distributions from Figures \ref{fig:comparing-sampling-distributions} and \ref{fig:comparing-sampling-distributions-2}, but now with a vertical red line marking the true population proportion \(p\) of balls that are red = 37.5\% in Figure \ref{fig:comparing-sampling-distributions-3}. We see that while there is a certain amount of error in the sample proportions \(\widehat{p}\) for all three sampling distributions, on average the \(\widehat{p}\) are centered at the true population proportion red \(p\).
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/comparing-sampling-distributions-3-1} 
+
+}
+
+\caption{Three sampling distributions with population proportion $p$ marked in red.}\label{fig:comparing-sampling-distributions-3}
+\end{figure}
+
+We also saw in this section that as your sample size \(n\) increases, your point estimates will vary less and less and be more and more concentrated around the true population parameter; this is quantified by the decreasing standard error. In other words, the typical error of your point estimates will decrease. In our simulations, as the sample size increases, the spread/variation of our sample proportions \(\widehat{p}\) around the true population proportion \(p\) decreases. You can observe this behavior as well in Figure \ref{fig:comparing-sampling-distributions-3}. This is also known as having a more \textbf{precise} estimate.
+
+So random sampling ensures our point estimates are accurate, while having a large sample size ensures our point estimates are precise. While accuracy and precision may sound like the same concept, they are actually not. Accuracy relates to how ``on target'' our estimates are whereas precision relates to how ``consistent'' our estimates are. Figure \ref{fig:accuracy-vs-precision} illustrates the difference.
+
+\begin{figure}
+
+{\centering \includegraphics[width=0.5\linewidth]{images/accuracy_vs_precision} 
+
+}
+
+\caption{Comparing accuracy and precision}\label{fig:accuracy-vs-precision}
+\end{figure}
+
+As this point you might be asking yourself: ``If you already knew the true proportion of the bowl's balls that are red was 37.5\%, then what did we do any of this for?'' In other words, ``If you already knew the value of the true unknown population parameter, then why did we do any sampling?'' You might also be asking: ``Why did we take 1000 repeated/replicated samples of size n = 25, 50, and 100? Shouldn't we be taking only \emph{one} sample that's as large as possible?'' Recall our definition of a simulation from Section \ref{sampling-simulation}: an approximate imitation of the operation of a process or system. We performed these simulations to study:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The effect of sampling variation on our estimates.
+\item
+  The effect of sample size on sampling variation.
+\end{enumerate}
+
+In a real-life scenario, we won't know what the true value of the population parameter is and furthermore we won't take repeated/replicated samples but rather a single sample that's as large as we can afford. This was also done to show the power of the technique of sampling when trying to estimate a population parameter. Since we knew the value was 37.5\%, we could show just how well the different sample sizes approximated this value in their sampling distributions. We present one case study of a real-life sampling scenario in the next section: polling.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{sampling-case-study}{%
+\section{Case study: Polls}\label{sampling-case-study}}
+
+In December 4, 2013 National Public Radio in the US reported on a recent, at the time, poll of President Obama's approval rating among young Americans aged 18-29 in an article \href{https://www.npr.org/sections/itsallpolitics/2013/12/04/248793753/poll-support-for-obama-among-young-americans-eroding}{Poll: Support For Obama Among Young Americans Eroding}. A quote from the article:
+
+\begin{quote}
+After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama.
+
+According to a new Harvard University Institute of Politics poll, just 41 percent of millennials --- adults ages 18-29 --- approve of Obama's job performance, his lowest-ever standing among the group and an 11-point drop from April.
+\end{quote}
+
+Let's tie elements of the real-life poll in this new article with our ``tactile'' and ``virtual'' simulations from Sections \ref{sampling-activity} and \ref{sampling-simulation} using the terminology, notations, and definitions we learned in Section \ref{sampling-framework}.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{(Study) Population}: Who is the population of \(N\) individuals or observations of interest?
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Simulation: \(N\) = 2400 identically-sized red and white balls
+  \item
+    Obama poll: \(N\) = ? young Americans aged 18-29
+  \end{itemize}
+\item
+  \textbf{Population parameter}: What is the population parameter?
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Simulation: The population proportion \(p\) of ALL the balls in the bowl that are red.
+  \item
+    Obama poll: The population proportion \(p\) of ALL young Americans who approve of Obama's job performance.
+  \end{itemize}
+\item
+  \textbf{Census}: What would a census look like?
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Simulation: Manually going over all \(N\) = 2400 balls and exactly computing the population proportion \(p\) of the balls that are red, a time consuming task.
+  \item
+    Obama poll: Locating all \(N\) = ? young Americans and asking them all if they approve of Obama's job performance, an expensive task.
+  \end{itemize}
+\item
+  \textbf{Sampling}: How do you collect the sample of size \(n\) individuals or observations?
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Simulation: Using a shovel with \(n\) slots.
+  \item
+    Obama poll: One method is to get a list of phone numbers of all young Americans and pick out \(n\) phone numbers. In this poll's case, the sample size of this poll was \(n\) = 2089 young Americans.
+  \end{itemize}
+\item
+  \textbf{Point estimate (AKA sample statistic)}: What is your estimate of the unknown population parameter?
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Simulation: The sample proportion \(\widehat{p}\) of the balls in the shovel that were red.
+  \item
+    Obama poll: The sample proportion \(\widehat{p}\) of young Americans in the sample that approve of Obama's job performance. In this poll's case, \(\widehat{p}\) = 0.41 = 41\%, the quoted percentage in the second paragraph of the article.
+  \end{itemize}
+\item
+  \textbf{Representative sampling}: Is the sampling procedure \emph{representative}?
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Simulation: Are the contents of the shovel representative of the contents of the bowl?
+  \item
+    Obama poll: Is the sample of \(n\) = 2089 young Americans representative of all young Americans aged 18-29?
+  \end{itemize}
+\item
+  \textbf{Generalizability}: Are the samples \emph{generalizable} to the greater population?
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Simulation: Is the sample proportion \(\widehat{p}\) of the shovel's balls that are red a ``good guess'' of the population proportion \(p\) of the bowl's balls that are red?
+  \item
+    Obama poll: Is the sample proportion \(\widehat{p}\) = 0.41 of the sample of young Americans who support Obama a ``good guess'' of the population proportion \(p\) of all young Americans who support Obama? In other words, can we confidently say that 41\% of \emph{all} young Americans approve of Obama?
+  \end{itemize}
+\item
+  \textbf{Bias}: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample?
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Simulation: Since each ball was equally sized, each ball had an equal chance of being included in a shovel's sample, and hence the sampling was unbiased.
+  \item
+    Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using only mobile phone numbers, would people without mobile phones be included? What if those who disapproved of Obama were less likely to agree to take part in the poll? What about if this were an internet poll on a certain news website? Would non-readers of this website be included? We need to ask the Harvard University Institute of Politics pollsters about their \emph{sampling methodology}.
+  \end{itemize}
+\item
+  \textbf{Random sampling}: Was the sampling random?
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Simulation: As long as you mixed the bowl sufficiently before sampling, your samples would be random.
+  \item
+    Obama poll: Was the sample conducted at random? We need to ask the Harvard University Institute of Politics pollsters about their \emph{sampling methodology}.
+  \end{itemize}
+\end{enumerate}
+
+Once again, let's revisit the sampling paradigm:
+
+\begin{quote}
+\begin{itemize}
+\tightlist
+\item
+  If the sampling of a sample of size \(n\) is done at \textbf{random}, then
+\item
+  the sample is \textbf{unbiased} and \textbf{representative} of the population of size \(N\), thus
+\item
+  any result based on the sample can \textbf{generalize} to the population, thus
+\item
+  the point estimate is a \textbf{``good guess''} of the unknown population parameter, thus
+\item
+  instead of performing a census, we can \textbf{infer} about the population using sampling.
+\end{itemize}
+\end{quote}
+
+In our simulations using the shovel with 50 slots:
+
+\begin{quote}
+\begin{itemize}
+\tightlist
+\item
+  If we extract a sample of \(n\) = 50 balls at \textbf{random}, in other words we mix the equally-sized balls before using the shovel, then
+\item
+  the contents of the shovel are an \textbf{unbiased representation} of the contents of the bowl's 2400 balls, thus
+\item
+  any result based on the sample of balls can \textbf{generalize} to the bowl, thus
+\item
+  the sample proportion \(\widehat{p}\) of the \(n\) = 50 balls in the shovel that are red is a \textbf{``good guess''} of the population proportion \(p\) of the \(N\) = 2400 balls that are red, thus
+\item
+  instead of manually going over all the balls in the bowl, we can \textbf{infer} about the bowl using the shovel.
+\end{itemize}
+\end{quote}
+
+In the in-real life Obama poll:
+
+\begin{quote}
+\begin{itemize}
+\tightlist
+\item
+  If we had a way of contacting a \textbf{randomly} chosen sample of 2089 young Americans and poll their approval of Obama, then
+\item
+  these 2089 young Americans would be an \textbf{unbiased} and \textbf{representative} sample of \emph{all} young Americans, thus
+\item
+  any results based on this sample of 2089 young Americans can \textbf{generalize} to the entire population of all young Americans, thus
+\item
+  the reported sample approval rating of 41\% of these 2089 young Americans is a \textbf{good guess} of the true approval rating among all young Americans, thus
+\item
+  instead of performing a highly costly census of all young Americans, we can \textbf{infer} about all young Americans using polling.
+\end{itemize}
+\end{quote}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{sampling-conclusion}{%
+\section{Conclusion}\label{sampling-conclusion}}
+
+\hypertarget{sampling-conclusion-central-limit-theorem}{%
+\subsection{Central Limit Theorem}\label{sampling-conclusion-central-limit-theorem}}
+
+What you did in Sections \ref{sampling-activity} and \ref{sampling-simulation} (in particular in Figure \ref{fig:comparing-sampling-distributions} and Table \ref{tab:comparing-n}) was demonstrate a very famous theorem, or mathematically proven truth, called the \emph{Central Limit Theorem}. It loosely states that when sample means and sample proportions are based on larger and larger sample sizes, the sampling distribution of these two point estimates become more and more normally shaped and more and more narrow. In other words, their sampling distributions become more normally distributed and the spread/variation of these sampling distributions as quantified by their standard errors gets smaller. Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following 3m38s video at \url{https://www.youtube.com/embed/jvoxEYmQHNM} explaining this crucial statistical theorem using the average weight of wild bunny rabbits and the average wing span of dragons as examples. Enjoy!
+
+\hypertarget{sampling-conclusion-table}{%
+\subsection{Summary table}\label{sampling-conclusion-table}}
+
+In this chapter, we performed both tactile and virtual simulations of sampling to infer about an unknown proportion. We also presented a case study of a sampling in real life situation: polls. In both cases, we used the sample proportion \(\widehat{p}\) to estimate the population proportion \(p\). However, we are not just limited to scenarios related statistical inference for proportions. In other words, we can consider other population parameter and point estimate scenarios than just the population proportion \(p\) and sample proportion \(\widehat{p}\) scenarios we studied in this chapter. We present 5 more such scenarios in Table \ref{tab:summarytable-ch8}.
+
+Note that the sample mean is traditionally noted as \(\overline{x}\) but can also be thought of as an estimate of the population mean \(\mu\). Thus, it can also be denoted as \(\widehat{\mu}\) as shown below in the table.
+
+\begin{table}[H]
+
+\caption{\label{tab:summarytable-ch8}\label{tab:summarytable-ch8}Scenarios of sampling for inference}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{>{\raggedleft\arraybackslash}p{0.5in}>{\raggedright\arraybackslash}p{0.7in}>{\raggedright\arraybackslash}p{1in}>{\raggedright\arraybackslash}p{1.1in}>{\raggedright\arraybackslash}p{1in}}
+\toprule
+Scenario & Population parameter & Notation & Point estimate & Notation.\\
+\midrule
+1 & Population proportion & $p$ & Sample proportion & $\widehat{p}$\\
+2 & Population mean & $\mu$ & Sample mean & $\widehat{\mu}$ or $\overline{x}$\\
+3 & Difference in population proportions & $p_1 - p_2$ & Difference in sample proportions & $\widehat{p}_1 - \widehat{p}_2$\\
+4 & Difference in population means & $\mu_1 - \mu_2$ & Difference in sample means & $\overline{x}_1 - \overline{x}_2$\\
+5 & Population regression slope & $\beta_1$ & Sample regression slope & $\widehat{\beta}_1$ or $b_1$\\
+\addlinespace
+6 & Population regression intercept & $\beta_0$ & Sample regression intercept & $\widehat{\beta}_0$ or $b_0$\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+We'll cover all the remaining scenarios as follows, using the terminology, notation, and definitions related to sampling you saw in Section \ref{sampling-framework}:
+
+\begin{itemize}
+\tightlist
+\item
+  In Chapter \ref{confidence-intervals}, we'll cover examples of statistical inference for
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Scenario 2: The mean age \(\mu\) of all pennies in circulation in the US.
+  \item
+    Scenario 3: The difference \(p_1 - p_2\) in the proportion of people who yawn when seeing someone else yawn and the proportion of people who yawn without seeing someone else yawn. This is an example of \emph{two-sample} inference.
+  \end{itemize}
+\item
+  In Chapter \ref{hypothesis-testing}, we'll cover an example of statistical inference for
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Scenario 4: The difference \(\mu_1 - \mu_2\) in average IMDB ratings for action and romance movies. This is another example of \emph{two-sample} inference.
+  \end{itemize}
+\item
+  In Chapter \ref{inference-for-regression}, we'll cover an example of statistical inference for the relationship between teaching score and various instructor demographic variables you saw in Chapter \ref{regression} on basic regression and Chapter \ref{multiple-regression} on multiple regression. Specifically
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Scenario 5: The intercept \(\beta_0\) of some population regression line.
+  \item
+    Scenario 6: The slope \(\beta_1\) of some population regression line.
+  \end{itemize}
+\end{itemize}
+
+\hypertarget{additional-resources-6}{%
+\subsection{Additional resources}\label{additional-resources-6}}
+
+An R script file of all R code used in this chapter is available \href{scripts/08-sampling.R}{here}.
+
+\hypertarget{whats-to-come-6}{%
+\subsection{What's to come?}\label{whats-to-come-6}}
+
+Recall in our Obama poll case study in Section \ref{sampling-case-study} that based on this particular sample, the Harvard University Institute of Politics' best guess of Obama's approval rating among all young Americans was 41\%. However, this isn't the end of the story. If you read further in the article, it states:
+
+\begin{quote}
+The online survey of 2,089 adults was conducted from Oct.~30 to Nov.~11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll's margin of error was plus or minus 2.1 percentage points.
+\end{quote}
+
+Note the term \emph{margin of error}, which here is plus or minus 2.1 percentage points. What this is saying is that most polls won't get it perfectly right; there will always be a certain amount of error caused by \emph{sampling variation}. The margin of error of plus or minus 2.1 percentage points is saying that a typical range of errors for polls of this type is about \(\pm\) 2.1\%, in words from about 2.1\% too small to about 2.1\% too big for an interval of {[}41\% - 2.1\%, 41\% + 2.1\%{]} = {[}37.9\%, 43.1\%{]}. Remember that this notation corresponds to 37.9\% and 43.1\% being included as well as all numbers between the two of them. We'll see in the next chapter that such intervals are known as \emph{confidence intervals}.
+
+\hypertarget{confidence-intervals}{%
+\chapter{Confidence Intervals}\label{confidence-intervals}}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\begin{announcement}
+\textbf{In preparation for our first print edition to be published by
+CRC Press in Fall 2019, we're remodeling this chapter a bit. Don't
+expect major changes in content, but rather only minor changes in
+presentation. Our remodeling will be complete and available online at
+\href{https://moderndive.com/}{ModernDive.com} by early Summer 2019!}
+\end{announcement}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+In Chapter \ref{sampling}, we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter \ref{sampling}:
+
+Generally speaking, we learned that if the sampling of a sample of size \(n\) is done at \emph{random}, then the resulting sample is \emph{unbiased} and \emph{representative} of the \emph{population}, thus any result based on the sample can \emph{generalize} to the population, and hence the \textbf{point estimate/sample statistic} computed from this sample is a ``good guess'' of the unknown population parameter of interest
+
+Specific to the bowl, we learned that if we properly mix the balls first thereby ensuring the randomness of samples extracted using the shovel with \(n=50\) slots, then the contents of the shovel will ``look like'' the contents of the bowl, thus any results based on the sample of \(n=50\) balls can generalize to the large bowl of \(N=2400\) balls, and hence the sample proportion red \(\widehat{p}\) of the \(n=50\) balls in the shovel is a ``good guess'' of the true population proportion red \(p\) of the \(N=2400\) balls in the bowl.
+
+We emphasize that we used a point estimate/sample statistic, in this case the sample proportion \(\widehat{p}\), to estimate the unknown value of the population parameter, in this case the population proportion \(p\). In other words, we are using the sample to \textbf{infer} about the population.
+
+We can however consider inferential situations other than just those involving proportions. We present a wide array of such scenarios in Table \ref{tab:summarytable}. In all 7 cases, the point estimate/sample statistic \emph{estimates} the unknown population parameter. It does so by computing summary statistics based on a sample of size \(n\).
+
+\begin{table}[H]
+
+\caption{\label{tab:summarytable}\label{tab:summarytable}Scenarios of sampling for inference}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{>{\raggedleft\arraybackslash}p{0.5in}>{\raggedright\arraybackslash}p{0.7in}>{\raggedright\arraybackslash}p{1in}>{\raggedright\arraybackslash}p{1.1in}>{\raggedright\arraybackslash}p{1in}}
+\toprule
+Scenario & Population parameter & Notation & Point estimate & Notation.\\
+\midrule
+1 & Population proportion & $p$ & Sample proportion & $\widehat{p}$\\
+2 & Population mean & $\mu$ & Sample mean & $\widehat{\mu}$ or $\overline{x}$\\
+3 & Difference in population proportions & $p_1 - p_2$ & Difference in sample proportions & $\widehat{p}_1 - \widehat{p}_2$\\
+4 & Difference in population means & $\mu_1 - \mu_2$ & Difference in sample means & $\overline{x}_1 - \overline{x}_2$\\
+5 & Population regression slope & $\beta_1$ & Sample regression slope & $\widehat{\beta}_1$ or $b_1$\\
+\addlinespace
+6 & Population regression intercept & $\beta_0$ & Sample regression intercept & $\widehat{\beta}_0$ or $b_0$\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+We'll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing:
+
+\begin{itemize}
+\tightlist
+\item
+  Scenario 2 about means. Ex: the average age of pennies.
+\item
+  Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of \emph{two-sample} inference.
+\item
+  Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This is another situation of \emph{two-sample} inference.
+\end{itemize}
+
+In Chapter \ref{inference-for-regression} on inference for regression, we'll cover Scenarios 5 \& 6 about the regression line. In particular we'll see that the fitted regression line from Chapter \ref{regression} on basic regression, \(\widehat{y} = b_0 + b_1 \cdot x\), is in fact an estimate of some true population regression line \(y = \beta_0 + \beta_1 \cdot x\) based on a sample of \(n\) pairs of points \((x, y)\). Ex: Recall our sample of \(n=463\) instructors at the UT Austin from the \texttt{evals} data set in Chapter \ref{regression}. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for \emph{all} instructors, not just those at the UT Austin?
+
+In contrast to these, Scenario 7 involves a measure of spread: the standard deviation. Does the spread/variability of a sample match the spread/variability of the population? However, we leave this topic for a more intermediate course on statistical inference.
+
+In most cases, we don't have the population values as we did with the \texttt{bowl} of balls. We only have a single sample of data from a larger population. We'd like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a \textbf{confidence interval} and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as \textbf{bootstrapping} that will be the focus of the beginning sections of this chapter.
+
+\hypertarget{needed-packages-6}{%
+\subsection*{Needed packages}\label{needed-packages-6}}
+
+
+Let's load all the packages needed for this chapter (this assumes you've already installed them). If needed, read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(janitor)}
+\KeywordTok{library}\NormalTok{(moderndive)}
+\KeywordTok{library}\NormalTok{(infer)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{bootstrapping}{%
+\section{Bootstrapping}\label{bootstrapping}}
+
+\hypertarget{data-explanation}{%
+\subsection{Data explanation}\label{data-explanation}}
+
+The \texttt{moderndive} package contains a sample of 40 pennies collected and minted in the United States. Let's explore this sample data first:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pennies_sample}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 40 x 2
+    year age_in_2011
+   <int>       <int>
+ 1  2005           6
+ 2  1981          30
+ 3  1977          34
+ 4  1992          19
+ 5  2005           6
+ 6  2006           5
+ 7  2000          11
+ 8  1992          19
+ 9  1988          23
+10  1996          15
+# ... with 30 more rows
+\end{verbatim}
+
+The \texttt{pennies\_sample} data frame has rows corresponding to a single penny with two variables:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{year} of minting as shown on the penny and
+\item
+  \texttt{age\_in\_2011} giving the years the penny had been in circulation from 2011 as an integer, e.g.~15, 2, etc.
+\end{itemize}
+
+Suppose we are interested in understanding some properties of the mean age of \textbf{all} US pennies from this data collected in 2011. How might we go about that? Let's begin by understanding some of the properties of \texttt{pennies\_sample} using data wrangling from Chapter \ref{wrangling} and data visualization from Chapter \ref{viz}.
+
+\hypertarget{exploratory-data-analysis}{%
+\subsection{Exploratory data analysis}\label{exploratory-data-analysis}}
+
+First, let's visualize the values in this sample as a histogram:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(pennies_sample, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ age_in_}\DecValTok{2011}\NormalTok{)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{10}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-255-1} \end{center}
+
+We see a roughly symmetric distribution here that has quite a few values near 20 years in age with only a few larger than 40 years or smaller than 5 years. If \texttt{pennies\_sample} is a representative sample from the population, we'd expect the age of all US pennies collected in 2011 to have a similar shape, a similar spread, and similar measures of central tendency like the mean.
+
+So where does the mean value fall for this sample? This point will be known as our \textbf{point estimate} and provides us with a single number that could serve as the guess to what the true population mean age might be. Recall how to find this using the \texttt{dplyr} package:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{x_bar <-}\StringTok{ }\NormalTok{pennies_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{stat =} \KeywordTok{mean}\NormalTok{(age_in_}\DecValTok{2011}\NormalTok{))}
+\NormalTok{x_bar}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1  25.1
+\end{verbatim}
+
+We've denoted this \emph{sample mean} as \(\bar{x}\), which is the standard symbol for denoting the mean of a sample. Our point estimate is, thus, \(\bar{x} = 25.1\). Note that this is just one sample though providing just one guess at the population mean. What if we'd like to have another guess?
+
+This should all sound similar to what we did in Chapter \ref{sampling}. There instead of collecting just a single scoop of balls we had many different students use the shovel to scoop different samples of red and white balls. We then calculated a sample statistic (the sample proportion) from each sample. But, we don't have a population to pull from here with the pennies. We only have this one sample.
+
+The process of \textbf{bootstrapping} allows us to use a single sample to generate many different samples that will act as our way of approximating a sampling distribution using a created \textbf{bootstrap distribution} instead. We will pull ourselves up from our bootstraps using a single sample (\texttt{pennies\_sample}) to get an idea of the grander sampling distribution.
+
+\hypertarget{bootstrap-process}{%
+\subsection{The Bootstrapping Process}\label{bootstrap-process}}
+
+Bootstrapping uses a process of sampling \textbf{with replacement} from our original sample to create new \textbf{bootstrap samples} of the \emph{same} size as our original sample. We can again make use of the \texttt{rep\_sample\_n()} function to explore what one such bootstrap sample would look like. Remember that we are randomly sampling from the original sample here with replacement and that we always use the same sample size for the bootstrap samples as the size of the original sample (\texttt{pennies\_sample}).
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_sample1 <-}\StringTok{ }\NormalTok{pennies_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rep_sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{40}\NormalTok{, }\DataTypeTok{replace =} \OtherTok{TRUE}\NormalTok{, }\DataTypeTok{reps =} \DecValTok{1}\NormalTok{)}
+\NormalTok{bootstrap_sample1}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 40 x 3
+# Groups:   replicate [1]
+   replicate  year age_in_2011
+       <int> <int>       <int>
+ 1         1  1983          28
+ 2         1  2000          11
+ 3         1  2004           7
+ 4         1  1981          30
+ 5         1  1993          18
+ 6         1  2006           5
+ 7         1  1981          30
+ 8         1  2004           7
+ 9         1  1992          19
+10         1  1994          17
+# ... with 30 more rows
+\end{verbatim}
+
+Let's visualize what this new bootstrap sample looks like:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(bootstrap_sample1, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ age_in_}\DecValTok{2011}\NormalTok{)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{10}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-259-1} \end{center}
+
+We now have another sample from what we could assume comes from the population of interest. We can similarly calculate the sample mean of this bootstrap sample, called a \textbf{bootstrap statistic}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_sample1 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{stat =} \KeywordTok{mean}\NormalTok{(age_in_}\DecValTok{2011}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  replicate  stat
+      <int> <dbl>
+1         1  23.2
+\end{verbatim}
+
+We can see that this sample mean is smaller than the \texttt{x\_bar} value we calculated earlier for the \texttt{pennies\_sample} data. We'll come back to analyzing the different bootstrap statistic values shortly.
+
+Let's recap what was done to get to this bootstrap sample using a tactile explanation:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  First, pretend that each of the 40 values of \texttt{age\_in\_2011} in \texttt{pennies\_sample} were written on a small piece of paper. Recall that these values were 6, 30, 34, 19, 6, etc.
+\item
+  Now, put the 40 small pieces of paper into a receptacle such as a baseball cap.
+\item
+  Shake up the pieces of paper.
+\item
+  Draw ``at random'' from the cap to select one piece of paper.
+\item
+  Write down the value on this piece of paper. Say that it is 28.
+\item
+  Now, place this piece of paper containing 28 back into the cap.
+\item
+  Draw ``at random'' again from the cap to select a piece of paper. Note that this is the \emph{sampling with replacement} part since you may draw 28 again.
+\item
+  Repeat this process until you have drawn 40 pieces of paper and written down the values on these 40 pieces of paper. Completing this repetition produces ONE bootstrap sample.
+\end{enumerate}
+
+If you look at the values in \texttt{bootstrap\_sample1}, you can see how this process plays out. We originally drew 28, then we drew 11, then 7, and so on. Of course, we didn't actually use pieces of paper and a cap here. We just had the computer perform this process for us to produce \texttt{bootstrap\_sample1} using \texttt{rep\_sample\_n()} with \texttt{replace\ =\ TRUE} set.
+
+The process of \emph{sampling with replacement} is how we can use the original sample to take a guess as to what other values in the population may be. Sometimes in these bootstrap samples, we will select lots of larger values from the original sample, sometimes we will select lots of smaller values, and most frequently we will select values that are near the center of the sample. Let's explore what the distribution of values of \texttt{age\_in\_2011} for six different bootstrap samples looks like to further understand this variability.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{six_bootstrap_samples <-}\StringTok{ }\NormalTok{pennies_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rep_sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{40}\NormalTok{, }\DataTypeTok{replace =} \OtherTok{TRUE}\NormalTok{, }\DataTypeTok{reps =} \DecValTok{6}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(six_bootstrap_samples, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ age_in_}\DecValTok{2011}\NormalTok{)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{10}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_wrap}\NormalTok{(}\OperatorTok{~}\StringTok{ }\NormalTok{replicate)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-262-1} \end{center}
+
+We can also look at the six different means using \texttt{dplyr} syntax:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{six_bootstrap_samples }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(replicate) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{stat =} \KeywordTok{mean}\NormalTok{(age_in_}\DecValTok{2011}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 6 x 2
+  replicate  stat
+      <int> <dbl>
+1         1  23.6
+2         2  24.1
+3         3  25.2
+4         4  23.1
+5         5  24.0
+6         6  24.7
+\end{verbatim}
+
+Instead of doing this six times, we could do it 1000 times and then look at the distribution of \texttt{stat} across all 1000 of the \texttt{replicate}s. This sets the stage for the \texttt{infer} R package \citep{R-infer} that was created to help users perform statistical inference such as confidence intervals and hypothesis tests using verbs similar to what you've seen with \texttt{dplyr}. We'll walk through setting up each of the \texttt{infer} verbs for confidence intervals using this \texttt{pennies\_sample} example, while also explaining the purpose of the verbs in a general framework.
+
+\hypertarget{the-infer-package-for-statistical-inference}{%
+\section{The infer package for statistical inference}\label{the-infer-package-for-statistical-inference}}
+
+The \texttt{infer} package makes great use of the \texttt{\%\textgreater{}\%} to create a pipeline for statistical inference. The goal of the package is to provide a way for its users to explain the computational process of confidence intervals and hypothesis tests using the code as a guide. The verbs build in order here, so you'll want to start with \texttt{specify()} and then continue through the others as needed.
+
+\hypertarget{specify-variables}{%
+\subsection{Specify variables}\label{specify-variables}}
+
+\begin{center}\includegraphics[width=\textwidth]{images/flowcharts/infer/specify} \end{center}
+
+The \texttt{specify()} function is used primarily to choose which variables will be the focus of the statistical inference. In addition, a setting of which variable will act as the \texttt{explanatory} and which acts as the \texttt{response} variable is done here. For proportion problems similar to those in Chapter \ref{sampling}, we can also give which of the different levels we would like to have as a \texttt{success}. We'll see further examples of these options in this chapter, Chapter \ref{hypothesis-testing}, and in Appendix \ref{appendixB}.
+
+To begin to create a confidence interval for the population mean age of US pennies in 2011, we start by using \texttt{specify()} to choose which variable in our \texttt{pennies\_sample} data we'd like to work with. This can be done in one of two ways:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Using the \texttt{response} argument:
+\end{enumerate}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pennies_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ age_in_}\DecValTok{2011}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Response: age_in_2011 (integer)
+# A tibble: 40 x 1
+   age_in_2011
+         <int>
+ 1           6
+ 2          30
+ 3          34
+ 4          19
+ 5           6
+ 6           5
+ 7          11
+ 8          19
+ 9          23
+10          15
+# ... with 30 more rows
+\end{verbatim}
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\setcounter{enumi}{1}
+\tightlist
+\item
+  Using \texttt{formula} notation:
+\end{enumerate}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pennies_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ age_in_}\DecValTok{2011} \OperatorTok{~}\StringTok{ }\OtherTok{NULL}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Response: age_in_2011 (integer)
+# A tibble: 40 x 1
+   age_in_2011
+         <int>
+ 1           6
+ 2          30
+ 3          34
+ 4          19
+ 5           6
+ 6           5
+ 7          11
+ 8          19
+ 9          23
+10          15
+# ... with 30 more rows
+\end{verbatim}
+
+Note that the formula notation uses the common R methodology to include the response \(y\) variable on the left of the \texttt{\textasciitilde{}} and the explanatory \(x\) variable on the right of the ``tilde.'' Recall that you used this notation frequently with the \texttt{lm()} function in Chapters \ref{regression} and \ref{multiple-regression} when fitting regression models. Either notation works just fine, but a preference is usually given here for the \texttt{formula} notation to further build on the ideas from earlier chapters.
+
+\hypertarget{generate-replicates}{%
+\subsection{Generate replicates}\label{generate-replicates}}
+
+\begin{center}\includegraphics[width=\textwidth]{images/flowcharts/infer/generate} \end{center}
+
+After \texttt{specify()}ing the variables we'd like in our inferential analysis, we next feed that into the \texttt{generate()} verb. The \texttt{generate()} verb's main argument is \texttt{reps}, which is used to give how many different repetitions one would like to perform. Another argument here is \texttt{type}, which is automatically determined by the kinds of variables passed into \texttt{specify()}. We can also be explicit and set this \texttt{type} to be \texttt{type\ =\ "bootstrap"}. This \texttt{type} argument will be further used in hypothesis testing in Chapter \ref{hypothesis-testing} as well. Make sure to check out \texttt{?generate} to see the options here and use the \texttt{?} operator to better understand other verbs as well.
+
+Let's \texttt{generate()} 1000 bootstrap samples:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{thousand_bootstrap_samples <-}\StringTok{ }\NormalTok{pennies_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ age_in_}\DecValTok{2011}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{1000}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Setting `type = "bootstrap"` in `generate()`.
+\end{verbatim}
+
+We can use the \texttt{dplyr} \texttt{count()} function to help us understand what the \texttt{thousand\_bootstrap\_samples} data frame looks like:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{thousand_bootstrap_samples }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{count}\NormalTok{(replicate)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1,000 x 2
+# Groups:   replicate [1,000]
+   replicate     n
+       <int> <int>
+ 1         1    40
+ 2         2    40
+ 3         3    40
+ 4         4    40
+ 5         5    40
+ 6         6    40
+ 7         7    40
+ 8         8    40
+ 9         9    40
+10        10    40
+# ... with 990 more rows
+\end{verbatim}
+
+Notice that each \texttt{replicate} has 40 entries here. Now that we have 1000 different bootstrap samples, our next step is to \texttt{calculate} the bootstrap statistics for each sample.
+
+\hypertarget{calculate-summary-statistics}{%
+\subsection{Calculate summary statistics}\label{calculate-summary-statistics}}
+
+\begin{center}\includegraphics[width=\textwidth]{images/flowcharts/infer/calculate} \end{center}
+
+After \texttt{generate()}ing many different samples, we next want to condense those samples down into a single statistic for each \texttt{replicate}d sample. As seen in the diagram, the \texttt{calculate()} function is helpful here.
+
+As we did at the beginning of this chapter, we now want to calculate the mean \texttt{age\_in\_2011} for each bootstrap sample. To do so, we use the \texttt{stat} argument and set it to \texttt{"mean"} below. The \texttt{stat} argument has a variety of different options here and we will see further examples of this throughout the remaining chapters.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution <-}\StringTok{ }\NormalTok{pennies_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ age_in_}\DecValTok{2011}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{1000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"mean"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Setting `type = "bootstrap"` in `generate()`.
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1,000 x 2
+   replicate  stat
+       <int> <dbl>
+ 1         1  26.5
+ 2         2  25.4
+ 3         3  26.0
+ 4         4  26  
+ 5         5  25.2
+ 6         6  29.0
+ 7         7  22.8
+ 8         8  26.4
+ 9         9  24.9
+10        10  28.1
+# ... with 990 more rows
+\end{verbatim}
+
+We see that the resulting data has 1000 rows and 2 columns corresponding to the 1000 replicates and the mean for each bootstrap sample.
+
+\hypertarget{observed-statistic-point-estimate-calculations}{%
+\subsubsection*{Observed statistic / point estimate calculations}\label{observed-statistic-point-estimate-calculations}}
+
+
+Just as \texttt{group\_by()\ \%\textgreater{}\%\ summarize()} produces a useful workflow in \texttt{dplyr}, we can also use \texttt{specify()\ \%\textgreater{}\%\ calculate()} to compute summary measures on our original sample data. It's often helpful both in confidence interval calculations, but also in hypothesis testing to identify what the corresponding statistic is in the original data. For our example on penny age, we computed above a value of \texttt{x\_bar} using the \texttt{summarize()} verb in \texttt{dplyr}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pennies_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{stat =} \KeywordTok{mean}\NormalTok{(age_in_}\DecValTok{2011}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1  25.1
+\end{verbatim}
+
+This can also be done by skipping the \texttt{generate()} step in the pipeline feeding \texttt{specify()} directly into \texttt{calculate()}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pennies_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ age_in_}\DecValTok{2011}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"mean"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1  25.1
+\end{verbatim}
+
+This shortcut will be particularly useful when the calculation of the observed statistic is tricky to do using \texttt{dplyr} alone. This is particularly the case when working with more than one variable as will be seen in Chapter \ref{hypothesis-testing}.
+
+\hypertarget{visualize-the-results}{%
+\subsection{Visualize the results}\label{visualize-the-results}}
+
+\begin{center}\includegraphics[width=\textwidth]{images/flowcharts/infer/visualize} \end{center}
+
+The \texttt{visualize()} verb provides a simple way to view the bootstrap distribution as a histogram of the \texttt{stat} variable values. It has many other arguments that one can use as well including the shading of the histogram values corresponding to the confidence interval values.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-275-1} \end{center}
+
+The shape of this resulting distribution may look familiar to you. It resembles the well-known normal (bell-shaped) curve.
+
+The following diagram recaps the \texttt{infer} pipeline for creating a bootstrap distribution.
+
+\begin{center}\includegraphics[width=\textwidth]{images/flowcharts/infer/ci_diagram} \end{center}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{now-to-confidence-intervals}{%
+\section{Now to confidence intervals}\label{now-to-confidence-intervals}}
+
+\textbf{Definition: Confidence Interval}
+
+A \emph{confidence interval} gives a range of plausible values for a parameter. It depends on a specified \emph{confidence level} with higher confidence levels corresponding to wider confidence intervals and lower confidence levels corresponding to narrower confidence intervals. Common confidence levels include 90\%, 95\%, and 99\%.
+
+Usually we don't just begin sections with a definition, but \emph{confidence intervals} are simple to define and play an important role in the sciences and any field that uses data. You can think of a confidence interval as playing the role of a net when fishing. Instead of just trying to catch a fish with a single spear (estimating an unknown parameter by using a single point estimate/statistic), we can use a net to try to provide a range of possible locations for the fish (use a range of possible values based around our statistic to make a plausible guess as to the location of the parameter).
+
+The bootstrapping process will provide bootstrap statistics that have a bootstrap distribution with center at (or extremely close to) the mean of the original sample. This can be seen by giving the observed statistic \texttt{obs\_stat} argument the value of the point estimate \texttt{x\_bar}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ x_bar)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-277-1} \end{center}
+
+We can also compute the mean of the bootstrap distribution of means to see how it compares to \texttt{x\_bar}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean_of_means =} \KeywordTok{mean}\NormalTok{(stat))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  mean_of_means
+          <dbl>
+1          25.1
+\end{verbatim}
+
+In this case, we can see that the bootstrap distribution provides us a guess as to what the variability in different sample means may look like only using the original sample as our guide. We can quantify this variability in the form of a 95\% confidence interval in a couple different ways.
+
+\hypertarget{percentile-method}{%
+\subsection{The percentile method}\label{percentile-method}}
+
+One way to calculate a range of plausible values for the unknown mean age of coins in 2011 is to use the middle 95\% of the \texttt{bootstrap\_distribution} to determine our endpoints. Our endpoints are thus at the 2.5\textsuperscript{th} and 97.5\textsuperscript{th} percentiles. This can be done with \texttt{infer} using the \texttt{get\_ci()} function. (You can also use the \texttt{conf\_int()} or \texttt{get\_confidence\_interval()} functions here as they are aliases that work the exact same way.)
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{(}\DataTypeTok{level =} \FloatTok{0.95}\NormalTok{, }\DataTypeTok{type =} \StringTok{"percentile"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `2.5%` `97.5%`
+   <dbl>   <dbl>
+1   21.0    29.3
+\end{verbatim}
+
+These options are the default values for \texttt{level} and \texttt{type} so we can also just do:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{percentile_ci <-}\StringTok{ }\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{()}
+\NormalTok{percentile_ci}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `2.5%` `97.5%`
+   <dbl>   <dbl>
+1   21.0    29.3
+\end{verbatim}
+
+Using the percentile method, our range of plausible values for the mean age of US pennies in circulation in 2011 is 20.972 years to 29.252 years. We can use the \texttt{visualize()} function to view this using the \texttt{endpoints} and \texttt{direction} arguments, setting \texttt{direction} to \texttt{"between"} (between the values) and \texttt{endpoints} to be those stored with name \texttt{percentile\_ci}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{endpoints =}\NormalTok{ percentile_ci, }\DataTypeTok{direction =} \StringTok{"between"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-281-1} \end{center}
+
+You can see that 95\% of the data stored in the \texttt{stat} variable in \texttt{bootstrap\_distribution} falls between the two endpoints with 2.5\% to the left outside of the shading and 2.5\% to the right outside of the shading. The cut-off points that provide our range are shown with the darker lines.
+
+\hypertarget{the-standard-error-method}{%
+\subsection{The standard error method}\label{the-standard-error-method}}
+
+If the bootstrap distribution is close to symmetric and bell-shaped, we can also use a shortcut formula for determining the lower and upper endpoints of the confidence interval. This is done by using the formula \(\bar{x} \pm (multiplier * SE),\) where \(\bar{x}\) is our original sample mean and \(SE\) stands for \textbf{standard error} and corresponds to the standard deviation of the bootstrap distribution. The value of \(multiplier\) here is the appropriate percentile of the standard normal distribution.
+
+These are automatically calculated when \texttt{level} is provided with \texttt{level\ =\ 0.95} being the default. (95\% of the values in a standard normal distribution fall within 1.96 standard deviations of the mean, so \(multiplier = 1.96\) for \texttt{level\ =\ 0.95}, for example.) As mentioned, this formula assumes that the bootstrap distribution is symmetric and bell-shaped. This is often the case with bootstrap distributions, especially those in which the original distribution of the sample is not highly skewed.
+
+\textbf{Definition: standard error}
+
+The \emph{standard error} is the standard deviation of the sampling distribution.
+
+The variability of the sampling distribution may be approximated by the variability of the bootstrap distribution. Traditional theory-based methodologies for inference also have formulas for standard errors, assuming some conditions are met.
+
+This \(\bar{x} \pm (multiplier * SE)\) formula is implemented in the \texttt{get\_ci()} function as shown with our pennies problem using the bootstrap distribution's variability as an approximation for the sampling distribution's variability. We'll see more on this approximation shortly.
+
+Note that the center of the confidence interval (the \texttt{point\_estimate}) must be provided for the standard error confidence interval.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{standard_error_ci <-}\StringTok{ }\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{(}\DataTypeTok{type =} \StringTok{"se"}\NormalTok{, }\DataTypeTok{point_estimate =}\NormalTok{ x_bar)}
+\NormalTok{standard_error_ci}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  lower upper
+  <dbl> <dbl>
+1  21.0  29.3
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{endpoints =}\NormalTok{ standard_error_ci, }\DataTypeTok{direction =} \StringTok{"between"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-284-1} \end{center}
+
+We see that both methods produce nearly identical confidence intervals with the percentile method being \([20.97, 29.25]\) and the standard error method being \([20.97, 29.28]\).
+
+\hypertarget{comparing-bootstrap-and-sampling-distributions}{%
+\section{Comparing bootstrap and sampling distributions}\label{comparing-bootstrap-and-sampling-distributions}}
+
+To help build up the idea of a confidence interval, we weren't completely honest in our initial discussion. The \texttt{pennies\_sample} data frame represents a sample from a larger number of pennies stored as \texttt{pennies} in the \texttt{moderndive} package. The \texttt{pennies} data frame (also in the \texttt{moderndive} package) contains 800 rows of data and two columns pertaining to the same variables as \texttt{pennies\_sample}. Let's begin by understanding some of the properties of the \texttt{age\_by\_2011} variable in the \texttt{pennies} data frame.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(pennies, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ age_in_}\DecValTok{2011}\NormalTok{)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{10}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-285-1} \end{center}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pennies }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean_age =} \KeywordTok{mean}\NormalTok{(age_in_}\DecValTok{2011}\NormalTok{),}
+            \DataTypeTok{median_age =} \KeywordTok{median}\NormalTok{(age_in_}\DecValTok{2011}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  mean_age median_age
+     <dbl>      <dbl>
+1     21.2         20
+\end{verbatim}
+
+We see that \texttt{pennies} is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that \texttt{pennies\_sample} was more symmetric than \texttt{pennies}. In fact, it actually exhibited some left-skew as we compare the mean and median values.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(pennies_sample, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ age_in_}\DecValTok{2011}\NormalTok{)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{10}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-287-1} \end{center}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pennies_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean_age =} \KeywordTok{mean}\NormalTok{(age_in_}\DecValTok{2011}\NormalTok{), }\DataTypeTok{median_age =} \KeywordTok{median}\NormalTok{(age_in_}\DecValTok{2011}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  mean_age median_age
+     <dbl>      <dbl>
+1     25.1       25.5
+\end{verbatim}
+
+\hypertarget{sampling-distribution}{%
+\subsubsection*{Sampling distribution}\label{sampling-distribution}}
+
+
+Let's assume that \texttt{pennies} represents our population of interest. We can then create a sampling distribution for the population mean age of pennies, denoted by the Greek letter \(\mu\), using the \texttt{rep\_sample\_n()} function seen in Chapter \ref{sampling}. First we will create 1000 samples from the \texttt{pennies} data frame.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{thousand_samples <-}\StringTok{ }\NormalTok{pennies }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rep_sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{40}\NormalTok{, }\DataTypeTok{reps =} \DecValTok{1000}\NormalTok{, }\DataTypeTok{replace =} \OtherTok{FALSE}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+When creating a sampling distribution, we do not replace the items when we create each sample. This is in contrast to the bootstrap distribution. It's important to remember that the sampling distribution is sampling \textbf{without} replacement from the population to better understand sample-to-sample variability, whereas the bootstrap distribution is sampling \textbf{with} replacement from our original sample to better understand potential sample-to-sample variability.
+
+After sampling from \texttt{pennies} 1000 times, we next want to compute the mean age for each of the 1000 samples:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{sampling_distribution <-}\StringTok{ }\NormalTok{thousand_samples }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(replicate) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{stat =} \KeywordTok{mean}\NormalTok{(age_in_}\DecValTok{2011}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(sampling_distribution, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ stat)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{10}\NormalTok{, }\DataTypeTok{fill =} \StringTok{"salmon"}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-291-1} 
+
+}
+
+\caption{Sampling distribution for n=40 samples of pennies}\label{fig:unnamed-chunk-291}
+\end{figure}
+
+We can also examine the variability in this sampling distribution by calculating the standard deviation of the \texttt{stat} column. Remember that the standard deviation of the sampling distribution is the \textbf{standard error}, frequently denoted as \texttt{se}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{sampling_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{se =} \KeywordTok{sd}\NormalTok{(stat))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+     se
+  <dbl>
+1  2.01
+\end{verbatim}
+
+\hypertarget{bootstrap-distribution}{%
+\subsubsection*{Bootstrap distribution}\label{bootstrap-distribution}}
+
+
+Let's now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We'll shade the bootstrap distribution blue to further assist with remembering which is which.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{10}\NormalTok{, }\DataTypeTok{fill =} \StringTok{"blue"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-293-1} \end{center}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{se =} \KeywordTok{sd}\NormalTok{(stat))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+     se
+  <dbl>
+1  2.12
+\end{verbatim}
+
+Notice that while the standard deviations are similar, the center of the sampling distribution and the bootstrap distribution differ:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{sampling_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean_of_sampling_means =} \KeywordTok{mean}\NormalTok{(stat))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  mean_of_sampling_means
+                   <dbl>
+1                   21.2
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean_of_bootstrap_means =} \KeywordTok{mean}\NormalTok{(stat))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  mean_of_bootstrap_means
+                    <dbl>
+1                    25.1
+\end{verbatim}
+
+Since the bootstrap distribution is centered at the original sample mean, it doesn't necessarily provide a good estimate of the overall population mean \(\mu\). Let's calculate the mean of \texttt{age\_in\_2011} for the \texttt{pennies} data frame to see how it compares to the mean of the sampling distribution and the mean of the bootstrap distribution.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pennies }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{overall_mean =} \KeywordTok{mean}\NormalTok{(age_in_}\DecValTok{2011}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  overall_mean
+         <dbl>
+1         21.2
+\end{verbatim}
+
+Notice that this value matches up well with the mean of the sampling distribution. This is actually an artifact of the Central Limit Theorem introduced in Chapter \ref{sampling}. The mean of the sampling distribution is expected to be the mean of the overall population.
+
+The unfortunate fact though is that we don't know the population mean in nearly all circumstances. The motivation of presenting it here was to show that the theory behind the Central Limit Theorem works using the tools you've worked with so far using the \texttt{ggplot2}, \texttt{dplyr}, \texttt{moderndive}, and \texttt{infer} packages.
+
+If we aren't able to use the sample mean as a good guess for the population mean, how should we best go about estimating what the population mean may be if we can only select samples from the population. We've now come full circle and can discuss the underpinnings of the confidence interval and ways to interpret it.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{interpreting-the-confidence-interval}{%
+\section{Interpreting the confidence interval}\label{interpreting-the-confidence-interval}}
+
+As shown above in Subsection \ref{percentile-method}, one range of plausible values for the population mean age of pennies in 2011, denoted by \(\mu\), is \([20.97, 29.25]\). Recall that this confidence interval is based on bootstrapping using \texttt{pennies\_sample}. Note that the mean of \texttt{pennies} (21.152) does fall in this confidence interval. If we had a different sample of size 40 and constructed a confidence interval using the same method, would we be guaranteed that it contained the population parameter value as well? Let's try it out:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pennies_sample2 <-}\StringTok{ }\NormalTok{pennies }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{40}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+Note the use of the \texttt{sample\_n()} function in the \texttt{dplyr} package here. This does the same thing as \texttt{rep\_sample\_n(reps\ =\ 1)} but omits the extra \texttt{replicate} column.
+
+We next create an \texttt{infer} pipeline to generate a percentile-based 95\% confidence interval for \(\mu\):
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{percentile_ci2 <-}\StringTok{ }\NormalTok{pennies_sample2 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ age_in_}\DecValTok{2011} \OperatorTok{~}\StringTok{ }\OtherTok{NULL}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{1000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"mean"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Setting `type = "bootstrap"` in `generate()`.
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{percentile_ci2}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `2.5%` `97.5%`
+   <dbl>   <dbl>
+1   18.4    25.3
+\end{verbatim}
+
+This new confidence interval also contains the value of \(\mu\). Let's further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of \texttt{pennies}. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years.
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-301-1} \end{center}
+
+Of the 100 confidence intervals based on samples of size \(n = 40\), 96 of them captured the population mean \(\mu = 21.152\), whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we'd expect 95\% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is ``95\% reliable'' in that we can expect it to include the true population parameter 95\% of the time if the process is repeated.
+
+To further accentuate this point, let's perform a similar procedure using 90\% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals.
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-302-1} \end{center}
+
+Of the 100 confidence intervals based on samples of size \(n = 40\), 87 of them captured the population mean \(\mu = 21.152\), whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90\% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be ``95\% confident'' or ``90\% confident'' that the true value falls within the range of the specified confidence interval. We will use this ``confident'' language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process.
+
+\hypertarget{back-to-our-pennies-example}{%
+\subsubsection*{Back to our pennies example}\label{back-to-our-pennies-example}}
+
+
+After this elaboration on what the level corresponds to in a confidence interval, let's conclude by providing an interpretation of the original confidence interval result we found in Subsection \ref{percentile-method}.
+
+\textbf{Interpretation:} We are 95\% confident that the true mean age of pennies in circulation in 2011 is between 20.972 and 29.252 years. This level of confidence is based on the percentile-based method including the true mean 95\% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{one-prop-ci}{%
+\section{Example: One proportion}\label{one-prop-ci}}
+
+Let's revisit our exercise of trying to estimate the proportion of red balls in the bowl from Chapter \ref{sampling}. We are now interested in determining a confidence interval for population parameter \(p\), the proportion of balls that are red out of the total \(N = 2400\) red and white balls.
+
+We will use the first sample reported from Ilyas and Yohan in Subsection \ref{student-shovels} for our point estimate. They observed 21 red balls out of the 50 in their shovel. This data is stored in the \texttt{tactile\_shovel1} data frame in the \texttt{moderndive} package.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{tactile_shovel1}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 50 x 1
+   color
+   <chr>
+ 1 red  
+ 2 red  
+ 3 white
+ 4 red  
+ 5 white
+ 6 red  
+ 7 red  
+ 8 white
+ 9 red  
+10 white
+# ... with 40 more rows
+\end{verbatim}
+
+\hypertarget{observed-statistic}{%
+\subsection{Observed Statistic}\label{observed-statistic}}
+
+To compute the proportion that are red in this data we can use the \texttt{specify()\ \%\textgreater{}\%\ calculate()} workflow. Note the use of the \texttt{success} argument here to clarify which of the two colors \texttt{"red"} or \texttt{"white"} we are interested in.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{p_hat <-}\StringTok{ }\NormalTok{tactile_shovel1 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ color }\OperatorTok{~}\StringTok{ }\OtherTok{NULL}\NormalTok{, }\DataTypeTok{success =} \StringTok{"red"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"prop"}\NormalTok{)}
+\NormalTok{p_hat}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1  0.42
+\end{verbatim}
+
+\hypertarget{bootstrap-distribution-1}{%
+\subsection{Bootstrap distribution}\label{bootstrap-distribution-1}}
+
+Next we want to calculate many different bootstrap samples and their corresponding bootstrap statistic (the proportion of red balls). We've done 1000 in the past, but let's go up to 10,000 now to better see the resulting distribution. Recall that this is done by including a \texttt{generate()} function call in the middle of our pipeline:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{tactile_shovel1 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ color }\OperatorTok{~}\StringTok{ }\OtherTok{NULL}\NormalTok{, }\DataTypeTok{success =} \StringTok{"red"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Setting `type = "bootstrap"` in `generate()`.
+\end{verbatim}
+
+This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the \texttt{calculate()} step.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_props <-}\StringTok{ }\NormalTok{tactile_shovel1 }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ color }\OperatorTok{~}\StringTok{ }\OtherTok{NULL}\NormalTok{, }\DataTypeTok{success =} \StringTok{"red"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"prop"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+Let's \texttt{visualize()} what the resulting bootstrap distribution looks like as a histogram. We've adjusted the number of bins here as well to better see the resulting shape.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_props }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{25}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-310-1} \end{center}
+
+We see that the resulting distribution is symmetric and bell-shaped so it doesn't much matter which confidence interval method we choose. Let's use the standard error method to create a 95\% confidence interval.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{standard_error_ci <-}\StringTok{ }\NormalTok{bootstrap_props }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{(}\DataTypeTok{type =} \StringTok{"se"}\NormalTok{, }\DataTypeTok{level =} \FloatTok{0.95}\NormalTok{, }\DataTypeTok{point_estimate =}\NormalTok{ p_hat)}
+\NormalTok{standard_error_ci}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  lower upper
+  <dbl> <dbl>
+1 0.284 0.556
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_props }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{25}\NormalTok{, }\DataTypeTok{endpoints =}\NormalTok{ standard_error_ci)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-312-1} \end{center}
+
+We are 95\% confident that the true proportion of red balls in the bowl is between 0.284 and 0.556. This level of confidence is based on the standard error-based method including the true proportion 95\% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.
+
+\hypertarget{theory-based-confidence-intervals}{%
+\subsection{Theory-based confidence intervals}\label{theory-based-confidence-intervals}}
+
+When the bootstrap distribution has the nice symmetric, bell shape that we saw in the red balls example above, we can also use a formula to quantify the standard error. This provides another way to compute a confidence interval, but is a little more tedious and mathematical. The steps are outlined below. We've also shown how we can use the confidence interval (CI) interpretation in this case as well to support your understanding of this tricky concept.
+
+\hypertarget{procedure-for-building-a-theory-based-ci-for-p}{%
+\subsubsection*{\texorpdfstring{Procedure for building a theory-based CI for \(p\)}{Procedure for building a theory-based CI for p}}\label{procedure-for-building-a-theory-based-ci-for-p}}
+
+
+To construct a theory-based confidence interval for \(p\), the unknown true population proportion we
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Collect a sample of size \(n\)
+\item
+  Compute \(\widehat{p}\)
+\item
+  Compute the standard error \[\text{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
+\item
+  Compute the margin of error \[\text{MoE} = 1.96 \cdot \text{SE} =  1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
+\item
+  Compute both end points of the confidence interval:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    The lower end point \texttt{lower\_ci}: \[\widehat{p} - \text{MoE} = \widehat{p} - 1.96 \cdot \text{SE} = \widehat{p} - 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
+  \item
+    The upper end point \texttt{upper\_ci}: \[\widehat{p} + \text{MoE} = \widehat{p} + 1.96 \cdot \text{SE} = \widehat{p} + 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
+  \end{itemize}
+\item
+  Alternatively, you can succinctly summarize a 95\% confidence interval for \(p\) using the \(\pm\) symbol:
+\end{enumerate}
+
+\[
+\widehat{p} \pm \text{MoE} = \widehat{p} \pm 1.96 \cdot \text{SE} = \widehat{p} \pm 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}
+\]
+
+\hypertarget{confidence-intervals-based-on-33-tactile-samples}{%
+\subsubsection*{Confidence intervals based on 33 tactile samples}\label{confidence-intervals-based-on-33-tactile-samples}}
+
+
+Let's load the tactile sampling data for the 33 groups from Chapter \ref{sampling}. Recall this data was saved in the \texttt{tactile\_prop\_red} data frame included in the \texttt{moderndive} package.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{tactile_prop_red}
+\end{Highlighting}
+\end{Shaded}
+
+Let's now apply the above procedure for constructing confidence intervals for \(p\) using the data saved in \texttt{tactile\_prop\_red} by adding/modifying new columns using the \texttt{dplyr} package data wrangling tools seen in Chapter \ref{wrangling}:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Rename \texttt{prop\_red} to \texttt{p\_hat}, the official name of the sample proportion
+\item
+  Make explicit the sample size \texttt{n} of \(n=50\)
+\item
+  the standard error \texttt{SE}
+\item
+  the margin of error \texttt{MoE}
+\item
+  the left endpoint of the confidence interval \texttt{lower\_ci}
+\item
+  the right endpoint of the confidence interval \texttt{upper\_ci}
+\end{enumerate}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{conf_ints <-}\StringTok{ }\NormalTok{tactile_prop_red }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rename}\NormalTok{(}\DataTypeTok{p_hat =}\NormalTok{ prop_red) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}
+    \DataTypeTok{n =} \DecValTok{50}\NormalTok{,}
+    \DataTypeTok{SE =} \KeywordTok{sqrt}\NormalTok{(p_hat }\OperatorTok{*}\StringTok{ }\NormalTok{(}\DecValTok{1} \OperatorTok{-}\StringTok{ }\NormalTok{p_hat) }\OperatorTok{/}\StringTok{ }\NormalTok{n),}
+    \DataTypeTok{MoE =} \FloatTok{1.96} \OperatorTok{*}\StringTok{ }\NormalTok{SE,}
+    \DataTypeTok{lower_ci =}\NormalTok{ p_hat }\OperatorTok{-}\StringTok{ }\NormalTok{MoE,}
+    \DataTypeTok{upper_ci =}\NormalTok{ p_hat }\OperatorTok{+}\StringTok{ }\NormalTok{MoE}
+\NormalTok{  )}
+\NormalTok{conf_ints}
+\end{Highlighting}
+\end{Shaded}
+
+\begingroup\fontsize{10}{12}\selectfont
+
+\begin{longtable}{lrrrrrrr}
+\caption{\label{tab:unnamed-chunk-315}33 confidence intervals from 33 tactile samples of size n=50}\\
+\toprule
+group & red\_balls & p\_hat & n & SE & MoE & lower\_ci & upper\_ci\\
+\midrule
+\endfirsthead
+\caption[]{\label{tab:unnamed-chunk-315}33 confidence intervals from 33 tactile samples of size n=50 \textit{(continued)}}\\
+\toprule
+group & red\_balls & p\_hat & n & SE & MoE & lower\_ci & upper\_ci\\
+\midrule
+\endhead
+\
+\endfoot
+\bottomrule
+\endlastfoot
+Ilyas, Yohan & 21 & 0.42 & 50 & 0.070 & 0.137 & 0.283 & 0.557\\
+Morgan, Terrance & 17 & 0.34 & 50 & 0.067 & 0.131 & 0.209 & 0.471\\
+Martin, Thomas & 21 & 0.42 & 50 & 0.070 & 0.137 & 0.283 & 0.557\\
+Clark, Frank & 21 & 0.42 & 50 & 0.070 & 0.137 & 0.283 & 0.557\\
+Riddhi, Karina & 18 & 0.36 & 50 & 0.068 & 0.133 & 0.227 & 0.493\\
+\addlinespace
+Andrew, Tyler & 19 & 0.38 & 50 & 0.069 & 0.135 & 0.245 & 0.515\\
+Julia & 19 & 0.38 & 50 & 0.069 & 0.135 & 0.245 & 0.515\\
+Rachel, Lauren & 11 & 0.22 & 50 & 0.059 & 0.115 & 0.105 & 0.335\\
+Daniel, Caroline & 15 & 0.30 & 50 & 0.065 & 0.127 & 0.173 & 0.427\\
+Josh, Maeve & 17 & 0.34 & 50 & 0.067 & 0.131 & 0.209 & 0.471\\
+\addlinespace
+Emily, Emily & 16 & 0.32 & 50 & 0.066 & 0.129 & 0.191 & 0.449\\
+Conrad, Emily & 18 & 0.36 & 50 & 0.068 & 0.133 & 0.227 & 0.493\\
+Oliver, Erik & 17 & 0.34 & 50 & 0.067 & 0.131 & 0.209 & 0.471\\
+Isabel, Nam & 21 & 0.42 & 50 & 0.070 & 0.137 & 0.283 & 0.557\\
+X, Claire & 15 & 0.30 & 50 & 0.065 & 0.127 & 0.173 & 0.427\\
+\addlinespace
+Cindy, Kimberly & 20 & 0.40 & 50 & 0.069 & 0.136 & 0.264 & 0.536\\
+Kevin, James & 11 & 0.22 & 50 & 0.059 & 0.115 & 0.105 & 0.335\\
+Nam, Isabelle & 21 & 0.42 & 50 & 0.070 & 0.137 & 0.283 & 0.557\\
+Harry, Yuko & 15 & 0.30 & 50 & 0.065 & 0.127 & 0.173 & 0.427\\
+Yuki, Eileen & 16 & 0.32 & 50 & 0.066 & 0.129 & 0.191 & 0.449\\
+\addlinespace
+Ramses & 23 & 0.46 & 50 & 0.070 & 0.138 & 0.322 & 0.598\\
+Joshua, Elizabeth, Stanley & 15 & 0.30 & 50 & 0.065 & 0.127 & 0.173 & 0.427\\
+Siobhan, Jane & 18 & 0.36 & 50 & 0.068 & 0.133 & 0.227 & 0.493\\
+Jack, Will & 16 & 0.32 & 50 & 0.066 & 0.129 & 0.191 & 0.449\\
+Caroline, Katie & 21 & 0.42 & 50 & 0.070 & 0.137 & 0.283 & 0.557\\
+\addlinespace
+Griffin, Y & 18 & 0.36 & 50 & 0.068 & 0.133 & 0.227 & 0.493\\
+Kaitlin, Jordan & 17 & 0.34 & 50 & 0.067 & 0.131 & 0.209 & 0.471\\
+Ella, Garrett & 18 & 0.36 & 50 & 0.068 & 0.133 & 0.227 & 0.493\\
+Julie, Hailin & 15 & 0.30 & 50 & 0.065 & 0.127 & 0.173 & 0.427\\
+Katie, Caroline & 21 & 0.42 & 50 & 0.070 & 0.137 & 0.283 & 0.557\\
+\addlinespace
+Mallory, Damani, Melissa & 21 & 0.42 & 50 & 0.070 & 0.137 & 0.283 & 0.557\\
+Katie & 16 & 0.32 & 50 & 0.066 & 0.129 & 0.191 & 0.449\\
+Francis, Vignesh & 19 & 0.38 & 50 & 0.069 & 0.135 & 0.245 & 0.515\\*
+\end{longtable}
+\endgroup{}
+
+Let's plot:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  These 33 confidence intervals for \(p\): from \texttt{lower\_ci} to \texttt{upper\_ci}
+\item
+  The true population proportion \(p = 900 / 2400 = 0.375\) with a red vertical line
+\end{enumerate}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/tactile-conf-int-1} 
+
+}
+
+\caption{33 confidence intervals based on 33 tactile samples of size n=50}\label{fig:tactile-conf-int}
+\end{figure}
+
+We see that:
+
+\begin{itemize}
+\tightlist
+\item
+  In 31 cases, the confidence intervals ``capture'' the true \(p = 900 / 2400 = 0.375\)
+\item
+  In 2 cases, the confidence intervals do not ``capture'' the true \(p = 900 / 2400 = 0.375\)
+\end{itemize}
+
+Thus, the confidence intervals capture the true proportion \$31 / 33 = 93.939\% of the time using this theory-based methodology.
+
+\hypertarget{confidence-intervals-based-on-100-virtual-samples}{%
+\subsubsection*{Confidence intervals based on 100 virtual samples}\label{confidence-intervals-based-on-100-virtual-samples}}
+
+
+Let's say however, we repeated the above 100 times, not tactilely, but virtually. Let's do this only 100 times instead of 1000 like we did before so that the results can fit on the screen. Again, the steps for compute a 95\% confidence interval for \(p\) are:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Collect a sample of size \(n = 50\) as we did in Chapter \ref{sampling}
+\item
+  Compute \(\widehat{p}\): the sample proportion red of these \(n=50\) balls
+\item
+  Compute the standard error \(\text{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
+\item
+  Compute the margin of error \(\text{MoE} = 1.96 \cdot \text{SE} = 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
+\item
+  Compute both end points of the confidence interval:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    \texttt{lower\_ci}: \(\widehat{p} - \text{MoE} = \widehat{p} - 1.96 \cdot \text{SE} = \widehat{p} - 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
+  \item
+    \texttt{upper\_ci}: \(\widehat{p} + \text{MoE} = \widehat{p} + 1.96 \cdot \text{SE} = \widehat{p} +1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
+  \end{itemize}
+\end{enumerate}
+
+Run the following three steps, being sure to \texttt{View()} the resulting data frame after each step so you can convince yourself of what's going on:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# First: Take 100 virtual samples of n=50 balls}
+\NormalTok{virtual_samples <-}\StringTok{ }\NormalTok{bowl }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rep_sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{50}\NormalTok{, }\DataTypeTok{reps =} \DecValTok{100}\NormalTok{)}
+
+\CommentTok{# Second: For each virtual sample compute the proportion red}
+\NormalTok{virtual_prop_red <-}\StringTok{ }\NormalTok{virtual_samples }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(replicate) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{red =} \KeywordTok{sum}\NormalTok{(color }\OperatorTok{==}\StringTok{ "red"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{prop_red =}\NormalTok{ red }\OperatorTok{/}\StringTok{ }\DecValTok{50}\NormalTok{)}
+
+\CommentTok{# Third: Compute the 95% confidence interval as above}
+\NormalTok{virtual_prop_red <-}\StringTok{ }\NormalTok{virtual_prop_red }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rename}\NormalTok{(}\DataTypeTok{p_hat =}\NormalTok{ prop_red) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}
+    \DataTypeTok{n =} \DecValTok{50}\NormalTok{,}
+    \DataTypeTok{SE =} \KeywordTok{sqrt}\NormalTok{(p_hat}\OperatorTok{*}\NormalTok{(}\DecValTok{1}\OperatorTok{-}\NormalTok{p_hat)}\OperatorTok{/}\NormalTok{n),}
+    \DataTypeTok{MoE =} \FloatTok{1.96} \OperatorTok{*}\StringTok{ }\NormalTok{SE,}
+    \DataTypeTok{lower_ci =}\NormalTok{ p_hat }\OperatorTok{-}\StringTok{ }\NormalTok{MoE,}
+    \DataTypeTok{upper_ci =}\NormalTok{ p_hat }\OperatorTok{+}\StringTok{ }\NormalTok{MoE}
+\NormalTok{  )}
+\end{Highlighting}
+\end{Shaded}
+
+Here are the results:
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/virtual-conf-int-1} 
+
+}
+
+\caption{100 confidence intervals based on 100 virtual samples of size n=50}\label{fig:virtual-conf-int}
+\end{figure}
+
+We see that of our 100 confidence intervals based on samples of size \(n=50\), 96 of them captured the true \(p = 900/2400\), whereas 4 of them missed. As we create more and more confidence intervals based on more and more samples, about 95\% of these intervals will capture. In other words our procedure is ``95\% reliable.''
+
+Theoretical methods like this have largely been used in the past since we didn't have the computing power to perform the simulation-based methods such as bootstrapping. They are still commonly used though and if the normality assumptions are met, they can provide a nice option for finding confidence intervals and performing hypothesis tests as we will see in Chapter \ref{hypothesis-testing}.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{example-comparing-two-proportions}{%
+\section{Example: Comparing two proportions}\label{example-comparing-two-proportions}}
+
+If you see someone else yawn, are you more likely to yawn? In an \href{http://www.discovery.com/tv-shows/mythbusters/mythbusters-database/yawning-contagious/}{episode} of the show \emph{Mythbusters}, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website \href{https://www.discovery.com/tv-shows/mythbusters/videos/is-yawning-contagious}{here}. More information about the episode is also available on IMDb \href{https://www.imdb.com/title/tt0768479/}{here}.
+
+Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (``confederate'') who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at \texttt{mythbusters\_yawn} in the \texttt{moderndive} package. Let's check it out.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{mythbusters_yawn}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 50 x 3
+    subj group   yawn 
+   <int> <chr>   <chr>
+ 1     1 seed    yes  
+ 2     2 control yes  
+ 3     3 seed    no   
+ 4     4 seed    yes  
+ 5     5 seed    no   
+ 6     6 control no   
+ 7     7 seed    yes  
+ 8     8 control no   
+ 9     9 control no   
+10    10 seed    no   
+# ... with 40 more rows
+\end{verbatim}
+
+\begin{itemize}
+\tightlist
+\item
+  The participant ID is stored in the \texttt{subj} variable with values of 1 to 50.
+\item
+  The \texttt{group} variable is either \texttt{"seed"} for when a confederate was trying to influence the participant or \texttt{"control"} if a confederate did not interact with the participant.
+\item
+  The \texttt{yawn} variable is either \texttt{"yes"} if the participant yawned or \texttt{"no"} if the participant did not yawn.
+\end{itemize}
+
+We can use the \texttt{janitor} package to get a glimpse into this data in a table format:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{mythbusters_yawn }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{tabyl}\NormalTok{(group, yawn) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{adorn_percentages}\NormalTok{() }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{adorn_pct_formatting}\NormalTok{() }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\CommentTok{# To show original counts}
+\StringTok{  }\KeywordTok{adorn_ns}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+   group         no        yes
+ control 75.0% (12) 25.0%  (4)
+    seed 70.6% (24) 29.4% (10)
+\end{verbatim}
+
+We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We'd like to see if the difference between these two proportions is significantly larger than 0. If so, we'd have evidence to support the claim that yawning is contagious based on this study.
+
+In looking over this problem, we can make note of some important details to include in our \texttt{infer} pipeline:
+
+\begin{itemize}
+\tightlist
+\item
+  We are calling a \texttt{success} having a \texttt{yawn} value of \texttt{"yes"}.
+\item
+  Our response variable will always correspond to the variable used in the \texttt{success} so the response variable is \texttt{yawn}.
+\item
+  The explanatory variable is the other variable of interest here: \texttt{group}.
+\end{itemize}
+
+To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not.
+
+\hypertarget{compute-the-point-estimate}{%
+\subsection{Compute the point estimate}\label{compute-the-point-estimate}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{mythbusters_yawn }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ yawn }\OperatorTok{~}\StringTok{ }\NormalTok{group)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`.
+\end{verbatim}
+
+Note that the \texttt{success} argument must be specified in situations such as this where the response variable has only two levels.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{mythbusters_yawn }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ yawn }\OperatorTok{~}\StringTok{ }\NormalTok{group, }\DataTypeTok{success =} \StringTok{"yes"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Response: yawn (factor)
+Explanatory: group (factor)
+# A tibble: 50 x 2
+   yawn  group  
+   <fct> <fct>  
+ 1 yes   seed   
+ 2 yes   control
+ 3 no    seed   
+ 4 yes   seed   
+ 5 no    seed   
+ 6 no    control
+ 7 yes   seed   
+ 8 no    control
+ 9 no    control
+10 no    seed   
+# ... with 40 more rows
+\end{verbatim}
+
+We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{mythbusters_yawn }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ yawn }\OperatorTok{~}\StringTok{ }\NormalTok{group, }\DataTypeTok{success =} \StringTok{"yes"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in props"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c("first", "second")` means `("first" - "second")`. Check `?calculate` for details.
+\end{verbatim}
+
+We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the \texttt{order} in which R should subtract these proportions of successes. As the error message states, we'll want to put \texttt{"seed"} first after \texttt{c()} and then \texttt{"control"}: \texttt{order\ =\ c("seed",\ "control")}. Our point estimate is thus calculated:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{obs_diff <-}\StringTok{ }\NormalTok{mythbusters_yawn }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ yawn }\OperatorTok{~}\StringTok{ }\NormalTok{group, }\DataTypeTok{success =} \StringTok{"yes"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in props"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"seed"}\NormalTok{, }\StringTok{"control"}\NormalTok{))}
+\NormalTok{obs_diff}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+    stat
+   <dbl>
+1 0.0441
+\end{verbatim}
+
+This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25).
+
+\hypertarget{bootstrap-distribution-2}{%
+\subsection{Bootstrap distribution}\label{bootstrap-distribution-2}}
+
+Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection \ref{bootstrap-process} and in computing bootstrap proportions in Section \ref{one-prop-ci}, but we haven't yet worked with bootstrapping involving multiple variables though.
+
+In the \texttt{infer} package, bootstrapping with multiple variables means that each \textbf{row} is potentially resampled. Let's investigate this by looking at the first few rows of \texttt{mythbusters\_yawn}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{head}\NormalTok{(mythbusters_yawn)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 6 x 3
+   subj group   yawn 
+  <int> <chr>   <chr>
+1     1 seed    yes  
+2     2 control yes  
+3     3 seed    no   
+4     4 seed    yes  
+5     5 seed    no   
+6     6 control no   
+\end{verbatim}
+
+When we bootstrap this data, we are potentially pulling the subject's readings multiple times. Thus, we could see the entries of \texttt{"seed"} for \texttt{group} and \texttt{"no"} for \texttt{yawn} together in a new row in a bootstrap sample. This is further seen by exploring the \texttt{sample\_n()} function in \texttt{dplyr} on this smaller 6 row data frame comprised of \texttt{head(mythbusters\_yawn)}. The \texttt{sample\_n()} function can perform this bootstrapping procedure and is similar to the \texttt{rep\_sample\_n()} function in \texttt{infer}, except that it is not \texttt{rep}eated but rather only performs one sample with or without replacement.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{set.seed}\NormalTok{(}\DecValTok{2019}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{head}\NormalTok{(mythbusters_yawn) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{sample_n}\NormalTok{(}\DataTypeTok{size =} \DecValTok{6}\NormalTok{, }\DataTypeTok{replace =} \OtherTok{TRUE}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 6 x 3
+   subj group   yawn 
+  <int> <chr>   <chr>
+1     5 seed    no   
+2     5 seed    no   
+3     2 control yes  
+4     4 seed    yes  
+5     1 seed    yes  
+6     1 seed    yes  
+\end{verbatim}
+
+We can see that in this bootstrap sample generated from the first six rows of \texttt{mythbusters\_yawn}, we have some rows repeated. The same is true when we perform the \texttt{generate()} step in \texttt{infer} as done below.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution <-}\StringTok{ }\NormalTok{mythbusters_yawn }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ yawn }\OperatorTok{~}\StringTok{ }\NormalTok{group, }\DataTypeTok{success =} \StringTok{"yes"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{1000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in props"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"seed"}\NormalTok{, }\StringTok{"control"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Setting `type = "bootstrap"` in `generate()`.
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{20}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-327-1} \end{center}
+
+This distribution is roughly symmetric and bell-shaped but isn't quite there. Let's use the percentile-based method to compute a 95\% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply \texttt{get\_ci()} can be used.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_distribution }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{(}\DataTypeTok{type =} \StringTok{"percentile"}\NormalTok{, }\DataTypeTok{level =} \FloatTok{0.95}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `2.5%` `97.5%`
+   <dbl>   <dbl>
+1 -0.219   0.293
+\end{verbatim}
+
+The confidence interval shown here includes the value of 0. We'll see in Chapter \ref{hypothesis-testing} further what this means in terms of this difference being statistically significant or not, but let's examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293.
+
+Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about ``95\% confident'') that the seed group had a higher proportion of yawning than the control group.
+
+Note that this all relates to the importance of denoting the \texttt{order} argument in the \texttt{calculate()} function. Since we specified \texttt{"seed"} and then \texttt{"control"} positive values for the statistic correspond to the \texttt{"seed"} proportion being higher, whereas negative values correspond to the \texttt{"control"} group being higher.
+
+We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that ``yawning is contagious'' being ``confirmed'' is not statistically appropriate.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+Practice problems to come soon!
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{conclusion-6}{%
+\section{Conclusion}\label{conclusion-6}}
+
+\hypertarget{whats-to-come-7}{%
+\subsection{What's to come?}\label{whats-to-come-7}}
+
+This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we'll dig much further into this in Chapter \ref{hypothesis-testing} up next!
+
+\hypertarget{script-of-r-code}{%
+\subsection{Script of R code}\label{script-of-r-code}}
+
+An R script file of all R code used in this chapter is available \href{scripts/09-confidence-intervals.R}{here}.
+
+\hypertarget{hypothesis-testing}{%
+\chapter{Hypothesis Testing}\label{hypothesis-testing}}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\begin{announcement}
+\textbf{In preparation for our first print edition to be published by
+CRC Press in Fall 2019, we're remodeling this chapter a bit. Don't
+expect major changes in content, but rather only minor changes in
+presentation. Our remodeling will be complete and available online at
+\href{https://moderndive.com/}{ModernDive.com} by early Summer 2019!}
+\end{announcement}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+We saw some of the main concepts of hypothesis testing introduced in Chapters \ref{sampling} and \ref{confidence-intervals}. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations.
+
+The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the \texttt{infer} package pipeline in Chapter \ref{confidence-intervals}. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix \ref{appendixB}.
+
+We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the \(t\)-test and normal-theory confidence intervals. You'll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook.
+
+\hypertarget{needed-packages-7}{%
+\subsection*{Needed packages}\label{needed-packages-7}}
+
+
+Let's load all the packages needed for this chapter (this assumes you've already installed them). If needed, read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(infer)}
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\KeywordTok{library}\NormalTok{(ggplot2movies)}
+\KeywordTok{library}\NormalTok{(broom)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+We saw some of the main concepts of hypothesis testing introduced in Chapters \ref{sampling} and \ref{confidence-intervals}. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations.
+
+The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the \texttt{infer} package pipeline in Chapter \ref{confidence-intervals}. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix \ref{appendixB}.
+
+We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the \(t\)-test and normal-theory confidence intervals. You'll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook.
+
+\hypertarget{needed-packages-8}{%
+\subsection*{Needed packages}\label{needed-packages-8}}
+
+
+Let's load all the packages needed for this chapter (this assumes you've already installed them). If needed, read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(infer)}
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\KeywordTok{library}\NormalTok{(ggplot2movies)}
+\KeywordTok{library}\NormalTok{(broom)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{when-inference-is-not-needed}{%
+\section{When inference is not needed}\label{when-inference-is-not-needed}}
+
+Before we delve into hypothesis testing, it's good to remember that there are cases where you need not perform a rigorous statistical inference. An important and time-saving skill is to \textbf{ALWAYS} do exploratory data analysis using \texttt{dplyr} and \texttt{ggplot2} before thinking about running a hypothesis test. Let's look at such an example selecting a sample of flights traveling to Boston and to San Francisco from New York City in the \texttt{flights} data frame in the \texttt{nycflights13} package. (We will remove flights with missing data first using \texttt{na.omit} and then sample 100 flights going to each of the two airports.)
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bos_sfo <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{na.omit}\NormalTok{() }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(dest }\OperatorTok{%in%}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{"BOS"}\NormalTok{, }\StringTok{"SFO"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(dest) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{sample_n}\NormalTok{(}\DecValTok{100}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+Suppose we were interested in seeing if the \texttt{air\_time} to SFO in San Francisco was statistically greater than the \texttt{air\_time} to BOS in Boston. As suggested, let's begin with some exploratory data analysis to get a sense for how the two variables of \texttt{air\_time} and \texttt{dest} relate for these two destination airports:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bos_sfo_summary <-}\StringTok{ }\NormalTok{bos_sfo }\OperatorTok{%>%}\StringTok{ }\KeywordTok{group_by}\NormalTok{(dest) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean_time =} \KeywordTok{mean}\NormalTok{(air_time),}
+            \DataTypeTok{sd_time =} \KeywordTok{sd}\NormalTok{(air_time))}
+\NormalTok{bos_sfo_summary}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 2 x 3
+  dest  mean_time sd_time
+  <chr>     <dbl>   <dbl>
+1 BOS        39.0    4.51
+2 SFO       349.    18.7 
+\end{verbatim}
+
+Looking at these results, we can clearly see that SFO \texttt{air\_time} is much larger than BOS \texttt{air\_time}. The standard deviation is also extremely informative here.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC10.1)} Could we make the same type of immediate conclusion that SFO had a statistically greater \texttt{air\_time} if, say, its corresponding standard deviation was 200 minutes? What about 100 minutes? Explain.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+To further understand just how different the \texttt{air\_time} variable is for BOS and SFO, let's look at a boxplot:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ bos_sfo, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ dest, }\DataTypeTok{y =}\NormalTok{ air_time)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_boxplot}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-340-1} \end{center}
+
+Since there is no overlap at all, we can conclude that the \texttt{air\_time} for San Francisco flights is statistically greater (at any level of significance) than the \texttt{air\_time} for Boston flights. This is a clear example of not needing to do anything more than some simple exploratory data analysis with descriptive statistics and data visualization to get an appropriate inferential conclusion. This is one reason why you should \textbf{ALWAYS} investigate the sample data first using \texttt{dplyr} and \texttt{ggplot2} via exploratory data analysis.
+
+As you get more and more practice with hypothesis testing, you'll be better able to determine in many cases whether or not the results will be statistically significant. There are circumstances where it is difficult to tell, but you should always try to make a guess FIRST about significance after you have completed your data exploration and before you actually begin the inferential techniques.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{ht-basics}{%
+\section{Basics of hypothesis testing}\label{ht-basics}}
+
+In a hypothesis test, we will use data from a sample to help us decide between two competing \emph{hypotheses} about a population. We make these hypotheses more concrete by specifying them in terms of at least one \emph{population parameter} of interest. We refer to the competing claims about the population as the \textbf{null hypothesis}, denoted by \(H_0\), and the \textbf{alternative (or research) hypothesis}, denoted by \(H_a\). The roles of these two hypotheses are NOT interchangeable.
+
+\begin{itemize}
+\tightlist
+\item
+  The claim for which we seek significant evidence is assigned to the alternative hypothesis. The alternative is usually what the experimenter or researcher wants to establish or find evidence for.
+\item
+  Usually, the null hypothesis is a claim that there really is ``no effect'' or ``no difference.'' In many cases, the null hypothesis represents the status quo or that nothing interesting is happening.\\
+\item
+  We assess the strength of evidence by assuming the null hypothesis is true and determining how unlikely it would be to see sample results/statistics as extreme (or more extreme) as those in the original sample.
+\end{itemize}
+
+Hypothesis testing brings about many weird and incorrect notions in the scientific community and society at large. One reason for this is that statistics has traditionally been thought of as this magic box of algorithms and procedures to get to results and this has been readily apparent if you do a Google search of ``flowchart statistics hypothesis tests.'' There are so many different complex ways to determine which test is appropriate.
+
+You'll see that we don't need to rely on these complicated series of assumptions and procedures to conduct a hypothesis test any longer. These methods were introduced in a time when computers weren't powerful. Your cellphone (in 2016) has more power than the computers that sent NASA astronauts to the moon after all. We'll see that ALL hypothesis tests can be broken down into the following framework given by Allen Downey \href{http://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.html}{here}:
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/ht} 
+
+}
+
+\caption{Hypothesis Testing Framework}\label{fig:htdowney}
+\end{figure}
+
+Before we hop into this framework, we will provide another way to think about hypothesis testing that may be useful.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{trial}{%
+\section{Criminal trial analogy}\label{trial}}
+
+We can think of hypothesis testing in the same context as a criminal trial in the United States. A criminal trial in the United States is a familiar situation in which a choice between two contradictory claims must be made.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\item
+  The accuser of the crime must be judged either guilty or not guilty.
+\item
+  Under the U.S. system of justice, the individual on trial is initially presumed not guilty.
+\item
+  Only STRONG EVIDENCE to the contrary causes the not guilty claim to be rejected in favor of a guilty verdict.
+\item
+  The phrase ``beyond a reasonable doubt'' is often used to set the cutoff value for when enough evidence has been given to convict.
+\end{enumerate}
+
+Theoretically, we should never say ``The person is innocent.'' but instead ``There is not sufficient evidence to show that the person is guilty.''
+
+Now let's compare that to how we look at a hypothesis test.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\item
+  The decision about the population parameter(s) must be judged to follow one of two hypotheses.
+\item
+  We initially assume that \(H_0\) is true.
+\item
+  The null hypothesis \(H_0\) will be rejected (in favor of \(H_a\)) only if the sample evidence strongly suggests that \(H_0\) is false. If the sample does not provide such evidence, \(H_0\) will not be rejected.
+\item
+  The analogy to ``beyond a reasonable doubt'' in hypothesis testing is what is known as the \textbf{significance level}. This will be set before conducting the hypothesis test and is denoted as \(\alpha\). Common values for \(\alpha\) are 0.1, 0.01, and 0.05.
+\end{enumerate}
+
+\hypertarget{two-possible-conclusions}{%
+\subsection{Two possible conclusions}\label{two-possible-conclusions}}
+
+Therefore, we have two possible conclusions with hypothesis testing:
+
+\begin{itemize}
+\tightlist
+\item
+  Reject \(H_0\)\\
+\item
+  Fail to reject \(H_0\)
+\end{itemize}
+
+Gut instinct says that ``Fail to reject \(H_0\)'' should say ``Accept \(H_0\)'' but this technically is not correct. Accepting \(H_0\) is the same as saying that a person is innocent. We cannot show that a person is innocent; we can only say that there was not enough substantial evidence to find the person guilty.
+
+When you run a hypothesis test, you are the jury of the trial. You decide whether there is enough evidence to convince yourself that \(H_a\) is true (``the person is guilty'') or that there was not enough evidence to convince yourself \(H_a\) is true (``the person is not guilty''). You must convince yourself (using statistical arguments) which hypothesis is the correct one given the sample information.
+
+\textbf{Important note:} Therefore, DO NOT WRITE ``Accept \(H_0\)'' any time you conduct a hypothesis test. Instead write ``Fail to reject \(H_0\).''
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{types-of-errors-in-hypothesis-testing}{%
+\section{Types of errors in hypothesis testing}\label{types-of-errors-in-hypothesis-testing}}
+
+Unfortunately, just as a jury or a judge can make an incorrect decision in regards to a criminal trial by reaching the wrong verdict, there is some chance we will reach the wrong conclusion via a hypothesis test about a population parameter. As with criminal trials, this comes from the fact that we don't have complete information, but rather a sample from which to try to infer about a population.
+
+The possible erroneous conclusions in a criminal trial are
+
+\begin{itemize}
+\tightlist
+\item
+  an innocent person is convicted (found guilty) or
+\item
+  a guilty person is set free (found not guilty).
+\end{itemize}
+
+The possible errors in a hypothesis test are
+
+\begin{itemize}
+\tightlist
+\item
+  rejecting \(H_0\) when in fact \(H_0\) is true (Type I Error) or
+\item
+  failing to reject \(H_0\) when in fact \(H_0\) is false (Type II Error).
+\end{itemize}
+
+The risk of error is the price researchers pay for basing an inference about a population on a sample. With any reasonable sample-based procedure, there is some chance that a Type I error will be made and some chance that a Type II error will occur.
+
+To help understand the concepts of Type I error and Type II error, observe the following table:
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/errors} 
+
+}
+
+\caption{Type I and Type II errors}\label{fig:unnamed-chunk-341}
+\end{figure}
+
+If we are using sample data to make inferences about a parameter, we run the risk of making a mistake. Obviously, we want to minimize our chance of error; we want a small probability of drawing an incorrect conclusion.
+
+\begin{itemize}
+\tightlist
+\item
+  The probability of a Type I Error occurring is denoted by \(\alpha\) and is called the \textbf{significance level} of a hypothesis test
+\item
+  The probability of a Type II Error is denoted by \(\beta\).
+\end{itemize}
+
+Formally, we can define \(\alpha\) and \(\beta\) in regards to the table above, but for hypothesis tests instead of a criminal trial.
+
+\begin{itemize}
+\tightlist
+\item
+  \(\alpha\) corresponds to the probability of rejecting \(H_0\) when, in fact, \(H_0\) is true.
+\item
+  \(\beta\) corresponds to the probability of failing to reject \(H_0\) when, in fact, \(H_0\) is false.
+\end{itemize}
+
+Ideally, we want \(\alpha = 0\) and \(\beta = 0\), meaning that the chance of making an error does not exist. When we have to use incomplete information (sample data), it is not possible to have both \(\alpha = 0\) and \(\beta = 0\). We will always have the possibility of at least one error existing when we use sample data.
+
+Usually, what is done is that \(\alpha\) is set before the hypothesis test is conducted and then the evidence is judged against that significance level. Common values for \(\alpha\) are 0.05, 0.01, and 0.10. If \(\alpha = 0.05\), we are using a testing procedure that, used over and over with different samples, rejects a TRUE null hypothesis five percent of the time.
+
+So if we can set \(\alpha\) to be whatever we want, why choose 0.05 instead of 0.01 or even better 0.0000000000000001? Well, a small \(\alpha\) means the test procedure requires the evidence against \(H_0\) to be \textbf{very strong} before we can reject \(H_0\). This means we will almost never reject \(H_0\) if \(\alpha\) is very small. If we almost never reject \(H_0\), the probability of a Type II Error -- failing to reject \(H_0\) when we should -- will \emph{increase}! Thus, as \(\alpha\) decreases, \(\beta\) increases and as \(\alpha\) increases, \(\beta\) decreases. We, therefore, need to strike a balance in \(\alpha\) and \(\beta\) and the common values for \(\alpha\) of 0.05, 0.01, and 0.10 usually lead to a nice balance.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC10.2)} Reproduce the table above about errors, but for a hypothesis test, instead of the one provided for a criminal trial.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{logic-of-hypothesis-testing}{%
+\subsection{Logic of hypothesis testing}\label{logic-of-hypothesis-testing}}
+
+\begin{itemize}
+\tightlist
+\item
+  Take a random sample (or samples) from a population (or multiple populations)
+\item
+  If the sample data are consistent with the null hypothesis, do not reject the null hypothesis.
+\item
+  If the sample data are inconsistent with the null hypothesis (in the direction of the alternative hypothesis), reject the null hypothesis and conclude that there is evidence the alternative hypothesis is true (based on the particular sample collected).
+\end{itemize}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{statistical-significance}{%
+\section{Statistical significance}\label{statistical-significance}}
+
+The idea that sample results are more extreme than we would reasonably expect to see by random chance if the null hypothesis were true is the fundamental idea behind statistical hypothesis tests. If data at least as extreme would be very unlikely if the null hypothesis were true, we say the data are \textbf{statistically significant}. Statistically significant data provide convincing evidence against the null hypothesis in favor of the alternative, and allow us to generalize our sample results to the claim about the population.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC10.3)} What is wrong about saying ``The defendant is innocent.'' based on the US system of criminal trials?
+
+\textbf{(LC10.4)} What is the purpose of hypothesis testing?
+
+\textbf{(LC10.5)} What are some flaws with hypothesis testing? How could we alleviate them?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{hypothesis-testing-with-infer}{%
+\section{Hypothesis testing with infer}\label{hypothesis-testing-with-infer}}
+
+The ``There is Only One Test'' diagram mentioned in Section \ref{ht-basics} was the inspiration for the \texttt{infer} pipeline that you saw for confidence intervals in Chapter \ref{confidence-intervals}. For hypothesis tests, we include one more verb into the pipeline: the \texttt{hypothesize()} verb. Its main argument is \texttt{null} which is either
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{"point"} for point hypotheses involving a single sample or
+\item
+  \texttt{"independence"} for testing for independence between two variables.
+\end{itemize}
+
+\begin{center}\includegraphics[width=\textwidth]{images/flowcharts/infer/ht} \end{center}
+
+We'll first explore the two variable case by comparing two means. Note the section headings here that refer to the ``There is Only One Test'' diagram. We will lay out the specifics for each problem using this framework and the \texttt{infer} pipeline together.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{example-comparing-two-means}{%
+\section{Example: Comparing two means}\label{example-comparing-two-means}}
+
+\hypertarget{randomizationpermutation}{%
+\subsection{Randomization/permutation}\label{randomizationpermutation}}
+
+We will now focus on building hypotheses looking at the difference between two population means in an example. We will denote population means using the Greek symbol \(\mu\) (pronounced ``mu''). Thus, we will be looking to see if one group ``out-performs'' another group. This is quite possibly the most common type of statistical inference and serves as a basis for many other types of analyses when comparing the relationship between two variables.
+
+Our null hypothesis will be of the form \(H_0: \mu_1 = \mu_2\), which can also be written as \(H_0: \mu_1 - \mu_2 = 0\). Our alternative hypothesis will be of the form \(H_0: \mu_1 \star \mu_2\) (or \(H_a: \mu_1 - \mu_2 \, \star \, 0\)) where \(\star\) = \(<\), \(\ne\), or \(>\) depending on the context of the problem. You needn't focus on these new symbols too much at this point. It will just be a shortcut way for us to describe our hypotheses.
+
+As we saw in Chapter \ref{confidence-intervals}, bootstrapping is a valuable tool when conducting inferences based on one or two population variables. We will see that the process of \textbf{randomization} (also known as \textbf{permutation}) will be valuable in conducting tests comparing quantitative values from two groups.
+
+\hypertarget{comparing-action-and-romance-movies}{%
+\subsection{Comparing action and romance movies}\label{comparing-action-and-romance-movies}}
+
+The \texttt{movies} dataset in the \texttt{ggplot2movies} package contains information on a large number of movies that have been rated by users of IMDB.com \citep{R-ggplot2movies}. We are interested in the question here of whether \texttt{Action} movies are rated higher on IMDB than \texttt{Romance} movies. We will first need to do a little bit of data wrangling using the ideas from Chapter \ref{wrangling} to get the data in the form that we would like:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{movies_trimmed <-}\StringTok{ }\NormalTok{movies }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(title, year, rating, Action, Romance)}
+\end{Highlighting}
+\end{Shaded}
+
+Note that \texttt{Action} and \texttt{Romance} are binary variables here. To remove any overlap of movies (and potential confusion) that are both \texttt{Action} and \texttt{Romance}, we will remove them from our \emph{population}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{movies_trimmed <-}\StringTok{ }\NormalTok{movies_trimmed }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{filter}\NormalTok{(}\OperatorTok{!}\NormalTok{(Action }\OperatorTok{==}\StringTok{ }\DecValTok{1} \OperatorTok{&}\StringTok{ }\NormalTok{Romance }\OperatorTok{==}\StringTok{ }\DecValTok{1}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+We will now create a new variable called \texttt{genre} that specifies whether a movie in our \texttt{movies\_trimmed} data frame is an \texttt{"Action"} movie, a \texttt{"Romance"} movie, or \texttt{"Neither"}. We aren't really interested in the \texttt{"Neither"} category here so we will exclude those rows as well. Lastly, the \texttt{Action} and \texttt{Romance} columns are not needed anymore since they are encoded in the \texttt{genre} column.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{movies_trimmed <-}\StringTok{ }\NormalTok{movies_trimmed }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{genre =} \KeywordTok{case_when}\NormalTok{(Action }\OperatorTok{==}\StringTok{ }\DecValTok{1} \OperatorTok{~}\StringTok{ "Action"}\NormalTok{,}
+\NormalTok{                           Romance }\OperatorTok{==}\StringTok{ }\DecValTok{1} \OperatorTok{~}\StringTok{ "Romance"}\NormalTok{,}
+                           \OtherTok{TRUE} \OperatorTok{~}\StringTok{ "Neither"}\NormalTok{)) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{filter}\NormalTok{(genre }\OperatorTok{!=}\StringTok{ "Neither"}\NormalTok{) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{select}\NormalTok{(}\OperatorTok{-}\NormalTok{Action, }\OperatorTok{-}\NormalTok{Romance)}
+\end{Highlighting}
+\end{Shaded}
+
+The \texttt{case\_when} function is useful for assigning values in a new variable based on the values of another variable. The last step of \texttt{TRUE\ \textasciitilde{}\ "Neither"} is used when a particular movie is not set to either Action or Romance.
+
+We are left with 8878 movies in our \emph{population} dataset that focuses on only \texttt{"Action"} and \texttt{"Romance"} movies.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC10.6)} Why are the different genre variables stored as binary variables (1s and 0s) instead of just listing the \texttt{genre} as a column of values like ``Action'', ``Comedy'', etc.?
+
+\textbf{(LC10.7)} What complications could come above with us excluding action romance movies? Should we question the results of our hypothesis test? Explain.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+Let's now visualize the distributions of \texttt{rating} across both levels of \texttt{genre}. Think about what type(s) of plot is/are appropriate here before you proceed:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ movies_trimmed, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ genre, }\DataTypeTok{y =}\NormalTok{ rating)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_boxplot}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-349-1} 
+
+}
+
+\caption{Rating vs genre in the population}\label{fig:unnamed-chunk-349}
+\end{figure}
+
+We can see that the middle 50\% of ratings for \texttt{"Action"} movies is more spread out than that of \texttt{"Romance"} movies in the population. \texttt{"Romance"} has outliers at both the top and bottoms of the scale though. We are initially interested in comparing the mean \texttt{rating} across these two groups so a faceted histogram may also be useful:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ movies_trimmed, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ rating)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{1}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_grid}\NormalTok{(genre }\OperatorTok{~}\StringTok{ }\NormalTok{.)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/movie-hist-1} 
+
+}
+
+\caption{Faceted histogram of genre vs rating}\label{fig:movie-hist}
+\end{figure}
+
+\textbf{Important note:} Remember that we hardly ever have access to the population values as we do here. This example and the \texttt{nycflights13} dataset were used to create a common flow from chapter to chapter. In nearly all circumstances, we'll be needing to use only a sample of the population to try to infer conclusions about the unknown population parameter values. These examples do show a nice relationship between statistics (where data is usually small and more focused on experimental settings) and data science (where data is frequently large and collected without experimental conditions).
+
+\hypertarget{sampling-rightarrow-randomization}{%
+\subsection{\texorpdfstring{Sampling \(\rightarrow\) randomization}{Sampling \textbackslash{}rightarrow randomization}}\label{sampling-rightarrow-randomization}}
+
+We can use hypothesis testing to investigate ways to determine, for example, whether a \textbf{treatment} has an effect over a \textbf{control} and other ways to statistically analyze if one group performs better than, worse than, or different than another. We are interested here in seeing how we can use a random sample of action movies and a random sample of romance movies from \texttt{movies} to determine if a statistical difference exists in the mean ratings of each group.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC10.8)} Define the relevant parameters here in terms of the populations of movies.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{data}{%
+\subsection{Data}\label{data}}
+
+Let's select a random sample of 34 action movies and a random sample of 34 romance movies. (The number 34 was chosen somewhat arbitrarily here.)
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{set.seed}\NormalTok{(}\DecValTok{2017}\NormalTok{)}
+\NormalTok{movies_genre_sample <-}\StringTok{ }\NormalTok{movies_trimmed }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(genre) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{sample_n}\NormalTok{(}\DecValTok{34}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{ungroup}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\textbf{Note} the addition of the \texttt{ungroup()} function here. This will be useful shortly in allowing us to permute the values of \texttt{rating} across \texttt{genre}. Our analysis does not work without this \texttt{ungroup()} function since the data stays grouped by the levels of \texttt{genre} without it.
+
+We can now observe the distributions of our two sample ratings for both groups. Remember that these plots
+should be rough approximations of our population distributions of movie ratings for \texttt{"Action"} and \texttt{"Romance"}
+in our population of all movies in the \texttt{movies} data frame.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ movies_genre_sample, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ genre, }\DataTypeTok{y =}\NormalTok{ rating)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_boxplot}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-352-1} 
+
+}
+
+\caption{Genre vs rating for our sample}\label{fig:unnamed-chunk-352}
+\end{figure}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ movies_genre_sample, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ rating)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{1}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_grid}\NormalTok{(genre }\OperatorTok{~}\StringTok{ }\NormalTok{.)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-353-1} 
+
+}
+
+\caption{Genre vs rating for our sample as faceted histogram}\label{fig:unnamed-chunk-353}
+\end{figure}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC10.9)} What single value could we change to improve the approximation using the sample distribution on the population distribution?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+Do we have reason to believe, based on the sample distributions of \texttt{rating} over the two groups of \texttt{genre}, that there is a significant difference between the mean \texttt{rating} for action movies compared to romance movies? It's hard to say just based on the plots. The boxplot does show that the median sample rating is higher for romance movies, but the histogram isn't as clear. The two groups have somewhat differently shaped distributions but they are both over similar values of \texttt{rating}. It's often useful to calculate the mean and standard deviation as well, conditioned on the two levels.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{summary_ratings <-}\StringTok{ }\NormalTok{movies_genre_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(genre) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(rating),}
+            \DataTypeTok{std_dev =} \KeywordTok{sd}\NormalTok{(rating),}
+            \DataTypeTok{n =} \KeywordTok{n}\NormalTok{())}
+\NormalTok{summary_ratings}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 2 x 4
+  genre    mean std_dev     n
+  <chr>   <dbl>   <dbl> <int>
+1 Action   5.11    1.49    34
+2 Romance  6.06    1.15    34
+\end{verbatim}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC10.10)} Why did we not specify \texttt{na.rm\ =\ TRUE} here as we did in Chapter \ref{wrangling}?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+We see that the sample mean rating for romance movies, \(\bar{x}_{r}\), is greater than the similar measure for action movies, \(\bar{x}_a\). But is it statistically significantly greater (thus, leading us to conclude that the means are statistically different)? The standard deviation can provide some insight here but with these standard deviations being so similar it's still hard to say for sure.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC10.11)} Why might the standard deviation provide some insight about the means being statistically different or not?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{model-of-h_0}{%
+\subsection{\texorpdfstring{Model of \(H_0\)}{Model of H\_0}}\label{model-of-h_0}}
+
+The hypotheses we specified can also be written in another form to better give us an idea of what we will be simulating to create our null distribution.
+
+\begin{itemize}
+\tightlist
+\item
+  \(H_0: \mu_r - \mu_a = 0\)
+\item
+  \(H_a: \mu_r - \mu_a \ne 0\)
+\end{itemize}
+
+\hypertarget{test-statistic-delta}{%
+\subsection{\texorpdfstring{Test statistic \(\delta\)}{Test statistic \textbackslash{}delta}}\label{test-statistic-delta}}
+
+We are, therefore, interested in seeing whether the difference in the sample means, \(\bar{x}_r - \bar{x}_a\), is statistically different than 0. We can now come back to our \texttt{infer} pipeline for computing our observed statistic. Note the \texttt{order} argument that shows the mean value for \texttt{"Action"} being subtracted from the mean value of \texttt{"Romance"}.
+
+\hypertarget{observed-effect-delta}{%
+\subsection{\texorpdfstring{Observed effect \(\delta^*\)}{Observed effect \textbackslash{}delta\^{}*}}\label{observed-effect-delta}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{obs_diff <-}\StringTok{ }\NormalTok{movies_genre_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ rating }\OperatorTok{~}\StringTok{ }\NormalTok{genre) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in means"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"Romance"}\NormalTok{, }\StringTok{"Action"}\NormalTok{))}
+\NormalTok{obs_diff}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1  0.95
+\end{verbatim}
+
+Our goal next is to figure out a random process with which to simulate the null hypothesis being true. Recall that \(H_0: \mu_r - \mu_a = 0\) corresponds to us assuming that the population means are the same. We would like to assume this is true and perform a random process to \texttt{generate()} data in the model of the null hypothesis.
+
+\hypertarget{simulated-data}{%
+\subsection{Simulated data}\label{simulated-data}}
+
+\textbf{Tactile simulation}
+
+Here, with us assuming the two population means are equal (\(H_0: \mu_r - \mu_a = 0\)), we can look at this from a tactile point of view by using index cards. There are \(n_r = 34\) data elements corresponding to romance movies and \(n_a = 34\) for action movies. We can write the 34 ratings from our sample for romance movies on one set of 34 index cards and the 34 ratings for action movies on another set of 34 index cards. (Note that the sample sizes need not be the same.)
+
+The next step is to put the two stacks of index cards together, creating a new set of 68 cards. If we assume that the two population means are equal, we are saying that there is no association between ratings and genre (romance vs action). We can use the index cards to create two \textbf{new} stacks for romance and action movies. Note that the \textbf{new} ``romance movie stack'' will likely have some of the original action movies in it and likewise for the ``action movie stack'' including some romance movies from our original set. Since we are assuming that each card is equally likely to have appeared in either one of the stacks this makes sense. First, we must shuffle all the cards thoroughly. After doing so, in this case with equal values of sample sizes, we split the deck in half.
+
+We then calculate the new sample mean rating of the romance deck, and also the new sample mean rating of the action deck. This creates one simulation of the samples that were collected originally. We next want to calculate a statistic from these two samples. Instead of actually doing the calculation using index cards, we can use R as we have before to simulate this process. Let's do this just once and compare the results to what we see in \texttt{movies\_genre\_sample}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{movies_genre_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ rating }\OperatorTok{~}\StringTok{ }\NormalTok{genre) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"independence"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{1}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in means"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"Romance"}\NormalTok{, }\StringTok{"Action"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1 0.515
+\end{verbatim}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC10.12)} How would the tactile shuffling of index cards change if we had different samples of say 20 action movies and 60 romance movies? Describe each step that would change.
+
+\textbf{(LC10.13)} Why are we taking the difference in the means of the cards in the new shuffled decks?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{distribution-of-delta-under-h_0}{%
+\subsection{\texorpdfstring{Distribution of \(\delta\) under \(H_0\)}{Distribution of \textbackslash{}delta under H\_0}}\label{distribution-of-delta-under-h_0}}
+
+The \texttt{generate()} step completes a permutation sending values of ratings to potentially different values of \texttt{genre} from which they originally came. It simulates a shuffling of the ratings between the two levels of \texttt{genre} just as we could have done with index cards. We can now proceed in a similar way to what we have done previously with bootstrapping by repeating this process many times to create simulated samples, assuming the null hypothesis is true.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{generated_samples <-}\StringTok{ }\NormalTok{movies_genre_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ rating }\OperatorTok{~}\StringTok{ }\NormalTok{genre) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"independence"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{5000}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+A \textbf{null distribution} of simulated differences in sample means is created with the specification of \texttt{stat\ =\ "diff\ in\ means"} for the \texttt{calculate()} step. The \textbf{null distribution} is similar to the bootstrap distribution we saw in Chapter \ref{confidence-intervals}, but remember that it consists of statistics generated assuming the null hypothesis is true.
+
+We can now plot the distribution of these simulated differences in means:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distribution_two_means }\OperatorTok{%>%}\StringTok{ }\KeywordTok{visualize}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-366-1} 
+
+}
+
+\caption{Simulated differences in means histogram}\label{fig:unnamed-chunk-366}
+\end{figure}
+
+\hypertarget{the-p-value}{%
+\subsection{The p-value}\label{the-p-value}}
+
+Remember that we are interested in seeing where our observed sample mean difference of 0.95 falls on this null/randomization distribution. We are interested in simply a difference here so ``more extreme'' corresponds to values in both tails on the distribution. Let's shade our null distribution to show a visual representation of our \(p\)-value:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distribution_two_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ obs_diff, }\DataTypeTok{direction =} \StringTok{"both"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-367-1} 
+
+}
+
+\caption{Shaded histogram to show p-value}\label{fig:unnamed-chunk-367}
+\end{figure}
+
+Remember that the observed difference in means was 0.95. We have shaded red all values at or above that value and also shaded red those values at or below its negative value (since this is a two-tailed test). By giving \texttt{obs\_stat\ =\ obs\_diff} a vertical darker line is also shown at 0.95. To better estimate how large the \(p\)-value will be, we also increase the number of bins to 100 here from 20:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distribution_two_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{100}\NormalTok{, }\DataTypeTok{obs_stat =}\NormalTok{ obs_diff, }\DataTypeTok{direction =} \StringTok{"both"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-368-1} 
+
+}
+
+\caption{Histogram with vertical lines corresponding to observed statistic}\label{fig:unnamed-chunk-368}
+\end{figure}
+
+At this point, it is important to take a guess as to what the \(p\)-value may be. We can see that there are only a few permuted differences as extreme or more extreme than our observed effect (in both directions). Maybe we guess that this \(p\)-value is somewhere around 2\%, or maybe 3\%, but certainly not 30\% or more. Lastly, we calculate the \(p\)-value directly using \texttt{infer}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pvalue <-}\StringTok{ }\NormalTok{null_distribution_two_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_pvalue}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ obs_diff, }\DataTypeTok{direction =} \StringTok{"both"}\NormalTok{)}
+\NormalTok{pvalue}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  p_value
+    <dbl>
+1   0.006
+\end{verbatim}
+
+We have around 0.6\% of values as extreme or more extreme than our observed statistic in both directions. Assuming we are using a 5\% significance level for \(\alpha\), we have evidence supporting the conclusion that the mean rating for romance movies is different from that of action movies. The next important idea is to better understand just how much higher of a mean rating can we expect the romance movies to have compared to that of action movies.
+
+\hypertarget{corresponding-confidence-interval}{%
+\subsection{Corresponding confidence interval}\label{corresponding-confidence-interval}}
+
+One of the great things about the \texttt{infer} pipeline is that going between hypothesis tests and confidence intervals is incredibly simple. To create a null distribution, we ran
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distribution_two_means <-}\StringTok{ }\NormalTok{movies_genre_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ rating }\OperatorTok{~}\StringTok{ }\NormalTok{genre) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"independence"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{5000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in means"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"Romance"}\NormalTok{, }\StringTok{"Action"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+To get the corresponding bootstrap distribution with which we can compute a confidence interval, we can just remove or comment out the \texttt{hypothesize()} step since we are no longer assuming the null hypothesis is true when we bootstrap:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{percentile_ci_two_means <-}\StringTok{ }\NormalTok{movies_genre_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ rating }\OperatorTok{~}\StringTok{ }\NormalTok{genre) }\OperatorTok{%>%}\StringTok{ }
+\CommentTok{#  hypothesize(null = "independence") %>% }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{5000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in means"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"Romance"}\NormalTok{, }\StringTok{"Action"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Setting `type = "bootstrap"` in `generate()`.
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{percentile_ci_two_means}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `2.5%` `97.5%`
+   <dbl>   <dbl>
+1  0.333    1.59
+\end{verbatim}
+
+Thus, we can expect the true mean of Romance movies on IMDB to have a rating 0.333 to 1.593 points higher than that of Action movies. Remember that this is based on bootstrapping using \texttt{movies\_genre\_sample} as our original sample and the confidence interval process being 95\% reliable.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC10.14)} Conduct the same analysis comparing action movies versus romantic movies using the median rating instead of the mean rating? What was different and what was the same?
+
+\textbf{(LC10.15)} What conclusions can you make from viewing the faceted histogram looking at \texttt{rating} versus \texttt{genre} that you couldn't see when looking at the boxplot?
+
+\textbf{(LC10.16)} Describe in a paragraph how we used Allen Downey's diagram to conclude if a statistical difference existed between mean movie ratings for action and romance movies.
+
+\textbf{(LC10.17)} Why are we relatively confident that the distributions of the sample ratings will be good approximations of the population distributions of ratings for the two genres?
+
+\textbf{(LC10.18)} Using the definition of ``\(p\)-value'', write in words what the \(p\)-value represents for the hypothesis test above comparing the mean rating of romance to action movies.
+
+\textbf{(LC10.19)} What is the value of the \(p\)-value for the hypothesis test comparing the mean rating of romance to action movies?
+
+\textbf{(LC10.20)} Do the results of the hypothesis test match up with the original plots we made looking at the population of movies? Why or why not?
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{summary-5}{%
+\subsection{Summary}\label{summary-5}}
+
+To review, these are the steps one would take whenever you'd like to do a hypothesis test comparing
+values from the distributions of two groups:
+
+\begin{itemize}
+\item
+  Simulate many samples using a random process that matches the way
+  the original data were collected and that \emph{assumes the null hypothesis is
+  true}.
+\item
+  Collect the values of a sample statistic for each sample created using this random process to build
+  a \emph{null distribution}.
+\item
+  Assess the significance of the \emph{original} sample by determining where
+  its sample statistic lies in the null distribution.
+\item
+  If the proportion of values as extreme or more extreme than the observed statistic in the randomization
+  distribution is smaller than the pre-determined significance level \(\alpha\), we reject \(H_0\). Otherwise,
+  we fail to reject \(H_0\). (If no significance level is given, one can assume \(\alpha = 0.05\).)
+\end{itemize}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{theory-hypo}{%
+\section{Building theory-based methods using computation}\label{theory-hypo}}
+
+As a point of reference, we will now discuss the traditional theory-based way to conduct the hypothesis test for determining if there is a statistically significant difference in the sample mean rating of Action movies versus Romance movies. This method and ones like it work very well when the assumptions are met in order to run the test. They are based on probability models and distributions such as the normal and \(t\)-distributions.
+
+These traditional methods have been used for many decades back to the time when researchers didn't have access to computers that could run 5000 simulations in a few seconds. They had to base their methods on probability theory instead. Many fields and researchers continue to use these methods and that is the biggest reason for their inclusion here. It's important to remember that a \(t\)-test or a \(z\)-test is really just an approximation of what you have seen in this chapter already using simulation and randomization. The focus here is on understanding how the shape of the \(t\)-curve comes about without digging big into the mathematical underpinnings.
+
+\hypertarget{example-t-test-for-two-independent-samples}{%
+\subsection{\texorpdfstring{Example: \(t\)-test for two independent samples}{Example: t-test for two independent samples}}\label{example-t-test-for-two-independent-samples}}
+
+What is commonly done in statistics is the process of standardization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common standardization is known as the \(z\)-score. The formula for a \(z\)-score is \[Z = \frac{x - \mu}{\sigma},\] where \(x\) represent the value of a variable, \(\mu\) represents the mean of the variable, and \(\sigma\) represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding \(z\)-score that gives how many standard deviations away that value is from its mean. \(z\)-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below.
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-373-1} \end{center}
+
+Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity.
+
+Another form of standardization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This standardization is often called the \(t\)-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is \[T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}}  }\]
+
+There is a lot to try to unpack here.
+
+\begin{itemize}
+\tightlist
+\item
+  \(\bar{x}_1\) is the sample mean response of the first group
+\item
+  \(\bar{x}_2\) is the sample mean response of the second group
+\item
+  \(\mu_1\) is the population mean response of the first group
+\item
+  \(\mu_2\) is the population mean response of the second group
+\item
+  \(s_1\) is the sample standard deviation of the response of the first group
+\item
+  \(s_2\) is the sample standard deviation of the response of the second group
+\item
+  \(n_1\) is the sample size of the first group
+\item
+  \(n_2\) is the sample size of the second group
+\end{itemize}
+
+Assuming that the null hypothesis is true (\(H_0: \mu_1 - \mu_2 = 0\)), \(T\) is said to be distributed following a \(t\) distribution with degrees of freedom equal to the smaller value of \(n_1 - 1\) and \(n_2 - 1\). The ``degrees of freedom'' can be thought of measuring how different the \(t\) distribution will be as compared to a normal distribution. Small sample sizes lead to small degrees of freedom and, thus, \(t\) distributions that have more values in the tails of their distributions. Large sample sizes lead to large degrees of freedom and, thus, \(t\) distributions that closely align with the standard normal, bell-shaped curve.
+
+So, assuming \(H_0\) is true, our formula simplifies a bit:
+
+\[T =\dfrac{ \bar{x}_1 - \bar{x}_2}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}}  }.\]
+
+We have already built an approximation for what we think the distribution of \(\delta = \bar{x}_1 - \bar{x}_2\) looks like using randomization above. Recall this distribution:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ null_distribution_two_means, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ stat)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{bins =} \DecValTok{20}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-374-1} 
+
+}
+
+\caption{Simulated differences in means histogram}\label{fig:unnamed-chunk-374}
+\end{figure}
+
+The \texttt{infer} package also includes some built-in theory-based statistics as well, so instead of going through the process of trying to transform the difference into a standardized form, we can just provide a different value for \texttt{stat} in \texttt{calculate()}. Recall the \texttt{generated\_samples} data frame created via:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{generated_samples <-}\StringTok{ }\NormalTok{movies_genre_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ rating }\OperatorTok{~}\StringTok{ }\NormalTok{genre) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"independence"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{5000}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+We can now created a null distribution of \(t\) statistics:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distribution_t <-}\StringTok{ }\NormalTok{generated_samples }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"t"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"Romance"}\NormalTok{, }\StringTok{"Action"}\NormalTok{))}
+\NormalTok{null_distribution_t }\OperatorTok{%>%}\StringTok{ }\KeywordTok{visualize}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-376-1} \end{center}
+
+We see that the shape of this \texttt{stat\ =\ "t"} distribution is the same as that of \texttt{stat\ =\ "diff\ in\ means"}. The scale has changed though with the \(t\) values having less spread than the difference in means.
+
+A traditional \(t\)-test doesn't look at this simulated distribution, but instead it looks at the \(t\)-curve with degrees of freedom equal to 62.029. We can overlay this distribution over the top of our permuted \(t\) statistics using the \texttt{method\ =\ "both"} setting in \texttt{visualize()}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distribution_t }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{method =} \StringTok{"both"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-377-1} \end{center}
+
+We can see that the curve does a good job of approximating the randomization distribution here. (More on when to expect for this to be the case when we discuss conditions for the \(t\)-test in a bit.) To calculate the \(p\)-value in this case, we need to figure out how much of the total area under the \(t\)-curve is at our observed \(T\)-statistic or more, plus also adding the area under the curve at the negative value of the observed \(T\)-statistic or below. (Remember this is a two-tailed test so we are looking for a difference--values in the tails of either direction.) Just as we converted all of the simulated values to \(T\)-statistics, we must also do so for our observed effect \(\delta^*\):
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{obs_t <-}\StringTok{ }\NormalTok{movies_genre_sample }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ rating }\OperatorTok{~}\StringTok{ }\NormalTok{genre) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"t"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"Romance"}\NormalTok{, }\StringTok{"Action"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+So graphically we are interested in finding the percentage of values that are at or above 2.945 or at or below -2.945.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distribution_t }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{method =} \StringTok{"both"}\NormalTok{, }\DataTypeTok{obs_stat =}\NormalTok{ obs_t, }\DataTypeTok{direction =} \StringTok{"both"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-379-1} \end{center}
+
+As we might have expected with this just being a standardization of the difference in means statistic that produced a small \(p\)-value, we also have a very small one here.
+
+\hypertarget{conditions-for-t-test}{%
+\subsection{Conditions for t-test}\label{conditions-for-t-test}}
+
+The \texttt{infer} package does not automatically check conditions for the theoretical methods to work and this warning was given when we used \texttt{method\ =\ "both"}. In order for the results of the \(t\)-test to be valid, three conditions must be met:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Independent observations in both samples
+\item
+  Nearly normal populations OR large sample sizes (\(n \ge 30\))
+\item
+  Independently selected samples
+\end{enumerate}
+
+Condition 1: This is met since we sampled at random using R from our population.
+
+Condition 2: Recall from Figure \ref{fig:movie-hist}, that we know how the populations are distributed. Both of them are close to normally distributed. If we are a little concerned about this assumption, we also do have samples of size larger than 30 (\(n_1 = n_2 = 34\)).
+
+Condition 3: This is met since there is no natural pairing of a movie in the Action group to a movie in the Romance group.
+
+Since all three conditions are met, we can be reasonably certain that the theory-based test will match the results of the randomization-based test using shuffling. Remember that theory-based tests can produce some incorrect results in these assumptions are not carefully checked. The only assumption for randomization and computational-based methods is that the sample is selected at random. They are our preference and we strongly believe they should be yours as well, but it's important to also see how the theory-based tests can be done and used as an approximation for the computational techniques until at least more researchers are using these techniques that utilize the power of computers.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{conclusion-7}{%
+\section{Conclusion}\label{conclusion-7}}
+
+We conclude by showing the \texttt{infer} pipeline diagram. In Chapter \ref{inference-for-regression}, we'll come back to regression and see how the ideas covered in Chapter \ref{confidence-intervals} and this chapter can help in understanding the significance of predictors in modeling.
+
+\begin{center}\includegraphics[width=\textwidth]{images/flowcharts/infer/ht_diagram} \end{center}
+
+\hypertarget{script-of-r-code-1}{%
+\subsection{Script of R code}\label{script-of-r-code-1}}
+
+An R script file of all R code used in this chapter is available \href{scripts/10-hypothesis-testing.R}{here}.
+
+\hypertarget{inference-for-regression}{%
+\chapter{Inference for Regression}\label{inference-for-regression}}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\begin{announcement}
+\textbf{In preparation for our first print edition to be published by
+CRC Press in Fall 2019, we're remodeling this chapter a bit. Don't
+expect major changes in content, but rather only minor changes in
+presentation. Our remodeling will be complete and available online at
+\href{https://moderndive.com/}{ModernDive.com} by early Summer 2019!}
+\end{announcement}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{needed-packages-9}{%
+\subsection*{Needed packages}\label{needed-packages-9}}
+
+
+Let's load all the packages needed for this chapter (this assumes you've already installed them). Read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(moderndive)}
+\KeywordTok{library}\NormalTok{(infer)}
+\KeywordTok{library}\NormalTok{(gapminder)}
+\KeywordTok{library}\NormalTok{(ISLR)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{simulation-based-inference-for-regression}{%
+\section{Simulation-based Inference for Regression}\label{simulation-based-inference-for-regression}}
+
+We can also use the concept of permuting to determine the standard error of our null distribution and conduct a hypothesis test for a population slope. Let's go back to our example on teacher evaluations Chapters \ref{regression} and \ref{multiple-regression}. We'll begin in the basic regression setting to test to see if we have evidence that a statistically significant \emph{positive} relationship exists between teaching and beauty scores for the University of Texas professors. As we did in Chapter \ref{regression}, teaching \texttt{score} will act as our outcome variable and \texttt{bty\_avg} will be our explanatory variable. We will set up this hypothesis testing process as we have each before via the ``There is Only One Test'' diagram in Figure \ref{fig:htdowney} using the \texttt{infer} package.
+
+\hypertarget{data-1}{%
+\subsection{Data}\label{data-1}}
+
+Our data is stored in \texttt{evals} and we are focused on the measurements of the \texttt{score} and \texttt{bty\_avg} variables there. Note that we don't choose a subset of variables here since we will \texttt{specify()} the variables of interest using \texttt{infer}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{evals }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{bty_avg)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Response: score (numeric)
+Explanatory: bty_avg (numeric)
+# A tibble: 463 x 2
+   score bty_avg
+   <dbl>   <dbl>
+ 1   4.7    5   
+ 2   4.1    5   
+ 3   3.9    5   
+ 4   4.8    5   
+ 5   4.6    3   
+ 6   4.3    3   
+ 7   2.8    3   
+ 8   4.1    3.33
+ 9   3.4    3.33
+10   4.5    3.17
+# ... with 453 more rows
+\end{verbatim}
+
+\hypertarget{test-statistic-delta-1}{%
+\subsection{\texorpdfstring{Test statistic \(\delta\)}{Test statistic \textbackslash{}delta}}\label{test-statistic-delta-1}}
+
+Our test statistic here is the sample slope coefficient that we denote with \(b_1\).
+
+\hypertarget{observed-effect-delta-1}{%
+\subsection{\texorpdfstring{Observed effect \(\delta^*\)}{Observed effect \textbackslash{}delta\^{}*}}\label{observed-effect-delta-1}}
+
+We can use the \texttt{specify()\ \%\textgreater{}\%\ calculate()} shortcut here to determine the slope value seen in our observed data:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{slope_obs <-}\StringTok{ }\NormalTok{evals }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{bty_avg) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"slope"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+The calculated slope value from our observed sample is \(b_1 = 0.067\).
+
+\hypertarget{model-of-h_0-1}{%
+\subsection{\texorpdfstring{Model of \(H_0\)}{Model of H\_0}}\label{model-of-h_0-1}}
+
+We are looking to see if a positive relationship exists so \(H_A: \beta_1 > 0\). Our null hypothesis is always in terms of equality so we have \(H_0: \beta_1 = 0\). In other words, when we assume the null hypothesis is true, we are assuming there is NOT a linear relationship between teaching and beauty scores for University of Texas professors.
+
+\hypertarget{simulated-data-1}{%
+\subsection{Simulated data}\label{simulated-data-1}}
+
+Now to simulate the null hypothesis being true and recreating how our sample was created, we need to think about what it means for \(\beta_1\) to be zero. If \(\beta_1 = 0\), we said above that there is no relationship between the teaching and beauty scores. If there is no relationship, then any one of the teaching score values could have just as likely occurred with any of the other beauty score values instead of the one that it actually did fall with. We, therefore, have another example of permuting in our simulating of data under the null hypothesis.
+
+\textbf{Tactile simulation}
+
+We could use a deck of 926 note cards to create a tactile simulation of this permuting process. We would write the 463 different values of beauty scores on each of the 463 cards, one per card. We would then do the same thing for the 463 teaching scores putting them on one per card.
+
+Next, we would lay out each of the 463 beauty score cards and we would shuffle the teaching score deck. Then, after shuffling the deck well, we would disperse the cards one per each one of the beauty score cards. We would then enter these new values in for teaching score and compute a sample slope based on this permuting. We could repeat this process many times, keeping track of our sample slope after each shuffle.
+
+\hypertarget{distribution-of-delta-under-h_0-1}{%
+\subsection{\texorpdfstring{Distribution of \(\delta\) under \(H_0\)}{Distribution of \textbackslash{}delta under H\_0}}\label{distribution-of-delta-under-h_0-1}}
+
+We can build our null distribution in much the same way we did in Chapter \ref{hypothesis-testing} using the \texttt{generate()} and \texttt{calculate()} functions. Note also the addition of the \texttt{hypothesize()} function, which lets \texttt{generate()} know to perform the permuting instead of bootstrapping.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_slope_distn <-}\StringTok{ }\NormalTok{evals }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{bty_avg) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"independence"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"slope"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_slope_distn }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ slope_obs, }\DataTypeTok{direction =} \StringTok{"greater"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-388-1} \end{center}
+
+In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let's calculate it next using a similar syntax to what was done with \texttt{visualize()}.
+
+\hypertarget{the-p-value-1}{%
+\subsection{The p-value}\label{the-p-value-1}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_slope_distn }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_pvalue}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ slope_obs, }\DataTypeTok{direction =} \StringTok{"greater"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  p_value
+    <dbl>
+1       0
+\end{verbatim}
+
+Since 0.067 falls far to the right of this plot beyond where any of the histogram bins have data, we can say that we have a \(p\)-value of 0. We, thus, have evidence to reject the null hypothesis in support of there being a positive association between the beauty score and teaching score of University of Texas faculty members.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC11.1)} Repeat the inference above but this time for the correlation coefficient instead of the slope. Note the implementation of \texttt{stat\ =\ "correlation"} in the \texttt{calculate()} function of the \texttt{infer} package.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{bootstrapping-for-the-regression-slope}{%
+\section{Bootstrapping for the regression slope}\label{bootstrapping-for-the-regression-slope}}
+
+With the p-value calculated as 0 in the hypothesis test above, we can next determine just how strong of a positive slope value we might expect between the variables of teaching \texttt{score} and beauty score (\texttt{bty\_avg}) for University of Texas faculty. Recall the \texttt{infer} pipeline above to compute the null distribution. Recall that this assumes the null hypothesis is true that there is no relationship between teaching score and beauty score using the \texttt{hypothesize()} function.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_slope_distn <-}\StringTok{ }\NormalTok{evals }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{bty_avg) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"independence"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{, }\DataTypeTok{type =} \StringTok{"permute"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"slope"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+To further reinforce the process being done in the pipeline, we've added the \texttt{type} argument to \texttt{generate()}. This is automatically added based on the entries for \texttt{specify()} and \texttt{hypothesize()} but it provides a useful way to check to make sure \texttt{generate()} is created the samples in the desired way. In this case, we \texttt{permute}d the values of one variable across the values of the other 10,000 times and \texttt{calculate}d a \texttt{"slope"} coefficient for each of these 10,000 \texttt{generate}d samples.
+
+If instead we'd like to get a range of plausible values for the true slope value, we can use the process of bootstrapping:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_slope_distn <-}\StringTok{ }\NormalTok{evals }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{bty_avg) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{, }\DataTypeTok{type =} \StringTok{"bootstrap"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"slope"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{bootstrap_slope_distn }\OperatorTok{%>%}\StringTok{ }\KeywordTok{visualize}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-394-1} \end{center}
+
+Next we can use the \texttt{get\_ci()} function to determine the confidence interval. Let's do this in two different ways obtaining 99\% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{percentile_slope_ci <-}\StringTok{ }\NormalTok{bootstrap_slope_distn }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{(}\DataTypeTok{level =} \FloatTok{0.99}\NormalTok{, }\DataTypeTok{type =} \StringTok{"percentile"}\NormalTok{)}
+\NormalTok{percentile_slope_ci}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `0.5%` `99.5%`
+   <dbl>   <dbl>
+1 0.0229   0.110
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{se_slope_ci <-}\StringTok{ }\NormalTok{bootstrap_slope_distn }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{(}\DataTypeTok{level =} \FloatTok{0.99}\NormalTok{, }\DataTypeTok{type =} \StringTok{"se"}\NormalTok{, }\DataTypeTok{point_estimate =}\NormalTok{ slope_obs)}
+\NormalTok{se_slope_ci}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+   lower upper
+   <dbl> <dbl>
+1 0.0220 0.111
+\end{verbatim}
+
+With the bootstrap distribution being close to symmetric, it makes sense that the two resulting confidence intervals are similar.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{inference-for-multiple-regression}{%
+\section{Inference for multiple regression}\label{inference-for-multiple-regression}}
+
+\hypertarget{refresher-professor-evaluations-data}{%
+\subsection{Refresher: Professor evaluations data}\label{refresher-professor-evaluations-data}}
+
+Let's revisit the professor evaluations data that we analyzed using multiple regression with one numerical and one categorical predictor. In particular
+
+\begin{itemize}
+\tightlist
+\item
+  \(y\): outcome variable of instructor evaluation \texttt{score}
+\item
+  predictor variables
+
+  \begin{itemize}
+  \tightlist
+  \item
+    \(x_1\): numerical explanatory/predictor variable of \texttt{age}
+  \item
+    \(x_2\): categorical explanatory/predictor variable of \texttt{gender}
+  \end{itemize}
+\end{itemize}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(moderndive)}
+
+\NormalTok{evals_multiple <-}\StringTok{ }\NormalTok{evals }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{select}\NormalTok{(score, ethnicity, gender, language, age, bty_avg, rank)}
+\end{Highlighting}
+\end{Shaded}
+
+First, recall that we had two competing potential models to explain professors'
+teaching scores:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Model 1: No interaction term. i.e.~both male and female profs have the same slope describing the associated effect of age on teaching score
+\item
+  Model 2: Includes an interaction term. i.e.~we allow for male and female profs to have different slopes describing the associated effect of age on teaching score
+\end{enumerate}
+
+\hypertarget{refresher-visualizations}{%
+\subsection{Refresher: Visualizations}\label{refresher-visualizations}}
+
+Recall the plots we made for both these models:
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/model1-1} 
+
+}
+
+\caption{Model 1: no interaction effect included}\label{fig:model1}
+\end{figure}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/model2-1} 
+
+}
+
+\caption{Model 2: interaction effect included}\label{fig:model2}
+\end{figure}
+
+\hypertarget{refresher-regression-tables}{%
+\subsection{Refresher: Regression tables}\label{refresher-regression-tables}}
+
+Last, let's recall the regressions we fit. First, the regression with no
+interaction effect: note the use of \texttt{+} in the formula in Table \ref{tab:modelmultireg}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{score_model_}\DecValTok{2}\NormalTok{ <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{age }\OperatorTok{+}\StringTok{ }\NormalTok{gender, }\DataTypeTok{data =}\NormalTok{ evals_multiple)}
+\KeywordTok{get_regression_table}\NormalTok{(score_model_}\DecValTok{2}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:modelmultireg}Model 1: Regression table with no interaction effect included}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrrrrrr}
+\toprule
+term & estimate & std\_error & statistic & p\_value & lower\_ci & upper\_ci\\
+\midrule
+intercept & 4.484 & 0.125 & 35.79 & 0.000 & 4.238 & 4.730\\
+age & -0.009 & 0.003 & -3.28 & 0.001 & -0.014 & -0.003\\
+gendermale & 0.191 & 0.052 & 3.63 & 0.000 & 0.087 & 0.294\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Second, the regression with an interaction effect: note the use of \texttt{*} in the formula.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{score_model_}\DecValTok{3}\NormalTok{ <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{age }\OperatorTok{*}\StringTok{ }\NormalTok{gender, }\DataTypeTok{data =}\NormalTok{ evals_multiple)}
+\KeywordTok{get_regression_table}\NormalTok{(score_model_}\DecValTok{3}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:unnamed-chunk-401}Model 2: Regression table with interaction effect included}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrrrrrr}
+\toprule
+term & estimate & std\_error & statistic & p\_value & lower\_ci & upper\_ci\\
+\midrule
+intercept & 4.883 & 0.205 & 23.80 & 0.000 & 4.480 & 5.286\\
+age & -0.018 & 0.004 & -3.92 & 0.000 & -0.026 & -0.009\\
+gendermale & -0.446 & 0.265 & -1.68 & 0.094 & -0.968 & 0.076\\
+age:gendermale & 0.014 & 0.006 & 2.45 & 0.015 & 0.003 & 0.024\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+\hypertarget{script-of-r-code-2}{%
+\subsection{Script of R code}\label{script-of-r-code-2}}
+
+An R script file of all R code used in this chapter is available \href{scripts/11-inference-for-regression.R}{here}.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{residual-analysis}{%
+\section{Residual analysis}\label{residual-analysis}}
+
+\hypertarget{model1residuals}{%
+\subsection{Residual analysis}\label{model1residuals}}
+
+Recall the residuals can be thought of as the error or the ``lack-of-fit'' between the observed value \(y\) and the fitted value \(\widehat{y}\) on the blue regression line in Figure \ref{fig:numxplot4}. Ideally when we fit a regression model, we'd like there to be \emph{no systematic pattern} to these residuals. We'll be more specific as to what we mean by \emph{no systematic pattern} when we see Figure \ref{fig:numxplot7} below, but let's keep this notion imprecise for now. Investigating any such patterns is known as \emph{residual analysis} and is the theme of this section.
+
+We'll perform our residual analysis in two ways:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Creating a scatterplot with the residuals on the \(y\)-axis and the original explanatory variable \(x\) on the \(x\)-axis.
+\item
+  Creating a histogram of the residuals, thereby showing the \emph{distribution} of the residuals.
+\end{enumerate}
+
+First, recall in Figure \ref{fig:numxplot5} above we created a scatterplot where
+
+\begin{itemize}
+\tightlist
+\item
+  on the vertical axis we had the teaching score \(y\),
+\item
+  on the horizontal axis we had the beauty score \(x\), and
+\item
+  the blue arrow represented the residual for one particular instructor.
+\end{itemize}
+
+Instead, in Figure \ref{fig:numxplot6} below, let's create a scatterplot where
+
+\begin{itemize}
+\tightlist
+\item
+  On the vertical axis we have the residual \(y-\widehat{y}\) instead
+\item
+  On the horizontal axis we have the beauty score \(x\) as before:
+\end{itemize}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Get data}
+\NormalTok{evals_ch6 <-}\StringTok{ }\NormalTok{evals }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{select}\NormalTok{(score, bty_avg, age)}
+\CommentTok{# Fit regression model:}
+\NormalTok{score_model <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{bty_avg, }\DataTypeTok{data =}\NormalTok{ evals_ch6)}
+\CommentTok{# Get regression table:}
+\KeywordTok{get_regression_table}\NormalTok{(score_model)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 2 x 7
+  term     estimate std_error statistic p_value lower_ci upper_ci
+  <chr>       <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
+1 interce~    3.88      0.076     51.0        0    3.73     4.03 
+2 bty_avg     0.067     0.016      4.09       0    0.035    0.099
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Get regression points}
+\NormalTok{regression_points <-}\StringTok{ }\KeywordTok{get_regression_points}\NormalTok{(score_model)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(regression_points, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ bty_avg, }\DataTypeTok{y =}\NormalTok{ residual)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Beauty Score"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Residual"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_hline}\NormalTok{(}\DataTypeTok{yintercept =} \DecValTok{0}\NormalTok{, }\DataTypeTok{col =} \StringTok{"blue"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{1}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxplot6-1} 
+
+}
+
+\caption{Plot of residuals over beauty score}\label{fig:numxplot6}
+\end{figure}
+
+You can think of Figure \ref{fig:numxplot6} as Figure \ref{fig:numxplot5} but with the blue line flattened out to \(y=0\). Does it seem like there is \emph{no systematic pattern} to the residuals? This question is rather qualitative and subjective in nature, thus different people may respond with different answers to the above question. However, it can be argued that there isn't a \emph{drastic} pattern in the residuals.
+
+Let's now get a little more precise in our definition of \emph{no systematic pattern} in the residuals. Ideally, the residuals should behave \emph{randomly}. In addition,
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  the residuals should be on average 0. In other words, sometimes the regression model will make a positive error in that \(y - \widehat{y} > 0\), sometimes the regression model will make a negative error in that \(y - \widehat{y} < 0\), but \emph{on average} the error is 0.
+\item
+  Further, the value and spread of the residuals should not depend on the value of \(x\).
+\end{enumerate}
+
+In Figure \ref{fig:numxplot7} below, we display some hypothetical examples where there are \emph{drastic} patterns to the residuals. In Example 1, the value of the residual seems to depend on \(x\): the residuals tend to be positive for small and large values of \(x\) in this range, whereas values of \(x\) more in the middle tend to have negative residuals. In Example 2, while the residuals seem to be on average 0 for each value of \(x\), the spread of the residuals varies for different values of \(x\); this situation is known as \emph{heteroskedasticity}.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxplot7-1} 
+
+}
+
+\caption{Examples of less than ideal residual patterns}\label{fig:numxplot7}
+\end{figure}
+
+The second way to perform a residual analysis is to look at the histogram of the residuals:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(regression_points, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ residual)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.25}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Residual"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/model1residualshist-1} 
+
+}
+
+\caption{Histogram of residuals}\label{fig:model1residualshist}
+\end{figure}
+
+This histogram seems to indicate that we have more positive residuals than negative. Since the residual \(y-\widehat{y}\) is positive when \(y > \widehat{y}\), it seems our fitted teaching score from the regression model tends to \emph{underestimate} the true teaching score. This histogram has a slight \emph{left-skew} in that there is a long tail on the left. Another way to say this is this data exhibits a \emph{negative skew}. Is this a problem? Again, there is a certain amount of subjectivity in the response. In the authors' opinion, while there is a slight skew/pattern to the residuals, it isn't a large concern. On the other hand, others might disagree with our assessment. Here are examples of an ideal and less than ideal pattern to the residuals when viewed in a histogram:
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/numxplot9-1} 
+
+}
+
+\caption{Examples of ideal and less than ideal residual patterns}\label{fig:numxplot9}
+\end{figure}
+
+In fact, we'll see later on that we would like the residuals to be \emph{normally distributed} with
+mean 0. In other words, be bell-shaped and centered at 0! While this requirement and residual analysis in general may seem to some of you as not being overly critical at this point, we'll see later after when we cover \emph{inference for regression} in Chapter \ref{inference-for-regression} that for the last five columns of the regression table from earlier (\texttt{std\ error}, \texttt{statistic}, \texttt{p\_value},\texttt{lower\_ci}, and \texttt{upper\_ci}) to have valid interpretations, the above three conditions should roughly hold.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC11.2)} Continuing with our regression using \texttt{age} as the explanatory variable and teaching \texttt{score} as the outcome variable, use the \texttt{get\_regression\_points()} function to get the observed values, fitted values, and residuals for all 463 instructors. Perform a residual analysis and look for any systematic patterns in the residuals. Ideally, there should be little to no pattern.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{model2residuals}{%
+\subsection{Residual analysis}\label{model2residuals}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Get data:}
+\NormalTok{gapminder2007 <-}\StringTok{ }\NormalTok{gapminder }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{filter}\NormalTok{(year }\OperatorTok{==}\StringTok{ }\DecValTok{2007}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(country, continent, lifeExp, gdpPercap)}
+\CommentTok{# Fit regression model:}
+\NormalTok{lifeExp_model <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(lifeExp }\OperatorTok{~}\StringTok{ }\NormalTok{continent, }\DataTypeTok{data =}\NormalTok{ gapminder2007)}
+\CommentTok{# Get regression table:}
+\KeywordTok{get_regression_table}\NormalTok{(lifeExp_model)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 5 x 7
+  term     estimate std_error statistic p_value lower_ci upper_ci
+  <chr>       <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
+1 interce~     54.8      1.02     53.4        0     52.8     56.8
+2 contine~     18.8      1.8      10.4        0     15.2     22.4
+3 contine~     15.9      1.65      9.68       0     12.7     19.2
+4 contine~     22.8      1.70     13.5        0     19.5     26.2
+5 contine~     25.9      5.33      4.86       0     15.4     36.4
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Get regression points}
+\NormalTok{regression_points <-}\StringTok{ }\KeywordTok{get_regression_points}\NormalTok{(lifeExp_model)}
+\end{Highlighting}
+\end{Shaded}
+
+Recall our discussion on residuals from Section \ref{model1residuals} where our goal was to investigate whether or not there was a \emph{systematic pattern} to the residuals. Ideally since residuals can be thought of as error, there should be no such pattern. While there are many ways to do such residual analysis, we focused on two approaches based on visualizations.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  A plot with residuals on the vertical axis and the predictor (in this case continent) on the horizontal axis
+\item
+  A histogram of all residuals
+\end{enumerate}
+
+First, let's plot the residuals versus continent in Figure \ref{fig:catxplot7}, but also let's plot all 142 points with a little horizontal random jitter by setting the \texttt{width\ =\ 0.1} parameter in \texttt{geom\_jitter()}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(regression_points, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ continent, }\DataTypeTok{y =}\NormalTok{ residual)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_jitter}\NormalTok{(}\DataTypeTok{width =} \FloatTok{0.1}\NormalTok{) }\OperatorTok{+}\StringTok{ }
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Continent"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Residual"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_hline}\NormalTok{(}\DataTypeTok{yintercept =} \DecValTok{0}\NormalTok{, }\DataTypeTok{col =} \StringTok{"blue"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/catxplot7-1} 
+
+}
+
+\caption{Plot of residuals over continent}\label{fig:catxplot7}
+\end{figure}
+
+We observe
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  There seems to be a rough balance of both positive and negative residuals for all 5 continents.
+\item
+  However, there is one clear outlier in Asia, which has a residual with the largest deviation away from 0.
+\end{enumerate}
+
+Let's investigate the 5 countries in Asia with the shortest life expectancy:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{gapminder2007 }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{filter}\NormalTok{(continent }\OperatorTok{==}\StringTok{ "Asia"}\NormalTok{) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{arrange}\NormalTok{(lifeExp)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+
+\caption{\label{tab:unnamed-chunk-410}Countries in Asia with shortest life expectancy}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{llrr}
+\toprule
+country & continent & lifeExp & gdpPercap\\
+\midrule
+Afghanistan & Asia & 43.8 & 975\\
+Iraq & Asia & 59.5 & 4471\\
+Cambodia & Asia & 59.7 & 1714\\
+Myanmar & Asia & 62.1 & 944\\
+Yemen, Rep. & Asia & 62.7 & 2281\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+This was the earlier identified residual for Afghanistan of -26.9. Unfortunately
+given recent geopolitical turmoil, individuals who live in Afghanistan and, in particular in 2007, have a
+drastically lower life expectancy.
+
+Second, let's look at a histogram of all 142 values of
+residuals in Figure \ref{fig:catxplot8}. In this case, the residuals form a
+rather nice bell-shape, although there are a couple of very low and very high
+values at the tails. As we said previously, searching for patterns in residuals
+can be somewhat subjective, but ideally we hope there are no ``drastic'' patterns.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(regression_points, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ residual)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{5}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Residual"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/catxplot8-1} 
+
+}
+
+\caption{Histogram of residuals}\label{fig:catxplot8}
+\end{figure}
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC11.3)} Continuing with our regression using \texttt{gdpPercap} as the outcome variable and \texttt{continent} as the explanatory variable, use the \texttt{get\_regression\_points()} function to get the observed values, fitted values, and residuals for all 142 countries in 2007 and perform a residual analysis to look for any systematic patterns in the residuals. Is there a pattern? Please keep in mind that these types of questions are somewhat subjective and different people will most likely have different answers. The focus should be on being able to justify the conclusions made.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{model3residuals}{%
+\subsection{Residual analysis}\label{model3residuals}}
+
+Recall in Section \ref{model1residuals}, our first residual analysis plot investigated the presence of any systematic pattern in the residuals when we had a single numerical predictor: \texttt{bty\_age}. For the \texttt{Credit} card dataset, since we have two numerical predictors, \texttt{Limit} and \texttt{Income}, we must perform this twice:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Get data:}
+\NormalTok{Credit <-}\StringTok{ }\NormalTok{Credit }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{select}\NormalTok{(Balance, Limit, Income, Rating, Age)}
+\CommentTok{# Fit regression model:}
+\NormalTok{Balance_model <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(Balance }\OperatorTok{~}\StringTok{ }\NormalTok{Limit }\OperatorTok{+}\StringTok{ }\NormalTok{Income, }\DataTypeTok{data =}\NormalTok{ Credit)}
+\CommentTok{# Get regression table:}
+\KeywordTok{get_regression_table}\NormalTok{(Balance_model)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 3 x 7
+  term     estimate std_error statistic p_value lower_ci upper_ci
+  <chr>       <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
+1 interce~ -385.       19.5       -19.8       0 -423.    -347.   
+2 Limit       0.264     0.006      45.0       0    0.253    0.276
+3 Income     -7.66      0.385     -19.9       0   -8.42    -6.91 
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Get regression points}
+\NormalTok{regression_points <-}\StringTok{ }\KeywordTok{get_regression_points}\NormalTok{(Balance_model)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(regression_points, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ Limit, }\DataTypeTok{y =}\NormalTok{ residual)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Credit limit (in $)"}\NormalTok{, }
+       \DataTypeTok{y =} \StringTok{"Residual"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Residuals vs credit limit"}\NormalTok{)}
+  
+\KeywordTok{ggplot}\NormalTok{(regression_points, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ Income, }\DataTypeTok{y =}\NormalTok{ residual)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Income (in $1000)"}\NormalTok{, }
+       \DataTypeTok{y =} \StringTok{"Residual"}\NormalTok{, }
+       \DataTypeTok{title =} \StringTok{"Residuals vs income"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-415-1} 
+
+}
+
+\caption{Residuals vs credit limit and income}\label{fig:unnamed-chunk-415}
+\end{figure}
+
+In this case, there \textbf{does} appear to be a systematic pattern to the residuals. As the scatter of the residuals around the line \(y=0\) is definitely not consistent. This behavior of the residuals is further evidenced by the histogram of residuals in Figure \ref{fig:model3-residuals-hist}. We observe that the residuals have a slight right-skew (recall we say that data is right-skewed, or positively-skewed, if there is a tail to the right). Ideally, these residuals should be bell-shaped around a residual value of 0.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(regression_points, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ residual)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Residual"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+`stat_bin()` using `bins = 30`. Pick better value with
+`binwidth`.
+\end{verbatim}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/model3-residuals-hist-1} 
+
+}
+
+\caption{Relationship between credit card balance and credit limit/income}\label{fig:model3-residuals-hist}
+\end{figure}
+
+Another way to interpret this histogram is that since the residual is computed as \(y - \widehat{y}\) = \texttt{balance} - \texttt{balance\_hat}, we have some values where the fitted value \(\widehat{y}\) is very much lower than the observed value \(y\). In other words, we are underestimating certain credit card holders' balances by a very large amount.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC11.4)} Continuing with our regression using \texttt{Rating} and \texttt{Age} as the explanatory variables and credit card \texttt{Balance} as the outcome variable, use the \texttt{get\_regression\_points()} function to get the observed values, fitted values, and residuals for all 400 credit card holders. Perform a residual analysis and look for any systematic patterns in the residuals.
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\hypertarget{model4residuals}{%
+\subsection{Residual analysis}\label{model4residuals}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Get data:}
+\NormalTok{evals_ch7 <-}\StringTok{ }\NormalTok{evals }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{select}\NormalTok{(score, age, gender)}
+\CommentTok{# Fit regression model:}
+\NormalTok{score_model_}\DecValTok{2}\NormalTok{ <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(score }\OperatorTok{~}\StringTok{ }\NormalTok{age }\OperatorTok{+}\StringTok{ }\NormalTok{gender, }\DataTypeTok{data =}\NormalTok{ evals_ch7)}
+\CommentTok{# Get regression table:}
+\KeywordTok{get_regression_table}\NormalTok{(score_model_}\DecValTok{2}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 3 x 7
+  term     estimate std_error statistic p_value lower_ci upper_ci
+  <chr>       <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
+1 interce~    4.48      0.125     35.8    0        4.24     4.73 
+2 age        -0.009     0.003     -3.28   0.001   -0.014   -0.003
+3 genderm~    0.191     0.052      3.63   0        0.087    0.294
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Get regression points}
+\NormalTok{regression_points <-}\StringTok{ }\KeywordTok{get_regression_points}\NormalTok{(score_model_}\DecValTok{2}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+As always, let's perform a residual analysis first with a histogram, which we can facet by \texttt{gender}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(regression_points, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ residual)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.25}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Residual"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_wrap}\NormalTok{(}\OperatorTok{~}\NormalTok{gender)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/residual1-1} 
+
+}
+
+\caption{Interaction model histogram of residuals}\label{fig:residual1}
+\end{figure}
+
+Second, the residuals as compared to the predictor variables:
+
+\begin{itemize}
+\tightlist
+\item
+  \(x_1\): numerical explanatory/predictor variable of \texttt{age}
+\item
+  \(x_2\): categorical explanatory/predictor variable of \texttt{gender}
+\end{itemize}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(regression_points, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ age, }\DataTypeTok{y =}\NormalTok{ residual)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"age"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Residual"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_hline}\NormalTok{(}\DataTypeTok{yintercept =} \DecValTok{0}\NormalTok{, }\DataTypeTok{col =} \StringTok{"blue"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{1}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_wrap}\NormalTok{(}\OperatorTok{~}\StringTok{ }\NormalTok{gender)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/residual2-1} 
+
+}
+
+\caption{Interaction model residuals vs predictor}\label{fig:residual2}
+\end{figure}
+
+\hypertarget{part-conclusion}{%
+\part{Conclusion}\label{part-conclusion}}
+
+\hypertarget{thinking-with-data}{%
+\chapter{Thinking with Data}\label{thinking-with-data}}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\begin{announcement}
+\textbf{In preparation for our first print edition to be published by
+CRC Press in Fall 2019, we're remodeling this chapter a bit. Don't
+expect major changes in content, but rather only minor changes in
+presentation. Our remodeling will be complete and available online at
+\href{https://moderndive.com/}{ModernDive.com} by early Summer 2019!}
+\end{announcement}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+Recall in Section \ref{sec:intro-for-students} ``Introduction for students'' and at the end of chapters throughout this book, we displayed the ``ModernDive flowchart'' mapping your journey through this book.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/flowcharts/flowchart/flowchart.002} 
+
+}
+
+\caption{ModernDive Flowchart}\label{fig:moderndive-figure-conclusion}
+\end{figure}
+
+Let's get a refresher of what you've covered so far. You first got started with with data in Chapter \ref{getting-started}, where you learned about the difference between R and RStudio, started coding in R, started understanding what R packages are, and explored your first dataset: all domestic departure flights from a New York City airport in 2013. Then:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{Data science}: You assembled your data science toolbox using \texttt{tidyverse} packages. In particular:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Ch.\ref{viz}: Visualizing data via the \texttt{ggplot2} package.
+  \item
+    Ch.\ref{tidy}: Understanding the concept of ``tidy'' data as a standardized data input format for all packages in the \texttt{tidyverse}
+  \item
+    Ch.\ref{wrangling}: Wrangling data via the \texttt{dplyr} package.
+  \end{itemize}
+\item
+  \textbf{Data modeling}: Using these data science tools and helper functions from the \texttt{moderndive} package, you started performing data modeling. In particular:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Ch.\ref{regression}: Constructing basic regression models.
+  \item
+    Ch.\ref{multiple-regression}: Constructing multiple regression models.
+  \end{itemize}
+\item
+  \textbf{Statistical inference}: Once again using your newly acquired data science tools, we unpacked statistical inference using the \texttt{infer} package. In particular:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Ch.\ref{sampling}: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a ``bowl'' with an unknown proportion of red balls.
+  \item
+    Ch.\ref{confidence-intervals}: Building confidence intervals.
+  \item
+    Ch.\ref{hypothesis-testing}: Conducting hypothesis tests.
+  \end{itemize}
+\item
+  \textbf{Data modeling revisited}: Armed with your new understanding of statistical inference, you revisited and reviewed the models you constructed in Ch.\ref{regression} \& Ch.\ref{multiple-regression}. In particular:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Ch.\ref{inference-for-regression}: Interpreting both the statistical and practice significance of the results of the models.
+  \end{itemize}
+\end{enumerate}
+
+All this was our approach of guiding you through your first experiences of \href{https://arxiv.org/pdf/1410.3127.pdf}{``thinking with data''}, an expression originally coined by Diane Lambert of Google. How the philosophy underlying this expression guided our mapping of the flowchart above was well put in the introduction to the \href{https://peerj.com/collections/50-practicaldatascistats/}{``Practical Data Science for Stats''} collection of preprints focusing on the practical side of data science workflows and statistical analysis, curated by \href{https://twitter.com/jennybryan?lang=en}{Jennifer Bryan} and \href{https://twitter.com/hadleywickham?lang=en}{Hadley Wickham}:
+
+\begin{quote}
+There are many aspects of day-to-day analytical work that are almost absent from the conventional statistics literature and curriculum. And yet these activities account for a considerable share of the time and effort of data analysts and applied statisticians. The goal of this collection is to increase the visibility and adoption of modern data analytical workflows. We aim to facilitate the transfer of tools and frameworks between industry and academia, between software engineering and statistics and computer science, and across different domains.
+\end{quote}
+
+In other words, in order to be equipped to ``think with data'' in the 21st century, future analysts need preparation going through the entirety of the \href{http://r4ds.had.co.nz/explore-intro.html}{``Data/Science Pipeline''} we also saw earlier and not just parts of it.
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{images/tidy1} 
+
+}
+
+\caption{Data/Science Pipeline}\label{fig:pipeline-figure-conclusion}
+\end{figure}
+
+In Section \ref{seattle-house-prices}, we'll take you through full-pass of the ``Data/Science Pipeline'' where we'll analyze the sale price of houses in Seattle, WA, USA. In Section \ref{data-journalism}, we'll present you with examples of effective data storytelling, in particular the articles from the data journalism website \href{https://fivethirtyeight.com/}{FiveThirtyEight.com}, many of whose source datasets are accessible from the \texttt{fivethirtyeight} R package.
+
+\hypertarget{needed-packages-10}{%
+\subsection*{Needed packages}\label{needed-packages-10}}
+
+
+Let's load all the packages needed for this chapter (this assumes you've already installed them). Read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(moderndive)}
+\KeywordTok{library}\NormalTok{(fivethirtyeight)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{seattle-house-prices}{%
+\section{Case study: Seattle house prices}\label{seattle-house-prices}}
+
+\href{https://www.kaggle.com/}{Kaggle.com} is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the \href{https://www.kaggle.com/harlfoxem/housesalesprediction}{House Sales in King County, USA} consisting of homes sold in between May 2014 and May 2015 in King County, Washington State, USA, which includes the greater Seattle metropolitan area. This \href{https://creativecommons.org/publicdomain/zero/1.0/}{CC0: Public Domain} licensed dataset is included in the \texttt{moderndive} package in the \texttt{house\_prices} data frame, which we'll refer to as the ``Seattle house prices'' dataset.
+
+The dataset consists 21,613 houses and 21 variables describing these houses; for a full list of these variables see the help file by running \texttt{?house\_prices} in the console. In this case study, we'll create a model using multiple regression where:
+
+\begin{itemize}
+\tightlist
+\item
+  The outcome variable \(y\) is the sale \texttt{price} of houses
+\item
+  The two explanatory/predictor variables we'll use are :
+
+  \begin{enumerate}
+  \def\labelenumi{\arabic{enumi}.}
+  \tightlist
+  \item
+    \(x_1\): house size \texttt{sqft\_living}, as measured by square feet of living space, where 1 square foot is about 0.09 square meters.
+  \item
+    \(x_2\): house \texttt{condition}, a categorical variable with 5 levels where 1 indicates ``poor'' and 5 indicates ``excellent.''
+  \end{enumerate}
+\end{itemize}
+
+Let's load all the packages needed for this case study (this assumes you've already installed them). If needed, read Section \ref{packages} for information on how to install and load R packages.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(moderndive)}
+\end{Highlighting}
+\end{Shaded}
+
+\hypertarget{house-prices-EDA-I}{%
+\subsection{Exploratory data analysis (EDA)}\label{house-prices-EDA-I}}
+
+A crucial first step before any formal modeling is an exploratory data analysis, commonly abbreviated at EDA. Exploratory data analysis can give you a sense of your data, help identify issues with your data, bring to light any outliers, and help inform model construction. There are three basic approaches to EDA:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Most fundamentally, just looking at the raw data. For example using RStudio's \texttt{View()} spreadsheet viewer or the \texttt{glimpse()} function from the \texttt{dplyr} package
+\item
+  Creating visualizations like the ones using \texttt{ggplot2} from Chapter \ref{viz}
+\item
+  Computing summary statistics using the \texttt{dplyr} data wrangling tools from Chapter \ref{wrangling}
+\end{enumerate}
+
+First, let's look the raw data using \texttt{View()} and the \texttt{glimpse()} function. Explore the dataset. Which variables are numerical and which are categorical? For the categorical variables, what are their levels? Which do you think would useful variables to use in a model for house price? In this case study, we'll only consider the variables \texttt{price}, \texttt{sqft\_living}, and \texttt{condition}. An important thing to observe is that while the \texttt{condition} variable has values \texttt{1} through \texttt{5}, these are saved in R as \texttt{fct} factors i.e.~R's way of saving categorical variables. So you should think of these as the ``labels'' \texttt{1} through \texttt{5} and not the numerical values \texttt{1} through \texttt{5}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{View}\NormalTok{(house_prices)}
+\KeywordTok{glimpse}\NormalTok{(house_prices)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Observations: 21,613
+Variables: 21
+$ id            <chr> "7129300520", "6414100192", "563150040...
+$ date          <dttm> 2014-10-13, 2014-12-09, 2015-02-25, 2...
+$ price         <dbl> 221900, 538000, 180000, 604000, 510000...
+$ bedrooms      <int> 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3,...
+$ bathrooms     <dbl> 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2....
+$ sqft_living   <int> 1180, 2570, 770, 1960, 1680, 5420, 171...
+$ sqft_lot      <int> 5650, 7242, 10000, 5000, 8080, 101930,...
+$ floors        <dbl> 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0...
+$ waterfront    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
+$ view          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
+$ condition     <fct> 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 4,...
+$ grade         <fct> 7, 7, 6, 7, 8, 11, 7, 7, 7, 7, 8, 7, 7...
+$ sqft_above    <int> 1180, 2170, 770, 1050, 1680, 3890, 171...
+$ sqft_basement <int> 0, 400, 0, 910, 0, 1530, 0, 0, 730, 0,...
+$ yr_built      <int> 1955, 1951, 1933, 1965, 1987, 2001, 19...
+$ yr_renovated  <int> 0, 1991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
+$ zipcode       <fct> 98178, 98125, 98028, 98136, 98074, 980...
+$ lat           <dbl> 47.5, 47.7, 47.7, 47.5, 47.6, 47.7, 47...
+$ long          <dbl> -122, -122, -122, -122, -122, -122, -1...
+$ sqft_living15 <int> 1340, 1690, 2720, 1360, 1800, 4760, 22...
+$ sqft_lot15    <int> 5650, 7639, 8062, 5000, 7503, 101930, ...
+\end{verbatim}
+
+Let's now perform the second possible approach to EDA: creating visualizations. Since \texttt{price} and \texttt{sqft\_living} are numerical variables, an appropriate way to visualize of these variables' distributions would be using a histogram using a \texttt{geom\_histogram()} as seen in Section \ref{histograms}. However, since \texttt{condition} is categorical, a barplot using a \texttt{geom\_bar()} yields an appropriate visualization of its distribution. Recall from Section \ref{geombar} that since \texttt{condition} is not ``pre-counted'', we use a \texttt{geom\_bar()} and not a \texttt{geom\_col()}. In Figure \ref{fig:house-prices-viz}, we display all three of these visualizations at once.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Histogram of house price:}
+\KeywordTok{ggplot}\NormalTok{(house_prices, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ price)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"price (USD)"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"House price"}\NormalTok{)}
+
+\CommentTok{# Histogram of sqft_living:}
+\KeywordTok{ggplot}\NormalTok{(house_prices, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ sqft_living)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"living space (square feet)"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"House size"}\NormalTok{)}
+
+\CommentTok{# Barplot of condition:}
+\KeywordTok{ggplot}\NormalTok{(house_prices, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ condition)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"condition"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"House condition"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/house-prices-viz-1} 
+
+}
+
+\caption{Exploratory visualizations of Seattle house prices data}\label{fig:house-prices-viz}
+\end{figure}
+
+We observe the following:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  In the histogram for \texttt{price}:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Since \texttt{e+06} means \(10^6\), or one million, we see that a majority of houses are less than 2 million dollars.
+  \item
+    The x-axis stretches out far to the right to 8 million dollars, even though there appear to be no houses.
+  \end{itemize}
+\item
+  In the histogram for size \texttt{sqft\_living}
+
+  \begin{itemize}
+  \tightlist
+  \item
+    Most houses appear to have less than 5000 square feet of living space. For comparison a standard American football field is about 57,600 square feet, where as a standard soccer AKA association football field is about 64,000 square feet.
+  \item
+    The x-axis exhibits the same stretched out behavior to the right as for \texttt{price}
+  \end{itemize}
+\item
+  Most houses are of \texttt{condition} 3, 4, or 5.
+\end{enumerate}
+
+In the case of \texttt{price}, why does the x-axis stretch so far to the right? It is because there are a very small number of houses with price closer to 8 million; these prices are outliers in this case. We say variable is ``right skewed'' as exhibited by the long right tail. This skew makes it difficult to compare prices of the less expensive houses as the more expensive houses dominate the scale of the x-axis. This is similarly the case for \texttt{sqft\_living}.
+
+Let's now perform the third possible approach to EDA: computing summary statistics. In particular, let's compute 4 summary statistics using the \texttt{summarize()} data wrangling verb from Section \ref{summarize}.
+
+\begin{itemize}
+\tightlist
+\item
+  Two measures of center: the mean and median
+\item
+  Two measures of variability/spread: the standard deviation and interquartile-range (IQR = 3rd quartile - 1st quartile)
+\end{itemize}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{house_prices }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}
+    \DataTypeTok{mean_price =} \KeywordTok{mean}\NormalTok{(price),}
+    \DataTypeTok{median_price =} \KeywordTok{median}\NormalTok{(price),}
+    \DataTypeTok{sd_price =} \KeywordTok{sd}\NormalTok{(price),}
+    \DataTypeTok{IQR_price =} \KeywordTok{IQR}\NormalTok{(price)}
+\NormalTok{  )}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 4
+  mean_price median_price sd_price IQR_price
+       <dbl>        <dbl>    <dbl>     <dbl>
+1    540088.       450000  367127.    323050
+\end{verbatim}
+
+Observe the following:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  The mean \texttt{price} of \$540,088 is larger than the median of \$450,000. This is because the small number of very expensive outlier houses prices are inflating the average, whereas since the median is the ``middle'' value, it is not as sensitive to such large values at the high end. This is why the news typically reports median house prices and not average house prices when describing the real estate market. We say here that the median more ``robust to outliers'' than the mean.
+\item
+  Similarly, while both the standard deviation and IQR are both measures of spread and variability, the IQR is more ``robust to outliers.''
+\end{enumerate}
+
+If you repeat the above \texttt{summarize()} for \texttt{sqft\_living}, you'll find a similar relationship between mean vs median and standard deviation vs IQR given its similar right-skewed nature. Is there anything we can do about this right-skew? Again, this could potentially be an issue because we'll have a harder time discriminating between houses at the lower end of \texttt{price} and \texttt{sqft\_living}, which might lead to a problem when modeling.
+
+We can in fact address this issue by using a log base 10 transformation, which we cover next.
+
+\hypertarget{log10-transformations}{%
+\subsection{log10 transformations}\label{log10-transformations}}
+
+At its simplest, \texttt{log10()} transformations returns base 10 \emph{logarithms}. For example, since \(1000 = 10^3\), \texttt{log10(1000)} returns \texttt{3}. To undo a log10-transformation, we raise 10 to this value. For example, to undo the previous log10-transformation and return the original value of 1000, we raise 10 to this value \(10^{3}\) by running \texttt{10\^{}(3)\ =\ 1000}. log-transformations allow us to focus on multiplicative changes instead of additive ones, thereby emphasizing changes in ``orders of magnitude.'' Let's illustrate this idea in Table \ref{tab:logten} with examples of prices of consumer goods in US dollars.
+
+\begin{table}[H]
+
+\caption{\label{tab:logten}log10-transformed prices, orders of magnitude, and examples}
+\centering
+\fontsize{10}{12}\selectfont
+\begin{tabular}{lrll}
+\toprule
+Price & log10(Price) & Order of magnitude & Examples\\
+\midrule
+\$1 & 0 & Singles & Cups of coffee\\
+\$10 & 1 & Tens & Books\\
+\$100 & 2 & Hundreds & Mobile phones\\
+\$1,000 & 3 & Thousands & High definition TV's\\
+\$10,000 & 4 & Tens of thousands & Cars\\
+\addlinespace
+\$100,000 & 5 & Hundreds of thousands & Luxury cars \& houses\\
+\$1,000,000 & 6 & Millions & Luxury houses\\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Let's break this down:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  When purchasing a cup of coffee, we tend to think of prices ranging in single dollars e.g. \$2 or \$3. However when purchasing say mobile phones, we don't tend to think in prices in single dollars e.g. \$676 or \$757, but tend to round to the nearest unit of hundreds of dollars e.g. \$200 or \$500.
+\item
+  Let's say we want to know the log10-transformed value of \$76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since \$76 is between \$10 and \$100. In fact, \texttt{log10(76)} is 1.880814.
+\item
+  log10-transformations are \emph{monotonic}, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B).
+\item
+  Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: \$100 to \$1000.
+\end{enumerate}
+
+Let's create new log10-transformed versions of the right-skewed variable \texttt{price} and \texttt{sqft\_living} using the \texttt{mutate()} function from Section \ref{mutate}, but we'll give the latter the name \texttt{log10\_size}, which is a little more succinct and descriptive a variable name.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{house_prices <-}\StringTok{ }\NormalTok{house_prices }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{mutate}\NormalTok{(}
+    \DataTypeTok{log10_price =} \KeywordTok{log10}\NormalTok{(price),}
+    \DataTypeTok{log10_size =} \KeywordTok{log10}\NormalTok{(sqft_living)}
+\NormalTok{    )}
+\end{Highlighting}
+\end{Shaded}
+
+Let's first display the before and after effects of this transformation on these variables for only the first 10 rows of \texttt{house\_prices}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{house_prices }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(price, log10_price, sqft_living, log10_size)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 10 x 4
+     price log10_price sqft_living log10_size
+     <dbl>       <dbl>       <int>      <dbl>
+ 1  221900        5.35        1180       3.07
+ 2  538000        5.73        2570       3.41
+ 3  180000        5.26         770       2.89
+ 4  604000        5.78        1960       3.29
+ 5  510000        5.71        1680       3.23
+ 6 1225000        6.09        5420       3.73
+ 7  257500        5.41        1715       3.23
+ 8  291850        5.47        1060       3.03
+ 9  229500        5.36        1780       3.25
+10  323000        5.51        1890       3.28
+\end{verbatim}
+
+Observe in particular:
+
+\begin{itemize}
+\tightlist
+\item
+  The house in the 6th row with \texttt{price} \$1,225,000, which is just above one million dollars. Since \(10^6\) is one million, its \texttt{log10\_price} is 6.09. Contrast this with all other houses with \texttt{log10\_price} less than 6.
+\item
+  Similarly, there is only one house with size \texttt{sqft\_living} less than 1000. Since \(1000 = 10^3\), its the lone house with \texttt{log10\_size} less than 3.
+\end{itemize}
+
+Let's now visualize the before and after effects of this transformation for \texttt{price} in Figure \ref{fig:log10-price-viz}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Before:}
+\KeywordTok{ggplot}\NormalTok{(house_prices, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ price)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"price (USD)"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"House price: Before"}\NormalTok{)}
+
+\CommentTok{# After:}
+\KeywordTok{ggplot}\NormalTok{(house_prices, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ log10_price)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"log10 price (USD)"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"House price: After"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/log10-price-viz-1} 
+
+}
+
+\caption{House price before and after log10-transformation}\label{fig:log10-price-viz}
+\end{figure}
+
+Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and bell-shaped, although this isn't always necessarily the case. Now you can now better discriminate between house prices at the lower end of the scale. Let's do the same for size where the before variable is \texttt{sqft\_living} and the after variable is \texttt{log10\_size}. Observe in Figure \ref{fig:log10-size-viz} that the log10-transformation has a similar effect of un-skewing the variable. Again, we emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Before:}
+\KeywordTok{ggplot}\NormalTok{(house_prices, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ sqft_living)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"living space (square feet)"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"House size: Before"}\NormalTok{)}
+
+\CommentTok{# After:}
+\KeywordTok{ggplot}\NormalTok{(house_prices, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ log10_size)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"log10 living space (square feet)"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"House size: After"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/log10-size-viz-1} 
+
+}
+
+\caption{House size before and after log10-transformation}\label{fig:log10-size-viz}
+\end{figure}
+
+Given the now un-skewed nature of \texttt{log10\_price} and \texttt{log10\_size}, we are going to revise our modeling structure:
+
+\begin{itemize}
+\tightlist
+\item
+  We'll use a new outcome variable \(y\) \texttt{log10\_price} of houses
+\item
+  The two explanatory/predictor variables we'll use are:
+
+  \begin{enumerate}
+  \def\labelenumi{\arabic{enumi}.}
+  \tightlist
+  \item
+    \(x_1\): A modified version of house size: \texttt{log10\_size}
+  \item
+    \(x_2\): House \texttt{condition} will remain unchanged
+  \end{enumerate}
+\end{itemize}
+
+\hypertarget{eda-part-ii}{%
+\subsection{EDA Part II}\label{eda-part-ii}}
+
+Let's continue our exploratory data analysis from Subsection \ref{house-prices-EDA-I} above. The earlier EDA you performed was \emph{univariate} in nature in that we only considered one variable at a time. The goal of modeling, however, is to explore relationships between variables. So we must \emph{jointly} consider the relationship between the outcome variable \texttt{log10\_price} and the explanatory/predictor variables \texttt{log10\_size} (numerical) and \texttt{condition} (categorical). We viewed such a modeling scenario in Section \ref{model4} using the \texttt{evals} dataset, where the outcome variable was teaching \texttt{score}, the numerical explanatory/predictor variable was instructor \texttt{age} and the categorical explanatory/predictor variable was (binary) \texttt{gender}.
+
+We have two possible visual models. Either a parallel slopes model in Figure \ref{fig:house-price-parallel-slopes} where we have a different regression line for each of the 5 possible \texttt{condition} levels, each with a different intercept but the same slope:
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/house-price-parallel-slopes-1} 
+
+}
+
+\caption{Parallel slopes model}\label{fig:house-price-parallel-slopes}
+\end{figure}
+
+Or an interaction model in Figure \ref{fig:house-price-interaction}, where we allow each regression line to not only have different intercepts, but different slopes as well:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(house_prices, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ log10_size, }\DataTypeTok{y =}\NormalTok{ log10_price, }\DataTypeTok{col =}\NormalTok{ condition)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{alpha =} \FloatTok{0.1}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{y =} \StringTok{"log10 price"}\NormalTok{, }\DataTypeTok{x =} \StringTok{"log10 size"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"House prices in Seattle"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/house-price-interaction-1} 
+
+}
+
+\caption{Interaction model}\label{fig:house-price-interaction}
+\end{figure}
+
+In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plots it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the purple line is highest, followed by condition 4 and 3. As for condition 1 and 2, this pattern isn't as clear, as if you recall from the univariate barplot of \texttt{condition} in Figure \ref{fig:house-prices-viz} there are very few houses of condition 1 or 2. This reality is more apparent in an alternative visualization to Figure \ref{fig:house-price-interaction} displayed in Figure \ref{fig:house-price-interaction-2} that uses facets instead:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(house_prices, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ log10_size, }\DataTypeTok{y =}\NormalTok{ log10_price, }\DataTypeTok{col =}\NormalTok{ condition)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{alpha =} \FloatTok{0.3}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{y =} \StringTok{"log10 price"}\NormalTok{, }\DataTypeTok{x =} \StringTok{"log10 size"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"House prices in Seattle"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_wrap}\NormalTok{(}\OperatorTok{~}\NormalTok{condition)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/house-price-interaction-2-1} 
+
+}
+
+\caption{Interaction model with facets}\label{fig:house-price-interaction-2}
+\end{figure}
+
+Which exploratory visualization of the interaction model is better, the one in Figure \ref{fig:house-price-interaction} or Figure \ref{fig:house-price-interaction-2}? There is no universal right answer, you need to make a choice depending on what you want to convey, and own it.
+
+\hypertarget{house-prices-regression}{%
+\subsection{Regression modeling}\label{house-prices-regression}}
+
+For now let's focus on the latter, interaction model we've visualized in Figure \ref{fig:house-price-interaction-2} above. What are the 5 different slopes and intercepts for the condition = 1, condition = 2, \ldots{}, and condition = 5 lines in Figure \ref{fig:house-price-interaction-2}? To determine these, we first need the values from the regression table:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Fit regression model:}
+\NormalTok{price_interaction <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(log10_price }\OperatorTok{~}\StringTok{ }\NormalTok{log10_size }\OperatorTok{*}\StringTok{ }\NormalTok{condition, }\DataTypeTok{data =}\NormalTok{ house_prices)}
+\CommentTok{# Get regression table:}
+\KeywordTok{get_regression_table}\NormalTok{(price_interaction)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 10 x 7
+   term    estimate std_error statistic p_value lower_ci upper_ci
+   <chr>      <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
+ 1 interc~    3.33      0.451     7.38    0        2.45     4.22 
+ 2 log10_~    0.69      0.148     4.65    0        0.399    0.98 
+ 3 condit~    0.047     0.498     0.094   0.925   -0.93     1.02 
+ 4 condit~   -0.367     0.452    -0.812   0.417   -1.25     0.519
+ 5 condit~   -0.398     0.453    -0.879   0.38    -1.29     0.49 
+ 6 condit~   -0.883     0.457    -1.93    0.053   -1.78     0.013
+ 7 log10_~   -0.024     0.163    -0.148   0.882   -0.344    0.295
+ 8 log10_~    0.133     0.148     0.893   0.372   -0.158    0.424
+ 9 log10_~    0.146     0.149     0.979   0.328   -0.146    0.437
+10 log10_~    0.31      0.15      2.07    0.039    0.016    0.604
+\end{verbatim}
+
+Recall from Section \ref{model4interactiontable} on how to interpret the outputs where there exists an \emph{interaction term}, where in this case the ``baseline for comparison'' group for the categorical variable \texttt{condition} are the condition 1 houses. We'll write our answers as:
+
+\[\widehat{\log10(\text{price})} = \hat{\beta}_0 + \hat{\beta}_{\text{size}} * \log10(\text{size})\]
+
+for all five condition levels separately:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Condition 1: \(\widehat{\log10(\text{price})} = 3.33 + 0.69 * \log10(\text{size})\)
+\item
+  Condition 2: \(\widehat{\log10(\text{price})} = (3.33 + 0.047) + (0.69 - 0.024) * \log10(\text{size}) = 3.38 + 0.666 * \log10(\text{size})\)
+\item
+  Condition 3: \(\widehat{\log10(\text{price})} = (3.33 - 0.367) + (0.69 + 0.133) * \log10(\text{size}) = 2.96 + 0.823 * \log10(\text{size})\)
+\item
+  Condition 4: \(\widehat{\log10(\text{price})} = (3.33 - 0.398) + (0.69 + 0.146) * \log10(\text{size}) = 2.93 + 0.836 * \log10(\text{size})\)
+\item
+  Condition 5: \(\widehat{\log10(\text{price})} = (3.33 - 0.883) + (0.69 + 0.31) * \log10(\text{size}) = 2.45 + 1 * \log10(\text{size})\)
+\end{enumerate}
+
+These correspond to the regression lines in the exploratory visualization of the interaction model in Figure \ref{fig:house-price-interaction} above. For homes of all 5 condition types, as the size of the house increases, the prices increases. This is what most would expect. However, the rate of increase of price with size is fastest for the homes with condition 3, 4, and 5 of 0.823, 0.836, and 1 respectively; these are the 3 most largest slopes out of the 5.
+
+\hypertarget{house-prices-making-predictions}{%
+\subsection{Making predictions}\label{house-prices-making-predictions}}
+
+Say you're a realtor and someone calls you asking you how much their home will sell for. They tell you that it's in condition = 5 and is sized 1900 square feet. What do you tell them? We first make this prediction visually in Figure \ref{fig:house-price-interaction-3}. The predicted \texttt{log10\_price} of this house is marked with a black dot: it is where the two following lines intersect:
+
+\begin{itemize}
+\tightlist
+\item
+  The purple regression line for the condition = 5 homes and
+\item
+  The vertical dashed black line at \texttt{log10\_size} equals 3.28, since our predictor variable is the log10-transformed square feet of living space and \(\log10(1900) = 3.28\) .
+\end{itemize}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/house-price-interaction-3-1} 
+
+}
+
+\caption{Interaction model with prediction}\label{fig:house-price-interaction-3}
+\end{figure}
+
+Eyeballing it, it seems the predicted \texttt{log10\_price} seems to be around 5.72. Let's now obtain the an exact numerical value for the prediction using the values of the intercept and slope for the condition = 5 that we computed using the regression table output. We use the equation for the condition = 5 line, being sure to \texttt{log10()} the
+square footage first.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\FloatTok{2.45} \OperatorTok{+}\StringTok{ }\DecValTok{1} \OperatorTok{*}\StringTok{ }\KeywordTok{log10}\NormalTok{(}\DecValTok{1900}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+[1] 5.73
+\end{verbatim}
+
+This value is very close to our earlier visually made prediction of 5.72. But wait! We were using the outcome variable \texttt{log10\_price} as our outcome variable! So if we want a prediction in terms of \texttt{price} in dollar units, we need to un-log this by taking a power of 10 as described in Section \ref{log10-transformations}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\DecValTok{10}\OperatorTok{^}\NormalTok{(}\FloatTok{2.45} \OperatorTok{+}\StringTok{ }\DecValTok{1} \OperatorTok{*}\StringTok{ }\KeywordTok{log10}\NormalTok{(}\DecValTok{1900}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+[1] 535493
+\end{verbatim}
+
+So we our predicted price for this home of condition 5 and size 1900 square feet is \$535,493.
+
+\begin{learncheck}
+\textbf{\emph{Learning check}}
+\end{learncheck}
+
+\textbf{(LC12.1)} Repeat the regression modeling in Subsection \ref{house-prices-regression} and the prediction making you just did on the house of condition 5 and size 1900 square feet in Subsection \ref{house-prices-making-predictions}, but using the parallel slopes model you visualized in Figure \ref{fig:house-price-parallel-slopes}. Hint: it's \$524,807!
+
+\begin{learncheck}
+
+\end{learncheck}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{data-journalism}{%
+\section{Case study: Effective data storytelling}\label{data-journalism}}
+
+\begin{learncheck}
+\textbf{Note: This section is still under construction. If you would
+like to contribute, please check us out on GitHub at
+\url{https://github.com/moderndive/moderndive_book}.}
+\end{learncheck}
+
+\includegraphics[width=0.2\textwidth,height=\textheight]{images/sign-2408065_1920.png}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+As we've progressed throughout this book, you've seen how to work with data in a variety of ways. You've learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types. You've summarized data in table form and calculated summary statistics for a variety of different variables. Further, you've seen the value of inference as a process to come to conclusions about a population by using a random sample. Lastly, you've explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure. All throughout, you've learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the ``effective data storytelling'' done by data journalists around the world. Great data stories don't mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling.
+
+\hypertarget{bechdel-test-for-hollywood-gender-representation}{%
+\subsection{Bechdel test for Hollywood gender representation}\label{bechdel-test-for-hollywood-gender-representation}}
+
+We recommend you read and analyze this article by Walt Hickey entitled \href{http://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/}{The Dollar-And-Cents Case Against Hollywood's Exclusion of Women} on the Bechdel test, an informal test of gender representation in a movie. As you read over it, think carefully about how Walt is using data, graphics, and analyses to paint the picture for the reader of what the story is he wants to tell. In the spirit of reproducibility, the members of FiveThirtyEight have also shared the \href{https://github.com/fivethirtyeight/data/tree/master/bechdel}{data and R code} that they used to create for this story and many more of their articles on \href{https://github.com/fivethirtyeight/data}{GitHub}.
+
+ModernDive co-authors \href{https://twitter.com/old_man_chester?lang=en}{Chester Ismay} and \href{https://twitter.com/rudeboybert}{Albert Y. Kim} along with \href{https://twitter.com/jchunn206}{Jennifer Chunn} went one step further by creating the \href{https://fivethirtyeight-r.netlify.com/}{\texttt{fivethirtyeight} R package}. The \texttt{fivethirtyeight} package takes FiveThirtyEight's article data from GitHub, \href{http://rpubs.com/rudeboybert/fivethirtyeight_tamedata}{``tames''} it so that it's novice-friendly, and makes all data, documentation, and the original article easily accessible via an R package.
+
+The package homepage also includes a list of \href{https://fivethirtyeight-r.netlify.com/articles/fivethirtyeight.html\#data-sets}{all \texttt{fivethirtyeight} data sets} included.
+
+Furthermore, example ``vignettes'' of fully reproducible start-to-finish analyses of some of these data using \texttt{dplyr}, \texttt{ggplot2}, and other packages in the \texttt{tidyverse} is available \href{https://fivethirtyeight-r.netlify.com/articles/}{here}. For example, a vignette showing how to reproduce one of the plots at the end of the above article on the Bechdel test is available \href{https://fivethirtyeight-r.netlify.com/articles/bechdel.html}{here}.
+
+\hypertarget{us-births-in-1999}{%
+\subsection{US Births in 1999}\label{us-births-in-1999}}
+
+Here is another example involving the \texttt{US\_births\_1994\_2003} data frame of all births in the United States between 1994 and 2003. For more information on this data frame including a link to the original article on FiveThirtyEight.com, check out the help file by running \texttt{?US\_births\_1994\_2003} in the console. First, let's load all necessary packages:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(fivethirtyeight)}
+\end{Highlighting}
+\end{Shaded}
+
+It's always a good idea to preview your data, either by using RStudio's spreadsheet \texttt{View()} function or using \texttt{glimpse()} from the \texttt{dplyr} package below:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Preview data}
+\KeywordTok{glimpse}\NormalTok{(US_births_}\DecValTok{1994}\NormalTok{_}\DecValTok{2003}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+Observations: 3,652
+Variables: 6
+$ year          <int> 1994, 1994, 1994, 1994, 1994, 1994, 19...
+$ month         <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
+$ date_of_month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,...
+$ date          <date> 1994-01-01, 1994-01-02, 1994-01-03, 1...
+$ day_of_week   <ord> Sat, Sun, Mon, Tues, Wed, Thurs, Fri, ...
+$ births        <int> 8096, 7772, 10142, 11248, 11053, 11406...
+\end{verbatim}
+
+We'll focus on the number of \texttt{births} for each \texttt{date}, but only for births that occurred in 1999. Recall we achieve this using the \texttt{filter()} command from \texttt{dplyr} package:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{US_births_}\DecValTok{1999}\NormalTok{ <-}\StringTok{ }\NormalTok{US_births_}\DecValTok{1994}\NormalTok{_}\DecValTok{2003} \OperatorTok{%>%}
+\StringTok{  }\KeywordTok{filter}\NormalTok{(year }\OperatorTok{==}\StringTok{ }\DecValTok{1999}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+Since \texttt{date} is a notion of time, which has a sequential ordering to it, a linegraph AKA a ``time series'' plot would be more appropriate than a scatterplot. In other words, use a \texttt{geom\_line()} instead of \texttt{geom\_point()}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(US_births_}\DecValTok{1999}\NormalTok{, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ date, }\DataTypeTok{y =}\NormalTok{ births)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_line}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Data"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Number of births"}\NormalTok{, }\DataTypeTok{title =} \StringTok{"US Births in 1999"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-440-1} \end{center}
+
+We see a big valley occurring just before January 1st, 2000, mostly likely due to the holiday season. However, what about the major peak of over 14,000 births occurring just before October 1st, 1999? What could be the reason for this anomalously high spike in ? Time to think with data!
+
+\hypertarget{other-examples}{%
+\subsection{Other examples}\label{other-examples}}
+
+Stand by!
+
+\hypertarget{script-of-r-code-3}{%
+\subsection{Script of R code}\label{script-of-r-code-3}}
+
+An R script file of all R code used in this chapter is available \href{scripts/12-thinking-with-data.R}{here}.
+
+\hypertarget{concluding-remarks}{%
+\section*{Concluding remarks}\label{concluding-remarks}}
+
+
+If you've come to this point in the book, I'd suspect that you know a thing or two about how to work with data in R. You've also gained a lot of knowledge about how to use simulation techniques to determine statistical significance and how these techniques build an intuition about traditional inferential methods like the \(t\)-test. The hope is that you've come to appreciate data wrangling, tidy datasets, and the power of data visualization. Actually, the data visualization part may be the most important thing here. If you can create truly beautiful graphics that display information in ways that the reader can clearly decipher, you've picked up a great skill. Let's hope that that skill keeps you creating great stories with data into the near and far distant future. Thanks for coming along for the ride as we dove into modern data analysis using R!
+
+\cleardoublepage
+
+\hypertarget{appendix-appendix}{%
+\appendix \addcontentsline{toc}{chapter}{\appendixname}}
+
+
+\hypertarget{appendixA}{%
+\chapter{Statistical Background}\label{appendixA}}
+
+\hypertarget{basic-statistical-terms}{%
+\section{Basic statistical terms}\label{basic-statistical-terms}}
+
+\hypertarget{mean}{%
+\subsection{Mean}\label{mean}}
+
+The mean AKA average is the most commonly reported measure of center. It is commonly called the ``average'' though this term can be a little ambiguous. The mean is the sum of all of the data elements divided by how many elements there are. If we have \(n\) data points, the mean is given by: \[Mean = \frac{x_1 + x_2 + \cdots + x_n}{n}\]
+
+\hypertarget{median}{%
+\subsection{Median}\label{median}}
+
+The median is calculated by first sorting a variable's data from smallest to largest. After sorting the data, the middle element in the list is the \textbf{median}. If the middle falls between two values, then the median is the mean of those two values.
+
+\hypertarget{standard-deviation}{%
+\subsection{Standard deviation}\label{standard-deviation}}
+
+We will next discuss the \textbf{standard deviation} of a sample dataset pertaining to one variable. The formula can be a little intimidating at first but it is important to remember that it is essentially a measure of how far to expect a given data value is from its mean:
+
+\[Standard \, deviation = \sqrt{\frac{(x_1 - Mean)^2 + (x_2 - Mean)^2 + \cdots + (x_n - Mean)^2}{n - 1}}\]
+
+\hypertarget{five-number-summary}{%
+\subsection{Five-number summary}\label{five-number-summary}}
+
+The \textbf{five-number summary} consists of five values: minimum, first quantile AKA 25\textsuperscript{th} percentile, second quantile AKA median AKA 50\textsuperscript{th} percentile, third quantile AKA 75\textsuperscript{th}, and maximum. The quantiles are calculated as
+
+\begin{itemize}
+\tightlist
+\item
+  first quantile (\(Q_1\)): the median of the first half of the sorted data
+\item
+  third quantile (\(Q_3\)): the median of the second half of the sorted data
+\end{itemize}
+
+The \emph{interquartile range} is defined as \(Q_3 - Q_1\) and is a measure of how spread out the middle 50\% of values is. The five-number summary is not influenced by the presence of outliers in the ways that the mean and standard deviation are. It is, thus, recommended for skewed datasets.
+
+\hypertarget{distribution}{%
+\subsection{Distribution}\label{distribution}}
+
+The \textbf{distribution} of a variable/dataset corresponds to generalizing patterns in the dataset. It often shows how frequently elements in the dataset appear. It shows how the data varies and gives some information about where a typical element in the data might fall. Distributions are most easily seen through data visualization.
+
+\hypertarget{outliers}{%
+\subsection{Outliers}\label{outliers}}
+
+\textbf{Outliers} correspond to values in the dataset that fall far outside the range of ``ordinary'' values. In regards to a boxplot (by default), they correspond to values below \(Q_1 - (1.5 * IQR)\) or above \(Q_3 + (1.5 * IQR)\).
+
+Note that these terms (aside from \textbf{Distribution}) only apply to quantitative variables.
+
+\hypertarget{normal-distribution-discussion}{%
+\section{Normal distribution discussion}\label{normal-distribution-discussion}}
+
+\hypertarget{appendixB}{%
+\chapter{Inference Examples}\label{appendixB}}
+
+This appendix is designed to provide you with examples of the five basic hypothesis tests and their corresponding confidence intervals. Traditional theory-based methods as well as computational-based methods are presented.
+
+\begin{learncheck}
+\textbf{Note: This appendix is still under construction. If you would
+like to contribute, please check us out on GitHub at
+\url{https://github.com/moderndive/moderndive_book}.}
+
+\textbf{Please check out our sneak peak of \texttt{infer} below in the
+meanwhile. For more details on \texttt{infer} visit
+\url{https://infer.netlify.com/}}.
+\end{learncheck}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{include_image}\NormalTok{(}\DataTypeTok{path =} \StringTok{"images/sign-2408065_1920.png"}\NormalTok{,}
+              \DataTypeTok{html_opts=}\StringTok{"height=100px"}\NormalTok{,}
+              \DataTypeTok{latex_opts =} \StringTok{"width=20%"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+![](images/sign-2408065_1920.png){ width=20% }
+\end{verbatim}
+
+\hypertarget{needed-packages-11}{%
+\section*{Needed packages}\label{needed-packages-11}}
+
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(infer)}
+\KeywordTok{library}\NormalTok{(knitr)}
+\KeywordTok{library}\NormalTok{(kableExtra)}
+\KeywordTok{library}\NormalTok{(readr)}
+\KeywordTok{library}\NormalTok{(janitor)}
+\end{Highlighting}
+\end{Shaded}
+
+\hypertarget{inference-mind-map}{%
+\section{Inference mind map}\label{inference-mind-map}}
+
+To help you better navigate and choose the appropriate analysis, we've created a mind map on \url{http://coggle.it} available \href{https://coggle.it/diagram/Vxlydu1akQFeqo6-}{here} and below.
+
+\begin{figure}
+
+{\centering \includegraphics[width=2\linewidth]{images/coggle} 
+
+}
+
+\caption{Mind map for Inference}\label{fig:infer-map}
+\end{figure}
+
+\hypertarget{one-mean}{%
+\section{One mean}\label{one-mean}}
+
+\hypertarget{problem-statement}{%
+\subsection{Problem statement}\label{problem-statement}}
+
+The National Survey of Family Growth conducted by the
+Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy,
+infertility, use of contraception, and men's and women's health. One of the variables collected on
+this survey is the age at first marriage. 5,534 randomly sampled US women between 2006 and 2010 completed the survey. The women sampled here had been married at least once. Do we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years? \citep[Tweaked a bit from][ {[}Chapter 4{]}]{isrs2014}
+
+\hypertarget{competing-hypotheses}{%
+\subsection{Competing hypotheses}\label{competing-hypotheses}}
+
+\hypertarget{in-words}{%
+\subsubsection*{In words}\label{in-words}}
+
+
+\begin{itemize}
+\item
+  Null hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is equal to 23 years.
+\item
+  Alternative hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.
+\end{itemize}
+
+\hypertarget{in-symbols-with-annotations}{%
+\subsubsection*{In symbols (with annotations)}\label{in-symbols-with-annotations}}
+
+
+\begin{itemize}
+\tightlist
+\item
+  \(H_0: \mu = \mu_{0}\), where \(\mu\) represents the mean age of first marriage for all US women from 2006 to 2010 and \(\mu_0\) is 23.
+\item
+  \(H_A: \mu > 23\)
+\end{itemize}
+
+\hypertarget{set-alpha}{%
+\subsubsection*{\texorpdfstring{Set \(\alpha\)}{Set \textbackslash{}alpha}}\label{set-alpha}}
+
+
+It's important to set the significance level before starting the testing using the data. Let's set the significance level at 5\% here.
+
+\hypertarget{exploring-the-sample-data}{%
+\subsection{Exploring the sample data}\label{exploring-the-sample-data}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{age_at_marriage <-}\StringTok{ }\KeywordTok{read_csv}\NormalTok{(}\StringTok{"https://moderndive.com/data/ageAtMar.csv"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{age_summ <-}\StringTok{ }\NormalTok{age_at_marriage }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{sample_size =} \KeywordTok{n}\NormalTok{(),}
+    \DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(age),}
+    \DataTypeTok{sd =} \KeywordTok{sd}\NormalTok{(age),}
+    \DataTypeTok{minimum =} \KeywordTok{min}\NormalTok{(age),}
+    \DataTypeTok{lower_quartile =} \KeywordTok{quantile}\NormalTok{(age, }\FloatTok{0.25}\NormalTok{),}
+    \DataTypeTok{median =} \KeywordTok{median}\NormalTok{(age),}
+    \DataTypeTok{upper_quartile =} \KeywordTok{quantile}\NormalTok{(age, }\FloatTok{0.75}\NormalTok{),}
+    \DataTypeTok{max =} \KeywordTok{max}\NormalTok{(age))}
+\KeywordTok{kable}\NormalTok{(age_summ) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{kable_styling}\NormalTok{(}\DataTypeTok{font_size =} \KeywordTok{ifelse}\NormalTok{(knitr}\OperatorTok{:::}\KeywordTok{is_latex_output}\NormalTok{(), }\DecValTok{10}\NormalTok{, }\DecValTok{16}\NormalTok{), }
+                \DataTypeTok{latex_options =} \KeywordTok{c}\NormalTok{(}\StringTok{"HOLD_position"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+\centering\begingroup\fontsize{10}{12}\selectfont
+
+\begin{tabular}{r|r|r|r|r|r|r|r}
+\hline
+sample\_size & mean & sd & minimum & lower\_quartile & median & upper\_quartile & max\\
+\hline
+5534 & 23.4 & 4.72 & 10 & 20 & 23 & 26 & 43\\
+\hline
+\end{tabular}
+\endgroup{}
+\end{table}
+
+The histogram below also shows the distribution of \texttt{age}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ age_at_marriage, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ age)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{3}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/hist1b-1} \end{center}
+
+The observed statistic of interest here is the sample mean:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{x_bar <-}\StringTok{ }\NormalTok{age_at_marriage }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ age) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"mean"}\NormalTok{)}
+\NormalTok{x_bar}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1  23.4
+\end{verbatim}
+
+\hypertarget{guess-about-statistical-significance}{%
+\subsubsection*{Guess about statistical significance}\label{guess-about-statistical-significance}}
+
+
+We are looking to see if the observed sample mean of 23.44 is statistically greater than \(\mu_0 = 23\). They seem to be quite close, but we have a large sample size here. Let's guess that the large sample size will lead us to reject this practically small difference.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{non-traditional-methods}{%
+\subsection{Non-traditional methods}\label{non-traditional-methods}}
+
+\hypertarget{bootstrapping-for-hypothesis-test}{%
+\subsubsection*{Bootstrapping for hypothesis test}\label{bootstrapping-for-hypothesis-test}}
+
+
+In order to look to see if the observed sample mean of 23.44 is statistically greater than \(\mu_0 = 23\), we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 5534 was selected.
+
+We can use the idea of \emph{bootstrapping} to simulate the population from which the sample came and then generate samples from that simulated population to account for sampling variability. Recall how bootstrapping would apply in this context:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  Sample with replacement from our original sample of 5534 women and repeat this process 10,000 times,
+\item
+  calculate the mean for each of the 10,000 bootstrap samples created in Step 1.,
+\item
+  combine all of these bootstrap statistics calculated in Step 2 into a \texttt{boot\_distn} object, and
+\item
+  shift the center of this distribution over to the null value of 23. (This is needed since it will be centered at 23.44 via the process of bootstrapping.)
+\end{enumerate}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{set.seed}\NormalTok{(}\DecValTok{2018}\NormalTok{)}
+\NormalTok{null_distn_one_mean <-}\StringTok{ }\NormalTok{age_at_marriage }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ age) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"point"}\NormalTok{, }\DataTypeTok{mu =} \DecValTok{23}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"mean"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distn_one_mean }\OperatorTok{%>%}\StringTok{ }\KeywordTok{visualize}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-449-1} \end{center}
+
+We can next use this distribution to observe our \(p\)-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44 for our \(p\)-value.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distn_one_mean }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ x_bar, }\DataTypeTok{direction =} \StringTok{"greater"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-450-1} \end{center}
+
+\hypertarget{calculate-p-value}{%
+\paragraph{\texorpdfstring{Calculate \(p\)-value}{Calculate p-value}}\label{calculate-p-value}}
+\addcontentsline{toc}{paragraph}{Calculate \(p\)-value}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pvalue <-}\StringTok{ }\NormalTok{null_distn_one_mean }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{get_pvalue}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ x_bar, }\DataTypeTok{direction =} \StringTok{"greater"}\NormalTok{)}
+\NormalTok{pvalue}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  p_value
+    <dbl>
+1       0
+\end{verbatim}
+
+So our \(p\)-value is 0 and we reject the null hypothesis at the 5\% level. You can also see this from the histogram above that we are far into the tail of the null distribution.
+
+\hypertarget{bootstrapping-for-confidence-interval}{%
+\subsubsection*{Bootstrapping for confidence interval}\label{bootstrapping-for-confidence-interval}}
+
+
+We can also create a confidence interval for the unknown population parameter \(\mu\) using our sample data using \emph{bootstrapping}. Note that we don't need to shift this distribution since we want the center of our confidence interval to be our point estimate \(\bar{x}_{obs} = 23.44\).
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{boot_distn_one_mean <-}\StringTok{ }\NormalTok{age_at_marriage }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ age) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"mean"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{ci <-}\StringTok{ }\NormalTok{boot_distn_one_mean }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{()}
+\NormalTok{ci}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `2.5%` `97.5%`
+   <dbl>   <dbl>
+1   23.3    23.6
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{boot_distn_one_mean }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{endpoints =}\NormalTok{ ci, }\DataTypeTok{direction =} \StringTok{"between"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-454-1} \end{center}
+
+We see that 23 is not contained in this confidence interval as a plausible value of \(\mu\) (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (\(\mu > 23\)).
+
+\textbf{Interpretation}: We are 95\% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.316 and 23.565.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{traditional-methods}{%
+\subsection{Traditional methods}\label{traditional-methods}}
+
+\hypertarget{check-conditions}{%
+\subsubsection*{Check conditions}\label{check-conditions}}
+
+
+Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\item
+  \emph{Independent observations}: The observations are collected independently.
+
+  The cases are selected independently through random sampling so this condition is met.
+\item
+  \emph{Approximately normal}: The distribution of the response variable should be normal or the sample size should be at least 30.
+
+  The histogram for the sample above does show some skew.
+\end{enumerate}
+
+The Q-Q plot below also shows some skew.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ age_at_marriage, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{sample =}\NormalTok{ age)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{stat_qq}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/qqplotmean-1} \end{center}
+
+The sample size here is quite large though (\(n = 5534\)) so both conditions are met.
+
+\hypertarget{test-statistic}{%
+\subsubsection*{Test statistic}\label{test-statistic}}
+
+
+The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean \(\mu\). A good guess is the sample mean \(\bar{X}\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \(\bar{x}_{obs} = 23.44\) or larger assuming that the population mean is 23 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can ``standardize'' this original test statistic of \(\bar{X}\) into a \(T\) statistic that follows a \(t\) distribution with degrees of freedom equal to \(df = n - 1\):
+
+\[ T =\dfrac{ \bar{X} - \mu_0}{ S / \sqrt{n} } \sim t (df = n - 1) \]
+
+where \(S\) represents the standard deviation of the sample and \(n\) is the sample size.
+
+\hypertarget{observed-test-statistic}{%
+\paragraph{Observed test statistic}\label{observed-test-statistic}}
+\addcontentsline{toc}{paragraph}{Observed test statistic}
+
+While one could compute this observed test statistic by ``hand'', the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the \texttt{t\_test()} function to perform this analysis for us.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{t_test_results <-}\StringTok{ }\NormalTok{age_at_marriage }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\NormalTok{infer}\OperatorTok{::}\KeywordTok{t_test}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ age }\OperatorTok{~}\StringTok{ }\OtherTok{NULL}\NormalTok{,}
+       \DataTypeTok{alternative =} \StringTok{"greater"}\NormalTok{,}
+       \DataTypeTok{mu =} \DecValTok{23}\NormalTok{)}
+\NormalTok{t_test_results}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 6
+  statistic  t_df  p_value alternative lower_ci upper_ci
+      <dbl> <dbl>    <dbl> <chr>          <dbl>    <dbl>
+1      6.94  5533 2.25e-12 greater         23.3      Inf
+\end{verbatim}
+
+We see here that the \(t_{obs}\) value is 6.936.
+
+\hypertarget{compute-p-value}{%
+\subsubsection*{\texorpdfstring{Compute \(p\)-value}{Compute p-value}}\label{compute-p-value}}
+
+
+The \(p\)-value---the probability of observing an \(t_{obs}\) value of 6.936 or more in our null distribution of a \(t\) with 5533 degrees of freedom---is essentially 0.
+
+\hypertarget{state-conclusion}{%
+\subsubsection*{State conclusion}\label{state-conclusion}}
+
+
+We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean was statistically greater than the hypothesized mean has supporting evidence here. Based on this sample, we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.
+
+\hypertarget{confidence-interval}{%
+\subsubsection*{Confidence interval}\label{confidence-interval}}
+
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{t.test}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ age_at_marriage}\OperatorTok{$}\NormalTok{age, }
+       \DataTypeTok{alternative =} \StringTok{"two.sided"}\NormalTok{,}
+       \DataTypeTok{mu =} \DecValTok{23}\NormalTok{)}\OperatorTok{$}\NormalTok{conf}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+[1] 23.3 23.6
+attr(,"conf.level")
+[1] 0.95
+\end{verbatim}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{comparing-results}{%
+\subsection{Comparing results}\label{comparing-results}}
+
+Observing the bootstrap distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met (the large sample size was the driver here) leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{one-proportion}{%
+\section{One proportion}\label{one-proportion}}
+
+\hypertarget{problem-statement-1}{%
+\subsection{Problem statement}\label{problem-statement-1}}
+
+The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. 73 were satisfied and the remaining were unsatisfied. Based on these findings from the sample, can we reject the CEO's hypothesis that 80\% of the customers are satisfied? {[}Tweaked a bit from \url{http://stattrek.com/hypothesis-test/proportion.aspx?Tutorial=AP}{]}
+
+\hypertarget{competing-hypotheses-1}{%
+\subsection{Competing hypotheses}\label{competing-hypotheses-1}}
+
+\hypertarget{in-words-1}{%
+\subsubsection*{In words}\label{in-words-1}}
+
+
+\begin{itemize}
+\item
+  Null hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is equal 0.80.
+\item
+  Alternative hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80.
+\end{itemize}
+
+\hypertarget{in-symbols-with-annotations-1}{%
+\subsubsection*{In symbols (with annotations)}\label{in-symbols-with-annotations-1}}
+
+
+\begin{itemize}
+\tightlist
+\item
+  \(H_0: \pi = p_{0}\), where \(\pi\) represents the proportion of all customers of the large electric utility satisfied with service they receive and \(p_0\) is 0.8.
+\item
+  \(H_A: \pi \ne 0.8\)
+\end{itemize}
+
+\hypertarget{set-alpha-1}{%
+\subsubsection*{\texorpdfstring{Set \(\alpha\)}{Set \textbackslash{}alpha}}\label{set-alpha-1}}
+
+
+It's important to set the significance level before starting the testing using the data. Let's set the significance level at 5\% here.
+
+\hypertarget{exploring-the-sample-data-1}{%
+\subsection{Exploring the sample data}\label{exploring-the-sample-data-1}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{elec <-}\StringTok{ }\KeywordTok{c}\NormalTok{(}\KeywordTok{rep}\NormalTok{(}\StringTok{"satisfied"}\NormalTok{, }\DecValTok{73}\NormalTok{), }\KeywordTok{rep}\NormalTok{(}\StringTok{"unsatisfied"}\NormalTok{, }\DecValTok{27}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{as_data_frame}\NormalTok{() }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{rename}\NormalTok{(}\DataTypeTok{satisfy =}\NormalTok{ value)}
+\end{Highlighting}
+\end{Shaded}
+
+The bar graph below also shows the distribution of \texttt{satisfy}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ elec, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ satisfy)) }\OperatorTok{+}\StringTok{ }
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/bar-1} \end{center}
+
+The observed statistic is computed as
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{p_hat <-}\StringTok{ }\NormalTok{elec }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ satisfy, }\DataTypeTok{success =} \StringTok{"satisfied"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"prop"}\NormalTok{)}
+\NormalTok{p_hat}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1  0.73
+\end{verbatim}
+
+\hypertarget{guess-about-statistical-significance-1}{%
+\subsubsection*{Guess about statistical significance}\label{guess-about-statistical-significance-1}}
+
+
+We are looking to see if the sample proportion of 0.73 is statistically different from \(p_0 = 0.8\) based on this sample. They seem to be quite close, and our sample size is not huge here (\(n = 100\)). Let's guess that we do not have evidence to reject the null hypothesis.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{non-traditional-methods-1}{%
+\subsection{Non-traditional methods}\label{non-traditional-methods-1}}
+
+\hypertarget{simulation-for-hypothesis-test}{%
+\subsubsection*{Simulation for hypothesis test}\label{simulation-for-hypothesis-test}}
+
+
+In order to look to see if 0.73 is statistically different from 0.8, we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 100 was selected. We can use the idea of an unfair coin to \emph{simulate} this process. We will simulate flipping an unfair coin (with probability of success 0.8 matching the null hypothesis) 100 times. Then we will keep track of how many heads come up in those 100 flips. Our simulated statistic matches with how we calculated the original statistic \(\hat{p}\): the number of heads (satisfied) out of our total sample of 100. We then repeat this process many times (say 10,000) to create the null distribution looking at the simulated proportions of successes:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{set.seed}\NormalTok{(}\DecValTok{2018}\NormalTok{)}
+\NormalTok{null_distn_one_prop <-}\StringTok{ }\NormalTok{elec }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ satisfy, }\DataTypeTok{success =} \StringTok{"satisfied"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"point"}\NormalTok{, }\DataTypeTok{p =} \FloatTok{0.8}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"prop"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distn_one_prop }\OperatorTok{%>%}\StringTok{ }\KeywordTok{visualize}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-457-1} \end{center}
+
+We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are 0.8 - 0.73 = 0.07 away from 0.8 in BOTH directions for our \(p\)-value:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distn_one_prop }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ p_hat, }\DataTypeTok{direction =} \StringTok{"both"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-458-1} \end{center}
+
+\hypertarget{calculate-p-value-1}{%
+\paragraph{\texorpdfstring{Calculate \(p\)-value}{Calculate p-value}}\label{calculate-p-value-1}}
+\addcontentsline{toc}{paragraph}{Calculate \(p\)-value}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pvalue <-}\StringTok{ }\NormalTok{null_distn_one_prop }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_pvalue}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ p_hat, }\DataTypeTok{direction =} \StringTok{"both"}\NormalTok{)}
+\NormalTok{pvalue}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  p_value
+    <dbl>
+1   0.114
+\end{verbatim}
+
+So our \(p\)-value is 0.114 and we fail to reject the null hypothesis at the 5\% level.
+
+\hypertarget{bootstrapping-for-confidence-interval-1}{%
+\subsubsection*{Bootstrapping for confidence interval}\label{bootstrapping-for-confidence-interval-1}}
+
+
+We can also create a confidence interval for the unknown population parameter \(\pi\) using our sample data. To do so, we use \emph{bootstrapping}, which involves
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  sampling with replacement from our original sample of 100 survey respondents and repeating this process 10,000 times,
+\item
+  calculating the proportion of successes for each of the 10,000 bootstrap samples created in Step 1.,
+\item
+  combining all of these bootstrap statistics calculated in Step 2 into a \texttt{boot\_distn} object,
+\item
+  identifying the 2.5\textsuperscript{th} and 97.5\textsuperscript{th} percentiles of this distribution (corresponding to the 5\% significance level chosen) to find a 95\% confidence interval for \(\pi\), and
+\item
+  interpret this confidence interval in the context of the problem.
+\end{enumerate}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{boot_distn_one_prop <-}\StringTok{ }\NormalTok{elec }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ satisfy, }\DataTypeTok{success =} \StringTok{"satisfied"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"prop"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+Just as we use the \texttt{mean} function for calculating the mean over a numerical variable, we can also use it to compute the proportion of successes for a categorical variable where we specify what we are calling a ``success'' after the \texttt{==}. (Think about the formula for calculating a mean and how R handles logical statements such as \texttt{satisfy\ ==\ "satisfied"} for why this must be true.)
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{ci <-}\StringTok{ }\NormalTok{boot_distn_one_prop }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{()}
+\NormalTok{ci}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `2.5%` `97.5%`
+   <dbl>   <dbl>
+1   0.64    0.81
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{boot_distn_one_prop }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{endpoints =}\NormalTok{ ci, }\DataTypeTok{direction =} \StringTok{"between"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-462-1} \end{center}
+
+We see that 0.80 is contained in this confidence interval as a plausible value of \(\pi\) (the unknown population proportion). This matches with our hypothesis test results of failing to reject the null hypothesis.
+
+\textbf{Interpretation}: We are 95\% confident the true proportion of customers who are satisfied with the service they receive is between 0.64 and 0.81.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{traditional-methods-1}{%
+\subsection{Traditional methods}\label{traditional-methods-1}}
+
+\hypertarget{check-conditions-1}{%
+\subsubsection*{Check conditions}\label{check-conditions-1}}
+
+
+Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\item
+  \emph{Independent observations}: The observations are collected independently.
+
+  The cases are selected independently through random sampling so this condition is met.
+\item
+  \emph{Approximately normal}: The number of expected successes and expected failures is at least 10.
+
+  This condition is met since 73 and 27 are both greater than 10.
+\end{enumerate}
+
+\hypertarget{test-statistic-1}{%
+\subsubsection*{Test statistic}\label{test-statistic-1}}
+
+
+The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population proportion \(\pi\). A good guess is the sample proportion \(\hat{P}\). Recall that this sample proportion is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample proportion of \(\hat{p}_{obs} = 0.73\) or larger assuming that the population proportion is 0.80 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can standardize this original test statistic of \(\hat{P}\) into a \(Z\) statistic that follows a \(N(0, 1)\) distribution.
+
+\[ Z =\dfrac{ \hat{P} - p_0}{\sqrt{\dfrac{p_0(1 - p_0)}{n} }} \sim N(0, 1) \]
+
+\hypertarget{observed-test-statistic-1}{%
+\paragraph{Observed test statistic}\label{observed-test-statistic-1}}
+\addcontentsline{toc}{paragraph}{Observed test statistic}
+
+While one could compute this observed test statistic by ``hand'' by plugging the observed values into the formula, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. The calculation has been done in R below for completeness though:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{p_hat <-}\StringTok{ }\FloatTok{0.73}
+\NormalTok{p0 <-}\StringTok{ }\FloatTok{0.8}
+\NormalTok{n <-}\StringTok{ }\DecValTok{100}
+\NormalTok{(z_obs <-}\StringTok{ }\NormalTok{(p_hat }\OperatorTok{-}\StringTok{ }\NormalTok{p0) }\OperatorTok{/}\StringTok{ }\KeywordTok{sqrt}\NormalTok{( (p0 }\OperatorTok{*}\StringTok{ }\NormalTok{(}\DecValTok{1} \OperatorTok{-}\StringTok{ }\NormalTok{p0)) }\OperatorTok{/}\StringTok{ }\NormalTok{n))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+[1] -1.75
+\end{verbatim}
+
+We see here that the \(z_{obs}\) value is around -1.75. Our observed sample proportion of 0.73 is 1.75 standard errors below the hypothesized parameter value of 0.8.
+
+\hypertarget{visualize-and-compute-p-value}{%
+\subsubsection*{\texorpdfstring{Visualize and compute \(p\)-value}{Visualize and compute p-value}}\label{visualize-and-compute-p-value}}
+
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{elec }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ satisfy, }\DataTypeTok{success =} \StringTok{"satisfied"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"point"}\NormalTok{, }\DataTypeTok{p =} \FloatTok{0.8}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"z"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{method =} \StringTok{"theoretical"}\NormalTok{, }\DataTypeTok{obs_stat =}\NormalTok{ z_obs, }\DataTypeTok{direction =} \StringTok{"both"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/pvaloneprop-1} \end{center}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\DecValTok{2} \OperatorTok{*}\StringTok{ }\KeywordTok{pnorm}\NormalTok{(z_obs)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+[1] 0.0801
+\end{verbatim}
+
+The \(p\)-value---the probability of observing an \(z_{obs}\) value of -1.75 or more extreme (in both directions) in our null distribution---is around 8\%.
+
+Note that we could also do this test directly using the \texttt{prop.test} function.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{stats}\OperatorTok{::}\KeywordTok{prop.test}\NormalTok{(}\DataTypeTok{x =} \KeywordTok{table}\NormalTok{(elec}\OperatorTok{$}\NormalTok{satisfy),}
+       \DataTypeTok{n =} \KeywordTok{length}\NormalTok{(elec}\OperatorTok{$}\NormalTok{satisfy),}
+       \DataTypeTok{alternative =} \StringTok{"two.sided"}\NormalTok{,}
+       \DataTypeTok{p =} \FloatTok{0.8}\NormalTok{,}
+       \DataTypeTok{correct =} \OtherTok{FALSE}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+
+    1-sample proportions test without continuity correction
+
+data:  table(elec$satisfy), null probability 0.8
+X-squared = 3, df = 1, p-value = 0.08
+alternative hypothesis: true p is not equal to 0.8
+95 percent confidence interval:
+ 0.636 0.807
+sample estimates:
+   p 
+0.73 
+\end{verbatim}
+
+\texttt{prop.test} does a \(\chi^2\) test here but this matches up exactly with what we would expect: \(x^2_{obs} = 3.06 = (-1.75)^2 = (z_{obs})^2\) and the \(p\)-values are the same because we are focusing on a two-tailed test.
+
+Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping.
+
+\hypertarget{state-conclusion-1}{%
+\subsubsection*{State conclusion}\label{state-conclusion-1}}
+
+
+We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample proportion was not statistically greater than the hypothesized proportion has not been invalidated. Based on this sample, we have do not evidence that the proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80, at the 5\% level.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{comparing-results-1}{%
+\subsection{Comparing results}\label{comparing-results-1}}
+
+Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{two-proportions}{%
+\section{Two proportions}\label{two-proportions}}
+
+\hypertarget{problem-statement-2}{%
+\subsection{Problem statement}\label{problem-statement-2}}
+
+A 2010 survey asked 827 randomly sampled registered voters
+in California ``Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of
+California? Or do you not know enough to say?'' Conduct a hypothesis test to determine if the data
+provide strong evidence that the proportion of college
+graduates who do not have an opinion on this issue is
+different than that of non-college graduates. \citep[Tweaked a bit from][ {[}Chapter 6{]}]{isrs2014}
+
+\hypertarget{competing-hypotheses-2}{%
+\subsection{Competing hypotheses}\label{competing-hypotheses-2}}
+
+\hypertarget{in-words-2}{%
+\subsubsection*{In words}\label{in-words-2}}
+
+
+\begin{itemize}
+\item
+  Null hypothesis: There is no association between having an opinion on drilling and having a college degree for all registered California voters in 2010.
+\item
+  Alternative hypothesis: There is an association between having an opinion on drilling and having a college degree for all registered California voters in 2010.
+\end{itemize}
+
+\hypertarget{another-way-in-words}{%
+\subsubsection*{Another way in words}\label{another-way-in-words}}
+
+
+\begin{itemize}
+\item
+  Null hypothesis: The probability that a Californian voter in 2010 having no opinion on drilling and is a college graduate is the \textbf{same} as that of a non-college graduate.
+\item
+  Alternative hypothesis: These parameter probabilities are different.
+\end{itemize}
+
+\hypertarget{in-symbols-with-annotations-2}{%
+\subsubsection*{In symbols (with annotations)}\label{in-symbols-with-annotations-2}}
+
+
+\begin{itemize}
+\tightlist
+\item
+  \(H_0: \pi_{college} = \pi_{no\_college}\) or \(H_0: \pi_{college} - \pi_{no\_college} = 0\), where \(\pi\) represents the probability of not having an opinion on drilling.
+\item
+  \(H_A: \pi_{college} - \pi_{no\_college} \ne 0\)
+\end{itemize}
+
+\hypertarget{set-alpha-2}{%
+\subsubsection*{\texorpdfstring{Set \(\alpha\)}{Set \textbackslash{}alpha}}\label{set-alpha-2}}
+
+
+It's important to set the significance level before starting the testing using the data. Let's set the significance level at 5\% here.
+
+\hypertarget{exploring-the-sample-data-2}{%
+\subsection{Exploring the sample data}\label{exploring-the-sample-data-2}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{offshore <-}\StringTok{ }\KeywordTok{read_csv}\NormalTok{(}\StringTok{"https://moderndive.com/data/offshore.csv"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{offshore }\OperatorTok{%>%}\StringTok{ }\KeywordTok{tabyl}\NormalTok{(college_grad, response)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+ college_grad no opinion opinion
+           no        131     258
+          yes        104     334
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{off_summ <-}\StringTok{ }\NormalTok{offshore }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(college_grad) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{prop_no_opinion =} \KeywordTok{mean}\NormalTok{(response }\OperatorTok{==}\StringTok{ "no opinion"}\NormalTok{),}
+    \DataTypeTok{sample_size =} \KeywordTok{n}\NormalTok{())}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(offshore, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ college_grad, }\DataTypeTok{fill =}\NormalTok{ response)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{position =} \StringTok{"fill"}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{coord_flip}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/stacked_bar-1} \end{center}
+
+\hypertarget{guess-about-statistical-significance-2}{%
+\subsubsection*{Guess about statistical significance}\label{guess-about-statistical-significance-2}}
+
+
+We are looking to see if a difference exists in the size of the bars corresponding to \texttt{no\ opinion} for the plot. Based solely on the plot, we have little reason to believe that a difference exists since the bars seem to be about the same size, BUT\ldots{}it's important to use statistics to see if that difference is actually statistically significant!
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{non-traditional-methods-2}{%
+\subsection{Non-traditional methods}\label{non-traditional-methods-2}}
+
+\hypertarget{collecting-summary-info}{%
+\subsubsection*{Collecting summary info}\label{collecting-summary-info}}
+
+
+The observed statistic is
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{d_hat <-}\StringTok{ }\NormalTok{offshore }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(response }\OperatorTok{~}\StringTok{ }\NormalTok{college_grad, }\DataTypeTok{success =} \StringTok{"no opinion"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in props"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"yes"}\NormalTok{, }\StringTok{"no"}\NormalTok{))}
+\NormalTok{d_hat}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+     stat
+    <dbl>
+1 -0.0993
+\end{verbatim}
+
+\hypertarget{randomization-for-hypothesis-test}{%
+\subsubsection*{Randomization for hypothesis test}\label{randomization-for-hypothesis-test}}
+
+
+In order to look to see if the observed sample proportion of no opinion for college graduates of 0.337 is statistically different than that for graduates of 0.237, we need to account for the sample sizes. Note that this is the same as looking to see if \(\hat{p}_{grad} - \hat{p}_{nograd}\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 389 and 438 were selected.
+
+We can use the idea of \emph{randomization testing} (also known as \emph{permutation testing}) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using \emph{shuffling} from that simulated population to account for sampling variability.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{set.seed}\NormalTok{(}\DecValTok{2018}\NormalTok{)}
+\NormalTok{null_distn_two_props <-}\StringTok{ }\NormalTok{offshore }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(response }\OperatorTok{~}\StringTok{ }\NormalTok{college_grad, }\DataTypeTok{success =} \StringTok{"no opinion"}\NormalTok{) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"independence"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in props"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"yes"}\NormalTok{, }\StringTok{"no"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distn_two_props }\OperatorTok{%>%}\StringTok{ }\KeywordTok{visualize}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-467-1} \end{center}
+
+We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.099 or less than or equal to 0.099 for our \(p\)-value.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distn_two_props }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ d_hat, }\DataTypeTok{direction =} \StringTok{"two_sided"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-468-1} \end{center}
+
+\hypertarget{calculate-p-value-2}{%
+\paragraph{\texorpdfstring{Calculate \(p\)-value}{Calculate p-value}}\label{calculate-p-value-2}}
+\addcontentsline{toc}{paragraph}{Calculate \(p\)-value}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pvalue <-}\StringTok{ }\NormalTok{null_distn_two_props }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_pvalue}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ d_hat, }\DataTypeTok{direction =} \StringTok{"two_sided"}\NormalTok{)}
+\NormalTok{pvalue}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  p_value
+    <dbl>
+1   0.003
+\end{verbatim}
+
+So our \(p\)-value is 0.003 and we reject the null hypothesis at the 5\% level. You can also see this from the histogram above that we are far into the tails of the null distribution.
+
+\hypertarget{bootstrapping-for-confidence-interval-2}{%
+\subsubsection*{Bootstrapping for confidence interval}\label{bootstrapping-for-confidence-interval-2}}
+
+
+We can also create a confidence interval for the unknown population parameter \(\pi_{college} - \pi_{no\_college}\) using our sample data with \emph{bootstrapping}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{boot_distn_two_props <-}\StringTok{ }\NormalTok{offshore }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(response }\OperatorTok{~}\StringTok{ }\NormalTok{college_grad, }\DataTypeTok{success =} \StringTok{"no opinion"}\NormalTok{) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in props"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"yes"}\NormalTok{, }\StringTok{"no"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{ci <-}\StringTok{ }\NormalTok{boot_distn_two_props }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{()}
+\NormalTok{ci}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `2.5%` `97.5%`
+   <dbl>   <dbl>
+1 -0.161 -0.0378
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{boot_distn_two_props }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{endpoints =}\NormalTok{ ci, }\DataTypeTok{direction =} \StringTok{"between"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-472-1} \end{center}
+
+We see that 0 is not contained in this confidence interval as a plausible value of \(\pi_{college} - \pi_{no\_college}\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates.
+
+\textbf{Interpretation}: We are 95\% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{traditional-methods-2}{%
+\subsection{Traditional methods}\label{traditional-methods-2}}
+
+\hypertarget{check-conditions-2}{%
+\subsection{Check conditions}\label{check-conditions-2}}
+
+Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\item
+  \emph{Independent observations}: Each case that was selected must be independent of all the other cases selected.
+
+  This condition is met since cases were selected at random to observe.
+\item
+  \emph{Sample size}: The number of pooled successes and pooled failures must be at least 10 for each group.
+
+  We need to first figure out the pooled success rate: \[\hat{p}_{obs} = \dfrac{131 + 104}{827} = 0.28.\] We now determine expected (pooled) success and failure counts:
+
+  \(0.28 \cdot (131 + 258) = 108.92\), \(0.72 \cdot (131 + 258) = 280.08\)
+
+  \(0.28 \cdot (104 + 334) = 122.64\), \(0.72 \cdot (104 + 334) = 315.36\)
+\item
+  \emph{Independent selection of samples}: The cases are not paired in any meaningful way.
+
+  We have no reason to suspect that a college graduate selected would have any relationship to a non-college graduate selected.
+\end{enumerate}
+
+\hypertarget{test-statistic-2}{%
+\subsection{Test statistic}\label{test-statistic-2}}
+
+The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample proportions corresponding to no opinion on drilling (\(\hat{p}_{college, obs} - \hat{p}_{no\_college, obs}\) = 0.033) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the standard normal distribution to standardize the difference in sample proportions (\(\hat{P}_{college} - \hat{P}_{no\_college}\)) using the standard error of \(\hat{P}_{college} - \hat{P}_{no\_college}\) and the pooled estimate:
+
+\[ Z =\dfrac{ (\hat{P}_1 - \hat{P}_2) - 0}{\sqrt{\dfrac{\hat{P}(1 - \hat{P})}{n_1} + \dfrac{\hat{P}(1 - \hat{P})}{n_2} }} \sim N(0, 1) \] where \(\hat{P} = \dfrac{\text{total number of successes} }{ \text{total number of cases}}.\)
+
+\hypertarget{observed-test-statistic-2}{%
+\subsubsection*{Observed test statistic}\label{observed-test-statistic-2}}
+
+
+While one could compute this observed test statistic by ``hand'', the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the \texttt{prop.test} function to perform this analysis for us.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{z_hat <-}\StringTok{ }\NormalTok{offshore }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(response }\OperatorTok{~}\StringTok{ }\NormalTok{college_grad, }\DataTypeTok{success =} \StringTok{"no opinion"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"z"}\NormalTok{, }\DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"yes"}\NormalTok{, }\StringTok{"no"}\NormalTok{))}
+\NormalTok{z_hat}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1 -3.16
+\end{verbatim}
+
+The observed difference in sample proportions is 3.16 standard deviations smaller than 0.
+
+The \(p\)-value---the probability of observing a \(Z\) value of -3.16 or more extreme in our null distribution---is 0.0016. This can also be calculated in R directly:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\DecValTok{2} \OperatorTok{*}\StringTok{ }\KeywordTok{pnorm}\NormalTok{(}\OperatorTok{-}\FloatTok{3.16}\NormalTok{, }\DataTypeTok{lower.tail =} \OtherTok{TRUE}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+[1] 0.00158
+\end{verbatim}
+
+\hypertarget{state-conclusion-2}{%
+\subsection{State conclusion}\label{state-conclusion-2}}
+
+We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{comparing-results-2}{%
+\subsection{Comparing results}\label{comparing-results-2}}
+
+Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{two-means-independent-samples}{%
+\section{Two means (independent samples)}\label{two-means-independent-samples}}
+
+\hypertarget{problem-statement-3}{%
+\subsection{Problem statement}\label{problem-statement-3}}
+
+Average income varies from one region of the country to
+another, and it often reflects both lifestyles and regional living expenses. Suppose a new graduate
+is considering a job in two locations, Cleveland, OH and Sacramento, CA, and he wants to see
+whether the average income in one of these cities is higher than the other. He would like to conduct
+a hypothesis test based on two randomly selected samples from the 2000 Census. \citep[Tweaked a bit from][ {[}Chapter 5{]}]{isrs2014}
+
+\hypertarget{competing-hypotheses-3}{%
+\subsection{Competing hypotheses}\label{competing-hypotheses-3}}
+
+\hypertarget{in-words-3}{%
+\subsubsection*{In words}\label{in-words-3}}
+
+
+\begin{itemize}
+\item
+  Null hypothesis: There is no association between income and location (Cleveland, OH and Sacramento, CA).
+\item
+  Alternative hypothesis: There is an association between income and location (Cleveland, OH and Sacramento, CA).
+\end{itemize}
+
+\hypertarget{another-way-in-words-1}{%
+\subsubsection*{Another way in words}\label{another-way-in-words-1}}
+
+
+\begin{itemize}
+\item
+  Null hypothesis: The mean income is the \textbf{same} for both cities.
+\item
+  Alternative hypothesis: The mean income is different for the two cities.
+\end{itemize}
+
+\hypertarget{in-symbols-with-annotations-3}{%
+\subsubsection*{In symbols (with annotations)}\label{in-symbols-with-annotations-3}}
+
+
+\begin{itemize}
+\tightlist
+\item
+  \(H_0: \mu_{sac} = \mu_{cle}\) or \(H_0: \mu_{sac} - \mu_{cle} = 0\), where \(\mu\) represents the average income.
+\item
+  \(H_A: \mu_{sac} - \mu_{cle} \ne 0\)
+\end{itemize}
+
+\hypertarget{set-alpha-3}{%
+\subsubsection*{\texorpdfstring{Set \(\alpha\)}{Set \textbackslash{}alpha}}\label{set-alpha-3}}
+
+
+It's important to set the significance level before starting the testing using the data. Let's set the significance level at 5\% here.
+
+\hypertarget{exploring-the-sample-data-3}{%
+\subsection{Exploring the sample data}\label{exploring-the-sample-data-3}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{cle_sac <-}\StringTok{ }\KeywordTok{read.delim}\NormalTok{(}\StringTok{"https://moderndive.com/data/cleSac.txt"}\NormalTok{) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{rename}\NormalTok{(}\DataTypeTok{metro_area =}\NormalTok{ Metropolitan_area_Detailed,}
+         \DataTypeTok{income =}\NormalTok{ Total_personal_income) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{na.omit}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{inc_summ <-}\StringTok{ }\NormalTok{cle_sac }\OperatorTok{%>%}\StringTok{ }\KeywordTok{group_by}\NormalTok{(metro_area) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{sample_size =} \KeywordTok{n}\NormalTok{(),}
+    \DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(income),}
+    \DataTypeTok{sd =} \KeywordTok{sd}\NormalTok{(income),}
+    \DataTypeTok{minimum =} \KeywordTok{min}\NormalTok{(income),}
+    \DataTypeTok{lower_quartile =} \KeywordTok{quantile}\NormalTok{(income, }\FloatTok{0.25}\NormalTok{),}
+    \DataTypeTok{median =} \KeywordTok{median}\NormalTok{(income),}
+    \DataTypeTok{upper_quartile =} \KeywordTok{quantile}\NormalTok{(income, }\FloatTok{0.75}\NormalTok{),}
+    \DataTypeTok{max =} \KeywordTok{max}\NormalTok{(income))}
+\KeywordTok{kable}\NormalTok{(inc_summ) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{kable_styling}\NormalTok{(}\DataTypeTok{font_size =} \KeywordTok{ifelse}\NormalTok{(knitr}\OperatorTok{:::}\KeywordTok{is_latex_output}\NormalTok{(), }\DecValTok{10}\NormalTok{, }\DecValTok{16}\NormalTok{), }
+                \DataTypeTok{latex_options =} \KeywordTok{c}\NormalTok{(}\StringTok{"HOLD_position"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{table}[H]
+\centering\begingroup\fontsize{10}{12}\selectfont
+
+\begin{tabular}{l|r|r|r|r|r|r|r|r}
+\hline
+metro\_area & sample\_size & mean & sd & minimum & lower\_quartile & median & upper\_quartile & max\\
+\hline
+Cleveland\_ OH & 212 & 27467 & 27681 & 0 & 8475 & 21000 & 35275 & 152400\\
+\hline
+Sacramento\_ CA & 175 & 32428 & 35774 & 0 & 8050 & 20000 & 49350 & 206900\\
+\hline
+\end{tabular}
+\endgroup{}
+\end{table}
+
+The boxplot below also shows the mean for each group highlighted by the red dots.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(cle_sac, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ metro_area, }\DataTypeTok{y =}\NormalTok{ income)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_boxplot}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{stat_summary}\NormalTok{(}\DataTypeTok{fun.y =} \StringTok{"mean"}\NormalTok{, }\DataTypeTok{geom =} \StringTok{"point"}\NormalTok{, }\DataTypeTok{color =} \StringTok{"red"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/boxplot-1} \end{center}
+
+\hypertarget{guess-about-statistical-significance-3}{%
+\subsubsection*{Guess about statistical significance}\label{guess-about-statistical-significance-3}}
+
+
+We are looking to see if a difference exists in the mean income of the two levels of the explanatory variable. Based solely on the boxplot, we have reason to believe that no difference exists. The distributions of income seem similar and the means fall in roughly the same place.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{non-traditional-methods-3}{%
+\subsection{Non-traditional methods}\label{non-traditional-methods-3}}
+
+\hypertarget{collecting-summary-info-1}{%
+\subsubsection*{Collecting summary info}\label{collecting-summary-info-1}}
+
+
+We now compute the observed statistic:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{d_hat <-}\StringTok{ }\NormalTok{cle_sac }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(income }\OperatorTok{~}\StringTok{ }\NormalTok{metro_area) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in means"}\NormalTok{, }
+            \DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"Sacramento_ CA"}\NormalTok{, }\StringTok{"Cleveland_ OH"}\NormalTok{))}
+\NormalTok{d_hat}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1 4960.
+\end{verbatim}
+
+\hypertarget{randomization-for-hypothesis-test-1}{%
+\subsubsection*{Randomization for hypothesis test}\label{randomization-for-hypothesis-test-1}}
+
+
+In order to look to see if the observed sample mean for Sacramento of 27467.066 is statistically different than that for Cleveland of 32427.543, we need to account for the sample sizes. Note that this is the same as looking to see if \(\bar{x}_{sac} - \bar{x}_{cle}\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 212 and 175 were selected.
+
+We can use the idea of \emph{randomization testing} (also known as \emph{permutation testing}) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using \emph{shuffling} from that simulated population to account for sampling variability.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{set.seed}\NormalTok{(}\DecValTok{2018}\NormalTok{)}
+\NormalTok{null_distn_two_means <-}\StringTok{ }\NormalTok{cle_sac }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(income }\OperatorTok{~}\StringTok{ }\NormalTok{metro_area) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"independence"}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in means"}\NormalTok{,}
+            \DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"Sacramento_ CA"}\NormalTok{, }\StringTok{"Cleveland_ OH"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distn_two_means }\OperatorTok{%>%}\StringTok{ }\KeywordTok{visualize}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-476-1} \end{center}
+
+We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to 4960.477 or less than or equal to -4960.477 for our \(p\)-value.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distn_two_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ d_hat, }\DataTypeTok{direction =} \StringTok{"both"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-477-1} \end{center}
+
+\hypertarget{calculate-p-value-3}{%
+\paragraph{\texorpdfstring{Calculate \(p\)-value}{Calculate p-value}}\label{calculate-p-value-3}}
+\addcontentsline{toc}{paragraph}{Calculate \(p\)-value}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pvalue <-}\StringTok{ }\NormalTok{null_distn_two_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_pvalue}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ d_hat, }\DataTypeTok{direction =} \StringTok{"both"}\NormalTok{)}
+\NormalTok{pvalue}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  p_value
+    <dbl>
+1   0.130
+\end{verbatim}
+
+So our \(p\)-value is 0.13 and we fail to reject the null hypothesis at the 5\% level. You can also see this from the histogram above that we are not very far into the tail of the null distribution.
+
+\hypertarget{bootstrapping-for-confidence-interval-3}{%
+\subsubsection*{Bootstrapping for confidence interval}\label{bootstrapping-for-confidence-interval-3}}
+
+
+We can also create a confidence interval for the unknown population parameter \(\mu_{sac} - \mu_{cle}\) using our sample data with \emph{bootstrapping}. Here we will bootstrap each of the groups with replacement instead of shuffling. This is done using the \texttt{groups}
+argument in the \texttt{resample} function to fix the size of each group to
+be the same as the original group sizes of 175 for Sacramento and 212 for Cleveland.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{boot_distn_two_means <-}\StringTok{ }\NormalTok{cle_sac }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(income }\OperatorTok{~}\StringTok{ }\NormalTok{metro_area) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"diff in means"}\NormalTok{,}
+            \DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"Sacramento_ CA"}\NormalTok{, }\StringTok{"Cleveland_ OH"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{ci <-}\StringTok{ }\NormalTok{boot_distn_two_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{()}
+\NormalTok{ci}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `2.5%` `97.5%`
+   <dbl>   <dbl>
+1 -1446.  11308.
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{boot_distn_two_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{endpoints =}\NormalTok{ ci, }\DataTypeTok{direction =} \StringTok{"between"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-481-1} \end{center}
+
+We see that 0 is contained in this confidence interval as a plausible value of \(\mu_{sac} - \mu_{cle}\) (the unknown population parameter). This matches with our hypothesis test results of failing to reject the null hypothesis. Since zero is a plausible value of the population parameter, we do not have evidence that Sacramento incomes are different than Cleveland incomes.
+
+\textbf{Interpretation}: We are 95\% confident the true mean yearly income for those living in Sacramento is between 1445.53 dollars smaller to 11307.82 dollars higher than for Cleveland.
+
+\textbf{Note}: You could also use the null distribution based on randomization with a shift to have its center at \(\bar{x}_{sac} - \bar{x}_{cle} = \$4960.48\) instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{traditional-methods-3}{%
+\subsection{Traditional methods}\label{traditional-methods-3}}
+
+\hypertarget{check-conditions-3}{%
+\paragraph{Check conditions}\label{check-conditions-3}}
+\addcontentsline{toc}{paragraph}{Check conditions}
+
+Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\item
+  \emph{Independent observations}: The observations are independent in both groups.
+
+  This \texttt{metro\_area} variable is met since the cases are randomly selected from each city.
+\item
+  \emph{Approximately normal}: The distribution of the response for each group should be normal or the sample sizes should be at least 30.
+\end{enumerate}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(cle_sac, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ income)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{color =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{binwidth =} \DecValTok{20000}\NormalTok{) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{facet_wrap}\NormalTok{(}\OperatorTok{~}\StringTok{ }\NormalTok{metro_area)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/hist-1} \end{center}
+
+We have some reason to doubt the normality assumption here since both the histograms show deviation from a normal model fitting the data well for each group. The sample sizes for each group are greater than 100 though so the assumptions should still apply.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\setcounter{enumi}{2}
+\item
+  \emph{Independent samples}: The samples should be collected without any natural pairing.
+
+  There is no mention of there being a relationship between those selected in Cleveland and in Sacramento.
+\end{enumerate}
+
+\hypertarget{test-statistic-3}{%
+\subsection{Test statistic}\label{test-statistic-3}}
+
+The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample means (\(\bar{x}_{sac, obs} - \bar{x}_{cle, obs}\) = 4960.477) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the \(t\) distribution to standardize the difference in sample means (\(\bar{X}_{sac} - \bar{X}_{cle}\)) using the approximate standard error of \(\bar{X}_{sac} - \bar{X}_{cle}\) (invoking \(S_{sac}\) and \(S_{cle}\) as estimates of unknown \(\sigma_{sac}\) and \(\sigma_{cle}\)).
+
+\[ T =\dfrac{ (\bar{X}_1 - \bar{X}_2) - 0}{ \sqrt{\dfrac{S_1^2}{n_1} + \dfrac{S_2^2}{n_2}}  } \sim t (df = min(n_1 - 1, n_2 - 1)) \] where 1 = Sacramento and 2 = Cleveland with \(S_1^2\) and \(S_2^2\) the sample variance of the incomes of both cities, respectively, and \(n_1 = 175\) for Sacramento and \(n_2 = 212\) for Cleveland.
+
+\hypertarget{observed-test-statistic-3}{%
+\subsubsection*{Observed test statistic}\label{observed-test-statistic-3}}
+
+
+Note that we could also do (ALMOST) this test directly using the \texttt{t.test} function. The \texttt{x} and \texttt{y} arguments are expected to both be numeric vectors here so we'll need to appropriately filter our datasets.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{cle_sac }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(income }\OperatorTok{~}\StringTok{ }\NormalTok{metro_area) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"t"}\NormalTok{,}
+            \DataTypeTok{order =} \KeywordTok{c}\NormalTok{(}\StringTok{"Cleveland_ OH"}\NormalTok{, }\StringTok{"Sacramento_ CA"}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   stat
+  <dbl>
+1 -1.50
+\end{verbatim}
+
+We see here that the observed test statistic value is around -1.5.
+
+While one could compute this observed test statistic by ``hand'', the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies.
+
+\hypertarget{compute-p-value-1}{%
+\subsection{\texorpdfstring{Compute \(p\)-value}{Compute p-value}}\label{compute-p-value-1}}
+
+The \(p\)-value---the probability of observing an \(t_{174}\) value of -1.501 or more extreme (in both directions) in our null distribution---is 0.13. This can also be calculated in R directly:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\DecValTok{2} \OperatorTok{*}\StringTok{ }\KeywordTok{pt}\NormalTok{(}\OperatorTok{-}\FloatTok{1.501}\NormalTok{, }\DataTypeTok{df =} \KeywordTok{min}\NormalTok{(}\DecValTok{212} \OperatorTok{-}\StringTok{ }\DecValTok{1}\NormalTok{, }\DecValTok{175} \OperatorTok{-}\StringTok{ }\DecValTok{1}\NormalTok{), }\DataTypeTok{lower.tail =} \OtherTok{TRUE}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+[1] 0.135
+\end{verbatim}
+
+We can also approximate by using the standard normal curve:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\DecValTok{2} \OperatorTok{*}\StringTok{ }\KeywordTok{pnorm}\NormalTok{(}\OperatorTok{-}\FloatTok{1.501}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+[1] 0.133
+\end{verbatim}
+
+Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping.
+
+\hypertarget{state-conclusion-3}{%
+\subsection{State conclusion}\label{state-conclusion-3}}
+
+We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference not existing in the means was backed by this statistical analysis. We do not have evidence to suggest that the true mean income differs between Cleveland, OH and Sacramento, CA based on this data.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{comparing-results-3}{%
+\subsection{Comparing results}\label{comparing-results-3}}
+
+Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{two-means-paired-samples}{%
+\section{Two means (paired samples)}\label{two-means-paired-samples}}
+
+\hypertarget{problem-statement-4}{%
+\subsubsection*{Problem statement}\label{problem-statement-4}}
+
+
+Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water at 10 randomly selected locations on a stretch of river. Do the data suggest that the true average concentration in the surface water is smaller than that of bottom water? (Note that units are not given.) {[}Tweaked a bit from \url{https://onlinecourses.science.psu.edu/stat500/node/51}{]}
+
+\hypertarget{competing-hypotheses-4}{%
+\subsection{Competing hypotheses}\label{competing-hypotheses-4}}
+
+\hypertarget{in-words-4}{%
+\subsubsection*{In words}\label{in-words-4}}
+
+
+\begin{itemize}
+\item
+  Null hypothesis: The mean concentration in the bottom water is the same as that of the surface water at different paired locations.
+\item
+  Alternative hypothesis: The mean concentration in the surface water is smaller than that of the bottom water at different paired locations.
+\end{itemize}
+
+\hypertarget{in-symbols-with-annotations-4}{%
+\subsubsection*{In symbols (with annotations)}\label{in-symbols-with-annotations-4}}
+
+
+\begin{itemize}
+\tightlist
+\item
+  \(H_0: \mu_{diff} = 0\), where \(\mu_{diff}\) represents the mean difference in concentration for surface water minus bottom water.
+\item
+  \(H_A: \mu_{diff} < 0\)
+\end{itemize}
+
+\hypertarget{set-alpha-4}{%
+\subsubsection*{\texorpdfstring{Set \(\alpha\)}{Set \textbackslash{}alpha}}\label{set-alpha-4}}
+
+
+It's important to set the significance level before starting the testing using the data. Let's set the significance level at 5\% here.
+
+\hypertarget{exploring-the-sample-data-4}{%
+\subsection{Exploring the sample data}\label{exploring-the-sample-data-4}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{zinc_tidy <-}\StringTok{ }\KeywordTok{read_csv}\NormalTok{(}\StringTok{"https://moderndive.com/data/zinc_tidy.csv"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+We want to look at the differences in \texttt{surface\ -\ bottom} for each location:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{zinc_diff <-}\StringTok{ }\NormalTok{zinc_tidy }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(loc_id) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{pair_diff =} \KeywordTok{diff}\NormalTok{(concentration)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{ungroup}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+Next we calculate the mean difference as our observed statistic:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{d_hat <-}\StringTok{ }\NormalTok{zinc_diff }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ pair_diff) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"mean"}\NormalTok{)}
+\NormalTok{d_hat}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+     stat
+    <dbl>
+1 -0.0804
+\end{verbatim}
+
+The histogram below also shows the distribution of \texttt{pair\_diff}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(zinc_diff, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ pair_diff)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.04}\NormalTok{, }\DataTypeTok{color =} \StringTok{"white"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/hist1a-1} \end{center}
+
+\hypertarget{guess-about-statistical-significance-4}{%
+\subsubsection*{Guess about statistical significance}\label{guess-about-statistical-significance-4}}
+
+
+We are looking to see if the sample paired mean difference of -0.08 is statistically less than 0. They seem to be quite close, but we have a small number of pairs here. Let's guess that we will fail to reject the null hypothesis.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{non-traditional-methods-4}{%
+\subsection{Non-traditional methods}\label{non-traditional-methods-4}}
+
+\hypertarget{bootstrapping-for-hypothesis-test-1}{%
+\subsubsection*{Bootstrapping for hypothesis test}\label{bootstrapping-for-hypothesis-test-1}}
+
+
+In order to look to see if the observed sample mean difference \(\bar{x}_{diff} = 4960.477\) is statistically less than 0, we need to account for the number of pairs. We also need to determine a process that replicates how the paired data was selected in a way similar to how we calculated our original difference in sample means.
+
+Treating the differences as our data of interest, we next use the process of \textbf{bootstrapping} to build other simulated samples and then calculate the mean of the bootstrap samples. We hypothesize that the mean difference is zero.
+
+This process is similar to comparing the One Mean example seen above, but using the differences between the two groups as a single sample with a hypothesized mean difference of 0.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{set.seed}\NormalTok{(}\DecValTok{2018}\NormalTok{)}
+\NormalTok{null_distn_paired_means <-}\StringTok{ }\NormalTok{zinc_diff }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ pair_diff) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{hypothesize}\NormalTok{(}\DataTypeTok{null =} \StringTok{"point"}\NormalTok{, }\DataTypeTok{mu =} \DecValTok{0}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"mean"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distn_paired_means }\OperatorTok{%>%}\StringTok{ }\KeywordTok{visualize}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-485-1} \end{center}
+
+We can next use this distribution to observe our \(p\)-value. Recall this is a left-tailed test so we will be looking for values that are less than or equal to 4960.477 for our \(p\)-value.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{null_distn_paired_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ d_hat, }\DataTypeTok{direction =} \StringTok{"less"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-486-1} \end{center}
+
+\hypertarget{calculate-p-value-4}{%
+\paragraph{\texorpdfstring{Calculate \(p\)-value}{Calculate p-value}}\label{calculate-p-value-4}}
+\addcontentsline{toc}{paragraph}{Calculate \(p\)-value}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{pvalue <-}\StringTok{ }\NormalTok{null_distn_paired_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_pvalue}\NormalTok{(}\DataTypeTok{obs_stat =}\NormalTok{ d_hat, }\DataTypeTok{direction =} \StringTok{"less"}\NormalTok{)}
+\NormalTok{pvalue}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  p_value
+    <dbl>
+1       0
+\end{verbatim}
+
+So our \(p\)-value is essentially 0 and we reject the null hypothesis at the 5\% level. You can also see this from the histogram above that we are far into the left tail of the null distribution.
+
+\hypertarget{bootstrapping-for-confidence-interval-4}{%
+\subsubsection*{Bootstrapping for confidence interval}\label{bootstrapping-for-confidence-interval-4}}
+
+
+We can also create a confidence interval for the unknown population parameter \(\mu_{diff}\) using our sample data (the calculated differences) with \emph{bootstrapping}. This is similar to the bootstrapping done in a one sample mean case, except now our data is differences instead of raw numerical data.
+Note that this code is identical to the pipeline shown in the hypothesis test above except the \texttt{hypothesize()} function is not called.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{boot_distn_paired_means <-}\StringTok{ }\NormalTok{zinc_diff }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{specify}\NormalTok{(}\DataTypeTok{response =}\NormalTok{ pair_diff) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{generate}\NormalTok{(}\DataTypeTok{reps =} \DecValTok{10000}\NormalTok{) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{calculate}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"mean"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{ci <-}\StringTok{ }\NormalTok{boot_distn_paired_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{get_ci}\NormalTok{()}
+\NormalTok{ci}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 2
+  `2.5%` `97.5%`
+   <dbl>   <dbl>
+1 -0.112 -0.0503
+\end{verbatim}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{boot_distn_paired_means }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{visualize}\NormalTok{(}\DataTypeTok{endpoints =}\NormalTok{ ci, }\DataTypeTok{direction =} \StringTok{"between"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-490-1} \end{center}
+
+We see that 0 is not contained in this confidence interval as a plausible value of \(\mu_{diff}\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter and since the entire confidence interval falls below zero, we have evidence that surface zinc concentration levels are lower, on average, than bottom level zinc concentrations.
+
+\textbf{Interpretation}: We are 95\% confident the true mean zinc concentration on the surface is between 0.11 units smaller to 0.05 units smaller than on the bottom.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{traditional-methods-4}{%
+\subsection{Traditional methods}\label{traditional-methods-4}}
+
+\hypertarget{check-conditions-4}{%
+\subsubsection*{Check conditions}\label{check-conditions-4}}
+
+
+Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\item
+  \emph{Independent observations}: The observations among pairs are independent.
+
+  The locations are selected independently through random sampling so this condition is met.
+\item
+  \emph{Approximately normal}: The distribution of population of differences is normal or the number of pairs is at least 30.
+
+  The histogram above does show some skew so we have reason to doubt the population being normal based on this sample. We also only have 10 pairs which is fewer than the 30 needed. A theory-based test may not be valid here.
+\end{enumerate}
+
+\hypertarget{test-statistic-4}{%
+\subsubsection*{Test statistic}\label{test-statistic-4}}
+
+
+The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean difference \(\mu_{diff}\). A good guess is the sample mean difference \(\bar{X}_{diff}\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \(\bar{x}_{diff, obs} = 0.0804\) or larger assuming that the population mean difference is 0 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can ``standardize'' this original test statistic of \(\bar{X}_{diff}\) into a \(T\) statistic that follows a \(t\) distribution with degrees of freedom equal to \(df = n - 1\):
+
+\[ T =\dfrac{ \bar{X}_{diff} - 0}{ S_{diff} / \sqrt{n} } \sim t (df = n - 1) \]
+
+where \(S\) represents the standard deviation of the sample differences and \(n\) is the number of pairs.
+
+\hypertarget{observed-test-statistic-4}{%
+\paragraph{Observed test statistic}\label{observed-test-statistic-4}}
+\addcontentsline{toc}{paragraph}{Observed test statistic}
+
+While one could compute this observed test statistic by ``hand'', the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the \texttt{t\_test} function on the differences to perform this analysis for us.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{t_test_results <-}\StringTok{ }\NormalTok{zinc_diff }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\NormalTok{infer}\OperatorTok{::}\KeywordTok{t_test}\NormalTok{(}\DataTypeTok{formula =}\NormalTok{ pair_diff }\OperatorTok{~}\StringTok{ }\OtherTok{NULL}\NormalTok{, }
+         \DataTypeTok{alternative =} \StringTok{"less"}\NormalTok{,}
+         \DataTypeTok{mu =} \DecValTok{0}\NormalTok{)}
+\NormalTok{t_test_results}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 6
+  statistic  t_df  p_value alternative lower_ci upper_ci
+      <dbl> <dbl>    <dbl> <chr>          <dbl>    <dbl>
+1     -4.86     9 0.000446 less            -Inf  -0.0501
+\end{verbatim}
+
+We see here that the \(t_{obs}\) value is -4.864.
+
+\hypertarget{compute-p-value-2}{%
+\subsubsection*{\texorpdfstring{Compute \(p\)-value}{Compute p-value}}\label{compute-p-value-2}}
+
+
+The \(p\)-value---the probability of observing a \(t_{obs}\) value of -4.864 or less in our null distribution of a \(t\) with 9 degrees of freedom---is 0. This can also be calculated in R directly:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{pt}\NormalTok{(}\OperatorTok{-}\FloatTok{4.8638}\NormalTok{, }\DataTypeTok{df =} \KeywordTok{nrow}\NormalTok{(zinc_diff) }\OperatorTok{-}\StringTok{ }\DecValTok{1}\NormalTok{, }\DataTypeTok{lower.tail =} \OtherTok{TRUE}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+[1] 0.000446
+\end{verbatim}
+
+\hypertarget{state-conclusion-4}{%
+\subsubsection*{State conclusion}\label{state-conclusion-4}}
+
+
+We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean difference was not statistically less than the hypothesized mean of 0 has been invalidated here. Based on this sample, we have evidence that the mean concentration in the bottom water is greater than that of the surface water at different paired locations.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{comparing-results-4}{%
+\subsection{Comparing results}\label{comparing-results-4}}
+
+Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results here.
+
+\hypertarget{appendixC}{%
+\chapter{Reach for the Stars}\label{appendixC}}
+
+\hypertarget{needed-packages-12}{%
+\section*{Needed packages}\label{needed-packages-12}}
+
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(knitr)}
+\KeywordTok{library}\NormalTok{(dygraphs)}
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\end{Highlighting}
+\end{Shaded}
+
+\hypertarget{sorted-barplots}{%
+\section{Sorted barplots}\label{sorted-barplots}}
+
+Building upon the example in Section \ref{geombar}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{flights_table <-}\StringTok{ }\KeywordTok{table}\NormalTok{(flights}\OperatorTok{$}\NormalTok{carrier)}
+\NormalTok{flights_table}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+
+   9E    AA    AS    B6    DL    EV    F9    FL    HA    MQ 
+18460 32729   714 54635 48110 54173   685  3260   342 26397 
+   OO    UA    US    VX    WN    YV 
+   32 58665 20536  5162 12275   601 
+\end{verbatim}
+
+We can sort this table from highest to lowest counts by using the \texttt{sort} function:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{sorted_flights <-}\StringTok{ }\KeywordTok{sort}\NormalTok{(flights_table, }\DataTypeTok{decreasing =} \OtherTok{TRUE}\NormalTok{)}
+\KeywordTok{names}\NormalTok{(sorted_flights)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+ [1] "UA" "B6" "EV" "DL" "AA" "MQ" "US" "9E" "WN" "VX" "FL" "AS"
+[13] "F9" "YV" "HA" "OO"
+\end{verbatim}
+
+It is often preferred for barplots to be ordered corresponding to the heights of the bars. This allows the reader to more easily compare the ordering of different airlines in terms of departed flights \citep{robbins2013}. We can also much more easily answer questions like ``How many airlines have more departing flights than Southwest Airlines?''.
+
+We can use the sorted table giving the number of flights defined as \texttt{sorted\_flights} to \textbf{reorder} the \texttt{carrier}.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ carrier)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_bar}\NormalTok{() }\OperatorTok{+}
+\StringTok{  }\KeywordTok{scale_x_discrete}\NormalTok{(}\DataTypeTok{limits =} \KeywordTok{names}\NormalTok{(sorted_flights))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{figure}
+
+{\centering \includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-496-1} 
+
+}
+
+\caption{Number of flights departing NYC in 2013 by airline - Descending numbers}\label{fig:unnamed-chunk-496}
+\end{figure}
+
+The last addition here specifies the values of the horizontal \texttt{x} axis on a discrete scale to correspond to those given by the entries of \texttt{sorted\_flights}.
+
+\hypertarget{interactive-graphics}{%
+\section{Interactive graphics}\label{interactive-graphics}}
+
+\hypertarget{interactive-linegraphs}{%
+\subsection{Interactive linegraphs}\label{interactive-linegraphs}}
+
+Another useful tool for viewing linegraphs such as this is the \texttt{dygraph} function in the \texttt{dygraphs} package in combination with the \texttt{dyRangeSelector} function. This allows us to zoom in on a selected range and get an interactive plot for us to work with:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dygraphs)}
+\NormalTok{flights_day <-}\StringTok{ }\KeywordTok{mutate}\NormalTok{(flights, }\DataTypeTok{date =} \KeywordTok{as.Date}\NormalTok{(time_hour))}
+\NormalTok{flights_summarized <-}\StringTok{ }\NormalTok{flights_day }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(date) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{median_arr_delay =} \KeywordTok{median}\NormalTok{(arr_delay, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{))}
+\KeywordTok{rownames}\NormalTok{(flights_summarized) <-}\StringTok{ }\NormalTok{flights_summarized}\OperatorTok{$}\NormalTok{date}
+\NormalTok{flights_summarized <-}\StringTok{ }\KeywordTok{select}\NormalTok{(flights_summarized, }\OperatorTok{-}\NormalTok{date)}
+\KeywordTok{dyRangeSelector}\NormalTok{(}\KeywordTok{dygraph}\NormalTok{(flights_summarized))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=1\linewidth]{ismaykim_files/figure-latex/unnamed-chunk-498-1} \end{center}
+
+The syntax here is a little different than what we have covered so far. The \texttt{dygraph} function is expecting for the dates to be given as the \texttt{rownames} of the object. We then remove the \texttt{date} variable from the \texttt{flights\_summarized} data frame since it is accounted for in the \texttt{rownames}. Lastly, we run the \texttt{dygraph} function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via \texttt{dyRangeSelector}. (Note that this plot will only be interactive in the HTML version of this book.)
+
+\hypertarget{appendixD}{%
+\chapter{Learning Check Solutions}\label{appendixD}}
+
+\hypertarget{chapter-2-solutions}{%
+\section{Chapter 2 Solutions}\label{chapter-2-solutions}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\end{Highlighting}
+\end{Shaded}
+
+\textbf{(LC2.1)} Repeat the above installing steps, but for the \texttt{dplyr}, \texttt{nycflights13}, and \texttt{knitr} packages. This will install the earlier mentioned \texttt{dplyr} package, the \texttt{nycflights13} package containing data on all domestic flights leaving a NYC airport in 2013, and the \texttt{knitr} package for writing reports in R.
+
+\textbf{(LC2.2)} ``Load'' the \texttt{dplyr}, \texttt{nycflights13}, and \texttt{knitr} packages as well by repeating the above steps.
+
+\textbf{Solution}: If the following code runs with no errors, you've succeeded!
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\KeywordTok{library}\NormalTok{(knitr)}
+\end{Highlighting}
+\end{Shaded}
+
+\textbf{(LC2.3)} What does any \emph{ONE} row in this \texttt{flights} dataset refer to?
+
+\begin{itemize}
+\tightlist
+\item
+  A. Data on an airline
+\item
+  B. Data on a flight
+\item
+  C. Data on an airport
+\item
+  D. Data on multiple flights
+\end{itemize}
+
+\textbf{Solution}: This is data on a flight. Not a flight path! Example:
+
+\begin{itemize}
+\tightlist
+\item
+  a flight path would be United 1545 to Houston
+\item
+  a flight would be United 1545 to Houston at a specific date/time. For example: 2013/1/1 at 5:15am.
+\end{itemize}
+
+\textbf{(LC2.4)} What are some examples in this dataset of \textbf{categorical} variables? What makes them different than \textbf{quantitative} variables?
+
+\textbf{Solution}: Hint: Type \texttt{?flights} in the console to see what all the variables mean!
+
+\begin{itemize}
+\tightlist
+\item
+  Categorical:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    \texttt{carrier} the company
+  \item
+    \texttt{dest} the destination
+  \item
+    \texttt{flight} the flight number. Even though this is a number, its simply a label. Example United 1545 is not less than United 1714
+  \end{itemize}
+\item
+  Quantitative:
+
+  \begin{itemize}
+  \tightlist
+  \item
+    \texttt{distance} the distance in miles
+  \item
+    \texttt{time\_hour} time
+  \end{itemize}
+\end{itemize}
+
+\textbf{(LC2.5)} What does \texttt{int}, \texttt{dbl}, and \texttt{chr} mean in the output above?
+
+\textbf{Solution}:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{int}: integer. Used to count things i.e.~a discrete value. Ex: the \# of cars parked in a lot
+\item
+  \texttt{dbl}: double. Used to measure things. i.e.~a continuous value. Ex: your height in inches
+\item
+  \texttt{chr}: character. i.e.~text
+\end{itemize}
+
+\textbf{(LC2.6)} What properties of the observational unit do each of \texttt{lat}, \texttt{lon}, \texttt{alt}, \texttt{tz}, \texttt{dst}, and \texttt{tzone} describe for the \texttt{airports} data frame? Note that you may want to use \texttt{?airports} to get more information.
+
+\textbf{Solution}: \texttt{lat} \texttt{long} represent the airport geographic coordinates, \texttt{alt} is the altitude above sea level of the airport (Run \texttt{airports\ \%\textgreater{}\%\ filter(faa\ ==\ "DEN")} to see the altitude of Denver International Airport), \texttt{tz} is the time zone difference with respect to GMT in London UK, \texttt{dst} is the daylight savings time zone, and \texttt{tzone} is the time zone label.
+
+\textbf{(LC2.7)} Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.
+
+\textbf{Solution}:
+
+\begin{itemize}
+\tightlist
+\item
+  In the \texttt{weather} example in LC3.8, the combination of \texttt{origin}, \texttt{year}, \texttt{month}, \texttt{day}, \texttt{hour} are identification variables as they identify the observation in question.
+\item
+  Anything else pertains to observations: \texttt{temp}, \texttt{humid}, \texttt{wind\_speed}, etc.
+\end{itemize}
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{chapter-3-solutions}{%
+\section{Chapter 3 Solutions}\label{chapter-3-solutions}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\end{Highlighting}
+\end{Shaded}
+
+\textbf{(LC3.1)} Take a look at both the \texttt{flights} and \texttt{alaska\_flights} data frames by running \texttt{View(flights)} and \texttt{View(alaska\_flights)} in the console. In what respect do these data frames differ?
+
+\textbf{Solution}: \texttt{flights} contains all flight data, while \texttt{alaska\_flights} contains only data from Alaskan carrier ``AS''. We can see that flights has 336776 rows while \texttt{alaska\_flights} has only 714
+
+\textbf{(LC3.2)} What are some practical reasons why \texttt{dep\_delay} and \texttt{arr\_delay} have a positive relationship?
+
+\textbf{Solution}: The later a plane departs, typically the later it will arrive.
+
+\textbf{(LC3.3)} What variables (not necessarily in the \texttt{flights} data frame) would you expect to have a negative correlation (i.e.~a negative relationship) with \texttt{dep\_delay}? Why? Remember that we are focusing on numerical variables here.
+
+\textbf{Solution}: An example in the \texttt{weather} dataset is \texttt{visibility}, which measure visibility in miles. As visibility increases, we would expect departure delays to decrease.
+
+\textbf{(LC3.4)} Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights?
+
+\textbf{Solution}: The point (0,0) means no delay in departure nor arrival. From the point of view of Alaska airlines, this means the flight was on time. It seems most flights are at least close to being on time.
+
+\textbf{(LC3.5)} What are some other features of the plot that stand out to you?
+
+\textbf{Solution}: Different people will answer this one differently. One answer is most flights depart and arrive less than an hour late.
+
+\textbf{(LC3.6)} Create a new scatterplot using different variables in the \texttt{alaska\_flights} data frame by modifying the example above.
+
+\textbf{Solution}: Many possibilities for this one, see the plot below. Is there a pattern in departure delay depending on when the flight is scheduled to depart? Interestingly, there seems to be only two blocks of time where flights depart.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ alaska_flights, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ dep_time, }\DataTypeTok{y =}\NormalTok{ dep_delay)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_point}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-504-1} \end{center}
+
+\textbf{(LC3.7)} Why is setting the \texttt{alpha} argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot?
+
+\textbf{Solution}: Why is setting the \texttt{alpha} argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? \emph{It thins out the points so we address overplotting. But more importantly it hints at the (statistical) \textbf{density} and \textbf{distribution} of the points: where are the points concentrated, where do they occur. We will see more about densities and distributions in Chapter 6 when we switch gears to statistical topics.}
+
+\textbf{(LC3.8)} After viewing the Figure \ref{fig:alpha} above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the \texttt{alpha\ =\ 0.2} set in Figure \ref{fig:noalpha}?
+
+\textbf{Solution}: After viewing the Figure \ref{fig:alpha} above, give a range of arrival delays and departure delays that occur most frequently? How has that region changed compared to when you observed the same plot without the \texttt{alpha\ =\ 0.2} set in Figure \ref{fig:noalpha}? \emph{The lower plot suggests that most Alaska flights from NYC depart between 12 minutes early and on time and arrive between 50 minutes early and on time.}
+
+\textbf{(LC3.9)} Take a look at both the \texttt{weather} and \texttt{early\_january\_weather} data frames by running \texttt{View(weather)} and \texttt{View(early\_january\_weather)} in the console. In what respect do these data frames differ?
+
+\textbf{Solution}: Take a look at both the \texttt{weather} and \texttt{early\_january\_weather} data frames by running \texttt{View(weather)} and \texttt{View(early\_january\_weather)} in the console. In what respect do these data frames differ? \emph{The rows of \texttt{early\_january\_weather} are a subset of \texttt{weather}.}
+
+\textbf{(LC3.10)} \texttt{View()} the \texttt{flights} data frame again. Why does the \texttt{time\_hour} variable uniquely identify the hour of the measurement whereas the \texttt{hour} variable does not?
+
+\textbf{Solution}: \texttt{View()} the \texttt{flights} data frame again. Why does the \texttt{time\_hour} variable correctly identify the hour of the measurement whereas the \texttt{hour} variable does not? \emph{Because to uniquely identify an hour, we need the \texttt{year}/\texttt{month}/\texttt{day}/\texttt{hour} sequence, whereas there are only 24 possible \texttt{hour}'s.}
+
+\textbf{(LC3.11)} Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis?
+
+\textbf{Solution}: Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? \emph{Because lines suggest connectedness and ordering.}
+
+\textbf{(LC3.12)} Why are linegraphs frequently used when time is the explanatory variable?
+
+\textbf{Solution}: Why are linegraphs frequently used when time is the explanatory variable? \emph{Because time is sequential: subsequent observations are closely related to each other.}
+
+\textbf{(LC3.13)} Plot a time series of a variable other than \texttt{temp} for Newark Airport in the first 15 days of January 2013.
+
+\textbf{Solution}: Plot a time series of a variable other than \texttt{temp} for Newark Airport in the first 15 days of January 2013. \emph{Humidity is a good one to look at, since this very closely related to the cycles of a day.}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =}\NormalTok{ early_january_weather, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ time_hour, }\DataTypeTok{y =}\NormalTok{ humid)) }\OperatorTok{+}
+\StringTok{  }\KeywordTok{geom_line}\NormalTok{()}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{center}\includegraphics[width=\textwidth]{ismaykim_files/figure-latex/unnamed-chunk-505-1} \end{center}
+
+\textbf{(LC3.14)} What does changing the number of bins from 30 to 40 tell us about the distribution of temperatures?
+
+\textbf{Solution}: The distribution doesn't change much. But by refining the bin width, we see that the temperature data has a high degree of accuracy. What do I mean by accuracy? Looking at the \texttt{temp} variabile by \texttt{View(weather)}, we see that the precision of each temperature recording is 2 decimal places.
+
+\textbf{(LC3.15)} Would you classify the distribution of temperatures as symmetric or skewed?
+
+\textbf{Solution}: It is rather symmetric, i.e.~there are no \textbf{long tails} on only one side of the distribution
+
+\textbf{(LC3.16)} What would you guess is the ``center'' value in this distribution? Why did you make that choice?
+
+\textbf{Solution}: The center is around 55.26°F. By running the \texttt{summary()} command, we see that the mean and median are very similar. In fact, when the distribution is symmetric the mean equals the median.
+
+\textbf{(LC3.17)} Is this data spread out greatly from the center or is it close? Why?
+
+\textbf{Solution}: This can only be answered relatively speaking! Let's pick things to be relative to Seattle, WA temperatures:
+
+\includegraphics{images/temp.png}
+
+While, it appears that Seattle weather has a similar center of 55°F, its
+temperatures are almost entirely between 35°F and 75°F for a range of
+about 40°F. Seattle temperatures are much less spread out than New York
+i.e.~much more consistent over the year. New York on the other hand has much colder
+days in the winter and much hotter days in the summer. Expressed differently,
+the middle 50\% of values, as delineated by the interquartile range is 30°F:
+
+\textbf{(LC3.18)} What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables?
+
+\textbf{Solution}:
+
+\begin{itemize}
+\tightlist
+\item
+  Certain months have much more consistent weather (August in particular), while others have crazy variability like January and October, representing changes in the seasons.
+\item
+  Because we see \texttt{temp} recordings split by \texttt{month}, we are considering the relationship between these two variables. For example, for example for summer months, temperatures tend to be higher.
+\end{itemize}
+
+\textbf{(LC3.19)} What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100?
+
+\textbf{Solution}:
+
+\begin{itemize}
+\tightlist
+\item
+  While month is technically a number between 1-12, we're viewing it as a categorical variable here. Specifically an \textbf{ordinal categorical} variable since there is a ordering to the categories
+\item
+  25, 50, 75, 100 are temperatures
+\end{itemize}
+
+\textbf{(LC3.20)} For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics.
+
+\textbf{Solution}:
+
+\begin{itemize}
+\tightlist
+\item
+  We'd have 365 facets to look at. Way too many.
+\item
+  We don't really care about day-to-day fluctuation in weather so much, but maybe more week-to-week variation. We'd like to focus on seasonal trends.
+\end{itemize}
+
+\textbf{(LC3.21)} Does the \texttt{temp} variable in the \texttt{weather} data-set have a lot of variability? Why do you say that?
+
+\textbf{Solution}: Again, like in LC (LC3.17), this is a relative question. I would say yes, because in New York City, you have 4 clear seasons with different weather. Whereas in Seattle WA and Portland OR, you have two seasons: summer and rain!
+
+\textbf{(LC3.22)} What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point.
+
+\textbf{Solution}: It appears to be an outlier. Let's revisit the use of the \texttt{filter} command to hone in on it. We want all data points where the \texttt{month} is 5 and \texttt{temp\textless{}25}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{weather }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(month}\OperatorTok{==}\DecValTok{5} \OperatorTok{&}\StringTok{ }\NormalTok{temp }\OperatorTok{<}\StringTok{ }\DecValTok{25}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 16
+  origin  year month   day  hour  temp  dewp humid wind_dir
+  <chr>  <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl>    <dbl>
+1 JFK     2013     5     8    22  13.1  12.0  95.3       80
+# ... with 7 more variables: wind_speed <dbl>, wind_gust <dbl>,
+#   precip <dbl>, pressure <dbl>, visib <dbl>, time_hour <dttm>,
+#   temp_in_C <dbl>
+\end{verbatim}
+
+There appears to be only one hour and only at JFK that recorded 13.1 F (-10.5 C) in the month of May. This is probably a data entry mistake! Why wasn't the weather at least similar at EWR (Newark) and LGA (La Guardia)?
+
+\textbf{(LC3.23)} Which months have the highest variability in temperature? What reasons do you think this is?
+
+\textbf{Solution}: We are now interested in the \textbf{spread} of the data. One measure some of you may have seen previously is the standard deviation. But in this plot we can read off the Interquartile Range (IQR):
+
+\begin{itemize}
+\tightlist
+\item
+  The distance from the 1st to the 3rd quartiles i.e.~the length of the boxes
+\item
+  You can also think of this as the spread of the \textbf{middle 50\%} of the data
+\end{itemize}
+
+Just from eyeballing it, it seems
+
+\begin{itemize}
+\tightlist
+\item
+  November has the biggest IQR, i.e.~the widest box, so has the most variation in temperature
+\item
+  August has the smallest IQR, i.e.~the narrowest box, so is the most consistent temperature-wise
+\end{itemize}
+
+Here's how we compute the exact IQR values for each month (we'll see this more in depth Chapter 5 of the text):
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \texttt{group} the observations by \texttt{month} then
+\item
+  for each \texttt{group}, i.e. \texttt{month}, \texttt{summarize} it by applying the summary statistic function \texttt{IQR()}, while making sure to skip over missing data via \texttt{na.rm=TRUE} then
+\item
+  \texttt{arrange} the table in \texttt{desc}ending order of \texttt{IQR}
+\end{enumerate}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{weather }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(month) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{IQR =} \KeywordTok{IQR}\NormalTok{(temp, }\DataTypeTok{na.rm=}\OtherTok{TRUE}\NormalTok{)) }\OperatorTok{%>%}
+\StringTok{  }\KeywordTok{arrange}\NormalTok{(}\KeywordTok{desc}\NormalTok{(IQR))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{tabular}{r|r}
+\hline
+month & IQR\\
+\hline
+11 & 16.02\\
+\hline
+12 & 14.04\\
+\hline
+1 & 13.77\\
+\hline
+9 & 12.06\\
+\hline
+4 & 12.06\\
+\hline
+5 & 11.88\\
+\hline
+6 & 10.98\\
+\hline
+10 & 10.98\\
+\hline
+2 & 10.08\\
+\hline
+7 & 9.18\\
+\hline
+3 & 9.00\\
+\hline
+8 & 7.02\\
+\hline
+\end{tabular}
+
+\textbf{(LC3.24)} We looked at the distribution of the numerical variable \texttt{temp} split by the numerical variable \texttt{month} that we converted to a categorical variable using the \texttt{factor()} function. Why would a boxplot of \texttt{temp} split by the numerical variable \texttt{pressure} similarly converted to a categorical variable using the \texttt{factor()} not be informative?
+
+\textbf{Solution}: Because there are 12 unique values of \texttt{month} yielding only 12 boxes in our boxplot. There are many more unique values of \texttt{pressure} (469 unique values in fact), because values are to the first decimal place. This would lead to 469 boxes, which is too many for people to digest.
+
+\textbf{(LC3.25)} Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?
+
+\textbf{Solution}: In a histogram, the bin corresponding to where an outlier lies may not by high enough for us to see. In a boxplot, they are explicitly labelled separately.
+
+\textbf{(LC3.26)} Why are histograms inappropriate for visualizing categorical variables?
+
+\textbf{Solution}: Histograms are for numerical variables i.e.~the horizontal part of each histogram bar represents an interval, whereas for a categorical variable each bar represents only one level of the categorical variable.
+
+\textbf{(LC3.27)} What is the difference between histograms and barplots?
+
+\textbf{Solution}: See above.
+
+\textbf{(LC3.28)} How many Envoy Air flights departed NYC in 2013?
+
+\textbf{Solution}: Envoy Air is carrier code \texttt{MQ} and thus 26397 flights departed NYC in 2013.
+
+\textbf{(LC3.29)} What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly?
+
+\textbf{Solution}: What a pain! We'll see in Chapter 5 on Data Wrangling that applying \texttt{arrange(desc(n))} will sort this table in descending order of \texttt{n}!
+
+\textbf{(LC3.30)} Why should pie charts be avoided and replaced by barplots?
+
+\textbf{Solution}: In our \textbf{opinion}, comparisons using horizontal lines are easier than comparing angles and areas of circles.
+
+\textbf{(LC3.31)} What is your opinion as to why pie charts continue to be used?
+
+\textbf{Solution}: Legacy?
+
+\textbf{(LC3.32)} What kinds of questions are not easily answered by looking at the above figure?
+
+\textbf{Solution}: Because the red, green, and blue bars don't all start at 0 (only red does), it makes comparing counts hard.
+
+\textbf{(LC3.33)} What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights?
+
+\textbf{Solution}: The different airlines prefer different airports. For example, United is mostly a Newark carrier and JetBlue is a JFK carrier. If airlines didn't prefer airports, each color would be roughly one third of each bar.\}
+
+\textbf{(LC3.34)} Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case?
+
+\textbf{Solution}: We can easily compare the different aiports for a given carrier using a single comparison line i.e.~things are lined up
+
+\textbf{(LC3.35)} What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general?
+
+\textbf{Solution}: It is hard to get totals for each airline.
+
+\textbf{(LC3.36)} Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case?
+
+\textbf{Solution}: Not that different than using side-by-side; depends on how you want to organize your presentation.
+
+\textbf{(LC3.37)} What information about the different carriers at different airports is more easily seen in the faceted barplot?
+
+\textbf{Solution}: Now we can also compare the different carriers \textbf{within} a particular airport easily too. For example, we can read off who the top carrier for each airport is easily using a single horizontal line.
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{chapter-4-solutions}{%
+\section{Chapter 4 Solutions}\label{chapter-4-solutions}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\end{Highlighting}
+\end{Shaded}
+
+\textbf{(LC4.1)} What's another way using the ``not'' operator \texttt{!} to filter only the rows that are not going to Burlington, VT nor Seattle, WA in the \texttt{flights} data frame? Test this out using the code above.
+
+\textbf{Solution}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\CommentTok{# Original in book}
+\NormalTok{not_BTV_SEA <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(}\OperatorTok{!}\NormalTok{(dest }\OperatorTok{==}\StringTok{ "BTV"} \OperatorTok{|}\StringTok{ }\NormalTok{dest }\OperatorTok{==}\StringTok{ "SEA"}\NormalTok{))}
+
+\CommentTok{# Alternative way}
+\NormalTok{not_BTV_SEA <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(}\OperatorTok{!}\NormalTok{dest }\OperatorTok{==}\StringTok{ "BTV"} \OperatorTok{&}\StringTok{ }\OperatorTok{!}\NormalTok{dest }\OperatorTok{==}\StringTok{ "SEA"}\NormalTok{)}
+
+\CommentTok{# Yet another way}
+\NormalTok{not_BTV_SEA <-}\StringTok{ }\NormalTok{flights }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{filter}\NormalTok{(dest }\OperatorTok{!=}\StringTok{ "BTV"} \OperatorTok{&}\StringTok{ }\NormalTok{dest }\OperatorTok{!=}\StringTok{ "SEA"}\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\textbf{(LC4.2)} Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor's approach?
+
+\textbf{Solution}: The missing patients may have died of lung cancer! So to ignore them might seriously \textbf{bias} your results! It is very important to think of what the consequences on your analysis are of ignoring missing data! Ask yourself:
+
+\begin{itemize}
+\tightlist
+\item
+  There is a systematic reasons why certain values are missing? If so, you might be biasing your results!
+\item
+  If there isn't, then it might be ok to ``sweep missing values under the rug.''
+\end{itemize}
+
+\textbf{(LC4.3)} Modify the above \texttt{summarize} function to create \texttt{summary\_temp} to also use the \texttt{n()} summary function: \texttt{summarize(count\ =\ n())}. What does the returned value correspond to?
+
+\textbf{Solution}: It corresponds to a count of the number of observations/rows:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{weather }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{count =} \KeywordTok{n}\NormalTok{())}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+  count
+  <int>
+1 26115
+\end{verbatim}
+
+\textbf{(LC4.4)} Why doesn't the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run \texttt{summary\_temp\ \textless{}-\ weather\ \%\textgreater{}\%\ summarize(mean\ =\ mean(temp,\ na.rm\ =\ TRUE))} first.
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{summary_temp <-}\StringTok{ }\NormalTok{weather }\OperatorTok{%>%}\StringTok{   }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{std_dev =} \KeywordTok{sd}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\textbf{Solution}: Consider the output of only running the first two lines:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{weather }\OperatorTok{%>%}\StringTok{   }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}\DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{))}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 1 x 1
+   mean
+  <dbl>
+1  55.3
+\end{verbatim}
+
+Because after the first \texttt{summarize()}, the variable \texttt{temp} disappears as it has been collapsed to the value \texttt{mean}. So when we try to run the second \texttt{summarize()}, it can't find the variable \texttt{temp} to compute the standard deviation of.
+
+\textbf{(LC4.5)} Recall from Chapter \ref{viz} when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the \texttt{summary\_monthly\_temp} data frame tell us about temperatures in New York City throughout the year?
+
+\textbf{Solution}:
+
+The standard deviation is a quantification of \textbf{spread} and \textbf{variability}. We
+see that the period in November, December, and January has the most variation in
+weather, so you can expect very different temperatures on different days.
+
+\textbf{(LC4.6)} What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC?
+
+\textbf{Solution}:
+
+Note: \texttt{group\_by(day)} is not enough, because \texttt{day} is a value between 1-31. We need to \texttt{group\_by(year,\ month,\ day)}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(nycflights13)}
+
+\NormalTok{summary_temp_by_month <-}\StringTok{ }\NormalTok{weather }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{group_by}\NormalTok{(month) }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{summarize}\NormalTok{(}
+          \DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{),}
+          \DataTypeTok{std_dev =} \KeywordTok{sd}\NormalTok{(temp, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{)}
+\NormalTok{          )}
+\end{Highlighting}
+\end{Shaded}
+
+\textbf{(LC4.7)} Recreate \texttt{by\_monthly\_origin}, but instead of grouping via \texttt{group\_by(origin,\ month)}, group variables in a different order \texttt{group\_by(month,\ origin)}. What differs in the resulting dataset?
+
+\textbf{Solution}:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{by_monthly_origin}
+\end{Highlighting}
+\end{Shaded}
+
+In \texttt{by\_monthly\_origin} the \texttt{month} column is now first and the rows are sorted by \texttt{month} instead of origin. If you compare the values of \texttt{count} in \texttt{by\_origin\_monthly} and \texttt{by\_monthly\_origin} using the \texttt{View()} function, you'll see that the values are actually the same, just presented in a different order.
+
+\textbf{(LC4.8)} How could we identify how many flights left each of the three airports for each \texttt{carrier}?
+
+\textbf{Solution}: We could summarize the count from each airport using the \texttt{n()} function, which \emph{counts rows}.
+
+All remarkably similar! Note: the \texttt{n()} function counts rows, whereas the \texttt{sum(VARIABLE\_NAME)} funciton sums all values of a certain numerical variable \texttt{VARIABLE\_NAME}.
+
+\textbf{(LC4.9)} How does the \texttt{filter} operation differ from a \texttt{group\_by} followed by a \texttt{summarize}?
+
+\textbf{Solution}:
+
+\begin{itemize}
+\tightlist
+\item
+  \texttt{filter} picks out rows from the original dataset without modifying them, whereas
+\item
+  \texttt{group\_by\ \%\textgreater{}\%\ summarize} computes summaries of numerical variables, and hence
+  reports new values.
+\end{itemize}
+
+\textbf{(LC4.10)} What do positive values of the \texttt{gain} variable in \texttt{flights} correspond to? What about negative values? And what about a zero value?
+
+\textbf{Solution}:
+
+\begin{itemize}
+\tightlist
+\item
+  Say a flight departed 20 minutes late, i.e. \texttt{dep\_delay\ =\ 20}
+\item
+  Then arrived 10 minutes late, i.e. \texttt{arr\_delay\ =\ 10}.
+\item
+  Then \texttt{gain\ =\ dep\_delay\ -\ arr\_delay\ =\ 20\ -\ 10\ \ =\ 10} is positive, so it ``made up/gained time in the air.''
+\item
+  0 means the departure and arrival time were the same, so no time was made up in the air. We see in most cases that the \texttt{gain} is near 0 minutes.
+\item
+  I never understood this. If the pilot says ``we're going make up time in the air''
+  because of delay by flying faster, why don't you always just fly faster to begin
+  with?
+\end{itemize}
+
+\textbf{(LC4.11)} Could we create the \texttt{dep\_delay} and \texttt{arr\_delay} columns by simply subtracting \texttt{dep\_time} from \texttt{sched\_dep\_time} and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in \texttt{flights}.
+
+\textbf{Solution}: No because you can't do direct arithmetic on times. The difference in time between 12:03 and 11:59 is 4 minutes, but \texttt{1203-1159\ =\ 44}
+
+\textbf{(LC4.12)} What can we say about the distribution of \texttt{gain}? Describe it in a few sentences using the plot and the \texttt{gain\_summary} data frame values.
+
+\textbf{Solution}: Most of the time the gain is a little under zero, most of the time the gain is between -50 and 50 minutes. There are some extreme cases however!
+
+\textbf{(LC4.13)} Looking at Figure \ref{fig:reldiagram}, when joining \texttt{flights} and \texttt{weather} (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of \texttt{year}, \texttt{month}, \texttt{day}, \texttt{hour}, and \texttt{origin}, and not just \texttt{hour}?
+
+\textbf{Solution}: Because \texttt{hour} is simply a value between 0 and 23; to identify a \emph{specific} hour, we need to know which year, month, day and at which airport.
+
+\textbf{(LC4.14)} What surprises you about the top 10 destinations from NYC in 2013?
+
+\textbf{Solution}: This question is subjective! What surprises me is the high number of flights to Boston. Wouldn't it be easier and quicker to take the train?
+
+\textbf{(LC4.15)} What are some advantages of data in normal forms? What are some disadvantages?
+
+\textbf{Solution}: When datasets are in normal form, we can easily \texttt{\_join} them with other datasets! For example, we can join the \texttt{flights} data with the \texttt{planes} data.
+
+\textbf{(LC4.16)} What are some ways to select all three of the \texttt{dest}, \texttt{air\_time}, and \texttt{distance} variables from \texttt{flights}? Give the code showing how to do this in at least three different ways.
+
+\textbf{Solution}:
+
+\textbf{(LC4.17)} How could one use \texttt{starts\_with}, \texttt{ends\_with}, and \texttt{contains} to select columns from the \texttt{flights} data frame? Provide three different examples in total: one for \texttt{starts\_with}, one for \texttt{ends\_with}, and one for \texttt{contains}.
+
+\textbf{Solution}:
+
+\textbf{(LC4.18)} Why might we want to use the \texttt{select()} function on a data frame?
+
+\textbf{Solution}: To narrow down the data frame, to make it easier to look at. Using \texttt{View()} for example.
+
+\textbf{(LC4.19)} Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.
+
+\textbf{Solution}:
+
+\textbf{(LC4.20)} Using the datasets included in the \texttt{nycflights13} package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\tightlist
+\item
+  \textbf{Crucial}: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level \emph{pseudocode} that is informal yet detailed enough to articulate what you are doing. This way you won't confuse \emph{what} you are trying to do (the algorithm) with \emph{how} you are going to do it (writing \texttt{dplyr} code).
+\item
+  Take a close look at all the datasets using the \texttt{View()} function: \texttt{flights}, \texttt{weather}, \texttt{planes}, \texttt{airports}, and \texttt{airlines} to identify which variables are necessary to compute available seat miles.
+\item
+  Figure \ref{fig:reldiagram} above showing how the various datasets can be joined will also be useful.
+\item
+  Consider the data wrangling verbs in Table \ref{tab:wrangle-summary-table} as your toolbox!
+\end{enumerate}
+
+\textbf{Solution}: Here are some examples of student-written \href{https://twitter.com/rudeboybert/status/964181298691629056}{pseudocode}. Based on our own pseudocode, let's first display the entire solution.
+
+Let's now break this down step-by-step. To compute the available seat miles for a given flight, we need the \texttt{distance} variable from the \texttt{flights} data frame and the \texttt{seats} variable from the \texttt{planes} data frame, necessitating a join by the key variable \texttt{tailnum} as illustrated in Figure \ref{fig:reldiagram}. To keep the resulting data frame easy to view, we'll \texttt{select()} only these two variables and \texttt{carrier}:
+
+Now for each flight we can compute the available seat miles \texttt{ASM} by multiplying the number of seats by the distance via a \texttt{mutate()}:
+
+Next we want to sum the \texttt{ASM} for each carrier. We achieve this by first grouping by \texttt{carrier} and then summarizing using the \texttt{sum()} function:
+
+However, because for certain carriers certain flights have missing \texttt{NA} values, the resulting table also returns \texttt{NA}'s. We can eliminate these by adding a \texttt{na.rm\ =\ TRUE} argument to \texttt{sum()}, telling R that we want to remove the \texttt{NA}'s in the sum. We saw this in Section \ref{summarize}:
+
+Finally, we \texttt{arrange()} the data in \texttt{desc()}ending order of \texttt{ASM}.
+
+While the above data frame is correct, the IATA \texttt{carrier} code is not always useful. For example, what carrier is \texttt{WN}? We can address this by joining with the \texttt{airlines} dataset using \texttt{carrier} is the key variable. While this step is not absolutely required, it goes a long way to making the table easier to make sense of. It is important to be empathetic with the ultimate consumers of your presented data!
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{chapter-5-solutions}{%
+\section{Chapter 5 Solutions}\label{chapter-5-solutions}}
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(nycflights13)}
+\KeywordTok{library}\NormalTok{(tidyr)}
+\KeywordTok{library}\NormalTok{(readr)}
+\end{Highlighting}
+\end{Shaded}
+
+\textbf{(LC5.1)} What are common characteristics of ``tidy'' datasets?
+
+\textbf{Solution}: Rows correspond to observations, while columns correspond to variables.
+
+\textbf{(LC5.2)} What makes ``tidy'' datasets useful for organizing data?
+
+\textbf{Solution}: Tidy datasets are an organized way of viewing data. This format is required for the \texttt{ggplot2} and \texttt{dplyr} packages for data visualization and wrangling.
+
+\textbf{(LC5.3)} Take a look the \texttt{airline\_safety} data frame included in the \texttt{fivethirtyeight} data. Run the following:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{airline_safety}
+\end{Highlighting}
+\end{Shaded}
+
+After reading the help file by running \texttt{?airline\_safety}, we see that \texttt{airline\_safety} is a data frame containing information on different airlines companies' safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver's article \href{https://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/}{``Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?''}. Let's ignore the \texttt{incl\_reg\_subsidiaries} and \texttt{avail\_seat\_km\_per\_week} variables for simplicity:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{airline_safety_smaller <-}\StringTok{ }\NormalTok{airline_safety }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{select}\NormalTok{(}\OperatorTok{-}\KeywordTok{c}\NormalTok{(incl_reg_subsidiaries, avail_seat_km_per_week))}
+\NormalTok{airline_safety_smaller}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 56 x 7
+   airline incidents_85_99 fatal_accidents~ fatalities_85_99
+   <chr>             <int>            <int>            <int>
+ 1 Aer Li~               2                0                0
+ 2 Aerofl~              76               14              128
+ 3 Aeroli~               6                0                0
+ 4 Aerome~               3                1               64
+ 5 Air Ca~               2                0                0
+ 6 Air Fr~              14                4               79
+ 7 Air In~               2                1              329
+ 8 Air Ne~               3                0                0
+ 9 Alaska~               5                0                0
+10 Alital~               7                2               50
+# ... with 46 more rows, and 3 more variables:
+#   incidents_00_14 <int>, fatal_accidents_00_14 <int>,
+#   fatalities_00_14 <int>
+\end{verbatim}
+
+This data frame is not in ``tidy'' format. How would you convert this data frame to be in ``tidy'' format, in particular so that it has a variable \texttt{incident\_type\_years} indicating the indicent type/year and a variable \texttt{count} of the counts?
+
+\textbf{Solution}: Using the \texttt{gather()} function from the \texttt{tidyr} package:
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{airline_safety_smaller_tidy <-}\StringTok{ }\NormalTok{airline_safety_smaller }\OperatorTok{%>%}\StringTok{ }
+\StringTok{  }\KeywordTok{gather}\NormalTok{(}\DataTypeTok{key =}\NormalTok{ incident_type_years, }\DataTypeTok{value =}\NormalTok{ count, }\OperatorTok{-}\NormalTok{airline)}
+\NormalTok{airline_safety_smaller_tidy}
+\end{Highlighting}
+\end{Shaded}
+
+\begin{verbatim}
+# A tibble: 336 x 3
+   airline               incident_type_years count
+   <chr>                 <chr>               <int>
+ 1 Aer Lingus            incidents_85_99         2
+ 2 Aeroflot              incidents_85_99        76
+ 3 Aerolineas Argentinas incidents_85_99         6
+ 4 Aeromexico            incidents_85_99         3
+ 5 Air Canada            incidents_85_99         2
+ 6 Air France            incidents_85_99        14
+ 7 Air India             incidents_85_99         2
+ 8 Air New Zealand       incidents_85_99         3
+ 9 Alaska Airlines       incidents_85_99         5
+10 Alitalia              incidents_85_99         7
+# ... with 326 more rows
+\end{verbatim}
+
+If you look at the resulting \texttt{airline\_safety\_smaller\_tidy} data frame in the spreadsheet viewer, you'll see that the variable \texttt{incident\_type\_years} has 6 possible values: \texttt{"incidents\_85\_99",\ "fatal\_accidents\_85\_99",\ "fatalities\_85\_99",\ \ "incidents\_00\_14",\ "fatal\_accidents\_00\_14",\ "fatalities\_00\_14"} corresponding to the 6 columns of \texttt{airline\_safety\_smaller} we tidied.
+
+\textbf{(LC5.4)} Convert the \texttt{dem\_score} data frame into
+a tidy data frame and assign the name of \texttt{dem\_score\_tidy} to the resulting long-formatted data frame.
+
+\textbf{Solution}: Running the following in the console:
+
+Let's now compare the \texttt{dem\_score} and \texttt{dem\_score\_tidy}. \texttt{dem\_score} has democracy score information for each year in columns, whereas in \texttt{dem\_score\_tidy} there are explicit variables \texttt{year} and \texttt{democracy\_score}. While both representations of the data contain the same information, we can only use \texttt{ggplot()} to create plots using the \texttt{dem\_score\_tidy} data frame.
+
+\textbf{(LC5.5)} Read in the life expectancy data stored at \url{https://moderndive.com/data/le_mess.csv} and convert it to a tidy data frame.
+
+\textbf{Solution}: The code is similar
+
+We observe the same construct structure with respect to \texttt{year} in \texttt{life\_expectancy} vs \texttt{life\_expectancy\_tidy} as we did in \texttt{dem\_score} vs \texttt{dem\_score\_tidy}:
+
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
+
+\hypertarget{chapter-6-solutions}{%
+\section{Chapter 6 Solutions}\label{chapter-6-solutions}}
+
+To come!
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\KeywordTok{library}\NormalTok{(ggplot2)}
+\KeywordTok{library}\NormalTok{(dplyr)}
+\KeywordTok{library}\NormalTok{(moderndive)}
+\KeywordTok{library}\NormalTok{(gapminder)}
+\KeywordTok{library}\NormalTok{(skimr)}
+\end{Highlighting}
+\end{Shaded}
+
+\bibliography{bib/books.bib,bib/packages.bib,bib/articles.bib}
+
+\backmatter
+\printindex
+
+\end{document}
diff --git a/docs/ismaykim_files/figure-html/comparing-sampling-distributions-1.png b/docs/ismaykim_files/figure-html/comparing-sampling-distributions-1.png
index 1d23301df..44967b82c 100644
Binary files a/docs/ismaykim_files/figure-html/comparing-sampling-distributions-1.png and b/docs/ismaykim_files/figure-html/comparing-sampling-distributions-1.png differ
diff --git a/docs/ismaykim_files/figure-html/comparing-sampling-distributions-2-1.png b/docs/ismaykim_files/figure-html/comparing-sampling-distributions-2-1.png
new file mode 100644
index 000000000..2fc646d95
Binary files /dev/null and b/docs/ismaykim_files/figure-html/comparing-sampling-distributions-2-1.png differ
diff --git a/docs/ismaykim_files/figure-html/comparing-sampling-distributions-3-1.png b/docs/ismaykim_files/figure-html/comparing-sampling-distributions-3-1.png
new file mode 100644
index 000000000..2223b8582
Binary files /dev/null and b/docs/ismaykim_files/figure-html/comparing-sampling-distributions-3-1.png differ
diff --git a/docs/ismaykim_files/figure-html/flightscol-1.png b/docs/ismaykim_files/figure-html/flightscol-1.png
deleted file mode 100644
index 530008410..000000000
Binary files a/docs/ismaykim_files/figure-html/flightscol-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/monthtempbox2-1.png b/docs/ismaykim_files/figure-html/monthtempbox2-1.png
deleted file mode 100644
index 909f37bae..000000000
Binary files a/docs/ismaykim_files/figure-html/monthtempbox2-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/monthtempbox3-1.png b/docs/ismaykim_files/figure-html/monthtempbox3-1.png
deleted file mode 100644
index 13a91bf9a..000000000
Binary files a/docs/ismaykim_files/figure-html/monthtempbox3-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-114-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-114-1.png
deleted file mode 100644
index 1c09d4e44..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-114-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-128-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-128-1.png
index ca28c859b..1c09d4e44 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-128-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-128-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-136-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-136-1.png
new file mode 100644
index 000000000..9868445fc
Binary files /dev/null and b/docs/ismaykim_files/figure-html/unnamed-chunk-136-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-163-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-163-1.png
deleted file mode 100644
index b7c35c173..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-163-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-164-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-164-1.png
deleted file mode 100644
index c7c5cc73c..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-164-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-173-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-183-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-173-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-183-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-174-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-184-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-174-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-184-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-217-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-226-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-217-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-226-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-24-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-24-1.png
deleted file mode 100644
index f9b4166dc..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-24-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-245-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-245-1.png
deleted file mode 100644
index 2b72ec0b7..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-245-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-247-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-247-1.png
new file mode 100644
index 000000000..44967b82c
Binary files /dev/null and b/docs/ismaykim_files/figure-html/unnamed-chunk-247-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-25-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-25-1.png
deleted file mode 100644
index 5ee39ffc8..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-25-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-255-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-255-1.png
index b15d00000..e9b8f09c5 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-255-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-255-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-259-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-259-1.png
index 2ec3bb210..b15d00000 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-259-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-259-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-26-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-26-1.png
deleted file mode 100644
index dbed10227..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-26-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-258-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-262-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-258-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-262-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-27-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-27-1.png
deleted file mode 100644
index b13df4d3c..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-27-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-273-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-273-1.png
deleted file mode 100644
index 13e1d625a..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-273-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-271-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-275-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-271-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-275-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-277-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-277-1.png
index f52137342..13e1d625a 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-277-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-277-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-28-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-28-1.png
index 8877991d2..f9b4166dc 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-28-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-28-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-281-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-281-1.png
index b8923fa33..f52137342 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-281-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-281-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-280-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-284-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-280-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-284-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-326-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-285-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-326-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-285-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-287-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-287-1.png
index 887295abf..e9b8f09c5 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-287-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-287-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-289-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-289-1.png
deleted file mode 100644
index 6ef18c209..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-289-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-29-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-29-1.png
index 2eff27fd8..5ee39ffc8 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-29-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-29-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-330-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-291-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-330-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-291-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-293-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-293-1.png
index d07081885..6ef18c209 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-293-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-293-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-301-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-301-1.png
index aba8f4003..e8a891a84 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-301-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-301-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-298-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-302-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-298-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-302-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-308-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-308-1.png
deleted file mode 100644
index 6117758de..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-308-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-306-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-310-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-306-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-310-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-312-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-312-1.png
index 66c96b001..6117758de 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-312-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-312-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-316-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-316-1.png
deleted file mode 100644
index a83a3f012..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-316-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-320-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-320-1.png
deleted file mode 100644
index 4ebd1834f..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-320-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-322-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-322-1.png
deleted file mode 100644
index 3b8313758..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-322-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-325-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-325-1.png
deleted file mode 100644
index ad168cf8f..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-325-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-323-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-327-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-323-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-327-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-328-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-328-1.png
deleted file mode 100644
index e9b8f09c5..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-328-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-334-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-334-1.png
deleted file mode 100644
index 6ef18c209..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-334-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-335-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-335-1.png
deleted file mode 100644
index 0dc266603..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-335-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-340-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-340-1.png
index e8a891a84..0dc266603 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-340-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-340-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-343-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-343-1.png
deleted file mode 100644
index 600dc1c56..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-343-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-349-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-349-1.png
index f5ebba1b3..5d138e29d 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-349-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-349-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-351-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-351-1.png
deleted file mode 100644
index 6117758de..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-351-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-347-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-352-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-347-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-352-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-353-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-353-1.png
index 6117758de..473bdf5e8 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-353-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-353-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-361-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-361-1.png
deleted file mode 100644
index 66a0bdf05..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-361-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-363-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-363-1.png
deleted file mode 100644
index 0e8e575a3..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-363-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-366-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-366-1.png
index 51c650433..66a0bdf05 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-366-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-366-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-362-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-367-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-362-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-367-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-368-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-368-1.png
index 7b141a8b0..0e8e575a3 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-368-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-368-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-37-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-37-1.png
deleted file mode 100644
index bc6dffa14..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-37-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-370-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-370-1.png
deleted file mode 100644
index 704b59d55..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-370-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-412-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-373-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-412-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-373-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-374-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-374-1.png
index c967fb69d..28dd1068e 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-374-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-374-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-371-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-376-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-371-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-376-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-372-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-377-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-372-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-377-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-379-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-379-1.png
index c4a378ec1..c967fb69d 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-379-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-379-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-34-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-38-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-34-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-38-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-383-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-388-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-383-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-388-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-393-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-393-1.png
deleted file mode 100644
index 473bdf5e8..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-393-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-389-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-394-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-389-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-394-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-403-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-403-1.png
deleted file mode 100644
index 9d2be180e..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-403-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-404-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-404-1.png
deleted file mode 100644
index f3c3cb53a..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-404-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-406-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-406-1.png
deleted file mode 100644
index 38e548f67..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-406-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-408-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-408-1.png
deleted file mode 100644
index 0e8e575a3..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-408-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-41-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-41-1.png
index 00f7f2338..bc6dffa14 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-41-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-41-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-410-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-410-1.png
deleted file mode 100644
index d403be4b5..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-410-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-414-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-414-1.png
deleted file mode 100644
index 28dd1068e..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-414-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-415-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-415-1.png
index 61c1fc57b..d403be4b5 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-415-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-415-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-416-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-416-1.png
deleted file mode 100644
index f2589c09f..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-416-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-417-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-417-1.png
deleted file mode 100644
index 1f89ebbc1..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-417-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-426-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-426-1.png
deleted file mode 100644
index cd6c55e94..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-426-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-429-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-429-1.png
deleted file mode 100644
index cdedbb6b5..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-429-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-44-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-44-1.png
deleted file mode 100644
index 0970ced49..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-44-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-435-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-440-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-435-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-440-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-444-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-444-1.png
deleted file mode 100644
index b41bf7e86..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-444-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-449-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-449-1.png
index eaebed90d..b41bf7e86 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-449-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-449-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-445-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-450-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-445-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-450-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-452-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-452-1.png
deleted file mode 100644
index 22c750bc4..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-452-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-454-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-454-1.png
index 61bb14963..eaebed90d 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-454-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-454-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-457-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-457-1.png
index 4da06aee8..22c750bc4 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-457-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-457-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-453-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-458-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-453-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-458-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-462-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-462-1.png
index adb1f9e5a..4da06aee8 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-462-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-462-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-463-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-463-1.png
deleted file mode 100644
index 86301fa48..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-463-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-467-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-467-1.png
index d99896f0e..adb1f9e5a 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-467-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-467-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-468-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-468-1.png
index cbdb5cccc..86301fa48 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-468-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-468-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-471-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-471-1.png
deleted file mode 100644
index ed947ab58..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-471-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-472-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-472-1.png
index 75e375d53..d99896f0e 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-472-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-472-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-473-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-473-1.png
deleted file mode 100644
index b41bf7e86..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-473-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-474-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-474-1.png
deleted file mode 100644
index 522de31bc..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-474-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-475-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-475-1.png
deleted file mode 100644
index e33b6bfb8..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-475-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-476-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-476-1.png
index 74c7bd743..ed947ab58 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-476-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-476-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-477-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-477-1.png
index 23529eddf..75e375d53 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-477-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-477-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-478-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-478-1.png
deleted file mode 100644
index eaebed90d..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-478-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-479-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-479-1.png
deleted file mode 100644
index a6ad5af19..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-479-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-480-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-480-1.png
deleted file mode 100644
index 7d5f47379..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-480-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-481-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-481-1.png
index 048a6e614..74c7bd743 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-481-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-481-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-483-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-483-1.png
deleted file mode 100644
index ae86b0ef1..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-483-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-485-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-485-1.png
index 8a939fe49..7d5f47379 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-485-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-485-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-486-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-486-1.png
index 72d5c1967..048a6e614 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-486-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-486-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-489-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-489-1.png
deleted file mode 100644
index e012ceacc..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-489-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-490-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-490-1.png
index 4b44d62f5..8a939fe49 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-490-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-490-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-492-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-492-1.png
deleted file mode 100644
index 86301fa48..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-492-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-493-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-493-1.png
deleted file mode 100644
index 734e38bd4..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-493-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-496-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-496-1.png
index d99896f0e..775c6dcde 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-496-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-496-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-498-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-498-1.png
deleted file mode 100644
index b7c35c173..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-498-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-499-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-499-1.png
deleted file mode 100644
index c7c5cc73c..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-499-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-501-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-501-1.png
deleted file mode 100644
index 75e375d53..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-501-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-502-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-502-1.png
deleted file mode 100644
index 6dd10e56d..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-502-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-504-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-504-1.png
index 5ca351109..b7c35c173 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-504-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-504-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-505-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-505-1.png
index 74c7bd743..c7c5cc73c 100644
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-505-1.png and b/docs/ismaykim_files/figure-html/unnamed-chunk-505-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-507-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-507-1.png
deleted file mode 100644
index 2ff6dd049..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-507-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-509-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-509-1.png
deleted file mode 100644
index 7d5f47379..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-509-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-511-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-511-1.png
deleted file mode 100644
index 66cbb9340..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-511-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-514-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-514-1.png
deleted file mode 100644
index 8a939fe49..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-514-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-520-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-520-1.png
deleted file mode 100644
index 775c6dcde..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-520-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-524-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-524-1.png
deleted file mode 100644
index b7c35c173..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-524-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-525-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-525-1.png
deleted file mode 100644
index c7c5cc73c..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-525-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-526-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-526-1.png
deleted file mode 100644
index b7c35c173..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-526-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-527-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-527-1.png
deleted file mode 100644
index b7c35c173..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-527-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-528-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-528-1.png
deleted file mode 100644
index c7c5cc73c..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-528-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-64-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-64-1.png
deleted file mode 100644
index f707a4355..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-64-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-74-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-74-1.png
deleted file mode 100644
index 6c035f1b1..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-74-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-82-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-82-1.png
deleted file mode 100644
index b7c35c173..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-82-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-83-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-83-1.png
deleted file mode 100644
index c7c5cc73c..000000000
Binary files a/docs/ismaykim_files/figure-html/unnamed-chunk-83-1.png and /dev/null differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-126-1.png b/docs/ismaykim_files/figure-html/unnamed-chunk-87-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-126-1.png
rename to docs/ismaykim_files/figure-html/unnamed-chunk-87-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/comparing-sampling-distributions-1.png b/docs/ismaykimkuyper_files/figure-html/comparing-sampling-distributions-1.png
index 1c2e4f708..b86bccb4d 100644
Binary files a/docs/ismaykimkuyper_files/figure-html/comparing-sampling-distributions-1.png and b/docs/ismaykimkuyper_files/figure-html/comparing-sampling-distributions-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/comparing-sampling-distributions-2-1.png b/docs/ismaykimkuyper_files/figure-html/comparing-sampling-distributions-2-1.png
new file mode 100644
index 000000000..1cb96f37d
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/comparing-sampling-distributions-2-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/comparing-sampling-distributions-3-1.png b/docs/ismaykimkuyper_files/figure-html/comparing-sampling-distributions-3-1.png
new file mode 100644
index 000000000..dccbc364e
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/comparing-sampling-distributions-3-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-126-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-126-1.png
new file mode 100644
index 000000000..2fda902a0
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-126-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-134-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-134-1.png
new file mode 100644
index 000000000..076b5f491
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-134-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-201-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-181-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-201-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-181-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-202-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-182-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-202-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-182-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-249-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-224-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-249-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-224-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-245-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-245-1.png
index d87de08c4..b86bccb4d 100644
Binary files a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-245-1.png and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-245-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-286-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-253-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-286-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-253-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-290-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-257-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-290-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-257-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-26-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-26-1.png
new file mode 100644
index 000000000..d1b281f6f
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-26-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-260-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-260-1.png
new file mode 100644
index 000000000..d07081885
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-260-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-27-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-27-1.png
new file mode 100644
index 000000000..036101171
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-27-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-273-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-273-1.png
index 867a0d815..d33568604 100644
Binary files a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-273-1.png and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-273-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-275-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-275-1.png
new file mode 100644
index 000000000..867a0d815
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-275-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-279-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-279-1.png
new file mode 100644
index 000000000..66c96b001
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-279-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-315-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-282-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-315-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-282-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-283-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-283-1.png
index 8b50ed479..a83a3f012 100644
Binary files a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-283-1.png and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-283-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-318-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-285-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-318-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-285-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-289-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-289-1.png
index dfb66ec6a..3b8313758 100644
Binary files a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-289-1.png and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-289-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-324-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-291-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-324-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-291-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-332-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-299-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-332-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-299-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-333-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-300-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-333-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-300-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-341-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-310-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-341-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-310-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-358-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-325-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-358-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-325-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-338-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-338-1.png
new file mode 100644
index 000000000..ac6f58d36
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-338-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-347-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-347-1.png
index c24112be0..c4a378ec1 100644
Binary files a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-347-1.png and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-347-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-382-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-350-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-382-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-350-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-351-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-351-1.png
new file mode 100644
index 000000000..b2716b8e1
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-351-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-36-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-36-1.png
new file mode 100644
index 000000000..bae030a6e
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-36-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-396-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-364-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-396-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-364-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-397-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-365-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-397-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-365-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-398-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-366-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-398-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-366-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-371-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-371-1.png
index 38e548f67..9d2be180e 100644
Binary files a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-371-1.png and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-371-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-372-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-372-1.png
index 06c1d002c..f3c3cb53a 100644
Binary files a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-372-1.png and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-372-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-374-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-374-1.png
index 171d3e5e5..38e548f67 100644
Binary files a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-374-1.png and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-374-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-407-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-375-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-407-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-375-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-409-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-377-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-409-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-377-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-419-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-386-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-419-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-386-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-39-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-39-1.png
new file mode 100644
index 000000000..416f0f951
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-39-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-425-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-392-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-425-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-392-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-234-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-413-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-234-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-413-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-438-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-438-1.png
new file mode 100644
index 000000000..61bb14963
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-438-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-447-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-447-1.png
new file mode 100644
index 000000000..8a3aa7114
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-447-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-464-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-448-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-464-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-448-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-452-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-452-1.png
index 245fcbc01..cbdb5cccc 100644
Binary files a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-452-1.png and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-452-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-455-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-455-1.png
new file mode 100644
index 000000000..245fcbc01
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-455-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-456-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-456-1.png
new file mode 100644
index 000000000..67f2b1539
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-456-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-460-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-460-1.png
new file mode 100644
index 000000000..364cb7fa0
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-460-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-465-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-465-1.png
new file mode 100644
index 000000000..f3264b4ce
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-465-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-482-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-466-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-482-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-466-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-470-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-470-1.png
new file mode 100644
index 000000000..72d5c1967
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-470-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-474-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-474-1.png
new file mode 100644
index 000000000..4b44d62f5
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-474-1.png differ
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-475-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-475-1.png
new file mode 100644
index 000000000..7d482b515
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-475-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-495-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-479-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-495-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-479-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-483-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-483-1.png
new file mode 100644
index 000000000..707c3700a
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-483-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-500-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-484-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-500-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-484-1.png
diff --git a/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-488-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-488-1.png
new file mode 100644
index 000000000..5ca351109
Binary files /dev/null and b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-488-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-510-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-494-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-510-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-494-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-517-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-502-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-517-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-502-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-518-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-503-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-518-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-503-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-118-1.png b/docs/ismaykimkuyper_files/figure-html/unnamed-chunk-85-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-118-1.png
rename to docs/ismaykimkuyper_files/figure-html/unnamed-chunk-85-1.png
diff --git a/docs/11-hypothesis-testing.html b/docs/previous_versions/v0.4.0/10-hypothesis-testing.html
similarity index 74%
rename from docs/11-hypothesis-testing.html
rename to docs/previous_versions/v0.4.0/10-hypothesis-testing.html
index 48509d3da..75f55a11d 100644
--- a/docs/11-hypothesis-testing.html
+++ b/docs/previous_versions/v0.4.0/10-hypothesis-testing.html
@@ -5,11 +5,11 @@
 
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
-  <title>Chapter 11 Hypothesis Testing | Statistical Inference via Data Science</title>
+  <title>10 Hypothesis Testing | An Introduction to Statistical and Data Sciences via R</title>
   <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
-  <meta property="og:title" content="Chapter 11 Hypothesis Testing | Statistical Inference via Data Science" />
+  <meta property="og:title" content="10 Hypothesis Testing | An Introduction to Statistical and Data Sciences via R" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
@@ -17,7 +17,7 @@
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
-  <meta name="twitter:title" content="Chapter 11 Hypothesis Testing | Statistical Inference via Data Science" />
+  <meta name="twitter:title" content="10 Hypothesis Testing | An Introduction to Statistical and Data Sciences via R" />
   
   <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
@@ -25,15 +25,15 @@
 <meta name="author" content="Chester Ismay and Albert Y. Kim">
 
 
-<meta name="date" content="2019-02-03">
+<meta name="date" content="2018-07-21">
 
   <meta name="viewport" content="width=device-width, initial-scale=1">
   <meta name="apple-mobile-web-app-capable" content="yes">
   <meta name="apple-mobile-web-app-status-bar-style" content="black">
   <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
   <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
-<link rel="prev" href="10-example-comparing-two-proportions.html">
-<link rel="next" href="12-inference-for-regression.html">
+<link rel="prev" href="9-confidence-intervals.html">
+<link rel="next" href="11-inference-for-regression.html">
 <script src="libs/jquery-2.2.3/jquery.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
 <link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
@@ -48,7 +48,6 @@
 
 
 
-<script src="libs/kePrint-0.0.1/kePrint.js"></script>
 <script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
 <link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
 <script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
@@ -147,54 +146,53 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.1</b> Introduction for students</a><ul>
-<li class="chapter" data-level="1.1.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.1.1</b> What you will learn from this book</a></li>
-<li class="chapter" data-level="1.1.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.1.2</b> Data/science pipeline</a></li>
-<li class="chapter" data-level="1.1.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.1.3</b> Reproducible research</a></li>
-<li class="chapter" data-level="1.1.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.1.4</b> Final note for students</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
 </ul></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.2</b> Introduction for instructors</a><ul>
-<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.2.1</b> Who is this book for?</a></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
 </ul></li>
-<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.3</b> DataCamp</a></li>
-<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.4</b> Connect and contribute</a></li>
-<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.5</b> About this book</a></li>
-<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.6</b> About the authors</a></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
-<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#r-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
 <li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
 <li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
 </ul></li>
 <li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
 <li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
-<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#errors-warnings-and-messages"><i class="fa fa-check"></i><b>2.2.2</b> Errors, warnings, and messages</a></li>
-<li class="chapter" data-level="2.2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.3</b> Tips on learning to code</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
 </ul></li>
 <li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
 <li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
 <li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
-<li class="chapter" data-level="2.3.3" data-path="2-getting-started.html"><a href="2-getting-started.html#package-use"><i class="fa fa-check"></i><b>2.3.3</b> Package use</a></li>
 </ul></li>
 <li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
-<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> <code>nycflights13</code> package</a></li>
-<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> <code>flights</code> data frame</a></li>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
 <li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
 <li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
 </ul></li>
 <li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
-<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#additional-resources"><i class="fa fa-check"></i><b>2.5.1</b> Additional resources</a></li>
-<li class="chapter" data-level="2.5.2" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
-<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization</a><ul>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
 <li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
 <li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
-<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder data</a></li>
-<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components"><i class="fa fa-check"></i><b>3.1.3</b> Other components</a></li>
-<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> ggplot2 package</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
 </ul></li>
 <li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
 <li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
@@ -217,76 +215,84 @@
 <li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
 </ul></li>
 <li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
-<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bar-or-geom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar or geom_col</a></li>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
 <li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
-<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#two-categ-barplot"><i class="fa fa-check"></i><b>3.8.3</b> Two categorical variables</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
 <li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
 </ul></li>
 <li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#summary-table"><i class="fa fa-check"></i><b>3.9.1</b> Summary table</a></li>
-<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#argument-specification"><i class="fa fa-check"></i><b>3.9.2</b> Argument specification</a></li>
-<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#additional-resources-1"><i class="fa fa-check"></i><b>3.9.3</b> Additional resources</a></li>
-<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>3.9.4</b> What’s to come</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="4" data-path="4-wrangling.html"><a href="4-wrangling.html"><i class="fa fa-check"></i><b>4</b> Data Wrangling</a><ul>
-<li class="chapter" data-level="" data-path="4-wrangling.html"><a href="4-wrangling.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="4.1" data-path="4-wrangling.html"><a href="4-wrangling.html#piping"><i class="fa fa-check"></i><b>4.1</b> The pipe operator: <code>%&gt;%</code></a></li>
-<li class="chapter" data-level="4.2" data-path="4-wrangling.html"><a href="4-wrangling.html#filter"><i class="fa fa-check"></i><b>4.2</b> <code>filter</code> rows</a></li>
-<li class="chapter" data-level="4.3" data-path="4-wrangling.html"><a href="4-wrangling.html#summarize"><i class="fa fa-check"></i><b>4.3</b> <code>summarize</code> variables</a></li>
-<li class="chapter" data-level="4.4" data-path="4-wrangling.html"><a href="4-wrangling.html#groupby"><i class="fa fa-check"></i><b>4.4</b> <code>group_by</code> rows</a><ul>
-<li class="chapter" data-level="4.4.1" data-path="4-wrangling.html"><a href="4-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>4.4.1</b> Grouping by more than one variable</a></li>
-</ul></li>
-<li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
-<li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
-<li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
-</ul></li>
-<li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
-<li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
-<li class="chapter" data-level="4.8.2" data-path="4-wrangling.html"><a href="4-wrangling.html#rename"><i class="fa fa-check"></i><b>4.8.2</b> <code>rename</code> variables</a></li>
-<li class="chapter" data-level="4.8.3" data-path="4-wrangling.html"><a href="4-wrangling.html#top_n-values-of-a-variable"><i class="fa fa-check"></i><b>4.8.3</b> <code>top_n</code> values of a variable</a></li>
-</ul></li>
-<li class="chapter" data-level="4.9" data-path="4-wrangling.html"><a href="4-wrangling.html#conclusion-2"><i class="fa fa-check"></i><b>4.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="4.9.1" data-path="4-wrangling.html"><a href="4-wrangling.html#summary-table-1"><i class="fa fa-check"></i><b>4.9.1</b> Summary table</a></li>
-<li class="chapter" data-level="4.9.2" data-path="4-wrangling.html"><a href="4-wrangling.html#additional-resources-2"><i class="fa fa-check"></i><b>4.9.2</b> Additional resources</a></li>
-<li class="chapter" data-level="4.9.3" data-path="4-wrangling.html"><a href="4-wrangling.html#whats-to-come-1"><i class="fa fa-check"></i><b>4.9.3</b> What’s to come?</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
-<li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
-</ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
-<li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
-</ul></li>
-<li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
-<li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
 <li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
 </ul></li>
 <li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
 <li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
 <li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
 </ul></li>
 <li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
 <li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
@@ -295,22 +301,24 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
 <li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
 </ul></li>
 <li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
 <li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
 <li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
 <li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
 </ul></li>
 <li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
 <li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
@@ -318,45 +326,36 @@
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>III Inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
-<li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
-</ul></li>
-<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
-<li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
-</ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
-</ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
 <li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
 <li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
 <li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
@@ -374,98 +373,93 @@
 </ul></li>
 <li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
 <li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
-<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> Example: One proportion</a><ul>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
 <li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
 <li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
 <li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
 </ul></li>
-</ul></li>
-<li class="chapter" data-level="10" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html"><i class="fa fa-check"></i><b>10</b> Example: Comparing two proportions</a><ul>
-<li class="chapter" data-level="10.0.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>10.0.1</b> Compute the point estimate</a></li>
-<li class="chapter" data-level="10.0.2" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>10.0.2</b> Bootstrap distribution</a></li>
-<li class="chapter" data-level="10.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#conclusion-6"><i class="fa fa-check"></i><b>10.1</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.1.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#whats-to-come-5"><i class="fa fa-check"></i><b>10.1.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="10.1.2" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#script-of-r-code-2"><i class="fa fa-check"></i><b>10.1.2</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="11" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html"><i class="fa fa-check"></i><b>11</b> Hypothesis Testing</a><ul>
-<li class="chapter" data-level="" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="11.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>11.1</b> When inference is not needed</a></li>
-<li class="chapter" data-level="11.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>11.2</b> Basics of hypothesis testing</a></li>
-<li class="chapter" data-level="11.3" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>11.3</b> Criminal trial analogy</a><ul>
-<li class="chapter" data-level="11.3.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>11.3.1</b> Two possible conclusions</a></li>
-</ul></li>
-<li class="chapter" data-level="11.4" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>11.4</b> Types of errors in hypothesis testing</a><ul>
-<li class="chapter" data-level="11.4.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>11.4.1</b> Logic of hypothesis testing</a></li>
-</ul></li>
-<li class="chapter" data-level="11.5" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>11.5</b> Statistical significance</a></li>
-<li class="chapter" data-level="11.6" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>11.6</b> Hypothesis testing with infer</a></li>
-<li class="chapter" data-level="11.7" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>11.7</b> Example: Comparing two means</a><ul>
-<li class="chapter" data-level="11.7.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>11.7.1</b> Randomization/permutation</a></li>
-<li class="chapter" data-level="11.7.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>11.7.2</b> Comparing action and romance movies</a></li>
-<li class="chapter" data-level="11.7.3" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>11.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
-<li class="chapter" data-level="11.7.4" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>11.7.4</b> Data</a></li>
-<li class="chapter" data-level="11.7.5" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>11.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="11.7.6" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>11.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
-<li class="chapter" data-level="11.7.7" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>11.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
-<li class="chapter" data-level="11.7.8" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>11.7.8</b> Simulated data</a></li>
-<li class="chapter" data-level="11.7.9" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>11.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="11.7.10" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>11.7.10</b> The p-value</a></li>
-<li class="chapter" data-level="11.7.11" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>11.7.11</b> Corresponding confidence interval</a></li>
-<li class="chapter" data-level="11.7.12" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>11.7.12</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="11.8" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>11.8</b> Building theory-based methods using computation</a><ul>
-<li class="chapter" data-level="11.8.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>11.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
-<li class="chapter" data-level="11.8.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>11.8.2</b> Conditions for t-test</a></li>
-</ul></li>
-<li class="chapter" data-level="11.9" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>11.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="11.9.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>11.9.1</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="12" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html"><i class="fa fa-check"></i><b>12</b> Inference for Regression</a><ul>
-<li class="chapter" data-level="" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="12.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>12.1</b> Simulation-based Inference for Regression</a><ul>
-<li class="chapter" data-level="12.1.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>12.1.1</b> Data</a></li>
-<li class="chapter" data-level="12.1.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>12.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
-<li class="chapter" data-level="12.1.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>12.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
-<li class="chapter" data-level="12.1.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>12.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="12.1.5" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>12.1.5</b> Simulated data</a></li>
-<li class="chapter" data-level="12.1.6" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>12.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="12.1.7" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>12.1.7</b> The p-value</a></li>
-</ul></li>
-<li class="chapter" data-level="12.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>12.2</b> Bootstrapping for the regression slope</a></li>
-<li class="chapter" data-level="12.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>12.3</b> Inference for multiple regression</a><ul>
-<li class="chapter" data-level="12.3.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>12.3.1</b> Refresher: Professor evaluations data</a></li>
-<li class="chapter" data-level="12.3.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>12.3.2</b> Refresher: Visualizations</a></li>
-<li class="chapter" data-level="12.3.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>12.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="12.3.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>12.3.4</b> Script of R code</a></li>
-</ul></li>
-<li class="chapter" data-level="12.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>12.4</b> Residual analysis</a><ul>
-<li class="chapter" data-level="12.4.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>12.4.1</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model2residuals"><i class="fa fa-check"></i><b>12.4.2</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model3residuals"><i class="fa fa-check"></i><b>12.4.3</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model4residuals"><i class="fa fa-check"></i><b>12.4.4</b> Residual analysis</a></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>IV Conclusion</b></span></li>
-<li class="chapter" data-level="13" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html"><i class="fa fa-check"></i><b>13</b> Thinking with Data</a><ul>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="13.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>13.1</b> Case study: Seattle house prices</a><ul>
-<li class="chapter" data-level="13.1.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>13.1.1</b> Exploratory data analysis (EDA)</a></li>
-<li class="chapter" data-level="13.1.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>13.1.2</b> log10 transformations</a></li>
-<li class="chapter" data-level="13.1.3" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>13.1.3</b> EDA Part II</a></li>
-<li class="chapter" data-level="13.1.4" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>13.1.4</b> Regression modeling</a></li>
-<li class="chapter" data-level="13.1.5" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>13.1.5</b> Making predictions</a></li>
-</ul></li>
-<li class="chapter" data-level="13.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>13.2</b> Case study: Effective data storytelling</a><ul>
-<li class="chapter" data-level="13.2.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>13.2.1</b> Bechdel test for Hollywood gender representation</a></li>
-<li class="chapter" data-level="13.2.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>13.2.2</b> US Births in 1999</a></li>
-<li class="chapter" data-level="13.2.3" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>13.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="13.2.4" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>13.2.4</b> Script of R code</a></li>
-</ul></li>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
 <li class="appendix"><span><b>Appendix</b></span></li>
 <li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
@@ -479,7 +473,7 @@
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
-<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
 <li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
 <li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
@@ -528,19 +522,12 @@
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
-<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-12"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
 <li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
 <li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
 </ul></li>
 </ul></li>
-<li class="chapter" data-level="D" data-path="D-appendixD.html"><a href="D-appendixD.html"><i class="fa fa-check"></i><b>D</b> Learning Check Solutions</a><ul>
-<li class="chapter" data-level="D.1" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-2-solutions"><i class="fa fa-check"></i><b>D.1</b> Chapter 2 Solutions</a></li>
-<li class="chapter" data-level="D.2" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-3-solutions"><i class="fa fa-check"></i><b>D.2</b> Chapter 3 Solutions</a></li>
-<li class="chapter" data-level="D.3" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-4-solutions"><i class="fa fa-check"></i><b>D.3</b> Chapter 4 Solutions</a></li>
-<li class="chapter" data-level="D.4" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-5-solutions"><i class="fa fa-check"></i><b>D.4</b> Chapter 5 Solutions</a></li>
-<li class="chapter" data-level="D.5" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-6-solutions"><i class="fa fa-check"></i><b>D.5</b> Chapter 6 Solutions</a></li>
-</ul></li>
 <li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
 </ul>
 
@@ -551,7 +538,7 @@
       <div class="body-inner">
         <div class="book-header" role="navigation">
           <h1>
-            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Statistical Inference via Data Science</a>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
           </h1>
         </div>
 
@@ -563,8 +550,8 @@ <h1>
 <img src='https://moderndive.com/wide_format.png' alt="ModernDive">
 </html>
 <div id="hypothesis-testing" class="section level1">
-<h1><span class="header-section-number">Chapter 11</span> Hypothesis Testing</h1>
-<p>We saw some of the main concepts of hypothesis testing introduced in Chapters <a href="8-sampling.html#sampling">8</a> and <a href="9-confidence-intervals.html#confidence-intervals">9</a>. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations.</p>
+<h1><span class="header-section-number">10</span> Hypothesis Testing</h1>
+<p>We saw some of the main concepts of hypothesis testing introduced in Chapters <a href="8-sampling.html#sampling">8</a> and <a href="9-confidence-intervals.html#confidence-intervals">9</a>. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general.Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations.</p>
 <p>The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the <code>infer</code> package pipeline in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix <a href="B-appendixB.html#appendixB">B</a>.</p>
 <p>We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the <span class="math inline">\(t\)</span>-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook.</p>
 <div id="needed-packages-7" class="section level3 unnumbered">
@@ -576,24 +563,18 @@ <h3>Needed packages</h3>
 <span class="kw">library</span>(nycflights13)
 <span class="kw">library</span>(ggplot2movies)
 <span class="kw">library</span>(broom)</code></pre>
-<hr />
-<p>We saw some of the main concepts of hypothesis testing introduced in Chapters <a href="8-sampling.html#sampling">8</a> and <a href="9-confidence-intervals.html#confidence-intervals">9</a>. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations.</p>
-<p>The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the <code>infer</code> package pipeline in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix <a href="B-appendixB.html#appendixB">B</a>.</p>
-<p>We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the <span class="math inline">\(t\)</span>-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook.</p>
 </div>
-<div id="needed-packages-8" class="section level3 unnumbered">
-<h3>Needed packages</h3>
-<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
-<span class="kw">library</span>(ggplot2)
-<span class="kw">library</span>(infer)
-<span class="kw">library</span>(nycflights13)
-<span class="kw">library</span>(ggplot2movies)
-<span class="kw">library</span>(broom)</code></pre>
+<div id="datacamp-7" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>Our approach of using data science tools to understand the second major component of statistical inference, hypothesis testing, uses the same tools as in <a href="https://twitter.com/minebocek">Mine Cetinkaya-Rundel</a> and <a href="https://twitter.com/crite">Andrew Bray’s</a> DataCamp courses “Inference for Numerical Data” and “Inference for Categorical Data.” If you’re interested in complementing your learning below in an interactive online environment, click on the images below to access the courses.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-numerical-data"><img src="images/datacamp_inference_for_numerical_data.png" alt="Drawing" style="height: 150px;"/></a>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-categorical-data"><img src="images/datacamp_inference_for_categorical_data.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
 <hr />
 </div>
 <div id="when-inference-is-not-needed" class="section level2">
-<h2><span class="header-section-number">11.1</span> When inference is not needed</h2>
+<h2><span class="header-section-number">10.1</span> When inference is not needed</h2>
 <p>Before we delve into hypothesis testing, it’s good to remember that there are cases where you need not perform a rigorous statistical inference. An important and time-saving skill is to <strong>ALWAYS</strong> do exploratory data analysis using <code>dplyr</code> and <code>ggplot2</code> before thinking about running a hypothesis test. Let’s look at such an example selecting a sample of flights traveling to Boston and to San Francisco from New York City in the <code>flights</code> data frame in the <code>nycflights13</code> package. (We will remove flights with missing data first using <code>na.omit</code> and then sample 100 flights going to each of the two airports.)</p>
 <pre class="sourceCode r"><code class="sourceCode r">bos_sfo &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">na.omit</span>() <span class="op">%&gt;%</span><span class="st"> </span>
@@ -608,8 +589,8 @@ <h2><span class="header-section-number">11.1</span> When inference is not needed
 <pre><code># A tibble: 2 x 3
   dest  mean_time sd_time
   &lt;chr&gt;     &lt;dbl&gt;   &lt;dbl&gt;
-1 BOS        39.0    4.51
-2 SFO       349.    18.7 </code></pre>
+1 BOS        38.3    4.21
+2 SFO       345.    18.0 </code></pre>
 <p>Looking at these results, we can clearly see that SFO <code>air_time</code> is much larger than BOS <code>air_time</code>. The standard deviation is also extremely informative here.</p>
 <div class="learncheck">
 <p>
@@ -623,13 +604,12 @@ <h2><span class="header-section-number">11.1</span> When inference is not needed
 <p>To further understand just how different the <code>air_time</code> variable is for BOS and SFO, let’s look at a boxplot:</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> bos_sfo, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dest, <span class="dt">y =</span> air_time)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-335-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-357-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Since there is no overlap at all, we can conclude that the <code>air_time</code> for San Francisco flights is statistically greater (at any level of significance) than the <code>air_time</code> for Boston flights. This is a clear example of not needing to do anything more than some simple exploratory data analysis with descriptive statistics and data visualization to get an appropriate inferential conclusion. This is one reason why you should <strong>ALWAYS</strong> investigate the sample data first using <code>dplyr</code> and <code>ggplot2</code> via exploratory data analysis.</p>
 <p>As you get more and more practice with hypothesis testing, you’ll be better able to determine in many cases whether or not the results will be statistically significant. There are circumstances where it is difficult to tell, but you should always try to make a guess FIRST about significance after you have completed your data exploration and before you actually begin the inferential techniques.</p>
-<hr />
 </div>
 <div id="ht-basics" class="section level2">
-<h2><span class="header-section-number">11.2</span> Basics of hypothesis testing</h2>
+<h2><span class="header-section-number">10.2</span> Basics of hypothesis testing</h2>
 <p>In a hypothesis test, we will use data from a sample to help us decide between two competing <em>hypotheses</em> about a population. We make these hypotheses more concrete by specifying them in terms of at least one <em>population parameter</em> of interest. We refer to the competing claims about the population as the <strong>null hypothesis</strong>, denoted by <span class="math inline">\(H_0\)</span>, and the <strong>alternative (or research) hypothesis</strong>, denoted by <span class="math inline">\(H_a\)</span>. The roles of these two hypotheses are NOT interchangeable.</p>
 <ul>
 <li>The claim for which we seek significant evidence is assigned to the alternative hypothesis. The alternative is usually what the experimenter or researcher wants to establish or find evidence for.</li>
@@ -637,19 +617,19 @@ <h2><span class="header-section-number">11.2</span> Basics of hypothesis testing
 </li>
 <li>We assess the strength of evidence by assuming the null hypothesis is true and determining how unlikely it would be to see sample results/statistics as extreme (or more extreme) as those in the original sample.</li>
 </ul>
-<p>Hypothesis testing brings about many weird and incorrect notions in the scientific community and society at large. One reason for this is that statistics has traditionally been thought of as this magic box of algorithms and procedures to get to results and this has been readily apparent if you do a Google search of “flowchart statistics hypothesis tests.” There are so many different complex ways to determine which test is appropriate.</p>
+<p>Hypothesis testing brings about many weird and incorrect notions in the scientific community and society at large. One reason for this is that statistics has traditionally been thought of as this magic box of algorithms and procedures to get to results and this has been readily apparent if you do a Google search of “flowchart statistics hypothesis tests”. There are so many different complex ways to determine which test is appropriate.</p>
 <p>You’ll see that we don’t need to rely on these complicated series of assumptions and procedures to conduct a hypothesis test any longer. These methods were introduced in a time when computers weren’t powerful. Your cellphone (in 2016) has more power than the computers that sent NASA astronauts to the moon after all. We’ll see that ALL hypothesis tests can be broken down into the following framework given by Allen Downey <a href="http://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.html">here</a>:</p>
 <div class="figure" style="text-align: center"><span id="fig:htdowney"></span>
 <img src="images/ht.png" alt="Hypothesis Testing Framework" width="\textwidth" />
 <p class="caption">
-FIGURE 11.1: Hypothesis Testing Framework
+Figure 10.1: Hypothesis Testing Framework
 </p>
 </div>
 <p>Before we hop into this framework, we will provide another way to think about hypothesis testing that may be useful.</p>
 <hr />
 </div>
 <div id="trial" class="section level2">
-<h2><span class="header-section-number">11.3</span> Criminal trial analogy</h2>
+<h2><span class="header-section-number">10.3</span> Criminal trial analogy</h2>
 <p>We can think of hypothesis testing in the same context as a criminal trial in the United States. A criminal trial in the United States is a familiar situation in which a choice between two contradictory claims must be made.</p>
 <ol style="list-style-type: decimal">
 <li><p>The accuser of the crime must be judged either guilty or not guilty.</p></li>
@@ -666,7 +646,7 @@ <h2><span class="header-section-number">11.3</span> Criminal trial analogy</h2>
 <li><p>The analogy to “beyond a reasonable doubt” in hypothesis testing is what is known as the <strong>significance level</strong>. This will be set before conducting the hypothesis test and is denoted as <span class="math inline">\(\alpha\)</span>. Common values for <span class="math inline">\(\alpha\)</span> are 0.1, 0.01, and 0.05.</p></li>
 </ol>
 <div id="two-possible-conclusions" class="section level3">
-<h3><span class="header-section-number">11.3.1</span> Two possible conclusions</h3>
+<h3><span class="header-section-number">10.3.1</span> Two possible conclusions</h3>
 <p>Therefore, we have two possible conclusions with hypothesis testing:</p>
 <ul>
 <li>Reject <span class="math inline">\(H_0\)</span><br />
@@ -680,7 +660,7 @@ <h3><span class="header-section-number">11.3.1</span> Two possible conclusions</
 </div>
 </div>
 <div id="types-of-errors-in-hypothesis-testing" class="section level2">
-<h2><span class="header-section-number">11.4</span> Types of errors in hypothesis testing</h2>
+<h2><span class="header-section-number">10.4</span> Types of errors in hypothesis testing</h2>
 <p>Unfortunately, just as a jury or a judge can make an incorrect decision in regards to a criminal trial by reaching the wrong verdict, there is some chance we will reach the wrong conclusion via a hypothesis test about a population parameter. As with criminal trials, this comes from the fact that we don’t have complete information, but rather a sample from which to try to infer about a population.</p>
 <p>The possible erroneous conclusions in a criminal trial are</p>
 <ul>
@@ -694,10 +674,10 @@ <h2><span class="header-section-number">11.4</span> Types of errors in hypothesi
 </ul>
 <p>The risk of error is the price researchers pay for basing an inference about a population on a sample. With any reasonable sample-based procedure, there is some chance that a Type I error will be made and some chance that a Type II error will occur.</p>
 <p>To help understand the concepts of Type I error and Type II error, observe the following table:</p>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-336"></span>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-358"></span>
 <img src="images/errors.png" alt="Type I and Type II errors" width="\textwidth" />
 <p class="caption">
-FIGURE 11.2: Type I and Type II errors
+Figure 10.2: Type I and Type II errors
 </p>
 </div>
 <p>If we are using sample data to make inferences about a parameter, we run the risk of making a mistake. Obviously, we want to minimize our chance of error; we want a small probability of drawing an incorrect conclusion.</p>
@@ -723,7 +703,7 @@ <h2><span class="header-section-number">11.4</span> Types of errors in hypothesi
 
 </div>
 <div id="logic-of-hypothesis-testing" class="section level3">
-<h3><span class="header-section-number">11.4.1</span> Logic of hypothesis testing</h3>
+<h3><span class="header-section-number">10.4.1</span> Logic of hypothesis testing</h3>
 <ul>
 <li>Take a random sample (or samples) from a population (or multiple populations)</li>
 <li>If the sample data are consistent with the null hypothesis, do not reject the null hypothesis.</li>
@@ -733,7 +713,7 @@ <h3><span class="header-section-number">11.4.1</span> Logic of hypothesis testin
 </div>
 </div>
 <div id="statistical-significance" class="section level2">
-<h2><span class="header-section-number">11.5</span> Statistical significance</h2>
+<h2><span class="header-section-number">10.5</span> Statistical significance</h2>
 <p>The idea that sample results are more extreme than we would reasonably expect to see by random chance if the null hypothesis were true is the fundamental idea behind statistical hypothesis tests. If data at least as extreme would be very unlikely if the null hypothesis were true, we say the data are <strong>statistically significant</strong>. Statistically significant data provide convincing evidence against the null hypothesis in favor of the alternative, and allow us to generalize our sample results to the claim about the population.</p>
 <div class="learncheck">
 <p>
@@ -746,30 +726,28 @@ <h2><span class="header-section-number">11.5</span> Statistical significance</h2
 <div class="learncheck">
 
 </div>
-<hr />
 </div>
 <div id="hypothesis-testing-with-infer" class="section level2">
-<h2><span class="header-section-number">11.6</span> Hypothesis testing with infer</h2>
-<p>The “There is Only One Test” diagram mentioned in Section <a href="11-hypothesis-testing.html#ht-basics">11.2</a> was the inspiration for the <code>infer</code> pipeline that you saw for confidence intervals in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>. For hypothesis tests, we include one more verb into the pipeline: the <code>hypothesize()</code> verb. Its main argument is <code>null</code> which is either</p>
+<h2><span class="header-section-number">10.6</span> Hypothesis testing with infer</h2>
+<p>The “There is Only One Test” diagram mentioned in Section <a href="10-hypothesis-testing.html#ht-basics">10.2</a> was the inspiration for the <code>infer</code> pipeline that you saw for confidence intervals in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>. For hypothesis tests, we include one more verb into the pipeline: the <code>hypothesize()</code> verb. Its main argument is <code>null</code> which is either</p>
 <ul>
 <li><code>&quot;point&quot;</code> for point hypotheses involving a single sample or</li>
 <li><code>&quot;independence&quot;</code> for testing for independence between two variables.</li>
 </ul>
 <p><img src="images/flowcharts/infer/ht.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We’ll first explore the two variable case by comparing two means. Note the section headings here that refer to the “There is Only One Test” diagram. We will lay out the specifics for each problem using this framework and the <code>infer</code> pipeline together.</p>
-<hr />
 </div>
 <div id="example-comparing-two-means" class="section level2">
-<h2><span class="header-section-number">11.7</span> Example: Comparing two means</h2>
+<h2><span class="header-section-number">10.7</span> Example: Comparing two means</h2>
 <div id="randomizationpermutation" class="section level3">
-<h3><span class="header-section-number">11.7.1</span> Randomization/permutation</h3>
+<h3><span class="header-section-number">10.7.1</span> Randomization/permutation</h3>
 <p>We will now focus on building hypotheses looking at the difference between two population means in an example. We will denote population means using the Greek symbol <span class="math inline">\(\mu\)</span> (pronounced “mu”). Thus, we will be looking to see if one group “out-performs” another group. This is quite possibly the most common type of statistical inference and serves as a basis for many other types of analyses when comparing the relationship between two variables.</p>
 <p>Our null hypothesis will be of the form <span class="math inline">\(H_0: \mu_1 = \mu_2\)</span>, which can also be written as <span class="math inline">\(H_0: \mu_1 - \mu_2 = 0\)</span>. Our alternative hypothesis will be of the form <span class="math inline">\(H_0: \mu_1 \star \mu_2\)</span> (or <span class="math inline">\(H_a: \mu_1 - \mu_2 \, \star \, 0\)</span>) where <span class="math inline">\(\star\)</span> = <span class="math inline">\(&lt;\)</span>, <span class="math inline">\(\ne\)</span>, or <span class="math inline">\(&gt;\)</span> depending on the context of the problem. You needn’t focus on these new symbols too much at this point. It will just be a shortcut way for us to describe our hypotheses.</p>
 <p>As we saw in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>, bootstrapping is a valuable tool when conducting inferences based on one or two population variables. We will see that the process of <strong>randomization</strong> (also known as <strong>permutation</strong>) will be valuable in conducting tests comparing quantitative values from two groups.</p>
 </div>
 <div id="comparing-action-and-romance-movies" class="section level3">
-<h3><span class="header-section-number">11.7.2</span> Comparing action and romance movies</h3>
-<p>The <code>movies</code> dataset in the <code>ggplot2movies</code> package contains information on a large number of movies that have been rated by users of IMDB.com <span class="citation">(Wickham <a href="#ref-R-ggplot2movies">2015</a>)</span>. We are interested in the question here of whether <code>Action</code> movies are rated higher on IMDB than <code>Romance</code> movies. We will first need to do a little bit of data wrangling using the ideas from Chapter <a href="4-wrangling.html#wrangling">4</a> to get the data in the form that we would like:</p>
+<h3><span class="header-section-number">10.7.2</span> Comparing action and romance movies</h3>
+<p>The <code>movies</code> dataset in the <code>ggplot2movies</code> package contains information on a large number of movies that have been rated by users of IMDB.com <span class="citation">(Wickham 2015)</span>. We are interested in the question here of whether <code>Action</code> movies are rated higher on IMDB than <code>Romance</code> movies. We will first need to do a little bit of data wrangling using the ideas from Chapter <a href="5-wrangling.html#wrangling">5</a> to get the data in the form that we would like:</p>
 <pre class="sourceCode r"><code class="sourceCode r">movies_trimmed &lt;-<span class="st"> </span>movies <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">select</span>(title, year, rating, Action, Romance)</code></pre>
 <p>Note that <code>Action</code> and <code>Romance</code> are binary variables here. To remove any overlap of movies (and potential confusion) that are both <code>Action</code> and <code>Romance</code>, we will remove them from our <em>population</em>:</p>
@@ -798,10 +776,10 @@ <h3><span class="header-section-number">11.7.2</span> Comparing action and roman
 <p>Let’s now visualize the distributions of <code>rating</code> across both levels of <code>genre</code>. Think about what type(s) of plot is/are appropriate here before you proceed:</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> movies_trimmed, <span class="kw">aes</span>(<span class="dt">x =</span> genre, <span class="dt">y =</span> rating)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-344"></span>
-<img src="ismaykim_files/figure-html/unnamed-chunk-344-1.png" alt="Rating vs genre in the population" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-366"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-366-1.png" alt="Rating vs genre in the population" width="\textwidth" />
 <p class="caption">
-FIGURE 11.3: Rating vs genre in the population
+Figure 10.3: Rating vs genre in the population
 </p>
 </div>
 <p>We can see that the middle 50% of ratings for <code>&quot;Action&quot;</code> movies is more spread out than that of <code>&quot;Romance&quot;</code> movies in the population. <code>&quot;Romance&quot;</code> has outliers at both the top and bottoms of the scale though. We are initially interested in comparing the mean <code>rating</code> across these two groups so a faceted histogram may also be useful:</p>
@@ -811,13 +789,13 @@ <h3><span class="header-section-number">11.7.2</span> Comparing action and roman
 <div class="figure" style="text-align: center"><span id="fig:movie-hist"></span>
 <img src="ismaykim_files/figure-html/movie-hist-1.png" alt="Faceted histogram of genre vs rating" width="\textwidth" />
 <p class="caption">
-FIGURE 11.4: Faceted histogram of genre vs rating
+Figure 10.4: Faceted histogram of genre vs rating
 </p>
 </div>
 <p><strong>Important note:</strong> Remember that we hardly ever have access to the population values as we do here. This example and the <code>nycflights13</code> dataset were used to create a common flow from chapter to chapter. In nearly all circumstances, we’ll be needing to use only a sample of the population to try to infer conclusions about the unknown population parameter values. These examples do show a nice relationship between statistics (where data is usually small and more focused on experimental settings) and data science (where data is frequently large and collected without experimental conditions).</p>
 </div>
 <div id="sampling-rightarrow-randomization" class="section level3">
-<h3><span class="header-section-number">11.7.3</span> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</h3>
+<h3><span class="header-section-number">10.7.3</span> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</h3>
 <p>We can use hypothesis testing to investigate ways to determine, for example, whether a <strong>treatment</strong> has an effect over a <strong>control</strong> and other ways to statistically analyze if one group performs better than, worse than, or different than another. We are interested here in seeing how we can use a random sample of action movies and a random sample of romance movies from <code>movies</code> to determine if a statistical difference exists in the mean ratings of each group.</p>
 <div class="learncheck">
 <p>
@@ -830,7 +808,7 @@ <h3><span class="header-section-number">11.7.3</span> Sampling <span class="math
 </div>
 </div>
 <div id="data" class="section level3">
-<h3><span class="header-section-number">11.7.4</span> Data</h3>
+<h3><span class="header-section-number">10.7.4</span> Data</h3>
 <p>Let’s select a random sample of 34 action movies and a random sample of 34 romance movies. (The number 34 was chosen somewhat arbitrarily here.)</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2017</span>)
 movies_genre_sample &lt;-<span class="st"> </span>movies_trimmed <span class="op">%&gt;%</span><span class="st"> </span>
@@ -843,19 +821,19 @@ <h3><span class="header-section-number">11.7.4</span> Data</h3>
 in our population of all movies in the <code>movies</code> data frame.</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> movies_genre_sample, <span class="kw">aes</span>(<span class="dt">x =</span> genre, <span class="dt">y =</span> rating)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-347"></span>
-<img src="ismaykim_files/figure-html/unnamed-chunk-347-1.png" alt="Genre vs rating for our sample" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-369"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-369-1.png" alt="Genre vs rating for our sample" width="\textwidth" />
 <p class="caption">
-FIGURE 11.5: Genre vs rating for our sample
+Figure 10.5: Genre vs rating for our sample
 </p>
 </div>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> movies_genre_sample, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> rating)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">1</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
 <span class="st">  </span><span class="kw">facet_grid</span>(genre <span class="op">~</span><span class="st"> </span>.)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-348"></span>
-<img src="ismaykim_files/figure-html/unnamed-chunk-348-1.png" alt="Genre vs rating for our sample as faceted histogram" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-370"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-370-1.png" alt="Genre vs rating for our sample as faceted histogram" width="\textwidth" />
 <p class="caption">
-FIGURE 11.6: Genre vs rating for our sample as faceted histogram
+Figure 10.6: Genre vs rating for our sample as faceted histogram
 </p>
 </div>
 <div class="learncheck">
@@ -884,7 +862,7 @@ <h3><span class="header-section-number">11.7.4</span> Data</h3>
 <strong><em>Learning check</em></strong>
 </p>
 </div>
-<p><strong>(LC10.10)</strong> Why did we not specify <code>na.rm = TRUE</code> here as we did in Chapter <a href="4-wrangling.html#wrangling">4</a>?</p>
+<p><strong>(LC10.10)</strong> Why did we not specify <code>na.rm = TRUE</code> here as we did in Chapter <a href="5-wrangling.html#wrangling">5</a>?</p>
 <div class="learncheck">
 
 </div>
@@ -900,7 +878,7 @@ <h3><span class="header-section-number">11.7.4</span> Data</h3>
 </div>
 </div>
 <div id="model-of-h_0" class="section level3">
-<h3><span class="header-section-number">11.7.5</span> Model of <span class="math inline">\(H_0\)</span></h3>
+<h3><span class="header-section-number">10.7.5</span> Model of <span class="math inline">\(H_0\)</span></h3>
 <p>The hypotheses we specified can also be written in another form to better give us an idea of what we will be simulating to create our null distribution.</p>
 <ul>
 <li><span class="math inline">\(H_0: \mu_r - \mu_a = 0\)</span></li>
@@ -908,11 +886,11 @@ <h3><span class="header-section-number">11.7.5</span> Model of <span class="math
 </ul>
 </div>
 <div id="test-statistic-delta" class="section level3">
-<h3><span class="header-section-number">11.7.6</span> Test statistic <span class="math inline">\(\delta\)</span></h3>
+<h3><span class="header-section-number">10.7.6</span> Test statistic <span class="math inline">\(\delta\)</span></h3>
 <p>We are, therefore, interested in seeing whether the difference in the sample means, <span class="math inline">\(\bar{x}_r - \bar{x}_a\)</span>, is statistically different than 0. We can now come back to our <code>infer</code> pipeline for computing our observed statistic. Note the <code>order</code> argument that shows the mean value for <code>&quot;Action&quot;</code> being subtracted from the mean value of <code>&quot;Romance&quot;</code>.</p>
 </div>
 <div id="observed-effect-delta" class="section level3">
-<h3><span class="header-section-number">11.7.7</span> Observed effect <span class="math inline">\(\delta^*\)</span></h3>
+<h3><span class="header-section-number">10.7.7</span> Observed effect <span class="math inline">\(\delta^*\)</span></h3>
 <pre class="sourceCode r"><code class="sourceCode r">obs_diff &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Romance&quot;</span>, <span class="st">&quot;Action&quot;</span>))
@@ -924,21 +902,22 @@ <h3><span class="header-section-number">11.7.7</span> Observed effect <span clas
 <p>Our goal next is to figure out a random process with which to simulate the null hypothesis being true. Recall that <span class="math inline">\(H_0: \mu_r - \mu_a = 0\)</span> corresponds to us assuming that the population means are the same. We would like to assume this is true and perform a random process to <code>generate()</code> data in the model of the null hypothesis.</p>
 </div>
 <div id="simulated-data" class="section level3">
-<h3><span class="header-section-number">11.7.8</span> Simulated data</h3>
+<h3><span class="header-section-number">10.7.8</span> Simulated data</h3>
 <p><strong>Tactile simulation</strong></p>
 <!-- Should probably include some pictures of the index cards here. -->
 <p>Here, with us assuming the two population means are equal (<span class="math inline">\(H_0: \mu_r - \mu_a = 0\)</span>), we can look at this from a tactile point of view by using index cards. There are <span class="math inline">\(n_r = 34\)</span> data elements corresponding to romance movies and <span class="math inline">\(n_a = 34\)</span> for action movies. We can write the 34 ratings from our sample for romance movies on one set of 34 index cards and the 34 ratings for action movies on another set of 34 index cards. (Note that the sample sizes need not be the same.)</p>
-<p>The next step is to put the two stacks of index cards together, creating a new set of 68 cards. If we assume that the two population means are equal, we are saying that there is no association between ratings and genre (romance vs action). We can use the index cards to create two <strong>new</strong> stacks for romance and action movies. Note that the <strong>new</strong> “romance movie stack” will likely have some of the original action movies in it and likewise for the “action movie stack” including some romance movies from our original set. Since we are assuming that each card is equally likely to have appeared in either one of the stacks this makes sense. First, we must shuffle all the cards thoroughly. After doing so, in this case with equal values of sample sizes, we split the deck in half.</p>
+<p>The next step is to put the two stacks of index cards together, creating a new set of 68 cards. If we assume that the two population means are equal, we are saying that there is no association between ratings and genre (romance vs action). We can use the index cards to create two <strong>new</strong> stacks for romance and action movies. First, we must shuffle all the cards thoroughly. After doing so, in this case with equal values of sample sizes, we split the deck in half.</p>
 <p>We then calculate the new sample mean rating of the romance deck, and also the new sample mean rating of the action deck. This creates one simulation of the samples that were collected originally. We next want to calculate a statistic from these two samples. Instead of actually doing the calculation using index cards, we can use R as we have before to simulate this process. Let’s do this just once and compare the results to what we see in <code>movies_genre_sample</code>.</p>
-<pre class="sourceCode r"><code class="sourceCode r">movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span>
-<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1</span>) <span class="op">%&gt;%</span><span class="st"> </span>
-<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Romance&quot;</span>, <span class="st">&quot;Action&quot;</span>))</code></pre>
-<pre><code># A tibble: 1 x 1
-   stat
-  &lt;dbl&gt;
-1 0.515</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">shuffled_ratings_old &lt;-<span class="st"> </span><span class="co">#movies_trimmed %&gt;%</span>
+<span class="st">  </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">     </span><span class="kw">mutate</span>(<span class="dt">genre =</span> mosaic<span class="op">::</span><span class="kw">shuffle</span>(genre)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">     </span><span class="kw">group_by</span>(genre) <span class="op">%&gt;%</span>
+<span class="st">     </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(rating))
+<span class="kw">diff</span>(shuffled_ratings_old<span class="op">$</span>mean)</code></pre>
+<pre><code>[1] 0.126</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">permuted_ratings &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1</span>)</code></pre>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
@@ -951,7 +930,7 @@ <h3><span class="header-section-number">11.7.8</span> Simulated data</h3>
 </div>
 </div>
 <div id="distribution-of-delta-under-h_0" class="section level3">
-<h3><span class="header-section-number">11.7.9</span> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></h3>
+<h3><span class="header-section-number">10.7.9</span> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></h3>
 <p>The <code>generate()</code> step completes a permutation sending values of ratings to potentially different values of <code>genre</code> from which they originally came. It simulates a shuffling of the ratings between the two levels of <code>genre</code> just as we could have done with index cards. We can now proceed in a similar way to what we have done previously with bootstrapping by repeating this process many times to create simulated samples, assuming the null hypothesis is true.</p>
 <pre class="sourceCode r"><code class="sourceCode r">generated_samples &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
@@ -960,31 +939,31 @@ <h3><span class="header-section-number">11.7.9</span> Distribution of <span clas
 <p>A <strong>null distribution</strong> of simulated differences in sample means is created with the specification of <code>stat = &quot;diff in means&quot;</code> for the <code>calculate()</code> step. The <strong>null distribution</strong> is similar to the bootstrap distribution we saw in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>, but remember that it consists of statistics generated assuming the null hypothesis is true.</p>
 <p>We can now plot the distribution of these simulated differences in means:</p>
 <pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-361"></span>
-<img src="ismaykim_files/figure-html/unnamed-chunk-361-1.png" alt="Simulated differences in means histogram" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-382"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-382-1.png" alt="Simulated differences in means histogram" width="\textwidth" />
 <p class="caption">
-FIGURE 11.7: Simulated differences in means histogram
+Figure 10.7: Simulated differences in means histogram
 </p>
 </div>
 </div>
 <div id="the-p-value" class="section level3">
-<h3><span class="header-section-number">11.7.10</span> The p-value</h3>
+<h3><span class="header-section-number">10.7.10</span> The p-value</h3>
 <p>Remember that we are interested in seeing where our observed sample mean difference of 0.95 falls on this null/randomization distribution. We are interested in simply a difference here so “more extreme” corresponds to values in both tails on the distribution. Let’s shade our null distribution to show a visual representation of our <span class="math inline">\(p\)</span>-value:</p>
 <pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> obs_diff, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-362"></span>
-<img src="ismaykim_files/figure-html/unnamed-chunk-362-1.png" alt="Shaded histogram to show p-value" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-383"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-383-1.png" alt="Shaded histogram to show p-value" width="\textwidth" />
 <p class="caption">
-FIGURE 11.8: Shaded histogram to show p-value
+Figure 10.8: Shaded histogram to show p-value
 </p>
 </div>
 <p>Remember that the observed difference in means was 0.95. We have shaded red all values at or above that value and also shaded red those values at or below its negative value (since this is a two-tailed test). By giving <code>obs_stat = obs_diff</code> a vertical darker line is also shown at 0.95. To better estimate how large the <span class="math inline">\(p\)</span>-value will be, we also increase the number of bins to 100 here from 20:</p>
 <pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">100</span>, <span class="dt">obs_stat =</span> obs_diff, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-363"></span>
-<img src="ismaykim_files/figure-html/unnamed-chunk-363-1.png" alt="Histogram with vertical lines corresponding to observed statistic" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-384"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-384-1.png" alt="Histogram with vertical lines corresponding to observed statistic" width="\textwidth" />
 <p class="caption">
-FIGURE 11.9: Histogram with vertical lines corresponding to observed statistic
+Figure 10.9: Histogram with vertical lines corresponding to observed statistic
 </p>
 </div>
 <p>At this point, it is important to take a guess as to what the <span class="math inline">\(p\)</span>-value may be. We can see that there are only a few permuted differences as extreme or more extreme than our observed effect (in both directions). Maybe we guess that this <span class="math inline">\(p\)</span>-value is somewhere around 2%, or maybe 3%, but certainly not 30% or more. Lastly, we calculate the <span class="math inline">\(p\)</span>-value directly using <code>infer</code>:</p>
@@ -994,11 +973,11 @@ <h3><span class="header-section-number">11.7.10</span> The p-value</h3>
 <pre><code># A tibble: 1 x 1
   p_value
     &lt;dbl&gt;
-1   0.006</code></pre>
-<p>We have around 0.6% of values as extreme or more extreme than our observed statistic in both directions. Assuming we are using a 5% significance level for <span class="math inline">\(\alpha\)</span>, we have evidence supporting the conclusion that the mean rating for romance movies is different from that of action movies. The next important idea is to better understand just how much higher of a mean rating can we expect the romance movies to have compared to that of action movies.</p>
+1  0.0046</code></pre>
+<p>We have around 0.46% of values as extreme or more extreme than our observed statistic in both directions. Assuming we are using a 5% significance level for <span class="math inline">\(\alpha\)</span>, we have evidence supporting the conclusion that the mean rating for romance movies is different from that of action movies. The next important idea is to better understand just how much higher of a mean rating can we expect the romance movies to have compared to that of action movies.</p>
 </div>
 <div id="corresponding-confidence-interval" class="section level3">
-<h3><span class="header-section-number">11.7.11</span> Corresponding confidence interval</h3>
+<h3><span class="header-section-number">10.7.11</span> Corresponding confidence interval</h3>
 <p>One of the great things about the <code>infer</code> pipeline is that going between hypothesis tests and confidence intervals is incredibly simple. To create a null distribution, we ran</p>
 <pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
@@ -1035,7 +1014,7 @@ <h3><span class="header-section-number">11.7.11</span> Corresponding confidence
 </div>
 </div>
 <div id="summary-5" class="section level3">
-<h3><span class="header-section-number">11.7.12</span> Summary</h3>
+<h3><span class="header-section-number">10.7.12</span> Summary</h3>
 <p>To review, these are the steps one would take whenever you’d like to do a hypothesis test comparing
 values from the distributions of two groups:</p>
 <ul>
@@ -1054,13 +1033,13 @@ <h3><span class="header-section-number">11.7.12</span> Summary</h3>
 </div>
 </div>
 <div id="theory-hypo" class="section level2">
-<h2><span class="header-section-number">11.8</span> Building theory-based methods using computation</h2>
+<h2><span class="header-section-number">10.8</span> Building theory-based methods using computation</h2>
 <p>As a point of reference, we will now discuss the traditional theory-based way to conduct the hypothesis test for determining if there is a statistically significant difference in the sample mean rating of Action movies versus Romance movies. This method and ones like it work very well when the assumptions are met in order to run the test. They are based on probability models and distributions such as the normal and <span class="math inline">\(t\)</span>-distributions.</p>
 <p>These traditional methods have been used for many decades back to the time when researchers didn’t have access to computers that could run 5000 simulations in a few seconds. They had to base their methods on probability theory instead. Many fields and researchers continue to use these methods and that is the biggest reason for their inclusion here. It’s important to remember that a <span class="math inline">\(t\)</span>-test or a <span class="math inline">\(z\)</span>-test is really just an approximation of what you have seen in this chapter already using simulation and randomization. The focus here is on understanding how the shape of the <span class="math inline">\(t\)</span>-curve comes about without digging big into the mathematical underpinnings.</p>
 <div id="example-t-test-for-two-independent-samples" class="section level3">
-<h3><span class="header-section-number">11.8.1</span> Example: <span class="math inline">\(t\)</span>-test for two independent samples</h3>
+<h3><span class="header-section-number">10.8.1</span> Example: <span class="math inline">\(t\)</span>-test for two independent samples</h3>
 <p>What is commonly done in statistics is the process of normalization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common normalization is known as the <span class="math inline">\(z\)</span>-score. The formula for a <span class="math inline">\(z\)</span>-score is <span class="math display">\[Z = \frac{x - \mu}{\sigma},\]</span> where <span class="math inline">\(x\)</span> represent the value of a variable, <span class="math inline">\(\mu\)</span> represents the mean of the variable, and <span class="math inline">\(\sigma\)</span> represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding <span class="math inline">\(z\)</span>-score that gives how many standard deviations away that value is from its mean. <span class="math inline">\(z\)</span>-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below.</p>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-368-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-389-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity.</p>
 <p>Another form of normalization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This normalization is often called the <span class="math inline">\(t\)</span>-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is <span class="math display">\[T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}}  }\]</span></p>
 <p>There is a lot to try to unpack here.</p>
@@ -1080,10 +1059,10 @@ <h3><span class="header-section-number">11.8.1</span> Example: <span class="math
 <p>We have already built an approximation for what we think the distribution of <span class="math inline">\(\delta = \bar{x}_1 - \bar{x}_2\)</span> looks like using randomization above. Recall this distribution:</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> null_distribution_two_means, <span class="kw">aes</span>(<span class="dt">x =</span> stat)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">bins =</span> <span class="dv">20</span>)</code></pre>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-369"></span>
-<img src="ismaykim_files/figure-html/unnamed-chunk-369-1.png" alt="Simulated differences in means histogram" width="\textwidth" />
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-390"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-390-1.png" alt="Simulated differences in means histogram" width="\textwidth" />
 <p class="caption">
-FIGURE 11.10: Simulated differences in means histogram
+Figure 10.10: Simulated differences in means histogram
 </p>
 </div>
 <p>The <code>infer</code> package also includes some built-in theory-based statistics as well, so instead of going through the process of trying to transform the difference into a standardized form, we can just provide a different value for <code>stat</code> in <code>calculate()</code>. Recall the <code>generated_samples</code> data frame created via:</p>
@@ -1095,12 +1074,12 @@ <h3><span class="header-section-number">11.8.1</span> Example: <span class="math
 <pre class="sourceCode r"><code class="sourceCode r">null_distribution_t &lt;-<span class="st"> </span>generated_samples <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;t&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Romance&quot;</span>, <span class="st">&quot;Action&quot;</span>))
 null_distribution_t <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-371-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-392-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see that the shape of this <code>stat = &quot;t&quot;</code> distribution is the same as that of <code>stat = &quot;diff in means&quot;</code>. The scale has changed though with the <span class="math inline">\(t\)</span> values having less spread than the difference in means.</p>
 <p>A traditional <span class="math inline">\(t\)</span>-test doesn’t look at this simulated distribution, but instead it looks at the <span class="math inline">\(t\)</span>-curve with degrees of freedom equal to 62.029. We can overlay this distribution over the top of our permuted <span class="math inline">\(t\)</span> statistics using the <code>method = &quot;both&quot;</code> setting in <code>visualize()</code>.</p>
 <pre class="sourceCode r"><code class="sourceCode r">null_distribution_t <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">method =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-372-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-393-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We can see that the curve does a good job of approximating the randomization distribution here. (More on when to expect for this to be the case when we discuss conditions for the <span class="math inline">\(t\)</span>-test in a bit.) To calculate the <span class="math inline">\(p\)</span>-value in this case, we need to figure out how much of the total area under the <span class="math inline">\(t\)</span>-curve is at our observed <span class="math inline">\(T\)</span>-statistic or more, plus also adding the area under the curve at the negative value of the observed <span class="math inline">\(T\)</span>-statistic or below. (Remember this is a two-tailed test so we are looking for a difference–values in the tails of either direction.) Just as we converted all of the simulated values to <span class="math inline">\(T\)</span>-statistics, we must also do so for our observed effect <span class="math inline">\(\delta^*\)</span>:</p>
 <pre class="sourceCode r"><code class="sourceCode r">obs_t &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
@@ -1108,11 +1087,11 @@ <h3><span class="header-section-number">11.8.1</span> Example: <span class="math
 <p>So graphically we are interested in finding the percentage of values that are at or above 2.945 or at or below -2.945.</p>
 <pre class="sourceCode r"><code class="sourceCode r">null_distribution_t <span class="op">%&gt;%</span><span class="st"> </span>
 <span class="st">  </span><span class="kw">visualize</span>(<span class="dt">method =</span> <span class="st">&quot;both&quot;</span>, <span class="dt">obs_stat =</span> obs_t, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-374-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-395-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>As we might have expected with this just being a standardization of the difference in means statistic that produced a small <span class="math inline">\(p\)</span>-value, we also have a very small one here.</p>
 </div>
 <div id="conditions-for-t-test" class="section level3">
-<h3><span class="header-section-number">11.8.2</span> Conditions for t-test</h3>
+<h3><span class="header-section-number">10.8.2</span> Conditions for t-test</h3>
 <p>The <code>infer</code> package does not automatically check conditions for the theoretical methods to work and this warning was given when we used <code>method = &quot;both&quot;</code>. In order for the results of the <span class="math inline">\(t\)</span>-test to be valid, three conditions must be met:</p>
 <ol style="list-style-type: decimal">
 <li>Independent observations in both samples</li>
@@ -1120,36 +1099,29 @@ <h3><span class="header-section-number">11.8.2</span> Conditions for t-test</h3>
 <li>Independently selected samples</li>
 </ol>
 <p>Condition 1: This is met since we sampled at random using R from our population.</p>
-<p>Condition 2: Recall from Figure <a href="11-hypothesis-testing.html#fig:movie-hist">11.4</a>, that we know how the populations are distributed. Both of them are close to normally distributed. If we are a little concerned about this assumption, we also do have samples of size larger than 30 (<span class="math inline">\(n_1 = n_2 = 34\)</span>).</p>
+<p>Condition 2: Recall from Figure <a href="10-hypothesis-testing.html#fig:movie-hist">10.4</a>, that we know how the populations are distributed. Both of them are close to normally distributed. If we are a little concerned about this assumption, we also do have samples of size larger than 30 (<span class="math inline">\(n_1 = n_2 = 34\)</span>).</p>
 <p>Condition 3: This is met since there is no natural pairing of a movie in the Action group to a movie in the Romance group.</p>
 <p>Since all three conditions are met, we can be reasonably certain that the theory-based test will match the results of the randomization-based test using shuffling. Remember that theory-based tests can produce some incorrect results in these assumptions are not carefully checked. The only assumption for randomization and computational-based methods is that the sample is selected at random. They are our preference and we strongly believe they should be yours as well, but it’s important to also see how the theory-based tests can be done and used as an approximation for the computational techniques until at least more researchers are using these techniques that utilize the power of computers.</p>
-<hr />
 </div>
 </div>
-<div id="conclusion-7" class="section level2">
-<h2><span class="header-section-number">11.9</span> Conclusion</h2>
-<p>We conclude by showing the <code>infer</code> pipeline diagram. In Chapter <a href="12-inference-for-regression.html#inference-for-regression">12</a>, we’ll come back to regression and see how the ideas covered in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a> and this chapter can help in understanding the significance of predictors in modeling.</p>
+<div id="conclusion-8" class="section level2">
+<h2><span class="header-section-number">10.9</span> Conclusion</h2>
+<p>We conclude by showing the <code>infer</code> pipeline diagram. In Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a>, we’ll come back to regression and see how the ideas covered in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a> and this chapter can help in understanding the significance of predictors in modeling.</p>
 <p><img src="images/flowcharts/infer/ht_diagram.png" width="\textwidth" style="display: block; margin: auto;" /></p>
-<div id="script-of-r-code-3" class="section level3">
-<h3><span class="header-section-number">11.9.1</span> Script of R code</h3>
-<p>An R script file of all R code used in this chapter is available <a href="scripts/10-hypothesis-testing.R">here</a>.</p>
+<div id="script-of-r-code-7" class="section level3">
+<h3><span class="header-section-number">10.9.1</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/10-hypothesis-testing.R">here</a>.</p>
 
 </div>
 </div>
-</div>
-<h3>References</h3>
-<div id="refs" class="references">
-<div id="ref-R-ggplot2movies">
-<p>Wickham, Hadley. 2015. <em>Ggplot2movies: Movies Data</em>. <a href="https://CRAN.R-project.org/package=ggplot2movies">https://CRAN.R-project.org/package=ggplot2movies</a>.</p>
-</div>
 </div>
             </section>
 
           </div>
         </div>
       </div>
-<a href="10-example-comparing-two-proportions.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
-<a href="12-inference-for-regression.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+<a href="9-confidence-intervals.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="11-inference-for-regression.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
     </div>
   </div>
 <script src="libs/gitbook-2.6.7/js/app.min.js"></script>
diff --git a/docs/previous_versions/v0.4.0/11-inference-for-regression.html b/docs/previous_versions/v0.4.0/11-inference-for-regression.html
new file mode 100644
index 000000000..1b98665d0
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/11-inference-for-regression.html
@@ -0,0 +1,914 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>11 Inference for Regression | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="11 Inference for Regression | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="11 Inference for Regression | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="10-hypothesis-testing.html">
+<link rel="next" href="12-thinking-with-data.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="inference-for-regression" class="section level1">
+<h1><span class="header-section-number">11</span> Inference for Regression</h1>
+<hr />
+<div class="learncheck">
+<p>
+<strong>Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at <a href="https://github.com/moderndive/moderndive_book" class="uri">https://github.com/moderndive/moderndive_book</a>.</strong>
+</p>
+<center>
+<img src="images/sign-2408065_1920.png" alt="Drawing" style="height: 100px;"/>
+</center>
+</div>
+<hr />
+<div id="needed-packages-8" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(moderndive)
+<span class="kw">library</span>(infer)</code></pre>
+</div>
+<div id="datacamp-8" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>Our approach of understanding both the statistical and practical significance of any regression results, is aligned with the approach taken in <a href="https://twitter.com/jo_hardin47">Jo Hardin’s</a> DataCamp course “Inference for Regression.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-linear-regression"><img src="images/datacamp_inference_for_regression.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="simulation-based-inference-for-regression" class="section level2">
+<h2><span class="header-section-number">11.1</span> Simulation-based Inference for Regression</h2>
+<p>We can also use the concept of permuting to determine the standard error of our null distribution and conduct a hypothesis test for a population slope. Let’s go back to our example on teacher evaluations Chapters <a href="6-regression.html#regression">6</a> and <a href="7-multiple-regression.html#multiple-regression">7</a>. We’ll begin in the basic regression setting to test to see if we have evidence that a statistically significant <em>positive</em> relationship exists between teaching and beauty scores for the University of Texas professors. As we did in Chapter <a href="6-regression.html#regression">6</a>, teaching <code>score</code> will act as our outcome variable and <code>bty_avg</code> will be our explanatory variable. We will set up this hypothesis testing process as we have each before via the “There is Only One Test” diagram in Figure <a href="10-hypothesis-testing.html#fig:htdowney">10.1</a> using the <code>infer</code> package.</p>
+<div id="data-1" class="section level3">
+<h3><span class="header-section-number">11.1.1</span> Data</h3>
+<p>Our data is stored in <code>evals</code> and we are focused on the measurements of the <code>score</code> and <code>bty_avg</code> variables there. Note that we don’t choose a subset of variables here since we will <code>specify()</code> the variables of interest using <code>infer</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg)</code></pre>
+<pre><code>Response: score (numeric)
+Explanatory: bty_avg (numeric)
+# A tibble: 463 x 2
+   score bty_avg
+   &lt;dbl&gt;   &lt;dbl&gt;
+ 1   4.7    5   
+ 2   4.1    5   
+ 3   3.9    5   
+ 4   4.8    5   
+ 5   4.6    3   
+ 6   4.3    3   
+ 7   2.8    3   
+ 8   4.1    3.33
+ 9   3.4    3.33
+10   4.5    3.17
+# … with 453 more rows</code></pre>
+</div>
+<div id="test-statistic-delta-1" class="section level3">
+<h3><span class="header-section-number">11.1.2</span> Test statistic <span class="math inline">\(\delta\)</span></h3>
+<p>Our test statistic here is the sample slope coefficient that we denote with <span class="math inline">\(b_1\)</span>.</p>
+</div>
+<div id="observed-effect-delta-1" class="section level3">
+<h3><span class="header-section-number">11.1.3</span> Observed effect <span class="math inline">\(\delta^*\)</span></h3>
+<p>We can use the <code>specify() %&gt;% calculate()</code> shortcut here to determine the slope value seen in our observed data:</p>
+<pre class="sourceCode r"><code class="sourceCode r">slope_obs &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre>
+<p>The calculated slope value from our observed sample is <span class="math inline">\(b_1 = 0.067\)</span>.</p>
+</div>
+<div id="model-of-h_0-1" class="section level3">
+<h3><span class="header-section-number">11.1.4</span> Model of <span class="math inline">\(H_0\)</span></h3>
+<p>We are looking to see if a positive relationship exists so <span class="math inline">\(H_A: \beta_1 &gt; 0\)</span>. Our null hypothesis is always in terms of equality so we have <span class="math inline">\(H_0: \beta_1 = 0\)</span>. In other words, when we assume the null hypothesis is true, we are assuming there is NOT a linear relationship between teaching and beauty scores for University of Texas professors.</p>
+</div>
+<div id="simulated-data-1" class="section level3">
+<h3><span class="header-section-number">11.1.5</span> Simulated data</h3>
+<p>Now to simulate the null hypothesis being true and recreating how our sample was created, we need to think about what it means for <span class="math inline">\(\beta_1\)</span> to be zero. If <span class="math inline">\(\beta_1 = 0\)</span>, we said above that there is no relationship between the teaching and beauty scores. If there is no relationship, then any one of the teaching score values could have just as likely occurred with any of the other beauty score values instead of the one that it actually did fall with. We, therefore, have another example of permuting in our simulating of data under the null hypothesis.</p>
+<p><strong>Tactile simulation</strong></p>
+<p>We could use a deck of 926 note cards to create a tactile simulation of this permuting process. We would write the 463 different values of beauty scores on each of the 463 cards, one per card. We would then do the same thing for the 463 teaching scores putting them on one per card.</p>
+<p>Next, we would lay out each of the 463 beauty score cards and we would shuffle the teaching score deck. Then, after shuffling the deck well, we would disperse the cards one per each one of the beauty score cards. We would then enter these new values in for teaching score and compute a sample slope based on this permuting. We could repeat this process many times, keeping track of our sample slope after each shuffle.</p>
+</div>
+<div id="distribution-of-delta-under-h_0-1" class="section level3">
+<h3><span class="header-section-number">11.1.6</span> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></h3>
+<p>We can build our null distribution in much the same way we did in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> using the <code>generate()</code> and <code>calculate()</code> functions. Note also the addition of the <code>hypothesize()</code> function, which lets <code>generate()</code> know to perform the permuting instead of bootstrapping.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> slope_obs, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-404-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with <code>visualize()</code>.</p>
+</div>
+<div id="the-p-value-1" class="section level3">
+<h3><span class="header-section-number">11.1.7</span> The p-value</h3>
+<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> slope_obs, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1       0</code></pre>
+<p>Since 0.067 falls far to the right of this plot beyond where any of the histogram bins have data, we can say that we have a <span class="math inline">\(p\)</span>-value of 0. We, thus, have evidence to reject the null hypothesis in support of there being a positive association between the beauty score and teaching score of University of Texas faculty members.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC11.1)</strong> Repeat the inference above but this time for the correlation coefficient instead of the slope. Note the implementation of <code>stat = &quot;correlation&quot;</code> in the <code>calculate()</code> function of the <code>infer</code> package.</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="bootstrapping-for-the-regression-slope" class="section level2">
+<h2><span class="header-section-number">11.2</span> Bootstrapping for the regression slope</h2>
+<p>With the p-value calculated as 0 in the hypothesis test above, we can next determine just how strong of a positive slope value we might expect between the variables of teaching <code>score</code> and beauty score (<code>bty_avg</code>) for University of Texas faculty. Recall the <code>infer</code> pipeline above to compute the null distribution. Recall that this assumes the null hypothesis is true that there is no relationship between teaching score and beauty score using the <code>hypothesize()</code> function.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>, <span class="dt">type =</span> <span class="st">&quot;permute&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre>
+<p>To further reinforce the process being done in the pipeline, we’ve added the <code>type</code> argument to <code>generate()</code>. This is automatically added based on the entries for <code>specify()</code> and <code>hypothesize()</code> but it provides a useful way to check to make sure <code>generate()</code> is created the samples in the desired way. In this case, we <code>permute</code>d the values of one variable across the values of the other 10,000 times and <code>calculate</code>d a <code>&quot;slope&quot;</code> coefficient for each of these 10,000 <code>generate</code>d samples.</p>
+<p>If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping:</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-410-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Next we can use the <code>get_ci()</code> function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score.</p>
+<pre class="sourceCode r"><code class="sourceCode r">percentile_slope_ci &lt;-<span class="st"> </span>bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">level =</span> <span class="fl">0.99</span>, <span class="dt">type =</span> <span class="st">&quot;percentile&quot;</span>)
+percentile_slope_ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `0.5%` `99.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1 0.0229   0.110</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">se_slope_ci &lt;-<span class="st"> </span>bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">level =</span> <span class="fl">0.99</span>, <span class="dt">type =</span> <span class="st">&quot;se&quot;</span>, <span class="dt">point_estimate =</span> slope_obs)
+se_slope_ci</code></pre>
+<pre><code># A tibble: 1 x 2
+   lower upper
+   &lt;dbl&gt; &lt;dbl&gt;
+1 0.0220 0.111</code></pre>
+<p>With the bootstrap distribution being close to symmetric, it makes sense that the two resulting confidence intervals are similar.</p>
+<!-- It's all you, Bert! Not sure if we want to cover more about the t distribution here as well or how we should transition from simulation-based to theory-based for the multiple regression part? -->
+</div>
+<div id="inference-for-multiple-regression" class="section level2">
+<h2><span class="header-section-number">11.3</span> Inference for multiple regression</h2>
+<div id="refresher-professor-evaluations-data" class="section level3">
+<h3><span class="header-section-number">11.3.1</span> Refresher: Professor evaluations data</h3>
+<p>Let’s revisit the professor evaluations data that we analyzed using multiple regression with one numerical and one categorical predictor. In particular</p>
+<ul>
+<li><span class="math inline">\(y\)</span>: outcome variable of instructor evaluation <code>score</code></li>
+<li>predictor variables
+<ul>
+<li><span class="math inline">\(x_1\)</span>: numerical explanatory/predictor variable of <code>age</code></li>
+<li><span class="math inline">\(x_2\)</span>: categorical explanatory/predictor variable of <code>gender</code></li>
+</ul></li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(moderndive)
+
+evals_multiple &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(score, ethnicity, gender, language, age, bty_avg, rank)</code></pre>
+<p>First, recall that we had two competing potential models to explain professors’
+teaching scores:</p>
+<ol style="list-style-type: decimal">
+<li>Model 1: No interaction term. i.e. both male and female profs have the same slope describing the associated effect of age on teaching score</li>
+<li>Model 2: Includes an interaction term. i.e. we allow for male and female profs to have different slopes describing the associated effect of age on teaching score</li>
+</ol>
+</div>
+<div id="refresher-visualizations" class="section level3">
+<h3><span class="header-section-number">11.3.2</span> Refresher: Visualizations</h3>
+<p>Recall the plots we made for both these models:</p>
+<div class="figure" style="text-align: center"><span id="fig:model1"></span>
+<img src="ismaykim_files/figure-html/model1-1.png" alt="Model 1: no interaction effect included" width="\textwidth" />
+<p class="caption">
+Figure 11.1: Model 1: no interaction effect included
+</p>
+</div>
+<div class="figure" style="text-align: center"><span id="fig:model2"></span>
+<img src="ismaykim_files/figure-html/model2-1.png" alt="Model 2: interaction effect included" width="\textwidth" />
+<p class="caption">
+Figure 11.2: Model 2: interaction effect included
+</p>
+</div>
+</div>
+<div id="refresher-regression-tables" class="section level3">
+<h3><span class="header-section-number">11.3.3</span> Refresher: Regression tables</h3>
+<p>Last, let’s recall the regressions we fit. First, the regression with no
+interaction effect: note the use of <code>+</code> in the formula.</p>
+<pre class="sourceCode r"><code class="sourceCode r">score_model_<span class="dv">2</span> &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">+</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_multiple)
+<span class="kw">get_regression_table</span>(score_model_<span class="dv">2</span>)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-416">Table 11.1: </span>Model 1: Regression table with no interaction effect included</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">4.484</td>
+<td align="right">0.125</td>
+<td align="right">35.79</td>
+<td align="right">0.000</td>
+<td align="right">4.238</td>
+<td align="right">4.730</td>
+</tr>
+<tr class="even">
+<td align="left">age</td>
+<td align="right">-0.009</td>
+<td align="right">0.003</td>
+<td align="right">-3.28</td>
+<td align="right">0.001</td>
+<td align="right">-0.014</td>
+<td align="right">-0.003</td>
+</tr>
+<tr class="odd">
+<td align="left">gendermale</td>
+<td align="right">0.191</td>
+<td align="right">0.052</td>
+<td align="right">3.63</td>
+<td align="right">0.000</td>
+<td align="right">0.087</td>
+<td align="right">0.294</td>
+</tr>
+</tbody>
+</table>
+<p>Second, the regression with an interaction effect: note the use of <code>*</code> in the formula.</p>
+<pre class="sourceCode r"><code class="sourceCode r">score_model_<span class="dv">3</span> &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">*</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_multiple)
+<span class="kw">get_regression_table</span>(score_model_<span class="dv">3</span>)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-418">Table 11.2: </span>Model 2: Regression table with interaction effect included</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">4.883</td>
+<td align="right">0.205</td>
+<td align="right">23.80</td>
+<td align="right">0.000</td>
+<td align="right">4.480</td>
+<td align="right">5.286</td>
+</tr>
+<tr class="even">
+<td align="left">age</td>
+<td align="right">-0.018</td>
+<td align="right">0.004</td>
+<td align="right">-3.92</td>
+<td align="right">0.000</td>
+<td align="right">-0.026</td>
+<td align="right">-0.009</td>
+</tr>
+<tr class="odd">
+<td align="left">gendermale</td>
+<td align="right">-0.446</td>
+<td align="right">0.265</td>
+<td align="right">-1.68</td>
+<td align="right">0.094</td>
+<td align="right">-0.968</td>
+<td align="right">0.076</td>
+</tr>
+<tr class="even">
+<td align="left">age:gendermale</td>
+<td align="right">0.014</td>
+<td align="right">0.006</td>
+<td align="right">2.45</td>
+<td align="right">0.015</td>
+<td align="right">0.003</td>
+<td align="right">0.024</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div id="script-of-r-code-8" class="section level3">
+<h3><span class="header-section-number">11.3.4</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/11-inference-for-regression.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+
+
+
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="10-hypothesis-testing.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="12-thinking-with-data.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/11-inference-for-regression.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/13-thinking-with-data.html b/docs/previous_versions/v0.4.0/12-thinking-with-data.html
similarity index 71%
rename from docs/13-thinking-with-data.html
rename to docs/previous_versions/v0.4.0/12-thinking-with-data.html
index 56d5260ac..5e0ae1d8a 100644
--- a/docs/13-thinking-with-data.html
+++ b/docs/previous_versions/v0.4.0/12-thinking-with-data.html
@@ -5,11 +5,11 @@
 
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
-  <title>Chapter 13 Thinking with Data | Statistical Inference via Data Science</title>
+  <title>12 Thinking with Data | An Introduction to Statistical and Data Sciences via R</title>
   <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
-  <meta property="og:title" content="Chapter 13 Thinking with Data | Statistical Inference via Data Science" />
+  <meta property="og:title" content="12 Thinking with Data | An Introduction to Statistical and Data Sciences via R" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
@@ -17,7 +17,7 @@
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
-  <meta name="twitter:title" content="Chapter 13 Thinking with Data | Statistical Inference via Data Science" />
+  <meta name="twitter:title" content="12 Thinking with Data | An Introduction to Statistical and Data Sciences via R" />
   
   <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
@@ -25,14 +25,14 @@
 <meta name="author" content="Chester Ismay and Albert Y. Kim">
 
 
-<meta name="date" content="2019-02-03">
+<meta name="date" content="2018-07-21">
 
   <meta name="viewport" content="width=device-width, initial-scale=1">
   <meta name="apple-mobile-web-app-capable" content="yes">
   <meta name="apple-mobile-web-app-status-bar-style" content="black">
   <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
   <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
-<link rel="prev" href="12-inference-for-regression.html">
+<link rel="prev" href="11-inference-for-regression.html">
 <link rel="next" href="A-appendixA.html">
 <script src="libs/jquery-2.2.3/jquery.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -48,7 +48,6 @@
 
 
 
-<script src="libs/kePrint-0.0.1/kePrint.js"></script>
 <script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
 <link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
 <script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
@@ -147,54 +146,53 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.1</b> Introduction for students</a><ul>
-<li class="chapter" data-level="1.1.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.1.1</b> What you will learn from this book</a></li>
-<li class="chapter" data-level="1.1.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.1.2</b> Data/science pipeline</a></li>
-<li class="chapter" data-level="1.1.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.1.3</b> Reproducible research</a></li>
-<li class="chapter" data-level="1.1.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.1.4</b> Final note for students</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
 </ul></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.2</b> Introduction for instructors</a><ul>
-<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.2.1</b> Who is this book for?</a></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
 </ul></li>
-<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.3</b> DataCamp</a></li>
-<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.4</b> Connect and contribute</a></li>
-<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.5</b> About this book</a></li>
-<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.6</b> About the authors</a></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
-<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#r-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
 <li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
 <li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
 </ul></li>
 <li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
 <li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
-<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#errors-warnings-and-messages"><i class="fa fa-check"></i><b>2.2.2</b> Errors, warnings, and messages</a></li>
-<li class="chapter" data-level="2.2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.3</b> Tips on learning to code</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
 </ul></li>
 <li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
 <li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
 <li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
-<li class="chapter" data-level="2.3.3" data-path="2-getting-started.html"><a href="2-getting-started.html#package-use"><i class="fa fa-check"></i><b>2.3.3</b> Package use</a></li>
 </ul></li>
 <li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
-<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> <code>nycflights13</code> package</a></li>
-<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> <code>flights</code> data frame</a></li>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
 <li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
 <li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
 </ul></li>
 <li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
-<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#additional-resources"><i class="fa fa-check"></i><b>2.5.1</b> Additional resources</a></li>
-<li class="chapter" data-level="2.5.2" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
-<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization</a><ul>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
 <li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
 <li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
-<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder data</a></li>
-<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components"><i class="fa fa-check"></i><b>3.1.3</b> Other components</a></li>
-<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> ggplot2 package</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
 </ul></li>
 <li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
 <li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
@@ -217,76 +215,84 @@
 <li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
 </ul></li>
 <li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
-<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bar-or-geom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar or geom_col</a></li>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
 <li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
-<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#two-categ-barplot"><i class="fa fa-check"></i><b>3.8.3</b> Two categorical variables</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
 <li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
 </ul></li>
 <li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#summary-table"><i class="fa fa-check"></i><b>3.9.1</b> Summary table</a></li>
-<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#argument-specification"><i class="fa fa-check"></i><b>3.9.2</b> Argument specification</a></li>
-<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#additional-resources-1"><i class="fa fa-check"></i><b>3.9.3</b> Additional resources</a></li>
-<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>3.9.4</b> What’s to come</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="4" data-path="4-wrangling.html"><a href="4-wrangling.html"><i class="fa fa-check"></i><b>4</b> Data Wrangling</a><ul>
-<li class="chapter" data-level="" data-path="4-wrangling.html"><a href="4-wrangling.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="4.1" data-path="4-wrangling.html"><a href="4-wrangling.html#piping"><i class="fa fa-check"></i><b>4.1</b> The pipe operator: <code>%&gt;%</code></a></li>
-<li class="chapter" data-level="4.2" data-path="4-wrangling.html"><a href="4-wrangling.html#filter"><i class="fa fa-check"></i><b>4.2</b> <code>filter</code> rows</a></li>
-<li class="chapter" data-level="4.3" data-path="4-wrangling.html"><a href="4-wrangling.html#summarize"><i class="fa fa-check"></i><b>4.3</b> <code>summarize</code> variables</a></li>
-<li class="chapter" data-level="4.4" data-path="4-wrangling.html"><a href="4-wrangling.html#groupby"><i class="fa fa-check"></i><b>4.4</b> <code>group_by</code> rows</a><ul>
-<li class="chapter" data-level="4.4.1" data-path="4-wrangling.html"><a href="4-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>4.4.1</b> Grouping by more than one variable</a></li>
-</ul></li>
-<li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
-<li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
-<li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
-</ul></li>
-<li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
-<li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
-<li class="chapter" data-level="4.8.2" data-path="4-wrangling.html"><a href="4-wrangling.html#rename"><i class="fa fa-check"></i><b>4.8.2</b> <code>rename</code> variables</a></li>
-<li class="chapter" data-level="4.8.3" data-path="4-wrangling.html"><a href="4-wrangling.html#top_n-values-of-a-variable"><i class="fa fa-check"></i><b>4.8.3</b> <code>top_n</code> values of a variable</a></li>
-</ul></li>
-<li class="chapter" data-level="4.9" data-path="4-wrangling.html"><a href="4-wrangling.html#conclusion-2"><i class="fa fa-check"></i><b>4.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="4.9.1" data-path="4-wrangling.html"><a href="4-wrangling.html#summary-table-1"><i class="fa fa-check"></i><b>4.9.1</b> Summary table</a></li>
-<li class="chapter" data-level="4.9.2" data-path="4-wrangling.html"><a href="4-wrangling.html#additional-resources-2"><i class="fa fa-check"></i><b>4.9.2</b> Additional resources</a></li>
-<li class="chapter" data-level="4.9.3" data-path="4-wrangling.html"><a href="4-wrangling.html#whats-to-come-1"><i class="fa fa-check"></i><b>4.9.3</b> What’s to come?</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
-<li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
-</ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
-<li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
-</ul></li>
-<li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
-<li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
 <li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
 </ul></li>
 <li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
 <li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
 <li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
 </ul></li>
 <li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
 <li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
@@ -295,22 +301,24 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
 <li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
 </ul></li>
 <li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
 <li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
 <li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
 <li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
 </ul></li>
 <li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
 <li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
@@ -318,45 +326,36 @@
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>III Inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
-<li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
-</ul></li>
-<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
-<li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
-</ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
-</ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
 <li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
 <li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
 <li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
@@ -374,98 +373,93 @@
 </ul></li>
 <li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
 <li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
-<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> Example: One proportion</a><ul>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
 <li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
 <li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
 <li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
 </ul></li>
-</ul></li>
-<li class="chapter" data-level="10" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html"><i class="fa fa-check"></i><b>10</b> Example: Comparing two proportions</a><ul>
-<li class="chapter" data-level="10.0.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>10.0.1</b> Compute the point estimate</a></li>
-<li class="chapter" data-level="10.0.2" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>10.0.2</b> Bootstrap distribution</a></li>
-<li class="chapter" data-level="10.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#conclusion-6"><i class="fa fa-check"></i><b>10.1</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.1.1" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#whats-to-come-5"><i class="fa fa-check"></i><b>10.1.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="10.1.2" data-path="10-example-comparing-two-proportions.html"><a href="10-example-comparing-two-proportions.html#script-of-r-code-2"><i class="fa fa-check"></i><b>10.1.2</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="11" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html"><i class="fa fa-check"></i><b>11</b> Hypothesis Testing</a><ul>
-<li class="chapter" data-level="" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="11.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>11.1</b> When inference is not needed</a></li>
-<li class="chapter" data-level="11.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>11.2</b> Basics of hypothesis testing</a></li>
-<li class="chapter" data-level="11.3" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>11.3</b> Criminal trial analogy</a><ul>
-<li class="chapter" data-level="11.3.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>11.3.1</b> Two possible conclusions</a></li>
-</ul></li>
-<li class="chapter" data-level="11.4" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>11.4</b> Types of errors in hypothesis testing</a><ul>
-<li class="chapter" data-level="11.4.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>11.4.1</b> Logic of hypothesis testing</a></li>
-</ul></li>
-<li class="chapter" data-level="11.5" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>11.5</b> Statistical significance</a></li>
-<li class="chapter" data-level="11.6" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>11.6</b> Hypothesis testing with infer</a></li>
-<li class="chapter" data-level="11.7" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>11.7</b> Example: Comparing two means</a><ul>
-<li class="chapter" data-level="11.7.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>11.7.1</b> Randomization/permutation</a></li>
-<li class="chapter" data-level="11.7.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>11.7.2</b> Comparing action and romance movies</a></li>
-<li class="chapter" data-level="11.7.3" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>11.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
-<li class="chapter" data-level="11.7.4" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>11.7.4</b> Data</a></li>
-<li class="chapter" data-level="11.7.5" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>11.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="11.7.6" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>11.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
-<li class="chapter" data-level="11.7.7" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>11.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
-<li class="chapter" data-level="11.7.8" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>11.7.8</b> Simulated data</a></li>
-<li class="chapter" data-level="11.7.9" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>11.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="11.7.10" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>11.7.10</b> The p-value</a></li>
-<li class="chapter" data-level="11.7.11" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>11.7.11</b> Corresponding confidence interval</a></li>
-<li class="chapter" data-level="11.7.12" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>11.7.12</b> Summary</a></li>
-</ul></li>
-<li class="chapter" data-level="11.8" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>11.8</b> Building theory-based methods using computation</a><ul>
-<li class="chapter" data-level="11.8.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>11.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
-<li class="chapter" data-level="11.8.2" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>11.8.2</b> Conditions for t-test</a></li>
-</ul></li>
-<li class="chapter" data-level="11.9" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>11.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="11.9.1" data-path="11-hypothesis-testing.html"><a href="11-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>11.9.1</b> Script of R code</a></li>
-</ul></li>
-</ul></li>
-<li class="chapter" data-level="12" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html"><i class="fa fa-check"></i><b>12</b> Inference for Regression</a><ul>
-<li class="chapter" data-level="" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="12.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>12.1</b> Simulation-based Inference for Regression</a><ul>
-<li class="chapter" data-level="12.1.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>12.1.1</b> Data</a></li>
-<li class="chapter" data-level="12.1.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>12.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
-<li class="chapter" data-level="12.1.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>12.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
-<li class="chapter" data-level="12.1.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>12.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="12.1.5" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>12.1.5</b> Simulated data</a></li>
-<li class="chapter" data-level="12.1.6" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>12.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
-<li class="chapter" data-level="12.1.7" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>12.1.7</b> The p-value</a></li>
-</ul></li>
-<li class="chapter" data-level="12.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>12.2</b> Bootstrapping for the regression slope</a></li>
-<li class="chapter" data-level="12.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>12.3</b> Inference for multiple regression</a><ul>
-<li class="chapter" data-level="12.3.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>12.3.1</b> Refresher: Professor evaluations data</a></li>
-<li class="chapter" data-level="12.3.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>12.3.2</b> Refresher: Visualizations</a></li>
-<li class="chapter" data-level="12.3.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>12.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="12.3.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>12.3.4</b> Script of R code</a></li>
-</ul></li>
-<li class="chapter" data-level="12.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>12.4</b> Residual analysis</a><ul>
-<li class="chapter" data-level="12.4.1" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>12.4.1</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.2" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model2residuals"><i class="fa fa-check"></i><b>12.4.2</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.3" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model3residuals"><i class="fa fa-check"></i><b>12.4.3</b> Residual analysis</a></li>
-<li class="chapter" data-level="12.4.4" data-path="12-inference-for-regression.html"><a href="12-inference-for-regression.html#model4residuals"><i class="fa fa-check"></i><b>12.4.4</b> Residual analysis</a></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>IV Conclusion</b></span></li>
-<li class="chapter" data-level="13" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html"><i class="fa fa-check"></i><b>13</b> Thinking with Data</a><ul>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
-<li class="chapter" data-level="13.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>13.1</b> Case study: Seattle house prices</a><ul>
-<li class="chapter" data-level="13.1.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>13.1.1</b> Exploratory data analysis (EDA)</a></li>
-<li class="chapter" data-level="13.1.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>13.1.2</b> log10 transformations</a></li>
-<li class="chapter" data-level="13.1.3" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>13.1.3</b> EDA Part II</a></li>
-<li class="chapter" data-level="13.1.4" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>13.1.4</b> Regression modeling</a></li>
-<li class="chapter" data-level="13.1.5" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>13.1.5</b> Making predictions</a></li>
-</ul></li>
-<li class="chapter" data-level="13.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>13.2</b> Case study: Effective data storytelling</a><ul>
-<li class="chapter" data-level="13.2.1" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>13.2.1</b> Bechdel test for Hollywood gender representation</a></li>
-<li class="chapter" data-level="13.2.2" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>13.2.2</b> US Births in 1999</a></li>
-<li class="chapter" data-level="13.2.3" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>13.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="13.2.4" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>13.2.4</b> Script of R code</a></li>
-</ul></li>
-<li class="chapter" data-level="" data-path="13-thinking-with-data.html"><a href="13-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
 <li class="appendix"><span><b>Appendix</b></span></li>
 <li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
@@ -479,7 +473,7 @@
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
-<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
 <li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
 <li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
@@ -528,19 +522,12 @@
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
-<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-12"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
 <li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
 <li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
 </ul></li>
 </ul></li>
-<li class="chapter" data-level="D" data-path="D-appendixD.html"><a href="D-appendixD.html"><i class="fa fa-check"></i><b>D</b> Learning Check Solutions</a><ul>
-<li class="chapter" data-level="D.1" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-2-solutions"><i class="fa fa-check"></i><b>D.1</b> Chapter 2 Solutions</a></li>
-<li class="chapter" data-level="D.2" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-3-solutions"><i class="fa fa-check"></i><b>D.2</b> Chapter 3 Solutions</a></li>
-<li class="chapter" data-level="D.3" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-4-solutions"><i class="fa fa-check"></i><b>D.3</b> Chapter 4 Solutions</a></li>
-<li class="chapter" data-level="D.4" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-5-solutions"><i class="fa fa-check"></i><b>D.4</b> Chapter 5 Solutions</a></li>
-<li class="chapter" data-level="D.5" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-6-solutions"><i class="fa fa-check"></i><b>D.5</b> Chapter 6 Solutions</a></li>
-</ul></li>
 <li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
 </ul>
 
@@ -551,7 +538,7 @@
       <div class="body-inner">
         <div class="book-header" role="navigation">
           <h1>
-            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Statistical Inference via Data Science</a>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
           </h1>
         </div>
 
@@ -563,12 +550,12 @@ <h1>
 <img src='https://moderndive.com/wide_format.png' alt="ModernDive">
 </html>
 <div id="thinking-with-data" class="section level1">
-<h1><span class="header-section-number">Chapter 13</span> Thinking with Data</h1>
-<p>Recall in Section <a href="index.html#sec:intro-for-students">1.1</a> “Introduction for students” and at the end of chapters throughout this book, we displayed the “ModernDive flowchart” mapping your journey through this book.</p>
+<h1><span class="header-section-number">12</span> Thinking with Data</h1>
+<p>Recall in Section <a href="index.html#sec:intro-for-students">1.2</a> “Introduction for students” and at the end of chapters throughout this book, we displayed the “ModernDive flowchart” mapping your journey through this book.</p>
 <div class="figure" style="text-align: center"><span id="fig:moderndive-figure-conclusion"></span>
 <img src="images/flowcharts/flowchart/flowchart.002.png" alt="ModernDive Flowchart" width="\textwidth" />
 <p class="caption">
-FIGURE 13.1: ModernDive Flowchart
+Figure 12.1: ModernDive Flowchart
 </p>
 </div>
 <p>Let’s get a refresher of what you’ve covered so far. You first got started with with data in Chapter <a href="2-getting-started.html#getting-started">2</a>, where you learned about the difference between R and RStudio, started coding in R, started understanding what R packages are, and explored your first dataset: all domestic departure flights from a New York City airport in 2013. Then:</p>
@@ -576,8 +563,8 @@ <h1><span class="header-section-number">Chapter 13</span> Thinking with Data</h1
 <li><strong>Data science</strong>: You assembled your data science toolbox using <code>tidyverse</code> packages. In particular:
 <ul>
 <li>Ch.<a href="3-viz.html#viz">3</a>: Visualizing data via the <code>ggplot2</code> package.</li>
-<li>Ch.<a href="5-tidy.html#tidy">5</a>: Understanding the concept of “tidy” data as a standardized data input format for all packages in the <code>tidyverse</code></li>
-<li>Ch.<a href="4-wrangling.html#wrangling">4</a>: Wrangling data via the <code>dplyr</code> package.</li>
+<li>Ch.<a href="4-tidy.html#tidy">4</a>: Understanding the concept of “tidy” data as a standardized data input format for all packages in the <code>tidyverse</code></li>
+<li>Ch.<a href="5-wrangling.html#wrangling">5</a>: Wrangling data via the <code>dplyr</code> package.</li>
 </ul></li>
 <li><strong>Data modeling</strong>: Using these data science tools and helper functions from the <code>moderndive</code> package, you started performing data modeling. In particular:
 <ul>
@@ -588,11 +575,11 @@ <h1><span class="header-section-number">Chapter 13</span> Thinking with Data</h1
 <ul>
 <li>Ch.<a href="8-sampling.html#sampling">8</a>: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls.</li>
 <li>Ch.<a href="9-confidence-intervals.html#confidence-intervals">9</a>: Building confidence intervals.</li>
-<li>Ch.<a href="11-hypothesis-testing.html#hypothesis-testing">11</a>: Conducting hypothesis tests.</li>
+<li>Ch.<a href="10-hypothesis-testing.html#hypothesis-testing">10</a>: Conducting hypothesis tests.</li>
 </ul></li>
 <li><strong>Data modeling revisited</strong>: Armed with your new understanding of statistical inference, you revisited and reviewed the models you constructed in Ch.<a href="6-regression.html#regression">6</a> &amp; Ch.<a href="7-multiple-regression.html#multiple-regression">7</a>. In particular:
 <ul>
-<li>Ch.<a href="12-inference-for-regression.html#inference-for-regression">12</a>: Interpreting both the statistical and practice significance of the results of the models.</li>
+<li>Ch.<a href="11-inference-for-regression.html#inference-for-regression">11</a>: Interpreting both the statistical and practice significance of the results of the models.</li>
 </ul></li>
 </ol>
 <p>All this was our approach of guiding you through your first experiences of <a href="https://arxiv.org/pdf/1410.3127.pdf">“thinking with data”</a>, an expression originally coined by Diane Lambert of Google. How the philosophy underlying this expression guided our mapping of the flowchart above was well put in the introduction to the <a href="https://peerj.com/collections/50-practicaldatascistats/">“Practical Data Science for Stats”</a> collection of preprints focusing on the practical side of data science workflows and statistical analysis, curated by <a href="https://twitter.com/jennybryan?lang=en">Jennifer Bryan</a> and <a href="https://twitter.com/hadleywickham?lang=en">Hadley Wickham</a>:</p>
@@ -603,11 +590,11 @@ <h1><span class="header-section-number">Chapter 13</span> Thinking with Data</h1
 <div class="figure" style="text-align: center"><span id="fig:pipeline-figure-conclusion"></span>
 <img src="images/tidy1.png" alt="Data/Science Pipeline" width="\textwidth" />
 <p class="caption">
-FIGURE 13.2: Data/Science Pipeline
+Figure 12.2: Data/Science Pipeline
 </p>
 </div>
-<p>In Section <a href="13-thinking-with-data.html#seattle-house-prices">13.1</a>, we’ll take you through full-pass of the “Data/Science Pipeline” where we’ll analyze the sale price of houses in Seattle, WA, USA. In Section <a href="13-thinking-with-data.html#data-journalism">13.2</a>, we’ll present you with examples of effective data storytelling, in particular the articles from the data journalism website <a href="https://fivethirtyeight.com/">FiveThirtyEight.com</a>, many of whose source datasets are accessible from the <code>fivethirtyeight</code> R package.</p>
-<div id="needed-packages-10" class="section level3 unnumbered">
+<p>In Section <a href="12-thinking-with-data.html#seattle-house-prices">12.1</a>, we’ll take you through full-pass of the “Data/Science Pipeline” where we’ll analyze the sale price of houses in Seattle, WA, USA. In Section <a href="12-thinking-with-data.html#data-journalism">12.2</a>, we’ll present you with examples of effective data storytelling, in particular the articles from the data journalism website <a href="https://fivethirtyeight.com/">FiveThirtyEight.com</a>, many of whose source datasets are accessible from the <code>fivethirtyeight</code> R package.</p>
+<div id="needed-packages-9" class="section level3 unnumbered">
 <h3>Needed packages</h3>
 <p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
@@ -615,17 +602,17 @@ <h3>Needed packages</h3>
 <span class="kw">library</span>(moderndive)
 <span class="kw">library</span>(fivethirtyeight)</code></pre>
 </div>
-<div id="datacamp-3" class="section level3 unnumbered">
+<div id="datacamp-9" class="section level3 unnumbered">
 <h3>DataCamp</h3>
-<p>The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author <a href="https://twitter.com/rudeboybert">Albert Y. Kim’s</a> DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression.”</p>
+<p>The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author <a href="https://twitter.com/rudeboybert">Albert Y. Kim’s</a> DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression”.</p>
 <center>
-<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse"><img src="images/datacamp_intro_to_modeling.png" style="height: 200px;"/></a>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse"><img src="images/datacamp_intro_to_modeling.png" alt="Drawing" style="height: 200px;"/></a>
 </center>
-<p>Case studies involving data in the <code>fivethirtyeight</code> R package form the basis of ModernDive co-author <a href="https://twitter.com/old_man_chester?lang=en">Chester Ismay’s</a> DataCamp course “Effective Data Storytelling in the Tidyverse.” This free course can be accessed <a href="https://www.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free">here</a>.</p>
+<p>Cases studies involving data in the <code>fivethirtyeight</code> R package form the basis of ModernDive co-author <a href="https://twitter.com/old_man_chester?lang=en">Chester Ismay’s</a> DataCamp course “Effective Data Storytelling in the Tidyverse”. This free course can be accessed <a href="https://www.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free">here</a>.</p>
 <hr />
 </div>
 <div id="seattle-house-prices" class="section level2">
-<h2><span class="header-section-number">13.1</span> Case study: Seattle house prices</h2>
+<h2><span class="header-section-number">12.1</span> Case study: Seattle house prices</h2>
 <p><a href="https://www.kaggle.com/">Kaggle.com</a> is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the <a href="https://www.kaggle.com/harlfoxem/housesalesprediction">House Sales in King County, USA</a> consisting of homes sold in between May 2014 and May 2015 in King County, Washington State, USA, which includes the greater Seattle metropolitan area. This <a href="https://creativecommons.org/publicdomain/zero/1.0/">CC0: Public Domain</a> licensed dataset is included in the <code>moderndive</code> package in the <code>house_prices</code> data frame, which we’ll refer to as the “Seattle house prices” dataset.</p>
 <p>The dataset consists 21,613 houses and 21 variables describing these houses; for a full list of these variables see the help file by running <code>?house_prices</code> in the console. In this case study, we’ll create a model using multiple regression where:</p>
 <ul>
@@ -641,12 +628,12 @@ <h2><span class="header-section-number">13.1</span> Case study: Seattle house pr
 <span class="kw">library</span>(dplyr)
 <span class="kw">library</span>(moderndive)</code></pre>
 <div id="house-prices-EDA-I" class="section level3">
-<h3><span class="header-section-number">13.1.1</span> Exploratory data analysis (EDA)</h3>
+<h3><span class="header-section-number">12.1.1</span> Exploratory data analysis (EDA)</h3>
 <p>A crucial first step before any formal modeling is an exploratory data analysis, commonly abbreviated at EDA. Exploratory data analysis can give you a sense of your data, help identify issues with your data, bring to light any outliers, and help inform model construction. There are three basic approaches to EDA:</p>
 <ol style="list-style-type: decimal">
 <li>Most fundamentally, just looking at the raw data. For example using RStudio’s <code>View()</code> spreadsheet viewer or the <code>glimpse()</code> function from the <code>dplyr</code> package</li>
 <li>Creating visualizations like the ones using <code>ggplot2</code> from Chapter <a href="3-viz.html#viz">3</a></li>
-<li>Computing summary statistics using the <code>dplyr</code> data wrangling tools from Chapter <a href="4-wrangling.html#wrangling">4</a></li>
+<li>Computing summary statistics using the <code>dplyr</code> data wrangling tools from Chapter <a href="5-wrangling.html#wrangling">5</a></li>
 </ol>
 <p>First, let’s look the raw data using <code>View()</code> and the <code>glimpse()</code> function. Explore the dataset. Which variables are numerical and which are categorical? For the categorical variables, what are their levels? Which do you think would useful variables to use in a model for house price? In this case study, we’ll only consider the variables <code>price</code>, <code>sqft_living</code>, and <code>condition</code>. An important thing to observe is that while the <code>condition</code> variable has values <code>1</code> through <code>5</code>, these are saved in R as <code>fct</code> factors i.e. R’s way of saving categorical variables. So you should think of these as the “labels” <code>1</code> through <code>5</code> and not the numerical values <code>1</code> through <code>5</code>.</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(house_prices)
@@ -674,7 +661,7 @@ <h3><span class="header-section-number">13.1.1</span> Exploratory data analysis
 $ long          &lt;dbl&gt; -122, -122, -122, -122, -122, -122, -122, -122, -122, -…
 $ sqft_living15 &lt;int&gt; 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780, 2…
 $ sqft_lot15    &lt;int&gt; 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 8113,…</code></pre>
-<p>Let’s now perform the second possible approach to EDA: creating visualizations. Since <code>price</code> and <code>sqft_living</code> are numerical variables, an appropriate way to visualize of these variables’ distributions would be using a histogram using a <code>geom_histogram()</code> as seen in Section <a href="3-viz.html#histograms">3.5</a>. However, since <code>condition</code> is categorical, a barplot using a <code>geom_bar()</code> yields an appropriate visualization of its distribution. Recall from Section <a href="3-viz.html#geombar">3.8</a> that since <code>condition</code> is not “pre-counted”, we use a <code>geom_bar()</code> and not a <code>geom_col()</code>. In Figure <a href="13-thinking-with-data.html#fig:house-prices-viz">13.3</a>, we display all three of these visualizations at once.</p>
+<p>Let’s now perform the second possible approach to EDA: creating visualizations. Since <code>price</code> and <code>sqft_living</code> are numerical variables, an appropriate way to visualize of these variables’ distributions would be using a histogram using a <code>geom_histogram()</code> as seen in Section <a href="3-viz.html#histograms">3.5</a>. However, since <code>condition</code> is categorical, a barplot using a <code>geom_bar()</code> yields an appropriate visualization of its distribution. Recall from Section <a href="3-viz.html#geombar">3.8</a> that since <code>condition</code> is not “pre-counted”, we use a <code>geom_bar()</code> and not a <code>geom_col()</code>. In Figure <a href="12-thinking-with-data.html#fig:house-prices-viz">12.3</a>, we display all three of these visualizations at once.</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Histogram of house price:</span>
 <span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> price)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
@@ -692,7 +679,7 @@ <h3><span class="header-section-number">13.1.1</span> Exploratory data analysis
 <div class="figure" style="text-align: center"><span id="fig:house-prices-viz"></span>
 <img src="ismaykim_files/figure-html/house-prices-viz-1.png" alt="Exploratory visualizations of Seattle house prices data" width="\textwidth" />
 <p class="caption">
-FIGURE 13.3: Exploratory visualizations of Seattle house prices data
+Figure 12.3: Exploratory visualizations of Seattle house prices data
 </p>
 </div>
 <p>We observe the following:</p>
@@ -710,7 +697,7 @@ <h3><span class="header-section-number">13.1.1</span> Exploratory data analysis
 <li>Most houses are of <code>condition</code> 3, 4, or 5.</li>
 </ol>
 <p>In the case of <code>price</code>, why does the x-axis stretch so far to the right? It is because there are a very small number of houses with price closer to 8 million; these prices are outliers in this case. We say variable is “right skewed” as exhibited by the long right tail. This skew makes it difficult to compare prices of the less expensive houses as the more expensive houses dominate the scale of the x-axis. This is similarly the case for <code>sqft_living</code>.</p>
-<p>Let’s now perform the third possible approach to EDA: computing summary statistics. In particular, let’s compute 4 summary statistics using the <code>summarize()</code> data wrangling verb from Section <a href="4-wrangling.html#summarize">4.3</a>.</p>
+<p>Let’s now perform the third possible approach to EDA: computing summary statistics. In particular, let’s compute 4 summary statistics using the <code>summarize()</code> data wrangling verb from Section <a href="5-wrangling.html#summarize">5.4</a>.</p>
 <ul>
 <li>Two measures of center: the mean and median</li>
 <li>Two measures of variability/spread: the standard deviation and interquartile-range (IQR = 3rd quartile - 1st quartile)</li>
@@ -729,132 +716,65 @@ <h3><span class="header-section-number">13.1.1</span> Exploratory data analysis
 <p>Observe the following:</p>
 <ol style="list-style-type: decimal">
 <li>The mean <code>price</code> of $540,088 is larger than the median of $450,000. This is because the small number of very expensive outlier houses prices are inflating the average, whereas since the median is the “middle” value, it is not as sensitive to such large values at the high end. This is why the news typically reports median house prices and not average house prices when describing the real estate market. We say here that the median more “robust to outliers” than the mean.</li>
-<li>Similarly, while both the standard deviation and IQR are both measures of spread and variability, the IQR is more “robust to outliers.”</li>
+<li>Similarly, while both the standard deviation and IQR are both measures of spread and variability, the IQR is more “robust to outliers”.</li>
 </ol>
 <p>If you repeat the above <code>summarize()</code> for <code>sqft_living</code>, you’ll find a similar relationship between mean vs median and standard deviation vs IQR given its similar right-skewed nature. Is there anything we can do about this right-skew? Again, this could potentially be an issue because we’ll have a harder time discriminating between houses at the lower end of <code>price</code> and <code>sqft_living</code>, which might lead to a problem when modeling.</p>
 <p>We can in fact address this issue by using a log base 10 transformation, which we cover next.</p>
 </div>
 <div id="log10-transformations" class="section level3">
-<h3><span class="header-section-number">13.1.2</span> log10 transformations</h3>
-<p>At its simplest, <code>log10()</code> transformations returns base 10 <em>logarithms</em>. For example, since <span class="math inline">\(1000 = 10^3\)</span>, <code>log10(1000)</code> returns <code>3</code>. To undo a log10-transformation, we raise 10 to this value. For example, to undo the previous log10-transformation and return the original value of 1000, we raise 10 to this value <span class="math inline">\(10^{3}\)</span> by running <code>10^(3) = 1000</code>. log-transformations allow us to focus on multiplicative changes instead of additive ones, thereby emphasizing changes in “orders of magnitude.” Let’s illustrate this idea in Table <a href="13-thinking-with-data.html#tab:logten">13.1</a> with examples of prices of consumer goods in US dollars.</p>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<caption style="font-size: initial !important;">
-<span id="tab:logten">TABLE 13.1: </span>log10-transformed prices, orders of magnitude, and examples
-</caption>
+<h3><span class="header-section-number">12.1.2</span> log10 transformations</h3>
+<p>At its simplest, <code>log10()</code> transformations returns base 10 <em>logarithms</em>. For example, since <span class="math inline">\(1000 = 10^3\)</span>, <code>log10(1000)</code> returns <code>3</code>. To undo a log10-transformation, we raise 10 to this value. For example, to undo the previous log10-transformation and return the original value of 1000, we raise 10 to this value <span class="math inline">\(10^{3}\)</span> by running <code>10^(3) = 1000</code>. log-transformations allow us to focus on multiplicative changes instead of additive ones, thereby emphasizing changes in “orders of magnitude.” Let’s illustrate this idea in Table <a href="#tab:log10-orders-of-magnitude"><strong>??</strong></a> with examples of prices of consumer goods in US dollars.</p>
+<table>
 <thead>
-<tr>
-<th style="text-align:left;">
-Price
-</th>
-<th style="text-align:right;">
-log10(Price)
-</th>
-<th style="text-align:left;">
-Order of magnitude
-</th>
-<th style="text-align:left;">
-Examples
-</th>
+<tr class="header">
+<th align="center">Price</th>
+<th align="center">log10(Price)</th>
+<th align="center">Order of magnitude</th>
+<th align="center">Examples</th>
 </tr>
 </thead>
 <tbody>
-<tr>
-<td style="text-align:left;">
-$1
-</td>
-<td style="text-align:right;">
-0
-</td>
-<td style="text-align:left;">
-Singles
-</td>
-<td style="text-align:left;">
-Cups of coffee
-</td>
+<tr class="odd">
+<td align="center">$1</td>
+<td align="center">0</td>
+<td align="center">Singles</td>
+<td align="center">Cups of coffee</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-$10
-</td>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:left;">
-Tens
-</td>
-<td style="text-align:left;">
-Books
-</td>
+<tr class="even">
+<td align="center">$10</td>
+<td align="center">1</td>
+<td align="center">Tens</td>
+<td align="center">Books</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-$100
-</td>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:left;">
-Hundreds
-</td>
-<td style="text-align:left;">
-Mobile phones
-</td>
+<tr class="odd">
+<td align="center">$100</td>
+<td align="center">2</td>
+<td align="center">Hundreds</td>
+<td align="center">Mobile phones</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-$1,000
-</td>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:left;">
-Thousands
-</td>
-<td style="text-align:left;">
-High definition TV’s
-</td>
+<tr class="even">
+<td align="center">$1,000</td>
+<td align="center">3</td>
+<td align="center">Thousands</td>
+<td align="center">High definition TV’s</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-$10,000
-</td>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:left;">
-Tens of thousands
-</td>
-<td style="text-align:left;">
-Cars
-</td>
+<tr class="odd">
+<td align="center">$10,000</td>
+<td align="center">4</td>
+<td align="center">Tens of thousands</td>
+<td align="center">Cars</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-$100,000
-</td>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:left;">
-Hundreds of thousands
-</td>
-<td style="text-align:left;">
-Luxury cars &amp; houses
-</td>
+<tr class="even">
+<td align="center">$100,000</td>
+<td align="center">5</td>
+<td align="center">Hundreds of thousands</td>
+<td align="center">Luxury cars &amp; houses</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-$1,000,000
-</td>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:left;">
-Millions
-</td>
-<td style="text-align:left;">
-Luxury houses
-</td>
+<tr class="odd">
+<td align="center">$1,000,000</td>
+<td align="center">6</td>
+<td align="center">Millions</td>
+<td align="center">Luxury houses</td>
 </tr>
 </tbody>
 </table>
@@ -865,7 +785,7 @@ <h3><span class="header-section-number">13.1.2</span> log10 transformations</h3>
 <li>log10-transformations are <em>monotonic</em>, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B).</li>
 <li>Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: $100 to $1000.</li>
 </ol>
-<p>Let’s create new log10-transformed versions of the right-skewed variable <code>price</code> and <code>sqft_living</code> using the <code>mutate()</code> function from Section <a href="4-wrangling.html#mutate">4.5</a>, but we’ll give the latter the name <code>log10_size</code>, which is a little more succinct and descriptive a variable name.</p>
+<p>Let’s create new log10-transformed versions of the right-skewed variable <code>price</code> and <code>sqft_living</code> using the <code>mutate()</code> function from Section <a href="5-wrangling.html#mutate">5.6</a>, but we’ll give the latter the name <code>log10_size</code>, which is a little more succinct and descriptive a variable name.</p>
 <pre class="sourceCode r"><code class="sourceCode r">house_prices &lt;-<span class="st"> </span>house_prices <span class="op">%&gt;%</span>
 <span class="st">  </span><span class="kw">mutate</span>(
     <span class="dt">log10_price =</span> <span class="kw">log10</span>(price),
@@ -892,7 +812,7 @@ <h3><span class="header-section-number">13.1.2</span> log10 transformations</h3>
 <li>The house in the 6th row with <code>price</code> $1,225,000, which is just above one million dollars. Since <span class="math inline">\(10^6\)</span> is one million, its <code>log10_price</code> is 6.09. Contrast this with all other houses with <code>log10_price</code> less than 6.</li>
 <li>Similarly, there is only one house with size <code>sqft_living</code> less than 1000. Since <span class="math inline">\(1000 = 10^3\)</span>, its the lone house with <code>log10_size</code> less than 3.</li>
 </ul>
-<p>Let’s now visualize the before and after effects of this transformation for <code>price</code> in Figure <a href="13-thinking-with-data.html#fig:log10-price-viz">13.4</a>.</p>
+<p>Let’s now visualize the before and after effects of this transformation for <code>price</code> in Figure <a href="12-thinking-with-data.html#fig:log10-price-viz">12.4</a>.</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Before:</span>
 <span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> price)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
@@ -905,10 +825,10 @@ <h3><span class="header-section-number">13.1.2</span> log10 transformations</h3>
 <div class="figure" style="text-align: center"><span id="fig:log10-price-viz"></span>
 <img src="ismaykim_files/figure-html/log10-price-viz-1.png" alt="House price before and after log10-transformation" width="\textwidth" />
 <p class="caption">
-FIGURE 13.4: House price before and after log10-transformation
+Figure 12.4: House price before and after log10-transformation
 </p>
 </div>
-<p>Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and bell-shaped, although this isn’t always necessarily the case. Now you can now better discriminate between house prices at the lower end of the scale. Let’s do the same for size where the before variable is <code>sqft_living</code> and the after variable is <code>log10_size</code>. Observe in Figure <a href="13-thinking-with-data.html#fig:log10-size-viz">13.5</a> that the log10-transformation has a similar effect of un-skewing the variable. Again, we emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case.</p>
+<p>Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and bell-shaped, although this isn’t always necessarily the case. Now you can now better discriminate between house prices at the lower end of the scale. Let’s do the same for size where the before variable is <code>sqft_living</code> and the after variable is <code>log10_size</code>. Observe in Figure <a href="12-thinking-with-data.html#fig:log10-size-viz">12.5</a> that the log10-transformation has a similar effect of un-skewing the variable. Again, we emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case.</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Before:</span>
 <span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> sqft_living)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
@@ -921,7 +841,7 @@ <h3><span class="header-section-number">13.1.2</span> log10 transformations</h3>
 <div class="figure" style="text-align: center"><span id="fig:log10-size-viz"></span>
 <img src="ismaykim_files/figure-html/log10-size-viz-1.png" alt="House size before and after log10-transformation" width="\textwidth" />
 <p class="caption">
-FIGURE 13.5: House size before and after log10-transformation
+Figure 12.5: House size before and after log10-transformation
 </p>
 </div>
 <p>Given the now un-skewed nature of <code>log10_price</code> and <code>log10_size</code>, we are going to revise our modeling structure:</p>
@@ -935,16 +855,16 @@ <h3><span class="header-section-number">13.1.2</span> log10 transformations</h3>
 </ul>
 </div>
 <div id="eda-part-ii" class="section level3">
-<h3><span class="header-section-number">13.1.3</span> EDA Part II</h3>
-<p>Let’s continue our exploratory data analysis from Subsection <a href="13-thinking-with-data.html#house-prices-EDA-I">13.1.1</a> above. The earlier EDA you performed was <em>univariate</em> in nature in that we only considered one variable at a time. The goal of modeling, however, is to explore relationships between variables. So we must <em>jointly</em> consider the relationship between the outcome variable <code>log10_price</code> and the explanatory/predictor variables <code>log10_size</code> (numerical) and <code>condition</code> (categorical). We viewed such a modeling scenario in Section <a href="7-multiple-regression.html#model4">7.2</a> using the <code>evals</code> dataset, where the outcome variable was teaching <code>score</code>, the numerical explanatory/predictor variable was instructor <code>age</code> and the categorical explanatory/predictor variable was (binary) <code>gender</code>.</p>
-<p>We have two possible visual models. Either a parallel slopes model in Figure <a href="13-thinking-with-data.html#fig:house-price-parallel-slopes">13.6</a> where we have a different regression line for each of the 5 possible <code>condition</code> levels, each with a different intercept but the same slope:</p>
+<h3><span class="header-section-number">12.1.3</span> EDA Part II</h3>
+<p>Let’s continue our exploratory data analysis from Subsection <a href="12-thinking-with-data.html#house-prices-EDA-I">12.1.1</a> above. The earlier EDA you performed was <em>univariate</em> in nature in that we only considered one variable at a time. The goal of modeling, however, is to explore relationships between variables. So we must <em>jointly</em> consider the relationship between the outcome variable <code>log10_price</code> and the explanatory/predictor variables <code>log10_size</code> (numerical) and <code>condition</code> (categorical). We viewed such a modeling scenario in Section <a href="7-multiple-regression.html#model4">7.2</a> using the <code>evals</code> dataset, where the outcome variable was teaching <code>score</code>, the numerical explanatory/predictor variable was instructor <code>age</code> and the categorical explanatory/predictor variable was (binary) <code>gender</code>.</p>
+<p>We have two possible visual models. Either a parallel slopes model in Figure <a href="12-thinking-with-data.html#fig:house-price-parallel-slopes">12.6</a> where we have a different regression line for each of the 5 possible <code>condition</code> levels, each with a different intercept but the same slope:</p>
 <div class="figure" style="text-align: center"><span id="fig:house-price-parallel-slopes"></span>
 <img src="ismaykim_files/figure-html/house-price-parallel-slopes-1.png" alt="Parallel slopes model" width="\textwidth" />
 <p class="caption">
-FIGURE 13.6: Parallel slopes model
+Figure 12.6: Parallel slopes model
 </p>
 </div>
-<p>Or an interaction model in Figure <a href="13-thinking-with-data.html#fig:house-price-interaction">13.7</a>, where we allow each regression line to not only have different intercepts, but different slopes as well:</p>
+<p>Or an interaction model in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction">12.7</a>, where we allow each regression line to not only have different intercepts, but different slopes as well:</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> log10_size, <span class="dt">y =</span> log10_price, <span class="dt">col =</span> condition)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_point</span>(<span class="dt">alpha =</span> <span class="fl">0.1</span>) <span class="op">+</span>
 <span class="st">  </span><span class="kw">labs</span>(<span class="dt">y =</span> <span class="st">&quot;log10 price&quot;</span>, <span class="dt">x =</span> <span class="st">&quot;log10 size&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House prices in Seattle&quot;</span>) <span class="op">+</span>
@@ -952,10 +872,10 @@ <h3><span class="header-section-number">13.1.3</span> EDA Part II</h3>
 <div class="figure" style="text-align: center"><span id="fig:house-price-interaction"></span>
 <img src="ismaykim_files/figure-html/house-price-interaction-1.png" alt="Interaction model" width="\textwidth" />
 <p class="caption">
-FIGURE 13.7: Interaction model
+Figure 12.7: Interaction model
 </p>
 </div>
-<p>In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plots it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the purple line is highest, followed by condition 4 and 3. As for condition 1 and 2, this pattern isn’t as clear, as if you recall from the univariate barplot of <code>condition</code> in Figure <a href="13-thinking-with-data.html#fig:house-prices-viz">13.3</a> there are very few houses of condition 1 or 2. This reality is more apparent in an alternative visualization to Figure <a href="13-thinking-with-data.html#fig:house-price-interaction">13.7</a> displayed in Figure <a href="13-thinking-with-data.html#fig:house-price-interaction-2">13.8</a> that uses facets instead:</p>
+<p>In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plot it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the purple line is highest, followed by condition 4 and 3. As for condition 1 and 2, this pattern isn’t as clear, as if you recall from the univariate barplot of <code>condition</code> in Figure <a href="12-thinking-with-data.html#fig:house-prices-viz">12.3</a> there are very few houses of condition 1 or 2. This ready is more apparent in an alternative visualization to Figure <a href="12-thinking-with-data.html#fig:house-price-interaction">12.7</a> displayed in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction-2">12.8</a> that uses facets instead:</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> log10_size, <span class="dt">y =</span> log10_price, <span class="dt">col =</span> condition)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_point</span>(<span class="dt">alpha =</span> <span class="fl">0.3</span>) <span class="op">+</span>
 <span class="st">  </span><span class="kw">labs</span>(<span class="dt">y =</span> <span class="st">&quot;log10 price&quot;</span>, <span class="dt">x =</span> <span class="st">&quot;log10 size&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House prices in Seattle&quot;</span>) <span class="op">+</span>
@@ -964,14 +884,14 @@ <h3><span class="header-section-number">13.1.3</span> EDA Part II</h3>
 <div class="figure" style="text-align: center"><span id="fig:house-price-interaction-2"></span>
 <img src="ismaykim_files/figure-html/house-price-interaction-2-1.png" alt="Interaction model with facets" width="\textwidth" />
 <p class="caption">
-FIGURE 13.8: Interaction model with facets
+Figure 12.8: Interaction model with facets
 </p>
 </div>
-<p>Which exploratory visualization of the interaction model is better, the one in Figure <a href="13-thinking-with-data.html#fig:house-price-interaction">13.7</a> or Figure <a href="13-thinking-with-data.html#fig:house-price-interaction-2">13.8</a>? There is no universal right answer, you need to make a choice depending on what you want to convey, and own it.</p>
+<p>Which exploratory visualization of the interaction model is better, the one in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction">12.7</a> or Figure <a href="12-thinking-with-data.html#fig:house-price-interaction-2">12.8</a>? There is no universal right answer, you need to make a choice depending on what you want to convey, and own it.</p>
 </div>
 <div id="house-prices-regression" class="section level3">
-<h3><span class="header-section-number">13.1.4</span> Regression modeling</h3>
-<p>For now let’s focus on the latter, interaction model we’ve visualized in Figure <a href="13-thinking-with-data.html#fig:house-price-interaction-2">13.8</a> above. What are the 5 different slopes and intercepts for the condition = 1, condition = 2, …, and condition = 5 lines in Figure <a href="13-thinking-with-data.html#fig:house-price-interaction-2">13.8</a>? To determine these, we first need the values from the regression table:</p>
+<h3><span class="header-section-number">12.1.4</span> Regression modeling</h3>
+<p>For now let’s focus on the latter, interaction model we’ve visualized in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction-2">12.8</a> above. What are the 5 different slopes and intercepts for the condition = 1, condition = 2, …, and condition = 5 lines in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction-2">12.8</a>? To determine these, we first need the values from the regression table:</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Fit regression model:</span>
 price_interaction &lt;-<span class="st"> </span><span class="kw">lm</span>(log10_price <span class="op">~</span><span class="st"> </span>log10_size <span class="op">*</span><span class="st"> </span>condition, <span class="dt">data =</span> house_prices)
 <span class="co"># Get regression table:</span>
@@ -999,7 +919,7 @@ <h3><span class="header-section-number">13.1.4</span> Regression modeling</h3>
 <li>Condition 4: <span class="math inline">\(\widehat{\log10(\text{price})} = (3.33 - 0.398) + (0.69 + 0.146) * \log10(\text{size}) = 2.93 + 0.836 * \log10(\text{size})\)</span></li>
 <li>Condition 5: <span class="math inline">\(\widehat{\log10(\text{price})} = (3.33 - 0.883) + (0.69 + 0.31) * \log10(\text{size}) = 2.45 + 1 * \log10(\text{size})\)</span></li>
 </ol>
-<p>These correspond to the regression lines in the exploratory visualization of the interaction model in Figure <a href="13-thinking-with-data.html#fig:house-price-interaction">13.7</a> above. For homes of all 5 condition types, as the size of the house increases, the prices increases. This is what most would expect. However, the rate of increase of price with size is fastest for the homes with condition 3, 4, and 5 of 0.823, 0.836, and 1 respectively; these are the 3 most largest slopes out of the 5.</p>
+<p>These correspond to the regression lines in the exploratory visualization of the interaction model in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction">12.7</a> above. For homes of all 5 condition types, as the size of the house increases, the prices increases. This is what most would expect. However, the rate of increase of price with size is fastest for the homes with condition 3, 4, and 5 of 0.823, 0.836, and 1 respectively; these are the 3 most largest slopes out of the 5.</p>
 <!--
 ### Inference for regression {#house-prices-inference-for-regression}
 
@@ -1007,8 +927,8 @@ <h3><span class="header-section-number">13.1.4</span> Regression modeling</h3>
 -->
 </div>
 <div id="house-prices-making-predictions" class="section level3">
-<h3><span class="header-section-number">13.1.5</span> Making predictions</h3>
-<p>Say you’re a realtor and someone calls you asking you how much their home will sell for. They tell you that it’s in condition = 5 and is sized 1900 square feet. What do you tell them? We first make this prediction visually in Figure <a href="13-thinking-with-data.html#fig:house-price-interaction-3">13.9</a>. The predicted <code>log10_price</code> of this house is marked with a black dot: it is where the two following lines intersect:</p>
+<h3><span class="header-section-number">12.1.5</span> Making predictions</h3>
+<p>Say you’re a realtor and someone calls you asking you how much their home will sell for. They tell you that it’s in condition = 5 and is sized 1900 square feet. What do you tell them? We first make this prediction visually in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction-3">12.9</a>. The predicted <code>log10_price</code> of this house is marked with a black dot: it is where the two following lines intersect:</p>
 <ul>
 <li>The purple regression line for the condition = 5 homes and</li>
 <li>The vertical dashed black line at <code>log10_size</code> equals 3.28, since our predictor variable is the log10-transformed square feet of living space and <span class="math inline">\(\log10(1900) = 3.28\)</span> .</li>
@@ -1016,14 +936,14 @@ <h3><span class="header-section-number">13.1.5</span> Making predictions</h3>
 <div class="figure" style="text-align: center"><span id="fig:house-price-interaction-3"></span>
 <img src="ismaykim_files/figure-html/house-price-interaction-3-1.png" alt="Interaction model with prediction" width="\textwidth" />
 <p class="caption">
-FIGURE 13.9: Interaction model with prediction
+Figure 12.9: Interaction model with prediction
 </p>
 </div>
 <p>Eyeballing it, it seems the predicted <code>log10_price</code> seems to be around 5.72. Let’s now obtain the an exact numerical value for the prediction using the values of the intercept and slope for the condition = 5 that we computed using the regression table output. We use the equation for the condition = 5 line, being sure to <code>log10()</code> the
 square footage first.</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="fl">2.45</span> <span class="op">+</span><span class="st"> </span><span class="dv">1</span> <span class="op">*</span><span class="st"> </span><span class="kw">log10</span>(<span class="dv">1900</span>)</code></pre>
 <pre><code>[1] 5.73</code></pre>
-<p>This value is very close to our earlier visually made prediction of 5.72. But wait! We were using the outcome variable <code>log10_price</code> as our outcome variable! So if we want a prediction in terms of <code>price</code> in dollar units, we need to un-log this by taking a power of 10 as described in Section <a href="13-thinking-with-data.html#log10-transformations">13.1.2</a>.</p>
+<p>This value is very close to our earlier visually made prediction of 5.72. But wait! We were using the outcome variable <code>log10_price</code> as our outcome variable! So if we want a prediction in terms of <code>price</code> in dollar units, we need to un-log this by taking a power of 10 as described in Section <a href="12-thinking-with-data.html#log10-transformations">12.1.2</a>.</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="dv">10</span><span class="op">^</span>(<span class="fl">2.45</span> <span class="op">+</span><span class="st"> </span><span class="dv">1</span> <span class="op">*</span><span class="st"> </span><span class="kw">log10</span>(<span class="dv">1900</span>))</code></pre>
 <pre><code>[1] 535493</code></pre>
 <p>So we our predicted price for this home of condition 5 and size 1900 square feet is $535,493.</p>
@@ -1032,7 +952,7 @@ <h3><span class="header-section-number">13.1.5</span> Making predictions</h3>
 <strong><em>Learning check</em></strong>
 </p>
 </div>
-<p><strong>(LC12.1)</strong> Repeat the regression modeling in Subsection <a href="13-thinking-with-data.html#house-prices-regression">13.1.4</a> and the prediction making you just did on the house of condition 5 and size 1900 square feet in Subsection <a href="13-thinking-with-data.html#house-prices-making-predictions">13.1.5</a>, but using the parallel slopes model you visualized in Figure <a href="13-thinking-with-data.html#fig:house-price-parallel-slopes">13.6</a>. Hint: it’s $524,807!
+<p><strong>(LC12.1)</strong> Repeat the regression modeling in Subsection <a href="12-thinking-with-data.html#house-prices-regression">12.1.4</a> and the prediction making you just did on the house of condition 5 and size 1900 square feet in Subsection <a href="12-thinking-with-data.html#house-prices-making-predictions">12.1.5</a>, but using the parallel slopes model you visualized in Figure <a href="12-thinking-with-data.html#fig:house-price-parallel-slopes">12.6</a>. Hint: it’s $524,807!
 <!--
 Add this in later:
 intepreting the inference for regression in Subsection \@ref(house-prices-inference-for-regression),
@@ -1043,27 +963,27 @@ <h3><span class="header-section-number">13.1.5</span> Making predictions</h3>
 </div>
 </div>
 <div id="data-journalism" class="section level2">
-<h2><span class="header-section-number">13.2</span> Case study: Effective data storytelling</h2>
+<h2><span class="header-section-number">12.2</span> Case study: Effective data storytelling</h2>
 <hr />
 <div class="learncheck">
 <p>
 <strong>Note: This section is still under construction. If you would like to contribute, please check us out on GitHub at <a href="https://github.com/moderndive/moderndive_book" class="uri">https://github.com/moderndive/moderndive_book</a>.</strong>
 </p>
 <center>
-/begin{center} <code>r include_image(path = “images/sign-2408065_1920.png”, html_opts=“height=100px”, latex_opts = “width=20%”)</code> /end{center}
+<img src="images/sign-2408065_1920.png" alt="Drawing" style="height: 100px;"/>
 </center>
 </div>
 <hr />
 <p>As we’ve progressed throughout this book, you’ve seen how to work with data in a variety of ways. You’ve learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types. You’ve summarized data in table form and calculated summary statistics for a variety of different variables. Further, you’ve seen the value of inference as a process to come to conclusions about a population by using a random sample. Lastly, you’ve explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure. All throughout, you’ve learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the “effective data storytelling” done by data journalists around the world. Great data stories don’t mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling.</p>
 <div id="bechdel-test-for-hollywood-gender-representation" class="section level3">
-<h3><span class="header-section-number">13.2.1</span> Bechdel test for Hollywood gender representation</h3>
+<h3><span class="header-section-number">12.2.1</span> Bechdel test for Hollywood gender representation</h3>
 <p>We recommend you read and analyze this article by Walt Hickey entitled <a href="http://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/">The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women</a> on the Bechdel test, an informal test of gender representation in a movie. As you read over it, think carefully about how Walt is using data, graphics, and analyses to paint the picture for the reader of what the story is he wants to tell. In the spirit of reproducibility, the members of FiveThirtyEight have also shared the <a href="https://github.com/fivethirtyeight/data/tree/master/bechdel">data and R code</a> that they used to create for this story and many more of their articles on <a href="https://github.com/fivethirtyeight/data">GitHub</a>.</p>
 <p>ModernDive co-authors <a href="https://twitter.com/old_man_chester?lang=en">Chester Ismay</a> and <a href="https://twitter.com/rudeboybert">Albert Y. Kim</a> along with <a href="https://twitter.com/jchunn206">Jennifer Chunn</a> went one step further by creating the <a href="https://fivethirtyeight-r.netlify.com/"><code>fivethirtyeight</code> R package</a>. The <code>fivethirtyeight</code> package takes FiveThirtyEight’s article data from GitHub, <a href="http://rpubs.com/rudeboybert/fivethirtyeight_tamedata">“tames”</a> it so that it’s novice-friendly, and makes all data, documentation, and the original article easily accessible via an R package.</p>
 <p>The package homepage also includes a list of <a href="https://fivethirtyeight-r.netlify.com/articles/fivethirtyeight.html#data-sets">all <code>fivethirtyeight</code> data sets</a> included.</p>
 <p>Furthermore, example “vignettes” of fully reproducible start-to-finish analyses of some of these data using <code>dplyr</code>, <code>ggplot2</code>, and other packages in the <code>tidyverse</code> is available <a href="https://fivethirtyeight-r.netlify.com/articles/">here</a>. For example, a vignette showing how to reproduce one of the plots at the end of the above article on the Bechdel test is available <a href="https://fivethirtyeight-r.netlify.com/articles/bechdel.html">here</a>.</p>
 </div>
 <div id="us-births-in-1999" class="section level3">
-<h3><span class="header-section-number">13.2.2</span> US Births in 1999</h3>
+<h3><span class="header-section-number">12.2.2</span> US Births in 1999</h3>
 <p>Here is another example involving the <code>US_births_1994_2003</code> data frame of all births in the United States between 1994 and 2003. For more information on this data frame including a link to the original article on FiveThirtyEight.com, check out the help file by running <code>?US_births_1994_2003</code> in the console. First, let’s load all necessary packages:</p>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
 <span class="kw">library</span>(dplyr)
@@ -1086,23 +1006,22 @@ <h3><span class="header-section-number">13.2.2</span> US Births in 1999</h3>
 <pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(US_births_<span class="dv">1999</span>, <span class="kw">aes</span>(<span class="dt">x =</span> date, <span class="dt">y =</span> births)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_line</span>() <span class="op">+</span>
 <span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Data&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Number of births&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;US Births in 1999&quot;</span>)</code></pre>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-435-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-439-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <p>We see a big valley occurring just before January 1st, 2000, mostly likely due to the holiday season. However, what about the major peak of over 14,000 births occurring just before October 1st, 1999? What could be the reason for this anomalously high spike in ? Time to think with data!</p>
 </div>
 <div id="other-examples" class="section level3">
-<h3><span class="header-section-number">13.2.3</span> Other examples</h3>
+<h3><span class="header-section-number">12.2.3</span> Other examples</h3>
 <p>Stand by!</p>
 </div>
-<div id="script-of-r-code-5" class="section level3">
-<h3><span class="header-section-number">13.2.4</span> Script of R code</h3>
-<p>An R script file of all R code used in this chapter is available <a href="scripts/12-thinking-with-data.R">here</a>.</p>
+<div id="script-of-r-code-9" class="section level3">
+<h3><span class="header-section-number">12.2.4</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/12-thinking-with-data.R">here</a>.</p>
 </div>
 </div>
 <div id="concluding-remarks" class="section level2 unnumbered">
 <h2>Concluding remarks</h2>
 <p>If you’ve come to this point in the book, I’d suspect that you know a thing or two about how to work with data in R. You’ve also gained a lot of knowledge about how to use simulation techniques to determine statistical significance and how these techniques build an intuition about traditional inferential methods like the <span class="math inline">\(t\)</span>-test. The hope is that you’ve come to appreciate data wrangling, tidy datasets, and the power of data visualization. Actually, the data visualization part may be the most important thing here. If you can create truly beautiful graphics that display information in ways that the reader can clearly decipher, you’ve picked up a great skill. Let’s hope that that skill keeps you creating great stories with data into the near and far distant future. Thanks for coming along for the ride as we dove into modern data analysis using R!</p>
 
-
 </div>
 </div>
 
@@ -1113,7 +1032,7 @@ <h2>Concluding remarks</h2>
           </div>
         </div>
       </div>
-<a href="12-inference-for-regression.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="11-inference-for-regression.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
 <a href="A-appendixA.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
     </div>
   </div>
diff --git a/docs/previous_versions/v0.4.0/2-getting-started.html b/docs/previous_versions/v0.4.0/2-getting-started.html
new file mode 100644
index 000000000..b48e65c5d
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/2-getting-started.html
@@ -0,0 +1,990 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>2 Getting Started with Data in R | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="2 Getting Started with Data in R | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="2 Getting Started with Data in R | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="index.html">
+<link rel="next" href="3-viz.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="getting-started" class="section level1">
+<h1><span class="header-section-number">2</span> Getting Started with Data in R</h1>
+<p>Before we can start exploring data in R, there are some key concepts to understand first:</p>
+<ol style="list-style-type: decimal">
+<li>What are R and RStudio?</li>
+<li>How do I code in R?</li>
+<li>What are R packages?</li>
+</ol>
+<p>If you are already familiar with these concepts, feel free to skip to Section <a href="2-getting-started.html#nycflights13">2.4</a> below introducing some of the datasets we will explore in depth in this book. Much of this chapter is based on two sources which you should feel free to use as references if you are looking for additional details:</p>
+<ol style="list-style-type: decimal">
+<li>ModernDive co-author Chester Ismay’s <a href="http://ismayc.github.io/rbasics-book">Getting used to R, RStudio, and R Markdown</a> <span class="citation">(Ismay 2016)</span>, which includes video screen recordings that you can follow along and pause as you learn.</li>
+<li>DataCamp’s online tutorials. DataCamp is a browser-based interactive platform for learning data science and their tutorials will help facilitate your learning of the above concepts (and other topics in this book). Go to <a href="https://www.datacamp.com/">DataCamp</a> and create an account before continuing.</li>
+</ol>
+<hr />
+<div id="what-are-r-and-rstudio" class="section level2">
+<h2><span class="header-section-number">2.1</span> What are R and RStudio?</h2>
+<p>For much of this book, we will assume that you are using R via RStudio. First time users often confuse the two. At its simplest:</p>
+<ul>
+<li>R is like a car’s engine</li>
+<li>RStudio is like a car’s dashboard</li>
+</ul>
+<table>
+<thead>
+<tr class="header">
+<th align="center">R: Engine</th>
+<th align="center">RStudio: Dashboard</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center"><img src="images/engine.jpg" alt="Drawing" style="height: 200px;"/></td>
+<td align="center"><img src="images/dashboard.jpg" alt="Drawing" style="height: 200px;"/></td>
+</tr>
+</tbody>
+</table>
+<p>More precisely, R is a programming language that runs computations while RStudio is an <em>integrated development environment (IDE)</em> that provides an interface by adding many convenient features and tools. So the way of having access to a speedometer, rearview mirrors, and a navigation system makes driving much easier, using RStudio’s interface makes using R much easier as well.</p>
+<p>Optional: For a more in-depth discussion on the difference between R and RStudio IDE, watch this <a href="https://campus.datacamp.com/courses/working-with-the-rstudio-ide-part-1/orientation?ex=1">DataCamp video (2m52s)</a>.</p>
+<div id="installing-r-and-rstudio" class="section level3">
+<h3><span class="header-section-number">2.1.1</span> Installing R and RStudio</h3>
+<p><em>If your instructor has provided you with a link and access to RStudio Server, then you can skip this section. We do recommend though after a few months of working on the RStudio Server that you return to these instructions. If you don’t know what RStudio Server is, then please read this section.</em></p>
+<p>You will first need to download and install both R and RStudio (Desktop version) on your computer.</p>
+<ol style="list-style-type: decimal">
+<li><a href="https://cran.r-project.org/">Download and install R</a>.
+<ul>
+<li>Note: You must do this first.</li>
+<li>Click on the download link corresponding to your computer’s operating system.</li>
+</ul></li>
+<li><a href="https://www.rstudio.com/products/rstudio/download3/">Download and install RStudio</a>.
+<ul>
+<li>Scroll down to “Installers for Supported Platforms”</li>
+<li>Click on the download link corresponding to your computer’s operating system.</li>
+</ul></li>
+</ol>
+<p>Optional: If you need more detailed instructions on how to install R and RStudio, watch this <a href="https://campus.datacamp.com/courses/working-with-the-rstudio-ide-part-1/orientation?ex=3">DataCamp video (1m22s)</a>.</p>
+</div>
+<div id="using-r-via-rstudio" class="section level3">
+<h3><span class="header-section-number">2.1.2</span> Using R via RStudio</h3>
+<p>Recall our car analogy from above. Much as we don’t drive a car by interacting directly with the engine but rather by using elements on the car’s dashboard, we won’t be using R directly but rather we will use RStudio’s interface. After you install R and RStudio on your computer, you’ll have two new programs AKA applications you can open. We will always work in RStudio and not R. In other words:</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">R: Do not open this</th>
+<th align="center">RStudio: Open this</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center"><img src="https://cran.r-project.org/Rlogo.svg" alt="Drawing" style="height: 100px;"/></td>
+<td align="center"><img src="https://www.rstudio.com/wp-content/uploads/2014/06/RStudio-Ball.png" alt="Drawing" style="height: 100px;"/></td>
+</tr>
+</tbody>
+</table>
+<p>After you open RStudio, you should see the following:</p>
+<p><img src="images/rstudio.png" /></p>
+<p>Watch the following <a href="https://campus.datacamp.com/courses/working-with-the-rstudio-ide-part-1/orientation?ex=5">DataCamp video (4m10s)</a> to learn about the different <em>panes</em> in RStudio, in particular the <em>Console pane</em> where you will later run R code.</p>
+<hr />
+</div>
+</div>
+<div id="code" class="section level2">
+<h2><span class="header-section-number">2.2</span> How do I code in R?</h2>
+<p>Now that you’re set up with R and RStudio, you are probably asking yourself “OK. Now how do I use R?” The first thing to note as that unlike other software like Excel, STATA, or SAS that provide <a href="https://en.wikipedia.org/wiki/Point_and_click">point and click</a> interfaces, R is an <a href="https://en.wikipedia.org/wiki/Interpreted_language">interpreted language</a>, meaning you have to enter in R commands written in R code i.e. you have to program in R (we use the terms “coding” and “programming” interchangeably in this book).</p>
+<p>While it is not required to be a seasoned coder/computer programmer to use R, there is still a set of basic programming concepts that R users need to understand. Consequently, while this book is not a book on programming, you will still learn just enough of these basic programming concepts needed to explore and analyze data effectively.</p>
+<div id="programming-concepts" class="section level3">
+<h3><span class="header-section-number">2.2.1</span> Basic programming concepts and terminology</h3>
+<p>To introduce you to many of these basic programming concepts and terminology, we direct you to the following DataCamp online interactive tutorials. For each of the tutorials, we give a list of the basic programming concepts covered. Note that in this book, we will use a different font to distinguish regular font from <code>computer_code</code>.</p>
+<p>It is important to note that while these tutorials serve as excellent introductions, a single pass through them is insufficient for long-term learning and retention. The ultimate tools for long-term learning and retention are “learning by doing” and repetition, something we will have you do over the course of the entire book and we encourage this process as much as possible as you learn any new skill.</p>
+<ul>
+<li>From the <a href="https://www.datacamp.com/courses/free-introduction-to-r">Introduction to R</a> course complete the following chapters. As you work through the chapters, carefully note the important terms and what they are used for. We recommend you do so in a notebook that you can easily refer back to.
+<ul>
+<li><a href="https://campus.datacamp.com/courses/free-introduction-to-r/chapter-1-intro-to-basics-1?ex=1">Chapter 1 Intro to basics</a>:
+<ul>
+<li>Console pane: where you enter in commands</li>
+<li>Objects: where values are saved, how to assign values to objects.</li>
+<li>Data types: integers, doubles/numerics, logicals, characters.<br />
+</li>
+</ul></li>
+<li><a href="https://campus.datacamp.com/courses/free-introduction-to-r/chapter-2-vectors-2?ex=1">Chapter 2 Vectors</a>:
+<ul>
+<li>Vectors: a series of values.</li>
+</ul></li>
+<li><a href="https://campus.datacamp.com/courses/free-introduction-to-r/chapter-4-factors-4?ex=1">Chapter 4 Factors</a>:
+<ul>
+<li><em>Categorical data</em> (as opposed to <em>numerical data</em>) are represented in R as <code>factor</code>s.</li>
+</ul></li>
+<li><a href="https://campus.datacamp.com/courses/free-introduction-to-r/chapter-5-data-frames?ex=1">Chapter 5 Data frames</a>:
+<ul>
+<li>Data frames are analogous to rectangular spreadsheets: they are representations of datasets in R where the rows correspond <em>observations</em> and the columns correspond to <em>variables</em> that describe the observations. We will revisit this later in Section <a href="2-getting-started.html#nycflights13">2.4</a>.</li>
+</ul></li>
+</ul></li>
+<li>From the <a href="https://www.datacamp.com/courses/intermediate-r">Intermediate R</a> course complete the following chapters:
+<ul>
+<li><a href="https://campus.datacamp.com/courses/intermediate-r/chapter-1-conditionals-and-control-flow?ex=1">Chapter 1 Conditionals and Control Flow</a>:
+<ul>
+<li>Testing for equality in R using <code>==</code> (and not <code>=</code> which is typically used for assignment). Ex: <code>2 + 1 == 3</code> compares <code>2 + 1</code> to <code>3</code> and is correct R syntax, while <code>2 + 1 = 3</code> is not and is incorrect R syntax.</li>
+<li>Boolean algebra: <code>TRUE/FALSE</code> statements and mathematical operators such as <code>&lt;</code> (less than), <code>&lt;=</code> (less than or equal), and <code>!=</code> (not equal to).</li>
+<li>Logical operators: <code>&amp;</code> representing “and”, <code>|</code> representing “or”. Ex: <code>(2 + 1 == 3) &amp; (2 + 1 == 4)</code> returns <code>FALSE</code> while <code>(2 + 1 == 3) | (2 + 1 == 4)</code> returns <code>TRUE</code>.</li>
+</ul></li>
+<li><a href="https://campus.datacamp.com/courses/intermediate-r/chapter-3-functions?ex=1">Chapter 3 Functions</a>:
+<ul>
+<li>Concept of functions: they take in inputs (called <em>arguments</em>) and return outputs.</li>
+<li>You either manually specify a function’s arguments or use the function’s <em>defaults</em>.</li>
+</ul></li>
+</ul></li>
+</ul>
+<p>This list is by no means an exhaustive list of all the programming concepts and terminology needed to become a savvy R user; such a list would be so large it wouldn’t be very useful, especially for novices. Rather, we feel this is the bare minimum you need to know before you get started; the rest we feel you can learn as you go. Remember that your knowledge of all of these concepts will build as you get better and better at “speaking R” and getting used to its syntax.</p>
+</div>
+<div id="tips-on-learning-to-code" class="section level3">
+<h3><span class="header-section-number">2.2.2</span> Tips on learning to code</h3>
+<p>Learning to code/program is very much like learning a foreign language, it can be very daunting and frustrating at first. However just as with learning a foreign language, if you put in the effort and are not afraid to make mistakes, anybody can learn. Lastly, there are a few useful things to keep in mind as you learn to program:</p>
+<ul>
+<li><strong>Computers are stupid</strong>: You have to tell a computer everything it needs to do. Furthermore, your instructions can’t have any mistakes in them, nor can they be ambiguous in any way.</li>
+<li><strong>Take the “copy/paste/tweak” approach</strong>: Especially when learning your first programming language, it is often much easier to taking existing code that you know works and modify it to suit your ends, rather than trying to write new code from scratch. We call this the <em>copy/paste/tweak</em> approach. So early on, we suggest not trying to code from scratch, but please take the code we provide throughout this book and play around with it!</li>
+<li><strong>Practice is key</strong>: Just as the only solution to improving your foreign language skills is practice, so also the only way to get better at R is through practice. Don’t worry however, we’ll give you plenty of opportunities to practice!</li>
+</ul>
+<hr />
+</div>
+</div>
+<div id="packages" class="section level2">
+<h2><span class="header-section-number">2.3</span> What are R packages?</h2>
+<p>Another point of confusion with new R users is the notion of a package. R packages extend the functionality of R by providing additional functions, data, and documentation and can be downloaded for free from the internet. They are written by a world-wide community of R users. For example, among the many packages we will use in this book are the</p>
+<ul>
+<li><code>ggplot2</code> package for data visualization in Chapter <a href="3-viz.html#viz">3</a></li>
+<li><code>dplyr</code> package for data wrangling in Chapter <a href="5-wrangling.html#wrangling">5</a></li>
+</ul>
+<p>There are two key things to remember about R packages:</p>
+<ol style="list-style-type: decimal">
+<li><em>Installation</em>: Most packages are not installed by default when you install R and RStudio. You need to install a package before you can use it. Once you’ve installed it, you likely don’t need to install it again unless you want to update it to a newer version of the package.</li>
+<li><em>Loading</em>: Packages are not loaded automatically when you open RStudio. You need to load them every time you open RStudio using the <code>library()</code> command.</li>
+</ol>
+<p>A good analogy for R packages is they are like apps you can download onto a mobile phone:</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">R: A new phone</th>
+<th align="center">R Packages: Apps you can download</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center"><img src="images/iphone.jpg" alt="Drawing" style="height: 200px;"/></td>
+<td align="center"><img src="images/apps.jpg" alt="Drawing" style="height: 200px;"/></td>
+</tr>
+</tbody>
+</table>
+<p>So, expanding on this analogy a bit:</p>
+<ol style="list-style-type: decimal">
+<li>R is like a new mobile phone. It has a certain amount of functionality when you use it for the first time, but it doesn’t have everything.</li>
+<li>R packages are like the apps you can download onto your phone, much like those offered in the App Store and Google Play. For example: Instagram.</li>
+<li>In order to use a package, just like in order to use Instagram, you must:
+<ol style="list-style-type: decimal">
+<li>First download it and install it. You do this only once.</li>
+<li>Load it, or in other words, “open” it, using the <code>library()</code> command.</li>
+</ol></li>
+</ol>
+<p>So just as you can only start sharing photos with your friends on Instagram if you first install the app and then open it, you can only access an R package’s data and functions if you first install the package and then load it with the <code>library()</code> command. Let’s cover these two steps:</p>
+<div id="package-installation" class="section level3">
+<h3><span class="header-section-number">2.3.1</span> Package installation</h3>
+<p>(Note that if you are working on an RStudio Server, you probably will not need to install your own packages as that has been already done for you. Still it is important that you know this process for later when you are not using the RStudio Server but rather your own installation of RStudio Desktop.)</p>
+<p>There are two ways to install an R package. For example, to install the <code>ggplot2</code> package:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Easy way</strong>: In the Files pane of RStudio:
+<ol style="list-style-type: lower-alpha">
+<li>Click on the “Packages” tab</li>
+<li>Click on “Install”</li>
+<li>Type the name of the package under “Packages (separate multiple with space or comma):” In this case, type <code>ggplot2</code></li>
+<li>Click “Install”</li>
+</ol></li>
+<li><strong>Alternative way</strong>: In the Console pane run <code>install.packages(&quot;ggplot2&quot;)</code> (you must include the quotation marks).</li>
+</ol>
+<p>Repeat this for the <code>dplyr</code> and <code>nycflights13</code> packages.</p>
+<p><strong>Note</strong>: You only have to install a package once, unless you want to update an already installed package to the latest version. If you want to update a package to the latest version, then re-install it by repeating the above steps.</p>
+</div>
+<div id="package-loading" class="section level3">
+<h3><span class="header-section-number">2.3.2</span> Package loading</h3>
+<p>After you’ve installed a package, you can now load it using the <code>library()</code> command. For example, to load the <code>ggplot2</code> and <code>dplyr</code> packages, run the following code in the Console pane:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)</code></pre>
+<p><strong>Note</strong>: You have to reload each package you want to use every time you open a new session of RStudio. This is a little annoying to get used to and will be your most common error as you begin. When you see an error such as</p>
+<pre><code>Error: could not find function</code></pre>
+<p>remember that this likely comes from you trying to use a function in a package that has not been loaded. Remember to run the <code>library()</code> function with the appropriate package to fix this error.</p>
+<hr />
+</div>
+</div>
+<div id="nycflights13" class="section level2">
+<h2><span class="header-section-number">2.4</span> Explore your first dataset</h2>
+<p>Let’s put everything we’ve learned so far into practice and start exploring some real data! Data comes to us in a variety of formats, from pictures to text to numbers. Throughout this book, we’ll focus on datasets that can be stored in a spreadsheet as that is among the most common way data is collected in the many fields. Remember from Subsection <a href="2-getting-started.html#programming-concepts">2.2.1</a> that these “spreadsheet”-type datasets are called <em>data frames</em> in R and we will focus on working with data frames throughout this book.</p>
+<p>Let’s first load all the packages needed for this chapter (This assumes you’ve already installed them. Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages if you haven’t already.) At the beginning of all subsequent chapters in this text, we’ll always have a list of packages similar to what follows that you should have installed and loaded to work with that chapter’s R code.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)</code></pre>
+<pre><code>Warning: package &#39;dplyr&#39; was built under R version 3.5.2</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Be sure to install these first!</span>
+<span class="kw">library</span>(nycflights13)
+<span class="kw">library</span>(knitr)</code></pre>
+<div id="nycflights13-package" class="section level3">
+<h3><span class="header-section-number">2.4.1</span> nycflights13 package</h3>
+<p>We likely have all flown on airplanes or know someone who has. Air travel has become an ever-present aspect in many people’s lives. If you live in or are visiting a relatively large city and you walk around that city’s airport, you see gates showing flight information from many different airlines. And you will frequently see that some flights are delayed because of a variety of conditions. Are there ways that we can avoid having to deal with these flight delays?</p>
+<p>We’d all like to arrive at our destinations on time whenever possible. (Unless you secretly love hanging out at airports. If you are one of these people, pretend for the moment that you are very much anticipating being at your final destination.) Throughout this book, we’re going to analyze data related to flights contained in the <code>nycflights13</code> package <span class="citation">(Wickham 2018)</span>. Specifically, this package contains five datasets saved as “data frames” (see Section <a href="2-getting-started.html#code">2.2</a>) with information about all domestic flights departing from New York City in 2013, from either Newark Liberty International (EWR), John F. Kennedy International (JFK), or LaGuardia (LGA) airports:</p>
+<ul>
+<li><code>flights</code>: information on all 336,776 flights</li>
+<li><code>airlines</code>: translation between two letter IATA carrier codes and names (16 in total)</li>
+<li><code>planes</code>: construction information about each of 3,322 planes used</li>
+<li><code>weather</code>: hourly meteorological data (about 8705 observations) for each of the three NYC airports</li>
+<li><code>airports</code>: airport names and locations</li>
+</ul>
+</div>
+<div id="flights-data-frame" class="section level3">
+<h3><span class="header-section-number">2.4.2</span> flights data frame</h3>
+<p>We will begin by exploring the <code>flights</code> data frame that is included in the <code>nycflights13</code> package and getting an idea of its structure. Run the following in your code in your console: it loads in the <code>flights</code> dataset into your Console. Note depending on the size of your monitor, the output may vary slightly.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights</code></pre>
+<pre><code># A tibble: 336,776 x 19
+    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+   &lt;int&gt; &lt;int&gt; &lt;int&gt;    &lt;int&gt;          &lt;int&gt;     &lt;dbl&gt;    &lt;int&gt;          &lt;int&gt;
+ 1  2013     1     1      517            515         2      830            819
+ 2  2013     1     1      533            529         4      850            830
+ 3  2013     1     1      542            540         2      923            850
+ 4  2013     1     1      544            545        -1     1004           1022
+ 5  2013     1     1      554            600        -6      812            837
+ 6  2013     1     1      554            558        -4      740            728
+ 7  2013     1     1      555            600        -5      913            854
+ 8  2013     1     1      557            600        -3      709            723
+ 9  2013     1     1      557            600        -3      838            846
+10  2013     1     1      558            600        -2      753            745
+# … with 336,766 more rows, and 11 more variables: arr_delay &lt;dbl&gt;,
+#   carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;,
+#   air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;, time_hour &lt;dttm&gt;</code></pre>
+<p>Let’s unpack this output:</p>
+<ul>
+<li><code>A tibble: 336,776 x 19</code>: a <code>tibble</code> is a <a href="https://blog.rstudio.org/2016/03/24/tibble-1-0-0/#tibbles-vs-data-frames">kind of data frame</a>. This particular data frame has
+<ul>
+<li><code>336,776</code> rows</li>
+<li><code>19</code> columns corresponding to 19 variables describing each observation</li>
+</ul></li>
+<li><code>year month day dep_time sched_dep_time dep_delay arr_time</code> are different columns, in other words variables, of this data frame.</li>
+<li>We then have the first 10 rows of observations corresponding to 10 flights.</li>
+<li><code>... with 336,766 more rows, and 11 more variables:</code> indicating to us that 336,766 more rows of data and 11 more variables could not fit in this screen.</li>
+</ul>
+<p>Unfortunately, this output does not allow us to explore the data very well. Let’s look at different tools to explore data frames.</p>
+</div>
+<div id="exploredataframes" class="section level3">
+<h3><span class="header-section-number">2.4.3</span> Exploring data frames</h3>
+<p>Among the many ways of getting a feel for the data contained in a data frame such as <code>flights</code>, we present three functions that take as their argument the data frame in question:</p>
+<ol style="list-style-type: decimal">
+<li>Using the <code>View()</code> function built for use in RStudio. We will use this the most.</li>
+<li>Using the <code>glimpse()</code> function loaded via <code>dplyr</code> package</li>
+<li>Using the <code>kable()</code> function in the <code>knitr</code> package</li>
+<li>Using the <code>$</code> operator to view a single variable in a data frame</li>
+</ol>
+<p><strong>1. <code>View()</code></strong>:</p>
+<p>Run <code>View(flights)</code> in your Console in RStudio and explore this data frame in the resulting pop-up viewer. You should get into the habit of always <code>View</code>ing any data frames that come your way.</p>
+<p>Note the capital “V” in <code>View</code>. R is case-sensitive so you’ll receive an error is you run <code>view(flights)</code> instead of <code>View(flights)</code>.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC2.1)</strong> What does any <em>ONE</em> row in this <code>flights</code> dataset refer to?</p>
+<ul>
+<li>A. Data on an airline</li>
+<li>B. Data on a flight</li>
+<li>C. Data on an airport</li>
+<li>D. Data on multiple flights</li>
+</ul>
+<div class="learncheck">
+
+</div>
+<p>By running <code>View(flights)</code>, we see the different <em>variables</em> listed in the columns and we see that there are different types of variables. Some of the variables like <code>distance</code>, <code>day</code>, and <code>arr_delay</code> are what we will call <em>quantitative</em> variables. These variables are numerical in nature. Other variables here are <em>categorical</em>.</p>
+<p>Note that if you look in the leftmost column of the <code>View(flights)</code> output, you will see a column of numbers. These are the row numbers of the dataset. If you glance across a row with the same number, say row 5, you can get an idea of what each row corresponds to. In other words, this will allow you to identify what object is being referred to in a given row. This is often called the <em>observational unit</em>. The <em>observational unit</em> in this example is an individual flight departing New York City in 2013. You can identify the observational unit by determining what the <em>thing</em> is that is being measured in each of the variables.</p>
+<p><strong>2. <code>glimpse()</code></strong>:</p>
+<p>The second way to explore a data frame is using the <code>glimpse()</code> function that you can access after you’ve loaded the <code>dplyr</code> package. It provides us with much of the above information and more.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(flights)</code></pre>
+<pre><code>Observations: 336,776
+Variables: 19
+$ year           &lt;int&gt; 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, …
+$ month          &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
+$ day            &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
+$ dep_time       &lt;int&gt; 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558,…
+$ sched_dep_time &lt;int&gt; 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600,…
+$ dep_delay      &lt;dbl&gt; 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -…
+$ arr_time       &lt;int&gt; 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849…
+$ sched_arr_time &lt;int&gt; 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851…
+$ arr_delay      &lt;dbl&gt; 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -…
+$ carrier        &lt;chr&gt; &quot;UA&quot;, &quot;UA&quot;, &quot;AA&quot;, &quot;B6&quot;, &quot;DL&quot;, &quot;UA&quot;, &quot;B6&quot;, &quot;EV&quot;, &quot;B6&quot;, …
+$ flight         &lt;int&gt; 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, …
+$ tailnum        &lt;chr&gt; &quot;N14228&quot;, &quot;N24211&quot;, &quot;N619AA&quot;, &quot;N804JB&quot;, &quot;N668DN&quot;, &quot;N39…
+$ origin         &lt;chr&gt; &quot;EWR&quot;, &quot;LGA&quot;, &quot;JFK&quot;, &quot;JFK&quot;, &quot;LGA&quot;, &quot;EWR&quot;, &quot;EWR&quot;, &quot;LGA&quot;…
+$ dest           &lt;chr&gt; &quot;IAH&quot;, &quot;IAH&quot;, &quot;MIA&quot;, &quot;BQN&quot;, &quot;ATL&quot;, &quot;ORD&quot;, &quot;FLL&quot;, &quot;IAD&quot;…
+$ air_time       &lt;dbl&gt; 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, …
+$ distance       &lt;dbl&gt; 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733,…
+$ hour           &lt;dbl&gt; 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, …
+$ minute         &lt;dbl&gt; 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, …
+$ time_hour      &lt;dttm&gt; 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 …</code></pre>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC2.2)</strong> What are some examples in this dataset of <strong>categorical</strong> variables? What makes them different than <strong>quantitative</strong> variables?</p>
+<p><strong>(LC2.3)</strong> What does <code>int</code>, <code>dbl</code>, and <code>chr</code> mean in the output above?</p>
+<div class="learncheck">
+
+</div>
+<p>We see that <code>glimpse</code> will give you the first few entries of each variable in a row after the variable. In addition, the <em>data type</em> (See Subsection <a href="2-getting-started.html#programming-concepts">2.2.1</a>) of the variable is given immediately after each variable’s name inside <code>&lt; &gt;</code>. Here, <code>int</code> and <code>num</code> refer to quantitative variables. In contrast, <code>chr</code> refers to categorical variables. One more type of variable is given here with the <code>time_hour</code> variable: <code>dttm</code>. As you may suspect, this variable corresponds to a specific date and time of day.</p>
+<p><strong>3. <code>kable()</code></strong>:</p>
+<p>The final way to explore the entirety of a data frame is using the <code>kable()</code> function from the <code>knitr</code> package. Let’s explore the different carrier codes for all the airlines in our dataset two ways. Run both of these in your Console:</p>
+<pre class="sourceCode r"><code class="sourceCode r">airlines
+<span class="kw">kable</span>(airlines)</code></pre>
+<p>At first glance of both outputs, it may not appear that there is much difference. However, we’ll see later on, especially when using a tool for document production called <a href="http://rmarkdown.rstudio.com/lesson-1.html">R Markdown</a>, that the latter produces output that is much more legible.</p>
+<p><strong>4. <code>$</code> operator</strong></p>
+<p>Lastly, the <code>$</code> operator allows us to explore a single variable within a data frame. For example, run the following in your console</p>
+<pre class="sourceCode r"><code class="sourceCode r">airlines
+airlines<span class="op">$</span>name</code></pre>
+<p>We used the <code>$</code> operator to extract only the <code>name</code> variable and return it as a vector of length 16. We will only be occasionally exploring data frames using this operator.</p>
+</div>
+<div id="help-files" class="section level3">
+<h3><span class="header-section-number">2.4.4</span> Help files</h3>
+<p>Another nice feature of R is the help system. You can get help in R by entering a <code>?</code> before the name of a function or data frame in question and you will be presented with a page showing the documentation. For example, let’s look at the help file for the <code>flights</code> data frame:</p>
+<pre class="sourceCode r"><code class="sourceCode r">?flights</code></pre>
+<p>A help file should pop-up in the Help pane of RStudio. Note the content of this particular help file is also accessible on the <a href="https://cran.r-project.org/web/packages/nycflights13/nycflights13.pdf">web</a> on page 3 of the PDF document. You should get in the habit of consulting the help file of any function or data frame in R about which you have questions.</p>
+<hr />
+</div>
+</div>
+<div id="conclusion" class="section level2">
+<h2><span class="header-section-number">2.5</span> Conclusion</h2>
+<p>We’ve given you what we feel are the most essential concepts to know before you can start exploring data in R. Is this chapter exhaustive? Absolutely not. To try to include everything in this chapter would make the chapter so large it wouldn’t be useful! However, as we stated earlier, the best way to learn R is to learn by doing. Now let’s get into learning about how to create good stories about and with data. In Chapter <a href="3-viz.html#viz">3</a>, we start with what we feel is the most important tool in a data scientist’s toolbox: data visualization.</p>
+<div id="whats-to-come" class="section level3">
+<h3><span class="header-section-number">2.5.1</span> What’s to come?</h3>
+<p>We’ll now start the “data science” portion of the in Chapter <a href="3-viz.html#viz">3</a>, where we will further explore the datasets include the <code>nycflights13</code> package. We’ll see that data visualization is a powerful tool to add to our toolbox for exploring what is going on in a dataset beyond the <code>View</code> and <code>glimpse</code> functions we introduced in this chapter.</p>
+<center>
+<img src="images/flowcharts/flowchart/flowchart.004.png" title="ModernDive flowchart" width="800"/>
+</center>
+<!--
+### Data Packages
+
+Some of the datasets we will analyze in this class are accessible via R packages. For example:
+
+- flights leaving New York City in 2013 in the `nycflights13` package
+- profiles of OKCupid users in San Francisco in the `okcupiddata` package
+- IMDB movie ratings in the `ggplot2movies` package
+
+By focusing on a few large data sources, it is our hope that you'll be able to see how each of the chapters is interconnected.  You'll see how the data being "tidy" (See Chapter \@ref(tidy)) leads into data visualization and wrangling in exploratory data analysis and how those concepts tie into inference and regression.
+
+We will keep a running list of R packages you will need to have installed to complete the analysis as well here in the `needed_pkgs` character vector.  You can check if you have all of the needed packages installed by running all of the lines below in the next chunk of R code.  The last lines including the `if` will install them as needed (i.e., download their needed files from the internet to your hard drive and install them for your use).
+
+You can run the `library` function on them to load them into your current analysis.  Prior to each analysis where a package is needed, you will see the corresponding `library` function in the text.  Make sure to check the top of the chapter to see if a package was loaded there.
+-->
+
+</div>
+</div>
+</div>
+
+
+
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="index.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="3-viz.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/02-getting-started.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/previous_versions/v0.4.0/3-tidy.html b/docs/previous_versions/v0.4.0/3-tidy.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/docs/previous_versions/v0.4.0/3-viz.html b/docs/previous_versions/v0.4.0/3-viz.html
new file mode 100644
index 000000000..871f30f95
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/3-viz.html
@@ -0,0 +1,1711 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>3 Data Visualization via ggplot2 | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="3 Data Visualization via ggplot2 | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="3 Data Visualization via ggplot2 | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="2-getting-started.html">
+<link rel="next" href="4-tidy.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="viz" class="section level1">
+<h1><span class="header-section-number">3</span> Data Visualization via ggplot2</h1>
+<p>We begin the development of your data science toolbox with data visualization. By visualizing our data, we will be able to gain valuable insights from our data that we couldn’t initially see from just looking at the raw data in spreadsheet form. We will use the <code>ggplot2</code> package as it provides an easy way to customize your plots and is rooted in the data visualization theory known as <em>The Grammar of Graphics</em> <span class="citation">(Wilkinson 2005)</span>.</p>
+<p>At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). The most important thing to know about graphics is that they should be created to make it obvious for your audience to understand the findings and insight you want to get across. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible, but on the other you don’t want to include so many as to overwhelm your audience.</p>
+<p>As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the <em>distribution</em> of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is <em>distributed</em> in terms of its values) as we go across the levels of a different categorical variable.</p>
+<div id="needed-packages" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(nycflights13)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)</code></pre>
+</div>
+<div id="datacamp-1" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>Our approach to introducing data visualization via the Grammar of Graphics and the <code>ggplot2</code> package is very similar to the approach taken in <a href="https://twitter.com/drob">David Robinson’s</a> DataCamp course “Introduction to the Tidyverse,” a course targeted at people new to R and the tidyverse. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters of the course are Chapter 2 on “Data visualization” and Chapter 4 on “Types of visualizations”.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/introduction-to-the-tidyverse"><img src="images/datacamp_intro_to_tidyverse.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="grammarofgraphics" class="section level2">
+<h2><span class="header-section-number">3.1</span> The Grammar of Graphics</h2>
+<p>We begin with a discussion of a theoretical framework for data visualization known as the “The Grammar of Graphics,” which serves as the basis for the <code>ggplot2</code> package. Much like how we construct sentences in any language by using a linguistic grammar (nouns, verbs, subjects, objects, etc.), the theoretical framework given by Leland Wilkinson <span class="citation">(Wilkinson 2005)</span> allows us to specify the components of a statistical graphic.</p>
+<div id="components-of-the-grammar" class="section level3">
+<h3><span class="header-section-number">3.1.1</span> Components of the Grammar</h3>
+<p>In short, the grammar tells us that:</p>
+<blockquote>
+<p><strong>A statistical graphic is a <code>mapping</code> of <code>data</code> variables to <code>aes</code>thetic attributes of <code>geom</code>etric objects.</strong></p>
+</blockquote>
+<p>Specifically, we can break a graphic into the following three essential components:</p>
+<ol style="list-style-type: decimal">
+<li><code>data</code>: the data-set comprised of variables that we map.</li>
+<li><code>geom</code>: the geometric object in question. This refers to our type of objects we can observe in our plot. For example, points, lines, bars, etc.</li>
+<li><code>aes</code>: aesthetic attributes of the geometric object that we can perceive on a graphic. For example, x/y position, color, shape, and size. Each assigned aesthetic attribute can be mapped to a variable in our data-set.</li>
+</ol>
+<p>Let’s break down the grammar with an example.</p>
+</div>
+<div id="gapminder" class="section level3">
+<h3><span class="header-section-number">3.1.2</span> Gapminder</h3>
+<p>In February 2006, a statistician named Hans Rosling gave a TED talk titled <a href="https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen">“The best stats you’ve ever seen”</a> where he presented global economic, health, and development data from the website <a href="http://www.gapminder.org/tools/#_locale_id=en;&amp;chart-type=bubbles">gapminder.org</a>. For example, from the 1704 countries included from 2007, consider only the first 6 countries when listed alphabetically:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-14">Table 3.1: </span>Gapminder 2007 Data: First 6 of 142 countries</caption>
+<thead>
+<tr class="header">
+<th align="left">Country</th>
+<th align="left">Continent</th>
+<th align="right">Life Expectancy</th>
+<th align="right">Population</th>
+<th align="right">GDP per Capita</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Afghanistan</td>
+<td align="left">Asia</td>
+<td align="right">43.83</td>
+<td align="right">31889923</td>
+<td align="right">974.58</td>
+</tr>
+<tr class="even">
+<td align="left">Albania</td>
+<td align="left">Europe</td>
+<td align="right">76.42</td>
+<td align="right">3600523</td>
+<td align="right">5937.03</td>
+</tr>
+<tr class="odd">
+<td align="left">Algeria</td>
+<td align="left">Africa</td>
+<td align="right">72.30</td>
+<td align="right">33333216</td>
+<td align="right">6223.37</td>
+</tr>
+<tr class="even">
+<td align="left">Angola</td>
+<td align="left">Africa</td>
+<td align="right">42.73</td>
+<td align="right">12420476</td>
+<td align="right">4797.23</td>
+</tr>
+<tr class="odd">
+<td align="left">Argentina</td>
+<td align="left">Americas</td>
+<td align="right">75.32</td>
+<td align="right">40301927</td>
+<td align="right">12779.38</td>
+</tr>
+<tr class="even">
+<td align="left">Australia</td>
+<td align="left">Oceania</td>
+<td align="right">81.23</td>
+<td align="right">20434176</td>
+<td align="right">34435.37</td>
+</tr>
+</tbody>
+</table>
+<p>Each row in this table corresponds to a country in 2007. For each row, we have 5 columns:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Country</strong>: Name of country.</li>
+<li><strong>Continent</strong>: Which of the five continents the country is part of. (Note that <code>Americas</code> groups North and South America and that Antarctica is excluded here.)</li>
+<li><strong>Life Expectancy</strong>: Life expectancy in years.</li>
+<li><strong>Population</strong>: Number of people living in the country.</li>
+<li><strong>GDP per Capita</strong>: Gross domestic product (in US dollars).</li>
+</ol>
+<p>Now consider Figure <a href="3-viz.html#fig:gapminder">3.1</a>, which plots this data for all 142 countries in the data frame. Note that R will deal with large numbers using scientific notation. So in the legend for “Population”, 1.25e+09 = <span class="math inline">\(1.25 \times 10^{9}\)</span> = 1,250,000,000 = 1.25 billion.</p>
+<div class="figure" style="text-align: center"><span id="fig:gapminder"></span>
+<img src="ismaykim_files/figure-html/gapminder-1.png" alt="Life Expectancy over GDP per Capita in 2007" width="\textwidth" />
+<p class="caption">
+Figure 3.1: Life Expectancy over GDP per Capita in 2007
+</p>
+</div>
+<p>Let’s view this plot through the grammar of graphics:</p>
+<ol style="list-style-type: decimal">
+<li>The <code>data</code> variable <strong>GDP per Capita</strong> gets mapped to the <code>x</code>-position <code>aes</code>thetic of the points.</li>
+<li>The <code>data</code> variable <strong>Life Expectancy</strong> gets mapped to the <code>y</code>-position <code>aes</code>thetic of the points.</li>
+<li>The <code>data</code> variable <strong>Population</strong> gets mapped to the <code>size</code> <code>aes</code>thetic of the points.</li>
+<li>The <code>data</code> variable <strong>Continent</strong> gets mapped to the <code>color</code> <code>aes</code>thetic of the points.</li>
+</ol>
+<p>Recall that <code>data</code> here corresponds to each of the variables being in the same <code>data</code> frame and the “data variable” corresponds to a column in a data frame.</p>
+<p>While in this example we are considering one type of <code>geom</code>etric object (of type <code>point</code>), graphics are not limited to just points. Some plots involve lines while others involve bars. Let’s summarize the three essential components of the grammar in a table:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-15">Table 3.2: </span>Summary of Grammar of Graphics for this plot</caption>
+<thead>
+<tr class="header">
+<th align="left">data variable</th>
+<th align="left">aes</th>
+<th align="left">geom</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">GDP per Capita</td>
+<td align="left">x</td>
+<td align="left">point</td>
+</tr>
+<tr class="even">
+<td align="left">Life Expectancy</td>
+<td align="left">y</td>
+<td align="left">point</td>
+</tr>
+<tr class="odd">
+<td align="left">Population</td>
+<td align="left">size</td>
+<td align="left">point</td>
+</tr>
+<tr class="even">
+<td align="left">Continent</td>
+<td align="left">color</td>
+<td align="left">point</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div id="other-components-of-the-grammar" class="section level3">
+<h3><span class="header-section-number">3.1.3</span> Other components of the Grammar</h3>
+<p>There are other components of the Grammar of Graphics we can control. As you start to delve deeper into the Grammar of Graphics, you’ll start to encounter these topics more and more often. In this book, we’ll only work with the two other components below (The other components are left to a more advanced text such as <a href="http://r4ds.had.co.nz/data-visualisation.html#aesthetic-mappings">R for Data Science</a> <span class="citation">(Grolemund and Wickham 2016)</span>):</p>
+<ul>
+<li><code>facet</code>ing breaks up a plot into small multiples corresponding to the levels of another variable (Section <a href="3-viz.html#facets">3.6</a>)</li>
+<li><code>position</code> adjustments for barplots (Section <a href="3-viz.html#geombar">3.8</a>)
+<!--
+- `scales` that both
+    + convert *data units* to *physical units* the computer can display. For example, apply a log-transformation on one of the axes to focus on multiplicative rather than additive changes.
+    + draw a legend and/or axes, which provide an inverse mapping to make it possible to read the original data values from the graph.
+- `coord`inate system for x/y values: typically `cartesian`, but can also be `map` or `polar`.
+- `stat`istical transformations: this includes smoothing, binning values into a histogram, or no transformation at all (known as the `"identity"` transformation).
+--></li>
+</ul>
+<p>In general, the Grammar of Graphics allows for a high degree of customization and also a consistent framework for easy updating/modification of plots.</p>
+</div>
+<div id="the-ggplot2-package" class="section level3">
+<h3><span class="header-section-number">3.1.4</span> The ggplot2 package</h3>
+<p>In this book, we will be using the <code>ggplot2</code> package for data visualization, which is an implementation of the Grammar of Graphics for R <span class="citation">(Wickham et al. 2018)</span>. You may have noticed that a lot of the previous text in this chapter is written in computer font. This is because the various components of the Grammar of Graphics are specified in the <code>ggplot</code> function, which expects at a bare minimum as arguments:</p>
+<ul>
+<li>The data frame where the variables exist: the <code>data</code> argument</li>
+<li>The mapping of the variables to aesthetic attributes: the <code>mapping</code> argument, which specifies the <code>aes</code>thetic attributes involved</li>
+</ul>
+<p>After we’ve specified these components, we then add <em>layers</em> to the plot using the <code>+</code> sign. The most essential layer to add to a plot is the specification of which type of <code>geom</code>etric object we want the plot to involve; e.g. points, lines, bars. Other layers we can add include the specification of the plot title, axes labels, facets, and visual themes for the plot.</p>
+<p>Let’s now put the theory of the Grammar of Graphics into practice.</p>
+<!--
+The plot given above is not a histogram, but the output does show us a bit of what is going on with `ggplot(data = weather, mapping = aes(x = temp))`.  It is producing a backdrop onto which we will "paint" elements.  We next proceed by adding a layer---hence, the use of the `+` symbol---to the plot to produce a histogram.  (Note also here that we don't have to specify the `data = ` and `mapping = ` text in our function calls.  This is covered in more detail in Chapter 5 of the "Getting Used to R, RStudio, and R Markdown" book [@usedtor2016]).
+-->
+<!--
+<div class="review">
+<p><strong><em>Review questions</em></strong></p>
+</div>
+**`paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
+- Have a variety of bad plots with data for the readers and have readers create better plots with `ggplot2`
+- Have sample datasets to work with from problem statements
+    + Identify the appropriate plot to address the questions of interest
+- Why is it important for barplots to start at zero?
+-->
+</div>
+</div>
+<div id="FiveNG" class="section level2">
+<h2><span class="header-section-number">3.2</span> Five Named Graphs - The 5NG</h2>
+<p>For our purposes, we will be limiting consideration to five different types of graphs. We term these five named graphs the <strong>5NG</strong>:</p>
+<ol style="list-style-type: decimal">
+<li>scatterplots</li>
+<li>linegraphs</li>
+<li>boxplots</li>
+<li>histograms</li>
+<li>barplots</li>
+</ol>
+<p>We will discuss some variations of these plots, but with this basic repertoire in your toolbox you can visualize a wide array of different data variable types. Note that certain plots are only appropriate for categorical/logical variables and others only for quantitative variables. You’ll want to quiz yourself often as we go along on which plot makes sense a given a particular problem or data-set.</p>
+<!--Subsection on scatter plots-->
+</div>
+<div id="scatterplots" class="section level2">
+<h2><span class="header-section-number">3.3</span> 5NG#1: Scatterplots</h2>
+<p>The simplest of the 5NG are <em>scatterplots</em> (also called bivariate plots); they allow you to investigate the relationship between two numerical variables. While you may already be familiar with this type of plot, let’s view it through the lens of the Grammar of Graphics. Specifically, we will graphically investigate the relationship between the following two numerical variables in the <code>flights</code> data frame:</p>
+<ol style="list-style-type: decimal">
+<li><code>dep_delay</code>: departure delay on the horizontal “x” axis and</li>
+<li><code>arr_delay</code>: arrival delay on the vertical “y” axis</li>
+</ol>
+<p>for Alaska Airlines flights leaving NYC in 2013. This requires paring down the <code>flights</code> data frame to a smaller data frame <code>all_alaska_flights</code> consisting of only Alaska Airlines (carrier code “AS”) flights. Don’t worry for now if you don’t fully understand what this code is doing, we’ll explain this in details Chapter <a href="5-wrangling.html#wrangling">5</a>, just run it all and understand that we are taking all flights and only considering those corresponding to Alaska Airlines.</p>
+<pre class="sourceCode r"><code class="sourceCode r">all_alaska_flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(carrier <span class="op">==</span><span class="st"> &quot;AS&quot;</span>)</code></pre>
+<p>This code snippet makes use of functions in the <code>dplyr</code> package for data wrangling to achieve our goal: it takes the <code>flights</code> data frame and <code>filter</code>s it to only return the rows which meet the condition <code>carrier == &quot;AS&quot;</code>. Recall from Section <a href="2-getting-started.html#code">2.2</a> that testing for equality is specified with <code>==</code> and not <code>=</code>. You will see many more examples of <code>==</code> and <code>filter()</code> in Chapter <a href="5-wrangling.html#wrangling">5</a>.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.1)</strong> Take a look at both the <code>flights</code> and <code>all_alaska_flights</code> data frames by running <code>View(flights)</code> and <code>View(all_alaska_flights)</code> in the console. In what respect do these data frames differ?</p>
+<div class="learncheck">
+
+</div>
+<div id="geompoint" class="section level3">
+<h3><span class="header-section-number">3.3.1</span> Scatterplots via geom_point</h3>
+<p>We proceed to create the scatterplot using the <code>ggplot()</code> function:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> all_alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_point</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:noalpha"></span>
+<img src="ismaykim_files/figure-html/noalpha-1.png" alt="Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013" width="\textwidth" />
+<p class="caption">
+Figure 3.2: Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013
+</p>
+</div>
+<p>In Figure <a href="3-viz.html#fig:noalpha">3.2</a> we see that a positive relationship exists between <code>dep_delay</code> and <code>arr_delay</code>: as departure delays increase, arrival delays tend to also increase. We also note that the majority of points fall near the point (0, 0). There is a large mass of points clustered there. Furthermore after executing this code, R returns a warning message alerting us to the fact that 5 rows were ignored due to missing values. For 5 rows either the value for <code>dep_delay</code> or <code>arr_delay</code> or both were missing, and thus these rows were ignored in our plot.</p>
+<p>Let’s go back to the <code>ggplot()</code> function call that created this visualization, keeping in mind our discussion in Section <a href="3-viz.html#grammarofgraphics">3.1</a>:</p>
+<ul>
+<li>Within the <code>ggplot()</code> function call, we specify two of the components of the grammar:
+<ol style="list-style-type: decimal">
+<li>The <code>data</code> frame to be <code>all_alaska_flights</code> by setting <code>data = all_alaska_flights</code></li>
+<li>The <code>aes</code>thetic <code>mapping</code> by setting <code>aes(x = dep_delay, y = arr_delay)</code>. Specifically
+<ul>
+<li>the variable <code>dep_delay</code> maps to the <code>x</code> position aesthetic</li>
+<li>the variable <code>arr_delay</code> maps to the <code>y</code> position aesthetic</li>
+</ul></li>
+</ol></li>
+<li>We add a layer to the <code>ggplot()</code> function call using the <code>+</code> sign. The layer in question specifies the third component of the grammar: the <code>geom</code>etric object. In this case the geometric object are <code>point</code>s, set by specifying <code>geom_point()</code>.</li>
+</ul>
+<p>Some notes on layers:</p>
+<ul>
+<li>Note that the <code>+</code> sign comes at the end of lines, and not at the beginning. You’ll get an error in R if you put it at the beginning.</li>
+<li>When adding layers to a plot, you are encouraged to hit <em>Return</em> on your keyboard after entering the <code>+</code> so that the code for each layer is on a new line. As we add more and more layers to plots, you’ll see this will greatly improve the legibility of your code.</li>
+<li>To stress the importance of adding layers, in particular the layer specifying the <code>geom</code>etric object, consider Figure <a href="3-viz.html#fig:nolayers">3.3</a> where no layers are added. A not very useful plot!</li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> all_alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay))</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:nolayers"></span>
+<img src="ismaykim_files/figure-html/nolayers-1.png" alt="Plot with No Layers" width="\textwidth" />
+<p class="caption">
+Figure 3.3: Plot with No Layers
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.2)</strong> What are some practical reasons why <code>dep_delay</code> and <code>arr_delay</code> have a positive relationship?</p>
+<p><strong>(LC3.3)</strong> What variables (not necessarily in the <code>flights</code> data frame) would you expect to have a negative correlation (i.e. a negative relationship) with <code>dep_delay</code>? Why? Remember that we are focusing on numerical variables here.</p>
+<p><strong>(LC3.4)</strong> Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights?</p>
+<p><strong>(LC3.5)</strong> What are some other features of the plot that stand out to you?</p>
+<p><strong>(LC3.6)</strong> Create a new scatterplot using different variables in the <code>all_alaska_flights</code> data frame by modifying the example above.</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="overplotting" class="section level3">
+<h3><span class="header-section-number">3.3.2</span> Over-plotting</h3>
+<p>The large mass of points near (0, 0) in Figure <a href="3-viz.html#fig:noalpha">3.2</a> can cause some confusion. This is the result of a phenomenon called <em>overplotting</em>. As one may guess, this corresponds to values being plotted on top of each other <em>over</em> and <em>over</em> again. It is often difficult to know just how many values are plotted in this way when looking at a basic scatterplot as we have here. There are two ways to address this issue:</p>
+<ol style="list-style-type: decimal">
+<li>By adjusting the transparency of the points via the <code>alpha</code> argument</li>
+<li>By jittering the points via <code>geom_jitter()</code></li>
+</ol>
+<p>The first way of relieving overplotting is by changing the <code>alpha</code> argument in <code>geom_point()</code> which controls the transparency of the points. By default, this value is set to <code>1</code>. We can change this to any value between <code>0</code> and <code>1</code> where <code>0</code> sets the points to be 100% transparent and <code>1</code> sets the points to be 100% opaque. Note how the following function call is identical to the one in Section <a href="3-viz.html#scatterplots">3.3</a>, but with <code>alpha = 0.2</code> added to the <code>geom_point()</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> all_alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_point</span>(<span class="dt">alpha =</span> <span class="fl">0.2</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:alpha"></span>
+<img src="ismaykim_files/figure-html/alpha-1.png" alt="Delay scatterplot with alpha=0.2" width="\textwidth" />
+<p class="caption">
+Figure 3.4: Delay scatterplot with alpha=0.2
+</p>
+</div>
+<p>The key feature to note in Figure <a href="3-viz.html#fig:alpha">3.4</a> is that the transparency of the points is cumulative: areas with a high-degree of overplotting are darker, whereas areas with a lower degree are less dark.</p>
+<p>Note that there is no <code>aes()</code> surrounding <code>alpha = 0.2</code> here. Since we are NOT mapping a variable to an aesthetic but instead are just changing a setting, we don’t need to create a mapping with <code>aes()</code>. In fact, you’ll receive an error if you try to change the second line above to <code>geom_point(aes(alpha = 0.2))</code>.</p>
+<p>The second way of relieving overplotting is to <em>jitter</em> the points a bit. In other words, we are going to add just a bit of random noise to the points to better see them and alleviate some of the overplotting. You can think of “jittering” as shaking the points around a bit on the plot. Let’s illustrate using a simple example first. Say we have a data frame <code>jitter_example</code> with 4 rows of identical value 0 for both <code>x</code> and <code>y</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">jitter_example</code></pre>
+<pre><code># A tibble: 4 x 2
+      x     y
+  &lt;dbl&gt; &lt;dbl&gt;
+1     0     0
+2     0     0
+3     0     0
+4     0     0</code></pre>
+<p>We display the resulting scatterplot in Figure <a href="3-viz.html#fig:jitter-example-plot-1">3.5</a>; observe that the 4 points are superimposed on top of each other. While we know there are 4 values being plotted, this fact might not be apparent to others.</p>
+<div class="figure" style="text-align: center"><span id="fig:jitter-example-plot-1"></span>
+<img src="ismaykim_files/figure-html/jitter-example-plot-1-1.png" alt="Regular scatterplot of jitter example data" width="\textwidth" />
+<p class="caption">
+Figure 3.5: Regular scatterplot of jitter example data
+</p>
+</div>
+<p>In Figure <a href="3-viz.html#fig:jitter-example-plot-2">3.6</a> we instead display a <em>jittered scatterplot</em>. Since each point is given a random “nudge”, it is now plainly evident that there are four points.</p>
+<div class="figure" style="text-align: center"><span id="fig:jitter-example-plot-2"></span>
+<img src="ismaykim_files/figure-html/jitter-example-plot-2-1.png" alt="Jittered scatterplot of jitter example data" width="\textwidth" />
+<p class="caption">
+Figure 3.6: Jittered scatterplot of jitter example data
+</p>
+</div>
+<p>To create a jittered scatterplot, instead of using <code>geom_point</code>, we use <code>geom_jitter</code>. To specify how much jitter to add, we adjust the <code>width</code> and <code>height</code> arguments. This corresponds to how hard you’d like to shake the plot in units corresponding to those for both the horizontal and vertical variables (in this case, minutes). It is important to add just enough jitter to break any overlap in points, but not so much that we completely obscure the overall pattern in points.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> all_alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_jitter</span>(<span class="dt">width =</span> <span class="dv">30</span>, <span class="dt">height =</span> <span class="dv">30</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:jitter"></span>
+<img src="ismaykim_files/figure-html/jitter-1.png" alt="Jittered delay scatterplot" width="\textwidth" />
+<p class="caption">
+Figure 3.7: Jittered delay scatterplot
+</p>
+</div>
+<p>Observe how this function call is identical to the one in Subsection <a href="3-viz.html#geompoint">3.3.1</a>, but with <code>geom_point()</code> replaced with <code>geom_jitter()</code>. Also, it is important to note that <code>geom_jitter()</code> is strictly a visualization tool and that does not alter the original values saved in <code>jitter_example</code>.</p>
+<p>The plot in Figure <a href="3-viz.html#fig:jitter">3.7</a> helps us a little bit in getting a sense for the overplotting, but with a relatively large data-set like this one (714 flights), it can be argued that changing the transparency of the points by setting <code>alpha</code> proved more effective.</p>
+<p>Furthermore, we’ll see later on that the two following R commands will yield the exact same plot:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> all_alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_jitter</span>(<span class="dt">width =</span> <span class="dv">30</span>, <span class="dt">height =</span> <span class="dv">30</span>)
+<span class="kw">ggplot</span>(all_alaska_flights, <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_jitter</span>(<span class="dt">width =</span> <span class="dv">30</span>, <span class="dt">height =</span> <span class="dv">30</span>)</code></pre>
+<p>In other words you can drop the <code>data =</code> and <code>mapping =</code> if you keep the order of the two arguments the same. Since the <code>ggplot()</code> function is expecting its first argument <code>data</code> to be a data frame and its second argument to correspond to <code>mapping =</code>, you can omit both and you’ll get the same plot. As you get more and more practice, you’ll likely find yourself not including the specification of the argument like this. But for now to keep things straightforward let’s make it a point to include the <code>data =</code> and <code>mapping =</code>.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.7)</strong> Why is setting the <code>alpha</code> argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot?</p>
+<p><strong>(LC3.8)</strong> After viewing the Figure <a href="3-viz.html#fig:alpha">3.4</a> above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the <code>alpha = 0.2</code> set in Figure <a href="3-viz.html#fig:noalpha">3.2</a>?</p>
+<div class="learncheck">
+
+</div>
+<!--
+Maybe include a shading of the points by another variable example here for multivariate thinking?
+-->
+</div>
+<div id="summary" class="section level3">
+<h3><span class="header-section-number">3.3.3</span> Summary</h3>
+<p>Scatterplots display the relationship between two numerical variables. They are among the most commonly used plots because they can provide an immediate way to see the trend in one variable versus another. However, if you try to create a scatterplot where either one of the two variables is not numerical, you will get strange results. Be careful!</p>
+<p>With medium to large data-sets, you may need to play with either <code>geom_jitter()</code> or the <code>alpha</code> argument in order to get a good feel for relationships in your data. This tweaking is often a fun part of data visualization since you’ll have the chance to see different relationships come about as you make subtle changes to your plots.</p>
+</div>
+</div>
+<div id="linegraphs" class="section level2">
+<h2><span class="header-section-number">3.4</span> 5NG#2: Linegraphs</h2>
+<p>The next of the 5NG is a linegraph. They are most frequently used when the x-axis represents time and the y-axis represents some other numerical variable; such plots are known as <em>time series</em>. Time represents a variable that is connected together by each day following the previous day. In other words, time has a natural ordering. Linegraphs should be avoided when there is not a clear sequential ordering to the explanatory variable, i.e. the x-variable or the <em>predictor</em> variable.</p>
+<p>Our focus now turns to the <code>temp</code> variable in this <code>weather</code> data-set. By</p>
+<ul>
+<li>Looking over the <code>weather</code> data-set by typing <code>View(weather)</code> in the console.</li>
+<li>Running <code>?weather</code> to bring up the help file.</li>
+</ul>
+<p>We can see that the <code>temp</code> variable corresponds to hourly temperature (in Fahrenheit) recordings at weather stations near airports in New York City. Instead of considering all hours in 2013 for all three airports in NYC, let’s focus on the hourly temperature at Newark airport (<code>origin</code> code “EWR”) for the first 15 days in January 2013. The <code>weather</code> data frame in the <code>nycflights13</code> package contains this data, but we first need to filter it to only include those rows that correspond to Newark in the first 15 days of January.</p>
+<pre class="sourceCode r"><code class="sourceCode r">early_january_weather &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(origin <span class="op">==</span><span class="st"> &quot;EWR&quot;</span> <span class="op">&amp;</span><span class="st"> </span>month <span class="op">==</span><span class="st"> </span><span class="dv">1</span> <span class="op">&amp;</span><span class="st"> </span>day <span class="op">&lt;=</span><span class="st"> </span><span class="dv">15</span>)</code></pre>
+<p>This is similar to the previous use of the <code>filter</code> command in Section <a href="3-viz.html#scatterplots">3.3</a>, however we now use the <code>&amp;</code> operator. The above selects only those rows in <code>weather</code> where the originating airport is <code>&quot;EWR&quot;</code> <strong>and</strong> we are in the first month <strong>and</strong> the day is from 1 to 15 inclusive.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.9)</strong> Take a look at both the <code>weather</code> and <code>early_january_weather</code> data frames by running <code>View(weather)</code> and <code>View(early_january_weather)</code> in the console. In what respect do these data frames differ?</p>
+<p><strong>(LC3.10)</strong> <code>View()</code> the <code>flights</code> data frame again. Why does the <code>time_hour</code> variable uniquely identify the hour of the measurement whereas the <code>hour</code> variable does not?</p>
+<div class="learncheck">
+
+</div>
+<div id="geomline" class="section level3">
+<h3><span class="header-section-number">3.4.1</span> Linegraphs via geom_line</h3>
+<p>We plot a linegraph of hourly temperature using <code>geom_line()</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> early_january_weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> time_hour, <span class="dt">y =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_line</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:hourlytemp"></span>
+<img src="ismaykim_files/figure-html/hourlytemp-1.png" alt="Hourly Temperature in Newark for January 1-15, 2013" width="\textwidth" />
+<p class="caption">
+Figure 3.8: Hourly Temperature in Newark for January 1-15, 2013
+</p>
+</div>
+<p>Much as with the <code>ggplot()</code> call in Chapter <a href="3-viz.html#geompoint">3.3.1</a>, we describe the components of the Grammar of Graphics:</p>
+<ul>
+<li>Within the <code>ggplot()</code> function call, we specify two of the components of the grammar:
+<ol style="list-style-type: decimal">
+<li>The <code>data</code> frame to be <code>early_january_weather</code> by setting <code>data = early_january_weather</code></li>
+<li>The <code>aes</code>thetic mapping by setting <code>aes(x = time_hour, y = temp)</code>. Specifically
+<ul>
+<li><code>time_hour</code> (i.e. the time variable) maps to the <code>x</code> position</li>
+<li><code>temp</code> maps to the <code>y</code> position</li>
+</ul></li>
+</ol></li>
+<li>We add a layer to the <code>ggplot()</code> function call using the <code>+</code> sign</li>
+<li>The layer in question specifies the third component of the grammar: the <code>geom</code>etric object in question. In this case the geometric object is a <code>line</code>, set by specifying <code>geom_line()</code>.</li>
+</ul>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.11)</strong> Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis?</p>
+<p><strong>(LC3.12)</strong> Why are linegraphs frequently used when time is the explanatory variable?</p>
+<p><strong>(LC3.13)</strong> Plot a time series of a variable other than <code>temp</code> for Newark Airport in the first 15 days of January 2013.</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="summary-1" class="section level3">
+<h3><span class="header-section-number">3.4.2</span> Summary</h3>
+<p>Linegraphs, just like scatterplots, display the relationship between two numerical variables. However, the variable on the x-axis (i.e. the explanatory variable) should have a natural ordering, like some notion of time. We can mislead our audience if that isn’t the case.</p>
+</div>
+</div>
+<div id="histograms" class="section level2">
+<h2><span class="header-section-number">3.5</span> 5NG#3: Histograms</h2>
+<p>Let’s consider the <code>temp</code> variable in the <code>weather</code> data frame once again, but now unlike with the linegraphs in Chapter <a href="3-viz.html#linegraphs">3.4</a>, let’s say we don’t care about the relationship of temperature to time, but rather we care about the <em>(statistical) distribution</em> of temperatures. We could just produce points where each of the different values appear on something similar to a number line:</p>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-26"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-26-1.png" alt="Plot of Hourly Temperature Recordings from NYC in 2013" width="\textwidth" />
+<p class="caption">
+Figure 3.9: Plot of Hourly Temperature Recordings from NYC in 2013
+</p>
+</div>
+<p>This gives us a general idea of how the values of <code>temp</code> differ. We see that temperatures vary from around 11 up to 100 degrees Fahrenheit. The area between 40 and 60 degrees appears to have more points plotted than outside that range.</p>
+<div id="geomhistogram" class="section level3">
+<h3><span class="header-section-number">3.5.1</span> Histograms via geom_histogram</h3>
+<p>What is commonly produced instead of the above plot is a plot known as a <em>histogram</em>. The histogram shows how many elements of a single numerical variable fall in specified <em>bins</em>. In this case, these bins may correspond to between 0-10°F, 10-20°F, etc. We produce a histogram of the hour temperatures at all three NYC airports in 2013:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>()</code></pre>
+<pre><code>`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</code></pre>
+<pre><code>Warning: Removed 1 rows containing non-finite values (stat_bin).</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-27"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-27-1.png" alt="Histogram of Hourly Temperature Recordings from NYC in 2013" width="\textwidth" />
+<p class="caption">
+Figure 3.10: Histogram of Hourly Temperature Recordings from NYC in 2013
+</p>
+</div>
+<p>Note here:</p>
+<ul>
+<li>There is only one variable being mapped in <code>aes()</code>: the single numerical variable <code>temp</code>. You don’t need to compute the y-aesthetic: it gets computed automatically.</li>
+<li>We set the <code>geom</code>etric object to be <code>geom_histogram()</code></li>
+<li>We got a warning message of <code>1 rows containing non-finite values</code> being removed. This is due to one of the values of temperature being missing. R is alerting us that this happened.<br />
+</li>
+<li>Another warning corresponds to an urge to specify the number of bins you’d like to create.</li>
+</ul>
+</div>
+<div id="adjustbins" class="section level3">
+<h3><span class="header-section-number">3.5.2</span> Adjusting the bins</h3>
+<p>We can adjust characteristics of the bins in one of <em>two</em> ways:</p>
+<ol style="list-style-type: decimal">
+<li>By adjusting the number of bins via the <code>bins</code> argument</li>
+<li>By adjusting the width of the bins via the <code>binwidth</code> argument</li>
+</ol>
+<p>First, we have the power to specify how many bins we would like to put the data into as an argument in the <code>geom_histogram()</code> function. By default, this is chosen to be 30 somewhat arbitrarily; we have received a warning above our plot that this was done.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">60</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-28"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-28-1.png" alt="Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Bins" width="\textwidth" />
+<p class="caption">
+Figure 3.11: Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Bins
+</p>
+</div>
+<p>Note the addition of the <code>color</code> argument. If you’d like to be able to more easily differentiate each of the bins, you can specify the color of the outline as done above. You can also adjust the color of the bars by setting the <code>fill</code> argument. Type <code>colors()</code> in your console to see all 657 available colors.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">60</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">fill =</span> <span class="st">&quot;steelblue&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-29"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-29-1.png" alt="Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Colored Bins" width="\textwidth" />
+<p class="caption">
+Figure 3.12: Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Colored Bins
+</p>
+</div>
+<p>Second, instead of specifying the number of bins, we can also specify the width of the bins by using the <code>binwidth</code> argument in the <code>geom_histogram</code> function.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-30"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-30-1.png" alt="Histogram of Hourly Temperature Recordings from NYC in 2013 - Binwidth = 10" width="\textwidth" />
+<p class="caption">
+Figure 3.13: Histogram of Hourly Temperature Recordings from NYC in 2013 - Binwidth = 10
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.14)</strong> What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures?</p>
+<p><strong>(LC3.15)</strong> Would you classify the distribution of temperatures as symmetric or skewed?</p>
+<p><strong>(LC3.16)</strong> What would you guess is the “center” value in this distribution? Why did you make that choice?</p>
+<p><strong>(LC3.17)</strong> Is this data spread out greatly from the center or is it close? Why?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="summary-2" class="section level3">
+<h3><span class="header-section-number">3.5.3</span> Summary</h3>
+<p>Histograms, unlike scatterplots and linegraphs, present information on only a single numerical variable. In particular they are visualizations of the (statistical) distribution of values.</p>
+</div>
+</div>
+<div id="facets" class="section level2">
+<h2><span class="header-section-number">3.6</span> Facets</h2>
+<p>Before continuing the 5NG, we briefly introduce a new concept called <em>faceting</em>. Faceting is used when we’d like to create small multiples of the same plot over a different categorical variable. By default, all of the small multiples will have the same vertical axis.</p>
+<p>For example, suppose we were interested in looking at how the temperature histograms we saw in Chapter <a href="3-viz.html#histograms">3.5</a> varied by month. This is what is meant by “the distribution of a variable over another variable”: <code>temp</code> is one variable and <code>month</code> is the other variable. In order to look at histograms of <code>temp</code> for each month, we add a layer <code>facet_wrap(~ month)</code>. You can also specify how many rows you’d like the small multiple plots to be in using <code>nrow</code> or how many columns using <code>ncol</code> inside of <code>facet_wrap</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>month, <span class="dt">nrow =</span> <span class="dv">4</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:facethistogram"></span>
+<img src="ismaykim_files/figure-html/facethistogram-1.png" alt="Faceted histogram" width="\textwidth" />
+<p class="caption">
+Figure 3.14: Faceted histogram
+</p>
+</div>
+<p>Note the use of the <code>~</code> before <code>month</code> in <code>facet_wrap</code>. The tilde (<code>~</code>) is required and you’ll receive the error <code>Error in as.quoted(facets) : object 'month' not found</code> if you don’t include it before <code>month</code> here.</p>
+<p>As we might expect, the temperature tends to increase as summer approaches and then decrease as winter approaches.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.18)</strong> What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables?</p>
+<p><strong>(LC3.19)</strong> What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100?</p>
+<p><strong>(LC3.20)</strong> For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics.</p>
+<p><strong>(LC3.21)</strong> Does the <code>temp</code> variable in the <code>weather</code> data-set have a lot of variability? Why do you say that?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="boxplots" class="section level2">
+<h2><span class="header-section-number">3.7</span> 5NG#4: Boxplots</h2>
+<p>While using faceted histograms can provide a way to compare distributions of a numerical variable split by groups of a categorical variable as in Section <a href="3-viz.html#facets">3.6</a>, an alternative plot called a <em>boxplot</em> (also called a <em>side-by-side boxplot</em>) achieves the same task and is frequently preferred. The <em>boxplot</em> uses the information provided in the <em>five-number summary</em> referred to in Appendix <a href="A-appendixA.html#appendixA">A</a>. It gives a way to compare this summary information across the different levels of a categorical variable.</p>
+<div id="geomboxplot" class="section level3">
+<h3><span class="header-section-number">3.7.1</span> Boxplots via geom_boxplot</h3>
+<p>Let’s create a boxplot to compare the monthly temperatures as we did above with the faceted histograms.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> month, <span class="dt">y =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:badbox"></span>
+<img src="ismaykim_files/figure-html/badbox-1.png" alt="Invalid boxplot specification" width="\textwidth" />
+<p class="caption">
+Figure 3.15: Invalid boxplot specification
+</p>
+</div>
+<pre><code>Warning messages:
+1: Continuous x aesthetic -- did you forget aes(group=...)? 
+2: Removed 1 rows containing non-finite values (stat_boxplot). </code></pre>
+<p>Note the set of warnings that is given here. The second warning corresponds to missing values in the data frame and it is turned off on subsequent plots. Let’s focus on the first warning.</p>
+<p>Observe that this plot does not look like what we were expecting. We were expecting to see the distribution of temperatures for each month (so 12 different boxplots). The first warning is letting us know that we are plotting a numerical, and not categorical variable, on the x-axis. This gives us the overall boxplot without any other groupings. We can get around this by introducing a new function for our <code>x</code> variable:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> <span class="kw">factor</span>(month), <span class="dt">y =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:monthtempbox"></span>
+<img src="ismaykim_files/figure-html/monthtempbox-1.png" alt="Month by temp boxplot" width="\textwidth" />
+<p class="caption">
+Figure 3.16: Month by temp boxplot
+</p>
+</div>
+<p>We have introduced a new function called <code>factor()</code> which converts a numerical variable to a categorical one. This is necessary as <code>geom_boxplot</code> requires the <code>x</code> variable to be a categorical variable, which the variable <code>month</code> is not. So after applying <code>factor(month)</code>, month goes from having numerical values 1, 2, …, 12 to having labels “1”, “2”, …, “12”. The resulting Figure <a href="3-viz.html#fig:monthtempbox">3.16</a> shows 12 separate “box and whiskers” plots with the following features:</p>
+<ul>
+<li>The “box” portions of this plot represent the 25<sup>th</sup> percentile AKA the 1<sup>st</sup> quartile, the median AKA the 50<sup>th</sup> percentile AKA the 2<sup>nd</sup> quartile, and the 75<sup>th</sup> percentile AKA the 3<sup>rd</sup> quartile.</li>
+<li>The height of each box, i.e. the value of the 3<sup>rd</sup> quartile minus the value of the 1<sup>st</sup> quartile, is called the <em>interquartile range</em> (<span class="math inline">\(IQR\)</span>). It is a measure of spread of the middle 50% of values, with longer boxes indicating more variability.</li>
+<li>The “whisker” portions of these plots extend out from the bottoms and tops of the boxes and represent points less than the 25<sup>th</sup> percentile and greater than the 75<sup>th</sup> percentiles respectively. They’re set to extend out no more than <span class="math inline">\(1.5 \times IQR\)</span> units away from either end of the boxes. We say “no more than” because the ends of the whiskers represent the first observed values of <code>temp</code> to be within the range of the whiskers. The length of these whiskers show how the data outside the middle 50% of values vary, with longer whiskers indicating more variability.</li>
+<li>The dots representing values falling outside the whiskers are called <em>outliers</em>. It is important to keep in mind that the definition of an outlier is somewhat arbitrary and not absolute. In this case, they are defined by the length of the whiskers, which are no more than <span class="math inline">\(1.5 \times IQR\)</span> units long.</li>
+</ul>
+<p>Looking at this plot we can see, as expected, that summer months (6 through 8) have higher median temperatures as evidenced by the higher solid lines in the middle of the boxes. We can easily compare temperatures across months by drawing imaginary horizontal lines across the plot. Furthermore, the height of the 12 boxes as quantified by the interquartile ranges are informative too; they tell us about variability, or spread, of temperatures recorded in a given month.</p>
+<p>But to really bring home what boxplots show, let’s focus only on the month of November’s 2141 temperature recordings.</p>
+<div class="figure" style="text-align: center"><span id="fig:monthtempbox2"></span>
+<img src="ismaykim_files/figure-html/monthtempbox2-1.png" alt="November boxplot" width="\textwidth" />
+<p class="caption">
+Figure 3.17: November boxplot
+</p>
+</div>
+<p>Now let’s plot all 2141 temperature recordings for November on top of the boxplot in Figure <a href="3-viz.html#fig:monthtempbox3">3.18</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:monthtempbox3"></span>
+<img src="ismaykim_files/figure-html/monthtempbox3-1.png" alt="November boxplot with points" width="\textwidth" />
+<p class="caption">
+Figure 3.18: November boxplot with points
+</p>
+</div>
+<p>What the boxplot does is summarize the 2141 points for you, in particular:</p>
+<ol style="list-style-type: decimal">
+<li>25% of points (about 534 observations) fall below the bottom edge of the box which is the first quartile of 35.96 degrees Fahrenheit (2.2 degrees Celsius). In other words 25% of observations were colder than 35.96 degrees Fahrenheit.</li>
+<li>25% of points fall between the bottom edge of the box and the solid middle line which is the median of 44.96 degrees Fahrenheit (7.8 degrees Celsius). In other words 25% of observations were between 35.96 and 44.96 degrees Fahrenheit.</li>
+<li>25% of points fall between the solid middle line and the top edge of the box which is the third quartile of 51.98 degrees Fahrenheit (11.1 degrees Celsius). In other words 25% of observations were between 44.96 and 51.98 degrees Fahrenheit.</li>
+<li>25% of points fall over the top edge of the box. In other words 25% of observations were warmer than 51.98 degrees Fahrenheit.</li>
+<li>The middle 50% of points lie within the interquartile range 16.02 degrees Fahrenheit.</li>
+</ol>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.22)</strong> What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point.</p>
+<p><strong>(LC3.23)</strong> Which months have the highest variability in temperature? What reasons do you think this is?</p>
+<p><strong>(LC3.24)</strong> We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?</p>
+<p><strong>(LC3.25)</strong> Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="summary-3" class="section level3">
+<h3><span class="header-section-number">3.7.2</span> Summary</h3>
+<p>Boxplots provide a way to compare and contrast the distribution of one quantitative variable across multiple levels of one categorical variable. One can see where the median falls across the different groups by looking at the center line in the box. To see how spread out the variable is across the different groups, look at both the width of the box and also how far the lines stretch vertically from the box. (If the lines stretch far from the box but the box has a small width, the variability of the values closer to the center is much smaller than the variability of the outer ends of the variable.) Outliers are even more easily identified when looking at a boxplot than when looking at a histogram.</p>
+</div>
+</div>
+<div id="geombar" class="section level2">
+<h2><span class="header-section-number">3.8</span> 5NG#5: Barplots</h2>
+<p>Both histograms and boxplots represent ways to visualize the variability of numerical variables. Another common task is to present the distribution of a categorical variable. This is a simpler task, focused on how many elements from the data fall into different categories of the categorical variable. Often the best way to visualize these different counts (also known as <em>frequencies</em>) is via a barplot, also known as a barchart.</p>
+<p>One complication, however, is how your data is represented: is the categorical variable of interest “pre-counted” or not? For example, run the following code in your Console. This code manually creates two data frames representing a collection of fruit: 3 apples and 2 oranges.</p>
+<pre class="sourceCode r"><code class="sourceCode r">fruits &lt;-<span class="st"> </span><span class="kw">data_frame</span>(
+  <span class="dt">fruit =</span> <span class="kw">c</span>(<span class="st">&quot;apple&quot;</span>, <span class="st">&quot;apple&quot;</span>, <span class="st">&quot;apple&quot;</span>, <span class="st">&quot;orange&quot;</span>, <span class="st">&quot;orange&quot;</span>)
+)
+fruits_counted &lt;-<span class="st"> </span><span class="kw">data_frame</span>(
+  <span class="dt">fruit =</span> <span class="kw">c</span>(<span class="st">&quot;apple&quot;</span>, <span class="st">&quot;orange&quot;</span>),
+  <span class="dt">number =</span> <span class="kw">c</span>(<span class="dv">3</span>, <span class="dv">2</span>)
+)</code></pre>
+<p>We see both the <code>fruits</code> and <code>fruits_counted</code> data frames represent the same collection of fruit. Whereas <code>fruits</code> just lists the fruit individually:</p>
+<table>
+<caption><span id="tab:fruits">Table 3.3: </span>Fruits</caption>
+<thead>
+<tr class="header">
+<th align="left">fruit</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">apple</td>
+</tr>
+<tr class="even">
+<td align="left">apple</td>
+</tr>
+<tr class="odd">
+<td align="left">apple</td>
+</tr>
+<tr class="even">
+<td align="left">orange</td>
+</tr>
+<tr class="odd">
+<td align="left">orange</td>
+</tr>
+</tbody>
+</table>
+<p><code>fruits_counted</code> has a variable <code>count</code> which represents pre-counted values of each fruit.</p>
+<table>
+<caption><span id="tab:fruitscounted">Table 3.4: </span>Fruits (Pre-Counted)</caption>
+<thead>
+<tr class="header">
+<th align="left">fruit</th>
+<th align="right">number</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">apple</td>
+<td align="right">3</td>
+</tr>
+<tr class="even">
+<td align="left">orange</td>
+<td align="right">2</td>
+</tr>
+</tbody>
+</table>
+<div id="barplots-via-geom_bargeom_col" class="section level3">
+<h3><span class="header-section-number">3.8.1</span> Barplots via geom_bar/geom_col</h3>
+<p>Let’s generate barplots using these two different representations of the same basket of fruit: 3 apples and 2 oranges. Using the not pre-counted data <code>fruits</code> from Table <a href="3-viz.html#tab:fruits">3.3</a>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> fruits, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> fruit)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:geombar"></span>
+<img src="ismaykim_files/figure-html/geombar-1.png" alt="Barplot when counts are not pre-counted" width="\textwidth" />
+<p class="caption">
+Figure 3.19: Barplot when counts are not pre-counted
+</p>
+</div>
+<p>and using the pre-counted data <code>fruits_counted</code> from Table <a href="3-viz.html#tab:fruitscounted">3.4</a>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> fruits_counted, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> fruit, <span class="dt">y =</span> number)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_col</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:geomcol"></span>
+<img src="ismaykim_files/figure-html/geomcol-1.png" alt="Barplot when counts are pre-counted" width="\textwidth" />
+<p class="caption">
+Figure 3.20: Barplot when counts are pre-counted
+</p>
+</div>
+<p>Compare the barplots in Figures <a href="3-viz.html#fig:geombar">3.19</a> and <a href="3-viz.html#fig:geomcol">3.20</a>, which are identical, but are based on the two different data frames. Observe that:</p>
+<ul>
+<li>The code that generates Figure <a href="3-viz.html#fig:geombar">3.19</a> based on <code>fruits</code> does not map a variable to the <code>y</code> <code>aes</code>thetic and uses <code>geom_bar()</code>.</li>
+<li>The code that generates Figure <a href="3-viz.html#fig:geomcol">3.20</a> based on <code>fruits_counted</code> maps the <code>number</code> variable to the <code>y</code> <code>aes</code>thetic and uses <code>geom_col()</code></li>
+</ul>
+<p>Stating the above differently:</p>
+<ul>
+<li>When the categorical variable you want to plot is not pre-counted in your data frame you need to use <code>geom_bar()</code>.</li>
+<li>When the categorical variable is pre-counted (in the above <code>fruits_counted</code> example in the variable <code>number</code>), you need to use <code>geom_col()</code> with the <code>y</code> aesthetic explicitly mapped.</li>
+</ul>
+<p>Please note that understanding this difference is one of <code>ggplot2</code>’s trickier aspects that causes the most confusion, and fortunately this is as complicated as our use of <code>ggplot2</code> is going to get. Let’s consider a different distribution: the distribution of airlines that flew out of New York City in 2013. Here we explore the number of flights from each airline/<code>carrier</code>. This can be plotted by invoking the <code>geom_bar</code> function in <code>ggplot2</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:flightsbar"></span>
+<img src="ismaykim_files/figure-html/flightsbar-1.png" alt="Number of flights departing NYC in 2013 by airline using geom_bar" width="\textwidth" />
+<p class="caption">
+Figure 3.21: Number of flights departing NYC in 2013 by airline using geom_bar
+</p>
+</div>
+<p>To get an understanding of what the names of these airlines are corresponding to these <code>carrier</code> codes, we can look at the <code>airlines</code> data frame in the <code>nycflights13</code> package.</p>
+<pre class="sourceCode r"><code class="sourceCode r">airlines</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="left">carrier</th>
+<th align="left">name</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">9E</td>
+<td align="left">Endeavor Air Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">AA</td>
+<td align="left">American Airlines Inc.</td>
+</tr>
+<tr class="odd">
+<td align="left">AS</td>
+<td align="left">Alaska Airlines Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">B6</td>
+<td align="left">JetBlue Airways</td>
+</tr>
+<tr class="odd">
+<td align="left">DL</td>
+<td align="left">Delta Air Lines Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">EV</td>
+<td align="left">ExpressJet Airlines Inc.</td>
+</tr>
+<tr class="odd">
+<td align="left">F9</td>
+<td align="left">Frontier Airlines Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">FL</td>
+<td align="left">AirTran Airways Corporation</td>
+</tr>
+<tr class="odd">
+<td align="left">HA</td>
+<td align="left">Hawaiian Airlines Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">MQ</td>
+<td align="left">Envoy Air</td>
+</tr>
+<tr class="odd">
+<td align="left">OO</td>
+<td align="left">SkyWest Airlines Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">UA</td>
+<td align="left">United Air Lines Inc.</td>
+</tr>
+<tr class="odd">
+<td align="left">US</td>
+<td align="left">US Airways Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">VX</td>
+<td align="left">Virgin America</td>
+</tr>
+<tr class="odd">
+<td align="left">WN</td>
+<td align="left">Southwest Airlines Co.</td>
+</tr>
+<tr class="even">
+<td align="left">YV</td>
+<td align="left">Mesa Airlines Inc.</td>
+</tr>
+</tbody>
+</table>
+<p>Going back to our barplot, we see that United Air Lines, JetBlue Airways, and ExpressJet Airlines had the most flights depart New York City in 2013. To get the actual number of flights by each airline we can use the <code>group_by()</code>, <code>summarize()</code>, and <code>n()</code> functions in the <code>dplyr</code> package on the <code>carrier</code> variable in <code>flights</code>, which we will introduce formally in Chapter <a href="5-wrangling.html#wrangling">5</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_table &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">number =</span> <span class="kw">n</span>())
+flights_table</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="left">carrier</th>
+<th align="right">number</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">9E</td>
+<td align="right">18460</td>
+</tr>
+<tr class="even">
+<td align="left">AA</td>
+<td align="right">32729</td>
+</tr>
+<tr class="odd">
+<td align="left">AS</td>
+<td align="right">714</td>
+</tr>
+<tr class="even">
+<td align="left">B6</td>
+<td align="right">54635</td>
+</tr>
+<tr class="odd">
+<td align="left">DL</td>
+<td align="right">48110</td>
+</tr>
+<tr class="even">
+<td align="left">EV</td>
+<td align="right">54173</td>
+</tr>
+<tr class="odd">
+<td align="left">F9</td>
+<td align="right">685</td>
+</tr>
+<tr class="even">
+<td align="left">FL</td>
+<td align="right">3260</td>
+</tr>
+<tr class="odd">
+<td align="left">HA</td>
+<td align="right">342</td>
+</tr>
+<tr class="even">
+<td align="left">MQ</td>
+<td align="right">26397</td>
+</tr>
+<tr class="odd">
+<td align="left">OO</td>
+<td align="right">32</td>
+</tr>
+<tr class="even">
+<td align="left">UA</td>
+<td align="right">58665</td>
+</tr>
+<tr class="odd">
+<td align="left">US</td>
+<td align="right">20536</td>
+</tr>
+<tr class="even">
+<td align="left">VX</td>
+<td align="right">5162</td>
+</tr>
+<tr class="odd">
+<td align="left">WN</td>
+<td align="right">12275</td>
+</tr>
+<tr class="even">
+<td align="left">YV</td>
+<td align="right">601</td>
+</tr>
+</tbody>
+</table>
+<p>In this table, the counts of the carriers are pre-counted. To create a barplot using the data frame <code>flights_table</code>, we</p>
+<ul>
+<li>use <code>geom_col()</code> instead of <code>geom_bar()</code></li>
+<li>map the <code>y</code> aesthetic to the variable <code>number</code>.</li>
+</ul>
+<p>Compare this barplot using <code>geom_col</code> in Figure <a href="3-viz.html#fig:flightscol">3.22</a> with the earlier barplot using <code>geom_bar</code> in Figure <a href="3-viz.html#fig:flightsbar">3.21</a>. They are identical. However the input data we used for these are different.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights_table, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier, <span class="dt">y =</span> number)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_col</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:flightscol"></span>
+<img src="ismaykim_files/figure-html/flightscol-1.png" alt="Number of flights departing NYC in 2013 by airline using geom_col" width="\textwidth" />
+<p class="caption">
+Figure 3.22: Number of flights departing NYC in 2013 by airline using geom_col
+</p>
+</div>
+<!--
+**Technical note**: Refer to the use of `::` in both lines of code above.  This is another way of ensuring the correct function is called.  A `count` exists in a couple different packages and sometimes you'll receive strange errors when a different instance of a function is used.  This is a great way of telling R that "I want this one!".  You specify the name of the package directly before the `::` and then the name of the function immediately after `::`.
+-->
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.26)</strong> Why are histograms inappropriate for visualizing categorical variables?</p>
+<p><strong>(LC3.27)</strong> What is the difference between histograms and barplots?</p>
+<p><strong>(LC3.28)</strong> How many Envoy Air flights departed NYC in 2013?</p>
+<p><strong>(LC3.29)</strong> What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="must-avoid-pie-charts" class="section level3">
+<h3><span class="header-section-number">3.8.2</span> Must avoid pie charts!</h3>
+<p>Unfortunately, one of the most common plots seen today for categorical data is the pie chart. While they may see harmless enough, they actually present a problem in that humans are unable to judge angles well. As Naomi Robbins describes in her book “Creating More Effective Graphs” <span class="citation">(Robbins 2013)</span>, we overestimate angles greater than 90 degrees and we underestimate angles less than 90 degrees. In other words, it is difficult for us to determine relative size of one piece of the pie compared to another.</p>
+<p>Let’s examine our previous barplot example on the number of flights departing NYC by airline. This time we will use a pie chart. As you review this chart, try to identify</p>
+<ul>
+<li>how much larger the portion of the pie is for ExpressJet Airlines (<code>EV</code>) compared to US Airways (<code>US</code>),</li>
+<li>what the third largest carrier is in terms of departing flights, and</li>
+<li>how many carriers have fewer flights than United Airlines (<code>UA</code>)?</li>
+</ul>
+<div class="figure" style="text-align: center"><span id="fig:carrierpie"></span>
+<img src="ismaykim_files/figure-html/carrierpie-1.png" alt="The dreaded pie chart" width="\textwidth" />
+<p class="caption">
+Figure 3.23: The dreaded pie chart
+</p>
+</div>
+<p>While it is quite easy to look back at the barplot to get the answer to these questions, it’s quite difficult to get the answers correct when looking at the pie graph. Barplots can always present the information in a way that is easier for the eye to determine relative position. There may be one exception from Nathan Yau at <a href="https://flowingdata.com/2008/09/19/pie-i-have-eaten-and-pie-i-have-not-eaten/" title="Pie I Have Eaten and Pie I Have Not Eaten">FlowingData.com</a> but we will leave this for the reader to decide:</p>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-47"></span>
+<img src="images/Pie-I-have-Eaten.jpg" alt="The only good pie chart" width="\textwidth" />
+<p class="caption">
+Figure 3.24: The only good pie chart
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.30)</strong> Why should pie charts be avoided and replaced by barplots?</p>
+<p><strong>(LC3.31)</strong> What is your opinion as to why pie charts continue to be used?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="using-barplots-to-compare-two-categorical-variables" class="section level3">
+<h3><span class="header-section-number">3.8.3</span> Using barplots to compare two categorical variables</h3>
+<p>Barplots are the go-to way to visualize the frequency of different categories of a categorical variable. They make it easy to order the counts and to compare the frequencies of one group to another. Another use of barplots (unfortunately, sometimes inappropriately and confusingly) is to compare two categorical variables together. Let’s examine the distribution of outgoing flights from NYC by <code>carrier</code> and <code>airport</code>.</p>
+<p>We begin by getting the names of the airports in NYC that were included in the <code>flights</code> data-set. Here, we preview the <code>inner_join()</code> function from Chapter <a href="5-wrangling.html#wrangling">5</a>. This function will join the data frame <code>flights</code> with the data frame <code>airports</code> by matching rows that have the same airport code. However, in <code>flights</code> the airport code is included in the <code>origin</code> variable whereas in <code>airports</code> the airport code is included in the <code>faa</code> variable. We will revisit such examples in Section <a href="5-wrangling.html#joins">5.8</a> on joining data-sets.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_namedports &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">inner_join</span>(airports, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;origin&quot;</span> =<span class="st"> &quot;faa&quot;</span>))</code></pre>
+<p>After running <code>View(flights_namedports)</code>, we see that <code>name</code> now corresponds to the name of the airport as referenced by the <code>origin</code> variable. We will now plot <code>carrier</code> as the horizontal variable. When we specify <code>geom_bar</code>, it will specify <code>count</code> as being the vertical variable. A new addition here is <code>fill = name</code>. Look over what was produced from the plot to get an idea of what this argument gives.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights_namedports, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier, <span class="dt">fill =</span> name)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-50"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-50-1.png" alt="Stacked barplot comparing the number of flights by carrier and airport" width="\textwidth" />
+<p class="caption">
+Figure 3.25: Stacked barplot comparing the number of flights by carrier and airport
+</p>
+</div>
+<p>This plot is what is known as a <em>stacked barplot</em>. While simple to make, it often leads to many problems. For example in this plot, it is difficult to compare the heights of the different colors (corresponding to the number of flights from each airport) between the bars (corresponding to the different carriers).</p>
+<p>Note that <code>fill</code> is an <code>aes</code>thetic just like <code>x</code> is an <code>aes</code>thetic, and thus must be included within the parentheses of the <code>aes()</code> mapping. The following code, where the <code>fill</code> <code>aes</code>thetic is specified on the outside will yield an error. This is a fairly common error that new <code>ggplot</code> users make:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights_namedports, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier), <span class="dt">fill =</span> name) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>()</code></pre>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.32)</strong> What kinds of questions are not easily answered by looking at the above figure?</p>
+<p><strong>(LC3.33)</strong> What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights?</p>
+<div class="learncheck">
+
+</div>
+<p>Another variation on the stacked barplot is the <em>side-by-side barplot</em> also called a <em>dodged barplot</em>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights_namedports, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier, <span class="dt">fill =</span> name)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>(<span class="dt">position =</span> <span class="st">&quot;dodge&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-53"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-53-1.png" alt="Side-by-side AKA dodged barplot comparing the number of flights by carrier and airport" width="\textwidth" />
+<p class="caption">
+Figure 3.26: Side-by-side AKA dodged barplot comparing the number of flights by carrier and airport
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.34)</strong> Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case?</p>
+<p><strong>(LC3.35)</strong> What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general?</p>
+<div class="learncheck">
+
+</div>
+<p>Lastly, an often preferred type of barplot is the <em>faceted barplot</em>. We already saw this concept of faceting and small multiples in Section <a href="3-viz.html#facets">3.6</a>. This gives us a nicer way to compare the distributions across both <code>carrier</code> and airport/<code>name</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights_namedports, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier, <span class="dt">fill =</span> name)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>name, <span class="dt">ncol =</span> <span class="dv">1</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:facet-bar-vert"></span>
+<img src="ismaykim_files/figure-html/facet-bar-vert-1.png" alt="Faceted barplot comparing the number of flights by carrier and airport" width="\textwidth" />
+<p class="caption">
+Figure 3.27: Faceted barplot comparing the number of flights by carrier and airport
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.36)</strong> Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case?</p>
+<p><strong>(LC3.37)</strong> What information about the different carriers at different airports is more easily seen in the faceted barplot?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="summary-4" class="section level3">
+<h3><span class="header-section-number">3.8.4</span> Summary</h3>
+<p>Barplots are the preferred way of displaying categorical variables. They are easy-to-understand and make it easy to compare across groups of a categorical variable. When dealing with more than one categorical variable, faceted barplots are frequently preferred over side-by-side or stacked barplots. Stacked barplots are sometimes nice to look at, but it is quite difficult to compare across the levels since the sizes of the bars are all of different sizes. Side-by-side barplots can provide an improvement on this, but the issue about comparing across groups still must be dealt with.</p>
+</div>
+</div>
+<div id="conclusion-1" class="section level2">
+<h2><span class="header-section-number">3.9</span> Conclusion</h2>
+<div id="putting-it-all-together" class="section level3">
+<h3><span class="header-section-number">3.9.1</span> Putting it all together</h3>
+<p>Let’s recap all five of the Five Named Graphs (5NG) in Table <a href="3-viz.html#tab:viz-summary-table">3.5</a> summarizing their differences. Using these 5NG, you’ll be able to visualize the distributions and relationships of variables contained in a wide array of datasets. This will be even more the case as we start to map more variables to more of each <code>geom</code>etric object’s <code>aes</code>thetic attribute options, further unlocking the awesome power of the <code>ggplot2</code> package.</p>
+<table>
+<caption><span id="tab:viz-summary-table">Table 3.5: </span>Summary of 5NG</caption>
+<thead>
+<tr class="header">
+<th></th>
+<th align="left">Named graph</th>
+<th align="left">Shows</th>
+<th align="left">Geometric object</th>
+<th align="left">Notes</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>1</td>
+<td align="left">Scatterplot</td>
+<td align="left">Relationship between 2 numerical variables</td>
+<td align="left"><code>geom_point()</code></td>
+<td align="left"></td>
+</tr>
+<tr class="even">
+<td>2</td>
+<td align="left">Linegraph</td>
+<td align="left">Relationship between 2 numerical variables</td>
+<td align="left"><code>geom_line()</code></td>
+<td align="left">Used when there is a sequential order to x-variable e.g. time</td>
+</tr>
+<tr class="odd">
+<td>3</td>
+<td align="left">Histogram</td>
+<td align="left">Distribution of 1 numerical variable</td>
+<td align="left"><code>geom_histogram()</code></td>
+<td align="left">Facetted histogram shows distribution of 1 numerical variable split by 1 categorical variable</td>
+</tr>
+<tr class="even">
+<td>4</td>
+<td align="left">Boxplot</td>
+<td align="left">Distribution of 1 numerical variable split by 1 categorical variable</td>
+<td align="left"><code>geom_boxplot()</code></td>
+<td align="left"></td>
+</tr>
+<tr class="odd">
+<td>5</td>
+<td align="left">Barplot</td>
+<td align="left">Distribution of 1 categorical variable</td>
+<td align="left"><code>geom_barplot()</code> when counts are not pre-counted</td>
+<td align="left">Stacked &amp; dodged barplots show distribution of 2 categorical variables</td>
+</tr>
+<tr class="even">
+<td></td>
+<td align="left"></td>
+<td align="left"></td>
+<td align="left"><code>geom_col()</code> when counts are pre-counted</td>
+<td align="left"></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div id="review-questions" class="section level3">
+<h3><span class="header-section-number">3.9.2</span> Review questions</h3>
+<p>Review questions have been designed using the <a href="https://rudeboybert.github.io/fivethirtyeight/"><code>fivethirtyeight</code> R package</a> <span class="citation">(Kim, Ismay, and Chunn 2019)</span> with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course <strong>Effective Data Storytelling using the <code>tidyverse</code></strong>. The material in this chapter is covered in the chapters of the DataCamp course available below:</p>
+<ul>
+<li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17581?ex=1">Scatterplots &amp; Linegraphs</a></li>
+<li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17582?ex=1">Histograms &amp; Boxplots</a></li>
+<li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17583?ex=1">Barplots</a></li>
+<li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17584?ex=1">ggplot2 Review</a></li>
+</ul>
+</div>
+<div id="whats-to-come-1" class="section level3">
+<h3><span class="header-section-number">3.9.3</span> What’s to come?</h3>
+<p>In Chapter <a href="4-tidy.html#tidy">4</a>, we’ll introduce the concept of “tidy data” and how it is used as a key data format for all the packages we use in this textbook. You’ll see that the concept appears to be simple, but actually can be a little challenging to decipher without careful practice. We’ll also investigate how to import CSV (comma-separated value) files into R using the <code>readr</code> package.</p>
+</div>
+<div id="resources" class="section level3">
+<h3><span class="header-section-number">3.9.4</span> Resources</h3>
+<p>An excellent resource as you begin to create plots using the <code>ggplot2</code> package is a cheatsheet that RStudio has put together entitled “Data Visualization with ggplot2” available</p>
+<ul>
+<li>by clicking <a href="https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf">here</a> or</li>
+<li>by clicking the RStudio Menu Bar -&gt; Help -&gt; Cheatsheets -&gt; “Data Visualization with <code>ggplot2</code>”</li>
+</ul>
+<p>This cheatsheet covers more than what we’ve discussed in this chapter but provides nice visual descriptions of what each function produces.</p>
+<!--
+Fix this later
+-->
+<!--
+In addition, we've created a mind map to help you remember which types of plots are most appropriate in a given situation by identifying the types of variables involved in the problem.  It is available [here](https://coggle.it/diagram/V_G2gzukTDoQ-aZt-) and below.
+-->
+</div>
+<div id="script-of-r-code" class="section level3">
+<h3><span class="header-section-number">3.9.5</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/03-visualization.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="2-getting-started.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="4-tidy.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/03-visualization.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/4-tidy.html b/docs/previous_versions/v0.4.0/4-tidy.html
similarity index 85%
rename from docs/4-tidy.html
rename to docs/previous_versions/v0.4.0/4-tidy.html
index e0385321f..83833fde5 100644
--- a/docs/4-tidy.html
+++ b/docs/previous_versions/v0.4.0/4-tidy.html
@@ -5,11 +5,11 @@
 
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
-  <title>Statistical Inference via Data Science</title>
+  <title>4 Tidy Data via tidyr | An Introduction to Statistical and Data Sciences via R</title>
   <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
-  <meta name="generator" content="bookdown 0.7 and GitBook 2.6.7">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
-  <meta property="og:title" content="Statistical Inference via Data Science" />
+  <meta property="og:title" content="4 Tidy Data via tidyr | An Introduction to Statistical and Data Sciences via R" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
@@ -17,7 +17,7 @@
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
-  <meta name="twitter:title" content="Statistical Inference via Data Science" />
+  <meta name="twitter:title" content="4 Tidy Data via tidyr | An Introduction to Statistical and Data Sciences via R" />
   
   <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
@@ -25,7 +25,7 @@
 <meta name="author" content="Chester Ismay and Albert Y. Kim">
 
 
-<meta name="date" content="2019-01-30">
+<meta name="date" content="2018-07-21">
 
   <meta name="viewport" content="width=device-width, initial-scale=1">
   <meta name="apple-mobile-web-app-capable" content="yes">
@@ -36,6 +36,7 @@
 <link rel="next" href="5-wrangling.html">
 <script src="libs/jquery-2.2.3/jquery.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
 <link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
 <link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
 <link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
@@ -47,8 +48,7 @@
 
 
 
-<script src="libs/kePrint-0.0.1/kePrint.js"></script>
-<script src="libs/htmlwidgets-1.2/htmlwidgets.js"></script>
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
 <link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
 <script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
 <script src="libs/dygraphs-1.1.1/shapes.js"></script>
@@ -71,7 +71,7 @@
 <style type="text/css">
 a.sourceLine { display: inline-block; line-height: 1.25; }
 a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
-a.sourceLine:empty { height: 1.2em; position: absolute; }
+a.sourceLine:empty { height: 1.2em; }
 .sourceCode { overflow: visible; }
 code.sourceCode { white-space: pre; position: relative; }
 div.sourceCode { margin: 1em 0; }
@@ -84,13 +84,11 @@
 a.sourceLine { text-indent: -1em; padding-left: 1em; }
 }
 pre.numberSource a.sourceLine
-  { position: relative; }
-pre.numberSource a.sourceLine:empty
-  { position: absolute; }
+  { position: relative; left: -4em; }
 pre.numberSource a.sourceLine::before
   { content: attr(data-line-number);
-    position: absolute; left: -5em; text-align: right; vertical-align: baseline;
-    border: none; pointer-events: all;
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
     -webkit-touch-callout: none; -webkit-user-select: none;
     -khtml-user-select: none; -moz-user-select: none;
     -ms-user-select: none; user-select: none;
@@ -148,52 +146,51 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.1</b> Introduction for students</a><ul>
-<li class="chapter" data-level="1.1.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.1.1</b> What you will learn from this book</a></li>
-<li class="chapter" data-level="1.1.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.1.2</b> Data/science pipeline</a></li>
-<li class="chapter" data-level="1.1.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.1.3</b> Reproducible research</a></li>
-<li class="chapter" data-level="1.1.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.1.4</b> Final note for students</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
 </ul></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.2</b> Introduction for instructors</a><ul>
-<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.2.1</b> Who is this book for?</a></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
 </ul></li>
-<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.3</b> DataCamp</a></li>
-<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.4</b> Connect and contribute</a></li>
-<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.5</b> About this book</a></li>
-<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.6</b> About the authors</a></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
-<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#r-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
 <li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
 <li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
 </ul></li>
 <li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
 <li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
-<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#errors-warnings-and-messages"><i class="fa fa-check"></i><b>2.2.2</b> Errors, warnings, and messages</a></li>
-<li class="chapter" data-level="2.2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.3</b> Tips on learning to code</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
 </ul></li>
 <li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
 <li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
 <li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
-<li class="chapter" data-level="2.3.3" data-path="2-getting-started.html"><a href="2-getting-started.html#package-use"><i class="fa fa-check"></i><b>2.3.3</b> Package use</a></li>
 </ul></li>
 <li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
-<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> <code>nycflights13</code> package</a></li>
-<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> <code>flights</code> data frame</a></li>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
 <li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
 <li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
 </ul></li>
 <li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
-<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#additional-resources"><i class="fa fa-check"></i><b>2.5.1</b> Additional resources</a></li>
-<li class="chapter" data-level="2.5.2" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
 <li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
 <li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
 <li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
-<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder data</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
 <li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
 <li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
 </ul></li>
@@ -224,17 +221,16 @@
 <li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
 </ul></li>
 <li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#argument-specification"><i class="fa fa-check"></i><b>3.9.1</b> Argument specification</a></li>
-<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.2</b> Putting it all together</a></li>
-<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.3</b> Review questions</a></li>
-<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.4</b> What’s to come?</a></li>
-<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.5</b> Resources</a></li>
-<li class="chapter" data-level="3.9.6" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.6</b> Script of R code</a></li>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
 <li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
 <li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
 <li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
@@ -254,7 +250,7 @@
 </ul></li>
 <li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
 <li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
 <li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
 <li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
@@ -285,7 +281,7 @@
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -310,7 +306,7 @@
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -341,11 +337,11 @@
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using the shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
 </ul></li>
 <li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
 <li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
 <li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
 <li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
@@ -359,7 +355,7 @@
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
 <li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
 <li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
 <li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
@@ -377,12 +373,12 @@
 </ul></li>
 <li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
 <li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
-<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> Example: One proportion</a><ul>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
 <li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
 <li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
 <li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
 </ul></li>
-<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> Example: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
 <li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
@@ -393,7 +389,7 @@
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
 <li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
 <li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
 <li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
@@ -428,7 +424,7 @@
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
 <li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
 <li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
 <li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
@@ -449,7 +445,7 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -532,13 +528,6 @@
 <li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
 </ul></li>
 </ul></li>
-<li class="chapter" data-level="D" data-path="D-appendixD.html"><a href="D-appendixD.html"><i class="fa fa-check"></i><b>D</b> Learning Check Solutions</a><ul>
-<li class="chapter" data-level="D.1" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-2-solutions"><i class="fa fa-check"></i><b>D.1</b> Chapter 2 Solutions</a></li>
-<li class="chapter" data-level="D.2" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-3-solutions"><i class="fa fa-check"></i><b>D.2</b> Chapter 3 Solutions</a></li>
-<li class="chapter" data-level="D.3" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-4-solutions"><i class="fa fa-check"></i><b>D.3</b> Chapter 4 Solutions</a></li>
-<li class="chapter" data-level="D.4" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-5-solutions"><i class="fa fa-check"></i><b>D.4</b> Chapter 5 Solutions</a></li>
-<li class="chapter" data-level="D.5" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-6-solutions"><i class="fa fa-check"></i><b>D.5</b> Chapter 6 Solutions</a></li>
-</ul></li>
 <li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
 </ul>
 
@@ -549,7 +538,7 @@
       <div class="body-inner">
         <div class="book-header" role="navigation">
           <h1>
-            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Statistical Inference via Data Science</a>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
           </h1>
         </div>
 
@@ -561,24 +550,24 @@ <h1>
 <img src='https://moderndive.com/wide_format.png' alt="ModernDive">
 </html>
 <div id="tidy" class="section level1">
-<h1><span class="header-section-number">Chapter 4</span> Tidy Data via tidyr</h1>
+<h1><span class="header-section-number">4</span> Tidy Data via tidyr</h1>
 <p>In Subsection <a href="2-getting-started.html#programming-concepts">2.2.1</a> we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section <a href="2-getting-started.html#nycflights13">2.4</a>, we started explorations of our first data frame <code>flights</code> included in the <code>nycflights13</code> package. In Chapter <a href="3-viz.html#viz">3</a> we made graphics using data contained in <code>flights</code> and other data frames.</p>
-<p>In this chapter, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest of having your data “neatly organized” in a spreadsheet. Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored and the implications of these rules for analyses.</p>
+<p>In this chapter, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest of having your data “neatly organized” in a spreadsheet. Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored and the implications of these rules on analyses.</p>
 <p>Although knowledge of this type of data formatting was not necessary in our treatment of data visualization in Chapter <a href="3-viz.html#viz">3</a> since all the data was already in tidy format, we’ll see going forward that having tidy data will allow you to more easily create data visualizations in a wide range of settings. Furthermore, it will also help you with data wrangling in Chapter <a href="5-wrangling.html#wrangling">5</a> and in all subsequent chapters in this book when we cover regression and discuss statistical inference.</p>
 <div id="needed-packages-1" class="section level3 unnumbered">
 <h3>Needed packages</h3>
 <p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
-<div class="sourceCode" id="cb46"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb46-1" data-line-number="1"><span class="kw">library</span>(dplyr)</a>
-<a class="sourceLine" id="cb46-2" data-line-number="2"><span class="kw">library</span>(ggplot2)</a>
-<a class="sourceLine" id="cb46-3" data-line-number="3"><span class="kw">library</span>(nycflights13)</a>
-<a class="sourceLine" id="cb46-4" data-line-number="4"><span class="kw">library</span>(tidyr)</a>
-<a class="sourceLine" id="cb46-5" data-line-number="5"><span class="kw">library</span>(readr)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(nycflights13)
+<span class="kw">library</span>(tidyr)
+<span class="kw">library</span>(readr)</code></pre>
 </div>
-<div id="datacamp-1" class="section level3 unnumbered">
+<div id="datacamp-2" class="section level3 unnumbered">
 <h3>DataCamp</h3>
 <p>Our approach to introducing the concept of “tidy” data is aligned with the approach taken in <a href="https://twitter.com/apreshill">Alison Hill’s</a> DataCamp course “Working with Data in the Tidyverse,” a course where students learn to work with data using tools from the tidyverse in R. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapter is Chapter 3 “Tidy your data.”</p>
 <center>
-<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/working-with-data-in-the-tidyverse"><img src="images/datacamp_working_with_data.png" style="height: 150px;"/></a>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/working-with-data-in-the-tidyverse"><img src="images/datacamp_working_with_data.png" alt="Drawing" style="height: 150px;"/></a>
 </center>
 <hr />
 <!--Subsection on Tidy Data -->
@@ -592,7 +581,7 @@ <h2><span class="header-section-number">4.1</span> What is tidy data?</h2>
 <li>Marie Kondo’s best-selling book <a href="https://www.amazon.com/Life-Changing-Magic-Tidying-Decluttering-Organizing/dp/1607747308/ref=sr_1_1?ie=UTF8&amp;qid=1469400636&amp;sr=8-1&amp;keywords=tidying+up"><em>The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing</em></a></li>
 <li>“I am not by any stretch of the imagination a tidy person, and the piles of unread books on the coffee table and by my bed have a plaintive, pleading quality to me - ‘Read me, please!’” - Linda Grant</li>
 </ul>
-<p>What does it mean for your data to be “tidy”? Beyond just being organized, in the context of this book having “tidy” data means that your data follows a standardized format. This makes it easier for you and others to visualize your data, to wrangle/transform your data, and to model your data. We will follow Hadley Wickham’s definition of <em>tidy data</em> here <span class="citation">(Wickham <a href="#ref-tidy">2014</a>)</span>:</p>
+<p>What does it mean for your data to be “tidy”? Beyond just being organized, in the context of this book having “tidy” data means that your data follows a standardized format. This makes it easier for you and others to visualize your data, to wrangle/transform your data, and to model your data. We will follow Hadley Wickham’s definition of <em>tidy data</em> here <span class="citation">(Wickham 2014)</span>:</p>
 <blockquote>
 <p>A dataset is a collection of values, usually either numbers (if quantitative)
 or strings AKA text data (if qualitative). Values are organised in two ways.
@@ -616,194 +605,104 @@ <h2><span class="header-section-number">4.1</span> What is tidy data?</h2>
 <div class="figure" style="text-align: center"><span id="fig:tidyfig"></span>
 <img src="images/tidy-1.png" alt="Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html" width="\textwidth" />
 <p class="caption">
-FIGURE 4.1: Tidy data graphic from <a href="http://r4ds.had.co.nz/tidy-data.html" class="uri">http://r4ds.had.co.nz/tidy-data.html</a>
+Figure 4.1: Tidy data graphic from <a href="http://r4ds.had.co.nz/tidy-data.html" class="uri">http://r4ds.had.co.nz/tidy-data.html</a>
 </p>
 </div>
 <p>For example, say the following table consists of stock prices:</p>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-52">TABLE 4.1: </span>Stock Prices (Non-Tidy Format)
-</caption>
+<table>
+<caption><span id="tab:unnamed-chunk-58">Table 4.1: </span>Stock Prices (Non-Tidy Format)</caption>
 <thead>
-<tr>
-<th style="text-align:left;">
-Date
-</th>
-<th style="text-align:left;">
-Boeing Stock Price
-</th>
-<th style="text-align:left;">
-Amazon Stock Price
-</th>
-<th style="text-align:left;">
-Google Stock Price
-</th>
+<tr class="header">
+<th align="left">Date</th>
+<th align="left">Boeing Stock Price</th>
+<th align="left">Amazon Stock Price</th>
+<th align="left">Google Stock Price</th>
 </tr>
 </thead>
 <tbody>
-<tr>
-<td style="text-align:left;">
-2009-01-01
-</td>
-<td style="text-align:left;">
-$173.55
-</td>
-<td style="text-align:left;">
-$174.90
-</td>
-<td style="text-align:left;">
-$174.34
-</td>
+<tr class="odd">
+<td align="left">2009-01-01</td>
+<td align="left">$173.55</td>
+<td align="left">$174.90</td>
+<td align="left">$174.34</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-2009-01-02
-</td>
-<td style="text-align:left;">
-$172.61
-</td>
-<td style="text-align:left;">
-$171.42
-</td>
-<td style="text-align:left;">
-$170.04
-</td>
+<tr class="even">
+<td align="left">2009-01-02</td>
+<td align="left">$172.61</td>
+<td align="left">$171.42</td>
+<td align="left">$170.04</td>
 </tr>
 </tbody>
 </table>
 <p>Although the data are neatly organized in a spreadsheet-type format, they are not in tidy format since there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), but there are not three columns. In tidy data format each variable should be its own column, as shown below. Notice that both tables present the same information, but in different formats.</p>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-53">TABLE 4.2: </span>Stock Prices (Tidy Format)
-</caption>
+<table>
+<caption><span id="tab:unnamed-chunk-59">Table 4.2: </span>Stock Prices (Tidy Format)</caption>
 <thead>
-<tr>
-<th style="text-align:left;">
-Date
-</th>
-<th style="text-align:left;">
-Stock Name
-</th>
-<th style="text-align:left;">
-Stock Price
-</th>
+<tr class="header">
+<th align="left">Date</th>
+<th align="left">Stock Name</th>
+<th align="left">Stock Price</th>
 </tr>
 </thead>
 <tbody>
-<tr>
-<td style="text-align:left;">
-2009-01-01
-</td>
-<td style="text-align:left;">
-Boeing
-</td>
-<td style="text-align:left;">
-$173.55
-</td>
+<tr class="odd">
+<td align="left">2009-01-01</td>
+<td align="left">Boeing</td>
+<td align="left">$173.55</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-2009-01-02
-</td>
-<td style="text-align:left;">
-Boeing
-</td>
-<td style="text-align:left;">
-$172.61
-</td>
+<tr class="even">
+<td align="left">2009-01-02</td>
+<td align="left">Boeing</td>
+<td align="left">$172.61</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-2009-01-01
-</td>
-<td style="text-align:left;">
-Amazon
-</td>
-<td style="text-align:left;">
-$174.90
-</td>
+<tr class="odd">
+<td align="left">2009-01-01</td>
+<td align="left">Amazon</td>
+<td align="left">$174.90</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-2009-01-02
-</td>
-<td style="text-align:left;">
-Amazon
-</td>
-<td style="text-align:left;">
-$171.42
-</td>
+<tr class="even">
+<td align="left">2009-01-02</td>
+<td align="left">Amazon</td>
+<td align="left">$171.42</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-2009-01-01
-</td>
-<td style="text-align:left;">
-Google
-</td>
-<td style="text-align:left;">
-$174.34
-</td>
+<tr class="odd">
+<td align="left">2009-01-01</td>
+<td align="left">Google</td>
+<td align="left">$174.34</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-2009-01-02
-</td>
-<td style="text-align:left;">
-Google
-</td>
-<td style="text-align:left;">
-$170.04
-</td>
+<tr class="even">
+<td align="left">2009-01-02</td>
+<td align="left">Google</td>
+<td align="left">$170.04</td>
 </tr>
 </tbody>
 </table>
 <p>However, consider the following table</p>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<caption style="font-size: initial !important;">
-<span id="tab:unnamed-chunk-54">TABLE 4.3: </span>Date, Boeing Price, Weather Data
-</caption>
+<table>
+<caption><span id="tab:unnamed-chunk-60">Table 4.3: </span>Date, Boeing Price, Weather Data</caption>
 <thead>
-<tr>
-<th style="text-align:left;">
-Date
-</th>
-<th style="text-align:left;">
-Boeing Price
-</th>
-<th style="text-align:left;">
-Weather
-</th>
+<tr class="header">
+<th align="left">Date</th>
+<th align="left">Boeing Price</th>
+<th align="left">Weather</th>
 </tr>
 </thead>
 <tbody>
-<tr>
-<td style="text-align:left;">
-2009-01-01
-</td>
-<td style="text-align:left;">
-$173.55
-</td>
-<td style="text-align:left;">
-Sunny
-</td>
+<tr class="odd">
+<td align="left">2009-01-01</td>
+<td align="left">$173.55</td>
+<td align="left">Sunny</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-2009-01-02
-</td>
-<td style="text-align:left;">
-$172.61
-</td>
-<td style="text-align:left;">
-Overcast
-</td>
+<tr class="even">
+<td align="left">2009-01-02</td>
+<td align="left">$172.61</td>
+<td align="left">Overcast</td>
 </tr>
 </tbody>
 </table>
 <p>In this case, even though the variable “Boeing Price” occurs again, the data <em>is</em> tidy since there are three variables corresponding to three unique pieces of information (Date, Boeing stock price, and the weather that particular day).</p>
 <p>The non-tidy data format in the original table is also known as <a href="https://en.wikipedia.org/wiki/Wide_and_narrow_data">“wide”</a> format whereas the tidy data format in the second table is also known as <a href="https://en.wikipedia.org/wiki/Wide_and_narrow_data#Narrow">“long/narrow”</a> data format.</p>
-<p>In this book, we will work mostly with datasets that are already in tidy format even though a lot of the world’s data isn’t always in this nice format that the <code>tidyverse</code> gets its name from. Data that is in wide format can be converted to “tidy” format by using the <code>gather()</code> function in the <code>tidyr</code> package <span class="citation">(Wickham and Henry <a href="#ref-R-tidyr">2018</a>)</span> in the <code>tidyverse</code>; we’ll show an example of this in Section <a href="4-tidy.html#tidying">4.4</a>. For other examples of converting a dataset into “tidy” format, check out the different functions available for data tidying and a case study using data from the World Health Organization in <a href="http://r4ds.had.co.nz/tidy-data.html">R for Data Science</a> <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>.</p>
+<p>In this book, we will work mostly with datasets that are already in tidy format even though a lot of the world’s data isn’t always in this nice format that the <code>tidyverse</code> gets its name from. Data that is in wide format can be converted to “tidy” format by using the <code>gather()</code> function in the <code>tidyr</code> package <span class="citation">(Wickham and Henry 2018)</span> in the <code>tidyverse</code>; we’ll show an example of this in Section <a href="4-tidy.html#tidying">4.4</a>. For other examples of converting a dataset into “tidy” format, check out the different functions available for data tidying and a case study using data from the World Health Organization in <a href="http://r4ds.had.co.nz/tidy-data.html">R for Data Science</a> <span class="citation">(Grolemund and Wickham 2016)</span>.</p>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
@@ -838,7 +737,7 @@ <h2><span class="header-section-number">4.2</span> Back to nycflights13</h2>
 <div id="observational-units" class="section level3">
 <h3><span class="header-section-number">4.2.1</span> Observational units</h3>
 <p>We identified earlier that the observational unit in the <code>flights</code> dataset is an individual flight. And we have shown that this dataset consists of 336,776 flights with 19 variables. In other words, rows of this dataset don’t refer to a measurement on an airline or on an airport; they refer to characteristics/measurements on a given flight from New York City in 2013.</p>
-<p>Also included in the <code>nycflights13</code> package are datasets with different observational units <span class="citation">(Wickham <a href="#ref-R-nycflights13">2018</a>)</span>:</p>
+<p>Also included in the <code>nycflights13</code> package are datasets with different observational units <span class="citation">(Wickham 2018)</span>:</p>
 <ul>
 <li><code>airlines</code>: translation between two letter IATA carrier codes and names (16 in total)</li>
 <li><code>planes</code>: construction information about each of 3,322 planes used</li>
@@ -851,13 +750,13 @@ <h3><span class="header-section-number">4.2.1</span> Observational units</h3>
 <div id="identification-vs-measurement" class="section level3">
 <h3><span class="header-section-number">4.2.2</span> Identification vs measurement variables</h3>
 <p>There is a subtle difference between the kinds of variables that you will encounter in data frames: <em>measurement variables</em> and <em>identification variables</em>. The <code>airports</code> data frame you worked with above contains both these types of variables. Recall that in <code>airports</code> the observational unit is an airport, and thus each row corresponds to one particular airport. Let’s pull them apart using the <code>glimpse</code> function:</p>
-<div class="sourceCode" id="cb48"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb48-1" data-line-number="1"><span class="kw">glimpse</span>(airports)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(airports)</code></pre>
 <pre><code>Observations: 1,458
 Variables: 8
 $ faa   &lt;chr&gt; &quot;04G&quot;, &quot;06A&quot;, &quot;06C&quot;, &quot;06N&quot;, &quot;09J&quot;, &quot;0A9&quot;, &quot;0G6&quot;, &quot;0G7&quot;, &quot;0P2&quot;, …
 $ name  &lt;chr&gt; &quot;Lansdowne Airport&quot;, &quot;Moton Field Municipal Airport&quot;, &quot;Schaumbu…
-$ lat   &lt;dbl&gt; 41.1, 32.5, 42.0, 41.4, 31.1, 36.4, 41.5, 42.9, 39.8, 48.1, 39.…
-$ lon   &lt;dbl&gt; -80.6, -85.7, -88.1, -74.4, -81.4, -82.2, -84.5, -76.8, -76.6, …
+$ lat   &lt;dbl&gt; 41.13047, 32.46057, 41.98934, 41.43191, 31.07447, 36.37122, 41.…
+$ lon   &lt;dbl&gt; -80.61958, -85.68003, -88.10124, -74.39156, -81.42778, -82.1734…
 $ alt   &lt;int&gt; 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 409, 875, 1…
 $ tz    &lt;dbl&gt; -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -5, -5, -5,…
 $ dst   &lt;chr&gt; &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;…
@@ -890,17 +789,17 @@ <h2><span class="header-section-number">4.3</span> Importing spreadsheets into R
 </ul></li>
 <li>An Excel <code>.xlsx</code> file. This format is based on Microsoft’s proprietary Excel software. As opposed to a bare-bones <code>.csv</code> files, <code>.xlsx</code> Excel files contain a lot of <em>metadata</em>, or put more simply, data about the data. Examples include the use of bold and italic fonts, colored cells, different column widths, and formula macros etc.</li>
 </ul>
-<p><a href="https://www.google.com/sheets/about/">Google Sheets</a> allows you to download your data in both comma separated values <code>.csv</code> and Excel <code>.xlsx</code> formats: Go to the Google Sheets menu bar -&gt; File -&gt; Download as -&gt; Select “Microsoft Excel” or “Comma-separated values.”</p>
+<p><a href="https://www.google.com/sheets/about/">Google Sheets</a> allows you to download your data in both comma separated values <code>.csv</code> and Excel <code>.xlsx</code> formats: Go to the Google Sheets menu bar -&gt; File -&gt; Download as -&gt; Select “Microsoft Excel” or “Comma-separated values”.</p>
 <p>We’ll cover two methods for importing data in R: one using the R console and the other using RStudio’s graphical interface.</p>
 <div id="method-1-from-the-console" class="section level3">
 <h3><span class="header-section-number">4.3.1</span> Method 1: From the console</h3>
 <p>First, let’s download a <em>Comma Separated Values</em> (CSV) file of ratings of the level of democracy in different countries spanning 1952 to 1992: <a href="https://moderndive.com/data/dem_score.csv" class="uri">https://moderndive.com/data/dem_score.csv</a>. We use the <code>read_csv()</code> function from the <code>readr</code> package to read it off the web and then take a look.</p>
-<div class="sourceCode" id="cb50"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb50-1" data-line-number="1"><span class="kw">library</span>(readr)</a>
-<a class="sourceLine" id="cb50-2" data-line-number="2">dem_score &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;https://moderndive.com/data/dem_score.csv&quot;</span>)</a>
-<a class="sourceLine" id="cb50-3" data-line-number="3">dem_score</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(readr)
+dem_score &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;https://moderndive.com/data/dem_score.csv&quot;</span>)
+dem_score</code></pre>
 <pre><code># A tibble: 96 x 10
    country    `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
-   &lt;chr&gt;       &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;
+   &lt;chr&gt;       &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
  1 Albania        -9     -9     -9     -9     -9     -9     -9     -9      5
  2 Argentina      -9     -1     -1     -9     -9     -9     -8      8      7
  3 Armenia        -9     -7     -7     -7     -7     -7     -7     -7      7
@@ -916,7 +815,7 @@ <h3><span class="header-section-number">4.3.1</span> Method 1: From the console<
 </div>
 <div id="method-2-using-rstudios-interface" class="section level3">
 <h3><span class="header-section-number">4.3.2</span> Method 2: Using RStudio’s interface</h3>
-<p>Let’s read in the same data saved in Excel format this time at <a href="https://moderndive.com/data/dem_score.xlsx" class="uri">https://moderndive.com/data/dem_score.xlsx</a>, but using RStudio’s graphical interface instead of via the R console. First download the Excel file, then go to the Files panel of RStudio -&gt; Navigate to the directory where your downloaded <code>dem_score.xlsx</code> is saved -&gt; Click on <code>dem_score.xlsx</code> -&gt; Click “Import Dataset…” At this point you should see an image like in</p>
+<p>Let’s read in the same data saved in Excel format this time at <a href="https://moderndive.com/data/dem_score.xlsx" class="uri">https://moderndive.com/data/dem_score.xlsx</a>, but using RStudio’s graphical interface instead of via the R console. First download the Excel file, then go to the Files pane of RStudio -&gt; Navigate to the directory where your downloaded <code>dem_score.xlsx</code> is saved -&gt; Click on <code>dem_score.xlsx</code> -&gt; Click “Import Dataset…” -&gt; Click “Import Dataset…” At this point you should see an image like in</p>
 <p><img src="images/read_excel.png" /></p>
 <p>After clicking on the “Import” button on the bottom right RStudio save this spreadsheet’s data in a data frame called <code>dem_score</code> and display its contents in the spreadsheet viewer. Furthermore you’ll see the code that read in your data in the console; you can copy and paste this code to reload your data again later instead of repeating the above manual process.</p>
 <hr />
@@ -925,12 +824,12 @@ <h3><span class="header-section-number">4.3.2</span> Method 2: Using RStudio’s
 <div id="tidying" class="section level2">
 <h2><span class="header-section-number">4.4</span> Converting to “tidy” data format</h2>
 <p>In this Section, we’ll show you how to convert a dataset that isn’t in “tidy” format i.e. “wide” format, to a dataset that is in “tidy” format i.e. “long/narrow” format. Let’s use the <code>dem_score</code> data frame we loaded from a spreadsheet in the previous Section but focus on only data corresponding to the country of Guatemala.</p>
-<div class="sourceCode" id="cb52"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb52-1" data-line-number="1">guat_dem &lt;-<span class="st"> </span>dem_score <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb52-2" data-line-number="2"><span class="st">  </span><span class="kw">filter</span>(country <span class="op">==</span><span class="st"> &quot;Guatemala&quot;</span>)</a>
-<a class="sourceLine" id="cb52-3" data-line-number="3">guat_dem</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">guat_dem &lt;-<span class="st"> </span>dem_score <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(country <span class="op">==</span><span class="st"> &quot;Guatemala&quot;</span>)
+guat_dem</code></pre>
 <pre><code># A tibble: 1 x 10
   country   `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
-  &lt;chr&gt;      &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;
+  &lt;chr&gt;      &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
 1 Guatemala      2     -6     -5      3      1     -3     -7      3      3</code></pre>
 <p>Now let’s produce a plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Let’s start by laying out how we would map our aesthetics to variables in the data frame:</p>
 <ul>
@@ -944,14 +843,14 @@ <h2><span class="header-section-number">4.4</span> Converting to “tidy” data
 <p>The <code>gather()</code> function in the <code>tidyr</code> package can complete this task for us. The first argument to <code>gather()</code>, just as with <code>ggplot2()</code>, is the <code>data</code> argument where we specify which data frame we would like to tidy. The next two arguments to <code>gather()</code> are <code>key</code> and <code>value</code>, which specify what we’d like to call the new columns that convert our wide data into long format. Lastly, we include a specification for variables we’d like to NOT include in this tidying process using a <code>-</code>.</p>
 <!-- Should we include a mention of also including all the variables you'd like to include? I rarely do this and use the negation instead. -->
 <!-- I like not teaching the pipe here since the data argument is the same as what they are used to with ggplot2 -->
-<div class="sourceCode" id="cb54"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb54-1" data-line-number="1">guat_tidy &lt;-<span class="st"> </span><span class="kw">gather</span>(<span class="dt">data =</span> guat_dem, </a>
-<a class="sourceLine" id="cb54-2" data-line-number="2">                    <span class="dt">key =</span> year,</a>
-<a class="sourceLine" id="cb54-3" data-line-number="3">                    <span class="dt">value =</span> democracy_score,</a>
-<a class="sourceLine" id="cb54-4" data-line-number="4">                    <span class="op">-</span><span class="st"> </span>country) </a>
-<a class="sourceLine" id="cb54-5" data-line-number="5">guat_tidy</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">guat_tidy &lt;-<span class="st"> </span><span class="kw">gather</span>(<span class="dt">data =</span> guat_dem, 
+                    <span class="dt">key =</span> year,
+                    <span class="dt">value =</span> democracy_score,
+                    <span class="op">-</span><span class="st"> </span>country) 
+guat_tidy</code></pre>
 <pre><code># A tibble: 9 x 3
   country   year  democracy_score
-  &lt;chr&gt;     &lt;chr&gt;           &lt;int&gt;
+  &lt;chr&gt;     &lt;chr&gt;           &lt;dbl&gt;
 1 Guatemala 1952                2
 2 Guatemala 1957               -6
 3 Guatemala 1962               -5
@@ -962,23 +861,20 @@ <h2><span class="header-section-number">4.4</span> Converting to “tidy” data
 8 Guatemala 1987                3
 9 Guatemala 1992                3</code></pre>
 <p>We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a linegraph and <code>ggplot2</code>.</p>
-<div class="sourceCode" id="cb56"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb56-1" data-line-number="1"><span class="kw">ggplot</span>(<span class="dt">data =</span> guat_tidy, </a>
-<a class="sourceLine" id="cb56-2" data-line-number="2">       <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> year, <span class="dt">y =</span> democracy_score)) <span class="op">+</span></a>
-<a class="sourceLine" id="cb56-3" data-line-number="3"><span class="st">  </span><span class="kw">geom_line</span>()</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> guat_tidy, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> year, <span class="dt">y =</span> democracy_score)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_line</span>()</code></pre>
 <pre><code>geom_path: Each group consists of only one observation. Do you need to adjust
 the group aesthetic?</code></pre>
-<p><img src="ismaykim_files/figure-html/unnamed-chunk-64-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-70-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
 <!-- Arg, this is really annoying that gather() doesn't see that these are all numbers.  Do you know a way around this? I usually just go mutate(year = as.numeric(year) but they don't know mutate() yet. -->
 <p>Observe that the <code>year</code> variable in <code>guat_tidy</code> is stored as a character vector since we had to circumvent the naming rules in R by adding backticks around the different year columns in <code>guat_dem</code>. This is leading to <code>ggplot</code> not knowing exactly how to plot a line using a categorical variable. We can fix this by using the <code>parse_number()</code> function in the <code>readr</code> package and then specify the horizontal axis label to be <code>&quot;year&quot;</code>:</p>
-<div class="sourceCode" id="cb58"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb58-1" data-line-number="1"><span class="kw">ggplot</span>(<span class="dt">data =</span> guat_tidy, </a>
-<a class="sourceLine" id="cb58-2" data-line-number="2">       <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> <span class="kw">parse_number</span>(year), </a>
-<a class="sourceLine" id="cb58-3" data-line-number="3">                     <span class="dt">y =</span> democracy_score)) <span class="op">+</span></a>
-<a class="sourceLine" id="cb58-4" data-line-number="4"><span class="st">  </span><span class="kw">geom_line</span>() <span class="op">+</span></a>
-<a class="sourceLine" id="cb58-5" data-line-number="5"><span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;year&quot;</span>)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> guat_tidy, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> <span class="kw">parse_number</span>(year), <span class="dt">y =</span> democracy_score)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_line</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;year&quot;</span>)</code></pre>
 <div class="figure" style="text-align: center"><span id="fig:guatline"></span>
 <img src="ismaykim_files/figure-html/guatline-1.png" alt="Guatemala's democracy score ratings from 1952 to 1992" width="\textwidth" />
 <p class="caption">
-FIGURE 4.2: Guatemala’s democracy score ratings from 1952 to 1992
+Figure 4.2: Guatemala’s democracy score ratings from 1952 to 1992
 </p>
 </div>
 <p>We’ll see in Chapter <a href="5-wrangling.html#wrangling">5</a> how we could use the <code>mutate()</code> function to change <code>year</code> to be a numeric variable instead after we have done our tidying. Notice now that the mappings of aesthetics to variables make sense in Figure <a href="4-tidy.html#fig:guatline">4.2</a>:</p>
@@ -1006,11 +902,9 @@ <h2><span class="header-section-number">4.5</span> Optional: Normal forms of dat
 <p>The datasets included in the <code>nycflights13</code> package are in a form that minimizes redundancy of data. We will see that there are ways to <em>merge</em> (or <em>join</em>) the different tables together easily. We are capable of doing so because each of the tables have <em>keys</em> in common to relate one to another. This is an important property of <strong>normal forms</strong> of data. The process of decomposing data frames into less redundant tables without losing information is called <strong>normalization</strong>. More information is available on <a href="https://en.wikipedia.org/wiki/Database_normalization">Wikipedia</a>.</p>
 <p>We saw an example of this above with the <code>airlines</code> dataset. While the <code>flights</code> data frame could also include a column with the names of the airlines instead of the carrier code, this would be repetitive since there is a unique mapping of the carrier code to the name of the airline/carrier.</p>
 <p>Below an example is given showing how to <strong>join</strong> the <code>airlines</code> data frame together with the <code>flights</code> data frame by linking together the two datasets via a common <strong>key</strong> of <code>&quot;carrier&quot;</code>. Note that this “joined” data frame is assigned to a new data frame called <code>joined_flights</code>. The <strong>key</strong> variable that we frequently join by is one of the <em>identification variables</em> mentioned above.</p>
-<div class="sourceCode" id="cb59"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb59-1" data-line-number="1"><span class="kw">library</span>(dplyr)</a>
-<a class="sourceLine" id="cb59-2" data-line-number="2">joined_flights &lt;-<span class="st"> </span><span class="kw">inner_join</span>(<span class="dt">x =</span> flights, </a>
-<a class="sourceLine" id="cb59-3" data-line-number="3">                             <span class="dt">y =</span> airlines, </a>
-<a class="sourceLine" id="cb59-4" data-line-number="4">                             <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)</a></code></pre></div>
-<div class="sourceCode" id="cb60"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb60-1" data-line-number="1"><span class="kw">View</span>(joined_flights)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+joined_flights &lt;-<span class="st"> </span><span class="kw">inner_join</span>(<span class="dt">x =</span> flights, <span class="dt">y =</span> airlines, <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(joined_flights)</code></pre>
 <p>If we <code>View</code> this dataset, we see a new variable has been created called <code>name</code>. (We will see in Subsection <a href="5-wrangling.html#rename">5.9.2</a> ways to change <code>name</code> to a more descriptive variable name.) More discussion about joining data frames together will be given in Chapter <a href="5-wrangling.html#wrangling">5</a>. We will see there that the names of the columns to be linked need not match as they did here with <code>&quot;carrier&quot;</code>.</p>
 <div class="learncheck">
 <p>
@@ -1030,7 +924,7 @@ <h2><span class="header-section-number">4.6</span> Conclusion</h2>
 <div id="review-questions-1" class="section level3">
 <h3><span class="header-section-number">4.6.1</span> Review questions</h3>
 <!-- Need to include an exercise in the DataCamp course on using gather() to turn the `police_locals` data frame into a tidy data frame. -->
-<p>Review questions have been designed using the <code>fivethirtyeight</code> R package <span class="citation">(Kim, Ismay, and Chunn <a href="#ref-R-fivethirtyeight">2018</a>)</span> with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course <strong>Effective Data Storytelling using the <code>tidyverse</code></strong>. The material in this chapter is covered in the <strong>Tidy Data</strong> chapter of the DataCamp course available <a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse/tidy-data">here</a>.</p>
+<p>Review questions have been designed using the <code>fivethirtyeight</code> R package <span class="citation">(Kim, Ismay, and Chunn 2019)</span> with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course <strong>Effective Data Storytelling using the <code>tidyverse</code></strong>. The material in this chapter is covered in the <strong>Tidy Data</strong> chapter of the DataCamp course available <a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse/tidy-data">here</a>.</p>
 </div>
 <div id="whats-to-come-2" class="section level3">
 <h3><span class="header-section-number">4.6.2</span> What’s to come?</h3>
@@ -1038,28 +932,10 @@ <h3><span class="header-section-number">4.6.2</span> What’s to come?</h3>
 </div>
 <div id="script-of-r-code-1" class="section level3">
 <h3><span class="header-section-number">4.6.3</span> Script of R code</h3>
-<p>An R script file of all R code used in this chapter is available <a href="scripts/04-tidy.R">here</a>.</p>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/04-tidy.R">here</a>.</p>
 
 </div>
 </div>
-</div>
-<h3>References</h3>
-<div id="refs" class="references">
-<div id="ref-tidy">
-<p>Wickham, Hadley. 2014. “Tidy Data.” <em>Journal of Statistical Software</em> Volume 59 (Issue 10). <a href="https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf" class="uri">https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf</a>.</p>
-</div>
-<div id="ref-R-tidyr">
-<p>Wickham, Hadley, and Lionel Henry. 2018. <em>Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions</em>. <a href="https://CRAN.R-project.org/package=tidyr" class="uri">https://CRAN.R-project.org/package=tidyr</a>.</p>
-</div>
-<div id="ref-rds2016">
-<p>Grolemund, Garrett, and Hadley Wickham. 2016. <em>R for Data Science</em>. <a href="http://r4ds.had.co.nz/" class="uri">http://r4ds.had.co.nz/</a>.</p>
-</div>
-<div id="ref-R-nycflights13">
-<p>Wickham, Hadley. 2018. <em>Nycflights13: Flights That Departed Nyc in 2013</em>. <a href="https://CRAN.R-project.org/package=nycflights13" class="uri">https://CRAN.R-project.org/package=nycflights13</a>.</p>
-</div>
-<div id="ref-R-fivethirtyeight">
-<p>Kim, Albert Y., Chester Ismay, and Jennifer Chunn. 2018. <em>Fivethirtyeight: Data and Code Behind the Stories and Interactives at ’Fivethirtyeight’</em>. <a href="https://github.com/rudeboybert/fivethirtyeight" class="uri">https://github.com/rudeboybert/fivethirtyeight</a>.</p>
-</div>
 </div>
             </section>
 
@@ -1087,7 +963,7 @@ <h3>References</h3>
 "google": false,
 "linkedin": false,
 "weibo": false,
-"instapper": false,
+"instapaper": false,
 "vk": false,
 "all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
 },
@@ -1100,6 +976,10 @@ <h3>References</h3>
 "link": "https://github.com/moderndive/moderndive_book/edit/master/04-tidy.Rmd",
 "text": "Edit"
 },
+"history": {
+"link": null,
+"text": null
+},
 "download": null,
 "toc": {
 "collapse": "section",
@@ -1115,7 +995,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:" && /^https?:/.test(src))
       src = src.replace(/^https?:/, '');
     script.src = src;
diff --git a/docs/previous_versions/v0.4.0/4-wrangling.html b/docs/previous_versions/v0.4.0/4-wrangling.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/docs/previous_versions/v0.4.0/5-multiple-regression.html b/docs/previous_versions/v0.4.0/5-multiple-regression.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/docs/5-wrangling.html b/docs/previous_versions/v0.4.0/5-wrangling.html
similarity index 72%
rename from docs/5-wrangling.html
rename to docs/previous_versions/v0.4.0/5-wrangling.html
index f182c4e1b..51ef75ea8 100644
--- a/docs/5-wrangling.html
+++ b/docs/previous_versions/v0.4.0/5-wrangling.html
@@ -5,11 +5,11 @@
 
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
-  <title>Statistical Inference via Data Science</title>
+  <title>5 Data Wrangling via dplyr | An Introduction to Statistical and Data Sciences via R</title>
   <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
-  <meta name="generator" content="bookdown 0.7 and GitBook 2.6.7">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
-  <meta property="og:title" content="Statistical Inference via Data Science" />
+  <meta property="og:title" content="5 Data Wrangling via dplyr | An Introduction to Statistical and Data Sciences via R" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
@@ -17,7 +17,7 @@
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
-  <meta name="twitter:title" content="Statistical Inference via Data Science" />
+  <meta name="twitter:title" content="5 Data Wrangling via dplyr | An Introduction to Statistical and Data Sciences via R" />
   
   <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
@@ -25,7 +25,7 @@
 <meta name="author" content="Chester Ismay and Albert Y. Kim">
 
 
-<meta name="date" content="2019-01-30">
+<meta name="date" content="2018-07-21">
 
   <meta name="viewport" content="width=device-width, initial-scale=1">
   <meta name="apple-mobile-web-app-capable" content="yes">
@@ -36,6 +36,7 @@
 <link rel="next" href="6-regression.html">
 <script src="libs/jquery-2.2.3/jquery.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
 <link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
 <link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
 <link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
@@ -47,8 +48,7 @@
 
 
 
-<script src="libs/kePrint-0.0.1/kePrint.js"></script>
-<script src="libs/htmlwidgets-1.2/htmlwidgets.js"></script>
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
 <link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
 <script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
 <script src="libs/dygraphs-1.1.1/shapes.js"></script>
@@ -71,7 +71,7 @@
 <style type="text/css">
 a.sourceLine { display: inline-block; line-height: 1.25; }
 a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
-a.sourceLine:empty { height: 1.2em; position: absolute; }
+a.sourceLine:empty { height: 1.2em; }
 .sourceCode { overflow: visible; }
 code.sourceCode { white-space: pre; position: relative; }
 div.sourceCode { margin: 1em 0; }
@@ -84,13 +84,11 @@
 a.sourceLine { text-indent: -1em; padding-left: 1em; }
 }
 pre.numberSource a.sourceLine
-  { position: relative; }
-pre.numberSource a.sourceLine:empty
-  { position: absolute; }
+  { position: relative; left: -4em; }
 pre.numberSource a.sourceLine::before
   { content: attr(data-line-number);
-    position: absolute; left: -5em; text-align: right; vertical-align: baseline;
-    border: none; pointer-events: all;
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
     -webkit-touch-callout: none; -webkit-user-select: none;
     -khtml-user-select: none; -moz-user-select: none;
     -ms-user-select: none; user-select: none;
@@ -148,52 +146,51 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.1</b> Introduction for students</a><ul>
-<li class="chapter" data-level="1.1.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.1.1</b> What you will learn from this book</a></li>
-<li class="chapter" data-level="1.1.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.1.2</b> Data/science pipeline</a></li>
-<li class="chapter" data-level="1.1.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.1.3</b> Reproducible research</a></li>
-<li class="chapter" data-level="1.1.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.1.4</b> Final note for students</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
 </ul></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.2</b> Introduction for instructors</a><ul>
-<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.2.1</b> Who is this book for?</a></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
 </ul></li>
-<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.3</b> DataCamp</a></li>
-<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.4</b> Connect and contribute</a></li>
-<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.5</b> About this book</a></li>
-<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.6</b> About the authors</a></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
-<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#r-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
 <li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
 <li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
 </ul></li>
 <li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
 <li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
-<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#errors-warnings-and-messages"><i class="fa fa-check"></i><b>2.2.2</b> Errors, warnings, and messages</a></li>
-<li class="chapter" data-level="2.2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.3</b> Tips on learning to code</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
 </ul></li>
 <li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
 <li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
 <li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
-<li class="chapter" data-level="2.3.3" data-path="2-getting-started.html"><a href="2-getting-started.html#package-use"><i class="fa fa-check"></i><b>2.3.3</b> Package use</a></li>
 </ul></li>
 <li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
-<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> <code>nycflights13</code> package</a></li>
-<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> <code>flights</code> data frame</a></li>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
 <li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
 <li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
 </ul></li>
 <li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
-<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#additional-resources"><i class="fa fa-check"></i><b>2.5.1</b> Additional resources</a></li>
-<li class="chapter" data-level="2.5.2" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
 <li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
 <li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
 <li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
-<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder data</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
 <li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
 <li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
 </ul></li>
@@ -224,17 +221,16 @@
 <li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
 </ul></li>
 <li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#argument-specification"><i class="fa fa-check"></i><b>3.9.1</b> Argument specification</a></li>
-<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.2</b> Putting it all together</a></li>
-<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.3</b> Review questions</a></li>
-<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.4</b> What’s to come?</a></li>
-<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.5</b> Resources</a></li>
-<li class="chapter" data-level="3.9.6" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.6</b> Script of R code</a></li>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
 <li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
 <li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
 <li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
@@ -254,7 +250,7 @@
 </ul></li>
 <li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
 <li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
 <li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
 <li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
@@ -285,7 +281,7 @@
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -310,7 +306,7 @@
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -341,11 +337,11 @@
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using the shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
 </ul></li>
 <li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
 <li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
 <li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
 <li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
@@ -359,7 +355,7 @@
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
 <li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
 <li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
 <li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
@@ -377,12 +373,12 @@
 </ul></li>
 <li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
 <li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
-<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> Example: One proportion</a><ul>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
 <li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
 <li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
 <li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
 </ul></li>
-<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> Example: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
 <li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
@@ -393,7 +389,7 @@
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
 <li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
 <li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
 <li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
@@ -428,7 +424,7 @@
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
 <li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
 <li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
 <li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
@@ -449,7 +445,7 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -532,13 +528,6 @@
 <li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
 </ul></li>
 </ul></li>
-<li class="chapter" data-level="D" data-path="D-appendixD.html"><a href="D-appendixD.html"><i class="fa fa-check"></i><b>D</b> Learning Check Solutions</a><ul>
-<li class="chapter" data-level="D.1" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-2-solutions"><i class="fa fa-check"></i><b>D.1</b> Chapter 2 Solutions</a></li>
-<li class="chapter" data-level="D.2" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-3-solutions"><i class="fa fa-check"></i><b>D.2</b> Chapter 3 Solutions</a></li>
-<li class="chapter" data-level="D.3" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-4-solutions"><i class="fa fa-check"></i><b>D.3</b> Chapter 4 Solutions</a></li>
-<li class="chapter" data-level="D.4" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-5-solutions"><i class="fa fa-check"></i><b>D.4</b> Chapter 5 Solutions</a></li>
-<li class="chapter" data-level="D.5" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-6-solutions"><i class="fa fa-check"></i><b>D.5</b> Chapter 6 Solutions</a></li>
-</ul></li>
 <li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
 </ul>
 
@@ -549,7 +538,7 @@
       <div class="body-inner">
         <div class="book-header" role="navigation">
           <h1>
-            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Statistical Inference via Data Science</a>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
           </h1>
         </div>
 
@@ -561,7 +550,7 @@ <h1>
 <img src='https://moderndive.com/wide_format.png' alt="ModernDive">
 </html>
 <div id="wrangling" class="section level1">
-<h1><span class="header-section-number">Chapter 5</span> Data Wrangling via dplyr</h1>
+<h1><span class="header-section-number">5</span> Data Wrangling via dplyr</h1>
 <!--
 - Make sure to refer back to plots in the viz chapter and consider how the
   material here relates to answering the questions posed in viz chapter
@@ -576,28 +565,28 @@ <h1><span class="header-section-number">Chapter 5</span> Data Wrangling via dply
 <li>barplots</li>
 </ol>
 <p>We can simply specify what variable/column we would like on one axis, (if applicable) what variable we’d like on the other axis, and what type of plot we’d like to make by specifying the <code>geom</code>etric object in question. We can also vary aesthetic attributes of the geometric objects in question (points, lines, bar), such as the size and color, along the values of another variable in this tidy dataset. Recall the Gapminder example from Figure <a href="3-viz.html#fig:gapminder">3.1</a>.</p>
-<p>Lastly, in a few spots in Chapter <a href="3-viz.html#viz">3</a> and Chapter <a href="4-tidy.html#tidy">4</a>, we hinted at some ways to summarize and wrangle data to suit your needs, using the <code>filter()</code> and <code>inner_join()</code> functions. This chapter expands on these two functions and provides you with various new data wrangling tools from the <code>dplyr</code> package <span class="citation">(Wickham, François, et al. <a href="#ref-R-dplyr">2018</a>)</span> for your data science toolbox.</p>
+<p>Lastly, in a few spots in Chapter <a href="3-viz.html#viz">3</a> and Chapter <a href="4-tidy.html#tidy">4</a>, we hinted at some ways to summarize and wrangle data to suit your needs, using the <code>filter()</code> and <code>inner_join()</code> functions. This chapter expands on these two functions and provides you with various new data wrangling tools from the <code>dplyr</code> package <span class="citation">(Wickham et al. 2019)</span> for your data science toolbox.</p>
 <div id="needed-packages-2" class="section level3 unnumbered">
 <h3>Needed packages</h3>
 <p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
-<div class="sourceCode" id="cb61"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb61-1" data-line-number="1"><span class="kw">library</span>(dplyr)</a>
-<a class="sourceLine" id="cb61-2" data-line-number="2"><span class="kw">library</span>(ggplot2)</a>
-<a class="sourceLine" id="cb61-3" data-line-number="3"><span class="kw">library</span>(nycflights13)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(nycflights13)</code></pre>
 </div>
-<div id="datacamp-2" class="section level3 unnumbered">
+<div id="datacamp-3" class="section level3 unnumbered">
 <h3>DataCamp</h3>
-<p>Our approach to introducing data wrangling tools from the <code>dplyr</code> package is very similar to the approach taken in <a href="https://twitter.com/drob">David Robinson’s</a> DataCamp course “Introduction to the Tidyverse,” a course targeted at people new to R and the tidyverse. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 on “Data wrangling” and Chapter 3 on “Grouping and summarizing.”</p>
+<p>Our approach to introducing data wrangling tools from the <code>dplyr</code> package is very similar to the approach taken in <a href="https://twitter.com/drob">David Robinson’s</a> DataCamp course “Introduction to the Tidyverse,” a course targeted at people new to R and the tidyverse. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 on “Data wrangling” and Chapter 3 on “Grouping and summarizing”.</p>
 <center>
-<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/introduction-to-the-tidyverse"><img src="images/datacamp_intro_to_tidyverse.png" style="height: 150px;"/></a>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/introduction-to-the-tidyverse"><img src="images/datacamp_intro_to_tidyverse.png" alt="Drawing" style="height: 150px;"/></a>
 </center>
 <p>While not required for this book, if you would like a quick peek at more powerful tools to explore, tame, tidy, and transform data, we suggest you take <a href="https://twitter.com/apreshill">Alison Hill’s</a> DataCamp course “Working with Data in the Tidyverse,” Click on the image below to access the course. The relevant chapter is Chapter 3 “Tidy your data.”</p>
 <center>
-<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/working-with-data-in-the-tidyverse"><img src="images/datacamp_working_with_data.png" style="height: 150px;"/></a>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/working-with-data-in-the-tidyverse"><img src="images/datacamp_working_with_data.png" alt="Drawing" style="height: 150px;"/></a>
 </center>
 </div>
 <div id="piping" class="section level2">
 <h2><span class="header-section-number">5.1</span> The pipe <code>%&gt;%</code></h2>
-<p>Before we dig into data wrangling, let’s first introduce the pipe operator (<code>%&gt;%</code>). Just as the <code>+</code> sign was used to add layers to a plot created using <code>ggplot()</code>, the pipe operator allows us to chain together <code>dplyr</code> data wrangling functions. The pipe operator can be read as “<em>then</em>.” The <code>%&gt;%</code> operator allows us to go from one step in <code>dplyr</code> to the next easily so we can, for example:</p>
+<p>Before we dig into data wrangling, let’s first introduce the pipe operator (<code>%&gt;%</code>). Just as the <code>+</code> sign was used to add layers to a plot created using <code>ggplot()</code>, the pipe operator allows us to chain together <code>dplyr</code> data wrangling functions. The pipe operator can be read as “<em>then</em>”. The <code>%&gt;%</code> operator allows us to go from one step in <code>dplyr</code> to the next easily so we can, for example:</p>
 <ul>
 <li><code>filter</code> our data frame to only focus on a few rows <em>then</em></li>
 <li><code>group_by</code> another variable to create groups <em>then</em></li>
@@ -623,13 +612,13 @@ <h2><span class="header-section-number">5.3</span> Filter observations using fil
 <div class="figure" style="text-align: center"><span id="fig:filter"></span>
 <img src="images/filter.png" alt="Filter diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
 <p class="caption">
-FIGURE 5.1: Filter diagram from Data Wrangling with dplyr and tidyr cheatsheet
+Figure 5.1: Filter diagram from Data Wrangling with dplyr and tidyr cheatsheet
 </p>
 </div>
 <p>The <code>filter</code> function here works much like the “Filter” option in Microsoft Excel; it allows you to specify criteria about values of a variable in your dataset and then chooses only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The <code>dest</code> code (or airport code) for Portland, Oregon is <code>&quot;PDX&quot;</code>. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here:</p>
-<div class="sourceCode" id="cb62"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb62-1" data-line-number="1">portland_flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb62-2" data-line-number="2"><span class="st">  </span><span class="kw">filter</span>(dest <span class="op">==</span><span class="st"> &quot;PDX&quot;</span>)</a>
-<a class="sourceLine" id="cb62-3" data-line-number="3"><span class="kw">View</span>(portland_flights)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">portland_flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(dest <span class="op">==</span><span class="st"> &quot;PDX&quot;</span>)
+<span class="kw">View</span>(portland_flights)</code></pre>
 <p>Note the following:</p>
 <ul>
 <li>The ordering of the commands:
@@ -654,26 +643,15 @@ <h2><span class="header-section-number">5.3</span> Filter observations using fil
 <li><code>!=</code> corresponds to “not equal to”</li>
 </ul>
 <p>To see many of these in action, let’s select all flights that left JFK airport heading to Burlington, Vermont (<code>&quot;BTV&quot;</code>) or Seattle, Washington (<code>&quot;SEA&quot;</code>) in the months of October, November, or December. Run the following</p>
-<div class="sourceCode" id="cb63"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb63-1" data-line-number="1">btv_sea_flights_fall &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb63-2" data-line-number="2"><span class="st">  </span><span class="kw">filter</span>(origin <span class="op">==</span><span class="st"> &quot;JFK&quot;</span>, </a>
-<a class="sourceLine" id="cb63-3" data-line-number="3">         dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>, </a>
-<a class="sourceLine" id="cb63-4" data-line-number="4">         month <span class="op">&gt;=</span><span class="st"> </span><span class="dv">10</span>)</a>
-<a class="sourceLine" id="cb63-5" data-line-number="5"><span class="kw">View</span>(btv_sea_flights_fall)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">btv_sea_flights_fall &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(origin <span class="op">==</span><span class="st"> &quot;JFK&quot;</span>, (dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>), month <span class="op">&gt;=</span><span class="st"> </span><span class="dv">10</span>)
+<span class="kw">View</span>(btv_sea_flights_fall)</code></pre>
 <p>Note: even though colloquially speaking one might say “all flights leaving Burlington, Vermont <em>and</em> Seattle, Washington,” in terms of computer logical operations, we really mean “all flights leaving Burlington, Vermont <em>or</em> Seattle, Washington.” For a given row in the data, <code>dest</code> can be “BTV”, “SEA”, or something else, but not “BTV” and “SEA” at the same time.</p>
-<p>Another example uses the <code>!</code> to pick rows that <em>don’t</em> match a condition. The <code>!</code> can be read as “not.” Here we are selecting rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA.</p>
-<div class="sourceCode" id="cb64"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb64-1" data-line-number="1">not_BTV_SEA &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb64-2" data-line-number="2"><span class="st">  </span><span class="kw">filter</span>(<span class="op">!</span>(dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>))</a>
-<a class="sourceLine" id="cb64-3" data-line-number="3"><span class="kw">View</span>(not_BTV_SEA)</a></code></pre></div>
-<p>Now say we have a large list of airports we want to filter for, say <code>BTV</code>, <code>SEA</code>, <code>PDX</code>, <code>SFO</code>, and <code>BDL</code>. We could continue to use the <code>|</code> or operator as so:</p>
-<div class="sourceCode" id="cb65"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb65-1" data-line-number="1">many_airports &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb65-2" data-line-number="2"><span class="st">  </span><span class="kw">filter</span>(dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;PDX&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SFO&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;BDL&quot;</span>)</a>
-<a class="sourceLine" id="cb65-3" data-line-number="3"><span class="kw">View</span>(many_airports)</a></code></pre></div>
-<p>but as we progressively include more airports, this will get unwieldly. A slightly shorter approach uses the <code>%in%</code> operator:</p>
-<div class="sourceCode" id="cb66"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb66-1" data-line-number="1">many_airports &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb66-2" data-line-number="2"><span class="st">  </span><span class="kw">filter</span>(dest <span class="op">%in%</span><span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;BTV&quot;</span>, <span class="st">&quot;SEA&quot;</span>, <span class="st">&quot;PDX&quot;</span>, <span class="st">&quot;SFO&quot;</span>, <span class="st">&quot;BDL&quot;</span>))</a>
-<a class="sourceLine" id="cb66-3" data-line-number="3"><span class="kw">View</span>(many_airports)</a></code></pre></div>
-<p>What this code is doing is its filtering for all flights where <code>dest</code> is in the list of airports <code>c(&quot;BTV&quot;, &quot;SEA&quot;, &quot;PDX&quot;, &quot;SFO&quot;, &quot;BDL&quot;)</code>. Both outputs of <code>many_airports</code> are the same, but as you can see the latter takes much less time to code.</p>
-<p>As a final note we point out that <code>filter()</code> should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope to just the observations your care about.</p>
+<p>Another example uses the <code>!</code> to pick rows that <em>don’t</em> match a condition. The <code>!</code> can be read as “not”. Here we are selecting rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA.</p>
+<pre class="sourceCode r"><code class="sourceCode r">not_BTV_SEA &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(<span class="op">!</span>(dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>))
+<span class="kw">View</span>(not_BTV_SEA)</code></pre>
+<p>As a final note we point out that <code>filter()</code> should often be the first verb you’ll apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope to just the observations your care about.</p>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
@@ -690,54 +668,39 @@ <h2><span class="header-section-number">5.4</span> Summarize variables using sum
 <div class="figure" style="text-align: center"><span id="fig:sum1"></span>
 <img src="images/summarize1.png" alt="Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
 <p class="caption">
-FIGURE 5.2: Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
+Figure 5.2: Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
 </p>
 </div>
 <div class="figure" style="text-align: center"><span id="fig:sum2"></span>
 <img src="images/summary.png" alt="Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
 <p class="caption">
-FIGURE 5.3: Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
+Figure 5.3: Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
 </p>
 </div>
 <p>We can calculate the standard deviation and mean of the temperature variable <code>temp</code> in the <code>weather</code> data frame of <code>nycflights13</code> in one step using the <code>summarize</code> (or equivalently using the UK spelling <code>summarise</code>) function in <code>dplyr</code> (See Appendix <a href="A-appendixA.html#appendixA">A</a>):</p>
-<div class="sourceCode" id="cb67"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb67-1" data-line-number="1">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb67-2" data-line-number="2"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp), </a>
-<a class="sourceLine" id="cb67-3" data-line-number="3">            <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp))</a>
-<a class="sourceLine" id="cb67-4" data-line-number="4">summary_temp</a></code></pre></div>
-<!--
-TODO: Fix this output later. As is, the table outputs no rows
--->
-<pre><code># A tibble: 1 x 2
-   mean std_dev
-  &lt;dbl&gt;   &lt;dbl&gt;
-1    NA      NA</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp), <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp))
+summary_temp</code></pre>
+<p>mean std_dev
+—– ——–</p>
 <p>We’ve created a small data frame here called <code>summary_temp</code> that includes both the <code>mean</code> and the <code>std_dev</code> of the <code>temp</code> variable in <code>weather</code>. Notice as shown in Figures <a href="5-wrangling.html#fig:sum1">5.2</a> and <a href="5-wrangling.html#fig:sum2">5.3</a>, the data frame <code>weather</code> went from many rows to a single row of just the summary values in the data frame <code>summary_temp</code>.</p>
 <p>But why are the values returned <code>NA</code>? This stands for “not available or not applicable” and is how R encodes <em>missing values</em>; if in a data frame for a particular row and column no value exists, <code>NA</code> is stored instead. Furthermore, by default any time you try to summarize a number of values (using <code>mean()</code> and <code>sd()</code> for example) that has one or more missing values, then <code>NA</code> is returned.</p>
 <p>Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You’ll often encounter issues with missing values.</p>
 <p>You can summarize all non-missing values by setting the <code>na.rm</code> argument to TRUE (<code>rm</code> is short for “remove”). This will remove any <code>NA</code> missing values and only return the summary value for all non-missing values. So the code below computes the mean and standard deviation of all non-missing values. Notice how the <code>na.rm=TRUE</code> are set as arguments to the <code>mean()</code> and <code>sd()</code> functions, and not to the <code>summarize()</code> function.</p>
-<div class="sourceCode" id="cb69"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb69-1" data-line-number="1">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb69-2" data-line-number="2"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>), </a>
-<a class="sourceLine" id="cb69-3" data-line-number="3">            <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</a>
-<a class="sourceLine" id="cb69-4" data-line-number="4">summary_temp</a></code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
+<pre class="sourceCode r"><code class="sourceCode r">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>), <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
+summary_temp</code></pre>
+<table>
 <thead>
-<tr>
-<th style="text-align:right;">
-mean
-</th>
-<th style="text-align:right;">
-std_dev
-</th>
+<tr class="header">
+<th align="right">mean</th>
+<th align="right">std_dev</th>
 </tr>
 </thead>
 <tbody>
-<tr>
-<td style="text-align:right;">
-55.3
-</td>
-<td style="text-align:right;">
-17.8
-</td>
+<tr class="odd">
+<td align="right">55.26039</td>
+<td align="right">17.78785</td>
 </tr>
 </tbody>
 </table>
@@ -767,9 +730,9 @@ <h2><span class="header-section-number">5.4</span> Summarize variables using sum
 <p><strong>(LC5.2)</strong> Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach?</p>
 <p><strong>(LC5.3)</strong> Modify the above <code>summarize</code> function to create <code>summary_temp</code> to also use the <code>n()</code> summary function: <code>summarize(count = n())</code>. What does the returned value correspond to?</p>
 <p><strong>(LC5.4)</strong> Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run <code>summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE))</code> first.</p>
-<div class="sourceCode" id="cb70"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb70-1" data-line-number="1">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st">   </span></a>
-<a class="sourceLine" id="cb70-2" data-line-number="2"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb70-3" data-line-number="3"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st">   </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</code></pre>
 <div class="learncheck">
 
 </div>
@@ -779,7 +742,7 @@ <h2><span class="header-section-number">5.5</span> Group rows using group_by</h2
 <div class="figure" style="text-align: center"><span id="fig:groupsummarize"></span>
 <img src="images/group_summary.png" alt="Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
 <p class="caption">
-FIGURE 5.4: Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
+Figure 5.4: Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
 </p>
 </div>
 <p>It’s often more useful to summarize a variable based on the groupings of another variable. Let’s say, we are interested in the mean and standard deviation of temperatures but <em>grouped by month</em>. To be more specific: we want the mean and standard deviation of temperatures</p>
@@ -790,157 +753,79 @@ <h2><span class="header-section-number">5.5</span> Group rows using group_by</h2
 <li>collapsed over month.</li>
 </ol>
 <p>Run the following code:</p>
-<div class="sourceCode" id="cb71"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb71-1" data-line-number="1">summary_monthly_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb71-2" data-line-number="2"><span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb71-3" data-line-number="3"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>), </a>
-<a class="sourceLine" id="cb71-4" data-line-number="4">            <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</a>
-<a class="sourceLine" id="cb71-5" data-line-number="5">summary_monthly_temp</a></code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
+<pre class="sourceCode r"><code class="sourceCode r">summary_monthly_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>), 
+            <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
+summary_monthly_temp</code></pre>
+<table>
 <thead>
-<tr>
-<th style="text-align:right;">
-month
-</th>
-<th style="text-align:right;">
-mean
-</th>
-<th style="text-align:right;">
-std_dev
-</th>
+<tr class="header">
+<th align="right">month</th>
+<th align="right">mean</th>
+<th align="right">std_dev</th>
 </tr>
 </thead>
 <tbody>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:right;">
-35.6
-</td>
-<td style="text-align:right;">
-10.22
-</td>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">35.63566</td>
+<td align="right">10.224635</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:right;">
-34.3
-</td>
-<td style="text-align:right;">
-6.98
-</td>
+<tr class="even">
+<td align="right">2</td>
+<td align="right">34.27060</td>
+<td align="right">6.982378</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:right;">
-39.9
-</td>
-<td style="text-align:right;">
-6.25
-</td>
+<tr class="odd">
+<td align="right">3</td>
+<td align="right">39.88007</td>
+<td align="right">6.249278</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:right;">
-51.7
-</td>
-<td style="text-align:right;">
-8.79
-</td>
+<tr class="even">
+<td align="right">4</td>
+<td align="right">51.74564</td>
+<td align="right">8.786168</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:right;">
-61.8
-</td>
-<td style="text-align:right;">
-9.68
-</td>
+<tr class="odd">
+<td align="right">5</td>
+<td align="right">61.79500</td>
+<td align="right">9.681644</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:right;">
-72.2
-</td>
-<td style="text-align:right;">
-7.55
-</td>
+<tr class="even">
+<td align="right">6</td>
+<td align="right">72.18400</td>
+<td align="right">7.546371</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:right;">
-80.1
-</td>
-<td style="text-align:right;">
-7.12
-</td>
+<tr class="odd">
+<td align="right">7</td>
+<td align="right">80.06622</td>
+<td align="right">7.119898</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-8
-</td>
-<td style="text-align:right;">
-74.5
-</td>
-<td style="text-align:right;">
-5.19
-</td>
+<tr class="even">
+<td align="right">8</td>
+<td align="right">74.46847</td>
+<td align="right">5.191615</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-9
-</td>
-<td style="text-align:right;">
-67.4
-</td>
-<td style="text-align:right;">
-8.47
-</td>
+<tr class="odd">
+<td align="right">9</td>
+<td align="right">67.37129</td>
+<td align="right">8.465902</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-10
-</td>
-<td style="text-align:right;">
-60.1
-</td>
-<td style="text-align:right;">
-8.85
-</td>
+<tr class="even">
+<td align="right">10</td>
+<td align="right">60.07113</td>
+<td align="right">8.846035</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-11
-</td>
-<td style="text-align:right;">
-45.0
-</td>
-<td style="text-align:right;">
-10.44
-</td>
+<tr class="odd">
+<td align="right">11</td>
+<td align="right">44.99043</td>
+<td align="right">10.443805</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-12
-</td>
-<td style="text-align:right;">
-38.4
-</td>
-<td style="text-align:right;">
-9.98
-</td>
+<tr class="even">
+<td align="right">12</td>
+<td align="right">38.44180</td>
+<td align="right">9.982432</td>
 </tr>
 </tbody>
 </table>
@@ -948,45 +833,29 @@ <h2><span class="header-section-number">5.5</span> Group rows using group_by</h2
 <p>It is important to note that <code>group_by</code> doesn’t change the data frame. It sets <em>meta-data</em> (data about the data), specifically the group structure of the data. It is only after we apply the <code>summarize</code> function that the data frame changes.</p>
 <p>If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the <code>ungroup()</code> function. For example, say the group structure meta-data is set to be by month via <code>group_by(month)</code>, all future summarizations will be reported on a month-by-month basis. If however, we would like to no longer have this and have all summarizations be for all data in a single group (in this case over the entire year of 2013), then pipe the data frame in question through and <code>ungroup()</code> to remove this.</p>
 <p>We now revisit the <code>n()</code> counting summary function we introduced in the previous section. For example, suppose we’d like to get a sense for how many flights departed each of the three airports in New York City:</p>
-<div class="sourceCode" id="cb72"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb72-1" data-line-number="1">by_origin &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb72-2" data-line-number="2"><span class="st">  </span><span class="kw">group_by</span>(origin) <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb72-3" data-line-number="3"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())</a>
-<a class="sourceLine" id="cb72-4" data-line-number="4">by_origin</a></code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
+<pre class="sourceCode r"><code class="sourceCode r">by_origin &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(origin) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())
+by_origin</code></pre>
+<table>
 <thead>
-<tr>
-<th style="text-align:left;">
-origin
-</th>
-<th style="text-align:right;">
-count
-</th>
+<tr class="header">
+<th align="left">origin</th>
+<th align="right">count</th>
 </tr>
 </thead>
 <tbody>
-<tr>
-<td style="text-align:left;">
-EWR
-</td>
-<td style="text-align:right;">
-120835
-</td>
+<tr class="odd">
+<td align="left">EWR</td>
+<td align="right">120835</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-JFK
-</td>
-<td style="text-align:right;">
-111279
-</td>
+<tr class="even">
+<td align="left">JFK</td>
+<td align="right">111279</td>
 </tr>
-<tr>
-<td style="text-align:left;">
-LGA
-</td>
-<td style="text-align:right;">
-104662
-</td>
+<tr class="odd">
+<td align="left">LGA</td>
+<td align="right">104662</td>
 </tr>
 </tbody>
 </table>
@@ -994,12 +863,12 @@ <h2><span class="header-section-number">5.5</span> Group rows using group_by</h2
 <div id="grouping-by-more-than-one-variable" class="section level3">
 <h3><span class="header-section-number">5.5.1</span> Grouping by more than one variable</h3>
 <p>You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports <em>for each month</em>, we can also group by a second variable <code>month</code>: <code>group_by(origin, month)</code>.</p>
-<div class="sourceCode" id="cb73"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb73-1" data-line-number="1">by_origin_monthly &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb73-2" data-line-number="2"><span class="st">  </span><span class="kw">group_by</span>(origin, month) <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb73-3" data-line-number="3"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())</a>
-<a class="sourceLine" id="cb73-4" data-line-number="4">by_origin_monthly</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">by_origin_monthly &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(origin, month) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())
+by_origin_monthly</code></pre>
 <pre><code># A tibble: 36 x 3
-# Groups:   origin [?]
+# Groups:   origin [3]
    origin month count
    &lt;chr&gt;  &lt;int&gt; &lt;int&gt;
  1 EWR        1  9893
@@ -1013,12 +882,33 @@ <h3><span class="header-section-number">5.5.1</span> Grouping by more than one v
  9 EWR        9  9550
 10 EWR       10 10104
 # … with 26 more rows</code></pre>
-<p>We see there are 36 rows to <code>by_origin_monthly</code> because there are 12 months times 3 airports (<code>EWR</code>, <code>JFK</code>, and <code>LGA</code>). Why do we <code>group_by(origin, month)</code> and not <code>group_by(origin)</code> and then <code>group_by(month)</code>? Let’s investigate:</p>
-<div class="sourceCode" id="cb75"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb75-1" data-line-number="1">by_origin_monthly_incorrect &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb75-2" data-line-number="2"><span class="st">  </span><span class="kw">group_by</span>(origin) <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb75-3" data-line-number="3"><span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb75-4" data-line-number="4"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())</a>
-<a class="sourceLine" id="cb75-5" data-line-number="5">by_origin_monthly_incorrect</a></code></pre></div>
+<p>We see there are 36 rows to <code>by_origin_monthly</code> because there are 12 months times 3 airports (<code>EWR</code>, <code>JFK</code>, and <code>LGA</code>). Let’s now pose two questions. First, what if we reverse the order of the grouping i.e. we <code>group_by(month, origin)</code>?</p>
+<pre class="sourceCode r"><code class="sourceCode r">by_monthly_origin &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(month, origin) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())
+by_monthly_origin</code></pre>
+<pre><code># A tibble: 36 x 3
+# Groups:   month [12]
+   month origin count
+   &lt;int&gt; &lt;chr&gt;  &lt;int&gt;
+ 1     1 EWR     9893
+ 2     1 JFK     9161
+ 3     1 LGA     7950
+ 4     2 EWR     9107
+ 5     2 JFK     8421
+ 6     2 LGA     7423
+ 7     3 EWR    10420
+ 8     3 JFK     9697
+ 9     3 LGA     8717
+10     4 EWR    10531
+# … with 26 more rows</code></pre>
+<p>In <code>by_monthly_origin</code> the <code>month</code> column is now first and the rows are sorted by <code>month</code> instead of origin. If you compare the values of <code>count</code> in <code>by_origin_monthly</code> and <code>by_monthly_origin</code> using the <code>View()</code> function, you’ll see that the values are actually the same, just presented in a different order.</p>
+<p>Second, why do we <code>group_by(origin, month)</code> and not <code>group_by(origin)</code> and then <code>group_by(month)</code>? Let’s investigate:</p>
+<pre class="sourceCode r"><code class="sourceCode r">by_origin_monthly_incorrect &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(origin) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())
+by_origin_monthly_incorrect</code></pre>
 <pre><code># A tibble: 12 x 2
    month count
    &lt;int&gt; &lt;int&gt;
@@ -1066,12 +956,12 @@ <h2><span class="header-section-number">5.6</span> Create new variables/change o
 <div class="figure" style="text-align: center"><span id="fig:select"></span>
 <img src="images/mutate.png" alt="Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
 <p class="caption">
-FIGURE 5.5: Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet
+Figure 5.5: Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet
 </p>
 </div>
-<p>When looking at the <code>flights</code> dataset, there are some clear additional variables that could be calculated based on the values of variables already in the dataset. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to when they expected to land. This is commonly referred to as “gain” and we will create this variable using the <code>mutate</code> function. Note that we have also overwritten the <code>flights</code> data frame with what it was before as well as an additional variable <code>gain</code> here, or put differently, the <code>mutate()</code> command outputs a new data frame which then gets saved over the original <code>flights</code> data frame.</p>
-<div class="sourceCode" id="cb77"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb77-1" data-line-number="1">flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb77-2" data-line-number="2"><span class="st">  </span><span class="kw">mutate</span>(<span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay)</a></code></pre></div>
+<p>When looking at the <code>flights</code> dataset, there are some clear additional variables that could be calculated based on the values of variables already in the dataset. Passengers are often frustrated when their flights departs late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to when they expected to land. This is commonly referred to as “gain” and we will create this variable using the <code>mutate</code> function. Note that we have also overwritten the <code>flights</code> data frame with what it was before as well as an additional variable <code>gain</code> here, or put differently, the <code>mutate()</code> command outputs a new data frame which then gets saved over the original <code>flights</code> data frame.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay)</code></pre>
 <p>Let’s take a look at <code>dep_delay</code>, <code>arr_delay</code>, and the resulting <code>gain</code> variables for the first 5 rows in our new <code>flights</code> data frame:</p>
 <pre><code># A tibble: 5 x 3
   dep_delay arr_delay  gain
@@ -1084,92 +974,60 @@ <h2><span class="header-section-number">5.6</span> Create new variables/change o
 <p>The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its “gained time in the air” is actually a loss of 9 minutes, hence its <code>gain</code> is <code>-9</code>. Contrast this to the flight in the fourth row which departed a minute early (<code>dep_delay</code> of <code>-1</code>) but arrived 18 minutes early (<code>arr_delay</code> of <code>-18</code>), so its “gained time in the air” is 17 minutes, hence its <code>gain</code> is <code>+17</code>.</p>
 <p>Why did we overwrite <code>flights</code> instead of assigning the resulting data frame to a new object, like <code>flights_with_gain</code>? As a rough rule of thumb, as long as you are not losing information that you might need later, it’s acceptable practice to overwrite data frames. However, if you overwrite existing variables and/or change the observational units, recovering the original information might prove difficult. In this case, it might make sense to create a new data object.</p>
 <p>Let’s look at summary measures of this <code>gain</code> variable and even plot it in the form of a histogram:</p>
-<div class="sourceCode" id="cb79"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb79-1" data-line-number="1">gain_summary &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb79-2" data-line-number="2"><span class="st">  </span><span class="kw">summarize</span>(</a>
-<a class="sourceLine" id="cb79-3" data-line-number="3">    <span class="dt">min =</span> <span class="kw">min</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),</a>
-<a class="sourceLine" id="cb79-4" data-line-number="4">    <span class="dt">q1 =</span> <span class="kw">quantile</span>(gain, <span class="fl">0.25</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),</a>
-<a class="sourceLine" id="cb79-5" data-line-number="5">    <span class="dt">median =</span> <span class="kw">quantile</span>(gain, <span class="fl">0.5</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),</a>
-<a class="sourceLine" id="cb79-6" data-line-number="6">    <span class="dt">q3 =</span> <span class="kw">quantile</span>(gain, <span class="fl">0.75</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),</a>
-<a class="sourceLine" id="cb79-7" data-line-number="7">    <span class="dt">max =</span> <span class="kw">max</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),</a>
-<a class="sourceLine" id="cb79-8" data-line-number="8">    <span class="dt">mean =</span> <span class="kw">mean</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),</a>
-<a class="sourceLine" id="cb79-9" data-line-number="9">    <span class="dt">sd =</span> <span class="kw">sd</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),</a>
-<a class="sourceLine" id="cb79-10" data-line-number="10">    <span class="dt">missing =</span> <span class="kw">sum</span>(<span class="kw">is.na</span>(gain))</a>
-<a class="sourceLine" id="cb79-11" data-line-number="11">  )</a>
-<a class="sourceLine" id="cb79-12" data-line-number="12">gain_summary</a></code></pre></div>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
+<pre class="sourceCode r"><code class="sourceCode r">gain_summary &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(
+    <span class="dt">min =</span> <span class="kw">min</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">q1 =</span> <span class="kw">quantile</span>(gain, <span class="fl">0.25</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">median =</span> <span class="kw">quantile</span>(gain, <span class="fl">0.5</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">q3 =</span> <span class="kw">quantile</span>(gain, <span class="fl">0.75</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">max =</span> <span class="kw">max</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">mean =</span> <span class="kw">mean</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">sd =</span> <span class="kw">sd</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">missing =</span> <span class="kw">sum</span>(<span class="kw">is.na</span>(gain))
+  )
+gain_summary</code></pre>
+<table>
 <thead>
-<tr>
-<th style="text-align:right;">
-min
-</th>
-<th style="text-align:right;">
-q1
-</th>
-<th style="text-align:right;">
-median
-</th>
-<th style="text-align:right;">
-q3
-</th>
-<th style="text-align:right;">
-max
-</th>
-<th style="text-align:right;">
-mean
-</th>
-<th style="text-align:right;">
-sd
-</th>
-<th style="text-align:right;">
-missing
-</th>
+<tr class="header">
+<th align="right">min</th>
+<th align="right">q1</th>
+<th align="right">median</th>
+<th align="right">q3</th>
+<th align="right">max</th>
+<th align="right">mean</th>
+<th align="right">sd</th>
+<th align="right">missing</th>
 </tr>
 </thead>
 <tbody>
-<tr>
-<td style="text-align:right;">
--196
-</td>
-<td style="text-align:right;">
--3
-</td>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:right;">
-17
-</td>
-<td style="text-align:right;">
-109
-</td>
-<td style="text-align:right;">
-5.66
-</td>
-<td style="text-align:right;">
-18
-</td>
-<td style="text-align:right;">
-9430
-</td>
+<tr class="odd">
+<td align="right">-196</td>
+<td align="right">-3</td>
+<td align="right">7</td>
+<td align="right">17</td>
+<td align="right">109</td>
+<td align="right">5.659779</td>
+<td align="right">18.04365</td>
+<td align="right">9430</td>
 </tr>
 </tbody>
 </table>
 <p>We’ve recreated the <code>summary</code> function we saw in Chapter <a href="3-viz.html#viz">3</a> here using the <code>summarize</code> function in <code>dplyr</code>.</p>
-<div class="sourceCode" id="cb80"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb80-1" data-line-number="1"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> gain)) <span class="op">+</span></a>
-<a class="sourceLine" id="cb80-2" data-line-number="2"><span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">bins =</span> <span class="dv">20</span>)</a></code></pre></div>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-118"></span>
-<img src="ismaykim_files/figure-html/unnamed-chunk-118-1.png" alt="Histogram of gain variable" width="\textwidth" />
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> gain)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">bins =</span> <span class="dv">20</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-121"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-121-1.png" alt="Histogram of gain variable" width="\textwidth" />
 <p class="caption">
-FIGURE 5.6: Histogram of gain variable
+Figure 5.6: Histogram of gain variable
 </p>
 </div>
-<p>We can also create multiple columns at once and even refer to columns that were just created in a new column. Hadley and Garrett produce one such example in Chapter 5 of “R for Data Science” <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>:</p>
-<div class="sourceCode" id="cb81"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb81-1" data-line-number="1">flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb81-2" data-line-number="2"><span class="st">  </span><span class="kw">mutate</span>(</a>
-<a class="sourceLine" id="cb81-3" data-line-number="3">    <span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay,</a>
-<a class="sourceLine" id="cb81-4" data-line-number="4">    <span class="dt">hours =</span> air_time <span class="op">/</span><span class="st"> </span><span class="dv">60</span>,</a>
-<a class="sourceLine" id="cb81-5" data-line-number="5">    <span class="dt">gain_per_hour =</span> gain <span class="op">/</span><span class="st"> </span>hours</a>
-<a class="sourceLine" id="cb81-6" data-line-number="6">  )</a></code></pre></div>
+<p>We can also create multiple columns at once and even refer to columns that were just created in a new column. Hadley and Garrett produce one such example in Chapter 5 of “R for Data Science” <span class="citation">(Grolemund and Wickham 2016)</span>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(
+    <span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay,
+    <span class="dt">hours =</span> air_time <span class="op">/</span><span class="st"> </span><span class="dv">60</span>,
+    <span class="dt">gain_per_hour =</span> gain <span class="op">/</span><span class="st"> </span>hours
+  )</code></pre>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
@@ -1186,10 +1044,10 @@ <h2><span class="header-section-number">5.6</span> Create new variables/change o
 <h2><span class="header-section-number">5.7</span> Reorder the data frame using arrange</h2>
 <p>One of the most common things people working with data would like to do is sort the data frames by a specific variable in a column. Have you ever been asked to calculate a median by hand? This requires you to put the data in order from smallest to highest in value. The <code>dplyr</code> package has a function called <code>arrange</code> that we will use to sort/reorder our data according to the values of the specified variable. This is often used after we have used the <code>group_by</code> and <code>summarize</code> functions as we will see.</p>
 <p>Let’s suppose we were interested in determining the most frequent destination airports from New York City in 2013:</p>
-<div class="sourceCode" id="cb82"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb82-1" data-line-number="1">freq_dest &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb82-2" data-line-number="2"><span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb82-3" data-line-number="3"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>())</a>
-<a class="sourceLine" id="cb82-4" data-line-number="4">freq_dest</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">freq_dest &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>())
+freq_dest</code></pre>
 <pre><code># A tibble: 105 x 2
    dest  num_flights
    &lt;chr&gt;       &lt;int&gt;
@@ -1205,8 +1063,8 @@ <h2><span class="header-section-number">5.7</span> Reorder the data frame using
 10 BHM           297
 # … with 95 more rows</code></pre>
 <p>You’ll see that by default the values of <code>dest</code> are displayed in alphabetical order here. We are interested in finding those airports that appear most:</p>
-<div class="sourceCode" id="cb84"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb84-1" data-line-number="1">freq_dest <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb84-2" data-line-number="2"><span class="st">  </span><span class="kw">arrange</span>(num_flights)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">freq_dest <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">arrange</span>(num_flights)</code></pre>
 <pre><code># A tibble: 105 x 2
    dest  num_flights
    &lt;chr&gt;       &lt;int&gt;
@@ -1222,8 +1080,8 @@ <h2><span class="header-section-number">5.7</span> Reorder the data frame using
 10 BZN            36
 # … with 95 more rows</code></pre>
 <p>This is actually giving us the opposite of what we are looking for. It tells us the least frequent destination airports first. To switch the ordering to be descending instead of ascending we use the <code>desc</code> (<code>desc</code>ending) function:</p>
-<div class="sourceCode" id="cb86"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb86-1" data-line-number="1">freq_dest <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb86-2" data-line-number="2"><span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights))</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">freq_dest <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights))</code></pre>
 <pre><code># A tibble: 105 x 2
    dest  num_flights
    &lt;chr&gt;       &lt;int&gt;
@@ -1242,28 +1100,28 @@ <h2><span class="header-section-number">5.7</span> Reorder the data frame using
 <div id="joins" class="section level2">
 <h2><span class="header-section-number">5.8</span> Joining data frames</h2>
 <p>Another common task is joining AKA merging two different datasets. For example, in the <code>flights</code> data, the variable <code>carrier</code> lists the carrier code for the different flights. While <code>&quot;UA&quot;</code> and <code>&quot;AA&quot;</code> might be somewhat easy to guess for some (United and American Airlines), what are “VX”, “HA”, and “B6”? This information is provided in a separate data frame <code>airlines</code>.</p>
-<div class="sourceCode" id="cb88"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb88-1" data-line-number="1"><span class="kw">View</span>(airlines)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(airlines)</code></pre>
 <p>We see that in <code>airports</code>, <code>carrier</code> is the carrier code while <code>name</code> is the full name of the airline. Using this table, we can see that “VX”, “HA”, and “B6” correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, will we have to continually look up the carrier’s name for each flight in the <code>airlines</code> dataset? No! Instead of having to do this manually, we can have R automatically do the “looking up” for us.</p>
-<p>Note that the values in the variable <code>carrier</code> in <code>flights</code> match the values in the variable <code>carrier</code> in <code>airlines</code>. In this case, we can use the variable <code>carrier</code> as a <em>key variable</em> to join/merge/match the two data frames by. Key variables are almost always identification variables that uniquely identify the observational units as we saw back in Subsection <a href="4-tidy.html#identification-vs-measurement">4.2.2</a> on identification vs measurement variables. This ensures that rows in both data frames are appropriate matched during the join. Hadley and Garrett <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span> created the following diagram to help us understand how the different datasets are linked by various key variables:</p>
+<p>Note that the values in the variable <code>carrier</code> in <code>flights</code> match the values in the variable <code>carrier</code> in <code>airlines</code>. In this case, we can use the variable <code>carrier</code> as a <em>key variable</em> to join/merge/match the two data frames by. Key variables are almost always identification variables that uniquely identify the observational units as we saw back in Subsection <a href="4-tidy.html#identification-vs-measurement">4.2.2</a> on identification vs measurement variables. This ensures that rows in both data frames are appropriate matched during the join. Hadley and Garrett <span class="citation">(Grolemund and Wickham 2016)</span> created the following diagram to help us understand how the different datasets are linked by various key variables:</p>
 <div class="figure" style="text-align: center"><span id="fig:reldiagram"></span>
 <img src="images/relational-nycflights.png" alt="Data relationships in nycflights13 from R for Data Science" width="\textwidth" />
 <p class="caption">
-FIGURE 5.7: Data relationships in nycflights13 from R for Data Science
+Figure 5.7: Data relationships in nycflights13 from R for Data Science
 </p>
 </div>
 <div id="joining-by-key-variables" class="section level3">
 <h3><span class="header-section-number">5.8.1</span> Joining by “key” variables</h3>
 <p>In both <code>flights</code> and <code>airlines</code>, the key variable we want to join/merge/match the two data frames with has the same name in both datasets: <code>carriers</code>. We make use of the <code>inner_join()</code> function to join by the variable <code>carrier</code>.</p>
-<div class="sourceCode" id="cb89"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb89-1" data-line-number="1">flights_joined &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb89-2" data-line-number="2"><span class="st">  </span><span class="kw">inner_join</span>(airlines, <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)</a>
-<a class="sourceLine" id="cb89-3" data-line-number="3"><span class="kw">View</span>(flights)</a>
-<a class="sourceLine" id="cb89-4" data-line-number="4"><span class="kw">View</span>(flights_joined)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">flights_joined &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">inner_join</span>(airlines, <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)
+<span class="kw">View</span>(flights)
+<span class="kw">View</span>(flights_joined)</code></pre>
 <p>We observed that the <code>flights</code> and <code>flights_joined</code> are identical except that <code>flights_joined</code> has an additional variable <code>name</code> whose values were drawn from <code>airlines</code>.</p>
-<p>A visual representation of the <code>inner_join</code> is given below <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>:</p>
+<p>A visual representation of the <code>inner_join</code> is given below <span class="citation">(Grolemund and Wickham 2016)</span>:</p>
 <div class="figure" style="text-align: center"><span id="fig:ijdiagram"></span>
 <img src="images/join-inner.png" alt="Diagram of inner join from R for Data Science" width="\textwidth" />
 <p class="caption">
-FIGURE 5.8: Diagram of inner join from R for Data Science
+Figure 5.8: Diagram of inner join from R for Data Science
 </p>
 </div>
 <p>There are more complex joins available, but the <code>inner_join</code> will solve nearly all of the problems you’ll face in our experience.</p>
@@ -1277,23 +1135,23 @@ <h3><span class="header-section-number">5.8.2</span> Joining by “key” variab
 <li>&quot;Where is <code>&quot;FLL&quot;</code>?</li>
 </ul>
 <p>The <code>airports</code> data frame contains airport codes:</p>
-<div class="sourceCode" id="cb90"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb90-1" data-line-number="1"><span class="kw">View</span>(airports)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(airports)</code></pre>
 <p>However, looking at both the <code>airports</code> and <code>flights</code> and the visual representation of the relations between the data frames in Figure <a href="5-wrangling.html#fig:ijdiagram">5.8</a>, we see that in:</p>
 <ul>
 <li><code>airports</code> the airport code is in the variable <code>faa</code></li>
-<li><code>flights</code> the airport code is in the variables <code>origin</code> and <code>dest</code> (destination)</li>
+<li><code>flights</code> the airport code is in the variable <code>origin</code></li>
 </ul>
-<p>So to join these two datasets so that we can identify the destination cities, our <code>inner_join</code> operation involves a <code>by</code> argument that accounts for the different names:</p>
-<div class="sourceCode" id="cb91"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb91-1" data-line-number="1">flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb91-2" data-line-number="2"><span class="st">  </span><span class="kw">inner_join</span>(airports, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;dest&quot;</span> =<span class="st"> &quot;faa&quot;</span>))</a></code></pre></div>
+<p>So to join these two datasets, our <code>inner_join</code> operation involves a <code>by</code> argument that accounts for the different names:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">inner_join</span>(airports, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;dest&quot;</span> =<span class="st"> &quot;faa&quot;</span>))</code></pre>
 <p>Let’s construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport:</p>
-<div class="sourceCode" id="cb92"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb92-1" data-line-number="1">named_dests &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span></a>
-<a class="sourceLine" id="cb92-2" data-line-number="2"><span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span></a>
-<a class="sourceLine" id="cb92-3" data-line-number="3"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>()) <span class="op">%&gt;%</span></a>
-<a class="sourceLine" id="cb92-4" data-line-number="4"><span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights)) <span class="op">%&gt;%</span></a>
-<a class="sourceLine" id="cb92-5" data-line-number="5"><span class="st">  </span><span class="kw">inner_join</span>(airports, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;dest&quot;</span> =<span class="st"> &quot;faa&quot;</span>)) <span class="op">%&gt;%</span></a>
-<a class="sourceLine" id="cb92-6" data-line-number="6"><span class="st">  </span><span class="kw">rename</span>(<span class="dt">airport_name =</span> name)</a>
-<a class="sourceLine" id="cb92-7" data-line-number="7">named_dests</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">named_dests &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>()) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights)) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">inner_join</span>(airports, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;dest&quot;</span> =<span class="st"> &quot;faa&quot;</span>)) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">airport_name =</span> name)
+named_dests</code></pre>
 <pre><code># A tibble: 101 x 9
    dest  num_flights airport_name          lat    lon   alt    tz dst   tzone   
    &lt;chr&gt;       &lt;int&gt; &lt;chr&gt;               &lt;dbl&gt;  &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;   
@@ -1314,10 +1172,9 @@ <h3><span class="header-section-number">5.8.2</span> Joining by “key” variab
 <h3><span class="header-section-number">5.8.3</span> Joining by multiple “key” variables</h3>
 <p>Say instead we are in a situation where we need to join by multiple variables. For example, in Figure <a href="5-wrangling.html#fig:reldiagram">5.7</a> above we see that in order to join the <code>flights</code> and <code>weather</code> data frames, we need more than one key variable: <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code>, and <code>origin</code>. This is because the combination of these 5 variables act to uniquely identify each observational unit in the <code>weather</code> data frame: hourly weather recordings at each of the 3 NYC airports.</p>
 <p>We achieve this by specifying a vector of key variables to join by using the <code>c()</code> concatenate function. Note the individual variables need to be wrapped in quotation marks.</p>
-<div class="sourceCode" id="cb94"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb94-1" data-line-number="1">flights_weather_joined &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span></a>
-<a class="sourceLine" id="cb94-2" data-line-number="2"><span class="st">  </span><span class="kw">inner_join</span>(weather, </a>
-<a class="sourceLine" id="cb94-3" data-line-number="3">             <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;year&quot;</span>, <span class="st">&quot;month&quot;</span>, <span class="st">&quot;day&quot;</span>, <span class="st">&quot;hour&quot;</span>, <span class="st">&quot;origin&quot;</span>))</a>
-<a class="sourceLine" id="cb94-4" data-line-number="4">flights_weather_joined</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">flights_weather_joined &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">inner_join</span>(weather, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;year&quot;</span>, <span class="st">&quot;month&quot;</span>, <span class="st">&quot;day&quot;</span>, <span class="st">&quot;hour&quot;</span>, <span class="st">&quot;origin&quot;</span>))
+flights_weather_joined</code></pre>
 <pre><code># A tibble: 335,220 x 32
     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
    &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt;    &lt;int&gt;          &lt;int&gt;     &lt;dbl&gt;    &lt;int&gt;          &lt;int&gt;
@@ -1352,52 +1209,52 @@ <h3><span class="header-section-number">5.8.3</span> Joining by multiple “key
 </div>
 <div id="other-verbs" class="section level2">
 <h2><span class="header-section-number">5.9</span> Other verbs</h2>
-<p>On top of the following examples of other verbs, if you’d like to see more examples on using <code>dplyr</code>, the data wrangling verbs we introduction in Section <a href="5-wrangling.html#verbs">5.2</a>, and the pipe function <code>%&gt;%</code> with the <code>nycflights13</code> dataset, check out <a href="http://r4ds.had.co.nz/transform.html">Chapter 5</a> of Hadley and Garrett’s book <span class="citation">(Grolemund and Wickham <a href="#ref-rds2016">2016</a>)</span>.</p>
+<p>On top of the following examples of other verbs, if you’d like to see more examples on using <code>dplyr</code>, the data wrangling verbs we introduction in Section <a href="5-wrangling.html#verbs">5.2</a>, and the pipe function <code>%&gt;%</code> with the <code>nycflights13</code> dataset, check out <a href="http://r4ds.had.co.nz/transform.html">Chapter 5</a> of Hadley and Garrett’s book <span class="citation">(Grolemund and Wickham 2016)</span>.</p>
 <div id="select" class="section level3">
 <h3><span class="header-section-number">5.9.1</span> Select variables using select</h3>
 <div class="figure" style="text-align: center"><span id="fig:selectfig"></span>
 <img src="images/select.png" alt="Select diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
 <p class="caption">
-FIGURE 5.9: Select diagram from Data Wrangling with dplyr and tidyr cheatsheet
+Figure 5.9: Select diagram from Data Wrangling with dplyr and tidyr cheatsheet
 </p>
 </div>
 <p>We’ve seen that the <code>flights</code> data frame in the <code>nycflights13</code> package contains many different variables. The <code>names</code> function gives a listing of all the columns in a data frame; in our case you would run <code>names(flights)</code>. You can also identify these variables by running the <code>glimpse</code> function in the <code>dplyr</code> package:</p>
-<div class="sourceCode" id="cb96"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb96-1" data-line-number="1"><span class="kw">glimpse</span>(flights)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(flights)</code></pre>
 <p>However, say you only want to consider two of these variables, say <code>carrier</code> and <code>flight</code>. You can <code>select</code> these:</p>
-<div class="sourceCode" id="cb97"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb97-1" data-line-number="1">flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb97-2" data-line-number="2"><span class="st">  </span><span class="kw">select</span>(carrier, flight)</a></code></pre></div>
-<p>This function makes navigating datasets with a very large number of variables easier for humans by restricting consideration to only those of interest, like <code>carrier</code> and <code>flight</code> above. So for example, this might make viewing the dataset using the <code>View()</code> spreadsheet viewer more digestible. However, as far as the computer is concerned it doesn’t care how many additional variables are in the dataset in question, so long as <code>carrier</code> and <code>flight</code> are included.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(carrier, flight)</code></pre>
+<p>This function makes navigating datasets with a very large number of variables easier for humans by restricting consideration to only those of interest, like <code>carrier</code> and <code>flight</code> above. So for example, this might make viewing the dataset using the <code>View()</code> spreadsheet viewer more digestible. However, as far as the computer is concerned it doesn’t care how many variables additional variables are in the dataset in question, so long as <code>carrier</code> and <code>flight</code> are included.</p>
 <p>Another example involves the variable <code>year</code>. If you remember the original description of the <code>flights</code> data frame (or by running <code>?flights</code>), you’ll remember that this data correspond to flights in 2013 departing New York City. The <code>year</code> variable isn’t really a variable here in that it doesn’t vary… <code>flights</code> actually comes from a larger dataset that covers many years. We may want to remove the <code>year</code> variable from our dataset since it won’t be helpful for analysis in this case. We can deselect <code>year</code> by using the <code>-</code> sign:</p>
-<div class="sourceCode" id="cb98"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb98-1" data-line-number="1">flights_no_year &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb98-2" data-line-number="2"><span class="st">  </span><span class="kw">select</span>(<span class="op">-</span>year)</a>
-<a class="sourceLine" id="cb98-3" data-line-number="3"><span class="kw">names</span>(flights_no_year)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">flights_no_year &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span>year)
+<span class="kw">names</span>(flights_no_year)</code></pre>
 <p>Or we could specify a ranges of columns:</p>
-<div class="sourceCode" id="cb99"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb99-1" data-line-number="1">flight_arr_times &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb99-2" data-line-number="2"><span class="st">  </span><span class="kw">select</span>(month<span class="op">:</span>day, arr_time<span class="op">:</span>sched_arr_time)</a>
-<a class="sourceLine" id="cb99-3" data-line-number="3">flight_arr_times</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">flight_arr_times &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(month<span class="op">:</span>day, arr_time<span class="op">:</span>sched_arr_time)
+flight_arr_times</code></pre>
 <p>The <code>select</code> function can also be used to reorder columns in combination with the <code>everything</code> helper function. Let’s suppose we’d like the <code>hour</code>, <code>minute</code>, and <code>time_hour</code> variables, which appear at the end of the <code>flights</code> dataset, to actually appear immediately after the <code>day</code> variable:</p>
-<div class="sourceCode" id="cb100"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb100-1" data-line-number="1">flights_reorder &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb100-2" data-line-number="2"><span class="st">  </span><span class="kw">select</span>(month<span class="op">:</span>day, hour<span class="op">:</span>time_hour, <span class="kw">everything</span>())</a>
-<a class="sourceLine" id="cb100-3" data-line-number="3"><span class="kw">names</span>(flights_reorder)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">flights_reorder &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(month<span class="op">:</span>day, hour<span class="op">:</span>time_hour, <span class="kw">everything</span>())
+<span class="kw">names</span>(flights_reorder)</code></pre>
 <p>in this case <code>everything()</code> picks up all remaining variables. Lastly, the helper functions <code>starts_with</code>, <code>ends_with</code>, and <code>contains</code> can be used to choose column names that match those conditions:</p>
-<div class="sourceCode" id="cb101"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb101-1" data-line-number="1">flights_begin_a &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb101-2" data-line-number="2"><span class="st">  </span><span class="kw">select</span>(<span class="kw">starts_with</span>(<span class="st">&quot;a&quot;</span>))</a>
-<a class="sourceLine" id="cb101-3" data-line-number="3">flights_begin_a</a></code></pre></div>
-<div class="sourceCode" id="cb102"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb102-1" data-line-number="1">flights_delays &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb102-2" data-line-number="2"><span class="st">  </span><span class="kw">select</span>(<span class="kw">ends_with</span>(<span class="st">&quot;delay&quot;</span>))</a>
-<a class="sourceLine" id="cb102-3" data-line-number="3">flights_delays</a></code></pre></div>
-<div class="sourceCode" id="cb103"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb103-1" data-line-number="1">flights_time &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb103-2" data-line-number="2"><span class="st">  </span><span class="kw">select</span>(<span class="kw">contains</span>(<span class="st">&quot;time&quot;</span>))</a>
-<a class="sourceLine" id="cb103-3" data-line-number="3">flights_time</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">flights_begin_a &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="kw">starts_with</span>(<span class="st">&quot;a&quot;</span>))
+flights_begin_a</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">flights_delays &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="kw">ends_with</span>(<span class="st">&quot;delay&quot;</span>))
+flights_delays</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">flights_time &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="kw">contains</span>(<span class="st">&quot;time&quot;</span>))
+flights_time</code></pre>
 </div>
 <div id="rename" class="section level3">
 <h3><span class="header-section-number">5.9.2</span> Rename variables using rename</h3>
 <p>Another useful function is <code>rename</code>, which as you may suspect renames one column to another name. Suppose we wanted <code>dep_time</code> and <code>arr_time</code> to be <code>departure_time</code> and <code>arrival_time</code> instead in the <code>flights_time</code> data frame:</p>
-<div class="sourceCode" id="cb104"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb104-1" data-line-number="1">flights_time_new &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb104-2" data-line-number="2"><span class="st">  </span><span class="kw">select</span>(<span class="kw">contains</span>(<span class="st">&quot;time&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb104-3" data-line-number="3"><span class="st">  </span><span class="kw">rename</span>(<span class="dt">departure_time =</span> dep_time,</a>
-<a class="sourceLine" id="cb104-4" data-line-number="4">         <span class="dt">arrival_time =</span> arr_time)</a>
-<a class="sourceLine" id="cb104-5" data-line-number="5"><span class="kw">names</span>(flights_time)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">flights_time_new &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="kw">contains</span>(<span class="st">&quot;time&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">departure_time =</span> dep_time,
+         <span class="dt">arrival_time =</span> arr_time)
+<span class="kw">names</span>(flights_time)</code></pre>
 <p>Note that in this case we used a single <code>=</code> sign with the <code>rename()</code>. Ex: <code>departure_time = dep_time</code>. This is because we are not testing for equality like we would using <code>==</code>, but instead we want to assign a new variable <code>departure_time</code> to have the same values as <code>dep_time</code> and then delete the variable <code>dep_time</code>.</p>
 <p>It’s easy to forget if the new name comes before or after the equals sign. I usually remember this as “New Before, Old After” or NBOA. You’ll receive an error if you try to do it the other way:</p>
 <pre><code>Error: Unknown variables: departure_time, arrival_time.</code></pre>
@@ -1405,20 +1262,20 @@ <h3><span class="header-section-number">5.9.2</span> Rename variables using rena
 <div id="find-the-top-number-of-values-using-top_n" class="section level3">
 <h3><span class="header-section-number">5.9.3</span> Find the top number of values using top_n</h3>
 <p>We can also use the <code>top_n</code> function which automatically tells us the most frequent <code>num_flights</code>. We specify the top 10 airports here:</p>
-<div class="sourceCode" id="cb106"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb106-1" data-line-number="1">named_dests <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb106-2" data-line-number="2"><span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>, <span class="dt">wt =</span> num_flights)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">named_dests <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>, <span class="dt">wt =</span> num_flights)</code></pre>
 <p>We’ll still need to arrange this by <code>num_flights</code> though:</p>
-<div class="sourceCode" id="cb107"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb107-1" data-line-number="1">named_dests  <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb107-2" data-line-number="2"><span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>, <span class="dt">wt =</span> num_flights) <span class="op">%&gt;%</span><span class="st"> </span></a>
-<a class="sourceLine" id="cb107-3" data-line-number="3"><span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights))</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">named_dests  <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>, <span class="dt">wt =</span> num_flights) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights))</code></pre>
 <p><strong>Note:</strong> Remember that I didn’t pull the <code>n</code> and <code>wt</code> arguments out of thin air. They can be found by using the <code>?</code> function on <code>top_n</code>.</p>
 <p>We can go one stop further and tie together the <code>group_by</code> and <code>summarize</code> functions we used to find the most frequent flights:</p>
-<div class="sourceCode" id="cb108"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb108-1" data-line-number="1">ten_freq_dests &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span></a>
-<a class="sourceLine" id="cb108-2" data-line-number="2"><span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span></a>
-<a class="sourceLine" id="cb108-3" data-line-number="3"><span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>()) <span class="op">%&gt;%</span></a>
-<a class="sourceLine" id="cb108-4" data-line-number="4"><span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights)) <span class="op">%&gt;%</span></a>
-<a class="sourceLine" id="cb108-5" data-line-number="5"><span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>) </a>
-<a class="sourceLine" id="cb108-6" data-line-number="6"><span class="kw">View</span>(ten_freq_dests)</a></code></pre></div>
+<pre class="sourceCode r"><code class="sourceCode r">ten_freq_dests &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>()) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights)) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>) 
+<span class="kw">View</span>(ten_freq_dests)</code></pre>
 <div class="learncheck">
 <p>
 <strong><em>Learning check</em></strong>
@@ -1438,99 +1295,50 @@ <h2><span class="header-section-number">5.10</span> Conclusion</h2>
 <div id="putting-it-all-together-available-seat-miles" class="section level3">
 <h3><span class="header-section-number">5.10.1</span> Putting it all together: Available seat miles</h3>
 <p>Let’s recap a selection of verbs in Table <a href="5-wrangling.html#tab:wrangle-summary-table">5.1</a> summarizing their differences. Using these verbs and the pipe <code>%&gt;%</code> operator from Section <a href="5-wrangling.html#piping">5.1</a>, you’ll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book.</p>
-<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
-<caption style="font-size: initial !important;">
-<span id="tab:wrangle-summary-table">TABLE 5.1: </span>Summary of data wrangling verbs
-</caption>
+<table>
+<caption><span id="tab:wrangle-summary-table">Table 5.1: </span>Summary of data wrangling verbs</caption>
 <thead>
-<tr>
-<th style="text-align:right;">
-</th>
-<th style="text-align:left;">
-Verb
-</th>
-<th style="text-align:left;">
-Data wrangling operation
-</th>
+<tr class="header">
+<th></th>
+<th align="left">Verb</th>
+<th align="left">Data wrangling operation</th>
 </tr>
 </thead>
 <tbody>
-<tr>
-<td style="text-align:right;">
-1
-</td>
-<td style="text-align:left;width: 0.9in; ">
-<code>filter()</code>
-</td>
-<td style="text-align:left;width: 3.3in; ">
-Pick out a subset of rows
-</td>
+<tr class="odd">
+<td>1</td>
+<td align="left"><code>filter()</code></td>
+<td align="left">Pick out a subset of rows</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-2
-</td>
-<td style="text-align:left;width: 0.9in; ">
-<code>summarize()</code>
-</td>
-<td style="text-align:left;width: 3.3in; ">
-Summarize many values to one using a summary statistic function like <code>mean()</code>, <code>median()</code>, etc.
-</td>
+<tr class="even">
+<td>2</td>
+<td align="left"><code>summarize()</code></td>
+<td align="left">Summarize many values to one using a summary statistic function like <code>mean()</code>, <code>median()</code>, etc.</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-3
-</td>
-<td style="text-align:left;width: 0.9in; ">
-<code>group_by()</code>
-</td>
-<td style="text-align:left;width: 3.3in; ">
-Add grouping structure to rows in data frame. Note this does not change values in data frame.
-</td>
+<tr class="odd">
+<td>3</td>
+<td align="left"><code>group_by()</code></td>
+<td align="left">Add grouping structure to rows in data frame. Note this does not change values in data frame.</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-4
-</td>
-<td style="text-align:left;width: 0.9in; ">
-<code>mutate()</code>
-</td>
-<td style="text-align:left;width: 3.3in; ">
-Create new variables by mutating existing ones
-</td>
+<tr class="even">
+<td>4</td>
+<td align="left"><code>mutate()</code></td>
+<td align="left">Create new variables by mutating existing ones</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-5
-</td>
-<td style="text-align:left;width: 0.9in; ">
-<code>arrange()</code>
-</td>
-<td style="text-align:left;width: 3.3in; ">
-Arrange rows of a data variable in ascending (default) or <code>desc</code>ending order
-</td>
+<tr class="odd">
+<td>5</td>
+<td align="left"><code>arrange()</code></td>
+<td align="left">Arrange rows of a data variable in ascending (default) or <code>desc</code>ending order</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-6
-</td>
-<td style="text-align:left;width: 0.9in; ">
-<code>inner_join()</code>
-</td>
-<td style="text-align:left;width: 3.3in; ">
-Join/merge two data frames, matching rows by a key variable
-</td>
+<tr class="even">
+<td>6</td>
+<td align="left"><code>inner_join()</code></td>
+<td align="left">Join/merge two data frames, matching rows by a key variable</td>
 </tr>
-<tr>
-<td style="text-align:right;">
-7
-</td>
-<td style="text-align:left;width: 0.9in; ">
-<code>select()</code>
-</td>
-<td style="text-align:left;width: 3.3in; ">
-Pick out a subset of columns to make data frames easier to view
-</td>
+<tr class="odd">
+<td>7</td>
+<td align="left"><code>select()</code></td>
+<td align="left">Pick out a subset of columns to make data frames easier to view</td>
 </tr>
 </tbody>
 </table>
@@ -1553,7 +1361,7 @@ <h3><span class="header-section-number">5.10.1</span> Putting it all together: A
 </div>
 <div id="review-questions-2" class="section level3">
 <h3><span class="header-section-number">5.10.2</span> Review questions</h3>
-<p>Review questions have been designed using the <code>fivethirtyeight</code> R package <span class="citation">(Kim, Ismay, and Chunn <a href="#ref-R-fivethirtyeight">2018</a>)</span> with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course <strong>Effective Data Storytelling using the <code>tidyverse</code></strong>. The material in this chapter is covered in the chapters of the DataCamp course available below:</p>
+<p>Review questions have been designed using the <code>fivethirtyeight</code> R package <span class="citation">(Kim, Ismay, and Chunn 2019)</span> with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course <strong>Effective Data Storytelling using the <code>tidyverse</code></strong>. The material in this chapter is covered in the chapters of the DataCamp course available below:</p>
 <ul>
 <li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17585?ex=1">Filtering, Grouping, &amp; Summarizing</a></li>
 <li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17586?ex=1">dplyr Review</a></li>
@@ -1562,18 +1370,16 @@ <h3><span class="header-section-number">5.10.2</span> Review questions</h3>
 <div id="whats-to-come-3" class="section level3">
 <h3><span class="header-section-number">5.10.3</span> What’s to come?</h3>
 <p>Congratulations! We’ve completed the “data science” portion of this book! We’ll now move to the “data modeling” portion in Chapters <a href="6-regression.html#regression">6</a> and <a href="7-multiple-regression.html#multiple-regression">7</a>, where you’ll leverage your data visualization and wrangling skills to model the <em>relationships</em> between different variables of datasets. However, we’re going to leave “Inference for Regression” (Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a>) until later.</p>
-<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-150"></span>
-<img src="images/flowcharts/flowchart/flowchart.005.png" alt="ModernDive flowchart - On to Part II!" width="\textwidth" />
-<p class="caption">
-FIGURE 5.10: ModernDive flowchart - On to Part II!
-</p>
-</div>
+<center>
+<img src="images/flowcharts/flowchart/flowchart.005.png" title="ModernDive flowchart" width="800"/>
+</center>
 </div>
 <div id="resources-1" class="section level3">
 <h3><span class="header-section-number">5.10.4</span> Resources</h3>
-<p>As we saw with the RStudio cheatsheet on <a href="https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf">data visualization</a>, RStudio has also created a cheatsheet for data wrangling entitled <a href="https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf">“Data Transformation with dplyr”</a>.</p>
+<p>As we saw with the RStudio cheatsheet on <a href="https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf">data visualization</a>, RStudio has also created a cheatsheet for data wrangling entitled <a href="https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/data-transformation-cheatsheet.pdf">“Data Transformation with dplyr”</a>.</p>
 <!--
 * By clicking [here](https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/data-transformation-cheatsheet.pdf)
+* Or by clicking the RStudio Menu Bar -> Help -> Cheatsheets -> "Data Manipulation with `dplyr`, `tidyr`"
 (The RStudio interface has not been updated to include JUST the dplyr cheatsheet)
 
 We will focus only on the `dplyr` functions in this book, but you are encouraged to also explore `tidyr` if you are presented with data that is not in the tidy format that we have specified as the preferred option for our purposes.
@@ -1581,7 +1387,7 @@ <h3><span class="header-section-number">5.10.4</span> Resources</h3>
 </div>
 <div id="script-of-r-code-2" class="section level3">
 <h3><span class="header-section-number">5.10.5</span> Script of R code</h3>
-<p>An R script file of all R code used in this chapter is available <a href="scripts/05-wrangling.R">here</a>.</p>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/05-wrangling.R">here</a>.</p>
 
 </div>
 </div>
@@ -1589,18 +1395,6 @@ <h3><span class="header-section-number">5.10.5</span> Script of R code</h3>
 
 
 
-<h3>References</h3>
-<div id="refs" class="references">
-<div id="ref-R-dplyr">
-<p>Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2018. <em>Dplyr: A Grammar of Data Manipulation</em>. <a href="https://CRAN.R-project.org/package=dplyr" class="uri">https://CRAN.R-project.org/package=dplyr</a>.</p>
-</div>
-<div id="ref-rds2016">
-<p>Grolemund, Garrett, and Hadley Wickham. 2016. <em>R for Data Science</em>. <a href="http://r4ds.had.co.nz/" class="uri">http://r4ds.had.co.nz/</a>.</p>
-</div>
-<div id="ref-R-fivethirtyeight">
-<p>Kim, Albert Y., Chester Ismay, and Jennifer Chunn. 2018. <em>Fivethirtyeight: Data and Code Behind the Stories and Interactives at ’Fivethirtyeight’</em>. <a href="https://github.com/rudeboybert/fivethirtyeight" class="uri">https://github.com/rudeboybert/fivethirtyeight</a>.</p>
-</div>
-</div>
             </section>
 
           </div>
@@ -1627,7 +1421,7 @@ <h3>References</h3>
 "google": false,
 "linkedin": false,
 "weibo": false,
-"instapper": false,
+"instapaper": false,
 "vk": false,
 "all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
 },
@@ -1640,6 +1434,10 @@ <h3>References</h3>
 "link": "https://github.com/moderndive/moderndive_book/edit/master/05-wrangling.Rmd",
 "text": "Edit"
 },
+"history": {
+"link": null,
+"text": null
+},
 "download": null,
 "toc": {
 "collapse": "section",
@@ -1655,7 +1453,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:" && /^https?:/.test(src))
       src = src.replace(/^https?:/, '');
     script.src = src;
diff --git a/docs/previous_versions/v0.4.0/6-confidence-intervals.html b/docs/previous_versions/v0.4.0/6-confidence-intervals.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/docs/previous_versions/v0.4.0/6-regression.html b/docs/previous_versions/v0.4.0/6-regression.html
new file mode 100644
index 000000000..e511ba227
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/6-regression.html
@@ -0,0 +1,2091 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>6 Basic Regression | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="6 Basic Regression | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="6 Basic Regression | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="5-wrangling.html">
+<link rel="next" href="7-multiple-regression.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="regression" class="section level1">
+<h1><span class="header-section-number">6</span> Basic Regression</h1>
+<p>Now that we are equipped with data visualization skills from Chapter <a href="3-viz.html#viz">3</a>, an understanding of the “tidy” data format from Chapter <a href="4-tidy.html#tidy">4</a>, and data wrangling skills from Chapter <a href="5-wrangling.html#wrangling">5</a>, we now proceed with data modeling. The fundamental premise of data modeling is <em>to make explicit the relationship</em> between:</p>
+<ul>
+<li>an outcome variable <span class="math inline">\(y\)</span>, also called a dependent variable and</li>
+<li>an explanatory/predictor variable <span class="math inline">\(x\)</span>, also called an independent variable or covariate.</li>
+</ul>
+<p>Another way to state this is using mathematical terminology: we will model the outcome variable <span class="math inline">\(y\)</span> <em>as a function</em> of the explanatory/predictor variable <span class="math inline">\(x\)</span>. Why do we have two different labels, explanatory and predictor, for the variable <span class="math inline">\(x\)</span>? That’s because roughly speaking data modeling can be used for two purposes:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Modeling for prediction</strong>: You want to predict an outcome variable <span class="math inline">\(y\)</span> based on the information contained in a set of predictor variables. You don’t care so much about understanding how all the variables relate and interact, but so long as you can make good predictions about <span class="math inline">\(y\)</span>, you’re fine. For example, if we know many individuals’ risk factors for lung cancer, such as smoking habits and age, can we predict whether or not they will develop lung cancer? Here we wouldn’t care so much about distinguishing the degree to which the different risk factors contribute to lung cancer, but instead only on whether or not they could be put together to make reliable predictions.</li>
+<li><strong>Modeling for explanation</strong>: You want to explicitly describe the relationship between an outcome variable <span class="math inline">\(y\)</span> and a set of explanatory variables, determine the significance of any found relationships, and have measures summarizing these. Continuing our example from above, we would now be interested in describing the individual effects of the different risk factors and quantifying the magnitude of these effects. One reason could be to design an intervention to reduce lung cancer cases in a population, such as targeting smokers of a specific age group with an advertisement for smoking cessation programs. In this book, we’ll focus more on this latter purpose.</li>
+</ol>
+<p>Data modeling is used in a wide variety of fields, including statistical inference, causal inference, artificial intelligence, and machine learning. There are many techniques for data modeling, such as tree-based models, neural networks and deep learning, and supervised learning. In this chapter, we’ll focus on one particular technique: <em>linear regression</em>, one of the most commonly-used and easy-to-understand approaches to modeling. Recall our discussion in Subsection <a href="2-getting-started.html#exploredataframes">2.4.3</a> on numerical and categorical variables. Linear regression involves:</p>
+<ul>
+<li>an outcome variable <span class="math inline">\(y\)</span> that is <em>numerical</em> and</li>
+<li>explanatory variables <span class="math inline">\(\vec{x}\)</span> that are either <em>numerical</em> or <em>categorical</em>.</li>
+</ul>
+<p>With linear regression there is always only one numerical outcome variable <span class="math inline">\(y\)</span> but we have choices on both the number and the type of explanatory variables <span class="math inline">\(\vec{x}\)</span> to use. We’re going to cover the following regression scenarios:</p>
+<ul>
+<li>In this current chapter on basic regression, we’ll always have only one explanatory variable.
+<ul>
+<li>In Section <a href="6-regression.html#model1">6.1</a>, this explanatory variable will be a single numerical explanatory variable <span class="math inline">\(x\)</span>. This scenario is known as <em>simple linear regression</em>.</li>
+<li>In Section <a href="6-regression.html#model2">6.2</a>, this explanatory variable will be a categorical explanatory variable <span class="math inline">\(x\)</span>.</li>
+</ul></li>
+<li>In the next chapter, Chapter <a href="7-multiple-regression.html#multiple-regression">7</a> on <em>multiple regression</em>, we’ll have more than one explanatory variable:
+<ul>
+<li>We’ll focus on two numerical explanatory variables <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span> in Section <a href="7-multiple-regression.html#model3">7.1</a>. This can be denoted as <span class="math inline">\(\vec{x}\)</span> as well since we have more than one explanatory variable.</li>
+<li>We’ll use one numerical and one categorical explanatory variable in Section <a href="7-multiple-regression.html#model3">7.1</a>. We’ll also introduce <em>interaction models</em> here; there, the effect of one explanatory variable depends on the value of another.</li>
+</ul></li>
+</ul>
+<p>We’ll study all four of these regression scenarios using real data, all easily accessible via R packages! <!--We will also discuss the concept of *correlation* and how it is frequently incorrectly used to imply *causation*.--></p>
+<div id="needed-packages-3" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>In this chapter we introduce a new package, <code>moderndive</code>, that is an accompaniment package to this ModernDive book. It includes useful functions for linear regression and other functions as well as data used later in the book. Let’s now load all the packages needed for this chapter. If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(moderndive)
+<span class="kw">library</span>(gapminder)
+<span class="kw">library</span>(skimr)</code></pre>
+</div>
+<div id="datacamp-4" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>The introductory basic regression analysis below was the inspiration for a large part of ModernDive co-author <a href="https://twitter.com/rudeboybert">Albert Y. Kim’s</a> DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 2 “Modeling with Basic Regression”.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse"><img src="images/datacamp_intro_to_modeling.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="model1" class="section level2">
+<h2><span class="header-section-number">6.1</span> One numerical explanatory variable</h2>
+<p>Why do some professors and instructors at universities and colleges get high teaching evaluations from students while others don’t? What factors can explain these differences? Are there biases? These are questions that are of interest to university/college administrators, as teaching evaluations are among the many criteria considered in determining which professors and instructors should get promotions. Researchers at the University of Texas in Austin, Texas (UT Austin) tried to answer this question: what factors can explain differences in instructor’s teaching evaluation scores? To this end, they collected information on <span class="math inline">\(n = 463\)</span> instructors. A full description of the study can be found at <a href="https://www.openintro.org/stat/data/?data=evals">openintro.org</a>.</p>
+<p>We’ll keep things simple for now and try to explain differences in instructor evaluation scores as a function of one numerical variable: their “beauty score.” The specifics on how this score was calculated will be described shortly.</p>
+<p>Could it be that instructors with higher beauty scores also have higher teaching evaluations? Could it be instead that instructors with higher beauty scores tend to have lower teaching evaluations? Or could it be there is no relationship between beauty score and teaching evaluations?</p>
+<p>We’ll achieve ways to address these questions by modeling the relationship between these two variables with a particular kind of linear regression called <em>simple linear regression</em>. Simple linear regression is the most basic form of linear regression. With it we have</p>
+<ol style="list-style-type: decimal">
+<li>A numerical outcome variable <span class="math inline">\(y\)</span>. In this case, their teaching score.</li>
+<li>A single numerical explanatory variable <span class="math inline">\(x\)</span>. In this case, their beauty score.</li>
+</ol>
+<div id="model1EDA" class="section level3">
+<h3><span class="header-section-number">6.1.1</span> Exploratory data analysis</h3>
+<p>A crucial step before doing any kind of modeling or analysis is performing an <em>exploratory data analysis</em>, or EDA, of all our data. Exploratory data analysis can give you a sense of the distribution of the data, and whether there are outliers and/or missing values. Most importantly, it can inform how to build your model. There are many approaches to exploratory data analysis; here are three:</p>
+<ol style="list-style-type: decimal">
+<li>Most fundamentally: just looking at the raw values, in a spreadsheet for example. While this may seem trivial, many people ignore this crucial step!</li>
+<li>Computing summary statistics likes means, medians, and standard deviations.</li>
+<li>Creating data visualizations.</li>
+</ol>
+<p>Let’s load the data, <code>select</code> only a subset of the variables, and look at the raw values. Recall you can look at the raw values by running <code>View()</code> in the console in RStudio to pop-up the spreadsheet viewer with the data frame of interest as the argument to <code>View()</code>. Here, however, we present only a snapshot of five randomly chosen rows:</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch6 &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(score, bty_avg, age)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch6 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">sample_n</span>(<span class="dv">5</span>)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-158">Table 6.1: </span>Random sample of 5 instructors</caption>
+<thead>
+<tr class="header">
+<th align="right">score</th>
+<th align="right">bty_avg</th>
+<th align="right">age</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">3.6</td>
+<td align="right">6.67</td>
+<td align="right">34</td>
+</tr>
+<tr class="even">
+<td align="right">4.9</td>
+<td align="right">3.50</td>
+<td align="right">43</td>
+</tr>
+<tr class="odd">
+<td align="right">3.3</td>
+<td align="right">2.33</td>
+<td align="right">47</td>
+</tr>
+<tr class="even">
+<td align="right">4.4</td>
+<td align="right">4.67</td>
+<td align="right">33</td>
+</tr>
+<tr class="odd">
+<td align="right">4.7</td>
+<td align="right">3.67</td>
+<td align="right">60</td>
+</tr>
+</tbody>
+</table>
+<p>While a full description of each of these variables can be found at <a href="https://www.openintro.org/stat/data/?data=evals">openintro.org</a>, let’s summarize what each of these variables represents.</p>
+<ol style="list-style-type: decimal">
+<li><code>score</code>: Numerical variable of the average teaching score based on students’ evaluations between 1 and 5. This is the outcome variable <span class="math inline">\(y\)</span> of interest.</li>
+<li><code>bty_avg</code>: Numerical variable of average “beauty” rating based on a panel of 6 students’ scores between 1 and 10. This is the numerical explanatory variable <span class="math inline">\(x\)</span> of interest. Here 1 corresponds to a low beauty rating and 10 to a high beauty rating.</li>
+<li><code>age</code>: A numerical variable of age in years as an integer value.</li>
+</ol>
+<p>Another way to look at the raw values is using the <code>glimpse()</code> function, which gives us a slightly different view of the data. We see <code>Observations: 463</code>, indicating that there are 463 observations in <code>evals</code>, each corresponding to a particular instructor at UT Austin. Expressed differently, each row in the data frame <code>evals</code> corresponds to one of 463 instructors.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(evals_ch6)</code></pre>
+<pre><code>Observations: 463
+Variables: 3
+$ score   &lt;dbl&gt; 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4.5, 4…
+$ bty_avg &lt;dbl&gt; 5.00, 5.00, 5.00, 5.00, 3.00, 3.00, 3.00, 3.33, 3.33, 3.17, 3…
+$ age     &lt;int&gt; 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, 40, 4…</code></pre>
+<p>Since both the outcome variable <code>score</code> and the explanatory variable <code>bty_avg</code> are numerical, we can compute summary statistics about them such as the mean, median, and standard deviation. Let’s take <code>evals_ch6</code> and select only the two variables of interest for now. However, let’s instead pipe this into the <code>skim()</code> function from the <code>skimr</code> package. This function quickly uses a “skim” of the data to return the following summary information about each variable.</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch6 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(score, bty_avg) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">skim</span>()</code></pre>
+<pre><code>Skim summary statistics
+ n obs: 463 
+ n variables: 2 
+
+── Variable type:numeric ─────
+ variable missing complete   n mean   sd   p0  p25  p50 p75 p100     hist
+  bty_avg       0      463 463 4.42 1.53 1.67 3.17 4.33 5.5 8.17 ▂▅▅▇▃▃▂▁
+    score       0      463 463 4.17 0.54 2.3  3.8  4.3  4.6 5    ▁▁▂▃▅▇▇▆</code></pre>
+<p>In this case for our two numerical variables <code>bty_avg</code> beauty score and teaching score <code>score</code> it returns:</p>
+<ul>
+<li><code>missing</code>: the number of missing values</li>
+<li><code>complete</code>: the number of non-missing or complete values</li>
+<li><code>n</code>: the total number of values</li>
+<li><code>mean</code>: the average</li>
+<li><code>sd</code>: the standard deviation</li>
+<li><code>p0</code>: the 0<sup>th</sup> percentile: the value at which 0% of observations are smaller than it. This is also known as the <em>minimum</em></li>
+<li><code>p25</code>: the 25<sup>th</sup> percentile: the value at which 25% of observations are smaller than it. This is also known as the <em>1<sup>st</sup> quartile</em></li>
+<li><code>p50</code>: the 25<sup>th</sup> percentile: the value at which 50% of observations are smaller than it. This is also know as the <em>2<sup>nd</sup></em> quartile and more commonly the <em>median</em></li>
+<li><code>p75</code>: the 75<sup>th</sup> percentile: the value at which 75% of observations are smaller than it. This is also known as the <em>3<sup>rd</sup> quartile</em></li>
+<li><code>p100</code>: the 100<sup>th</sup> percentile: the value at which 100% of observations are smaller than it. This is also known as the <em>maximum</em></li>
+<li>A quick snapshot of the <code>hist</code>ogram</li>
+</ul>
+<p>We get an idea of how the values in both variables are distributed. For example, the mean teaching score was 4.17 out of 5 whereas the mean beauty score was 4.42 out of 10. Furthermore, the middle 50% of teaching scores were between 3.80 and 4.6 (the first and third quartiles) while the middle 50% of beauty scores were between 3.17 and 5.5 out of 10.</p>
+<p>The <code>skim()</code> function however only returns what are called <em>univariate</em> summaries, i.e. summaries about single variables at a time. Since we are considering the relationship between two numerical variables, it would be nice to have a summary statistic that simultaneously considers both variables. The <em>correlation coefficient</em> is a <em>bivariate</em> summary statistic that fits this bill. <em>Coefficients</em> in general are quantitative expressions of a specific property of a phenomenon. A correlation coefficient is a quantitative expression between -1 and 1 that summarizes the <em>strength of the linear relationship between two numerical variables</em>:</p>
+<ul>
+<li>-1 indicates a perfect <em>negative relationship</em>: as the value of one variable goes up, the value of the other variable tends to go down.</li>
+<li>0 indicates no relationship: the values of both variables go up/down independently of each other.</li>
+<li>+1 indicates a perfect <em>positive relationship</em>: as the value of one variable goes up, the value of the other variable tends to go up as well.</li>
+</ul>
+<p>Figure <a href="6-regression.html#fig:correlation1">6.1</a> gives examples of different correlation coefficient values for hypothetical numerical variables <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>. We see that while for a correlation coefficient of -0.75 there is still a negative relationship between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>, it is not as strong as the negative relationship between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> when the correlation coefficient is -1.</p>
+<div class="figure" style="text-align: center"><span id="fig:correlation1"></span>
+<img src="ismaykim_files/figure-html/correlation1-1.png" alt="Different correlation coefficients" width="\textwidth" />
+<p class="caption">
+Figure 6.1: Different correlation coefficients
+</p>
+</div>
+<p>The correlation coefficient is computed using the <code>get_correlation()</code> function in the <code>moderndive</code> package, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. We place the name of the response variable on the left hand side of the <code>~</code> and the explanatory variable on the right hand side of the “tilde.” We will use this same “formula” syntax with regression later in this chapter.</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch6 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_correlation</span>(<span class="dt">formula =</span> score <span class="op">~</span><span class="st"> </span>bty_avg)</code></pre>
+<pre><code># A tibble: 1 x 1
+  correlation
+        &lt;dbl&gt;
+1       0.187</code></pre>
+<p>The correlation coefficient can also be computed using the <code>cor()</code> function, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. Recall from Subsection <a href="2-getting-started.html#exploredataframes">2.4.3</a> that the <code>$</code> pulls out specific variables from a data frame:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">cor</span>(<span class="dt">x =</span> evals_ch6<span class="op">$</span>bty_avg, <span class="dt">y =</span> evals_ch6<span class="op">$</span>score)</code></pre>
+<pre><code>[1] 0.187</code></pre>
+<p>In our case, the correlation coefficient of 0.187 indicates that the relationship between teaching evaluation score and beauty average is “weakly positive.” There is a certain amount of subjectivity in interpreting correlation coefficients, especially those that aren’t close to -1, 0, and 1. For help developing such intuition and more discussion on the correlation coefficient see Subsection <a href="6-regression.html#correlationcoefficient">6.3.1</a> below.</p>
+<p>Let’s now proceed by visualizing this data. Since both the <code>score</code> and <code>bty_avg</code> variables are numerical, a scatterplot is an appropriate graph to visualize this data. Let’s do this using <code>geom_point()</code> and set informative axes labels and title and display the result in Figure <a href="6-regression.html#fig:numxplot1">6.2</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(evals_ch6, <span class="kw">aes</span>(<span class="dt">x =</span> bty_avg, <span class="dt">y =</span> score)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Beauty Score&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Teaching Score&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Relationship of teaching and beauty scores&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:numxplot1"></span>
+<img src="ismaykim_files/figure-html/numxplot1-1.png" alt="Instructor evaluation scores at UT Austin" width="\textwidth" />
+<p class="caption">
+Figure 6.2: Instructor evaluation scores at UT Austin
+</p>
+</div>
+<p>Observe the following:</p>
+<ol style="list-style-type: decimal">
+<li>Most “beauty” scores lie between 2 and 8.</li>
+<li>Most teaching scores lie between 3 and 5.</li>
+<li>Recall our earlier computation of the correlation coefficient, which describes the strength of the linear relationship between two numerical variables. Looking at Figure <a href="6-regression.html#fig:numxplot2">6.3</a>, it is not immediately apparent that these two variables are positively related. This is to be expected given the positive, but rather weak (close to 0), correlation coefficient of 0.187.</li>
+</ol>
+<p>Before we continue, we bring to light an important fact about this dataset: it suffers from <em>overplotting</em>. Recall from the data visualization Subsection <a href="3-viz.html#overplotting">3.3.2</a> that overplotting occurs when several points are stacked directly on top of each other thereby obscuring the number of points. For example, let’s focus on the 6 points in the top-right of the plot with a beauty score of around 8 out of 10: are there truly only 6 points, or are there many more just stacked on top of each other? You can think of these as <em>ties</em>. Let’s break up these ties with a little random “jitter” added to the points in Figure <a href="6-regression.html#fig:numxplot2">6.3</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:numxplot2"></span>
+<img src="ismaykim_files/figure-html/numxplot2-1.png" alt="Instructor evaluation scores at UT Austin: Jittered" width="\textwidth" />
+<p class="caption">
+Figure 6.3: Instructor evaluation scores at UT Austin: Jittered
+</p>
+</div>
+<p>Jittering adds a little random bump to each of the points to break up these ties: just enough so you can distinguish them, but not so much that the plot is overly altered. Furthermore, jittering is strictly a visualization tool; it does not alter the original values in the dataset.</p>
+<p>Let’s compare side-by-side the regular scatterplot in Figure <a href="6-regression.html#fig:numxplot1">6.2</a> with the jittered scatterplot in Figure <a href="6-regression.html#fig:numxplot2">6.3</a> in Figure <a href="6-regression.html#fig:numxplot2-a">6.4</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:numxplot2-a"></span>
+<img src="ismaykim_files/figure-html/numxplot2-a-1.png" alt="Comparing regular and jittered scatterplots." width="\textwidth" />
+<p class="caption">
+Figure 6.4: Comparing regular and jittered scatterplots.
+</p>
+</div>
+<p>We make several further observations:</p>
+<!-- We might want to actually highlight these points in the plot. -->
+<ol style="list-style-type: decimal">
+<li>Focusing our attention on the top-right of the plot again, as noted earlier where there seemed to only be 6 points in the regular scatterplot, we see there were in fact really 9 as seen in the jittered scatterplot.</li>
+<li>A further interesting trend is that the jittering revealed a large number of instructors with beauty scores of between 3 and 4.5, towards the lower end of the beauty scale.</li>
+</ol>
+<p>Going forward for simplicity’s sake however, we’ll only present regular scatterplot rather than the jittered scatterplots; we’ll only keep the overplotting in mind whenever looking at such plots. Going back to scatterplot in Figure <a href="6-regression.html#fig:numxplot1">6.2</a>, let’s improve on it by adding a “regression line” in Figure <a href="6-regression.html#fig:numxplot3">6.5</a>. This is easily done by adding a new layer to the <code>ggplot</code> code that created Figure <a href="6-regression.html#fig:numxplot2">6.3</a>: <code>+ geom_smooth(method = &quot;lm&quot;)</code>. A regression line is a “best fitting” line in that of all possible lines you could draw on this plot, it is “best” in terms of some mathematical criteria. We discuss the criteria for “best” in Subsection <a href="6-regression.html#leastsquares">6.3.3</a> below, but we suggest you read this only after covering the concept of a <em>residual</em> coming up in Subsection <a href="6-regression.html#model1points">6.1.3</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(evals_ch6, <span class="kw">aes</span>(<span class="dt">x =</span> bty_avg, <span class="dt">y =</span> score)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Beauty Score&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Teaching Score&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Relationship of teaching and beauty scores&quot;</span>) <span class="op">+</span><span class="st">  </span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:numxplot3"></span>
+<img src="ismaykim_files/figure-html/numxplot3-1.png" alt="Regression line" width="\textwidth" />
+<p class="caption">
+Figure 6.5: Regression line
+</p>
+</div>
+<p>When viewed on this plot, the regression line is a visual summary of the relationship between two numerical variables, in our case the outcome variable <code>score</code> and the explanatory variable <code>bty_avg</code>. The positive slope of the blue line is consistent with our observed correlation coefficient of 0.187 suggesting that there is a positive relationship between <code>score</code> and <code>bty_avg</code>. We’ll see later however that while the correlation coefficient is not equal to the slope of this line, they always have the same sign: positive or negative.</p>
+<p>What are the grey bands surrounding the blue line? These are <em>standard error</em> bands, which can be thought of as error/uncertainty bands. Let’s skip this idea for now and suppress these grey bars for now by adding the argument <code>se = FALSE</code> to <code>geom_smooth(method = &quot;lm&quot;)</code>. We’ll introduce standard errors in Chapter <a href="8-sampling.html#sampling">8</a> on sampling, use them for constructing <em>confidence intervals</em> and conducting <em>hypothesis tests</em> in Chapters <a href="9-confidence-intervals.html#confidence-intervals">9</a> and <a href="10-hypothesis-testing.html#hypothesis-testing">10</a>, and consider them when we revisit regression in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(evals_ch6, <span class="kw">aes</span>(<span class="dt">x =</span> bty_avg, <span class="dt">y =</span> score)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Beauty Score&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Teaching Score&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Relationship of teaching and beauty scores&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>, <span class="dt">se =</span> <span class="ot">FALSE</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:numxplot4"></span>
+<img src="ismaykim_files/figure-html/numxplot4-1.png" alt="Regression line without error bands" width="\textwidth" />
+<p class="caption">
+Figure 6.6: Regression line without error bands
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.1)</strong> Conduct a new exploratory data analysis with the same outcome variable <span class="math inline">\(y\)</span> being <code>score</code> but with <code>age</code> as the new explanatory variable <span class="math inline">\(x\)</span>. Remember, this involves three things:</p>
+<ol style="list-style-type: lower-alpha">
+<li>Looking at the raw values.</li>
+<li>Computing summary statistics of the variables of interest.</li>
+<li>Creating informative visualizations.</li>
+</ol>
+<p>What can you say about the relationship between age and teaching scores based on this exploration?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model1table" class="section level3">
+<h3><span class="header-section-number">6.1.2</span> Simple linear regression</h3>
+<p>You may recall from secondary school / high school algebra, in general, the equation of a line is <span class="math inline">\(y = a + bx\)</span>, which is defined by two coefficients. Recall we defined this earlier as “quantitative expressions of a specific property of a phenomenon.” These two coefficients are:</p>
+<ul>
+<li>the intercept coefficient <span class="math inline">\(a\)</span>, or the value of <span class="math inline">\(y\)</span> when <span class="math inline">\(x = 0\)</span>, and</li>
+<li>the slope coefficient <span class="math inline">\(b\)</span>, or the increase in <span class="math inline">\(y\)</span> for every increase of one in <span class="math inline">\(x\)</span>.</li>
+</ul>
+<p>However, when defining a line specifically for regression, like the blue regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>, we use slightly different notation: the equation of the regression line is <span class="math inline">\(\widehat{y} = b_0 + b_1 \cdot x\)</span> where</p>
+<ul>
+<li>the intercept coefficient is <span class="math inline">\(b_0\)</span>, or the value of <span class="math inline">\(\widehat{y}\)</span> when <span class="math inline">\(x=0\)</span>, and</li>
+<li>the slope coefficient <span class="math inline">\(b_1\)</span>, or the increase in <span class="math inline">\(\widehat{y}\)</span> for every increase of one in <span class="math inline">\(x\)</span>.</li>
+</ul>
+<p>Why do we put a “hat” on top of the <span class="math inline">\(y\)</span>? It’s a form of notation commonly used in regression, which we’ll introduce in the next Subsection <a href="6-regression.html#model1points">6.1.3</a> when we discuss <em>fitted values</em>. For now, let’s ignore the hat and treat the equation of the line as you would from secondary school / high school algebra recognizing the slope and the intercept. We know looking at Figure <a href="6-regression.html#fig:numxplot4">6.6</a> that the slope coefficient corresponding to <code>bty_avg</code> should be positive. Why? Because as <code>bty_avg</code> increases, professors tend to roughly have larger teaching evaluation <code>scores</code>. However, what are the specific values of the intercept and slope coefficients? Let’s not worry about computing these by hand, but instead let the computer do the work for us. Specifically let’s use R!</p>
+<p>Let’s get the value of the intercept and slope coefficients by outputting something called the <em>linear regression table</em>. We will fit the linear regression model to the <code>data</code> using the <code>lm()</code> function and save this to <code>score_model</code>. <code>lm</code> stands for “linear model”, given that we are dealing with lines. When we say “fit”, we are saying find the best fitting line to this data.</p>
+<p>The <code>lm()</code> function that “fits” the linear regression model is typically used as <code>lm(y ~ x, data = data_frame_name)</code> where:</p>
+<ul>
+<li><code>y</code> is the outcome variable, followed by a tilde (<code>~</code>). This is likely the key to the left of “1” on your keyboard. In our case, <code>y</code> is set to <code>score</code>.</li>
+<li><code>x</code> is the explanatory variable. In our case, <code>x</code> is set to <code>bty_avg</code>. We call the combination <code>y ~ x</code> a <em>model formula</em>. Recall the use of this notation when we computed the correlation coefficient using the <code>get_correlation()</code> function in Subsection <a href="6-regression.html#model1EDA">6.1.1</a>.</li>
+<li><code>data_frame_name</code> is the name of the data frame that contains the variables <code>y</code> and <code>x</code>. In our case, <code>data_frame_name</code> is the <code>evals_ch6</code> data frame.</li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r">score_model &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>bty_avg, <span class="dt">data =</span> evals_ch6)
+score_model</code></pre>
+<pre><code>
+Call:
+lm(formula = score ~ bty_avg, data = evals_ch6)
+
+Coefficients:
+(Intercept)      bty_avg  
+     3.8803       0.0666  </code></pre>
+<p>This output is telling us that the <code>Intercept</code> coefficient <span class="math inline">\(b_0\)</span> of the regression line is 3.8803 and the slope coefficient for <code>by_avg</code> is 0.0666. Therefore the blue regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a> is</p>
+<p><span class="math display">\[\widehat{\text{score}} = b_0 + b_{\text{bty_avg}} \cdot\text{bty_avg} = 3.8803 + 0.0666\cdot\text{ bty_avg}\]</span></p>
+<p>where</p>
+<ul>
+<li>The intercept coefficient <span class="math inline">\(b_0 = 3.8803\)</span> means for instructors that had a hypothetical beauty score of 0, we would expect them to have on average a teaching score of 3.8803. In this case however, while the intercept has a mathematical interpretation when defining the regression line, there is no <em>practical</em> interpretation since <code>score</code> is an average of a panel of 6 students’ ratings from 1 to 10, a <code>bty_avg</code> of 0 would be impossible. Furthermore, no instructors had a beauty score anywhere near 0 in this data.</li>
+<li><p>Of more interest is the slope coefficient associated with <code>bty_avg</code>: <span class="math inline">\(b_{\text{bty avg}} = +0.0666\)</span>. This is a numerical quantity that summarizes the relationship between the outcome and explanatory variables. Note that the sign is positive, suggesting a positive relationship between beauty scores and teaching scores, meaning as beauty scores go up, so also do teaching scores go up. The slope’s precise interpretation is:</p>
+<blockquote>
+<p>For every increase of 1 unit in <code>bty_avg</code>, there is an <em>associated</em> increase of, <em>on average</em>, 0.0666 units of <code>score</code>.</p>
+</blockquote></li>
+</ul>
+<p>Such interpretations need be carefully worded:</p>
+<ul>
+<li>We only stated that there is an <em>associated</em> increase, and not necessarily a <em>causal</em> increase. For example, perhaps it’s not that beauty directly affects teaching scores, but instead individuals from wealthier backgrounds tend to have had better education and training, and hence have higher teaching scores, but these same individuals also have higher beauty scores. Avoiding such reasoning can be summarized by the adage “correlation is not necessarily causation.” In other words, just because two variables are correlated, it doesn’t mean one directly causes the other. We discuss these ideas more in Subsection <a href="6-regression.html#correlation-is-not-causation">6.3.2</a>.<br />
+</li>
+<li>We say that this associated increase is <em>on average</em> 0.0666 units of teaching <code>score</code> and not that the associated increase is <em>exactly</em> 0.0666 units of <code>score</code> across all values of <code>bty_avg</code>. This is because the slope is the average increase across all points as shown by the regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>.</li>
+</ul>
+<p>Now that we’ve learned how to compute the equation for the blue regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a> and interpreted all its terms, let’s take our modeling one step further. This time after fitting the model using the <code>lm()</code>, let’s get something called the <em>regression table</em> using the <code>get_regression_table()</code> function from the <code>moderndive</code> package:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Fit regression model:</span>
+score_model &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>bty_avg, <span class="dt">data =</span> evals_ch6)
+<span class="co"># Get regression table:</span>
+<span class="kw">get_regression_table</span>(score_model)</code></pre>
+<table>
+<caption><span id="tab:numxplot4b">Table 6.2: </span>Linear regression table</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">3.880</td>
+<td align="right">0.076</td>
+<td align="right">50.96</td>
+<td align="right">0</td>
+<td align="right">3.731</td>
+<td align="right">4.030</td>
+</tr>
+<tr class="even">
+<td align="left">bty_avg</td>
+<td align="right">0.067</td>
+<td align="right">0.016</td>
+<td align="right">4.09</td>
+<td align="right">0</td>
+<td align="right">0.035</td>
+<td align="right">0.099</td>
+</tr>
+</tbody>
+</table>
+<p>Note how we took the output of the model fit saved in <code>score_model</code> and used it as an input to the subsequent <code>get_regression_table()</code> function. The output now looks like a table: in fact it is a data frame. The values of the intercept and slope of 3.880 and 0.0666 are now in the <code>estimate</code> column. But what are the remaining 5 columns: <code>std_error</code>, <code>statistic</code>, <code>p_value</code>, <code>lower_ci</code> and <code>upper_ci</code>? What do they tell us? They tell us about both the <em>statistical significance</em> and <em>practical significance</em> of our model results. You can think of this loosely as the “meaningfulness” of the results from a statistical perspective.</p>
+<p>We are going to put aside these ideas for now and revisit them in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on (statistical) inference for regression, after we’ve had a chance to cover:</p>
+<ul>
+<li>Standard errors in Chapter <a href="8-sampling.html#sampling">8</a> (<code>std_error</code>)</li>
+<li>Confidence intervals in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a> (<code>lower_ci</code> and <code>upper_ci</code>)</li>
+<li>Hypothesis testing in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> (<code>statistic</code> and <code>p_value</code>).</li>
+</ul>
+<p>For now, we’ll only focus on the <code>term</code> and <code>estimate</code> columns of any regression table.</p>
+<p>The <code>get_regression_table()</code> from the <code>moderndive</code> is an example of what’s known as a <em>wrapper function</em> in computer programming, which takes other pre-existing functions and “wraps” them into a single function. This concept is illustrated in Figure <a href="6-regression.html#fig:moderndive-figure-wrapper">6.7</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:moderndive-figure-wrapper"></span>
+<img src="images/flowcharts/flowchart.011-cropped.png" alt="The concept of a 'wrapper' function." width="\textwidth" />
+<p class="caption">
+Figure 6.7: The concept of a ‘wrapper’ function.
+</p>
+</div>
+<p>So all you need to worry about is the what the inputs look like and what the outputs look like; you leave all the other details “under the hood of the car.” In our regression modeling example, the <code>get_regression_table()</code> has</p>
+<ul>
+<li>Input: A saved <code>lm()</code> linear regression</li>
+<li>Output: A data frame with information on the intercept and slope of the regression line.</li>
+</ul>
+<p>If you’re interested in learning more about the <code>get_regression_table()</code> function’s construction and thinking, see Subsection <a href="6-regression.html#underthehood">6.3.4</a> below.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.2)</strong> Fit a new simple linear regression using <code>lm(score ~ age, data = evals_ch6)</code> where <code>age</code> is the new explanatory variable <span class="math inline">\(x\)</span>. Get information about the “best-fitting” line from the regression table by applying the <code>get_regression_table()</code> function. How do the regression results match up with the results from your exploratory data analysis above?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model1points" class="section level3">
+<h3><span class="header-section-number">6.1.3</span> Observed/fitted values and residuals</h3>
+<p>We just saw how to get the value of the intercept and the slope of the regression line from the regression table generated by <code>get_regression_table()</code>. Now instead, say we want information on individual points. In this case, we focus on one of the <span class="math inline">\(n = 463\)</span> instructors in this dataset, corresponding to a single row of <code>evals_ch6</code>.</p>
+<p>For example, say we are interested in the 21st instructor in this dataset:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-170">Table 6.3: </span>Data for 21st instructor</caption>
+<thead>
+<tr class="header">
+<th align="right">score</th>
+<th align="right">bty_avg</th>
+<th align="right">age</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">4.9</td>
+<td align="right">7.33</td>
+<td align="right">31</td>
+</tr>
+</tbody>
+</table>
+<p>What is the value on the blue line corresponding to this instructor’s <code>bty_avg</code> of 7.333? In Figure <a href="6-regression.html#fig:numxplot5">6.8</a> we mark three values in particular corresponding to this instructor.</p>
+<ul>
+<li>Red circle: This is the <em>observed value</em> <span class="math inline">\(y\)</span> = 4.9 and corresponds to this instructor’s actual teaching score.</li>
+<li>Red square: This is the <em>fitted value</em> <span class="math inline">\(\widehat{y}\)</span> and corresponds to the value on the regression line for <span class="math inline">\(x\)</span> = 7.333. This value is computed using the intercept and slope in the regression table above: <span class="math display">\[\widehat{y} = b_0 + b_1 \cdot x = 3.88 + 0.067 * 7.333 = 4.369\]</span></li>
+<li>Blue arrow: The length of this arrow is the <em>residual</em> and is computed by subtracting the fitted value <span class="math inline">\(\widehat{y}\)</span> from the observed value <span class="math inline">\(y\)</span>. The residual can be thought of as the error or “lack of fit” of the regression line. In the case of this instructor, it is <span class="math inline">\(y - \widehat{y}\)</span> = 4.9 - 4.369 = 0.531. In other words, the model was off by 0.531 teaching score units for this instructor.</li>
+</ul>
+<div class="figure" style="text-align: center"><span id="fig:numxplot5"></span>
+<img src="ismaykim_files/figure-html/numxplot5-1.png" alt="Example of observed value, fitted value, and residual" width="\textwidth" />
+<p class="caption">
+Figure 6.8: Example of observed value, fitted value, and residual
+</p>
+</div>
+<p>What if we want both</p>
+<ol style="list-style-type: decimal">
+<li>the fitted value <span class="math inline">\(\widehat{y} = b_0 + b_1 \cdot x\)</span> and</li>
+<li>the residual <span class="math inline">\(y - \widehat{y}\)</span></li>
+</ol>
+<p>not only the 21st instructor but for all 463 instructors in the study? Recall that each instructor corresponds to one of the 463 rows in the <code>evals_ch6</code> data frame and also one of the 463 points in the regression plot in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>.</p>
+<p>We could repeat the above calculations by hand 463 times, but that would be tedious and time consuming. Instead, let’s use the <code>get_regression_points()</code> function that we’ve included in the <code>moderndive</code> R package. Note that in the table below we only present the results for the 21st through the 24th instructors.</p>
+<pre class="sourceCode r"><code class="sourceCode r">regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(score_model)
+regression_points</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-173">Table 6.4: </span>Regression points (for only 21st through 24th instructor)</caption>
+<thead>
+<tr class="header">
+<th align="right">ID</th>
+<th align="right">score</th>
+<th align="right">bty_avg</th>
+<th align="right">score_hat</th>
+<th align="right">residual</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">21</td>
+<td align="right">4.9</td>
+<td align="right">7.33</td>
+<td align="right">4.37</td>
+<td align="right">0.531</td>
+</tr>
+<tr class="even">
+<td align="right">22</td>
+<td align="right">4.6</td>
+<td align="right">7.33</td>
+<td align="right">4.37</td>
+<td align="right">0.231</td>
+</tr>
+<tr class="odd">
+<td align="right">23</td>
+<td align="right">4.5</td>
+<td align="right">7.33</td>
+<td align="right">4.37</td>
+<td align="right">0.131</td>
+</tr>
+<tr class="even">
+<td align="right">24</td>
+<td align="right">4.4</td>
+<td align="right">5.50</td>
+<td align="right">4.25</td>
+<td align="right">0.153</td>
+</tr>
+</tbody>
+</table>
+<p>Just as with the <code>get_regression_table()</code> function, the inputs to the <code>get_regression_points()</code> function are the same, however the outputs are different. Let’s inspect the individual columns:</p>
+<ul>
+<li>The <code>score</code> column represents the observed value of the outcome variable <span class="math inline">\(y\)</span>.</li>
+<li>The <code>bty_avg</code> column represents the values of the explanatory variable <span class="math inline">\(x\)</span>.</li>
+<li>The <code>score_hat</code> column represents the fitted values <span class="math inline">\(\widehat{y}\)</span>.</li>
+<li>The <code>residual</code> column represents the residuals <span class="math inline">\(y - \widehat{y}\)</span>.</li>
+</ul>
+<p><code>get_regression_points()</code> is another example of a wrapper function we described in Figure <a href="6-regression.html#fig:moderndive-figure-wrapper">6.7</a>. If you’re curious about this function as well, check out Subsection <a href="6-regression.html#underthehood">6.3.4</a>.</p>
+<p>Just as we did for the 21st instructor in the <code>evals_ch6</code> dataset (in the first row of the table above), let’s repeat the above calculations for the 24th instructor in the <code>evals_ch6</code> dataset (in the fourth row of the table above):</p>
+<ul>
+<li><code>score</code> = 4.4 is the observed value <span class="math inline">\(y\)</span> for this instructor.</li>
+<li><code>bty_avg</code> = 5.50 is the value of the explanatory variable <span class="math inline">\(x\)</span> for this instructor.</li>
+<li><code>score_hat</code> = 4.25 = 3.88 + 0.067 * <span class="math inline">\(x\)</span> = 3.88 + 0.067 * 5.50 is the fitted value <span class="math inline">\(\widehat{y}\)</span> for this instructor.</li>
+<li><code>residual</code> = 0.153 = 4.4 - 4.25 is the value of the residual for this instructor. In other words, the model was off by 0.153 teaching score units for this instructor.</li>
+</ul>
+<p>More development of this idea appears in Section <a href="6-regression.html#leastsquares">6.3.3</a> and we encourage you to read that section after you investigate residuals.</p>
+</div>
+<div id="model1residuals" class="section level3">
+<h3><span class="header-section-number">6.1.4</span> Residual analysis</h3>
+<p>Recall the residuals can be thought of as the error or the “lack-of-fit” between the observed value <span class="math inline">\(y\)</span> and the fitted value <span class="math inline">\(\widehat{y}\)</span> on the blue regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>. Ideally when we fit a regression model, we’d like there to be <em>no systematic pattern</em> to these residuals. We’ll be more specific as to what we mean by <em>no systematic pattern</em> when we see Figure <a href="6-regression.html#fig:numxplot7">6.10</a> below, but let’s keep this notion imprecise for now. Investigating any such patterns is known as <em>residual analysis</em> and is the theme of this section.</p>
+<p>We’ll perform our residual analysis in two ways:</p>
+<ol style="list-style-type: decimal">
+<li>Creating a scatterplot with the residuals on the <span class="math inline">\(y\)</span>-axis and the original explanatory variable <span class="math inline">\(x\)</span> on the <span class="math inline">\(x\)</span>-axis.</li>
+<li>Creating a histogram of the residuals, thereby showing the <em>distribution</em> of the residuals.</li>
+</ol>
+<p>First, recall in Figure <a href="6-regression.html#fig:numxplot5">6.8</a> above we created a scatterplot where</p>
+<ul>
+<li>on the vertical axis we had the teaching score <span class="math inline">\(y\)</span>,</li>
+<li>on the horizontal axis we had the beauty score <span class="math inline">\(x\)</span>, and</li>
+<li>the blue arrow represented the residual for one particular instructor.</li>
+</ul>
+<p>Instead, in Figure <a href="6-regression.html#fig:numxplot6">6.9</a> below, let’s create a scatterplot where</p>
+<ul>
+<li>On the vertical axis we have the residual <span class="math inline">\(y-\widehat{y}\)</span> instead</li>
+<li>On the horizontal axis we have the beauty score <span class="math inline">\(x\)</span> as before:</li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> bty_avg, <span class="dt">y =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Beauty Score&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_hline</span>(<span class="dt">yintercept =</span> <span class="dv">0</span>, <span class="dt">col =</span> <span class="st">&quot;blue&quot;</span>, <span class="dt">size =</span> <span class="dv">1</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:numxplot6"></span>
+<img src="ismaykim_files/figure-html/numxplot6-1.png" alt="Plot of residuals over beauty score" width="\textwidth" />
+<p class="caption">
+Figure 6.9: Plot of residuals over beauty score
+</p>
+</div>
+<p>You can think of Figure <a href="6-regression.html#fig:numxplot6">6.9</a> as Figure <a href="6-regression.html#fig:numxplot5">6.8</a> but with the blue line flattened out to <span class="math inline">\(y=0\)</span>. Does it seem like there is <em>no systematic pattern</em> to the residuals? This question is rather qualitative and subjective in nature, thus different people may respond with different answers to the above question. However, it can be argued that there isn’t a <em>drastic</em> pattern in the residuals.</p>
+<p>Let’s now get a little more precise in our definition of <em>no systematic pattern</em> in the residuals. Ideally, the residuals should behave <em>randomly</em>. In addition,</p>
+<ol style="list-style-type: decimal">
+<li>the residuals should be on average 0. In other words, sometimes the regression model will make a positive error in that <span class="math inline">\(y - \widehat{y} &gt; 0\)</span>, sometimes the regression model will make a negative error in that <span class="math inline">\(y - \widehat{y} &lt; 0\)</span>, but <em>on average</em> the error is 0.</li>
+<li>Further, the value and spread of the residuals should not depend on the value of <span class="math inline">\(x\)</span>.</li>
+</ol>
+<p>In Figure <a href="6-regression.html#fig:numxplot7">6.10</a> below, we display some hypothetical examples where there are <em>drastic</em> patterns to the residuals. In Example 1, the value of the residual seems to depend on <span class="math inline">\(x\)</span>: the residuals tend to be positive for small and large values of <span class="math inline">\(x\)</span> in this range, whereas values of <span class="math inline">\(x\)</span> more in the middle tend to have negative residuals. In Example 2, while the residuals seem to be on average 0 for each value of <span class="math inline">\(x\)</span>, the spread of the residuals varies for different values of <span class="math inline">\(x\)</span>; this situation is known as <em>heteroskedasticity</em>.</p>
+<div class="figure" style="text-align: center"><span id="fig:numxplot7"></span>
+<img src="ismaykim_files/figure-html/numxplot7-1.png" alt="Examples of less than ideal residual patterns" width="\textwidth" />
+<p class="caption">
+Figure 6.10: Examples of less than ideal residual patterns
+</p>
+</div>
+<p>The second way to perform a residual analysis is to look at the histogram of the residuals:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.25</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:model1residualshist"></span>
+<img src="ismaykim_files/figure-html/model1residualshist-1.png" alt="Histogram of residuals" width="\textwidth" />
+<p class="caption">
+Figure 6.11: Histogram of residuals
+</p>
+</div>
+<p>This histogram seems to indicate that we have more positive residuals than negative. Since the residual <span class="math inline">\(y-\widehat{y}\)</span> is positive when <span class="math inline">\(y &gt; \widehat{y}\)</span>, it seems our fitted teaching score from the regression model tends to <em>underestimate</em> the true teaching score. This histogram has a slight <em>left-skew</em> in that there is a long tail on the left. Another way to say this is this data exhibits a <em>negative skew</em>. Is this a problem? Again, there is a certain amount of subjectivity in the response. In the authors’ opinion, while there is a slight skew/pattern to the residuals, it isn’t a large concern. On the other hand, others might disagree with our assessment. Here are examples of an ideal and less than ideal pattern to the residuals when viewed in a histogram:</p>
+<div class="figure" style="text-align: center"><span id="fig:numxplot9"></span>
+<img src="ismaykim_files/figure-html/numxplot9-1.png" alt="Examples of ideal and less than ideal residual patterns" width="\textwidth" />
+<p class="caption">
+Figure 6.12: Examples of ideal and less than ideal residual patterns
+</p>
+</div>
+<p>In fact, we’ll see later on that we would like the residuals to be <em>normally distributed</em> with
+mean 0. In other words, be bell-shaped and centered at 0! While this requirement and residual analysis in general may seem to some of you as not being overly critical at this point, we’ll see later after when we cover <em>inference for regression</em> in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> that for the last five columns of the regression table from earlier (<code>std error</code>, <code>statistic</code>, <code>p_value</code>,<code>lower_ci</code>, and <code>upper_ci</code>) to have valid interpretations, the above three conditions should roughly hold.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.3)</strong> Continuing with our regression using <code>age</code> as the explanatory variable and teaching <code>score</code> as the outcome variable, use the <code>get_regression_points()</code> function to get the observed values, fitted values, and residuals for all 463 instructors. Perform a residual analysis and look for any systematic patterns in the residuals. Ideally, there should be little to no pattern.</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="model2" class="section level2">
+<h2><span class="header-section-number">6.2</span> One categorical explanatory variable</h2>
+<p>It’s an unfortunate truth that life expectancy is not the same across various countries in the world; there are a multitude of factors that are associated with how long people live. International development agencies are very interested in studying these differences in the hope of understanding where governments should allocate resources to address this problem. In this section, we’ll explore differences in life expectancy in two ways:</p>
+<ol style="list-style-type: decimal">
+<li>Differences between continents: Are there significant differences in life expectancy, on average, between the five continents of the world: Africa, the Americas, Asia, Europe, and Oceania?</li>
+<li>Differences within continents: How does life expectancy vary within the world’s five continents? For example, is the spread of life expectancy among the countries of Africa larger than the spread of life expectancy among the countries of Asia?</li>
+</ol>
+<p>To answer such questions, we’ll study the <code>gapminder</code> dataset in the <code>gapminder</code> package. Recall we mentioned this dataset in Subsection <a href="3-viz.html#gapminder">3.1.2</a> when we first studied the “Grammar of Graphics” introduced in Figure <a href="3-viz.html#fig:gapminder">3.1</a>. This dataset has international development statistics such as life expectancy, GDP per capita, and population by country (<span class="math inline">\(n\)</span> = 142) for 5-year intervals between 1952 and 2007.</p>
+<p>We’ll use this data for linear regression again, but note that our explanatory variable <span class="math inline">\(x\)</span> is now categorical, and not numerical like when we covered simple linear regression in Section <a href="6-regression.html#model1">6.1</a>. More precisely, we have:</p>
+<ol style="list-style-type: decimal">
+<li>A numerical outcome variable <span class="math inline">\(y\)</span>. In this case, life expectancy.</li>
+<li>A single categorical explanatory variable <span class="math inline">\(x\)</span>, In this case, the continent the country is part of.</li>
+</ol>
+<p>When the explanatory variable <span class="math inline">\(x\)</span> is categorical, the concept of a “best-fitting” line is a little different than the one we saw previously in Section <a href="6-regression.html#model1">6.1</a> where the explanatory variable <span class="math inline">\(x\)</span> was numerical. We’ll study these differences shortly in Subsection <a href="6-regression.html#model2table">6.2.2</a>, but first we conduct our exploratory data analysis.</p>
+<div id="model2EDA" class="section level3">
+<h3><span class="header-section-number">6.2.1</span> Exploratory data analysis</h3>
+<p>Let’s load the <code>gapminder</code> data and <code>filter()</code> for only observations in 2007. Next we <code>select()</code> only the variables we’ll need along with <code>gdpPercap</code>, which is each country’s gross domestic product per capita (GDP). GDP is a rough measure of that country’s economic performance. (This will be used for the upcoming Learning Check). Lastly, we save this in a data frame with name <code>gapminder2007</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(gapminder)
+gapminder2007 &lt;-<span class="st"> </span>gapminder <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">filter</span>(year <span class="op">==</span><span class="st"> </span><span class="dv">2007</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(country, continent, lifeExp, gdpPercap)</code></pre>
+<p>You should look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the <code>glimpse()</code> function. In Table <a href="6-regression.html#tab:model2-data-preview">6.5</a> we only show 5 randomly selected countries out of 142:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(gapminder2007)</code></pre>
+<table>
+<caption><span id="tab:model2-data-preview">Table 6.5: </span>Random sample of 5 countries</caption>
+<thead>
+<tr class="header">
+<th align="left">country</th>
+<th align="left">continent</th>
+<th align="right">lifeExp</th>
+<th align="right">gdpPercap</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Slovak Republic</td>
+<td align="left">Europe</td>
+<td align="right">74.7</td>
+<td align="right">18678</td>
+</tr>
+<tr class="even">
+<td align="left">Israel</td>
+<td align="left">Asia</td>
+<td align="right">80.7</td>
+<td align="right">25523</td>
+</tr>
+<tr class="odd">
+<td align="left">Bulgaria</td>
+<td align="left">Europe</td>
+<td align="right">73.0</td>
+<td align="right">10681</td>
+</tr>
+<tr class="even">
+<td align="left">Tanzania</td>
+<td align="left">Africa</td>
+<td align="right">52.5</td>
+<td align="right">1107</td>
+</tr>
+<tr class="odd">
+<td align="left">Myanmar</td>
+<td align="left">Asia</td>
+<td align="right">62.1</td>
+<td align="right">944</td>
+</tr>
+</tbody>
+</table>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(gapminder2007)</code></pre>
+<pre><code>Observations: 142
+Variables: 4
+$ country   &lt;fct&gt; Afghanistan, Albania, Algeria, Angola, Argentina, Australia…
+$ continent &lt;fct&gt; Asia, Europe, Africa, Africa, Americas, Oceania, Europe, As…
+$ lifeExp   &lt;dbl&gt; 43.8, 76.4, 72.3, 42.7, 75.3, 81.2, 79.8, 75.6, 64.1, 79.4,…
+$ gdpPercap &lt;dbl&gt; 975, 5937, 6223, 4797, 12779, 34435, 36126, 29796, 1391, 33…</code></pre>
+<p>We see that the variable <code>continent</code> is indeed categorical, as it is encoded as <code>fct</code> which stands for “factor.” This is R’s way of storing categorical variables. Let’s once again apply the <code>skim()</code> function from the <code>skimr</code> package to our two variables of interest: <code>continent</code> and <code>lifeExp</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">gapminder2007 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(continent, lifeExp) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">skim</span>()</code></pre>
+<pre><code>Skim summary statistics
+ n obs: 142 
+ n variables: 2 
+
+── Variable type:factor ──────
+  variable missing complete   n n_unique                         top_counts
+ continent       0      142 142        5 Afr: 52, Asi: 33, Eur: 30, Ame: 25
+ ordered
+   FALSE
+
+── Variable type:numeric ─────
+ variable missing complete   n  mean    sd    p0   p25   p50   p75 p100
+  lifeExp       0      142 142 67.01 12.07 39.61 57.16 71.94 76.41 82.6
+     hist
+ ▂▂▂▂▂▃▇▇</code></pre>
+<p>The output now reports summaries for categorical variables (the variable type: factor) separately from the numerical variables. For the categorical variable <code>continent</code> it now reports:</p>
+<ul>
+<li><code>missing</code>, <code>complete</code>, <code>n</code> as before which are the number of missing, complete, and total number of values.</li>
+<li><code>n_unique</code>: The unique number of levels to this variable, corresponding to Africa, Asia, Americas, Europe, and Oceania</li>
+<li><code>top_counts</code>: In this case the top four counts: Africa has 52 entries each corresponding to a country, Asia has 33, Europe has 30, and Americans has 25. Not displayed is Oceania with 2 countries</li>
+<li><code>ordered</code>: Reporting whether the variable is “ordinal.” In this case, it is not ordered.</li>
+</ul>
+<p>Given that the global median life expectancy is 71.94, half of the world’s countries (71 countries) will have a life expectancy less than 71.94. Further, half will have a life expectancy greater than this value. The mean life expectancy of 67.01 is lower however. Why are these two values different? Let’s look at a histogram of <code>lifeExp</code> in Figure <a href="6-regression.html#fig:lifeExp2007hist">6.13</a> to see why.</p>
+<div class="figure" style="text-align: center"><span id="fig:lifeExp2007hist"></span>
+<img src="ismaykim_files/figure-html/lifeExp2007hist-1.png" alt="Histogram of Life Expectancy in 2007" width="\textwidth" />
+<p class="caption">
+Figure 6.13: Histogram of Life Expectancy in 2007
+</p>
+</div>
+<p>We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancies that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a <code>group_by(continent)</code> to the above code:</p>
+<pre class="sourceCode r"><code class="sourceCode r">lifeExp_by_continent &lt;-<span class="st"> </span>gapminder2007 <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">group_by</span>(continent) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">median =</span> <span class="kw">median</span>(lifeExp), <span class="dt">mean =</span> <span class="kw">mean</span>(lifeExp))</code></pre>
+<table>
+<caption><span id="tab:catxplot0">Table 6.6: </span>Life expectancy by continent</caption>
+<thead>
+<tr class="header">
+<th align="left">continent</th>
+<th align="right">median</th>
+<th align="right">mean</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Africa</td>
+<td align="right">52.9</td>
+<td align="right">54.8</td>
+</tr>
+<tr class="even">
+<td align="left">Americas</td>
+<td align="right">72.9</td>
+<td align="right">73.6</td>
+</tr>
+<tr class="odd">
+<td align="left">Asia</td>
+<td align="right">72.4</td>
+<td align="right">70.7</td>
+</tr>
+<tr class="even">
+<td align="left">Europe</td>
+<td align="right">78.6</td>
+<td align="right">77.6</td>
+</tr>
+<tr class="odd">
+<td align="left">Oceania</td>
+<td align="right">80.7</td>
+<td align="right">80.7</td>
+</tr>
+</tbody>
+</table>
+<p>We see now that there are differences in life expectancies between the continents. For example let’s focus on only medians. While the median life expectancy across all <span class="math inline">\(n = 142\)</span> countries in 2007 was 71.935, the median life expectancy across the <span class="math inline">\(n =52\)</span> countries in Africa was only 52.927.</p>
+<p>Let’s create a corresponding visualization. One way to compare the life expectancies of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section <a href="3-viz.html#facets">3.6</a>, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure <a href="6-regression.html#fig:catxplot0b">6.14</a>, the variable we facet by is <code>continent</code>, which is categorical with five levels, each corresponding to the five continents of the world.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(gapminder2007, <span class="kw">aes</span>(<span class="dt">x =</span> lifeExp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Life expectancy&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Number of countries&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Life expectancy by continent&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>continent, <span class="dt">nrow =</span> <span class="dv">2</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:catxplot0b"></span>
+<img src="ismaykim_files/figure-html/catxplot0b-1.png" alt="Life expectancy in 2007" width="\textwidth" />
+<p class="caption">
+Figure 6.14: Life expectancy in 2007
+</p>
+</div>
+<p>Another way would be via a <code>geom_boxplot</code> where we map the categorical variable <code>continent</code> to the <span class="math inline">\(x\)</span>-axis and the different life expectancies within each continent on the <span class="math inline">\(y\)</span>-axis; we do this in Figure <a href="6-regression.html#fig:catxplot1">6.15</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(gapminder2007, <span class="kw">aes</span>(<span class="dt">x =</span> continent, <span class="dt">y =</span> lifeExp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Continent&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Life expectancy (years)&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Life expectancy by continent&quot;</span>) </code></pre>
+<div class="figure" style="text-align: center"><span id="fig:catxplot1"></span>
+<img src="ismaykim_files/figure-html/catxplot1-1.png" alt="Life expectancy in 2007" width="\textwidth" />
+<p class="caption">
+Figure 6.15: Life expectancy in 2007
+</p>
+</div>
+<p>Some people prefer comparing a numerical variable between different levels of a categorical variable, in this case comparing life expectancy between different continents, using a boxplot over a faceted histogram as we can make quick comparisons with single horizontal lines. For example, we can see that even the country with the highest life expectancy in Africa is still lower than all countries in Oceania.</p>
+<p>It’s important to remember however that the solid lines in the middle of the boxes correspond to the medians (i.e. the middle value) rather than the mean (the average). So, for example, if you look at Asia, the solid line denotes the median life expectancy of around 72 years, indicating to us that half of all countries in Asia have a life expectancy below 72 years whereas half of all countries in Asia have a life expectancy above 72 years. Furthermore, note that:</p>
+<ul>
+<li>Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes).</li>
+<li>Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand.</li>
+</ul>
+<p>Now, let’s start making comparisons of life expectancy <em>between</em> continents. Let’s use Africa as a <em>baseline for comparsion</em>. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa:</p>
+<ol style="list-style-type: decimal">
+<li>The median life expectancy of the Americas is roughly 20 years greater.</li>
+<li>The median life expectancy of Asia is roughly 20 years greater.</li>
+<li>The median life expectancy of Europe is roughly 25 years greater.</li>
+<li>The median life expectancy of Oceania is roughly 27.8 years greater.</li>
+</ol>
+<p>Let’s remember these four differences vs Africa corresponding to the Americas, Asia, Europe, and Oceania: 20, 20, 25, 27.8.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.4)</strong> Conduct a new exploratory data analysis with the same explanatory variable <span class="math inline">\(x\)</span> being <code>continent</code> but with <code>gdpPercap</code> as the new outcome variable <span class="math inline">\(y\)</span>. Remember, this involves three things:</p>
+<ol style="list-style-type: lower-alpha">
+<li>Looking at the raw values</li>
+<li>Computing summary statistics of the variables of interest.</li>
+<li>Creating informative visualizations</li>
+</ol>
+<p>What can you say about the differences in GDP per capita between continents based on this exploration?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model2table" class="section level3">
+<h3><span class="header-section-number">6.2.2</span> Linear regression</h3>
+<p>In Subsection <a href="6-regression.html#model1table">6.1.2</a> we introduced <em>simple</em> linear regression, which involves modeling the relationship between a numerical outcome variable <span class="math inline">\(y\)</span> as a function of a numerical explanatory variable <span class="math inline">\(x\)</span>, in our life expectancy example, we now have a categorical explanatory variable <span class="math inline">\(x\)</span> <code>continent</code>. While we still can fit a regression model, given our categorical explanatory variable we no longer have a concept of a “best-fitting” line, but rather “differences relative to a baseline for comparison.”</p>
+<p>Before we fit our regression model, let’s create a table similar to Table <a href="6-regression.html#tab:catxplot0">6.6</a>, but</p>
+<ol style="list-style-type: decimal">
+<li>Report the mean life expectancy for each continent.</li>
+<li>Report the difference in mean life expectancy <em>relative</em> to Africa’s mean life expectancy of 54.806 in the column “mean vs Africa”; this column is simply the “mean” column minus 54.806.</li>
+</ol>
+<p>Think back to your observations from the eyeball test of Figure <a href="6-regression.html#fig:catxplot1">6.15</a> at the end of the last subsection. The column “mean vs Africa” is the same idea of comparing a summary statistic to a baseline for comparison, in this case the countries of Africa, but using means instead of medians.</p>
+<table>
+<caption><span id="tab:continent-mean-life-expectancies">Table 6.7: </span>Mean life expectancy by continent</caption>
+<thead>
+<tr class="header">
+<th align="left">continent</th>
+<th align="right">mean</th>
+<th align="right">mean vs Africa</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Africa</td>
+<td align="right">54.8</td>
+<td align="right">0.0</td>
+</tr>
+<tr class="even">
+<td align="left">Americas</td>
+<td align="right">73.6</td>
+<td align="right">18.8</td>
+</tr>
+<tr class="odd">
+<td align="left">Asia</td>
+<td align="right">70.7</td>
+<td align="right">15.9</td>
+</tr>
+<tr class="even">
+<td align="left">Europe</td>
+<td align="right">77.6</td>
+<td align="right">22.8</td>
+</tr>
+<tr class="odd">
+<td align="left">Oceania</td>
+<td align="right">80.7</td>
+<td align="right">25.9</td>
+</tr>
+</tbody>
+</table>
+<p>Now, let’s use the <code>get_regression_table()</code> function we introduced in Section <a href="6-regression.html#model1table">6.1.2</a> to get the <em>regression table</em> for <code>gapminder2007</code> analysis:</p>
+<pre class="sourceCode r"><code class="sourceCode r">lifeExp_model &lt;-<span class="st"> </span><span class="kw">lm</span>(lifeExp <span class="op">~</span><span class="st"> </span>continent, <span class="dt">data =</span> gapminder2007)
+<span class="kw">get_regression_table</span>(lifeExp_model)</code></pre>
+<table>
+<caption><span id="tab:catxplot4b">Table 6.8: </span>Linear regression table</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">54.8</td>
+<td align="right">1.02</td>
+<td align="right">53.45</td>
+<td align="right">0</td>
+<td align="right">52.8</td>
+<td align="right">56.8</td>
+</tr>
+<tr class="even">
+<td align="left">continentAmericas</td>
+<td align="right">18.8</td>
+<td align="right">1.80</td>
+<td align="right">10.45</td>
+<td align="right">0</td>
+<td align="right">15.2</td>
+<td align="right">22.4</td>
+</tr>
+<tr class="odd">
+<td align="left">continentAsia</td>
+<td align="right">15.9</td>
+<td align="right">1.65</td>
+<td align="right">9.68</td>
+<td align="right">0</td>
+<td align="right">12.7</td>
+<td align="right">19.2</td>
+</tr>
+<tr class="even">
+<td align="left">continentEurope</td>
+<td align="right">22.8</td>
+<td align="right">1.70</td>
+<td align="right">13.47</td>
+<td align="right">0</td>
+<td align="right">19.5</td>
+<td align="right">26.2</td>
+</tr>
+<tr class="odd">
+<td align="left">continentOceania</td>
+<td align="right">25.9</td>
+<td align="right">5.33</td>
+<td align="right">4.86</td>
+<td align="right">0</td>
+<td align="right">15.4</td>
+<td align="right">36.5</td>
+</tr>
+</tbody>
+</table>
+<p>Just as before, we have the <code>term</code> and <code>estimates</code> columns of interest, but unlike before, we now have 5 rows corresponding to 5 outputs in our table: an intercept like before, but also <code>continentAmericas</code>, <code>continentAsia</code>, <code>continentEurope</code>, and <code>continentOceania</code>. What are these values? First, we must describe the equation for fitted value <span class="math inline">\(\widehat{y}\)</span>, which is a little more complicated when the <span class="math inline">\(x\)</span> explanatory variable is categorical:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\text{life exp}} &amp;= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)
+\end{align}
+\]</span></p>
+<p>Let’s break this down. First, <span class="math inline">\(\mathbb{1}_{A}(x)\)</span> is what’s known in mathematics as an “indicator function” that takes one of two possible values:</p>
+<p><span class="math display">\[
+\mathbb{1}_{A}(x) = \left\{
+\begin{array}{ll}
+1 &amp; \text{if } x \text{ is in } A \\
+0 &amp; \text{if } \text{otherwise} \end{array}
+\right.
+\]</span></p>
+<p>In a statistical modeling context this is also known as a “dummy variable”. In our case, let’s consider the first such indicator variable:</p>
+<p><span class="math display">\[
+\mathbb{1}_{\mbox{Amer}}(x) = \left\{
+\begin{array}{ll}
+1 &amp; \text{if } \text{country } x \text{ is in the Americas} \\
+0 &amp; \text{otherwise}\end{array}
+\right.
+\]</span></p>
+<p>Now let’s interpret the terms in the estimate column of the regression table. First <span class="math inline">\(b_0 =\)</span> <code>intercept = 54.8</code> corresponds to the mean life expectancy for countries in Africa, since for country <span class="math inline">\(x\)</span> in Africa we have the following equation:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\text{life exp}} &amp;= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot 0 + 15.9\cdot 0 + 22.8\cdot 0 + 25.9\cdot 0\\
+&amp;= 54.8
+\end{align}
+\]</span></p>
+<p>i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table <a href="6-regression.html#tab:continent-mean-life-expectancies">6.7</a>.</p>
+<p>Next, <span class="math inline">\(b_{\text{Amer}}\)</span> = <code>continentAmericas = 18.8</code> is the difference in mean life expectancies of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\text{life exp}} &amp;= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot 1 + 15.9\cdot 0 + 22.8\cdot 0 + 25.9\cdot 0\\
+&amp;= 54.8 + 18.8\\
+&amp;= 72.9
+\end{align}
+\]</span></p>
+<p>i.e. in this case, only the indicator function <span class="math inline">\(\mathbb{1}_{\mbox{Amer}}(x)\)</span> is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table <a href="6-regression.html#tab:continent-mean-life-expectancies">6.7</a>.</p>
+<p>Similarly, <span class="math inline">\(b_{\text{Asia}}\)</span> = <code>continentAsia = 15.9</code> is the difference in mean life expectancies of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\text{life exp}} &amp;= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot 0 + 15.9\cdot 1 + 22.8\cdot 0 + 25.9\cdot 0\\
+&amp;= 54.8 + 15.9\\
+&amp;= 70.7
+\end{align}
+\]</span></p>
+<p>i.e. in this case, only the indicator function <span class="math inline">\(\mathbb{1}_{\mbox{Asia}}(x)\)</span> is equal to 1, but all others are 0. Recall that 70.7 corresponds to the group mean life expectancy for all countries in Asia in Table <a href="6-regression.html#tab:continent-mean-life-expectancies">6.7</a>. The same logic applies to <span class="math inline">\(b_{\text{Euro}} = 22.8\)</span> and <span class="math inline">\(b_{\text{Ocean}} = 25.9\)</span>; they correspond to the “offset” in mean life expectancy for countries in Europe and Oceania, relative to the mean life expectancy of the baseline group for comparison of African countries.</p>
+<p>Let’s generalize this idea a bit. If we fit a linear regression model using a categorical explanatory variable <span class="math inline">\(x\)</span> that has <span class="math inline">\(k\)</span> levels, a regression model will return an intercept and <span class="math inline">\(k - 1\)</span> “slope” coefficients. When <span class="math inline">\(x\)</span> is a numerical explanatory variable the interpretation is of a “slope” coefficient, but when <span class="math inline">\(x\)</span> is categorical the meaning is a little trickier. They are <em>offsets</em> relative to the baseline.</p>
+<p>In our case, since there are <span class="math inline">\(k = 5\)</span> continents, the regression model returns an intercept corresponding to the baseline for comparison Africa and <span class="math inline">\(k - 1 = 4\)</span> slope coefficients corresponding to the Americas, Asia, Europe, and Oceania. Africa was chosen as the baseline by R for no other reason than it is first alphabetically of the 5 continents. You can manually specify which continent to use as baseline instead of the default choice of whichever comes first alphabetically, but we leave that to a more advanced course. (The <code>forcats</code> package is particularly nice for doing this and we encourage you to explore using it.)</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.5)</strong> Fit a new linear regression using <code>lm(gdpPercap ~ continent, data = gapminder2007)</code> where <code>gdpPercap</code> is the new outcome variable <span class="math inline">\(y\)</span>. Get information about the “best-fitting” line from the regression table by applying the <code>get_regression_table()</code> function. How do the regression results match up with the results from your exploratory data analysis above?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model2points" class="section level3">
+<h3><span class="header-section-number">6.2.3</span> Observed/fitted values and residuals</h3>
+<p>Recall in Subsection <a href="6-regression.html#model1points">6.1.3</a> when we had a numerical explanatory variable <span class="math inline">\(x\)</span>, we defined:</p>
+<ol style="list-style-type: decimal">
+<li>Observed values <span class="math inline">\(y\)</span>, or the observed value of the outcome variable</li>
+<li>Fitted values <span class="math inline">\(\widehat{y}\)</span>, or the value on the regression line for a given <span class="math inline">\(x\)</span> value</li>
+<li>Residuals <span class="math inline">\(y - \widehat{y}\)</span>, or the error between the observed value and the fitted value</li>
+</ol>
+<p>What do fitted values <span class="math inline">\(\widehat{y}\)</span> and residuals <span class="math inline">\(y - \widehat{y}\)</span> correspond to when the explanatory variable <span class="math inline">\(x\)</span> is categorical? Let’s investigate these values for the first 10 countries in the <code>gapminder2007</code> dataset:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-191">Table 6.9: </span>First 10 out of 142 countries</caption>
+<thead>
+<tr class="header">
+<th align="left">country</th>
+<th align="left">continent</th>
+<th align="right">lifeExp</th>
+<th align="right">gdpPercap</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Afghanistan</td>
+<td align="left">Asia</td>
+<td align="right">43.8</td>
+<td align="right">975</td>
+</tr>
+<tr class="even">
+<td align="left">Albania</td>
+<td align="left">Europe</td>
+<td align="right">76.4</td>
+<td align="right">5937</td>
+</tr>
+<tr class="odd">
+<td align="left">Algeria</td>
+<td align="left">Africa</td>
+<td align="right">72.3</td>
+<td align="right">6223</td>
+</tr>
+<tr class="even">
+<td align="left">Angola</td>
+<td align="left">Africa</td>
+<td align="right">42.7</td>
+<td align="right">4797</td>
+</tr>
+<tr class="odd">
+<td align="left">Argentina</td>
+<td align="left">Americas</td>
+<td align="right">75.3</td>
+<td align="right">12779</td>
+</tr>
+<tr class="even">
+<td align="left">Australia</td>
+<td align="left">Oceania</td>
+<td align="right">81.2</td>
+<td align="right">34435</td>
+</tr>
+<tr class="odd">
+<td align="left">Austria</td>
+<td align="left">Europe</td>
+<td align="right">79.8</td>
+<td align="right">36126</td>
+</tr>
+<tr class="even">
+<td align="left">Bahrain</td>
+<td align="left">Asia</td>
+<td align="right">75.6</td>
+<td align="right">29796</td>
+</tr>
+<tr class="odd">
+<td align="left">Bangladesh</td>
+<td align="left">Asia</td>
+<td align="right">64.1</td>
+<td align="right">1391</td>
+</tr>
+<tr class="even">
+<td align="left">Belgium</td>
+<td align="left">Europe</td>
+<td align="right">79.4</td>
+<td align="right">33693</td>
+</tr>
+</tbody>
+</table>
+<p>Recall the <code>get_regression_points()</code> function we used in Subsection <a href="6-regression.html#model1points">6.1.3</a> to return</p>
+<ul>
+<li>the observed value of the outcome variable,</li>
+<li>all explanatory variables,</li>
+<li>fitted values, and</li>
+<li>residuals for all points in the regression. Recall that each “point”. In this case, each row corresponds to one of 142 countries in the <code>gapminder2007</code> dataset. They are also the 142 observations used to construct the boxplots in Figure <a href="6-regression.html#fig:catxplot1">6.15</a>.</li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r">regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(lifeExp_model)
+regression_points</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-193">Table 6.10: </span>Regression points (First 10 out of 142 countries)</caption>
+<thead>
+<tr class="header">
+<th align="right">ID</th>
+<th align="right">lifeExp</th>
+<th align="left">continent</th>
+<th align="right">lifeExp_hat</th>
+<th align="right">residual</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">43.8</td>
+<td align="left">Asia</td>
+<td align="right">70.7</td>
+<td align="right">-26.900</td>
+</tr>
+<tr class="even">
+<td align="right">2</td>
+<td align="right">76.4</td>
+<td align="left">Europe</td>
+<td align="right">77.6</td>
+<td align="right">-1.226</td>
+</tr>
+<tr class="odd">
+<td align="right">3</td>
+<td align="right">72.3</td>
+<td align="left">Africa</td>
+<td align="right">54.8</td>
+<td align="right">17.495</td>
+</tr>
+<tr class="even">
+<td align="right">4</td>
+<td align="right">42.7</td>
+<td align="left">Africa</td>
+<td align="right">54.8</td>
+<td align="right">-12.075</td>
+</tr>
+<tr class="odd">
+<td align="right">5</td>
+<td align="right">75.3</td>
+<td align="left">Americas</td>
+<td align="right">73.6</td>
+<td align="right">1.712</td>
+</tr>
+<tr class="even">
+<td align="right">6</td>
+<td align="right">81.2</td>
+<td align="left">Oceania</td>
+<td align="right">80.7</td>
+<td align="right">0.515</td>
+</tr>
+<tr class="odd">
+<td align="right">7</td>
+<td align="right">79.8</td>
+<td align="left">Europe</td>
+<td align="right">77.6</td>
+<td align="right">2.180</td>
+</tr>
+<tr class="even">
+<td align="right">8</td>
+<td align="right">75.6</td>
+<td align="left">Asia</td>
+<td align="right">70.7</td>
+<td align="right">4.907</td>
+</tr>
+<tr class="odd">
+<td align="right">9</td>
+<td align="right">64.1</td>
+<td align="left">Asia</td>
+<td align="right">70.7</td>
+<td align="right">-6.666</td>
+</tr>
+<tr class="even">
+<td align="right">10</td>
+<td align="right">79.4</td>
+<td align="left">Europe</td>
+<td align="right">77.6</td>
+<td align="right">1.792</td>
+</tr>
+</tbody>
+</table>
+<p>Notice</p>
+<ul>
+<li>The fitted values <code>lifeExp_hat</code> <span class="math inline">\(\widehat{\text{lifeexp}}\)</span>. Countries in Africa have the
+same fitted value of 54.8, which is the mean life expectancy of Africa. Countries in Asia have the same fitted value of 70.7, which is the mean life
+expectancy of Asia. This similarly holds for countries in the Americas, Europe,
+and Oceania.</li>
+<li>The <code>residual</code> column is simply <span class="math inline">\(y - \widehat{y}\)</span> = <code>lifeexp - lifeexp_hat</code>.
+These values can be interpreted as that particular country’s deviation from the
+mean life expectancy of the respective continent’s mean. For example, the first
+row of this dataset corresponds to Afghanistan, and the residual of
+<span class="math inline">\(-26.9 = 43.8 - 70.7\)</span> is Afghanistan’s mean life expectancy minus the mean life
+expectancy of all Asian countries.</li>
+</ul>
+</div>
+<div id="model2residuals" class="section level3">
+<h3><span class="header-section-number">6.2.4</span> Residual analysis</h3>
+<p>Recall our discussion on residuals from Section <a href="6-regression.html#model1residuals">6.1.4</a> where our goal was to investigate whether or not there was a <em>systematic pattern</em> to the residuals. Ideally since residuals can be thought of as error, there should be no such pattern. While there are many ways to do such residual analysis, we focused on two approaches based on visualizations.</p>
+<ol style="list-style-type: decimal">
+<li>A plot with residuals on the vertical axis and the predictor (in this case continent) on the horizontal axis</li>
+<li>A histogram of all residuals</li>
+</ol>
+<p>First, let’s plot the residuals versus continent in Figure <a href="6-regression.html#fig:catxplot7">6.16</a>, but also let’s plot all 142 points with a little horizontal random jitter by setting the <code>width = 0.1</code> parameter in <code>geom_jitter()</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> continent, <span class="dt">y =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_jitter</span>(<span class="dt">width =</span> <span class="fl">0.1</span>) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Continent&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_hline</span>(<span class="dt">yintercept =</span> <span class="dv">0</span>, <span class="dt">col =</span> <span class="st">&quot;blue&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:catxplot7"></span>
+<img src="ismaykim_files/figure-html/catxplot7-1.png" alt="Plot of residuals over continent" width="\textwidth" />
+<p class="caption">
+Figure 6.16: Plot of residuals over continent
+</p>
+</div>
+<p>We observe</p>
+<ol style="list-style-type: decimal">
+<li>There seems to be a rough balance of both positive and negative residuals for all 5 continents.</li>
+<li>However, there is one clear outlier in Asia. It has the smallest residual,
+hence also has the smallest life expectancy in Asia.</li>
+</ol>
+<p>Let’s investigate the 5 countries in Asia with the shortest life expectancy:</p>
+<pre class="sourceCode r"><code class="sourceCode r">gapminder2007 <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">filter</span>(continent <span class="op">==</span><span class="st"> &quot;Asia&quot;</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">arrange</span>(lifeExp)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-195">Table 6.11: </span>Countries in Asia with shortest life expectancy</caption>
+<thead>
+<tr class="header">
+<th align="left">country</th>
+<th align="left">continent</th>
+<th align="right">lifeExp</th>
+<th align="right">gdpPercap</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Afghanistan</td>
+<td align="left">Asia</td>
+<td align="right">43.8</td>
+<td align="right">975</td>
+</tr>
+<tr class="even">
+<td align="left">Iraq</td>
+<td align="left">Asia</td>
+<td align="right">59.5</td>
+<td align="right">4471</td>
+</tr>
+<tr class="odd">
+<td align="left">Cambodia</td>
+<td align="left">Asia</td>
+<td align="right">59.7</td>
+<td align="right">1714</td>
+</tr>
+<tr class="even">
+<td align="left">Myanmar</td>
+<td align="left">Asia</td>
+<td align="right">62.1</td>
+<td align="right">944</td>
+</tr>
+<tr class="odd">
+<td align="left">Yemen, Rep.</td>
+<td align="left">Asia</td>
+<td align="right">62.7</td>
+<td align="right">2281</td>
+</tr>
+</tbody>
+</table>
+<p>This was the earlier identified residual for Afghanistan of -26.9. Unfortunately
+given recent geopolitical turmoil, individuals who live in Afghanistan and, in particular in 2007, have a
+drastically lower life expectancy.</p>
+<p>Second, let’s look at a histogram of all 142 values of
+residuals in Figure <a href="6-regression.html#fig:catxplot8">6.17</a>. In this case, the residuals form a
+rather nice bell-shape, although there are a couple of very low and very high
+values at the tails. As we said previously, searching for patterns in residuals
+can be somewhat subjective, but ideally we hope there are no “drastic” patterns.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:catxplot8"></span>
+<img src="ismaykim_files/figure-html/catxplot8-1.png" alt="Histogram of residuals" width="\textwidth" />
+<p class="caption">
+Figure 6.17: Histogram of residuals
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.6)</strong> Continuing with our regression using <code>gdpPercap</code> as the outcome variable and <code>continent</code> as the explanatory variable, use the <code>get_regression_points()</code> function to get the observed values, fitted values, and residuals for all 142 countries in 2007 and perform a residual analysis to look for any systematic patterns in the residuals. Is there a pattern? Please keep in mind that these types of questions are somewhat subjective and different people will most likely have different answers. The focus should be on being able to justify the conclusions made.</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="related-topics" class="section level2">
+<h2><span class="header-section-number">6.3</span> Related topics</h2>
+<div id="correlationcoefficient" class="section level3">
+<h3><span class="header-section-number">6.3.1</span> Correlation coefficient</h3>
+<p>Let’s re-plot Figure <a href="6-regression.html#fig:correlation1">6.1</a>, but now consider a broader range of correlation coefficient values in Figure <a href="6-regression.html#fig:correlation2">6.18</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:correlation2"></span>
+<img src="ismaykim_files/figure-html/correlation2-1.png" alt="Different Correlation Coefficients" width="\textwidth" />
+<p class="caption">
+Figure 6.18: Different Correlation Coefficients
+</p>
+</div>
+<p>As we suggested in Subsection <a href="6-regression.html#model1EDA">6.1.1</a>, interpreting coefficients that are not close to the extreme values of -1 and 1 can be subjective. To develop your sense of correlation coefficients, we suggest you play the following 80’s-style video game called “Guess the correlation”! Click on the image below to do so:</p>
+<center>
+<a target="_blank" href="http://guessthecorrelation.com/"><img src="images/guess_the_correlation.png" title="Guess the correlation" width="600"/></a>
+</center>
+</div>
+<div id="correlation-is-not-causation" class="section level3">
+<h3><span class="header-section-number">6.3.2</span> Correlation is not necessarily causation</h3>
+<p>You’ll note throughout this chapter we’ve been very cautious in making statements of the “associated effect” of explanatory variables on the outcome variables, for example our statement from Subsection <a href="6-regression.html#model1table">6.1.2</a> that “for every increase of 1 unit in <code>bty_avg</code>, there is an <em>associated</em> increase of, <em>on average</em>, 18.802 units of <code>score</code>.” We stay this because we are careful not to make <em>causal</em> statements. So while beauty score <code>bty_avg</code> is positively correlated with teaching <code>score</code>, does it directly cause effects on teaching score.</p>
+<p>For example, let’s say an instructor has their <code>bty_avg</code> reevaluated, but only after taking steps to try to boost their beauty score. Does this mean that they will suddenly be a better instructor? Or will they suddenly get higher teaching scores? Maybe?</p>
+<p>Here is another example, a not-so-great medical doctor goes through their medical records and finds that patients who slept with their shoes on tended to wake up more with headaches. So this doctor declares “Sleeping with shoes on cause headaches!”</p>
+<div class="figure" style="text-align: center"><span id="fig:moderndive-figure-causal-graph-2"></span>
+<img src="images/flowcharts/flowchart.010-cropped.png" alt="Does sleeping with shoes on cause headaches?" width="\textwidth" />
+<p class="caption">
+Figure 6.19: Does sleeping with shoes on cause headaches?
+</p>
+</div>
+<p>However as some of you might have guessed, if someone is sleeping with their shoes on its probably because they are intoxicated. Furthermore, drinking more tends to cause more hangovers, and hence more headaches.</p>
+<p>In this instance, alcohol is what’s known as a <em>confounding/lurking</em> variable. It “lurks” behind the scenes, confounding or making less apparent, the causal effect (if any) of “sleeping with shoes on” with waking up with a headache. We can summarize this notion in Figure <a href="6-regression.html#fig:moderndive-figure-causal-graph">6.20</a> with a <em>causal graph</em> where:</p>
+<ul>
+<li>Y: Is an <em>outcome</em> variable, here “waking up with a headache.”</li>
+<li>X: Is a <em>treatment</em> variable whose causal effect we are interested in, here “sleeping with shoes on.”</li>
+</ul>
+<div class="figure" style="text-align: center"><span id="fig:moderndive-figure-causal-graph"></span>
+<img src="images/flowcharts/flowchart.009-cropped.png" alt="Causal graph." width="\textwidth" />
+<p class="caption">
+Figure 6.20: Causal graph.
+</p>
+</div>
+<p>So for example, many such studies use regression modeling where the outcome variable is set to Y and the explanatory/predictor variable is X, much as you’ve started learning how to do in this chapter. However, Figure <a href="6-regression.html#fig:moderndive-figure-causal-graph">6.20</a> also includes a third variable with arrows pointing at both X and Y.</p>
+<ul>
+<li>Z: Is a <em>confounding</em> variable that effects both X &amp; Y, thus “confounding” their relationship.</li>
+</ul>
+<p>So as we said, alcohol will both cause people to be more likely to sleep with their shoes on as well as more likely to wake up with a headache. Thus when evaluating what causes one to wake up with a headache, its hard to tease out the effect of sleeping with shoes on versus just the alcohol. Thus our model needs to also use Z as an explanatory/predictor variable as well, in other words our doctor needs to take into account who had been drinking the night before. We’ll start covering multiple regression models that allows us to incorporate more than one variable in the next chapter.</p>
+<p>Establishing causation is a tricky problem and frequently takes either carefully designed experiments or methods to control for the effects of potential confounding variables. Both these approaches attempt either to remove all confounding variables or take them into account as best they can, and only focus on the behavior of an outcome variable in the presence of the levels of the other variable(s). Be careful as you read studies to make sure that the writers aren’t falling into this fallacy of correlation implying causation. If you spot one, you may want to send them a link to <a href="http://www.tylervigen.com/spurious-correlations">Spurious Correlations</a>.</p>
+</div>
+<div id="leastsquares" class="section level3">
+<h3><span class="header-section-number">6.3.3</span> Best fitting line</h3>
+<p>Regression lines are also known as “best fitting lines”. But what do we mean by best? Let’s unpack the criteria
+that is used by regression to determine best. Recall the plot in Figure <a href="6-regression.html#fig:numxplot5">6.8</a> where for a instructor
+with a beauty average score of <span class="math inline">\(x=7.333\)</span></p>
+<ul>
+<li>The observed value <span class="math inline">\(y=4.9\)</span> was marked with a red circle</li>
+<li>The fitted value <span class="math inline">\(\widehat{y} = 4.369\)</span> on the regression line was marked with a red square</li>
+<li>The residual <span class="math inline">\(y-\widehat{y} = 4.9-4.369 = 0.531\)</span> was the length of the blue arrow.</li>
+</ul>
+<p>Let’s do this for another arbitrarily chosen instructor whose beauty score was
+<span class="math inline">\(x=2.333\)</span>. The residual in this case is <span class="math inline">\(2.7 - 4.036 = -1.336\)</span>.</p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-198-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Another arbitrarily chosen instructor whose beauty score was
+<span class="math inline">\(x=3.667\)</span> results in the residual in this case being <span class="math inline">\(4.4 - 4.125 = 0.2753\)</span>.</p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-199-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Let’s do this one more time for another arbitrarily chosen instructor. This instructor had a beauty score of
+<span class="math inline">\(x = 6\)</span>. The residual in this case is <span class="math inline">\(3.8 - 4.28 = -0.4802\)</span>.</p>
+<p><img src="ismaykim_files/figure-html/here-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Now let’s say we repeated this process for all 463 instructors in our
+dataset. Regression <em>minimizes the sum of all 463 arrow lengths
+squared.</em> In other words, it minimizes the sum of the squared residuals:</p>
+<p><span class="math display">\[
+\sum_{i=1}^{n}(y_i - \widehat{y}_i)^2
+\]</span></p>
+<p>We square the arrow lengths so that positive and negative deviations of the same amount are treated equally. That’s why alternative names for the simple linear regression line are the <strong>least-squares line</strong> and the <strong>best fitting line</strong>. It can be proven via calculus and linear algebra that this line uniquely minimizes the sum of the squared arrow lengths.</p>
+<p>For the regression line in the plot, the sum of the squared residuals is 131.879. This is the lowest possible value of the sum of the squared residuals of all possible lines we could draw on this scatterplot? How do we know this? We can mathematically prove this fact, but this requires some calculus and linear algebra, so let’s leave this proof for another course!</p>
+</div>
+<div id="underthehood" class="section level3">
+<h3><span class="header-section-number">6.3.4</span> <code>get_regression_x()</code> functions</h3>
+<p>What is going on behind the scenes with the <code>get_regression_table()</code> <code>get_regression_points()</code> from the <code>moderndive</code> package? Recall we introduced</p>
+<ol style="list-style-type: decimal">
+<li>In Subsection <a href="6-regression.html#model1table">6.1.2</a>, the <code>get_regression_table()</code> function that returned a regression table.</li>
+<li>In Subsection <a href="6-regression.html#model1points">6.1.3</a>, the <code>get_regression_points()</code> function that returned information on all <span class="math inline">\(n\)</span> points/observations involved in a regression?</li>
+</ol>
+<p>and that these were examples of <em>wrapper functions</em> that takes other pre-existing functions and “wraps” them in a single function. This way all the user needs to worry about is the input and the output format, and ignore what’s “under the hood.” In this subsection we “lift the hood” and see how the engine of these wrapper functions work.</p>
+<p>First, the <code>get_regression_table()</code> wrapper function leverages the</p>
+<ul>
+<li>the <code>tidy()</code> function in the <a href="https://broom.tidyverse.org/"><code>broom</code> package</a> and</li>
+<li>the <code>clean_names()</code> function in the <a href="https://github.com/sfirke/janitor"><code>janitor</code> package</a></li>
+</ul>
+<p>to generate tidy data frames with information about a regression model. Here is what the regression table from Subsection <a href="6-regression.html#model1table">6.1.2</a> looks like:</p>
+<pre class="sourceCode r"><code class="sourceCode r">score_model &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>bty_avg, <span class="dt">data =</span> evals_ch6)
+<span class="kw">get_regression_table</span>(score_model)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">3.880</td>
+<td align="right">0.076</td>
+<td align="right">50.96</td>
+<td align="right">0</td>
+<td align="right">3.731</td>
+<td align="right">4.030</td>
+</tr>
+<tr class="even">
+<td align="left">bty_avg</td>
+<td align="right">0.067</td>
+<td align="right">0.016</td>
+<td align="right">4.09</td>
+<td align="right">0</td>
+<td align="right">0.035</td>
+<td align="right">0.099</td>
+</tr>
+</tbody>
+</table>
+<p>The <code>get_regression_table()</code> function takes the above two functions that already existed in other R packages, uses them, and hides the details as seen below. This was on the editorial decision on our part as we felt the following code was unfortunately out of the reach for some new coders, so the following wrapper function was written so that users need only focus on the output.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(broom)
+<span class="kw">library</span>(janitor)
+score_model <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">tidy</span>(<span class="dt">conf.int =</span> <span class="ot">TRUE</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate_if</span>(is.numeric, round, <span class="dt">digits =</span> <span class="dv">3</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">clean_names</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">lower_ci =</span> conf_low,
+         <span class="dt">upper_ci =</span> conf_high)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">(Intercept)</td>
+<td align="right">3.880</td>
+<td align="right">0.076</td>
+<td align="right">50.96</td>
+<td align="right">0</td>
+<td align="right">3.731</td>
+<td align="right">4.030</td>
+</tr>
+<tr class="even">
+<td align="left">bty_avg</td>
+<td align="right">0.067</td>
+<td align="right">0.016</td>
+<td align="right">4.09</td>
+<td align="right">0</td>
+<td align="right">0.035</td>
+<td align="right">0.099</td>
+</tr>
+</tbody>
+</table>
+<p>Note that the <code>mutate_if()</code> function is from the <code>dplyr</code> package and applies the <code>round()</code> function with 3 significant digits precision only to those variables that are numerical.</p>
+<p>Similarly, the second <code>get_regression_points()</code> function is another wrapper function, but this time returning information about the points in a regression rather than the regression table. It uses the <code>augment()</code> function in the <a href="https://broom.tidyverse.org/"><code>broom</code> package</a> instead of <code>tidy()</code> as with <code>get_regression_points()</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(broom)
+<span class="kw">library</span>(janitor)
+score_model <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">augment</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate_if</span>(is.numeric, round, <span class="dt">digits =</span> <span class="dv">3</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">clean_names</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span><span class="kw">c</span>(<span class="st">&quot;se_fit&quot;</span>, <span class="st">&quot;hat&quot;</span>, <span class="st">&quot;sigma&quot;</span>, <span class="st">&quot;cooksd&quot;</span>, <span class="st">&quot;std_resid&quot;</span>))</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="right">score</th>
+<th align="right">bty_avg</th>
+<th align="right">fitted</th>
+<th align="right">resid</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">4.7</td>
+<td align="right">5.00</td>
+<td align="right">4.21</td>
+<td align="right">0.486</td>
+</tr>
+<tr class="even">
+<td align="right">4.1</td>
+<td align="right">5.00</td>
+<td align="right">4.21</td>
+<td align="right">-0.114</td>
+</tr>
+<tr class="odd">
+<td align="right">3.9</td>
+<td align="right">5.00</td>
+<td align="right">4.21</td>
+<td align="right">-0.314</td>
+</tr>
+<tr class="even">
+<td align="right">4.8</td>
+<td align="right">5.00</td>
+<td align="right">4.21</td>
+<td align="right">0.586</td>
+</tr>
+<tr class="odd">
+<td align="right">4.6</td>
+<td align="right">3.00</td>
+<td align="right">4.08</td>
+<td align="right">0.520</td>
+</tr>
+<tr class="even">
+<td align="right">4.3</td>
+<td align="right">3.00</td>
+<td align="right">4.08</td>
+<td align="right">0.220</td>
+</tr>
+<tr class="odd">
+<td align="right">2.8</td>
+<td align="right">3.00</td>
+<td align="right">4.08</td>
+<td align="right">-1.280</td>
+</tr>
+<tr class="even">
+<td align="right">4.1</td>
+<td align="right">3.33</td>
+<td align="right">4.10</td>
+<td align="right">-0.002</td>
+</tr>
+<tr class="odd">
+<td align="right">3.4</td>
+<td align="right">3.33</td>
+<td align="right">4.10</td>
+<td align="right">-0.702</td>
+</tr>
+<tr class="even">
+<td align="right">4.5</td>
+<td align="right">3.17</td>
+<td align="right">4.09</td>
+<td align="right">0.409</td>
+</tr>
+</tbody>
+</table>
+<p>In this case, it outputs only variables of interest to us as new regression modelers: the outcome variable <span class="math inline">\(y\)</span> (<code>score</code>), all explanatory/predictor variables (<code>bty_avg</code>), all resulting <code>fitted</code> values <span class="math inline">\(\hat{y}\)</span> used by applying the equation of the regression line to <code>bty_avg</code>, and the <code>resid</code>ual <span class="math inline">\(y - \hat{y}\)</span>.</p>
+<p>If you’re even more curious, take a look at the source code for these functions on <a href="https://github.com/moderndive/moderndive/blob/master/R/regression_functions.R">GitHub</a>.</p>
+</div>
+</div>
+<div id="conclusion-4" class="section level2">
+<h2><span class="header-section-number">6.4</span> Conclusion</h2>
+<p>In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter <a href="7-multiple-regression.html#multiple-regression">7</a>, we’ll study <em>multiple regression</em> where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections {#model1residuals} and {#model2residuals}. We are actually verifying some very important assumptions that must be met for the <code>std_error</code> (standard error), <code>p_value</code>, <code>lower_ci</code> and <code>upper_ci</code> (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in!</p>
+<div id="script-of-r-code-3" class="section level3">
+<h3><span class="header-section-number">6.4.1</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/06-regression.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="5-wrangling.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="7-multiple-regression.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/06-regression.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/previous_versions/v0.4.0/7-hypothesis-testing.html b/docs/previous_versions/v0.4.0/7-hypothesis-testing.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/docs/previous_versions/v0.4.0/7-multiple-regression.html b/docs/previous_versions/v0.4.0/7-multiple-regression.html
new file mode 100644
index 000000000..d74909907
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/7-multiple-regression.html
@@ -0,0 +1,1538 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>7 Multiple Regression | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="7 Multiple Regression | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="7 Multiple Regression | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="6-regression.html">
+<link rel="next" href="8-sampling.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="multiple-regression" class="section level1">
+<h1><span class="header-section-number">7</span> Multiple Regression</h1>
+<p>In Chapter <a href="6-regression.html#regression">6</a> we introduced ideas related to modeling, in particular that the fundamental premise of modeling is <em>to make explicit the relationship</em> between an outcome variable <span class="math inline">\(y\)</span> and an explanatory/predictor variable <span class="math inline">\(x\)</span>. Recall further the synonyms that we used to also denote <span class="math inline">\(y\)</span> as the dependent variable and <span class="math inline">\(x\)</span> as an independent variable or covariate.</p>
+<p>There are many modeling approaches one could take, among the most well-known being linear regression, which was the focus of the last chapter. Whereas in the last chapter we focused solely on regression scenarios where there is only one explanatory/predictor variable, in this chapter, we now focus on modeling scenarios where there is more than one. This case of regression more than one explanatory variable is known as multiple regression. You can imagine when trying to model a particular outcome variable, like teaching evaluation score as in Section <a href="6-regression.html#model1">6.1</a> or life expectancy as in Section <a href="6-regression.html#model2">6.2</a>, it would be very useful to incorporate more than one explanatory variable.</p>
+<p>Since our regression models will now consider more than one explanatory/predictor variable, the interpretation of the associated effect of any one explanatory/predictor variables must be made in conjunction with the others. For example, say we are modeling individuals’ incomes as a function of their number of years of education and their parents’ wealth. When interpreting the effect of education on income, one has to consider the effect of their parents’ wealth at the same time, as these two variables are almost certainly related. Make note of this throughout this chapter and as you work on interpreting the results of multiple regression models into the future.</p>
+<div id="needed-packages-4" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(moderndive)
+<span class="kw">library</span>(ISLR)
+<span class="kw">library</span>(skimr)</code></pre>
+</div>
+<div id="datacamp-5" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>The approach taken below of using more than one variable of information in models using multiple regression is identical to that taken in ModernDive co-author <a href="https://twitter.com/rudeboybert">Albert Y. Kim’s</a> DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression”.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse"><img src="images/datacamp_intro_to_modeling.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="model3" class="section level2">
+<h2><span class="header-section-number">7.1</span> Two numerical explanatory variables</h2>
+<p>Let’s now attempt to identify factors that are associated with how much credit card debt an individual will have. The textbook <a href="http://www-bcf.usc.edu/~gareth/ISL/">An Introduction to Statistical Learning with Applications in R</a> by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani is an intermediate-level textbook on statistical and machine learning freely available <a href="http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf">here</a>. It has an accompanying R package called <code>ISLR</code> with datasets that the authors use to demonstrate various machine learning methods. One dataset that is frequently used by the authors is the <code>Credit</code> dataset where predictions are made on the credit card balance held by <span class="math inline">\(n = 400\)</span> credit card holders. These predictions are based on information about them like income, credit limit, and education level. Note that this dataset is not based on actual individuals, it is a simulated dataset used for educational purposes.</p>
+<p>Since no information was provided as to who these <span class="math inline">\(n\)</span> = 400 individuals are and how they came to be included in this dataset, it will be hard to make any scientific claims based on this data. Recall our discussion from the previous chapter that correlation does not necessarily imply causation. That being said, we’ll still use <code>Credit</code> to demonstrate multiple regression with:</p>
+<ol style="list-style-type: decimal">
+<li>A numerical outcome variable <span class="math inline">\(y\)</span>, in this case credit card balance.</li>
+<li>Two explanatory variables:
+<ol style="list-style-type: decimal">
+<li>A first numerical explanatory variable <span class="math inline">\(x_1\)</span>. In this case, their credit limit.</li>
+<li>A second numerical explanatory variable <span class="math inline">\(x_2\)</span>. In this case, their income (in thousands of dollars).</li>
+</ol></li>
+</ol>
+<p>In the forthcoming Learning Checks, we’ll consider a different scenario:</p>
+<ol style="list-style-type: decimal">
+<li>The same numerical outcome variable <span class="math inline">\(y\)</span>: credit card balance.</li>
+<li>Two new explanatory variables:
+<ol style="list-style-type: decimal">
+<li>A first numerical explanatory variable <span class="math inline">\(x_1\)</span>: their credit rating.</li>
+<li>A second numerical explanatory variable <span class="math inline">\(x_2\)</span>: their age.</li>
+</ol></li>
+</ol>
+<div id="model3EDA" class="section level3">
+<h3><span class="header-section-number">7.1.1</span> Exploratory data analysis</h3>
+<p>Let’s load the <code>Credit</code> data and <code>select()</code> only the needed subset of variables.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ISLR)
+Credit &lt;-<span class="st"> </span>Credit <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(Balance, Limit, Income, Rating, Age)</code></pre>
+<p>Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the <code>glimpse()</code> function. Although in Table <a href="7-multiple-regression.html#tab:model3-data-preview">7.1</a> we only show 5 randomly selected credit card holders out of 400:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(Credit)</code></pre>
+<table>
+<caption><span id="tab:model3-data-preview">Table 7.1: </span>Random sample of 5 credit card holders</caption>
+<thead>
+<tr class="header">
+<th align="right">Balance</th>
+<th align="right">Limit</th>
+<th align="right">Income</th>
+<th align="right">Rating</th>
+<th align="right">Age</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1425</td>
+<td align="right">6045</td>
+<td align="right">39.8</td>
+<td align="right">459</td>
+<td align="right">32</td>
+</tr>
+<tr class="even">
+<td align="right">279</td>
+<td align="right">3300</td>
+<td align="right">15.1</td>
+<td align="right">266</td>
+<td align="right">66</td>
+</tr>
+<tr class="odd">
+<td align="right">204</td>
+<td align="right">5308</td>
+<td align="right">80.6</td>
+<td align="right">394</td>
+<td align="right">57</td>
+</tr>
+<tr class="even">
+<td align="right">1050</td>
+<td align="right">9310</td>
+<td align="right">180.4</td>
+<td align="right">665</td>
+<td align="right">67</td>
+</tr>
+<tr class="odd">
+<td align="right">15</td>
+<td align="right">4952</td>
+<td align="right">88.8</td>
+<td align="right">360</td>
+<td align="right">86</td>
+</tr>
+</tbody>
+</table>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(Credit)</code></pre>
+<pre><code>Observations: 400
+Variables: 5
+$ Balance &lt;int&gt; 333, 903, 580, 964, 331, 1151, 203, 872, 279, 1350, 1407, 0, …
+$ Limit   &lt;int&gt; 3606, 6645, 7075, 9504, 4897, 8047, 3388, 7114, 3300, 6819, 8…
+$ Income  &lt;dbl&gt; 14.9, 106.0, 104.6, 148.9, 55.9, 80.2, 21.0, 71.4, 15.1, 71.1…
+$ Rating  &lt;int&gt; 283, 483, 514, 681, 357, 569, 259, 512, 266, 491, 589, 138, 3…
+$ Age     &lt;int&gt; 34, 82, 71, 36, 68, 77, 37, 87, 66, 41, 30, 64, 57, 49, 75, 5…</code></pre>
+<p>Let’s look at some summary statistics, again using the <code>skim()</code> function from the <code>skimr</code> package:</p>
+<pre class="sourceCode r"><code class="sourceCode r">Credit <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(Balance, Limit, Income) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">skim</span>()</code></pre>
+<pre><code>Skim summary statistics
+ n obs: 400 
+ n variables: 3 
+
+── Variable type:integer ─────
+ variable missing complete   n    mean      sd  p0     p25    p50     p75  p100
+  Balance       0      400 400  520.01  459.76   0   68.75  459.5  863     1999
+    Limit       0      400 400 4735.6  2308.2  855 3088    4622.5 5872.75 13913
+     hist
+ ▇▃▃▃▂▁▁▁
+ ▅▇▇▃▂▁▁▁
+
+── Variable type:numeric ─────
+ variable missing complete   n  mean    sd    p0   p25   p50   p75   p100
+   Income       0      400 400 45.22 35.24 10.35 21.01 33.12 57.47 186.63
+     hist
+ ▇▃▂▁▁▁▁▁</code></pre>
+<p>We observe for example:</p>
+<ol style="list-style-type: decimal">
+<li>The mean and median credit card balance are $520.01 and $495.50 respectively.</li>
+<li>25% of card holders had debts of $68.75 or less.</li>
+<li>The mean and median credit card limit are $4735.6 and $4622.50 respectively.</li>
+<li>75% of these card holders had incomes of $57,470 or less.</li>
+</ol>
+<p>Since our outcome variable <code>Balance</code> and the explanatory variables <code>Limit</code> and
+<code>Rating</code> are numerical, we can compute the correlation coefficient between pairs
+of these variables. First, we could run the <code>get_correlation()</code> command as seen
+in Subsection <a href="6-regression.html#model1EDA">6.1.1</a> twice, once for each explanatory variable:</p>
+<pre class="sourceCode r"><code class="sourceCode r">Credit <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_correlation</span>(Balance <span class="op">~</span><span class="st"> </span>Limit)
+Credit <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_correlation</span>(Balance <span class="op">~</span><span class="st"> </span>Income)</code></pre>
+<p>Or we can simultaneously compute them by returning a <em>correlation matrix</em> in
+Table <a href="7-multiple-regression.html#tab:model3-correlation">7.2</a>. We can read off the correlation coefficient
+for any pair of variables by looking them up in the appropriate row/column combination.</p>
+<pre class="sourceCode r"><code class="sourceCode r">Credit <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(Balance, Limit, Income) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">cor</span>()</code></pre>
+<table>
+<caption><span id="tab:model3-correlation">Table 7.2: </span>Correlations between credit card balance, credit limit, and income</caption>
+<thead>
+<tr class="header">
+<th></th>
+<th align="right">Balance</th>
+<th align="right">Limit</th>
+<th align="right">Income</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>Balance</td>
+<td align="right">1.000</td>
+<td align="right">0.862</td>
+<td align="right">0.464</td>
+</tr>
+<tr class="even">
+<td>Limit</td>
+<td align="right">0.862</td>
+<td align="right">1.000</td>
+<td align="right">0.792</td>
+</tr>
+<tr class="odd">
+<td>Income</td>
+<td align="right">0.464</td>
+<td align="right">0.792</td>
+<td align="right">1.000</td>
+</tr>
+</tbody>
+</table>
+<p>For example, the correlation coefficient of:</p>
+<ol style="list-style-type: decimal">
+<li><code>Balance</code> with itself is 1 as we would expect based on the definition of the correlation coefficient.</li>
+<li><code>Balance</code> with <code>Limit</code> is 0.862. This indicates a strong positive linear relationship, which makes sense as only individuals with large credit limits can accrue large credit card balances.</li>
+<li><code>Balance</code> with <code>Income</code> is 0.464. This is suggestive of another positive linear relationship, although not as strong as the relationship between <code>Balance</code> and <code>Limit</code>.</li>
+<li>As an added bonus, we can read off the correlation coefficient of the two explanatory variables, <code>Limit</code> and <code>Income</code> of 0.792. In this case, we say there is a high degree of <em>collinearity</em> between these two explanatory variables.</li>
+</ol>
+<p>Collinearity (or multicollinearity) is a phenomenon in which one explanatory variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. So in this case, if we knew someone’s credit card <code>Limit</code> and since <code>Limit</code> and <code>Income</code> are highly correlated, we could make a fairly accurate guess as to that person’s <code>Income</code>. Or put loosely, these two variables provided redundant information. For now let’s ignore any issues related to collinearity and press on.</p>
+<p>Let’s visualize the relationship of the outcome variable with each of the two explanatory variables in two separate plots:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(Credit, <span class="kw">aes</span>(<span class="dt">x =</span> Limit, <span class="dt">y =</span> Balance)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Credit limit (in $)&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Credit card balance (in $)&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Relationship between balance and credit limit&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>, <span class="dt">se =</span> <span class="ot">FALSE</span>)
+  
+<span class="kw">ggplot</span>(Credit, <span class="kw">aes</span>(<span class="dt">x =</span> Income, <span class="dt">y =</span> Balance)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Income (in $1000)&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Credit card balance (in $)&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Relationship between balance and income&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>, <span class="dt">se =</span> <span class="ot">FALSE</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:2numxplot1"></span>
+<img src="ismaykim_files/figure-html/2numxplot1-1.png" alt="Relationship between credit card balance and credit limit/income" width="\textwidth" />
+<p class="caption">
+Figure 7.1: Relationship between credit card balance and credit limit/income
+</p>
+</div>
+<p>First, there is a positive relationship between credit limit and balance, since as credit limit increases so also does credit card balance; this is to be expected given the strongly positive correlation coefficient of 0.862. In the case of income, the positive relationship doesn’t appear as strong, given the weakly positive correlation coefficient of 0.464. However the two plots in Figure <a href="7-multiple-regression.html#fig:2numxplot1">7.1</a> only focus on the relationship of the outcome variable with each of the explanatory variables independently. To get a sense of the <em>joint</em> relationship of all three variables simultaneously through a visualization, let’s display the data in a 3-dimensional (3D) scatterplot, where</p>
+<ol style="list-style-type: decimal">
+<li>The numerical outcome variable <span class="math inline">\(y\)</span> <code>Balance</code> is on the z-axis (vertical axis)</li>
+<li>The two numerical explanatory variables form the “floor” axes. In this case
+<ol style="list-style-type: decimal">
+<li>The first numerical explanatory variable <span class="math inline">\(x_1\)</span> <code>Income</code> is on of the floor axes.</li>
+<li>The second numerical explanatory variable <span class="math inline">\(x_2\)</span> <code>Limit</code> is on the other floor axis.</li>
+</ol></li>
+</ol>
+<p>Click on the following image to open an interactive 3D scatterplot in your browser:</p>
+<center>
+<a target="_blank" href="https://assets.datacamp.com/production/repositories/1575/datasets/f369dc94041e88effd5ed66512978f8cdfd33801/03-01-slides-interactive_3D_scatterplot_regression_plane.html"><img src="images/credit_card_balance_3D_scatterplot.png" title="3D scatterplot" width="800"/></a>
+</center>
+<p>Previously in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>, we plotted a “best-fitting” regression line through a set of points where the numerical outcome variable <span class="math inline">\(y\)</span> was teaching <code>score</code> and a single numerical explanatory variable <span class="math inline">\(x\)</span> was <code>bty_avg</code>. What is the analogous concept when we have <em>two</em> numerical predictor variables? Instead of a best-fitting line, we now have a best-fitting <em>plane</em>, which is a 3D generalization of lines which exist in 2D. Click on the following image to open an interactive plot of the regression plane in your browser. Move the image around, zoom in, and think about how this plane generalizes the concept of a linear regression line to three dimensions.</p>
+<center>
+<a target="_blank" href="https://beta.rstudioconnect.com/connect/#/apps/3214/"><img src="images/credit_card_balance_regression_plane.png" title="Regression plane" width="800"/></a>
+</center>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC7.1)</strong> Conduct a new exploratory data analysis with the same outcome variable <span class="math inline">\(y\)</span> being <code>Balance</code> but with <code>Rating</code> and <code>Age</code> as the new explanatory variables <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span>. Remember, this involves three things:</p>
+<ol style="list-style-type: lower-alpha">
+<li>Looking at the raw values</li>
+<li>Computing summary statistics of the variables of interest.</li>
+<li>Creating informative visualizations</li>
+</ol>
+<p>What can you say about the relationship between a credit card holder’s balance and their credit rating and age?</p>
+<!-- CHESTER: I'm not sold on this practice and prefer to assign new variables in R like `Credit_small` instead of overwriting. I seem to remember us agreeing that re-assignment was only OK if we added more variables in Chapter 2-5, not if we chose a subset. We should stay consistent throughout so I'd recommend switching this to a different name as I have with `evals` in Chapters 6 and 7. -->
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model3table" class="section level3">
+<h3><span class="header-section-number">7.1.2</span> Multiple regression</h3>
+<p>Just as we did when we had a single numerical explanatory variable <span class="math inline">\(x\)</span> in Subsection <a href="6-regression.html#model1table">6.1.2</a> and when we had a single categorical explanatory variable <span class="math inline">\(x\)</span> in Subsection <a href="6-regression.html#model2table">6.2.2</a>, we fit a regression model and obtained the regression table in our two numerical explanatory variable scenario. To fit a regression model and get a table using <code>get_regression_table()</code>, we now use a <code>+</code> to consider multiple explanatory variables. In this case since we want to perform a regression of <code>Limit</code> and <code>Income</code> simultaneously, we input <code>Balance ~ Limit + Income</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">Balance_model &lt;-<span class="st"> </span><span class="kw">lm</span>(Balance <span class="op">~</span><span class="st"> </span>Limit <span class="op">+</span><span class="st"> </span>Income, <span class="dt">data =</span> Credit)
+<span class="kw">get_regression_table</span>(Balance_model)</code></pre>
+<table>
+<caption><span id="tab:model3-table-output">Table 7.3: </span>Multiple regression table</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">-385.179</td>
+<td align="right">19.465</td>
+<td align="right">-19.8</td>
+<td align="right">0</td>
+<td align="right">-423.446</td>
+<td align="right">-346.912</td>
+</tr>
+<tr class="even">
+<td align="left">Limit</td>
+<td align="right">0.264</td>
+<td align="right">0.006</td>
+<td align="right">45.0</td>
+<td align="right">0</td>
+<td align="right">0.253</td>
+<td align="right">0.276</td>
+</tr>
+<tr class="odd">
+<td align="left">Income</td>
+<td align="right">-7.663</td>
+<td align="right">0.385</td>
+<td align="right">-19.9</td>
+<td align="right">0</td>
+<td align="right">-8.420</td>
+<td align="right">-6.906</td>
+</tr>
+</tbody>
+</table>
+<p>How do we interpret these three values that define the regression plane?</p>
+<ul>
+<li>Intercept: -$385.18 (rounded to two decimal points to represent cents). The intercept in our case represents the credit card balance for an individual who has both a credit <code>Limit</code> of $0 and <code>Income</code> of $0. In our data however, the intercept has limited practical interpretation as no individuals had <code>Limit</code> or <code>Income</code> values of $0 and furthermore the smallest credit card balance was $0. Rather, it is used to situate the regression plane in 3D space.</li>
+<li>Limit: $0.26. Now that we have multiple variables to consider, we have to add
+a caveat to our interpretation: <em>taking all other variables in our model into account, for every increase of one unit in credit <code>Limit</code> (dollars), there is an associated increase of on average $0.26 in credit card balance</em>. Note:
+<ul>
+<li>Just as we did in Subsection <a href="6-regression.html#model1table">6.1.2</a>, we are not making any causal statements, only statements relating to the association between credit limit and balance</li>
+<li>We need to preface our interpretation of the associated effect of <code>Limit</code> with the statement “taking all other variables into account”, in this case <code>Income</code>, to emphasize that we are now jointly interpreting the associated effect of multiple explanatory variables in the same model and not in isolation.</li>
+</ul></li>
+<li>Income: -$7.66. Similarly, <em>taking all other variables into account, for every increase of one unit in <code>Income</code> (in other words, $1000 in income), there is an associated decrease of on average $7.66 in credit card balance</em>.</li>
+</ul>
+<p>However, recall in Figure <a href="7-multiple-regression.html#fig:2numxplot1">7.1</a> that when considered separately, both <code>Limit</code> and <code>Income</code> had positive relationships with the outcome variable <code>Balance</code>. As card holders’ credit limits increased their credit card balances tended to increase as well, and a similar relationship held for incomes and balances. In the above multiple regression, however, the slope for <code>Income</code> is now -7.66, suggesting a <em>negative relationship</em> between income and credit card balance. What explains these contradictory results?</p>
+<p>This is known as Simpson’s Paradox, a phenomenon in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. We expand on this in Subsection <a href="7-multiple-regression.html#simpsonsparadox">7.3.2</a> where we’ll look at the relationship between credit <code>Limit</code> and credit card balance but split by different income bracket groups.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC7.2)</strong> Fit a new simple linear regression using <code>lm(Balance ~ Rating + Age, data = Credit)</code> where <code>Rating</code> and <code>Age</code> are the new numerical explanatory variables <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span>. Get information about the “best-fitting” line from the regression table by applying the <code>get_regression_table()</code> function. How do the regression results match up with the results from your exploratory data analysis above?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model3points" class="section level3">
+<h3><span class="header-section-number">7.1.3</span> Observed/fitted values and residuals</h3>
+<p>As we did previously in Table <a href="7-multiple-regression.html#tab:model3-points-table">7.4</a>, let’s unpack the output of the <code>get_regression_points()</code> function for our model for credit card balance for all 400 card holders in the dataset. Recall that each card holder corresponds to one of the 400 rows in the <code>Credit</code> data frame and also for one of the 400 3D points in the 3D scatterplots in Subsection <a href="7-multiple-regression.html#model3EDA">7.1.1</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(Balance_model)
+regression_points</code></pre>
+<table>
+<caption><span id="tab:model3-points-table">Table 7.4: </span>Regression points (first 5 rows of 400)</caption>
+<thead>
+<tr class="header">
+<th align="right">ID</th>
+<th align="right">Balance</th>
+<th align="right">Limit</th>
+<th align="right">Income</th>
+<th align="right">Balance_hat</th>
+<th align="right">residual</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">333</td>
+<td align="right">3606</td>
+<td align="right">14.9</td>
+<td align="right">454</td>
+<td align="right">-120.8</td>
+</tr>
+<tr class="even">
+<td align="right">2</td>
+<td align="right">903</td>
+<td align="right">6645</td>
+<td align="right">106.0</td>
+<td align="right">559</td>
+<td align="right">344.3</td>
+</tr>
+<tr class="odd">
+<td align="right">3</td>
+<td align="right">580</td>
+<td align="right">7075</td>
+<td align="right">104.6</td>
+<td align="right">683</td>
+<td align="right">-103.4</td>
+</tr>
+<tr class="even">
+<td align="right">4</td>
+<td align="right">964</td>
+<td align="right">9504</td>
+<td align="right">148.9</td>
+<td align="right">986</td>
+<td align="right">-21.7</td>
+</tr>
+<tr class="odd">
+<td align="right">5</td>
+<td align="right">331</td>
+<td align="right">4897</td>
+<td align="right">55.9</td>
+<td align="right">481</td>
+<td align="right">-150.0</td>
+</tr>
+</tbody>
+</table>
+<p>Recall the format of the output:</p>
+<ul>
+<li><code>Balance</code> corresponds to <span class="math inline">\(y\)</span> (the observed value)</li>
+<li><code>Balance_hat</code> corresponds to <span class="math inline">\(\widehat{y}\)</span> (the fitted value)</li>
+<li><code>residual</code> corresponds to <span class="math inline">\(y - \widehat{y}\)</span> (the residual)</li>
+</ul>
+</div>
+<div id="model3residuals" class="section level3">
+<h3><span class="header-section-number">7.1.4</span> Residual analysis</h3>
+<p>Recall in Section <a href="6-regression.html#model1residuals">6.1.4</a>, our first residual analysis plot investigated the presence of any systematic pattern in the residuals when we had a single numerical predictor: <code>bty_age</code>. For the <code>Credit</code> card dataset, since we have two numerical predictors, <code>Limit</code> and <code>Income</code>, we must perform this twice:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> Limit, <span class="dt">y =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Credit limit (in $)&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Residuals vs credit limit&quot;</span>)
+  
+<span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> Income, <span class="dt">y =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Income (in $1000)&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Residuals vs income&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-226"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-226-1.png" alt="Residuals vs credit limit and income" width="\textwidth" />
+<p class="caption">
+Figure 7.2: Residuals vs credit limit and income
+</p>
+</div>
+<p>In this case, there <strong>does</strong> appear to be a systematic pattern to the residuals. As the scatter of the residuals around the line <span class="math inline">\(y=0\)</span> is definitely not consistent. This behavior of the residuals is further evidenced by the histogram of residuals in Figure <a href="7-multiple-regression.html#fig:model3-residuals-hist">7.3</a>. We observe that the residuals have a slight right-skew (recall we say that data is right-skewed, or positively-skewed, if there is a tail to the right). Ideally, these residuals should be bell-shaped around a residual value of 0.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:model3-residuals-hist"></span>
+<img src="ismaykim_files/figure-html/model3-residuals-hist-1.png" alt="Relationship between credit card balance and credit limit/income" width="\textwidth" />
+<p class="caption">
+Figure 7.3: Relationship between credit card balance and credit limit/income
+</p>
+</div>
+<p>Another way to interpret this histogram is that since the residual is computed as <span class="math inline">\(y - \widehat{y}\)</span> = <code>balance</code> - <code>balance_hat</code>, we have some values where the fitted value <span class="math inline">\(\widehat{y}\)</span> is very much lower than the observed value <span class="math inline">\(y\)</span>. In other words, we are underestimating certain credit card holders’ balances by a very large amount.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC7.3)</strong> Continuing with our regression using <code>Rating</code> and <code>Age</code> as the explanatory variables and credit card <code>Balance</code> as the outcome variable, use the <code>get_regression_points()</code> function to get the observed values, fitted values, and residuals for all 400 credit card holders. Perform a residual analysis and look for any systematic patterns in the residuals.</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="model4" class="section level2">
+<h2><span class="header-section-number">7.2</span> One numerical &amp; one categorical explanatory variable</h2>
+<p>Let’s revisit the instructor evaluation data introduced in Section <a href="6-regression.html#model1">6.1</a>, where we studied the relationship between instructor evaluation scores and their beauty scores. This analysis suggested that there is a positive relationship between <code>bty_avg</code> and <code>score</code>, in other words as instructors had higher beauty scores, they also tended to have higher teaching evaluation scores. Now let’s say instead of <code>bty_avg</code> we are interested in the numerical explanatory variable <span class="math inline">\(x_1\)</span> <code>age</code> and furthermore we want to use a second explanatory variable <span class="math inline">\(x_2\)</span>, the (binary) categorical variable <code>gender</code>.</p>
+<p><strong>Note</strong>: This study only focused on the gender binary of <code>&quot;male&quot;</code> or <code>&quot;female&quot;</code> when the data was collected and analyzed years ago. It has been tradition to use gender as an “easy” binary variable in the past in statistical analyses. We have chosen to include it here because of the interesting results of the study, but we also understand that a segment of the population is not included in this dichotomous assignment of gender and we advocate for more inclusion in future studies to show representation of groups that do not identify with the gender binary. We now resume our analyses using this <code>evals</code> data and hope that others find these results interesting and worth further exploration.</p>
+<p>Our modeling scenario now becomes</p>
+<ol style="list-style-type: decimal">
+<li>A numerical outcome variable <span class="math inline">\(y\)</span>. As before, instructor evaluation score.</li>
+<li>Two explanatory variables:
+<ol style="list-style-type: decimal">
+<li>A numerical explanatory variable <span class="math inline">\(x_1\)</span>: in this case, their age.</li>
+<li>A categorical explanatory variable <span class="math inline">\(x_2\)</span>: in this case, their binary gender.</li>
+</ol></li>
+</ol>
+<div id="model4EDA" class="section level3">
+<h3><span class="header-section-number">7.2.1</span> Exploratory data analysis</h3>
+<p>Let’s reload the <code>evals</code> data and <code>select()</code> only the needed subset of variables. Note that these are different than the variables chosen in Chapter 6. Let’s given this the name <code>evals_ch7</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch7 &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(score, age, gender)</code></pre>
+<p>Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the <code>glimpse()</code> function, although in Table <a href="7-multiple-regression.html#tab:model4-data-preview">7.5</a> we only show 5 randomly selected instructors out of 463:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(evals_ch7)</code></pre>
+<table>
+<caption><span id="tab:model4-data-preview">Table 7.5: </span>Random sample of 5 instructors</caption>
+<thead>
+<tr class="header">
+<th align="right">score</th>
+<th align="right">age</th>
+<th align="left">gender</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">3.6</td>
+<td align="right">34</td>
+<td align="left">male</td>
+</tr>
+<tr class="even">
+<td align="right">4.9</td>
+<td align="right">43</td>
+<td align="left">male</td>
+</tr>
+<tr class="odd">
+<td align="right">3.3</td>
+<td align="right">47</td>
+<td align="left">male</td>
+</tr>
+<tr class="even">
+<td align="right">4.4</td>
+<td align="right">33</td>
+<td align="left">female</td>
+</tr>
+<tr class="odd">
+<td align="right">4.7</td>
+<td align="right">60</td>
+<td align="left">male</td>
+</tr>
+</tbody>
+</table>
+<p>Let’s look at some summary statistics using the <code>skim()</code> function from the <code>skimr</code> package:</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch7 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">skim</span>()</code></pre>
+<pre><code>Skim summary statistics
+ n obs: 463 
+ n variables: 3 
+
+── Variable type:factor ──────
+ variable missing complete   n n_unique                top_counts ordered
+   gender       0      463 463        2 mal: 268, fem: 195, NA: 0   FALSE
+
+── Variable type:integer ─────
+ variable missing complete   n  mean  sd p0 p25 p50 p75 p100     hist
+      age       0      463 463 48.37 9.8 29  42  48  57   73 ▅▅▅▇▅▇▂▁
+
+── Variable type:numeric ─────
+ variable missing complete   n mean   sd  p0 p25 p50 p75 p100     hist
+    score       0      463 463 4.17 0.54 2.3 3.8 4.3 4.6    5 ▁▁▂▃▅▇▇▆</code></pre>
+<p>Furthermore, let’s compute the correlation between two numerical variables we have <code>score</code> and <code>age</code>. Recall that correlation coefficients only exist between numerical variables. We observe that they are weakly negatively correlated.</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch7 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_correlation</span>(<span class="dt">formula =</span> score <span class="op">~</span><span class="st"> </span>age)</code></pre>
+<pre><code># A tibble: 1 x 1
+  correlation
+        &lt;dbl&gt;
+1      -0.107</code></pre>
+<p>In Figure <a href="7-multiple-regression.html#fig:numxcatxplot1">7.4</a>, we plot a scatterplot of <code>score</code> over <code>age</code>. Given that <code>gender</code> is a binary categorical variable in this study, we can make some interesting tweaks:</p>
+<ol style="list-style-type: decimal">
+<li>We can assign a color to points from each of the two levels of <code>gender</code>: female and male.</li>
+<li>Furthermore, the <code>geom_smooth(method = &quot;lm&quot;, se = FALSE)</code> layer automatically fits a different regression line for each since we have provided <code>color = gender</code> at the top level in <code>ggplot()</code>. This allows for all <code>geom_</code>etries that follow to have the same mapping of <code>aes()</code>thetics to variables throughout the plot.</li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(evals_ch7, <span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> score, <span class="dt">color =</span> gender)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_jitter</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Age&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Teaching Score&quot;</span>, <span class="dt">color =</span> <span class="st">&quot;Gender&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>, <span class="dt">se =</span> <span class="ot">FALSE</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:numxcatxplot1"></span>
+<img src="ismaykim_files/figure-html/numxcatxplot1-1.png" alt="Instructor evaluation scores at UT Austin split by gender (jittered)" width="\textwidth" />
+<p class="caption">
+Figure 7.4: Instructor evaluation scores at UT Austin split by gender (jittered)
+</p>
+</div>
+<p>We notice some interesting trends:</p>
+<ol style="list-style-type: decimal">
+<li>There are almost no women faculty over the age of 60. We can see this by the lack of red dots above 60.</li>
+<li>Fitting separate regression lines for men and women, we see they have different slopes. We see that the associated effect of increasing age seems to be much harsher for women than men. In other words, as women age, the drop in their teaching score appears to be faster.</li>
+</ol>
+</div>
+<div id="model4table" class="section level3">
+<h3><span class="header-section-number">7.2.2</span> Multiple regression: Parallel slopes model</h3>
+<p>Much like we started to consider multiple explanatory variables using the <code>+</code> sign in Subsection <a href="7-multiple-regression.html#model3table">7.1.2</a>, let’s fit a regression model and get the regression table. This time we provide the name of <code>score_model_2</code> to our regression model fit, in so as to not overwrite the model <code>score_model</code> from Section <a href="6-regression.html#model1table">6.1.2</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">score_model_<span class="dv">2</span> &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">+</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_ch7)
+<span class="kw">get_regression_table</span>(score_model_<span class="dv">2</span>)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-234">Table 7.6: </span>Regression table</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">4.484</td>
+<td align="right">0.125</td>
+<td align="right">35.79</td>
+<td align="right">0.000</td>
+<td align="right">4.238</td>
+<td align="right">4.730</td>
+</tr>
+<tr class="even">
+<td align="left">age</td>
+<td align="right">-0.009</td>
+<td align="right">0.003</td>
+<td align="right">-3.28</td>
+<td align="right">0.001</td>
+<td align="right">-0.014</td>
+<td align="right">-0.003</td>
+</tr>
+<tr class="odd">
+<td align="left">gendermale</td>
+<td align="right">0.191</td>
+<td align="right">0.052</td>
+<td align="right">3.63</td>
+<td align="right">0.000</td>
+<td align="right">0.087</td>
+<td align="right">0.294</td>
+</tr>
+</tbody>
+</table>
+<p>The modeling equation for this scenario is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{y} &amp;= b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 \\
+\widehat{\mbox{score}} &amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) \\
+\end{align}
+\]</span>
+where <span class="math inline">\(\mathbb{1}_{\mbox{is male}}(x)\)</span> is an <em>indicator function</em> for <code>sex == male</code>. In other words, <span class="math inline">\(\mathbb{1}_{\mbox{is male}}(x)\)</span> equals one if the current observation corresponds to a male professor, and 0 if the current observation corresponds to a female professor. This model can be visualized in Figure <a href="7-multiple-regression.html#fig:numxcatxplot2">7.5</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:numxcatxplot2"></span>
+<img src="ismaykim_files/figure-html/numxcatxplot2-1.png" alt="Instructor evaluation scores at UT Austin by gender: same slope" width="\textwidth" />
+<p class="caption">
+Figure 7.5: Instructor evaluation scores at UT Austin by gender: same slope
+</p>
+</div>
+<p>We see that:</p>
+<ul>
+<li>Females are treated as the baseline for comparison for no other reason than “female” is alphabetically earlier than “male.” The <span class="math inline">\(b_{male} = 0.1906\)</span> is the vertical “bump” that men get in their teaching evaluation scores. Or more precisely, it is the average difference in teaching score
+that men get <em>relative to the baseline of women</em>.</li>
+<li>Accordingly, the intercepts (which in this case make no sense since no instructor can have an age of 0) are :
+<ul>
+<li>for women: <span class="math inline">\(b_0\)</span> = 4.484</li>
+<li>for men: <span class="math inline">\(b_0 + b_{male}\)</span> = 4.484 + 0.191 = 4.675</li>
+</ul></li>
+<li>Both men and women have the same slope. In other words, <em>in this model</em> the associated effect of age is the same for men and women. So for every increase of one year in age, there is on average an associated change of <span class="math inline">\(b_{age}\)</span> = -0.009 (a decrease) in teaching score.</li>
+</ul>
+<p>But wait, why is Figure <a href="7-multiple-regression.html#fig:numxcatxplot2">7.5</a> different than Figure <a href="7-multiple-regression.html#fig:numxcatxplot1">7.4</a>! What is going on? What we have in the original plot is known as an <em>interaction effect</em> between age and gender. Focusing on fitting a model for each of men and women, we see that the resulting regression lines are different. Thus, <code>gender</code> appears to interact in different ways for men and women with the different values of <code>age</code>.</p>
+</div>
+<div id="model4interactiontable" class="section level3">
+<h3><span class="header-section-number">7.2.3</span> Multiple regression: Interaction model</h3>
+<p>We say a model has an <em>interaction effect</em> if the associated effect of one variable <em>depends on the value of another variable</em>. These types of models usually prove to be tricky to view on first glance because of their complexity. In this case, the effect of <code>age</code> will depend on the value of <code>gender</code>. Put differently, the effect of age on teaching scores will differ for men and for women, as was suggested by the different slopes for men and women in our visual exploratory data analysis in Figure <a href="7-multiple-regression.html#fig:numxcatxplot1">7.4</a>.</p>
+<p>Let’s fit a regression with an interaction term. Instead of using the <code>+</code> sign in the enumeration of explanatory variables, we use the <code>*</code> sign. Let’s fit this regression and save it in <code>score_model_3</code>, then we get the regression table using the <code>get_regression_table()</code> function as before.</p>
+<pre class="sourceCode r"><code class="sourceCode r">score_model_interaction &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">*</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_ch7)
+<span class="kw">get_regression_table</span>(score_model_interaction)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-236">Table 7.7: </span>Regression table</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">4.883</td>
+<td align="right">0.205</td>
+<td align="right">23.80</td>
+<td align="right">0.000</td>
+<td align="right">4.480</td>
+<td align="right">5.286</td>
+</tr>
+<tr class="even">
+<td align="left">age</td>
+<td align="right">-0.018</td>
+<td align="right">0.004</td>
+<td align="right">-3.92</td>
+<td align="right">0.000</td>
+<td align="right">-0.026</td>
+<td align="right">-0.009</td>
+</tr>
+<tr class="odd">
+<td align="left">gendermale</td>
+<td align="right">-0.446</td>
+<td align="right">0.265</td>
+<td align="right">-1.68</td>
+<td align="right">0.094</td>
+<td align="right">-0.968</td>
+<td align="right">0.076</td>
+</tr>
+<tr class="even">
+<td align="left">age:gendermale</td>
+<td align="right">0.014</td>
+<td align="right">0.006</td>
+<td align="right">2.45</td>
+<td align="right">0.015</td>
+<td align="right">0.003</td>
+<td align="right">0.024</td>
+</tr>
+</tbody>
+</table>
+<p>The modeling equation for this scenario is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{y} &amp;= b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 + b_3 \cdot x_1 \cdot x_2\\
+\widehat{\mbox{score}} &amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\
+\end{align}
+\]</span></p>
+<p>Oof, that’s a lot of rows in the regression table output and a lot of terms in the model equation. The fourth term being added on the right hand side of the equation corresponds to the <em>interaction term</em>. Let’s simplify things by considering men and women separately. First, recall that <span class="math inline">\(\mathbb{1}_{\mbox{is male}}(x)\)</span> equals 1 if a particular observation (or row in <code>evals_ch7</code>) corresponds to a male instructor. In this case, using the values from the regression table the fitted value of <span class="math inline">\(\widehat{\mbox{score}}\)</span> is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\mbox{score}} &amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\
+&amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot 1 + b_{\mbox{age,male}} \cdot \mbox{age} \cdot 1 \\
+&amp;= \left(b_0 + b_{\mbox{male}}\right) + \left(b_{\mbox{age}} +  b_{\mbox{age,male}} \right) \cdot \mbox{age} \\
+&amp;= \left(4.883 + -0.446\right) + \left(-0.018 +  0.014 \right) \cdot \mbox{age} \\
+&amp;= 4.437 -0.004 \cdot \mbox{age}
+\end{align}
+\]</span></p>
+<p>Second, recall that <span class="math inline">\(\mathbb{1}_{\mbox{is male}}(x)\)</span> equals 0 if a particular observation corresponds to a female instructor. Again, using the values from the regression table the fitted value of <span class="math inline">\(\widehat{\mbox{score}}\)</span> is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\mbox{score}} &amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\
+&amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot 0 + b_{\mbox{age,male}}\mbox{age} \cdot 0 \\
+&amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age}\\
+&amp;= 4.883 -0.018 \cdot \mbox{age}
+\end{align}
+\]</span></p>
+<p>Let’s summarize these values in a table:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-237">Table 7.8: </span>Comparison of male and female intercepts and age slopes</caption>
+<thead>
+<tr class="header">
+<th align="left">Gender</th>
+<th align="right">Intercept</th>
+<th align="right">Slope for age</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Male instructors</td>
+<td align="right">4.44</td>
+<td align="right">-0.004</td>
+</tr>
+<tr class="even">
+<td align="left">Female instructors</td>
+<td align="right">4.88</td>
+<td align="right">-0.018</td>
+</tr>
+</tbody>
+</table>
+<p>We see that while male instructors have a lower intercept, as they age, they have a less steep associated average decrease in teaching scores: 0.004 teaching score units per year as opposed to -0.018 for women. This is consistent with the different slopes and intercepts of the red and blue regression lines fit in Figure <a href="7-multiple-regression.html#fig:numxcatxplot1">7.4</a>. Recall our definition of a model having an interaction effect: when the associated effect of one variable, in this case <code>age</code>, depends on the value of another variable, in this case <code>gender</code>.</p>
+<p>But how do we know when it’s appropriate to include an interaction effect? For example, which is the more appropriate model? The regular multiple regression model without an interaction term we saw in Section <a href="7-multiple-regression.html#model4table">7.2.2</a> or the multiple regression model with the interaction term we just saw? We’ll revisit this question in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on “inference for regression.”</p>
+</div>
+<div id="model4points" class="section level3">
+<h3><span class="header-section-number">7.2.4</span> Observed/fitted values and residuals</h3>
+<p>Now say we want to apply the above calculations for male and female instructors for all 463 instructors in the <code>evals_ch7</code> dataset. As our multiple regression models get more and more complex, computing such values by hand gets more and more tedious. The <code>get_regression_points()</code> function spares us this tedium and returns all fitted values and all residuals. For simplicity, let’s focus only on the fitted interaction model, which is saved in <code>score_model_interaction</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(score_model_interaction)
+regression_points</code></pre>
+<table>
+<caption><span id="tab:model4-points-table">Table 7.9: </span>Regression points (first 5 rows of 463)</caption>
+<thead>
+<tr class="header">
+<th align="right">ID</th>
+<th align="right">score</th>
+<th align="right">age</th>
+<th align="left">gender</th>
+<th align="right">score_hat</th>
+<th align="right">residual</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">4.7</td>
+<td align="right">36</td>
+<td align="left">female</td>
+<td align="right">4.25</td>
+<td align="right">0.448</td>
+</tr>
+<tr class="even">
+<td align="right">2</td>
+<td align="right">4.1</td>
+<td align="right">36</td>
+<td align="left">female</td>
+<td align="right">4.25</td>
+<td align="right">-0.152</td>
+</tr>
+<tr class="odd">
+<td align="right">3</td>
+<td align="right">3.9</td>
+<td align="right">36</td>
+<td align="left">female</td>
+<td align="right">4.25</td>
+<td align="right">-0.352</td>
+</tr>
+<tr class="even">
+<td align="right">4</td>
+<td align="right">4.8</td>
+<td align="right">36</td>
+<td align="left">female</td>
+<td align="right">4.25</td>
+<td align="right">0.548</td>
+</tr>
+<tr class="odd">
+<td align="right">5</td>
+<td align="right">4.6</td>
+<td align="right">59</td>
+<td align="left">male</td>
+<td align="right">4.20</td>
+<td align="right">0.399</td>
+</tr>
+</tbody>
+</table>
+<p>Recall the format of the output:</p>
+<ul>
+<li><code>score</code> corresponds to <span class="math inline">\(y\)</span> the observed value</li>
+<li><code>score_hat</code> corresponds to <span class="math inline">\(\widehat{y} = \widehat{\mbox{score}}\)</span> the fitted value</li>
+<li><code>residual</code> corresponds to the residual <span class="math inline">\(y - \widehat{y}\)</span></li>
+</ul>
+</div>
+<div id="model4residuals" class="section level3">
+<h3><span class="header-section-number">7.2.5</span> Residual analysis</h3>
+<p>As always, let’s perform a residual analysis first with a histogram, which we can facet by <code>gender</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.25</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span>gender)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:residual1"></span>
+<img src="ismaykim_files/figure-html/residual1-1.png" alt="Interaction model histogram of residuals" width="\textwidth" />
+<p class="caption">
+Figure 7.6: Interaction model histogram of residuals
+</p>
+</div>
+<p>Second, the residuals as compared to the predictor variables:</p>
+<ul>
+<li><span class="math inline">\(x_1\)</span>: numerical explanatory/predictor variable of <code>age</code></li>
+<li><span class="math inline">\(x_2\)</span>: categorical explanatory/predictor variable of <code>gender</code></li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;age&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_hline</span>(<span class="dt">yintercept =</span> <span class="dv">0</span>, <span class="dt">col =</span> <span class="st">&quot;blue&quot;</span>, <span class="dt">size =</span> <span class="dv">1</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>gender)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:residual2"></span>
+<img src="ismaykim_files/figure-html/residual2-1.png" alt="Interaction model residuals vs predictor" width="\textwidth" />
+<p class="caption">
+Figure 7.7: Interaction model residuals vs predictor
+</p>
+</div>
+</div>
+</div>
+<div id="related-topics-1" class="section level2">
+<h2><span class="header-section-number">7.3</span> Related topics</h2>
+<div id="correlationcoefficient2" class="section level3">
+<h3><span class="header-section-number">7.3.1</span> More on the correlation coefficient</h3>
+<p>Recall from Table <a href="7-multiple-regression.html#tab:model3-correlation">7.2</a> that we saw the correlation
+coefficient between <code>Income</code> in thousands of dollars and credit card <code>Balance</code>
+was 0.464. What if in instead we looked at the correlation coefficient between
+<code>Income</code> and credit card <code>Balance</code>, but where <code>Income</code> was in dollars and not
+thousands of dollars? This can be done by multiplying <code>Income</code> by 1000.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ISLR)
+<span class="kw">data</span>(Credit)
+Credit <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(Balance, Income) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">Income =</span> Income <span class="op">*</span><span class="st"> </span><span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">cor</span>()</code></pre>
+<table>
+<caption><span id="tab:cor-credit-2">Table 7.10: </span>Correlation between income (in $) and credit card balance</caption>
+<thead>
+<tr class="header">
+<th></th>
+<th align="right">Balance</th>
+<th align="right">Income</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>Balance</td>
+<td align="right">1.000</td>
+<td align="right">0.464</td>
+</tr>
+<tr class="even">
+<td>Income</td>
+<td align="right">0.464</td>
+<td align="right">1.000</td>
+</tr>
+</tbody>
+</table>
+<p>We see it is the same! We say that the correlation coefficient is invariant to linear
+transformations! In other words,</p>
+<ul>
+<li>the correlation between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> will be the same as</li>
+<li>the correlation between <span class="math inline">\(a\times x + b\)</span> and <span class="math inline">\(y\)</span> where <span class="math inline">\(a\)</span> and <span class="math inline">\(b\)</span> are numerical values (real numbers in mathematical terms).</li>
+</ul>
+</div>
+<div id="simpsonsparadox" class="section level3">
+<h3><span class="header-section-number">7.3.2</span> Simpson’s Paradox</h3>
+<p>Recall in Section <a href="7-multiple-regression.html#model3">7.1</a>, we saw the two following seemingly contradictory results when studying the relationship between credit card balance, credit limit, and income. On the one hand, the right hand plot of Figure <a href="7-multiple-regression.html#fig:2numxplot1">7.1</a> suggested that credit card balance and income were positively related:</p>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-240"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-240-1.png" alt="Relationship between credit card balance and credit limit/income" width="\textwidth" />
+<p class="caption">
+Figure 7.8: Relationship between credit card balance and credit limit/income
+</p>
+</div>
+<p>On the other hand, the multiple regression in Table <a href="7-multiple-regression.html#tab:model3-table-output">7.3</a>, suggested that when modeling credit card balance as a function of both credit limit and income at the same time, credit limit has a negative relationship with balance, as evidenced by the slope of -7.66. How can this be?</p>
+<p>First, let’s dive a little deeper into the explanatory variable <code>Limit</code>. Figure <a href="7-multiple-regression.html#fig:credit-limit-quartiles">7.9</a> shows a histogram of all 400 values of <code>Limit</code>, along with vertical red lines that cut up the data into quartiles, meaning:</p>
+<ol style="list-style-type: decimal">
+<li>25% of credit limits were between $0 and $3088. Let’s call this the “low” credit limit bracket.</li>
+<li>25% of credit limits were between $3088 and $4622. Let’s call this the “medium-low” credit limit bracket.</li>
+<li>25% of credit limits were between $4622 and $5873. Let’s call this the “medium-high” credit limit bracket.</li>
+<li>25% of credit limits were over $5873. Let’s call this the “high” credit limit bracket.</li>
+</ol>
+<div class="figure" style="text-align: center"><span id="fig:credit-limit-quartiles"></span>
+<img src="ismaykim_files/figure-html/credit-limit-quartiles-1.png" alt="Histogram of credit limits and quartiles" width="\textwidth" />
+<p class="caption">
+Figure 7.9: Histogram of credit limits and quartiles
+</p>
+</div>
+<p>Let’s now display</p>
+<ol style="list-style-type: decimal">
+<li>The scatterplot showing the relationship between credit card balance and limit (the right-hand plot of Figure <a href="7-multiple-regression.html#fig:2numxplot1">7.1</a>).</li>
+<li>The scatterplot showing the relationship between credit card balance and limit now with a color aesthetic added corresponding to the credit limit bracket.</li>
+</ol>
+<div class="figure" style="text-align: center"><span id="fig:2numxplot4"></span>
+<img src="ismaykim_files/figure-html/2numxplot4-1.png" alt="Relationship between credit card balance and income for different credit limit brackets" width="\textwidth" />
+<p class="caption">
+Figure 7.10: Relationship between credit card balance and income for different credit limit brackets
+</p>
+</div>
+<p>In the right-hand plot, the</p>
+<ul>
+<li>Red points (bottom-left) correspond to the low credit limit bracket.</li>
+<li>Green points correspond to the medium-low credit limit bracket.</li>
+<li>Blue points correspond to the medium-high credit limit bracket.</li>
+<li>Purple points (top-right) correspond to the high credit limit bracket.</li>
+</ul>
+<p>The left-hand plot focuses of the relationship between balance and income in aggregate, but the right-hand plot focuses on the relationship between balance and income <em>broken down by credit limit bracket</em>. Whereas in aggregate there is an overall positive relationship, when broken down we now see that for the low (red points), medium-low (green points), and medium-high (blue points) income bracket groups, the strong positive relationship between credit card balance and income disappears! Only for the high bracket does the relationship stay somewhat positive. In this example, credit limit is a <em>confounding variable</em> for credit card balance and income.</p>
+<!--
+Alternatively, we could also have used facets, where each facet has roughly 25% of people based
+on the credit limit bracket. However, IMO the above plot is easier to read.
+
+<div class="figure" style="text-align: center">
+<img src="ismaykim_files/figure-html/2numxplot5-1.png" alt="Relationship between credit card balance and income for different credit limit brackets" width="\textwidth" />
+<p class="caption">(\#fig:2numxplot5)Relationship between credit card balance and income for different credit limit brackets</p>
+</div>
+-->
+</div>
+</div>
+<div id="conclusion-5" class="section level2">
+<h2><span class="header-section-number">7.4</span> Conclusion</h2>
+<div id="whats-to-come-4" class="section level3">
+<h3><span class="header-section-number">7.4.1</span> What’s to come?</h3>
+<p>Congratulations! We’re ready to proceed to the third portion of this book: “statistical inference” using a new package called <code>infer</code>. Once we’ve covered Chapters <a href="8-sampling.html#sampling">8</a> on sampling, <a href="9-confidence-intervals.html#confidence-intervals">9</a> on confidence intervals, and <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> on hypothesis testing, we’ll come back to the models we’ve seen in “data modeling” in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on inference for regression. As we said at the end of Chapter <a href="6-regression.html#regression">6</a>, we’ll see why we’ve been conducting the residual analyses from Subsections <a href="7-multiple-regression.html#model3residuals">7.1.4</a> and <a href="7-multiple-regression.html#model4residuals">7.2.5</a>. We are actually verifying some very important assumptions that must be met for the <code>std_error</code> (standard error), <code>p_value</code>, <code>conf_low</code> and <code>conf_high</code> (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation.</p>
+<p>Up next:</p>
+<center>
+<img src="images/flowcharts/flowchart/flowchart.006.png" title="ModernDive flowchart" width="800"/>
+</center>
+</div>
+<div id="script-of-r-code-4" class="section level3">
+<h3><span class="header-section-number">7.4.2</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/07-multiple-regression.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+
+
+
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="6-regression.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="8-sampling.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/07-multiple-regression.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/previous_versions/v0.4.0/8-inference-for-regression.html b/docs/previous_versions/v0.4.0/8-inference-for-regression.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/docs/previous_versions/v0.4.0/8-sampling.html b/docs/previous_versions/v0.4.0/8-sampling.html
new file mode 100644
index 000000000..f30c114c7
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/8-sampling.html
@@ -0,0 +1,1612 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>8 Sampling | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="8 Sampling | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="8 Sampling | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="7-multiple-regression.html">
+<link rel="next" href="9-confidence-intervals.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="sampling" class="section level1">
+<h1><span class="header-section-number">8</span> Sampling</h1>
+<p>In this chapter we kick off the third segment of this book, statistical inference, by learning about <strong>sampling</strong>. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we’ll cover in Chapters <a href="9-confidence-intervals.html#confidence-intervals">9</a> and <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> respectively. We will see that the tools that you learned in the data science segment of this book (data visualization, “tidy” data format, and data wrangling) will also play an important role here in the development of your understanding. As mentioned before, the concepts throughout this text all build into a culmination allowing you to “think with data.”</p>
+<div id="needed-packages-5" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(moderndive)</code></pre>
+</div>
+<div id="introduction-to-sampling" class="section level2">
+<h2><span class="header-section-number">8.1</span> Introduction to sampling</h2>
+<p>Let’s kick off this chapter immediately with an exercise that involves <strong>sampling</strong>. Imagine you are given a large bowl with 2400 balls that are either red or white. We are interested in the proportion of balls in this bowl that are red, but you don’t have the time to do an exhaustive count. You are also given a “shovel” that you can insert into this bowl…</p>
+<div class="figure" style="text-align: center"><span id="fig:sampling-exercise-1"></span>
+<img src="images/sampling_bowl_2.jpg" alt="A bowl with 2400 balls" width="600px" />
+<p class="caption">
+Figure 8.1: A bowl with 2400 balls
+</p>
+</div>
+<p>… and extract a sample of 50 balls:</p>
+<div class="figure" style="text-align: center"><span id="fig:sampling-exercise-2"></span>
+<img src="images/sampling_bowl_3_cropped.jpg" alt="A shovel used to extract a sample of size n = 50" width="600px" />
+<p class="caption">
+Figure 8.2: A shovel used to extract a sample of size n = 50
+</p>
+</div>
+<div id="concepts-related-to-sampling" class="section level3 unnumbered">
+<h3>Concepts related to sampling</h3>
+<p>Let’s now define some concepts and terminology important to understand sampling, being sure to tie things back to the above example. You might have to read this a couple times more as you progress throughout this book, as they are very deeply layered concepts. However as we’ll soon see, they are very powerful concepts that open up a whole new world of scientific thinking:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Population</strong>: The population is a set of <span class="math inline">\(N\)</span> observations of interest.
+<ul>
+<li>Above Ex: Our bowl consisting of <span class="math inline">\(N=2400\)</span> identically-shaped balls.</li>
+</ul></li>
+<li><strong>Population parameter</strong>: A population parameter is a numerical summary value about the population. In most settings, this is a value that’s unknown and you wish you knew it.
+<ul>
+<li>Above Ex: The true <em>population proportion <span class="math inline">\(p\)</span></em> of the balls in the bowl that are red.</li>
+<li>In this scenario the parameter of interest is the proportion, but in others it could be numerical summary values like the mean, median, etc.</li>
+</ul></li>
+<li><strong>Census</strong>: An exhaustive enumeration/counting of all observations in the population in order to compute the population parameter’s numerical value. <em>exactly</em>
+<ul>
+<li>Above Ex: This corresponds to manually going over all <span class="math inline">\(N=2400\)</span> balls and counting the number that are red, thereby allowing us to compute the population proportion <span class="math inline">\(p\)</span> of the balls that are red exactly.</li>
+<li>When <span class="math inline">\(N\)</span> is small, a census is feasible. However, when <span class="math inline">\(N\)</span> is large, a census can get very expensive, either in terms of time, energy, or money.</li>
+<li>Ex: the Decennial United States census attempts to exhaustively count the US population. Consequently it is a very expensive, but necessary, procedure.</li>
+</ul></li>
+<li><strong>Sampling</strong>: Collecting a sample of size <span class="math inline">\(n\)</span> of observations from the population. Typically the sample size <span class="math inline">\(n\)</span> is much smaller than the population size <span class="math inline">\(N\)</span>, thereby making sampling a much cheaper procedure than a census.
+<ul>
+<li>Above Ex: Using the shovel to extract a sample of <span class="math inline">\(n=50\)</span> balls.</li>
+<li>It is important to remember that the lowercase <span class="math inline">\(n\)</span> corresponds to the sample size and uppercase <span class="math inline">\(N\)</span> corresponds to the population size, thus <span class="math inline">\(n \leq N\)</span>.</li>
+</ul></li>
+<li><strong>Point estimates/sample statistics</strong>: A summary statistic based on the sample of size <span class="math inline">\(n\)</span> that <em>estimates</em> the unknown population parameter.
+<ul>
+<li>Above Ex: it’s the <em>sample proportion <span class="math inline">\(\widehat{p}\)</span></em> red of the balls in the sample of size <span class="math inline">\(n=50\)</span>.</li>
+<li>Key: The sample proportion red <span class="math inline">\(\widehat{p}\)</span> is an <em>estimate</em> of the true unknown population proportion red <span class="math inline">\(p\)</span>.</li>
+</ul></li>
+<li><strong>Representative sampling</strong>: A sample is said be a <em>representative sample</em> if it “looks like the population”. In other words, the sample’s characteristics are a good representation of the population’s characteristics.
+<ul>
+<li>Above Ex: Does our sample of <span class="math inline">\(n=50\)</span> balls “look like” the contents of the larger set of <span class="math inline">\(N=2400\)</span> balls in the bowl?</li>
+</ul></li>
+<li><strong>Generalizability</strong>: We say a sample is <em>generalizable</em> if any results of based on the sample can generalize to the population.
+<ul>
+<li>Above Ex: Is <span class="math inline">\(\widehat{p}\)</span> a “good guess” of <span class="math inline">\(p\)</span>?</li>
+<li>In other words, can we <em>infer</em> about the true proportion of the balls in the bowl that are red, based on the results of our sample of <span class="math inline">\(n=50\)</span> balls?</li>
+</ul></li>
+<li><strong>Bias</strong>: In a statistical sense, we say <em>bias</em> occurs if certain observations in a population have a higher chance of being sampled than others. We say a sampling procedure is <em>unbiased</em> if every observation in a population had an equal chance of being sampled.
+<ul>
+<li>Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? we feel since the balls are all of the same size, there isn’t any bias in the sampling. If, say, the red balls had a much larger diameter than the red ones. You might have have a higher or lower probability of now sampling red balls.</li>
+</ul></li>
+<li><strong>Random sampling</strong>: We say a sampling procedure is <em>random</em> if we sample randomly from the population in an unbiased fashion.
+<ul>
+<li>Above Ex: As long as you mixed the bowl sufficiently before sampling, your samples of size <span class="math inline">\(n=50\)</span> balls would be random.</li>
+</ul></li>
+</ol>
+</div>
+<div id="inference-via-sampling" class="section level3 unnumbered">
+<h3>Inference via sampling</h3>
+<p>Why did we go through the trouble of enumerating all the above concepts and terminology?</p>
+<p><strong>The moral of the story</strong>:</p>
+<blockquote>
+<ul>
+<li>If the sampling of a sample of size <span class="math inline">\(n\)</span> is done at <strong>random</strong>, then</li>
+<li>The sample is <strong>unbiased</strong> and <strong>representative</strong> of the population, thus</li>
+<li>Any result based on the sample can <strong>generalize</strong> to the population, thus</li>
+<li>The <strong>point estimate/sample statistic</strong> is a “good guess” of the unknown population parameter of interest</li>
+</ul>
+</blockquote>
+<p><strong>and thus we have inferred about the population based on our sample. In the above example</strong>:</p>
+<blockquote>
+<ul>
+<li>If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size <span class="math inline">\(n=50\)</span>, then</li>
+<li>The contents of the shovel will “look like” the contents of the bowl, thus</li>
+<li>Any results based on the sample of <span class="math inline">\(n=50\)</span> balls can generalize to the large bowl of <span class="math inline">\(N=2400\)</span> balls, thus</li>
+<li>The sample proportion <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> balls in the shovel that are red is a “good guess” of the true population proportion <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls that are red.</li>
+</ul>
+</blockquote>
+<p><strong>and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel.</strong></p>
+<p>At this point, you might be saying to yourself: “Big deal, why do we care about this bowl?” As hopefully you’ll soon come to appreciate, this sampling bowl exercise is merely a <strong>simulation</strong> representing the reality of many important sampling scenarios in a simplified and accessible setting. One in particular sampling scenario is familiar to many: polling. Whether for market research or for political purposes, polls inform much of the world’s decision and opinion making, and understanding the mechanism behind them can better inform you statistical citizenship. We’ll tie-in everything we learn in this chapter with an example relating to a 2013 poll on President Obama’s approval ratings among young adult in Section <a href="8-sampling.html#polls">8.4</a>.</p>
+</div>
+</div>
+<div id="tactile" class="section level2">
+<h2><span class="header-section-number">8.2</span> Tactile sampling simulation</h2>
+<p>Let’s start by revisiting our <em>tactile</em> sampling illustrating with “sampling bowl” in Figures <a href="8-sampling.html#fig:sampling-exercise-1">8.1</a> and <a href="8-sampling.html#fig:sampling-exercise-2">8.2</a>. By <em>tactile</em> we mean with your hands and to the touch. We’ll break down the act of tactile sampling from the bowl with the shovel using our newly acquired concepts and terminology relating to sampling. In particular we’ll study how <em>sampling variability</em> affects outcomes, which we’ll illustrate through simulations of <em>repeated sampling</em>. To this end, we’ll be using both the above-mentioned <em>tactile</em> simulation, but also using <em>virtual</em> simulation. By <em>virtual</em> we mean on the computer.</p>
+<div id="using-shovel-once" class="section level3">
+<h3><span class="header-section-number">8.2.1</span> Using shovel once</h3>
+<p>Let’s now view our shovel through the lens of sampling with the following 3-step <em>tactile</em> sampling simulation:</p>
+<p><strong>Step 1</strong>: Use the shovel to take a sample of size <span class="math inline">\(n=50\)</span> balls from the bowl as seen in Fig <a href="8-sampling.html#fig:tactile1">8.3</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:tactile1"></span>
+<img src="images/sampling/tactile_1_b.jpg" alt="Step 1: Take sample of size $n=50$" width="600px" />
+<p class="caption">
+Figure 8.3: Step 1: Take sample of size <span class="math inline">\(n=50\)</span>
+</p>
+</div>
+<p><strong>Step 2</strong>: Pour them into a cup and</p>
+<ul>
+<li>Count the number that are red then</li>
+<li>Compute the sample proportion <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> balls that are red</li>
+</ul>
+<p>as seen in Figure <a href="8-sampling.html#fig:tactile2">8.4</a> below. Note from above there are 18 balls out of <span class="math inline">\(n=50\)</span> that are red. Thus the <em>sample proportion red <span class="math inline">\(\widehat{p}\)</span></em> for this particular sample is thus <span class="math inline">\(\widehat{p} = 18 / 50 = 0.36\)</span>.</p>
+<div class="figure" style="text-align: center"><span id="fig:tactile2"></span>
+<img src="images/sampling/tactile_2_a.jpg" alt="Step 2: Pour into Red Solo Cup and compute $\widehat{p}$" width="400px" />
+<p class="caption">
+Figure 8.4: Step 2: Pour into Red Solo Cup and compute <span class="math inline">\(\widehat{p}\)</span>
+</p>
+</div>
+<p><strong>Step 3</strong>: Mark the sample proportion <span class="math inline">\(\widehat{p}\)</span> in a hand-drawn histogram, just like our intrepid students are doing in Figure <a href="8-sampling.html#fig:tactile3">8.5</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:tactile3"></span>
+<img src="images/sampling/tactile_3_a.jpg" alt="Step 3: Mark $\widehat{p}$'s in histogram" width="600px" />
+<p class="caption">
+Figure 8.5: Step 3: Mark <span class="math inline">\(\widehat{p}\)</span>’s in histogram
+</p>
+</div>
+<p><strong>Repeat Steps 1-3 a few times</strong>: After a few groups of students complete this exercise, let’s draw the resulting histogram by hand. In Figure <a href="8-sampling.html#fig:tactile4">8.6</a> we have the resulting hand-drawn histogram for 10 groups of students.</p>
+<div class="figure" style="text-align: center"><span id="fig:tactile4"></span>
+<img src="images/sampling/tactile_3_c.jpg" alt="Step 3: Histogram of 10 values of $\widehat{p}$" width="600px" />
+<p class="caption">
+Figure 8.6: Step 3: Histogram of 10 values of <span class="math inline">\(\widehat{p}\)</span>
+</p>
+</div>
+<p>Observe the behavior of the 10 different values of the sample proportion <span class="math inline">\(\widehat{p}\)</span> in the histogram of their distribution, in particular where the values center and how much they spread out, in other words <em>how much they vary</em>. Note:</p>
+<ul>
+<li>The lowest value of <span class="math inline">\(\widehat{p}\)</span> was somewhere between 0.20 and 0.25.</li>
+<li>The highest value of <span class="math inline">\(\widehat{p}\)</span> was somewhere between 0.45 and 0.50.</li>
+<li>Five of the sample proportions <span class="math inline">\(\widehat{p}\)</span> cluster. Five different samples of size <span class="math inline">\(n=50\)</span> yielded sample proportions <span class="math inline">\(\widehat{p}\)</span> that were in the range 0.30 to 0.35.</li>
+</ul>
+<p>Let’s now look at some real-life outcomes of this tactile sampling simulation. We present the actual results for not 10 groups of students, but 33 groups of students below!</p>
+</div>
+<div id="student-shovels" class="section level3">
+<h3><span class="header-section-number">8.2.2</span> Using shovel 33 times</h3>
+<p>All told, 33 groups took samples. In other words, the shovel was used 33 times and 33 values of the sample proportion <span class="math inline">\(\widehat{p}\)</span> were computed; this data is saved in the <code>tactile_prop_red</code> data frame included in the <code>moderndive</code> package. Let’s display its contents in Table <a href="#tab:tactile-prop-red"><strong>??</strong></a>. Notice how the <code>replicate</code> column enumerates each of the 33 groups, <code>red_balls</code> is the count of balls in the sample of size <span class="math inline">\(n=50\)</span> that we red, and <code>prop_red</code> is the sample proportion <span class="math inline">\(\widehat{p}\)</span> that are red.</p>
+<pre class="sourceCode r"><code class="sourceCode r">tactile_prop_red
+<span class="kw">View</span>(tactile_prop_red)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">group</th>
+<th align="center">replicate</th>
+<th align="center">red_balls</th>
+<th align="center">prop_red</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">Ilyas, Yohan</td>
+<td align="center">1</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="even">
+<td align="center">Morgan, Terrance</td>
+<td align="center">2</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="odd">
+<td align="center">Martin, Thomas</td>
+<td align="center">3</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="even">
+<td align="center">Clark, Frank</td>
+<td align="center">4</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="odd">
+<td align="center">Riddhi, Karina</td>
+<td align="center">5</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="even">
+<td align="center">Andrew, Tyler</td>
+<td align="center">6</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="odd">
+<td align="center">Julia</td>
+<td align="center">7</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="even">
+<td align="center">Rachel, Lauren</td>
+<td align="center">8</td>
+<td align="center">11</td>
+<td align="center">0.22</td>
+</tr>
+<tr class="odd">
+<td align="center">Daniel, Caroline</td>
+<td align="center">9</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="even">
+<td align="center">Josh, Maeve</td>
+<td align="center">10</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="odd">
+<td align="center">Emily, Emily</td>
+<td align="center">11</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="even">
+<td align="center">Conrad, Emily</td>
+<td align="center">12</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="odd">
+<td align="center">Oliver, Erik</td>
+<td align="center">13</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="even">
+<td align="center">Isabel, Nam</td>
+<td align="center">14</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="odd">
+<td align="center">X, Claire</td>
+<td align="center">15</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="even">
+<td align="center">Cindy, Kimberly</td>
+<td align="center">16</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="odd">
+<td align="center">Kevin, James</td>
+<td align="center">17</td>
+<td align="center">11</td>
+<td align="center">0.22</td>
+</tr>
+<tr class="even">
+<td align="center">Nam, Isabelle</td>
+<td align="center">18</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="odd">
+<td align="center">Harry, Yuko</td>
+<td align="center">19</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="even">
+<td align="center">Yuki, Eileen</td>
+<td align="center">20</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="odd">
+<td align="center">Ramses</td>
+<td align="center">21</td>
+<td align="center">23</td>
+<td align="center">0.46</td>
+</tr>
+<tr class="even">
+<td align="center">Joshua, Elizabeth, Stanley</td>
+<td align="center">22</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="odd">
+<td align="center">Siobhan, Jane</td>
+<td align="center">23</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="even">
+<td align="center">Jack, Will</td>
+<td align="center">24</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="odd">
+<td align="center">Caroline, Katie</td>
+<td align="center">25</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="even">
+<td align="center">Griffin, Y</td>
+<td align="center">26</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="odd">
+<td align="center">Kaitlin, Jordan</td>
+<td align="center">27</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="even">
+<td align="center">Ella, Garrett</td>
+<td align="center">28</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="odd">
+<td align="center">Julie, Hailin</td>
+<td align="center">29</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="even">
+<td align="center">Katie, Caroline</td>
+<td align="center">30</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="odd">
+<td align="center">Mallory, Damani, Melissa</td>
+<td align="center">31</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="even">
+<td align="center">Katie</td>
+<td align="center">32</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="odd">
+<td align="center">Francis, Vignesh</td>
+<td align="center">33</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+</tbody>
+</table>
+<p>Using your data visualization skills that you honed in Chapter <a href="3-viz.html#viz">3</a>, let’s visualize the distribution of these 33 sample proportions red <span class="math inline">\(\widehat{p}\)</span> using a histogram with <code>binwidth = 0.05</code>. This visualization is appropriate since <code>prop_red</code> is a numerical variable. This histogram is showing a very particular important type of distribution in statistics: the <em>sampling distribution</em>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(tactile_prop_red, <span class="kw">aes</span>(<span class="dt">x =</span> prop_red)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.05</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Sample proportion red based on n = 50&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Sampling distribution of p-hat&quot;</span>) </code></pre>
+<div class="figure" style="text-align: center"><span id="fig:samplingdistribution-tactile"></span>
+<img src="ismaykim_files/figure-html/samplingdistribution-tactile-1.png" alt="Sampling distribution of 33 sample proportions based on 33 tactile samples with n=50" width="\textwidth" />
+<p class="caption">
+Figure 8.7: Sampling distribution of 33 sample proportions based on 33 tactile samples with n=50
+</p>
+</div>
+<p>Sampling distributions are a specific kind of distribution: distributions of <em>point estimates/sample statistics</em> based on samples of size <span class="math inline">\(n\)</span> used to estimate an unknown <em>population parameter</em>.</p>
+<p>In the case of the histogram in Figure <a href="8-sampling.html#fig:samplingdistribution-tactile">8.7</a>, its the distribution of the sample proportion red <span class="math inline">\(\widehat{p}\)</span> based on <span class="math inline">\(n=50\)</span> sampled balls from the bowl, for which we want to estimate the unknown <em>population proportion</em> <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls that are red. Sampling distributions describe how values of the sample proportion red <span class="math inline">\(\widehat{p}\)</span> will vary from sample to sample due to <strong>sampling variability</strong> and thus identify “typical” and “atypical” values of <span class="math inline">\(\widehat{p}\)</span>. For example</p>
+<ul>
+<li>Obtaining a sample that yields <span class="math inline">\(\widehat{p} = 0.36\)</span> would be considered typical, common, and plausible since it would in theory occur frequently.</li>
+<li>Obtaining a sample that yields <span class="math inline">\(\widehat{p} = 0.8\)</span> would be considered atypical, uncommon, and implausible since it lies far away from most of the distribution.</li>
+</ul>
+<p>Let’s now ask ourselves the following questions:</p>
+<ol style="list-style-type: decimal">
+<li>Where is the sampling distribution centered?</li>
+<li>What is the spread of this sampling distribution?</li>
+</ol>
+<p>Recall from Section <a href="5-wrangling.html#summarize">5.4</a> the mean and the standard deviation are two summary statistics that would answer this question:</p>
+<pre class="sourceCode r"><code class="sourceCode r">tactile_prop_red <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(prop_red), <span class="dt">sd =</span> <span class="kw">sd</span>(prop_red))</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">mean</th>
+<th align="center">sd</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">0.356</td>
+<td align="center">0.058</td>
+</tr>
+</tbody>
+</table>
+<p>Finally, it’s important to keep in mind:</p>
+<ol style="list-style-type: decimal">
+<li>If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red <span class="math inline">\(p\)</span>, or in other words the true number of balls out of 2400 that are red.</li>
+<li>The spread of this histogram, as quantified by the standard deviation of 0.058, is called the <strong>standard error</strong>. It quantifies the variability of our estimates for <span class="math inline">\(\widehat{p}\)</span>.
+<ul>
+<li><strong>Note</strong>: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors.</li>
+</ul></li>
+</ol>
+</div>
+</div>
+<div id="virtual" class="section level2">
+<h2><span class="header-section-number">8.3</span> Virtual sampling simulation</h2>
+<p>Now let’s mimic the above <em>tactile</em> sampling, but with <em>virtual</em> sampling. We’ll resort to virtual sampling because while collecting 33 tactile samples manually is feasible, for large numbers like 1000, things start getting tiresome! That’s where a computer can really help: computers excel at performing mundane tasks repeatedly; think of what accounting software must be like!</p>
+<p>In other words:</p>
+<ul>
+<li>Instead of considering the <em>tactile bowl</em> shown in Figure <a href="8-sampling.html#fig:sampling-exercise-1">8.1</a> above and using a <em>tactile shovel</em> to draw samples of size <span class="math inline">\(n=50\)</span></li>
+<li>Let’s use a <em>virtual bowl</em> saved in a computer and use R’s random number generator as a <em>virtual shovel</em> to draw samples of size <span class="math inline">\(n=50\)</span></li>
+</ul>
+<p>First, we describe our <em>virtual bowl</em>. In the <code>moderndive</code> package, we’ve included a data frame called <code>bowl</code> that has 2400 rows corresponding to the <span class="math inline">\(N=2400\)</span> balls in the physical bowl. Run <code>View(bowl)</code> in RStudio to convince yourselves that <code>bowl</code> is indeed a virtual version of the tactile bowl in the previous section.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bowl</code></pre>
+<pre><code># A tibble: 2,400 x 2
+   ball_ID color
+     &lt;int&gt; &lt;chr&gt;
+ 1       1 white
+ 2       2 white
+ 3       3 white
+ 4       4 red  
+ 5       5 white
+ 6       6 white
+ 7       7 red  
+ 8       8 white
+ 9       9 red  
+10      10 white
+# … with 2,390 more rows</code></pre>
+<p>Note that the balls are not actually marked with numbers; the variable <code>ball_ID</code> is merely used as an identification variable for each row of <code>bowl</code>. Recall our previous discussion on identification variables in Subsection <a href="4-tidy.html#identification-vs-measurement">4.2.2</a> in the “Data Tidying” Chapter <a href="4-tidy.html#tidy">4</a>.</p>
+<p>Next, we describe our <em>virtual shovel</em>: the <code>rep_sample_n()</code> function included in the <code>moderndive</code> package where <code>rep_sample_n()</code> indicates that we are taking repeated/replicated samples of size <span class="math inline">\(n\)</span>.</p>
+<div id="using-shovel-once-1" class="section level3">
+<h3><span class="header-section-number">8.3.1</span> Using shovel once</h3>
+<p>The <code>rep_sample_n()</code> function included in the <code>moderndive</code> package where <code>rep_sample_n()</code> indicates that we are taking repeated/replicated samples of size <span class="math inline">\(n\)</span>. Let’s perform the virtual analogue of tactilely inserting the shovel <em>only once</em> into the bowl and extracting a sample of <code>size</code> <span class="math inline">\(n=50\)</span>. In the table below we only show results about the first 10 sampled balls out of 50.</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_shovel &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>)
+<span class="kw">View</span>(virtual_shovel)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-249">Table 8.1: </span>First 10 sampled balls of 50 in virtual sample</caption>
+<thead>
+<tr class="header">
+<th align="right">replicate</th>
+<th align="right">ball_ID</th>
+<th align="right">color</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">2079</td>
+<td align="right">red</td>
+</tr>
+<tr class="even">
+<td align="right">1</td>
+<td align="right">1076</td>
+<td align="right">white</td>
+</tr>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">1691</td>
+<td align="right">red</td>
+</tr>
+<tr class="even">
+<td align="right">1</td>
+<td align="right">1687</td>
+<td align="right">red</td>
+</tr>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">1434</td>
+<td align="right">white</td>
+</tr>
+<tr class="even">
+<td align="right">1</td>
+<td align="right">954</td>
+<td align="right">white</td>
+</tr>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">483</td>
+<td align="right">white</td>
+</tr>
+<tr class="even">
+<td align="right">1</td>
+<td align="right">1520</td>
+<td align="right">white</td>
+</tr>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">2060</td>
+<td align="right">red</td>
+</tr>
+<tr class="even">
+<td align="right">1</td>
+<td align="right">1682</td>
+<td align="right">white</td>
+</tr>
+</tbody>
+</table>
+<p>Looking at all 50 rows of <code>virtual_shovel</code> in the spreadsheet viewer that pops up after running <code>View(virtual_shovel)</code> in RStudio, the <code>ball_ID</code> variable seems to suggest that we do indeed have a random sample of <span class="math inline">\(n=50\)</span> balls. However, what does the <code>replicate</code> variable indicate, where in this case it’s equal to 1 for all 50 rows? We’ll see in a minute. First let’s compute both the number of balls red and the proportion red out of <span class="math inline">\(n=50\)</span> using our <code>dplyr</code> data wrangling tools from Chapter <a href="5-wrangling.html#wrangling">5</a>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_shovel <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-251">Table 8.2: </span>Count and proportion red in single virtual sample of size n = 50</caption>
+<thead>
+<tr class="header">
+<th align="right">replicate</th>
+<th align="right">red</th>
+<th align="right">prop_red</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">23</td>
+<td align="right">0.46</td>
+</tr>
+</tbody>
+</table>
+<p>Why does this work? Because for every row where <code>color == &quot;red&quot;</code>, the Boolean <code>TRUE</code> is returned and R treats <code>TRUE</code> like the number <code>1</code>. Equivalently, for every row where <code>color</code> is not equal to <code>&quot;red&quot;</code>, the Boolean <code>FALSE</code> is returned and R treats <code>FALSE</code> like the number <code>0</code>. So summing the number of <code>TRUE</code>’s and <code>FALSE</code>’s is equivalent to summing <code>1</code>’s and <code>0</code>’s which counts the number of balls where <code>color</code> is <code>red</code>.</p>
+</div>
+<div id="using-shovel-33-times" class="section level3">
+<h3><span class="header-section-number">8.3.2</span> Using shovel 33 times</h3>
+<p>Recall however in our tactile sampling exercise in Section <a href="8-sampling.html#tactile">8.2</a> above that we had 33 groups of students take 33 samples total of size <span class="math inline">\(n=50\)</span> using the shovel 33 times and hence compute 33 separate values of the sample proportion red <span class="math inline">\(\widehat{p}\)</span>. In other words we <em>repeated/replicated</em> the sampling 33 times. We can achieve this by reusing the same <code>rep_sample_n()</code> function code above, but by adding the <code>reps = 33</code> argument indicating we want to repeat this sampling 33 times:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_samples &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">33</span>)
+<span class="kw">View</span>(virtual_samples)</code></pre>
+<p><code>virtual_samples</code> has <span class="math inline">\(50 \times 33 = 1650\)</span> rows, corresponding to 33 samples of size <span class="math inline">\(n=50\)</span>, or 33 draws from the shovel. We won’t display the contents of this data frame but leave it to you to <code>View()</code> this data frame. You’ll see that the first 50 rows have <code>replicate</code> equal to 1, then the next 50 rows have <code>replicate</code> equal to 2, and so on and so forth, up until the last 50 rows which have <code>replicate</code> equal to 33. The <code>replicate</code> variable denotes which of our 33 samples a particular ball is included in.</p>
+<p>Now let’s compute the 33 corresponding values of the sample proportion <span class="math inline">\(\widehat{p}\)</span> based on 33 different samples of size <span class="math inline">\(n=50\)</span> by reusing the previous code, but remembering to <code>group_by</code> the <code>replicate</code> variable first since we want to compute the sample proportion for each of the 33 samples separately. Notice the similarity of this table with Table <a href="#tab:tactile-prop-red"><strong>??</strong></a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red &lt;-<span class="st"> </span>virtual_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)
+<span class="kw">View</span>(virtual_prop_red)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">replicate</th>
+<th align="center">red</th>
+<th align="center">prop_red</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">1</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="even">
+<td align="center">2</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="odd">
+<td align="center">3</td>
+<td align="center">24</td>
+<td align="center">0.48</td>
+</tr>
+<tr class="even">
+<td align="center">4</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="odd">
+<td align="center">5</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="even">
+<td align="center">6</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="odd">
+<td align="center">7</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="even">
+<td align="center">8</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="odd">
+<td align="center">9</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="even">
+<td align="center">10</td>
+<td align="center">12</td>
+<td align="center">0.24</td>
+</tr>
+<tr class="odd">
+<td align="center">11</td>
+<td align="center">22</td>
+<td align="center">0.44</td>
+</tr>
+<tr class="even">
+<td align="center">12</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="odd">
+<td align="center">13</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="even">
+<td align="center">14</td>
+<td align="center">22</td>
+<td align="center">0.44</td>
+</tr>
+<tr class="odd">
+<td align="center">15</td>
+<td align="center">13</td>
+<td align="center">0.26</td>
+</tr>
+<tr class="even">
+<td align="center">16</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="odd">
+<td align="center">17</td>
+<td align="center">23</td>
+<td align="center">0.46</td>
+</tr>
+<tr class="even">
+<td align="center">18</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="odd">
+<td align="center">19</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="even">
+<td align="center">20</td>
+<td align="center">12</td>
+<td align="center">0.24</td>
+</tr>
+<tr class="odd">
+<td align="center">21</td>
+<td align="center">14</td>
+<td align="center">0.28</td>
+</tr>
+<tr class="even">
+<td align="center">22</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="odd">
+<td align="center">23</td>
+<td align="center">14</td>
+<td align="center">0.28</td>
+</tr>
+<tr class="even">
+<td align="center">24</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="odd">
+<td align="center">25</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="even">
+<td align="center">26</td>
+<td align="center">12</td>
+<td align="center">0.24</td>
+</tr>
+<tr class="odd">
+<td align="center">27</td>
+<td align="center">22</td>
+<td align="center">0.44</td>
+</tr>
+<tr class="even">
+<td align="center">28</td>
+<td align="center">23</td>
+<td align="center">0.46</td>
+</tr>
+<tr class="odd">
+<td align="center">29</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="even">
+<td align="center">30</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="odd">
+<td align="center">31</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="even">
+<td align="center">32</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="odd">
+<td align="center">33</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+</tbody>
+</table>
+<p>Just as we did before, let’s now visualize the <em>sampling distribution</em> using a histogram with <code>binwidth = 0.05</code> of the 33 virtually sample proportions <span class="math inline">\(\widehat{p}\)</span>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(virtual_prop_red, <span class="kw">aes</span>(<span class="dt">x =</span> prop_red)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.05</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Sample proportion red based on n = 50&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Sampling distribution of p-hat&quot;</span>) </code></pre>
+<div class="figure" style="text-align: center"><span id="fig:samplingdistribution-virtual"></span>
+<img src="ismaykim_files/figure-html/samplingdistribution-virtual-1.png" alt="Sampling distribution of 33 sample proportions based on 33 virtual samples with n=50" width="\textwidth" />
+<p class="caption">
+Figure 8.8: Sampling distribution of 33 sample proportions based on 33 virtual samples with n=50
+</p>
+</div>
+<p>The resulting sampling distribution based on our virtual sampling simulation is near identical to the sampling distribution of our tactile sampling simulation from Section <a href="8-sampling.html#virtual">8.3</a>. Let’s compare them side-by-side in Figure <a href="8-sampling.html#fig:tactile-vs-virtual">8.9</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:tactile-vs-virtual"></span>
+<img src="ismaykim_files/figure-html/tactile-vs-virtual-1.png" alt="Comparison of sampling distributions based on 33 tactile &amp; virtual samples with n=50" width="\textwidth" />
+<p class="caption">
+Figure 8.9: Comparison of sampling distributions based on 33 tactile &amp; virtual samples with n=50
+</p>
+</div>
+<p>We see that they are similar in terms of center and spread, although not identical due to random variation. This was in fact by design, as we made the virtual contents of the virtual <code>bowl</code> match the actual contents of the actual bowl pictured above.</p>
+</div>
+<div id="using-shovel-1000-times" class="section level3">
+<h3><span class="header-section-number">8.3.3</span> Using shovel 1000 times</h3>
+<p>In Figure <a href="8-sampling.html#fig:samplingdistribution-virtual">8.8</a>, we can start seeing a pattern in the sampling distribution emerge. However, 33 values of the sample proportion <span class="math inline">\(\widehat{p}\)</span> might not be enough to get a true sense of the distribution. Using 1000 values of <span class="math inline">\(\widehat{p}\)</span> would definitely give a better sense. What are our two options for constructing these histograms?</p>
+<ol style="list-style-type: decimal">
+<li>Tactile sampling: Make the 33 groups of students take <span class="math inline">\(1000 / 33 \approx 31\)</span> samples of size <span class="math inline">\(n=50\)</span> each, count the number of red balls for each of the 1000 tactile samples, and then compute the 1000 corresponding values of the sample proportion <span class="math inline">\(\widehat{p}\)</span>. However, this would be cruel and unusual as this would take hours!</li>
+<li>Virtual sampling: Computers are very good at automating repetitive tasks such as this one. This is the way to go!</li>
+</ol>
+<p>First, generate 1000 samples of size <span class="math inline">\(n=50\)</span></p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_samples &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">1000</span>)
+<span class="kw">View</span>(virtual_samples)</code></pre>
+<p>Then for each of these 1000 samples of size <span class="math inline">\(n=50\)</span>, compute the corresponding sample proportions</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red &lt;-<span class="st"> </span>virtual_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)
+<span class="kw">View</span>(virtual_prop_red)</code></pre>
+<p>As previously done, let’s plot the sampling distribution of these 1000 simulated values of the sample proportion red <span class="math inline">\(\widehat{p}\)</span> with a histogram in Figure <a href="8-sampling.html#fig:samplingdistribution-virtual-1000">8.10</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(virtual_prop_red, <span class="kw">aes</span>(<span class="dt">x =</span> prop_red)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.05</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Sample proportion red based on n = 50&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Sampling distribution of p-hat&quot;</span>) </code></pre>
+<div class="figure" style="text-align: center"><span id="fig:samplingdistribution-virtual-1000"></span>
+<img src="ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png" alt="Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50" width="\textwidth" />
+<p class="caption">
+Figure 8.10: Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50
+</p>
+</div>
+<p>Since the sampling is random and thus representative and unbiased, the above sampling distribution is centered at the true population proportion red <span class="math inline">\(p\)</span> of all <span class="math inline">\(N=2400\)</span> balls in the bowl. Eyeballing it, the sampling distribution appears to be centered at around 0.375.</p>
+<p>What is the standard error of the above sampling distribution of <span class="math inline">\(\widehat{p}\)</span> based on 1000 samples of size <span class="math inline">\(n=50\)</span>?</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">SE =</span> <span class="kw">sd</span>(prop_red))</code></pre>
+<pre><code># A tibble: 1 x 1
+      SE
+   &lt;dbl&gt;
+1 0.0698</code></pre>
+<p>What this value is saying might not be immediately apparent by itself to someone who is new to sampling. It’s best to first compare different standard errors for different sampling schemes based on different sample sizes <span class="math inline">\(n\)</span>. We’ll do so for samples of size <span class="math inline">\(n=25\)</span>, <span class="math inline">\(n=50\)</span>, and <span class="math inline">\(n=100\)</span> next.</p>
+</div>
+<div id="using-different-shovels" class="section level3">
+<h3><span class="header-section-number">8.3.4</span> Using different shovels</h3>
+<p>Recall, the sampling we just did on the computer using the <code>rep_sample_n()</code> function is simply a virtual version of act of taking a tactile sample using the shovel with <span class="math inline">\(n=50\)</span> slots seen in Figure <a href="8-sampling.html#fig:shovel-n-50">8.11</a>. We visualized the variation in the resulting sample proportion red <span class="math inline">\(\widehat{p}\)</span> in a histogram of the sampling distribution and quantified this variation using the standard error.</p>
+<div class="figure" style="text-align: center"><span id="fig:shovel-n-50"></span>
+<img src="images/sampling/shovel_050.jpg" alt="Tactile shovel for sampling n = 50 balls" width="400px" />
+<p class="caption">
+Figure 8.11: Tactile shovel for sampling n = 50 balls
+</p>
+</div>
+<p>But what if we changed the sample size to <span class="math inline">\(n=25\)</span>? This would correspond to sampling using the shovel with <span class="math inline">\(n=25\)</span> slots see in Figure <a href="8-sampling.html#fig:shovel-n-25">8.12</a>. What differences if any would you notice about the sampling distribution and the standard error?</p>
+<div class="figure" style="text-align: center"><span id="fig:shovel-n-25"></span>
+<img src="images/sampling/shovel_025.jpg" alt="Tactile shovel for sampling n = 25 balls" width="400px" />
+<p class="caption">
+Figure 8.12: Tactile shovel for sampling n = 25 balls
+</p>
+</div>
+<p>Furthermore what if we took samples of size <span class="math inline">\(n=100\)</span> as well? This would correspond to sampling using the shovel with <span class="math inline">\(n=100\)</span> slots see in Figure <a href="8-sampling.html#fig:shovel-n-100">8.13</a>. What differences if any would you notice about the sampling distribution and the standard error for <span class="math inline">\(n=100\)</span> as compared to <span class="math inline">\(n=50\)</span> and <span class="math inline">\(n=25\)</span>?</p>
+<div class="figure" style="text-align: center"><span id="fig:shovel-n-100"></span>
+<img src="images/sampling/shovel_100.jpg" alt="Tactile shovel for sampling n = 100 balls" width="400px" />
+<p class="caption">
+Figure 8.13: Tactile shovel for sampling n = 100 balls
+</p>
+</div>
+<p>Let’s take the opportunity to review our sampling procedure and do this for 1000 virtual samples of size <span class="math inline">\(n=25\)</span>, <span class="math inline">\(n=50\)</span>, <span class="math inline">\(n=100\)</span> each.</p>
+<p><strong>Shovel with <span class="math inline">\(n=50\)</span> slots</strong>: Take 1000 virtual samples of size <span class="math inline">\(n=50\)</span>, mimicking the act of taking 1000 tactile samples using the shovel with <span class="math inline">\(n=50\)</span> slots:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_samples_<span class="dv">50</span> &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">1000</span>)</code></pre>
+<p>Then based on each of these 1000 virtual samples of size <span class="math inline">\(n=50\)</span>, compute the corresponding 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span> being sure to divide by <code>50</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">50</span> &lt;-<span class="st"> </span>virtual_samples_<span class="dv">50</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)</code></pre>
+<p>The <em>standard error</em> is the standard deviation of the 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span>, in other words we are quantifying how much <span class="math inline">\(\widehat{p}\)</span> varies from sample-to-sample based on samples of size <span class="math inline">\(n=50\)</span> due to sampling variation.</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">50</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">SE =</span> <span class="kw">sd</span>(prop_red))</code></pre>
+<pre><code># A tibble: 1 x 1
+      SE
+   &lt;dbl&gt;
+1 0.0694</code></pre>
+<p><strong>Shovel with <span class="math inline">\(n=25\)</span> slots</strong>: Take 1000 virtual samples of size <span class="math inline">\(n=25\)</span>, mimicking the act of taking 1000 tactile samples using the shovel with <span class="math inline">\(n=25\)</span> slots:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_samples_<span class="dv">25</span> &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">25</span>, <span class="dt">reps =</span> <span class="dv">1000</span>)</code></pre>
+<p>Then based on each of these 1000 virtual samples of size <span class="math inline">\(n=50\)</span>, compute the corresponding 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span> being sure to divide by <code>50</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">25</span> &lt;-<span class="st"> </span>virtual_samples_<span class="dv">25</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">25</span>)</code></pre>
+<p>The <em>standard error</em> is the standard deviation of the 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span>, in other words we are quantifying how much <span class="math inline">\(\widehat{p}\)</span> varies from sample-to-sample based on samples of size <span class="math inline">\(n=25\)</span> due to sampling variation.</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">25</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">SE =</span> <span class="kw">sd</span>(prop_red))</code></pre>
+<pre><code># A tibble: 1 x 1
+     SE
+  &lt;dbl&gt;
+1 0.100</code></pre>
+<p><strong>Shovel with <span class="math inline">\(n=100\)</span> slots</strong>: Take 1000 virtual samples of size <span class="math inline">\(n=100\)</span>, mimicking the act of taking 1000 tactile samples using the shovel with <span class="math inline">\(n=100\)</span> slots:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_samples_<span class="dv">100</span> &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">100</span>, <span class="dt">reps =</span> <span class="dv">1000</span>)</code></pre>
+<p>Then based on each of these 1000 virtual samples of size <span class="math inline">\(n=100\)</span>, compute the corresponding 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span> being sure to divide by <code>100</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">100</span> &lt;-<span class="st"> </span>virtual_samples_<span class="dv">100</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">100</span>)</code></pre>
+<p>The <em>standard error</em> is the standard deviation of the 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span>, in other words we are quantifying how much <span class="math inline">\(\widehat{p}\)</span> varies from sample-to-sample based on samples of size <span class="math inline">\(n=100\)</span> due to sampling variation.</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">100</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">SE =</span> <span class="kw">sd</span>(prop_red))</code></pre>
+<pre><code># A tibble: 1 x 1
+      SE
+   &lt;dbl&gt;
+1 0.0457</code></pre>
+<p><strong>Comparison</strong>: Let’s compare the 3 standard errors we computed above in Table <a href="#tab:comparing-n"><strong>??</strong></a>:</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">n</th>
+<th align="center">SE</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">25</td>
+<td align="center">0.100</td>
+</tr>
+<tr class="even">
+<td align="center">50</td>
+<td align="center">0.069</td>
+</tr>
+<tr class="odd">
+<td align="center">100</td>
+<td align="center">0.046</td>
+</tr>
+</tbody>
+</table>
+<p>Observe the behavior of the standard error as <span class="math inline">\(n\)</span> increases from <span class="math inline">\(n=25\)</span> to <span class="math inline">\(n=50\)</span> to <span class="math inline">\(n=100\)</span>, the standard error get smaller. In other words, the values of <span class="math inline">\(\widehat{p}\)</span> vary less. The standard error is a numerical quantification of the spreads of the following three histograms (on the same scale) of the sampling distribution of the sample proportion <span class="math inline">\(\widehat{p}\)</span>:</p>
+<div class="figure" style="text-align: center"><span id="fig:comparing-sampling-distributions"></span>
+<img src="ismaykim_files/figure-html/comparing-sampling-distributions-1.png" alt="Comparing sampling distributions of p-hat for different sample sizes n" width="\textwidth" />
+<p class="caption">
+Figure 8.14: Comparing sampling distributions of p-hat for different sample sizes n
+</p>
+</div>
+<p>Observe that the histogram of possible <span class="math inline">\(\widehat{p}\)</span> values are narrowest and most consistent for the <span class="math inline">\(n=100\)</span> case. In other words, they make less error. “Bigger sample size equals better sampling” is a concept you probably knew before reading this chapter. What we’ve just demonstrated is what this concept means: Samples based on large samples sizes will yield point estimates that vary less around the true value and hence be less prone to error.</p>
+<p>In the case of our sampling bowl, the sample proportion red <span class="math inline">\(\widehat{p}\)</span> based on samples of size <span class="math inline">\(n=100\)</span> will vary the least around the true proportion <span class="math inline">\(p\)</span> of the balls that are red, and thus be less prone to error. On the case of polls as we study in the next chapter: representative polls based on a larger number of respondents will yield guess that tend to be closer to the truth.</p>
+</div>
+</div>
+<div id="polls" class="section level2">
+<h2><span class="header-section-number">8.4</span> In real-life sampling: Polls</h2>
+<p>In December 4, 2013 National Public Radio reported on a recent poll of President Obama’s approval rating among young Americans aged 18-29 in an article <a href="https://www.npr.org/sections/itsallpolitics/2013/12/04/248793753/poll-support-for-obama-among-young-americans-eroding">Poll: Support For Obama Among Young Americans Eroding</a>. A quote from the article:</p>
+<blockquote>
+<p>After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama.</p>
+<p>According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama’s job performance, his lowest-ever standing among the group and an 11-point drop from April.</p>
+</blockquote>
+<p>Let’s tie elements of this story using the concepts and terminology we learned at the outset of this chapter along with our observations from the tactile and virtual sampling simulations:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Population</strong>: Who is the population of <span class="math inline">\(N\)</span> observations of interest?
+<ul>
+<li>Bowl: <span class="math inline">\(N=2400\)</span> identically-shaped balls</li>
+<li>Obama poll: <span class="math inline">\(N = \text{?}\)</span> young Americans aged 18-29</li>
+</ul></li>
+<li><strong>Population parameter</strong>: What is the population parameter?
+<ul>
+<li>Bowl: The true population proportion <span class="math inline">\(p\)</span> of the balls in the bowl that are red.</li>
+<li>Obama poll: The true population proportion <span class="math inline">\(p\)</span> of young Americans who approve of Obama’s job performance.</li>
+</ul></li>
+<li><strong>Census</strong>: What would a census be in this case?
+<ul>
+<li>Bowl: Manually going over all <span class="math inline">\(N=2400\)</span> balls and exactly computing the population proportion <span class="math inline">\(p\)</span> of the balls that are red.</li>
+<li>Obama poll: Locating all <span class="math inline">\(N = \text{?}\)</span> young Americans (which is in the millions) and asking them if they approve of Obama’s job performance. This would be quite expensive to do!</li>
+</ul></li>
+<li><strong>Sampling</strong>: How do you acquire the sample of size <span class="math inline">\(n\)</span> observations?
+<ul>
+<li>Bowl: Using the shovel to extract a sample of <span class="math inline">\(n=50\)</span> balls.</li>
+<li>Obama poll: One way would be to get phone records from a database and pick out <span class="math inline">\(n\)</span> phone numbers. In the case of the above poll, the sample was of size <span class="math inline">\(n=2089\)</span> young adults.</li>
+</ul></li>
+<li><strong>Point estimates/sample statistics</strong>: What is the summary statistic based on the sample of size <span class="math inline">\(n\)</span> that <em>estimates</em> the unknown population parameter?
+<ul>
+<li>Bowl: The <em>sample proportion <span class="math inline">\(\widehat{p}\)</span></em> red of the balls in the sample of size <span class="math inline">\(n=50\)</span>.</li>
+<li>Key: The sample proportion red <span class="math inline">\(\widehat{p}\)</span> of young Americans in the sample of size <span class="math inline">\(n=2089\)</span> that approve of Obama’s job performance. In this study’s case, <span class="math inline">\(\widehat{p} = 0.41\)</span> which is the quoted 41% figure in the article.</li>
+</ul></li>
+<li><strong>Representative sampling</strong>: Is the sample procedure <em>representative</em>? In other words, to the resulting samples “look like” the population?
+<ul>
+<li>Bowl: Does our sample of <span class="math inline">\(n=50\)</span> balls “look like” the contents of the larger set of <span class="math inline">\(N=2400\)</span> balls in the bowl?</li>
+<li>Obama poll: Does our sample of <span class="math inline">\(n=2089\)</span> young Americans “look like” the population of all young Americans aged 18-29?</li>
+</ul></li>
+<li><strong>Generalizability</strong>: Are the samples <em>generalizable</em> to the greater population?
+<ul>
+<li>Bowl: Is <span class="math inline">\(\widehat{p}\)</span> a “good guess” of <span class="math inline">\(p\)</span>?</li>
+<li>Obama poll: Is <span class="math inline">\(\widehat{p} = 0.41\)</span> a “good guess” of <span class="math inline">\(p\)</span>? In other words, can we confidently say that 41% of <em>all</em> young Americans approve of Obama.</li>
+</ul></li>
+<li><strong>Bias</strong>: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample?
+<ul>
+<li>Bowl: Here, I would say it is unbiased. All balls are equally sized as evidenced by the slots of the <span class="math inline">\(n=50\)</span> shovel, and thus no particular color of ball can be favored in our samples over others.</li>
+<li>Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using a database of only mobile phone numbers, would people without mobile phones be included? What about if this were an internet poll on a certain news website? Would non-readers of this this website be included?</li>
+</ul></li>
+<li><strong>Random sampling</strong>: Was the sampling random?
+<ul>
+<li>Bowl: As long as you mixed the bowl sufficiently before sampling, your samples would be random?</li>
+<li>Obama poll: Random sampling is a necessary assumption for all of the above to work. Most articles reporting on polls take this assumption as granted. In our Obama poll, you’d have to ask the group that conducted the poll: The Harvard University Institute of Politics.</li>
+</ul></li>
+</ol>
+<p>Recall the punchline of all the above:</p>
+<blockquote>
+<ul>
+<li>If the sampling of a sample of size <span class="math inline">\(n\)</span> is done at <strong>random</strong>, then</li>
+<li>The sample is <strong>unbiased</strong> and <strong>representative</strong> of the population, thus</li>
+<li>Any result based on the sample can <strong>generalize</strong> to the population, thus</li>
+<li>The <strong>point estimate/sample statistic</strong> is a “good guess” of the unknown population parameter of interest</li>
+</ul>
+</blockquote>
+<p>and thus we have <em>inferred</em> about the population based on our sample. In the bowl example:</p>
+<blockquote>
+<ul>
+<li>If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size <span class="math inline">\(n=50\)</span>, then</li>
+<li>The contents of the shovel will “look like” the contents of the bowl, thus</li>
+<li>Any results based on the sample of <span class="math inline">\(n=50\)</span> balls can generalize to the large bowl of <span class="math inline">\(N=2400\)</span> balls, thus</li>
+<li>The sample proportion <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> sampled balls in the shovel that are red is a “good guess” of the true population proportion <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls that are red.</li>
+</ul>
+</blockquote>
+<p>and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel: the proportion of balls that are red. In the Obama poll example:</p>
+<blockquote>
+<ul>
+<li>If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then</li>
+<li>These 2089 young Americans would “look like” the population of all young Americans, thus</li>
+<li>Any results based on this sample of 2089 young Americans can generalize to entire population of all young Americans, thus</li>
+<li>The reported sample approval rating of 41% of these 2089 young Americans is a “good guess” of the true approval rating amongst <em>all</em> young Americans.</li>
+</ul>
+</blockquote>
+<p>So long story short, this poll’s guess of Obama’s approval rating was 41%. However is this the end of the story when understanding the results of a poll? If you read further in the article, it states:</p>
+<blockquote>
+<p>The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points.</p>
+</blockquote>
+<p>Note the term <em>margin of error</em>, which here is plus or minus 2.1 percentage points. This is saying that a typical range of errors for polls of this type is about <span class="math inline">\(\pm 2.1\%\)</span>, in words from about 2.1% too small to about 2.1% too big. These errors are caused by <em>sampling variation</em>, the same sampling variation you saw studied in the histograms in Sections <a href="8-sampling.html#tactile">8.2</a> on our tactile sampling simulations and Sections <a href="8-sampling.html#virtual">8.3</a> on our virtual sampling simulations.</p>
+<p>In this case of polls, any variation from the true approval rating is an “error” and a reasonable range of errors is the margin of error. We’ll see in the next chapter that this what’s known as a 95% confidence interval for the unknown approval rating. We’ll study confidence intervals using a new package for our data science and statistical toolbox: the <code>infer</code> package for statistical inference.</p>
+</div>
+<div id="conclusion-6" class="section level2">
+<h2><span class="header-section-number">8.5</span> Conclusion</h2>
+<div id="central-limit-theorem" class="section level3">
+<h3><span class="header-section-number">8.5.1</span> Central Limit Theorem</h3>
+<p>What you did in Section <a href="8-sampling.html#tactile">8.2</a> and <a href="8-sampling.html#virtual">8.3</a> was demonstrate a very famous theorem, or mathematically proven truth, called the <em>Central Limit Theorem</em>. It loosely states that when sample means and sample proportions are based on larger and larger samples, the sampling distribution corresponding to these point estimates get</p>
+<ol style="list-style-type: decimal">
+<li>More and more normal</li>
+<li>More and more narrow</li>
+</ol>
+<p>Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following three minute and 38 second video explaining this crucial theorem to statistics using as examples, what else?</p>
+<ol style="list-style-type: decimal">
+<li>The average weight of wild bunny rabbits!</li>
+<li>The average wing span of dragons!</li>
+</ol>
+<center>
+<iframe width="800" height="450" src="https://www.youtube.com/embed/jvoxEYmQHNM" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen>
+</iframe>
+</center>
+</div>
+<div id="whats-to-come-5" class="section level3">
+<h3><span class="header-section-number">8.5.2</span> What’s to come?</h3>
+<p>This chapter serves as an introduction to the theoretical underpinning of the statistical inference techniques that will be discussed in greater detail in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a> for confidence intervals and Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> for hypothesis testing.</p>
+</div>
+<div id="script-of-r-code-5" class="section level3">
+<h3><span class="header-section-number">8.5.3</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/08-sampling.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="7-multiple-regression.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="9-confidence-intervals.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/08-sampling.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/previous_versions/v0.4.0/9-confidence-intervals.html b/docs/previous_versions/v0.4.0/9-confidence-intervals.html
new file mode 100644
index 000000000..00a84b19c
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/9-confidence-intervals.html
@@ -0,0 +1,1806 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>9 Confidence Intervals | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="9 Confidence Intervals | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="9 Confidence Intervals | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="8-sampling.html">
+<link rel="next" href="10-hypothesis-testing.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="confidence-intervals" class="section level1">
+<h1><span class="header-section-number">9</span> Confidence Intervals</h1>
+<p>In Chapter <a href="8-sampling.html#sampling">8</a>, we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter <a href="8-sampling.html#sampling">8</a>:</p>
+<p>Generally speaking, we learned that if the sampling of a sample of size <span class="math inline">\(n\)</span> is done at <em>random</em>, then the resulting sample is <em>unbiased</em> and <em>representative</em> of the <em>population</em>, thus any result based on the sample can <em>generalize</em> to the population, and hence the <strong>point estimate/sample statistic</strong> computed from this sample is a “good guess” of the unknown population parameter of interest</p>
+<p>Specific to the bowl, we learned that if we properly mix the balls first thereby ensuring the randomness of samples extracted using the shovel with <span class="math inline">\(n=50\)</span> slots, then the contents of the shovel will “look like” the contents of the bowl, thus any results based on the sample of <span class="math inline">\(n=50\)</span> balls can generalize to the large bowl of <span class="math inline">\(N=2400\)</span> balls, and hence the sample proportion red <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> balls in the shovel is a “good guess” of the true population proportion red <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls in the bowl.</p>
+<p>We emphasize that we used a point estimate/sample statistic, in this case the sample proportion <span class="math inline">\(\widehat{p}\)</span>, to estimate the unknown value of the population parameter, in this case the population proportion <span class="math inline">\(p\)</span>. In other words, we are using the sample to <strong>infer</strong> about the population.</p>
+<p>We can however consider inferential situations other than just those involving proportions. We present a wide array of such scenarios in Table <a href="#tab:inference-summary-table"><strong>??</strong></a>. In all 7 cases, the point estimate/sample statistic <em>estimates</em> the unknown population parameter. It does so by computing summary statistics based on a sample of size <span class="math inline">\(n\)</span>.</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">Scenario</th>
+<th align="center">Population parameter</th>
+<th align="center">Population Notation</th>
+<th align="center">Point estimate/sample statistic</th>
+<th align="center">Sample Notation</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">1</td>
+<td align="center">Population proportion</td>
+<td align="center"><span class="math inline">\(p\)</span></td>
+<td align="center">Sample proportion</td>
+<td align="center"><span class="math inline">\(\widehat{p}\)</span></td>
+</tr>
+<tr class="even">
+<td align="center">2</td>
+<td align="center">Population mean</td>
+<td align="center"><span class="math inline">\(\mu\)</span></td>
+<td align="center">Sample mean</td>
+<td align="center"><span class="math inline">\(\overline{x}\)</span></td>
+</tr>
+<tr class="odd">
+<td align="center">3</td>
+<td align="center">Difference in population proportions</td>
+<td align="center"><span class="math inline">\(p_1 - p_2\)</span></td>
+<td align="center">Difference in sample proportions</td>
+<td align="center"><span class="math inline">\(\widehat{p}_1 - \widehat{p}_2\)</span></td>
+</tr>
+<tr class="even">
+<td align="center">4</td>
+<td align="center">Difference in population means</td>
+<td align="center"><span class="math inline">\(\mu_1 - \mu_2\)</span></td>
+<td align="center">Difference in sample means</td>
+<td align="center"><span class="math inline">\(\overline{x}_1 - \overline{x}_2\)</span></td>
+</tr>
+<tr class="odd">
+<td align="center">5</td>
+<td align="center">Population standard deviation</td>
+<td align="center"><span class="math inline">\(\sigma\)</span></td>
+<td align="center">Sample standard deviation</td>
+<td align="center"><span class="math inline">\(s\)</span></td>
+</tr>
+<tr class="even">
+<td align="center">6</td>
+<td align="center">Population regression intercept</td>
+<td align="center"><span class="math inline">\(\beta_0\)</span></td>
+<td align="center">Sample regression intercept</td>
+<td align="center"><span class="math inline">\(\widehat{\beta}_0\)</span> or <span class="math inline">\(b_0\)</span></td>
+</tr>
+<tr class="odd">
+<td align="center">7</td>
+<td align="center">Population regression slope</td>
+<td align="center"><span class="math inline">\(\beta_1\)</span></td>
+<td align="center">Sample regression slope</td>
+<td align="center"><span class="math inline">\(\widehat{\beta}_1\)</span> or <span class="math inline">\(b_1\)</span></td>
+</tr>
+</tbody>
+</table>
+<p>We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing:</p>
+<ul>
+<li>Scenario 2 about means. Ex: the average age of pennies.</li>
+<li>Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of <em>two-sample</em> inference.</li>
+<li>Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This another situation of <em>two-sample</em> inference.</li>
+</ul>
+<p>In contrast to these, Scenario 5 involves a measure of spread: the standard deviation. Does the spread/variability of a sample match the spread/variability of the population? However, we leave this topic for a more intermediate course on statistical inference.</p>
+<p>In Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on inference for regression, we’ll cover Scenarios 6 &amp; 7 about the regression line. In particular we’ll see that the fitted regression line from Chapter <a href="6-regression.html#regression">6</a> on basic regression, <span class="math inline">\(\widehat{y} = b_0 + b_1 \cdot x\)</span>, is in fact an estimate of some true population regression line <span class="math inline">\(y = \beta_0 + \beta+1 \cdot x\)</span> based on a sample of <span class="math inline">\(n\)</span> pairs of points <span class="math inline">\((x, y)\)</span>. Ex: Recall our sample of <span class="math inline">\(n=463\)</span> instructors at the UT Austin from the <code>evals</code> data set in Chapter <a href="6-regression.html#regression">6</a>. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for <em>all</em> instructors, not just those at the UT Austin?</p>
+<p>In most cases, we don’t have the population values as we did with the <code>bowl</code> of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a <strong>confidence interval</strong> and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as <strong>bootstrapping</strong> that will be the focus of the beginning sections of this chapter.</p>
+<div id="needed-packages-6" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(janitor)
+<span class="kw">library</span>(moderndive)
+<span class="kw">library</span>(infer)</code></pre>
+</div>
+<div id="datacamp-6" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>Our approach of using data science tools to understand the first major component of statistical inference, confidence intervals, uses the same tools as in <a href="https://twitter.com/minebocek">Mine Cetinkaya-Rundel</a> and <a href="https://twitter.com/crite">Andrew Bray’s</a> DataCamp courses “Inference for Numerical Data” and “Inference for Categorical Data.” If you’re interested in complementing your learning below in an interactive online environment, click on the images below to access the courses.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-numerical-data"><img src="images/datacamp_inference_for_numerical_data.png" alt="Drawing" style="height: 150px;"/></a>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-categorical-data"><img src="images/datacamp_inference_for_categorical_data.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="bootstrapping" class="section level2">
+<h2><span class="header-section-number">9.1</span> Bootstrapping</h2>
+<div id="data-explanation" class="section level3">
+<h3><span class="header-section-number">9.1.1</span> Data explanation</h3>
+<p>The <code>moderndive</code> package contains a sample of 40 pennies collected and minted in the United States. Let’s explore this sample data first:</p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample</code></pre>
+<pre><code># A tibble: 40 x 2
+    year age_in_2011
+   &lt;int&gt;       &lt;int&gt;
+ 1  2005           6
+ 2  1981          30
+ 3  1977          34
+ 4  1992          19
+ 5  2005           6
+ 6  2006           5
+ 7  2000          11
+ 8  1992          19
+ 9  1988          23
+10  1996          15
+# … with 30 more rows</code></pre>
+<p>The <code>pennies_sample</code> data frame has rows corresponding to a single penny with two variables:</p>
+<ul>
+<li><code>year</code> of minting as shown on the penny and</li>
+<li><code>age_in_2011</code> giving the years the penny had been in circulation from 2011 as an integer, e.g. 15, 2, etc.</li>
+</ul>
+<p>Suppose we are interested in understanding some properties of the mean age of <strong>all</strong> US pennies from this data collected in 2011. How might we go about that? Let’s begin by understanding some of the properties of <code>pennies_sample</code> using data wrangling from Chapter <a href="5-wrangling.html#wrangling">5</a> and data visualization from Chapter <a href="3-viz.html#viz">3</a>.</p>
+</div>
+<div id="exploratory-data-analysis" class="section level3">
+<h3><span class="header-section-number">9.1.2</span> Exploratory data analysis</h3>
+<p>First, let’s visualize the values in this sample as a histogram:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(pennies_sample, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-275-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see a roughly symmetric distribution here that has quite a few values near 20 years in age with only a few larger than 40 years or smaller than 5 years. If <code>pennies_sample</code> is a representative sample from the population, we’d expect the age of all US pennies collected in 2011 to have a similar shape, a similar spread, and similar measures of central tendency like the mean.</p>
+<p>So where does the mean value fall for this sample? This point will be known as our <strong>point estimate</strong> and provides us with a single number that could serve as the guess to what the true population mean age might be. Recall how to find this using the <code>dplyr</code> package:</p>
+<pre class="sourceCode r"><code class="sourceCode r">x_bar &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))
+x_bar</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  25.1</code></pre>
+<p>We’ve denoted this <em>sample mean</em> as <span class="math inline">\(\bar{x}\)</span>, which is the standard symbol for denoting the mean of a sample. Our point estimate is, thus, <span class="math inline">\(\bar{x} = 25.1\)</span>. Note that this is just one sample though providing just one guess at the population mean. What if we’d like to have another guess?</p>
+<p>This should all sound similar to what we did in Chapter <a href="8-sampling.html#sampling">8</a>. There instead of collecting just a single scoop of balls we had many different students use the shovel to scoop different samples of red and white balls. We then calculated a sample statistic (the sample proportion) from each sample. But, we don’t have a population to pull from here with the pennies. We only have this one sample.</p>
+<p>The process of <strong>bootstrapping</strong> allows us to use a single sample to generate many different samples that will act as our way of approximating a sampling distribution using a created <strong>bootstrap distribution</strong> instead. We will pull ourselves up from our bootstraps using a single sample (<code>pennies_sample</code>) to get an idea of the grander sampling distribution.</p>
+</div>
+<div id="bootstrap-process" class="section level3">
+<h3><span class="header-section-number">9.1.3</span> The Bootstrapping Process</h3>
+<p>Bootstrapping uses a process of sampling <strong>with replacement</strong> from our original sample to create new <strong>bootstrap samples</strong> of the <em>same</em> size as our original sample. We can again make use of the <code>rep_sample_n()</code> function to explore what one such bootstrap sample would look like. Remember that we are randomly sampling from the original sample here with replacement and that we always use the same sample size for the bootstrap samples as the size of the original sample (<code>pennies_sample</code>).</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_sample1 &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">40</span>, <span class="dt">replace =</span> <span class="ot">TRUE</span>, <span class="dt">reps =</span> <span class="dv">1</span>)
+bootstrap_sample1</code></pre>
+<pre><code># A tibble: 40 x 3
+# Groups:   replicate [1]
+   replicate  year age_in_2011
+       &lt;int&gt; &lt;int&gt;       &lt;int&gt;
+ 1         1  1983          28
+ 2         1  2000          11
+ 3         1  2004           7
+ 4         1  1981          30
+ 5         1  1993          18
+ 6         1  2006           5
+ 7         1  1981          30
+ 8         1  2004           7
+ 9         1  1992          19
+10         1  1994          17
+# … with 30 more rows</code></pre>
+<p>Let’s visualize what this new bootstrap sample looks like:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(bootstrap_sample1, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-279-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We now have another sample from what we could assume comes from the population of interest. We can similarly calculate the sample mean of this bootstrap sample, called a <strong>bootstrap statistic</strong>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_sample1 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 1 x 2
+  replicate  stat
+      &lt;int&gt; &lt;dbl&gt;
+1         1  23.2</code></pre>
+<p>We can see that this sample mean is smaller than the <code>x_bar</code> value we calculated earlier for the <code>pennies_sample</code> data. We’ll come back to analyzing the different bootstrap statistic values shortly.</p>
+<p>Let’s recap what was done to get to this bootstrap sample using a tactile explanation:</p>
+<ol style="list-style-type: decimal">
+<li>First, pretend that each of the 40 values of <code>age_in_2011</code> in <code>pennies_sample</code> were written on a small piece of paper. Recall that these values were 6, 30, 34, 19, 6, etc.</li>
+<li>Now, put the 40 small pieces of paper into a receptacle such as a baseball cap.</li>
+<li>Shake up the pieces of paper.</li>
+<li>Draw “at random” from the cap to select one piece of paper.</li>
+<li>Write down the value on this piece of paper. Say that it is 28.</li>
+<li>Now, place this piece of paper containing 28 back into the cap.</li>
+<li>Draw “at random” again from the cap to select a piece of paper. Note that this is the <em>sampling with replacement</em> part since you may draw 28 again.</li>
+<li>Repeat this process until you have drawn 40 pieces of paper and written down the values on these 40 pieces of paper. Completing this repetition produces ONE bootstrap sample.</li>
+</ol>
+<p>If you look at the values in <code>bootstrap_sample1</code>, you can see how this process plays out. We originally drew 28, then we drew 11, then 7, and so on. Of course, we didn’t actually use pieces of paper and a cap here. We just had the computer perform this process for us to produce <code>bootstrap_sample1</code> using <code>rep_sample_n()</code> with <code>replace = TRUE</code> set.</p>
+<p>The process of <em>sampling with replacement</em> is how we can use the original sample to take a guess as to what other values in the population may be. Sometimes in these bootstrap samples, we will select lots of larger values from the original sample, sometimes we will select lots of smaller values, and most frequently we will select values that are near the center of the sample. Let’s explore what the distribution of values of <code>age_in_2011</code> for six different bootstrap samples looks like to further understand this variability.</p>
+<pre class="sourceCode r"><code class="sourceCode r">six_bootstrap_samples &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">40</span>, <span class="dt">replace =</span> <span class="ot">TRUE</span>, <span class="dt">reps =</span> <span class="dv">6</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(six_bootstrap_samples, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>replicate)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-282-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can also look at the six different means using <code>dplyr</code> syntax:</p>
+<pre class="sourceCode r"><code class="sourceCode r">six_bootstrap_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 6 x 2
+  replicate  stat
+      &lt;int&gt; &lt;dbl&gt;
+1         1  23.6
+2         2  24.1
+3         3  25.2
+4         4  23.1
+5         5  24.0
+6         6  24.7</code></pre>
+<p>Instead of doing this six times, we could do it 1000 times and then look at the distribution of <code>stat</code> across all 1000 of the <code>replicate</code>s. This sets the stage for the <code>infer</code> R package <span class="citation">(Bray et al. 2019)</span> that was created to help users perform statistical inference such as confidence intervals and hypothesis tests using verbs similar to what you’ve seen with <code>dplyr</code>. We’ll walk through setting up each of the <code>infer</code> verbs for confidence intervals using this <code>pennies_sample</code> example, while also explaining the purpose of the verbs in a general framework.</p>
+</div>
+</div>
+<div id="the-infer-package-for-statistical-inference" class="section level2">
+<h2><span class="header-section-number">9.2</span> The infer package for statistical inference</h2>
+<p>The <code>infer</code> package makes great use of the <code>%&gt;%</code> to create a pipeline for statistical inference. The goal of the package is to provide a way for its users to explain the computational process of confidence intervals and hypothesis tests using the code as a guide. The verbs build in order here, so you’ll want to start with <code>specify()</code> and then continue through the others as needed.</p>
+<div id="specify-variables" class="section level3">
+<h3><span class="header-section-number">9.2.1</span> Specify variables</h3>
+<p><img src="images/flowcharts/infer/specify.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The <code>specify()</code> function is used primarily to choose which variables will be the focus of the statistical inference. In addition, a setting of which variable will act as the <code>explanatory</code> and which acts as the <code>response</code> variable is done here. For proportion problems to those in Chapter <a href="8-sampling.html#sampling">8</a>, we can also give which of the different levels we would like to have as a <code>success</code>. We’ll see further examples of these options in this chapter, Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a>, and in Appendix <a href="B-appendixB.html#appendixB">B</a>.</p>
+<p>To begin to create a confidence interval for the population mean age of US pennies in 2011, we start by using <code>specify()</code> to choose which variable in our <code>pennies_sample</code> data we’d like to work with. This can be done in one of two ways:</p>
+<ol style="list-style-type: decimal">
+<li>Using the <code>response</code> argument:</li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age_in_<span class="dv">2011</span>)</code></pre>
+<pre><code>Response: age_in_2011 (integer)
+# A tibble: 40 x 1
+   age_in_2011
+         &lt;int&gt;
+ 1           6
+ 2          30
+ 3          34
+ 4          19
+ 5           6
+ 6           5
+ 7          11
+ 8          19
+ 9          23
+10          15
+# … with 30 more rows</code></pre>
+<ol start="2" style="list-style-type: decimal">
+<li>Using <code>formula</code> notation:</li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> age_in_<span class="dv">2011</span> <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>)</code></pre>
+<pre><code>Response: age_in_2011 (integer)
+# A tibble: 40 x 1
+   age_in_2011
+         &lt;int&gt;
+ 1           6
+ 2          30
+ 3          34
+ 4          19
+ 5           6
+ 6           5
+ 7          11
+ 8          19
+ 9          23
+10          15
+# … with 30 more rows</code></pre>
+<p>Note that the formula notation uses the common R methodology to include the response <span class="math inline">\(y\)</span> variable on the left of the <code>~</code> and the explanatory <span class="math inline">\(x\)</span> variable on the right of the “tilde.” Recall that you used this notation frequently with the <code>lm()</code> function in Chapters <a href="6-regression.html#regression">6</a> and <a href="7-multiple-regression.html#multiple-regression">7</a> when fitting regression models. Either notation works just fine, but a preference is usually given here for the <code>formula</code> notation to further build on the ideas from earlier chapters.</p>
+</div>
+<div id="generate-replicates" class="section level3">
+<h3><span class="header-section-number">9.2.2</span> Generate replicates</h3>
+<p><img src="images/flowcharts/infer/generate.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>After <code>specify()</code>ing the variables we’d like in our inferential analysis, we next feed that into the <code>generate()</code> verb. The <code>generate()</code> verb’s main argument is <code>reps</code>, which is used to give how many different repetitions one would like to perform. Another argument here is <code>type</code>, which is automatically determined by the kinds of variables passed into <code>specify()</code>. We can also be explicit and set this <code>type</code> to be <code>type = &quot;bootstrap&quot;</code>. This <code>type</code> argument will be further used in hypothesis testing in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> as well. Make sure to check out <code>?generate</code> to see the options here and use the <code>?</code> operator to better understand other verbs as well.</p>
+<p>Let’s <code>generate()</code> 1000 bootstrap samples:</p>
+<pre class="sourceCode r"><code class="sourceCode r">thousand_bootstrap_samples &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age_in_<span class="dv">2011</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>)</code></pre>
+<p>We can use the <code>dplyr</code> <code>count()</code> function to help us understand what the <code>thousand_bootstrap_samples</code> data frame looks like:</p>
+<pre class="sourceCode r"><code class="sourceCode r">thousand_bootstrap_samples <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">count</span>(replicate)</code></pre>
+<pre><code># A tibble: 1,000 x 2
+# Groups:   replicate [1,000]
+   replicate     n
+       &lt;int&gt; &lt;int&gt;
+ 1         1    40
+ 2         2    40
+ 3         3    40
+ 4         4    40
+ 5         5    40
+ 6         6    40
+ 7         7    40
+ 8         8    40
+ 9         9    40
+10        10    40
+# … with 990 more rows</code></pre>
+<p>Notice that each <code>replicate</code> has 40 entries here. Now that we have 1000 different bootstrap samples, our next step is to <code>calculate</code> the bootstrap statistics for each sample.</p>
+</div>
+<div id="calculate-summary-statistics" class="section level3">
+<h3><span class="header-section-number">9.2.3</span> Calculate summary statistics</h3>
+<p><img src="images/flowcharts/infer/calculate.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>After <code>generate()</code>ing many different samples, we next want to condense those samples down into a single statistic for each <code>replicate</code>d sample. As seen in the diagram, the <code>calculate()</code> function is helpful here.</p>
+<p>As we did at the beginning of this chapter, we now want to calculate the mean <code>age_in_2011</code> for each bootstrap sample. To do so, we use the <code>stat</code> argument and set it to <code>&quot;mean&quot;</code> below. The <code>stat</code> argument has a variety of different options here and we will see further examples of this throughout the remaining chapters.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age_in_<span class="dv">2011</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)
+bootstrap_distribution</code></pre>
+<pre><code># A tibble: 1,000 x 2
+   replicate  stat
+       &lt;int&gt; &lt;dbl&gt;
+ 1         1  26.5
+ 2         2  25.4
+ 3         3  26.0
+ 4         4  26  
+ 5         5  25.2
+ 6         6  29.0
+ 7         7  22.8
+ 8         8  26.4
+ 9         9  24.9
+10        10  28.1
+# … with 990 more rows</code></pre>
+<p>We see that the resulting data has 1000 rows and 2 columns corresponding to the 1000 replicates and the mean for each bootstrap sample.</p>
+<div id="observed-statistic-point-estimate-calculations" class="section level4 unnumbered">
+<h4>Observed statistic / point estimate calculations</h4>
+<p>Just as <code>group_by() %&gt;% summarize()</code> produces a useful workflow in <code>dplyr</code>, we can also use <code>specify() %&gt;% calculate()</code> to compute summary measures on our original sample data. It’s often helpful both in confidence interval calculations, but also in hypothesis testing to identify what the corresponding statistic is in the original data. For our example on penny age, we computed above a value of <code>x_bar</code> using the <code>summarize()</code> verb in <code>dplyr</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  25.1</code></pre>
+<p>This can also be done by skipping the <code>generate()</code> step in the pipeline feeding <code>specify()</code> directly into <code>calculate()</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age_in_<span class="dv">2011</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  25.1</code></pre>
+<p>This shortcut will be particularly useful when the calculation of the observed statistic is tricky to do using <code>dplyr</code> alone. This is particularly the case when working with more than one variable as will be seen in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a>.</p>
+</div>
+</div>
+<div id="visualize-the-results" class="section level3">
+<h3><span class="header-section-number">9.2.4</span> Visualize the results</h3>
+<p><img src="images/flowcharts/infer/visualize.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The <code>visualize()</code> verb provides a simple way to view the bootstrap distribution as a histogram of the <code>stat</code> variable values. It has many other arguments that one can use as well including the shading of the histogram values corresponding to the confidence interval values.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-295-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The shape of this resulting distribution may look familiar to you. It resembles the well-known normal (bell-shaped) curve.</p>
+<p>The following diagram recaps the <code>infer</code> pipeline for creating a bootstrap distribution.</p>
+<p><img src="images/flowcharts/infer/ci_diagram.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+</div>
+</div>
+<div id="now-to-confidence-intervals" class="section level2">
+<h2><span class="header-section-number">9.3</span> Now to confidence intervals</h2>
+<p><strong>Definition: Confidence Interval</strong></p>
+<p>A <em>confidence interval</em> gives a range of plausible values for a parameter. It depends on a specified <em>confidence level</em> with higher confidence levels corresponding to wider confidence intervals and lower confidence levels corresponding to narrower confidence intervals. Common confidence levels include 90%, 95%, and 99%.</p>
+<p>Usually we don’t just begin sections with a definition, but <em>confidence intervals</em> are simple to define and play an important role in the sciences and any field that uses data. You can think of a confidence interval as playing the role of a net when fishing. Instead of just trying to catch a fish with a single spear (estimating an unknown parameter by using a single point estimate/statistic), we can use a net to try to provide a range of possible locations for the fish (use a range of possible values based around our statistic to make a plausible guess as to the location of the parameter).</p>
+<p>The bootstrapping process will provide bootstrap statistics that have a bootstrap distribution with center at (or extremely close to) the mean of the original sample. This can be seen by giving the observed statistic <code>obs_stat</code> argument the value of the point estimate <code>x_bar</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> x_bar)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-297-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can also compute the mean of the bootstrap distribution of means to see how it compares to <code>x_bar</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_of_means =</span> <span class="kw">mean</span>(stat))</code></pre>
+<pre><code># A tibble: 1 x 1
+  mean_of_means
+          &lt;dbl&gt;
+1          25.1</code></pre>
+<p>In this case, we can see that the bootstrap distribution provides us a guess as to what the variability in different sample means may look like only using the original sample as our guide. We can quantify this variability in the form of a 95% confidence interval in a couple different ways.</p>
+<div id="percentile-method" class="section level3">
+<h3><span class="header-section-number">9.3.1</span> The percentile method</h3>
+<p>One way to calculate a range of plausible values for the unknown mean age of coins in 2011 is to use the middle 95% of the <code>bootstrap_distribution</code> to determine our endpoints. Our endpoints are thus at the 2.5<sup>th</sup> and 97.5<sup>th</sup> percentiles. This can be done with <code>infer</code> using the <code>get_ci()</code> function. (You can also use the <code>conf_int()</code> or <code>get_confidence_interval()</code> functions here as they are aliases that work the exact same way.)</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">level =</span> <span class="fl">0.95</span>, <span class="dt">type =</span> <span class="st">&quot;percentile&quot;</span>)</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1   21.0    29.3</code></pre>
+<p>These options are the default values for <code>level</code> and <code>type</code> so we can also just do:</p>
+<pre class="sourceCode r"><code class="sourceCode r">percentile_ci &lt;-<span class="st"> </span>bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+percentile_ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1   21.0    29.3</code></pre>
+<p>Using the percentile method, our range of plausible values for the mean age of US pennies in circulation in 2011 is 20.972 years to 29.252 years. We can use the <code>visualize()</code> function to view this using the <code>endpoints</code> and <code>direction</code> arguments, setting <code>direction</code> to <code>&quot;between&quot;</code> (between the values) and <code>endpoints</code> to be those stored with name <code>percentile_ci</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> percentile_ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-301-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>You can see that 95% of the data stored in the <code>stat</code> variable in <code>bootstrap_distribution</code> falls between the two endpoints with 2.5% to the left outside of the shading and 2.5% to the right outside of the shading. The cut-off points that provide our range are shown with the darker lines.</p>
+</div>
+<div id="the-standard-error-method" class="section level3">
+<h3><span class="header-section-number">9.3.2</span> The standard error method</h3>
+<p>If the bootstrap distribution is close to symmetric and bell-shaped, we can also use a shortcut formula for determining the lower and upper endpoints of the confidence interval. This is done by using the formula <span class="math inline">\(\bar{x} \pm (multiplier * SE),\)</span> where <span class="math inline">\(\bar{x}\)</span> is our original sample mean and <span class="math inline">\(SE\)</span> stands for <strong>standard error</strong> and corresponds to the standard deviation of the bootstrap distribution. The value of <span class="math inline">\(multiplier\)</span> here is the appropriate percentile of the standard normal distribution.</p>
+<p>These are automatically calculated when <code>level</code> is provided with <code>level = 0.95</code> being the default. (95% of the values in a standard normal distribution fall within 1.96 standard deviations of the mean, so <span class="math inline">\(multiplier = 1.96\)</span> for <code>level = 0.95</code>, for example.) As mentioned, this formula assumes that the bootstrap distribution is symmetric and bell-shaped. This is often the case with bootstrap distributions, especially those in which the original distribution of the sample is not highly skewed.</p>
+<p><strong>Definition: standard error</strong></p>
+<p>The <em>standard error</em> is the standard deviation of the sampling distribution.</p>
+<p>The variability of the sampling distribution may be approximated by the variability of the bootstrap distribution. Traditional theory-based methodologies for inference also have formulas for standard errors, assuming some conditions are met.</p>
+<p>This <span class="math inline">\(\bar{x} \pm (multiplier * SE)\)</span> formula is implemented in the <code>get_ci()</code> function as shown with our pennies problem using the bootstrap distribution’s variability as an approximation for the sampling distribution’s variability. We’ll see more on this approximation shortly.</p>
+<p>Note that the center of the confidence interval (the <code>point_estimate</code>) must be provided for the standard error confidence interval.</p>
+<pre class="sourceCode r"><code class="sourceCode r">standard_error_ci &lt;-<span class="st"> </span>bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">type =</span> <span class="st">&quot;se&quot;</span>, <span class="dt">point_estimate =</span> x_bar)
+standard_error_ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  lower upper
+  &lt;dbl&gt; &lt;dbl&gt;
+1  21.0  29.3</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> standard_error_ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-304-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that both methods produce nearly identical confidence intervals with the percentile method being <span class="math inline">\([20.97, 29.25]\)</span> and the standard error method being <span class="math inline">\([20.97, 29.28]\)</span>.</p>
+</div>
+</div>
+<div id="comparing-bootstrap-and-sampling-distributions" class="section level2">
+<h2><span class="header-section-number">9.4</span> Comparing bootstrap and sampling distributions</h2>
+<p>To help build up the idea of a confidence interval, we weren’t completely honest in our initial discussion. The <code>pennies_sample</code> data frame represents a sample from a larger number of pennies stored as <code>pennies</code> in the <code>moderndive</code> package. The <code>pennies</code> data frame (also in the <code>moderndive</code> package) contains 800 rows of data and two columns pertaining to the same variables as <code>pennies_sample</code>. Let’s begin by understanding some of the properties of the <code>age_by_2011</code> variable in the <code>pennies</code> data frame.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(pennies, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-305-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_age =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>),
+            <span class="dt">median_age =</span> <span class="kw">median</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 1 x 2
+  mean_age median_age
+     &lt;dbl&gt;      &lt;dbl&gt;
+1     21.2         20</code></pre>
+<p>We see that <code>pennies</code> is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that <code>pennies_sample</code> was more symmetric than <code>pennies</code>. In fact, it actually exhibited some left-skew as we compare the mean and median values.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(pennies_sample, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-307-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_age =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>),
+            <span class="dt">median_age =</span> <span class="kw">median</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 1 x 2
+  mean_age median_age
+     &lt;dbl&gt;      &lt;dbl&gt;
+1     25.1       25.5</code></pre>
+<div id="sampling-distribution" class="section level4 unnumbered">
+<h4>Sampling distribution</h4>
+<p>Let’s assume that <code>pennies</code> represents our population of interest. We can then create a sampling distribution for the population mean age of pennies, denoted by the Greek letter <span class="math inline">\(\mu\)</span>, using the <code>rep_sample_n()</code> function seen in Chapter <a href="8-sampling.html#sampling">8</a>. First we will create 1000 samples from the <code>pennies</code> data frame.</p>
+<pre class="sourceCode r"><code class="sourceCode r">thousand_samples &lt;-<span class="st"> </span>pennies <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">40</span>, <span class="dt">reps =</span> <span class="dv">1000</span>, <span class="dt">replace =</span> <span class="ot">FALSE</span>)</code></pre>
+<p>When creating a sampling distribution, we do not replace the items when we create each sample. This is in contrast to the bootstrap distribution. It’s important to remember that the sampling distribution is sampling <strong>without</strong> replacement from the population to better understand sample-to-sample variability, whereas the bootstrap distribution is sampling <strong>with</strong> replacement from our original sample to better understand potential sample-to-sample variability.</p>
+<p>After sampling from <code>pennies</code> 1000 times, we next want to compute the mean age for each of the 1000 samples:</p>
+<pre class="sourceCode r"><code class="sourceCode r">sampling_distribution &lt;-<span class="st"> </span>thousand_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<p>We could use <code>ggplot()</code> with <code>geom_histogram()</code> again, but since we’ve named our column in <code>summarize()</code> to be <code>stat</code>, we can also use the shortcut <code>visualize()</code> function in <code>infer</code> and also specify the number of bins and also fill the bars with a different color such as <code>&quot;salmon&quot;</code>. This will be done to help remember that <code>&quot;salmon&quot;</code> corresponds to “sampling distribution”.</p>
+<pre class="sourceCode r"><code class="sourceCode r">sampling_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">fill =</span> <span class="st">&quot;salmon&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-311"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-311-1.png" alt="Sampling distribution for n=40 samples of pennies" width="\textwidth" />
+<p class="caption">
+Figure 9.1: Sampling distribution for n=40 samples of pennies
+</p>
+</div>
+<p>We can also examine the variability in this sampling distribution by calculating the standard deviation of the <code>stat</code> column. Remember that the standard deviation of the sampling distribution is the <strong>standard error</strong>, frequently denoted as <code>se</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">sampling_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">se =</span> <span class="kw">sd</span>(stat))</code></pre>
+<pre><code># A tibble: 1 x 1
+     se
+  &lt;dbl&gt;
+1  2.01</code></pre>
+</div>
+<div id="bootstrap-distribution" class="section level4 unnumbered">
+<h4>Bootstrap distribution</h4>
+<p>Let’s now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We’ll shade the bootstrap distribution blue to further assist with remembering which is which.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">fill =</span> <span class="st">&quot;blue&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-313-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">se =</span> <span class="kw">sd</span>(stat))</code></pre>
+<pre><code># A tibble: 1 x 1
+     se
+  &lt;dbl&gt;
+1  2.12</code></pre>
+<p>Notice that while the standard deviations are similar, the center of the sampling distribution and the bootstrap distribution differ:</p>
+<pre class="sourceCode r"><code class="sourceCode r">sampling_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_of_sampling_means =</span> <span class="kw">mean</span>(stat))</code></pre>
+<pre><code># A tibble: 1 x 1
+  mean_of_sampling_means
+                   &lt;dbl&gt;
+1                   21.2</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_of_bootstrap_means =</span> <span class="kw">mean</span>(stat))</code></pre>
+<pre><code># A tibble: 1 x 1
+  mean_of_bootstrap_means
+                    &lt;dbl&gt;
+1                    25.1</code></pre>
+<p>Since the bootstrap distribution is centered at the original sample mean, it doesn’t necessarily provide a good estimate of the overall population mean <span class="math inline">\(\mu\)</span>. Let’s calculate the mean of <code>age_in_2011</code> for the <code>pennies</code> data frame to see how it compares to the mean of the sampling distribution and the mean of the bootstrap distribution.</p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">overall_mean =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 1 x 1
+  overall_mean
+         &lt;dbl&gt;
+1         21.2</code></pre>
+<p>Notice that this value matches up well with the mean of the sampling distribution. This is actually an artifact of the Central Limit Theorem introduced in Chapter <a href="8-sampling.html#sampling">8</a>. The mean of the sampling distribution is expected to be the mean of the overall population.</p>
+<p>The unfortunate fact though is that we don’t know the population mean in nearly all circumstances. The motivation of presenting it here was to show that the theory behind the Central Limit Theorem works using the tools you’ve worked with so far using the <code>ggplot2</code>, <code>dplyr</code>, <code>moderndive</code>, and <code>infer</code> packages.</p>
+<p>If we aren’t able to use the sample mean as a good guess for the population mean, how should we best go about estimating what the population mean may be if we can only select samples from the population. We’ve now come full circle and can discuss the underpinnings of the confidence interval and ways to interpret it.</p>
+</div>
+</div>
+<div id="interpreting-the-confidence-interval" class="section level2">
+<h2><span class="header-section-number">9.5</span> Interpreting the confidence interval</h2>
+<p>As shown above in Subsection <a href="9-confidence-intervals.html#percentile-method">9.3.1</a>, one range of plausible values for the population mean age of pennies in 2011, denoted by <span class="math inline">\(\mu\)</span>, is <span class="math inline">\([20.97, 29.25]\)</span>. Recall that this confidence interval is based on bootstrapping using <code>pennies_sample</code>. Note that the mean of <code>pennies</code> (21.152) does fall in this confidence interval. If we had a different sample of size 40 and constructed a confidence interval using the same method, would we be guaranteed that it contained the population parameter value as well? Let’s try it out:</p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample2 &lt;-<span class="st"> </span>pennies <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">sample_n</span>(<span class="dt">size =</span> <span class="dv">40</span>)</code></pre>
+<p>Note the use of the <code>sample_n()</code> function in the <code>dplyr</code> package here. This does the same thing as <code>rep_sample_n(reps = 1)</code> but omits the extra <code>replicate</code> column.</p>
+<p>We next create an <code>infer</code> pipeline to generate a percentile-based 95% confidence interval for <span class="math inline">\(\mu\)</span>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">percentile_ci2 &lt;-<span class="st"> </span>pennies_sample2 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> age_in_<span class="dv">2011</span> <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+percentile_ci2</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1   18.4    25.3</code></pre>
+<p>This new confidence interval also contains the value of <span class="math inline">\(\mu\)</span>. Let’s further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of <code>pennies</code>. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years.</p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-321-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Of the 100 confidence intervals based on samples of size <span class="math inline">\(n = 40\)</span>, 96 of them captured the population mean <span class="math inline">\(\mu = 21.152\)</span>, whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we’d expect 95% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is “95% reliable” in that we can expect it to include the true population parameter 95% of the time if the process is repeated.</p>
+<p>To further accentuate this point, let’s perform a similar procedure using 90% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals.</p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-322-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Of the 100 confidence intervals based on samples of size <span class="math inline">\(n = 40\)</span>, 87 of them captured the population mean <span class="math inline">\(\mu = 21.152\)</span>, whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be “95% confident” or “90% confident” that the true value falls within the range of the specified confidence interval. We will use this “confident” language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process.</p>
+<div id="back-to-our-pennies-example" class="section level4 unnumbered">
+<h4>Back to our pennies example</h4>
+<p>After this elaboration on what the level corresponds to in a confidence interval, let’s conclude by providing an interpretation of the original confidence interval result we found in Subsection <a href="9-confidence-intervals.html#percentile-method">9.3.1</a>.</p>
+<p><strong>Interpretation:</strong> We are 95% confident that the true mean age of pennies in circulation in 2011 is between 20.972 and 29.252 years. This level of confidence is based on the percentile-based method including the true mean 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.</p>
+</div>
+</div>
+<div id="one-prop-ci" class="section level2">
+<h2><span class="header-section-number">9.6</span> EXAMPLE: One proportion</h2>
+<p>Let’s revisit our exercise of trying to estimate the proportion of red balls in the bowl from Chapter <a href="8-sampling.html#sampling">8</a>. We are now interested in determining a confidence interval for population parameter <span class="math inline">\(p\)</span>, the proportion of balls that are red out of the total <span class="math inline">\(N = 2400\)</span> red and white balls.</p>
+<p>We will use the first sample reported from Ilyas and Yohan in Subsection <a href="8-sampling.html#student-shovels">8.2.2</a> for our point estimate. They observed 21 red balls out of the 50 in their shovel. This data is stored in the <code>tactile_shovel1</code> data frame in the <code>moderndive</code> package.</p>
+<!-- Need to include this in the pkg! -->
+<pre class="sourceCode r"><code class="sourceCode r">tactile_shovel1</code></pre>
+<pre><code># A tibble: 50 x 1
+   color
+   &lt;chr&gt;
+ 1 red  
+ 2 red  
+ 3 white
+ 4 red  
+ 5 white
+ 6 red  
+ 7 red  
+ 8 white
+ 9 red  
+10 white
+# … with 40 more rows</code></pre>
+<div id="observed-statistic" class="section level3">
+<h3><span class="header-section-number">9.6.1</span> Observed Statistic</h3>
+<p>To compute the proportion that are red in this data we can use the <code>specify() %&gt;% calculate()</code> workflow. Note the use of the <code>success</code> argument here to clarify which of the two colors <code>&quot;red&quot;</code> or <code>&quot;white&quot;</code> we are interested in.</p>
+<pre class="sourceCode r"><code class="sourceCode r">p_hat &lt;-<span class="st"> </span>tactile_shovel1 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> color <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>, <span class="dt">success =</span> <span class="st">&quot;red&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)
+p_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  0.42</code></pre>
+</div>
+<div id="bootstrap-distribution-1" class="section level3">
+<h3><span class="header-section-number">9.6.2</span> Bootstrap distribution</h3>
+<p>Next we want to calculate many different bootstrap samples and their corresponding bootstrap statistic (the proportion of red balls). We’ve done 1000 in the past, but let’s go up to 10,000 now to better see the resulting distribution. Recall that this is done by including a <code>generate()</code> function call in the middle of our pipeline:</p>
+<pre class="sourceCode r"><code class="sourceCode r">tactile_shovel1 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> color <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>, <span class="dt">success =</span> <span class="st">&quot;red&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>)</code></pre>
+<p>This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the <code>calculate()</code> step.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_props &lt;-<span class="st"> </span>tactile_shovel1 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> color <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>, <span class="dt">success =</span> <span class="st">&quot;red&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)</code></pre>
+<p>Let’s <code>visualize()</code> what the resulting bootstrap distribution looks like as a histogram. We’ve adjusted the number of bins here as well to better see the resulting shape.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_props <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">25</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-330-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that the resulting distribution is symmetric and bell-shaped so it doesn’t much matter which confidence interval method we choose. Let’s use the standard error method to create a 95% confidence interval.</p>
+<pre class="sourceCode r"><code class="sourceCode r">standard_error_ci &lt;-<span class="st"> </span>bootstrap_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">type =</span> <span class="st">&quot;se&quot;</span>, <span class="dt">level =</span> <span class="fl">0.95</span>, <span class="dt">point_estimate =</span> p_hat)
+standard_error_ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  lower upper
+  &lt;dbl&gt; &lt;dbl&gt;
+1 0.284 0.556</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">25</span>, <span class="dt">endpoints =</span> standard_error_ci)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-332-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We are 95% confident that the true proportion of red balls in the bowl is between 0.284 and years. This level of confidence is based on the standard error-based method including the true proportion 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.</p>
+</div>
+<div id="theory-based-confidence-intervals" class="section level3">
+<h3><span class="header-section-number">9.6.3</span> Theory-based confidence intervals</h3>
+<p>When the bootstrap distribution has the nice symmetric, bell shape that we saw in the red balls example above, we can also use a formula to quantify the standard error. This provides another way to compute a confidence interval, but is a little more tedious and mathematical. The steps are outlined below. We’ve also shown how we can use the confidence interval (CI) interpretation in this case as well to support your understanding of this tricky concept.</p>
+<div id="procedure-for-building-a-theory-based-ci-for-p" class="section level4 unnumbered">
+<h4>Procedure for building a theory-based CI for <span class="math inline">\(p\)</span></h4>
+<p>To construct a theory-based confidence interval for <span class="math inline">\(p\)</span>, the unknown true population proportion we</p>
+<ol style="list-style-type: decimal">
+<li>Collect a sample of size <span class="math inline">\(n\)</span></li>
+<li>Compute <span class="math inline">\(\widehat{p}\)</span></li>
+<li>Compute the standard error <span class="math display">\[\text{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]</span></li>
+<li>Compute the margin of error <span class="math display">\[\text{MoE} = 1.96 \cdot \text{SE} =  1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]</span></li>
+<li>Compute both end points of the confidence interval:
+<ul>
+<li>The lower end point <code>lower_ci</code>: <span class="math display">\[\widehat{p} - \text{MoE} = \widehat{p} - 1.96 \cdot \text{SE} = \widehat{p} - 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]</span></li>
+<li>The upper end point <code>upper_ci</code>: <span class="math display">\[\widehat{p} + \text{MoE} = \widehat{p} + 1.96 \cdot \text{SE} = \widehat{p} + 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]</span></li>
+</ul></li>
+<li>Alternatively, you can succinctly summarize a 95% confidence interval for <span class="math inline">\(p\)</span> using the <span class="math inline">\(\pm\)</span> symbol:</li>
+</ol>
+<p><span class="math display">\[
+\widehat{p} \pm \text{MoE} = \widehat{p} \pm 1.96 \cdot \text{SE} = \widehat{p} \pm 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}
+\]</span></p>
+</div>
+<div id="confidence-intervals-based-on-33-tactile-samples" class="section level4 unnumbered">
+<h4>Confidence intervals based on 33 tactile samples</h4>
+<p>Let’s load the tactile sampling data for the 33 groups from Chapter <a href="8-sampling.html#sampling">8</a>. Recall this data was saved in the <code>tactile_prop_red</code> data frame included in the <code>moderndive</code> package.</p>
+<!-- Load tactile_prop_red into moderndive package too -->
+<pre class="sourceCode r"><code class="sourceCode r">tactile_prop_red</code></pre>
+<p>Let’s now apply the above procedure for constructing confidence intervals for <span class="math inline">\(p\)</span> using the data saved in <code>tactile_prop_red</code> by adding/modifying new columns using the <code>dplyr</code> package data wrangling tools seen in Chapter <a href="5-wrangling.html#wrangling">5</a>:</p>
+<ol style="list-style-type: decimal">
+<li>Rename <code>prop_red</code> to <code>p_hat</code>, the official name of the sample proportion</li>
+<li>Make explicit the sample size <code>n</code> of <span class="math inline">\(n=50\)</span></li>
+<li>the standard error <code>SE</code></li>
+<li>the margin of error <code>MoE</code></li>
+<li>the left endpoint of the confidence interval <code>lower_ci</code></li>
+<li>the right endpoint of the confidence interval <code>upper_ci</code></li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r">conf_ints &lt;-<span class="st"> </span>tactile_prop_red <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">p_hat =</span> prop_red) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(
+    <span class="dt">n =</span> <span class="dv">50</span>,
+    <span class="dt">SE =</span> <span class="kw">sqrt</span>(p_hat <span class="op">*</span><span class="st"> </span>(<span class="dv">1</span> <span class="op">-</span><span class="st"> </span>p_hat) <span class="op">/</span><span class="st"> </span>n),
+    <span class="dt">MoE =</span> <span class="fl">1.96</span> <span class="op">*</span><span class="st"> </span>SE,
+    <span class="dt">lower_ci =</span> p_hat <span class="op">-</span><span class="st"> </span>MoE,
+    <span class="dt">upper_ci =</span> p_hat <span class="op">+</span><span class="st"> </span>MoE
+  )
+conf_ints</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">group</th>
+<th align="center">red_balls</th>
+<th align="center">p_hat</th>
+<th align="center">n</th>
+<th align="center">SE</th>
+<th align="center">MoE</th>
+<th align="center">lower_ci</th>
+<th align="center">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">Ilyas, Yohan</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="even">
+<td align="center">Morgan, Terrance</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+<td align="center">50</td>
+<td align="center">0.067</td>
+<td align="center">0.131</td>
+<td align="center">0.209</td>
+<td align="center">0.471</td>
+</tr>
+<tr class="odd">
+<td align="center">Martin, Thomas</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="even">
+<td align="center">Clark, Frank</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="odd">
+<td align="center">Riddhi, Karina</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+<td align="center">50</td>
+<td align="center">0.068</td>
+<td align="center">0.133</td>
+<td align="center">0.227</td>
+<td align="center">0.493</td>
+</tr>
+<tr class="even">
+<td align="center">Andrew, Tyler</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+<td align="center">50</td>
+<td align="center">0.069</td>
+<td align="center">0.135</td>
+<td align="center">0.245</td>
+<td align="center">0.515</td>
+</tr>
+<tr class="odd">
+<td align="center">Julia</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+<td align="center">50</td>
+<td align="center">0.069</td>
+<td align="center">0.135</td>
+<td align="center">0.245</td>
+<td align="center">0.515</td>
+</tr>
+<tr class="even">
+<td align="center">Rachel, Lauren</td>
+<td align="center">11</td>
+<td align="center">0.22</td>
+<td align="center">50</td>
+<td align="center">0.059</td>
+<td align="center">0.115</td>
+<td align="center">0.105</td>
+<td align="center">0.335</td>
+</tr>
+<tr class="odd">
+<td align="center">Daniel, Caroline</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+<td align="center">50</td>
+<td align="center">0.065</td>
+<td align="center">0.127</td>
+<td align="center">0.173</td>
+<td align="center">0.427</td>
+</tr>
+<tr class="even">
+<td align="center">Josh, Maeve</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+<td align="center">50</td>
+<td align="center">0.067</td>
+<td align="center">0.131</td>
+<td align="center">0.209</td>
+<td align="center">0.471</td>
+</tr>
+<tr class="odd">
+<td align="center">Emily, Emily</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+<td align="center">50</td>
+<td align="center">0.066</td>
+<td align="center">0.129</td>
+<td align="center">0.191</td>
+<td align="center">0.449</td>
+</tr>
+<tr class="even">
+<td align="center">Conrad, Emily</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+<td align="center">50</td>
+<td align="center">0.068</td>
+<td align="center">0.133</td>
+<td align="center">0.227</td>
+<td align="center">0.493</td>
+</tr>
+<tr class="odd">
+<td align="center">Oliver, Erik</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+<td align="center">50</td>
+<td align="center">0.067</td>
+<td align="center">0.131</td>
+<td align="center">0.209</td>
+<td align="center">0.471</td>
+</tr>
+<tr class="even">
+<td align="center">Isabel, Nam</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="odd">
+<td align="center">X, Claire</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+<td align="center">50</td>
+<td align="center">0.065</td>
+<td align="center">0.127</td>
+<td align="center">0.173</td>
+<td align="center">0.427</td>
+</tr>
+<tr class="even">
+<td align="center">Cindy, Kimberly</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+<td align="center">50</td>
+<td align="center">0.069</td>
+<td align="center">0.136</td>
+<td align="center">0.264</td>
+<td align="center">0.536</td>
+</tr>
+<tr class="odd">
+<td align="center">Kevin, James</td>
+<td align="center">11</td>
+<td align="center">0.22</td>
+<td align="center">50</td>
+<td align="center">0.059</td>
+<td align="center">0.115</td>
+<td align="center">0.105</td>
+<td align="center">0.335</td>
+</tr>
+<tr class="even">
+<td align="center">Nam, Isabelle</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="odd">
+<td align="center">Harry, Yuko</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+<td align="center">50</td>
+<td align="center">0.065</td>
+<td align="center">0.127</td>
+<td align="center">0.173</td>
+<td align="center">0.427</td>
+</tr>
+<tr class="even">
+<td align="center">Yuki, Eileen</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+<td align="center">50</td>
+<td align="center">0.066</td>
+<td align="center">0.129</td>
+<td align="center">0.191</td>
+<td align="center">0.449</td>
+</tr>
+<tr class="odd">
+<td align="center">Ramses</td>
+<td align="center">23</td>
+<td align="center">0.46</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.138</td>
+<td align="center">0.322</td>
+<td align="center">0.598</td>
+</tr>
+<tr class="even">
+<td align="center">Joshua, Elizabeth, Stanley</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+<td align="center">50</td>
+<td align="center">0.065</td>
+<td align="center">0.127</td>
+<td align="center">0.173</td>
+<td align="center">0.427</td>
+</tr>
+<tr class="odd">
+<td align="center">Siobhan, Jane</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+<td align="center">50</td>
+<td align="center">0.068</td>
+<td align="center">0.133</td>
+<td align="center">0.227</td>
+<td align="center">0.493</td>
+</tr>
+<tr class="even">
+<td align="center">Jack, Will</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+<td align="center">50</td>
+<td align="center">0.066</td>
+<td align="center">0.129</td>
+<td align="center">0.191</td>
+<td align="center">0.449</td>
+</tr>
+<tr class="odd">
+<td align="center">Caroline, Katie</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="even">
+<td align="center">Griffin, Y</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+<td align="center">50</td>
+<td align="center">0.068</td>
+<td align="center">0.133</td>
+<td align="center">0.227</td>
+<td align="center">0.493</td>
+</tr>
+<tr class="odd">
+<td align="center">Kaitlin, Jordan</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+<td align="center">50</td>
+<td align="center">0.067</td>
+<td align="center">0.131</td>
+<td align="center">0.209</td>
+<td align="center">0.471</td>
+</tr>
+<tr class="even">
+<td align="center">Ella, Garrett</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+<td align="center">50</td>
+<td align="center">0.068</td>
+<td align="center">0.133</td>
+<td align="center">0.227</td>
+<td align="center">0.493</td>
+</tr>
+<tr class="odd">
+<td align="center">Julie, Hailin</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+<td align="center">50</td>
+<td align="center">0.065</td>
+<td align="center">0.127</td>
+<td align="center">0.173</td>
+<td align="center">0.427</td>
+</tr>
+<tr class="even">
+<td align="center">Katie, Caroline</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="odd">
+<td align="center">Mallory, Damani, Melissa</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="even">
+<td align="center">Katie</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+<td align="center">50</td>
+<td align="center">0.066</td>
+<td align="center">0.129</td>
+<td align="center">0.191</td>
+<td align="center">0.449</td>
+</tr>
+<tr class="odd">
+<td align="center">Francis, Vignesh</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+<td align="center">50</td>
+<td align="center">0.069</td>
+<td align="center">0.135</td>
+<td align="center">0.245</td>
+<td align="center">0.515</td>
+</tr>
+</tbody>
+</table>
+<p>Let’s plot:</p>
+<ol style="list-style-type: decimal">
+<li>These 33 confidence intervals for <span class="math inline">\(p\)</span>: from <code>lower_ci</code> to <code>upper_ci</code></li>
+<li>The true population proportion <span class="math inline">\(p = 900 / 2400 = 0.375\)</span> with a red vertical line</li>
+</ol>
+<div class="figure" style="text-align: center"><span id="fig:tactile-conf-int"></span>
+<img src="ismaykim_files/figure-html/tactile-conf-int-1.png" alt="33 confidence intervals based on 33 tactile samples of size n=50" width="\textwidth" />
+<p class="caption">
+Figure 9.2: 33 confidence intervals based on 33 tactile samples of size n=50
+</p>
+</div>
+<p>We see that:</p>
+<ul>
+<li>In 31 cases, the confidence intervals “capture” the true <span class="math inline">\(p = 900 / 2400 = 0.375\)</span></li>
+<li>In 2 cases, the confidence intervals do not “capture” the true <span class="math inline">\(p = 900 / 2400 = 0.375\)</span></li>
+</ul>
+<p>Thus, the confidence intervals capture the true proportion $31 / 33 = 93.939% of the time using this theory-based methodology.</p>
+</div>
+<div id="confidence-intervals-based-on-100-virtual-samples" class="section level4 unnumbered">
+<h4>Confidence intervals based on 100 virtual samples</h4>
+<p>Let’s say however, we repeated the above 100 times, not tactilely, but virtually. Let’s do this only 100 times instead of 1000 like we did before so that the results can fit on the screen. Again, the steps for compute a 95% confidence interval for <span class="math inline">\(p\)</span> are:</p>
+<ol style="list-style-type: decimal">
+<li>Collect a sample of size <span class="math inline">\(n = 50\)</span> as we did in Chapter <a href="8-sampling.html#sampling">8</a></li>
+<li>Compute <span class="math inline">\(\widehat{p}\)</span>: the sample proportion red of these <span class="math inline">\(n=50\)</span> balls</li>
+<li>Compute the standard error <span class="math inline">\(\text{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)</span></li>
+<li>Compute the margin of error <span class="math inline">\(\text{MoE} = 1.96 \cdot \text{SE} = 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)</span></li>
+<li>Compute both end points of the confidence interval:
+<ul>
+<li><code>lower_ci</code>: <span class="math inline">\(\widehat{p} - \text{MoE} = \widehat{p} - 1.96 \cdot \text{SE} = \widehat{p} - 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)</span></li>
+<li><code>upper_ci</code>: <span class="math inline">\(\widehat{p} + \text{MoE} = \widehat{p} + 1.96 \cdot \text{SE} = \widehat{p} +1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)</span></li>
+</ul></li>
+</ol>
+<p>Run the following three steps, being sure to <code>View()</code> the resulting data frame after each step so you can convince yourself of what’s going on:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># First: Take 100 virtual samples of n=50 balls</span>
+virtual_samples &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">100</span>)
+
+<span class="co"># Second: For each virtual sample compute the proportion red</span>
+virtual_prop_red &lt;-<span class="st"> </span>virtual_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)
+
+<span class="co"># Third: Compute the 95% confidence interval as above</span>
+virtual_prop_red &lt;-<span class="st"> </span>virtual_prop_red <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">p_hat =</span> prop_red) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(
+    <span class="dt">n =</span> <span class="dv">50</span>,
+    <span class="dt">SE =</span> <span class="kw">sqrt</span>(p_hat<span class="op">*</span>(<span class="dv">1</span><span class="op">-</span>p_hat)<span class="op">/</span>n),
+    <span class="dt">MoE =</span> <span class="fl">1.96</span> <span class="op">*</span><span class="st"> </span>SE,
+    <span class="dt">lower_ci =</span> p_hat <span class="op">-</span><span class="st"> </span>MoE,
+    <span class="dt">upper_ci =</span> p_hat <span class="op">+</span><span class="st"> </span>MoE
+  )</code></pre>
+<p>Here are the results:</p>
+<div class="figure" style="text-align: center"><span id="fig:virtual-conf-int"></span>
+<img src="ismaykim_files/figure-html/virtual-conf-int-1.png" alt="100 confidence intervals based on 100 virtual samples of size n=50" width="\textwidth" />
+<p class="caption">
+Figure 9.3: 100 confidence intervals based on 100 virtual samples of size n=50
+</p>
+</div>
+<p>We see that of our 100 confidence intervals based on samples of size <span class="math inline">\(n=50\)</span>, 96 of them captured the true <span class="math inline">\(p = 900/2400\)</span>, whereas 4 of them missed. As we create more and more confidence intervals based on more and more samples, about 95% of these intervals will capture. In other words our procedure is “95% reliable.”</p>
+<p>Theoretical methods like this have largely been used in the past since we didn’t have the computing power to perform the simulation-based methods such as bootstrapping. They are still commonly used though and if the normality assumptions are met, they can provide a nice option for finding confidence intervals and performing hypothesis tests as we will see in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a>.</p>
+</div>
+</div>
+</div>
+<div id="example-comparing-two-proportions" class="section level2">
+<h2><span class="header-section-number">9.7</span> EXAMPLE: Comparing two proportions</h2>
+<p>If you see someone else yawn, are you more likely to yawn? In an <a href="http://www.discovery.com/tv-shows/mythbusters/mythbusters-database/yawning-contagious/">episode</a> of the show <em>Mythbusters</em>, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website <a href="https://www.discovery.com/tv-shows/mythbusters/videos/is-yawning-contagious">here</a>. More information about the episode is also available on IMDb <a href="https://www.imdb.com/title/tt0768479/">here</a>.</p>
+<p>Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (“confederate”) who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at <code>mythbusters_yawn</code> in the <code>moderndive</code> package. Let’s check it out.</p>
+<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn</code></pre>
+<pre><code># A tibble: 50 x 3
+    subj group   yawn 
+   &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
+ 1     1 seed    yes  
+ 2     2 control yes  
+ 3     3 seed    no   
+ 4     4 seed    yes  
+ 5     5 seed    no   
+ 6     6 control no   
+ 7     7 seed    yes  
+ 8     8 control no   
+ 9     9 control no   
+10    10 seed    no   
+# … with 40 more rows</code></pre>
+<ul>
+<li>The participant ID is stored in the <code>subj</code> variable with values of 1 to 50.</li>
+<li>The <code>group</code> variable is either <code>&quot;seed&quot;</code> for when a confederate was trying to influence the participant or <code>&quot;control&quot;</code> if a confederate did not interact with the participant.</li>
+<li>The <code>yawn</code> variable is either <code>&quot;yes&quot;</code> if the participant yawned or <code>&quot;no&quot;</code> if the participant did not yawn.</li>
+</ul>
+<p>We can use the <code>janitor</code> package to get a glimpse into this data in a table format:</p>
+<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">tabyl</span>(group, yawn) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">adorn_percentages</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">adorn_pct_formatting</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="co"># To show original counts</span>
+<span class="st">  </span><span class="kw">adorn_ns</span>()</code></pre>
+<pre><code>   group         no        yes
+ control 75.0% (12) 25.0%  (4)
+    seed 70.6% (24) 29.4% (10)</code></pre>
+<p>We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We’d like to see if the difference between these two proportions is significantly larger than 0. If so, we’d have evidence to support the claim that yawning is contagious based on this study.</p>
+<p>In looking over this problem, we can make note of some important details to include in our <code>infer</code> pipeline:</p>
+<ul>
+<li>We are calling a <code>success</code> having a <code>yawn</code> value of <code>&quot;yes&quot;</code>.</li>
+<li>Our response variable will always correspond to the variable used in the <code>success</code> so the response variable is <code>yawn</code>.</li>
+<li>The explanatory variable is the other variable of interest here: <code>group</code>.</li>
+</ul>
+<p>To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not.</p>
+<div id="compute-the-point-estimate" class="section level3">
+<h3><span class="header-section-number">9.7.1</span> Compute the point estimate</h3>
+<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group)</code></pre>
+<pre><code>Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`.</code></pre>
+<p>Note that the <code>success</code> argument must be specified in situations such as this where the response variable has only two levels.</p>
+<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>)</code></pre>
+<pre><code>Response: yawn (factor)
+Explanatory: group (factor)
+# A tibble: 50 x 2
+   yawn  group  
+   &lt;fct&gt; &lt;fct&gt;  
+ 1 yes   seed   
+ 2 yes   control
+ 3 no    seed   
+ 4 yes   seed   
+ 5 no    seed   
+ 6 no    control
+ 7 yes   seed   
+ 8 no    control
+ 9 no    control
+10 no    seed   
+# … with 40 more rows</code></pre>
+<p>We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes.</p>
+<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>)</code></pre>
+<pre><code>Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c(&quot;first&quot;, &quot;second&quot;)` means `(&quot;first&quot; - &quot;second&quot;)`. Check `?calculate` for details.</code></pre>
+<p>We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the <code>order</code> in which R should subtract these proportions of successes. As the error message states, we’ll want to put <code>&quot;seed&quot;</code> first after <code>c()</code> and then <code>&quot;control&quot;</code>: <code>order = c(&quot;seed&quot;, &quot;control&quot;)</code>. Our point estimate is thus calculated:</p>
+<pre class="sourceCode r"><code class="sourceCode r">obs_diff &lt;-<span class="st"> </span>mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;seed&quot;</span>, <span class="st">&quot;control&quot;</span>))
+obs_diff</code></pre>
+<pre><code># A tibble: 1 x 1
+    stat
+   &lt;dbl&gt;
+1 0.0441</code></pre>
+<p>This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25).</p>
+</div>
+<div id="bootstrap-distribution-2" class="section level3">
+<h3><span class="header-section-number">9.7.2</span> Bootstrap distribution</h3>
+<p>Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection <a href="9-confidence-intervals.html#bootstrap-process">9.1.3</a> and in computing bootstrap proportions in Section <a href="9-confidence-intervals.html#one-prop-ci">9.6</a>, but we haven’t yet worked with bootstrapping involving multiple variables though.</p>
+<p>In the <code>infer</code> package, bootstrapping with multiple variables means that each <strong>row</strong> is potentially resampled. Let’s investigate this by looking at the first few rows of <code>mythbusters_yawn</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(mythbusters_yawn)</code></pre>
+<pre><code># A tibble: 6 x 3
+   subj group   yawn 
+  &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
+1     1 seed    yes  
+2     2 control yes  
+3     3 seed    no   
+4     4 seed    yes  
+5     5 seed    no   
+6     6 control no   </code></pre>
+<p>When we bootstrap this data, we are potentially pulling the subject’s readings multiple times. Thus, we could see the entries of <code>&quot;seed&quot;</code> for <code>group</code> and <code>&quot;no&quot;</code> for <code>yawn</code> together in a new row in a bootstrap sample. This is further seen by exploring the <code>sample_n()</code> function in <code>dplyr</code> on this smaller 6 row data frame comprised of <code>head(mythbusters_yawn)</code>. The <code>sample_n()</code> function can perform this bootstrapping procedure and is similar to the <code>rep_sample_n()</code> function in <code>infer</code>, except that it is not <code>rep</code>eated but rather only performs one sample with or without replacement.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2019</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(mythbusters_yawn) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">sample_n</span>(<span class="dt">size =</span> <span class="dv">6</span>, <span class="dt">replace =</span> <span class="ot">TRUE</span>)</code></pre>
+<pre><code># A tibble: 6 x 3
+   subj group   yawn 
+  &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
+1     5 seed    no   
+2     5 seed    no   
+3     2 control yes  
+4     4 seed    yes  
+5     1 seed    yes  
+6     1 seed    yes  </code></pre>
+<p>We can see that in this bootstrap sample generated from the first six rows of <code>mythbusters_yawn</code>, we have some rows repeated. The same is true when we perform the <code>generate()</code> step in <code>infer</code> as done below.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution &lt;-<span class="st"> </span>mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;seed&quot;</span>, <span class="st">&quot;control&quot;</span>))</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">20</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-347-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply <code>get_ci()</code> can be used.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">type =</span> <span class="st">&quot;percentile&quot;</span>, <span class="dt">level =</span> <span class="fl">0.95</span>)</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1 -0.219   0.293</code></pre>
+<p>The confidence interval shown here includes the value of 0. We’ll see in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> further what this means in terms of this difference being statistically significant or not, but let’s examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293.</p>
+<p>Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about “95% confident”) that the seed group had a higher proportion of yawning than the control group.</p>
+<p>Note that this all relates to the importance of denoting the <code>order</code> argument in the <code>calculate()</code> function. Since we specified <code>&quot;seed&quot;</code> and then <code>&quot;control&quot;</code> positive values for the statistic correspond to the <code>&quot;seed&quot;</code> proportion being higher, whereas negative values correspond to the <code>&quot;control&quot;</code> group being higher.</p>
+<p>We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that “yawning is contagious” being “confirmed” is not statistically appropriate.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p>Practice problems to come soon!</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="conclusion-7" class="section level2">
+<h2><span class="header-section-number">9.8</span> Conclusion</h2>
+<div id="whats-to-come-6" class="section level3">
+<h3><span class="header-section-number">9.8.1</span> What’s to come?</h3>
+<p>This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> up next!</p>
+</div>
+<div id="script-of-r-code-6" class="section level3">
+<h3><span class="header-section-number">9.8.2</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/09-confidence-intervals.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="8-sampling.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="10-hypothesis-testing.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/09-confidence-intervals.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/previous_versions/v0.4.0/A-appendixA.html b/docs/previous_versions/v0.4.0/A-appendixA.html
new file mode 100644
index 000000000..38f919b39
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/A-appendixA.html
@@ -0,0 +1,657 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>A Statistical Background | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="A Statistical Background | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="A Statistical Background | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="12-thinking-with-data.html">
+<link rel="next" href="B-appendixB.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="appendixA" class="section level1">
+<h1><span class="header-section-number">A</span> Statistical Background</h1>
+<div id="basic-statistical-terms" class="section level2">
+<h2><span class="header-section-number">A.1</span> Basic statistical terms</h2>
+<div id="mean" class="section level3">
+<h3><span class="header-section-number">A.1.1</span> Mean</h3>
+<p>The mean is the most commonly reported measure of center. It is commonly called the “average” though this term can be a little ambiguous. The mean is the sum of all of the data elements divided by how many elements there are. If we have <span class="math inline">\(n\)</span> data points, the mean is given by: <span class="math display">\[Mean = \frac{x_1 + x_2 + \cdots + x_n}{n}\]</span></p>
+</div>
+<div id="median" class="section level3">
+<h3><span class="header-section-number">A.1.2</span> Median</h3>
+<p>The median is calculated by first sorting a variable’s data from smallest to largest. After sorting the data, the middle element in the list is the <strong>median</strong>. If the middle falls between two values, then the median is the mean of those two values.</p>
+</div>
+<div id="standard-deviation" class="section level3">
+<h3><span class="header-section-number">A.1.3</span> Standard deviation</h3>
+<p>We will next discuss the <strong>standard deviation</strong> of a sample dataset pertaining to one variable. The formula can be a little intimidating at first but it is important to remember that it is essentially a measure of how far to expect a given data value is from its mean:</p>
+<p><span class="math display">\[Standard \, deviation = \sqrt{\frac{(x_1 - Mean)^2 + (x_2 - Mean)^2 + \cdots + (x_n - Mean)^2}{n - 1}}\]</span></p>
+</div>
+<div id="five-number-summary" class="section level3">
+<h3><span class="header-section-number">A.1.4</span> Five-number summary</h3>
+<p>The <strong>five-number summary</strong> consists of five values: minimum, first quantile (25<sup>th</sup> percentile), median (50<sup>th</sup> percentile), third quantile (75<sup>th</sup>) quantile, and maximum. The quantiles are calculated as</p>
+<ul>
+<li>first quantile (<span class="math inline">\(Q_1\)</span>): the median of the first half of the sorted data</li>
+<li>third quantile (<span class="math inline">\(Q_3\)</span>): the median of the second half of the sorted data</li>
+</ul>
+<p>The <em>interquartile range</em> is defined as <span class="math inline">\(Q_3 - Q_1\)</span> and is a measure of how spread out the middle 50% of values is. The five-number summary is not influenced by the presence of outliers in the ways that the mean and standard deviation are. It is, thus, recommended for skewed datasets.</p>
+</div>
+<div id="distribution" class="section level3">
+<h3><span class="header-section-number">A.1.5</span> Distribution</h3>
+<p>The <strong>distribution</strong> of a variable/dataset corresponds to generalizing patterns in the dataset. It often shows how frequently elements in the dataset appear. It shows how the data varies and gives some information about where a typical element in the data might fall. Distributions are most easily seen through data visualization.</p>
+</div>
+<div id="outliers" class="section level3">
+<h3><span class="header-section-number">A.1.6</span> Outliers</h3>
+<p><strong>Outliers</strong> correspond to values in the dataset that fall far outside the range of “ordinary” values. In regards to a boxplot (by default), they correspond to values below <span class="math inline">\(Q_1 - (1.5 * IQR)\)</span> or above <span class="math inline">\(Q_3 + (1.5 * IQR)\)</span>.</p>
+<p>Note that these terms (aside from <strong>Distribution</strong>) only apply to quantitative variables.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="12-thinking-with-data.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="B-appendixB.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/91-appendixA.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/previous_versions/v0.4.0/B-appendixB.html b/docs/previous_versions/v0.4.0/B-appendixB.html
new file mode 100644
index 000000000..ea1e3de54
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/B-appendixB.html
@@ -0,0 +1,1618 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>B Inference Examples | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="B Inference Examples | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="B Inference Examples | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="A-appendixA.html">
+<link rel="next" href="C-appendixC.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="appendixB" class="section level1">
+<h1><span class="header-section-number">B</span> Inference Examples</h1>
+<p>This appendix is designed to provide you with examples of the five basic hypothesis tests and their corresponding confidence intervals. Traditional theory-based methods as well as computational-based methods are presented. <!-- You can also use this appendix as a way to check for understanding of which statistical graphic is most appropriate given the problem set-up. --></p>
+<div class="learncheck">
+<p>
+<strong>Note: This appendix is still under construction. If you would like to contribute, please check us out on GitHub at <a href="https://github.com/moderndive/moderndive_book" class="uri">https://github.com/moderndive/moderndive_book</a>.</strong>
+</p>
+<p>
+<strong>Please check out our sneak peak of <code>infer</code> below in the meanwhile. For more details on <code>infer</code> visit <a href="https://infer.netlify.com/" class="uri">https://infer.netlify.com/</a></strong>.
+</p>
+<center>
+<img src="images/sign-2408065_1920.png" alt="Drawing" style="height: 100px;"/>
+</center>
+</div>
+<div id="needed-packages-10" class="section level2 unnumbered">
+<h2>Needed packages</h2>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(infer)
+<span class="kw">library</span>(knitr)
+<span class="kw">library</span>(readr)
+<span class="kw">library</span>(janitor)</code></pre>
+</div>
+<div id="inference-mind-map" class="section level2">
+<h2><span class="header-section-number">B.1</span> Inference mind map</h2>
+<p>To help you better navigate and choose the appropriate analysis, we’ve created a mind map on <a href="http://coggle.it" class="uri">http://coggle.it</a> available <a href="https://coggle.it/diagram/Vxlydu1akQFeqo6-">here</a> and below.</p>
+<div class="figure" style="text-align: center"><span id="fig:infer-map"></span>
+<img src="images/coggle.png" alt="Mind map for Inference" width="200%" />
+<p class="caption">
+Figure B.1: Mind map for Inference
+</p>
+</div>
+</div>
+<div id="one-mean" class="section level2">
+<h2><span class="header-section-number">B.2</span> One mean</h2>
+<div id="problem-statement" class="section level3">
+<h3><span class="header-section-number">B.2.1</span> Problem statement</h3>
+<p>The National Survey of Family Growth conducted by the
+Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy,
+infertility, use of contraception, and men’s and women’s health. One of the variables collected on
+this survey is the age at first marriage. 5,534 randomly sampled US women between 2006 and 2010 completed the survey. The women sampled here had been married at least once. Do we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years? <span class="citation">(Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 4])</span></p>
+</div>
+<div id="competing-hypotheses" class="section level3">
+<h3><span class="header-section-number">B.2.2</span> Competing hypotheses</h3>
+<div id="in-words" class="section level4 unnumbered">
+<h4>In words</h4>
+<ul>
+<li><p>Null hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is equal to 23 years.</p></li>
+<li><p>Alternative hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.</p></li>
+</ul>
+</div>
+<div id="in-symbols-with-annotations" class="section level4 unnumbered">
+<h4>In symbols (with annotations)</h4>
+<ul>
+<li><span class="math inline">\(H_0: \mu = \mu_{0}\)</span>, where <span class="math inline">\(\mu\)</span> represents the mean age of first marriage for all US women from 2006 to 2010 and <span class="math inline">\(\mu_0\)</span> is 23.</li>
+<li><span class="math inline">\(H_A: \mu &gt; 23\)</span></li>
+</ul>
+</div>
+<div id="set-alpha" class="section level4 unnumbered">
+<h4>Set <span class="math inline">\(\alpha\)</span></h4>
+<p>It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.</p>
+</div>
+</div>
+<div id="exploring-the-sample-data" class="section level3">
+<h3><span class="header-section-number">B.2.3</span> Exploring the sample data</h3>
+<pre class="sourceCode r"><code class="sourceCode r">age_at_marriage &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;https://moderndive.com/data/ageAtMar.csv&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">age_summ &lt;-<span class="st"> </span>age_at_marriage <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">sample_size =</span> <span class="kw">n</span>(),
+    <span class="dt">mean =</span> <span class="kw">mean</span>(age),
+    <span class="dt">sd =</span> <span class="kw">sd</span>(age),
+    <span class="dt">minimum =</span> <span class="kw">min</span>(age),
+    <span class="dt">lower_quartile =</span> <span class="kw">quantile</span>(age, <span class="fl">0.25</span>),
+    <span class="dt">median =</span> <span class="kw">median</span>(age),
+    <span class="dt">upper_quartile =</span> <span class="kw">quantile</span>(age, <span class="fl">0.75</span>),
+    <span class="dt">max =</span> <span class="kw">max</span>(age))
+<span class="kw">kable</span>(age_summ)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">sample_size</th>
+<th align="center">mean</th>
+<th align="center">sd</th>
+<th align="center">minimum</th>
+<th align="center">lower_quartile</th>
+<th align="center">median</th>
+<th align="center">upper_quartile</th>
+<th align="center">max</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">5534</td>
+<td align="center">23.4</td>
+<td align="center">4.72</td>
+<td align="center">10</td>
+<td align="center">20</td>
+<td align="center">23</td>
+<td align="center">26</td>
+<td align="center">43</td>
+</tr>
+</tbody>
+</table>
+<p>The histogram below also shows the distribution of <code>age</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> age_at_marriage, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> age)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">3</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/hist1b-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The observed statistic of interest here is the sample mean:</p>
+<pre class="sourceCode r"><code class="sourceCode r">x_bar &lt;-<span class="st"> </span>age_at_marriage <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)
+x_bar</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  23.4</code></pre>
+<div id="guess-about-statistical-significance" class="section level4 unnumbered">
+<h4>Guess about statistical significance</h4>
+<p>We are looking to see if the observed sample mean of 23.44 is statistically greater than <span class="math inline">\(\mu_0 = 23\)</span>. They seem to be quite close, but we have a large sample size here. Let’s guess that the large sample size will lead us to reject this practically small difference.</p>
+<hr />
+</div>
+</div>
+<div id="non-traditional-methods" class="section level3">
+<h3><span class="header-section-number">B.2.4</span> Non-traditional methods</h3>
+<div id="bootstrapping-for-hypothesis-test" class="section level4 unnumbered">
+<h4>Bootstrapping for hypothesis test</h4>
+<p>In order to look to see if the observed sample mean of 23.44 is statistically greater than <span class="math inline">\(\mu_0 = 23\)</span>, we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 5534 was selected.</p>
+<p>We can use the idea of <em>bootstrapping</em> to simulate the population from which the sample came and then generate samples from that simulated population to account for sampling variability. Recall how bootstrapping would apply in this context:</p>
+<ol style="list-style-type: decimal">
+<li>Sample with replacement from our original sample of 5534 women and repeat this process 10,000 times,</li>
+<li>calculate the mean for each of the 10,000 bootstrap samples created in Step 1.,</li>
+<li>combine all of these bootstrap statistics calculated in Step 2 into a <code>boot_distn</code> object, and</li>
+<li>shift the center of this distribution over to the null value of 23. (This is needed since it will be centered at 23.44 via the process of bootstrapping.)</li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2018</span>)
+null_distn_one_mean &lt;-<span class="st"> </span>age_at_marriage <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;point&quot;</span>, <span class="dt">mu =</span> <span class="dv">23</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_one_mean <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-447-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44 for our <span class="math inline">\(p\)</span>-value.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_one_mean <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> x_bar, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-448-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="calculate-p-value" class="section level5 unnumbered">
+<h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_one_mean <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> x_bar, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1       0</code></pre>
+<p>So our <span class="math inline">\(p\)</span>-value is 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tail of the null distribution.</p>
+</div>
+</div>
+<div id="bootstrapping-for-confidence-interval" class="section level4 unnumbered">
+<h4>Bootstrapping for confidence interval</h4>
+<p>We can also create a confidence interval for the unknown population parameter <span class="math inline">\(\mu\)</span> using our sample data using <em>bootstrapping</em>. Note that we don’t need to shift this distribution since we want the center of our confidence interval to be our point estimate <span class="math inline">\(\bar{x}_{obs} = 23.44\)</span>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_one_mean &lt;-<span class="st"> </span>age_at_marriage <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">ci &lt;-<span class="st"> </span>boot_distn_one_mean <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1   23.3    23.6</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_one_mean <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-452-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that 23 is not contained in this confidence interval as a plausible value of <span class="math inline">\(\mu\)</span> (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (<span class="math inline">\(\mu &gt; 23\)</span>).</p>
+<p><strong>Interpretation</strong>: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.316 and 23.565.</p>
+<hr />
+</div>
+</div>
+<div id="traditional-methods" class="section level3">
+<h3><span class="header-section-number">B.2.5</span> Traditional methods</h3>
+<div id="check-conditions" class="section level4 unnumbered">
+<h4>Check conditions</h4>
+<p>Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.</p>
+<ol style="list-style-type: decimal">
+<li><p><em>Independent observations</em>: The observations are collected independently.</p>
+<p>The cases are selected independently through random sampling so this condition is met.</p></li>
+<li><p><em>Approximately normal</em>: The distribution of the response variable should be normal or the sample size should be at least 30.</p>
+<p>The histogram for the sample above does show some skew.</p></li>
+</ol>
+<p>The Q-Q plot below also shows some skew.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> age_at_marriage, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">sample =</span> age)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">stat_qq</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/qqplotmean-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The sample size here is quite large though (<span class="math inline">\(n = 5534\)</span>) so both conditions are met.</p>
+</div>
+<div id="test-statistic" class="section level4 unnumbered">
+<h4>Test statistic</h4>
+<p>The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean <span class="math inline">\(\mu\)</span>. A good guess is the sample mean <span class="math inline">\(\bar{X}\)</span>. Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of <span class="math inline">\(\bar{x}_{obs} = 23.44\)</span> or larger assuming that the population mean is 23 (assuming the null hypothesis is true). If the conditions are met and assuming <span class="math inline">\(H_0\)</span> is true, we can “standardize” this original test statistic of <span class="math inline">\(\bar{X}\)</span> into a <span class="math inline">\(T\)</span> statistic that follows a <span class="math inline">\(t\)</span> distribution with degrees of freedom equal to <span class="math inline">\(df = n - 1\)</span>:</p>
+<p><span class="math display">\[ T =\dfrac{ \bar{X} - \mu_0}{ S / \sqrt{n} } \sim t (df = n - 1) \]</span></p>
+<p>where <span class="math inline">\(S\)</span> represents the standard deviation of the sample and <span class="math inline">\(n\)</span> is the sample size.</p>
+<div id="observed-test-statistic" class="section level5 unnumbered">
+<h5>Observed test statistic</h5>
+<p>While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the <code>t_test()</code> function to perform this analysis for us.</p>
+<pre class="sourceCode r"><code class="sourceCode r">t_test_results &lt;-<span class="st"> </span>age_at_marriage <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span>infer<span class="op">::</span><span class="kw">t_test</span>(<span class="dt">formula =</span> age <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>,
+       <span class="dt">alternative =</span> <span class="st">&quot;greater&quot;</span>,
+       <span class="dt">mu =</span> <span class="dv">23</span>)
+t_test_results</code></pre>
+<pre><code># A tibble: 1 x 6
+  statistic  t_df  p_value alternative lower_ci upper_ci
+      &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt;          &lt;dbl&gt;    &lt;dbl&gt;
+1      6.94  5533 2.25e-12 greater         23.3      Inf</code></pre>
+<p>We see here that the <span class="math inline">\(t_{obs}\)</span> value is 6.936.</p>
+</div>
+</div>
+<div id="compute-p-value" class="section level4 unnumbered">
+<h4>Compute <span class="math inline">\(p\)</span>-value</h4>
+<p>The <span class="math inline">\(p\)</span>-value—the probability of observing an <span class="math inline">\(t_{obs}\)</span> value of 6.936 or more in our null distribution of a <span class="math inline">\(t\)</span> with 5533 degrees of freedom—is essentially 0.</p>
+</div>
+<div id="state-conclusion" class="section level4 unnumbered">
+<h4>State conclusion</h4>
+<p>We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean was statistically greater than the hypothesized mean has supporting evidence here. Based on this sample, we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.</p>
+</div>
+<div id="confidence-interval" class="section level4 unnumbered">
+<h4>Confidence interval</h4>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">t.test</span>(<span class="dt">x =</span> age_at_marriage<span class="op">$</span>age, 
+       <span class="dt">alternative =</span> <span class="st">&quot;two.sided&quot;</span>,
+       <span class="dt">mu =</span> <span class="dv">23</span>)<span class="op">$</span>conf</code></pre>
+<pre><code>[1] 23.3 23.6
+attr(,&quot;conf.level&quot;)
+[1] 0.95</code></pre>
+<hr />
+</div>
+</div>
+<div id="comparing-results" class="section level3">
+<h3><span class="header-section-number">B.2.6</span> Comparing results</h3>
+<p>Observing the bootstrap distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the <span class="math inline">\(p\)</span>-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met (the large sample size was the driver here) leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.</p>
+<hr />
+<hr />
+</div>
+</div>
+<div id="one-proportion" class="section level2">
+<h2><span class="header-section-number">B.3</span> One proportion</h2>
+<div id="problem-statement-1" class="section level3">
+<h3><span class="header-section-number">B.3.1</span> Problem statement</h3>
+<p>The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. 73 were satisfied and the remaining were unsatisfied. Based on these findings from the sample, can we reject the CEO’s hypothesis that 80% of the customers are satisfied? [Tweaked a bit from <a href="http://stattrek.com/hypothesis-test/proportion.aspx?Tutorial=AP" class="uri">http://stattrek.com/hypothesis-test/proportion.aspx?Tutorial=AP</a>]</p>
+</div>
+<div id="competing-hypotheses-1" class="section level3">
+<h3><span class="header-section-number">B.3.2</span> Competing hypotheses</h3>
+<div id="in-words-1" class="section level4 unnumbered">
+<h4>In words</h4>
+<ul>
+<li><p>Null hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is equal 0.80.</p></li>
+<li><p>Alternative hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80.</p></li>
+</ul>
+</div>
+<div id="in-symbols-with-annotations-1" class="section level4 unnumbered">
+<h4>In symbols (with annotations)</h4>
+<ul>
+<li><span class="math inline">\(H_0: \pi = p_{0}\)</span>, where <span class="math inline">\(\pi\)</span> represents the proportion of all customers of the large electric utility satisfied with service they receive and <span class="math inline">\(p_0\)</span> is 0.8.</li>
+<li><span class="math inline">\(H_A: \pi \ne 0.8\)</span></li>
+</ul>
+</div>
+<div id="set-alpha-1" class="section level4 unnumbered">
+<h4>Set <span class="math inline">\(\alpha\)</span></h4>
+<p>It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.</p>
+</div>
+</div>
+<div id="exploring-the-sample-data-1" class="section level3">
+<h3><span class="header-section-number">B.3.3</span> Exploring the sample data</h3>
+<pre class="sourceCode r"><code class="sourceCode r">elec &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="kw">rep</span>(<span class="st">&quot;satisfied&quot;</span>, <span class="dv">73</span>), <span class="kw">rep</span>(<span class="st">&quot;unsatisfied&quot;</span>, <span class="dv">27</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">as_data_frame</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">satisfy =</span> value)</code></pre>
+<p>The bar graph below also shows the distribution of <code>satisfy</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> elec, <span class="kw">aes</span>(<span class="dt">x =</span> satisfy)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_bar</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/bar-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The observed statistic is computed as</p>
+<pre class="sourceCode r"><code class="sourceCode r">p_hat &lt;-<span class="st"> </span>elec <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> satisfy, <span class="dt">success =</span> <span class="st">&quot;satisfied&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)
+p_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  0.73</code></pre>
+<div id="guess-about-statistical-significance-1" class="section level4 unnumbered">
+<h4>Guess about statistical significance</h4>
+<p>We are looking to see if the sample proportion of 0.73 is statistically different from <span class="math inline">\(p_0 = 0.8\)</span> based on this sample. They seem to be quite close, and our sample size is not huge here (<span class="math inline">\(n = 100\)</span>). Let’s guess that we do not have evidence to reject the null hypothesis.</p>
+<hr />
+</div>
+</div>
+<div id="non-traditional-methods-1" class="section level3">
+<h3><span class="header-section-number">B.3.4</span> Non-traditional methods</h3>
+<div id="simulation-for-hypothesis-test" class="section level4 unnumbered">
+<h4>Simulation for hypothesis test</h4>
+<p>In order to look to see if 0.73 is statistically different from 0.8, we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 100 was selected. We can use the idea of an unfair coin to <em>simulate</em> this process. We will simulate flipping an unfair coin (with probability of success 0.8 matching the null hypothesis) 100 times. Then we will keep track of how many heads come up in those 100 flips. Our simulated statistic matches with how we calculated the original statistic <span class="math inline">\(\hat{p}\)</span>: the number of heads (satisfied) out of our total sample of 100. We then repeat this process many times (say 10,000) to create the null distribution looking at the simulated proportions of successes:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2018</span>)
+null_distn_one_prop &lt;-<span class="st"> </span>elec <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> satisfy, <span class="dt">success =</span> <span class="st">&quot;satisfied&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;point&quot;</span>, <span class="dt">p =</span> <span class="fl">0.8</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-455-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a two-tailed test so we will be looking for values that are 0.8 - 0.73 = 0.07 away from 0.8 in BOTH directions for our <span class="math inline">\(p\)</span>-value:</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> p_hat, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-456-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="calculate-p-value-1" class="section level5 unnumbered">
+<h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> p_hat, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1  0.0813</code></pre>
+<p>So our <span class="math inline">\(p\)</span>-value is 0.081 and we fail to reject the null hypothesis at the 5% level.</p>
+</div>
+</div>
+<div id="bootstrapping-for-confidence-interval-1" class="section level4 unnumbered">
+<h4>Bootstrapping for confidence interval</h4>
+<p>We can also create a confidence interval for the unknown population parameter <span class="math inline">\(\pi\)</span> using our sample data. To do so, we use <em>bootstrapping</em>, which involves</p>
+<ol style="list-style-type: decimal">
+<li>sampling with replacement from our original sample of 100 survey respondents and repeating this process 10,000 times,</li>
+<li>calculating the proportion of successes for each of the 10,000 bootstrap samples created in Step 1.,</li>
+<li>combining all of these bootstrap statistics calculated in Step 2 into a <code>boot_distn</code> object,</li>
+<li>identifying the 2.5<sup>th</sup> and 97.5<sup>th</sup> percentiles of this distribution (corresponding to the 5% significance level chosen) to find a 95% confidence interval for <span class="math inline">\(\pi\)</span>, and</li>
+<li>interpret this confidence interval in the context of the problem.</li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_one_prop &lt;-<span class="st"> </span>elec <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> satisfy, <span class="dt">success =</span> <span class="st">&quot;satisfied&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)</code></pre>
+<p>Just as we use the <code>mean</code> function for calculating the mean over a numerical variable, we can also use it to compute the proportion of successes for a categorical variable where we specify what we are calling a “success” after the <code>==</code>. (Think about the formula for calculating a mean and how R handles logical statements such as <code>satisfy == &quot;satisfied&quot;</code> for why this must be true.)</p>
+<pre class="sourceCode r"><code class="sourceCode r">ci &lt;-<span class="st"> </span>boot_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1   0.64    0.81</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-460-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that 0.80 is contained in this confidence interval as a plausible value of <span class="math inline">\(\pi\)</span> (the unknown population proportion). This matches with our hypothesis test results of failing to reject the null hypothesis.</p>
+<p><strong>Interpretation</strong>: We are 95% confident the true proportion of customers who are satisfied with the service they receive is between 0.64 and 0.81.</p>
+<hr />
+</div>
+</div>
+<div id="traditional-methods-1" class="section level3">
+<h3><span class="header-section-number">B.3.5</span> Traditional methods</h3>
+<div id="check-conditions-1" class="section level4 unnumbered">
+<h4>Check conditions</h4>
+<p>Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.</p>
+<ol style="list-style-type: decimal">
+<li><p><em>Independent observations</em>: The observations are collected independently.</p>
+<p>The cases are selected independently through random sampling so this condition is met.</p></li>
+<li><p><em>Approximately normal</em>: The number of expected successes and expected failures is at least 10.</p>
+<p>This condition is met since 73 and 27 are both greater than 10.</p></li>
+</ol>
+</div>
+<div id="test-statistic-1" class="section level4 unnumbered">
+<h4>Test statistic</h4>
+<p>The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population proportion <span class="math inline">\(\pi\)</span>. A good guess is the sample proportion <span class="math inline">\(\hat{P}\)</span>. Recall that this sample proportion is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample proportion of <span class="math inline">\(\hat{p}_{obs} = 0.73\)</span> or larger assuming that the population proportion is 0.80 (assuming the null hypothesis is true). If the conditions are met and assuming <span class="math inline">\(H_0\)</span> is true, we can standardize this original test statistic of <span class="math inline">\(\hat{P}\)</span> into a <span class="math inline">\(Z\)</span> statistic that follows a <span class="math inline">\(N(0, 1)\)</span> distribution.</p>
+<p><span class="math display">\[ Z =\dfrac{ \hat{P} - p_0}{\sqrt{\dfrac{p_0(1 - p_0)}{n} }} \sim N(0, 1) \]</span></p>
+<div id="observed-test-statistic-1" class="section level5 unnumbered">
+<h5>Observed test statistic</h5>
+<p>While one could compute this observed test statistic by “hand” by plugging the observed values into the formula, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. The calculation has been done in R below for completeness though:</p>
+<pre class="sourceCode r"><code class="sourceCode r">p_hat &lt;-<span class="st"> </span><span class="fl">0.73</span>
+p0 &lt;-<span class="st"> </span><span class="fl">0.8</span>
+n &lt;-<span class="st"> </span><span class="dv">100</span>
+(z_obs &lt;-<span class="st"> </span>(p_hat <span class="op">-</span><span class="st"> </span>p0) <span class="op">/</span><span class="st"> </span><span class="kw">sqrt</span>( (p0 <span class="op">*</span><span class="st"> </span>(<span class="dv">1</span> <span class="op">-</span><span class="st"> </span>p0)) <span class="op">/</span><span class="st"> </span>n))</code></pre>
+<pre><code>[1] -1.75</code></pre>
+<p>We see here that the <span class="math inline">\(z_{obs}\)</span> value is around -1.75. Our observed sample proportion of 0.73 is 1.75 standard errors below the hypothesized parameter value of 0.8.</p>
+</div>
+</div>
+<div id="visualize-and-compute-p-value" class="section level4 unnumbered">
+<h4>Visualize and compute <span class="math inline">\(p\)</span>-value</h4>
+<pre class="sourceCode r"><code class="sourceCode r">elec <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> satisfy, <span class="dt">success =</span> <span class="st">&quot;satisfied&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;point&quot;</span>, <span class="dt">p =</span> <span class="fl">0.8</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;z&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">method =</span> <span class="st">&quot;theoretical&quot;</span>, <span class="dt">obs_stat =</span> z_obs, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/pvaloneprop-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="dv">2</span> <span class="op">*</span><span class="st"> </span><span class="kw">pnorm</span>(z_obs)</code></pre>
+<pre><code>[1] 0.0801</code></pre>
+<p>The <span class="math inline">\(p\)</span>-value—the probability of observing an <span class="math inline">\(z_{obs}\)</span> value of -1.75 or more extreme (in both directions) in our null distribution—is around 8%.</p>
+<p>Note that we could also do this test directly using the <code>prop.test</code> function.</p>
+<pre class="sourceCode r"><code class="sourceCode r">stats<span class="op">::</span><span class="kw">prop.test</span>(<span class="dt">x =</span> <span class="kw">table</span>(elec<span class="op">$</span>satisfy),
+       <span class="dt">n =</span> <span class="kw">length</span>(elec<span class="op">$</span>satisfy),
+       <span class="dt">alternative =</span> <span class="st">&quot;two.sided&quot;</span>,
+       <span class="dt">p =</span> <span class="fl">0.8</span>,
+       <span class="dt">correct =</span> <span class="ot">FALSE</span>)</code></pre>
+<pre><code>
+    1-sample proportions test without continuity correction
+
+data:  table(elec$satisfy), null probability 0.8
+X-squared = 3, df = 1, p-value = 0.08
+alternative hypothesis: true p is not equal to 0.8
+95 percent confidence interval:
+ 0.636 0.807
+sample estimates:
+   p 
+0.73 </code></pre>
+<p><code>prop.test</code> does a <span class="math inline">\(\chi^2\)</span> test here but this matches up exactly with what we would expect: <span class="math inline">\(x^2_{obs} = 3.06 = (-1.75)^2 = (z_{obs})^2\)</span> and the <span class="math inline">\(p\)</span>-values are the same because we are focusing on a two-tailed test.</p>
+<p>Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping.</p>
+</div>
+<div id="state-conclusion-1" class="section level4 unnumbered">
+<h4>State conclusion</h4>
+<p>We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample proportion was not statistically greater than the hypothesized proportion has not been invalidated. Based on this sample, we have do not evidence that the proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80, at the 5% level.</p>
+<hr />
+</div>
+</div>
+<div id="comparing-results-1" class="section level3">
+<h3><span class="header-section-number">B.3.6</span> Comparing results</h3>
+<p>Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the <span class="math inline">\(p\)</span>-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.</p>
+<hr />
+<hr />
+</div>
+</div>
+<div id="two-proportions" class="section level2">
+<h2><span class="header-section-number">B.4</span> Two proportions</h2>
+<div id="problem-statement-2" class="section level3">
+<h3><span class="header-section-number">B.4.1</span> Problem statement</h3>
+<p>A 2010 survey asked 827 randomly sampled registered voters
+in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of
+California? Or do you not know enough to say?” Conduct a hypothesis test to determine if the data
+provide strong evidence that the proportion of college
+graduates who do not have an opinion on this issue is
+different than that of non-college graduates. <span class="citation">(Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 6])</span></p>
+</div>
+<div id="competing-hypotheses-2" class="section level3">
+<h3><span class="header-section-number">B.4.2</span> Competing hypotheses</h3>
+<div id="in-words-2" class="section level4 unnumbered">
+<h4>In words</h4>
+<ul>
+<li><p>Null hypothesis: There is no association between having an opinion on drilling and having a college degree for all registered California voters in 2010.</p></li>
+<li><p>Alternative hypothesis: There is an association between having an opinion on drilling and having a college degree for all registered California voters in 2010.</p></li>
+</ul>
+</div>
+<div id="another-way-in-words" class="section level4 unnumbered">
+<h4>Another way in words</h4>
+<ul>
+<li><p>Null hypothesis: The probability that a Californian voter in 2010 having no opinion on drilling and is a college graduate is the <strong>same</strong> as that of a non-college graduate.</p></li>
+<li><p>Alternative hypothesis: These parameter probabilities are different.</p></li>
+</ul>
+</div>
+<div id="in-symbols-with-annotations-2" class="section level4 unnumbered">
+<h4>In symbols (with annotations)</h4>
+<ul>
+<li><span class="math inline">\(H_0: \pi_{college} = \pi_{no\_college}\)</span> or <span class="math inline">\(H_0: \pi_{college} - \pi_{no\_college} = 0\)</span>, where <span class="math inline">\(\pi\)</span> represents the probability of not having an opinion on drilling.</li>
+<li><span class="math inline">\(H_A: \pi_{college} - \pi_{no\_college} \ne 0\)</span></li>
+</ul>
+</div>
+<div id="set-alpha-2" class="section level4 unnumbered">
+<h4>Set <span class="math inline">\(\alpha\)</span></h4>
+<p>It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.</p>
+</div>
+</div>
+<div id="exploring-the-sample-data-2" class="section level3">
+<h3><span class="header-section-number">B.4.3</span> Exploring the sample data</h3>
+<pre class="sourceCode r"><code class="sourceCode r">offshore &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;https://moderndive.com/data/offshore.csv&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">offshore <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">tabyl</span>(college_grad, response)</code></pre>
+<pre><code> college_grad no opinion opinion
+           no        131     258
+          yes        104     334</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">off_summ &lt;-<span class="st"> </span>offshore <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(college_grad) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">prop_no_opinion =</span> <span class="kw">mean</span>(response <span class="op">==</span><span class="st"> &quot;no opinion&quot;</span>),
+    <span class="dt">sample_size =</span> <span class="kw">n</span>())</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(offshore, <span class="kw">aes</span>(<span class="dt">x =</span> college_grad, <span class="dt">fill =</span> response)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>(<span class="dt">position =</span> <span class="st">&quot;fill&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">coord_flip</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/stacked_bar-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="guess-about-statistical-significance-2" class="section level4 unnumbered">
+<h4>Guess about statistical significance</h4>
+<p>We are looking to see if a difference exists in the size of the bars corresponding to <code>no opinion</code> for the plot. Based solely on the plot, we have little reason to believe that a difference exists since the bars seem to be about the same size, BUT…it’s important to use statistics to see if that difference is actually statistically significant!</p>
+<hr />
+</div>
+</div>
+<div id="non-traditional-methods-2" class="section level3">
+<h3><span class="header-section-number">B.4.4</span> Non-traditional methods</h3>
+<div id="collecting-summary-info" class="section level4 unnumbered">
+<h4>Collecting summary info</h4>
+<p>The observed statistic is</p>
+<pre class="sourceCode r"><code class="sourceCode r">d_hat &lt;-<span class="st"> </span>offshore <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(response <span class="op">~</span><span class="st"> </span>college_grad, <span class="dt">success =</span> <span class="st">&quot;no opinion&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;yes&quot;</span>, <span class="st">&quot;no&quot;</span>))
+d_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+     stat
+    &lt;dbl&gt;
+1 -0.0993</code></pre>
+</div>
+<div id="randomization-for-hypothesis-test" class="section level4 unnumbered">
+<h4>Randomization for hypothesis test</h4>
+<p>In order to look to see if the observed sample proportion of no opinion for college graduates of 0.337 is statistically different than that for graduates of 0.237, we need to account for the sample sizes. Note that this is the same as looking to see if <span class="math inline">\(\hat{p}_{grad} - \hat{p}_{nograd}\)</span> is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 389 and 438 were selected.</p>
+<p>We can use the idea of <em>randomization testing</em> (also known as <em>permutation testing</em>) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using <em>shuffling</em> from that simulated population to account for sampling variability.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2018</span>)
+null_distn_two_props &lt;-<span class="st"> </span>offshore <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(response <span class="op">~</span><span class="st"> </span>college_grad, <span class="dt">success =</span> <span class="st">&quot;no opinion&quot;</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;yes&quot;</span>, <span class="st">&quot;no&quot;</span>))</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-465-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.099 or less than or equal to 0.099 for our <span class="math inline">\(p\)</span>-value.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;two_sided&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-466-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="calculate-p-value-2" class="section level5 unnumbered">
+<h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;two_sided&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1  0.0021</code></pre>
+<p>So our <span class="math inline">\(p\)</span>-value is 0.002 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tails of the null distribution.</p>
+</div>
+</div>
+<div id="bootstrapping-for-confidence-interval-2" class="section level4 unnumbered">
+<h4>Bootstrapping for confidence interval</h4>
+<p>We can also create a confidence interval for the unknown population parameter <span class="math inline">\(\pi_{college} - \pi_{no\_college}\)</span> using our sample data with <em>bootstrapping</em>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_two_props &lt;-<span class="st"> </span>offshore <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(response <span class="op">~</span><span class="st"> </span>college_grad, <span class="dt">success =</span> <span class="st">&quot;no opinion&quot;</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;yes&quot;</span>, <span class="st">&quot;no&quot;</span>))</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">ci &lt;-<span class="st"> </span>boot_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1 -0.161 -0.0378</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-470-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that 0 is not contained in this confidence interval as a plausible value of <span class="math inline">\(\pi_{college} - \pi_{no\_college}\)</span> (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates.</p>
+<p><strong>Interpretation</strong>: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates.</p>
+<hr />
+</div>
+</div>
+<div id="traditional-methods-2" class="section level3">
+<h3><span class="header-section-number">B.4.5</span> Traditional methods</h3>
+</div>
+<div id="check-conditions-2" class="section level3">
+<h3><span class="header-section-number">B.4.6</span> Check conditions</h3>
+<p>Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.</p>
+<ol style="list-style-type: decimal">
+<li><p><em>Independent observations</em>: Each case that was selected must be independent of all the other cases selected.</p>
+<p>This condition is met since cases were selected at random to observe.</p></li>
+<li><p><em>Sample size</em>: The number of pooled successes and pooled failures must be at least 10 for each group.</p>
+<p>We need to first figure out the pooled success rate: <span class="math display">\[\hat{p}_{obs} = \dfrac{131 + 104}{827} = 0.28.\]</span> We now determine expected (pooled) success and failure counts:</p>
+<p><span class="math inline">\(0.28 \cdot (131 + 258) = 108.92\)</span>, <span class="math inline">\(0.72 \cdot (131 + 258) = 280.08\)</span></p>
+<p><span class="math inline">\(0.28 \cdot (104 + 334) = 122.64\)</span>, <span class="math inline">\(0.72 \cdot (104 + 334) = 315.36\)</span></p></li>
+<li><p><em>Independent selection of samples</em>: The cases are not paired in any meaningful way.</p>
+<p>We have no reason to suspect that a college graduate selected would have any relationship to a non-college graduate selected.</p></li>
+</ol>
+</div>
+<div id="test-statistic-2" class="section level3">
+<h3><span class="header-section-number">B.4.7</span> Test statistic</h3>
+<p>The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample proportions corresponding to no opinion on drilling (<span class="math inline">\(\hat{p}_{college, obs} - \hat{p}_{no\_college, obs}\)</span> = 0.033) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the standard normal distribution to standardize the difference in sample proportions (<span class="math inline">\(\hat{P}_{college} - \hat{P}_{no\_college}\)</span>) using the standard error of <span class="math inline">\(\hat{P}_{college} - \hat{P}_{no\_college}\)</span> and the pooled estimate:</p>
+<p><span class="math display">\[ Z =\dfrac{ (\hat{P}_1 - \hat{P}_2) - 0}{\sqrt{\dfrac{\hat{P}(1 - \hat{P})}{n_1} + \dfrac{\hat{P}(1 - \hat{P})}{n_2} }} \sim N(0, 1) \]</span> where <span class="math inline">\(\hat{P} = \dfrac{\text{total number of successes} }{ \text{total number of cases}}.\)</span></p>
+<div id="observed-test-statistic-2" class="section level4 unnumbered">
+<h4>Observed test statistic</h4>
+<p>While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the <code>prop.test</code> function to perform this analysis for us.</p>
+<pre class="sourceCode r"><code class="sourceCode r">z_hat &lt;-<span class="st"> </span>offshore <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(response <span class="op">~</span><span class="st"> </span>college_grad, <span class="dt">success =</span> <span class="st">&quot;no opinion&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;z&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;yes&quot;</span>, <span class="st">&quot;no&quot;</span>))
+z_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1 -3.16</code></pre>
+<p>The observed difference in sample proportions is 3.16 standard deviations smaller than 0.</p>
+<p>The <span class="math inline">\(p\)</span>-value—the probability of observing a <span class="math inline">\(Z\)</span> value of -3.16 or more extreme in our null distribution—is 0.0016. This can also be calculated in R directly:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="dv">2</span> <span class="op">*</span><span class="st"> </span><span class="kw">pnorm</span>(<span class="op">-</span><span class="fl">3.16</span>, <span class="dt">lower.tail =</span> <span class="ot">TRUE</span>)</code></pre>
+<pre><code>[1] 0.00158</code></pre>
+</div>
+</div>
+<div id="state-conclusion-2" class="section level3">
+<h3><span class="header-section-number">B.4.8</span> State conclusion</h3>
+<p>We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians.</p>
+<hr />
+</div>
+<div id="comparing-results-2" class="section level3">
+<h3><span class="header-section-number">B.4.9</span> Comparing results</h3>
+<p>Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the <span class="math inline">\(p\)</span>-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results.</p>
+<hr />
+<hr />
+</div>
+</div>
+<div id="two-means-independent-samples" class="section level2">
+<h2><span class="header-section-number">B.5</span> Two means (independent samples)</h2>
+<div id="problem-statement-3" class="section level3">
+<h3><span class="header-section-number">B.5.1</span> Problem statement</h3>
+<p>Average income varies from one region of the country to
+another, and it often reflects both lifestyles and regional living expenses. Suppose a new graduate
+is considering a job in two locations, Cleveland, OH and Sacramento, CA, and he wants to see
+whether the average income in one of these cities is higher than the other. He would like to conduct
+a hypothesis test based on two randomly selected samples from the 2000 Census. <span class="citation">(Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 5])</span></p>
+</div>
+<div id="competing-hypotheses-3" class="section level3">
+<h3><span class="header-section-number">B.5.2</span> Competing hypotheses</h3>
+<div id="in-words-3" class="section level4 unnumbered">
+<h4>In words</h4>
+<ul>
+<li><p>Null hypothesis: There is no association between income and location (Cleveland, OH and Sacramento, CA).</p></li>
+<li><p>Alternative hypothesis: There is an association between income and location (Cleveland, OH and Sacramento, CA).</p></li>
+</ul>
+</div>
+<div id="another-way-in-words-1" class="section level4 unnumbered">
+<h4>Another way in words</h4>
+<ul>
+<li><p>Null hypothesis: The mean income is the <strong>same</strong> for both cities.</p></li>
+<li><p>Alternative hypothesis: The mean income is different for the two cities.</p></li>
+</ul>
+</div>
+<div id="in-symbols-with-annotations-3" class="section level4 unnumbered">
+<h4>In symbols (with annotations)</h4>
+<ul>
+<li><span class="math inline">\(H_0: \mu_{sac} = \mu_{cle}\)</span> or <span class="math inline">\(H_0: \mu_{sac} - \mu_{cle} = 0\)</span>, where <span class="math inline">\(\mu\)</span> represents the average income.</li>
+<li><span class="math inline">\(H_A: \mu_{sac} - \mu_{cle} \ne 0\)</span></li>
+</ul>
+</div>
+<div id="set-alpha-3" class="section level4 unnumbered">
+<h4>Set <span class="math inline">\(\alpha\)</span></h4>
+<p>It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.</p>
+</div>
+</div>
+<div id="exploring-the-sample-data-3" class="section level3">
+<h3><span class="header-section-number">B.5.3</span> Exploring the sample data</h3>
+<pre class="sourceCode r"><code class="sourceCode r">cle_sac &lt;-<span class="st"> </span><span class="kw">read.delim</span>(<span class="st">&quot;https://moderndive.com/data/cleSac.txt&quot;</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">metro_area =</span> Metropolitan_area_Detailed,
+         <span class="dt">income =</span> Total_personal_income) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">na.omit</span>()</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">inc_summ &lt;-<span class="st"> </span>cle_sac <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">group_by</span>(metro_area) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">sample_size =</span> <span class="kw">n</span>(),
+    <span class="dt">mean =</span> <span class="kw">mean</span>(income),
+    <span class="dt">sd =</span> <span class="kw">sd</span>(income),
+    <span class="dt">minimum =</span> <span class="kw">min</span>(income),
+    <span class="dt">lower_quartile =</span> <span class="kw">quantile</span>(income, <span class="fl">0.25</span>),
+    <span class="dt">median =</span> <span class="kw">median</span>(income),
+    <span class="dt">upper_quartile =</span> <span class="kw">quantile</span>(income, <span class="fl">0.75</span>),
+    <span class="dt">max =</span> <span class="kw">max</span>(income))
+<span class="kw">kable</span>(inc_summ)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">metro_area</th>
+<th align="center">sample_size</th>
+<th align="center">mean</th>
+<th align="center">sd</th>
+<th align="center">minimum</th>
+<th align="center">lower_quartile</th>
+<th align="center">median</th>
+<th align="center">upper_quartile</th>
+<th align="center">max</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">Cleveland_ OH</td>
+<td align="center">212</td>
+<td align="center">27467</td>
+<td align="center">27681</td>
+<td align="center">0</td>
+<td align="center">8475</td>
+<td align="center">21000</td>
+<td align="center">35275</td>
+<td align="center">152400</td>
+</tr>
+<tr class="even">
+<td align="center">Sacramento_ CA</td>
+<td align="center">175</td>
+<td align="center">32428</td>
+<td align="center">35774</td>
+<td align="center">0</td>
+<td align="center">8050</td>
+<td align="center">20000</td>
+<td align="center">49350</td>
+<td align="center">206900</td>
+</tr>
+</tbody>
+</table>
+<p>The boxplot below also shows the mean for each group highlighted by the red dots.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(cle_sac, <span class="kw">aes</span>(<span class="dt">x =</span> metro_area, <span class="dt">y =</span> income)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">stat_summary</span>(<span class="dt">fun.y =</span> <span class="st">&quot;mean&quot;</span>, <span class="dt">geom =</span> <span class="st">&quot;point&quot;</span>, <span class="dt">color =</span> <span class="st">&quot;red&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/boxplot-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="guess-about-statistical-significance-3" class="section level4 unnumbered">
+<h4>Guess about statistical significance</h4>
+<p>We are looking to see if a difference exists in the mean income of the two levels of the explanatory variable. Based solely on the boxplot, we have reason to believe that no difference exists. The distributions of income seem similar and the means fall in roughly the same place.</p>
+<hr />
+</div>
+</div>
+<div id="non-traditional-methods-3" class="section level3">
+<h3><span class="header-section-number">B.5.4</span> Non-traditional methods</h3>
+<div id="collecting-summary-info-1" class="section level4 unnumbered">
+<h4>Collecting summary info</h4>
+<p>We now compute the observed statistic:</p>
+<pre class="sourceCode r"><code class="sourceCode r">d_hat &lt;-<span class="st"> </span>cle_sac <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(income <span class="op">~</span><span class="st"> </span>metro_area) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>, 
+            <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Sacramento_ CA&quot;</span>, <span class="st">&quot;Cleveland_ OH&quot;</span>))
+d_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1 4960.</code></pre>
+</div>
+<div id="randomization-for-hypothesis-test-1" class="section level4 unnumbered">
+<h4>Randomization for hypothesis test</h4>
+<p>In order to look to see if the observed sample mean for Sacramento of 27467.066 is statistically different than that for Cleveland of 32427.543, we need to account for the sample sizes. Note that this is the same as looking to see if <span class="math inline">\(\bar{x}_{sac} - \bar{x}_{cle}\)</span> is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 212 and 175 were selected.</p>
+<p>We can use the idea of <em>randomization testing</em> (also known as <em>permutation testing</em>) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using <em>shuffling</em> from that simulated population to account for sampling variability.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2018</span>)
+null_distn_two_means &lt;-<span class="st"> </span>cle_sac <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(income <span class="op">~</span><span class="st"> </span>metro_area) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>,
+            <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Sacramento_ CA&quot;</span>, <span class="st">&quot;Cleveland_ OH&quot;</span>))</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-474-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to 4960.477 or less than or equal to -4960.477 for our <span class="math inline">\(p\)</span>-value.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-475-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="calculate-p-value-3" class="section level5 unnumbered">
+<h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1   0.124</code></pre>
+<p>So our <span class="math inline">\(p\)</span>-value is 0.124 and we fail to reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are not very far into the tail of the null distribution.</p>
+</div>
+</div>
+<div id="bootstrapping-for-confidence-interval-3" class="section level4 unnumbered">
+<h4>Bootstrapping for confidence interval</h4>
+<p>We can also create a confidence interval for the unknown population parameter <span class="math inline">\(\mu_{sac} - \mu_{cle}\)</span> using our sample data with <em>bootstrapping</em>. Here we will bootstrap each of the groups with replacement instead of shuffling. This is done using the <code>groups</code>
+argument in the <code>resample</code> function to fix the size of each group to
+be the same as the original group sizes of 175 for Sacramento and 212 for Cleveland.</p>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_two_means &lt;-<span class="st"> </span>cle_sac <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(income <span class="op">~</span><span class="st"> </span>metro_area) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>,
+            <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Sacramento_ CA&quot;</span>, <span class="st">&quot;Cleveland_ OH&quot;</span>))</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">ci &lt;-<span class="st"> </span>boot_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1 -1446.  11308.</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-479-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that 0 is contained in this confidence interval as a plausible value of <span class="math inline">\(\mu_{sac} - \mu_{cle}\)</span> (the unknown population parameter). This matches with our hypothesis test results of failing to reject the null hypothesis. Since zero is a plausible value of the population parameter, we do not have evidence that Sacramento incomes are different than Cleveland incomes.</p>
+<p><strong>Interpretation</strong>: We are 95% confident the true mean yearly income for those living in Sacramento is between 1445.53 dollars smaller to 11307.82 dollars higher than for Cleveland.</p>
+<p><strong>Note</strong>: You could also use the null distribution based on randomization with a shift to have its center at <span class="math inline">\(\bar{x}_{sac} - \bar{x}_{cle} = \$4960.48\)</span> instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above.</p>
+<hr />
+</div>
+</div>
+<div id="traditional-methods-3" class="section level3">
+<h3><span class="header-section-number">B.5.5</span> Traditional methods</h3>
+<div id="check-conditions-3" class="section level5 unnumbered">
+<h5>Check conditions</h5>
+<p>Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.</p>
+<ol style="list-style-type: decimal">
+<li><p><em>Independent observations</em>: The observations are independent in both groups.</p>
+<p>This <code>metro_area</code> variable is met since the cases are randomly selected from each city.</p></li>
+<li><p><em>Approximately normal</em>: The distribution of the response for each group should be normal or the sample sizes should be at least 30.</p></li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(cle_sac, <span class="kw">aes</span>(<span class="dt">x =</span> income)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">binwidth =</span> <span class="dv">20000</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>metro_area)</code></pre>
+<p><img src="ismaykim_files/figure-html/hist-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We have some reason to doubt the normality assumption here since both the histograms show deviation from a normal model fitting the data well for each group. The sample sizes for each group are greater than 100 though so the assumptions should still apply.</p>
+<ol start="3" style="list-style-type: decimal">
+<li><p><em>Independent samples</em>: The samples should be collected without any natural pairing.</p>
+<p>There is no mention of there being a relationship between those selected in Cleveland and in Sacramento.</p></li>
+</ol>
+</div>
+</div>
+<div id="test-statistic-3" class="section level3">
+<h3><span class="header-section-number">B.5.6</span> Test statistic</h3>
+<p>The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample means (<span class="math inline">\(\bar{x}_{sac, obs} - \bar{x}_{cle, obs}\)</span> = 4960.477) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the <span class="math inline">\(t\)</span> distribution to standardize the difference in sample means (<span class="math inline">\(\bar{X}_{sac} - \bar{X}_{cle}\)</span>) using the approximate standard error of <span class="math inline">\(\bar{X}_{sac} - \bar{X}_{cle}\)</span> (invoking <span class="math inline">\(S_{sac}\)</span> and <span class="math inline">\(S_{cle}\)</span> as estimates of unknown <span class="math inline">\(\sigma_{sac}\)</span> and <span class="math inline">\(\sigma_{cle}\)</span>).</p>
+<p><span class="math display">\[ T =\dfrac{ (\bar{X}_1 - \bar{X}_2) - 0}{ \sqrt{\dfrac{S_1^2}{n_1} + \dfrac{S_2^2}{n_2}}  } \sim t (df = min(n_1 - 1, n_2 - 1)) \]</span> where 1 = Sacramento and 2 = Cleveland with <span class="math inline">\(S_1^2\)</span> and <span class="math inline">\(S_2^2\)</span> the sample variance of the incomes of both cities, respectively, and <span class="math inline">\(n_1 = 175\)</span> for Sacramento and <span class="math inline">\(n_2 = 212\)</span> for Cleveland.</p>
+<div id="observed-test-statistic-3" class="section level4 unnumbered">
+<h4>Observed test statistic</h4>
+<p>Note that we could also do (ALMOST) this test directly using the <code>t.test</code> function. The <code>x</code> and <code>y</code> arguments are expected to both be numeric vectors here so we’ll need to appropriately filter our datasets.</p>
+<pre class="sourceCode r"><code class="sourceCode r">cle_sac <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(income <span class="op">~</span><span class="st"> </span>metro_area) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;t&quot;</span>,
+            <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Cleveland_ OH&quot;</span>, <span class="st">&quot;Sacramento_ CA&quot;</span>))</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1 -1.50</code></pre>
+<!--
+Note that the degrees of freedom reported above are different than what we used above in specifying the **Test Statistic**.  The degrees of freedom used here is also known as the Satterthwaite approximation and involves a quite complicated formula.  For most problems, the much simpler smaller of sample sizes minus one will suffice.
+-->
+<p>We see here that the observed test statistic value is around -1.5. <!-- with $df = min(212 - 1, 175 - 1) = 174$.--></p>
+<p>While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies.</p>
+<!--
+We can use the `inference` function in the `oilabs` package to perform this analysis for us.  Note that to obtain the `F value` given here, you divide the observed $MSG$ value of 17.53 by the observed $MSE$ value of 1.75.  (The use of the word `Residuals` will make more sense when we have covered regression.)
+-->
+<!--Recall that for large degrees of freedom, the $t$ distribution is roughly equal to the standard normal curve so our difference in `df` for the Satterthwaite and "min" variations doesn't really matter.-->
+</div>
+</div>
+<div id="compute-p-value-1" class="section level3">
+<h3><span class="header-section-number">B.5.7</span> Compute <span class="math inline">\(p\)</span>-value</h3>
+<p>The <span class="math inline">\(p\)</span>-value—the probability of observing an <span class="math inline">\(t_{174}\)</span> value of -1.501 or more extreme (in both directions) in our null distribution—is 0.13. This can also be calculated in R directly:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="dv">2</span> <span class="op">*</span><span class="st"> </span><span class="kw">pt</span>(<span class="op">-</span><span class="fl">1.501</span>, <span class="dt">df =</span> <span class="kw">min</span>(<span class="dv">212</span> <span class="op">-</span><span class="st"> </span><span class="dv">1</span>, <span class="dv">175</span> <span class="op">-</span><span class="st"> </span><span class="dv">1</span>), <span class="dt">lower.tail =</span> <span class="ot">TRUE</span>)</code></pre>
+<pre><code>[1] 0.135</code></pre>
+<p>We can also approximate by using the standard normal curve:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="dv">2</span> <span class="op">*</span><span class="st"> </span><span class="kw">pnorm</span>(<span class="op">-</span><span class="fl">1.501</span>)</code></pre>
+<pre><code>[1] 0.133</code></pre>
+<p>Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping.</p>
+</div>
+<div id="state-conclusion-3" class="section level3">
+<h3><span class="header-section-number">B.5.8</span> State conclusion</h3>
+<p>We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference not existing in the means was backed by this statistical analysis. We do not have evidence to suggest that the true mean income differs between Cleveland, OH and Sacramento, CA based on this data.</p>
+<hr />
+</div>
+<div id="comparing-results-3" class="section level3">
+<h3><span class="header-section-number">B.5.9</span> Comparing results</h3>
+<p>Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the <span class="math inline">\(p\)</span>-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.</p>
+<hr />
+<hr />
+</div>
+</div>
+<div id="two-means-paired-samples" class="section level2">
+<h2><span class="header-section-number">B.6</span> Two means (paired samples)</h2>
+<div id="problem-statement-4" class="section level4 unnumbered">
+<h4>Problem statement</h4>
+<p>Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water at 10 randomly selected locations on a stretch of river. Do the data suggest that the true average concentration in the surface water is smaller than that of bottom water? (Note that units are not given.) [Tweaked a bit from <a href="https://onlinecourses.science.psu.edu/stat500/node/51" class="uri">https://onlinecourses.science.psu.edu/stat500/node/51</a>]</p>
+</div>
+<div id="competing-hypotheses-4" class="section level3">
+<h3><span class="header-section-number">B.6.1</span> Competing hypotheses</h3>
+<div id="in-words-4" class="section level4 unnumbered">
+<h4>In words</h4>
+<ul>
+<li><p>Null hypothesis: The mean concentration in the bottom water is the same as that of the surface water at different paired locations.</p></li>
+<li><p>Alternative hypothesis: The mean concentration in the surface water is smaller than that of the bottom water at different paired locations.</p></li>
+</ul>
+</div>
+<div id="in-symbols-with-annotations-4" class="section level4 unnumbered">
+<h4>In symbols (with annotations)</h4>
+<ul>
+<li><span class="math inline">\(H_0: \mu_{diff} = 0\)</span>, where <span class="math inline">\(\mu_{diff}\)</span> represents the mean difference in concentration for surface water minus bottom water.</li>
+<li><span class="math inline">\(H_A: \mu_{diff} &lt; 0\)</span></li>
+</ul>
+</div>
+<div id="set-alpha-4" class="section level4 unnumbered">
+<h4>Set <span class="math inline">\(\alpha\)</span></h4>
+<p>It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.</p>
+</div>
+</div>
+<div id="exploring-the-sample-data-4" class="section level3">
+<h3><span class="header-section-number">B.6.2</span> Exploring the sample data</h3>
+<pre class="sourceCode r"><code class="sourceCode r">zinc_tidy &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;https://moderndive.com/data/zinc_tidy.csv&quot;</span>)</code></pre>
+<p>We want to look at the differences in <code>surface - bottom</code> for each location:</p>
+<pre class="sourceCode r"><code class="sourceCode r">zinc_diff &lt;-<span class="st"> </span>zinc_tidy <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(loc_id) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">pair_diff =</span> <span class="kw">diff</span>(concentration)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">ungroup</span>()</code></pre>
+<p>Next we calculate the mean difference as our observed statistic:</p>
+<pre class="sourceCode r"><code class="sourceCode r">d_hat &lt;-<span class="st"> </span>zinc_diff <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> pair_diff) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)
+d_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+     stat
+    &lt;dbl&gt;
+1 -0.0804</code></pre>
+<p>The histogram below also shows the distribution of <code>pair_diff</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(zinc_diff, <span class="kw">aes</span>(<span class="dt">x =</span> pair_diff)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.04</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/hist1a-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="guess-about-statistical-significance-4" class="section level4 unnumbered">
+<h4>Guess about statistical significance</h4>
+<p>We are looking to see if the sample paired mean difference of -0.08 is statistically less than 0. They seem to be quite close, but we have a small number of pairs here. Let’s guess that we will fail to reject the null hypothesis.</p>
+<hr />
+</div>
+</div>
+<div id="non-traditional-methods-4" class="section level3">
+<h3><span class="header-section-number">B.6.3</span> Non-traditional methods</h3>
+<div id="bootstrapping-for-hypothesis-test-1" class="section level4 unnumbered">
+<h4>Bootstrapping for hypothesis test</h4>
+<p>In order to look to see if the observed sample mean difference <span class="math inline">\(\bar{x}_{diff} = 4960.477\)</span> is statistically less than 0, we need to account for the number of pairs. We also need to determine a process that replicates how the paired data was selected in a way similar to how we calculated our original difference in sample means.</p>
+<p>Treating the differences as our data of interest, we next use the process of <strong>bootstrapping</strong> to build other simulated samples and then calculate the mean of the bootstrap samples. We hypothesize that the mean difference is zero.</p>
+<p>This process is similar to comparing the One Mean example seen above, but using the differences between the two groups as a single sample with a hypothesized mean difference of 0.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2018</span>)
+null_distn_paired_means &lt;-<span class="st"> </span>zinc_diff <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> pair_diff) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;point&quot;</span>, <span class="dt">mu =</span> <span class="dv">0</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-483-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a left-tailed test so we will be looking for values that are less than or equal to 4960.477 for our <span class="math inline">\(p\)</span>-value.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;less&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-484-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="calculate-p-value-4" class="section level5 unnumbered">
+<h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;less&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1       0</code></pre>
+<p>So our <span class="math inline">\(p\)</span>-value is essentially 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the left tail of the null distribution.</p>
+</div>
+</div>
+<div id="bootstrapping-for-confidence-interval-4" class="section level4 unnumbered">
+<h4>Bootstrapping for confidence interval</h4>
+<p>We can also create a confidence interval for the unknown population parameter <span class="math inline">\(\mu_{diff}\)</span> using our sample data (the calculated differences) with <em>bootstrapping</em>. This is similar to the bootstrapping done in a one sample mean case, except now our data is differences instead of raw numerical data.
+Note that this code is identical to the pipeline shown in the hypothesis test above except the <code>hypothesize()</code> function is not called.</p>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_paired_means &lt;-<span class="st"> </span>zinc_diff <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> pair_diff) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">ci &lt;-<span class="st"> </span>boot_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1 -0.112 -0.0503</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-488-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that 0 is not contained in this confidence interval as a plausible value of <span class="math inline">\(\mu_{diff}\)</span> (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter and since the entire confidence interval falls below zero, we have evidence that surface zinc concentration levels are lower, on average, than bottom level zinc concentrations.</p>
+<p><strong>Interpretation</strong>: We are 95% confident the true mean zinc concentration on the surface is between 0.11 units smaller to 0.05 units smaller than on the bottom.</p>
+<hr />
+</div>
+</div>
+<div id="traditional-methods-4" class="section level3">
+<h3><span class="header-section-number">B.6.4</span> Traditional methods</h3>
+<div id="check-conditions-4" class="section level4 unnumbered">
+<h4>Check conditions</h4>
+<p>Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.</p>
+<ol style="list-style-type: decimal">
+<li><p><em>Independent observations</em>: The observations among pairs are independent.</p>
+<p>The locations are selected independently through random sampling so this condition is met.</p></li>
+<li><p><em>Approximately normal</em>: The distribution of population of differences is normal or the number of pairs is at least 30.</p>
+<p>The histogram above does show some skew so we have reason to doubt the population being normal based on this sample. We also only have 10 pairs which is fewer than the 30 needed. A theory-based test may not be valid here.</p></li>
+</ol>
+</div>
+<div id="test-statistic-4" class="section level4 unnumbered">
+<h4>Test statistic</h4>
+<p>The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean difference <span class="math inline">\(\mu_{diff}\)</span>. A good guess is the sample mean difference <span class="math inline">\(\bar{X}_{diff}\)</span>. Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of <span class="math inline">\(\bar{x}_{diff, obs} = 0.0804\)</span> or larger assuming that the population mean difference is 0 (assuming the null hypothesis is true). If the conditions are met and assuming <span class="math inline">\(H_0\)</span> is true, we can “standardize” this original test statistic of <span class="math inline">\(\bar{X}_{diff}\)</span> into a <span class="math inline">\(T\)</span> statistic that follows a <span class="math inline">\(t\)</span> distribution with degrees of freedom equal to <span class="math inline">\(df = n - 1\)</span>:</p>
+<p><span class="math display">\[ T =\dfrac{ \bar{X}_{diff} - 0}{ S_{diff} / \sqrt{n} } \sim t (df = n - 1) \]</span></p>
+<p>where <span class="math inline">\(S\)</span> represents the standard deviation of the sample differences and <span class="math inline">\(n\)</span> is the number of pairs.</p>
+<div id="observed-test-statistic-4" class="section level5 unnumbered">
+<h5>Observed test statistic</h5>
+<p>While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the <code>t_test</code> function on the differences to perform this analysis for us.</p>
+<pre class="sourceCode r"><code class="sourceCode r">t_test_results &lt;-<span class="st"> </span>zinc_diff <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span>infer<span class="op">::</span><span class="kw">t_test</span>(<span class="dt">formula =</span> pair_diff <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>, 
+         <span class="dt">alternative =</span> <span class="st">&quot;less&quot;</span>,
+         <span class="dt">mu =</span> <span class="dv">0</span>)
+t_test_results</code></pre>
+<pre><code># A tibble: 1 x 6
+  statistic  t_df  p_value alternative lower_ci upper_ci
+      &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt;          &lt;dbl&gt;    &lt;dbl&gt;
+1     -4.86     9 0.000446 less            -Inf  -0.0501</code></pre>
+<p>We see here that the <span class="math inline">\(t_{obs}\)</span> value is -4.864.</p>
+</div>
+</div>
+<div id="compute-p-value-2" class="section level4 unnumbered">
+<h4>Compute <span class="math inline">\(p\)</span>-value</h4>
+<p>The <span class="math inline">\(p\)</span>-value—the probability of observing a <span class="math inline">\(t_{obs}\)</span> value of -4.864 or less in our null distribution of a <span class="math inline">\(t\)</span> with 9 degrees of freedom—is 0. This can also be calculated in R directly:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">pt</span>(<span class="op">-</span><span class="fl">4.8638</span>, <span class="dt">df =</span> <span class="kw">nrow</span>(zinc_diff) <span class="op">-</span><span class="st"> </span><span class="dv">1</span>, <span class="dt">lower.tail =</span> <span class="ot">TRUE</span>)</code></pre>
+<pre><code>[1] 0.000446</code></pre>
+</div>
+<div id="state-conclusion-4" class="section level4 unnumbered">
+<h4>State conclusion</h4>
+<p>We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean difference was not statistically less than the hypothesized mean of 0 has been invalidated here. Based on this sample, we have evidence that the mean concentration in the bottom water is greater than that of the surface water at different paired locations.</p>
+<hr />
+</div>
+</div>
+<div id="comparing-results-4" class="section level3">
+<h3><span class="header-section-number">B.6.5</span> Comparing results</h3>
+<p>Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the <span class="math inline">\(p\)</span>-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results here.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="A-appendixA.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="C-appendixC.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/92-appendixB.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/previous_versions/v0.4.0/C-appendixC.html b/docs/previous_versions/v0.4.0/C-appendixC.html
new file mode 100644
index 000000000..c8c82b6fe
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/C-appendixC.html
@@ -0,0 +1,693 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>C Reach for the Stars | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="C Reach for the Stars | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="C Reach for the Stars | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="B-appendixB.html">
+<link rel="next" href="references.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="appendixC" class="section level1">
+<h1><span class="header-section-number">C</span> Reach for the Stars</h1>
+<div id="needed-packages-11" class="section level2 unnumbered">
+<h2>Needed packages</h2>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(knitr)
+<span class="kw">library</span>(dygraphs)
+<span class="kw">library</span>(nycflights13)</code></pre>
+</div>
+<div id="sorted-barplots" class="section level2">
+<h2><span class="header-section-number">C.1</span> Sorted barplots</h2>
+<p>Building upon the example in Section <a href="3-viz.html#geombar">3.8</a>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_table &lt;-<span class="st"> </span><span class="kw">table</span>(flights<span class="op">$</span>carrier)
+flights_table</code></pre>
+<pre><code>
+   9E    AA    AS    B6    DL    EV    F9    FL    HA    MQ    OO    UA    US 
+18460 32729   714 54635 48110 54173   685  3260   342 26397    32 58665 20536 
+   VX    WN    YV 
+ 5162 12275   601 </code></pre>
+<p>We can sort this table from highest to lowest counts by using the <code>sort</code> function:</p>
+<pre class="sourceCode r"><code class="sourceCode r">sorted_flights &lt;-<span class="st"> </span><span class="kw">sort</span>(flights_table, <span class="dt">decreasing =</span> <span class="ot">TRUE</span>)
+<span class="kw">names</span>(sorted_flights)</code></pre>
+<pre><code> [1] &quot;UA&quot; &quot;B6&quot; &quot;EV&quot; &quot;DL&quot; &quot;AA&quot; &quot;MQ&quot; &quot;US&quot; &quot;9E&quot; &quot;WN&quot; &quot;VX&quot; &quot;FL&quot; &quot;AS&quot; &quot;F9&quot; &quot;YV&quot; &quot;HA&quot;
+[16] &quot;OO&quot;</code></pre>
+<p>It is often preferred for barplots to be ordered corresponding to the heights of the bars. This allows the reader to more easily compare the ordering of different airlines in terms of departed flights <span class="citation">(Robbins 2013)</span>. We can also much more easily answer questions like “How many airlines have more departing flights than Southwest Airlines?”.</p>
+<p>We can use the sorted table giving the number of flights defined as <code>sorted_flights</code> to <strong>reorder</strong> the <code>carrier</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">scale_x_discrete</span>(<span class="dt">limits =</span> <span class="kw">names</span>(sorted_flights))</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-494"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-494-1.png" alt="Number of flights departing NYC in 2013 by airline - Descending numbers" width="\textwidth" />
+<p class="caption">
+Figure C.1: Number of flights departing NYC in 2013 by airline - Descending numbers
+</p>
+</div>
+<p>The last addition here specifies the values of the horizontal <code>x</code> axis on a discrete scale to correspond to those given by the entries of <code>sorted_flights</code>.</p>
+</div>
+<div id="interactive-graphics" class="section level2">
+<h2><span class="header-section-number">C.2</span> Interactive graphics</h2>
+<div id="interactive-linegraphs" class="section level3">
+<h3><span class="header-section-number">C.2.1</span> Interactive linegraphs</h3>
+<p>Another useful tool for viewing linegraphs such as this is the <code>dygraph</code> function in the <code>dygraphs</code> package in combination with the <code>dyRangeSelector</code> function. This allows us to zoom in on a selected range and get an interactive plot for us to work with:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dygraphs)
+flights_day &lt;-<span class="st"> </span><span class="kw">mutate</span>(flights, <span class="dt">date =</span> <span class="kw">as.Date</span>(time_hour))
+flights_summarized &lt;-<span class="st"> </span>flights_day <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(date) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">median_arr_delay =</span> <span class="kw">median</span>(arr_delay, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
+<span class="kw">rownames</span>(flights_summarized) &lt;-<span class="st"> </span>flights_summarized<span class="op">$</span>date
+flights_summarized &lt;-<span class="st"> </span><span class="kw">select</span>(flights_summarized, <span class="op">-</span>date)
+<span class="kw">dyRangeSelector</span>(<span class="kw">dygraph</span>(flights_summarized))</code></pre>
+<div id="htmlwidget-31c6bf48ffd450078efe" style="width:100%;height:384px;" class="dygraphs html-widget"></div>
+<script type="application/json" data-for="htmlwidget-31c6bf48ffd450078efe">{"x":{"attrs":{"labels":["day","median_arr_delay"],"legend":"auto","retainDateWindow":false,"axes":{"x":{"pixelsPerLabel":60}},"showRangeSelector":true,"rangeSelectorHeight":40,"rangeSelectorPlotFillColor":" #A7B1C4","rangeSelectorPlotStrokeColor":"#808FAB","interactionModel":"Dygraph.Interaction.defaultModel"},"scale":"daily","annotations":[],"shadings":[],"events":[],"format":"date","data":[["2013-01-01T05:00:00.000Z","2013-01-02T05:00:00.000Z","2013-01-03T05:00:00.000Z","2013-01-04T05:00:00.000Z","2013-01-05T05:00:00.000Z","2013-01-06T05:00:00.000Z","2013-01-07T05:00:00.000Z","2013-01-08T05:00:00.000Z","2013-01-09T05:00:00.000Z","2013-01-10T05:00:00.000Z","2013-01-11T05:00:00.000Z","2013-01-12T05:00:00.000Z","2013-01-13T05:00:00.000Z","2013-01-14T05:00:00.000Z","2013-01-15T05:00:00.000Z","2013-01-16T05:00:00.000Z","2013-01-17T05:00:00.000Z","2013-01-18T05:00:00.000Z","2013-01-19T05:00:00.000Z","2013-01-20T05:00:00.000Z","2013-01-21T05:00:00.000Z","2013-01-22T05:00:00.000Z","2013-01-23T05:00:00.000Z","2013-01-24T05:00:00.000Z","2013-01-25T05:00:00.000Z","2013-01-26T05:00:00.000Z","2013-01-27T05:00:00.000Z","2013-01-28T05:00:00.000Z","2013-01-29T05:00:00.000Z","2013-01-30T05:00:00.000Z","2013-01-31T05:00:00.000Z","2013-02-01T05:00:00.000Z","2013-02-02T05:00:00.000Z","2013-02-03T05:00:00.000Z","2013-02-04T05:00:00.000Z","2013-02-05T05:00:00.000Z","2013-02-06T05:00:00.000Z","2013-02-07T05:00:00.000Z","2013-02-08T05:00:00.000Z","2013-02-09T05:00:00.000Z","2013-02-10T05:00:00.000Z","2013-02-11T05:00:00.000Z","2013-02-12T05:00:00.000Z","2013-02-13T05:00:00.000Z","2013-02-14T05:00:00.000Z","2013-02-15T05:00:00.000Z","2013-02-16T05:00:00.000Z","2013-02-17T05:00:00.000Z","2013-02-18T05:00:00.000Z","2013-02-19T05:00:00.000Z","2013-02-20T05:00:00.000Z","2013-02-21T05:00:00.000Z","2013-02-22T05:00:00.000Z","2013-02-23T05:00:00.000Z","2013-02-24T05:00:00.000Z","2013-02-25T05:00:00.000Z","2013-02-26T05:00:00.000Z","2013-02-27T05:00:00.000Z","2013-02-28T05:00:00.000Z","2013-03-01T05:00:00.000Z","2013-03-02T05:00:00.000Z","2013-03-03T05:00:00.000Z","2013-03-04T05:00:00.000Z","2013-03-05T05:00:00.000Z","2013-03-06T05:00:00.000Z","2013-03-07T05:00:00.000Z","2013-03-08T05:00:00.000Z","2013-03-09T05:00:00.000Z","2013-03-10T05:00:00.000Z","2013-03-11T04:00:00.000Z","2013-03-12T04:00:00.000Z","2013-03-13T04:00:00.000Z","2013-03-14T04:00:00.000Z","2013-03-15T04:00:00.000Z","2013-03-16T04:00:00.000Z","2013-03-17T04:00:00.000Z","2013-03-18T04:00:00.000Z","2013-03-19T04:00:00.000Z","2013-03-20T04:00:00.000Z","2013-03-21T04:00:00.000Z","2013-03-22T04:00:00.000Z","2013-03-23T04:00:00.000Z","2013-03-24T04:00:00.000Z","2013-03-25T04:00:00.000Z","2013-03-26T04:00:00.000Z","2013-03-27T04:00:00.000Z","2013-03-28T04:00:00.000Z","2013-03-29T04:00:00.000Z","2013-03-30T04:00:00.000Z","2013-03-31T04:00:00.000Z","2013-04-01T04:00:00.000Z","2013-04-02T04:00:00.000Z","2013-04-03T04:00:00.000Z","2013-04-04T04:00:00.000Z","2013-04-05T04:00:00.000Z","2013-04-06T04:00:00.000Z","2013-04-07T04:00:00.000Z","2013-04-08T04:00:00.000Z","2013-04-09T04:00:00.000Z","2013-04-10T04:00:00.000Z","2013-04-11T04:00:00.000Z","2013-04-12T04:00:00.000Z","2013-04-13T04:00:00.000Z","2013-04-14T04:00:00.000Z","2013-04-15T04:00:00.000Z","2013-04-16T04:00:00.000Z","2013-04-17T04:00:00.000Z","2013-04-18T04:00:00.000Z","2013-04-19T04:00:00.000Z","2013-04-20T04:00:00.000Z","2013-04-21T04:00:00.000Z","2013-04-22T04:00:00.000Z","2013-04-23T04:00:00.000Z","2013-04-24T04:00:00.000Z","2013-04-25T04:00:00.000Z","2013-04-26T04:00:00.000Z","2013-04-27T04:00:00.000Z","2013-04-28T04:00:00.000Z","2013-04-29T04:00:00.000Z","2013-04-30T04:00:00.000Z","2013-05-01T04:00:00.000Z","2013-05-02T04:00:00.000Z","2013-05-03T04:00:00.000Z","2013-05-04T04:00:00.000Z","2013-05-05T04:00:00.000Z","2013-05-06T04:00:00.000Z","2013-05-07T04:00:00.000Z","2013-05-08T04:00:00.000Z","2013-05-09T04:00:00.000Z","2013-05-10T04:00:00.000Z","2013-05-11T04:00:00.000Z","2013-05-12T04:00:00.000Z","2013-05-13T04:00:00.000Z","2013-05-14T04:00:00.000Z","2013-05-15T04:00:00.000Z","2013-05-16T04:00:00.000Z","2013-05-17T04:00:00.000Z","2013-05-18T04:00:00.000Z","2013-05-19T04:00:00.000Z","2013-05-20T04:00:00.000Z","2013-05-21T04:00:00.000Z","2013-05-22T04:00:00.000Z","2013-05-23T04:00:00.000Z","2013-05-24T04:00:00.000Z","2013-05-25T04:00:00.000Z","2013-05-26T04:00:00.000Z","2013-05-27T04:00:00.000Z","2013-05-28T04:00:00.000Z","2013-05-29T04:00:00.000Z","2013-05-30T04:00:00.000Z","2013-05-31T04:00:00.000Z","2013-06-01T04:00:00.000Z","2013-06-02T04:00:00.000Z","2013-06-03T04:00:00.000Z","2013-06-04T04:00:00.000Z","2013-06-05T04:00:00.000Z","2013-06-06T04:00:00.000Z","2013-06-07T04:00:00.000Z","2013-06-08T04:00:00.000Z","2013-06-09T04:00:00.000Z","2013-06-10T04:00:00.000Z","2013-06-11T04:00:00.000Z","2013-06-12T04:00:00.000Z","2013-06-13T04:00:00.000Z","2013-06-14T04:00:00.000Z","2013-06-15T04:00:00.000Z","2013-06-16T04:00:00.000Z","2013-06-17T04:00:00.000Z","2013-06-18T04:00:00.000Z","2013-06-19T04:00:00.000Z","2013-06-20T04:00:00.000Z","2013-06-21T04:00:00.000Z","2013-06-22T04:00:00.000Z","2013-06-23T04:00:00.000Z","2013-06-24T04:00:00.000Z","2013-06-25T04:00:00.000Z","2013-06-26T04:00:00.000Z","2013-06-27T04:00:00.000Z","2013-06-28T04:00:00.000Z","2013-06-29T04:00:00.000Z","2013-06-30T04:00:00.000Z","2013-07-01T04:00:00.000Z","2013-07-02T04:00:00.000Z","2013-07-03T04:00:00.000Z","2013-07-04T04:00:00.000Z","2013-07-05T04:00:00.000Z","2013-07-06T04:00:00.000Z","2013-07-07T04:00:00.000Z","2013-07-08T04:00:00.000Z","2013-07-09T04:00:00.000Z","2013-07-10T04:00:00.000Z","2013-07-11T04:00:00.000Z","2013-07-12T04:00:00.000Z","2013-07-13T04:00:00.000Z","2013-07-14T04:00:00.000Z","2013-07-15T04:00:00.000Z","2013-07-16T04:00:00.000Z","2013-07-17T04:00:00.000Z","2013-07-18T04:00:00.000Z","2013-07-19T04:00:00.000Z","2013-07-20T04:00:00.000Z","2013-07-21T04:00:00.000Z","2013-07-22T04:00:00.000Z","2013-07-23T04:00:00.000Z","2013-07-24T04:00:00.000Z","2013-07-25T04:00:00.000Z","2013-07-26T04:00:00.000Z","2013-07-27T04:00:00.000Z","2013-07-28T04:00:00.000Z","2013-07-29T04:00:00.000Z","2013-07-30T04:00:00.000Z","2013-07-31T04:00:00.000Z","2013-08-01T04:00:00.000Z","2013-08-02T04:00:00.000Z","2013-08-03T04:00:00.000Z","2013-08-04T04:00:00.000Z","2013-08-05T04:00:00.000Z","2013-08-06T04:00:00.000Z","2013-08-07T04:00:00.000Z","2013-08-08T04:00:00.000Z","2013-08-09T04:00:00.000Z","2013-08-10T04:00:00.000Z","2013-08-11T04:00:00.000Z","2013-08-12T04:00:00.000Z","2013-08-13T04:00:00.000Z","2013-08-14T04:00:00.000Z","2013-08-15T04:00:00.000Z","2013-08-16T04:00:00.000Z","2013-08-17T04:00:00.000Z","2013-08-18T04:00:00.000Z","2013-08-19T04:00:00.000Z","2013-08-20T04:00:00.000Z","2013-08-21T04:00:00.000Z","2013-08-22T04:00:00.000Z","2013-08-23T04:00:00.000Z","2013-08-24T04:00:00.000Z","2013-08-25T04:00:00.000Z","2013-08-26T04:00:00.000Z","2013-08-27T04:00:00.000Z","2013-08-28T04:00:00.000Z","2013-08-29T04:00:00.000Z","2013-08-30T04:00:00.000Z","2013-08-31T04:00:00.000Z","2013-09-01T04:00:00.000Z","2013-09-02T04:00:00.000Z","2013-09-03T04:00:00.000Z","2013-09-04T04:00:00.000Z","2013-09-05T04:00:00.000Z","2013-09-06T04:00:00.000Z","2013-09-07T04:00:00.000Z","2013-09-08T04:00:00.000Z","2013-09-09T04:00:00.000Z","2013-09-10T04:00:00.000Z","2013-09-11T04:00:00.000Z","2013-09-12T04:00:00.000Z","2013-09-13T04:00:00.000Z","2013-09-14T04:00:00.000Z","2013-09-15T04:00:00.000Z","2013-09-16T04:00:00.000Z","2013-09-17T04:00:00.000Z","2013-09-18T04:00:00.000Z","2013-09-19T04:00:00.000Z","2013-09-20T04:00:00.000Z","2013-09-21T04:00:00.000Z","2013-09-22T04:00:00.000Z","2013-09-23T04:00:00.000Z","2013-09-24T04:00:00.000Z","2013-09-25T04:00:00.000Z","2013-09-26T04:00:00.000Z","2013-09-27T04:00:00.000Z","2013-09-28T04:00:00.000Z","2013-09-29T04:00:00.000Z","2013-09-30T04:00:00.000Z","2013-10-01T04:00:00.000Z","2013-10-02T04:00:00.000Z","2013-10-03T04:00:00.000Z","2013-10-04T04:00:00.000Z","2013-10-05T04:00:00.000Z","2013-10-06T04:00:00.000Z","2013-10-07T04:00:00.000Z","2013-10-08T04:00:00.000Z","2013-10-09T04:00:00.000Z","2013-10-10T04:00:00.000Z","2013-10-11T04:00:00.000Z","2013-10-12T04:00:00.000Z","2013-10-13T04:00:00.000Z","2013-10-14T04:00:00.000Z","2013-10-15T04:00:00.000Z","2013-10-16T04:00:00.000Z","2013-10-17T04:00:00.000Z","2013-10-18T04:00:00.000Z","2013-10-19T04:00:00.000Z","2013-10-20T04:00:00.000Z","2013-10-21T04:00:00.000Z","2013-10-22T04:00:00.000Z","2013-10-23T04:00:00.000Z","2013-10-24T04:00:00.000Z","2013-10-25T04:00:00.000Z","2013-10-26T04:00:00.000Z","2013-10-27T04:00:00.000Z","2013-10-28T04:00:00.000Z","2013-10-29T04:00:00.000Z","2013-10-30T04:00:00.000Z","2013-10-31T04:00:00.000Z","2013-11-01T04:00:00.000Z","2013-11-02T04:00:00.000Z","2013-11-03T04:00:00.000Z","2013-11-04T05:00:00.000Z","2013-11-05T05:00:00.000Z","2013-11-06T05:00:00.000Z","2013-11-07T05:00:00.000Z","2013-11-08T05:00:00.000Z","2013-11-09T05:00:00.000Z","2013-11-10T05:00:00.000Z","2013-11-11T05:00:00.000Z","2013-11-12T05:00:00.000Z","2013-11-13T05:00:00.000Z","2013-11-14T05:00:00.000Z","2013-11-15T05:00:00.000Z","2013-11-16T05:00:00.000Z","2013-11-17T05:00:00.000Z","2013-11-18T05:00:00.000Z","2013-11-19T05:00:00.000Z","2013-11-20T05:00:00.000Z","2013-11-21T05:00:00.000Z","2013-11-22T05:00:00.000Z","2013-11-23T05:00:00.000Z","2013-11-24T05:00:00.000Z","2013-11-25T05:00:00.000Z","2013-11-26T05:00:00.000Z","2013-11-27T05:00:00.000Z","2013-11-28T05:00:00.000Z","2013-11-29T05:00:00.000Z","2013-11-30T05:00:00.000Z","2013-12-01T05:00:00.000Z","2013-12-02T05:00:00.000Z","2013-12-03T05:00:00.000Z","2013-12-04T05:00:00.000Z","2013-12-05T05:00:00.000Z","2013-12-06T05:00:00.000Z","2013-12-07T05:00:00.000Z","2013-12-08T05:00:00.000Z","2013-12-09T05:00:00.000Z","2013-12-10T05:00:00.000Z","2013-12-11T05:00:00.000Z","2013-12-12T05:00:00.000Z","2013-12-13T05:00:00.000Z","2013-12-14T05:00:00.000Z","2013-12-15T05:00:00.000Z","2013-12-16T05:00:00.000Z","2013-12-17T05:00:00.000Z","2013-12-18T05:00:00.000Z","2013-12-19T05:00:00.000Z","2013-12-20T05:00:00.000Z","2013-12-21T05:00:00.000Z","2013-12-22T05:00:00.000Z","2013-12-23T05:00:00.000Z","2013-12-24T05:00:00.000Z","2013-12-25T05:00:00.000Z","2013-12-26T05:00:00.000Z","2013-12-27T05:00:00.000Z","2013-12-28T05:00:00.000Z","2013-12-29T05:00:00.000Z","2013-12-30T05:00:00.000Z","2013-12-31T05:00:00.000Z","2014-01-01T05:00:00.000Z"],[3,4,1,-7,-7,-2,-8,-8,-6,-11,-11,-14,-9,3,-3,16,1,-3,-12,-7,-3,2,-1,-1,3,-1,-9.5,-3,-12,-1,12,0,-9,-6,-3,1,-6,-5,10,-3,-5,7,-3,-6,-2,-3,-4,-12,-9.5,-3,-5,0,3,3,-8,-5,-5,11,-10,-8,-9,-13,-8,-10,-7.5,0,58,-9,-12,-7,3,-7,-7,-7,2.5,0,9,15,-3,-5,-6,3,-1,-1,-11,-11,-13,-14,-17,-10,0,-5,-4,-1,-2,-11,-9,-10,-9,-4,6,19,0,-3,-5,-8,-4,10,14,1,-3,19,13,4,23,11,-14,-11,-10,-13,-14,-13,-7,-15,-15,-12,-16,10,4,-3,2,-10,-12,-15,-12,-5,-6,-15,-3,-3,-6,5,30.5,10,-7,-14,-15,-9,-6,-10,-11,-16,-5,10,-5,-10,-6,5,-8,-9,3,-7,-5,30,4,-11,-9,3,13,4,-11,-11,-12,-8,14.5,15,5,8,14,0,11,44.5,1,2,-13.5,-15,-15,0,9,7,13,4,4,2,-16,-14,-6,-5.5,-6,-3,-1,-1,12,25,5,2,3,-7,7,3,-7,-8,11,2,-2,-4,-5,-8,-2,20,27,-1,-2,2,16,6,-2,-8,-12,-9,-9,-13,-14,10,-6,-16.5,-20,-18,-16,1,-6,-18,-15,-16,-2,-8,-18,-18,-19,-22,-16,-15,-15,-10,16,4,-17,-15,-7,-13,-16,-11,-13,-11,-8,-10,-12,-9,-9,-11,-19,-14,-15,-21,-16,-5,-10,-13,-10,5,-8,-11,-2,2,-13,-9,-7,-10,-7,-4,-2,-4,-5,-5,-4,-1,-4,-3,-5,-13,-7,-7,-9,-5,0,-4,-11,-1,-8,-8,-1,-3,-10,-9,-9,4,-12,-13,-8,-11,-2.5,-4,-6,-9,-5,-1,-1,-6,-8,-3,8,-5,-14,-17,-11.5,-7,-3,-7.5,12,9,1,10,29,35,7,0,-4,16,5.5,1,27,8,0,2,3,5,15,-1,-9,-2,-5,-8,-1,2,1,3]]},"evals":["attrs.interactionModel"],"jsHooks":[]}</script>
+<p><br></p>
+<p>The syntax here is a little different than what we have covered so far. The <code>dygraph</code> function is expecting for the dates to be given as the <code>rownames</code> of the object. We then remove the <code>date</code> variable from the <code>flights_summarized</code> data frame since it is accounted for in the <code>rownames</code>. Lastly, we run the <code>dygraph</code> function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via <code>dyRangeSelector</code>. (Note that this plot will only be interactive in the HTML version of this book.)</p>
+<!--
+**`paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Use the interactive linegraph to determine the highest median arrival delay for flights from NYC in 2013.  What date was it and what do you think contributed to it?
+
+
+** ** What are three specific questions that can be more easily answered by looking at Figure 4.6 instead of Figure 4.5?
+
+- Changing the labels of a plot (x-axis, y-axis)
+- stat = "identity" for aggregated data and barplots
+- Changing the theme for ggplots (`ggthemes` package too)
+- Adding `code_folding` and `code_download` to YAML
+- `kable` function from `knitr`
+- Reading in data from files in different formats - Getting Used to R book reference
+- Reshaping the data with `tidyr`
+
+-->
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="B-appendixB.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="references.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/93-appendixC.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/previous_versions/v0.4.0/data/ageAtMar.csv b/docs/previous_versions/v0.4.0/data/ageAtMar.csv
new file mode 100755
index 000000000..b68e12a1b
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/data/ageAtMar.csv
@@ -0,0 +1,5535 @@
+age
+32
+25
+24
+26
+32
+29
+23
+23
+29
+27
+23
+21
+29
+40
+22
+20
+31
+21
+25
+24
+23
+31
+30
+34
+29
+29
+35
+25
+22
+26
+26
+21
+25
+22
+19
+24
+22
+26
+27
+29
+25
+22
+29
+21
+23
+28
+25
+29
+27
+25
+21
+20
+31
+33
+32
+24
+33
+24
+22
+33
+33
+25
+24
+23
+20
+27
+28
+23
+27
+25
+22
+29
+27
+19
+27
+17
+29
+23
+27
+24
+27
+31
+30
+19
+24
+28
+15
+19
+26
+22
+19
+19
+22
+26
+24
+20
+30
+18
+22
+21
+17
+27
+27
+31
+34
+32
+22
+28
+25
+24
+18
+23
+17
+16
+23
+24
+21
+30
+23
+24
+26
+25
+22
+27
+19
+24
+24
+22
+26
+22
+23
+33
+28
+20
+24
+20
+26
+28
+24
+19
+41
+34
+23
+33
+39
+27
+23
+26
+18
+28
+34
+31
+18
+18
+22
+21
+29
+29
+27
+30
+25
+27
+23
+29
+22
+23
+22
+22
+26
+18
+22
+33
+19
+21
+29
+29
+18
+27
+27
+23
+20
+25
+25
+25
+23
+28
+30
+20
+28
+27
+29
+27
+29
+40
+23
+29
+32
+24
+23
+27
+29
+23
+23
+19
+17
+25
+37
+23
+42
+35
+24
+26
+27
+28
+30
+21
+18
+19
+19
+18
+22
+19
+28
+32
+20
+29
+31
+29
+31
+31
+24
+25
+24
+27
+24
+21
+29
+24
+31
+26
+19
+41
+26
+24
+23
+20
+26
+18
+27
+28
+22
+18
+26
+32
+28
+23
+32
+34
+28
+25
+25
+19
+25
+33
+25
+23
+17
+36
+29
+25
+20
+31
+28
+29
+18
+28
+20
+25
+28
+27
+25
+21
+30
+22
+25
+23
+25
+27
+27
+23
+25
+25
+21
+29
+22
+36
+22
+24
+24
+30
+25
+22
+21
+19
+27
+25
+29
+19
+25
+28
+25
+20
+30
+27
+23
+24
+30
+25
+18
+22
+30
+25
+25
+23
+29
+25
+30
+19
+22
+19
+24
+19
+25
+22
+37
+34
+35
+22
+26
+18
+31
+22
+21
+29
+22
+33
+26
+20
+24
+24
+15
+23
+28
+24
+21
+23
+26
+19
+25
+25
+17
+18
+19
+21
+29
+22
+19
+29
+23
+22
+28
+26
+23
+20
+24
+27
+25
+34
+24
+30
+22
+30
+20
+28
+27
+27
+27
+21
+23
+29
+23
+31
+20
+18
+22
+20
+28
+20
+22
+21
+21
+31
+29
+24
+31
+22
+19
+19
+18
+23
+19
+21
+29
+27
+20
+37
+27
+32
+22
+20
+21
+23
+21
+21
+23
+19
+19
+38
+21
+25
+21
+33
+23
+23
+18
+33
+27
+21
+20
+25
+18
+20
+19
+26
+20
+26
+20
+25
+16
+16
+22
+24
+22
+24
+21
+24
+34
+19
+29
+24
+23
+23
+21
+21
+23
+31
+17
+28
+19
+17
+24
+27
+23
+22
+21
+34
+17
+30
+41
+28
+27
+28
+24
+19
+20
+28
+29
+20
+25
+16
+21
+26
+31
+20
+18
+19
+21
+16
+21
+31
+21
+22
+18
+23
+33
+36
+26
+26
+18
+24
+25
+25
+28
+22
+20
+24
+20
+26
+21
+26
+26
+24
+38
+22
+25
+21
+35
+20
+18
+22
+23
+32
+27
+20
+13
+20
+26
+20
+21
+20
+37
+26
+22
+26
+29
+21
+22
+25
+27
+21
+29
+19
+32
+27
+31
+32
+20
+22
+22
+29
+17
+25
+26
+22
+26
+21
+33
+26
+21
+24
+22
+28
+26
+25
+26
+22
+24
+27
+33
+23
+26
+21
+26
+22
+26
+21
+19
+28
+30
+22
+20
+27
+26
+23
+29
+23
+26
+20
+26
+30
+18
+26
+15
+25
+25
+23
+30
+28
+18
+21
+25
+26
+26
+25
+29
+17
+22
+25
+30
+31
+24
+38
+25
+20
+26
+30
+23
+24
+25
+26
+21
+29
+29
+18
+22
+21
+29
+22
+20
+19
+20
+19
+25
+25
+18
+19
+19
+21
+24
+21
+22
+23
+19
+22
+29
+25
+26
+24
+26
+29
+22
+21
+21
+20
+25
+26
+26
+21
+24
+40
+20
+19
+16
+29
+24
+25
+22
+28
+23
+17
+16
+23
+20
+21
+23
+33
+20
+27
+19
+24
+26
+22
+26
+22
+29
+29
+25
+21
+24
+32
+42
+22
+26
+24
+19
+21
+22
+21
+27
+37
+22
+20
+31
+24
+20
+25
+22
+23
+20
+24
+25
+33
+20
+21
+19
+28
+24
+19
+19
+31
+28
+20
+27
+18
+22
+20
+22
+25
+28
+26
+20
+20
+37
+21
+19
+32
+29
+19
+19
+34
+21
+27
+29
+23
+21
+23
+26
+30
+26
+35
+23
+18
+19
+33
+30
+21
+26
+25
+35
+30
+29
+25
+20
+21
+25
+26
+25
+39
+25
+24
+26
+23
+29
+26
+22
+25
+25
+20
+22
+26
+22
+29
+26
+25
+25
+29
+19
+27
+40
+21
+25
+29
+21
+23
+26
+19
+33
+17
+28
+25
+24
+26
+18
+31
+21
+24
+26
+24
+25
+32
+18
+22
+25
+26
+23
+20
+20
+29
+30
+28
+23
+28
+27
+21
+19
+23
+21
+21
+23
+23
+18
+26
+22
+20
+20
+15
+24
+20
+32
+32
+28
+23
+26
+32
+19
+28
+24
+30
+17
+27
+26
+26
+27
+33
+36
+22
+22
+30
+26
+23
+20
+18
+21
+17
+19
+31
+21
+19
+30
+24
+25
+28
+21
+25
+26
+30
+23
+28
+27
+32
+25
+20
+20
+37
+23
+28
+24
+35
+25
+26
+21
+30
+23
+32
+21
+24
+20
+33
+20
+18
+28
+27
+19
+29
+19
+14
+20
+21
+17
+28
+21
+21
+31
+25
+27
+29
+29
+18
+29
+25
+27
+25
+21
+24
+18
+22
+22
+17
+29
+21
+33
+29
+21
+27
+18
+22
+18
+18
+20
+23
+19
+21
+21
+23
+19
+27
+22
+24
+17
+17
+27
+29
+22
+26
+33
+20
+21
+21
+27
+21
+25
+22
+17
+27
+21
+18
+30
+31
+23
+26
+19
+21
+30
+28
+19
+27
+23
+23
+19
+23
+13
+28
+23
+18
+17
+25
+25
+25
+23
+26
+21
+23
+35
+20
+23
+23
+21
+25
+16
+23
+17
+24
+20
+24
+17
+22
+28
+24
+25
+18
+23
+28
+19
+29
+27
+26
+25
+23
+28
+21
+26
+25
+29
+25
+22
+28
+23
+25
+22
+25
+19
+19
+20
+16
+24
+20
+18
+30
+19
+29
+23
+31
+15
+25
+15
+29
+23
+24
+14
+19
+31
+28
+16
+17
+28
+21
+18
+22
+20
+23
+18
+24
+25
+14
+18
+23
+19
+17
+20
+14
+22
+16
+21
+19
+17
+21
+27
+24
+19
+17
+21
+17
+25
+26
+18
+24
+26
+24
+22
+25
+24
+21
+16
+29
+18
+26
+28
+28
+19
+29
+21
+31
+30
+23
+19
+18
+23
+21
+22
+23
+21
+21
+16
+35
+21
+20
+20
+25
+30
+29
+29
+18
+18
+25
+22
+19
+30
+25
+26
+18
+26
+23
+25
+24
+18
+19
+22
+19
+28
+18
+24
+19
+22
+35
+14
+23
+19
+24
+25
+20
+28
+32
+17
+28
+29
+25
+29
+24
+25
+21
+25
+22
+24
+24
+20
+24
+21
+23
+26
+22
+20
+22
+27
+22
+21
+23
+26
+25
+22
+26
+24
+19
+34
+20
+27
+23
+20
+27
+27
+19
+25
+22
+31
+33
+19
+20
+31
+21
+17
+17
+21
+23
+18
+27
+33
+27
+23
+26
+27
+22
+18
+19
+25
+19
+24
+17
+25
+19
+15
+20
+24
+23
+19
+25
+27
+27
+22
+29
+21
+20
+32
+28
+29
+21
+26
+16
+17
+19
+21
+15
+22
+23
+22
+22
+25
+23
+21
+37
+18
+21
+29
+20
+17
+19
+22
+27
+23
+25
+22
+19
+15
+22
+19
+28
+24
+23
+20
+21
+22
+30
+24
+20
+24
+20
+18
+27
+22
+22
+40
+18
+25
+25
+23
+19
+18
+26
+24
+14
+21
+19
+18
+22
+10
+23
+18
+18
+21
+30
+18
+20
+31
+19
+17
+18
+25
+19
+22
+31
+19
+27
+17
+23
+27
+24
+17
+19
+19
+25
+21
+28
+23
+20
+23
+22
+24
+23
+37
+25
+21
+24
+18
+23
+16
+21
+17
+26
+30
+28
+23
+30
+26
+24
+20
+22
+30
+25
+15
+21
+25
+26
+26
+20
+22
+41
+20
+25
+42
+22
+25
+16
+19
+20
+23
+29
+23
+21
+23
+24
+23
+19
+22
+23
+18
+21
+18
+21
+20
+20
+19
+23
+25
+19
+25
+16
+22
+29
+19
+29
+30
+27
+22
+20
+18
+26
+20
+20
+18
+19
+31
+21
+31
+22
+20
+25
+18
+19
+19
+27
+28
+29
+25
+18
+22
+18
+22
+25
+20
+18
+20
+19
+22
+19
+18
+23
+20
+20
+24
+27
+22
+21
+31
+18
+33
+16
+23
+21
+23
+21
+20
+27
+19
+21
+28
+30
+18
+17
+28
+18
+31
+21
+32
+22
+16
+26
+26
+18
+19
+15
+22
+21
+19
+19
+19
+22
+15
+26
+29
+23
+20
+20
+20
+24
+21
+23
+32
+31
+21
+31
+28
+33
+26
+20
+24
+24
+25
+31
+24
+32
+28
+20
+22
+18
+25
+16
+18
+12
+22
+29
+31
+20
+21
+20
+23
+24
+29
+19
+20
+18
+21
+20
+22
+21
+20
+28
+22
+17
+24
+30
+17
+27
+33
+26
+22
+18
+25
+22
+32
+22
+18
+22
+27
+24
+30
+30
+25
+25
+23
+19
+28
+19
+16
+20
+25
+29
+18
+33
+25
+28
+32
+34
+32
+25
+24
+28
+26
+27
+29
+30
+20
+24
+28
+27
+24
+27
+24
+23
+26
+26
+26
+26
+26
+24
+22
+30
+21
+22
+22
+34
+25
+27
+21
+14
+30
+27
+32
+28
+20
+24
+22
+23
+15
+20
+25
+22
+22
+25
+27
+30
+26
+23
+23
+15
+21
+18
+21
+23
+22
+24
+29
+24
+19
+29
+18
+21
+20
+18
+23
+24
+19
+21
+28
+20
+30
+19
+23
+24
+24
+25
+23
+22
+29
+26
+26
+23
+25
+18
+41
+19
+28
+18
+19
+38
+19
+33
+30
+18
+16
+23
+29
+31
+19
+22
+18
+21
+17
+22
+26
+26
+20
+25
+28
+24
+23
+28
+17
+22
+25
+25
+22
+19
+18
+17
+25
+23
+32
+22
+17
+30
+27
+16
+21
+21
+26
+17
+22
+22
+31
+23
+22
+25
+30
+23
+25
+20
+25
+21
+21
+19
+16
+24
+28
+19
+19
+39
+21
+32
+25
+26
+40
+16
+25
+21
+19
+20
+20
+25
+20
+22
+25
+22
+16
+17
+19
+15
+30
+18
+21
+30
+21
+35
+20
+21
+26
+30
+17
+23
+19
+25
+19
+17
+32
+25
+32
+19
+25
+25
+27
+19
+20
+28
+25
+28
+21
+29
+23
+25
+17
+17
+24
+37
+28
+22
+25
+31
+25
+24
+19
+22
+28
+24
+24
+18
+29
+22
+27
+26
+27
+19
+35
+17
+20
+18
+19
+27
+22
+29
+26
+30
+27
+19
+22
+25
+22
+29
+28
+25
+29
+38
+23
+35
+25
+32
+23
+30
+35
+24
+26
+22
+16
+23
+26
+21
+23
+19
+28
+16
+22
+26
+26
+26
+31
+19
+21
+20
+17
+26
+20
+28
+25
+27
+20
+25
+32
+32
+19
+15
+24
+22
+20
+22
+26
+23
+21
+18
+22
+27
+19
+21
+24
+20
+25
+22
+21
+20
+18
+20
+18
+20
+27
+18
+18
+23
+16
+18
+22
+22
+19
+18
+22
+26
+19
+35
+31
+22
+15
+26
+21
+25
+38
+15
+19
+26
+18
+24
+22
+21
+18
+19
+25
+27
+20
+19
+22
+23
+23
+28
+29
+16
+23
+27
+19
+23
+23
+25
+22
+18
+25
+20
+24
+20
+27
+30
+24
+17
+18
+18
+22
+25
+23
+22
+18
+23
+21
+27
+28
+20
+29
+23
+18
+34
+24
+36
+25
+28
+35
+23
+23
+31
+23
+27
+23
+19
+20
+26
+24
+34
+29
+21
+27
+23
+27
+23
+21
+22
+21
+37
+23
+24
+29
+30
+26
+26
+21
+29
+23
+26
+21
+19
+21
+34
+25
+19
+29
+19
+22
+29
+24
+25
+28
+14
+25
+33
+21
+25
+20
+26
+20
+24
+18
+26
+27
+16
+23
+21
+22
+28
+27
+19
+18
+27
+28
+19
+27
+22
+22
+27
+26
+24
+29
+28
+22
+27
+25
+23
+19
+25
+23
+26
+22
+23
+24
+24
+18
+32
+23
+27
+23
+16
+24
+27
+28
+20
+24
+30
+18
+27
+21
+25
+28
+24
+34
+31
+23
+37
+35
+38
+35
+20
+28
+31
+27
+35
+22
+22
+33
+29
+26
+25
+19
+23
+28
+19
+37
+22
+23
+26
+21
+18
+18
+37
+29
+20
+22
+24
+25
+19
+27
+32
+27
+32
+26
+31
+37
+23
+22
+24
+32
+29
+29
+25
+23
+31
+20
+17
+16
+26
+21
+21
+28
+23
+29
+24
+17
+17
+24
+22
+21
+24
+20
+23
+22
+26
+21
+21
+24
+26
+25
+37
+24
+35
+23
+21
+24
+22
+28
+25
+25
+20
+22
+21
+21
+17
+36
+26
+28
+16
+17
+20
+14
+21
+28
+18
+25
+29
+31
+34
+28
+24
+34
+29
+18
+29
+25
+18
+19
+26
+22
+22
+20
+30
+24
+22
+29
+23
+21
+19
+26
+29
+26
+26
+18
+24
+21
+27
+22
+25
+29
+31
+23
+29
+19
+23
+29
+25
+26
+24
+23
+21
+30
+29
+22
+19
+23
+32
+34
+30
+27
+23
+22
+20
+24
+29
+21
+21
+19
+24
+24
+24
+27
+27
+19
+22
+20
+34
+22
+19
+21
+30
+18
+23
+22
+21
+25
+18
+30
+28
+17
+24
+21
+23
+23
+18
+19
+27
+23
+24
+25
+16
+19
+29
+21
+36
+28
+21
+25
+19
+26
+25
+26
+26
+20
+21
+20
+23
+26
+26
+14
+21
+29
+27
+25
+27
+22
+26
+20
+26
+19
+29
+29
+34
+25
+22
+25
+18
+22
+21
+23
+25
+32
+23
+34
+24
+23
+24
+30
+30
+27
+28
+32
+29
+41
+25
+26
+23
+30
+20
+26
+19
+22
+26
+26
+22
+21
+18
+24
+22
+20
+27
+30
+23
+24
+18
+19
+21
+35
+16
+21
+23
+24
+22
+24
+25
+25
+18
+30
+16
+21
+22
+33
+20
+17
+23
+26
+24
+23
+27
+28
+27
+25
+27
+30
+25
+29
+30
+30
+24
+21
+26
+20
+34
+21
+22
+22
+25
+17
+29
+22
+25
+21
+25
+23
+23
+21
+23
+23
+39
+25
+22
+26
+20
+19
+18
+24
+19
+20
+24
+19
+33
+25
+24
+21
+20
+25
+26
+21
+21
+29
+35
+30
+20
+22
+28
+20
+20
+37
+26
+26
+21
+23
+31
+18
+23
+21
+19
+22
+21
+16
+22
+28
+20
+21
+20
+24
+25
+20
+19
+17
+17
+20
+18
+23
+21
+25
+21
+19
+31
+30
+33
+16
+34
+34
+18
+25
+28
+18
+20
+25
+23
+26
+23
+22
+29
+24
+22
+21
+25
+25
+20
+24
+23
+25
+22
+19
+24
+27
+25
+22
+35
+29
+36
+29
+25
+21
+24
+29
+28
+30
+25
+26
+26
+30
+34
+24
+18
+25
+23
+39
+21
+20
+31
+20
+21
+28
+25
+23
+29
+19
+28
+27
+19
+18
+30
+25
+29
+18
+19
+21
+22
+21
+24
+38
+20
+23
+25
+19
+19
+18
+25
+26
+27
+19
+24
+25
+29
+18
+18
+25
+24
+18
+19
+24
+22
+25
+22
+19
+20
+20
+35
+23
+17
+25
+27
+28
+16
+32
+28
+18
+25
+18
+22
+19
+18
+27
+22
+30
+19
+23
+19
+20
+28
+24
+16
+19
+21
+24
+15
+18
+30
+20
+24
+27
+23
+26
+18
+20
+18
+21
+18
+27
+23
+29
+23
+28
+28
+27
+27
+28
+20
+22
+30
+23
+19
+19
+19
+25
+30
+24
+21
+19
+29
+20
+28
+22
+20
+19
+27
+22
+28
+25
+24
+35
+26
+26
+23
+26
+20
+24
+26
+28
+31
+26
+22
+22
+27
+26
+21
+23
+24
+27
+23
+22
+24
+21
+27
+21
+18
+17
+27
+18
+23
+28
+18
+25
+20
+20
+28
+20
+25
+26
+21
+26
+26
+23
+27
+21
+21
+33
+23
+22
+24
+21
+24
+24
+23
+25
+26
+23
+25
+20
+20
+26
+21
+26
+25
+22
+21
+28
+29
+20
+25
+21
+26
+23
+27
+24
+16
+31
+21
+23
+28
+23
+22
+27
+24
+29
+31
+30
+30
+19
+21
+21
+20
+28
+20
+29
+22
+26
+26
+28
+21
+31
+25
+23
+32
+26
+30
+28
+33
+18
+33
+22
+16
+32
+26
+23
+33
+28
+24
+25
+28
+30
+24
+21
+24
+22
+23
+23
+26
+22
+29
+28
+24
+30
+22
+22
+26
+33
+19
+24
+22
+19
+19
+25
+28
+19
+21
+30
+18
+30
+22
+20
+31
+28
+22
+27
+22
+20
+21
+22
+23
+22
+39
+21
+23
+23
+28
+20
+35
+20
+19
+35
+26
+19
+35
+25
+25
+25
+30
+22
+18
+27
+20
+25
+24
+19
+29
+18
+30
+25
+24
+22
+25
+25
+35
+25
+22
+20
+31
+26
+27
+25
+25
+21
+22
+22
+22
+25
+26
+23
+20
+28
+23
+30
+18
+25
+26
+22
+19
+19
+20
+34
+25
+22
+23
+16
+29
+35
+23
+26
+19
+38
+20
+25
+19
+24
+19
+19
+29
+21
+20
+29
+21
+23
+29
+19
+20
+22
+28
+17
+23
+24
+32
+21
+25
+15
+25
+22
+24
+27
+20
+28
+24
+36
+19
+33
+19
+31
+21
+34
+26
+23
+19
+30
+32
+25
+28
+24
+23
+31
+25
+25
+22
+27
+24
+20
+23
+23
+23
+26
+23
+26
+18
+23
+24
+24
+19
+25
+31
+23
+19
+23
+25
+22
+24
+21
+22
+22
+17
+23
+20
+24
+24
+22
+20
+26
+24
+20
+22
+27
+39
+20
+14
+22
+25
+21
+30
+19
+34
+24
+23
+18
+20
+21
+14
+27
+27
+20
+30
+20
+34
+25
+28
+24
+32
+26
+23
+20
+30
+25
+21
+19
+24
+24
+19
+20
+19
+21
+27
+21
+24
+24
+25
+27
+23
+17
+30
+39
+21
+22
+21
+24
+20
+25
+19
+23
+22
+19
+16
+20
+21
+31
+24
+24
+22
+29
+21
+22
+18
+19
+21
+25
+21
+29
+24
+22
+18
+24
+28
+19
+21
+19
+28
+25
+22
+24
+21
+27
+20
+28
+18
+18
+23
+25
+22
+22
+21
+26
+28
+26
+25
+20
+24
+20
+19
+23
+21
+30
+42
+29
+39
+34
+21
+18
+24
+25
+29
+40
+23
+24
+23
+22
+17
+17
+21
+26
+35
+22
+18
+26
+20
+22
+25
+19
+21
+27
+20
+22
+19
+21
+29
+20
+23
+17
+25
+16
+19
+21
+20
+36
+21
+17
+18
+23
+21
+21
+19
+22
+22
+21
+25
+25
+18
+27
+20
+24
+31
+19
+27
+21
+19
+19
+26
+23
+24
+24
+32
+25
+20
+27
+22
+15
+26
+29
+18
+20
+27
+22
+21
+23
+19
+25
+24
+33
+29
+33
+24
+27
+18
+28
+24
+27
+29
+33
+21
+19
+27
+19
+19
+24
+22
+12
+21
+22
+22
+30
+28
+28
+25
+23
+31
+33
+25
+27
+26
+21
+29
+21
+23
+17
+23
+24
+16
+22
+20
+21
+19
+22
+26
+30
+27
+35
+26
+22
+33
+25
+19
+21
+20
+34
+25
+26
+26
+19
+26
+24
+23
+27
+18
+22
+25
+19
+27
+25
+26
+26
+29
+23
+22
+23
+28
+21
+17
+19
+20
+17
+22
+26
+18
+19
+26
+18
+20
+22
+24
+23
+19
+19
+24
+25
+25
+26
+21
+17
+20
+17
+20
+22
+19
+19
+18
+27
+26
+21
+33
+25
+31
+16
+28
+22
+26
+18
+19
+23
+18
+24
+29
+30
+25
+16
+26
+20
+23
+23
+20
+16
+26
+33
+19
+22
+17
+20
+21
+18
+26
+19
+24
+24
+26
+23
+21
+20
+18
+19
+27
+21
+28
+24
+31
+20
+22
+21
+18
+29
+26
+18
+20
+22
+22
+22
+20
+22
+28
+33
+19
+26
+20
+25
+19
+34
+18
+19
+18
+19
+29
+16
+27
+19
+19
+26
+24
+28
+29
+22
+19
+20
+28
+22
+16
+20
+20
+28
+26
+29
+25
+23
+18
+19
+18
+15
+29
+20
+33
+27
+19
+29
+28
+26
+20
+21
+22
+32
+18
+24
+19
+20
+21
+20
+28
+21
+31
+24
+40
+21
+15
+26
+26
+26
+19
+21
+22
+18
+23
+21
+20
+21
+20
+20
+35
+26
+30
+19
+27
+19
+20
+29
+24
+17
+22
+16
+34
+25
+24
+20
+24
+27
+21
+30
+26
+38
+38
+18
+18
+32
+19
+24
+20
+15
+24
+30
+25
+15
+28
+20
+27
+24
+31
+15
+19
+31
+20
+35
+18
+20
+19
+25
+16
+24
+22
+28
+30
+21
+24
+20
+20
+19
+34
+23
+16
+21
+16
+19
+20
+21
+27
+21
+22
+20
+22
+19
+23
+30
+24
+31
+31
+22
+19
+33
+23
+22
+19
+16
+18
+22
+17
+20
+18
+31
+25
+23
+19
+18
+19
+21
+21
+27
+22
+19
+29
+20
+20
+16
+18
+24
+25
+19
+31
+19
+22
+19
+16
+17
+23
+21
+22
+27
+16
+23
+20
+20
+20
+19
+19
+24
+18
+21
+23
+20
+21
+20
+20
+18
+20
+20
+22
+18
+33
+18
+15
+17
+21
+16
+36
+31
+17
+21
+17
+18
+17
+19
+23
+18
+17
+20
+16
+23
+20
+24
+18
+14
+17
+14
+17
+23
+18
+24
+23
+25
+20
+18
+15
+22
+26
+17
+19
+18
+34
+24
+18
+27
+20
+20
+19
+19
+20
+19
+25
+27
+23
+28
+22
+27
+24
+26
+15
+26
+28
+24
+33
+24
+23
+16
+30
+21
+22
+26
+23
+18
+28
+26
+31
+22
+27
+21
+20
+19
+29
+16
+24
+26
+21
+25
+19
+26
+29
+28
+24
+29
+28
+21
+17
+22
+26
+19
+34
+26
+19
+29
+24
+30
+16
+24
+25
+22
+24
+19
+22
+21
+23
+30
+20
+22
+27
+27
+28
+23
+24
+17
+31
+25
+25
+25
+22
+23
+17
+25
+29
+33
+19
+24
+33
+18
+27
+30
+15
+30
+17
+21
+25
+18
+28
+22
+23
+20
+18
+19
+32
+24
+25
+23
+26
+30
+24
+25
+25
+20
+24
+19
+22
+31
+26
+28
+28
+24
+19
+26
+18
+25
+17
+34
+19
+28
+20
+21
+21
+18
+18
+19
+21
+34
+20
+24
+16
+20
+22
+22
+21
+24
+23
+20
+19
+17
+19
+21
+33
+25
+18
+17
+29
+27
+27
+33
+22
+22
+23
+13
+25
+24
+21
+21
+32
+20
+21
+28
+20
+29
+25
+25
+28
+34
+26
+25
+24
+21
+25
+20
+21
+27
+27
+18
+23
+14
+27
+22
+24
+21
+26
+24
+23
+19
+20
+22
+22
+20
+30
+23
+28
+19
+21
+23
+26
+19
+27
+27
+22
+24
+25
+36
+19
+34
+35
+26
+21
+23
+33
+20
+23
+26
+21
+19
+24
+20
+28
+21
+37
+26
+21
+18
+20
+18
+43
+25
+19
+28
+19
+20
+25
+20
+21
+15
+21
+20
+21
+19
+29
+22
+22
+18
+20
+29
+29
+23
+27
+21
+20
+18
+35
+25
+23
+24
+18
+20
+19
+18
+16
+37
+26
+24
+33
+35
+23
+20
+22
+14
+24
+19
+19
+18
+29
+15
+17
+37
+22
+25
+19
+20
+32
+21
+19
+29
+21
+23
+16
+24
+20
+22
+18
+18
+19
+23
+39
+21
+19
+22
+24
+25
+28
+18
+16
+18
+21
+21
+18
+18
+20
+24
+23
+15
+19
+19
+22
+23
+27
+26
+25
+24
+22
+18
+17
+18
+26
+18
+24
+18
+23
+20
+24
+24
+21
+27
+27
+35
+24
+25
+23
+20
+24
+20
+25
+21
+24
+23
+25
+21
+20
+21
+20
+32
+24
+18
+28
+16
+19
+18
+23
+24
+25
+20
+23
+20
+29
+23
+18
+21
+21
+23
+23
+21
+22
+22
+21
+20
+28
+21
+22
+21
+21
+24
+20
+28
+17
+21
+18
+20
+19
+20
+23
+33
+19
+18
+25
+23
+24
+19
+23
+25
+21
+26
+27
+19
+28
+20
+34
+25
+20
+19
+22
+22
+30
+21
+24
+18
+20
+15
+19
+23
+24
+36
+18
+27
+21
+17
+21
+26
+18
+24
+31
+14
+30
+26
+23
+19
+16
+23
+19
+20
+28
+23
+23
+33
+34
+32
+21
+20
+18
+25
+26
+24
+27
+17
+31
+38
+22
+31
+20
+25
+23
+15
+24
+21
+20
+19
+15
+23
+24
+28
+20
+28
+27
+19
+24
+25
+19
+25
+29
+25
+22
+21
+26
+21
+25
+21
+18
+27
+25
+23
+22
+23
+23
+24
+22
+21
+22
+20
+23
+25
+23
+21
+20
+21
+21
+22
+26
+25
+18
+18
+25
+24
+20
+26
+21
+20
+23
+20
+20
+17
+17
+19
+23
+20
+19
+19
+21
+20
+26
+22
+22
+24
+28
+22
+25
+22
+19
+20
+21
+21
+21
+22
+28
+20
+21
+25
+22
+24
+30
+21
+19
+21
+24
+27
+21
+19
+22
+15
+18
+20
+21
+19
+22
+16
+21
+18
+23
+19
+19
+21
+24
+27
+21
+27
+24
+31
+20
+26
+20
+21
+18
+24
+21
+24
+19
+18
+24
+23
+33
+25
+22
+30
+28
+21
+29
+25
+25
+29
+27
+25
+27
+27
+25
+27
+27
+20
+22
+27
+36
+19
+16
+24
+18
+27
+26
+19
+23
+22
+22
+24
+24
+22
+26
+29
+23
+25
+25
+21
+24
+21
+24
+22
+17
+24
+26
+25
+19
+21
+21
+20
+20
+22
+24
+30
+26
+24
+29
+22
+28
+27
+32
+23
+19
+24
+28
+30
+25
+33
+30
+21
+18
+29
+32
+28
+34
+21
+22
+30
+21
+24
+25
+33
+18
+23
+24
+34
+26
+25
+22
+23
+26
+33
+27
+24
+25
+22
+29
+19
+26
+22
+23
+19
+18
+15
+20
+24
+18
+18
+21
+18
+18
+18
+19
+17
+31
+20
+16
+24
+20
+25
+25
+22
+18
+18
+26
+23
+40
+20
+19
+21
+19
+21
+23
+19
+25
+20
+22
+24
+20
+23
+29
+20
+23
+23
+19
+23
+25
+23
+24
+25
+22
+28
+23
+28
+23
+16
+24
+23
+20
+27
+25
+20
+25
+30
+31
+23
+19
+29
+18
+25
+22
+22
+20
+13
+38
+18
+22
+19
+20
+18
+28
+16
+25
+19
+24
+21
+21
+19
+18
+21
+21
+18
+21
+24
+17
+21
+20
+19
+19
+18
+24
+18
+25
+28
+18
+27
+19
+27
+19
+31
+19
+28
+21
+17
+29
+21
+18
+26
+24
+31
+25
+23
+27
+22
+26
+27
+23
+20
+20
+27
+29
+21
+23
+35
+27
+19
+31
+34
+19
+23
+26
+27
+17
+19
+18
+19
+19
+20
+23
+24
+20
+21
+17
+18
+23
+21
+21
+24
+16
+19
+19
+16
+21
+17
+24
+19
+16
+21
+16
+22
+25
+42
+25
+22
+16
+25
+17
+23
+30
+31
+23
+26
+24
+18
+23
+28
+21
+21
+18
+19
+27
+21
+18
+24
+14
+21
+26
+28
+18
+19
+18
+36
+22
+21
+17
+18
+30
+21
+22
+23
+20
+21
+22
+26
+25
+22
+29
+21
+23
+18
+18
+25
+23
+19
+18
+29
+27
+22
+26
+26
+17
+26
+22
+30
+26
+16
+28
+26
+20
+19
+18
+23
+22
+35
+26
+21
+22
+23
+24
+23
+20
+22
+25
+21
+24
+33
+18
+22
+25
+33
+19
+20
+24
+24
+24
+28
+20
+32
+21
+23
+26
+25
+24
+23
+24
+30
+22
+28
+30
+19
+30
+23
+28
+20
+24
+28
+19
+22
+18
+24
+25
+22
+30
+24
+24
+19
+30
+27
+23
+32
+23
+29
+25
+17
+19
+18
+19
+18
+24
+22
+28
+24
+21
+27
+22
+23
+28
+24
+18
+23
+20
+22
+22
+17
+23
+23
+28
+22
+20
+24
+24
+24
+22
+26
+26
+33
+20
+21
+30
+26
+26
+21
+19
+20
+24
+34
+21
+18
+19
+23
+26
+29
+19
+25
+21
+22
+26
+28
+27
+27
+19
+22
+24
+20
+25
+18
+21
+21
+20
+19
+20
+26
+24
+20
+18
+27
+19
+21
+24
+23
+21
+27
+20
+26
+21
+18
+20
+23
+23
+24
+29
+20
+21
+18
+25
+22
+29
+18
+19
+30
+18
+25
+20
+22
+24
+27
+25
+25
+22
+18
+17
+19
+27
+28
+26
+20
+22
+24
+23
+23
+25
+20
+23
+27
+20
+24
+23
+25
+24
+19
+18
+22
+24
+23
+15
+19
+18
+22
+16
+18
+35
+22
+22
+20
+25
+20
+20
+25
+22
+37
+21
+18
+19
+18
+18
+27
+21
+24
+20
+20
+19
+22
+22
+23
+20
+18
+19
+22
+25
+25
+25
+20
+18
+20
+24
+21
+18
+19
+19
+21
+19
+20
+27
+27
+23
+24
+22
+19
+20
+22
+18
+19
+29
+16
+38
+24
+19
+23
+14
+36
+25
+19
+23
+30
+26
+28
+26
+26
+15
+22
+21
+20
+22
+21
+22
+19
+28
+18
+33
+25
+16
+24
+19
+24
+20
+24
+21
+25
+21
+20
+28
+19
+21
+24
+18
+18
+31
+18
+20
+19
+23
+19
+23
+25
+20
+24
+20
+21
+26
+22
+22
+25
+24
+21
+23
+25
+24
+18
+23
+25
+18
+26
+24
+21
+25
+23
+22
+28
+21
+24
+20
+26
+25
+19
+20
+24
+16
+25
+26
+31
+26
+20
+29
+23
+19
+24
+27
+22
+27
+23
+22
+24
+20
+19
+26
+23
+21
+19
+20
+31
+17
+18
+21
+17
+22
+22
+26
+26
+22
+18
+15
+19
+26
+23
+20
+15
+23
+18
+22
+21
+21
+21
+27
+19
+20
+28
+21
+39
+26
+22
+20
+24
+20
+20
+28
+30
+18
+22
+28
+20
+19
+19
+20
+27
+18
+24
+21
+20
+20
+32
+20
+22
+18
+22
+18
+30
+17
+17
+20
+23
+17
+24
+24
+16
+20
+20
+24
+26
+22
+19
+21
+28
+21
+26
+26
+17
+27
+26
+19
+33
+22
+18
+21
+21
+24
+16
+20
+22
+14
+22
+21
+21
+19
+24
+39
+20
+16
+25
+20
+26
+29
+23
+29
+26
+20
+20
+36
+30
+24
+23
+30
+27
+29
+26
+25
+23
+24
+28
+27
+18
+32
+18
+23
+19
+21
+21
+17
+27
+19
+26
+24
+21
+21
+27
+23
+23
+23
+23
+25
+21
+27
+20
+23
+21
+27
+20
+23
+23
+18
+16
+19
+19
+37
+19
+23
+22
+27
+26
+19
+22
+24
+19
+16
+17
+20
+22
+23
+18
+24
+19
+17
+29
+25
+21
+23
+23
+20
+19
+17
+21
+15
+24
+25
+18
+20
+23
+20
+22
+19
+27
+15
+24
+19
+16
+19
+16
+15
+14
+18
+16
+19
+17
+19
+18
+16
+18
+21
+18
+42
+20
+17
+17
+19
+18
+28
+16
+31
+29
+26
+28
+18
+17
+17
+17
+30
+23
+25
+19
+20
+19
+20
+20
+25
+26
+20
+24
+18
+27
+25
+20
+20
+22
+19
+25
+30
+22
+17
+19
+19
+21
+36
+17
+25
+17
+13
+20
+28
+21
+21
+26
+40
+24
+25
+33
+23
+35
+23
+19
+22
+18
+23
+27
+31
+19
+23
+27
+22
+18
+19
+18
+22
+21
+22
+37
+19
+22
+25
+27
+38
+33
+19
+23
+17
+41
+20
+20
+21
+34
+20
+20
+20
+15
+20
+30
+23
+16
+28
+18
+21
+16
+18
+18
+18
+26
+18
+18
+21
+20
+21
+18
+20
+17
+21
+21
+18
+22
+15
+22
+18
+22
+20
+24
+20
+17
+29
+25
+18
+23
+21
+18
+18
+21
+18
+23
+25
+20
+20
+20
+17
+20
+25
+18
+25
+24
+18
+20
+19
+27
+28
+21
+22
+28
+16
+17
+16
+19
+17
+29
+21
+22
+21
+18
+22
+27
+26
+22
+20
+20
+24
+19
+22
+18
+32
+21
+19
+21
+15
+28
+20
+25
+19
+24
+19
+19
+33
+39
+18
+21
+25
+19
+19
+23
+21
+29
+19
+24
+22
+25
+21
+18
+24
+18
+21
+20
+18
+23
+33
+21
+19
+18
+26
+21
+17
+18
+34
+18
+21
+18
+19
+17
+32
+24
+21
+24
+20
+18
+22
+20
+17
+23
+21
+19
+26
+23
+26
+21
+23
+15
+21
+17
+28
+20
+28
+20
+22
+22
+24
+17
+32
+24
+16
+24
+23
+20
+27
+22
+42
+28
+18
+31
+22
+22
+19
+19
+22
+32
+15
+27
+23
+23
+18
+18
+22
+25
+20
+22
+22
+17
+21
+17
+20
+15
+19
+18
+26
+25
+18
+24
+26
+22
+18
+22
+17
+31
+18
+21
+31
+20
+26
+27
+25
+26
+27
+19
+18
+24
+18
+22
+23
+28
+28
+23
+26
+29
+28
+18
+20
+20
+15
+18
+23
+26
+20
+20
+23
+26
+19
+19
+20
+25
+21
+21
+24
+19
+20
+16
+14
+24
+19
+28
+20
+25
+31
+21
+22
+23
+19
+24
+19
+20
+19
+20
+22
+22
+27
+22
+26
+22
+14
+19
+18
+20
+27
+20
+20
+21
+21
+24
+24
+16
+25
+27
+22
+21
+31
+26
+20
+17
+21
+20
+19
+19
+21
+16
+21
+33
+22
+19
+25
+23
+23
+21
+22
+27
+20
+21
+23
+17
+23
+18
+28
+25
+23
+31
+35
+23
+20
+18
+24
+31
+19
+32
+19
+30
+19
+26
+19
+22
+16
+19
+21
+21
+40
+23
+26
+17
+20
+17
+31
+21
+22
+22
+18
+17
+22
+24
+25
+25
+23
+24
+23
+30
+21
+25
+32
+23
+27
+26
+22
+25
+34
+16
+22
+22
+18
+23
+23
+20
+28
+26
+26
+19
+34
+22
+28
+19
+24
+21
+19
diff --git a/docs/previous_versions/v0.4.0/data/cleSac.txt b/docs/previous_versions/v0.4.0/data/cleSac.txt
new file mode 100755
index 000000000..20e7da082
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/data/cleSac.txt
@@ -0,0 +1 @@
+Census_year	State_FIPS_code	Metropolitan_area_Detailed	Age	Sex	Race_General	Marital_status	Total_personal_income2000	California	Sacramento_ CA	56	Male	Japanese	Married_ spouse present	402402000	California	Sacramento_ CA	53	Female	White	Married_ spouse present	136002000	California	Sacramento_ CA	17	Female	Two major races	Never married/single (N/A)	02000	California	Sacramento_ CA	37	Female	White	Never married/single (N/A)	490002000	California	Sacramento_ CA	40	Male	White	Never married/single (N/A)	383002000	California	Sacramento_ CA	23	Male	Other race_ nec	Never married/single (N/A)	140002000	California	Sacramento_ CA	40	Female	Black/Negro	Divorced	90002000	California	Sacramento_ CA	11	Male	Black/Negro	Never married/single (N/A)	2000	California	Sacramento_ CA	46	Male	Black/Negro	Married_ spouse present	400002000	California	Sacramento_ CA	34	Female	Black/Negro	Married_ spouse present	180002000	California	Sacramento_ CA	16	Male	Black/Negro	Never married/single (N/A)	02000	California	Sacramento_ CA	11	Female	Black/Negro	Never married/single (N/A)	2000	California	Sacramento_ CA	7	Female	Black/Negro	Never married/single (N/A)	2000	California	Sacramento_ CA	23	Male	White	Never married/single (N/A)	650002000	California	Sacramento_ CA	30	Female	White	Divorced	300002000	California	Sacramento_ CA	35	Male	White	Married_ spouse present	611002000	California	Sacramento_ CA	30	Male	White	Married_ spouse present	620002000	California	Sacramento_ CA	28	Female	White	Married_ spouse present	55002000	California	Sacramento_ CA	3	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	0	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	42	Male	White	Married_ spouse present	360002000	California	Sacramento_ CA	17	Female	White	Never married/single (N/A)	02000	California	Sacramento_ CA	6	Male	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Male	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	40	Male	White	Married_ spouse present	500002000	California	Sacramento_ CA	37	Female	White	Married_ spouse present	700002000	California	Sacramento_ CA	9	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	7	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	39	Male	White	Divorced	344002000	California	Sacramento_ CA	33	Male	Other Asian or Pacific Islander	Married_ spouse present	180002000	California	Sacramento_ CA	37	Female	Other Asian or Pacific Islander	Married_ spouse present	02000	California	Sacramento_ CA	62	Male	Other Asian or Pacific Islander	Married_ spouse present	38002000	California	Sacramento_ CA	27	Male	Other race_ nec	Married_ spouse absent	150002000	California	Sacramento_ CA	11	Female	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	21	Female	Other race_ nec	Married_ spouse absent	02000	California	Sacramento_ CA	5	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	4	Female	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	80	Female	White	Widowed	551002000	California	Sacramento_ CA	28	Female	Other Asian or Pacific Islander	Married_ spouse present	270002000	California	Sacramento_ CA	0	Male	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	85	Female	White	Widowed	139002000	California	Sacramento_ CA	24	Female	White	Never married/single (N/A)	200002000	California	Sacramento_ CA	45	Female	Black/Negro	Divorced	1500002000	California	Sacramento_ CA	52	Female	White	Divorced	83002000	California	Sacramento_ CA	23	Male	Black/Negro	Never married/single (N/A)	02000	California	Sacramento_ CA	16	Male	Black/Negro	Never married/single (N/A)	02000	California	Sacramento_ CA	43	Female	Other Asian or Pacific Islander	Married_ spouse present	02000	California	Sacramento_ CA	62	Male	White	Married_ spouse present	420002000	California	Sacramento_ CA	60	Female	White	Divorced	14002000	California	Sacramento_ CA	52	Male	White	Married_ spouse present	700002000	California	Sacramento_ CA	51	Female	White	Married_ spouse present	350002000	California	Sacramento_ CA	49	Female	White	Divorced	660002000	California	Sacramento_ CA	29	Male	White	Married_ spouse present	135002000	California	Sacramento_ CA	4	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	2	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	49	Female	Other Asian or Pacific Islander	Married_ spouse present	51002000	California	Sacramento_ CA	51	Male	Other Asian or Pacific Islander	Married_ spouse present	81002000	California	Sacramento_ CA	19	Female	Other Asian or Pacific Islander	Never married/single (N/A)	80002000	California	Sacramento_ CA	25	Male	Other Asian or Pacific Islander	Married_ spouse present	320002000	California	Sacramento_ CA	55	Female	White	Married_ spouse present	518002000	California	Sacramento_ CA	39	Female	White	Never married/single (N/A)	250002000	California	Sacramento_ CA	39	Male	White	Married_ spouse absent	950002000	California	Sacramento_ CA	25	Female	American Indian or Alaska Native	Never married/single (N/A)	320002000	California	Sacramento_ CA	24	Female	White	Married_ spouse present	02000	California	Sacramento_ CA	4	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	77	Male	White	Married_ spouse present	550002000	California	Sacramento_ CA	63	Female	Two major races	Married_ spouse present	510002000	California	Sacramento_ CA	33	Male	White	Married_ spouse present	200002000	California	Sacramento_ CA	12	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	4	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	35	Male	Black/Negro	Divorced	8502000	California	Sacramento_ CA	44	Male	White	Married_ spouse present	800002000	California	Sacramento_ CA	44	Female	White	Married_ spouse present	440002000	California	Sacramento_ CA	18	Male	White	Never married/single (N/A)	02000	California	Sacramento_ CA	15	Female	White	Never married/single (N/A)	02000	California	Sacramento_ CA	19	Male	Two major races	Never married/single (N/A)	37502000	California	Sacramento_ CA	37	Male	Black/Negro	Married_ spouse present	200002000	California	Sacramento_ CA	1	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	30	Male	White	Married_ spouse present	360002000	California	Sacramento_ CA	39	Male	White	Married_ spouse absent	550002000	California	Sacramento_ CA	41	Female	White	Married_ spouse absent	02000	California	Sacramento_ CA	36	Female	White	Never married/single (N/A)	320002000	California	Sacramento_ CA	33	Female	White	Divorced	360002000	California	Sacramento_ CA	18	Male	Other race_ nec	Never married/single (N/A)	20102000	California	Sacramento_ CA	2	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	49	Male	White	Married_ spouse present	763002000	California	Sacramento_ CA	46	Female	White	Married_ spouse present	410002000	California	Sacramento_ CA	20	Female	Black/Negro	Never married/single (N/A)	100002000	California	Sacramento_ CA	35	Male	White	Divorced	96002000	California	Sacramento_ CA	59	Male	White	Divorced	540002000	California	Sacramento_ CA	44	Female	White	Never married/single (N/A)	290002000	California	Sacramento_ CA	15	Female	White	Never married/single (N/A)	02000	California	Sacramento_ CA	51	Male	Japanese	Married_ spouse present	120002000	California	Sacramento_ CA	19	Male	White	Never married/single (N/A)	20002000	California	Sacramento_ CA	16	Male	White	Never married/single (N/A)	02000	California	Sacramento_ CA	14	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	54	Female	White	Married_ spouse present	394002000	California	Sacramento_ CA	51	Female	White	Married_ spouse present	02000	California	Sacramento_ CA	12	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	30	Female	White	Married_ spouse present	400002000	California	Sacramento_ CA	29	Male	White	Married_ spouse present	300002000	California	Sacramento_ CA	0	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	63	Female	White	Married_ spouse present	221002000	California	Sacramento_ CA	46	Female	White	Divorced	179002000	California	Sacramento_ CA	26	Male	White	Never married/single (N/A)	200002000	California	Sacramento_ CA	46	Female	Black/Negro	Divorced	230002000	California	Sacramento_ CA	24	Male	Black/Negro	Never married/single (N/A)	250002000	California	Sacramento_ CA	80	Male	White	Married_ spouse absent	120002000	California	Sacramento_ CA	36	Male	White	Married_ spouse present	109002000	California	Sacramento_ CA	29	Male	White	Married_ spouse absent	1600002000	California	Sacramento_ CA	64	Male	White	Divorced	140002000	California	Sacramento_ CA	27	Female	White	Married_ spouse present	196002000	California	Sacramento_ CA	29	Male	White	Married_ spouse present	680002000	California	Sacramento_ CA	93	Male	White	Married_ spouse present	393002000	California	Sacramento_ CA	22	Male	Other Asian or Pacific Islander	Never married/single (N/A)	120002000	California	Sacramento_ CA	23	Male	Other Asian or Pacific Islander	Never married/single (N/A)	67002000	California	Sacramento_ CA	38	Male	White	Divorced	500002000	California	Sacramento_ CA	40	Female	White	Married_ spouse present	524902000	California	Sacramento_ CA	39	Male	White	Married_ spouse present	624002000	California	Sacramento_ CA	11	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	25	Female	Other Asian or Pacific Islander	Married_ spouse present	02000	California	Sacramento_ CA	8	Female	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	44	Male	Two major races	Married_ spouse present	160002000	California	Sacramento_ CA	39	Female	Two major races	Married_ spouse present	69002000	California	Sacramento_ CA	21	Male	Other race_ nec	Never married/single (N/A)	40002000	California	Sacramento_ CA	20	Male	White	Never married/single (N/A)	130002000	California	Sacramento_ CA	17	Female	White	Never married/single (N/A)	02000	California	Sacramento_ CA	12	Female	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	21	Male	Other race_ nec	Never married/single (N/A)	167002000	California	Sacramento_ CA	37	Female	Black/Negro	Separated	249002000	California	Sacramento_ CA	33	Male	Black/Negro	Never married/single (N/A)	161002000	California	Sacramento_ CA	15	Male	Black/Negro	Never married/single (N/A)	71002000	California	Sacramento_ CA	7	Female	Black/Negro	Never married/single (N/A)	2000	California	Sacramento_ CA	12	Female	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	4	Female	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	16	Male	White	Never married/single (N/A)	02000	California	Sacramento_ CA	15	Male	White	Never married/single (N/A)	02000	California	Sacramento_ CA	65	Male	White	Married_ spouse present	448002000	California	Sacramento_ CA	71	Female	White	Married_ spouse present	160002000	California	Sacramento_ CA	71	Male	White	Married_ spouse present	867002000	California	Sacramento_ CA	65	Female	White	Married_ spouse present	30002000	California	Sacramento_ CA	40	Female	White	Married_ spouse present	125002000	California	Sacramento_ CA	12	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	52	Female	White	Married_ spouse present	166002000	California	Sacramento_ CA	40	Female	Two major races	Married_ spouse present	1700002000	California	Sacramento_ CA	46	Male	Two major races	Married_ spouse present	700002000	California	Sacramento_ CA	31	Male	White	Married_ spouse present	320002000	California	Sacramento_ CA	8	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	36	Male	Other race_ nec	Married_ spouse present	400402000	California	Sacramento_ CA	8	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	45	Male	White	Married_ spouse present	100002000	California	Sacramento_ CA	36	Female	White	Married_ spouse present	340002000	California	Sacramento_ CA	29	Male	White	Married_ spouse present	6002000	California	Sacramento_ CA	24	Female	White	Married_ spouse present	02000	California	Sacramento_ CA	43	Male	White	Married_ spouse present	60002000	California	Sacramento_ CA	2	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	0	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	37	Male	Black/Negro	Married_ spouse present	500202000	California	Sacramento_ CA	35	Female	Black/Negro	Married_ spouse present	500202000	California	Sacramento_ CA	50	Male	White	Married_ spouse present	920002000	California	Sacramento_ CA	49	Female	White	Married_ spouse present	700002000	California	Sacramento_ CA	7	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	7	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	35	Male	White	Married_ spouse present	150002000	California	Sacramento_ CA	27	Female	White	Married_ spouse present	160002000	California	Sacramento_ CA	38	Female	Other Asian or Pacific Islander	Never married/single (N/A)	300002000	California	Sacramento_ CA	58	Male	White	Married_ spouse present	1340002000	California	Sacramento_ CA	53	Female	Two major races	Married_ spouse present	02000	California	Sacramento_ CA	41	Male	White	Married_ spouse present	1700002000	California	Sacramento_ CA	14	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	48	Female	Two major races	Married_ spouse present	02000	California	Sacramento_ CA	33	Female	Two major races	Married_ spouse present	652002000	California	Sacramento_ CA	82	Female	White	Widowed	497002000	California	Sacramento_ CA	50	Male	White	Married_ spouse present	791002000	California	Sacramento_ CA	47	Female	White	Married_ spouse present	142002000	California	Sacramento_ CA	30	Male	Two major races	Married_ spouse present	200002000	California	Sacramento_ CA	44	Female	Two major races	Married_ spouse present	1052002000	California	Sacramento_ CA	41	Female	White	Married_ spouse present	200002000	California	Sacramento_ CA	4	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	25	Female	White	Never married/single (N/A)	292002000	California	Sacramento_ CA	7	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	53	Female	White	Married_ spouse present	600002000	California	Sacramento_ CA	19	Female	White	Never married/single (N/A)	20002000	California	Sacramento_ CA	93	Male	White	Divorced	226002000	California	Sacramento_ CA	32	Male	White	Divorced	120002000	California	Sacramento_ CA	50	Female	White	Married_ spouse present	340002000	California	Sacramento_ CA	53	Male	White	Married_ spouse present	246002000	California	Sacramento_ CA	41	Male	White	Married_ spouse present	500002000	California	Sacramento_ CA	38	Female	White	Married_ spouse present	210002000	California	Sacramento_ CA	15	Female	White	Never married/single (N/A)	02000	California	Sacramento_ CA	8	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	0	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	79	Female	White	Widowed	160002000	California	Sacramento_ CA	63	Female	White	Widowed	2069002000	California	Sacramento_ CA	41	Male	White	Married_ spouse present	100002000	California	Sacramento_ CA	40	Female	White	Married_ spouse present	56002000	California	Sacramento_ CA	34	Female	White	Never married/single (N/A)	245002000	California	Sacramento_ CA	11	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	51	Male	White	Married_ spouse present	849002000	California	Sacramento_ CA	11	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	66	Male	White	Married_ spouse present	93002000	California	Sacramento_ CA	65	Female	White	Married_ spouse present	56002000	California	Sacramento_ CA	52	Male	White	Married_ spouse present	604002000	California	Sacramento_ CA	31	Male	White	Never married/single (N/A)	250002000	California	Sacramento_ CA	54	Female	White	Divorced	250002000	California	Sacramento_ CA	7	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	5	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	43	Female	White	Married_ spouse present	50002000	California	Sacramento_ CA	12	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	44	Male	White	Never married/single (N/A)	200002000	California	Sacramento_ CA	69	Female	White	Widowed	304002000	California	Sacramento_ CA	52	Female	White	Separated	130002000	California	Sacramento_ CA	42	Male	White	Married_ spouse present	810002000	California	Sacramento_ CA	47	Female	White	Married_ spouse present	134002000	California	Sacramento_ CA	12	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	59	Female	White	Married_ spouse present	304002000	California	Sacramento_ CA	14	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	6	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	28	Female	White	Married_ spouse present	376002000	California	Sacramento_ CA	1	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	0	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	62	Male	White	Married_ spouse present	1150002000	California	Sacramento_ CA	83	Female	Chinese	Widowed	821002000	California	Sacramento_ CA	9	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	50	Male	White	Married_ spouse present	500002000	California	Sacramento_ CA	48	Male	White	Married_ spouse present	136002000	California	Sacramento_ CA	23	Male	White	Never married/single (N/A)	9002000	Ohio	Cleveland_ OH	76	Male	White	Married_ spouse absent	330002000	Ohio	Cleveland_ OH	68	Male	White	Married_ spouse present	413002000	Ohio	Cleveland_ OH	46	Female	White	Married_ spouse present	477002000	Ohio	Cleveland_ OH	45	Male	White	Widowed	66902000	Ohio	Cleveland_ OH	48	Male	White	Married_ spouse present	900002000	Ohio	Cleveland_ OH	48	Female	White	Married_ spouse present	210002000	Ohio	Cleveland_ OH	15	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	50	Male	White	Married_ spouse present	810002000	Ohio	Cleveland_ OH	50	Female	White	Married_ spouse present	170002000	Ohio	Cleveland_ OH	62	Female	White	Married_ spouse present	23002000	Ohio	Cleveland_ OH	30	Male	White	Married_ spouse present	352002000	Ohio	Cleveland_ OH	31	Female	White	Married_ spouse present	246002000	Ohio	Cleveland_ OH	5	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	49	Male	White	Married_ spouse present	1270002000	Ohio	Cleveland_ OH	16	Male	White	Never married/single (N/A)	1302000	Ohio	Cleveland_ OH	88	Female	White	Widowed	199002000	Ohio	Cleveland_ OH	35	Female	White	Married_ spouse present	100002000	Ohio	Cleveland_ OH	38	Male	White	Never married/single (N/A)	182002000	Ohio	Cleveland_ OH	67	Female	White	Married_ spouse present	84002000	Ohio	Cleveland_ OH	48	Female	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	32	Male	White	Married_ spouse present	430002000	Ohio	Cleveland_ OH	21	Female	Black/Negro	Never married/single (N/A)	106002000	Ohio	Cleveland_ OH	32	Female	White	Married_ spouse present	276002000	Ohio	Cleveland_ OH	12	Female	Two major races	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	7	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	5	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	38	Female	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	11	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	8	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	5	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	0	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	42	Male	White	Married_ spouse present	960302000	Ohio	Cleveland_ OH	42	Female	White	Married_ spouse present	280002000	Ohio	Cleveland_ OH	25	Male	White	Divorced	250002000	Ohio	Cleveland_ OH	54	Male	White	Married_ spouse present	310002000	Ohio	Cleveland_ OH	52	Female	White	Married_ spouse present	361002000	Ohio	Cleveland_ OH	29	Male	White	Separated	610002000	Ohio	Cleveland_ OH	58	Female	White	Divorced	61002000	Ohio	Cleveland_ OH	70	Male	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	24	Female	White	Never married/single (N/A)	270002000	Ohio	Cleveland_ OH	14	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	12	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	43	Female	Other race_ nec	Married_ spouse present	390002000	Ohio	Cleveland_ OH	13	Male	Other race_ nec	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	20	Female	Other race_ nec	Never married/single (N/A)	15402000	Ohio	Cleveland_ OH	54	Female	White	Divorced	550002000	Ohio	Cleveland_ OH	15	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	13	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	38	Female	White	Married_ spouse present	290002000	Ohio	Cleveland_ OH	42	Female	American Indian or Alaska Native	Never married/single (N/A)	141002000	Ohio	Cleveland_ OH	58	Male	White	Married_ spouse present	1524002000	Ohio	Cleveland_ OH	26	Male	White	Married_ spouse present	366002000	Ohio	Cleveland_ OH	8	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	5	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	0	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	65	Male	White	Married_ spouse present	201002000	Ohio	Cleveland_ OH	56	Female	White	Married_ spouse present	3002000	Ohio	Cleveland_ OH	61	Female	White	Married_ spouse present	64002000	Ohio	Cleveland_ OH	50	Male	White	Married_ spouse present	756002000	Ohio	Cleveland_ OH	41	Female	White	Married_ spouse present	4902000	Ohio	Cleveland_ OH	11	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	86	Male	White	Widowed	295002000	Ohio	Cleveland_ OH	45	Male	White	Divorced	679002000	Ohio	Cleveland_ OH	33	Male	White	Never married/single (N/A)	220002000	Ohio	Cleveland_ OH	51	Female	White	Married_ spouse absent	96002000	Ohio	Cleveland_ OH	15	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	32	Female	White	Divorced	355002000	Ohio	Cleveland_ OH	22	Female	White	Never married/single (N/A)	240002000	Ohio	Cleveland_ OH	10	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	77	Male	White	Married_ spouse present	132102000	Ohio	Cleveland_ OH	75	Female	White	Married_ spouse present	149202000	Ohio	Cleveland_ OH	57	Female	White	Married_ spouse present	7002000	Ohio	Cleveland_ OH	23	Female	White	Never married/single (N/A)	110002000	Ohio	Cleveland_ OH	4	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	37	Male	White	Married_ spouse present	301002000	Ohio	Cleveland_ OH	72	Female	White	Married_ spouse present	53002000	Ohio	Cleveland_ OH	62	Female	White	Divorced	90002000	Ohio	Cleveland_ OH	77	Male	White	Divorced	107802000	Ohio	Cleveland_ OH	41	Male	White	Never married/single (N/A)	180002000	Ohio	Cleveland_ OH	52	Female	White	Divorced	487002000	Ohio	Cleveland_ OH	53	Male	White	Divorced	350002000	Ohio	Cleveland_ OH	43	Male	White	Married_ spouse present	620002000	Ohio	Cleveland_ OH	14	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	10	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	56	Male	White	Married_ spouse present	593002000	Ohio	Cleveland_ OH	53	Female	White	Married_ spouse present	350002000	Ohio	Cleveland_ OH	60	Male	White	Married_ spouse present	360042000	Ohio	Cleveland_ OH	57	Female	White	Married_ spouse present	250102000	Ohio	Cleveland_ OH	50	Male	White	Married_ spouse present	377002000	Ohio	Cleveland_ OH	45	Female	White	Married_ spouse present	336002000	Ohio	Cleveland_ OH	18	Male	White	Never married/single (N/A)	88402000	Ohio	Cleveland_ OH	11	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	39	Female	White	Married_ spouse present	248002000	Ohio	Cleveland_ OH	35	Male	White	Married_ spouse present	544502000	Ohio	Cleveland_ OH	2	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	56	Female	White	Divorced	544002000	Ohio	Cleveland_ OH	93	Female	White	Widowed	02000	Ohio	Cleveland_ OH	69	Male	White	Widowed	479902000	Ohio	Cleveland_ OH	51	Male	White	Married_ spouse present	1310002000	Ohio	Cleveland_ OH	53	Female	White	Married_ spouse present	700002000	Ohio	Cleveland_ OH	80	Male	White	Married_ spouse present	432002000	Ohio	Cleveland_ OH	68	Female	White	Married_ spouse present	708002000	Ohio	Cleveland_ OH	38	Male	White	Never married/single (N/A)	250002000	Ohio	Cleveland_ OH	34	Female	White	Married_ spouse present	300002000	Ohio	Cleveland_ OH	70	Female	White	Never married/single (N/A)	667002000	Ohio	Cleveland_ OH	57	Male	White	Married_ spouse present	600002000	Ohio	Cleveland_ OH	47	Female	White	Never married/single (N/A)	220002000	Ohio	Cleveland_ OH	67	Female	White	Married_ spouse absent	289002000	Ohio	Cleveland_ OH	35	Female	White	Divorced	241002000	Ohio	Cleveland_ OH	15	Male	White	Never married/single (N/A)	9002000	Ohio	Cleveland_ OH	13	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	58	Female	White	Widowed	489002000	Ohio	Cleveland_ OH	72	Female	White	Widowed	136002000	Ohio	Cleveland_ OH	27	Female	Black/Negro	Never married/single (N/A)	207002000	Ohio	Cleveland_ OH	7	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	4	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	0	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	73	Male	White	Widowed	185002000	Ohio	Cleveland_ OH	65	Male	Black/Negro	Married_ spouse present	218002000	Ohio	Cleveland_ OH	66	Female	Black/Negro	Married_ spouse present	36002000	Ohio	Cleveland_ OH	63	Male	White	Married_ spouse present	90002000	Ohio	Cleveland_ OH	60	Female	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	79	Male	White	Never married/single (N/A)	134002000	Ohio	Cleveland_ OH	83	Female	White	Widowed	104002000	Ohio	Cleveland_ OH	7	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	83	Male	White	Widowed	180002000	Ohio	Cleveland_ OH	62	Male	Black/Negro	Never married/single (N/A)	60002000	Ohio	Cleveland_ OH	12	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	1	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	14	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	8	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	71	Female	White	Never married/single (N/A)	60002000	Ohio	Cleveland_ OH	68	Female	White	Widowed	135702000	Ohio	Cleveland_ OH	51	Male	White	Married_ spouse present	327002000	Ohio	Cleveland_ OH	50	Female	White	Married_ spouse present	326002000	Ohio	Cleveland_ OH	19	Male	White	Never married/single (N/A)	45602000	Ohio	Cleveland_ OH	53	Female	White	Never married/single (N/A)	420002000	Ohio	Cleveland_ OH	24	Female	Black/Negro	Never married/single (N/A)	26002000	Ohio	Cleveland_ OH	2	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	77	Female	White	Widowed	123902000	Ohio	Cleveland_ OH	37	Female	Black/Negro	Never married/single (N/A)	232002000	Ohio	Cleveland_ OH	16	Female	Black/Negro	Never married/single (N/A)	5002000	Ohio	Cleveland_ OH	13	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	16	Female	Black/Negro	Never married/single (N/A)	5002000	Ohio	Cleveland_ OH	2	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	12	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	11	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	66	Male	Black/Negro	Divorced	100102000	Ohio	Cleveland_ OH	41	Male	Black/Negro	Never married/single (N/A)	56002000	Ohio	Cleveland_ OH	30	Female	Black/Negro	Never married/single (N/A)	120042000	Ohio	Cleveland_ OH	42	Female	White	Divorced	416002000	Ohio	Cleveland_ OH	65	Male	White	Divorced	440002000	Ohio	Cleveland_ OH	47	Male	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	47	Female	White	Never married/single (N/A)	106002000	Ohio	Cleveland_ OH	59	Female	White	Divorced	02000	Ohio	Cleveland_ OH	86	Female	Black/Negro	Widowed	178602000	Ohio	Cleveland_ OH	70	Male	Black/Negro	Married_ spouse present	208002000	Ohio	Cleveland_ OH	69	Female	Black/Negro	Married_ spouse present	299002000	Ohio	Cleveland_ OH	52	Female	White	Divorced	5002000	Ohio	Cleveland_ OH	49	Female	Black/Negro	Never married/single (N/A)	55002000	Ohio	Cleveland_ OH	63	Female	White	Divorced	76002000	Ohio	Cleveland_ OH	33	Male	White	Never married/single (N/A)	319002000	Ohio	Cleveland_ OH	73	Female	White	Widowed	87002000	Ohio	Cleveland_ OH	27	Male	White	Divorced	190002000	Ohio	Cleveland_ OH	39	Male	White	Married_ spouse present	860002000	Ohio	Cleveland_ OH	39	Female	White	Married_ spouse present	200002000	Ohio	Cleveland_ OH	14	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	0	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	37	Male	Black/Negro	Never married/single (N/A)	370602000	Ohio	Cleveland_ OH	32	Male	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	59	Female	Two major races	Divorced	300002000	Ohio	Cleveland_ OH	57	Female	White	Married_ spouse present	1260002000	Ohio	Cleveland_ OH	74	Male	White	Married_ spouse present	299002000	Ohio	Cleveland_ OH	71	Female	White	Married_ spouse present	65002000	Ohio	Cleveland_ OH	40	Male	White	Never married/single (N/A)	78002000	Ohio	Cleveland_ OH	42	Female	White	Separated	195002000	Ohio	Cleveland_ OH	79	Male	White	Married_ spouse present	325002000	Ohio	Cleveland_ OH	47	Male	White	Married_ spouse present	350002000	Ohio	Cleveland_ OH	49	Female	White	Married_ spouse present	200002000	Ohio	Cleveland_ OH	9	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	85	Male	White	Widowed	175002000	Ohio	Cleveland_ OH	40	Female	White	Divorced	232002000	Ohio	Cleveland_ OH	14	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	84	Female	White	Widowed	75002000	Ohio	Cleveland_ OH	53	Female	White	Never married/single (N/A)	320002000	Ohio	Cleveland_ OH	26	Male	Black/Negro	Married_ spouse present	182002000	Ohio	Cleveland_ OH	1	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	75	Male	White	Married_ spouse present	170002000	Ohio	Cleveland_ OH	35	Female	White	Married_ spouse present	300002000	Ohio	Cleveland_ OH	9	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	3	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	3	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	74	Female	White	Married_ spouse present	270002000	Ohio	Cleveland_ OH	18	Female	White	Never married/single (N/A)	28002000	Ohio	Cleveland_ OH	54	Female	Black/Negro	Divorced	110002000	Ohio	Cleveland_ OH	58	Male	Black/Negro	Divorced	100002000	Ohio	Cleveland_ OH	29	Male	White	Never married/single (N/A)	303002000	Ohio	Cleveland_ OH	51	Male	White	Separated	400002000	Ohio	Cleveland_ OH	63	Female	White	Divorced	600002000	Ohio	Cleveland_ OH	67	Female	White	Widowed	92002000	Ohio	Cleveland_ OH	69	Female	Black/Negro	Widowed	85002000	Ohio	Cleveland_ OH	40	Female	White	Married_ spouse present	131302000	Ohio	Cleveland_ OH	62	Male	White	Married_ spouse present	455002000	Ohio	Cleveland_ OH	61	Female	White	Married_ spouse present	490002000	Ohio	Cleveland_ OH	18	Male	White	Never married/single (N/A)	290002000	Ohio	Cleveland_ OH	61	Male	White	Married_ spouse present	200002000	Ohio	Cleveland_ OH	38	Male	White	Married_ spouse present	462002000	Ohio	Cleveland_ OH	40	Male	White	Married_ spouse present	1120002000	Ohio	Cleveland_ OH	34	Female	White	Married_ spouse present	32002000	Ohio	Cleveland_ OH	19	Male	White	Never married/single (N/A)	130002000	Ohio	Cleveland_ OH	36	Male	White	Married_ spouse present	535002000	Ohio	Cleveland_ OH	60	Male	White	Married_ spouse present	492002000	Ohio	Cleveland_ OH	47	Female	White	Married_ spouse present	253502000	Ohio	Cleveland_ OH	26	Female	White	Never married/single (N/A)	293002000	Ohio	Cleveland_ OH	51	Female	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	34	Female	White	Separated	25002000	Ohio	Cleveland_ OH	3	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	45	Male	White	Married_ spouse present	519102000	Ohio	Cleveland_ OH	7	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	31	Female	White	Married_ spouse present	330002000	Ohio	Cleveland_ OH	0	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	41	Female	White	Married_ spouse present	80002000	Ohio	Cleveland_ OH	22	Male	White	Never married/single (N/A)	32002000	Ohio	Cleveland_ OH	12	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	35	Male	Other race_ nec	Married_ spouse absent	138002000	Ohio	Cleveland_ OH	11	Female	Other race_ nec	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	26	Male	Other race_ nec	Married_ spouse present	120002000	Ohio	Cleveland_ OH	47	Male	Other race_ nec	Married_ spouse absent	124002000	Ohio	Cleveland_ OH	62	Male	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	63	Female	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	18	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	17	Male	White	Never married/single (N/A)	8302000	Ohio	Cleveland_ OH	41	Female	White	Divorced	120002000	Ohio	Cleveland_ OH	15	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	26	Male	White	Married_ spouse present	754002000	Ohio	Cleveland_ OH	30	Female	White	Married_ spouse present	56002000	Ohio	Cleveland_ OH	51	Male	White	Married_ spouse present	361002000	Ohio	Cleveland_ OH	44	Female	White	Married_ spouse present	301002000	Ohio	Cleveland_ OH	22	Female	White	Never married/single (N/A)	180002000	Ohio	Cleveland_ OH	18	Female	White	Never married/single (N/A)	60002000	Ohio	Cleveland_ OH	21	Female	White	Never married/single (N/A)	65002000	Ohio	Cleveland_ OH	74	Male	White	Married_ spouse present	231002000	Ohio	Cleveland_ OH	27	Male	White	Divorced	600002000	Ohio	Cleveland_ OH	8	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	47	Male	White	Married_ spouse present	336002000	Ohio	Cleveland_ OH	72	Female	White	Widowed	210002000	Ohio	Cleveland_ OH	61	Male	White	Married_ spouse present	1432302000	Ohio	Cleveland_ OH	16	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	13	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	43	Male	White	Married_ spouse present	1163902000	Ohio	Cleveland_ OH	39	Female	White	Married_ spouse present	230002000	Ohio	Cleveland_ OH	10	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	6	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	62	Male	White	Widowed	4202000	Ohio	Cleveland_ OH	53	Male	White	Married_ spouse present	700002000	Ohio	Cleveland_ OH	52	Female	White	Married_ spouse present	300002000	Ohio	Cleveland_ OH	50	Male	White	Divorced	165002000	Ohio	Cleveland_ OH	59	Female	White	Divorced	272002000	Ohio	Cleveland_ OH	40	Male	White	Married_ spouse present	237002000	Ohio	Cleveland_ OH	40	Female	White	Married_ spouse present	198002000	Ohio	Cleveland_ OH	9	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	32	Female	White	Married_ spouse present	257002000	Ohio	Cleveland_ OH	4	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	34	Male	White	Never married/single (N/A)	360002000	Ohio	Cleveland_ OH	73	Female	White	Widowed	258202000	Ohio	Cleveland_ OH	59	Male	White	Married_ spouse present	340002000	Ohio	Cleveland_ OH	60	Female	White	Married_ spouse present	21000
\ No newline at end of file
diff --git a/docs/previous_versions/v0.4.0/data/dem_score.csv b/docs/previous_versions/v0.4.0/data/dem_score.csv
new file mode 100755
index 000000000..c48fc1f49
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/data/dem_score.csv
@@ -0,0 +1,97 @@
+country,1952,1957,1962,1967,1972,1977,1982,1987,1992
+Albania,-9,-9,-9,-9,-9,-9,-9,-9,5
+Argentina,-9,-1,-1,-9,-9,-9,-8,8,7
+Armenia,-9,-7,-7,-7,-7,-7,-7,-7,7
+Australia,10,10,10,10,10,10,10,10,10
+Austria,10,10,10,10,10,10,10,10,10
+Azerbaijan,-9,-7,-7,-7,-7,-7,-7,-7,1
+Belarus,-9,-7,-7,-7,-7,-7,-7,-7,7
+Belgium,10,10,10,10,10,10,10,10,10
+Bhutan,-10,-10,-10,-10,-10,-10,-10,-10,-10
+Bolivia,-4,-3,-3,-4,-7,-7,8,9,9
+Brazil,5,5,5,-9,-9,-4,-3,7,8
+Bulgaria,-7,-7,-7,-7,-7,-7,-7,-7,8
+Canada,10,10,10,10,10,10,10,10,10
+Chile,2,5,5,6,6,-7,-7,-6,8
+China,-8,-8,-8,-9,-8,-7,-7,-7,-7
+Colombia,-5,7,7,7,7,8,8,8,9
+Costa Rica,10,10,10,10,10,10,10,10,10
+Croatia,-7,-7,-7,-7,-7,-7,-5,-5,-3
+Cuba,0,-9,-7,-7,-7,-7,-7,-7,-7
+Czech Rep.,-7,-7,-7,-7,-7,-7,-7,-7,8
+Denmark,10,10,10,10,10,10,10,10,10
+Dominican Rep.,-9,-9,8,-3,-3,-3,6,6,6
+Ecuador,2,2,-1,-1,-5,-5,9,8,9
+Egypt,-7,-7,-7,-7,-7,-6,-6,-6,-6
+El Salvador,-6,-5,-3,0,-1,-6,2,6,7
+Estonia,-9,-7,-7,-7,-7,-7,-7,-7,6
+Ethiopia,-9,-9,-9,-9,-9,-7,-7,-8,0
+Finland,10,10,10,10,10,10,10,10,10
+France,10,10,5,5,8,8,8,9,9
+Georgia,-9,-7,-7,-7,-7,-7,-7,-7,4
+Germany,10,10,10,10,10,10,10,10,10
+Greece,4,4,4,-7,-7,8,8,10,10
+Guatemala,2,-6,-5,3,1,-3,-7,3,3
+Haiti,-5,-5,-9,-9,-10,-9,-9,-8,-7
+Honduras,-3,-1,-1,-1,-1,-1,6,5,6
+Hungary,-7,-7,-7,-7,-7,-7,-7,-7,10
+India,9,9,9,9,9,8,8,8,8
+Indonesia,0,-1,-5,-7,-7,-7,-7,-7,-7
+Iran,-1,-10,-10,-10,-10,-10,-6,-6,-6
+Iraq,-4,-4,-5,-5,-7,-7,-9,-9,-9
+Ireland,10,10,10,10,10,10,10,10,10
+Israel,10,10,10,9,9,9,9,9,9
+Italy,10,10,10,10,10,10,10,10,10
+Japan,10,10,10,10,10,10,10,10,10
+Jordan,-1,-9,-9,-9,-9,-10,-10,-9,-2
+Kazakhstan,-9,-7,-7,-7,-7,-7,-7,-7,-3
+"Korea, Dem. Rep.",-7,-8,-8,-9,-9,-9,-9,-9,-9
+"Korea, Rep.",-4,-4,-7,3,-9,-8,-5,1,6
+Kyrgyzstan,-9,-7,-7,-7,-7,-7,-7,-7,-3
+Latvia,-9,-7,-7,-7,-7,-7,-7,-7,8
+Lebanon,2,2,2,2,5,0,0,0,0
+Liberia,-6,-6,-6,-6,-6,-6,-7,-6,0
+Libya,-7,-7,-7,-7,-7,-7,-7,-7,-7
+Lithuania,-9,-7,-7,-7,-7,-7,-7,-7,10
+"Macedonia, FYR",-7,-7,-7,-7,-7,-7,-5,-5,6
+Mexico,-6,-6,-6,-6,-6,-3,-3,-3,0
+Moldova,-9,-7,-7,-7,-7,-7,-7,-7,5
+Mongolia,-7,-7,-7,-7,-7,-7,-7,-7,9
+Montenegro,-7,-7,-7,-7,-7,-7,-5,-5,-5
+Myanmar,8,8,-6,-7,-7,-6,-8,-8,-7
+Nepal,-7,-4,-9,-9,-9,-9,-2,-2,5
+Netherlands,10,10,10,10,10,10,10,10,10
+New Zealand,10,10,10,10,10,10,10,10,10
+Nicaragua,-8,-8,-8,-8,-8,-8,-5,-1,6
+Norway,10,10,10,10,10,10,10,10,10
+Oman,-6,-10,-10,-10,-10,-10,-10,-10,-9
+Pakistan,5,8,1,1,4,-7,-7,-4,8
+Panama,-1,4,4,4,-7,-7,-5,-8,8
+Paraguay,-5,-9,-9,-8,-8,-8,-8,-8,7
+Peru,-2,5,-6,5,-7,-7,7,7,-3
+Philippines,5,5,5,5,-9,-9,-7,8,8
+Poland,-7,-7,-7,-7,-7,-7,-8,-6,8
+Portugal,-9,-9,-9,-9,-9,9,10,10,10
+Romania,-7,-7,-7,-7,-7,-8,-8,-8,5
+Russia,-9,-7,-7,-7,-7,-7,-7,-7,5
+Saudi Arabia,-10,-10,-10,-10,-10,-10,-10,-10,-10
+Serbia,-7,-7,-7,-7,-7,-7,-5,-5,-5
+Slovak Republic,-7,-7,-7,-7,-7,-7,-7,-7,8
+Slovenia,-7,-7,-7,-7,-7,-7,-5,-5,10
+South Africa,4,4,4,4,4,4,4,4,6
+Spain,-7,-7,-7,-7,-7,5,10,10,10
+Sri Lanka,7,7,7,7,8,8,5,5,5
+Sweden,10,10,10,10,10,10,10,10,10
+Switzerland,10,10,10,10,10,10,10,10,10
+Syria,-7,7,-2,-7,-9,-9,-9,-9,-9
+Taiwan,-8,-8,-8,-8,-8,-7,-7,-1,7
+Tajikistan,-9,-7,-7,-7,-7,-7,-7,-7,-6
+Thailand,-6,-3,-7,-7,-7,-2,2,2,9
+Turkey,7,4,9,8,-2,9,-5,7,9
+Turkmenistan,-9,-7,-7,-7,-7,-7,-7,-7,-9
+Ukraine,-9,-7,-7,-7,-7,-7,-7,-7,6
+United Kingdom,10,10,10,10,10,10,10,10,10
+United States,10,10,10,10,10,10,10,10,10
+Uruguay,8,8,8,8,-3,-8,-7,9,10
+Uzbekistan,-9,-7,-7,-7,-7,-7,-7,-7,-9
+Venezuela,-3,-3,6,6,9,9,9,9,8
diff --git a/docs/previous_versions/v0.4.0/data/dem_score.xlsx b/docs/previous_versions/v0.4.0/data/dem_score.xlsx
new file mode 100755
index 000000000..85d90daa9
Binary files /dev/null and b/docs/previous_versions/v0.4.0/data/dem_score.xlsx differ
diff --git a/docs/previous_versions/v0.4.0/data/ideology.csv b/docs/previous_versions/v0.4.0/data/ideology.csv
new file mode 100755
index 000000000..302957298
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/data/ideology.csv
@@ -0,0 +1,76 @@
+city,state,state_ideology
+New York,New York,Liberal
+Chicago,Illinois,Liberal
+Los Angeles,California,Liberal
+Washington,DC,Liberal
+Houston,Texas,Conservative
+Philadelphia,Pennsylvania,Conservative
+Phoenix,Arizona,Conservative
+San Diego,California,Liberal
+Dallas,Texas,Conservative
+Detroit,Michigan,Conservative
+San Francisco,California,Liberal
+San Antonio,Texas,Conservative
+Atlanta,Georgia,Conservative
+Las Vegas,Nevada,Liberal
+Baltimore,Maryland,Liberal
+Boston,Massachusetts,Liberal
+"Jacksonville, Fla.",Florida,Conservative
+"El Paso, Texas",Texas,Conservative
+"Columbus, Ohio",Ohio,Conservative
+Cleveland,Ohio,Conservative
+"Tucson, Ariz.",Arizona,Conservative
+"Newark, N.J.",New Jersey,Liberal
+"Austin, Texas",Texas,Conservative
+"Memphis, Tenn.",Tennessee,Conservative
+Milwaukee,Wisconsin,Conservative
+"San Jose, Calif.",California,Liberal
+Miami,Florida,Conservative
+Denver,Colorado,Liberal
+"Sacramento, Calif.",California,Liberal
+"Charlotte, N.C.",North Carolina,Conservative
+"Tampa, Fla.",Florida,Conservative
+Indianapolis,Indiana,Conservative
+"Santa Ana, Calif.",California,Liberal
+New Orleans,Louisiana,Conservative
+"Oakland, Calif.",California,Liberal
+"Orlando, Fla.",Florida,Conservative
+"Oklahoma City, Okla.",Oklahoma,Conservative
+Seattle,Washington,Liberal
+"Kansas City, Mo.",Missouri,Conservative
+"Nashville, Tenn.",Tennessee,Conservative
+"Laredo, Texas",Texas,Conservative
+"Fort Worth, Texas",Texas,Conservative
+"Louisville, Ky.",Kentucky,Conservative
+"Norfolk, Va.",Virginia,Liberal
+"Arlington, Va.",Virginia,Liberal
+Pittsburgh,Pennsylvania,Conservative
+"Albuquerque, N.M.",New Mexico,Liberal
+"Jersey City, N.J.",New Jersey,Liberal
+"Raleigh, N.C.",North Carolina,Conservative
+"Rochester, N.Y.",New York,Liberal
+Cincinnati,Ohio,Conservative
+"Long Beach, Calif.",California,Liberal
+"Birmingham, Ala.",Alabama,Conservative
+"Wichita, Kan.",Kansas,Conservative
+"Virginia Beach, Va.",Virginia,Liberal
+"Fresno, Calif.",California,Liberal
+"Buffalo, N.Y.",New York,Liberal
+Minneapolis,Minneapolis,Liberal
+"Portland, Ore.",Oregon,Liberal
+"Reno, Nev.",Nevada,Liberal
+"Richmond, Va.",Virginia,Liberal
+"Baton Rouge, La.",Louisiana,Conservative
+"Jackson, Miss.",Mississippi,Conservative
+"Riverside, Calif.",California,Liberal
+"Fort Lauderdale, Fla.",Florida,Conservative
+St. Louis,Missouri,Conservative
+"Brownsville, Texas",Texas,Conservative
+"Albany, N.Y.",New York,Liberal
+"Colorado Springs, Colo.",Colorado,Liberal
+"Savannah, Ga.",Georgia,Conservative
+"Winston-Salem, N.C.",North Carolina,Conservative
+"Toledo, Ohio",Ohio,Conservative
+"Madison, Wis.",Wisconsin,Conservative
+"Corpus Christi, Texas",Texas,Conservative
+"San Bernardino, Calif.",California,Liberal
\ No newline at end of file
diff --git a/docs/previous_versions/v0.4.0/data/le_mess.csv b/docs/previous_versions/v0.4.0/data/le_mess.csv
new file mode 100755
index 000000000..7cc6fb6fc
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/data/le_mess.csv
@@ -0,0 +1,203 @@
+country,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
+Afghanistan,27.13,27.67,28.19,28.73,29.27,29.8,30.34,30.86,31.4,31.94,32.47,33.01,33.53,34.07,34.6,35.13,35.66,36.17,36.69,37.2,37.7,38.19,38.67,39.14,39.61,40.07,40.53,40.98,41.46,41.96,42.51,43.11,43.75,44.45,45.21,46.02,46.87,47.74,48.62,49.5,49.3,49.4,49.5,48.9,49.4,49.7,49.5,48.6,50.0,50.1,50.4,51.0,51.4,51.8,52.0,52.1,52.4,52.8,53.3,53.6,54.0,54.4,54.8,54.9,53.8,52.72
+Albania,54.72,55.23,55.85,56.59,57.45,58.42,59.48,60.6,61.75,62.87,63.92,64.84,65.6,66.18,66.59,66.88,67.11,67.32,67.55,67.83,68.16,68.53,68.93,69.35,69.77,70.17,70.54,70.86,71.14,71.39,71.63,71.88,72.15,72.42,72.71,72.96,73.14,73.25,73.3,73.3,73.4,73.6,73.6,73.6,73.7,73.8,74.1,74.2,74.2,74.7,75.1,75.5,75.7,75.9,76.2,76.4,76.6,76.8,77.0,77.2,77.4,77.5,77.7,77.9,78.0,78.1
+Algeria,43.03,43.5,43.96,44.44,44.93,45.44,45.94,46.45,46.97,47.5,48.02,48.55,49.07,49.58,50.09,50.58,51.05,51.49,51.95,52.41,52.88,53.38,53.91,54.52,55.24,56.11,57.13,58.28,59.56,60.92,62.31,63.69,64.97,66.15,67.18,68.04,68.75,69.33,69.81,70.2,70.5,70.9,71.2,71.4,71.6,72.1,72.4,72.6,73.0,73.3,73.5,73.8,73.9,74.4,74.8,75.0,75.3,75.5,75.7,76.0,76.1,76.2,76.3,76.3,76.4,76.5
+Angola,31.05,31.59,32.14,32.69,33.24,33.78,34.33,34.88,35.43,35.98,36.53,37.08,37.63,38.18,38.74,39.28,39.84,40.39,40.95,41.5,42.06,42.62,43.17,43.71,44.22,44.68,45.12,45.5,45.84,46.14,46.42,46.69,46.96,47.23,47.5,47.75,47.99,48.2,48.4,48.6,49.3,49.6,48.4,50.0,50.9,51.3,51.7,51.8,51.8,52.3,52.5,53.3,53.9,54.5,55.2,55.7,56.2,56.7,57.1,57.6,58.1,58.5,58.8,59.2,59.6,60.0
+Antigua and Barbuda,58.26,58.8,59.34,59.87,60.41,60.93,61.45,61.97,62.48,62.97,63.46,63.93,64.38,64.81,65.23,65.63,66.03,66.41,66.81,67.19,67.56,67.94,68.3,68.64,68.99,69.32,69.64,69.96,70.28,70.59,70.9,71.22,71.52,71.82,72.13,72.42,72.7,72.97,73.24,73.5,73.6,73.5,73.4,73.4,73.5,73.5,73.9,74.1,74.0,73.8,74.1,74.3,74.5,74.6,74.9,74.9,75.3,75.5,75.7,75.8,75.9,76.1,76.2,76.3,76.4,76.5
+Argentina,61.93,62.54,63.1,63.59,64.03,64.41,64.73,65.0,65.22,65.39,65.53,65.64,65.74,65.84,65.95,66.08,66.26,66.47,66.72,67.01,67.32,67.64,67.96,68.28,68.6,68.92,69.24,69.57,69.89,70.2,70.51,70.78,71.04,71.26,71.46,71.66,71.84,72.05,72.26,72.5,72.7,72.8,73.1,73.4,73.5,73.5,73.6,73.8,73.9,74.2,74.3,74.3,74.5,75.0,75.3,75.3,75.2,75.4,75.6,75.8,76.0,76.1,76.2,76.3,76.5,76.7
+Armenia,62.67,63.13,63.6,64.07,64.54,65.0,65.45,65.92,66.39,66.86,67.33,67.82,68.3,68.78,69.26,69.74,70.22,70.67,71.1,71.47,71.79,72.02,72.19,72.28,72.33,72.38,72.44,72.53,72.63,72.72,72.73,72.64,72.43,72.1,71.7,71.24,70.82,70.46,70.22,70.1,69.7,68.8,68.3,68.6,69.1,69.4,70.0,70.5,70.8,71.3,71.4,71.6,71.5,71.8,71.8,71.7,72.3,72.3,72.6,73.0,73.5,73.9,74.3,74.5,74.7,74.9
+Aruba,58.96,60.01,60.98,61.87,62.69,63.42,64.09,64.68,65.2,65.66,66.07,66.44,66.79,67.11,67.44,67.76,68.1,68.44,68.78,69.14,69.5,69.85,70.19,70.52,70.83,71.14,71.44,71.74,72.02,72.29,72.54,72.75,72.93,73.07,73.18,73.26,73.33,73.38,73.43,73.47,73.51,73.54,73.57,73.6,73.62,73.65,73.67,73.7,73.73,73.78,73.85,73.94,74.05,74.18,74.32,74.47,74.62,74.77,74.92,75.06,75.19,75.32,75.46,75.59,75.72,75.85
+Australia,68.71,69.11,69.69,69.84,70.16,70.03,70.31,70.86,70.43,70.87,71.14,70.91,70.97,70.63,70.96,70.79,71.07,70.7,71.11,70.78,71.38,71.9,72.11,71.86,72.81,72.84,73.45,73.84,74.4,74.56,74.92,74.7,75.51,75.98,75.41,76.08,76.27,76.3,76.4,77.0,77.4,77.6,77.9,78.1,78.3,78.5,78.8,79.2,79.4,79.8,80.1,80.3,80.6,80.9,81.2,81.4,81.5,81.6,81.8,82.0,82.2,82.4,82.4,82.3,82.3,82.3
+Austria,65.24,66.78,67.27,67.3,67.58,67.7,67.46,68.46,68.39,68.75,69.72,69.51,69.64,70.13,69.92,70.22,70.1,70.25,70.02,70.07,70.27,70.59,71.16,71.15,71.28,71.77,72.12,72.2,72.51,72.64,72.96,73.12,73.19,73.73,73.95,74.43,74.86,75.34,75.43,75.7,75.8,76.0,76.2,76.5,76.8,77.1,77.6,77.8,78.0,78.2,78.6,78.8,79.0,79.4,79.5,80.0,80.1,80.4,80.3,80.5,80.7,80.9,81.1,81.2,81.3,81.4
+Azerbaijan,57.5,57.93,58.36,58.79,59.21,59.63,60.05,60.48,60.9,61.33,61.76,62.2,62.62,63.06,63.49,63.91,64.35,64.75,65.14,65.48,65.75,65.93,66.04,66.05,66.02,65.92,65.8,65.68,65.6,65.55,65.61,65.73,65.92,66.15,66.37,66.48,66.46,66.28,65.98,65.6,65.3,63.7,64.0,63.5,64.6,65.0,65.3,65.6,65.9,66.5,67.2,67.6,67.6,67.8,68.2,68.7,69.1,69.2,69.7,70.1,70.8,71.5,72.1,72.5,72.9,73.3
+Bahamas,58.91,59.29,59.67,60.03,60.39,60.72,61.06,61.38,61.69,62.0,62.29,62.58,62.85,63.13,63.4,63.65,63.91,64.14,64.39,64.61,64.85,65.08,65.3,65.53,65.74,65.96,66.16,66.37,66.57,66.75,66.95,67.12,67.31,67.5,67.67,67.86,68.02,68.2,68.35,68.5,68.9,69.2,69.7,69.5,69.7,70.0,70.2,70.1,70.1,70.2,70.3,70.4,71.1,71.7,71.7,72.0,71.8,72.2,72.7,72.7,72.6,72.7,72.9,73.5,73.7,73.9
+Bahrain,41.45,42.32,43.26,44.27,45.35,46.49,47.7,48.97,50.29,51.64,52.99,54.33,55.64,56.9,58.1,59.23,60.29,61.29,62.22,63.1,63.92,64.67,65.38,66.03,66.63,67.2,67.72,68.21,68.67,69.09,69.47,69.83,70.16,70.46,70.73,70.98,71.2,71.41,71.61,71.8,72.0,72.1,72.5,72.9,73.0,73.4,73.8,74.0,74.2,73.7,74.3,74.8,75.3,75.7,76.1,76.3,77.0,77.6,78.2,78.7,78.8,79.0,79.1,79.1,79.1,79.1
+Bangladesh,42.58,42.87,43.19,43.54,43.91,44.3,44.73,45.19,45.68,46.2,46.73,47.28,47.81,48.29,48.6,48.63,48.37,47.83,47.09,46.31,45.74,45.52,45.77,46.49,47.58,48.92,50.27,51.47,52.44,53.18,53.72,54.15,54.57,55.0,55.47,55.96,56.46,56.94,57.42,57.9,56.4,59.7,60.5,61.2,61.6,62.4,63.2,63.9,64.6,64.9,65.4,65.8,66.3,66.8,67.1,67.5,67.7,68.3,68.6,68.8,69.3,69.4,69.8,70.1,70.4,70.7
+Barbados,56.82,57.41,57.99,58.56,59.13,59.67,60.22,60.76,61.28,61.8,62.31,62.79,63.27,63.74,64.2,64.64,65.08,65.5,65.91,66.31,66.71,67.09,67.47,67.83,68.17,68.53,68.87,69.22,69.57,69.91,70.25,70.58,70.91,71.23,71.54,71.85,72.14,72.43,72.72,73.0,73.2,73.2,73.1,73.0,73.3,73.7,73.9,74.1,74.2,74.0,74.4,74.6,74.8,74.9,75.0,75.0,75.1,75.3,75.3,75.2,75.2,75.4,75.5,75.6,75.7,75.8
+Belarus,65.11,65.54,65.96,66.37,66.77,67.16,67.52,67.88,68.82,71.59,72.3,71.01,71.66,73.17,72.7,73.05,72.78,72.88,72.47,71.94,72.56,72.26,72.29,72.57,71.63,71.46,71.39,71.23,70.82,70.57,70.84,70.95,70.73,70.09,70.28,71.66,71.55,71.28,71.05,70.5,70.1,69.6,68.9,68.6,68.2,68.1,68.0,67.9,67.7,68.1,68.0,67.9,68.2,68.5,68.7,69.1,69.7,70.0,70.1,70.2,70.3,70.4,70.6,70.7,71.0,71.3
+Belgium,66.77,67.97,68.33,68.59,68.54,68.83,69.19,69.88,70.28,69.59,70.46,70.19,70.0,70.66,70.51,70.58,70.86,70.55,70.63,70.89,71.01,71.35,71.56,71.91,71.9,72.05,72.7,72.64,73.13,73.18,73.59,73.81,73.81,74.31,74.41,74.61,75.22,75.53,75.59,76.0,76.2,76.3,76.5,76.6,76.9,77.2,77.4,77.5,77.7,77.8,78.0,78.2,78.5,79.0,79.1,79.5,79.5,79.6,79.8,80.1,80.2,80.3,80.4,80.5,80.5,80.5
+Belize,55.15,55.7,56.27,56.82,57.37,57.91,58.46,58.99,59.54,60.08,60.64,61.2,61.78,62.36,62.95,63.53,64.11,64.67,65.21,65.72,66.21,66.66,67.11,67.52,67.93,68.32,68.7,69.06,69.43,69.78,70.13,70.47,70.8,71.09,71.34,71.51,71.6,71.61,71.54,71.4,71.2,71.1,70.8,70.6,70.5,70.4,69.7,69.5,69.3,69.0,68.8,69.3,69.6,69.9,70.0,70.3,70.6,70.7,70.9,71.2,71.2,71.3,71.3,71.5,71.7,71.9
+Benin,33.53,34.09,34.64,35.19,35.72,36.25,36.77,37.28,37.79,38.29,38.8,39.32,39.85,40.38,40.93,41.5,42.09,42.69,43.31,43.93,44.55,45.16,45.77,46.36,46.93,47.46,47.96,48.43,48.88,49.34,49.84,50.38,50.97,51.62,52.33,53.09,53.89,54.67,55.42,56.1,56.3,56.6,56.9,56.8,56.7,56.6,56.9,57.0,57.1,57.2,57.4,57.7,57.9,58.2,58.6,58.9,59.2,59.7,60.4,60.8,61.1,61.4,61.7,62.0,62.3,62.6
+Bhutan,30.94,31.47,32.01,32.56,33.12,33.68,34.25,34.81,35.38,35.94,36.49,37.04,37.57,38.12,38.68,39.28,39.94,40.66,41.45,42.31,43.23,44.2,45.18,46.2,47.21,48.22,49.22,50.21,51.18,52.12,53.05,53.96,54.87,55.78,56.69,57.61,58.54,59.48,60.44,61.4,61.9,62.4,62.8,63.1,63.8,64.7,65.1,65.6,66.5,65.9,67.5,68.1,68.5,68.9,69.3,69.8,70.3,70.7,70.9,71.4,71.7,71.9,72.2,72.4,72.7,73.0
+Bolivia,40.6,40.94,41.28,41.64,41.98,42.34,42.7,43.05,43.41,43.77,44.14,44.5,44.88,45.24,45.62,45.99,46.34,46.69,47.05,47.44,47.86,48.34,48.89,49.5,50.19,50.93,51.73,52.54,53.38,54.21,55.04,55.87,56.67,57.47,58.22,58.96,59.65,60.33,60.98,61.6,62.2,62.7,63.2,63.8,64.4,65.1,65.6,66.3,66.9,67.6,68.3,68.7,69.3,69.8,70.2,70.6,70.9,71.2,71.6,71.8,72.1,72.4,72.7,72.9,73.2,73.5
+Bosnia and Herzegovina,53.22,54.49,55.7,56.85,57.94,58.97,59.95,60.87,61.74,62.56,63.34,64.07,64.78,65.46,66.14,66.81,67.47,68.14,68.82,69.49,70.17,70.84,71.49,72.12,72.71,73.24,73.71,74.12,74.48,74.82,75.2,75.65,76.15,76.63,76.95,76.89,76.37,75.39,74.07,72.7,72.7,68.0,68.3,71.1,67.0,73.8,74.4,74.8,75.3,75.7,76.2,76.4,76.7,76.9,77.0,77.1,77.3,77.5,77.7,77.9,78.2,78.4,78.6,78.7,78.9,79.1
+Botswana,46.87,47.27,47.66,48.05,48.45,48.84,49.23,49.61,49.99,50.34,50.7,51.02,51.35,51.67,52.0,52.36,52.77,53.23,53.73,54.3,54.9,55.54,56.18,56.82,57.45,58.07,58.65,59.21,59.74,60.24,60.73,61.21,61.67,62.08,62.44,62.7,62.85,62.85,62.69,62.3,62.0,61.2,60.1,58.6,56.8,54.8,52.9,50.9,49.2,47.6,46.5,45.6,45.7,46.9,49.3,51.2,52.4,53.2,54.3,55.6,56.5,56.5,56.9,57.3,58.7,60.13
+Brazil,50.59,51.1,51.62,52.14,52.66,53.19,53.71,54.23,54.75,55.27,55.78,56.27,56.75,57.21,57.66,58.07,58.49,58.91,59.31,59.73,60.14,60.56,60.98,61.41,61.84,62.27,62.68,63.07,63.45,63.81,64.18,64.55,64.94,65.34,65.76,66.18,66.6,67.04,67.47,67.9,68.1,68.3,68.5,68.8,69.0,69.3,69.6,69.9,70.3,70.7,71.1,71.4,71.7,72.0,72.4,72.7,73.0,73.2,73.4,73.6,73.8,74.0,74.1,74.3,74.4,74.5
+Brunei,56.99,57.6,58.22,58.83,59.45,60.07,60.7,61.31,61.93,62.52,63.11,63.67,64.21,64.72,65.21,65.67,66.12,66.54,66.97,67.38,67.79,68.19,68.58,68.95,69.32,69.67,70.01,70.33,70.65,70.95,71.25,71.54,71.84,72.12,72.41,72.69,72.98,73.26,73.54,73.8,73.8,74.0,74.2,74.4,74.7,74.9,75.2,75.6,75.8,75.9,76.1,76.3,76.5,76.7,76.7,76.8,76.8,76.9,77.0,77.1,76.9,76.9,76.9,77.1,77.1,77.1
+Bulgaria,60.65,59.62,64.16,64.43,64.84,65.24,66.64,68.74,66.6,69.22,70.26,69.55,70.38,71.18,71.35,71.28,70.47,71.3,70.48,71.32,70.93,70.96,71.4,71.26,71.11,71.44,70.88,71.24,71.34,71.17,71.56,71.16,71.33,71.43,71.15,71.63,71.42,71.49,71.55,71.4,71.3,71.2,71.1,70.9,71.0,70.9,70.6,71.0,71.4,71.6,71.8,72.1,72.3,72.5,72.6,72.7,72.9,73.2,73.5,73.7,74.2,74.5,74.6,74.7,74.8,74.9
+Burkina Faso,30.65,31.18,31.69,32.21,32.72,33.21,33.71,34.21,34.71,35.21,35.72,36.23,36.75,37.27,37.8,38.3,38.8,39.3,39.78,40.27,40.75,41.25,41.78,42.36,43.0,43.74,44.61,45.56,46.58,47.61,48.58,49.45,50.17,50.71,51.08,51.28,51.38,51.42,51.42,51.4,51.4,51.3,51.3,51.3,51.3,51.5,51.6,51.8,52.2,52.6,53.2,53.8,54.5,55.1,55.9,56.6,57.4,58.0,58.5,59.0,59.5,59.9,60.3,60.6,60.9,61.2
+Burundi,38.19,38.45,38.72,38.98,39.25,39.51,39.77,40.04,40.3,40.58,40.85,41.13,41.41,41.69,41.95,42.18,42.37,42.52,42.65,42.76,42.91,43.1,43.35,43.66,44.02,44.43,44.84,45.24,45.6,45.93,46.22,46.49,46.75,46.95,47.08,47.05,46.88,46.54,46.1,45.6,45.4,45.3,45.1,45.0,44.5,44.3,45.0,45.5,46.3,46.7,48.4,49.8,51.3,53.0,54.7,56.4,57.9,59.1,60.0,60.4,60.8,61.1,61.3,61.4,61.4,61.4
+Cambodia,40.5,40.81,41.08,41.32,41.52,41.7,41.86,41.99,42.14,42.29,42.47,42.7,42.95,43.2,43.45,43.73,44.0,44.13,44.03,43.28,41.67,39.73,37.58,34.94,21.69,19.04,18.1,19.55,21.91,28.16,38.0,44.24,49.43,53.22,55.5,56.49,56.82,56.99,57.22,57.6,57.9,58.2,58.1,58.0,58.1,58.3,58.7,59.0,59.5,60.0,60.8,61.6,62.4,63.2,64.0,64.8,65.4,66.1,66.6,67.0,67.6,68.2,68.7,69.1,69.4,69.7
+Cameroon,39.08,39.51,39.94,40.41,40.87,41.37,41.88,42.39,42.93,43.46,44.0,44.53,45.07,45.59,46.13,46.67,47.22,47.79,48.37,48.97,49.59,50.22,50.85,51.49,52.13,52.74,53.36,53.95,54.52,55.06,55.56,56.03,56.45,56.83,57.17,57.48,57.75,58.01,58.22,58.4,58.2,57.9,57.4,57.0,56.5,56.2,55.5,55.0,54.7,54.3,54.2,54.2,54.3,54.4,54.9,55.4,55.7,56.6,57.3,57.8,58.1,58.5,59.0,59.1,59.4,59.7
+Canada,68.53,68.72,69.1,69.96,70.02,70.0,69.92,70.58,70.62,71.0,71.22,71.25,71.26,71.64,71.74,71.86,72.07,72.23,72.39,72.58,72.91,72.81,73.04,73.12,73.41,73.84,74.13,74.46,74.81,75.05,75.46,75.67,76.04,76.33,76.31,76.46,76.76,76.82,77.09,77.4,77.6,77.7,77.8,77.9,78.0,78.3,78.6,78.8,79.0,79.2,79.5,79.6,79.8,80.1,80.2,80.5,80.6,80.8,81.1,81.3,81.6,81.6,81.6,81.7,81.7,81.7
+Cape Verde,48.45,48.63,48.81,49.0,49.19,49.38,49.57,49.76,49.95,50.12,50.27,50.43,50.59,50.77,51.0,51.32,51.75,52.32,53.0,53.78,54.65,55.57,56.5,57.41,58.3,59.16,60.0,60.82,61.62,62.41,63.19,63.95,64.69,65.43,66.12,66.75,67.33,67.85,68.3,68.7,68.6,68.6,68.4,68.3,68.3,68.2,68.2,68.2,68.2,68.4,68.6,68.7,68.9,69.1,69.3,69.6,69.6,70.4,70.7,71.1,71.4,71.9,72.3,72.7,72.9,73.1
+Central African Republic,33.34,33.79,34.26,34.72,35.18,35.62,36.07,36.53,36.97,37.43,37.89,38.36,38.85,39.36,39.92,40.5,41.15,41.84,42.57,43.36,44.19,45.04,45.91,46.77,47.6,48.36,49.07,49.7,50.21,50.61,50.86,50.96,50.95,50.81,50.57,50.21,49.8,49.34,48.86,48.4,48.1,48.0,47.5,47.2,46.7,46.3,45.9,45.7,45.5,45.3,45.2,45.2,45.2,45.4,45.5,45.8,46.2,46.8,47.6,47.9,48.1,48.5,47.8,48.2,49.6,51.04
+Chad,37.29,37.69,38.09,38.49,38.9,39.31,39.72,40.14,40.54,40.95,41.35,41.76,42.17,42.58,43.01,43.48,43.98,44.54,45.12,45.72,46.33,46.91,47.47,47.98,48.45,48.89,49.31,49.72,50.14,50.56,50.97,51.38,51.78,52.15,52.51,52.81,53.09,53.33,53.52,53.7,54.3,53.9,54.0,53.6,53.6,53.0,52.5,52.1,51.7,51.5,51.7,51.9,52.1,52.6,53.0,53.1,54.0,54.3,55.2,55.8,56.1,56.3,56.6,56.8,57.4,58.01
+Channel Islands,68.71,69.09,69.43,69.72,69.97,70.19,70.37,70.52,70.64,70.74,70.83,70.93,71.03,71.14,71.27,71.39,71.51,71.62,71.73,71.82,71.92,72.02,72.13,72.26,72.41,72.58,72.77,72.98,73.21,73.44,73.67,73.89,74.1,74.3,74.49,74.68,74.87,75.07,75.29,75.51,75.73,75.94,76.14,76.34,76.53,76.72,76.92,77.14,77.37,77.61,77.87,78.14,78.41,78.67,78.93,79.16,79.38,79.57,79.75,79.9,80.05,80.19,80.32,80.47,80.61,80.75
+Chile,54.35,54.56,54.79,55.03,55.29,55.57,55.86,56.16,56.5,56.85,57.23,57.63,58.07,58.54,59.03,59.54,60.07,60.61,61.17,61.74,62.34,62.98,63.63,64.31,65.02,65.75,66.5,67.25,67.99,68.7,69.36,69.97,70.51,71.0,71.42,71.8,72.14,72.47,72.79,73.1,74.1,75.0,75.2,75.3,75.4,75.7,76.2,76.6,76.9,77.3,77.4,77.7,77.8,78.0,78.2,78.2,78.3,78.5,78.5,78.5,78.9,79.1,79.1,79.2,79.4,79.6
+China,41.98,42.91,43.85,45.7,47.2,49.57,49.62,49.17,37.36,30.53,32.95,43.29,50.64,52.0,54.28,55.37,56.9,57.87,59.38,61.0,62.04,61.36,60.97,60.63,60.78,60.46,61.94,62.15,62.95,63.92,64.2,65.28,65.49,65.68,65.87,66.05,66.23,66.39,66.56,66.7,67.0,67.2,67.5,67.9,68.4,68.8,69.1,69.4,69.6,69.8,70.0,70.2,70.9,71.4,71.9,72.6,73.1,73.4,73.9,74.3,74.9,75.3,75.7,75.9,76.2,76.5
+Colombia,49.7,50.93,52.08,53.16,54.15,55.07,55.91,56.69,57.39,58.03,58.63,59.18,59.71,60.21,60.7,61.16,61.6,62.03,62.43,62.83,63.23,63.64,64.08,64.53,65.04,65.58,66.17,66.79,67.43,68.07,68.67,69.24,69.72,70.13,70.48,70.74,70.96,71.14,71.32,71.5,71.1,71.1,71.4,71.6,72.0,72.2,72.8,73.1,73.2,73.3,73.5,73.7,74.5,74.7,75.1,75.3,75.9,76.2,76.2,76.4,77.0,77.3,77.5,77.8,78.0,78.2
+Comoros,40.58,40.91,41.25,41.61,41.99,42.38,42.78,43.19,43.61,44.04,44.47,44.89,45.32,45.75,46.18,46.63,47.1,47.58,48.09,48.61,49.12,49.63,50.12,50.59,51.03,51.46,51.89,52.3,52.72,53.15,53.59,54.03,54.48,54.93,55.36,55.77,56.15,56.5,56.81,57.1,57.4,57.8,58.2,58.5,58.9,58.4,59.4,60.0,61.4,62.1,63.0,63.8,64.8,65.5,66.0,66.3,66.6,67.1,66.7,67.7,67.2,67.6,67.8,68.0,68.1,68.2
+"Congo, Dem. Rep.",40.07,40.58,41.06,41.53,41.97,42.39,42.79,43.17,43.54,43.9,44.25,44.61,44.98,45.36,45.77,46.2,46.66,47.14,47.63,48.13,48.6,49.05,49.46,49.83,50.17,50.49,50.8,51.11,51.43,51.76,52.09,52.41,52.72,53.0,53.28,53.55,53.81,54.07,54.31,54.5,54.4,54.3,54.3,54.3,54.0,51.8,53.2,53.5,54.0,54.3,54.5,54.7,54.9,55.9,56.4,56.8,57.1,57.5,57.9,58.4,58.8,59.1,59.6,60.1,60.8,61.51
+"Congo, Rep.",41.81,42.56,43.32,44.05,44.78,45.5,46.21,46.92,47.6,48.25,48.88,49.47,50.04,50.55,51.02,51.45,51.84,52.21,52.54,52.85,53.14,53.42,53.69,53.94,54.2,54.45,54.71,54.97,55.22,55.45,55.65,55.81,55.93,55.98,55.94,55.79,55.54,55.21,54.78,54.3,54.4,54.4,53.5,53.2,52.6,52.2,46.3,49.9,51.6,52.5,53.5,54.3,55.0,55.8,56.7,57.8,58.3,58.8,59.8,60.4,60.9,61.3,61.5,61.5,61.5,61.5
+Costa Rica,56.6,57.19,57.79,58.38,58.98,59.57,60.17,60.77,61.37,61.97,62.56,63.13,63.7,64.26,64.8,65.33,65.85,66.35,66.84,67.34,67.86,68.4,68.95,69.53,70.12,70.75,71.38,72.0,72.62,73.2,73.73,74.22,74.66,75.04,75.37,75.66,75.9,76.14,76.37,76.6,76.5,76.6,76.6,76.7,76.8,76.8,77.0,77.2,77.5,77.7,78.0,78.2,78.4,78.7,79.0,79.3,79.6,79.8,79.8,79.8,79.9,80.0,80.1,80.2,80.3,80.4
+Cote d'Ivoire,32.0,32.54,33.1,33.71,34.36,35.03,35.75,36.49,37.24,38.0,38.74,39.46,40.17,40.84,41.51,42.21,42.93,43.7,44.53,45.38,46.27,47.15,48.02,48.85,49.63,50.37,51.06,51.7,52.31,52.87,53.38,53.87,54.31,54.69,55.02,55.26,55.43,55.5,55.46,55.3,54.9,54.4,53.7,53.2,52.5,52.3,52.3,52.2,52.2,52.0,52.1,52.3,52.6,52.8,53.4,54.1,54.9,55.4,56.0,56.6,57.0,57.5,58.1,58.5,59.1,59.71
+Croatia,60.57,61.08,61.6,62.1,62.58,63.06,63.52,63.98,64.41,64.85,65.26,65.66,66.05,66.43,66.8,67.16,67.52,67.87,68.22,68.54,68.86,69.14,69.4,69.63,69.83,70.0,70.16,70.3,70.42,70.56,70.71,70.89,71.08,71.31,71.54,71.78,72.0,72.22,72.4,72.6,71.9,72.3,72.9,73.4,73.0,73.4,73.4,73.5,73.8,74.2,74.6,74.9,75.1,75.3,75.7,75.9,76.0,76.2,76.4,76.7,77.1,77.4,77.6,77.8,77.8,77.8
+Cuba,58.53,59.12,59.71,60.29,60.89,61.48,62.07,62.66,63.25,63.85,64.47,65.09,65.71,66.35,66.99,67.6,68.2,68.78,69.32,69.84,70.34,70.82,71.29,71.74,72.18,72.59,72.96,73.3,73.59,73.84,74.05,74.22,74.36,74.48,74.57,74.62,74.65,74.67,74.67,74.7,74.8,74.7,74.7,74.8,75.0,75.2,75.4,75.6,75.8,76.2,76.4,76.8,76.9,77.0,77.1,77.3,77.5,77.6,77.7,77.8,77.9,78.0,78.0,78.1,78.2,78.3
+Cyprus,66.13,66.58,67.03,67.45,67.87,68.26,68.65,69.01,69.38,69.72,70.06,70.38,70.71,71.02,71.33,71.62,71.92,72.19,72.47,72.73,72.99,73.23,73.47,73.7,73.93,74.15,74.37,74.58,74.79,74.99,75.19,75.38,75.58,75.76,75.95,76.12,76.3,76.47,76.64,76.8,76.4,76.7,76.8,76.4,76.7,77.1,77.1,77.1,77.5,77.7,78.5,78.7,79.0,79.1,79.0,79.5,79.8,80.0,80.3,80.6,81.1,81.5,81.7,81.7,81.8,81.9
+Czech Republic,65.32,66.94,67.64,68.14,69.06,69.47,69.14,70.05,70.04,70.58,70.77,70.04,70.56,70.73,70.43,70.65,70.55,70.11,69.62,69.72,69.96,70.49,70.33,70.42,70.77,70.88,70.94,71.02,71.13,70.67,71.11,71.22,71.0,71.26,71.48,71.42,71.87,72.08,72.13,71.8,72.0,72.3,72.7,73.0,73.4,73.8,74.2,74.5,74.7,75.0,75.3,75.4,75.6,75.9,76.2,76.5,76.8,77.1,77.3,77.5,77.8,78.1,78.3,78.6,78.8,79.0
+Denmark,70.97,70.82,71.2,71.4,71.97,72.11,71.87,72.3,72.29,72.28,72.55,72.43,72.52,72.61,72.49,72.57,73.06,73.27,73.36,73.49,73.55,73.59,73.83,73.96,74.24,73.91,74.82,74.59,74.41,74.3,74.44,74.78,74.65,74.81,74.68,74.86,74.97,75.06,75.1,75.1,75.4,75.4,75.4,75.4,75.6,75.9,76.2,76.7,76.3,77.1,77.2,77.2,77.6,77.8,78.3,78.3,78.4,78.9,79.1,79.4,79.9,80.3,80.3,80.3,80.4,80.5
+Djibouti,41.48,41.89,42.31,42.77,43.23,43.71,44.21,44.73,45.24,45.77,46.28,46.79,47.3,47.8,48.33,48.9,49.53,50.23,50.99,51.75,52.51,53.2,53.83,54.38,54.85,55.29,55.71,56.15,56.61,57.1,57.59,58.08,58.55,58.97,59.38,59.74,60.09,60.42,60.72,61.0,60.7,60.4,60.7,60.0,60.4,60.3,60.1,60.0,59.9,60.0,60.1,60.2,60.3,60.4,60.7,60.7,61.5,61.8,62.1,62.3,62.5,62.8,63.1,63.1,63.8,64.51
+Dominican Republic,45.6,46.5,47.39,48.27,49.15,50.01,50.87,51.71,52.54,53.37,54.17,54.97,55.75,56.52,57.28,58.02,58.75,59.47,60.16,60.83,61.47,62.09,62.67,63.23,63.75,64.25,64.73,65.19,65.65,66.12,66.6,67.11,67.63,68.18,68.75,69.34,69.96,70.58,71.2,71.8,72.2,72.5,72.5,72.5,72.6,72.6,72.9,72.9,73.2,73.3,73.4,73.5,73.5,73.1,73.3,73.5,73.7,74.1,74.3,74.4,74.6,74.7,74.9,75.1,75.3,75.5
+Ecuador,48.06,48.64,49.23,49.87,50.54,51.23,51.93,52.65,53.38,54.09,54.77,55.42,56.01,56.53,57.02,57.47,57.89,58.32,58.76,59.21,59.67,60.16,60.67,61.18,61.73,62.3,62.9,63.51,64.16,64.82,65.49,66.17,66.85,67.53,68.18,68.83,69.46,70.06,70.64,71.2,71.4,71.7,71.8,72.2,72.3,72.5,72.7,72.8,73.1,73.2,73.4,73.6,73.7,73.9,74.1,74.3,74.5,74.7,74.9,75.1,75.3,75.5,75.6,75.8,75.9,76.0
+Egypt,39.32,40.72,42.03,43.22,44.3,45.29,46.17,46.97,47.68,48.31,48.89,49.43,49.94,50.42,50.88,51.29,51.65,51.97,52.25,52.54,52.88,53.31,53.84,54.46,55.17,55.93,56.69,57.45,58.16,58.85,59.52,60.21,60.93,61.65,62.38,63.07,63.7,64.27,64.76,65.2,65.4,66.1,66.4,66.7,67.4,67.9,68.2,68.6,69.0,69.7,69.7,69.8,69.8,69.9,70.1,70.1,70.3,70.2,70.1,70.1,70.4,70.5,71.0,71.3,71.5,71.7
+El Salvador,44.11,45.06,45.99,46.9,47.8,48.68,49.55,50.39,51.22,52.02,52.77,53.5,54.18,54.81,55.4,55.93,56.41,56.84,57.24,57.57,57.85,58.07,58.22,58.33,58.36,58.33,58.22,58.09,57.98,57.96,58.13,58.53,59.19,60.11,61.24,62.54,63.91,65.28,66.55,67.7,68.1,68.9,69.3,69.6,70.0,70.3,70.8,71.0,71.6,71.9,71.7,72.5,72.6,72.8,73.0,73.3,73.5,73.7,73.8,74.1,74.3,74.5,74.6,74.8,74.9,75.0
+Equatorial Guinea,34.55,34.9,35.25,35.59,35.95,36.3,36.65,36.99,37.34,37.69,38.04,38.38,38.73,39.08,39.44,39.78,40.13,40.48,40.82,41.17,41.52,41.87,42.21,42.56,42.91,43.28,43.65,44.04,44.44,44.85,45.26,45.69,46.11,46.52,46.92,47.33,47.73,48.14,48.52,48.9,48.7,48.7,48.6,48.5,48.5,48.9,50.3,51.2,52.0,52.9,54.0,54.9,55.3,55.9,56.0,56.8,57.1,57.5,58.0,58.6,58.7,59.4,60.5,61.0,61.0,61.0
+Eritrea,36.47,36.75,37.02,37.29,37.58,37.86,38.14,38.42,38.73,39.03,39.35,39.69,40.04,40.41,40.81,41.22,41.66,42.1,42.56,43.02,43.47,43.92,44.35,44.75,45.14,45.49,45.8,46.09,46.38,46.66,46.97,47.33,47.74,48.21,48.77,49.38,50.06,50.8,51.58,52.4,53.4,54.9,56.2,57.0,57.8,58.4,59.0,58.8,52.2,37.6,59.9,60.0,59.9,60.0,59.9,60.0,60.1,60.1,60.1,60.1,60.2,60.3,60.4,60.6,60.7,60.8
+Estonia,59.91,61.13,63.7,65.05,65.73,67.36,67.84,68.29,68.72,69.42,69.74,69.93,69.99,70.74,70.81,70.78,71.08,70.7,70.4,70.51,70.71,70.48,70.83,70.94,70.26,69.88,70.01,69.87,69.66,69.75,69.62,70.03,69.95,69.83,69.97,71.11,71.13,71.17,70.73,70.1,69.6,69.3,68.2,66.3,67.7,69.8,70.0,69.5,70.2,70.4,70.0,70.9,71.5,72.0,72.5,72.9,73.0,74.2,74.9,76.4,76.3,76.7,77.5,77.6,77.8,78.0
+Ethiopia,33.09,33.41,33.8,34.23,34.72,35.25,32.41,30.37,37.08,37.72,38.35,38.94,39.49,39.36,38.13,39.09,41.09,41.38,41.65,41.9,42.14,41.98,39.85,37.71,38.78,42.86,42.41,42.07,42.74,42.8,42.87,42.93,42.5,39.46,35.43,41.39,43.95,44.4,44.82,45.2,46.9,47.8,48.4,48.8,49.2,50.0,50.6,51.1,50.6,52.1,52.7,53.6,54.3,55.2,56.1,57.2,58.6,60.0,61.2,62.1,62.9,63.6,64.2,64.7,65.2,65.7
+Fiji,51.3,51.85,52.38,52.9,53.4,53.89,54.36,54.81,55.26,55.7,56.12,56.54,56.94,57.35,57.75,58.14,58.52,58.89,59.26,59.61,59.96,60.29,60.6,60.91,61.21,61.5,61.8,62.09,62.37,62.65,62.92,63.2,63.46,63.71,63.96,64.2,64.43,64.66,64.88,65.1,65.1,65.0,64.8,64.7,64.5,64.3,64.1,64.2,64.1,64.2,64.4,64.5,64.6,64.7,64.8,64.8,64.9,64.9,64.9,65.2,65.3,65.4,65.6,65.7,65.8,65.9
+Finland,65.68,66.56,66.63,67.59,67.39,68.01,67.51,68.65,68.83,69.03,69.07,68.78,69.19,69.4,69.16,69.68,69.86,69.82,69.7,70.4,70.22,70.91,71.42,71.34,71.89,72.04,72.56,73.13,73.42,73.71,74.03,74.6,74.51,74.82,74.49,74.86,74.89,74.85,75.07,75.1,75.4,75.7,76.0,76.4,76.7,76.8,77.1,77.3,77.5,77.8,78.1,78.3,78.5,78.8,79.0,79.2,79.4,79.6,79.8,80.0,80.3,80.5,80.8,80.9,80.9,80.9
+France,66.17,67.46,67.4,68.27,68.54,68.57,69.0,70.24,70.27,70.49,71.07,70.61,70.46,71.43,71.26,71.67,71.67,71.66,71.4,72.29,72.27,72.52,72.69,73.04,73.13,73.38,73.99,74.12,74.43,74.53,74.69,75.07,75.06,75.56,75.67,75.95,76.55,76.78,76.91,77.2,77.3,77.6,77.7,78.0,78.2,78.4,78.7,78.7,78.8,79.1,79.2,79.4,79.6,80.2,80.4,80.7,81.0,81.1,81.2,81.4,81.6,81.6,81.7,81.7,81.8,81.9
+French Guiana,52.52,53.05,53.58,54.12,54.67,55.22,55.78,56.37,57.0,57.68,58.44,59.28,60.19,61.14,62.1,63.0,63.8,64.46,64.97,65.34,65.57,65.71,65.81,65.91,66.04,66.24,66.51,66.87,67.3,67.79,68.31,68.83,69.33,69.79,70.2,70.57,70.92,71.27,71.6,71.94,72.27,72.61,72.93,73.25,73.56,73.84,74.1,74.34,74.55,74.75,74.92,75.07,75.21,75.35,75.5,75.65,75.82,76.01,76.21,76.43,76.65,76.89,77.12,77.35,77.58,77.81
+French Polynesia,46.52,48.28,49.86,51.27,52.5,53.55,54.44,55.18,55.78,56.28,56.71,57.09,57.47,57.85,58.24,58.65,59.06,59.45,59.83,60.18,60.52,60.84,61.15,61.47,61.82,62.23,62.72,63.28,63.9,64.56,65.22,65.84,66.39,66.87,67.27,67.59,67.88,68.15,68.42,68.7,69.01,69.33,69.68,70.05,70.43,70.82,71.21,71.59,71.96,72.31,72.67,73.03,73.4,73.77,74.13,74.48,74.81,75.11,75.38,75.62,75.84,76.05,76.26,76.47,76.69,76.91
+Gabon,35.84,36.34,36.8,37.19,37.54,37.83,38.1,38.33,38.56,38.83,39.15,39.56,40.07,40.7,41.42,42.21,43.06,43.9,44.74,45.55,46.35,47.13,47.9,48.68,49.45,50.23,51.01,51.81,52.61,53.42,54.24,55.07,55.88,56.66,57.4,58.04,58.58,59.0,59.32,59.5,59.8,60.2,60.1,59.9,59.8,59.6,59.9,60.0,59.7,59.3,59.0,59.4,59.4,59.4,60.1,60.9,61.6,61.7,62.1,63.0,63.3,63.9,64.4,65.0,65.9,66.81
+Gambia,31.85,32.33,32.78,33.22,33.65,34.06,34.46,34.86,35.27,35.7,36.16,36.68,37.26,37.91,38.66,39.47,40.36,41.3,42.3,43.31,44.36,45.42,46.47,47.51,48.56,49.58,50.6,51.61,52.62,53.61,54.59,55.55,56.48,57.37,58.21,58.97,59.65,60.26,60.81,61.3,61.5,61.5,62.0,62.3,62.6,62.8,63.1,63.4,63.4,63.6,63.9,63.8,64.4,64.7,64.9,65.2,65.3,65.7,66.0,66.5,67.1,67.5,67.8,68.0,68.1,68.2
+Georgia,59.96,60.36,60.75,61.15,61.54,61.93,62.32,62.72,63.11,63.5,63.9,64.31,64.71,65.11,65.52,65.9,66.26,66.6,66.93,67.24,67.54,67.85,68.17,68.47,68.76,69.0,69.19,69.31,69.37,69.4,69.42,69.46,69.52,69.62,69.75,69.86,69.94,69.96,69.95,69.9,69.9,69.4,69.2,70.2,70.7,71.2,71.3,71.4,71.4,71.4,71.7,71.6,71.7,71.5,71.8,71.9,72.1,71.8,72.1,72.2,72.2,72.4,72.5,72.6,72.9,73.2
+Germany,67.08,67.4,67.7,68.0,68.28,68.57,68.49,69.23,69.34,69.26,69.85,70.01,70.1,70.66,70.65,70.77,70.99,70.64,70.48,70.72,70.94,71.16,71.41,71.71,71.56,72.02,72.63,72.6,72.96,73.14,73.37,73.69,73.97,74.44,74.55,74.75,75.15,75.33,75.51,75.4,75.6,76.0,76.1,76.4,76.6,76.9,77.3,77.6,77.8,78.1,78.4,78.6,78.8,79.2,79.4,79.7,79.9,80.0,80.1,80.3,80.5,80.6,80.7,80.7,80.8,80.9
+Ghana,41.66,42.22,42.76,43.3,43.83,44.36,44.87,45.37,45.86,46.34,46.8,47.25,47.66,48.07,48.44,48.8,49.14,49.46,49.78,50.08,50.39,50.7,51.02,51.35,51.68,52.0,52.33,52.63,52.95,53.26,53.6,53.95,54.34,54.76,55.23,55.75,56.31,56.89,57.47,58.0,58.4,58.7,59.5,59.6,60.0,60.1,59.8,60.1,60.1,60.0,59.9,60.0,60.2,60.5,60.8,61.2,61.6,62.0,62.4,62.9,63.5,64.1,64.5,64.8,65.3,65.8
+Greece,65.57,65.72,65.92,66.16,66.46,66.79,67.16,67.57,67.99,68.41,68.8,69.14,69.44,69.69,69.91,70.12,70.34,70.59,70.88,71.2,71.53,71.85,72.13,72.39,72.62,72.85,73.1,73.38,73.68,74.01,74.33,74.64,74.94,75.21,75.47,75.73,76.01,76.32,76.66,77.0,77.1,77.1,77.5,77.7,77.8,77.9,78.1,78.2,78.3,78.6,78.9,79.1,79.3,79.4,79.6,80.0,79.8,80.2,80.2,80.4,80.5,80.6,81.0,81.0,81.0,81.0
+Greenland,43.94,45.59,48.67,51.76,54.85,57.94,58.82,59.71,60.6,61.49,61.85,62.22,62.59,62.97,63.34,63.71,64.08,64.45,64.82,65.19,65.01,64.84,64.66,64.49,64.31,64.14,63.96,63.78,63.61,63.09,62.71,62.8,62.89,63.05,63.42,63.81,64.22,64.14,64.22,64.6,65.1,65.5,65.9,66.3,66.5,66.8,66.9,67.2,67.5,67.8,68.0,68.3,68.5,68.8,69.1,69.5,70.0,70.3,70.6,70.8,71.2,71.6,71.8,72.0,72.1,72.2
+Grenada,55.81,56.39,56.97,57.52,58.07,58.61,59.12,59.63,60.11,60.59,61.05,61.49,61.93,62.35,62.76,63.16,63.54,63.91,64.27,64.62,64.97,65.29,65.62,65.92,66.22,66.52,66.79,67.07,67.33,67.6,67.86,68.1,68.35,68.59,68.83,69.06,69.28,69.5,69.7,69.9,70.2,70.2,70.0,70.4,70.7,70.8,70.8,70.6,70.6,70.5,70.3,70.2,70.2,69.3,70.3,70.5,70.7,70.8,70.9,71.0,71.0,71.1,71.2,71.4,71.5,71.6
+Guadeloupe,52.09,52.94,53.77,54.57,55.35,56.11,56.84,57.55,58.24,58.91,59.58,60.23,60.87,61.51,62.14,62.75,63.34,63.91,64.46,64.98,65.49,65.99,66.49,66.97,67.46,67.93,68.4,68.86,69.31,69.75,70.18,70.6,71.01,71.42,71.82,72.21,72.6,72.98,73.35,73.72,74.08,74.44,74.79,75.14,75.48,75.82,76.15,76.48,76.8,77.12,77.43,77.74,78.04,78.35,78.65,78.95,79.25,79.55,79.85,80.14,80.43,80.69,80.95,81.18,81.41,81.64
+Guam,56.53,57.04,57.55,58.08,58.6,59.12,59.65,60.18,60.71,61.24,61.76,62.28,62.79,63.29,63.78,64.26,64.72,65.18,65.63,66.06,66.49,66.9,67.3,67.7,68.07,68.43,68.79,69.13,69.47,69.8,70.12,70.42,70.73,71.02,71.3,71.58,71.84,72.09,72.35,72.6,72.4,72.4,72.5,72.7,73.0,73.2,69.4,73.4,73.5,73.6,73.6,73.6,73.5,73.3,73.1,72.7,72.4,72.1,71.8,71.6,71.5,71.5,71.6,71.6,71.7,71.8
+Guatemala,42.06,42.44,42.83,43.27,43.73,44.23,44.77,45.32,45.91,46.51,47.12,47.76,48.4,49.05,49.73,50.43,51.16,51.93,52.72,53.5,54.27,55.0,55.67,56.28,56.82,57.32,57.78,58.22,58.66,59.12,59.6,60.1,60.62,61.16,61.72,62.28,62.86,63.45,64.02,64.6,64.0,63.8,64.2,64.6,66.9,68.1,67.7,67.7,68.8,68.8,69.3,70.0,70.1,70.2,69.8,70.2,71.0,71.2,70.9,71.2,71.6,72.1,72.3,72.4,72.6,72.8
+Guinea,33.12,33.44,33.74,34.04,34.35,34.64,34.92,35.2,35.46,35.71,35.95,36.17,36.37,36.57,36.77,36.96,37.16,37.37,37.61,37.89,38.2,38.57,38.97,39.43,39.94,40.47,41.04,41.63,42.25,42.92,43.66,44.47,45.37,46.32,47.33,48.35,49.37,50.33,51.22,52.0,52.3,52.5,53.0,53.1,53.4,53.8,54.0,54.0,54.0,54.2,54.4,54.7,55.1,55.6,56.0,56.4,56.8,57.1,57.5,57.9,58.2,58.5,58.8,58.6,59.1,59.6
+Guinea-Bissau,39.65,40.03,40.42,40.81,41.2,41.58,41.97,42.36,42.75,43.14,43.39,43.64,43.89,44.15,44.39,44.63,44.86,45.09,45.29,45.5,45.71,45.91,46.12,46.33,46.54,46.77,47.02,47.27,47.54,47.83,48.13,48.45,48.78,49.13,49.49,49.87,50.26,50.67,51.09,51.5,51.7,51.8,52.0,52.2,52.3,52.6,52.8,51.7,52.5,52.8,52.7,52.7,52.8,52.8,52.9,53.0,53.2,53.6,53.9,54.3,54.5,54.8,55.1,55.3,55.6,55.9
+Guyana,57.51,57.68,57.85,58.04,58.21,58.38,58.56,58.73,58.9,59.08,59.24,59.41,59.58,59.75,59.92,60.09,60.25,60.43,60.59,60.75,60.92,61.08,61.24,61.4,61.56,61.72,61.88,62.03,62.18,62.34,62.5,62.67,62.85,63.02,63.21,63.41,63.61,63.8,64.01,64.2,64.3,64.5,64.4,64.5,64.4,64.3,64.3,64.3,64.3,64.2,63.9,63.5,63.7,64.2,64.4,64.8,64.9,65.0,65.3,65.5,65.6,65.9,66.2,66.4,66.8,67.2
+Haiti,36.56,37.22,37.87,38.5,39.12,39.74,40.34,40.93,41.52,42.1,42.68,43.26,43.82,44.38,44.93,45.43,45.9,46.33,46.73,47.1,47.45,47.81,48.17,48.53,48.9,49.28,49.63,49.97,50.3,50.62,50.94,51.29,51.64,52.02,52.42,52.81,53.21,53.59,53.95,54.3,54.4,54.9,54.7,55.4,56.2,56.7,57.0,57.5,58.0,58.7,59.2,59.6,59.7,58.6,60.0,60.3,60.8,61.0,61.7,32.2,62.4,62.9,63.4,63.8,64.3,64.8
+Honduras,41.86,42.39,42.95,43.54,44.16,44.83,45.52,46.23,46.97,47.71,48.47,49.21,49.94,50.65,51.35,52.02,52.68,53.34,54.0,54.68,55.37,56.07,56.8,57.56,58.34,59.15,59.97,60.8,61.65,62.5,63.36,64.24,65.12,66.01,66.86,67.67,68.42,69.11,69.73,70.3,70.3,70.1,69.9,70.1,70.1,70.1,70.2,63.9,70.3,70.5,70.6,70.7,70.8,71.0,71.2,71.4,71.6,71.8,71.9,72.0,72.2,72.3,72.6,72.8,73.0,73.2
+"Hong Kong, China",62.38,62.9,63.43,63.98,64.54,65.11,65.69,66.28,66.87,67.45,68.01,68.55,69.05,69.52,69.96,70.37,70.77,71.15,71.53,71.89,72.25,72.58,72.9,73.2,73.49,73.78,74.06,74.35,74.64,74.93,75.22,75.5,75.77,76.03,76.28,76.53,76.78,77.02,77.27,77.52,77.77,78.01,78.25,78.48,78.72,78.99,79.29,79.63,79.99,80.36,80.73,81.08,81.4,81.68,81.92,82.12,82.31,82.49,82.66,82.84,83.02,83.2,83.38,83.56,83.73,83.9
+Hungary,62.48,64.05,63.89,65.46,66.91,66.07,66.44,67.45,67.35,68.13,69.06,68.0,69.02,69.52,69.22,69.98,69.55,69.38,69.45,69.29,69.19,69.82,69.69,69.41,69.46,69.75,70.02,69.56,69.77,69.18,69.24,69.47,69.03,69.07,69.01,69.22,69.69,70.09,69.53,69.5,69.2,69.1,69.2,69.5,70.1,70.5,70.9,71.1,71.3,71.8,72.3,72.6,72.7,72.9,73.1,73.3,73.6,73.9,74.3,74.6,75.0,75.5,76.1,76.5,76.7,76.9
+Iceland,71.12,72.57,72.39,73.45,73.4,73.08,73.58,73.55,72.78,74.22,73.6,73.82,73.13,73.72,74.0,73.4,73.9,74.12,73.9,74.0,73.75,74.66,74.52,74.59,75.57,76.94,76.35,76.66,76.88,76.92,76.61,77.26,76.91,77.71,77.85,78.38,77.53,77.39,78.46,78.3,78.4,78.6,78.8,79.1,78.9,79.4,79.6,79.9,80.2,80.5,80.8,81.0,81.3,81.5,81.7,81.8,82.1,82.4,82.5,82.8,82.9,83.1,83.2,83.3,83.3,83.3
+India,35.1,35.76,36.44,37.11,37.79,38.48,39.16,39.85,40.56,41.26,41.99,42.72,43.46,44.23,44.98,45.73,46.49,47.21,47.93,48.65,49.35,50.08,50.81,51.53,52.25,52.93,53.56,54.14,54.65,55.1,55.51,55.86,56.19,56.51,56.81,57.11,57.39,57.65,57.93,58.2,58.5,58.8,59.1,59.5,59.9,60.2,60.5,60.8,61.2,61.5,61.9,62.3,62.8,63.2,63.6,63.9,64.3,64.7,65.0,65.4,65.7,66.1,66.5,66.9,67.2,67.5
+Indonesia,36.99,37.93,38.86,39.78,40.68,41.57,42.45,43.32,44.17,45.01,45.83,46.65,47.45,48.24,43.77,44.18,50.54,51.27,52.0,52.71,53.4,54.09,54.75,55.41,56.04,56.67,57.27,57.87,58.45,59.01,59.57,60.12,60.64,61.16,61.66,62.15,62.63,63.1,63.55,64.0,64.5,64.9,65.3,65.7,66.1,66.4,66.7,67.0,67.2,67.5,67.8,68.0,68.2,66.7,68.7,68.9,69.2,69.4,69.6,69.8,70.1,70.3,70.6,70.8,71.1,71.4
+Iran,40.29,40.92,41.56,42.19,42.84,43.47,44.11,44.74,45.38,46.0,46.61,47.22,47.83,48.43,49.04,49.66,50.3,50.98,51.67,52.43,53.28,54.24,55.24,56.24,57.1,57.64,57.78,57.52,56.95,56.24,55.62,55.32,55.49,56.19,57.39,59.01,60.83,62.67,64.43,66.0,67.8,68.5,69.1,69.6,69.9,69.8,70.3,70.8,71.3,71.4,71.3,71.3,70.1,71.5,71.9,72.4,72.8,73.1,73.4,73.7,74.1,74.3,74.5,74.6,74.6,74.6
+Iraq,35.08,36.58,38.04,39.45,40.81,42.11,43.38,44.61,45.79,46.96,48.11,49.24,50.36,51.46,52.52,53.51,54.42,55.24,55.96,56.61,57.21,57.77,58.31,58.81,59.19,59.35,59.26,58.94,58.44,57.91,57.52,57.41,57.68,58.34,59.36,60.65,62.05,63.41,64.65,65.7,63.9,65.4,65.4,65.4,65.3,65.3,65.2,65.7,65.9,65.8,66.4,66.1,66.1,66.3,65.7,65.1,65.3,66.6,67.1,67.3,67.7,68.1,68.3,67.7,67.4,67.1
+Ireland,65.07,67.52,68.3,68.44,68.46,69.43,69.51,69.84,69.99,70.76,70.24,70.57,70.85,71.12,71.35,70.89,71.95,71.67,71.62,71.68,72.5,71.86,72.11,72.08,72.68,72.81,72.98,72.98,73.28,73.66,74.04,74.34,74.4,74.87,74.84,74.93,75.76,75.82,75.87,76.3,76.7,76.8,76.9,77.3,77.1,77.5,77.4,77.6,77.7,77.8,78.4,78.8,79.1,79.3,79.7,79.8,80.1,80.1,80.3,81.0,80.6,81.1,81.5,81.6,81.7,81.8
+Israel,64.42,65.04,65.62,66.15,66.65,67.1,67.51,67.89,68.24,68.55,68.85,69.13,69.41,69.68,69.93,70.17,70.39,70.6,70.78,70.96,71.13,71.33,71.54,71.78,72.04,72.33,72.62,72.9,73.19,73.47,73.74,73.99,74.49,74.78,75.1,74.92,75.29,75.65,76.24,76.7,76.5,76.3,76.9,77.1,77.4,77.7,77.9,78.1,78.5,78.6,78.8,78.6,79.1,79.5,79.7,79.6,80.3,80.6,81.0,81.6,81.6,82.1,82.0,81.3,82.1,82.91
+Italy,65.3,65.93,66.56,67.88,68.23,67.62,67.79,68.85,69.3,69.19,69.82,69.21,69.32,70.37,70.24,70.99,71.03,70.85,70.87,71.62,71.87,72.15,72.09,72.81,72.72,73.07,73.44,73.78,74.11,74.07,74.46,74.93,74.75,75.51,75.62,75.94,76.36,76.54,76.94,77.0,77.0,77.3,77.6,77.8,78.1,78.3,78.7,78.9,79.3,79.6,79.8,80.1,80.1,80.9,81.1,81.2,81.3,81.5,81.6,81.9,82.0,82.0,82.1,82.1,82.2,82.3
+Jamaica,58.02,59.06,60.07,61.03,61.95,62.83,63.66,64.47,65.21,65.91,66.57,67.17,67.74,68.25,68.73,69.17,69.58,69.99,70.36,70.72,71.06,71.39,71.71,72.0,72.29,72.58,72.89,73.21,73.52,73.82,74.1,74.34,74.55,74.7,74.79,74.84,74.85,74.85,74.83,74.8,74.9,74.9,74.8,74.8,74.7,74.5,74.4,74.5,74.6,74.4,74.2,74.5,74.8,75.0,75.4,75.5,75.3,75.1,74.8,74.8,74.6,74.7,74.8,74.8,75.0,75.2
+Japan,60.98,63.02,63.36,64.6,65.76,65.62,65.49,67.11,67.49,67.78,68.43,68.71,69.79,70.26,70.31,71.12,71.41,71.73,71.96,72.05,72.87,73.39,73.45,73.88,74.38,74.78,75.35,75.67,76.18,76.16,76.57,77.08,77.11,77.5,77.8,78.22,78.63,78.54,78.97,79.0,79.1,79.3,79.4,79.8,79.7,80.2,80.4,80.5,80.6,81.0,81.3,81.6,81.7,81.9,82.0,82.2,82.4,82.5,82.7,82.7,82.6,82.9,83.0,83.1,83.2,83.3
+Jordan,45.56,46.45,47.34,48.23,49.09,49.95,50.8,51.65,52.48,53.3,54.12,54.94,55.75,56.55,57.35,58.13,58.9,59.66,60.42,61.15,61.87,62.59,63.3,63.98,64.64,65.28,65.89,66.46,67.0,67.51,68.0,68.45,68.9,69.33,69.75,70.15,70.52,70.87,71.2,71.5,71.9,72.2,72.2,72.4,72.5,72.6,72.8,73.0,73.2,73.4,73.6,73.8,74.0,74.1,74.5,75.5,76.3,76.9,77.5,77.9,78.1,78.2,78.3,78.4,78.5,78.6
+Kazakhstan,54.67,55.15,55.63,56.11,56.58,57.05,57.51,57.98,58.44,58.91,59.38,59.85,60.31,60.79,61.24,61.69,62.12,62.53,62.92,63.27,63.6,63.89,64.16,64.4,64.64,64.88,65.13,65.41,65.71,66.05,66.43,66.84,67.28,67.7,68.07,68.34,68.48,68.47,68.31,68.0,67.6,67.1,65.3,64.6,63.6,63.5,63.9,64.2,65.0,64.9,65.1,65.4,65.3,65.3,65.3,65.3,65.8,67.1,68.2,68.5,69.1,69.7,70.0,70.2,70.2,70.2
+Kenya,42.33,42.71,43.16,43.64,44.17,44.75,45.37,46.03,46.72,47.42,48.13,48.82,49.48,50.13,50.75,51.35,51.96,52.57,53.19,53.83,54.45,55.08,55.69,56.29,56.89,57.49,58.1,58.74,59.36,59.96,60.52,61.02,61.42,61.74,61.96,62.09,62.13,62.1,62.0,61.8,61.1,60.3,59.5,58.7,58.1,57.4,56.7,56.1,55.8,55.6,55.6,55.7,55.8,56.2,57.2,58.4,59.8,60.8,61.9,62.9,63.7,64.3,64.8,65.0,65.1,65.2
+Kiribati,42.25,42.65,43.05,43.44,43.85,44.25,44.64,45.04,45.45,45.84,46.24,46.64,47.03,47.44,47.84,48.23,48.63,49.02,49.42,49.82,50.21,50.61,51.02,51.41,51.81,52.2,52.58,52.97,53.36,53.75,54.17,54.62,55.08,55.56,56.04,56.5,56.93,57.33,57.68,58.0,58.2,58.4,58.4,58.7,58.9,59.2,59.4,59.5,59.6,59.8,60.1,60.2,60.4,60.6,60.8,61.0,61.2,61.5,61.7,61.9,62.1,62.3,62.6,62.8,63.0,63.2
+Kuwait,52.95,54.13,55.27,56.36,57.43,58.45,59.44,60.38,61.29,62.15,62.98,63.77,64.53,65.24,65.92,66.58,67.2,67.8,68.38,68.93,69.46,69.97,70.46,70.93,71.39,71.85,72.29,72.72,73.14,73.58,74.0,74.41,74.81,75.22,75.59,75.95,76.29,76.62,76.92,77.2,64.4,80.0,78.7,77.6,76.5,76.0,76.2,76.3,77.3,77.7,77.6,78.2,78.5,78.1,77.7,77.7,77.7,77.3,77.4,78.5,79.0,79.1,79.7,80.2,80.3,80.4
+Kyrgyz Republic,52.07,52.52,52.96,53.41,53.86,54.31,54.75,55.2,55.64,56.09,56.54,56.99,57.44,57.9,58.34,58.76,59.19,59.58,59.95,60.3,60.61,60.88,61.14,61.38,61.6,61.83,62.05,62.3,62.57,62.89,63.23,63.62,64.04,64.45,64.86,65.23,65.54,65.77,65.93,66.0,65.9,65.6,65.3,65.0,65.1,65.2,65.3,65.6,65.8,65.9,66.0,65.9,66.0,66.2,66.5,66.7,67.0,67.3,67.7,67.9,68.5,69.0,69.4,69.6,69.8,70.0
+Lao,39.88,40.13,40.37,40.62,40.86,41.11,41.37,41.62,41.87,42.13,42.38,42.64,42.89,43.13,43.39,43.64,43.89,44.15,44.41,44.66,44.91,45.16,45.39,45.62,45.85,46.05,46.26,46.47,46.69,46.91,47.17,47.45,47.76,48.12,48.54,49.02,49.56,50.16,50.8,51.5,52.0,52.4,52.8,53.2,53.6,54.0,54.4,54.9,55.5,56.1,56.6,57.6,58.4,59.3,60.1,60.8,61.7,62.5,63.3,64.1,65.0,65.6,66.1,66.6,67.1,67.6
+Latvia,60.48,61.88,63.19,64.38,65.46,66.44,67.31,68.07,69.53,70.37,70.6,69.97,70.36,71.62,71.29,71.26,70.94,70.56,70.29,70.31,70.66,70.35,70.29,70.22,69.37,69.48,69.56,69.45,68.93,69.23,69.18,69.75,69.51,69.56,69.72,71.09,71.14,71.05,70.55,69.6,69.1,68.4,66.7,65.7,66.5,68.6,69.3,69.0,70.0,70.5,70.0,70.4,70.8,71.2,71.1,70.8,71.3,72.4,73.3,73.9,74.6,75.1,75.0,75.2,75.4,75.6
+Lebanon,59.61,60.04,60.45,60.85,61.23,61.6,61.95,62.28,62.6,62.9,63.19,63.47,63.74,64.0,64.25,64.5,64.76,65.01,65.27,65.52,65.75,65.98,66.18,66.37,66.54,66.69,66.83,66.96,67.08,67.21,67.36,67.52,67.68,67.87,68.07,68.29,68.53,68.78,69.03,69.3,71.9,72.2,72.5,73.0,73.4,74.0,74.4,74.9,75.6,75.9,76.3,76.6,76.9,77.1,77.3,77.4,77.5,77.8,77.9,78.1,76.6,78.5,78.6,78.7,78.9,79.1
+Lesotho,41.53,42.11,42.72,43.33,43.96,44.59,45.22,45.85,46.46,47.02,47.54,47.97,48.32,48.59,48.79,48.95,49.09,49.24,49.43,49.67,49.96,50.31,50.7,51.14,51.63,52.17,52.75,53.38,54.01,54.65,55.25,55.82,56.34,56.83,57.31,57.88,58.51,59.21,59.92,60.5,60.6,60.4,60.1,59.2,58.7,57.9,56.6,54.6,52.9,50.7,48.9,47.0,45.4,44.2,43.1,43.1,43.3,44.5,45.5,46.4,46.7,46.1,45.6,45.4,47.1,48.86
+Liberia,33.11,33.36,33.6,33.84,34.07,34.28,34.51,34.73,34.98,35.24,35.54,35.88,36.28,36.73,37.23,37.77,38.33,38.91,39.49,40.1,40.75,41.43,42.16,42.91,43.68,44.45,45.21,45.92,46.57,47.14,47.6,47.97,48.25,48.46,48.58,48.62,48.59,48.55,48.53,48.6,51.5,51.8,50.1,48.9,50.9,50.4,53.8,54.4,55.2,55.8,56.3,55.4,55.2,57.9,58.4,58.8,59.3,59.9,60.3,60.8,61.5,62.3,62.9,61.8,63.2,64.63
+Libya,38.07,37.73,37.66,37.89,38.39,39.18,40.22,41.5,42.97,44.59,46.28,48.0,49.69,51.28,52.77,54.15,55.45,56.69,57.88,59.01,60.11,61.16,62.16,63.13,64.06,64.95,65.81,66.62,67.4,68.13,68.82,69.46,70.04,70.58,71.09,71.56,72.03,72.48,72.94,73.4,73.7,73.8,74.2,74.4,74.6,74.6,74.8,74.8,74.9,74.8,75.0,75.0,75.1,75.2,75.4,75.5,75.5,75.6,75.7,75.9,60.5,75.5,75.8,75.0,74.1,73.21
+Lithuania,63.9,64.52,65.14,65.77,66.38,66.99,67.59,68.19,67.73,70.33,70.52,69.46,70.64,72.0,71.76,71.92,71.99,71.68,71.3,71.16,72.1,71.34,71.7,71.63,71.24,71.38,71.14,70.93,70.8,70.78,70.77,71.17,71.09,70.6,70.78,72.45,72.26,72.1,71.79,71.5,70.5,70.3,69.1,68.7,69.0,70.2,71.1,71.3,71.8,72.1,71.6,72.1,72.1,72.2,71.7,71.5,71.4,72.1,73.6,73.9,74.3,74.7,74.9,75.0,75.2,75.4
+Luxembourg,65.38,65.71,66.04,66.37,66.67,66.98,67.27,67.55,67.83,68.99,69.49,68.59,68.8,68.98,69.31,69.21,69.59,70.17,69.73,69.47,69.35,70.59,70.34,70.42,70.37,70.31,71.61,71.57,72.25,72.42,72.22,72.31,73.19,72.94,73.51,74.44,73.97,74.57,74.49,75.2,75.5,75.8,76.2,76.5,76.9,77.1,77.4,77.7,78.1,78.5,78.7,79.0,79.1,79.5,80.0,80.3,80.6,81.0,81.2,81.3,81.5,81.7,81.9,82.1,82.2,82.3
+"Macao, China",60.25,60.79,61.32,61.84,62.37,62.89,63.41,63.92,64.43,64.93,65.42,65.9,66.36,66.81,67.24,67.66,68.06,68.45,68.83,69.2,69.56,69.91,70.26,70.61,70.95,71.29,71.62,71.94,72.26,72.57,72.88,73.17,73.46,73.75,74.03,74.31,74.58,74.84,75.1,75.36,75.61,75.86,76.1,76.33,76.56,76.78,77.0,77.21,77.42,77.63,77.83,78.04,78.25,78.46,78.67,78.89,79.1,79.32,79.54,79.75,79.97,80.19,80.4,80.61,80.82,81.03
+"Macedonia, FYR",53.65,54.61,55.53,56.4,57.25,58.04,58.79,59.51,60.2,60.85,61.49,62.11,62.72,63.32,63.92,64.51,65.08,65.62,66.14,66.63,67.08,67.48,67.83,68.14,68.41,68.61,68.76,68.88,68.98,69.08,69.21,69.4,69.63,69.92,70.26,70.6,70.93,71.23,71.48,71.7,71.7,71.6,71.5,71.7,71.8,72.1,72.3,72.4,72.6,72.9,73.0,73.3,73.4,73.6,73.8,74.1,74.3,74.5,74.7,75.2,75.6,75.8,76.0,76.2,76.5,76.8
+Madagascar,36.69,37.28,37.86,38.45,39.03,39.62,40.21,40.79,41.38,41.96,42.54,43.12,43.7,44.28,44.85,45.43,46.01,46.6,47.18,47.77,48.36,48.94,49.5,50.06,50.59,51.12,51.63,52.12,52.58,53.01,53.36,53.64,53.86,54.03,54.19,54.38,54.63,54.98,55.43,56.0,56.2,56.4,56.3,56.8,57.2,57.6,58.0,58.3,58.8,59.1,59.6,59.8,60.1,60.6,61.2,61.7,62.0,62.2,62.3,62.4,62.6,62.8,63.0,63.3,63.5,63.7
+Malawi,36.45,36.62,36.81,37.02,37.24,37.48,37.72,37.99,38.25,38.51,38.76,39.02,39.25,39.49,39.75,40.03,40.36,40.73,41.16,41.62,42.09,42.55,43.0,43.41,43.79,44.16,44.54,44.92,45.31,45.72,46.13,46.53,46.91,47.26,47.6,47.9,48.17,48.42,48.64,48.8,48.6,48.3,48.0,47.4,46.9,46.3,45.8,45.3,45.1,45.4,45.9,46.4,47.0,47.5,48.5,49.6,51.0,52.4,53.9,55.4,56.6,58.0,59.3,60.1,60.5,60.9
+Malaysia,54.05,54.72,55.39,56.06,56.72,57.37,58.01,58.65,59.27,59.89,60.48,61.07,61.63,62.17,62.71,63.21,63.7,64.17,64.63,65.08,65.51,65.93,66.34,66.73,67.13,67.5,67.86,68.21,68.56,68.89,69.22,69.53,69.84,70.14,70.45,70.73,71.01,71.28,71.54,71.8,72.0,72.2,72.4,72.4,72.4,72.5,72.8,73.0,73.1,73.3,73.6,73.8,73.9,74.0,74.3,74.5,74.5,74.5,74.3,74.4,74.6,74.7,74.9,75.1,75.3,75.5
+Maldives,33.9,34.18,34.49,34.86,35.27,35.72,36.22,36.78,37.39,38.07,38.82,39.64,40.54,41.48,42.47,43.48,44.49,45.48,46.44,47.37,48.25,49.12,49.98,50.82,51.69,52.59,53.53,54.51,55.53,56.58,57.62,58.64,59.64,60.6,61.51,62.41,63.3,64.19,65.08,66.0,66.7,67.3,67.9,68.6,69.3,70.0,70.8,71.7,72.3,73.0,73.7,74.4,75.3,74.7,76.9,77.5,78.1,78.5,78.9,79.2,79.6,79.8,79.9,80.0,80.0,80.0
+Mali,27.34,27.71,28.04,28.34,28.6,28.84,29.04,29.23,29.42,29.61,29.83,30.08,30.4,30.79,31.26,31.8,32.41,33.07,33.77,34.51,35.27,36.04,36.82,37.61,38.39,39.18,39.97,40.79,41.61,42.45,43.3,44.17,45.02,45.86,46.68,47.45,48.18,48.86,49.47,50.0,50.5,50.8,51.2,51.2,51.4,51.8,52.2,50.9,53.5,53.5,54.1,54.6,55.5,56.2,56.9,57.4,58.0,58.5,58.9,59.2,59.6,59.8,59.8,60.0,60.2,60.4
+Malta,66.02,66.17,66.35,66.55,66.79,67.06,67.34,67.65,67.97,68.32,68.67,69.02,69.37,69.7,70.03,70.36,70.67,70.98,71.29,71.6,71.9,72.2,72.49,72.78,73.07,73.36,73.63,73.92,74.19,74.47,74.74,75.01,75.28,75.54,75.81,76.08,76.33,76.59,76.84,77.1,77.3,77.5,77.9,78.2,78.4,78.5,78.8,78.9,79.0,79.2,79.4,79.8,80.1,80.3,80.7,81.0,80.9,80.7,81.2,81.3,81.3,81.6,81.7,82.0,82.1,82.2
+Martinique,54.51,55.23,55.93,56.61,57.28,57.93,58.57,59.2,59.81,60.41,61.0,61.58,62.16,62.72,63.28,63.84,64.39,64.93,65.46,65.99,66.51,67.02,67.53,68.02,68.51,69.0,69.47,69.93,70.38,70.82,71.25,71.68,72.09,72.5,72.9,73.29,73.67,74.05,74.42,74.79,75.15,75.51,75.86,76.2,76.54,76.88,77.22,77.55,77.88,78.19,78.5,78.78,79.05,79.31,79.55,79.78,80.01,80.24,80.48,80.71,80.95,81.18,81.41,81.64,81.86,82.08
+Mauritania,37.95,38.53,39.14,39.77,40.42,41.09,41.78,42.48,43.2,43.91,44.62,45.31,45.96,46.59,47.18,47.73,48.26,48.78,49.27,49.77,50.25,50.73,51.2,51.69,52.19,52.73,53.29,53.89,54.51,55.13,55.75,56.34,56.9,57.41,57.86,58.28,58.64,58.96,59.25,59.5,60.2,60.4,60.7,60.7,61.2,61.5,62.0,62.5,63.2,63.8,64.2,64.9,65.5,65.9,66.3,67.0,67.5,67.9,68.2,68.6,68.8,69.1,69.3,69.6,69.7,69.8
+Mauritius,48.57,49.61,50.68,51.78,52.92,54.09,55.28,56.46,57.63,58.74,59.75,60.64,61.38,61.97,62.4,62.67,62.85,62.97,63.05,63.14,63.27,63.45,63.68,63.99,64.37,64.83,65.34,65.87,66.41,66.92,67.34,67.7,67.96,68.14,68.26,68.38,68.53,68.74,68.99,69.3,69.6,69.7,69.8,70.0,70.3,70.5,70.7,71.0,71.2,71.4,71.6,71.7,71.9,72.1,72.4,72.5,72.7,72.9,73.2,73.4,73.7,74.1,74.2,74.3,74.5,74.7
+Mayotte,45.38,46.68,47.92,49.11,50.24,51.32,52.34,53.3,54.22,55.09,55.92,56.72,57.5,58.25,58.98,59.7,60.39,61.07,61.73,62.36,62.99,63.59,64.17,64.74,65.3,65.84,66.36,66.88,67.38,67.86,68.34,68.8,69.25,69.69,70.12,70.54,70.95,71.35,71.75,72.14,72.53,72.91,73.28,73.64,74.0,74.35,74.7,75.03,75.36,75.69,76.01,76.33,76.64,76.95,77.24,77.53,77.8,78.05,78.29,78.52,78.74,78.96,79.19,79.42,79.65,79.88
+Mexico,49.27,50.37,51.42,52.43,53.39,54.29,55.14,55.94,56.67,57.34,57.95,58.49,58.96,59.4,59.78,60.15,60.53,60.91,61.32,61.77,62.25,62.75,63.29,63.83,64.39,64.95,65.51,66.05,66.58,67.09,67.58,68.05,68.52,68.97,69.4,69.84,70.26,70.67,71.09,71.5,71.9,72.1,72.4,72.7,73.0,73.3,73.6,73.7,74.1,74.6,74.9,74.9,74.9,75.2,75.1,75.4,75.6,75.4,75.3,75.4,75.7,75.7,75.4,75.6,75.9,76.2
+"Micronesia, Fed. Sts.",53.56,53.92,54.28,54.65,55.01,55.37,55.73,56.09,56.45,56.82,57.18,57.54,57.9,58.26,58.63,58.99,59.36,59.73,60.1,60.48,60.89,61.3,61.71,62.12,62.5,62.85,63.14,63.37,63.53,63.64,63.71,63.77,63.81,63.86,63.92,63.99,64.07,64.15,64.23,64.3,64.5,64.7,64.9,65.1,65.4,65.7,65.9,66.1,66.3,66.6,66.8,66.0,67.3,67.4,67.6,67.7,67.9,68.0,68.1,68.3,68.4,68.6,68.7,68.8,68.9,69.0
+Moldova,58.5,58.96,59.42,59.85,60.27,60.68,61.07,61.46,61.84,62.22,62.61,62.99,63.38,63.77,64.14,64.48,64.78,65.03,65.23,65.39,65.48,65.55,65.58,65.6,65.6,65.57,65.52,65.47,65.41,65.4,65.48,65.68,65.98,66.38,66.83,67.29,67.69,67.98,68.16,68.2,67.4,67.6,67.4,65.8,65.4,66.1,67.9,68.5,68.4,68.6,69.2,69.6,69.9,70.2,69.5,69.8,70.0,70.4,70.6,70.5,72.3,72.4,73.3,73.6,73.9,74.2
+Mongolia,43.09,43.41,43.83,44.34,44.96,45.66,46.46,47.33,48.25,49.2,50.15,51.08,51.94,52.74,53.48,54.16,54.8,55.43,56.02,56.58,57.08,57.49,57.82,58.06,58.22,58.31,58.36,58.4,58.46,58.56,58.73,59.0,59.34,59.76,60.22,60.71,61.18,61.61,61.98,62.3,62.3,62.2,62.0,62.0,61.7,61.7,61.9,62.1,62.3,62.5,62.7,62.9,63.1,63.4,63.6,64.0,64.4,64.8,65.0,65.2,65.6,66.0,66.4,66.8,67.1,67.4
+Montenegro,59.32,59.59,59.91,60.31,60.78,61.3,61.87,62.5,63.17,63.86,64.54,65.21,65.86,66.47,67.05,67.62,68.19,68.78,69.36,69.94,70.48,70.99,71.41,71.78,72.07,72.33,72.55,72.75,72.95,73.16,73.35,73.52,73.68,73.83,73.96,74.08,74.21,74.35,74.47,74.6,74.4,74.2,73.9,73.7,73.5,73.4,73.3,73.1,73.0,73.3,73.5,74.0,74.5,74.8,75.0,75.2,75.6,76.0,76.3,76.5,76.7,76.8,76.9,77.1,77.2,77.3
+Morocco,45.84,46.21,46.58,46.98,47.39,47.81,48.25,48.7,49.17,49.64,50.11,50.6,51.09,51.58,52.06,52.54,53.0,53.46,53.91,54.34,54.77,55.19,55.62,56.08,56.56,57.11,57.72,58.39,59.13,59.93,60.77,61.63,62.49,63.33,64.14,64.91,65.66,66.38,67.06,67.7,68.1,68.4,68.6,69.1,69.5,70.0,70.4,70.8,71.1,71.5,71.8,72.0,72.3,72.5,72.7,72.9,73.1,73.3,73.5,73.7,73.9,74.1,74.3,74.4,74.6,74.8
+Mozambique,32.26,32.92,33.58,34.25,34.91,35.58,36.23,36.89,37.54,38.17,38.79,39.4,39.98,40.54,41.1,41.66,42.21,42.78,43.37,43.97,44.58,45.21,45.85,46.46,47.06,47.61,48.1,48.52,48.88,49.17,49.4,49.57,49.72,49.87,50.02,50.21,50.45,50.74,51.08,51.5,51.7,52.1,52.3,52.6,52.7,52.6,52.5,52.6,52.6,52.3,52.8,52.7,52.9,53.0,52.9,53.0,53.2,54.0,54.4,54.4,54.5,54.5,54.8,56.1,57.1,58.12
+Myanmar,33.8,35.24,36.53,37.69,38.71,39.6,40.36,41.03,41.65,42.25,42.9,43.64,44.47,45.4,46.4,47.39,48.31,49.11,49.78,50.31,50.72,51.09,51.44,51.78,52.15,52.54,52.93,53.31,53.69,54.07,54.44,54.8,55.16,55.52,55.87,56.23,56.58,56.93,57.26,57.6,57.8,58.1,58.4,58.8,59.0,59.4,59.7,60.1,60.4,60.8,61.3,61.7,62.3,62.8,63.4,64.0,64.6,59.4,65.6,66.0,66.4,66.8,67.2,67.6,68.0,68.4
+Namibia,40.72,41.49,42.23,42.96,43.69,44.39,45.09,45.76,46.42,47.07,47.7,48.31,48.9,49.48,50.05,50.61,51.17,51.71,52.26,52.81,53.36,53.91,54.44,54.98,55.51,56.04,56.56,57.07,57.57,58.06,58.54,59.01,59.45,59.87,60.27,60.65,61.0,61.3,61.54,61.7,61.9,62.0,62.0,61.5,60.5,59.3,58.1,56.7,55.4,54.0,53.4,52.7,52.4,52.5,53.1,54.9,57.5,59.1,60.3,61.4,62.6,63.6,63.9,64.1,64.2,64.3
+Nepal,35.53,36.0,36.48,36.96,37.43,37.9,38.38,38.85,39.32,39.8,40.26,40.74,41.21,41.67,42.14,42.6,43.05,43.51,43.97,44.43,44.91,45.41,45.92,46.47,47.05,47.64,48.28,48.94,49.63,50.32,51.06,51.81,52.57,53.36,54.17,54.98,55.83,56.68,57.53,58.4,59.1,60.0,60.2,61.0,61.7,62.5,63.4,63.9,64.6,65.2,65.9,65.9,66.8,67.0,67.4,67.8,68.1,68.4,68.7,69.0,69.3,69.7,69.9,70.2,69.7,69.2
+Netherlands,71.5,72.12,71.7,72.39,72.51,72.52,72.97,73.13,73.17,73.35,73.54,73.21,73.33,73.71,73.58,73.52,73.79,73.6,73.51,73.57,73.81,73.72,74.17,74.56,74.49,74.61,75.2,75.11,75.59,75.72,75.93,76.01,76.21,76.28,76.34,76.31,76.78,76.98,76.82,77.0,77.2,77.3,77.2,77.5,77.6,77.6,77.9,78.1,78.0,78.1,78.3,78.5,78.7,79.1,79.6,79.9,80.2,80.3,80.6,80.8,80.9,81.0,81.2,81.3,81.3,81.3
+Netherlands Antilles,58.96,60.02,61.0,61.89,62.7,63.43,64.08,64.65,65.15,65.6,65.99,66.34,66.67,67.0,67.33,67.67,68.03,68.41,68.81,69.22,69.63,70.05,70.45,70.84,71.21,71.58,71.94,72.29,72.64,72.96,73.27,73.56,73.8,74.02,74.19,74.33,74.42,74.49,74.52,74.54,74.53,74.52,74.5,74.49,74.48,74.48,74.5,74.53,74.57,74.65,74.76,74.91,75.09,75.3,75.53,75.76,75.98,76.18,76.36,76.52,76.65,76.77,76.89,77.01,77.14,77.27
+New Caledonia,49.51,50.34,51.16,51.96,52.74,53.5,54.25,54.98,55.69,56.38,57.06,57.72,58.36,58.99,59.6,60.19,60.77,61.33,61.89,62.42,62.95,63.46,63.95,64.44,64.91,65.37,65.81,66.25,66.68,67.09,67.5,67.89,68.28,68.65,69.02,69.37,69.72,70.05,70.38,70.7,71.01,71.31,71.6,71.89,72.16,72.43,72.7,72.95,73.21,73.46,73.7,73.94,74.17,74.4,74.62,74.84,75.05,75.26,75.47,75.67,75.88,76.09,76.31,76.52,76.74,76.96
+New Zealand,69.17,69.4,70.25,70.36,70.49,70.75,70.27,70.9,70.82,71.28,71.0,71.26,71.33,71.37,71.3,71.16,71.54,71.2,71.57,71.35,71.8,71.92,71.78,72.03,72.3,72.5,72.25,73.14,73.18,72.98,73.77,73.87,73.97,74.53,74.03,74.28,74.36,74.64,75.05,75.6,75.9,76.2,76.5,76.7,77.0,77.3,77.6,78.0,78.2,78.4,78.6,78.9,79.1,79.4,79.8,79.9,80.1,80.3,80.5,80.8,80.8,81.1,81.4,81.4,81.4,81.4
+Nicaragua,43.38,44.18,44.98,45.78,46.59,47.4,48.22,49.04,49.86,50.69,51.53,52.36,53.19,54.04,54.88,55.74,56.6,57.47,58.33,59.18,60.01,60.8,61.56,62.28,62.95,63.59,64.17,64.73,65.28,65.83,66.38,66.95,67.56,68.2,68.89,69.67,70.51,71.4,72.35,73.3,73.7,73.6,73.9,74.1,74.4,74.7,75.0,73.2,75.6,76.0,76.2,76.3,76.3,76.4,76.6,76.7,76.8,77.0,77.1,77.2,77.4,77.5,77.6,77.8,78.0,78.2
+Niger,35.61,35.72,35.83,35.95,36.08,36.22,36.37,36.51,36.67,36.82,36.97,37.1,37.24,37.36,37.49,37.61,37.73,37.88,38.05,38.24,38.45,38.69,38.95,39.25,39.57,39.97,40.4,40.9,41.44,42.0,42.58,43.13,43.66,44.15,44.63,45.09,45.57,46.07,46.62,47.2,47.9,48.2,48.6,49.1,49.5,50.2,50.6,51.2,51.8,52.4,52.9,53.7,54.4,55.2,55.9,56.6,57.3,58.0,58.6,59.2,59.6,60.0,60.4,60.7,61.0,61.3
+Nigeria,35.25,35.74,36.25,36.79,37.35,37.93,38.53,39.14,39.76,40.39,41.0,41.61,42.19,42.75,43.29,43.81,38.31,33.47,31.63,41.79,46.56,47.16,47.77,48.38,49.0,49.62,50.24,50.84,51.42,51.95,52.41,52.8,53.12,53.36,53.54,53.67,53.78,53.88,53.98,54.1,54.3,54.4,54.5,54.9,55.0,55.0,55.0,55.1,55.2,55.2,55.4,55.3,55.6,56.1,56.8,57.4,58.3,59.2,60.3,61.2,62.0,62.6,63.3,63.7,64.6,65.51
+North Korea,26.78,24.76,31.74,42.66,46.7,48.18,49.16,49.73,50.43,50.9,51.25,51.64,52.15,52.86,53.76,54.84,55.97,57.07,58.1,59.06,59.93,60.74,61.5,62.22,62.88,63.49,64.04,64.53,64.98,65.39,65.75,66.08,66.4,66.69,67.0,67.36,67.78,68.22,68.63,68.9,69.2,69.4,69.6,69.7,58.6,58.7,58.8,58.9,59.0,59.1,59.2,59.3,69.9,70.0,70.2,70.4,70.6,70.9,71.0,71.2,71.4,71.6,71.8,71.9,72.1,72.3
+Norway,72.58,72.72,73.2,73.28,73.5,73.55,73.5,73.5,73.63,73.66,73.67,73.55,73.2,73.7,73.83,74.11,74.18,74.07,73.78,74.19,74.3,74.46,74.56,74.88,74.93,75.17,75.51,75.54,75.54,75.8,76.0,76.13,76.19,76.36,76.07,76.21,76.07,76.17,76.52,76.6,77.0,77.1,77.5,77.7,77.9,78.2,78.3,78.3,78.5,78.6,78.9,79.1,79.5,79.8,80.2,80.4,80.6,80.8,80.8,81.1,81.1,81.6,81.6,82.0,82.0,82.0
+Oman,35.74,36.78,37.81,38.82,39.82,40.8,41.78,42.75,43.7,44.64,45.57,46.47,47.37,48.26,49.13,49.97,50.8,51.62,52.43,53.26,54.14,55.07,56.06,57.11,58.2,59.32,60.45,61.57,62.65,63.7,64.69,65.65,66.59,67.48,68.35,69.17,69.95,70.7,71.41,72.1,72.5,72.9,73.3,73.6,73.9,74.2,74.5,74.8,75.1,75.2,75.4,75.4,75.6,75.8,76.0,76.0,76.0,76.2,76.2,76.1,76.3,76.6,76.8,77.0,77.2,77.4
+Pakistan,36.85,38.07,39.26,40.42,41.56,42.67,43.75,44.8,45.81,46.79,47.73,48.63,49.47,50.27,51.01,51.7,52.34,52.95,53.52,54.06,54.6,55.12,55.64,56.16,56.68,57.17,57.63,58.05,58.44,58.79,59.13,59.45,59.77,60.09,60.43,60.77,61.11,61.45,61.78,62.1,62.2,62.1,62.0,61.9,61.8,61.9,61.8,62.0,62.1,62.3,62.5,62.6,62.8,63.1,62.2,63.7,63.8,64.1,64.3,64.5,64.9,65.1,65.4,65.6,65.9,66.2
+Panama,56.42,56.99,57.56,58.14,58.72,59.31,59.89,60.47,61.05,61.62,62.17,62.71,63.22,63.72,64.21,64.7,65.18,65.65,66.15,66.66,67.18,67.72,68.26,68.81,69.35,69.88,70.38,70.85,71.3,71.72,72.1,72.47,72.8,73.13,73.45,73.76,74.06,74.34,74.62,74.9,75.0,75.0,75.2,75.2,75.3,75.4,75.6,75.8,76.2,76.5,76.7,76.9,77.0,77.1,77.2,77.2,77.3,77.3,77.3,77.3,77.4,77.5,77.6,77.9,78.2,78.5
+Papua New Guinea,34.02,34.53,35.04,35.54,36.03,36.53,37.02,37.51,38.04,38.6,39.2,39.87,40.6,41.39,42.22,43.07,43.92,44.74,45.53,46.27,46.97,47.63,48.27,48.9,49.54,50.21,50.91,51.65,52.4,53.11,53.74,54.26,54.65,54.92,55.08,55.19,55.3,55.47,55.7,56.0,56.0,56.2,56.4,56.7,56.9,57.0,57.2,56.5,57.4,57.5,57.6,57.6,57.7,57.7,57.9,58.0,58.2,58.6,58.8,59.1,59.4,59.7,60.2,60.5,60.9,61.3
+Paraguay,64.04,64.16,64.33,64.52,64.76,65.03,65.33,65.65,66.0,66.35,66.7,67.03,67.33,67.61,67.87,68.11,68.37,68.63,68.9,69.2,69.49,69.78,70.06,70.32,70.57,70.81,71.04,71.28,71.51,71.73,71.97,72.19,72.41,72.64,72.87,73.11,73.36,73.62,73.91,74.2,74.2,74.1,74.1,74.0,74.1,74.1,74.2,74.2,74.3,74.2,74.2,74.1,74.1,73.8,74.0,74.0,74.0,74.0,74.0,74.0,74.0,74.1,74.1,74.3,74.4,74.5
+Peru,43.99,44.43,44.91,45.41,45.95,46.51,47.1,47.72,48.34,48.95,49.56,50.14,50.7,51.25,51.79,52.38,53.03,53.74,54.52,55.36,56.2,57.04,57.85,58.6,59.31,59.99,60.63,61.28,61.93,62.59,63.25,63.9,64.55,65.18,65.8,66.41,66.99,67.57,68.14,68.7,69.2,69.5,70.0,70.5,71.1,71.7,72.4,73.1,73.9,74.6,75.2,75.7,76.2,76.7,77.2,77.7,77.9,78.2,78.2,78.4,78.5,78.7,79.1,79.3,79.5,79.7
+Philippines,55.43,55.83,56.23,56.61,56.99,57.36,57.74,58.11,58.46,58.82,59.17,59.53,59.87,60.21,60.56,60.91,61.26,61.6,61.94,62.26,62.54,62.77,62.95,63.1,63.21,63.32,63.44,63.6,63.81,64.06,64.37,64.74,65.13,65.53,65.95,66.35,66.72,67.05,67.34,67.6,67.9,68.2,68.3,68.6,68.8,68.9,69.0,69.0,69.2,69.1,69.0,69.0,69.1,69.1,69.1,69.2,69.7,69.8,69.9,70.1,70.2,70.3,70.3,70.7,71.0,71.3
+Poland,59.68,60.87,61.96,62.97,63.9,64.74,65.5,65.97,65.59,67.92,68.04,67.71,68.64,68.87,69.58,69.99,69.69,70.33,69.83,69.96,69.76,70.95,70.95,71.46,70.88,70.88,70.78,70.71,71.05,70.4,71.38,71.45,71.29,71.03,70.78,71.07,71.12,71.49,71.25,70.9,70.7,71.1,71.7,71.7,71.9,72.4,72.7,73.0,73.1,73.8,74.2,74.6,74.9,75.0,75.1,75.2,75.2,75.4,75.7,76.2,76.5,76.7,77.3,77.4,77.6,77.8
+Portugal,58.71,59.81,61.11,62.25,61.42,61.22,61.49,63.79,62.97,64.23,62.85,64.37,65.0,65.22,66.17,65.67,66.57,66.88,66.49,67.14,66.91,69.23,68.63,69.18,68.9,69.12,70.37,70.83,71.64,71.71,71.9,72.73,72.65,72.94,73.22,73.61,74.0,74.02,74.58,74.2,74.2,74.6,74.7,75.5,75.5,75.5,75.8,76.1,76.4,76.8,76.8,77.3,77.6,78.2,78.4,79.0,79.2,79.4,79.6,79.9,80.2,80.4,80.7,80.7,80.8,80.9
+Puerto Rico,61.57,62.94,64.16,65.22,66.13,66.87,67.48,67.94,68.31,68.58,68.8,69.0,69.21,69.45,69.72,70.03,70.36,70.7,71.03,71.35,71.66,71.98,72.26,72.53,72.77,72.98,73.15,73.28,73.38,73.47,73.56,73.65,73.76,73.89,74.0,74.07,74.08,74.04,73.93,73.8,73.8,73.7,73.8,73.1,73.3,73.6,74.5,75.1,75.2,75.6,75.8,76.2,76.5,76.5,76.6,76.8,76.9,77.0,77.1,77.1,77.4,77.7,77.9,78.2,78.5,78.8
+Qatar,53.86,54.67,55.47,56.26,57.04,57.81,58.58,59.33,60.08,60.82,61.57,62.31,63.06,63.79,64.53,65.25,65.95,66.64,67.29,67.91,68.49,69.03,69.52,69.98,70.4,70.79,71.15,71.49,71.83,72.14,72.45,72.75,73.01,73.27,73.51,73.74,73.94,74.14,74.32,74.5,74.4,74.5,74.5,74.4,74.4,74.5,74.6,74.6,74.6,74.7,75.0,75.0,75.2,75.8,76.3,76.7,77.3,77.9,78.5,79.2,79.7,79.9,79.9,79.8,79.7,79.6
+Reunion,45.98,47.28,48.53,49.72,50.86,51.94,52.96,53.93,54.85,55.73,56.57,57.37,58.15,58.9,59.64,60.36,61.06,61.74,62.41,63.06,63.69,64.3,64.89,65.46,66.0,66.53,67.05,67.55,68.03,68.51,68.97,69.43,69.87,70.3,70.73,71.14,71.54,71.94,72.32,72.69,73.06,73.41,73.77,74.11,74.45,74.79,75.12,75.44,75.76,76.08,76.38,76.68,76.97,77.26,77.53,77.81,78.08,78.35,78.62,78.88,79.14,79.4,79.65,79.89,80.12,80.35
+Romania,61.13,61.07,61.19,61.47,61.93,62.54,63.29,64.14,65.04,65.92,66.7,67.32,67.74,67.96,68.02,67.98,67.95,68.01,68.16,68.41,68.73,69.06,69.34,69.58,69.75,69.87,69.95,70.01,70.06,70.1,70.12,70.11,70.1,70.08,70.05,70.02,70.0,69.98,69.99,70.0,70.5,70.0,69.8,69.5,69.4,69.1,69.1,69.8,70.6,71.1,71.1,71.2,71.6,72.0,72.4,72.8,73.3,73.2,73.3,73.7,74.5,74.7,74.9,75.1,75.2,75.3
+Russia,57.76,58.16,58.96,60.96,63.35,64.85,63.95,66.84,67.59,68.61,68.85,68.51,68.98,69.77,69.36,69.43,69.21,69.17,68.65,68.76,69.02,68.92,68.89,68.88,68.24,67.98,67.85,67.89,67.61,67.57,67.79,68.25,68.01,67.53,68.19,69.8,69.81,69.66,69.57,69.2,69.1,68.0,65.2,63.8,64.4,65.7,67.0,67.2,65.9,65.1,65.1,64.9,64.7,65.1,65.1,66.7,67.7,67.9,68.8,68.9,69.8,70.4,70.8,70.9,71.0,71.1
+Rwanda,39.99,40.32,40.66,41.0,41.34,41.69,42.03,42.38,42.73,43.07,43.41,43.74,44.05,44.35,44.62,44.85,45.07,45.27,45.44,45.58,45.71,45.81,45.91,46.01,46.13,46.31,46.54,46.81,47.12,47.46,47.88,48.32,48.69,48.88,49.15,49.42,49.69,49.96,50.23,50.5,49.3,48.0,46.7,13.2,43.8,44.6,44.0,45.6,47.2,49.2,51.0,53.5,55.5,57.6,59.6,61.6,63.1,64.1,64.3,65.1,65.3,65.5,65.6,65.7,65.9,66.1
+Samoa,46.08,46.69,47.3,47.9,48.5,49.09,49.69,50.28,50.87,51.45,52.04,52.62,53.21,53.8,54.39,54.98,55.57,56.15,56.75,57.33,57.92,58.5,59.09,59.67,60.26,60.84,61.44,62.02,62.62,63.2,63.79,64.36,64.94,65.51,66.1,66.67,67.27,67.87,68.49,69.1,69.1,69.5,69.7,69.8,70.0,70.2,70.4,70.6,70.7,70.8,71.0,71.2,71.4,71.6,71.8,72.0,72.1,72.3,70.4,72.6,72.7,72.7,73.0,73.1,73.2,73.3
+Sao Tome and Principe,46.1,46.54,47.01,47.52,48.05,48.6,49.18,49.77,50.38,51.01,51.62,52.21,52.79,53.36,53.92,54.47,55.03,55.6,56.19,56.81,57.47,58.13,58.81,59.47,60.09,60.63,61.08,61.42,61.67,61.83,61.93,62.02,62.12,62.24,62.4,62.59,62.79,63.0,63.2,63.4,63.5,63.6,63.7,64.0,64.1,63.9,63.9,64.0,64.4,64.6,64.9,65.0,65.3,65.4,65.5,65.7,65.7,66.0,66.7,66.9,67.2,67.4,67.6,67.8,68.0,68.2
+Saudi Arabia,42.31,42.89,43.47,44.05,44.64,45.23,45.82,46.42,47.02,47.62,48.22,48.84,49.48,50.15,50.88,51.68,52.55,53.51,54.55,55.65,56.82,58.04,59.26,60.48,61.67,62.83,63.95,65.01,66.03,66.99,67.89,68.74,69.54,70.3,71.01,71.66,72.28,72.85,73.39,73.9,74.3,74.6,74.9,75.1,75.5,75.8,76.0,76.3,76.6,76.8,77.1,77.2,77.4,77.5,77.8,77.9,78.2,78.3,78.5,78.7,78.9,79.2,79.3,79.4,79.5,79.6
+Senegal,34.89,35.39,35.88,36.34,36.78,37.19,37.57,37.93,38.23,38.46,38.63,38.72,38.76,38.74,38.71,38.7,38.74,38.9,39.17,39.59,40.18,40.94,41.85,42.85,43.94,45.07,46.21,47.33,48.4,49.42,50.43,51.44,52.47,53.48,54.45,55.36,56.17,56.86,57.41,57.8,58.0,58.0,58.2,58.2,58.4,58.8,58.9,59.1,59.2,59.7,60.2,60.4,61.3,61.7,62.2,62.5,63.0,63.5,63.9,64.2,64.4,64.6,64.8,65.0,65.3,65.6
+Serbia,58.63,59.11,59.61,60.12,60.63,61.15,61.69,62.23,62.78,63.33,63.88,64.44,64.99,65.53,66.06,66.56,67.05,67.51,67.94,68.34,68.7,69.03,69.32,69.59,69.82,70.03,70.21,70.37,70.53,70.68,70.84,71.01,71.19,71.38,71.58,71.78,71.99,72.17,72.35,72.5,71.4,72.4,72.3,72.1,72.0,71.9,72.1,71.5,71.0,72.1,72.4,72.5,72.7,72.9,73.2,73.6,74.0,74.3,74.6,74.8,75.1,75.4,75.7,75.9,76.2,76.5
+Seychelles,57.55,57.43,57.45,57.57,57.82,58.18,58.65,59.19,59.8,60.42,61.03,61.59,62.08,62.47,62.81,63.11,63.43,63.78,64.18,64.62,65.11,65.59,66.06,66.49,66.9,67.26,67.59,67.89,68.16,68.4,68.63,68.83,69.02,69.17,69.3,69.36,69.37,69.31,69.22,69.1,69.1,69.2,69.3,69.6,69.8,69.9,70.1,70.4,70.7,70.9,71.1,71.3,71.5,71.7,72.0,72.3,72.6,72.9,73.0,73.1,73.4,73.7,73.8,74.0,74.1,74.2
+Sierra Leone,31.66,32.13,32.62,33.1,33.6,34.09,34.59,35.08,35.58,36.07,36.57,37.06,37.57,38.1,38.7,39.38,40.18,41.08,42.08,43.15,44.28,45.39,46.48,47.5,48.45,49.31,50.11,50.83,51.49,52.04,52.5,52.83,53.06,53.16,53.14,52.98,52.72,52.36,51.98,51.6,51.4,51.9,52.1,51.6,50.9,51.9,51.3,49.7,49.2,51.5,51.8,51.6,51.7,52.0,52.3,52.7,53.0,53.6,54.2,55.0,55.6,56.4,57.1,55.2,57.1,59.07
+Singapore,58.62,59.54,60.41,61.24,62.01,62.73,63.39,64.01,64.54,65.02,65.41,65.72,65.97,66.16,66.31,66.46,66.63,66.84,67.09,67.4,67.75,68.12,68.5,68.88,69.26,69.62,69.98,70.34,70.68,71.04,71.39,71.74,72.11,72.49,72.87,73.27,73.68,74.1,74.51,74.9,75.6,76.0,76.2,76.3,76.4,76.7,77.2,77.6,78.0,78.3,78.6,78.9,79.3,79.8,80.0,80.2,80.4,80.6,81.0,81.3,81.5,81.6,81.7,81.9,82.0,82.1
+Slovak Republic,61.35,64.4,65.7,66.76,67.89,68.42,67.51,69.41,69.09,70.42,70.86,70.4,70.79,71.17,70.39,70.53,71.07,70.6,69.91,69.84,69.99,70.46,70.16,70.33,70.45,70.62,70.58,70.59,70.92,70.58,70.82,70.94,70.64,70.88,70.89,71.07,71.24,71.32,71.12,71.0,71.1,71.4,71.9,72.3,72.4,72.8,72.8,72.8,73.0,73.3,73.6,73.8,73.9,74.2,74.3,74.5,74.6,74.9,75.2,75.7,76.1,76.5,77.0,77.4,77.6,77.8
+Slovenia,64.71,65.28,65.83,66.34,66.81,67.25,67.66,68.02,68.34,68.62,68.82,68.98,69.08,69.12,69.14,69.14,69.14,69.17,69.23,69.32,69.47,69.66,69.86,70.09,70.32,70.51,70.66,70.77,70.85,70.89,70.94,71.03,70.74,71.2,71.63,72.17,72.1,72.75,73.19,73.7,73.6,73.8,73.9,74.2,74.6,75.0,75.2,75.4,75.7,76.1,76.3,76.6,76.8,77.2,77.6,77.9,78.2,78.7,79.1,79.5,79.9,80.1,80.3,80.8,80.9,81.0
+Solomon Islands,45.39,45.97,46.53,47.11,47.68,48.26,48.83,49.41,49.98,50.55,51.12,51.69,52.27,52.84,53.42,54.0,54.58,55.16,55.74,56.31,56.91,57.52,58.13,58.74,59.33,59.9,60.43,60.89,61.27,61.53,61.59,61.46,61.16,60.74,60.26,59.84,59.58,59.52,59.7,60.1,60.0,60.4,60.6,60.9,61.1,61.4,61.5,61.6,61.7,61.7,61.7,61.7,61.7,61.7,61.8,61.9,61.9,62.3,62.4,62.7,63.0,63.3,63.5,63.6,64.0,64.4
+Somalia,34.13,34.6,35.07,35.54,36.01,36.47,36.94,37.41,37.87,38.34,38.8,39.26,39.74,40.21,40.68,41.14,41.61,42.08,42.54,42.99,43.44,43.9,44.35,44.8,45.24,45.7,46.15,46.6,47.03,47.46,47.88,48.28,48.65,48.98,49.24,49.36,49.34,49.19,48.98,48.8,47.4,48.4,49.7,49.7,49.9,49.9,49.6,50.3,50.4,50.7,50.9,51.1,51.5,51.6,52.1,52.2,52.4,52.6,52.8,51.6,52.0,53.4,54.1,54.3,54.2,54.1
+South Africa,43.92,44.67,45.37,46.03,46.63,47.19,47.71,48.17,48.6,49.01,49.4,49.78,50.14,50.52,50.91,51.3,51.68,52.04,52.41,52.77,53.11,53.44,53.77,54.11,54.47,54.86,55.3,55.77,56.29,56.85,57.44,58.04,58.64,59.22,59.78,60.32,60.83,61.29,61.69,62.0,62.5,62.4,63.0,62.8,62.7,61.6,60.0,58.9,57.9,56.4,55.9,54.8,53.7,52.8,52.7,52.5,53.0,53.4,53.9,54.9,56.6,59.0,60.7,61.2,61.3,61.4
+South Korea,40.52,40.02,45.02,48.02,49.55,50.22,50.9,51.6,52.3,53.02,53.75,54.51,55.27,56.04,56.84,57.67,58.54,59.44,60.35,61.22,62.02,62.73,63.34,63.84,64.26,64.62,64.95,65.31,65.7,66.15,66.66,67.21,67.78,68.37,68.98,69.58,70.18,70.75,71.29,71.8,72.2,72.7,73.1,73.6,74.0,74.5,74.9,75.4,75.8,76.3,76.7,77.1,77.7,78.2,78.7,79.1,79.4,79.8,80.1,80.4,80.6,80.7,80.9,80.9,81.0,81.1
+South Sudan,28.6,29.37,30.11,30.82,31.51,32.17,32.81,33.42,34.02,34.61,35.18,35.75,36.32,36.9,37.48,38.04,38.6,39.15,39.68,40.21,40.75,41.29,41.84,42.39,42.93,43.43,43.87,44.26,44.61,44.93,45.25,45.6,46.01,46.5,47.06,47.72,48.45,49.23,50.05,50.9,51.0,51.6,51.9,52.3,52.7,53.1,53.4,53.8,54.1,54.4,54.7,54.9,55.0,55.2,55.3,55.4,55.5,55.6,55.8,56.0,55.9,56.0,56.0,56.1,56.1,56.1
+Spain,61.5,64.92,65.79,66.98,66.75,66.79,66.63,68.82,68.74,69.23,69.62,69.65,69.81,70.54,70.95,71.2,71.39,71.68,71.21,72.19,71.79,73.0,72.78,73.16,73.49,73.81,74.32,74.51,75.05,75.53,75.67,76.22,76.0,76.38,76.34,76.59,76.82,76.82,76.89,76.9,77.0,77.4,77.6,77.8,77.9,78.1,78.6,78.8,78.8,79.2,79.5,79.6,79.6,80.0,80.3,80.7,80.8,81.1,81.5,81.8,82.0,82.2,82.5,82.5,82.6,82.7
+Sri Lanka,53.25,54.34,55.32,56.22,57.01,57.71,58.32,58.86,59.32,59.76,60.18,60.61,61.06,61.55,62.07,62.62,63.17,63.7,64.21,64.69,65.15,65.56,65.97,66.36,66.76,67.17,67.6,68.06,68.52,68.97,69.35,69.64,69.83,69.93,69.97,70.0,70.05,70.16,70.32,70.5,71.3,72.0,72.9,72.8,71.7,71.3,71.4,72.0,72.4,72.4,73.3,73.7,74.0,69.4,73.9,73.9,74.4,74.0,74.1,75.0,76.4,76.8,77.1,77.4,77.6,77.8
+St. Lucia,51.89,52.09,52.4,52.81,53.32,53.92,54.6,55.36,56.15,56.97,57.75,58.47,59.11,59.66,60.15,60.58,61.0,61.45,61.94,62.47,63.04,63.63,64.22,64.8,65.38,65.96,66.54,67.1,67.64,68.15,68.6,68.99,69.29,69.53,69.7,69.83,69.93,70.03,70.12,70.2,70.4,70.5,70.7,70.9,71.1,71.2,71.5,71.7,71.8,72.0,72.1,72.3,72.5,72.8,73.1,73.4,73.7,74.1,74.3,74.5,74.6,74.7,74.7,74.8,74.8,74.8
+St. Vincent and the Grenadines,50.11,50.59,51.19,51.89,52.69,53.58,54.57,55.63,56.73,57.85,58.96,59.99,60.93,61.75,62.46,63.06,63.58,64.04,64.46,64.84,65.16,65.43,65.64,65.82,65.99,66.16,66.36,66.61,66.88,67.19,67.52,67.84,68.15,68.43,68.68,68.92,69.14,69.34,69.53,69.7,69.7,69.7,69.7,69.6,69.6,69.4,69.7,69.8,69.6,69.1,69.7,69.7,70.1,70.2,70.4,70.6,70.8,70.9,71.1,71.1,71.0,71.1,70.8,71.1,71.2,71.3
+Sudan,44.44,45.08,45.71,46.31,46.88,47.45,48.0,48.53,49.04,49.54,50.04,50.52,50.99,51.47,51.94,52.42,52.9,53.36,53.82,54.26,54.68,55.06,55.41,55.73,56.0,56.23,56.44,56.63,56.8,56.95,57.11,57.27,57.44,57.61,57.81,58.01,58.23,58.44,58.67,58.9,59.2,59.4,59.5,60.2,60.5,60.6,60.8,61.2,62.0,62.4,62.8,63.3,63.5,63.7,64.6,64.9,65.3,65.5,65.7,66.1,66.3,66.7,66.9,67.2,67.5,67.8
+Suriname,55.52,56.24,56.93,57.57,58.16,58.72,59.24,59.71,60.16,60.58,61.0,61.41,61.81,62.23,62.65,63.07,63.49,63.89,64.28,64.66,65.0,65.32,65.62,65.91,66.19,66.47,66.76,67.07,67.39,67.71,68.02,68.31,68.57,68.79,68.98,69.15,69.29,69.43,69.57,69.7,69.9,69.8,69.7,69.8,70.1,70.2,70.2,70.1,69.9,69.7,69.5,69.4,69.5,69.7,69.9,70.0,70.1,70.2,70.5,70.7,71.0,71.3,71.6,71.8,72.0,72.2
+Swaziland,41.01,41.51,41.98,42.44,42.88,43.3,43.7,44.08,44.44,44.78,45.1,45.42,45.73,46.05,46.39,46.76,47.2,47.67,48.21,48.79,49.4,50.03,50.67,51.3,51.94,52.58,53.24,53.92,54.62,55.31,56.02,56.71,57.38,58.0,58.6,59.15,59.67,60.13,60.5,60.7,60.7,61.0,61.3,60.7,59.1,57.1,55.8,53.5,51.4,48.8,46.6,45.1,44.0,43.0,42.5,43.1,44.3,45.1,45.9,46.4,48.0,49.1,49.4,49.8,51.8,53.88
+Sweden,71.35,71.84,71.88,72.34,72.58,72.64,72.47,73.11,73.34,73.01,73.47,73.34,73.53,73.7,73.85,74.09,74.12,73.99,74.11,74.66,74.58,74.68,74.83,74.94,74.95,74.96,75.39,75.48,75.52,75.74,76.04,76.36,76.6,76.86,76.72,76.98,77.12,77.01,77.67,77.6,77.7,78.1,78.3,78.5,78.9,79.1,79.4,79.5,79.5,79.7,79.8,80.0,80.2,80.2,80.6,80.8,80.9,81.1,81.2,81.6,81.7,81.8,81.9,82.1,82.1,82.1
+Switzerland,68.72,69.63,69.55,70.02,70.1,70.23,70.58,71.32,71.48,71.46,71.79,71.35,71.34,72.23,72.36,72.5,72.8,72.75,72.76,73.18,73.3,73.82,74.12,74.47,74.86,74.98,75.43,75.39,75.69,75.69,75.92,76.26,76.27,76.87,76.99,77.17,77.47,77.49,77.68,77.5,77.6,77.9,78.3,78.4,78.5,79.1,79.2,79.5,79.8,79.8,80.2,80.4,80.6,81.0,81.3,81.5,81.7,82.0,82.0,82.3,82.6,82.7,82.8,82.9,83.0,83.1
+Syria,47.87,48.44,49.02,49.59,50.15,50.7,51.25,51.79,52.33,52.87,53.43,53.98,54.56,55.15,55.77,56.42,57.12,57.83,58.57,59.31,60.08,60.82,61.56,62.26,62.95,63.6,64.24,64.84,65.44,66.01,66.56,67.08,67.58,68.05,68.51,68.94,69.35,69.75,70.14,70.5,71.0,71.8,72.0,72.3,72.7,73.1,73.4,73.8,74.1,74.4,74.6,74.9,75.1,75.3,75.5,75.7,75.9,76.1,76.3,76.5,75.1,68.1,69.0,67.2,68.2,69.21
+Taiwan,55.11,58.51,60.31,62.01,62.41,62.51,62.41,64.21,64.22,64.42,64.92,65.22,66.02,66.72,67.42,67.42,67.52,67.62,68.62,68.67,69.08,69.38,69.43,69.8,70.05,70.41,70.58,71.15,71.28,71.53,71.63,72.14,72.12,72.79,72.98,73.11,73.4,73.22,73.53,73.8,74.2,74.3,74.5,74.6,74.6,74.7,75.2,75.4,75.3,76.0,76.4,76.9,77.3,77.3,77.4,77.8,78.2,78.4,78.7,79.0,78.8,79.0,79.3,79.4,79.5,79.6
+Tajikistan,52.94,53.4,53.87,54.33,54.79,55.26,55.72,56.17,56.64,57.1,57.57,58.03,58.51,58.98,59.45,59.9,60.34,60.77,61.17,61.55,61.9,62.23,62.53,62.81,63.08,63.34,63.57,63.81,64.04,64.28,64.53,64.8,65.07,65.34,65.55,65.69,65.73,65.67,65.5,65.3,65.3,62.6,64.2,64.1,64.1,63.3,64.8,64.9,65.5,65.8,66.1,66.5,66.9,67.5,68.0,68.7,69.2,69.6,70.0,70.1,70.1,70.8,71.4,71.9,72.4,72.9
+Tanzania,41.66,42.19,42.69,43.18,43.63,44.05,44.46,44.84,45.22,45.57,45.91,46.26,46.62,46.99,47.37,47.77,48.19,48.62,49.07,49.53,50.03,50.55,51.09,51.65,52.19,52.71,53.19,53.61,53.98,54.29,54.56,54.82,55.05,55.26,55.44,55.54,55.58,55.51,55.39,55.2,55.1,54.7,54.5,54.0,53.9,53.8,53.8,53.7,53.8,54.3,54.8,55.4,55.9,56.5,57.1,57.9,59.1,60.4,60.8,61.4,61.7,61.9,62.7,63.3,64.1,64.91
+Thailand,51.14,51.5,51.9,52.32,52.78,53.28,53.8,54.35,54.91,55.46,56.01,56.51,56.98,57.4,57.8,58.18,58.56,58.96,59.39,59.86,60.33,60.82,61.29,61.77,62.24,62.7,63.15,63.62,64.1,64.62,65.22,65.91,66.69,67.52,68.36,69.15,69.84,70.38,70.76,71.0,71.0,70.9,70.8,70.6,70.6,70.6,70.5,70.4,70.5,70.7,71.2,71.7,72.1,72.2,73.1,73.5,73.8,73.9,74.0,74.2,74.3,74.4,74.4,74.6,74.7,74.8
+Timor-Leste,31.41,32.12,32.83,33.54,34.24,34.94,35.64,36.34,37.04,37.74,38.45,39.15,39.85,40.55,41.29,42.12,43.02,43.96,44.86,45.56,45.8,45.51,44.71,43.49,42.12,40.94,40.25,40.27,41.01,42.45,44.42,46.61,48.76,50.76,52.51,53.99,55.26,56.41,57.47,58.5,59.2,59.9,60.6,61.3,61.8,62.3,62.4,62.8,62.3,60.7,64.4,65.3,65.7,66.5,67.5,68.5,69.2,69.9,70.4,70.8,71.3,71.7,72.0,72.3,72.4,72.5
+Togo,34.69,35.42,36.15,36.86,37.57,38.28,38.98,39.68,40.38,41.06,41.74,42.42,43.1,43.77,44.43,45.09,45.75,46.41,47.07,47.72,48.36,49.0,49.63,50.26,50.88,51.49,52.09,52.7,53.29,53.87,54.43,54.97,55.48,55.96,56.39,56.79,57.14,57.42,57.65,57.8,57.8,57.9,57.8,57.6,57.6,57.3,56.9,56.6,56.8,56.7,56.7,56.7,56.4,56.8,56.8,57.5,57.5,57.5,58.0,58.7,59.6,60.3,60.7,61.1,61.5,61.9
+Tonga,58.0,58.35,58.7,59.05,59.41,59.77,60.12,60.48,60.84,61.2,61.56,61.91,62.26,62.6,62.94,63.27,63.61,63.93,64.26,64.58,64.88,65.17,65.44,65.69,65.93,66.16,66.39,66.61,66.84,67.08,67.32,67.56,67.8,68.04,68.27,68.48,68.67,68.83,68.98,69.1,69.3,69.4,69.5,69.5,69.6,69.7,69.7,69.7,69.6,69.6,69.6,69.7,69.6,69.8,70.0,70.1,70.2,70.3,68.6,70.7,70.8,71.0,71.2,71.3,71.5,71.7
+Trinidad and Tobago,57.36,57.85,58.39,58.98,59.61,60.27,60.97,61.68,62.38,63.07,63.68,64.2,64.63,64.94,65.17,65.31,65.41,65.5,65.6,65.73,65.91,66.11,66.33,66.57,66.83,67.08,67.33,67.54,67.73,67.89,68.03,68.16,68.28,68.39,68.5,68.62,68.74,68.87,68.98,69.1,69.3,69.2,69.3,69.2,69.3,69.3,69.4,69.6,69.3,69.5,69.8,69.9,70.4,70.9,71.1,71.3,71.5,71.7,71.8,71.8,71.9,72.0,72.1,72.3,72.4,72.5
+Tunisia,39.03,39.33,39.68,40.06,40.48,40.94,41.43,41.97,42.56,43.2,43.89,44.65,45.47,46.35,47.31,48.33,49.42,50.56,51.74,52.94,54.16,55.37,56.57,57.75,58.9,60.03,61.15,62.27,63.36,64.41,65.4,66.31,67.13,67.87,68.56,69.2,69.83,70.48,71.13,71.8,72.0,72.2,72.2,72.5,72.9,73.4,73.9,74.3,74.7,75.0,75.3,75.5,75.7,76.0,76.2,76.4,76.6,76.8,77.0,77.1,77.2,77.4,77.5,77.6,77.6,77.6
+Turkey,41.2,41.68,42.2,42.76,43.35,43.99,44.67,45.38,46.13,46.91,47.71,48.52,49.35,50.15,50.96,51.74,52.48,53.21,53.91,54.59,55.27,55.96,56.65,57.36,58.08,58.81,59.55,60.29,61.03,61.74,62.45,63.15,63.82,64.49,65.15,65.76,66.37,66.96,67.53,68.1,68.5,69.2,69.7,69.8,70.0,70.6,71.2,72.0,71.5,73.8,74.4,75.1,75.1,75.8,76.2,76.7,77.4,77.8,78.5,78.8,78.8,79.1,78.8,79.1,79.2,79.3
+Turkmenistan,50.89,51.34,51.79,52.25,52.69,53.14,53.58,54.03,54.47,54.91,55.36,55.82,56.27,56.72,57.17,57.61,58.02,58.42,58.8,59.15,59.46,59.74,60.01,60.25,60.49,60.73,61.0,61.28,61.58,61.9,62.24,62.57,62.89,63.18,63.44,63.63,63.77,63.85,63.89,63.9,63.5,63.5,63.5,63.4,63.3,63.2,63.2,63.3,63.5,63.7,64.1,64.4,64.8,65.3,65.8,66.3,66.8,67.2,67.6,68.1,68.5,68.9,69.2,69.6,70.0,70.4
+Uganda,39.94,40.51,41.08,41.65,42.24,42.82,43.42,44.03,44.64,45.27,45.91,46.56,47.22,47.86,48.49,49.07,49.58,50.05,50.43,50.74,50.99,51.17,51.33,51.45,51.55,51.65,51.75,51.83,51.93,52.01,52.09,52.14,52.17,52.16,52.09,51.94,51.72,51.42,51.08,50.7,50.0,49.6,49.0,48.5,48.3,48.2,48.5,48.7,48.9,49.1,49.7,50.3,51.2,52.0,53.5,54.9,55.3,56.0,57.0,57.8,58.6,59.3,60.1,60.7,61.3,61.91
+Ukraine,62.2,62.94,63.63,64.42,66.26,67.15,67.19,68.88,69.26,70.88,71.15,70.56,71.18,71.97,71.4,71.66,71.25,71.33,70.7,70.59,70.81,70.57,70.75,70.63,69.96,70.01,69.68,69.63,69.36,69.33,69.36,69.51,69.48,69.17,69.47,70.82,70.61,70.49,70.43,70.0,69.4,68.8,68.3,67.5,66.5,66.7,67.3,68.1,67.7,67.3,67.5,67.5,67.7,67.5,67.1,67.9,67.6,67.8,69.6,70.5,71.1,71.2,71.3,71.3,71.5,71.7
+United Arab Emirates,41.83,43.04,44.22,45.37,46.5,47.62,48.7,49.77,50.82,51.85,52.89,53.91,54.91,55.9,56.87,57.81,58.7,59.54,60.33,61.08,61.78,62.45,63.09,63.7,64.3,64.87,65.41,65.93,66.43,66.91,67.36,67.79,68.2,68.6,68.98,69.34,69.68,70.0,70.31,70.6,70.8,71.1,71.3,71.6,71.9,72.1,72.4,72.8,73.0,73.3,73.6,73.8,74.1,74.4,75.2,75.7,75.6,75.6,75.6,75.6,75.5,75.5,75.4,75.4,75.4,75.4
+United Kingdom,68.26,69.55,69.82,70.19,70.15,70.42,70.54,70.71,70.81,71.02,70.77,70.84,70.74,71.53,71.52,71.43,72.06,71.68,71.64,71.89,72.2,71.98,72.18,72.38,72.65,72.62,73.11,73.04,73.14,73.57,73.9,74.03,74.28,74.66,74.51,74.78,75.12,75.23,75.36,75.7,76.0,76.2,76.3,76.6,76.7,76.9,77.1,77.3,77.5,77.8,78.0,78.2,78.4,78.7,79.0,79.2,79.5,79.7,80.0,80.2,80.5,80.7,80.8,80.9,81.0,81.1
+United States,68.22,68.44,68.79,69.58,69.63,69.71,69.49,69.76,69.98,69.91,70.32,70.21,70.04,70.33,70.41,70.43,70.76,70.42,70.66,70.92,71.24,71.34,71.54,72.08,72.68,72.99,73.38,73.58,74.03,73.93,74.36,74.65,74.71,74.81,74.79,74.87,75.01,75.02,75.1,75.4,75.5,75.8,75.7,75.8,75.9,76.3,76.6,76.8,76.9,76.9,76.9,77.1,77.3,77.6,77.6,77.8,78.1,78.3,78.5,78.8,78.9,79.0,79.1,79.1,79.1,79.1
+Uruguay,65.96,66.11,66.28,66.47,66.69,66.93,67.18,67.43,67.7,67.95,68.19,68.39,68.55,68.67,68.74,68.78,68.8,68.82,68.84,68.88,68.94,69.01,69.1,69.23,69.39,69.58,69.8,70.05,70.32,70.6,70.89,71.17,71.46,71.72,71.97,72.2,72.41,72.61,72.81,73.0,72.6,73.2,73.2,73.3,73.4,73.5,73.7,74.0,74.3,74.6,74.8,75.0,75.0,75.3,75.5,75.7,75.7,76.0,76.2,76.2,76.3,76.3,76.4,76.6,76.8,77.0
+Uzbekistan,55.32,55.78,56.23,56.68,57.13,57.58,58.02,58.46,58.91,59.35,59.8,60.25,60.7,61.15,61.59,62.02,62.43,62.83,63.2,63.54,63.86,64.14,64.4,64.64,64.87,65.11,65.37,65.64,65.94,66.25,66.59,66.93,67.27,67.59,67.85,68.02,68.09,68.06,67.96,67.8,67.6,67.3,67.0,66.7,66.6,66.7,66.9,67.1,67.4,67.6,67.8,67.9,68.1,68.3,68.5,68.8,69.2,69.6,69.9,70.2,70.6,70.9,71.2,71.5,71.8,72.1
+Vanuatu,40.79,41.36,41.94,42.51,43.09,43.67,44.24,44.82,45.4,45.97,46.55,47.14,47.71,48.29,48.87,49.44,50.01,50.56,51.12,51.67,52.21,52.77,53.33,53.89,54.46,55.05,55.64,56.24,56.83,57.41,57.97,58.5,58.98,59.44,59.87,60.27,60.67,61.07,61.48,61.9,62.0,62.1,62.2,62.2,62.3,62.4,61.2,62.5,62.0,62.5,62.5,62.5,62.5,62.6,62.7,62.9,63.2,63.4,63.6,63.9,64.1,64.4,64.6,64.7,64.9,65.1
+Venezuela,54.64,55.24,55.84,56.43,57.03,57.64,58.25,58.86,59.47,60.08,60.69,61.3,61.91,62.51,63.09,63.66,64.22,64.77,65.3,65.8,66.27,66.72,67.14,67.53,67.9,68.23,68.52,68.79,69.04,69.3,69.57,69.85,70.17,70.53,70.89,71.27,71.63,71.95,72.24,72.5,72.4,72.4,72.5,72.4,72.7,73.1,73.6,73.6,70.2,73.8,73.8,73.8,73.5,74.3,74.6,74.5,74.4,74.2,74.4,74.9,74.8,74.6,74.7,74.8,74.8,74.8
+Vietnam,51.98,52.81,53.6,54.36,55.11,55.83,56.52,57.19,57.86,58.52,59.17,59.82,60.42,60.95,61.32,61.36,61.06,60.45,59.63,58.78,58.17,58.0,58.35,59.23,60.54,62.07,63.58,64.86,65.84,66.49,66.86,67.1,67.3,67.51,67.77,68.07,68.38,68.68,69.0,69.3,69.6,69.8,70.1,70.3,70.6,70.9,71.1,71.5,71.7,72.0,72.2,72.5,72.8,73.0,73.3,73.5,73.8,74.1,74.3,74.5,74.7,74.9,75.0,75.2,75.4,75.6
+Virgin Islands (U.S.),57.9,58.87,59.74,60.54,61.25,61.88,62.44,62.93,63.36,63.75,64.11,64.46,64.82,65.2,65.6,66.02,66.44,66.87,67.29,67.71,68.12,68.53,68.94,69.34,69.73,70.11,70.46,70.8,71.12,71.43,71.74,72.05,72.38,72.71,73.06,73.41,73.75,74.09,74.42,74.73,75.04,75.34,75.64,75.94,76.23,76.52,76.8,77.07,77.33,77.57,77.8,78.0,78.19,78.36,78.52,78.69,78.86,79.05,79.25,79.46,79.69,79.92,80.15,80.38,80.6,80.82
+West Bank and Gaza,47.03,47.31,47.63,47.97,48.36,48.78,49.23,49.72,50.25,50.82,51.43,52.08,52.75,53.47,54.2,54.94,55.7,56.45,57.22,57.97,58.73,59.48,60.26,61.03,61.81,62.6,63.39,64.18,64.96,65.74,66.48,67.21,67.92,68.59,69.23,69.82,70.38,70.88,71.36,71.8,72.0,72.4,72.8,73.3,73.7,74.0,74.2,74.5,74.7,74.4,74.7,74.4,74.4,74.4,74.6,74.4,74.3,74.1,73.8,74.3,74.2,74.2,74.4,74.5,74.6,74.7
+Western Sahara,34.95,35.33,35.72,36.1,36.48,36.86,37.24,37.62,37.99,38.37,38.75,39.12,39.5,39.88,40.26,40.62,40.97,41.32,41.67,42.07,42.52,43.07,43.7,44.43,45.23,46.11,47.01,47.92,48.82,49.72,50.61,51.5,52.4,53.3,54.17,54.99,55.74,56.43,57.04,57.59,58.09,58.56,59.03,59.51,60.0,60.51,61.04,61.57,62.11,62.64,63.15,63.65,64.13,64.58,65.01,65.41,65.79,66.16,66.51,66.84,67.17,67.47,67.76,68.04,68.3,68.56
+Yemen,24.0,24.96,25.92,26.87,27.84,28.8,29.76,30.72,31.68,32.64,33.58,34.52,35.45,36.37,37.27,38.15,39.01,39.87,40.71,41.55,42.4,43.28,44.17,45.1,46.05,47.05,48.06,49.08,50.11,51.13,52.13,53.09,54.02,54.89,55.69,56.4,57.04,57.6,58.08,58.5,58.9,59.3,59.6,59.7,60.3,60.7,61.1,61.5,62.0,62.4,62.8,63.3,63.7,64.2,64.6,65.0,65.2,65.7,66.2,66.6,66.6,66.7,67.1,67.1,66.0,64.92
+Zambia,43.22,43.79,44.38,44.95,45.53,46.1,46.67,47.24,47.79,48.34,48.89,49.42,49.94,50.44,50.96,51.49,52.05,52.64,53.25,53.88,54.51,55.13,55.71,56.24,56.7,57.07,57.36,57.57,57.66,57.62,57.45,57.14,56.71,56.17,55.54,54.85,54.09,53.33,52.59,51.9,50.7,49.6,48.6,47.7,46.9,46.3,45.9,45.4,45.0,44.8,44.9,45.1,45.3,46.3,47.1,47.9,49.0,51.1,52.3,53.1,53.7,54.7,55.6,56.3,56.7,57.1
+Zimbabwe,48.75,49.25,49.75,50.25,50.73,51.22,51.71,52.17,52.64,53.11,53.55,53.99,54.42,54.83,55.25,55.65,56.04,56.43,56.83,57.22,57.63,58.05,58.47,58.92,59.41,59.94,60.53,61.17,61.82,62.48,63.13,63.73,64.23,64.63,64.86,64.9,64.74,64.39,63.81,63.0,62.7,61.4,59.8,58.2,56.0,54.4,52.8,50.9,49.3,47.9,47.0,45.9,45.3,44.7,45.1,45.5,46.4,47.3,48.0,49.1,51.6,54.2,55.7,57.0,59.3,61.69
diff --git a/docs/previous_versions/v0.4.0/data/offshore.csv b/docs/previous_versions/v0.4.0/data/offshore.csv
new file mode 100755
index 000000000..5aa096441
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/data/offshore.csv
@@ -0,0 +1,828 @@
+college_grad,response
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
diff --git a/docs/previous_versions/v0.4.0/data/zinc_tidy.csv b/docs/previous_versions/v0.4.0/data/zinc_tidy.csv
new file mode 100755
index 000000000..84856e658
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/data/zinc_tidy.csv
@@ -0,0 +1,21 @@
+loc_id,location,concentration
+1.0,bottom,0.43
+1.0,surface,0.415
+2.0,bottom,0.266
+2.0,surface,0.238
+3.0,bottom,0.567
+3.0,surface,0.39
+4.0,bottom,0.531
+4.0,surface,0.41
+5.0,bottom,0.707
+5.0,surface,0.605
+6.0,bottom,0.716
+6.0,surface,0.609
+7.0,bottom,0.651
+7.0,surface,0.632
+8.0,bottom,0.589
+8.0,surface,0.523
+9.0,bottom,0.469
+9.0,surface,0.411
+10.0,bottom,0.723
+10.0,surface,0.612
diff --git a/docs/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg b/docs/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg
new file mode 100755
index 000000000..92464e41e
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/apps.jpg b/docs/previous_versions/v0.4.0/images/apps.jpg
new file mode 100755
index 000000000..7ef7ea59a
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/apps.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/coggle.png b/docs/previous_versions/v0.4.0/images/coggle.png
new file mode 100755
index 000000000..668944334
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/coggle.png differ
diff --git a/docs/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png b/docs/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png
new file mode 100755
index 000000000..054694d97
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png differ
diff --git a/docs/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png b/docs/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png
new file mode 100755
index 000000000..d7037938b
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png differ
diff --git a/docs/previous_versions/v0.4.0/images/dashboard.jpg b/docs/previous_versions/v0.4.0/images/dashboard.jpg
new file mode 100755
index 000000000..57996bf17
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/dashboard.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/datacamp.png b/docs/previous_versions/v0.4.0/images/datacamp.png
new file mode 100755
index 000000000..2911de3c4
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp.png differ
diff --git a/docs/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png
new file mode 100755
index 000000000..17fcfa240
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png differ
diff --git a/docs/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png
new file mode 100755
index 000000000..811743c26
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png differ
diff --git a/docs/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png
new file mode 100755
index 000000000..143c4cee8
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png differ
diff --git a/docs/previous_versions/v0.4.0/images/datacamp_intermediate_R.png b/docs/previous_versions/v0.4.0/images/datacamp_intermediate_R.png
new file mode 100755
index 000000000..81b3cf7fb
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_intermediate_R.png differ
diff --git a/docs/previous_versions/v0.4.0/images/datacamp_intro_to_R.png b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_R.png
new file mode 100755
index 000000000..193664acd
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_R.png differ
diff --git a/docs/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png
new file mode 100755
index 000000000..8bd13337a
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png differ
diff --git a/docs/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png
new file mode 100755
index 000000000..69ca9772a
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png differ
diff --git a/docs/previous_versions/v0.4.0/images/datacamp_working_with_data.png b/docs/previous_versions/v0.4.0/images/datacamp_working_with_data.png
new file mode 100755
index 000000000..eeb4ac861
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_working_with_data.png differ
diff --git a/docs/previous_versions/v0.4.0/images/engine.jpg b/docs/previous_versions/v0.4.0/images/engine.jpg
new file mode 100755
index 000000000..597512b49
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/engine.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/errors.png b/docs/previous_versions/v0.4.0/images/errors.png
new file mode 100755
index 000000000..43c19d9a3
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/errors.png differ
diff --git a/docs/previous_versions/v0.4.0/images/filter.png b/docs/previous_versions/v0.4.0/images/filter.png
new file mode 100755
index 000000000..8cd96205d
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/filter.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png
new file mode 100755
index 000000000..e14558e96
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png
new file mode 100755
index 000000000..0ce574917
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png
new file mode 100755
index 000000000..7c8b6c6a7
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png
new file mode 100755
index 000000000..71139e1a1
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png
new file mode 100755
index 000000000..e78715c4d
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png
new file mode 100755
index 000000000..dce19ad70
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png
new file mode 100755
index 000000000..964f0ae8f
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png
new file mode 100755
index 000000000..83b51e66e
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png
new file mode 100755
index 000000000..d9baa59f1
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/generate.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/generate.png
new file mode 100755
index 000000000..d81baa6ff
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/generate.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht.png
new file mode 100755
index 000000000..5effd3674
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png
new file mode 100755
index 000000000..582bdad19
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/specify.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/specify.png
new file mode 100755
index 000000000..7f68e18b7
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/specify.png differ
diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png
new file mode 100755
index 000000000..895426ff3
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png differ
diff --git a/docs/previous_versions/v0.4.0/images/group_summary.png b/docs/previous_versions/v0.4.0/images/group_summary.png
new file mode 100755
index 000000000..2f09b0f0f
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/group_summary.png differ
diff --git a/docs/previous_versions/v0.4.0/images/guess_the_correlation.png b/docs/previous_versions/v0.4.0/images/guess_the_correlation.png
new file mode 100755
index 000000000..fefdb23b1
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/guess_the_correlation.png differ
diff --git a/docs/previous_versions/v0.4.0/images/ht.png b/docs/previous_versions/v0.4.0/images/ht.png
new file mode 100755
index 000000000..204422828
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/ht.png differ
diff --git a/docs/previous_versions/v0.4.0/images/iphone.jpg b/docs/previous_versions/v0.4.0/images/iphone.jpg
new file mode 100755
index 000000000..cf3a222a0
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/iphone.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/ismay.jpeg b/docs/previous_versions/v0.4.0/images/ismay.jpeg
new file mode 100755
index 000000000..f68ead9ed
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/ismay.jpeg differ
diff --git a/docs/previous_versions/v0.4.0/images/join-inner.png b/docs/previous_versions/v0.4.0/images/join-inner.png
new file mode 100755
index 000000000..18e996daa
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/join-inner.png differ
diff --git a/docs/previous_versions/v0.4.0/images/kim.jpeg b/docs/previous_versions/v0.4.0/images/kim.jpeg
new file mode 100755
index 000000000..524aff3d5
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/kim.jpeg differ
diff --git a/docs/previous_versions/v0.4.0/images/logos/book_cover.png b/docs/previous_versions/v0.4.0/images/logos/book_cover.png
new file mode 100755
index 000000000..f20fd9ef6
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/logos/book_cover.png differ
diff --git a/docs/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png b/docs/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png
new file mode 100755
index 000000000..d28831d0b
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png differ
diff --git a/docs/previous_versions/v0.4.0/images/logos/favicons/favicon.ico b/docs/previous_versions/v0.4.0/images/logos/favicons/favicon.ico
new file mode 100755
index 000000000..bddb10a6f
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/logos/favicons/favicon.ico differ
diff --git a/docs/previous_versions/v0.4.0/images/mutate.png b/docs/previous_versions/v0.4.0/images/mutate.png
new file mode 100755
index 000000000..ab15762b8
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/mutate.png differ
diff --git a/docs/previous_versions/v0.4.0/images/read_excel.png b/docs/previous_versions/v0.4.0/images/read_excel.png
new file mode 100755
index 000000000..e9467bb82
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/read_excel.png differ
diff --git a/docs/previous_versions/v0.4.0/images/relational-nycflights.png b/docs/previous_versions/v0.4.0/images/relational-nycflights.png
new file mode 100755
index 000000000..10b04ce0f
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/relational-nycflights.png differ
diff --git a/docs/previous_versions/v0.4.0/images/rstudio.png b/docs/previous_versions/v0.4.0/images/rstudio.png
new file mode 100755
index 000000000..e1d286545
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/rstudio.png differ
diff --git a/docs/previous_versions/v0.4.0/images/sampling/shovel_025.jpg b/docs/previous_versions/v0.4.0/images/sampling/shovel_025.jpg
new file mode 100755
index 000000000..df2c5e1d2
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/shovel_025.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/sampling/shovel_050.jpg b/docs/previous_versions/v0.4.0/images/sampling/shovel_050.jpg
new file mode 100755
index 000000000..68787cf3d
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/shovel_050.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/sampling/shovel_100.jpg b/docs/previous_versions/v0.4.0/images/sampling/shovel_100.jpg
new file mode 100755
index 000000000..1cc70a70f
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/shovel_100.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg b/docs/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg
new file mode 100755
index 000000000..9a045406f
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg b/docs/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg
new file mode 100755
index 000000000..45b2791a9
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg b/docs/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg
new file mode 100755
index 000000000..50ef8b56f
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg b/docs/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg
new file mode 100755
index 000000000..bd20120f3
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/sampling_bowl_2.jpg b/docs/previous_versions/v0.4.0/images/sampling_bowl_2.jpg
new file mode 100755
index 000000000..48412bcfd
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling_bowl_2.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg b/docs/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg
new file mode 100755
index 000000000..a38e5d063
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg differ
diff --git a/docs/previous_versions/v0.4.0/images/select.png b/docs/previous_versions/v0.4.0/images/select.png
new file mode 100755
index 000000000..a7329274a
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/select.png differ
diff --git a/docs/previous_versions/v0.4.0/images/sign-2408065_1920.png b/docs/previous_versions/v0.4.0/images/sign-2408065_1920.png
new file mode 100755
index 000000000..824dc86f0
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sign-2408065_1920.png differ
diff --git a/docs/previous_versions/v0.4.0/images/summarize1.png b/docs/previous_versions/v0.4.0/images/summarize1.png
new file mode 100755
index 000000000..e52e1d984
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/summarize1.png differ
diff --git a/docs/previous_versions/v0.4.0/images/summary.png b/docs/previous_versions/v0.4.0/images/summary.png
new file mode 100755
index 000000000..86415225e
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/summary.png differ
diff --git a/docs/previous_versions/v0.4.0/images/tidy-1.png b/docs/previous_versions/v0.4.0/images/tidy-1.png
new file mode 100755
index 000000000..4287d74c6
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/tidy-1.png differ
diff --git a/docs/previous_versions/v0.4.0/images/tidy1.png b/docs/previous_versions/v0.4.0/images/tidy1.png
new file mode 100755
index 000000000..88771ff58
Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/tidy1.png differ
diff --git a/docs/previous_versions/v0.4.0/index.html b/docs/previous_versions/v0.4.0/index.html
new file mode 100644
index 000000000..3fdbeb151
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/index.html
@@ -0,0 +1,941 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+
+<link rel="next" href="2-getting-started.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="header">
+<h1 class="title">An Introduction to Statistical and Data Sciences via R</h1>
+<p class="author"><em>Chester Ismay and Albert Y. Kim</em></p>
+<p class="date"><em>July 21, 2018</em></p>
+</div>
+<div id="intro" class="section level1">
+<h1><span class="header-section-number">1</span> Introduction</h1>
+<!--
+---
+
+<div class="learncheck">
+<p><strong>Note: This is the development version of ModernDive and is currently in the process of being edited. For the latest released version of ModernDive, please go to <a href="https://moderndive.com/">ModernDive.com</a>.</strong></p>
+</div>
+-->
+<hr />
+<div id="important-note" class="section level2">
+<h2><span class="header-section-number">1.1</span> Important Note</h2>
+<p>This is a previous version (v0.4.0) of ModernDive and may be out of date. For the current version of ModernDive, please go to <a href="https://moderndive.com/">ModernDive.com</a>.</p>
+<hr />
+<p><img src="https://cran.r-project.org/Rlogo.svg" alt="Drawing" style="height: 150px;"/>
+     
+<img src="https://www.rstudio.com/wp-content/uploads/2014/07/RStudio-Logo-Blue-Gradient.png" alt="Drawing" style="height: 150px;"/></p>
+<p><strong>Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do?</strong> If you’re asking yourself this question, then you’ve come to the right place! Start with our <a href="index.html#sec:intro-for-students">Introduction for Students</a>.</p>
+<ul>
+<li><em>Are you an instructor hoping to use this book in your courses? Then click <a href="index.html#sec:intro-instructors">here</a> for more information on how to teach with this book.</em></li>
+<li><em>Are you looking to connect with and contribute to ModernDive? Then click <a href="index.html#sec:connect-contribute">here</a> for information on how.</em></li>
+<li><em>Are you curious about the publishing of this book? Then click <a href="index.html#sec:about-book">here</a> for more information on the open-source technology, in particular R Markdown and the bookdown package.</em></li>
+</ul>
+<p>This is version 0.4.0 of ModernDive published on July 21, 2018. For previous versions of ModernDive, see Section <a href="index.html#sec:about-book">1.6</a>.</p>
+<hr />
+</div>
+<div id="sec:intro-for-students" class="section level2">
+<h2><span class="header-section-number">1.2</span> Introduction for students</h2>
+<p>This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.</p>
+<p>In Figure <a href="index.html#fig:moderndive-figure">1.1</a> we present a flowchart of what you’ll cover in this book. You’ll first get started with with data in Chapter <a href="2-getting-started.html#getting-started">2</a>, where you’ll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then</p>
+<ol style="list-style-type: decimal">
+<li><strong>Data science</strong>: You’ll assemble your data science toolbox using <code>tidyverse</code> packages. In particular:
+<ul>
+<li>Ch.<a href="3-viz.html#viz">3</a>: Visualizing data via the <code>ggplot2</code> package.</li>
+<li>Ch.<a href="4-tidy.html#tidy">4</a>: Understanding the concept of “tidy” data as a standardized data input format for all packages in the <code>tidyverse</code></li>
+<li>Ch.<a href="5-wrangling.html#wrangling">5</a>: Wrangling data via the <code>dplyr</code> package.</li>
+</ul></li>
+<li><strong>Data modeling</strong>: Using these data science tools and helper functions from the <code>moderndive</code> package, you’ll start performing data modeling. In particular:
+<ul>
+<li>Ch.<a href="6-regression.html#regression">6</a>: Constructing basic regression models.</li>
+<li>Ch.<a href="7-multiple-regression.html#multiple-regression">7</a>: Constructing multiple regression models.</li>
+</ul></li>
+<li><strong>Statistical inference</strong>: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the <code>infer</code> package. In particular:
+<ul>
+<li>Ch.<a href="8-sampling.html#sampling">8</a>: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls.</li>
+<li>Ch.<a href="9-confidence-intervals.html#confidence-intervals">9</a>: Building confidence intervals.</li>
+<li>Ch.<a href="10-hypothesis-testing.html#hypothesis-testing">10</a>: Conducting hypothesis tests.</li>
+</ul></li>
+<li><strong>Data modeling revisited</strong>: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.<a href="6-regression.html#regression">6</a> &amp; Ch.<a href="6-regression.html#regression">6</a>. In particular:
+<ul>
+<li>Ch.<a href="11-inference-for-regression.html#inference-for-regression">11</a>: Interpreting both the statistical and practice significance of the results of the models.</li>
+</ul></li>
+</ol>
+<p>We’ll end with a discussion on what it means to “think with data” in Chapter <a href="12-thinking-with-data.html#thinking-with-data">12</a> and present an example case study data analysis of house prices in Seattle.</p>
+<div class="figure" style="text-align: center"><span id="fig:moderndive-figure"></span>
+<img src="images/flowcharts/flowchart/flowchart.002.png" alt="ModernDive Flowchart" width="\textwidth" />
+<p class="caption">
+Figure 1.1: ModernDive Flowchart
+</p>
+</div>
+<div id="subsec:learning-goals" class="section level3">
+<h3><span class="header-section-number">1.2.1</span> What you will learn from this book</h3>
+<p>We hope that by the end of this book, you’ll have learned</p>
+<ol style="list-style-type: decimal">
+<li>How to use R to explore data.<br />
+</li>
+<li>How to answer statistical questions using tools like confidence intervals and hypothesis tests.</li>
+<li>How to effectively create “data stories” using these tools.</li>
+</ol>
+<p>What do we mean by data stories? We mean any analysis involving data that engages the reader in answering questions with careful visuals and thoughtful discussion, such as <a href="http://rpubs.com/ry_lisa_elana/chicago">How strong is the relationship between per capita income and crime in Chicago neighborhoods?</a> and <a href="https://ismayc.github.io/soc301_s2017/group_projects/group4.html">How many f**ks does Quentin Tarantino give (as measured by the amount of swearing in his films)?</a>. Further discussions on data stories can be found in this <a href="https://www.thinkwithgoogle.com/marketing-resources/data-measurement/tell-meaningful-stories-with-data/">Think With Google article</a>.</p>
+<p>For other examples of data stories constructed by students like yourselves, look at the final projects for two courses that have previously used ModernDive:</p>
+<ul>
+<li>Middlebury College <a href="https://rudeboybert.github.io/MATH116/PS/final_project/final_project_outline.html#past_examples">MATH 116 Introduction to Statistical and Data Sciences</a> using student collected data.</li>
+<li>Pacific University <a href="https://ismayc.github.io/soc301_s2017/group-projects/index.html">SOC 301 Social Statistics</a> using data from the <a href="https://cran.r-project.org/web/packages/fivethirtyeight/vignettes/fivethirtyeight.html">fivethirtyeight R package</a>.</li>
+</ul>
+<p>This book will help you develop your “data science toolbox”, including tools such as data visualization, data formatting, data wrangling, and data modeling using regression. With these tools, you’ll be able to perform the entirety of the “data/science pipeline” while building data communication skills (see Subsection <a href="index.html#subsec:pipeline">1.2.2</a> for more details).</p>
+<p>In particular, this book will lean heavily on data visualization. In today’s world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are to convey relationships with data. You’ll also see the use of visualization to introduce concepts like mean, median, standard deviation, distributions, etc. In general, we’ll use visualization as a way of building almost all of the ideas in this book.</p>
+<p>To impart the statistical lessons in this book, we have intentionally minimized the number of mathematical formulas used and instead have focused on developing a conceptual understanding via data visualization, statistical computing, and simulations. We hope this is a more intuitive experience than the way statistics has traditionally been taught in the past and how it is commonly perceived.</p>
+<p>Finally, you’ll learn the importance of literate programming. By this we mean you’ll learn how to write code that is useful not just for a computer to execute but also for readers to understand exactly what your analysis is doing and how you did it. This is part of a greater effort to encourage reproducible research (see Subsection <a href="index.html#subsec:reproducible">1.2.3</a> for more details). Hal Abelson coined the phrase that we will follow throughout this book:</p>
+<blockquote>
+<p>“Programs must be written for people to read, and only incidentally for machines to execute.”</p>
+</blockquote>
+<p>We understand that there may be challenging moments as you learn to program. Both of us continue to struggle and find ourselves often using web searches to find answers and reach out to colleagues for help. In the long run though, we all can solve problems faster and more elegantly via programming. We wrote this book as our way to help you get started and you should know that there is a huge community of R users that are always happy to help everyone along as well. This community exists in particular on the internet on various forums and websites such as <a href="https://stackoverflow.com/">stackoverflow.com</a>.</p>
+</div>
+<div id="subsec:pipeline" class="section level3">
+<h3><span class="header-section-number">1.2.2</span> Data/science pipeline</h3>
+<p>You may think of statistics as just being a bunch of numbers. We commonly hear the phrase “statistician” when listening to broadcasts of sporting events. Statistics (in particular, data analysis), in addition to describing numbers like with baseball batting averages, plays a vital role in all of the sciences. You’ll commonly hear the phrase “statistically significant” thrown around in the media. You’ll see articles that say “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this book, you’ll be able to better understand whether these claims should be trusted or whether we should be wary. Inside data analysis are many sub-fields that we will discuss throughout this book (though not necessarily in this order):</p>
+<ul>
+<li>data collection</li>
+<li>data wrangling</li>
+<li>data visualization</li>
+<li>data modeling</li>
+<li>inference</li>
+<li>correlation and regression</li>
+<li>interpretation of results</li>
+<li>data communication/storytelling</li>
+</ul>
+<p>These sub-fields are summarized in what Grolemund and Wickham term the <a href="http://r4ds.had.co.nz/explore-intro.html">“Data/Science Pipeline”</a> in Figure <a href="index.html#fig:pipeline-figure">1.2</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:pipeline-figure"></span>
+<img src="images/tidy1.png" alt="Data/Science Pipeline" width="\textwidth" />
+<p class="caption">
+Figure 1.2: Data/Science Pipeline
+</p>
+</div>
+<p>We will begin by digging into the gray <strong>Understand</strong> portion of the cycle with data visualization, then with a discussion on what is meant by tidy data and data wrangling, and then conclude by talking about interpreting and discussing the results of our models via <strong>Communication</strong>. These steps are vital to any statistical analysis. But why should you care about statistics? “Why did they make me take this class?”</p>
+<p>There’s a reason so many fields require a statistics course. Scientific knowledge grows through an understanding of statistical significance and data analysis. You needn’t be intimidated by statistics. It’s not the beast that it used to be and, paired with computation, you’ll see how reproducible research in the sciences particularly increases scientific knowledge.</p>
+</div>
+<div id="subsec:reproducible" class="section level3">
+<h3><span class="header-section-number">1.2.3</span> Reproducible research</h3>
+<blockquote>
+<p>“The most important tool is the <em>mindset</em>, when starting, that the end product will be reproducible.” – Keith Baggerly</p>
+</blockquote>
+<p>Another goal of this book is to help readers understand the importance of reproducible analyses. The hope is to get readers into the habit of making their analyses reproducible from the very beginning. This means we’ll be trying to help you build new habits. This will take practice and be difficult at times. You’ll see just why it is so important for you to keep track of your code and well-document it to help yourself later and any potential collaborators as well.</p>
+<p>Copying and pasting results from one program into a word processor is not the way that efficient and effective scientific research is conducted. It’s much more important for time to be spent on data collection and data analysis and not on copying and pasting plots back and forth across a variety of programs.</p>
+<p>In a traditional analyses if an error was made with the original data, we’d need to step through the entire process again: recreate the plots and copy and paste all of the new plots and our statistical analysis into your document. This is error prone and a frustrating use of time. We’ll see how to use R Markdown to get away from this tedious activity so that we can spend more time doing science.</p>
+<blockquote>
+<p>“We are talking about <em>computational</em> reproducibility.” - Yihui Xie</p>
+</blockquote>
+<p>Reproducibility means a lot of things in terms of different scientific fields. Are experiments conducted in a way that another researcher could follow the steps and get similar results? In this book, we will focus on what is known as <strong>computational reproducibility</strong>. This refers to being able to pass all of one’s data analysis, data-sets, and conclusions to someone else and have them get exactly the same results on their machine. This allows for time to be spent interpreting results and considering assumptions instead of the more error prone way of starting from scratch or following a list of steps that may be different from machine to machine.</p>
+<!--
+Additionally, this book will focus on computational thinking, data thinking, and inferential thinking. We'll see throughout the book how these three modes of thinking can build effective ways to work with, to describe, and to convey statistical knowledge.  
+-->
+</div>
+<div id="final-note-for-students" class="section level3">
+<h3><span class="header-section-number">1.2.4</span> Final note for students</h3>
+<p>At this point, if you are interested in instructor perspectives on this book, ways to contribute and collaborate, or the technical details of this book’s construction and publishing, then continue with the rest of the chapter below. Otherwise, let’s get started with R and RStudio in Chapter <a href="2-getting-started.html#getting-started">2</a>!</p>
+<hr />
+</div>
+</div>
+<div id="sec:intro-instructors" class="section level2">
+<h2><span class="header-section-number">1.3</span> Introduction for instructors</h2>
+<p>This book is inspired by the following books:</p>
+<ul>
+<li>“Mathematical Statistics with Resampling and R” <span class="citation">(Chihara and Hesterberg 2011)</span>,</li>
+<li>“OpenIntro: Intro Stat with Randomization and Simulation” <span class="citation">(Diez, Barr, and Çetinkaya-Rundel 2014)</span>, and</li>
+<li>“R for Data Science” <span class="citation">(Grolemund and Wickham 2016)</span>.</li>
+</ul>
+<p>The first book, while designed for upper-level undergraduates and graduate students, provides an excellent resource on how to use resampling to impart statistical concepts like sampling distributions using computation instead of large-sample approximations and other mathematical formulas. The last two books are free options to learning introductory statistics and data science, providing an alternative to the many traditionally expensive introductory statistics textbooks.</p>
+<p>When looking over the large number of introductory statistics textbooks that currently exist, we found that there wasn’t one that incorporated many newly developed R packages directly into the text, in particular the many packages included in the <a href="http://tidyverse.org/"><code>tidyverse</code></a> collection of packages, such as <code>ggplot2</code>, <code>dplyr</code>, <code>tidyr</code>, and <code>broom</code>. Additionally, there wasn’t an open-source and easily reproducible textbook available that exposed new learners all of three of the learning goals listed at the outset of Subsection <a href="index.html#subsec:learning-goals">1.2.1</a>.</p>
+<div id="who-is-this-book-for" class="section level3">
+<h3><span class="header-section-number">1.3.1</span> Who is this book for?</h3>
+<p>This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience.</p>
+<p>Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you.</p>
+<ol style="list-style-type: decimal">
+<li><strong>Blur the lines between lecture and lab</strong>
+<ul>
+<li>With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened.</li>
+<li>It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key.</li>
+</ul></li>
+<li><strong>Focus on the entire data/science research pipeline</strong>
+<ul>
+<li>We believe that the entirety of Grolemund and Wickham’s <a href="http://r4ds.had.co.nz/introduction.html">data/science pipeline</a> should be taught.</li>
+<li>We believe in <a href="https://arxiv.org/abs/1507.05346">“minimizing prerequisites to research”</a>: students should be answering questions with data as soon as possible.</li>
+</ul></li>
+<li><strong>It’s all about the data</strong>
+<ul>
+<li>We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the <code>nycflights13</code> and <code>fivethirtyeight</code> packages.</li>
+<li>We believe that <a href="http://escholarship.org/uc/item/84v3774z">data visualization is a gateway drug for statistics</a> and that the Grammar of Graphics as implemented in the <code>ggplot2</code> package is the best way to impart such lessons. However, we often hear: “You can’t teach <code>ggplot2</code> for data visualization in intro stats!” We, like <a href="http://varianceexplained.org/r/teach_ggplot2_to_beginners/">David Robinson</a>, are much more optimistic.</li>
+<li><code>dplyr</code> has made data wrangling much more <a href="http://chance.amstat.org/2015/04/setting-the-stage/">accessible</a> to novices, and hence much more interesting data-sets can be explored.</li>
+</ul></li>
+<li><strong>Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas</strong>
+<ul>
+<li>Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference.</li>
+<li>This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics.</li>
+</ul></li>
+<li><strong>Don’t fence off students from the computation pool, throw them in!</strong>
+<ul>
+<li>Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice.</li>
+<li>We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis.</li>
+</ul></li>
+<li><strong>Complete reproducibility and customizability</strong>
+<ul>
+<li>We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book!</li>
+<li>Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see <a href="#about-book">About this Book</a>.</li>
+</ul></li>
+</ol>
+<hr />
+</div>
+</div>
+<div id="datacamp" class="section level2">
+<h2><span class="header-section-number">1.4</span> DataCamp</h2>
+<p><img src="images/datacamp.png" /></p>
+<p>DataCamp is a browser-based interactive platform for learning data science, offering courses on a wide array of courses on data science, analytics, statistics, machine learning, and artificial intelligence, where each course is a combination of lectures and exercises that offer immediate feedback.</p>
+<p>The following chapters of ModernDive roughly map to the following closely-integrated DataCamp courses that use the same R tools and often even the same datasets. By no means is this an exhaustive list of possible DataCamp courses that are relevant to the topics in this book, we recommend these ones in particular to supplement your ModernDive experience.</p>
+<p>Click on the image for each course to access its webpage on <a href="https://www.datacamp.com/home">datacamp.com</a>. Instructors at accredited universities can sign their class up for a free academic licence at <a href="https://www.datacamp.com/groups/education">DataCamp For The Classroom</a>, giving their students access to all premium courses for 6 months for free.</p>
+<table>
+<colgroup>
+<col width="33%" />
+<col width="33%" />
+<col width="33%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th align="left">Chapter</th>
+<th align="left">Topic</th>
+<th align="left">DataCamp Courses</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">2</td>
+<td align="left">Basic R programming concepts</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/free-introduction-to-r"><img src="images/datacamp_intro_to_R.png" alt="Drawing" style="height: 150px;"/></a> <a target="_blank" class="page-link" href="https://www.datacamp.com/courses/intermediate-r"><img src="images/datacamp_intermediate_R.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+<tr class="even">
+<td align="left">3 &amp; 5</td>
+<td align="left">Introductory data visualization and wrangling</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/introduction-to-the-tidyverse"><img src="images/datacamp_intro_to_tidyverse.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+<tr class="odd">
+<td align="left">4 &amp; 5</td>
+<td align="left">Data “tidying” and intermediate data wrangling</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/working-with-data-in-the-tidyverse"><img src="images/datacamp_working_with_data.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+<tr class="even">
+<td align="left">6 &amp; 7</td>
+<td align="left">Data modelling, basic regression, and multiple regression</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse"><img src="images/datacamp_intro_to_modeling.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+<tr class="odd">
+<td align="left">9 &amp; 10</td>
+<td align="left">Statistical inference: confidence intervals and hypothesis testing</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-numerical-data"><img src="images/datacamp_inference_for_numerical_data.png" alt="Drawing" style="height: 150px;"/></a> <a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-categorical-data"><img src="images/datacamp_inference_for_categorical_data.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+<tr class="even">
+<td align="left">11</td>
+<td align="left">Inference for regression</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-linear-regression"><img src="images/datacamp_inference_for_regression.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+</tbody>
+</table>
+<hr />
+</div>
+<div id="sec:connect-contribute" class="section level2">
+<h2><span class="header-section-number">1.5</span> Connect and contribute</h2>
+<p>If you would like to connect with ModernDive, check out the following links:</p>
+<ul>
+<li>If you would like to receive periodic updates about ModernDive (roughly every 3 months), please sign up for our <a href="http://eepurl.com/cBkItf">mailing list</a>.</li>
+<li>Contact Albert at <a href="mailto:albert@moderndive.com">albert@moderndive.com</a> and Chester <a href="mailto:chester@moderndive.com">chester@moderndive.com</a></li>
+<li>We’re on Twitter at <a href="https://twitter.com/ModernDive">ModernDive</a>.</li>
+</ul>
+<p>If you would like to contribute to ModernDive, there are many ways! Let’s all work together to make this book as great as possible for as many students and instructors as possible!</p>
+<ul>
+<li>Please let us know if you find any errors, typos, or areas from improvement on our <a href="https://github.com/moderndive/moderndive_book/issues">GitHub issues</a> page.</li>
+<li>If you are familiar with GitHub and would like to contribute more, please see Section <a href="index.html#sec:about-book">1.6</a> below.</li>
+</ul>
+<p>The authors would like to thank <a href="https://github.com/nsonneborn">Nina Sonneborn</a>, <a href="https://twitter.com/rhobott?lang=en">Kristin Bott</a>, and the participants of our <a href="https://www.causeweb.org/cause/uscots/uscots17/workshop/3">USCOTS 2017 workshop</a> for their feedback and suggestions. A special thanks goes to Prof. Yana Weinstein, cognitive psychological scientist and co-founder of <a href="http://www.learningscientists.org/yana-weinstein/">The Learning Scientists</a>, for her extensive contributions.</p>
+<hr />
+</div>
+<div id="sec:about-book" class="section level2">
+<h2><span class="header-section-number">1.6</span> About this book</h2>
+<p>This book was written using RStudio’s <a href="https://bookdown.org/">bookdown</a> package by Yihui Xie <span class="citation">(Xie 2018)</span>. This package simplifies the publishing of books by having all content written in <a href="http://rmarkdown.rstudio.com/html_document_format.html">R Markdown</a>. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub:</p>
+<ul>
+<li><strong>Latest published version</strong> The most up-to-date release:
+<ul>
+<li>Version 0.4.0 released on July 21, 2018 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.4.0">source code</a>).</li>
+<li>Available at <a href="https://moderndive.com/">ModernDive.com</a></li>
+</ul></li>
+<li><strong>Development version</strong> The working copy of the next version which is currently being edited:
+<ul>
+<li>Preview of development version is available at <a href="https://moderndive.netlify.com/">https://moderndive.netlify.com/</a></li>
+<li>Source code: Available on ModernDive’s <a href="https://github.com/moderndive/moderndive_book">GitHub repository page</a></li>
+</ul></li>
+<li><strong>Previous versions</strong> Older versions that may be out of date:
+<ul>
+<li><a href="https://moderndive.com/previous_versions/v0.3.0">Version 0.3.0</a> released on February 3, 2018 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.3.0">source code</a>)</li>
+<li><a href="https://moderndive.com/previous_versions/v0.2.0">Version 0.2.0</a> released on August 02, 2017 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.2.0">source code</a>)</li>
+<li><a href="https://moderndive.com/previous_versions/v0.1.3">Version 0.1.3</a> released on February 09, 2017 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.1.3">source code</a>)</li>
+<li><a href="https://moderndive.com/previous_versions/v0.1.2">Version 0.1.2</a> released on January 22, 2017 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.1.2">source code</a>)</li>
+</ul></li>
+</ul>
+<p>Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated <em>editions</em> of the textbook every few years, we apply a software design influenced model of publishing more easily updated <em>versions</em>. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests.</p>
+<p>Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of <code>index.Rmd</code> as “Chester Ismay, Albert Y. Kim, and YOU!”</p>
+<hr />
+</div>
+<div id="sec:about-authors" class="section level2">
+<h2><span class="header-section-number">1.7</span> About the authors</h2>
+<p>Who we are!</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">Chester Ismay</th>
+<th align="center">Albert Y. Kim</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center"><img src="images/ismay.jpeg" alt="Drawing" style="height: 200px;"/></td>
+<td align="center"><img src="images/kim.jpeg" alt="Drawing" style="height: 200px;"/></td>
+</tr>
+</tbody>
+</table>
+<ul>
+<li>Chester Ismay: Data Science Curriculum Lead - DataCamp, Portland, OR, USA.
+<ul>
+<li>Email: <a href="mailto:chester@moderndive.com">chester@moderndive.com</a></li>
+<li>Webpage: <a href="http://ismayc.github.io/" class="uri">http://ismayc.github.io/</a></li>
+<li>Twitter: <a href="https://twitter.com/old_man_chester">old_man_chester</a></li>
+<li>GitHub: <a href="https://github.com/ismayc" class="uri">https://github.com/ismayc</a></li>
+</ul></li>
+<li>Albert Y. Kim: Assistant Professor of Statistical &amp; Data Sciences - Smith College, Northampton, MA, USA.
+<ul>
+<li>Email: <a href="mailto:albert@moderndive.com">albert@moderndive.com</a></li>
+<li>Webpage: <a href="http://rudeboybert.rbind.io/" class="uri">http://rudeboybert.rbind.io/</a></li>
+<li>Twitter: <a href="https://twitter.com/rudeboybert">rudeboybert</a></li>
+<li>GitHub: <a href="https://github.com/rudeboybert" class="uri">https://github.com/rudeboybert</a></li>
+</ul></li>
+</ul>
+<!--
+### Colophon 
+
+* ModernDive is written using the CC0 1.0 Universal License; more information on this license is available [here](https://creativecommons.org/publicdomain/zero/1.0/).
+* ModernDive uses the following versions of R packages (and their dependent packages):
+-->
+
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+
+<a href="2-getting-started.html" class="navigation navigation-next navigation-unique" aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/index.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-257-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot1-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-257-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot1-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png
new file mode 100644
index 000000000..1c40650d5
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png
new file mode 100644
index 000000000..75f1b3198
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png
new file mode 100644
index 000000000..e64a9d291
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png
new file mode 100644
index 000000000..b0424a705
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png
new file mode 100644
index 000000000..eac2ae532
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png
new file mode 100644
index 000000000..b461e4d8d
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png
new file mode 100644
index 000000000..c1b23c5d0
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png
new file mode 100644
index 000000000..7533ef4c0
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png
new file mode 100644
index 000000000..59079169e
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png
new file mode 100644
index 000000000..c373f0614
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png
new file mode 100644
index 000000000..52ec8d8c9
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png
new file mode 100644
index 000000000..8e38890bd
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png
new file mode 100644
index 000000000..bc07984da
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png
new file mode 100644
index 000000000..19c3d3ce4
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png
new file mode 100644
index 000000000..05fd2a2c6
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png
new file mode 100644
index 000000000..ae1e27ce4
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png
new file mode 100644
index 000000000..b15743ff0
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png
new file mode 100644
index 000000000..5c819748e
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png
new file mode 100644
index 000000000..6c974ac5c
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png
new file mode 100644
index 000000000..945c2a767
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png
new file mode 100644
index 000000000..ac51b9a4c
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png
new file mode 100644
index 000000000..0e7da0c56
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png differ
diff --git a/docs/ismaykim_files/figure-html/guatline-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/guatline-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/guatline-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/guatline-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png
new file mode 100644
index 000000000..21cc533e4
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png
new file mode 100644
index 000000000..9aabd2385
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png
new file mode 100644
index 000000000..c22482a88
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png
new file mode 100644
index 000000000..ed39c46a3
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png
new file mode 100644
index 000000000..5289175ca
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png
new file mode 100644
index 000000000..1aff0c4d4
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png
new file mode 100644
index 000000000..87b9e1d29
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png
new file mode 100644
index 000000000..f7be7b354
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png
new file mode 100644
index 000000000..ae2dc50a1
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png
new file mode 100644
index 000000000..e24767ad9
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png
new file mode 100644
index 000000000..89a84cdec
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png
new file mode 100644
index 000000000..35d0b1184
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png
new file mode 100644
index 000000000..1b56768ff
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png
new file mode 100644
index 000000000..9d795fdd9
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png
new file mode 100644
index 000000000..fa5dcdc0c
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png
new file mode 100644
index 000000000..b57a36c43
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png
new file mode 100644
index 000000000..a9ba01c3c
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png
new file mode 100644
index 000000000..99c74f9f3
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png
new file mode 100644
index 000000000..e4bf82faa
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png
new file mode 100644
index 000000000..1dba66321
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png
new file mode 100644
index 000000000..b1494c193
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png
new file mode 100644
index 000000000..677284899
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png
new file mode 100644
index 000000000..2b7c97f70
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png
new file mode 100644
index 000000000..2f468ce06
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png
new file mode 100644
index 000000000..9cbd57423
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png
new file mode 100644
index 000000000..07492d6b9
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png
new file mode 100644
index 000000000..80f396a81
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png
new file mode 100644
index 000000000..22004abf2
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png
new file mode 100644
index 000000000..0c3d6fc59
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png
new file mode 100644
index 000000000..257b8868f
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png
new file mode 100644
index 000000000..f51ed808e
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png
new file mode 100644
index 000000000..b16443c75
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png
new file mode 100644
index 000000000..a02e9ea6d
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png
new file mode 100644
index 000000000..cbd0db485
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png
new file mode 100644
index 000000000..250e5aba0
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png
new file mode 100644
index 000000000..9fe9d9ddb
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png
new file mode 100644
index 000000000..b1bbe1929
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png
new file mode 100644
index 000000000..354794075
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png
new file mode 100644
index 000000000..6497fc940
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png
new file mode 100644
index 000000000..894639541
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png
new file mode 100644
index 000000000..e4d3802a7
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png
new file mode 100644
index 000000000..f223722fe
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png
new file mode 100644
index 000000000..2893c6c1b
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png
new file mode 100644
index 000000000..68d82d93f
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png
new file mode 100644
index 000000000..07c033577
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png
new file mode 100644
index 000000000..44a656cb9
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png
new file mode 100644
index 000000000..45267ae95
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-76-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-121-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-76-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-121-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-209-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-198-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-209-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-198-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-210-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-199-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-210-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-199-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-242-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-226-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-242-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-226-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png
new file mode 100644
index 000000000..2ec3bb210
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png
new file mode 100644
index 000000000..2eff27fd8
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-30-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-27-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-30-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-27-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-251-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-275-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-251-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-275-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-300-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-279-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-300-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-279-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-31-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-28-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-31-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-28-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-303-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-282-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-303-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-282-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-32-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-29-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-32-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-29-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-314-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-295-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-314-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-295-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png
new file mode 100644
index 000000000..ecc2e03aa
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-33-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-30-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-33-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-30-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png
new file mode 100644
index 000000000..67f3cebba
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png
new file mode 100644
index 000000000..59b11718b
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png
new file mode 100644
index 000000000..b8923fa33
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-283-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-307-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-283-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-307-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png
new file mode 100644
index 000000000..887295abf
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png
new file mode 100644
index 000000000..361b4cf56
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-297-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-321-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-297-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-321-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png
new file mode 100644
index 000000000..3f91edc4c
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png
new file mode 100644
index 000000000..f5ebba1b3
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png
new file mode 100644
index 000000000..b66803a57
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png
new file mode 100644
index 000000000..51c650433
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-378-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-357-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-378-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-357-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-344-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-366-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-344-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-366-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-390-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-369-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-390-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-369-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-348-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-370-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-348-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-370-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-405-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-382-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-405-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-382-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png
new file mode 100644
index 000000000..a7c90d982
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png
new file mode 100644
index 000000000..38e224167
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-413-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-389-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-413-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-389-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-369-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-390-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-369-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-390-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png
new file mode 100644
index 000000000..61c1fc57b
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png
new file mode 100644
index 000000000..81bb3ed7e
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-418-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-395-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-418-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-395-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-428-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-404-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-428-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-404-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-432-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-410-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-432-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-410-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-461-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-439-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-461-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-439-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-470-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-447-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-470-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-447-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png
new file mode 100644
index 000000000..987c77048
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png
new file mode 100644
index 000000000..23529eddf
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png
new file mode 100644
index 000000000..e5fc89b14
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png
new file mode 100644
index 000000000..9fba85a96
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png
new file mode 100644
index 000000000..d8fdb2a55
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-488-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-465-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-488-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-465-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png
new file mode 100644
index 000000000..d58a15b80
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png
new file mode 100644
index 000000000..7330b399c
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-497-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-474-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-497-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-474-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png
new file mode 100644
index 000000000..08e92540f
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png differ
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png
new file mode 100644
index 000000000..ca3cdd4bf
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-506-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-483-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-506-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-483-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png
new file mode 100644
index 000000000..c561dced3
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-513-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-488-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-513-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-488-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-491-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-494-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-491-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-494-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-52-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-50-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-52-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-50-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-55-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-53-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-55-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-53-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-122-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-70-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-122-1.png
rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-70-1.png
diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png
new file mode 100644
index 000000000..a3dff48e8
Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png differ
diff --git a/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js
new file mode 100644
index 000000000..7d6121e1d
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js
@@ -0,0 +1,6 @@
+/*! @license Copyright 2014 Dan Vanderkam (danvdk@gmail.com) MIT-licensed (http://opensource.org/licenses/MIT) */
+!function(t){"use strict";for(var e,a,i={},r=function(){},n="memory".split(","),o="assert,clear,count,debug,dir,dirxml,error,exception,group,groupCollapsed,groupEnd,info,log,markTimeline,profile,profiles,profileEnd,show,table,time,timeEnd,timeline,timelineEnd,timeStamp,trace,warn".split(",");e=n.pop();)t[e]=t[e]||i;for(;a=o.pop();)t[a]=t[a]||r}(this.console=this.console||{}),function(){"use strict";CanvasRenderingContext2D.prototype.installPattern=function(t){if("undefined"!=typeof this.isPatternInstalled)throw"Must un-install old line pattern before installing a new one.";this.isPatternInstalled=!0;var e=[0,0],a=[],i=this.beginPath,r=this.lineTo,n=this.moveTo,o=this.stroke;this.uninstallPattern=function(){this.beginPath=i,this.lineTo=r,this.moveTo=n,this.stroke=o,this.uninstallPattern=void 0,this.isPatternInstalled=void 0},this.beginPath=function(){a=[],i.call(this)},this.moveTo=function(t,e){a.push([[t,e]]),n.call(this,t,e)},this.lineTo=function(t,e){var i=a[a.length-1];i.push([t,e])},this.stroke=function(){if(0===a.length)return void o.call(this);for(var i=0;i<a.length;i++)for(var s=a[i],l=s[0][0],h=s[0][1],p=1;p<s.length;p++){var g=s[p][0],d=s[p][1];this.save();var u=g-l,c=d-h,y=Math.sqrt(u*u+c*c),_=Math.atan2(c,u);this.translate(l,h),n.call(this,0,0),this.rotate(_);for(var v=e[0],f=0;y>f;){var x=t[v];f+=e[1]?e[1]:x,f>y?(e=[v,f-y],f=y):e=[(v+1)%t.length,0],v%2===0?r.call(this,f,0):n.call(this,f,0),v=(v+1)%t.length}this.restore(),l=g,h=d}o.call(this),a=[]}},CanvasRenderingContext2D.prototype.uninstallPattern=function(){throw"Must install a line pattern before uninstalling it."}}();var DygraphOptions=function(){return function(){"use strict";var t=function(t){this.dygraph_=t,this.yAxes_=[],this.xAxis_={},this.series_={},this.global_=this.dygraph_.attrs_,this.user_=this.dygraph_.user_attrs_||{},this.labels_=[],this.highlightSeries_=this.get("highlightSeriesOpts")||{},this.reparseSeries()};t.AXIS_STRING_MAPPINGS_={y:0,Y:0,y1:0,Y1:0,y2:1,Y2:1},t.axisToIndex_=function(e){if("string"==typeof e){if(t.AXIS_STRING_MAPPINGS_.hasOwnProperty(e))return t.AXIS_STRING_MAPPINGS_[e];throw"Unknown axis : "+e}if("number"==typeof e){if(0===e||1===e)return e;throw"Dygraphs only supports two y-axes, indexed from 0-1."}if(e)throw"Unknown axis : "+e;return 0},t.prototype.reparseSeries=function(){var e=this.get("labels");if(e){this.labels_=e.slice(1),this.yAxes_=[{series:[],options:{}}],this.xAxis_={options:{}},this.series_={};var a=!this.user_.series;if(a){for(var i=0,r=0;r<this.labels_.length;r++){var n=this.labels_[r],o=this.user_[n]||{},s=0,l=o.axis;"object"==typeof l&&(s=++i,this.yAxes_[s]={series:[n],options:l}),l||this.yAxes_[0].series.push(n),this.series_[n]={idx:r,yAxis:s,options:o}}for(var r=0;r<this.labels_.length;r++){var n=this.labels_[r],o=this.series_[n].options,l=o.axis;if("string"==typeof l){if(!this.series_.hasOwnProperty(l))return void console.error("Series "+n+" wants to share a y-axis with series "+l+", which does not define its own axis.");var s=this.series_[l].yAxis;this.series_[n].yAxis=s,this.yAxes_[s].series.push(n)}}}else for(var r=0;r<this.labels_.length;r++){var n=this.labels_[r],o=this.user_.series[n]||{},s=t.axisToIndex_(o.axis);this.series_[n]={idx:r,yAxis:s,options:o},this.yAxes_[s]?this.yAxes_[s].series.push(n):this.yAxes_[s]={series:[n],options:{}}}var h=this.user_.axes||{};Dygraph.update(this.yAxes_[0].options,h.y||{}),this.yAxes_.length>1&&Dygraph.update(this.yAxes_[1].options,h.y2||{}),Dygraph.update(this.xAxis_.options,h.x||{})}},t.prototype.get=function(t){var e=this.getGlobalUser_(t);return null!==e?e:this.getGlobalDefault_(t)},t.prototype.getGlobalUser_=function(t){return this.user_.hasOwnProperty(t)?this.user_[t]:null},t.prototype.getGlobalDefault_=function(t){return this.global_.hasOwnProperty(t)?this.global_[t]:Dygraph.DEFAULT_ATTRS.hasOwnProperty(t)?Dygraph.DEFAULT_ATTRS[t]:null},t.prototype.getForAxis=function(t,e){var a,i;if("number"==typeof e)a=e,i=0===a?"y":"y2";else{if("y1"==e&&(e="y"),"y"==e)a=0;else if("y2"==e)a=1;else{if("x"!=e)throw"Unknown axis "+e;a=-1}i=e}var r=-1==a?this.xAxis_:this.yAxes_[a];if(r){var n=r.options;if(n.hasOwnProperty(t))return n[t]}if("x"!==e||"logscale"!==t){var o=this.getGlobalUser_(t);if(null!==o)return o}var s=Dygraph.DEFAULT_ATTRS.axes[i];return s.hasOwnProperty(t)?s[t]:this.getGlobalDefault_(t)},t.prototype.getForSeries=function(t,e){if(e===this.dygraph_.getHighlightSeries()&&this.highlightSeries_.hasOwnProperty(t))return this.highlightSeries_[t];if(!this.series_.hasOwnProperty(e))throw"Unknown series: "+e;var a=this.series_[e],i=a.options;return i.hasOwnProperty(t)?i[t]:this.getForAxis(t,a.yAxis)},t.prototype.numAxes=function(){return this.yAxes_.length},t.prototype.axisForSeries=function(t){return this.series_[t].yAxis},t.prototype.axisOptions=function(t){return this.yAxes_[t].options},t.prototype.seriesForAxis=function(t){return this.yAxes_[t].series},t.prototype.seriesNames=function(){return this.labels_};return t}()}(),DygraphLayout=function(){"use strict";var t=function(t){this.dygraph_=t,this.points=[],this.setNames=[],this.annotations=[],this.yAxes_=null,this.xTicks_=null,this.yTicks_=null};return t.prototype.addDataset=function(t,e){this.points.push(e),this.setNames.push(t)},t.prototype.getPlotArea=function(){return this.area_},t.prototype.computePlotArea=function(){var t={x:0,y:0};t.w=this.dygraph_.width_-t.x-this.dygraph_.getOption("rightGap"),t.h=this.dygraph_.height_;var e={chart_div:this.dygraph_.graphDiv,reserveSpaceLeft:function(e){var a={x:t.x,y:t.y,w:e,h:t.h};return t.x+=e,t.w-=e,a},reserveSpaceRight:function(e){var a={x:t.x+t.w-e,y:t.y,w:e,h:t.h};return t.w-=e,a},reserveSpaceTop:function(e){var a={x:t.x,y:t.y,w:t.w,h:e};return t.y+=e,t.h-=e,a},reserveSpaceBottom:function(e){var a={x:t.x,y:t.y+t.h-e,w:t.w,h:e};return t.h-=e,a},chartRect:function(){return{x:t.x,y:t.y,w:t.w,h:t.h}}};this.dygraph_.cascadeEvents_("layout",e),this.area_=t},t.prototype.setAnnotations=function(t){this.annotations=[];for(var e=this.dygraph_.getOption("xValueParser")||function(t){return t},a=0;a<t.length;a++){var i={};if(!t[a].xval&&void 0===t[a].x)return void console.error("Annotations must have an 'x' property");if(t[a].icon&&(!t[a].hasOwnProperty("width")||!t[a].hasOwnProperty("height")))return void console.error("Must set width and height when setting annotation.icon property");Dygraph.update(i,t[a]),i.xval||(i.xval=e(i.x)),this.annotations.push(i)}},t.prototype.setXTicks=function(t){this.xTicks_=t},t.prototype.setYAxes=function(t){this.yAxes_=t},t.prototype.evaluate=function(){this._xAxis={},this._evaluateLimits(),this._evaluateLineCharts(),this._evaluateLineTicks(),this._evaluateAnnotations()},t.prototype._evaluateLimits=function(){var t=this.dygraph_.xAxisRange();this._xAxis.minval=t[0],this._xAxis.maxval=t[1];var e=t[1]-t[0];this._xAxis.scale=0!==e?1/e:1,this.dygraph_.getOptionForAxis("logscale","x")&&(this._xAxis.xlogrange=Dygraph.log10(this._xAxis.maxval)-Dygraph.log10(this._xAxis.minval),this._xAxis.xlogscale=0!==this._xAxis.xlogrange?1/this._xAxis.xlogrange:1);for(var a=0;a<this.yAxes_.length;a++){var i=this.yAxes_[a];i.minyval=i.computedValueRange[0],i.maxyval=i.computedValueRange[1],i.yrange=i.maxyval-i.minyval,i.yscale=0!==i.yrange?1/i.yrange:1,this.dygraph_.getOption("logscale")&&(i.ylogrange=Dygraph.log10(i.maxyval)-Dygraph.log10(i.minyval),i.ylogscale=0!==i.ylogrange?1/i.ylogrange:1,(!isFinite(i.ylogrange)||isNaN(i.ylogrange))&&console.error("axis "+a+" of graph at "+i.g+" can't be displayed in log scale for range ["+i.minyval+" - "+i.maxyval+"]"))}},t.calcXNormal_=function(t,e,a){return a?(Dygraph.log10(t)-Dygraph.log10(e.minval))*e.xlogscale:(t-e.minval)*e.scale},t.calcYNormal_=function(t,e,a){if(a){var i=1-(Dygraph.log10(e)-Dygraph.log10(t.minyval))*t.ylogscale;return isFinite(i)?i:0/0}return 1-(e-t.minyval)*t.yscale},t.prototype._evaluateLineCharts=function(){for(var e=this.dygraph_.getOption("stackedGraph"),a=this.dygraph_.getOptionForAxis("logscale","x"),i=0;i<this.points.length;i++){for(var r=this.points[i],n=this.setNames[i],o=this.dygraph_.getOption("connectSeparatedPoints",n),s=this.dygraph_.axisPropertiesForSeries(n),l=this.dygraph_.attributes_.getForSeries("logscale",n),h=0;h<r.length;h++){var p=r[h];p.x=t.calcXNormal_(p.xval,this._xAxis,a);var g=p.yval;e&&(p.y_stacked=t.calcYNormal_(s,p.yval_stacked,l),null===g||isNaN(g)||(g=p.yval_stacked)),null===g&&(g=0/0,o||(p.yval=0/0)),p.y=t.calcYNormal_(s,g,l)}this.dygraph_.dataHandler_.onLineEvaluated(r,s,l)}},t.prototype._evaluateLineTicks=function(){var t,e,a,i;for(this.xticks=[],t=0;t<this.xTicks_.length;t++)e=this.xTicks_[t],a=e.label,i=this.dygraph_.toPercentXCoord(e.v),i>=0&&1>i&&this.xticks.push([i,a]);for(this.yticks=[],t=0;t<this.yAxes_.length;t++)for(var r=this.yAxes_[t],n=0;n<r.ticks.length;n++)e=r.ticks[n],a=e.label,i=this.dygraph_.toPercentYCoord(e.v,t),i>0&&1>=i&&this.yticks.push([t,i,a])},t.prototype._evaluateAnnotations=function(){var t,e={};for(t=0;t<this.annotations.length;t++){var a=this.annotations[t];e[a.xval+","+a.series]=a}if(this.annotated_points=[],this.annotations&&this.annotations.length)for(var i=0;i<this.points.length;i++){var r=this.points[i];for(t=0;t<r.length;t++){var n=r[t],o=n.xval+","+n.name;o in e&&(n.annotation=e[o],this.annotated_points.push(n))}}},t.prototype.removeAllDatasets=function(){delete this.points,delete this.setNames,delete this.setPointsLengths,delete this.setPointsOffsets,this.points=[],this.setNames=[],this.setPointsLengths=[],this.setPointsOffsets=[]},t}(),DygraphCanvasRenderer=function(){"use strict";var t=function(t,e,a,i){if(this.dygraph_=t,this.layout=i,this.element=e,this.elementContext=a,this.height=t.height_,this.width=t.width_,!this.isIE&&!Dygraph.isCanvasSupported(this.element))throw"Canvas is not supported.";if(this.area=i.getPlotArea(),this.dygraph_.isUsingExcanvas_)this._createIEClipArea();else if(!Dygraph.isAndroid()){var r=this.dygraph_.canvas_ctx_;r.beginPath(),r.rect(this.area.x,this.area.y,this.area.w,this.area.h),r.clip(),r=this.dygraph_.hidden_ctx_,r.beginPath(),r.rect(this.area.x,this.area.y,this.area.w,this.area.h),r.clip()}};return t.prototype.clear=function(){var t;if(this.isIE)try{this.clearDelay&&(this.clearDelay.cancel(),this.clearDelay=null),t=this.elementContext}catch(e){return}t=this.elementContext,t.clearRect(0,0,this.width,this.height)},t.prototype.render=function(){this._updatePoints(),this._renderLineChart()},t.prototype._createIEClipArea=function(){function t(t){if(0!==t.w&&0!==t.h){var i=document.createElement("div");i.className=e,i.style.backgroundColor=r,i.style.position="absolute",i.style.left=t.x+"px",i.style.top=t.y+"px",i.style.width=t.w+"px",i.style.height=t.h+"px",a.appendChild(i)}}for(var e="dygraph-clip-div",a=this.dygraph_.graphDiv,i=a.childNodes.length-1;i>=0;i--)a.childNodes[i].className==e&&a.removeChild(a.childNodes[i]);for(var r=document.bgColor,n=this.dygraph_.graphDiv;n!=document;){var o=n.currentStyle.backgroundColor;if(o&&"transparent"!=o){r=o;break}n=n.parentNode}var s=this.area;t({x:0,y:0,w:s.x,h:this.height}),t({x:s.x,y:0,w:this.width-s.x,h:s.y}),t({x:s.x+s.w,y:0,w:this.width-s.x-s.w,h:this.height}),t({x:s.x,y:s.y+s.h,w:this.width-s.x,h:this.height-s.h-s.y})},t._getIteratorPredicate=function(e){return e?t._predicateThatSkipsEmptyPoints:null},t._predicateThatSkipsEmptyPoints=function(t,e){return null!==t[e].yval},t._drawStyledLine=function(e,a,i,r,n,o,s){var l=e.dygraph,h=l.getBooleanOption("stepPlot",e.setName);Dygraph.isArrayLike(r)||(r=null);var p=l.getBooleanOption("drawGapEdgePoints",e.setName),g=e.points,d=e.setName,u=Dygraph.createIterator(g,0,g.length,t._getIteratorPredicate(l.getBooleanOption("connectSeparatedPoints",d))),c=r&&r.length>=2,y=e.drawingContext;y.save(),c&&y.installPattern(r);var _=t._drawSeries(e,u,i,s,n,p,h,a);t._drawPointsOnLine(e,_,o,a,s),c&&y.uninstallPattern(),y.restore()},t._drawSeries=function(t,e,a,i,r,n,o,s){var l,h,p=null,g=null,d=null,u=[],c=!0,y=t.drawingContext;y.beginPath(),y.strokeStyle=s,y.lineWidth=a;for(var _=e.array_,v=e.end_,f=e.predicate_,x=e.start_;v>x;x++){if(h=_[x],f){for(;v>x&&!f(_,x);)x++;if(x==v)break;h=_[x]}if(null===h.canvasy||h.canvasy!=h.canvasy)o&&null!==p&&(y.moveTo(p,g),y.lineTo(h.canvasx,g)),p=g=null;else{if(l=!1,n||!p){e.nextIdx_=x,e.next(),d=e.hasNext?e.peek.canvasy:null;var m=null===d||d!=d;l=!p&&m,n&&(!c&&!p||e.hasNext&&m)&&(l=!0)}null!==p?a&&(o&&(y.moveTo(p,g),y.lineTo(h.canvasx,g)),y.lineTo(h.canvasx,h.canvasy)):y.moveTo(h.canvasx,h.canvasy),(r||l)&&u.push([h.canvasx,h.canvasy,h.idx]),p=h.canvasx,g=h.canvasy}c=!1}return y.stroke(),u},t._drawPointsOnLine=function(t,e,a,i,r){for(var n=t.drawingContext,o=0;o<e.length;o++){var s=e[o];n.save(),a.call(t.dygraph,t.dygraph,t.setName,n,s[0],s[1],i,r,s[2]),n.restore()}},t.prototype._updatePoints=function(){for(var t=this.layout.points,e=t.length;e--;)for(var a=t[e],i=a.length;i--;){var r=a[i];r.canvasx=this.area.w*r.x+this.area.x,r.canvasy=this.area.h*r.y+this.area.y}},t.prototype._renderLineChart=function(t,e){var a,i,r=e||this.elementContext,n=this.layout.points,o=this.layout.setNames;this.colors=this.dygraph_.colorsMap_;var s=this.dygraph_.getOption("plotter"),l=s;Dygraph.isArrayLike(l)||(l=[l]);var h={};for(a=0;a<o.length;a++){i=o[a];var p=this.dygraph_.getOption("plotter",i);p!=s&&(h[i]=p)}for(a=0;a<l.length;a++)for(var g=l[a],d=a==l.length-1,u=0;u<n.length;u++)if(i=o[u],!t||i==t){var c=n[u],y=g;if(i in h){if(!d)continue;y=h[i]}var _=this.colors[i],v=this.dygraph_.getOption("strokeWidth",i);r.save(),r.strokeStyle=_,r.lineWidth=v,y({points:c,setName:i,drawingContext:r,color:_,strokeWidth:v,dygraph:this.dygraph_,axis:this.dygraph_.axisPropertiesForSeries(i),plotArea:this.area,seriesIndex:u,seriesCount:n.length,singleSeriesName:t,allSeriesPoints:n}),r.restore()}},t._Plotters={linePlotter:function(e){t._linePlotter(e)},fillPlotter:function(e){t._fillPlotter(e)},errorPlotter:function(e){t._errorPlotter(e)}},t._linePlotter=function(e){var a=e.dygraph,i=e.setName,r=e.strokeWidth,n=a.getNumericOption("strokeBorderWidth",i),o=a.getOption("drawPointCallback",i)||Dygraph.Circles.DEFAULT,s=a.getOption("strokePattern",i),l=a.getBooleanOption("drawPoints",i),h=a.getNumericOption("pointSize",i);n&&r&&t._drawStyledLine(e,a.getOption("strokeBorderColor",i),r+2*n,s,l,o,h),t._drawStyledLine(e,e.color,r,s,l,o,h)},t._errorPlotter=function(e){var a=e.dygraph,i=e.setName,r=a.getBooleanOption("errorBars")||a.getBooleanOption("customBars");if(r){var n=a.getBooleanOption("fillGraph",i);n&&console.warn("Can't use fillGraph option with error bars");var o,s=e.drawingContext,l=e.color,h=a.getNumericOption("fillAlpha",i),p=a.getBooleanOption("stepPlot",i),g=e.points,d=Dygraph.createIterator(g,0,g.length,t._getIteratorPredicate(a.getBooleanOption("connectSeparatedPoints",i))),u=0/0,c=0/0,y=[-1,-1],_=Dygraph.toRGB_(l),v="rgba("+_.r+","+_.g+","+_.b+","+h+")";s.fillStyle=v,s.beginPath();for(var f=function(t){return null===t||void 0===t||isNaN(t)};d.hasNext;){var x=d.next();!p&&f(x.y)||p&&!isNaN(c)&&f(c)?u=0/0:(o=[x.y_bottom,x.y_top],p&&(c=x.y),isNaN(o[0])&&(o[0]=x.y),isNaN(o[1])&&(o[1]=x.y),o[0]=e.plotArea.h*o[0]+e.plotArea.y,o[1]=e.plotArea.h*o[1]+e.plotArea.y,isNaN(u)||(p?(s.moveTo(u,y[0]),s.lineTo(x.canvasx,y[0]),s.lineTo(x.canvasx,y[1])):(s.moveTo(u,y[0]),s.lineTo(x.canvasx,o[0]),s.lineTo(x.canvasx,o[1])),s.lineTo(u,y[1]),s.closePath()),y=o,u=x.canvasx)}s.fill()}},t._fastCanvasProxy=function(t){var e=[],a=null,i=null,r=1,n=2,o=0,s=function(t){if(!(e.length<=1)){for(var a=e.length-1;a>0;a--){var i=e[a];if(i[0]==n){var o=e[a-1];o[1]==i[1]&&o[2]==i[2]&&e.splice(a,1)}}for(var a=0;a<e.length-1;){var i=e[a];i[0]==n&&e[a+1][0]==n?e.splice(a,1):a++}if(e.length>2&&!t){var s=0;e[0][0]==n&&s++;for(var l=null,h=null,a=s;a<e.length;a++){var i=e[a];if(i[0]==r)if(null===l&&null===h)l=a,h=a;else{var p=i[2];p<e[l][2]?l=a:p>e[h][2]&&(h=a)}}var g=e[l],d=e[h];e.splice(s,e.length-s),h>l?(e.push(g),e.push(d)):l>h?(e.push(d),e.push(g)):e.push(g)}}},l=function(a){s(a);for(var l=0,h=e.length;h>l;l++){var p=e[l];p[0]==r?t.lineTo(p[1],p[2]):p[0]==n&&t.moveTo(p[1],p[2])}e.length&&(i=e[e.length-1][1]),o+=e.length,e=[]},h=function(t,r,n){var o=Math.round(r);if(null===a||o!=a){var s=a-i>1,h=o-a>1,p=s||h;l(p),a=o}e.push([t,r,n])};return{moveTo:function(t,e){h(n,t,e)},lineTo:function(t,e){h(r,t,e)},stroke:function(){l(!0),t.stroke()},fill:function(){l(!0),t.fill()},beginPath:function(){l(!0),t.beginPath()},closePath:function(){l(!0),t.closePath()},_count:function(){return o}}},t._fillPlotter=function(e){if(!e.singleSeriesName&&0===e.seriesIndex){for(var a=e.dygraph,i=a.getLabels().slice(1),r=i.length;r>=0;r--)a.visibility()[r]||i.splice(r,1);var n=function(){for(var t=0;t<i.length;t++)if(a.getBooleanOption("fillGraph",i[t]))return!0;return!1}();if(n)for(var o,s,l=e.plotArea,h=e.allSeriesPoints,p=h.length,g=a.getNumericOption("fillAlpha"),d=a.getBooleanOption("stackedGraph"),u=a.getColors(),c={},y=function(t,e,a,i){if(t.lineTo(e,a),d)for(var r=i.length-1;r>=0;r--){var n=i[r];t.lineTo(n[0],n[1])}},_=p-1;_>=0;_--){var v=e.drawingContext,f=i[_];if(a.getBooleanOption("fillGraph",f)){var x=a.getBooleanOption("stepPlot",f),m=u[_],D=a.axisPropertiesForSeries(f),w=1+D.minyval*D.yscale;0>w?w=0:w>1&&(w=1),w=l.h*w+l.y;var A,b=h[_],T=Dygraph.createIterator(b,0,b.length,t._getIteratorPredicate(a.getBooleanOption("connectSeparatedPoints",f))),E=0/0,C=[-1,-1],L=Dygraph.toRGB_(m),P="rgba("+L.r+","+L.g+","+L.b+","+g+")";v.fillStyle=P,v.beginPath();var S,O=!0;(b.length>2*a.width_||Dygraph.FORCE_FAST_PROXY)&&(v=t._fastCanvasProxy(v));for(var M,R=[];T.hasNext;)if(M=T.next(),Dygraph.isOK(M.y)||x){if(d){if(!O&&S==M.xval)continue;O=!1,S=M.xval,o=c[M.canvasx];var F;F=void 0===o?w:s?o[0]:o,A=[M.canvasy,F],x?-1===C[0]?c[M.canvasx]=[M.canvasy,w]:c[M.canvasx]=[M.canvasy,C[0]]:c[M.canvasx]=M.canvasy}else A=isNaN(M.canvasy)&&x?[l.y+l.h,w]:[M.canvasy,w];isNaN(E)?(v.moveTo(M.canvasx,A[1]),v.lineTo(M.canvasx,A[0])):(x?(v.lineTo(M.canvasx,C[0]),v.lineTo(M.canvasx,A[0])):v.lineTo(M.canvasx,A[0]),d&&(R.push([E,C[1]]),R.push(s&&o?[M.canvasx,o[1]]:[M.canvasx,A[1]]))),C=A,E=M.canvasx}else y(v,E,C[1],R),R=[],E=0/0,null===M.y_stacked||isNaN(M.y_stacked)||(c[M.canvasx]=l.h*M.y_stacked+l.y);s=x,A&&M&&(y(v,M.canvasx,A[1],R),R=[]),v.fill()}}}},t}(),Dygraph=function(){"use strict";var t=function(t,e,a,i){this.is_initial_draw_=!0,this.readyFns_=[],void 0!==i?(console.warn("Using deprecated four-argument dygraph constructor"),this.__old_init__(t,e,a,i)):this.__init__(t,e,a)};return t.NAME="Dygraph",t.VERSION="1.1.1",t.__repr__=function(){return"["+t.NAME+" "+t.VERSION+"]"},t.toString=function(){return t.__repr__()},t.DEFAULT_ROLL_PERIOD=1,t.DEFAULT_WIDTH=480,t.DEFAULT_HEIGHT=320,t.ANIMATION_STEPS=12,t.ANIMATION_DURATION=200,t.KMB_LABELS=["K","M","B","T","Q"],t.KMG2_BIG_LABELS=["k","M","G","T","P","E","Z","Y"],t.KMG2_SMALL_LABELS=["m","u","n","p","f","a","z","y"],t.numberValueFormatter=function(e,a){var i=a("sigFigs");if(null!==i)return t.floatFormat(e,i);var r,n=a("digitsAfterDecimal"),o=a("maxNumberWidth"),s=a("labelsKMB"),l=a("labelsKMG2");if(r=0!==e&&(Math.abs(e)>=Math.pow(10,o)||Math.abs(e)<Math.pow(10,-n))?e.toExponential(n):""+t.round_(e,n),s||l){var h,p=[],g=[];s&&(h=1e3,p=t.KMB_LABELS),l&&(s&&console.warn("Setting both labelsKMB and labelsKMG2. Pick one!"),h=1024,p=t.KMG2_BIG_LABELS,g=t.KMG2_SMALL_LABELS);for(var d=Math.abs(e),u=t.pow(h,p.length),c=p.length-1;c>=0;c--,u/=h)if(d>=u){r=t.round_(e/u,n)+p[c];break}if(l){var y=String(e.toExponential()).split("e-");2===y.length&&y[1]>=3&&y[1]<=24&&(r=y[1]%3>0?t.round_(y[0]/t.pow(10,y[1]%3),n):Number(y[0]).toFixed(2),r+=g[Math.floor(y[1]/3)-1])}}return r},t.numberAxisLabelFormatter=function(e,a,i){return t.numberValueFormatter.call(this,e,i)},t.SHORT_MONTH_NAMES_=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],t.dateAxisLabelFormatter=function(e,a,i){var r=i("labelsUTC"),n=r?t.DateAccessorsUTC:t.DateAccessorsLocal,o=n.getFullYear(e),s=n.getMonth(e),l=n.getDate(e),h=n.getHours(e),p=n.getMinutes(e),g=n.getSeconds(e),d=n.getSeconds(e);if(a>=t.DECADAL)return""+o;if(a>=t.MONTHLY)return t.SHORT_MONTH_NAMES_[s]+"&#160;"+o;var u=3600*h+60*p+g+.001*d;return 0===u||a>=t.DAILY?t.zeropad(l)+"&#160;"+t.SHORT_MONTH_NAMES_[s]:t.hmsString_(h,p,g)},t.dateAxisFormatter=t.dateAxisLabelFormatter,t.dateValueFormatter=function(e,a){return t.dateString_(e,a("labelsUTC"))},t.Plotters=DygraphCanvasRenderer._Plotters,t.DEFAULT_ATTRS={highlightCircleSize:3,highlightSeriesOpts:null,highlightSeriesBackgroundAlpha:.5,labelsDivWidth:250,labelsDivStyles:{},labelsSeparateLines:!1,labelsShowZeroValues:!0,labelsKMB:!1,labelsKMG2:!1,showLabelsOnHighlight:!0,digitsAfterDecimal:2,maxNumberWidth:6,sigFigs:null,strokeWidth:1,strokeBorderWidth:0,strokeBorderColor:"white",axisTickSize:3,axisLabelFontSize:14,rightGap:5,showRoller:!1,xValueParser:t.dateParser,delimiter:",",sigma:2,errorBars:!1,fractions:!1,wilsonInterval:!0,customBars:!1,fillGraph:!1,fillAlpha:.15,connectSeparatedPoints:!1,stackedGraph:!1,stackedGraphNaNFill:"all",hideOverlayOnMouseOut:!0,legend:"onmouseover",stepPlot:!1,avoidMinZero:!1,xRangePad:0,yRangePad:null,drawAxesAtZero:!1,titleHeight:28,xLabelHeight:18,yLabelWidth:18,drawXAxis:!0,drawYAxis:!0,axisLineColor:"black",axisLineWidth:.3,gridLineWidth:.3,axisLabelColor:"black",axisLabelWidth:50,drawYGrid:!0,drawXGrid:!0,gridLineColor:"rgb(128,128,128)",interactionModel:null,animatedZooms:!1,showRangeSelector:!1,rangeSelectorHeight:40,rangeSelectorPlotStrokeColor:"#808FAB",rangeSelectorPlotFillColor:"#A7B1C4",showInRangeSelector:null,plotter:[t.Plotters.fillPlotter,t.Plotters.errorPlotter,t.Plotters.linePlotter],plugins:[],axes:{x:{pixelsPerLabel:70,axisLabelWidth:60,axisLabelFormatter:t.dateAxisLabelFormatter,valueFormatter:t.dateValueFormatter,drawGrid:!0,drawAxis:!0,independentTicks:!0,ticker:null},y:{axisLabelWidth:50,pixelsPerLabel:30,valueFormatter:t.numberValueFormatter,axisLabelFormatter:t.numberAxisLabelFormatter,drawGrid:!0,drawAxis:!0,independentTicks:!0,ticker:null},y2:{axisLabelWidth:50,pixelsPerLabel:30,valueFormatter:t.numberValueFormatter,axisLabelFormatter:t.numberAxisLabelFormatter,drawAxis:!0,drawGrid:!1,independentTicks:!1,ticker:null}}},t.HORIZONTAL=1,t.VERTICAL=2,t.PLUGINS=[],t.addedAnnotationCSS=!1,t.prototype.__old_init__=function(e,a,i,r){if(null!==i){for(var n=["Date"],o=0;o<i.length;o++)n.push(i[o]);t.update(r,{labels:n})}this.__init__(e,a,r)},t.prototype.__init__=function(e,a,i){if(/MSIE/.test(navigator.userAgent)&&!window.opera&&"undefined"!=typeof G_vmlCanvasManager&&"complete"!=document.readyState){var r=this;return void setTimeout(function(){r.__init__(e,a,i)},100)}if((null===i||void 0===i)&&(i={}),i=t.mapLegacyOptions_(i),"string"==typeof e&&(e=document.getElementById(e)),!e)return void console.error("Constructing dygraph with a non-existent div!");this.isUsingExcanvas_="undefined"!=typeof G_vmlCanvasManager,this.maindiv_=e,this.file_=a,this.rollPeriod_=i.rollPeriod||t.DEFAULT_ROLL_PERIOD,this.previousVerticalX_=-1,this.fractions_=i.fractions||!1,this.dateWindow_=i.dateWindow||null,this.annotations_=[],this.zoomed_x_=!1,this.zoomed_y_=!1,e.innerHTML="",""===e.style.width&&i.width&&(e.style.width=i.width+"px"),""===e.style.height&&i.height&&(e.style.height=i.height+"px"),""===e.style.height&&0===e.clientHeight&&(e.style.height=t.DEFAULT_HEIGHT+"px",""===e.style.width&&(e.style.width=t.DEFAULT_WIDTH+"px")),this.width_=e.clientWidth||i.width||0,this.height_=e.clientHeight||i.height||0,i.stackedGraph&&(i.fillGraph=!0),this.user_attrs_={},t.update(this.user_attrs_,i),this.attrs_={},t.updateDeep(this.attrs_,t.DEFAULT_ATTRS),this.boundaryIds_=[],this.setIndexByName_={},this.datasetIndex_=[],this.registeredEvents_=[],this.eventListeners_={},this.attributes_=new DygraphOptions(this),this.createInterface_(),this.plugins_=[];for(var n=t.PLUGINS.concat(this.getOption("plugins")),o=0;o<n.length;o++){var s,l=n[o];s="undefined"!=typeof l.activate?l:new l;var h={plugin:s,events:{},options:{},pluginOptions:{}},p=s.activate(this);for(var g in p)p.hasOwnProperty(g)&&(h.events[g]=p[g]);this.plugins_.push(h)}for(var o=0;o<this.plugins_.length;o++){var d=this.plugins_[o];for(var g in d.events)if(d.events.hasOwnProperty(g)){var u=d.events[g],c=[d.plugin,u];g in this.eventListeners_?this.eventListeners_[g].push(c):this.eventListeners_[g]=[c]}}this.createDragInterface_(),this.start_()},t.prototype.cascadeEvents_=function(e,a){if(!(e in this.eventListeners_))return!1;var i={dygraph:this,cancelable:!1,defaultPrevented:!1,preventDefault:function(){if(!i.cancelable)throw"Cannot call preventDefault on non-cancelable event.";i.defaultPrevented=!0},propagationStopped:!1,stopPropagation:function(){i.propagationStopped=!0}};t.update(i,a);var r=this.eventListeners_[e];if(r)for(var n=r.length-1;n>=0;n--){var o=r[n][0],s=r[n][1];if(s.call(o,i),i.propagationStopped)break}return i.defaultPrevented},t.prototype.getPluginInstance_=function(t){for(var e=0;e<this.plugins_.length;e++){var a=this.plugins_[e];if(a.plugin instanceof t)return a.plugin}return null},t.prototype.isZoomed=function(t){if(null===t||void 0===t)return this.zoomed_x_||this.zoomed_y_;if("x"===t)return this.zoomed_x_;if("y"===t)return this.zoomed_y_;throw"axis parameter is ["+t+"] must be null, 'x' or 'y'."},t.prototype.toString=function(){var t=this.maindiv_,e=t&&t.id?t.id:t;return"[Dygraph "+e+"]"},t.prototype.attr_=function(t,e){return e?this.attributes_.getForSeries(t,e):this.attributes_.get(t)},t.prototype.getOption=function(t,e){return this.attr_(t,e)},t.prototype.getNumericOption=function(t,e){return this.getOption(t,e)},t.prototype.getStringOption=function(t,e){return this.getOption(t,e)},t.prototype.getBooleanOption=function(t,e){return this.getOption(t,e)},t.prototype.getFunctionOption=function(t,e){return this.getOption(t,e)},t.prototype.getOptionForAxis=function(t,e){return this.attributes_.getForAxis(t,e)},t.prototype.optionsViewForAxis_=function(t){var e=this;return function(a){var i=e.user_attrs_.axes;return i&&i[t]&&i[t].hasOwnProperty(a)?i[t][a]:"x"===t&&"logscale"===a?!1:"undefined"!=typeof e.user_attrs_[a]?e.user_attrs_[a]:(i=e.attrs_.axes,i&&i[t]&&i[t].hasOwnProperty(a)?i[t][a]:"y"==t&&e.axes_[0].hasOwnProperty(a)?e.axes_[0][a]:"y2"==t&&e.axes_[1].hasOwnProperty(a)?e.axes_[1][a]:e.attr_(a))}},t.prototype.rollPeriod=function(){return this.rollPeriod_},t.prototype.xAxisRange=function(){return this.dateWindow_?this.dateWindow_:this.xAxisExtremes()},t.prototype.xAxisExtremes=function(){var t=this.getNumericOption("xRangePad")/this.plotter_.area.w;if(0===this.numRows())return[0-t,1+t];var e=this.rawData_[0][0],a=this.rawData_[this.rawData_.length-1][0];if(t){var i=a-e;e-=i*t,a+=i*t}return[e,a]},t.prototype.yAxisRange=function(t){if("undefined"==typeof t&&(t=0),0>t||t>=this.axes_.length)return null;var e=this.axes_[t];return[e.computedValueRange[0],e.computedValueRange[1]]},t.prototype.yAxisRanges=function(){for(var t=[],e=0;e<this.axes_.length;e++)t.push(this.yAxisRange(e));return t},t.prototype.toDomCoords=function(t,e,a){return[this.toDomXCoord(t),this.toDomYCoord(e,a)]},t.prototype.toDomXCoord=function(t){if(null===t)return null;var e=this.plotter_.area,a=this.xAxisRange();return e.x+(t-a[0])/(a[1]-a[0])*e.w},t.prototype.toDomYCoord=function(t,e){var a=this.toPercentYCoord(t,e);if(null===a)return null;var i=this.plotter_.area;return i.y+a*i.h},t.prototype.toDataCoords=function(t,e,a){return[this.toDataXCoord(t),this.toDataYCoord(e,a)]},t.prototype.toDataXCoord=function(e){if(null===e)return null;var a=this.plotter_.area,i=this.xAxisRange();if(this.attributes_.getForAxis("logscale","x")){var r=(e-a.x)/a.w,n=t.log10(i[0]),o=t.log10(i[1]),s=n+r*(o-n),l=Math.pow(t.LOG_SCALE,s);return l}return i[0]+(e-a.x)/a.w*(i[1]-i[0])},t.prototype.toDataYCoord=function(e,a){if(null===e)return null;var i=this.plotter_.area,r=this.yAxisRange(a);if("undefined"==typeof a&&(a=0),this.attributes_.getForAxis("logscale",a)){var n=(e-i.y)/i.h,o=t.log10(r[0]),s=t.log10(r[1]),l=s-n*(s-o),h=Math.pow(t.LOG_SCALE,l);return h}return r[0]+(i.y+i.h-e)/i.h*(r[1]-r[0])},t.prototype.toPercentYCoord=function(e,a){if(null===e)return null;"undefined"==typeof a&&(a=0);var i,r=this.yAxisRange(a),n=this.attributes_.getForAxis("logscale",a);if(n){var o=t.log10(r[0]),s=t.log10(r[1]);i=(s-t.log10(e))/(s-o)}else i=(r[1]-e)/(r[1]-r[0]);return i},t.prototype.toPercentXCoord=function(e){if(null===e)return null;var a,i=this.xAxisRange(),r=this.attributes_.getForAxis("logscale","x");if(r===!0){var n=t.log10(i[0]),o=t.log10(i[1]);a=(t.log10(e)-n)/(o-n)}else a=(e-i[0])/(i[1]-i[0]);return a},t.prototype.numColumns=function(){return this.rawData_?this.rawData_[0]?this.rawData_[0].length:this.attr_("labels").length:0},t.prototype.numRows=function(){return this.rawData_?this.rawData_.length:0},t.prototype.getValue=function(t,e){return 0>t||t>this.rawData_.length?null:0>e||e>this.rawData_[t].length?null:this.rawData_[t][e]},t.prototype.createInterface_=function(){var e=this.maindiv_;this.graphDiv=document.createElement("div"),this.graphDiv.style.textAlign="left",this.graphDiv.style.position="relative",e.appendChild(this.graphDiv),this.canvas_=t.createCanvas(),this.canvas_.style.position="absolute",this.hidden_=this.createPlotKitCanvas_(this.canvas_),this.canvas_ctx_=t.getContext(this.canvas_),this.hidden_ctx_=t.getContext(this.hidden_),this.resizeElements_(),this.graphDiv.appendChild(this.hidden_),this.graphDiv.appendChild(this.canvas_),this.mouseEventElement_=this.createMouseEventElement_(),this.layout_=new DygraphLayout(this);var a=this;this.mouseMoveHandler_=function(t){a.mouseMove_(t)},this.mouseOutHandler_=function(e){var i=e.target||e.fromElement,r=e.relatedTarget||e.toElement;t.isNodeContainedBy(i,a.graphDiv)&&!t.isNodeContainedBy(r,a.graphDiv)&&a.mouseOut_(e)},this.addAndTrackEvent(window,"mouseout",this.mouseOutHandler_),this.addAndTrackEvent(this.mouseEventElement_,"mousemove",this.mouseMoveHandler_),this.resizeHandler_||(this.resizeHandler_=function(t){a.resize()},this.addAndTrackEvent(window,"resize",this.resizeHandler_))},t.prototype.resizeElements_=function(){this.graphDiv.style.width=this.width_+"px",this.graphDiv.style.height=this.height_+"px";var e=t.getContextPixelRatio(this.canvas_ctx_);this.canvas_.width=this.width_*e,this.canvas_.height=this.height_*e,this.canvas_.style.width=this.width_+"px",this.canvas_.style.height=this.height_+"px",1!==e&&this.canvas_ctx_.scale(e,e);var a=t.getContextPixelRatio(this.hidden_ctx_);this.hidden_.width=this.width_*a,this.hidden_.height=this.height_*a,this.hidden_.style.width=this.width_+"px",this.hidden_.style.height=this.height_+"px",1!==a&&this.hidden_ctx_.scale(a,a)},t.prototype.destroy=function(){this.canvas_ctx_.restore(),this.hidden_ctx_.restore();for(var e=this.plugins_.length-1;e>=0;e--){var a=this.plugins_.pop();a.plugin.destroy&&a.plugin.destroy()}var i=function(t){for(;t.hasChildNodes();)i(t.firstChild),t.removeChild(t.firstChild)};this.removeTrackedEvents_(),t.removeEvent(window,"mouseout",this.mouseOutHandler_),t.removeEvent(this.mouseEventElement_,"mousemove",this.mouseMoveHandler_),t.removeEvent(window,"resize",this.resizeHandler_),this.resizeHandler_=null,i(this.maindiv_);var r=function(t){for(var e in t)"object"==typeof t[e]&&(t[e]=null)};r(this.layout_),r(this.plotter_),r(this)},t.prototype.createPlotKitCanvas_=function(e){var a=t.createCanvas();return a.style.position="absolute",a.style.top=e.style.top,a.style.left=e.style.left,a.width=this.width_,a.height=this.height_,a.style.width=this.width_+"px",a.style.height=this.height_+"px",a},t.prototype.createMouseEventElement_=function(){if(this.isUsingExcanvas_){var t=document.createElement("div");return t.style.position="absolute",t.style.backgroundColor="white",t.style.filter="alpha(opacity=0)",t.style.width=this.width_+"px",t.style.height=this.height_+"px",this.graphDiv.appendChild(t),t}return this.canvas_},t.prototype.setColors_=function(){var e=this.getLabels(),a=e.length-1;this.colors_=[],this.colorsMap_={};for(var i=this.getNumericOption("colorSaturation")||1,r=this.getNumericOption("colorValue")||.5,n=Math.ceil(a/2),o=this.getOption("colors"),s=this.visibility(),l=0;a>l;l++)if(s[l]){
+var h=e[l+1],p=this.attributes_.getForSeries("color",h);if(!p)if(o)p=o[l%o.length];else{var g=l%2?n+(l+1)/2:Math.ceil((l+1)/2),d=1*g/(1+a);p=t.hsvToRGB(d,i,r)}this.colors_.push(p),this.colorsMap_[h]=p}},t.prototype.getColors=function(){return this.colors_},t.prototype.getPropertiesForSeries=function(t){for(var e=-1,a=this.getLabels(),i=1;i<a.length;i++)if(a[i]==t){e=i;break}return-1==e?null:{name:t,column:e,visible:this.visibility()[e-1],color:this.colorsMap_[t],axis:1+this.attributes_.axisForSeries(t)}},t.prototype.createRollInterface_=function(){this.roller_||(this.roller_=document.createElement("input"),this.roller_.type="text",this.roller_.style.display="none",this.graphDiv.appendChild(this.roller_));var t=this.getBooleanOption("showRoller")?"block":"none",e=this.plotter_.area,a={position:"absolute",zIndex:10,top:e.y+e.h-25+"px",left:e.x+1+"px",display:t};this.roller_.size="2",this.roller_.value=this.rollPeriod_;for(var i in a)a.hasOwnProperty(i)&&(this.roller_.style[i]=a[i]);var r=this;this.roller_.onchange=function(){r.adjustRoll(r.roller_.value)}},t.prototype.createDragInterface_=function(){var e={isZooming:!1,isPanning:!1,is2DPan:!1,dragStartX:null,dragStartY:null,dragEndX:null,dragEndY:null,dragDirection:null,prevEndX:null,prevEndY:null,prevDragDirection:null,cancelNextDblclick:!1,initialLeftmostDate:null,xUnitsPerPixel:null,dateRange:null,px:0,py:0,boundedDates:null,boundedValues:null,tarp:new t.IFrameTarp,initializeMouseDown:function(e,a,i){e.preventDefault?e.preventDefault():(e.returnValue=!1,e.cancelBubble=!0);var r=t.findPos(a.canvas_);i.px=r.x,i.py=r.y,i.dragStartX=t.dragGetX_(e,i),i.dragStartY=t.dragGetY_(e,i),i.cancelNextDblclick=!1,i.tarp.cover()},destroy:function(){var t=this;if((t.isZooming||t.isPanning)&&(t.isZooming=!1,t.dragStartX=null,t.dragStartY=null),t.isPanning){t.isPanning=!1,t.draggingDate=null,t.dateRange=null;for(var e=0;e<i.axes_.length;e++)delete i.axes_[e].draggingValue,delete i.axes_[e].dragValueRange}t.tarp.uncover()}},a=this.getOption("interactionModel"),i=this,r=function(t){return function(a){t(a,i,e)}};for(var n in a)a.hasOwnProperty(n)&&this.addAndTrackEvent(this.mouseEventElement_,n,r(a[n]));if(!a.willDestroyContextMyself){var o=function(t){e.destroy()};this.addAndTrackEvent(document,"mouseup",o)}},t.prototype.drawZoomRect_=function(e,a,i,r,n,o,s,l){var h=this.canvas_ctx_;o==t.HORIZONTAL?h.clearRect(Math.min(a,s),this.layout_.getPlotArea().y,Math.abs(a-s),this.layout_.getPlotArea().h):o==t.VERTICAL&&h.clearRect(this.layout_.getPlotArea().x,Math.min(r,l),this.layout_.getPlotArea().w,Math.abs(r-l)),e==t.HORIZONTAL?i&&a&&(h.fillStyle="rgba(128,128,128,0.33)",h.fillRect(Math.min(a,i),this.layout_.getPlotArea().y,Math.abs(i-a),this.layout_.getPlotArea().h)):e==t.VERTICAL&&n&&r&&(h.fillStyle="rgba(128,128,128,0.33)",h.fillRect(this.layout_.getPlotArea().x,Math.min(r,n),this.layout_.getPlotArea().w,Math.abs(n-r))),this.isUsingExcanvas_&&(this.currentZoomRectArgs_=[e,a,i,r,n,0,0,0])},t.prototype.clearZoomRect_=function(){this.currentZoomRectArgs_=null,this.canvas_ctx_.clearRect(0,0,this.width_,this.height_)},t.prototype.doZoomX_=function(t,e){this.currentZoomRectArgs_=null;var a=this.toDataXCoord(t),i=this.toDataXCoord(e);this.doZoomXDates_(a,i)},t.prototype.doZoomXDates_=function(t,e){var a=this.xAxisRange(),i=[t,e];this.zoomed_x_=!0;var r=this;this.doAnimatedZoom(a,i,null,null,function(){r.getFunctionOption("zoomCallback")&&r.getFunctionOption("zoomCallback").call(r,t,e,r.yAxisRanges())})},t.prototype.doZoomY_=function(t,e){this.currentZoomRectArgs_=null;for(var a=this.yAxisRanges(),i=[],r=0;r<this.axes_.length;r++){var n=this.toDataYCoord(t,r),o=this.toDataYCoord(e,r);i.push([o,n])}this.zoomed_y_=!0;var s=this;this.doAnimatedZoom(null,null,a,i,function(){if(s.getFunctionOption("zoomCallback")){var t=s.xAxisRange();s.getFunctionOption("zoomCallback").call(s,t[0],t[1],s.yAxisRanges())}})},t.zoomAnimationFunction=function(t,e){var a=1.5;return(1-Math.pow(a,-t))/(1-Math.pow(a,-e))},t.prototype.resetZoom=function(){var t=!1,e=!1,a=!1;null!==this.dateWindow_&&(t=!0,e=!0);for(var i=0;i<this.axes_.length;i++)"undefined"!=typeof this.axes_[i].valueWindow&&null!==this.axes_[i].valueWindow&&(t=!0,a=!0);if(this.clearSelection(),t){this.zoomed_x_=!1,this.zoomed_y_=!1;var r=this.rawData_[0][0],n=this.rawData_[this.rawData_.length-1][0];if(!this.getBooleanOption("animatedZooms")){for(this.dateWindow_=null,i=0;i<this.axes_.length;i++)null!==this.axes_[i].valueWindow&&delete this.axes_[i].valueWindow;return this.drawGraph_(),void(this.getFunctionOption("zoomCallback")&&this.getFunctionOption("zoomCallback").call(this,r,n,this.yAxisRanges()))}var o=null,s=null,l=null,h=null;if(e&&(o=this.xAxisRange(),s=[r,n]),a){l=this.yAxisRanges();var p=this.gatherDatasets_(this.rolledSeries_,null),g=p.extremes;for(this.computeYAxisRanges_(g),h=[],i=0;i<this.axes_.length;i++){var d=this.axes_[i];h.push(null!==d.valueRange&&void 0!==d.valueRange?d.valueRange:d.extremeRange)}}var u=this;this.doAnimatedZoom(o,s,l,h,function(){u.dateWindow_=null;for(var t=0;t<u.axes_.length;t++)null!==u.axes_[t].valueWindow&&delete u.axes_[t].valueWindow;u.getFunctionOption("zoomCallback")&&u.getFunctionOption("zoomCallback").call(u,r,n,u.yAxisRanges())})}},t.prototype.doAnimatedZoom=function(e,a,i,r,n){var o,s,l=this.getBooleanOption("animatedZooms")?t.ANIMATION_STEPS:1,h=[],p=[];if(null!==e&&null!==a)for(o=1;l>=o;o++)s=t.zoomAnimationFunction(o,l),h[o-1]=[e[0]*(1-s)+s*a[0],e[1]*(1-s)+s*a[1]];if(null!==i&&null!==r)for(o=1;l>=o;o++){s=t.zoomAnimationFunction(o,l);for(var g=[],d=0;d<this.axes_.length;d++)g.push([i[d][0]*(1-s)+s*r[d][0],i[d][1]*(1-s)+s*r[d][1]]);p[o-1]=g}var u=this;t.repeatAndCleanup(function(t){if(p.length)for(var e=0;e<u.axes_.length;e++){var a=p[t][e];u.axes_[e].valueWindow=[a[0],a[1]]}h.length&&(u.dateWindow_=h[t]),u.drawGraph_()},l,t.ANIMATION_DURATION/l,n)},t.prototype.getArea=function(){return this.plotter_.area},t.prototype.eventToDomCoords=function(e){if(e.offsetX&&e.offsetY)return[e.offsetX,e.offsetY];var a=t.findPos(this.mouseEventElement_),i=t.pageX(e)-a.x,r=t.pageY(e)-a.y;return[i,r]},t.prototype.findClosestRow=function(e){for(var a=1/0,i=-1,r=this.layout_.points,n=0;n<r.length;n++)for(var o=r[n],s=o.length,l=0;s>l;l++){var h=o[l];if(t.isValidPoint(h,!0)){var p=Math.abs(h.canvasx-e);a>p&&(a=p,i=h.idx)}}return i},t.prototype.findClosestPoint=function(e,a){for(var i,r,n,o,s,l,h,p=1/0,g=this.layout_.points.length-1;g>=0;--g)for(var d=this.layout_.points[g],u=0;u<d.length;++u)o=d[u],t.isValidPoint(o)&&(r=o.canvasx-e,n=o.canvasy-a,i=r*r+n*n,p>i&&(p=i,s=o,l=g,h=o.idx));var c=this.layout_.setNames[l];return{row:h,seriesName:c,point:s}},t.prototype.findStackedPoint=function(e,a){for(var i,r,n=this.findClosestRow(e),o=0;o<this.layout_.points.length;++o){var s=this.getLeftBoundary_(o),l=n-s,h=this.layout_.points[o];if(!(l>=h.length)){var p=h[l];if(t.isValidPoint(p)){var g=p.canvasy;if(e>p.canvasx&&l+1<h.length){var d=h[l+1];if(t.isValidPoint(d)){var u=d.canvasx-p.canvasx;if(u>0){var c=(e-p.canvasx)/u;g+=c*(d.canvasy-p.canvasy)}}}else if(e<p.canvasx&&l>0){var y=h[l-1];if(t.isValidPoint(y)){var u=p.canvasx-y.canvasx;if(u>0){var c=(p.canvasx-e)/u;g+=c*(y.canvasy-p.canvasy)}}}(0===o||a>g)&&(i=p,r=o)}}}var _=this.layout_.setNames[r];return{row:n,seriesName:_,point:i}},t.prototype.mouseMove_=function(t){var e=this.layout_.points;if(void 0!==e&&null!==e){var a=this.eventToDomCoords(t),i=a[0],r=a[1],n=this.getOption("highlightSeriesOpts"),o=!1;if(n&&!this.isSeriesLocked()){var s;s=this.getBooleanOption("stackedGraph")?this.findStackedPoint(i,r):this.findClosestPoint(i,r),o=this.setSelection(s.row,s.seriesName)}else{var l=this.findClosestRow(i);o=this.setSelection(l)}var h=this.getFunctionOption("highlightCallback");h&&o&&h.call(this,t,this.lastx_,this.selPoints_,this.lastRow_,this.highlightSet_)}},t.prototype.getLeftBoundary_=function(t){if(this.boundaryIds_[t])return this.boundaryIds_[t][0];for(var e=0;e<this.boundaryIds_.length;e++)if(void 0!==this.boundaryIds_[e])return this.boundaryIds_[e][0];return 0},t.prototype.animateSelection_=function(e){var a=10,i=30;void 0===this.fadeLevel&&(this.fadeLevel=0),void 0===this.animateId&&(this.animateId=0);var r=this.fadeLevel,n=0>e?r:a-r;if(0>=n)return void(this.fadeLevel&&this.updateSelection_(1));var o=++this.animateId,s=this;t.repeatAndCleanup(function(t){s.animateId==o&&(s.fadeLevel+=e,0===s.fadeLevel?s.clearSelection():s.updateSelection_(s.fadeLevel/a))},n,i,function(){})},t.prototype.updateSelection_=function(e){this.cascadeEvents_("select",{selectedRow:this.lastRow_,selectedX:this.lastx_,selectedPoints:this.selPoints_});var a,i=this.canvas_ctx_;if(this.getOption("highlightSeriesOpts")){i.clearRect(0,0,this.width_,this.height_);var r=1-this.getNumericOption("highlightSeriesBackgroundAlpha");if(r){var n=!0;if(n){if(void 0===e)return void this.animateSelection_(1);r*=e}i.fillStyle="rgba(255,255,255,"+r+")",i.fillRect(0,0,this.width_,this.height_)}this.plotter_._renderLineChart(this.highlightSet_,i)}else if(this.previousVerticalX_>=0){var o=0,s=this.attr_("labels");for(a=1;a<s.length;a++){var l=this.getNumericOption("highlightCircleSize",s[a]);l>o&&(o=l)}var h=this.previousVerticalX_;i.clearRect(h-o-1,0,2*o+2,this.height_)}if(this.isUsingExcanvas_&&this.currentZoomRectArgs_&&t.prototype.drawZoomRect_.apply(this,this.currentZoomRectArgs_),this.selPoints_.length>0){var p=this.selPoints_[0].canvasx;for(i.save(),a=0;a<this.selPoints_.length;a++){var g=this.selPoints_[a];if(t.isOK(g.canvasy)){var d=this.getNumericOption("highlightCircleSize",g.name),u=this.getFunctionOption("drawHighlightPointCallback",g.name),c=this.plotter_.colors[g.name];u||(u=t.Circles.DEFAULT),i.lineWidth=this.getNumericOption("strokeWidth",g.name),i.strokeStyle=c,i.fillStyle=c,u.call(this,this,g.name,i,p,g.canvasy,c,d,g.idx)}}i.restore(),this.previousVerticalX_=p}},t.prototype.setSelection=function(t,e,a){this.selPoints_=[];var i=!1;if(t!==!1&&t>=0){t!=this.lastRow_&&(i=!0),this.lastRow_=t;for(var r=0;r<this.layout_.points.length;++r){var n=this.layout_.points[r],o=t-this.getLeftBoundary_(r);if(o<n.length&&n[o].idx==t){var s=n[o];null!==s.yval&&this.selPoints_.push(s)}else for(var l=0;l<n.length;++l){var s=n[l];if(s.idx==t){null!==s.yval&&this.selPoints_.push(s);break}}}}else this.lastRow_>=0&&(i=!0),this.lastRow_=-1;return this.selPoints_.length?this.lastx_=this.selPoints_[0].xval:this.lastx_=-1,void 0!==e&&(this.highlightSet_!==e&&(i=!0),this.highlightSet_=e),void 0!==a&&(this.lockedSet_=a),i&&this.updateSelection_(void 0),i},t.prototype.mouseOut_=function(t){this.getFunctionOption("unhighlightCallback")&&this.getFunctionOption("unhighlightCallback").call(this,t),this.getBooleanOption("hideOverlayOnMouseOut")&&!this.lockedSet_&&this.clearSelection()},t.prototype.clearSelection=function(){return this.cascadeEvents_("deselect",{}),this.lockedSet_=!1,this.fadeLevel?void this.animateSelection_(-1):(this.canvas_ctx_.clearRect(0,0,this.width_,this.height_),this.fadeLevel=0,this.selPoints_=[],this.lastx_=-1,this.lastRow_=-1,void(this.highlightSet_=null))},t.prototype.getSelection=function(){if(!this.selPoints_||this.selPoints_.length<1)return-1;for(var t=0;t<this.layout_.points.length;t++)for(var e=this.layout_.points[t],a=0;a<e.length;a++)if(e[a].x==this.selPoints_[0].x)return e[a].idx;return-1},t.prototype.getHighlightSeries=function(){return this.highlightSet_},t.prototype.isSeriesLocked=function(){return this.lockedSet_},t.prototype.loadedEvent_=function(t){this.rawData_=this.parseCSV_(t),this.cascadeDataDidUpdateEvent_(),this.predraw_()},t.prototype.addXTicks_=function(){var t;t=this.dateWindow_?[this.dateWindow_[0],this.dateWindow_[1]]:this.xAxisExtremes();var e=this.optionsViewForAxis_("x"),a=e("ticker")(t[0],t[1],this.plotter_.area.w,e,this);this.layout_.setXTicks(a)},t.prototype.getHandlerClass_=function(){var e;return e=this.attr_("dataHandler")?this.attr_("dataHandler"):this.fractions_?this.getBooleanOption("errorBars")?t.DataHandlers.FractionsBarsHandler:t.DataHandlers.DefaultFractionHandler:this.getBooleanOption("customBars")?t.DataHandlers.CustomBarsHandler:this.getBooleanOption("errorBars")?t.DataHandlers.ErrorBarsHandler:t.DataHandlers.DefaultHandler},t.prototype.predraw_=function(){var t=new Date;this.dataHandler_=new(this.getHandlerClass_()),this.layout_.computePlotArea(),this.computeYAxes_(),this.is_initial_draw_||(this.canvas_ctx_.restore(),this.hidden_ctx_.restore()),this.canvas_ctx_.save(),this.hidden_ctx_.save(),this.plotter_=new DygraphCanvasRenderer(this,this.hidden_,this.hidden_ctx_,this.layout_),this.createRollInterface_(),this.cascadeEvents_("predraw"),this.rolledSeries_=[null];for(var e=1;e<this.numColumns();e++){var a=this.dataHandler_.extractSeries(this.rawData_,e,this.attributes_);this.rollPeriod_>1&&(a=this.dataHandler_.rollingAverage(a,this.rollPeriod_,this.attributes_)),this.rolledSeries_.push(a)}this.drawGraph_();var i=new Date;this.drawingTimeMs_=i-t},t.PointType=void 0,t.stackPoints_=function(t,e,a,i){for(var r=null,n=null,o=null,s=-1,l=function(e){if(!(s>=e))for(var a=e;a<t.length;++a)if(o=null,!isNaN(t[a].yval)&&null!==t[a].yval){s=a,o=t[a];break}},h=0;h<t.length;++h){var p=t[h],g=p.xval;void 0===e[g]&&(e[g]=0);var d=p.yval;isNaN(d)||null===d?"none"==i?d=0:(l(h),d=n&&o&&"none"!=i?n.yval+(o.yval-n.yval)*((g-n.xval)/(o.xval-n.xval)):n&&"all"==i?n.yval:o&&"all"==i?o.yval:0):n=p;var u=e[g];r!=g&&(u+=d,e[g]=u),r=g,p.yval_stacked=u,u>a[1]&&(a[1]=u),u<a[0]&&(a[0]=u)}},t.prototype.gatherDatasets_=function(e,a){var i,r,n,o,s,l,h=[],p=[],g=[],d={},u=e.length-1;for(i=u;i>=1;i--)if(this.visibility()[i-1]){if(a){l=e[i];var c=a[0],y=a[1];for(n=null,o=null,r=0;r<l.length;r++)l[r][0]>=c&&null===n&&(n=r),l[r][0]<=y&&(o=r);null===n&&(n=0);for(var _=n,v=!0;v&&_>0;)_--,v=null===l[_][1];null===o&&(o=l.length-1);var f=o;for(v=!0;v&&f<l.length-1;)f++,v=null===l[f][1];_!==n&&(n=_),f!==o&&(o=f),h[i-1]=[n,o],l=l.slice(n,o+1)}else l=e[i],h[i-1]=[0,l.length-1];var x=this.attr_("labels")[i],m=this.dataHandler_.getExtremeYValues(l,a,this.getBooleanOption("stepPlot",x)),D=this.dataHandler_.seriesToPoints(l,x,h[i-1][0]);this.getBooleanOption("stackedGraph")&&(s=this.attributes_.axisForSeries(x),void 0===g[s]&&(g[s]=[]),t.stackPoints_(D,g[s],m,this.getBooleanOption("stackedGraphNaNFill"))),d[x]=m,p[i]=D}return{points:p,extremes:d,boundaryIds:h}},t.prototype.drawGraph_=function(){var t=new Date,e=this.is_initial_draw_;this.is_initial_draw_=!1,this.layout_.removeAllDatasets(),this.setColors_(),this.attrs_.pointSize=.5*this.getNumericOption("highlightCircleSize");var a=this.gatherDatasets_(this.rolledSeries_,this.dateWindow_),i=a.points,r=a.extremes;this.boundaryIds_=a.boundaryIds,this.setIndexByName_={};var n=this.attr_("labels");n.length>0&&(this.setIndexByName_[n[0]]=0);for(var o=0,s=1;s<i.length;s++)this.setIndexByName_[n[s]]=s,this.visibility()[s-1]&&(this.layout_.addDataset(n[s],i[s]),this.datasetIndex_[s]=o++);this.computeYAxisRanges_(r),this.layout_.setYAxes(this.axes_),this.addXTicks_();var l=this.zoomed_x_;if(this.zoomed_x_=l,this.layout_.evaluate(),this.renderGraph_(e),this.getStringOption("timingName")){var h=new Date;console.log(this.getStringOption("timingName")+" - drawGraph: "+(h-t)+"ms")}},t.prototype.renderGraph_=function(t){this.cascadeEvents_("clearChart"),this.plotter_.clear(),this.getFunctionOption("underlayCallback")&&this.getFunctionOption("underlayCallback").call(this,this.hidden_ctx_,this.layout_.getPlotArea(),this,this);var e={canvas:this.hidden_,drawingContext:this.hidden_ctx_};if(this.cascadeEvents_("willDrawChart",e),this.plotter_.render(),this.cascadeEvents_("didDrawChart",e),this.lastRow_=-1,this.canvas_.getContext("2d").clearRect(0,0,this.width_,this.height_),null!==this.getFunctionOption("drawCallback")&&this.getFunctionOption("drawCallback").call(this,this,t),t)for(this.readyFired_=!0;this.readyFns_.length>0;){var a=this.readyFns_.pop();a(this)}},t.prototype.computeYAxes_=function(){var e,a,i,r,n;if(void 0!==this.axes_&&this.user_attrs_.hasOwnProperty("valueRange")===!1)for(e=[],i=0;i<this.axes_.length;i++)e.push(this.axes_[i].valueWindow);for(this.axes_=[],a=0;a<this.attributes_.numAxes();a++)r={g:this},t.update(r,this.attributes_.axisOptions(a)),this.axes_[a]=r;if(n=this.attr_("valueRange"),n&&(this.axes_[0].valueRange=n),void 0!==e){var o=Math.min(e.length,this.axes_.length);for(i=0;o>i;i++)this.axes_[i].valueWindow=e[i]}for(a=0;a<this.axes_.length;a++)if(0===a)r=this.optionsViewForAxis_("y"+(a?"2":"")),n=r("valueRange"),n&&(this.axes_[a].valueRange=n);else{var s=this.user_attrs_.axes;s&&s.y2&&(n=s.y2.valueRange,n&&(this.axes_[a].valueRange=n))}},t.prototype.numAxes=function(){return this.attributes_.numAxes()},t.prototype.axisPropertiesForSeries=function(t){return this.axes_[this.attributes_.axisForSeries(t)]},t.prototype.computeYAxisRanges_=function(t){for(var e,a,i,r,n,o=function(t){return isNaN(parseFloat(t))},s=this.attributes_.numAxes(),l=0;s>l;l++){var h=this.axes_[l],p=this.attributes_.getForAxis("logscale",l),g=this.attributes_.getForAxis("includeZero",l),d=this.attributes_.getForAxis("independentTicks",l);if(i=this.attributes_.seriesForAxis(l),e=!0,r=.1,null!==this.getNumericOption("yRangePad")&&(e=!1,r=this.getNumericOption("yRangePad")/this.plotter_.area.h),0===i.length)h.extremeRange=[0,1];else{for(var u,c,y=1/0,_=-(1/0),v=0;v<i.length;v++)t.hasOwnProperty(i[v])&&(u=t[i[v]][0],null!==u&&(y=Math.min(u,y)),c=t[i[v]][1],null!==c&&(_=Math.max(c,_)));g&&!p&&(y>0&&(y=0),0>_&&(_=0)),y==1/0&&(y=0),_==-(1/0)&&(_=1),a=_-y,0===a&&(0!==_?a=Math.abs(_):(_=1,a=1));var f,x;if(p)if(e)f=_+r*a,x=y;else{var m=Math.exp(Math.log(a)*r);f=_*m,x=y/m}else f=_+r*a,x=y-r*a,e&&!this.getBooleanOption("avoidMinZero")&&(0>x&&y>=0&&(x=0),f>0&&0>=_&&(f=0));h.extremeRange=[x,f]}if(h.valueWindow)h.computedValueRange=[h.valueWindow[0],h.valueWindow[1]];else if(h.valueRange){var D=o(h.valueRange[0])?h.extremeRange[0]:h.valueRange[0],w=o(h.valueRange[1])?h.extremeRange[1]:h.valueRange[1];if(!e)if(h.logscale){var m=Math.exp(Math.log(a)*r);D*=m,w/=m}else a=w-D,D-=a*r,w+=a*r;h.computedValueRange=[D,w]}else h.computedValueRange=h.extremeRange;if(d){h.independentTicks=d;var A=this.optionsViewForAxis_("y"+(l?"2":"")),b=A("ticker");h.ticks=b(h.computedValueRange[0],h.computedValueRange[1],this.plotter_.area.h,A,this),n||(n=h)}}if(void 0===n)throw'Configuration Error: At least one axis has to have the "independentTicks" option activated.';for(var l=0;s>l;l++){var h=this.axes_[l];if(!h.independentTicks){for(var A=this.optionsViewForAxis_("y"+(l?"2":"")),b=A("ticker"),T=n.ticks,E=n.computedValueRange[1]-n.computedValueRange[0],C=h.computedValueRange[1]-h.computedValueRange[0],L=[],P=0;P<T.length;P++){var S=(T[P].v-n.computedValueRange[0])/E,O=h.computedValueRange[0]+S*C;L.push(O)}h.ticks=b(h.computedValueRange[0],h.computedValueRange[1],this.plotter_.area.h,A,this,L)}}},t.prototype.detectTypeFromString_=function(t){var e=!1,a=t.indexOf("-");a>0&&"e"!=t[a-1]&&"E"!=t[a-1]||t.indexOf("/")>=0||isNaN(parseFloat(t))?e=!0:8==t.length&&t>"19700101"&&"20371231">t&&(e=!0),this.setXAxisOptions_(e)},t.prototype.setXAxisOptions_=function(e){e?(this.attrs_.xValueParser=t.dateParser,this.attrs_.axes.x.valueFormatter=t.dateValueFormatter,this.attrs_.axes.x.ticker=t.dateTicker,this.attrs_.axes.x.axisLabelFormatter=t.dateAxisLabelFormatter):(this.attrs_.xValueParser=function(t){return parseFloat(t)},this.attrs_.axes.x.valueFormatter=function(t){return t},this.attrs_.axes.x.ticker=t.numericTicks,this.attrs_.axes.x.axisLabelFormatter=this.attrs_.axes.x.valueFormatter)},t.prototype.parseCSV_=function(e){var a,i,r=[],n=t.detectLineDelimiter(e),o=e.split(n||"\n"),s=this.getStringOption("delimiter");-1==o[0].indexOf(s)&&o[0].indexOf("	")>=0&&(s="	");var l=0;"labels"in this.user_attrs_||(l=1,this.attrs_.labels=o[0].split(s),this.attributes_.reparseSeries());for(var h,p=0,g=!1,d=this.attr_("labels").length,u=!1,c=l;c<o.length;c++){var y=o[c];if(p=c,0!==y.length&&"#"!=y[0]){var _=y.split(s);if(!(_.length<2)){var v=[];if(g||(this.detectTypeFromString_(_[0]),h=this.getFunctionOption("xValueParser"),g=!0),v[0]=h(_[0],this),this.fractions_)for(i=1;i<_.length;i++)a=_[i].split("/"),2!=a.length?(console.error('Expected fractional "num/den" values in CSV data but found a value \''+_[i]+"' on line "+(1+c)+" ('"+y+"') which is not of this form."),v[i]=[0,0]):v[i]=[t.parseFloat_(a[0],c,y),t.parseFloat_(a[1],c,y)];else if(this.getBooleanOption("errorBars"))for(_.length%2!=1&&console.error("Expected alternating (value, stdev.) pairs in CSV data but line "+(1+c)+" has an odd number of values ("+(_.length-1)+"): '"+y+"'"),i=1;i<_.length;i+=2)v[(i+1)/2]=[t.parseFloat_(_[i],c,y),t.parseFloat_(_[i+1],c,y)];else if(this.getBooleanOption("customBars"))for(i=1;i<_.length;i++){var f=_[i];/^ *$/.test(f)?v[i]=[null,null,null]:(a=f.split(";"),3==a.length?v[i]=[t.parseFloat_(a[0],c,y),t.parseFloat_(a[1],c,y),t.parseFloat_(a[2],c,y)]:console.warn('When using customBars, values must be either blank or "low;center;high" tuples (got "'+f+'" on line '+(1+c)))}else for(i=1;i<_.length;i++)v[i]=t.parseFloat_(_[i],c,y);if(r.length>0&&v[0]<r[r.length-1][0]&&(u=!0),v.length!=d&&console.error("Number of columns in line "+c+" ("+v.length+") does not agree with number of labels ("+d+") "+y),0===c&&this.attr_("labels")){var x=!0;for(i=0;x&&i<v.length;i++)v[i]&&(x=!1);if(x){console.warn("The dygraphs 'labels' option is set, but the first row of CSV data ('"+y+"') appears to also contain labels. Will drop the CSV labels and use the option labels.");continue}}r.push(v)}}}return u&&(console.warn("CSV is out of order; order it correctly to speed loading."),r.sort(function(t,e){return t[0]-e[0]})),r},t.prototype.parseArray_=function(e){if(0===e.length)return console.error("Can't plot empty data set"),null;if(0===e[0].length)return console.error("Data set cannot contain an empty row"),null;var a;if(null===this.attr_("labels")){for(console.warn("Using default labels. Set labels explicitly via 'labels' in the options parameter"),this.attrs_.labels=["X"],a=1;a<e[0].length;a++)this.attrs_.labels.push("Y"+a);this.attributes_.reparseSeries()}else{var i=this.attr_("labels");if(i.length!=e[0].length)return console.error("Mismatch between number of labels ("+i+") and number of columns in array ("+e[0].length+")"),null}if(t.isDateLike(e[0][0])){this.attrs_.axes.x.valueFormatter=t.dateValueFormatter,this.attrs_.axes.x.ticker=t.dateTicker,this.attrs_.axes.x.axisLabelFormatter=t.dateAxisLabelFormatter;var r=t.clone(e);for(a=0;a<e.length;a++){if(0===r[a].length)return console.error("Row "+(1+a)+" of data is empty"),null;if(null===r[a][0]||"function"!=typeof r[a][0].getTime||isNaN(r[a][0].getTime()))return console.error("x value in row "+(1+a)+" is not a Date"),null;r[a][0]=r[a][0].getTime()}return r}return this.attrs_.axes.x.valueFormatter=function(t){return t},this.attrs_.axes.x.ticker=t.numericTicks,this.attrs_.axes.x.axisLabelFormatter=t.numberAxisLabelFormatter,e},t.prototype.parseDataTable_=function(e){var a=function(t){var e=String.fromCharCode(65+t%26);for(t=Math.floor(t/26);t>0;)e=String.fromCharCode(65+(t-1)%26)+e.toLowerCase(),t=Math.floor((t-1)/26);return e},i=e.getNumberOfColumns(),r=e.getNumberOfRows(),n=e.getColumnType(0);if("date"==n||"datetime"==n)this.attrs_.xValueParser=t.dateParser,this.attrs_.axes.x.valueFormatter=t.dateValueFormatter,this.attrs_.axes.x.ticker=t.dateTicker,this.attrs_.axes.x.axisLabelFormatter=t.dateAxisLabelFormatter;else{if("number"!=n)return console.error("only 'date', 'datetime' and 'number' types are supported for column 1 of DataTable input (Got '"+n+"')"),null;this.attrs_.xValueParser=function(t){return parseFloat(t)},this.attrs_.axes.x.valueFormatter=function(t){return t},this.attrs_.axes.x.ticker=t.numericTicks,this.attrs_.axes.x.axisLabelFormatter=this.attrs_.axes.x.valueFormatter}var o,s,l=[],h={},p=!1;for(o=1;i>o;o++){var g=e.getColumnType(o);if("number"==g)l.push(o);else if("string"==g&&this.getBooleanOption("displayAnnotations")){var d=l[l.length-1];h.hasOwnProperty(d)?h[d].push(o):h[d]=[o],p=!0}else console.error("Only 'number' is supported as a dependent type with Gviz. 'string' is only supported if displayAnnotations is true")}var u=[e.getColumnLabel(0)];for(o=0;o<l.length;o++)u.push(e.getColumnLabel(l[o])),this.getBooleanOption("errorBars")&&(o+=1);this.attrs_.labels=u,i=u.length;var c=[],y=!1,_=[];for(o=0;r>o;o++){var v=[];if("undefined"!=typeof e.getValue(o,0)&&null!==e.getValue(o,0)){if(v.push("date"==n||"datetime"==n?e.getValue(o,0).getTime():e.getValue(o,0)),this.getBooleanOption("errorBars"))for(s=0;i-1>s;s++)v.push([e.getValue(o,1+2*s),e.getValue(o,2+2*s)]);else{for(s=0;s<l.length;s++){var f=l[s];if(v.push(e.getValue(o,f)),p&&h.hasOwnProperty(f)&&null!==e.getValue(o,h[f][0])){var x={};x.series=e.getColumnLabel(f),x.xval=v[0],x.shortText=a(_.length),x.text="";for(var m=0;m<h[f].length;m++)m&&(x.text+="\n"),x.text+=e.getValue(o,h[f][m]);_.push(x)}}for(s=0;s<v.length;s++)isFinite(v[s])||(v[s]=null)}c.length>0&&v[0]<c[c.length-1][0]&&(y=!0),c.push(v)}else console.warn("Ignoring row "+o+" of DataTable because of undefined or null first column.")}y&&(console.warn("DataTable is out of order; order it correctly to speed loading."),c.sort(function(t,e){return t[0]-e[0]})),this.rawData_=c,_.length>0&&this.setAnnotations(_,!0),this.attributes_.reparseSeries()},t.prototype.cascadeDataDidUpdateEvent_=function(){this.cascadeEvents_("dataDidUpdate",{})},t.prototype.start_=function(){var e=this.file_;if("function"==typeof e&&(e=e()),t.isArrayLike(e))this.rawData_=this.parseArray_(e),this.cascadeDataDidUpdateEvent_(),this.predraw_();else if("object"==typeof e&&"function"==typeof e.getColumnRange)this.parseDataTable_(e),this.cascadeDataDidUpdateEvent_(),this.predraw_();else if("string"==typeof e){var a=t.detectLineDelimiter(e);if(a)this.loadedEvent_(e);else{var i;i=window.XMLHttpRequest?new XMLHttpRequest:new ActiveXObject("Microsoft.XMLHTTP");var r=this;i.onreadystatechange=function(){4==i.readyState&&(200===i.status||0===i.status)&&r.loadedEvent_(i.responseText)},i.open("GET",e,!0),i.send(null)}}else console.error("Unknown data format: "+typeof e)},t.prototype.updateOptions=function(e,a){"undefined"==typeof a&&(a=!1);var i=e.file,r=t.mapLegacyOptions_(e);"rollPeriod"in r&&(this.rollPeriod_=r.rollPeriod),"dateWindow"in r&&(this.dateWindow_=r.dateWindow,"isZoomedIgnoreProgrammaticZoom"in r||(this.zoomed_x_=null!==r.dateWindow)),"valueRange"in r&&!("isZoomedIgnoreProgrammaticZoom"in r)&&(this.zoomed_y_=null!==r.valueRange);var n=t.isPixelChangingOptionList(this.attr_("labels"),r);t.updateDeep(this.user_attrs_,r),this.attributes_.reparseSeries(),i?(this.cascadeEvents_("dataWillUpdate",{}),this.file_=i,a||this.start_()):a||(n?this.predraw_():this.renderGraph_(!1))},t.mapLegacyOptions_=function(t){var e={};for(var a in t)t.hasOwnProperty(a)&&"file"!=a&&t.hasOwnProperty(a)&&(e[a]=t[a]);var i=function(t,a,i){e.axes||(e.axes={}),e.axes[t]||(e.axes[t]={}),e.axes[t][a]=i},r=function(a,r,n){"undefined"!=typeof t[a]&&(console.warn("Option "+a+" is deprecated. Use the "+n+" option for the "+r+" axis instead. (e.g. { axes : { "+r+" : { "+n+" : ... } } } (see http://dygraphs.com/per-axis.html for more information."),i(r,n,t[a]),delete e[a])};return r("xValueFormatter","x","valueFormatter"),r("pixelsPerXLabel","x","pixelsPerLabel"),r("xAxisLabelFormatter","x","axisLabelFormatter"),r("xTicker","x","ticker"),r("yValueFormatter","y","valueFormatter"),r("pixelsPerYLabel","y","pixelsPerLabel"),r("yAxisLabelFormatter","y","axisLabelFormatter"),r("yTicker","y","ticker"),r("drawXGrid","x","drawGrid"),r("drawXAxis","x","drawAxis"),r("drawYGrid","y","drawGrid"),r("drawYAxis","y","drawAxis"),r("xAxisLabelWidth","x","axisLabelWidth"),r("yAxisLabelWidth","y","axisLabelWidth"),e},t.prototype.resize=function(t,e){if(!this.resize_lock){this.resize_lock=!0,null===t!=(null===e)&&(console.warn("Dygraph.resize() should be called with zero parameters or two non-NULL parameters. Pretending it was zero."),t=e=null);var a=this.width_,i=this.height_;t?(this.maindiv_.style.width=t+"px",this.maindiv_.style.height=e+"px",this.width_=t,this.height_=e):(this.width_=this.maindiv_.clientWidth,this.height_=this.maindiv_.clientHeight),(a!=this.width_||i!=this.height_)&&(this.resizeElements_(),this.predraw_()),this.resize_lock=!1}},t.prototype.adjustRoll=function(t){this.rollPeriod_=t,this.predraw_()},t.prototype.visibility=function(){for(this.getOption("visibility")||(this.attrs_.visibility=[]);this.getOption("visibility").length<this.numColumns()-1;)this.attrs_.visibility.push(!0);return this.getOption("visibility")},t.prototype.setVisibility=function(t,e){var a=this.visibility();0>t||t>=a.length?console.warn("invalid series number in setVisibility: "+t):(a[t]=e,this.predraw_())},t.prototype.size=function(){return{width:this.width_,height:this.height_}},t.prototype.setAnnotations=function(e,a){return t.addAnnotationRule(),this.annotations_=e,this.layout_?(this.layout_.setAnnotations(this.annotations_),void(a||this.predraw_())):void console.warn("Tried to setAnnotations before dygraph was ready. Try setting them in a ready() block. See dygraphs.com/tests/annotation.html")},t.prototype.annotations=function(){return this.annotations_},t.prototype.getLabels=function(){var t=this.attr_("labels");return t?t.slice():null},t.prototype.indexFromSetName=function(t){return this.setIndexByName_[t]},t.prototype.ready=function(t){this.is_initial_draw_?this.readyFns_.push(t):t.call(this,this)},t.addAnnotationRule=function(){if(!t.addedAnnotationCSS){var e="border: 1px solid black; background-color: white; text-align: center;",a=document.createElement("style");a.type="text/css",document.getElementsByTagName("head")[0].appendChild(a);for(var i=0;i<document.styleSheets.length;i++)if(!document.styleSheets[i].disabled){var r=document.styleSheets[i];try{if(r.insertRule){var n=r.cssRules?r.cssRules.length:0;r.insertRule(".dygraphDefaultAnnotation { "+e+" }",n)}else r.addRule&&r.addRule(".dygraphDefaultAnnotation",e);return void(t.addedAnnotationCSS=!0)}catch(o){}}console.warn("Unable to add default annotation CSS rule; display may be off.")}},"object"==typeof exports&&"undefined"!=typeof module&&(module.exports=t),t}();!function(){"use strict";function t(t){var e=a.exec(t);if(!e)return null;var i=parseInt(e[1],10),r=parseInt(e[2],10),n=parseInt(e[3],10);return e[4]?{r:i,g:r,b:n,a:parseFloat(e[4])}:{r:i,g:r,b:n}}Dygraph.LOG_SCALE=10,Dygraph.LN_TEN=Math.log(Dygraph.LOG_SCALE),Dygraph.log10=function(t){return Math.log(t)/Dygraph.LN_TEN},Dygraph.DOTTED_LINE=[2,2],Dygraph.DASHED_LINE=[7,3],Dygraph.DOT_DASH_LINE=[7,2,2,2],Dygraph.getContext=function(t){return t.getContext("2d")},Dygraph.addEvent=function(t,e,a){t.addEventListener?t.addEventListener(e,a,!1):(t[e+a]=function(){a(window.event)},t.attachEvent("on"+e,t[e+a]))},Dygraph.prototype.addAndTrackEvent=function(t,e,a){Dygraph.addEvent(t,e,a),this.registeredEvents_.push({elem:t,type:e,fn:a})},Dygraph.removeEvent=function(t,e,a){if(t.removeEventListener)t.removeEventListener(e,a,!1);else{try{t.detachEvent("on"+e,t[e+a])}catch(i){}t[e+a]=null}},Dygraph.prototype.removeTrackedEvents_=function(){if(this.registeredEvents_)for(var t=0;t<this.registeredEvents_.length;t++){var e=this.registeredEvents_[t];Dygraph.removeEvent(e.elem,e.type,e.fn)}this.registeredEvents_=[]},Dygraph.cancelEvent=function(t){return t=t?t:window.event,t.stopPropagation&&t.stopPropagation(),t.preventDefault&&t.preventDefault(),t.cancelBubble=!0,t.cancel=!0,t.returnValue=!1,!1},Dygraph.hsvToRGB=function(t,e,a){var i,r,n;if(0===e)i=a,r=a,n=a;else{var o=Math.floor(6*t),s=6*t-o,l=a*(1-e),h=a*(1-e*s),p=a*(1-e*(1-s));switch(o){case 1:i=h,r=a,n=l;break;case 2:i=l,r=a,n=p;break;case 3:i=l,r=h,n=a;break;case 4:i=p,r=l,n=a;break;case 5:i=a,r=l,n=h;break;case 6:case 0:i=a,r=p,n=l}}return i=Math.floor(255*i+.5),r=Math.floor(255*r+.5),n=Math.floor(255*n+.5),"rgb("+i+","+r+","+n+")"},Dygraph.findPos=function(t){var e=0,a=0;if(t.offsetParent)for(var i=t;;){var r="0",n="0";if(window.getComputedStyle){
+var o=window.getComputedStyle(i,null);r=o.borderLeft||"0",n=o.borderTop||"0"}if(e+=parseInt(r,10),a+=parseInt(n,10),e+=i.offsetLeft,a+=i.offsetTop,!i.offsetParent)break;i=i.offsetParent}else t.x&&(e+=t.x),t.y&&(a+=t.y);for(;t&&t!=document.body;)e-=t.scrollLeft,a-=t.scrollTop,t=t.parentNode;return{x:e,y:a}},Dygraph.pageX=function(t){if(t.pageX)return!t.pageX||t.pageX<0?0:t.pageX;var e=document.documentElement,a=document.body;return t.clientX+(e.scrollLeft||a.scrollLeft)-(e.clientLeft||0)},Dygraph.pageY=function(t){if(t.pageY)return!t.pageY||t.pageY<0?0:t.pageY;var e=document.documentElement,a=document.body;return t.clientY+(e.scrollTop||a.scrollTop)-(e.clientTop||0)},Dygraph.dragGetX_=function(t,e){return Dygraph.pageX(t)-e.px},Dygraph.dragGetY_=function(t,e){return Dygraph.pageY(t)-e.py},Dygraph.isOK=function(t){return!!t&&!isNaN(t)},Dygraph.isValidPoint=function(t,e){return t?null===t.yval?!1:null===t.x||void 0===t.x?!1:null===t.y||void 0===t.y?!1:isNaN(t.x)||!e&&isNaN(t.y)?!1:!0:!1},Dygraph.floatFormat=function(t,e){var a=Math.min(Math.max(1,e||2),21);return Math.abs(t)<.001&&0!==t?t.toExponential(a-1):t.toPrecision(a)},Dygraph.zeropad=function(t){return 10>t?"0"+t:""+t},Dygraph.DateAccessorsLocal={getFullYear:function(t){return t.getFullYear()},getMonth:function(t){return t.getMonth()},getDate:function(t){return t.getDate()},getHours:function(t){return t.getHours()},getMinutes:function(t){return t.getMinutes()},getSeconds:function(t){return t.getSeconds()},getMilliseconds:function(t){return t.getMilliseconds()},getDay:function(t){return t.getDay()},makeDate:function(t,e,a,i,r,n,o){return new Date(t,e,a,i,r,n,o)}},Dygraph.DateAccessorsUTC={getFullYear:function(t){return t.getUTCFullYear()},getMonth:function(t){return t.getUTCMonth()},getDate:function(t){return t.getUTCDate()},getHours:function(t){return t.getUTCHours()},getMinutes:function(t){return t.getUTCMinutes()},getSeconds:function(t){return t.getUTCSeconds()},getMilliseconds:function(t){return t.getUTCMilliseconds()},getDay:function(t){return t.getUTCDay()},makeDate:function(t,e,a,i,r,n,o){return new Date(Date.UTC(t,e,a,i,r,n,o))}},Dygraph.hmsString_=function(t,e,a){var i=Dygraph.zeropad,r=i(t)+":"+i(e);return a&&(r+=":"+i(a)),r},Dygraph.dateString_=function(t,e){var a=Dygraph.zeropad,i=e?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal,r=new Date(t),n=i.getFullYear(r),o=i.getMonth(r),s=i.getDate(r),l=i.getHours(r),h=i.getMinutes(r),p=i.getSeconds(r),g=""+n,d=a(o+1),u=a(s),c=3600*l+60*h+p,y=g+"/"+d+"/"+u;return c&&(y+=" "+Dygraph.hmsString_(l,h,p)),y},Dygraph.round_=function(t,e){var a=Math.pow(10,e);return Math.round(t*a)/a},Dygraph.binarySearch=function(t,e,a,i,r){if((null===i||void 0===i||null===r||void 0===r)&&(i=0,r=e.length-1),i>r)return-1;(null===a||void 0===a)&&(a=0);var n,o=function(t){return t>=0&&t<e.length},s=parseInt((i+r)/2,10),l=e[s];return l==t?s:l>t?a>0&&(n=s-1,o(n)&&e[n]<t)?s:Dygraph.binarySearch(t,e,a,i,s-1):t>l?0>a&&(n=s+1,o(n)&&e[n]>t)?s:Dygraph.binarySearch(t,e,a,s+1,r):-1},Dygraph.dateParser=function(t){var e,a;if((-1==t.search("-")||-1!=t.search("T")||-1!=t.search("Z"))&&(a=Dygraph.dateStrToMillis(t),a&&!isNaN(a)))return a;if(-1!=t.search("-")){for(e=t.replace("-","/","g");-1!=e.search("-");)e=e.replace("-","/");a=Dygraph.dateStrToMillis(e)}else 8==t.length?(e=t.substr(0,4)+"/"+t.substr(4,2)+"/"+t.substr(6,2),a=Dygraph.dateStrToMillis(e)):a=Dygraph.dateStrToMillis(t);return(!a||isNaN(a))&&console.error("Couldn't parse "+t+" as a date"),a},Dygraph.dateStrToMillis=function(t){return new Date(t).getTime()},Dygraph.update=function(t,e){if("undefined"!=typeof e&&null!==e)for(var a in e)e.hasOwnProperty(a)&&(t[a]=e[a]);return t},Dygraph.updateDeep=function(t,e){function a(t){return"object"==typeof Node?t instanceof Node:"object"==typeof t&&"number"==typeof t.nodeType&&"string"==typeof t.nodeName}if("undefined"!=typeof e&&null!==e)for(var i in e)e.hasOwnProperty(i)&&(null===e[i]?t[i]=null:Dygraph.isArrayLike(e[i])?t[i]=e[i].slice():a(e[i])?t[i]=e[i]:"object"==typeof e[i]?(("object"!=typeof t[i]||null===t[i])&&(t[i]={}),Dygraph.updateDeep(t[i],e[i])):t[i]=e[i]);return t},Dygraph.isArrayLike=function(t){var e=typeof t;return"object"!=e&&("function"!=e||"function"!=typeof t.item)||null===t||"number"!=typeof t.length||3===t.nodeType?!1:!0},Dygraph.isDateLike=function(t){return"object"!=typeof t||null===t||"function"!=typeof t.getTime?!1:!0},Dygraph.clone=function(t){for(var e=[],a=0;a<t.length;a++)e.push(Dygraph.isArrayLike(t[a])?Dygraph.clone(t[a]):t[a]);return e},Dygraph.createCanvas=function(){var t=document.createElement("canvas"),e=/MSIE/.test(navigator.userAgent)&&!window.opera;return e&&"undefined"!=typeof G_vmlCanvasManager&&(t=G_vmlCanvasManager.initElement(t)),t},Dygraph.getContextPixelRatio=function(t){try{var e=window.devicePixelRatio,a=t.webkitBackingStorePixelRatio||t.mozBackingStorePixelRatio||t.msBackingStorePixelRatio||t.oBackingStorePixelRatio||t.backingStorePixelRatio||1;return void 0!==e?e/a:1}catch(i){return 1}},Dygraph.isAndroid=function(){return/Android/.test(navigator.userAgent)},Dygraph.Iterator=function(t,e,a,i){e=e||0,a=a||t.length,this.hasNext=!0,this.peek=null,this.start_=e,this.array_=t,this.predicate_=i,this.end_=Math.min(t.length,e+a),this.nextIdx_=e-1,this.next()},Dygraph.Iterator.prototype.next=function(){if(!this.hasNext)return null;for(var t=this.peek,e=this.nextIdx_+1,a=!1;e<this.end_;){if(!this.predicate_||this.predicate_(this.array_,e)){this.peek=this.array_[e],a=!0;break}e++}return this.nextIdx_=e,a||(this.hasNext=!1,this.peek=null),t},Dygraph.createIterator=function(t,e,a,i){return new Dygraph.Iterator(t,e,a,i)},Dygraph.requestAnimFrame=function(){return window.requestAnimationFrame||window.webkitRequestAnimationFrame||window.mozRequestAnimationFrame||window.oRequestAnimationFrame||window.msRequestAnimationFrame||function(t){window.setTimeout(t,1e3/60)}}(),Dygraph.repeatAndCleanup=function(t,e,a,i){var r,n=0,o=(new Date).getTime();if(t(n),1==e)return void i();var s=e-1;!function l(){n>=e||Dygraph.requestAnimFrame.call(window,function(){var e=(new Date).getTime(),h=e-o;r=n,n=Math.floor(h/a);var p=n-r,g=n+p>s;g||n>=s?(t(s),i()):(0!==p&&t(n),l())})}()};var e={annotationClickHandler:!0,annotationDblClickHandler:!0,annotationMouseOutHandler:!0,annotationMouseOverHandler:!0,axisLabelColor:!0,axisLineColor:!0,axisLineWidth:!0,clickCallback:!0,drawCallback:!0,drawHighlightPointCallback:!0,drawPoints:!0,drawPointCallback:!0,drawXGrid:!0,drawYGrid:!0,fillAlpha:!0,gridLineColor:!0,gridLineWidth:!0,hideOverlayOnMouseOut:!0,highlightCallback:!0,highlightCircleSize:!0,interactionModel:!0,isZoomedIgnoreProgrammaticZoom:!0,labelsDiv:!0,labelsDivStyles:!0,labelsDivWidth:!0,labelsKMB:!0,labelsKMG2:!0,labelsSeparateLines:!0,labelsShowZeroValues:!0,legend:!0,panEdgeFraction:!0,pixelsPerYLabel:!0,pointClickCallback:!0,pointSize:!0,rangeSelectorPlotFillColor:!0,rangeSelectorPlotStrokeColor:!0,showLabelsOnHighlight:!0,showRoller:!0,strokeWidth:!0,underlayCallback:!0,unhighlightCallback:!0,zoomCallback:!0};Dygraph.isPixelChangingOptionList=function(t,a){var i={};if(t)for(var r=1;r<t.length;r++)i[t[r]]=!0;var n=function(t){for(var a in t)if(t.hasOwnProperty(a)&&!e[a])return!0;return!1};for(var o in a)if(a.hasOwnProperty(o))if("highlightSeriesOpts"==o||i[o]&&!a.series){if(n(a[o]))return!0}else if("series"==o||"axes"==o){var s=a[o];for(var l in s)if(s.hasOwnProperty(l)&&n(s[l]))return!0}else if(!e[o])return!0;return!1},Dygraph.Circles={DEFAULT:function(t,e,a,i,r,n,o){a.beginPath(),a.fillStyle=n,a.arc(i,r,o,0,2*Math.PI,!1),a.fill()}},Dygraph.IFrameTarp=function(){this.tarps=[]},Dygraph.IFrameTarp.prototype.cover=function(){for(var t=document.getElementsByTagName("iframe"),e=0;e<t.length;e++){var a=t[e],i=Dygraph.findPos(a),r=i.x,n=i.y,o=a.offsetWidth,s=a.offsetHeight,l=document.createElement("div");l.style.position="absolute",l.style.left=r+"px",l.style.top=n+"px",l.style.width=o+"px",l.style.height=s+"px",l.style.zIndex=999,document.body.appendChild(l),this.tarps.push(l)}},Dygraph.IFrameTarp.prototype.uncover=function(){for(var t=0;t<this.tarps.length;t++)this.tarps[t].parentNode.removeChild(this.tarps[t]);this.tarps=[]},Dygraph.detectLineDelimiter=function(t){for(var e=0;e<t.length;e++){var a=t.charAt(e);if("\r"===a)return e+1<t.length&&"\n"===t.charAt(e+1)?"\r\n":a;if("\n"===a)return e+1<t.length&&"\r"===t.charAt(e+1)?"\n\r":a}return null},Dygraph.isNodeContainedBy=function(t,e){if(null===e||null===t)return!1;for(var a=t;a&&a!==e;)a=a.parentNode;return a===e},Dygraph.pow=function(t,e){return 0>e?1/Math.pow(t,-e):Math.pow(t,e)};var a=/^rgba?\((\d{1,3}),\s*(\d{1,3}),\s*(\d{1,3})(?:,\s*([01](?:\.\d+)?))?\)$/;Dygraph.toRGB_=function(e){var a=t(e);if(a)return a;var i=document.createElement("div");i.style.backgroundColor=e,i.style.visibility="hidden",document.body.appendChild(i);var r;return r=window.getComputedStyle?window.getComputedStyle(i,null).backgroundColor:i.currentStyle.backgroundColor,document.body.removeChild(i),t(r)},Dygraph.isCanvasSupported=function(t){var e;try{e=t||document.createElement("canvas"),e.getContext("2d")}catch(a){var i=navigator.appVersion.match(/MSIE (\d\.\d)/),r=-1!=navigator.userAgent.toLowerCase().indexOf("opera");return!i||i[1]<6||r?!1:!0}return!0},Dygraph.parseFloat_=function(t,e,a){var i=parseFloat(t);if(!isNaN(i))return i;if(/^ *$/.test(t))return null;if(/^ *nan *$/i.test(t))return 0/0;var r="Unable to parse '"+t+"' as a number";return void 0!==a&&void 0!==e&&(r+=" on line "+(1+(e||0))+" ('"+a+"') of CSV."),console.error(r),null}}(),function(){"use strict";Dygraph.GVizChart=function(t){this.container=t},Dygraph.GVizChart.prototype.draw=function(t,e){this.container.innerHTML="","undefined"!=typeof this.date_graph&&this.date_graph.destroy(),this.date_graph=new Dygraph(this.container,t,e)},Dygraph.GVizChart.prototype.setSelection=function(t){var e=!1;t.length&&(e=t[0].row),this.date_graph.setSelection(e)},Dygraph.GVizChart.prototype.getSelection=function(){var t=[],e=this.date_graph.getSelection();if(0>e)return t;for(var a=this.date_graph.layout_.points,i=0;i<a.length;++i)t.push({row:e,column:i+1});return t}}(),function(){"use strict";var t=100;Dygraph.Interaction={},Dygraph.Interaction.maybeTreatMouseOpAsClick=function(t,e,a){a.dragEndX=Dygraph.dragGetX_(t,a),a.dragEndY=Dygraph.dragGetY_(t,a);var i=Math.abs(a.dragEndX-a.dragStartX),r=Math.abs(a.dragEndY-a.dragStartY);2>i&&2>r&&void 0!==e.lastx_&&-1!=e.lastx_&&Dygraph.Interaction.treatMouseOpAsClick(e,t,a),a.regionWidth=i,a.regionHeight=r},Dygraph.Interaction.startPan=function(t,e,a){var i,r;a.isPanning=!0;var n=e.xAxisRange();if(e.getOptionForAxis("logscale","x")?(a.initialLeftmostDate=Dygraph.log10(n[0]),a.dateRange=Dygraph.log10(n[1])-Dygraph.log10(n[0])):(a.initialLeftmostDate=n[0],a.dateRange=n[1]-n[0]),a.xUnitsPerPixel=a.dateRange/(e.plotter_.area.w-1),e.getNumericOption("panEdgeFraction")){var o=e.width_*e.getNumericOption("panEdgeFraction"),s=e.xAxisExtremes(),l=e.toDomXCoord(s[0])-o,h=e.toDomXCoord(s[1])+o,p=e.toDataXCoord(l),g=e.toDataXCoord(h);a.boundedDates=[p,g];var d=[],u=e.height_*e.getNumericOption("panEdgeFraction");for(i=0;i<e.axes_.length;i++){r=e.axes_[i];var c=r.extremeRange,y=e.toDomYCoord(c[0],i)+u,_=e.toDomYCoord(c[1],i)-u,v=e.toDataYCoord(y,i),f=e.toDataYCoord(_,i);d[i]=[v,f]}a.boundedValues=d}for(a.is2DPan=!1,a.axes=[],i=0;i<e.axes_.length;i++){r=e.axes_[i];var x={},m=e.yAxisRange(i),D=e.attributes_.getForAxis("logscale",i);D?(x.initialTopValue=Dygraph.log10(m[1]),x.dragValueRange=Dygraph.log10(m[1])-Dygraph.log10(m[0])):(x.initialTopValue=m[1],x.dragValueRange=m[1]-m[0]),x.unitsPerPixel=x.dragValueRange/(e.plotter_.area.h-1),a.axes.push(x),(r.valueWindow||r.valueRange)&&(a.is2DPan=!0)}},Dygraph.Interaction.movePan=function(t,e,a){a.dragEndX=Dygraph.dragGetX_(t,a),a.dragEndY=Dygraph.dragGetY_(t,a);var i=a.initialLeftmostDate-(a.dragEndX-a.dragStartX)*a.xUnitsPerPixel;a.boundedDates&&(i=Math.max(i,a.boundedDates[0]));var r=i+a.dateRange;if(a.boundedDates&&r>a.boundedDates[1]&&(i-=r-a.boundedDates[1],r=i+a.dateRange),e.getOptionForAxis("logscale","x")?e.dateWindow_=[Math.pow(Dygraph.LOG_SCALE,i),Math.pow(Dygraph.LOG_SCALE,r)]:e.dateWindow_=[i,r],a.is2DPan)for(var n=a.dragEndY-a.dragStartY,o=0;o<e.axes_.length;o++){var s=e.axes_[o],l=a.axes[o],h=n*l.unitsPerPixel,p=a.boundedValues?a.boundedValues[o]:null,g=l.initialTopValue+h;p&&(g=Math.min(g,p[1]));var d=g-l.dragValueRange;p&&d<p[0]&&(g-=d-p[0],d=g-l.dragValueRange),e.attributes_.getForAxis("logscale",o)?s.valueWindow=[Math.pow(Dygraph.LOG_SCALE,d),Math.pow(Dygraph.LOG_SCALE,g)]:s.valueWindow=[d,g]}e.drawGraph_(!1)},Dygraph.Interaction.endPan=Dygraph.Interaction.maybeTreatMouseOpAsClick,Dygraph.Interaction.startZoom=function(t,e,a){a.isZooming=!0,a.zoomMoved=!1},Dygraph.Interaction.moveZoom=function(t,e,a){a.zoomMoved=!0,a.dragEndX=Dygraph.dragGetX_(t,a),a.dragEndY=Dygraph.dragGetY_(t,a);var i=Math.abs(a.dragStartX-a.dragEndX),r=Math.abs(a.dragStartY-a.dragEndY);a.dragDirection=r/2>i?Dygraph.VERTICAL:Dygraph.HORIZONTAL,e.drawZoomRect_(a.dragDirection,a.dragStartX,a.dragEndX,a.dragStartY,a.dragEndY,a.prevDragDirection,a.prevEndX,a.prevEndY),a.prevEndX=a.dragEndX,a.prevEndY=a.dragEndY,a.prevDragDirection=a.dragDirection},Dygraph.Interaction.treatMouseOpAsClick=function(t,e,a){for(var i=t.getFunctionOption("clickCallback"),r=t.getFunctionOption("pointClickCallback"),n=null,o=-1,s=Number.MAX_VALUE,l=0;l<t.selPoints_.length;l++){var h=t.selPoints_[l],p=Math.pow(h.canvasx-a.dragEndX,2)+Math.pow(h.canvasy-a.dragEndY,2);!isNaN(p)&&(-1==o||s>p)&&(s=p,o=l)}var g=t.getNumericOption("highlightCircleSize")+2;if(g*g>=s&&(n=t.selPoints_[o]),n){var d={cancelable:!0,point:n,canvasx:a.dragEndX,canvasy:a.dragEndY},u=t.cascadeEvents_("pointClick",d);if(u)return;r&&r.call(t,e,n)}var d={cancelable:!0,xval:t.lastx_,pts:t.selPoints_,canvasx:a.dragEndX,canvasy:a.dragEndY};t.cascadeEvents_("click",d)||i&&i.call(t,e,t.lastx_,t.selPoints_)},Dygraph.Interaction.endZoom=function(t,e,a){e.clearZoomRect_(),a.isZooming=!1,Dygraph.Interaction.maybeTreatMouseOpAsClick(t,e,a);var i=e.getArea();if(a.regionWidth>=10&&a.dragDirection==Dygraph.HORIZONTAL){var r=Math.min(a.dragStartX,a.dragEndX),n=Math.max(a.dragStartX,a.dragEndX);r=Math.max(r,i.x),n=Math.min(n,i.x+i.w),n>r&&e.doZoomX_(r,n),a.cancelNextDblclick=!0}else if(a.regionHeight>=10&&a.dragDirection==Dygraph.VERTICAL){var o=Math.min(a.dragStartY,a.dragEndY),s=Math.max(a.dragStartY,a.dragEndY);o=Math.max(o,i.y),s=Math.min(s,i.y+i.h),s>o&&e.doZoomY_(o,s),a.cancelNextDblclick=!0}a.dragStartX=null,a.dragStartY=null},Dygraph.Interaction.startTouch=function(t,e,a){t.preventDefault(),t.touches.length>1&&(a.startTimeForDoubleTapMs=null);for(var i=[],r=0;r<t.touches.length;r++){var n=t.touches[r];i.push({pageX:n.pageX,pageY:n.pageY,dataX:e.toDataXCoord(n.pageX),dataY:e.toDataYCoord(n.pageY)})}if(a.initialTouches=i,1==i.length)a.initialPinchCenter=i[0],a.touchDirections={x:!0,y:!0};else if(i.length>=2){a.initialPinchCenter={pageX:.5*(i[0].pageX+i[1].pageX),pageY:.5*(i[0].pageY+i[1].pageY),dataX:.5*(i[0].dataX+i[1].dataX),dataY:.5*(i[0].dataY+i[1].dataY)};var o=180/Math.PI*Math.atan2(a.initialPinchCenter.pageY-i[0].pageY,i[0].pageX-a.initialPinchCenter.pageX);o=Math.abs(o),o>90&&(o=90-o),a.touchDirections={x:67.5>o,y:o>22.5}}a.initialRange={x:e.xAxisRange(),y:e.yAxisRange()}},Dygraph.Interaction.moveTouch=function(t,e,a){a.startTimeForDoubleTapMs=null;var i,r=[];for(i=0;i<t.touches.length;i++){var n=t.touches[i];r.push({pageX:n.pageX,pageY:n.pageY})}var o,s=a.initialTouches,l=a.initialPinchCenter;o=1==r.length?r[0]:{pageX:.5*(r[0].pageX+r[1].pageX),pageY:.5*(r[0].pageY+r[1].pageY)};var h={pageX:o.pageX-l.pageX,pageY:o.pageY-l.pageY},p=a.initialRange.x[1]-a.initialRange.x[0],g=a.initialRange.y[0]-a.initialRange.y[1];h.dataX=h.pageX/e.plotter_.area.w*p,h.dataY=h.pageY/e.plotter_.area.h*g;var d,u;if(1==r.length)d=1,u=1;else if(r.length>=2){var c=s[1].pageX-l.pageX;d=(r[1].pageX-o.pageX)/c;var y=s[1].pageY-l.pageY;u=(r[1].pageY-o.pageY)/y}d=Math.min(8,Math.max(.125,d)),u=Math.min(8,Math.max(.125,u));var _=!1;if(a.touchDirections.x&&(e.dateWindow_=[l.dataX-h.dataX+(a.initialRange.x[0]-l.dataX)/d,l.dataX-h.dataX+(a.initialRange.x[1]-l.dataX)/d],_=!0),a.touchDirections.y)for(i=0;1>i;i++){var v=e.axes_[i],f=e.attributes_.getForAxis("logscale",i);f||(v.valueWindow=[l.dataY-h.dataY+(a.initialRange.y[0]-l.dataY)/u,l.dataY-h.dataY+(a.initialRange.y[1]-l.dataY)/u],_=!0)}if(e.drawGraph_(!1),_&&r.length>1&&e.getFunctionOption("zoomCallback")){var x=e.xAxisRange();e.getFunctionOption("zoomCallback").call(e,x[0],x[1],e.yAxisRanges())}},Dygraph.Interaction.endTouch=function(t,e,a){if(0!==t.touches.length)Dygraph.Interaction.startTouch(t,e,a);else if(1==t.changedTouches.length){var i=(new Date).getTime(),r=t.changedTouches[0];a.startTimeForDoubleTapMs&&i-a.startTimeForDoubleTapMs<500&&a.doubleTapX&&Math.abs(a.doubleTapX-r.screenX)<50&&a.doubleTapY&&Math.abs(a.doubleTapY-r.screenY)<50?e.resetZoom():(a.startTimeForDoubleTapMs=i,a.doubleTapX=r.screenX,a.doubleTapY=r.screenY)}};var e=function(t,e,a){return e>t?e-t:t>a?t-a:0},a=function(t,a){var i=Dygraph.findPos(a.canvas_),r={left:i.x,right:i.x+a.canvas_.offsetWidth,top:i.y,bottom:i.y+a.canvas_.offsetHeight},n={x:Dygraph.pageX(t),y:Dygraph.pageY(t)},o=e(n.x,r.left,r.right),s=e(n.y,r.top,r.bottom);return Math.max(o,s)};Dygraph.Interaction.defaultModel={mousedown:function(e,i,r){if(!e.button||2!=e.button){r.initializeMouseDown(e,i,r),e.altKey||e.shiftKey?Dygraph.startPan(e,i,r):Dygraph.startZoom(e,i,r);var n=function(e){if(r.isZooming){var n=a(e,i);t>n?Dygraph.moveZoom(e,i,r):null!==r.dragEndX&&(r.dragEndX=null,r.dragEndY=null,i.clearZoomRect_())}else r.isPanning&&Dygraph.movePan(e,i,r)},o=function(t){r.isZooming?null!==r.dragEndX?Dygraph.endZoom(t,i,r):Dygraph.Interaction.maybeTreatMouseOpAsClick(t,i,r):r.isPanning&&Dygraph.endPan(t,i,r),Dygraph.removeEvent(document,"mousemove",n),Dygraph.removeEvent(document,"mouseup",o),r.destroy()};i.addAndTrackEvent(document,"mousemove",n),i.addAndTrackEvent(document,"mouseup",o)}},willDestroyContextMyself:!0,touchstart:function(t,e,a){Dygraph.Interaction.startTouch(t,e,a)},touchmove:function(t,e,a){Dygraph.Interaction.moveTouch(t,e,a)},touchend:function(t,e,a){Dygraph.Interaction.endTouch(t,e,a)},dblclick:function(t,e,a){if(a.cancelNextDblclick)return void(a.cancelNextDblclick=!1);var i={canvasx:a.dragEndX,canvasy:a.dragEndY};e.cascadeEvents_("dblclick",i)||t.altKey||t.shiftKey||e.resetZoom()}},Dygraph.DEFAULT_ATTRS.interactionModel=Dygraph.Interaction.defaultModel,Dygraph.defaultInteractionModel=Dygraph.Interaction.defaultModel,Dygraph.endZoom=Dygraph.Interaction.endZoom,Dygraph.moveZoom=Dygraph.Interaction.moveZoom,Dygraph.startZoom=Dygraph.Interaction.startZoom,Dygraph.endPan=Dygraph.Interaction.endPan,Dygraph.movePan=Dygraph.Interaction.movePan,Dygraph.startPan=Dygraph.Interaction.startPan,Dygraph.Interaction.nonInteractiveModel_={mousedown:function(t,e,a){a.initializeMouseDown(t,e,a)},mouseup:Dygraph.Interaction.maybeTreatMouseOpAsClick},Dygraph.Interaction.dragIsPanInteractionModel={mousedown:function(t,e,a){a.initializeMouseDown(t,e,a),Dygraph.startPan(t,e,a)},mousemove:function(t,e,a){a.isPanning&&Dygraph.movePan(t,e,a)},mouseup:function(t,e,a){a.isPanning&&Dygraph.endPan(t,e,a)}}}(),function(){"use strict";Dygraph.TickList=void 0,Dygraph.Ticker=void 0,Dygraph.numericLinearTicks=function(t,e,a,i,r,n){var o=function(t){return"logscale"===t?!1:i(t)};return Dygraph.numericTicks(t,e,a,o,r,n)},Dygraph.numericTicks=function(t,e,a,i,r,n){var o,s,l,h,p=i("pixelsPerLabel"),g=[];if(n)for(o=0;o<n.length;o++)g.push({v:n[o]});else{if(i("logscale")){h=Math.floor(a/p);var d=Dygraph.binarySearch(t,Dygraph.PREFERRED_LOG_TICK_VALUES,1),u=Dygraph.binarySearch(e,Dygraph.PREFERRED_LOG_TICK_VALUES,-1);-1==d&&(d=0),-1==u&&(u=Dygraph.PREFERRED_LOG_TICK_VALUES.length-1);var c=null;if(u-d>=h/4){for(var y=u;y>=d;y--){var _=Dygraph.PREFERRED_LOG_TICK_VALUES[y],v=Math.log(_/t)/Math.log(e/t)*a,f={v:_};null===c?c={tickValue:_,pixel_coord:v}:Math.abs(v-c.pixel_coord)>=p?c={tickValue:_,pixel_coord:v}:f.label="",g.push(f)}g.reverse()}}if(0===g.length){var x,m,D=i("labelsKMG2");D?(x=[1,2,4,8,16,32,64,128,256],m=16):(x=[1,2,5,10,20,50,100],m=10);var w,A,b,T,E=Math.ceil(a/p),C=Math.abs(e-t)/E,L=Math.floor(Math.log(C)/Math.log(m)),P=Math.pow(m,L);for(s=0;s<x.length&&(w=P*x[s],A=Math.floor(t/w)*w,b=Math.ceil(e/w)*w,h=Math.abs(b-A)/w,T=a/h,!(T>p));s++);for(A>b&&(w*=-1),o=0;h>=o;o++)l=A+o*w,g.push({v:l})}}var S=i("axisLabelFormatter");for(o=0;o<g.length;o++)void 0===g[o].label&&(g[o].label=S.call(r,g[o].v,0,i,r));return g},Dygraph.dateTicker=function(t,e,a,i,r,n){var o=Dygraph.pickDateTickGranularity(t,e,a,i);return o>=0?Dygraph.getDateAxis(t,e,o,i,r):[]},Dygraph.SECONDLY=0,Dygraph.TWO_SECONDLY=1,Dygraph.FIVE_SECONDLY=2,Dygraph.TEN_SECONDLY=3,Dygraph.THIRTY_SECONDLY=4,Dygraph.MINUTELY=5,Dygraph.TWO_MINUTELY=6,Dygraph.FIVE_MINUTELY=7,Dygraph.TEN_MINUTELY=8,Dygraph.THIRTY_MINUTELY=9,Dygraph.HOURLY=10,Dygraph.TWO_HOURLY=11,Dygraph.SIX_HOURLY=12,Dygraph.DAILY=13,Dygraph.TWO_DAILY=14,Dygraph.WEEKLY=15,Dygraph.MONTHLY=16,Dygraph.QUARTERLY=17,Dygraph.BIANNUAL=18,Dygraph.ANNUAL=19,Dygraph.DECADAL=20,Dygraph.CENTENNIAL=21,Dygraph.NUM_GRANULARITIES=22,Dygraph.DATEFIELD_Y=0,Dygraph.DATEFIELD_M=1,Dygraph.DATEFIELD_D=2,Dygraph.DATEFIELD_HH=3,Dygraph.DATEFIELD_MM=4,Dygraph.DATEFIELD_SS=5,Dygraph.DATEFIELD_MS=6,Dygraph.NUM_DATEFIELDS=7,Dygraph.TICK_PLACEMENT=[],Dygraph.TICK_PLACEMENT[Dygraph.SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:1,spacing:1e3},Dygraph.TICK_PLACEMENT[Dygraph.TWO_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:2,spacing:2e3},Dygraph.TICK_PLACEMENT[Dygraph.FIVE_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:5,spacing:5e3},Dygraph.TICK_PLACEMENT[Dygraph.TEN_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:10,spacing:1e4},Dygraph.TICK_PLACEMENT[Dygraph.THIRTY_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:30,spacing:3e4},Dygraph.TICK_PLACEMENT[Dygraph.MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:1,spacing:6e4},Dygraph.TICK_PLACEMENT[Dygraph.TWO_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:2,spacing:12e4},Dygraph.TICK_PLACEMENT[Dygraph.FIVE_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:5,spacing:3e5},Dygraph.TICK_PLACEMENT[Dygraph.TEN_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:10,spacing:6e5},Dygraph.TICK_PLACEMENT[Dygraph.THIRTY_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:30,spacing:18e5},Dygraph.TICK_PLACEMENT[Dygraph.HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:1,spacing:36e5},Dygraph.TICK_PLACEMENT[Dygraph.TWO_HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:2,spacing:72e5},Dygraph.TICK_PLACEMENT[Dygraph.SIX_HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:6,spacing:216e5},Dygraph.TICK_PLACEMENT[Dygraph.DAILY]={datefield:Dygraph.DATEFIELD_D,step:1,spacing:864e5},Dygraph.TICK_PLACEMENT[Dygraph.TWO_DAILY]={datefield:Dygraph.DATEFIELD_D,step:2,spacing:1728e5},Dygraph.TICK_PLACEMENT[Dygraph.WEEKLY]={datefield:Dygraph.DATEFIELD_D,step:7,spacing:6048e5},Dygraph.TICK_PLACEMENT[Dygraph.MONTHLY]={datefield:Dygraph.DATEFIELD_M,step:1,spacing:2629817280},Dygraph.TICK_PLACEMENT[Dygraph.QUARTERLY]={datefield:Dygraph.DATEFIELD_M,step:3,spacing:216e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.BIANNUAL]={datefield:Dygraph.DATEFIELD_M,step:6,spacing:432e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.ANNUAL]={datefield:Dygraph.DATEFIELD_Y,step:1,spacing:864e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.DECADAL]={datefield:Dygraph.DATEFIELD_Y,step:10,spacing:315578073600},Dygraph.TICK_PLACEMENT[Dygraph.CENTENNIAL]={datefield:Dygraph.DATEFIELD_Y,step:100,spacing:3155780736e3},Dygraph.PREFERRED_LOG_TICK_VALUES=function(){for(var t=[],e=-39;39>=e;e++)for(var a=Math.pow(10,e),i=1;9>=i;i++){var r=a*i;t.push(r)}return t}(),Dygraph.pickDateTickGranularity=function(t,e,a,i){for(var r=i("pixelsPerLabel"),n=0;n<Dygraph.NUM_GRANULARITIES;n++){var o=Dygraph.numDateTicks(t,e,n);if(a/o>=r)return n}return-1},Dygraph.numDateTicks=function(t,e,a){var i=Dygraph.TICK_PLACEMENT[a].spacing;return Math.round(1*(e-t)/i)},Dygraph.getDateAxis=function(t,e,a,i,r){var n=i("axisLabelFormatter"),o=i("labelsUTC"),s=o?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal,l=Dygraph.TICK_PLACEMENT[a].datefield,h=Dygraph.TICK_PLACEMENT[a].step,p=Dygraph.TICK_PLACEMENT[a].spacing,g=new Date(t),d=[];d[Dygraph.DATEFIELD_Y]=s.getFullYear(g),d[Dygraph.DATEFIELD_M]=s.getMonth(g),d[Dygraph.DATEFIELD_D]=s.getDate(g),d[Dygraph.DATEFIELD_HH]=s.getHours(g),d[Dygraph.DATEFIELD_MM]=s.getMinutes(g),d[Dygraph.DATEFIELD_SS]=s.getSeconds(g),d[Dygraph.DATEFIELD_MS]=s.getMilliseconds(g);var u=d[l]%h;a==Dygraph.WEEKLY&&(u=s.getDay(g)),d[l]-=u;for(var c=l+1;c<Dygraph.NUM_DATEFIELDS;c++)d[c]=c===Dygraph.DATEFIELD_D?1:0;var y=[],_=s.makeDate.apply(null,d),v=_.getTime();if(a<=Dygraph.HOURLY)for(t>v&&(v+=p,_=new Date(v));e>=v;)y.push({v:v,label:n.call(r,_,a,i,r)}),v+=p,_=new Date(v);else for(t>v&&(d[l]+=h,_=s.makeDate.apply(null,d),v=_.getTime());e>=v;)(a>=Dygraph.DAILY||s.getHours(_)%h===0)&&y.push({v:v,label:n.call(r,_,a,i,r)}),d[l]+=h,_=s.makeDate.apply(null,d),v=_.getTime();return y},Dygraph&&Dygraph.DEFAULT_ATTRS&&Dygraph.DEFAULT_ATTRS.axes&&Dygraph.DEFAULT_ATTRS.axes.x&&Dygraph.DEFAULT_ATTRS.axes.y&&Dygraph.DEFAULT_ATTRS.axes.y2&&(Dygraph.DEFAULT_ATTRS.axes.x.ticker=Dygraph.dateTicker,Dygraph.DEFAULT_ATTRS.axes.y.ticker=Dygraph.numericTicks,Dygraph.DEFAULT_ATTRS.axes.y2.ticker=Dygraph.numericTicks)}(),Dygraph.Plugins={},Dygraph.Plugins.Annotations=function(){"use strict";var t=function(){this.annotations_=[]};return t.prototype.toString=function(){return"Annotations Plugin"},t.prototype.activate=function(t){return{clearChart:this.clearChart,didDrawChart:this.didDrawChart}},t.prototype.detachLabels=function(){for(var t=0;t<this.annotations_.length;t++){var e=this.annotations_[t];e.parentNode&&e.parentNode.removeChild(e),this.annotations_[t]=null}this.annotations_=[]},t.prototype.clearChart=function(t){this.detachLabels()},t.prototype.didDrawChart=function(t){var e=t.dygraph,a=e.layout_.annotated_points;if(a&&0!==a.length)for(var i=t.canvas.parentNode,r={position:"absolute",fontSize:e.getOption("axisLabelFontSize")+"px",zIndex:10,overflow:"hidden"},n=function(t,a,i){return function(r){var n=i.annotation;n.hasOwnProperty(t)?n[t](n,i,e,r):e.getOption(a)&&e.getOption(a)(n,i,e,r)}},o=t.dygraph.plotter_.area,s={},l=0;l<a.length;l++){var h=a[l];if(!(h.canvasx<o.x||h.canvasx>o.x+o.w||h.canvasy<o.y||h.canvasy>o.y+o.h)){var p=h.annotation,g=6;p.hasOwnProperty("tickHeight")&&(g=p.tickHeight);var d=document.createElement("div");for(var u in r)r.hasOwnProperty(u)&&(d.style[u]=r[u]);p.hasOwnProperty("icon")||(d.className="dygraphDefaultAnnotation"),p.hasOwnProperty("cssClass")&&(d.className+=" "+p.cssClass);var c=p.hasOwnProperty("width")?p.width:16,y=p.hasOwnProperty("height")?p.height:16;if(p.hasOwnProperty("icon")){var _=document.createElement("img");_.src=p.icon,_.width=c,_.height=y,d.appendChild(_)}else h.annotation.hasOwnProperty("shortText")&&d.appendChild(document.createTextNode(h.annotation.shortText));var v=h.canvasx-c/2;d.style.left=v+"px";var f=0;if(p.attachAtBottom){var x=o.y+o.h-y-g;s[v]?x-=s[v]:s[v]=0,s[v]+=g+y,f=x}else f=h.canvasy-y-g;d.style.top=f+"px",d.style.width=c+"px",d.style.height=y+"px",d.title=h.annotation.text,d.style.color=e.colorsMap_[h.name],d.style.borderColor=e.colorsMap_[h.name],p.div=d,e.addAndTrackEvent(d,"click",n("clickHandler","annotationClickHandler",h,this)),e.addAndTrackEvent(d,"mouseover",n("mouseOverHandler","annotationMouseOverHandler",h,this)),e.addAndTrackEvent(d,"mouseout",n("mouseOutHandler","annotationMouseOutHandler",h,this)),e.addAndTrackEvent(d,"dblclick",n("dblClickHandler","annotationDblClickHandler",h,this)),i.appendChild(d),this.annotations_.push(d);var m=t.drawingContext;if(m.save(),m.strokeStyle=e.colorsMap_[h.name],m.beginPath(),p.attachAtBottom){var x=f+y;m.moveTo(h.canvasx,x),m.lineTo(h.canvasx,x+g)}else m.moveTo(h.canvasx,h.canvasy),m.lineTo(h.canvasx,h.canvasy-2-g);m.closePath(),m.stroke(),m.restore()}}},t.prototype.destroy=function(){this.detachLabels()},t}(),Dygraph.Plugins.Axes=function(){"use strict";var t=function(){this.xlabels_=[],this.ylabels_=[]};return t.prototype.toString=function(){return"Axes Plugin"},t.prototype.activate=function(t){return{layout:this.layout,clearChart:this.clearChart,willDrawChart:this.willDrawChart}},t.prototype.layout=function(t){var e=t.dygraph;if(e.getOptionForAxis("drawAxis","y")){var a=e.getOptionForAxis("axisLabelWidth","y")+2*e.getOptionForAxis("axisTickSize","y");t.reserveSpaceLeft(a)}if(e.getOptionForAxis("drawAxis","x")){var i;i=e.getOption("xAxisHeight")?e.getOption("xAxisHeight"):e.getOptionForAxis("axisLabelFontSize","x")+2*e.getOptionForAxis("axisTickSize","x"),t.reserveSpaceBottom(i)}if(2==e.numAxes()){if(e.getOptionForAxis("drawAxis","y2")){var a=e.getOptionForAxis("axisLabelWidth","y2")+2*e.getOptionForAxis("axisTickSize","y2");t.reserveSpaceRight(a)}}else e.numAxes()>2&&e.error("Only two y-axes are supported at this time. (Trying to use "+e.numAxes()+")")},t.prototype.detachLabels=function(){function t(t){for(var e=0;e<t.length;e++){var a=t[e];a.parentNode&&a.parentNode.removeChild(a)}}t(this.xlabels_),t(this.ylabels_),this.xlabels_=[],this.ylabels_=[]},t.prototype.clearChart=function(t){this.detachLabels()},t.prototype.willDrawChart=function(t){function e(t){return Math.round(t)+.5}function a(t){return Math.round(t)-.5}var i=t.dygraph;if(i.getOptionForAxis("drawAxis","x")||i.getOptionForAxis("drawAxis","y")||i.getOptionForAxis("drawAxis","y2")){var r,n,o,s,l,h=t.drawingContext,p=t.canvas.parentNode,g=i.width_,d=i.height_,u=function(t){return{position:"absolute",fontSize:i.getOptionForAxis("axisLabelFontSize",t)+"px",zIndex:10,color:i.getOptionForAxis("axisLabelColor",t),width:i.getOptionForAxis("axisLabelWidth",t)+"px",lineHeight:"normal",overflow:"hidden"}},c={x:u("x"),y:u("y"),y2:u("y2")},y=function(t,e,a){var i=document.createElement("div"),r=c["y2"==a?"y2":e];for(var n in r)r.hasOwnProperty(n)&&(i.style[n]=r[n]);var o=document.createElement("div");return o.className="dygraph-axis-label dygraph-axis-label-"+e+(a?" dygraph-axis-label-"+a:""),o.innerHTML=t,i.appendChild(o),i};h.save();var _=i.layout_,v=t.dygraph.plotter_.area,f=function(t){return function(e){return i.getOptionForAxis(e,t)}};if(i.getOptionForAxis("drawAxis","y")){if(_.yticks&&_.yticks.length>0){var x=i.numAxes(),m=[f("y"),f("y2")];for(l=0;l<_.yticks.length;l++){if(s=_.yticks[l],"function"==typeof s)return;n=v.x;var D=1,w="y1",A=m[0];1==s[0]&&(n=v.x+v.w,D=-1,w="y2",A=m[1]);var b=A("axisLabelFontSize");o=v.y+s[1]*v.h,r=y(s[2],"y",2==x?w:null);var T=o-b/2;0>T&&(T=0),T+b+3>d?r.style.bottom="0":r.style.top=T+"px",0===s[0]?(r.style.left=v.x-A("axisLabelWidth")-A("axisTickSize")+"px",r.style.textAlign="right"):1==s[0]&&(r.style.left=v.x+v.w+A("axisTickSize")+"px",r.style.textAlign="left"),r.style.width=A("axisLabelWidth")+"px",p.appendChild(r),this.ylabels_.push(r)}var E=this.ylabels_[0],b=i.getOptionForAxis("axisLabelFontSize","y"),C=parseInt(E.style.top,10)+b;C>d-b&&(E.style.top=parseInt(E.style.top,10)-b/2+"px")}var L;if(i.getOption("drawAxesAtZero")){var P=i.toPercentXCoord(0);(P>1||0>P||isNaN(P))&&(P=0),L=e(v.x+P*v.w)}else L=e(v.x);h.strokeStyle=i.getOptionForAxis("axisLineColor","y"),h.lineWidth=i.getOptionForAxis("axisLineWidth","y"),h.beginPath(),h.moveTo(L,a(v.y)),h.lineTo(L,a(v.y+v.h)),h.closePath(),h.stroke(),2==i.numAxes()&&(h.strokeStyle=i.getOptionForAxis("axisLineColor","y2"),h.lineWidth=i.getOptionForAxis("axisLineWidth","y2"),h.beginPath(),h.moveTo(a(v.x+v.w),a(v.y)),h.lineTo(a(v.x+v.w),a(v.y+v.h)),h.closePath(),h.stroke())}if(i.getOptionForAxis("drawAxis","x")){if(_.xticks){var A=f("x");for(l=0;l<_.xticks.length;l++){s=_.xticks[l],n=v.x+s[0]*v.w,o=v.y+v.h,r=y(s[1],"x"),r.style.textAlign="center",r.style.top=o+A("axisTickSize")+"px";var S=n-A("axisLabelWidth")/2;S+A("axisLabelWidth")>g&&(S=g-A("axisLabelWidth"),r.style.textAlign="right"),0>S&&(S=0,r.style.textAlign="left"),r.style.left=S+"px",r.style.width=A("axisLabelWidth")+"px",
+p.appendChild(r),this.xlabels_.push(r)}}h.strokeStyle=i.getOptionForAxis("axisLineColor","x"),h.lineWidth=i.getOptionForAxis("axisLineWidth","x"),h.beginPath();var O;if(i.getOption("drawAxesAtZero")){var P=i.toPercentYCoord(0,0);(P>1||0>P)&&(P=1),O=a(v.y+P*v.h)}else O=a(v.y+v.h);h.moveTo(e(v.x),O),h.lineTo(e(v.x+v.w),O),h.closePath(),h.stroke()}h.restore()}},t}(),Dygraph.Plugins.ChartLabels=function(){"use strict";var t=function(){this.title_div_=null,this.xlabel_div_=null,this.ylabel_div_=null,this.y2label_div_=null};t.prototype.toString=function(){return"ChartLabels Plugin"},t.prototype.activate=function(t){return{layout:this.layout,didDrawChart:this.didDrawChart}};var e=function(t){var e=document.createElement("div");return e.style.position="absolute",e.style.left=t.x+"px",e.style.top=t.y+"px",e.style.width=t.w+"px",e.style.height=t.h+"px",e};t.prototype.detachLabels_=function(){for(var t=[this.title_div_,this.xlabel_div_,this.ylabel_div_,this.y2label_div_],e=0;e<t.length;e++){var a=t[e];a&&a.parentNode&&a.parentNode.removeChild(a)}this.title_div_=null,this.xlabel_div_=null,this.ylabel_div_=null,this.y2label_div_=null};var a=function(t,e,a,i,r){var n=document.createElement("div");n.style.position="absolute",1==a?n.style.left="0px":n.style.left=e.x+"px",n.style.top=e.y+"px",n.style.width=e.w+"px",n.style.height=e.h+"px",n.style.fontSize=t.getOption("yLabelWidth")-2+"px";var o=document.createElement("div");o.style.position="absolute",o.style.width=e.h+"px",o.style.height=e.w+"px",o.style.top=e.h/2-e.w/2+"px",o.style.left=e.w/2-e.h/2+"px",o.style.textAlign="center";var s="rotate("+(1==a?"-":"")+"90deg)";o.style.transform=s,o.style.WebkitTransform=s,o.style.MozTransform=s,o.style.OTransform=s,o.style.msTransform=s,"undefined"!=typeof document.documentMode&&document.documentMode<9&&(o.style.filter="progid:DXImageTransform.Microsoft.BasicImage(rotation="+(1==a?"3":"1")+")",o.style.left="0px",o.style.top="0px");var l=document.createElement("div");return l.className=i,l.innerHTML=r,o.appendChild(l),n.appendChild(o),n};return t.prototype.layout=function(t){this.detachLabels_();var i=t.dygraph,r=t.chart_div;if(i.getOption("title")){var n=t.reserveSpaceTop(i.getOption("titleHeight"));this.title_div_=e(n),this.title_div_.style.textAlign="center",this.title_div_.style.fontSize=i.getOption("titleHeight")-8+"px",this.title_div_.style.fontWeight="bold",this.title_div_.style.zIndex=10;var o=document.createElement("div");o.className="dygraph-label dygraph-title",o.innerHTML=i.getOption("title"),this.title_div_.appendChild(o),r.appendChild(this.title_div_)}if(i.getOption("xlabel")){var s=t.reserveSpaceBottom(i.getOption("xLabelHeight"));this.xlabel_div_=e(s),this.xlabel_div_.style.textAlign="center",this.xlabel_div_.style.fontSize=i.getOption("xLabelHeight")-2+"px";var o=document.createElement("div");o.className="dygraph-label dygraph-xlabel",o.innerHTML=i.getOption("xlabel"),this.xlabel_div_.appendChild(o),r.appendChild(this.xlabel_div_)}if(i.getOption("ylabel")){var l=t.reserveSpaceLeft(0);this.ylabel_div_=a(i,l,1,"dygraph-label dygraph-ylabel",i.getOption("ylabel")),r.appendChild(this.ylabel_div_)}if(i.getOption("y2label")&&2==i.numAxes()){var h=t.reserveSpaceRight(0);this.y2label_div_=a(i,h,2,"dygraph-label dygraph-y2label",i.getOption("y2label")),r.appendChild(this.y2label_div_)}},t.prototype.didDrawChart=function(t){var e=t.dygraph;this.title_div_&&(this.title_div_.children[0].innerHTML=e.getOption("title")),this.xlabel_div_&&(this.xlabel_div_.children[0].innerHTML=e.getOption("xlabel")),this.ylabel_div_&&(this.ylabel_div_.children[0].children[0].innerHTML=e.getOption("ylabel")),this.y2label_div_&&(this.y2label_div_.children[0].children[0].innerHTML=e.getOption("y2label"))},t.prototype.clearChart=function(){},t.prototype.destroy=function(){this.detachLabels_()},t}(),Dygraph.Plugins.Grid=function(){"use strict";var t=function(){};return t.prototype.toString=function(){return"Gridline Plugin"},t.prototype.activate=function(t){return{willDrawChart:this.willDrawChart}},t.prototype.willDrawChart=function(t){function e(t){return Math.round(t)+.5}function a(t){return Math.round(t)-.5}var i,r,n,o,s=t.dygraph,l=t.drawingContext,h=s.layout_,p=t.dygraph.plotter_.area;if(s.getOptionForAxis("drawGrid","y")){for(var g=["y","y2"],d=[],u=[],c=[],y=[],_=[],n=0;n<g.length;n++)c[n]=s.getOptionForAxis("drawGrid",g[n]),c[n]&&(d[n]=s.getOptionForAxis("gridLineColor",g[n]),u[n]=s.getOptionForAxis("gridLineWidth",g[n]),_[n]=s.getOptionForAxis("gridLinePattern",g[n]),y[n]=_[n]&&_[n].length>=2);for(o=h.yticks,l.save(),n=0;n<o.length;n++){var v=o[n][0];c[v]&&(y[v]&&l.installPattern(_[v]),l.strokeStyle=d[v],l.lineWidth=u[v],i=e(p.x),r=a(p.y+o[n][1]*p.h),l.beginPath(),l.moveTo(i,r),l.lineTo(i+p.w,r),l.closePath(),l.stroke(),y[v]&&l.uninstallPattern())}l.restore()}if(s.getOptionForAxis("drawGrid","x")){o=h.xticks,l.save();var _=s.getOptionForAxis("gridLinePattern","x"),y=_&&_.length>=2;for(y&&l.installPattern(_),l.strokeStyle=s.getOptionForAxis("gridLineColor","x"),l.lineWidth=s.getOptionForAxis("gridLineWidth","x"),n=0;n<o.length;n++)i=e(p.x+o[n][0]*p.w),r=a(p.y+p.h),l.beginPath(),l.moveTo(i,r),l.lineTo(i,p.y),l.closePath(),l.stroke();y&&l.uninstallPattern(),l.restore()}},t.prototype.destroy=function(){},t}(),Dygraph.Plugins.Legend=function(){"use strict";var t=function(){this.legend_div_=null,this.is_generated_div_=!1};t.prototype.toString=function(){return"Legend Plugin"};var e;t.prototype.activate=function(t){var e,a=t.getOption("labelsDivWidth"),i=t.getOption("labelsDiv");if(i&&null!==i)e="string"==typeof i||i instanceof String?document.getElementById(i):i;else{var r={position:"absolute",fontSize:"14px",zIndex:10,width:a+"px",top:"0px",left:t.size().width-a-2+"px",background:"white",lineHeight:"normal",textAlign:"left",overflow:"hidden"};Dygraph.update(r,t.getOption("labelsDivStyles")),e=document.createElement("div"),e.className="dygraph-legend";for(var n in r)if(r.hasOwnProperty(n))try{e.style[n]=r[n]}catch(o){console.warn("You are using unsupported css properties for your browser in labelsDivStyles")}t.graphDiv.appendChild(e),this.is_generated_div_=!0}return this.legend_div_=e,this.one_em_width_=10,{select:this.select,deselect:this.deselect,predraw:this.predraw,didDrawChart:this.didDrawChart}};var a=function(t){var e=document.createElement("span");e.setAttribute("style","margin: 0; padding: 0 0 0 1em; border: 0;"),t.appendChild(e);var a=e.offsetWidth;return t.removeChild(e),a},i=function(t){return t.replace(/&/g,"&amp;").replace(/"/g,"&quot;").replace(/</g,"&lt;").replace(/>/g,"&gt;")};return t.prototype.select=function(e){var a=e.selectedX,i=e.selectedPoints,r=e.selectedRow,n=e.dygraph.getOption("legend");if("never"===n)return void(this.legend_div_.style.display="none");if("follow"===n){var o=e.dygraph.plotter_.area,s=e.dygraph.getOption("labelsDivWidth"),l=e.dygraph.getOptionForAxis("axisLabelWidth","y"),h=i[0].x*o.w+20,p=i[0].y*o.h-20;h+s+1>window.scrollX+window.innerWidth&&(h=h-40-s-(l-o.x)),e.dygraph.graphDiv.appendChild(this.legend_div_),this.legend_div_.style.left=l+h+"px",this.legend_div_.style.top=p+"px"}var g=t.generateLegendHTML(e.dygraph,a,i,this.one_em_width_,r);this.legend_div_.innerHTML=g,this.legend_div_.style.display=""},t.prototype.deselect=function(e){var i=e.dygraph.getOption("legend");"always"!==i&&(this.legend_div_.style.display="none");var r=a(this.legend_div_);this.one_em_width_=r;var n=t.generateLegendHTML(e.dygraph,void 0,void 0,r,null);this.legend_div_.innerHTML=n},t.prototype.didDrawChart=function(t){this.deselect(t)},t.prototype.predraw=function(t){if(this.is_generated_div_){t.dygraph.graphDiv.appendChild(this.legend_div_);var e=t.dygraph.plotter_.area,a=t.dygraph.getOption("labelsDivWidth");this.legend_div_.style.left=e.x+e.w-a-1+"px",this.legend_div_.style.top=e.y+"px",this.legend_div_.style.width=a+"px"}},t.prototype.destroy=function(){this.legend_div_=null},t.generateLegendHTML=function(t,a,r,n,o){if(t.getOption("showLabelsOnHighlight")!==!0)return"";var s,l,h,p,g,d=t.getLabels();if("undefined"==typeof a){if("always"!=t.getOption("legend"))return"";for(l=t.getOption("labelsSeparateLines"),s="",h=1;h<d.length;h++){var u=t.getPropertiesForSeries(d[h]);u.visible&&(""!==s&&(s+=l?"<br/>":" "),g=t.getOption("strokePattern",d[h]),p=e(g,u.color,n),s+="<span style='font-weight: bold; color: "+u.color+";'>"+p+" "+i(d[h])+"</span>")}return s}var c=t.optionsViewForAxis_("x"),y=c("valueFormatter");s=y.call(t,a,c,d[0],t,o,0),""!==s&&(s+=":");var _=[],v=t.numAxes();for(h=0;v>h;h++)_[h]=t.optionsViewForAxis_("y"+(h?1+h:""));var f=t.getOption("labelsShowZeroValues");l=t.getOption("labelsSeparateLines");var x=t.getHighlightSeries();for(h=0;h<r.length;h++){var m=r[h];if((0!==m.yval||f)&&Dygraph.isOK(m.canvasy)){l&&(s+="<br/>");var u=t.getPropertiesForSeries(m.name),D=_[u.axis-1],w=D("valueFormatter"),A=w.call(t,m.yval,D,m.name,t,o,d.indexOf(m.name)),b=m.name==x?" class='highlight'":"";s+="<span"+b+"> <b><span style='color: "+u.color+";'>"+i(m.name)+"</span></b>:&#160;"+A+"</span>"}}return s},e=function(t,e,a){var i=/MSIE/.test(navigator.userAgent)&&!window.opera;if(i)return"&mdash;";if(!t||t.length<=1)return'<div style="display: inline-block; position: relative; bottom: .5ex; padding-left: 1em; height: 1px; border-bottom: 2px solid '+e+';"></div>';var r,n,o,s,l,h=0,p=0,g=[];for(r=0;r<=t.length;r++)h+=t[r%t.length];if(l=Math.floor(a/(h-t[0])),l>1){for(r=0;r<t.length;r++)g[r]=t[r]/a;p=g.length}else{for(l=1,r=0;r<t.length;r++)g[r]=t[r]/h;p=g.length+1}var d="";for(n=0;l>n;n++)for(r=0;p>r;r+=2)o=g[r%g.length],s=r<t.length?g[(r+1)%g.length]:0,d+='<div style="display: inline-block; position: relative; bottom: .5ex; margin-right: '+s+"em; padding-left: "+o+"em; height: 1px; border-bottom: 2px solid "+e+';"></div>';return d},t}(),Dygraph.Plugins.RangeSelector=function(){"use strict";var t=function(){this.isIE_=/MSIE/.test(navigator.userAgent)&&!window.opera,this.hasTouchInterface_="undefined"!=typeof TouchEvent,this.isMobileDevice_=/mobile|android/gi.test(navigator.appVersion),this.interfaceCreated_=!1};return t.prototype.toString=function(){return"RangeSelector Plugin"},t.prototype.activate=function(t){return this.dygraph_=t,this.isUsingExcanvas_=t.isUsingExcanvas_,this.getOption_("showRangeSelector")&&this.createInterface_(),{layout:this.reserveSpace_,predraw:this.renderStaticLayer_,didDrawChart:this.renderInteractiveLayer_}},t.prototype.destroy=function(){this.bgcanvas_=null,this.fgcanvas_=null,this.leftZoomHandle_=null,this.rightZoomHandle_=null,this.iePanOverlay_=null},t.prototype.getOption_=function(t,e){return this.dygraph_.getOption(t,e)},t.prototype.setDefaultOption_=function(t,e){this.dygraph_.attrs_[t]=e},t.prototype.createInterface_=function(){this.createCanvases_(),this.isUsingExcanvas_&&this.createIEPanOverlay_(),this.createZoomHandles_(),this.initInteraction_(),this.getOption_("animatedZooms")&&(console.warn("Animated zooms and range selector are not compatible; disabling animatedZooms."),this.dygraph_.updateOptions({animatedZooms:!1},!0)),this.interfaceCreated_=!0,this.addToGraph_()},t.prototype.addToGraph_=function(){var t=this.graphDiv_=this.dygraph_.graphDiv;t.appendChild(this.bgcanvas_),t.appendChild(this.fgcanvas_),t.appendChild(this.leftZoomHandle_),t.appendChild(this.rightZoomHandle_)},t.prototype.removeFromGraph_=function(){var t=this.graphDiv_;t.removeChild(this.bgcanvas_),t.removeChild(this.fgcanvas_),t.removeChild(this.leftZoomHandle_),t.removeChild(this.rightZoomHandle_),this.graphDiv_=null},t.prototype.reserveSpace_=function(t){this.getOption_("showRangeSelector")&&t.reserveSpaceBottom(this.getOption_("rangeSelectorHeight")+4)},t.prototype.renderStaticLayer_=function(){this.updateVisibility_()&&(this.resize_(),this.drawStaticLayer_())},t.prototype.renderInteractiveLayer_=function(){this.updateVisibility_()&&!this.isChangingRange_&&(this.placeZoomHandles_(),this.drawInteractiveLayer_())},t.prototype.updateVisibility_=function(){var t=this.getOption_("showRangeSelector");if(t)this.interfaceCreated_?this.graphDiv_&&this.graphDiv_.parentNode||this.addToGraph_():this.createInterface_();else if(this.graphDiv_){this.removeFromGraph_();var e=this.dygraph_;setTimeout(function(){e.width_=0,e.resize()},1)}return t},t.prototype.resize_=function(){function t(t,e,a){var i=Dygraph.getContextPixelRatio(e);t.style.top=a.y+"px",t.style.left=a.x+"px",t.width=a.w*i,t.height=a.h*i,t.style.width=a.w+"px",t.style.height=a.h+"px",1!=i&&e.scale(i,i)}var e=this.dygraph_.layout_.getPlotArea(),a=0;this.dygraph_.getOptionForAxis("drawAxis","x")&&(a=this.getOption_("xAxisHeight")||this.getOption_("axisLabelFontSize")+2*this.getOption_("axisTickSize")),this.canvasRect_={x:e.x,y:e.y+e.h+a+4,w:e.w,h:this.getOption_("rangeSelectorHeight")},t(this.bgcanvas_,this.bgcanvas_ctx_,this.canvasRect_),t(this.fgcanvas_,this.fgcanvas_ctx_,this.canvasRect_)},t.prototype.createCanvases_=function(){this.bgcanvas_=Dygraph.createCanvas(),this.bgcanvas_.className="dygraph-rangesel-bgcanvas",this.bgcanvas_.style.position="absolute",this.bgcanvas_.style.zIndex=9,this.bgcanvas_ctx_=Dygraph.getContext(this.bgcanvas_),this.fgcanvas_=Dygraph.createCanvas(),this.fgcanvas_.className="dygraph-rangesel-fgcanvas",this.fgcanvas_.style.position="absolute",this.fgcanvas_.style.zIndex=9,this.fgcanvas_.style.cursor="default",this.fgcanvas_ctx_=Dygraph.getContext(this.fgcanvas_)},t.prototype.createIEPanOverlay_=function(){this.iePanOverlay_=document.createElement("div"),this.iePanOverlay_.style.position="absolute",this.iePanOverlay_.style.backgroundColor="white",this.iePanOverlay_.style.filter="alpha(opacity=0)",this.iePanOverlay_.style.display="none",this.iePanOverlay_.style.cursor="move",this.fgcanvas_.appendChild(this.iePanOverlay_)},t.prototype.createZoomHandles_=function(){var t=new Image;t.className="dygraph-rangesel-zoomhandle",t.style.position="absolute",t.style.zIndex=10,t.style.visibility="hidden",t.style.cursor="col-resize",/MSIE 7/.test(navigator.userAgent)?(t.width=7,t.height=14,t.style.backgroundColor="white",t.style.border="1px solid #333333"):(t.width=9,t.height=16,t.src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAkAAAAQCAYAAADESFVDAAAAAXNSR0IArs4c6QAAAAZiS0dEANAAzwDP4Z7KegAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAAd0SU1FB9sHGw0cMqdt1UwAAAAZdEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIEdJTVBXgQ4XAAAAaElEQVQoz+3SsRFAQBCF4Z9WJM8KCDVwownl6YXsTmCUsyKGkZzcl7zkz3YLkypgAnreFmDEpHkIwVOMfpdi9CEEN2nGpFdwD03yEqDtOgCaun7sqSTDH32I1pQA2Pb9sZecAxc5r3IAb21d6878xsAAAAAASUVORK5CYII="),this.isMobileDevice_&&(t.width*=2,t.height*=2),this.leftZoomHandle_=t,this.rightZoomHandle_=t.cloneNode(!1)},t.prototype.initInteraction_=function(){var t,e,a,i,r,n,o,s,l,h,p,g,d,u,c=this,y=document,_=0,v=null,f=!1,x=!1,m=!this.isMobileDevice_&&!this.isUsingExcanvas_,D=new Dygraph.IFrameTarp;t=function(t){var e=c.dygraph_.xAxisExtremes(),a=(e[1]-e[0])/c.canvasRect_.w,i=e[0]+(t.leftHandlePos-c.canvasRect_.x)*a,r=e[0]+(t.rightHandlePos-c.canvasRect_.x)*a;return[i,r]},e=function(t){return Dygraph.cancelEvent(t),f=!0,_=t.clientX,v=t.target?t.target:t.srcElement,("mousedown"===t.type||"dragstart"===t.type)&&(Dygraph.addEvent(y,"mousemove",a),Dygraph.addEvent(y,"mouseup",i)),c.fgcanvas_.style.cursor="col-resize",D.cover(),!0},a=function(t){if(!f)return!1;Dygraph.cancelEvent(t);var e=t.clientX-_;if(Math.abs(e)<4)return!0;_=t.clientX;var a,i=c.getZoomHandleStatus_();v==c.leftZoomHandle_?(a=i.leftHandlePos+e,a=Math.min(a,i.rightHandlePos-v.width-3),a=Math.max(a,c.canvasRect_.x)):(a=i.rightHandlePos+e,a=Math.min(a,c.canvasRect_.x+c.canvasRect_.w),a=Math.max(a,i.leftHandlePos+v.width+3));var n=v.width/2;return v.style.left=a-n+"px",c.drawInteractiveLayer_(),m&&r(),!0},i=function(t){return f?(f=!1,D.uncover(),Dygraph.removeEvent(y,"mousemove",a),Dygraph.removeEvent(y,"mouseup",i),c.fgcanvas_.style.cursor="default",m||r(),!0):!1},r=function(){try{var e=c.getZoomHandleStatus_();if(c.isChangingRange_=!0,e.isZoomed){var a=t(e);c.dygraph_.doZoomXDates_(a[0],a[1])}else c.dygraph_.resetZoom()}finally{c.isChangingRange_=!1}},n=function(t){if(c.isUsingExcanvas_)return t.srcElement==c.iePanOverlay_;var e=c.leftZoomHandle_.getBoundingClientRect(),a=e.left+e.width/2;e=c.rightZoomHandle_.getBoundingClientRect();var i=e.left+e.width/2;return t.clientX>a&&t.clientX<i},o=function(t){return!x&&n(t)&&c.getZoomHandleStatus_().isZoomed?(Dygraph.cancelEvent(t),x=!0,_=t.clientX,"mousedown"===t.type&&(Dygraph.addEvent(y,"mousemove",s),Dygraph.addEvent(y,"mouseup",l)),!0):!1},s=function(t){if(!x)return!1;Dygraph.cancelEvent(t);var e=t.clientX-_;if(Math.abs(e)<4)return!0;_=t.clientX;var a=c.getZoomHandleStatus_(),i=a.leftHandlePos,r=a.rightHandlePos,n=r-i;i+e<=c.canvasRect_.x?(i=c.canvasRect_.x,r=i+n):r+e>=c.canvasRect_.x+c.canvasRect_.w?(r=c.canvasRect_.x+c.canvasRect_.w,i=r-n):(i+=e,r+=e);var o=c.leftZoomHandle_.width/2;return c.leftZoomHandle_.style.left=i-o+"px",c.rightZoomHandle_.style.left=r-o+"px",c.drawInteractiveLayer_(),m&&h(),!0},l=function(t){return x?(x=!1,Dygraph.removeEvent(y,"mousemove",s),Dygraph.removeEvent(y,"mouseup",l),m||h(),!0):!1},h=function(){try{c.isChangingRange_=!0,c.dygraph_.dateWindow_=t(c.getZoomHandleStatus_()),c.dygraph_.drawGraph_(!1)}finally{c.isChangingRange_=!1}},p=function(t){if(!f&&!x){var e=n(t)?"move":"default";e!=c.fgcanvas_.style.cursor&&(c.fgcanvas_.style.cursor=e)}},g=function(t){"touchstart"==t.type&&1==t.targetTouches.length?e(t.targetTouches[0])&&Dygraph.cancelEvent(t):"touchmove"==t.type&&1==t.targetTouches.length?a(t.targetTouches[0])&&Dygraph.cancelEvent(t):i(t)},d=function(t){"touchstart"==t.type&&1==t.targetTouches.length?o(t.targetTouches[0])&&Dygraph.cancelEvent(t):"touchmove"==t.type&&1==t.targetTouches.length?s(t.targetTouches[0])&&Dygraph.cancelEvent(t):l(t)},u=function(t,e){for(var a=["touchstart","touchend","touchmove","touchcancel"],i=0;i<a.length;i++)c.dygraph_.addAndTrackEvent(t,a[i],e)},this.setDefaultOption_("interactionModel",Dygraph.Interaction.dragIsPanInteractionModel),this.setDefaultOption_("panEdgeFraction",1e-4);var w=window.opera?"mousedown":"dragstart";this.dygraph_.addAndTrackEvent(this.leftZoomHandle_,w,e),this.dygraph_.addAndTrackEvent(this.rightZoomHandle_,w,e),this.isUsingExcanvas_?this.dygraph_.addAndTrackEvent(this.iePanOverlay_,"mousedown",o):(this.dygraph_.addAndTrackEvent(this.fgcanvas_,"mousedown",o),this.dygraph_.addAndTrackEvent(this.fgcanvas_,"mousemove",p)),this.hasTouchInterface_&&(u(this.leftZoomHandle_,g),u(this.rightZoomHandle_,g),u(this.fgcanvas_,d))},t.prototype.drawStaticLayer_=function(){var t=this.bgcanvas_ctx_;t.clearRect(0,0,this.canvasRect_.w,this.canvasRect_.h);try{this.drawMiniPlot_()}catch(e){console.warn(e)}var a=.5;this.bgcanvas_ctx_.lineWidth=1,t.strokeStyle="gray",t.beginPath(),t.moveTo(a,a),t.lineTo(a,this.canvasRect_.h-a),t.lineTo(this.canvasRect_.w-a,this.canvasRect_.h-a),t.lineTo(this.canvasRect_.w-a,a),t.stroke()},t.prototype.drawMiniPlot_=function(){var t=this.getOption_("rangeSelectorPlotFillColor"),e=this.getOption_("rangeSelectorPlotStrokeColor");if(t||e){var a=this.getOption_("stepPlot"),i=this.computeCombinedSeriesAndLimits_(),r=i.yMax-i.yMin,n=this.bgcanvas_ctx_,o=.5,s=this.dygraph_.xAxisExtremes(),l=Math.max(s[1]-s[0],1e-30),h=(this.canvasRect_.w-o)/l,p=(this.canvasRect_.h-o)/r,g=this.canvasRect_.w-o,d=this.canvasRect_.h-o,u=null,c=null;n.beginPath(),n.moveTo(o,d);for(var y=0;y<i.data.length;y++){var _=i.data[y],v=null!==_[0]?(_[0]-s[0])*h:0/0,f=null!==_[1]?d-(_[1]-i.yMin)*p:0/0;(a||null===u||Math.round(v)!=Math.round(u))&&(isFinite(v)&&isFinite(f)?(null===u?n.lineTo(v,d):a&&n.lineTo(v,c),n.lineTo(v,f),u=v,c=f):(null!==u&&(a?(n.lineTo(v,c),n.lineTo(v,d)):n.lineTo(u,d)),u=c=null))}if(n.lineTo(g,d),n.closePath(),t){var x=this.bgcanvas_ctx_.createLinearGradient(0,0,0,d);x.addColorStop(0,"white"),x.addColorStop(1,t),this.bgcanvas_ctx_.fillStyle=x,n.fill()}e&&(this.bgcanvas_ctx_.strokeStyle=e,this.bgcanvas_ctx_.lineWidth=1.5,n.stroke())}},t.prototype.computeCombinedSeriesAndLimits_=function(){var t,e=this.dygraph_,a=this.getOption_("logscale"),i=e.numColumns(),r=e.getLabels(),n=new Array(i),o=!1;for(t=1;i>t;t++){var s=this.getOption_("showInRangeSelector",r[t]);n[t]=s,null!==s&&(o=!0)}if(!o)for(t=0;t<n.length;t++)n[t]=!0;var l=[],h=e.dataHandler_,p=e.attributes_;for(t=1;t<e.numColumns();t++)if(n[t]){var g=h.extractSeries(e.rawData_,t,p);e.rollPeriod()>1&&(g=h.rollingAverage(g,e.rollPeriod(),p)),l.push(g)}var d=[];for(t=0;t<l[0].length;t++){for(var u=0,c=0,y=0;y<l.length;y++){var _=l[y][t][1];null===_||isNaN(_)||(c++,u+=_)}d.push([l[0][t][0],u/c])}var v=Number.MAX_VALUE,f=-Number.MAX_VALUE;for(t=0;t<d.length;t++){var x=d[t][1];null!==x&&isFinite(x)&&(!a||x>0)&&(v=Math.min(v,x),f=Math.max(f,x))}var m=.25;if(a)for(f=Dygraph.log10(f),f+=f*m,v=Dygraph.log10(v),t=0;t<d.length;t++)d[t][1]=Dygraph.log10(d[t][1]);else{var D,w=f-v;D=w<=Number.MIN_VALUE?f*m:w*m,f+=D,v-=D}return{data:d,yMin:v,yMax:f}},t.prototype.placeZoomHandles_=function(){var t=this.dygraph_.xAxisExtremes(),e=this.dygraph_.xAxisRange(),a=t[1]-t[0],i=Math.max(0,(e[0]-t[0])/a),r=Math.max(0,(t[1]-e[1])/a),n=this.canvasRect_.x+this.canvasRect_.w*i,o=this.canvasRect_.x+this.canvasRect_.w*(1-r),s=Math.max(this.canvasRect_.y,this.canvasRect_.y+(this.canvasRect_.h-this.leftZoomHandle_.height)/2),l=this.leftZoomHandle_.width/2;this.leftZoomHandle_.style.left=n-l+"px",this.leftZoomHandle_.style.top=s+"px",this.rightZoomHandle_.style.left=o-l+"px",this.rightZoomHandle_.style.top=this.leftZoomHandle_.style.top,this.leftZoomHandle_.style.visibility="visible",this.rightZoomHandle_.style.visibility="visible"},t.prototype.drawInteractiveLayer_=function(){var t=this.fgcanvas_ctx_;t.clearRect(0,0,this.canvasRect_.w,this.canvasRect_.h);var e=1,a=this.canvasRect_.w-e,i=this.canvasRect_.h-e,r=this.getZoomHandleStatus_();if(t.strokeStyle="black",r.isZoomed){var n=Math.max(e,r.leftHandlePos-this.canvasRect_.x),o=Math.min(a,r.rightHandlePos-this.canvasRect_.x);t.fillStyle="rgba(240, 240, 240, 0.6)",t.fillRect(0,0,n,this.canvasRect_.h),t.fillRect(o,0,this.canvasRect_.w-o,this.canvasRect_.h),t.beginPath(),t.moveTo(e,e),t.lineTo(n,e),t.lineTo(n,i),t.lineTo(o,i),t.lineTo(o,e),t.lineTo(a,e),t.stroke(),this.isUsingExcanvas_&&(this.iePanOverlay_.style.width=o-n+"px",this.iePanOverlay_.style.left=n+"px",this.iePanOverlay_.style.height=i+"px",this.iePanOverlay_.style.display="inline")}else t.beginPath(),t.moveTo(e,e),t.lineTo(e,i),t.lineTo(a,i),t.lineTo(a,e),t.stroke(),this.iePanOverlay_&&(this.iePanOverlay_.style.display="none")},t.prototype.getZoomHandleStatus_=function(){var t=this.leftZoomHandle_.width/2,e=parseFloat(this.leftZoomHandle_.style.left)+t,a=parseFloat(this.rightZoomHandle_.style.left)+t;return{leftHandlePos:e,rightHandlePos:a,isZoomed:e-1>this.canvasRect_.x||a+1<this.canvasRect_.x+this.canvasRect_.w}},t}(),Dygraph.PLUGINS.push(Dygraph.Plugins.Legend,Dygraph.Plugins.Axes,Dygraph.Plugins.RangeSelector,Dygraph.Plugins.ChartLabels,Dygraph.Plugins.Annotations,Dygraph.Plugins.Grid),Dygraph.DataHandler=function(){},Dygraph.DataHandlers={},function(){"use strict";var t=Dygraph.DataHandler;t.X=0,t.Y=1,t.EXTRAS=2,t.prototype.extractSeries=function(t,e,a){},t.prototype.seriesToPoints=function(e,a,i){for(var r=[],n=0;n<e.length;++n){var o=e[n],s=o[1],l=null===s?null:t.parseFloat(s),h={x:0/0,y:0/0,xval:t.parseFloat(o[0]),yval:l,name:a,idx:n+i};r.push(h)}return this.onPointsCreated_(e,r),r},t.prototype.onPointsCreated_=function(t,e){},t.prototype.rollingAverage=function(t,e,a){},t.prototype.getExtremeYValues=function(t,e,a){},t.prototype.onLineEvaluated=function(t,e,a){},t.prototype.computeYInterpolation_=function(t,e,a){var i=e[1]-t[1],r=e[0]-t[0],n=i/r,o=(a-t[0])*n;return t[1]+o},t.prototype.getIndexesInWindow_=function(t,e){var a=0,i=t.length-1;if(e){for(var r=0,n=e[0],o=e[1];r<t.length-1&&t[r][0]<n;)a++,r++;for(r=t.length-1;r>0&&t[r][0]>o;)i--,r--}return i>=a?[a,i]:[0,t.length-1]},t.parseFloat=function(t){return null===t?0/0:t}}(),function(){"use strict";Dygraph.DataHandlers.DefaultHandler=function(){};var t=Dygraph.DataHandlers.DefaultHandler;t.prototype=new Dygraph.DataHandler,t.prototype.extractSeries=function(t,e,a){for(var i=[],r=a.get("logscale"),n=0;n<t.length;n++){var o=t[n][0],s=t[n][e];r&&0>=s&&(s=null),i.push([o,s])}return i},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r,n,o,s,l=[];if(1==e)return t;for(i=0;i<t.length;i++){for(o=0,s=0,r=Math.max(0,i-e+1);i+1>r;r++)n=t[r][1],null===n||isNaN(n)||(s++,o+=t[r][1]);s?l[i]=[t[i][0],o/s]:l[i]=[t[i][0],null]}return l},t.prototype.getExtremeYValues=function(t,e,a){for(var i,r=null,n=null,o=0,s=t.length-1,l=o;s>=l;l++)i=t[l][1],null===i||isNaN(i)||((null===n||i>n)&&(n=i),(null===r||r>i)&&(r=i));return[r,n]}}(),function(){"use strict";Dygraph.DataHandlers.DefaultFractionHandler=function(){};var t=Dygraph.DataHandlers.DefaultFractionHandler;t.prototype=new Dygraph.DataHandlers.DefaultHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s,l,h=[],p=100,g=a.get("logscale"),d=0;d<t.length;d++)i=t[d][0],n=t[d][e],g&&null!==n&&(n[0]<=0||n[1]<=0)&&(n=null),null!==n?(o=n[0],s=n[1],null===o||isNaN(o)?h.push([i,o,[o,s]]):(l=s?o/s:0,r=p*l,h.push([i,r,[o,s]]))):h.push([i,null,[null,null]]);return h},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r=[],n=0,o=0,s=100;for(i=0;i<t.length;i++){n+=t[i][2][0],o+=t[i][2][1],i-e>=0&&(n-=t[i-e][2][0],o-=t[i-e][2][1]);var l=t[i][0],h=o?n/o:0;r[i]=[l,s*h]}return r}}(),function(){"use strict";Dygraph.DataHandlers.BarsHandler=function(){Dygraph.DataHandler.call(this)},Dygraph.DataHandlers.BarsHandler.prototype=new Dygraph.DataHandler;var t=Dygraph.DataHandlers.BarsHandler;t.prototype.extractSeries=function(t,e,a){},t.prototype.rollingAverage=function(t,e,a){},t.prototype.onPointsCreated_=function(t,e){for(var a=0;a<t.length;++a){var i=t[a],r=e[a];r.y_top=0/0,r.y_bottom=0/0,r.yval_minus=Dygraph.DataHandler.parseFloat(i[2][0]),r.yval_plus=Dygraph.DataHandler.parseFloat(i[2][1])}},t.prototype.getExtremeYValues=function(t,e,a){for(var i,r=null,n=null,o=0,s=t.length-1,l=o;s>=l;l++)if(i=t[l][1],null!==i&&!isNaN(i)){var h=t[l][2][0],p=t[l][2][1];h>i&&(h=i),i>p&&(p=i),(null===n||p>n)&&(n=p),(null===r||r>h)&&(r=h)}return[r,n]},t.prototype.onLineEvaluated=function(t,e,a){for(var i,r=0;r<t.length;r++)i=t[r],i.y_top=DygraphLayout.calcYNormal_(e,i.yval_minus,a),i.y_bottom=DygraphLayout.calcYNormal_(e,i.yval_plus,a)}}(),function(){"use strict";Dygraph.DataHandlers.CustomBarsHandler=function(){};var t=Dygraph.DataHandlers.CustomBarsHandler;t.prototype=new Dygraph.DataHandlers.BarsHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o=[],s=a.get("logscale"),l=0;l<t.length;l++)i=t[l][0],n=t[l][e],s&&null!==n&&(n[0]<=0||n[1]<=0||n[2]<=0)&&(n=null),null!==n?(r=n[1],o.push(null===r||isNaN(r)?[i,r,[r,r]]:[i,r,[n[0],n[2]]])):o.push([i,null,[null,null]]);return o},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r,n,o,s,l,h,p=[];for(r=0,o=0,n=0,s=0,l=0;l<t.length;l++){if(i=t[l][1],h=t[l][2],p[l]=t[l],null===i||isNaN(i)||(r+=h[0],o+=i,n+=h[1],s+=1),l-e>=0){var g=t[l-e];null===g[1]||isNaN(g[1])||(r-=g[2][0],o-=g[1],n-=g[2][1],s-=1)}s?p[l]=[t[l][0],1*o/s,[1*r/s,1*n/s]]:p[l]=[t[l][0],null,[null,null]]}return p}}(),function(){"use strict";Dygraph.DataHandlers.ErrorBarsHandler=function(){};var t=Dygraph.DataHandlers.ErrorBarsHandler;t.prototype=new Dygraph.DataHandlers.BarsHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s=[],l=a.get("sigma"),h=a.get("logscale"),p=0;p<t.length;p++)i=t[p][0],o=t[p][e],h&&null!==o&&(o[0]<=0||o[0]-l*o[1]<=0)&&(o=null),null!==o?(r=o[0],null===r||isNaN(r)?s.push([i,r,[r,r,r]]):(n=l*o[1],s.push([i,r,[r-n,r+n,o[1]]]))):s.push([i,null,[null,null,null]]);return s},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r,n,o,s,l,h,p,g,d=[],u=a.get("sigma");for(i=0;i<t.length;i++){for(s=0,p=0,l=0,r=Math.max(0,i-e+1);i+1>r;r++)n=t[r][1],null===n||isNaN(n)||(l++,s+=n,p+=Math.pow(t[r][2][2],2));l?(h=Math.sqrt(p)/l,g=s/l,d[i]=[t[i][0],g,[g-u*h,g+u*h]]):(o=1==e?t[i][1]:null,d[i]=[t[i][0],o,[o,o]])}return d}}(),function(){"use strict";Dygraph.DataHandlers.FractionsBarsHandler=function(){};var t=Dygraph.DataHandlers.FractionsBarsHandler;t.prototype=new Dygraph.DataHandlers.BarsHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s,l,h,p,g=[],d=100,u=a.get("sigma"),c=a.get("logscale"),y=0;y<t.length;y++)i=t[y][0],n=t[y][e],c&&null!==n&&(n[0]<=0||n[1]<=0)&&(n=null),null!==n?(o=n[0],s=n[1],null===o||isNaN(o)?g.push([i,o,[o,o,o,s]]):(l=s?o/s:0,h=s?u*Math.sqrt(l*(1-l)/s):1,p=d*h,r=d*l,g.push([i,r,[r-p,r+p,o,s]]))):g.push([i,null,[null,null,null,null]]);return g},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r,n,o,s=[],l=a.get("sigma"),h=a.get("wilsonInterval"),p=0,g=0,d=100;for(n=0;n<t.length;n++){p+=t[n][2][2],g+=t[n][2][3],n-e>=0&&(p-=t[n-e][2][2],g-=t[n-e][2][3]);var u=t[n][0],c=g?p/g:0;if(h)if(g){var y=0>c?0:c,_=g,v=l*Math.sqrt(y*(1-y)/_+l*l/(4*_*_)),f=1+l*l/g;i=(y+l*l/(2*g)-v)/f,r=(y+l*l/(2*g)+v)/f,s[n]=[u,y*d,[i*d,r*d]]}else s[n]=[u,0,[0,0]];else o=g?l*Math.sqrt(c*(1-c)/g):1,s[n]=[u,d*c,[d*(c-o),d*(c+o)]]}return s}}();
+//# sourceMappingURL=dygraph-combined.js.map
\ No newline at end of file
diff --git a/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css
new file mode 100644
index 000000000..4745b2fc2
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css
@@ -0,0 +1,8 @@
+
+div .dygraphs input[type="text"] {
+  width: 25px;
+}
+
+div .qt .dygraph-axis-label {
+  font-size: 11px;
+}
\ No newline at end of file
diff --git a/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js
new file mode 100644
index 000000000..2df07a9b8
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js
@@ -0,0 +1,123 @@
+/**
+ * @license
+ * Copyright 2011 Dan Vanderkam (danvdk@gmail.com)
+ * MIT-licensed (http://opensource.org/licenses/MIT)
+ */
+
+/**
+ * @fileoverview
+ * Including this file will add several additional shapes to Dygraph.Circles
+ * which can be passed to drawPointCallback.
+ * See tests/custom-circles.html for usage.
+ */
+
+(function() {
+
+/**
+ * @param {!CanvasRenderingContext2D} ctx the canvas context
+ * @param {number} sides the number of sides in the shape.
+ * @param {number} radius the radius of the image.
+ * @param {number} cx center x coordate
+ * @param {number} cy center y coordinate
+ * @param {number=} rotationRadians the shift of the initial angle, in radians.
+ * @param {number=} delta the angle shift for each line. If missing, creates a
+ *     regular polygon.
+ */
+var regularShape = function(
+    ctx, sides, radius, cx, cy, rotationRadians, delta) {
+  rotationRadians = rotationRadians || 0;
+  delta = delta || Math.PI * 2 / sides;
+
+  ctx.beginPath();
+  var initialAngle = rotationRadians;
+  var angle = initialAngle;
+
+  var computeCoordinates = function() {
+    var x = cx + (Math.sin(angle) * radius);
+    var y = cy + (-Math.cos(angle) * radius);
+    return [x, y];
+  };
+
+  var initialCoordinates = computeCoordinates();
+  var x = initialCoordinates[0];
+  var y = initialCoordinates[1];
+  ctx.moveTo(x, y);
+
+  for (var idx = 0; idx < sides; idx++) {
+    angle = (idx == sides - 1) ? initialAngle : (angle + delta);
+    var coords = computeCoordinates();
+    ctx.lineTo(coords[0], coords[1]);
+  }
+  ctx.fill();
+  ctx.stroke();
+};
+
+/**
+ * TODO(danvk): be more specific on the return type.
+ * @param {number} sides
+ * @param {number=} rotationRadians
+ * @param {number=} delta
+ * @return {Function}
+ * @private
+ */
+var shapeFunction = function(sides, rotationRadians, delta) {
+  return function(g, name, ctx, cx, cy, color, radius) {
+    ctx.strokeStyle = color;
+    ctx.fillStyle = "white";
+    regularShape(ctx, sides, radius, cx, cy, rotationRadians, delta);
+  };
+};
+
+var customCircles = {
+  TRIANGLE : shapeFunction(3),
+  SQUARE : shapeFunction(4, Math.PI / 4),
+  DIAMOND : shapeFunction(4),
+  PENTAGON : shapeFunction(5),
+  HEXAGON : shapeFunction(6),
+  CIRCLE : function(g, name, ctx, cx, cy, color, radius) {
+    ctx.beginPath();
+    ctx.strokeStyle = color;
+    ctx.fillStyle = "white";
+    ctx.arc(cx, cy, radius, 0, 2 * Math.PI, false);
+    ctx.fill();
+    ctx.stroke();
+  },
+  STAR : shapeFunction(5, 0, 4 * Math.PI / 5),
+  PLUS : function(g, name, ctx, cx, cy, color, radius) {
+    ctx.strokeStyle = color;
+
+    ctx.beginPath();
+    ctx.moveTo(cx + radius, cy);
+    ctx.lineTo(cx - radius, cy);
+    ctx.closePath();
+    ctx.stroke();
+
+    ctx.beginPath();
+    ctx.moveTo(cx, cy + radius);
+    ctx.lineTo(cx, cy - radius);
+    ctx.closePath();
+    ctx.stroke();
+  },
+  EX : function(g, name, ctx, cx, cy, color, radius) {
+    ctx.strokeStyle = color;
+
+    ctx.beginPath();
+    ctx.moveTo(cx + radius, cy + radius);
+    ctx.lineTo(cx - radius, cy - radius);
+    ctx.closePath();
+    ctx.stroke();
+
+    ctx.beginPath();
+    ctx.moveTo(cx + radius, cy - radius);
+    ctx.lineTo(cx - radius, cy + radius);
+    ctx.closePath();
+    ctx.stroke();
+  }
+};
+
+for (var k in customCircles) {
+  if (!customCircles.hasOwnProperty(k)) continue;
+  Dygraph.Circles[k] = customCircles[k];
+}
+
+})();
diff --git a/docs/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js b/docs/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js
new file mode 100644
index 000000000..3cd03913f
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js
@@ -0,0 +1,789 @@
+
+// polyfill indexOf for IE8
+if (!Array.prototype.indexOf) {
+  Array.prototype.indexOf = function(elt /*, from*/) {
+    var len = this.length >>> 0;
+
+    var from = Number(arguments[1]) || 0;
+    from = (from < 0)
+         ? Math.ceil(from)
+         : Math.floor(from);
+    if (from < 0)
+      from += len;
+
+    for (; from < len; from++) {
+      if (from in this &&
+          this[from] === elt)
+        return from;
+    }
+    return -1;
+  };
+}
+
+HTMLWidgets.widget({
+
+  name: "dygraphs",
+
+  type: "output",
+
+  factory: function(el, width, height) {
+    
+    // reference to dygraph
+    var dygraph = null;
+    
+    // reference to widget global groups
+    var groups = this.groups;
+ 
+    // add qt style if we are running under Qt
+    if (window.navigator.userAgent.indexOf(" Qt/") > 0)
+      el.className += " qt";
+    
+    return {
+      
+      renderValue: function(x) {
+        
+        // reference to this for closures
+        var thiz = this;
+        
+        // get dygraph attrs and populate file field
+        var attrs = x.attrs;
+        attrs.file = x.data;
+	      
+	// disable zoom interaction except for clicks
+        if (attrs.disableZoom) {
+          attrs.interactionModel = Dygraph.Interaction.nonInteractiveModel_;
+        }
+        
+        // convert non-arrays to arrays
+        for (var index = 0; index < attrs.file.length; index++) {
+          if (!$.isArray(attrs.file[index]))
+            attrs.file[index] = [].concat(attrs.file[index]);
+        }
+            
+        // resolve "auto" legend behavior
+        if (x.attrs.legend == "auto") {
+          if (x.data.length <= 2)
+            x.attrs.legend = "onmouseover";
+          else
+            x.attrs.legend = "always";
+        }
+        
+        if (x.format == "date") {
+          
+          // set appropriated function in case of fixed tz
+          if ((attrs.axes.x.axisLabelFormatter === undefined) && x.fixedtz)
+            attrs.axes.x.axisLabelFormatter = this.xAxisLabelFormatterFixedTZ(x.tzone);
+            
+          if ((attrs.axes.x.valueFormatter === undefined) && x.fixedtz)
+            attrs.axes.x.valueFormatter = this.xValueFormatterFixedTZ(x.scale, x.tzone);
+      
+          if ((attrs.axes.x.ticker === undefined) && x.fixedtz)
+            attrs.axes.x.ticker = this.customDateTickerFixedTZ(x.tzone);
+        
+          // provide an automatic x value formatter if none is already specified
+          if ((attrs.axes.x.valueFormatter === undefined) && (x.fixedtz != true))
+            attrs.axes.x.valueFormatter = this.xValueFormatter(x.scale);
+          
+          // convert time to js time
+          attrs.file[0] = attrs.file[0].map(function(value) {
+            return thiz.normalizeDateValue(x.scale, value, x.fixedtz);
+          });
+          if (attrs.dateWindow != null) {
+            attrs.dateWindow = attrs.dateWindow.map(function(value) {
+              var date = thiz.normalizeDateValue(x.scale, value, x.fixedtz);
+              return date.getTime();
+            });
+          }
+        }
+        
+        
+        // transpose array
+        attrs.file = HTMLWidgets.transposeArray2D(attrs.file);
+        
+        // add drawCallback for group
+        if (x.group != null)
+          this.addGroupDrawCallback(x);  
+          
+        // add shading and event callback if necessary
+        this.addShadingCallback(x);
+        this.addEventCallback(x);
+        this.addZoomCallback(x);
+        
+        // disable y-axis touch events on mobile phones
+        if (attrs.mobileDisableYTouch !== false && this.isMobilePhone()) {
+          // create default interaction model if necessary
+          if (!attrs.interactionModel)
+            attrs.interactionModel = Dygraph.Interaction.defaultModel;
+          // disable y touch direction
+          attrs.interactionModel.touchstart = function(event, dygraph, context) {
+            Dygraph.defaultInteractionModel.touchstart(event, dygraph, context);
+            context.touchDirections = { x: true, y: false };
+          };
+        }
+    
+        // create plugins
+        if (x.plugins) {
+          attrs.plugins = [];
+          for (var plugin in x.plugins) {
+            if (x.plugins.hasOwnProperty(plugin)) {
+              
+              // get plugin options
+              var options = x.plugins[plugin];
+              
+              // create plugin and add to dygraph
+              var p = new Dygraph.Plugins[plugin](options);
+              attrs.plugins.push(p);
+            }
+          }
+        }
+
+        // custom plotter
+        if (x.plotter) {
+          attrs.plotter = Dygraph.Plotters[x.plotter];
+        }
+
+        // custom data handler
+        if (x.dataHandler) {
+          attrs.dataHandler = Dygraph.DataHandlers[x.dataHandler];
+        }
+
+        // custom circles
+        if (x.pointShape) {
+          if (typeof x.pointShape === 'string') {
+            attrs.drawPointCallback = Dygraph.Circles[x.pointShape.toUpperCase()];
+            attrs.drawHighlightPointCallback = Dygraph.Circles[x.pointShape.toUpperCase()];
+          } else {
+            for (var s in x.pointShape) {
+              if (x.pointShape.hasOwnProperty(s)) {
+                attrs.series[s].drawPointCallback = Dygraph.Circles[x.pointShape[s].toUpperCase()];
+                attrs.series[s].drawHighlightPointCallback = Dygraph.Circles[x.pointShape[s].toUpperCase()];
+              }
+            }
+          }
+        }
+    
+        // if there is no existing dygraph perform initialization
+        if (!dygraph) {
+          
+          // subscribe to custom shown event (fired by ioslides to trigger
+          // shiny reactivity but we can use it as well). this is necessary
+          // because if a dygraph starts out as display:none it has height
+          // and width == 0 and this doesn't change when it becomes visible
+          $(el).closest('slide').on('shown', function() {
+            if (dygraph)
+              dygraph.resize();  
+          });
+          
+          // do the same for reveal.js
+          $(el).closest('section.slide').on('shown', function() {
+            if (dygraph)
+              dygraph.resize();  
+          });
+          
+          // redraw on R Markdown {.tabset} tab visibility changed
+          var tab = $(el).closest('div.tab-pane');
+          if (tab !== null) {
+            var tabID = tab.attr('id');
+            var tabAnchor = $('a[data-toggle="tab"][href="#' + tabID + '"]');
+            if (tabAnchor !== null) {
+              tabAnchor.on('shown.bs.tab', function() {
+                if (dygraph)
+                  dygraph.resize();  
+              });
+            }
+          }
+          // add default font for viewer mode
+          if (this.queryVar("viewer_pane") === "1")
+            document.body.style.fontFamily = "Arial, sans-serif";
+    
+          // inject css if necessary
+          if (x.css != null) {
+            var style = document.createElement('style');
+            style.type = 'text/css';
+            if (style.styleSheet) 
+              style.styleSheet.cssText = x.css;
+            else 
+              style.appendChild(document.createTextNode(x.css));
+            document.getElementsByTagName("head")[0].appendChild(style);
+          }
+          
+        } else {
+          
+            // retain the userDateWindow if requested
+            if (dygraph.userDateWindow != null
+                && attrs.retainDateWindow == true) {
+              attrs.dateWindow = dygraph.xAxisRange();
+            }
+                
+            // remove it from groups if it's there
+            if (x.group != null && groups[x.group] != null) {
+              var index = groups[x.group].indexOf(dygraph);
+              if (index != -1)
+                groups[x.group].splice(index, 1);
+            }
+            
+            // destroy the existing dygraph 
+            dygraph.destroy();
+            dygraph = null;
+        }
+        
+        // create the dygraph and add it to it's group (if any)
+        dygraph = thiz.dygraph = new Dygraph(el, attrs.file, attrs);
+        dygraph.userDateWindow = attrs.dateWindow;
+        if (x.group != null)
+          groups[x.group].push(dygraph);
+   	
+        // add shiny inputs for date window and click
+        if (HTMLWidgets.shinyMode) {
+          var isDate = x.format == "date";
+          this.addClickShinyInput(el.id, isDate);
+          this.addDateWindowShinyInput(el.id, isDate);
+        }
+        
+        // set annotations
+        if (x.annotations != null) {
+          dygraph.ready(function() {
+            if (x.format == "date") {
+              x.annotations.map(function(annotation) {
+                var date = thiz.normalizeDateValue(x.scale, annotation.x, x.fixedtz);
+                annotation.x = date.getTime();
+              });
+            }
+            dygraph.setAnnotations(x.annotations);
+          }); 
+        }
+          
+      },
+      
+      customDateTickerFixedTZ : function(tz){
+        return function(t,e,a,i,r) {   
+          var a=Dygraph.pickDateTickGranularity(t,e,a,i);
+          if(a >= 0){
+            
+            var n=i("axisLabelFormatter"),
+            o=i("labelsUTC"),
+            s=o?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal;
+            l=Dygraph.TICK_PLACEMENT[a].datefield;
+            h=Dygraph.TICK_PLACEMENT[a].step;
+            p=Dygraph.TICK_PLACEMENT[a].spacing;
+            
+            var y = [];
+            var d = moment(t);
+            d.tz(tz); 
+            d.millisecond(0);
+          
+            if(l > Dygraph.DATEFIELD_M){
+              var x;
+              if (l === Dygraph.DATEFIELD_SS) {  // seconds 
+                x = d.second();         
+                d.second(x - x % h);     
+              } else if(l === Dygraph.DATEFIELD_MM){
+                d.second(0)
+                x = d.minute();
+                d.minute(x - x % h);
+              } else if(l === Dygraph.DATEFIELD_HH){
+                d.second(0);
+                d.minute(0);
+                x = d.hour();
+                d.hour(x - x % h);
+              } else if(l === Dygraph.DATEFIELD_D){
+                d.second(0);
+                d.minute(0);
+                d.hour(0);
+                if (h == 7) {  // one week
+                    d.startOf('week');
+                }
+              }
+              
+              v = d.valueOf();
+              _=moment(v).tz(tz);
+            
+              // For spacings coarser than two-hourly, we want to ignore daylight
+              // savings transitions to get consistent ticks. For finer-grained ticks,
+              // it's essential to show the DST transition in all its messiness.
+              var start_offset_min = moment(v).tz(tz).zone();
+              var check_dst = (p >= Dygraph.TICK_PLACEMENT[Dygraph.TWO_HOURLY].spacing);
+              
+    	        if(a<=Dygraph.HOURLY){
+    		        for(t>v&&(v+=p,_=moment(v).tz(tz));e>=v;){
+    			        y.push({v:v,label:n(_,a,i,r)});
+    			        v+=p;
+    			        _=moment(v).tz(tz);
+    		        }
+    	        }else{
+                for(t>v&&(v+=p,_=moment(v).tz(tz));e>=v;){  
+                
+                  // This ensures that we stay on the same hourly "rhythm" across
+                  // daylight savings transitions. Without this, the ticks could get off
+                  // by an hour. See tests/daylight-savings.html or issue 147.
+                  if (check_dst && _.zone() != start_offset_min) {
+                    var delta_min = _.zone() - start_offset_min;
+                    v += delta_min * 60 * 1000;
+                    _= moment(v).tz(tz);
+                    start_offset_min = _.zone();
+    
+                    // Check whether we've backed into the previous timezone again.
+                    // This can happen during a "spring forward" transition. In this case,
+                    // it's best to skip this tick altogether (we may be shooting for a
+                    // non-existent time like the 2AM that's skipped) and go to the next
+                    // one.
+                    if (moment(v + p).tz(tz).zone() != start_offset_min) {
+                      v += p;
+                      _= moment(v).tz(tz);
+                      start_offset_min = _.zone();
+                    }
+                  }
+                
+                  (a>=Dygraph.DAILY||_.get('hour')%h===0)&&y.push({v:v,label:n(_,a,i,r)});
+    			        v+=p;
+    			        _=moment(v).tz(tz);
+    		        }
+    	        }
+    	      }else{
+              var start_year = moment(t).tz(tz).year();
+              var end_year   = moment(e).tz(tz).year();
+              var start_month = moment(t).tz(tz).month();
+              
+              if(l === Dygraph.DATEFIELD_M){
+                var step_month = h;
+                for (var ii = start_year; ii <= end_year; ii++) {
+                  for (var j = 0; j < 12;) {
+                    var dt = moment(new Date(ii, j, 1)).tz(tz); 
+                    // fix some tz bug
+                    dt.year(ii);
+                    dt.month(j);
+                    dt.date(1);
+                    dt.hour(0);
+                    v = dt.valueOf();
+                    y.push({v:v,label:n(moment(v).tz(tz),a,i,r)});
+                    j+=step_month;
+                  }
+                }
+              }else{
+                var step_year = h;
+                for (var ii = start_year; ii <= end_year;) {
+                  var dt = moment(new Date(ii, 1, 1)).tz(tz); 
+                  // fix some tz bug
+                  dt.year(ii);
+                  dt.month(j);
+                  dt.date(1);
+                  dt.hour(0);
+                  v = dt.valueOf();
+                  y.push({v:v,label:n(moment(v).tz(tz),a,i,r)});
+                  ii+=step_year;
+                }
+              }
+    	      }
+    	      return y;
+    	    }else{
+           return []; 
+    	    }
+        };
+      },
+    
+      xAxisLabelFormatterFixedTZ : function(tz){
+      
+        return function dateAxisFormatter(date, granularity){
+          var mmnt = moment(date).tz(tz);
+          if (granularity >= Dygraph.DECADAL){
+            return mmnt.format('YYYY');
+          }else{
+            if(granularity >= Dygraph.MONTHLY){
+              return mmnt.format('MMM YYYY');
+            }else{
+              var frac = mmnt.hour() * 3600 + mmnt.minute() * 60 + mmnt.second() + mmnt.millisecond();
+                if (frac === 0 || granularity >= Dygraph.DAILY) {
+                  return mmnt.format('DD MMM');
+                } else {
+                 if (mmnt.second()) {
+                   return mmnt.format('HH:mm:ss');
+                 } else {
+                   return mmnt.format('HH:mm');
+                 }
+                }
+             } 
+                            
+           }         
+       }
+      },
+             
+      xValueFormatterFixedTZ: function(scale, tz) {
+                       
+        return function(millis) {
+          var mmnt = moment(millis).tz(tz);
+          if (scale == "yearly")
+            return mmnt.format('YYYY') + ' (' + mmnt.zoneAbbr() + ')';
+          else if (scale == "quarterly")
+            return mmnt.fquarter(1) + ' (' + mmnt.zoneAbbr() + ')';
+            else if (scale == "monthly")
+              return mmnt.format('MMM, YYYY')+ ' (' + mmnt.zoneAbbr() + ')';
+            else if (scale == "daily" || scale == "weekly")
+              return mmnt.format('MMM, DD, YYYY')+ ' (' + mmnt.zoneAbbr() + ')';
+            else
+              return mmnt.format('dddd, MMMM DD, YYYY HH:mm:ss')+ ' (' + mmnt.zoneAbbr() + ')';
+        }
+      },
+      
+      xValueFormatter: function(scale) {
+        
+        var monthNames = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
+                          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"];
+                          
+        return function(millis) {
+          var date = new Date(millis);
+          if (scale == "yearly")
+            return date.getFullYear();
+          else if (scale == "quarterly")
+            return moment(millis).fquarter(1);
+          else if (scale == "monthly")
+            return monthNames[date.getMonth()] + ', ' + date.getFullYear(); 
+          else if (scale == "daily" || scale == "weekly")
+            return monthNames[date.getMonth()] + ', ' + 
+                              date.getDate() + ', ' + 
+                              date.getFullYear();
+          else
+            return date.toLocaleString();
+        }
+      },
+      
+      addZoomCallback: function(x) {
+        
+        // alias this
+        var thiz = this;
+        
+        // get attrs
+        var attrs = x.attrs;
+        
+        // check for an existing zoomCallback
+        var prevZoomCallback = attrs["zoomCallback"];
+        
+        attrs.zoomCallback = function(minDate, maxDate, yRanges) {
+          
+          // call existing
+          if (prevZoomCallback)
+            prevZoomCallback(minDate, maxDate, yRanges);
+            
+          // record user date window (or lack thereof)
+          if (dygraph.xAxisExtremes()[0] != minDate ||
+              dygraph.xAxisExtremes()[1] != maxDate) {
+             dygraph.userDateWindow = [minDate, maxDate];
+          } else {
+             dygraph.userDateWindow = null;
+          }
+          
+          // record in group if necessary
+          if (x.group != null && groups[x.group] != null) {
+            var group = groups[x.group];
+            for(var i = 0; i<group.length; i++)
+              group[i].userDateWindow = dygraph.userDateWindow;
+          }
+        };
+      },
+      
+      addGroupDrawCallback: function(x) {
+        
+        // get attrs
+        var attrs = x.attrs;
+        
+        // check for an existing drawCallback
+        var prevDrawCallback = attrs["drawCallback"];
+        
+        groups[x.group] = groups[x.group] || [];
+        var group = groups[x.group];
+        var blockRedraw = false;
+        attrs.drawCallback = function(me, initial) {
+          
+          // call existing
+          if (prevDrawCallback)
+            prevDrawCallback(me, initial);
+          
+          // sync peers in group
+          if (blockRedraw || initial) return;
+          blockRedraw = true;
+          var range = dygraph.xAxisRange();
+          for (var j = 0; j < group.length; j++) {
+            if (group[j] == me) continue;
+            // update group range only if it's different (prevents
+            // infinite recursion in updateOptions)
+            var peerRange = group[j].xAxisRange();
+            if (peerRange[0] != range[0] || peerRange[1] != range[1]) {
+              group[j].updateOptions({
+                dateWindow: range
+              });
+            }
+          }
+          blockRedraw = false;
+        };
+      },
+      
+      addShadingCallback: function(x) {
+        
+        // bail if no shadings
+        if (x.shadings.length == 0)
+          return;
+        
+        // alias this
+        var thiz = this;
+        
+        // get attrs
+        var attrs = x.attrs;
+        
+        // check for an existing underlayCallback
+        var prevUnderlayCallback = attrs["underlayCallback"];
+        
+        // install callback
+        attrs.underlayCallback = function(canvas, area, g) {
+          
+          // call existing
+          if (prevUnderlayCallback)
+            prevUnderlayCallback(canvas, area, g);
+            
+          for (var i = 0; i < x.shadings.length; i++) {
+            var shading = x.shadings[i];
+            canvas.save();
+            canvas.fillStyle = shading.color;
+            if (shading.axis == "x") {
+              var x1 = shading.from;
+              var x2 = shading.to;
+              if (x.format == "date") {
+                x1 = thiz.normalizeDateValue(x.scale, x1, x.fixedtz).getTime();
+                x2 = thiz.normalizeDateValue(x.scale, x2, x.fixedtz).getTime();
+              }
+              var left = g.toDomXCoord(x1);
+              var right = g.toDomXCoord(x2);
+              
+              canvas.fillRect(left, area.y, right - left, area.h);
+            } else if (shading.axis == "y") {
+              var bottom = g.toDomYCoord(shading.from);
+              var top = g.toDomYCoord(shading.to);
+    
+              canvas.fillRect(area.x, bottom, area.w, top - bottom);
+            }
+            canvas.restore();
+          }
+        };
+      },
+      
+      addEventCallback: function(x) {
+        
+        // bail if no evets
+        if (x.events.length == 0)
+          return;
+        
+        // alias this
+        var thiz = this;
+        
+        // get attrs
+        var attrs = x.attrs;
+        
+        // check for an existing underlayCallback
+        var prevUnderlayCallback = attrs["underlayCallback"];
+        
+        // install callback
+        attrs.underlayCallback = function(canvas, area, g) {
+          
+          // call existing
+          if (prevUnderlayCallback)
+            prevUnderlayCallback(canvas, area, g);
+            
+          for (var i = 0; i < x.events.length; i++) {
+            
+            // get event and x-coordinate
+            var event = x.events[i];
+            
+            // draw line
+            canvas.save();
+            canvas.strokeStyle = event.color;
+            if (event.axis == "x") {
+              var xPos;
+              if (jQuery.isNumeric(event.pos)) {
+                xPos = g.toDomXCoord(event.pos);
+              } else {
+                xPos = thiz.normalizeDateValue(x.scale, event.pos, x.fixedtz).getTime();
+                xPos = g.toDomXCoord(xPos);
+              }
+              
+              // draw line
+              thiz.dashedLine(canvas, 
+                              xPos, 
+                              area.y, 
+                              xPos, 
+                              area.y + area.h,
+                              event.strokePattern);
+            } else if (event.axis == "y") {
+              yPos = g.toDomYCoord(event.pos);
+              
+              thiz.dashedLine(canvas, 
+                              area.x, 
+                              yPos, 
+                              area.x + area.w, 
+                              yPos,
+                              event.strokePattern);
+            }
+            canvas.restore();
+            
+            // draw label
+            if (event.label != null) {
+              canvas.save();
+              thiz.setFontSize(canvas, 12);
+              var size = canvas.measureText(event.label);
+              if (event.axis == "x") {
+                var tx = xPos - 4;
+                var ty;
+                if (event.labelLoc == "top")
+                  ty = area.y + size.width + 10;
+                else
+                  ty = area.y + area.h - 10;
+                canvas.translate(tx, ty);
+                canvas.rotate(3 * Math.PI / 2);
+                canvas.translate(-tx,-ty);
+              } else if (event.axis == "y") {
+                var ty = yPos - 4;
+                var tx;
+                if (event.labelLoc == "right")
+                  tx = area.x + area.w - size.width - 10;
+                else
+                  tx = area.x + 10;
+              }
+              canvas.fillStyle = event.color;
+              canvas.fillText(event.label, tx, ty);
+              canvas.restore();
+            }
+          }
+        };
+      },
+      
+      addDateWindowShinyInput: function(id, isDate) {
+          
+        // check for an existing drawCallback
+        var prevDrawCallback = dygraph.getOption("drawCallback");
+        
+        // install the callback
+        dygraph.updateOptions({
+          drawCallback: function(me, initial) {
+            // call existing
+            if (prevDrawCallback)
+              prevDrawCallback(me, initial);
+            // fire input change
+            var range = dygraph.xAxisRange();
+            if (isDate)
+              range = [new Date(range[0]), new Date(range[1])];
+            if (Shiny.onInputChange) // may note be ready yet in case of static render
+              Shiny.onInputChange(id + "_date_window", range); 
+          }
+        });
+      },
+      
+      addClickShinyInput: function(id, isDate) {
+        
+        var prevClickCallback = dygraph.getOption("clickCallback")
+        
+        dygraph.updateOptions({
+          clickCallback: function(e, x, points) {
+            
+            // call existing
+            if (prevClickCallback)
+              prevClickCallback(e, x, points);
+              
+			      // fire input change
+			      if (Shiny.onInputChange) { // may note be ready yet in case of static render
+              Shiny.onInputChange(el.id + "_click", {
+        				x: isDate ? new Date(x) : x,
+        				x_closest_point: isDate ? new Date(points[0].xval) : points[0].xval,
+        				y_closest_point: points[0].yval,
+        				series_name: points[0].name,
+        				'.nonce': Math.random() // Force reactivity if click hasn't changed
+  			      }); 
+			      }
+          }
+        });
+      },
+      
+      // Add dashed line support to canvas rendering context
+      // See: http://stackoverflow.com/questions/4576724/dotted-stroke-in-canvas
+      dashedLine: function(canvas, x, y, x2, y2, dashArray) {
+        canvas.beginPath();
+        if (!dashArray) dashArray=[10,5];
+        if (dashLength==0) dashLength = 0.001; // Hack for Safari
+        var dashCount = dashArray.length;
+        canvas.moveTo(x, y);
+        var dx = (x2-x), dy = (y2-y);
+        var slope = dx ? dy/dx : 1e15;
+        var distRemaining = Math.sqrt( dx*dx + dy*dy );
+        var dashIndex=0, draw=true;
+        while (distRemaining>=0.1){
+          var dashLength = dashArray[dashIndex++%dashCount];
+          if (dashLength > distRemaining) dashLength = distRemaining;
+          var xStep = Math.sqrt( dashLength*dashLength / (1 + slope*slope) );
+          if (dx<0) xStep = -xStep;
+          x += xStep
+          y += slope*xStep;
+          canvas[draw ? 'lineTo' : 'moveTo'](x,y);
+          distRemaining -= dashLength;
+          draw = !draw;
+        }
+        canvas.stroke();
+      },
+      
+      setFontSize: function(canvas, size) {
+        var cFont = canvas.font;
+        var parts = cFont.split(' ');
+        if (parts.length === 2)
+          canvas.font = size + 'px ' + parts[1];
+        else if (parts.length === 3)
+          canvas.font = parts[0] + ' ' + size + 'px ' + parts[2];
+      },
+      
+      // Returns the value of a GET variable
+      queryVar: function(name) {
+        return decodeURI(window.location.search.replace(
+          new RegExp("^(?:.*[&\\?]" +
+                     encodeURI(name).replace(/[\.\+\*]/g, "\\$&") +
+                     "(?:\\=([^&]*))?)?.*$", "i"),
+          "$1"));
+      },
+      
+      // We deal exclusively in UTC dates within R, however dygraphs deals 
+      // exclusively in the local time zone. Therefore, in order to plot date
+      // labels that make sense to the user when we are dealing with days,
+      // months or years we need to convert the UTC date value to a local time
+      // value that "looks like" the equivilant UTC value. To do this we add the
+      // timezone offset to the UTC date.
+      // Don't use in case of fixedtz
+      normalizeDateValue: function(scale, value, fixedtz) {
+        var date = new Date(value); 
+        if (scale != "minute" && scale != "hourly" && scale != "seconds" && !fixedtz) {
+          var localAsUTC = date.getTime() + (date.getTimezoneOffset() * 60000);
+          date = new Date(localAsUTC);
+        }
+        return date;
+      },
+      
+      // safely detect rendering on a mobile phone
+      isMobilePhone: function() {
+        try
+        {
+          return ! window.matchMedia("only screen and (min-width: 768px)").matches;
+        }
+        catch(e) {
+          return false;
+        }
+      },
+        
+      
+      resize: function(width, height) {
+        if (dygraph)
+          dygraph.resize();
+      },
+      
+      // export dygraph so other code can get a hold of it
+      dygraph: null
+    
+    };
+  },
+  
+  // track groups globally
+  groups: {}
+  
+});
+
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf
new file mode 100644
index 000000000..35acda2fa
Binary files /dev/null and b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf differ
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css
new file mode 100644
index 000000000..8e5bb8a3c
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css
@@ -0,0 +1,99 @@
+.book .book-header h1 {
+  padding-left: 20px;
+  padding-right: 20px;
+}
+.book .book-header.fixed {
+  position: fixed;
+  right: 0;
+  top: 0;
+  left: 0;
+  border-bottom: 1px solid rgba(0,0,0,.07);
+}
+span.search-highlight {
+  background-color: #ffff88;
+}
+@media (min-width: 600px) {
+  .book.with-summary .book-header.fixed {
+    left: 300px;
+  }
+}
+@media (max-width: 1240px) {
+  .book .book-body.fixed {
+    top: 50px;
+  }
+  .book .book-body.fixed .body-inner {
+    top: auto;
+  }
+}
+@media (max-width: 600px) {
+  .book.with-summary .book-header.fixed {
+    left: calc(100% - 60px);
+    min-width: 300px;
+  }
+  .book.with-summary .book-body {
+    transform: none;
+    left: calc(100% - 60px);
+    min-width: 300px;
+  }
+  .book .book-body.fixed {
+    top: 0;
+  }
+}
+
+.book .book-body.fixed .body-inner {
+  top: 50px;
+}
+.book .book-body .page-wrapper .page-inner section.normal sub, .book .book-body .page-wrapper .page-inner section.normal sup {
+  font-size: 85%;
+}
+
+@media print {
+  .book .book-summary, .book .book-body .book-header, .fa {
+    display: none !important;
+  }
+  .book .book-body.fixed {
+    left: 0px;
+  }
+  .book .book-body,.book .book-body .body-inner, .book.with-summary {
+    overflow: visible !important;
+  }
+}
+.kable_wrapper {
+  border-spacing: 20px 0;
+  border-collapse: separate;
+  border: none;
+  margin: auto;
+}
+.kable_wrapper > tbody > tr > td {
+  vertical-align: top;
+}
+.book .book-body .page-wrapper .page-inner section.normal table tr.header {
+  border-top-width: 2px;
+}
+.book .book-body .page-wrapper .page-inner section.normal table tr:last-child td {
+  border-bottom-width: 2px;
+}
+.book .book-body .page-wrapper .page-inner section.normal table td, .book .book-body .page-wrapper .page-inner section.normal table th {
+  border-left: none;
+  border-right: none;
+}
+.book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr, .book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr > td {
+  border-top: none;
+}
+.book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr:last-child > td {
+    border-bottom: none;
+}
+
+div.theorem, div.lemma, div.corollary, div.proposition, div.conjecture {
+  font-style: italic;
+}
+span.theorem, span.lemma, span.corollary, span.proposition, span.conjecture {
+  font-style: normal;
+}
+div.proof:after {
+  content: "\25a2";
+  float: right;
+}
+.header-section-number {
+  padding-right: .5em;
+}
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css
new file mode 100644
index 000000000..87236b4c0
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css
@@ -0,0 +1,292 @@
+/*
+ * Theme 1
+ */
+.color-theme-1 .dropdown-menu {
+  background-color: #111111;
+  border-color: #7e888b;
+}
+.color-theme-1 .dropdown-menu .dropdown-caret .caret-inner {
+  border-bottom: 9px solid #111111;
+}
+.color-theme-1 .dropdown-menu .buttons {
+  border-color: #7e888b;
+}
+.color-theme-1 .dropdown-menu .button {
+  color: #afa790;
+}
+.color-theme-1 .dropdown-menu .button:hover {
+  color: #73553c;
+}
+/*
+ * Theme 2
+ */
+.color-theme-2 .dropdown-menu {
+  background-color: #2d3143;
+  border-color: #272a3a;
+}
+.color-theme-2 .dropdown-menu .dropdown-caret .caret-inner {
+  border-bottom: 9px solid #2d3143;
+}
+.color-theme-2 .dropdown-menu .buttons {
+  border-color: #272a3a;
+}
+.color-theme-2 .dropdown-menu .button {
+  color: #62677f;
+}
+.color-theme-2 .dropdown-menu .button:hover {
+  color: #f4f4f5;
+}
+.book .book-header .font-settings .font-enlarge {
+  line-height: 30px;
+  font-size: 1.4em;
+}
+.book .book-header .font-settings .font-reduce {
+  line-height: 30px;
+  font-size: 1em;
+}
+.book.color-theme-1 .book-body {
+  color: #704214;
+  background: #f3eacb;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section {
+  background: #f3eacb;
+}
+.book.color-theme-2 .book-body {
+  color: #bdcadb;
+  background: #1c1f2b;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section {
+  background: #1c1f2b;
+}
+.book.font-size-0 .book-body .page-inner section {
+  font-size: 1.2rem;
+}
+.book.font-size-1 .book-body .page-inner section {
+  font-size: 1.4rem;
+}
+.book.font-size-2 .book-body .page-inner section {
+  font-size: 1.6rem;
+}
+.book.font-size-3 .book-body .page-inner section {
+  font-size: 2.2rem;
+}
+.book.font-size-4 .book-body .page-inner section {
+  font-size: 4rem;
+}
+.book.font-family-0 {
+  font-family: Georgia, serif;
+}
+.book.font-family-1 {
+  font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal {
+  color: #704214;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal a {
+  color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h1,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h2,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h3,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h4,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h5,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h6 {
+  color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h1,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h2 {
+  border-color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h6 {
+  color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal hr {
+  background-color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal blockquote {
+  border-color: #c4b29f;
+  opacity: 0.9;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code {
+  background: #fdf6e3;
+  color: #657b83;
+  border-color: #f8df9c;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal .highlight {
+  background-color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table th,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table td {
+  border-color: #f5d06c;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table tr {
+  color: inherit;
+  background-color: #fdf6e3;
+  border-color: #444444;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n) {
+  background-color: #fbeecb;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal {
+  color: #bdcadb;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal a {
+  color: #3eb1d0;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h1,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h2,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h3,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h4,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h5,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h6 {
+  color: #fffffa;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h1,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h2 {
+  border-color: #373b4e;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h6 {
+  color: #373b4e;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal hr {
+  background-color: #373b4e;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal blockquote {
+  border-color: #373b4e;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code {
+  color: #9dbed8;
+  background: #2d3143;
+  border-color: #2d3143;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal .highlight {
+  background-color: #282a39;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table th,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table td {
+  border-color: #3b3f54;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table tr {
+  color: #b6c2d2;
+  background-color: #2d3143;
+  border-color: #3b3f54;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n) {
+  background-color: #35394b;
+}
+.book.color-theme-1 .book-header {
+  color: #afa790;
+  background: transparent;
+}
+.book.color-theme-1 .book-header .btn {
+  color: #afa790;
+}
+.book.color-theme-1 .book-header .btn:hover {
+  color: #73553c;
+  background: none;
+}
+.book.color-theme-1 .book-header h1 {
+  color: #704214;
+}
+.book.color-theme-2 .book-header {
+  color: #7e888b;
+  background: transparent;
+}
+.book.color-theme-2 .book-header .btn {
+  color: #3b3f54;
+}
+.book.color-theme-2 .book-header .btn:hover {
+  color: #fffff5;
+  background: none;
+}
+.book.color-theme-2 .book-header h1 {
+  color: #bdcadb;
+}
+.book.color-theme-1 .book-body .navigation {
+  color: #afa790;
+}
+.book.color-theme-1 .book-body .navigation:hover {
+  color: #73553c;
+}
+.book.color-theme-2 .book-body .navigation {
+  color: #383f52;
+}
+.book.color-theme-2 .book-body .navigation:hover {
+  color: #fffff5;
+}
+/*
+ * Theme 1
+ */
+.book.color-theme-1 .book-summary {
+  color: #afa790;
+  background: #111111;
+  border-right: 1px solid rgba(0, 0, 0, 0.07);
+}
+.book.color-theme-1 .book-summary .book-search {
+  background: transparent;
+}
+.book.color-theme-1 .book-summary .book-search input,
+.book.color-theme-1 .book-summary .book-search input:focus {
+  border: 1px solid transparent;
+}
+.book.color-theme-1 .book-summary ul.summary li.divider {
+  background: #7e888b;
+  box-shadow: none;
+}
+.book.color-theme-1 .book-summary ul.summary li i.fa-check {
+  color: #33cc33;
+}
+.book.color-theme-1 .book-summary ul.summary li.done > a {
+  color: #877f6a;
+}
+.book.color-theme-1 .book-summary ul.summary li a,
+.book.color-theme-1 .book-summary ul.summary li span {
+  color: #877f6a;
+  background: transparent;
+  font-weight: normal;
+}
+.book.color-theme-1 .book-summary ul.summary li.active > a,
+.book.color-theme-1 .book-summary ul.summary li a:hover {
+  color: #704214;
+  background: transparent;
+  font-weight: normal;
+}
+/*
+ * Theme 2
+ */
+.book.color-theme-2 .book-summary {
+  color: #bcc1d2;
+  background: #2d3143;
+  border-right: none;
+}
+.book.color-theme-2 .book-summary .book-search {
+  background: transparent;
+}
+.book.color-theme-2 .book-summary .book-search input,
+.book.color-theme-2 .book-summary .book-search input:focus {
+  border: 1px solid transparent;
+}
+.book.color-theme-2 .book-summary ul.summary li.divider {
+  background: #272a3a;
+  box-shadow: none;
+}
+.book.color-theme-2 .book-summary ul.summary li i.fa-check {
+  color: #33cc33;
+}
+.book.color-theme-2 .book-summary ul.summary li.done > a {
+  color: #62687f;
+}
+.book.color-theme-2 .book-summary ul.summary li a,
+.book.color-theme-2 .book-summary ul.summary li span {
+  color: #c1c6d7;
+  background: transparent;
+  font-weight: 600;
+}
+.book.color-theme-2 .book-summary ul.summary li.active > a,
+.book.color-theme-2 .book-summary ul.summary li a:hover {
+  color: #f4f4f5;
+  background: #252737;
+  font-weight: 600;
+}
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css
new file mode 100644
index 000000000..2aabd3deb
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css
@@ -0,0 +1,426 @@
+.book .book-body .page-wrapper .page-inner section.normal pre,
+.book .book-body .page-wrapper .page-inner section.normal code {
+  /* http://jmblog.github.com/color-themes-for-google-code-highlightjs */
+  /* Tomorrow Comment */
+  /* Tomorrow Red */
+  /* Tomorrow Orange */
+  /* Tomorrow Yellow */
+  /* Tomorrow Green */
+  /* Tomorrow Aqua */
+  /* Tomorrow Blue */
+  /* Tomorrow Purple */
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-comment,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-comment,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-title {
+  color: #8e908c;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-variable,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-variable,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-attribute,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-tag,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-tag,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-regexp,
+.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-constant,
+.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-constant,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-tag .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-tag .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-pi,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-pi,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-doctype,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-doctype,
+.book .book-body .page-wrapper .page-inner section.normal pre .html .hljs-doctype,
+.book .book-body .page-wrapper .page-inner section.normal code .html .hljs-doctype,
+.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-id,
+.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-id,
+.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-class,
+.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-class,
+.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo,
+.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo {
+  color: #c82829;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-number,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-number,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-pragma,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-built_in,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-literal,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-literal,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-params,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-params,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-constant,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-constant {
+  color: #f5871f;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-class .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-class .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-rules .hljs-attribute,
+.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-rules .hljs-attribute {
+  color: #eab700;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-string,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-string,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-value,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-value,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-inheritance,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-inheritance,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-header,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-header,
+.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-symbol,
+.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-symbol,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata {
+  color: #718c00;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-hexcolor,
+.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-hexcolor {
+  color: #3e999f;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-function,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-function,
+.book .book-body .page-wrapper .page-inner section.normal pre .python .hljs-decorator,
+.book .book-body .page-wrapper .page-inner section.normal code .python .hljs-decorator,
+.book .book-body .page-wrapper .page-inner section.normal pre .python .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .python .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-function .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-function .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-title .hljs-keyword,
+.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-title .hljs-keyword,
+.book .book-body .page-wrapper .page-inner section.normal pre .perl .hljs-sub,
+.book .book-body .page-wrapper .page-inner section.normal code .perl .hljs-sub,
+.book .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal pre .coffeescript .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .coffeescript .hljs-title {
+  color: #4271ae;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-keyword,
+.book .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-function,
+.book .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-function {
+  color: #8959a8;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs {
+  display: block;
+  background: white;
+  color: #4d4d4c;
+  padding: 0.5em;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .coffeescript .javascript,
+.book .book-body .page-wrapper .page-inner section.normal code .coffeescript .javascript,
+.book .book-body .page-wrapper .page-inner section.normal pre .javascript .xml,
+.book .book-body .page-wrapper .page-inner section.normal code .javascript .xml,
+.book .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula,
+.book .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .javascript,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .javascript,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .vbscript,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .vbscript,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .css,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .css,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata {
+  opacity: 0.5;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code {
+  /*
+
+Orginal Style from ethanschoonover.com/solarized (c) Jeremy Hull <sourdrums@gmail.com>
+
+*/
+  /* Solarized Green */
+  /* Solarized Cyan */
+  /* Solarized Blue */
+  /* Solarized Yellow */
+  /* Solarized Orange */
+  /* Solarized Red */
+  /* Solarized Violet */
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs {
+  display: block;
+  padding: 0.5em;
+  background: #fdf6e3;
+  color: #657b83;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-comment,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-comment,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-template_comment,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-template_comment,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .diff .hljs-header,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .diff .hljs-header,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-doctype,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-doctype,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-pi,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-pi,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .lisp .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .lisp .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-javadoc,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-javadoc {
+  color: #93a1a1;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-keyword,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-winutils,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-winutils,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .method,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .method,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-addition,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-addition,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-tag,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-tag,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-request,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-request,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-status,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-status,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .nginx .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .nginx .hljs-title {
+  color: #859900;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-number,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-number,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-command,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-command,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-tag .hljs-value,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-tag .hljs-value,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-rules .hljs-value,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-rules .hljs-value,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-phpdoc,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-phpdoc,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-regexp,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-hexcolor,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-hexcolor,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_url,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_url {
+  color: #2aa198;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-localvars,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-localvars,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-chunk,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-chunk,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-decorator,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-decorator,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-built_in,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-identifier,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-identifier,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .vhdl .hljs-literal,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .vhdl .hljs-literal,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-id,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-id,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-function,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-function {
+  color: #268bd2;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-attribute,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-variable,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-variable,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .lisp .hljs-body,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .lisp .hljs-body,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .smalltalk .hljs-number,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .smalltalk .hljs-number,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-constant,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-constant,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-class .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-class .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-parent,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-parent,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .haskell .hljs-type,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .haskell .hljs-type,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_reference,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_reference {
+  color: #b58900;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor .hljs-keyword,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor .hljs-keyword,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-pragma,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-shebang,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-shebang,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-symbol,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-symbol,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-symbol .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-symbol .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .diff .hljs-change,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .diff .hljs-change,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-special,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-special,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-attr_selector,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-attr_selector,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-subst,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-subst,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-cdata,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-cdata,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .clojure .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .clojure .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-header,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-header {
+  color: #cb4b16;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-deletion,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-deletion,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-important,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-important {
+  color: #dc322f;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_label,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_label {
+  color: #6c71c4;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula {
+  background: #eee8d5;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code {
+  /* Tomorrow Night Bright Theme */
+  /* Original theme - https://github.com/chriskempson/tomorrow-theme */
+  /* http://jmblog.github.com/color-themes-for-google-code-highlightjs */
+  /* Tomorrow Comment */
+  /* Tomorrow Red */
+  /* Tomorrow Orange */
+  /* Tomorrow Yellow */
+  /* Tomorrow Green */
+  /* Tomorrow Aqua */
+  /* Tomorrow Blue */
+  /* Tomorrow Purple */
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-comment,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-comment,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-title {
+  color: #969896;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-variable,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-variable,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-attribute,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-tag,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-tag,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-regexp,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-constant,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-constant,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-tag .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-tag .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-pi,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-pi,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-doctype,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-doctype,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .html .hljs-doctype,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .html .hljs-doctype,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-id,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-id,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-class,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-class,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo {
+  color: #d54e53;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-number,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-number,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-pragma,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-built_in,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-literal,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-literal,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-params,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-params,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-constant,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-constant {
+  color: #e78c45;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-class .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-class .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-rules .hljs-attribute,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-rules .hljs-attribute {
+  color: #e7c547;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-string,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-string,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-value,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-value,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-inheritance,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-inheritance,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-header,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-header,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-symbol,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-symbol,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata {
+  color: #b9ca4a;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-hexcolor,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-hexcolor {
+  color: #70c0b1;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-function,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-function,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .python .hljs-decorator,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .python .hljs-decorator,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .python .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .python .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-function .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-function .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-title .hljs-keyword,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-title .hljs-keyword,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .perl .hljs-sub,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .perl .hljs-sub,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .coffeescript .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .coffeescript .hljs-title {
+  color: #7aa6da;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-keyword,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-function,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-function {
+  color: #c397d8;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs {
+  display: block;
+  background: black;
+  color: #eaeaea;
+  padding: 0.5em;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .coffeescript .javascript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .coffeescript .javascript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .xml,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .xml,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .javascript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .javascript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .vbscript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .vbscript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .css,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .css,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata {
+  opacity: 0.5;
+}
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css
new file mode 100644
index 000000000..d7ff2d991
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css
@@ -0,0 +1,28 @@
+.book .book-summary .book-search {
+  padding: 6px;
+  background: transparent;
+  position: absolute;
+  top: -50px;
+  left: 0px;
+  right: 0px;
+  transition: top 0.5s ease;
+}
+.book .book-summary .book-search input,
+.book .book-summary .book-search input:focus,
+.book .book-summary .book-search input:hover {
+  width: 100%;
+  background: transparent;
+  border: 1px solid #ccc;
+  box-shadow: none;
+  outline: none;
+  line-height: 22px;
+  padding: 7px 4px;
+  color: inherit;
+  box-sizing: border-box;
+}
+.book.with-search .book-summary .book-search {
+  top: 0px;
+}
+.book.with-search .book-summary ul.summary {
+  top: 50px;
+}
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css
new file mode 100644
index 000000000..7fba1b9fb
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css
@@ -0,0 +1 @@
+.book .book-body .page-wrapper .page-inner section.normal table{display:table;width:100%;border-collapse:collapse;border-spacing:0;overflow:auto}.book .book-body .page-wrapper .page-inner section.normal table td,.book .book-body .page-wrapper .page-inner section.normal table th{padding:6px 13px;border:1px solid #ddd}.book .book-body .page-wrapper .page-inner section.normal table tr{background-color:#fff;border-top:1px solid #ccc}.book .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n){background-color:#f8f8f8}.book .book-body .page-wrapper .page-inner section.normal table th{font-weight:700}
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css
new file mode 100644
index 000000000..b89689209
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css
@@ -0,0 +1,10 @@
+/*! normalize.css v2.1.0 | MIT License | git.io/normalize */img,legend{border:0}*,.fa{-webkit-font-smoothing:antialiased}.fa-ul>li,sub,sup{position:relative}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book-langs-index .inner .languages:after,.buttons:after,.dropdown-menu .buttons:after{clear:both}body,html{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}article,aside,details,figcaption,figure,footer,header,hgroup,main,nav,section,summary{display:block}audio,canvas,video{display:inline-block}.hidden,[hidden]{display:none}audio:not([controls]){display:none;height:0}html{font-family:sans-serif}body,figure{margin:0}a:focus{outline:dotted thin}a:active,a:hover{outline:0}h1{font-size:2em;margin:.67em 0}abbr[title]{border-bottom:1px dotted}b,strong{font-weight:700}dfn{font-style:italic}hr{-moz-box-sizing:content-box;box-sizing:content-box;height:0}mark{background:#ff0;color:#000}code,kbd,pre,samp{font-family:monospace,serif;font-size:1em}pre{white-space:pre-wrap}q{quotes:"\201C" "\201D" "\2018" "\2019"}small{font-size:80%}sub,sup{font-size:75%;line-height:0;vertical-align:baseline}sup{top:-.5em}sub{bottom:-.25em}svg:not(:root){overflow:hidden}fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}legend{padding:0}button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}button,input{line-height:normal}button,select{text-transform:none}button,html input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer}button[disabled],html input[disabled]{cursor:default}input[type=checkbox],input[type=radio]{box-sizing:border-box;padding:0}input[type=search]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}input[type=search]::-webkit-search-cancel-button{margin-right:10px;}button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}textarea{overflow:auto;vertical-align:top}table{border-collapse:collapse;border-spacing:0}/*!
+ * Preboot v2
+ *
+ * Open sourced under MIT license by @mdo.
+ * Some variables and mixins from Bootstrap (Apache 2 license).
+ */.link-inherit,.link-inherit:focus,.link-inherit:hover{color:inherit}.fa,.fa-stack{display:inline-block}/*!
+ *  Font Awesome 4.1.0 by @davegandy - http://fontawesome.io - @fontawesome
+ *  License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
+ */@font-face{font-family:FontAwesome;src:url(./fontawesome/fontawesome-webfont.ttf?v=4.1.0) format('truetype');font-weight:400;font-style:normal}.fa{font-family:FontAwesome;font-style:normal;font-weight:400;line-height:1;-moz-osx-font-smoothing:grayscale}.book .book-header,.book .book-summary{font-family:"Helvetica Neue",Helvetica,Arial,sans-serif}.fa-lg{font-size:1.33333333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571429em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14285714em;list-style-type:none}.fa-li{position:absolute;left:-2.14285714em;width:2.14285714em;top:.14285714em;text-align:center}.fa-li.fa-lg{left:-1.85714286em}.fa-border{padding:.2em .25em .15em;border:.08em solid #eee;border-radius:.1em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left{margin-right:.3em}.fa.pull-right{margin-left:.3em}.fa-spin{-webkit-animation:spin 2s infinite linear;-moz-animation:spin 2s infinite linear;-o-animation:spin 2s infinite linear;animation:spin 2s infinite linear}@-moz-keyframes spin{0%{-moz-transform:rotate(0)}100%{-moz-transform:rotate(359deg)}}@-webkit-keyframes spin{0%{-webkit-transform:rotate(0)}100%{-webkit-transform:rotate(359deg)}}@-o-keyframes spin{0%{-o-transform:rotate(0)}100%{-o-transform:rotate(359deg)}}@keyframes spin{0%{-webkit-transform:rotate(0);transform:rotate(0)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=1);-webkit-transform:rotate(90deg);-moz-transform:rotate(90deg);-ms-transform:rotate(90deg);-o-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2);-webkit-transform:rotate(180deg);-moz-transform:rotate(180deg);-ms-transform:rotate(180deg);-o-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=3);-webkit-transform:rotate(270deg);-moz-transform:rotate(270deg);-ms-transform:rotate(270deg);-o-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1);-webkit-transform:scale(-1,1);-moz-transform:scale(-1,1);-ms-transform:scale(-1,1);-o-transform:scale(-1,1);transform:scale(-1,1)}.fa-flip-vertical{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1);-webkit-transform:scale(1,-1);-moz-transform:scale(1,-1);-ms-transform:scale(1,-1);-o-transform:scale(1,-1);transform:scale(1,-1)}.fa-stack{position:relative;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:"\f000"}.fa-music:before{content:"\f001"}.fa-search:before{content:"\f002"}.fa-envelope-o:before{content:"\f003"}.fa-heart:before{content:"\f004"}.fa-star:before{content:"\f005"}.fa-star-o:before{content:"\f006"}.fa-user:before{content:"\f007"}.fa-film:before{content:"\f008"}.fa-th-large:before{content:"\f009"}.fa-th:before{content:"\f00a"}.fa-th-list:before{content:"\f00b"}.fa-check:before{content:"\f00c"}.fa-times:before{content:"\f00d"}.fa-search-plus:before{content:"\f00e"}.fa-search-minus:before{content:"\f010"}.fa-power-off:before{content:"\f011"}.fa-signal:before{content:"\f012"}.fa-cog:before,.fa-gear:before{content:"\f013"}.fa-trash-o:before{content:"\f014"}.fa-home:before{content:"\f015"}.fa-file-o:before{content:"\f016"}.fa-clock-o:before{content:"\f017"}.fa-road:before{content:"\f018"}.fa-download:before{content:"\f019"}.fa-arrow-circle-o-down:before{content:"\f01a"}.fa-arrow-circle-o-up:before{content:"\f01b"}.fa-inbox:before{content:"\f01c"}.fa-play-circle-o:before{content:"\f01d"}.fa-repeat:before,.fa-rotate-right:before{content:"\f01e"}.fa-refresh:before{content:"\f021"}.fa-list-alt:before{content:"\f022"}.fa-lock:before{content:"\f023"}.fa-flag:before{content:"\f024"}.fa-headphones:before{content:"\f025"}.fa-volume-off:before{content:"\f026"}.fa-volume-down:before{content:"\f027"}.fa-volume-up:before{content:"\f028"}.fa-qrcode:before{content:"\f029"}.fa-barcode:before{content:"\f02a"}.fa-tag:before{content:"\f02b"}.fa-tags:before{content:"\f02c"}.fa-book:before{content:"\f02d"}.fa-bookmark:before{content:"\f02e"}.fa-print:before{content:"\f02f"}.fa-camera:before{content:"\f030"}.fa-font:before{content:"\f031"}.fa-bold:before{content:"\f032"}.fa-italic:before{content:"\f033"}.fa-text-height:before{content:"\f034"}.fa-text-width:before{content:"\f035"}.fa-align-left:before{content:"\f036"}.fa-align-center:before{content:"\f037"}.fa-align-right:before{content:"\f038"}.fa-align-justify:before{content:"\f039"}.fa-list:before{content:"\f03a"}.fa-dedent:before,.fa-outdent:before{content:"\f03b"}.fa-indent:before{content:"\f03c"}.fa-video-camera:before{content:"\f03d"}.fa-image:before,.fa-photo:before,.fa-picture-o:before{content:"\f03e"}.fa-pencil:before{content:"\f040"}.fa-map-marker:before{content:"\f041"}.fa-adjust:before{content:"\f042"}.fa-tint:before{content:"\f043"}.fa-edit:before,.fa-pencil-square-o:before{content:"\f044"}.fa-share-square-o:before{content:"\f045"}.fa-check-square-o:before{content:"\f046"}.fa-arrows:before{content:"\f047"}.fa-step-backward:before{content:"\f048"}.fa-fast-backward:before{content:"\f049"}.fa-backward:before{content:"\f04a"}.fa-play:before{content:"\f04b"}.fa-pause:before{content:"\f04c"}.fa-stop:before{content:"\f04d"}.fa-forward:before{content:"\f04e"}.fa-fast-forward:before{content:"\f050"}.fa-step-forward:before{content:"\f051"}.fa-eject:before{content:"\f052"}.fa-chevron-left:before{content:"\f053"}.fa-chevron-right:before{content:"\f054"}.fa-plus-circle:before{content:"\f055"}.fa-minus-circle:before{content:"\f056"}.fa-times-circle:before{content:"\f057"}.fa-check-circle:before{content:"\f058"}.fa-question-circle:before{content:"\f059"}.fa-info-circle:before{content:"\f05a"}.fa-crosshairs:before{content:"\f05b"}.fa-times-circle-o:before{content:"\f05c"}.fa-check-circle-o:before{content:"\f05d"}.fa-ban:before{content:"\f05e"}.fa-arrow-left:before{content:"\f060"}.fa-arrow-right:before{content:"\f061"}.fa-arrow-up:before{content:"\f062"}.fa-arrow-down:before{content:"\f063"}.fa-mail-forward:before,.fa-share:before{content:"\f064"}.fa-expand:before{content:"\f065"}.fa-compress:before{content:"\f066"}.fa-plus:before{content:"\f067"}.fa-minus:before{content:"\f068"}.fa-asterisk:before{content:"\f069"}.fa-exclamation-circle:before{content:"\f06a"}.fa-gift:before{content:"\f06b"}.fa-leaf:before{content:"\f06c"}.fa-fire:before{content:"\f06d"}.fa-eye:before{content:"\f06e"}.fa-eye-slash:before{content:"\f070"}.fa-exclamation-triangle:before,.fa-warning:before{content:"\f071"}.fa-plane:before{content:"\f072"}.fa-calendar:before{content:"\f073"}.fa-random:before{content:"\f074"}.fa-comment:before{content:"\f075"}.fa-magnet:before{content:"\f076"}.fa-chevron-up:before{content:"\f077"}.fa-chevron-down:before{content:"\f078"}.fa-retweet:before{content:"\f079"}.fa-shopping-cart:before{content:"\f07a"}.fa-folder:before{content:"\f07b"}.fa-folder-open:before{content:"\f07c"}.fa-arrows-v:before{content:"\f07d"}.fa-arrows-h:before{content:"\f07e"}.fa-bar-chart-o:before{content:"\f080"}.fa-twitter-square:before{content:"\f081"}.fa-facebook-square:before{content:"\f082"}.fa-camera-retro:before{content:"\f083"}.fa-key:before{content:"\f084"}.fa-cogs:before,.fa-gears:before{content:"\f085"}.fa-comments:before{content:"\f086"}.fa-thumbs-o-up:before{content:"\f087"}.fa-thumbs-o-down:before{content:"\f088"}.fa-star-half:before{content:"\f089"}.fa-heart-o:before{content:"\f08a"}.fa-sign-out:before{content:"\f08b"}.fa-linkedin-square:before{content:"\f08c"}.fa-thumb-tack:before{content:"\f08d"}.fa-external-link:before{content:"\f08e"}.fa-sign-in:before{content:"\f090"}.fa-trophy:before{content:"\f091"}.fa-github-square:before{content:"\f092"}.fa-upload:before{content:"\f093"}.fa-lemon-o:before{content:"\f094"}.fa-phone:before{content:"\f095"}.fa-square-o:before{content:"\f096"}.fa-bookmark-o:before{content:"\f097"}.fa-phone-square:before{content:"\f098"}.fa-twitter:before{content:"\f099"}.fa-facebook:before{content:"\f09a"}.fa-github:before{content:"\f09b"}.fa-unlock:before{content:"\f09c"}.fa-credit-card:before{content:"\f09d"}.fa-rss:before{content:"\f09e"}.fa-hdd-o:before{content:"\f0a0"}.fa-bullhorn:before{content:"\f0a1"}.fa-bell:before{content:"\f0f3"}.fa-certificate:before{content:"\f0a3"}.fa-hand-o-right:before{content:"\f0a4"}.fa-hand-o-left:before{content:"\f0a5"}.fa-hand-o-up:before{content:"\f0a6"}.fa-hand-o-down:before{content:"\f0a7"}.fa-arrow-circle-left:before{content:"\f0a8"}.fa-arrow-circle-right:before{content:"\f0a9"}.fa-arrow-circle-up:before{content:"\f0aa"}.fa-arrow-circle-down:before{content:"\f0ab"}.fa-globe:before{content:"\f0ac"}.fa-wrench:before{content:"\f0ad"}.fa-tasks:before{content:"\f0ae"}.fa-filter:before{content:"\f0b0"}.fa-briefcase:before{content:"\f0b1"}.fa-arrows-alt:before{content:"\f0b2"}.fa-group:before,.fa-users:before{content:"\f0c0"}.fa-chain:before,.fa-link:before{content:"\f0c1"}.fa-cloud:before{content:"\f0c2"}.fa-flask:before{content:"\f0c3"}.fa-cut:before,.fa-scissors:before{content:"\f0c4"}.fa-copy:before,.fa-files-o:before{content:"\f0c5"}.fa-paperclip:before{content:"\f0c6"}.fa-floppy-o:before,.fa-save:before{content:"\f0c7"}.fa-square:before{content:"\f0c8"}.fa-bars:before,.fa-navicon:before,.fa-reorder:before{content:"\f0c9"}.fa-list-ul:before{content:"\f0ca"}.fa-list-ol:before{content:"\f0cb"}.fa-strikethrough:before{content:"\f0cc"}.fa-underline:before{content:"\f0cd"}.fa-table:before{content:"\f0ce"}.fa-magic:before{content:"\f0d0"}.fa-truck:before{content:"\f0d1"}.fa-pinterest:before{content:"\f0d2"}.fa-pinterest-square:before{content:"\f0d3"}.fa-google-plus-square:before{content:"\f0d4"}.fa-google-plus:before{content:"\f0d5"}.fa-money:before{content:"\f0d6"}.fa-caret-down:before{content:"\f0d7"}.fa-caret-up:before{content:"\f0d8"}.fa-caret-left:before{content:"\f0d9"}.fa-caret-right:before{content:"\f0da"}.fa-columns:before{content:"\f0db"}.fa-sort:before,.fa-unsorted:before{content:"\f0dc"}.fa-sort-desc:before,.fa-sort-down:before{content:"\f0dd"}.fa-sort-asc:before,.fa-sort-up:before{content:"\f0de"}.fa-envelope:before{content:"\f0e0"}.fa-linkedin:before{content:"\f0e1"}.fa-rotate-left:before,.fa-undo:before{content:"\f0e2"}.fa-gavel:before,.fa-legal:before{content:"\f0e3"}.fa-dashboard:before,.fa-tachometer:before{content:"\f0e4"}.fa-comment-o:before{content:"\f0e5"}.fa-comments-o:before{content:"\f0e6"}.fa-bolt:before,.fa-flash:before{content:"\f0e7"}.fa-sitemap:before{content:"\f0e8"}.fa-umbrella:before{content:"\f0e9"}.fa-clipboard:before,.fa-paste:before{content:"\f0ea"}.fa-lightbulb-o:before{content:"\f0eb"}.fa-exchange:before{content:"\f0ec"}.fa-cloud-download:before{content:"\f0ed"}.fa-cloud-upload:before{content:"\f0ee"}.fa-user-md:before{content:"\f0f0"}.fa-stethoscope:before{content:"\f0f1"}.fa-suitcase:before{content:"\f0f2"}.fa-bell-o:before{content:"\f0a2"}.fa-coffee:before{content:"\f0f4"}.fa-cutlery:before{content:"\f0f5"}.fa-file-text-o:before{content:"\f0f6"}.fa-building-o:before{content:"\f0f7"}.fa-hospital-o:before{content:"\f0f8"}.fa-ambulance:before{content:"\f0f9"}.fa-medkit:before{content:"\f0fa"}.fa-fighter-jet:before{content:"\f0fb"}.fa-beer:before{content:"\f0fc"}.fa-h-square:before{content:"\f0fd"}.fa-plus-square:before{content:"\f0fe"}.fa-angle-double-left:before{content:"\f100"}.fa-angle-double-right:before{content:"\f101"}.fa-angle-double-up:before{content:"\f102"}.fa-angle-double-down:before{content:"\f103"}.fa-angle-left:before{content:"\f104"}.fa-angle-right:before{content:"\f105"}.fa-angle-up:before{content:"\f106"}.fa-angle-down:before{content:"\f107"}.fa-desktop:before{content:"\f108"}.fa-laptop:before{content:"\f109"}.fa-tablet:before{content:"\f10a"}.fa-mobile-phone:before,.fa-mobile:before{content:"\f10b"}.fa-circle-o:before{content:"\f10c"}.fa-quote-left:before{content:"\f10d"}.fa-quote-right:before{content:"\f10e"}.fa-spinner:before{content:"\f110"}.fa-circle:before{content:"\f111"}.fa-mail-reply:before,.fa-reply:before{content:"\f112"}.fa-github-alt:before{content:"\f113"}.fa-folder-o:before{content:"\f114"}.fa-folder-open-o:before{content:"\f115"}.fa-smile-o:before{content:"\f118"}.fa-frown-o:before{content:"\f119"}.fa-meh-o:before{content:"\f11a"}.fa-gamepad:before{content:"\f11b"}.fa-keyboard-o:before{content:"\f11c"}.fa-flag-o:before{content:"\f11d"}.fa-flag-checkered:before{content:"\f11e"}.fa-terminal:before{content:"\f120"}.fa-code:before{content:"\f121"}.fa-mail-reply-all:before,.fa-reply-all:before{content:"\f122"}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:"\f123"}.fa-location-arrow:before{content:"\f124"}.fa-crop:before{content:"\f125"}.fa-code-fork:before{content:"\f126"}.fa-chain-broken:before,.fa-unlink:before{content:"\f127"}.fa-question:before{content:"\f128"}.fa-info:before{content:"\f129"}.fa-exclamation:before{content:"\f12a"}.fa-superscript:before{content:"\f12b"}.fa-subscript:before{content:"\f12c"}.fa-eraser:before{content:"\f12d"}.fa-puzzle-piece:before{content:"\f12e"}.fa-microphone:before{content:"\f130"}.fa-microphone-slash:before{content:"\f131"}.fa-shield:before{content:"\f132"}.fa-calendar-o:before{content:"\f133"}.fa-fire-extinguisher:before{content:"\f134"}.fa-rocket:before{content:"\f135"}.fa-maxcdn:before{content:"\f136"}.fa-chevron-circle-left:before{content:"\f137"}.fa-chevron-circle-right:before{content:"\f138"}.fa-chevron-circle-up:before{content:"\f139"}.fa-chevron-circle-down:before{content:"\f13a"}.fa-html5:before{content:"\f13b"}.fa-css3:before{content:"\f13c"}.fa-anchor:before{content:"\f13d"}.fa-unlock-alt:before{content:"\f13e"}.fa-bullseye:before{content:"\f140"}.fa-ellipsis-h:before{content:"\f141"}.fa-ellipsis-v:before{content:"\f142"}.fa-rss-square:before{content:"\f143"}.fa-play-circle:before{content:"\f144"}.fa-ticket:before{content:"\f145"}.fa-minus-square:before{content:"\f146"}.fa-minus-square-o:before{content:"\f147"}.fa-level-up:before{content:"\f148"}.fa-level-down:before{content:"\f149"}.fa-check-square:before{content:"\f14a"}.fa-pencil-square:before{content:"\f14b"}.fa-external-link-square:before{content:"\f14c"}.fa-share-square:before{content:"\f14d"}.fa-compass:before{content:"\f14e"}.fa-caret-square-o-down:before,.fa-toggle-down:before{content:"\f150"}.fa-caret-square-o-up:before,.fa-toggle-up:before{content:"\f151"}.fa-caret-square-o-right:before,.fa-toggle-right:before{content:"\f152"}.fa-eur:before,.fa-euro:before{content:"\f153"}.fa-gbp:before{content:"\f154"}.fa-dollar:before,.fa-usd:before{content:"\f155"}.fa-inr:before,.fa-rupee:before{content:"\f156"}.fa-cny:before,.fa-jpy:before,.fa-rmb:before,.fa-yen:before{content:"\f157"}.fa-rouble:before,.fa-rub:before,.fa-ruble:before{content:"\f158"}.fa-krw:before,.fa-won:before{content:"\f159"}.fa-bitcoin:before,.fa-btc:before{content:"\f15a"}.fa-file:before{content:"\f15b"}.fa-file-text:before{content:"\f15c"}.fa-sort-alpha-asc:before{content:"\f15d"}.fa-sort-alpha-desc:before{content:"\f15e"}.fa-sort-amount-asc:before{content:"\f160"}.fa-sort-amount-desc:before{content:"\f161"}.fa-sort-numeric-asc:before{content:"\f162"}.fa-sort-numeric-desc:before{content:"\f163"}.fa-thumbs-up:before{content:"\f164"}.fa-thumbs-down:before{content:"\f165"}.fa-youtube-square:before{content:"\f166"}.fa-youtube:before{content:"\f167"}.fa-xing:before{content:"\f168"}.fa-xing-square:before{content:"\f169"}.fa-youtube-play:before{content:"\f16a"}.fa-dropbox:before{content:"\f16b"}.fa-stack-overflow:before{content:"\f16c"}.fa-instagram:before{content:"\f16d"}.fa-flickr:before{content:"\f16e"}.fa-adn:before{content:"\f170"}.fa-bitbucket:before{content:"\f171"}.fa-bitbucket-square:before{content:"\f172"}.fa-tumblr:before{content:"\f173"}.fa-tumblr-square:before{content:"\f174"}.fa-long-arrow-down:before{content:"\f175"}.fa-long-arrow-up:before{content:"\f176"}.fa-long-arrow-left:before{content:"\f177"}.fa-long-arrow-right:before{content:"\f178"}.fa-apple:before{content:"\f179"}.fa-windows:before{content:"\f17a"}.fa-android:before{content:"\f17b"}.fa-linux:before{content:"\f17c"}.fa-dribbble:before{content:"\f17d"}.fa-skype:before{content:"\f17e"}.fa-foursquare:before{content:"\f180"}.fa-trello:before{content:"\f181"}.fa-female:before{content:"\f182"}.fa-male:before{content:"\f183"}.fa-gittip:before{content:"\f184"}.fa-sun-o:before{content:"\f185"}.fa-moon-o:before{content:"\f186"}.fa-archive:before{content:"\f187"}.fa-bug:before{content:"\f188"}.fa-vk:before{content:"\f189"}.fa-weibo:before{content:"\f18a"}.fa-renren:before{content:"\f18b"}.fa-pagelines:before{content:"\f18c"}.fa-stack-exchange:before{content:"\f18d"}.fa-arrow-circle-o-right:before{content:"\f18e"}.fa-arrow-circle-o-left:before{content:"\f190"}.fa-caret-square-o-left:before,.fa-toggle-left:before{content:"\f191"}.fa-dot-circle-o:before{content:"\f192"}.fa-wheelchair:before{content:"\f193"}.fa-vimeo-square:before{content:"\f194"}.fa-try:before,.fa-turkish-lira:before{content:"\f195"}.fa-plus-square-o:before{content:"\f196"}.fa-space-shuttle:before{content:"\f197"}.fa-slack:before{content:"\f198"}.fa-envelope-square:before{content:"\f199"}.fa-wordpress:before{content:"\f19a"}.fa-openid:before{content:"\f19b"}.fa-bank:before,.fa-institution:before,.fa-university:before{content:"\f19c"}.fa-graduation-cap:before,.fa-mortar-board:before{content:"\f19d"}.fa-yahoo:before{content:"\f19e"}.fa-google:before{content:"\f1a0"}.fa-reddit:before{content:"\f1a1"}.fa-reddit-square:before{content:"\f1a2"}.fa-stumbleupon-circle:before{content:"\f1a3"}.fa-stumbleupon:before{content:"\f1a4"}.fa-delicious:before{content:"\f1a5"}.fa-digg:before{content:"\f1a6"}.fa-pied-piper-square:before,.fa-pied-piper:before{content:"\f1a7"}.fa-pied-piper-alt:before{content:"\f1a8"}.fa-drupal:before{content:"\f1a9"}.fa-joomla:before{content:"\f1aa"}.fa-language:before{content:"\f1ab"}.fa-fax:before{content:"\f1ac"}.fa-building:before{content:"\f1ad"}.fa-child:before{content:"\f1ae"}.fa-paw:before{content:"\f1b0"}.fa-spoon:before{content:"\f1b1"}.fa-cube:before{content:"\f1b2"}.fa-cubes:before{content:"\f1b3"}.fa-behance:before{content:"\f1b4"}.fa-behance-square:before{content:"\f1b5"}.fa-steam:before{content:"\f1b6"}.fa-steam-square:before{content:"\f1b7"}.fa-recycle:before{content:"\f1b8"}.fa-automobile:before,.fa-car:before{content:"\f1b9"}.fa-cab:before,.fa-taxi:before{content:"\f1ba"}.fa-tree:before{content:"\f1bb"}.fa-spotify:before{content:"\f1bc"}.fa-deviantart:before{content:"\f1bd"}.fa-soundcloud:before{content:"\f1be"}.fa-database:before{content:"\f1c0"}.fa-file-pdf-o:before{content:"\f1c1"}.fa-file-word-o:before{content:"\f1c2"}.fa-file-excel-o:before{content:"\f1c3"}.fa-file-powerpoint-o:before{content:"\f1c4"}.fa-file-image-o:before,.fa-file-photo-o:before,.fa-file-picture-o:before{content:"\f1c5"}.fa-file-archive-o:before,.fa-file-zip-o:before{content:"\f1c6"}.fa-file-audio-o:before,.fa-file-sound-o:before{content:"\f1c7"}.fa-file-movie-o:before,.fa-file-video-o:before{content:"\f1c8"}.fa-file-code-o:before{content:"\f1c9"}.fa-vine:before{content:"\f1ca"}.fa-codepen:before{content:"\f1cb"}.fa-jsfiddle:before{content:"\f1cc"}.fa-life-bouy:before,.fa-life-ring:before,.fa-life-saver:before,.fa-support:before{content:"\f1cd"}.fa-circle-o-notch:before{content:"\f1ce"}.fa-ra:before,.fa-rebel:before{content:"\f1d0"}.fa-empire:before,.fa-ge:before{content:"\f1d1"}.fa-git-square:before{content:"\f1d2"}.fa-git:before{content:"\f1d3"}.fa-hacker-news:before{content:"\f1d4"}.fa-tencent-weibo:before{content:"\f1d5"}.fa-qq:before{content:"\f1d6"}.fa-wechat:before,.fa-weixin:before{content:"\f1d7"}.fa-paper-plane:before,.fa-send:before{content:"\f1d8"}.fa-paper-plane-o:before,.fa-send-o:before{content:"\f1d9"}.fa-history:before{content:"\f1da"}.fa-circle-thin:before{content:"\f1db"}.fa-header:before{content:"\f1dc"}.fa-paragraph:before{content:"\f1dd"}.fa-sliders:before{content:"\f1de"}.fa-share-alt:before{content:"\f1e0"}.fa-share-alt-square:before{content:"\f1e1"}.fa-bomb:before{content:"\f1e2"}.book-langs-index{width:100%;height:100%;padding:40px 0;margin:0;overflow:auto}@media (max-width:600px){.book-langs-index{padding:0}}.book-langs-index .inner{max-width:600px;width:100%;margin:0 auto;padding:30px;background:#fff;border-radius:3px}.book-langs-index .inner h3{margin:0}.book-langs-index .inner .languages{list-style:none;padding:20px 30px;margin-top:20px;border-top:1px solid #eee}.book-langs-index .inner .languages:after,.book-langs-index .inner .languages:before{content:" ";display:table;line-height:0}.book-langs-index .inner .languages li{width:50%;float:left;padding:10px 5px;font-size:16px}@media (max-width:600px){.book-langs-index .inner .languages li{width:100%;max-width:100%}}.book .book-header{overflow:visible;height:50px;padding:0 8px;z-index:2;font-size:.85em;color:#7e888b;background:0 0}.book .book-header .btn{display:block;height:50px;padding:0 15px;border-bottom:none;color:#ccc;text-transform:uppercase;line-height:50px;-webkit-box-shadow:none!important;box-shadow:none!important;position:relative;font-size:14px}.book .book-header .btn:hover{position:relative;text-decoration:none;color:#444;background:0 0}.book .book-header h1{margin:0;font-size:20px;font-weight:200;text-align:center;line-height:50px;opacity:0;padding-left:200px;padding-right:200px;-webkit-transition:opacity .2s ease;-moz-transition:opacity .2s ease;-o-transition:opacity .2s ease;transition:opacity .2s ease;overflow:hidden;text-overflow:ellipsis;white-space:nowrap}.book .book-header h1 a,.book .book-header h1 a:hover{color:inherit;text-decoration:none}@media (max-width:1000px){.book .book-header h1{display:none}}.book .book-header h1 i{display:none}.book .book-header:hover h1{opacity:1}.book.is-loading .book-header h1 i{display:inline-block}.book.is-loading .book-header h1 a{display:none}.dropdown{position:relative}.dropdown-menu{position:absolute;top:100%;left:0;z-index:100;display:none;float:left;min-width:160px;padding:0;margin:2px 0 0;list-style:none;font-size:14px;background-color:#fafafa;border:1px solid rgba(0,0,0,.07);border-radius:1px;-webkit-box-shadow:0 6px 12px rgba(0,0,0,.175);box-shadow:0 6px 12px rgba(0,0,0,.175);background-clip:padding-box}.dropdown-menu.open{display:block}.dropdown-menu.dropdown-left{left:auto;right:4%}.dropdown-menu.dropdown-left .dropdown-caret{right:14px;left:auto}.dropdown-menu .dropdown-caret{position:absolute;top:-8px;left:14px;width:18px;height:10px;float:left;overflow:hidden}.dropdown-menu .dropdown-caret .caret-inner,.dropdown-menu .dropdown-caret .caret-outer{display:inline-block;top:0;border-left:9px solid transparent;border-right:9px solid transparent;position:absolute}.dropdown-menu .dropdown-caret .caret-outer{border-bottom:9px solid rgba(0,0,0,.1);height:auto;left:0;width:auto;margin-left:-1px}.dropdown-menu .dropdown-caret .caret-inner{margin-top:-1px;top:1px;border-bottom:9px solid #fafafa}.dropdown-menu .buttons{border-bottom:1px solid rgba(0,0,0,.07)}.dropdown-menu .buttons:after,.dropdown-menu .buttons:before{content:" ";display:table;line-height:0}.dropdown-menu .buttons:last-child{border-bottom:none}.dropdown-menu .buttons .button{border:0;background-color:transparent;color:#a6a6a6;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.alert,.dropdown-menu .buttons .button:hover{color:#444}.dropdown-menu .buttons .button:focus,.dropdown-menu .buttons .button:hover{outline:0}.dropdown-menu .buttons .button.size-2{width:50%}.dropdown-menu .buttons .button.size-3{width:33%}.alert{padding:15px;margin-bottom:20px;background:#eee;border-bottom:5px solid #ddd}.alert-success{background:#dff0d8;border-color:#d6e9c6;color:#3c763d}.alert-info{background:#d9edf7;border-color:#bce8f1;color:#31708f}.alert-danger{background:#f2dede;border-color:#ebccd1;color:#a94442}.alert-warning{background:#fcf8e3;border-color:#faebcc;color:#8a6d3b}.book .book-summary{position:absolute;top:0;left:-300px;bottom:0;z-index:1;width:300px;color:#364149;background:#fafafa;border-right:1px solid rgba(0,0,0,.07);-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-summary ul.summary{position:absolute;top:0;left:0;right:0;bottom:0;overflow-y:auto;list-style:none;margin:0;padding:0;-webkit-transition:top .5s ease;-moz-transition:top .5s ease;-o-transition:top .5s ease;transition:top .5s ease}.book .book-summary ul.summary li{list-style:none}.book .book-summary ul.summary li.divider{height:1px;margin:7px 0;overflow:hidden;background:rgba(0,0,0,.07)}.book .book-summary ul.summary li i.fa-check{display:none;position:absolute;right:9px;top:16px;font-size:9px;color:#3c3}.book .book-summary ul.summary li.done>a{color:#364149;font-weight:400}.book .book-summary ul.summary li.done>a i{display:inline}.book .book-summary ul.summary li a,.book .book-summary ul.summary li span{display:block;padding:10px 15px;border-bottom:none;color:#364149;background:0 0;text-overflow:ellipsis;overflow:hidden;white-space:nowrap;position:relative}.book .book-summary ul.summary li span{cursor:not-allowed;opacity:.3;filter:alpha(opacity=30)}.book .book-summary ul.summary li a:hover,.book .book-summary ul.summary li.active>a{color:#008cff;background:0 0;text-decoration:none}.book .book-summary ul.summary li ul{padding-left:20px}@media (max-width:600px){.book .book-summary{width:calc(100% - 60px);bottom:0;left:-100%}}.book.with-summary .book-summary{left:0}.book.without-animation .book-summary{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.book{position:relative;width:100%;height:100%}.book .book-body,.book .book-body .body-inner{position:absolute;top:0;left:0;overflow-y:auto;bottom:0;right:0}.book .book-body{color:#000;background:#fff;-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-body .page-wrapper{position:relative;outline:0}.book .book-body .page-wrapper .page-inner{max-width:800px;margin:0 auto;padding:20px 0 40px}.book .book-body .page-wrapper .page-inner section{margin:0;padding:5px 15px;background:#fff;border-radius:2px;line-height:1.7;font-size:1.6rem}.book .book-body .page-wrapper .page-inner .btn-group .btn{border-radius:0;background:#eee;border:0}@media (max-width:1240px){.book .book-body{-webkit-transition:-webkit-transform 250ms ease;-moz-transition:-moz-transform 250ms ease;-o-transition:-o-transform 250ms ease;transition:transform 250ms ease;padding-bottom:20px}.book .book-body .body-inner{position:static;min-height:calc(100% - 50px)}}@media (min-width:600px){.book.with-summary .book-body{left:300px}}@media (max-width:600px){.book.with-summary{overflow:hidden}.book.with-summary .book-body{-webkit-transform:translate(calc(100% - 60px),0);-moz-transform:translate(calc(100% - 60px),0);-ms-transform:translate(calc(100% - 60px),0);-o-transform:translate(calc(100% - 60px),0);transform:translate(calc(100% - 60px),0)}}.book.without-animation .book-body{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.buttons:after,.buttons:before{content:" ";display:table;line-height:0}.button{border:0;background:#eee;color:#666;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.button:hover{color:#444}.button:focus,.button:hover{outline:0}.button.size-2{width:50%}.button.size-3{width:33%}.book .book-body .page-wrapper .page-inner section{display:none}.book .book-body .page-wrapper .page-inner section.normal{display:block;word-wrap:break-word;overflow:hidden;color:#333;line-height:1.7;text-size-adjust:100%;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;-moz-text-size-adjust:100%}.book .book-body .page-wrapper .page-inner section.normal *{box-sizing:border-box;-webkit-box-sizing:border-box;}.book .book-body .page-wrapper .page-inner section.normal>:first-child{margin-top:0!important}.book .book-body .page-wrapper .page-inner section.normal>:last-child{margin-bottom:0!important}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal figure,.book .book-body .page-wrapper .page-inner section.normal img,.book .book-body .page-wrapper .page-inner section.normal pre,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal tr{page-break-inside:avoid}.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal p{orphans:3;widows:3}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5{page-break-after:avoid}.book .book-body .page-wrapper .page-inner section.normal b,.book .book-body .page-wrapper .page-inner section.normal strong{font-weight:700}.book .book-body .page-wrapper .page-inner section.normal em{font-style:italic}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal dl,.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal p,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal ul{margin-top:0;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal a{color:#4183c4;text-decoration:none;background:0 0}.book .book-body .page-wrapper .page-inner section.normal a:active,.book .book-body .page-wrapper .page-inner section.normal a:focus,.book .book-body .page-wrapper .page-inner section.normal a:hover{outline:0;text-decoration:underline}.book .book-body .page-wrapper .page-inner section.normal img{border:0;max-width:100%}.book .book-body .page-wrapper .page-inner section.normal hr{height:4px;padding:0;margin:1.7em 0;overflow:hidden;background-color:#e7e7e7;border:none}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book .book-body .page-wrapper .page-inner section.normal hr:before{display:table;content:" "}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal h6{margin-top:1.275em;margin-bottom:.85em;}.book .book-body .page-wrapper .page-inner section.normal h1{font-size:2em}.book .book-body .page-wrapper .page-inner section.normal h2{font-size:1.75em}.book .book-body .page-wrapper .page-inner section.normal h3{font-size:1.5em}.book .book-body .page-wrapper .page-inner section.normal h4{font-size:1.25em}.book .book-body .page-wrapper .page-inner section.normal h5{font-size:1em}.book .book-body .page-wrapper .page-inner section.normal h6{font-size:1em;color:#777}.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal pre{font-family:Consolas,"Liberation Mono",Menlo,Courier,monospace;direction:ltr;border:none;color:inherit}.book .book-body .page-wrapper .page-inner section.normal pre{overflow:auto;word-wrap:normal;margin:0 0 1.275em;padding:.85em 1em;background:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal pre>code{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;font-size:.85em;white-space:pre;background:0 0}.book .book-body .page-wrapper .page-inner section.normal pre>code:after,.book .book-body .page-wrapper .page-inner section.normal pre>code:before{content:normal}.book .book-body .page-wrapper .page-inner section.normal code{padding:.2em;margin:0;font-size:.85em;background-color:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal code:after,.book .book-body .page-wrapper .page-inner section.normal code:before{letter-spacing:-.2em;content:"\00a0"}.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal ul{padding:0 0 0 2em;margin:0 0 .85em}.book .book-body .page-wrapper .page-inner section.normal ol ol,.book .book-body .page-wrapper .page-inner section.normal ol ul,.book .book-body .page-wrapper .page-inner section.normal ul ol,.book .book-body .page-wrapper .page-inner section.normal ul ul{margin-top:0;margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal ol ol{list-style-type:lower-roman}.book .book-body .page-wrapper .page-inner section.normal blockquote{margin:0 0 .85em;padding:0 15px;opacity:0.75;border-left:4px solid #dcdcdc}.book .book-body .page-wrapper .page-inner section.normal blockquote:first-child{margin-top:0}.book .book-body .page-wrapper .page-inner section.normal blockquote:last-child{margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal dl{padding:0}.book .book-body .page-wrapper .page-inner section.normal dl dt{padding:0;margin-top:.85em;font-style:italic;font-weight:700}.book .book-body .page-wrapper .page-inner section.normal dl dd{padding:0 .85em;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal dd{margin-left:0}.book .book-body .page-wrapper .page-inner section.normal .glossary-term{cursor:help;text-decoration:underline}.book .book-body .navigation{position:absolute;top:50px;bottom:0;margin:0;max-width:150px;min-width:90px;display:flex;justify-content:center;align-content:center;flex-direction:column;font-size:40px;color:#ccc;text-align:center;-webkit-transition:all 350ms ease;-moz-transition:all 350ms ease;-o-transition:all 350ms ease;transition:all 350ms ease}.book .book-body .navigation:hover{text-decoration:none;color:#444}.book .book-body .navigation.navigation-next{right:0}.book .book-body .navigation.navigation-prev{left:0}@media (max-width:1240px){.book .book-body .navigation{position:static;top:auto;max-width:50%;width:50%;display:inline-block;float:left}.book .book-body .navigation.navigation-unique{max-width:100%;width:100%}}.book .book-body .page-wrapper .page-inner section.glossary{margin-bottom:40px}.book .book-body .page-wrapper .page-inner section.glossary h2 a,.book .book-body .page-wrapper .page-inner section.glossary h2 a:hover{color:inherit;text-decoration:none}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index{list-style:none;margin:0;padding:0}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index li{display:inline;margin:0 8px;white-space:nowrap}*{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;-webkit-overflow-scrolling:touch;-webkit-tap-highlight-color:transparent;-webkit-text-size-adjust:none;-webkit-touch-callout:none}a{text-decoration:none}body,html{height:100%}html{font-size:62.5%}body{text-rendering:optimizeLegibility;font-smoothing:antialiased;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:.2px;text-size-adjust:100%}
+.book .book-summary ul.summary li a span {display:inline;padding:initial;overflow:visible;cursor:auto;opacity:1;}
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js
new file mode 100644
index 000000000..9ace197e9
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js
@@ -0,0 +1,6 @@
+(function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof require=="function"&&require;if(!u&&a)return a(o,!0);if(i)return i(o,!0);var f=new Error("Cannot find module '"+o+"'");throw f.code="MODULE_NOT_FOUND",f}var l=n[o]={exports:{}};t[o][0].call(l.exports,function(e){var n=t[o][1][e];return s(n?n:e)},l,l.exports,e,t,n,r)}return n[o].exports}var i=typeof require=="function"&&require;for(var o=0;o<r.length;o++)s(r[o]);return s})({1:[function(require,module,exports){if(typeof module==="object"&&typeof module.exports==="object"){module.exports=jQuery}},{}],2:[function(require,module,exports){(function(global){(function(){var undefined;var VERSION="3.10.1";var BIND_FLAG=1,BIND_KEY_FLAG=2,CURRY_BOUND_FLAG=4,CURRY_FLAG=8,CURRY_RIGHT_FLAG=16,PARTIAL_FLAG=32,PARTIAL_RIGHT_FLAG=64,ARY_FLAG=128,REARG_FLAG=256;var DEFAULT_TRUNC_LENGTH=30,DEFAULT_TRUNC_OMISSION="...";var HOT_COUNT=150,HOT_SPAN=16;var LARGE_ARRAY_SIZE=200;var LAZY_FILTER_FLAG=1,LAZY_MAP_FLAG=2;var FUNC_ERROR_TEXT="Expected a function";var PLACEHOLDER="__lodash_placeholder__";var argsTag="[object Arguments]",arrayTag="[object Array]",boolTag="[object Boolean]",dateTag="[object Date]",errorTag="[object Error]",funcTag="[object Function]",mapTag="[object Map]",numberTag="[object Number]",objectTag="[object Object]",regexpTag="[object RegExp]",setTag="[object Set]",stringTag="[object String]",weakMapTag="[object WeakMap]";var arrayBufferTag="[object ArrayBuffer]",float32Tag="[object Float32Array]",float64Tag="[object Float64Array]",int8Tag="[object Int8Array]",int16Tag="[object Int16Array]",int32Tag="[object Int32Array]",uint8Tag="[object Uint8Array]",uint8ClampedTag="[object Uint8ClampedArray]",uint16Tag="[object Uint16Array]",uint32Tag="[object Uint32Array]";var reEmptyStringLeading=/\b__p \+= '';/g,reEmptyStringMiddle=/\b(__p \+=) '' \+/g,reEmptyStringTrailing=/(__e\(.*?\)|\b__t\)) \+\n'';/g;var reEscapedHtml=/&(?:amp|lt|gt|quot|#39|#96);/g,reUnescapedHtml=/[&<>"'`]/g,reHasEscapedHtml=RegExp(reEscapedHtml.source),reHasUnescapedHtml=RegExp(reUnescapedHtml.source);var reEscape=/<%-([\s\S]+?)%>/g,reEvaluate=/<%([\s\S]+?)%>/g,reInterpolate=/<%=([\s\S]+?)%>/g;var reIsDeepProp=/\.|\[(?:[^[\]]*|(["'])(?:(?!\1)[^\n\\]|\\.)*?\1)\]/,reIsPlainProp=/^\w*$/,rePropName=/[^.[\]]+|\[(?:(-?\d+(?:\.\d+)?)|(["'])((?:(?!\2)[^\n\\]|\\.)*?)\2)\]/g;var reRegExpChars=/^[:!,]|[\\^$.*+?()[\]{}|\/]|(^[0-9a-fA-Fnrtuvx])|([\n\r\u2028\u2029])/g,reHasRegExpChars=RegExp(reRegExpChars.source);var reComboMark=/[\u0300-\u036f\ufe20-\ufe23]/g;var reEscapeChar=/\\(\\)?/g;var reEsTemplate=/\$\{([^\\}]*(?:\\.[^\\}]*)*)\}/g;var reFlags=/\w*$/;var reHasHexPrefix=/^0[xX]/;var reIsHostCtor=/^\[object .+?Constructor\]$/;var reIsUint=/^\d+$/;var reLatin1=/[\xc0-\xd6\xd8-\xde\xdf-\xf6\xf8-\xff]/g;var reNoMatch=/($^)/;var reUnescapedString=/['\n\r\u2028\u2029\\]/g;var reWords=function(){var upper="[A-Z\\xc0-\\xd6\\xd8-\\xde]",lower="[a-z\\xdf-\\xf6\\xf8-\\xff]+";return RegExp(upper+"+(?="+upper+lower+")|"+upper+"?"+lower+"|"+upper+"+|[0-9]+","g")}();var contextProps=["Array","ArrayBuffer","Date","Error","Float32Array","Float64Array","Function","Int8Array","Int16Array","Int32Array","Math","Number","Object","RegExp","Set","String","_","clearTimeout","isFinite","parseFloat","parseInt","setTimeout","TypeError","Uint8Array","Uint8ClampedArray","Uint16Array","Uint32Array","WeakMap"];var templateCounter=-1;var typedArrayTags={};typedArrayTags[float32Tag]=typedArrayTags[float64Tag]=typedArrayTags[int8Tag]=typedArrayTags[int16Tag]=typedArrayTags[int32Tag]=typedArrayTags[uint8Tag]=typedArrayTags[uint8ClampedTag]=typedArrayTags[uint16Tag]=typedArrayTags[uint32Tag]=true;typedArrayTags[argsTag]=typedArrayTags[arrayTag]=typedArrayTags[arrayBufferTag]=typedArrayTags[boolTag]=typedArrayTags[dateTag]=typedArrayTags[errorTag]=typedArrayTags[funcTag]=typedArrayTags[mapTag]=typedArrayTags[numberTag]=typedArrayTags[objectTag]=typedArrayTags[regexpTag]=typedArrayTags[setTag]=typedArrayTags[stringTag]=typedArrayTags[weakMapTag]=false;var cloneableTags={};cloneableTags[argsTag]=cloneableTags[arrayTag]=cloneableTags[arrayBufferTag]=cloneableTags[boolTag]=cloneableTags[dateTag]=cloneableTags[float32Tag]=cloneableTags[float64Tag]=cloneableTags[int8Tag]=cloneableTags[int16Tag]=cloneableTags[int32Tag]=cloneableTags[numberTag]=cloneableTags[objectTag]=cloneableTags[regexpTag]=cloneableTags[stringTag]=cloneableTags[uint8Tag]=cloneableTags[uint8ClampedTag]=cloneableTags[uint16Tag]=cloneableTags[uint32Tag]=true;cloneableTags[errorTag]=cloneableTags[funcTag]=cloneableTags[mapTag]=cloneableTags[setTag]=cloneableTags[weakMapTag]=false;var deburredLetters={"À":"A","Á":"A","Â":"A","Ã":"A","Ä":"A","Å":"A","à":"a","á":"a","â":"a","ã":"a","ä":"a","å":"a","Ç":"C","ç":"c","Ð":"D","ð":"d","È":"E","É":"E","Ê":"E","Ë":"E","è":"e","é":"e","ê":"e","ë":"e","Ì":"I","Í":"I","Î":"I","Ï":"I","ì":"i","í":"i","î":"i","ï":"i","Ñ":"N","ñ":"n","Ò":"O","Ó":"O","Ô":"O","Õ":"O","Ö":"O","Ø":"O","ò":"o","ó":"o","ô":"o","õ":"o","ö":"o","ø":"o","Ù":"U","Ú":"U","Û":"U","Ü":"U","ù":"u","ú":"u","û":"u","ü":"u","Ý":"Y","ý":"y","ÿ":"y","Æ":"Ae","æ":"ae","Þ":"Th","þ":"th","ß":"ss"};var htmlEscapes={"&":"&amp;","<":"&lt;",">":"&gt;",'"':"&quot;","'":"&#39;","`":"&#96;"};var htmlUnescapes={"&amp;":"&","&lt;":"<","&gt;":">","&quot;":'"',"&#39;":"'","&#96;":"`"};var objectTypes={"function":true,object:true};var regexpEscapes={0:"x30",1:"x31",2:"x32",3:"x33",4:"x34",5:"x35",6:"x36",7:"x37",8:"x38",9:"x39",A:"x41",B:"x42",C:"x43",D:"x44",E:"x45",F:"x46",a:"x61",b:"x62",c:"x63",d:"x64",e:"x65",f:"x66",n:"x6e",r:"x72",t:"x74",u:"x75",v:"x76",x:"x78"};var stringEscapes={"\\":"\\","'":"'","\n":"n","\r":"r","\u2028":"u2028","\u2029":"u2029"};var freeExports=objectTypes[typeof exports]&&exports&&!exports.nodeType&&exports;var freeModule=objectTypes[typeof module]&&module&&!module.nodeType&&module;var freeGlobal=freeExports&&freeModule&&typeof global=="object"&&global&&global.Object&&global;var freeSelf=objectTypes[typeof self]&&self&&self.Object&&self;var freeWindow=objectTypes[typeof window]&&window&&window.Object&&window;var moduleExports=freeModule&&freeModule.exports===freeExports&&freeExports;var root=freeGlobal||freeWindow!==(this&&this.window)&&freeWindow||freeSelf||this;function baseCompareAscending(value,other){if(value!==other){var valIsNull=value===null,valIsUndef=value===undefined,valIsReflexive=value===value;var othIsNull=other===null,othIsUndef=other===undefined,othIsReflexive=other===other;if(value>other&&!othIsNull||!valIsReflexive||valIsNull&&!othIsUndef&&othIsReflexive||valIsUndef&&othIsReflexive){return 1}if(value<other&&!valIsNull||!othIsReflexive||othIsNull&&!valIsUndef&&valIsReflexive||othIsUndef&&valIsReflexive){return-1}}return 0}function baseFindIndex(array,predicate,fromRight){var length=array.length,index=fromRight?length:-1;while(fromRight?index--:++index<length){if(predicate(array[index],index,array)){return index}}return-1}function baseIndexOf(array,value,fromIndex){if(value!==value){return indexOfNaN(array,fromIndex)}var index=fromIndex-1,length=array.length;while(++index<length){if(array[index]===value){return index}}return-1}function baseIsFunction(value){return typeof value=="function"||false}function baseToString(value){return value==null?"":value+""}function charsLeftIndex(string,chars){var index=-1,length=string.length;while(++index<length&&chars.indexOf(string.charAt(index))>-1){}return index}function charsRightIndex(string,chars){var index=string.length;while(index--&&chars.indexOf(string.charAt(index))>-1){}return index}function compareAscending(object,other){return baseCompareAscending(object.criteria,other.criteria)||object.index-other.index}function compareMultiple(object,other,orders){var index=-1,objCriteria=object.criteria,othCriteria=other.criteria,length=objCriteria.length,ordersLength=orders.length;while(++index<length){var result=baseCompareAscending(objCriteria[index],othCriteria[index]);if(result){if(index>=ordersLength){return result}var order=orders[index];return result*(order==="asc"||order===true?1:-1)}}return object.index-other.index}function deburrLetter(letter){return deburredLetters[letter]}function escapeHtmlChar(chr){return htmlEscapes[chr]}function escapeRegExpChar(chr,leadingChar,whitespaceChar){if(leadingChar){chr=regexpEscapes[chr]}else if(whitespaceChar){chr=stringEscapes[chr]}return"\\"+chr}function escapeStringChar(chr){return"\\"+stringEscapes[chr]}function indexOfNaN(array,fromIndex,fromRight){var length=array.length,index=fromIndex+(fromRight?0:-1);while(fromRight?index--:++index<length){var other=array[index];if(other!==other){return index}}return-1}function isObjectLike(value){return!!value&&typeof value=="object"}function isSpace(charCode){return charCode<=160&&(charCode>=9&&charCode<=13)||charCode==32||charCode==160||charCode==5760||charCode==6158||charCode>=8192&&(charCode<=8202||charCode==8232||charCode==8233||charCode==8239||charCode==8287||charCode==12288||charCode==65279)}function replaceHolders(array,placeholder){var index=-1,length=array.length,resIndex=-1,result=[];while(++index<length){if(array[index]===placeholder){array[index]=PLACEHOLDER;result[++resIndex]=index}}return result}function sortedUniq(array,iteratee){var seen,index=-1,length=array.length,resIndex=-1,result=[];while(++index<length){var value=array[index],computed=iteratee?iteratee(value,index,array):value;if(!index||seen!==computed){seen=computed;result[++resIndex]=value}}return result}function trimmedLeftIndex(string){var index=-1,length=string.length;while(++index<length&&isSpace(string.charCodeAt(index))){}return index}function trimmedRightIndex(string){var index=string.length;while(index--&&isSpace(string.charCodeAt(index))){}return index}function unescapeHtmlChar(chr){return htmlUnescapes[chr]}function runInContext(context){context=context?_.defaults(root.Object(),context,_.pick(root,contextProps)):root;var Array=context.Array,Date=context.Date,Error=context.Error,Function=context.Function,Math=context.Math,Number=context.Number,Object=context.Object,RegExp=context.RegExp,String=context.String,TypeError=context.TypeError;var arrayProto=Array.prototype,objectProto=Object.prototype,stringProto=String.prototype;var fnToString=Function.prototype.toString;var hasOwnProperty=objectProto.hasOwnProperty;var idCounter=0;var objToString=objectProto.toString;var oldDash=root._;var reIsNative=RegExp("^"+fnToString.call(hasOwnProperty).replace(/[\\^$.*+?()[\]{}|]/g,"\\$&").replace(/hasOwnProperty|(function).*?(?=\\\()| for .+?(?=\\\])/g,"$1.*?")+"$");var ArrayBuffer=context.ArrayBuffer,clearTimeout=context.clearTimeout,parseFloat=context.parseFloat,pow=Math.pow,propertyIsEnumerable=objectProto.propertyIsEnumerable,Set=getNative(context,"Set"),setTimeout=context.setTimeout,splice=arrayProto.splice,Uint8Array=context.Uint8Array,WeakMap=getNative(context,"WeakMap");var nativeCeil=Math.ceil,nativeCreate=getNative(Object,"create"),nativeFloor=Math.floor,nativeIsArray=getNative(Array,"isArray"),nativeIsFinite=context.isFinite,nativeKeys=getNative(Object,"keys"),nativeMax=Math.max,nativeMin=Math.min,nativeNow=getNative(Date,"now"),nativeParseInt=context.parseInt,nativeRandom=Math.random;var NEGATIVE_INFINITY=Number.NEGATIVE_INFINITY,POSITIVE_INFINITY=Number.POSITIVE_INFINITY;var MAX_ARRAY_LENGTH=4294967295,MAX_ARRAY_INDEX=MAX_ARRAY_LENGTH-1,HALF_MAX_ARRAY_LENGTH=MAX_ARRAY_LENGTH>>>1;var MAX_SAFE_INTEGER=9007199254740991;var metaMap=WeakMap&&new WeakMap;var realNames={};function lodash(value){if(isObjectLike(value)&&!isArray(value)&&!(value instanceof LazyWrapper)){if(value instanceof LodashWrapper){return value}if(hasOwnProperty.call(value,"__chain__")&&hasOwnProperty.call(value,"__wrapped__")){return wrapperClone(value)}}return new LodashWrapper(value)}function baseLodash(){}function LodashWrapper(value,chainAll,actions){this.__wrapped__=value;this.__actions__=actions||[];this.__chain__=!!chainAll}var support=lodash.support={};lodash.templateSettings={escape:reEscape,evaluate:reEvaluate,interpolate:reInterpolate,variable:"",imports:{_:lodash}};function LazyWrapper(value){this.__wrapped__=value;this.__actions__=[];this.__dir__=1;this.__filtered__=false;this.__iteratees__=[];this.__takeCount__=POSITIVE_INFINITY;this.__views__=[]}function lazyClone(){var result=new LazyWrapper(this.__wrapped__);result.__actions__=arrayCopy(this.__actions__);result.__dir__=this.__dir__;result.__filtered__=this.__filtered__;result.__iteratees__=arrayCopy(this.__iteratees__);result.__takeCount__=this.__takeCount__;result.__views__=arrayCopy(this.__views__);return result}function lazyReverse(){if(this.__filtered__){var result=new LazyWrapper(this);result.__dir__=-1;result.__filtered__=true}else{result=this.clone();result.__dir__*=-1}return result}function lazyValue(){var array=this.__wrapped__.value(),dir=this.__dir__,isArr=isArray(array),isRight=dir<0,arrLength=isArr?array.length:0,view=getView(0,arrLength,this.__views__),start=view.start,end=view.end,length=end-start,index=isRight?end:start-1,iteratees=this.__iteratees__,iterLength=iteratees.length,resIndex=0,takeCount=nativeMin(length,this.__takeCount__);if(!isArr||arrLength<LARGE_ARRAY_SIZE||arrLength==length&&takeCount==length){return baseWrapperValue(isRight&&isArr?array.reverse():array,this.__actions__)}var result=[];outer:while(length--&&resIndex<takeCount){index+=dir;var iterIndex=-1,value=array[index];while(++iterIndex<iterLength){var data=iteratees[iterIndex],iteratee=data.iteratee,type=data.type,computed=iteratee(value);if(type==LAZY_MAP_FLAG){value=computed}else if(!computed){if(type==LAZY_FILTER_FLAG){continue outer}else{break outer}}}result[resIndex++]=value}return result}function MapCache(){this.__data__={}}function mapDelete(key){return this.has(key)&&delete this.__data__[key]}function mapGet(key){return key=="__proto__"?undefined:this.__data__[key]}function mapHas(key){return key!="__proto__"&&hasOwnProperty.call(this.__data__,key)}function mapSet(key,value){if(key!="__proto__"){this.__data__[key]=value}return this}function SetCache(values){var length=values?values.length:0;this.data={hash:nativeCreate(null),set:new Set};while(length--){this.push(values[length])}}function cacheIndexOf(cache,value){var data=cache.data,result=typeof value=="string"||isObject(value)?data.set.has(value):data.hash[value];return result?0:-1}function cachePush(value){var data=this.data;if(typeof value=="string"||isObject(value)){data.set.add(value)}else{data.hash[value]=true}}function arrayConcat(array,other){var index=-1,length=array.length,othIndex=-1,othLength=other.length,result=Array(length+othLength);while(++index<length){result[index]=array[index]}while(++othIndex<othLength){result[index++]=other[othIndex]}return result}function arrayCopy(source,array){var index=-1,length=source.length;array||(array=Array(length));while(++index<length){array[index]=source[index]}return array}function arrayEach(array,iteratee){var index=-1,length=array.length;while(++index<length){if(iteratee(array[index],index,array)===false){break}}return array}function arrayEachRight(array,iteratee){var length=array.length;while(length--){if(iteratee(array[length],length,array)===false){break}}return array}function arrayEvery(array,predicate){var index=-1,length=array.length;while(++index<length){if(!predicate(array[index],index,array)){return false}}return true}function arrayExtremum(array,iteratee,comparator,exValue){var index=-1,length=array.length,computed=exValue,result=computed;while(++index<length){var value=array[index],current=+iteratee(value);if(comparator(current,computed)){computed=current;result=value}}return result}function arrayFilter(array,predicate){var index=-1,length=array.length,resIndex=-1,result=[];while(++index<length){var value=array[index];if(predicate(value,index,array)){result[++resIndex]=value}}return result}function arrayMap(array,iteratee){var index=-1,length=array.length,result=Array(length);while(++index<length){result[index]=iteratee(array[index],index,array)}return result}function arrayPush(array,values){var index=-1,length=values.length,offset=array.length;while(++index<length){array[offset+index]=values[index]}return array}function arrayReduce(array,iteratee,accumulator,initFromArray){var index=-1,length=array.length;if(initFromArray&&length){accumulator=array[++index]}while(++index<length){accumulator=iteratee(accumulator,array[index],index,array)}return accumulator}function arrayReduceRight(array,iteratee,accumulator,initFromArray){var length=array.length;if(initFromArray&&length){accumulator=array[--length]}while(length--){accumulator=iteratee(accumulator,array[length],length,array)}return accumulator}function arraySome(array,predicate){var index=-1,length=array.length;while(++index<length){if(predicate(array[index],index,array)){return true}}return false}function arraySum(array,iteratee){var length=array.length,result=0;while(length--){result+=+iteratee(array[length])||0}return result}function assignDefaults(objectValue,sourceValue){return objectValue===undefined?sourceValue:objectValue}function assignOwnDefaults(objectValue,sourceValue,key,object){return objectValue===undefined||!hasOwnProperty.call(object,key)?sourceValue:objectValue}function assignWith(object,source,customizer){var index=-1,props=keys(source),length=props.length;while(++index<length){var key=props[index],value=object[key],result=customizer(value,source[key],key,object,source);if((result===result?result!==value:value===value)||value===undefined&&!(key in object)){object[key]=result}}return object}function baseAssign(object,source){return source==null?object:baseCopy(source,keys(source),object)}function baseAt(collection,props){var index=-1,isNil=collection==null,isArr=!isNil&&isArrayLike(collection),length=isArr?collection.length:0,propsLength=props.length,result=Array(propsLength);while(++index<propsLength){var key=props[index];if(isArr){result[index]=isIndex(key,length)?collection[key]:undefined}else{result[index]=isNil?undefined:collection[key]}}return result}function baseCopy(source,props,object){object||(object={});var index=-1,length=props.length;while(++index<length){var key=props[index];object[key]=source[key]}return object}function baseCallback(func,thisArg,argCount){var type=typeof func;if(type=="function"){return thisArg===undefined?func:bindCallback(func,thisArg,argCount)}if(func==null){return identity}if(type=="object"){return baseMatches(func)}return thisArg===undefined?property(func):baseMatchesProperty(func,thisArg)}function baseClone(value,isDeep,customizer,key,object,stackA,stackB){var result;if(customizer){result=object?customizer(value,key,object):customizer(value)}if(result!==undefined){return result}if(!isObject(value)){return value}var isArr=isArray(value);if(isArr){result=initCloneArray(value);if(!isDeep){return arrayCopy(value,result)}}else{var tag=objToString.call(value),isFunc=tag==funcTag;if(tag==objectTag||tag==argsTag||isFunc&&!object){result=initCloneObject(isFunc?{}:value);if(!isDeep){return baseAssign(result,value)}}else{return cloneableTags[tag]?initCloneByTag(value,tag,isDeep):object?value:{}}}stackA||(stackA=[]);stackB||(stackB=[]);var length=stackA.length;while(length--){if(stackA[length]==value){return stackB[length]}}stackA.push(value);stackB.push(result);(isArr?arrayEach:baseForOwn)(value,function(subValue,key){result[key]=baseClone(subValue,isDeep,customizer,key,value,stackA,stackB)});return result}var baseCreate=function(){function object(){}return function(prototype){if(isObject(prototype)){object.prototype=prototype;var result=new object;object.prototype=undefined}return result||{}}}();function baseDelay(func,wait,args){if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}return setTimeout(function(){func.apply(undefined,args)},wait)}function baseDifference(array,values){var length=array?array.length:0,result=[];if(!length){return result}var index=-1,indexOf=getIndexOf(),isCommon=indexOf==baseIndexOf,cache=isCommon&&values.length>=LARGE_ARRAY_SIZE?createCache(values):null,valuesLength=values.length;if(cache){indexOf=cacheIndexOf;isCommon=false;values=cache}outer:while(++index<length){var value=array[index];if(isCommon&&value===value){var valuesIndex=valuesLength;while(valuesIndex--){if(values[valuesIndex]===value){continue outer}}result.push(value)}else if(indexOf(values,value,0)<0){result.push(value)}}return result}var baseEach=createBaseEach(baseForOwn);var baseEachRight=createBaseEach(baseForOwnRight,true);function baseEvery(collection,predicate){var result=true;baseEach(collection,function(value,index,collection){result=!!predicate(value,index,collection);return result});return result}function baseExtremum(collection,iteratee,comparator,exValue){var computed=exValue,result=computed;baseEach(collection,function(value,index,collection){var current=+iteratee(value,index,collection);if(comparator(current,computed)||current===exValue&&current===result){computed=current;result=value}});return result}function baseFill(array,value,start,end){var length=array.length;start=start==null?0:+start||0;if(start<0){start=-start>length?0:length+start}end=end===undefined||end>length?length:+end||0;if(end<0){end+=length}length=start>end?0:end>>>0;start>>>=0;while(start<length){array[start++]=value}return array}function baseFilter(collection,predicate){var result=[];baseEach(collection,function(value,index,collection){if(predicate(value,index,collection)){result.push(value)}});return result}function baseFind(collection,predicate,eachFunc,retKey){var result;eachFunc(collection,function(value,key,collection){if(predicate(value,key,collection)){result=retKey?key:value;return false}});return result}function baseFlatten(array,isDeep,isStrict,result){result||(result=[]);var index=-1,length=array.length;while(++index<length){var value=array[index];if(isObjectLike(value)&&isArrayLike(value)&&(isStrict||isArray(value)||isArguments(value))){if(isDeep){baseFlatten(value,isDeep,isStrict,result)}else{arrayPush(result,value)}}else if(!isStrict){result[result.length]=value}}return result}var baseFor=createBaseFor();var baseForRight=createBaseFor(true);function baseForIn(object,iteratee){return baseFor(object,iteratee,keysIn)}function baseForOwn(object,iteratee){return baseFor(object,iteratee,keys)}function baseForOwnRight(object,iteratee){return baseForRight(object,iteratee,keys)}function baseFunctions(object,props){var index=-1,length=props.length,resIndex=-1,result=[];while(++index<length){var key=props[index];if(isFunction(object[key])){result[++resIndex]=key}}return result}function baseGet(object,path,pathKey){if(object==null){return}if(pathKey!==undefined&&pathKey in toObject(object)){path=[pathKey]}var index=0,length=path.length;while(object!=null&&index<length){object=object[path[index++]]}return index&&index==length?object:undefined}function baseIsEqual(value,other,customizer,isLoose,stackA,stackB){if(value===other){return true}if(value==null||other==null||!isObject(value)&&!isObjectLike(other)){return value!==value&&other!==other}return baseIsEqualDeep(value,other,baseIsEqual,customizer,isLoose,stackA,stackB)}function baseIsEqualDeep(object,other,equalFunc,customizer,isLoose,stackA,stackB){var objIsArr=isArray(object),othIsArr=isArray(other),objTag=arrayTag,othTag=arrayTag;if(!objIsArr){objTag=objToString.call(object);if(objTag==argsTag){objTag=objectTag}else if(objTag!=objectTag){objIsArr=isTypedArray(object)}}if(!othIsArr){othTag=objToString.call(other);if(othTag==argsTag){othTag=objectTag}else if(othTag!=objectTag){othIsArr=isTypedArray(other)}}var objIsObj=objTag==objectTag,othIsObj=othTag==objectTag,isSameTag=objTag==othTag;if(isSameTag&&!(objIsArr||objIsObj)){return equalByTag(object,other,objTag)}if(!isLoose){var objIsWrapped=objIsObj&&hasOwnProperty.call(object,"__wrapped__"),othIsWrapped=othIsObj&&hasOwnProperty.call(other,"__wrapped__");if(objIsWrapped||othIsWrapped){return equalFunc(objIsWrapped?object.value():object,othIsWrapped?other.value():other,customizer,isLoose,stackA,stackB)}}if(!isSameTag){return false}stackA||(stackA=[]);stackB||(stackB=[]);var length=stackA.length;while(length--){if(stackA[length]==object){return stackB[length]==other}}stackA.push(object);stackB.push(other);var result=(objIsArr?equalArrays:equalObjects)(object,other,equalFunc,customizer,isLoose,stackA,stackB);stackA.pop();stackB.pop();return result}function baseIsMatch(object,matchData,customizer){var index=matchData.length,length=index,noCustomizer=!customizer;if(object==null){return!length}object=toObject(object);while(index--){var data=matchData[index];if(noCustomizer&&data[2]?data[1]!==object[data[0]]:!(data[0]in object)){return false}}while(++index<length){data=matchData[index];var key=data[0],objValue=object[key],srcValue=data[1];if(noCustomizer&&data[2]){if(objValue===undefined&&!(key in object)){return false}}else{var result=customizer?customizer(objValue,srcValue,key):undefined;if(!(result===undefined?baseIsEqual(srcValue,objValue,customizer,true):result)){return false}}}return true}function baseMap(collection,iteratee){var index=-1,result=isArrayLike(collection)?Array(collection.length):[];baseEach(collection,function(value,key,collection){result[++index]=iteratee(value,key,collection)});return result}function baseMatches(source){var matchData=getMatchData(source);if(matchData.length==1&&matchData[0][2]){var key=matchData[0][0],value=matchData[0][1];return function(object){if(object==null){return false}return object[key]===value&&(value!==undefined||key in toObject(object))}}return function(object){return baseIsMatch(object,matchData)}}function baseMatchesProperty(path,srcValue){var isArr=isArray(path),isCommon=isKey(path)&&isStrictComparable(srcValue),pathKey=path+"";path=toPath(path);return function(object){if(object==null){return false}var key=pathKey;object=toObject(object);if((isArr||!isCommon)&&!(key in object)){object=path.length==1?object:baseGet(object,baseSlice(path,0,-1));if(object==null){return false}key=last(path);object=toObject(object)}return object[key]===srcValue?srcValue!==undefined||key in object:baseIsEqual(srcValue,object[key],undefined,true)}}function baseMerge(object,source,customizer,stackA,stackB){if(!isObject(object)){return object}var isSrcArr=isArrayLike(source)&&(isArray(source)||isTypedArray(source)),props=isSrcArr?undefined:keys(source);arrayEach(props||source,function(srcValue,key){if(props){key=srcValue;srcValue=source[key]}if(isObjectLike(srcValue)){stackA||(stackA=[]);stackB||(stackB=[]);baseMergeDeep(object,source,key,baseMerge,customizer,stackA,stackB)}else{var value=object[key],result=customizer?customizer(value,srcValue,key,object,source):undefined,isCommon=result===undefined;if(isCommon){result=srcValue}if((result!==undefined||isSrcArr&&!(key in object))&&(isCommon||(result===result?result!==value:value===value))){object[key]=result}}});return object}function baseMergeDeep(object,source,key,mergeFunc,customizer,stackA,stackB){var length=stackA.length,srcValue=source[key];while(length--){if(stackA[length]==srcValue){object[key]=stackB[length];return}}var value=object[key],result=customizer?customizer(value,srcValue,key,object,source):undefined,isCommon=result===undefined;if(isCommon){result=srcValue;if(isArrayLike(srcValue)&&(isArray(srcValue)||isTypedArray(srcValue))){result=isArray(value)?value:isArrayLike(value)?arrayCopy(value):[]}else if(isPlainObject(srcValue)||isArguments(srcValue)){result=isArguments(value)?toPlainObject(value):isPlainObject(value)?value:{}}else{isCommon=false}}stackA.push(srcValue);stackB.push(result);if(isCommon){object[key]=mergeFunc(result,srcValue,customizer,stackA,stackB)}else if(result===result?result!==value:value===value){object[key]=result}}function baseProperty(key){return function(object){return object==null?undefined:object[key]}}function basePropertyDeep(path){var pathKey=path+"";path=toPath(path);return function(object){return baseGet(object,path,pathKey)}}function basePullAt(array,indexes){var length=array?indexes.length:0;while(length--){var index=indexes[length];if(index!=previous&&isIndex(index)){var previous=index;splice.call(array,index,1)}}return array}function baseRandom(min,max){return min+nativeFloor(nativeRandom()*(max-min+1))}function baseReduce(collection,iteratee,accumulator,initFromCollection,eachFunc){eachFunc(collection,function(value,index,collection){accumulator=initFromCollection?(initFromCollection=false,value):iteratee(accumulator,value,index,collection)});return accumulator}var baseSetData=!metaMap?identity:function(func,data){metaMap.set(func,data);return func};function baseSlice(array,start,end){var index=-1,length=array.length;start=start==null?0:+start||0;if(start<0){start=-start>length?0:length+start}end=end===undefined||end>length?length:+end||0;if(end<0){end+=length}length=start>end?0:end-start>>>0;start>>>=0;var result=Array(length);while(++index<length){result[index]=array[index+start]}return result}function baseSome(collection,predicate){var result;baseEach(collection,function(value,index,collection){result=predicate(value,index,collection);return!result});return!!result}function baseSortBy(array,comparer){var length=array.length;array.sort(comparer);while(length--){array[length]=array[length].value}return array}function baseSortByOrder(collection,iteratees,orders){var callback=getCallback(),index=-1;iteratees=arrayMap(iteratees,function(iteratee){return callback(iteratee)});var result=baseMap(collection,function(value){var criteria=arrayMap(iteratees,function(iteratee){return iteratee(value)});return{criteria:criteria,index:++index,value:value}});return baseSortBy(result,function(object,other){return compareMultiple(object,other,orders)})}function baseSum(collection,iteratee){var result=0;baseEach(collection,function(value,index,collection){result+=+iteratee(value,index,collection)||0});return result}function baseUniq(array,iteratee){var index=-1,indexOf=getIndexOf(),length=array.length,isCommon=indexOf==baseIndexOf,isLarge=isCommon&&length>=LARGE_ARRAY_SIZE,seen=isLarge?createCache():null,result=[];if(seen){indexOf=cacheIndexOf;isCommon=false}else{isLarge=false;seen=iteratee?[]:result}outer:while(++index<length){var value=array[index],computed=iteratee?iteratee(value,index,array):value;if(isCommon&&value===value){var seenIndex=seen.length;while(seenIndex--){if(seen[seenIndex]===computed){continue outer}}if(iteratee){seen.push(computed)}result.push(value)}else if(indexOf(seen,computed,0)<0){if(iteratee||isLarge){seen.push(computed)}result.push(value)}}return result}function baseValues(object,props){var index=-1,length=props.length,result=Array(length);while(++index<length){result[index]=object[props[index]]}return result}function baseWhile(array,predicate,isDrop,fromRight){var length=array.length,index=fromRight?length:-1;while((fromRight?index--:++index<length)&&predicate(array[index],index,array)){}return isDrop?baseSlice(array,fromRight?0:index,fromRight?index+1:length):baseSlice(array,fromRight?index+1:0,fromRight?length:index)}function baseWrapperValue(value,actions){var result=value;if(result instanceof LazyWrapper){result=result.value()}var index=-1,length=actions.length;while(++index<length){var action=actions[index];result=action.func.apply(action.thisArg,arrayPush([result],action.args))}return result}function binaryIndex(array,value,retHighest){var low=0,high=array?array.length:low;if(typeof value=="number"&&value===value&&high<=HALF_MAX_ARRAY_LENGTH){while(low<high){var mid=low+high>>>1,computed=array[mid];if((retHighest?computed<=value:computed<value)&&computed!==null){low=mid+1}else{high=mid}}return high}return binaryIndexBy(array,value,identity,retHighest)}function binaryIndexBy(array,value,iteratee,retHighest){value=iteratee(value);var low=0,high=array?array.length:0,valIsNaN=value!==value,valIsNull=value===null,valIsUndef=value===undefined;while(low<high){var mid=nativeFloor((low+high)/2),computed=iteratee(array[mid]),isDef=computed!==undefined,isReflexive=computed===computed;if(valIsNaN){var setLow=isReflexive||retHighest}else if(valIsNull){setLow=isReflexive&&isDef&&(retHighest||computed!=null);
+}else if(valIsUndef){setLow=isReflexive&&(retHighest||isDef)}else if(computed==null){setLow=false}else{setLow=retHighest?computed<=value:computed<value}if(setLow){low=mid+1}else{high=mid}}return nativeMin(high,MAX_ARRAY_INDEX)}function bindCallback(func,thisArg,argCount){if(typeof func!="function"){return identity}if(thisArg===undefined){return func}switch(argCount){case 1:return function(value){return func.call(thisArg,value)};case 3:return function(value,index,collection){return func.call(thisArg,value,index,collection)};case 4:return function(accumulator,value,index,collection){return func.call(thisArg,accumulator,value,index,collection)};case 5:return function(value,other,key,object,source){return func.call(thisArg,value,other,key,object,source)}}return function(){return func.apply(thisArg,arguments)}}function bufferClone(buffer){var result=new ArrayBuffer(buffer.byteLength),view=new Uint8Array(result);view.set(new Uint8Array(buffer));return result}function composeArgs(args,partials,holders){var holdersLength=holders.length,argsIndex=-1,argsLength=nativeMax(args.length-holdersLength,0),leftIndex=-1,leftLength=partials.length,result=Array(leftLength+argsLength);while(++leftIndex<leftLength){result[leftIndex]=partials[leftIndex]}while(++argsIndex<holdersLength){result[holders[argsIndex]]=args[argsIndex]}while(argsLength--){result[leftIndex++]=args[argsIndex++]}return result}function composeArgsRight(args,partials,holders){var holdersIndex=-1,holdersLength=holders.length,argsIndex=-1,argsLength=nativeMax(args.length-holdersLength,0),rightIndex=-1,rightLength=partials.length,result=Array(argsLength+rightLength);while(++argsIndex<argsLength){result[argsIndex]=args[argsIndex]}var offset=argsIndex;while(++rightIndex<rightLength){result[offset+rightIndex]=partials[rightIndex]}while(++holdersIndex<holdersLength){result[offset+holders[holdersIndex]]=args[argsIndex++]}return result}function createAggregator(setter,initializer){return function(collection,iteratee,thisArg){var result=initializer?initializer():{};iteratee=getCallback(iteratee,thisArg,3);if(isArray(collection)){var index=-1,length=collection.length;while(++index<length){var value=collection[index];setter(result,value,iteratee(value,index,collection),collection)}}else{baseEach(collection,function(value,key,collection){setter(result,value,iteratee(value,key,collection),collection)})}return result}}function createAssigner(assigner){return restParam(function(object,sources){var index=-1,length=object==null?0:sources.length,customizer=length>2?sources[length-2]:undefined,guard=length>2?sources[2]:undefined,thisArg=length>1?sources[length-1]:undefined;if(typeof customizer=="function"){customizer=bindCallback(customizer,thisArg,5);length-=2}else{customizer=typeof thisArg=="function"?thisArg:undefined;length-=customizer?1:0}if(guard&&isIterateeCall(sources[0],sources[1],guard)){customizer=length<3?undefined:customizer;length=1}while(++index<length){var source=sources[index];if(source){assigner(object,source,customizer)}}return object})}function createBaseEach(eachFunc,fromRight){return function(collection,iteratee){var length=collection?getLength(collection):0;if(!isLength(length)){return eachFunc(collection,iteratee)}var index=fromRight?length:-1,iterable=toObject(collection);while(fromRight?index--:++index<length){if(iteratee(iterable[index],index,iterable)===false){break}}return collection}}function createBaseFor(fromRight){return function(object,iteratee,keysFunc){var iterable=toObject(object),props=keysFunc(object),length=props.length,index=fromRight?length:-1;while(fromRight?index--:++index<length){var key=props[index];if(iteratee(iterable[key],key,iterable)===false){break}}return object}}function createBindWrapper(func,thisArg){var Ctor=createCtorWrapper(func);function wrapper(){var fn=this&&this!==root&&this instanceof wrapper?Ctor:func;return fn.apply(thisArg,arguments)}return wrapper}function createCache(values){return nativeCreate&&Set?new SetCache(values):null}function createCompounder(callback){return function(string){var index=-1,array=words(deburr(string)),length=array.length,result="";while(++index<length){result=callback(result,array[index],index)}return result}}function createCtorWrapper(Ctor){return function(){var args=arguments;switch(args.length){case 0:return new Ctor;case 1:return new Ctor(args[0]);case 2:return new Ctor(args[0],args[1]);case 3:return new Ctor(args[0],args[1],args[2]);case 4:return new Ctor(args[0],args[1],args[2],args[3]);case 5:return new Ctor(args[0],args[1],args[2],args[3],args[4]);case 6:return new Ctor(args[0],args[1],args[2],args[3],args[4],args[5]);case 7:return new Ctor(args[0],args[1],args[2],args[3],args[4],args[5],args[6])}var thisBinding=baseCreate(Ctor.prototype),result=Ctor.apply(thisBinding,args);return isObject(result)?result:thisBinding}}function createCurry(flag){function curryFunc(func,arity,guard){if(guard&&isIterateeCall(func,arity,guard)){arity=undefined}var result=createWrapper(func,flag,undefined,undefined,undefined,undefined,undefined,arity);result.placeholder=curryFunc.placeholder;return result}return curryFunc}function createDefaults(assigner,customizer){return restParam(function(args){var object=args[0];if(object==null){return object}args.push(customizer);return assigner.apply(undefined,args)})}function createExtremum(comparator,exValue){return function(collection,iteratee,thisArg){if(thisArg&&isIterateeCall(collection,iteratee,thisArg)){iteratee=undefined}iteratee=getCallback(iteratee,thisArg,3);if(iteratee.length==1){collection=isArray(collection)?collection:toIterable(collection);var result=arrayExtremum(collection,iteratee,comparator,exValue);if(!(collection.length&&result===exValue)){return result}}return baseExtremum(collection,iteratee,comparator,exValue)}}function createFind(eachFunc,fromRight){return function(collection,predicate,thisArg){predicate=getCallback(predicate,thisArg,3);if(isArray(collection)){var index=baseFindIndex(collection,predicate,fromRight);return index>-1?collection[index]:undefined}return baseFind(collection,predicate,eachFunc)}}function createFindIndex(fromRight){return function(array,predicate,thisArg){if(!(array&&array.length)){return-1}predicate=getCallback(predicate,thisArg,3);return baseFindIndex(array,predicate,fromRight)}}function createFindKey(objectFunc){return function(object,predicate,thisArg){predicate=getCallback(predicate,thisArg,3);return baseFind(object,predicate,objectFunc,true)}}function createFlow(fromRight){return function(){var wrapper,length=arguments.length,index=fromRight?length:-1,leftIndex=0,funcs=Array(length);while(fromRight?index--:++index<length){var func=funcs[leftIndex++]=arguments[index];if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}if(!wrapper&&LodashWrapper.prototype.thru&&getFuncName(func)=="wrapper"){wrapper=new LodashWrapper([],true)}}index=wrapper?-1:length;while(++index<length){func=funcs[index];var funcName=getFuncName(func),data=funcName=="wrapper"?getData(func):undefined;if(data&&isLaziable(data[0])&&data[1]==(ARY_FLAG|CURRY_FLAG|PARTIAL_FLAG|REARG_FLAG)&&!data[4].length&&data[9]==1){wrapper=wrapper[getFuncName(data[0])].apply(wrapper,data[3])}else{wrapper=func.length==1&&isLaziable(func)?wrapper[funcName]():wrapper.thru(func)}}return function(){var args=arguments,value=args[0];if(wrapper&&args.length==1&&isArray(value)&&value.length>=LARGE_ARRAY_SIZE){return wrapper.plant(value).value()}var index=0,result=length?funcs[index].apply(this,args):value;while(++index<length){result=funcs[index].call(this,result)}return result}}}function createForEach(arrayFunc,eachFunc){return function(collection,iteratee,thisArg){return typeof iteratee=="function"&&thisArg===undefined&&isArray(collection)?arrayFunc(collection,iteratee):eachFunc(collection,bindCallback(iteratee,thisArg,3))}}function createForIn(objectFunc){return function(object,iteratee,thisArg){if(typeof iteratee!="function"||thisArg!==undefined){iteratee=bindCallback(iteratee,thisArg,3)}return objectFunc(object,iteratee,keysIn)}}function createForOwn(objectFunc){return function(object,iteratee,thisArg){if(typeof iteratee!="function"||thisArg!==undefined){iteratee=bindCallback(iteratee,thisArg,3)}return objectFunc(object,iteratee)}}function createObjectMapper(isMapKeys){return function(object,iteratee,thisArg){var result={};iteratee=getCallback(iteratee,thisArg,3);baseForOwn(object,function(value,key,object){var mapped=iteratee(value,key,object);key=isMapKeys?mapped:key;value=isMapKeys?value:mapped;result[key]=value});return result}}function createPadDir(fromRight){return function(string,length,chars){string=baseToString(string);return(fromRight?string:"")+createPadding(string,length,chars)+(fromRight?"":string)}}function createPartial(flag){var partialFunc=restParam(function(func,partials){var holders=replaceHolders(partials,partialFunc.placeholder);return createWrapper(func,flag,undefined,partials,holders)});return partialFunc}function createReduce(arrayFunc,eachFunc){return function(collection,iteratee,accumulator,thisArg){var initFromArray=arguments.length<3;return typeof iteratee=="function"&&thisArg===undefined&&isArray(collection)?arrayFunc(collection,iteratee,accumulator,initFromArray):baseReduce(collection,getCallback(iteratee,thisArg,4),accumulator,initFromArray,eachFunc)}}function createHybridWrapper(func,bitmask,thisArg,partials,holders,partialsRight,holdersRight,argPos,ary,arity){var isAry=bitmask&ARY_FLAG,isBind=bitmask&BIND_FLAG,isBindKey=bitmask&BIND_KEY_FLAG,isCurry=bitmask&CURRY_FLAG,isCurryBound=bitmask&CURRY_BOUND_FLAG,isCurryRight=bitmask&CURRY_RIGHT_FLAG,Ctor=isBindKey?undefined:createCtorWrapper(func);function wrapper(){var length=arguments.length,index=length,args=Array(length);while(index--){args[index]=arguments[index]}if(partials){args=composeArgs(args,partials,holders)}if(partialsRight){args=composeArgsRight(args,partialsRight,holdersRight)}if(isCurry||isCurryRight){var placeholder=wrapper.placeholder,argsHolders=replaceHolders(args,placeholder);length-=argsHolders.length;if(length<arity){var newArgPos=argPos?arrayCopy(argPos):undefined,newArity=nativeMax(arity-length,0),newsHolders=isCurry?argsHolders:undefined,newHoldersRight=isCurry?undefined:argsHolders,newPartials=isCurry?args:undefined,newPartialsRight=isCurry?undefined:args;bitmask|=isCurry?PARTIAL_FLAG:PARTIAL_RIGHT_FLAG;bitmask&=~(isCurry?PARTIAL_RIGHT_FLAG:PARTIAL_FLAG);if(!isCurryBound){bitmask&=~(BIND_FLAG|BIND_KEY_FLAG)}var newData=[func,bitmask,thisArg,newPartials,newsHolders,newPartialsRight,newHoldersRight,newArgPos,ary,newArity],result=createHybridWrapper.apply(undefined,newData);if(isLaziable(func)){setData(result,newData)}result.placeholder=placeholder;return result}}var thisBinding=isBind?thisArg:this,fn=isBindKey?thisBinding[func]:func;if(argPos){args=reorder(args,argPos)}if(isAry&&ary<args.length){args.length=ary}if(this&&this!==root&&this instanceof wrapper){fn=Ctor||createCtorWrapper(func)}return fn.apply(thisBinding,args)}return wrapper}function createPadding(string,length,chars){var strLength=string.length;length=+length;if(strLength>=length||!nativeIsFinite(length)){return""}var padLength=length-strLength;chars=chars==null?" ":chars+"";return repeat(chars,nativeCeil(padLength/chars.length)).slice(0,padLength)}function createPartialWrapper(func,bitmask,thisArg,partials){var isBind=bitmask&BIND_FLAG,Ctor=createCtorWrapper(func);function wrapper(){var argsIndex=-1,argsLength=arguments.length,leftIndex=-1,leftLength=partials.length,args=Array(leftLength+argsLength);while(++leftIndex<leftLength){args[leftIndex]=partials[leftIndex]}while(argsLength--){args[leftIndex++]=arguments[++argsIndex]}var fn=this&&this!==root&&this instanceof wrapper?Ctor:func;return fn.apply(isBind?thisArg:this,args)}return wrapper}function createRound(methodName){var func=Math[methodName];return function(number,precision){precision=precision===undefined?0:+precision||0;if(precision){precision=pow(10,precision);return func(number*precision)/precision}return func(number)}}function createSortedIndex(retHighest){return function(array,value,iteratee,thisArg){var callback=getCallback(iteratee);return iteratee==null&&callback===baseCallback?binaryIndex(array,value,retHighest):binaryIndexBy(array,value,callback(iteratee,thisArg,1),retHighest)}}function createWrapper(func,bitmask,thisArg,partials,holders,argPos,ary,arity){var isBindKey=bitmask&BIND_KEY_FLAG;if(!isBindKey&&typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}var length=partials?partials.length:0;if(!length){bitmask&=~(PARTIAL_FLAG|PARTIAL_RIGHT_FLAG);partials=holders=undefined}length-=holders?holders.length:0;if(bitmask&PARTIAL_RIGHT_FLAG){var partialsRight=partials,holdersRight=holders;partials=holders=undefined}var data=isBindKey?undefined:getData(func),newData=[func,bitmask,thisArg,partials,holders,partialsRight,holdersRight,argPos,ary,arity];if(data){mergeData(newData,data);bitmask=newData[1];arity=newData[9]}newData[9]=arity==null?isBindKey?0:func.length:nativeMax(arity-length,0)||0;if(bitmask==BIND_FLAG){var result=createBindWrapper(newData[0],newData[2])}else if((bitmask==PARTIAL_FLAG||bitmask==(BIND_FLAG|PARTIAL_FLAG))&&!newData[4].length){result=createPartialWrapper.apply(undefined,newData)}else{result=createHybridWrapper.apply(undefined,newData)}var setter=data?baseSetData:setData;return setter(result,newData)}function equalArrays(array,other,equalFunc,customizer,isLoose,stackA,stackB){var index=-1,arrLength=array.length,othLength=other.length;if(arrLength!=othLength&&!(isLoose&&othLength>arrLength)){return false}while(++index<arrLength){var arrValue=array[index],othValue=other[index],result=customizer?customizer(isLoose?othValue:arrValue,isLoose?arrValue:othValue,index):undefined;if(result!==undefined){if(result){continue}return false}if(isLoose){if(!arraySome(other,function(othValue){return arrValue===othValue||equalFunc(arrValue,othValue,customizer,isLoose,stackA,stackB)})){return false}}else if(!(arrValue===othValue||equalFunc(arrValue,othValue,customizer,isLoose,stackA,stackB))){return false}}return true}function equalByTag(object,other,tag){switch(tag){case boolTag:case dateTag:return+object==+other;case errorTag:return object.name==other.name&&object.message==other.message;case numberTag:return object!=+object?other!=+other:object==+other;case regexpTag:case stringTag:return object==other+""}return false}function equalObjects(object,other,equalFunc,customizer,isLoose,stackA,stackB){var objProps=keys(object),objLength=objProps.length,othProps=keys(other),othLength=othProps.length;if(objLength!=othLength&&!isLoose){return false}var index=objLength;while(index--){var key=objProps[index];if(!(isLoose?key in other:hasOwnProperty.call(other,key))){return false}}var skipCtor=isLoose;while(++index<objLength){key=objProps[index];var objValue=object[key],othValue=other[key],result=customizer?customizer(isLoose?othValue:objValue,isLoose?objValue:othValue,key):undefined;if(!(result===undefined?equalFunc(objValue,othValue,customizer,isLoose,stackA,stackB):result)){return false}skipCtor||(skipCtor=key=="constructor")}if(!skipCtor){var objCtor=object.constructor,othCtor=other.constructor;if(objCtor!=othCtor&&("constructor"in object&&"constructor"in other)&&!(typeof objCtor=="function"&&objCtor instanceof objCtor&&typeof othCtor=="function"&&othCtor instanceof othCtor)){return false}}return true}function getCallback(func,thisArg,argCount){var result=lodash.callback||callback;result=result===callback?baseCallback:result;return argCount?result(func,thisArg,argCount):result}var getData=!metaMap?noop:function(func){return metaMap.get(func)};function getFuncName(func){var result=func.name,array=realNames[result],length=array?array.length:0;while(length--){var data=array[length],otherFunc=data.func;if(otherFunc==null||otherFunc==func){return data.name}}return result}function getIndexOf(collection,target,fromIndex){var result=lodash.indexOf||indexOf;result=result===indexOf?baseIndexOf:result;return collection?result(collection,target,fromIndex):result}var getLength=baseProperty("length");function getMatchData(object){var result=pairs(object),length=result.length;while(length--){result[length][2]=isStrictComparable(result[length][1])}return result}function getNative(object,key){var value=object==null?undefined:object[key];return isNative(value)?value:undefined}function getView(start,end,transforms){var index=-1,length=transforms.length;while(++index<length){var data=transforms[index],size=data.size;switch(data.type){case"drop":start+=size;break;case"dropRight":end-=size;break;case"take":end=nativeMin(end,start+size);break;case"takeRight":start=nativeMax(start,end-size);break}}return{start:start,end:end}}function initCloneArray(array){var length=array.length,result=new array.constructor(length);if(length&&typeof array[0]=="string"&&hasOwnProperty.call(array,"index")){result.index=array.index;result.input=array.input}return result}function initCloneObject(object){var Ctor=object.constructor;if(!(typeof Ctor=="function"&&Ctor instanceof Ctor)){Ctor=Object}return new Ctor}function initCloneByTag(object,tag,isDeep){var Ctor=object.constructor;switch(tag){case arrayBufferTag:return bufferClone(object);case boolTag:case dateTag:return new Ctor(+object);case float32Tag:case float64Tag:case int8Tag:case int16Tag:case int32Tag:case uint8Tag:case uint8ClampedTag:case uint16Tag:case uint32Tag:var buffer=object.buffer;return new Ctor(isDeep?bufferClone(buffer):buffer,object.byteOffset,object.length);case numberTag:case stringTag:return new Ctor(object);case regexpTag:var result=new Ctor(object.source,reFlags.exec(object));result.lastIndex=object.lastIndex}return result}function invokePath(object,path,args){if(object!=null&&!isKey(path,object)){path=toPath(path);object=path.length==1?object:baseGet(object,baseSlice(path,0,-1));path=last(path)}var func=object==null?object:object[path];return func==null?undefined:func.apply(object,args)}function isArrayLike(value){return value!=null&&isLength(getLength(value))}function isIndex(value,length){value=typeof value=="number"||reIsUint.test(value)?+value:-1;length=length==null?MAX_SAFE_INTEGER:length;return value>-1&&value%1==0&&value<length}function isIterateeCall(value,index,object){if(!isObject(object)){return false}var type=typeof index;if(type=="number"?isArrayLike(object)&&isIndex(index,object.length):type=="string"&&index in object){var other=object[index];return value===value?value===other:other!==other}return false}function isKey(value,object){var type=typeof value;if(type=="string"&&reIsPlainProp.test(value)||type=="number"){return true}if(isArray(value)){return false}var result=!reIsDeepProp.test(value);return result||object!=null&&value in toObject(object)}function isLaziable(func){var funcName=getFuncName(func);if(!(funcName in LazyWrapper.prototype)){return false}var other=lodash[funcName];if(func===other){return true}var data=getData(other);return!!data&&func===data[0]}function isLength(value){return typeof value=="number"&&value>-1&&value%1==0&&value<=MAX_SAFE_INTEGER}function isStrictComparable(value){return value===value&&!isObject(value)}function mergeData(data,source){var bitmask=data[1],srcBitmask=source[1],newBitmask=bitmask|srcBitmask,isCommon=newBitmask<ARY_FLAG;var isCombo=srcBitmask==ARY_FLAG&&bitmask==CURRY_FLAG||srcBitmask==ARY_FLAG&&bitmask==REARG_FLAG&&data[7].length<=source[8]||srcBitmask==(ARY_FLAG|REARG_FLAG)&&bitmask==CURRY_FLAG;if(!(isCommon||isCombo)){return data}if(srcBitmask&BIND_FLAG){data[2]=source[2];newBitmask|=bitmask&BIND_FLAG?0:CURRY_BOUND_FLAG}var value=source[3];if(value){var partials=data[3];data[3]=partials?composeArgs(partials,value,source[4]):arrayCopy(value);data[4]=partials?replaceHolders(data[3],PLACEHOLDER):arrayCopy(source[4])}value=source[5];if(value){partials=data[5];data[5]=partials?composeArgsRight(partials,value,source[6]):arrayCopy(value);data[6]=partials?replaceHolders(data[5],PLACEHOLDER):arrayCopy(source[6])}value=source[7];if(value){data[7]=arrayCopy(value)}if(srcBitmask&ARY_FLAG){data[8]=data[8]==null?source[8]:nativeMin(data[8],source[8])}if(data[9]==null){data[9]=source[9]}data[0]=source[0];data[1]=newBitmask;return data}function mergeDefaults(objectValue,sourceValue){return objectValue===undefined?sourceValue:merge(objectValue,sourceValue,mergeDefaults)}function pickByArray(object,props){object=toObject(object);var index=-1,length=props.length,result={};while(++index<length){var key=props[index];if(key in object){result[key]=object[key]}}return result}function pickByCallback(object,predicate){var result={};baseForIn(object,function(value,key,object){if(predicate(value,key,object)){result[key]=value}});return result}function reorder(array,indexes){var arrLength=array.length,length=nativeMin(indexes.length,arrLength),oldArray=arrayCopy(array);while(length--){var index=indexes[length];array[length]=isIndex(index,arrLength)?oldArray[index]:undefined}return array}var setData=function(){var count=0,lastCalled=0;return function(key,value){var stamp=now(),remaining=HOT_SPAN-(stamp-lastCalled);lastCalled=stamp;if(remaining>0){if(++count>=HOT_COUNT){return key}}else{count=0}return baseSetData(key,value)}}();function shimKeys(object){var props=keysIn(object),propsLength=props.length,length=propsLength&&object.length;var allowIndexes=!!length&&isLength(length)&&(isArray(object)||isArguments(object));var index=-1,result=[];while(++index<propsLength){var key=props[index];if(allowIndexes&&isIndex(key,length)||hasOwnProperty.call(object,key)){result.push(key)}}return result}function toIterable(value){if(value==null){return[]}if(!isArrayLike(value)){return values(value)}return isObject(value)?value:Object(value)}function toObject(value){return isObject(value)?value:Object(value)}function toPath(value){if(isArray(value)){return value}var result=[];baseToString(value).replace(rePropName,function(match,number,quote,string){result.push(quote?string.replace(reEscapeChar,"$1"):number||match)});return result}function wrapperClone(wrapper){return wrapper instanceof LazyWrapper?wrapper.clone():new LodashWrapper(wrapper.__wrapped__,wrapper.__chain__,arrayCopy(wrapper.__actions__))}function chunk(array,size,guard){if(guard?isIterateeCall(array,size,guard):size==null){size=1}else{size=nativeMax(nativeFloor(size)||1,1)}var index=0,length=array?array.length:0,resIndex=-1,result=Array(nativeCeil(length/size));while(index<length){result[++resIndex]=baseSlice(array,index,index+=size)}return result}function compact(array){var index=-1,length=array?array.length:0,resIndex=-1,result=[];while(++index<length){var value=array[index];if(value){result[++resIndex]=value}}return result}var difference=restParam(function(array,values){return isObjectLike(array)&&isArrayLike(array)?baseDifference(array,baseFlatten(values,false,true)):[]});function drop(array,n,guard){var length=array?array.length:0;if(!length){return[]}if(guard?isIterateeCall(array,n,guard):n==null){n=1}return baseSlice(array,n<0?0:n)}function dropRight(array,n,guard){var length=array?array.length:0;if(!length){return[]}if(guard?isIterateeCall(array,n,guard):n==null){n=1}n=length-(+n||0);return baseSlice(array,0,n<0?0:n)}function dropRightWhile(array,predicate,thisArg){return array&&array.length?baseWhile(array,getCallback(predicate,thisArg,3),true,true):[]}function dropWhile(array,predicate,thisArg){return array&&array.length?baseWhile(array,getCallback(predicate,thisArg,3),true):[]}function fill(array,value,start,end){var length=array?array.length:0;if(!length){return[]}if(start&&typeof start!="number"&&isIterateeCall(array,value,start)){start=0;end=length}return baseFill(array,value,start,end)}var findIndex=createFindIndex();var findLastIndex=createFindIndex(true);function first(array){return array?array[0]:undefined}function flatten(array,isDeep,guard){var length=array?array.length:0;if(guard&&isIterateeCall(array,isDeep,guard)){isDeep=false}return length?baseFlatten(array,isDeep):[]}function flattenDeep(array){var length=array?array.length:0;return length?baseFlatten(array,true):[]}function indexOf(array,value,fromIndex){var length=array?array.length:0;if(!length){return-1}if(typeof fromIndex=="number"){fromIndex=fromIndex<0?nativeMax(length+fromIndex,0):fromIndex}else if(fromIndex){var index=binaryIndex(array,value);if(index<length&&(value===value?value===array[index]:array[index]!==array[index])){return index}return-1}return baseIndexOf(array,value,fromIndex||0)}function initial(array){return dropRight(array,1)}var intersection=restParam(function(arrays){var othLength=arrays.length,othIndex=othLength,caches=Array(length),indexOf=getIndexOf(),isCommon=indexOf==baseIndexOf,result=[];while(othIndex--){var value=arrays[othIndex]=isArrayLike(value=arrays[othIndex])?value:[];caches[othIndex]=isCommon&&value.length>=120?createCache(othIndex&&value):null}var array=arrays[0],index=-1,length=array?array.length:0,seen=caches[0];outer:while(++index<length){value=array[index];if((seen?cacheIndexOf(seen,value):indexOf(result,value,0))<0){var othIndex=othLength;while(--othIndex){var cache=caches[othIndex];if((cache?cacheIndexOf(cache,value):indexOf(arrays[othIndex],value,0))<0){continue outer}}if(seen){seen.push(value)}result.push(value)}}return result});function last(array){var length=array?array.length:0;return length?array[length-1]:undefined}function lastIndexOf(array,value,fromIndex){var length=array?array.length:0;if(!length){return-1}var index=length;if(typeof fromIndex=="number"){index=(fromIndex<0?nativeMax(length+fromIndex,0):nativeMin(fromIndex||0,length-1))+1}else if(fromIndex){index=binaryIndex(array,value,true)-1;var other=array[index];if(value===value?value===other:other!==other){return index}return-1}if(value!==value){return indexOfNaN(array,index,true)}while(index--){if(array[index]===value){return index}}return-1}function pull(){var args=arguments,array=args[0];if(!(array&&array.length)){return array}var index=0,indexOf=getIndexOf(),length=args.length;while(++index<length){var fromIndex=0,value=args[index];while((fromIndex=indexOf(array,value,fromIndex))>-1){splice.call(array,fromIndex,1)}}return array}var pullAt=restParam(function(array,indexes){indexes=baseFlatten(indexes);var result=baseAt(array,indexes);basePullAt(array,indexes.sort(baseCompareAscending));return result});function remove(array,predicate,thisArg){var result=[];if(!(array&&array.length)){return result}var index=-1,indexes=[],length=array.length;predicate=getCallback(predicate,thisArg,3);while(++index<length){var value=array[index];if(predicate(value,index,array)){result.push(value);indexes.push(index)}}basePullAt(array,indexes);return result}function rest(array){return drop(array,1)}function slice(array,start,end){var length=array?array.length:0;if(!length){return[]}if(end&&typeof end!="number"&&isIterateeCall(array,start,end)){start=0;end=length}return baseSlice(array,start,end)}var sortedIndex=createSortedIndex();var sortedLastIndex=createSortedIndex(true);function take(array,n,guard){var length=array?array.length:0;if(!length){return[]}if(guard?isIterateeCall(array,n,guard):n==null){n=1}return baseSlice(array,0,n<0?0:n)}function takeRight(array,n,guard){var length=array?array.length:0;if(!length){return[]}if(guard?isIterateeCall(array,n,guard):n==null){n=1}n=length-(+n||0);return baseSlice(array,n<0?0:n)}function takeRightWhile(array,predicate,thisArg){return array&&array.length?baseWhile(array,getCallback(predicate,thisArg,3),false,true):[]}function takeWhile(array,predicate,thisArg){return array&&array.length?baseWhile(array,getCallback(predicate,thisArg,3)):[]}var union=restParam(function(arrays){return baseUniq(baseFlatten(arrays,false,true))});function uniq(array,isSorted,iteratee,thisArg){var length=array?array.length:0;if(!length){return[]}if(isSorted!=null&&typeof isSorted!="boolean"){thisArg=iteratee;iteratee=isIterateeCall(array,isSorted,thisArg)?undefined:isSorted;isSorted=false}var callback=getCallback();if(!(iteratee==null&&callback===baseCallback)){iteratee=callback(iteratee,thisArg,3)}return isSorted&&getIndexOf()==baseIndexOf?sortedUniq(array,iteratee):baseUniq(array,iteratee)}function unzip(array){if(!(array&&array.length)){return[]}var index=-1,length=0;array=arrayFilter(array,function(group){if(isArrayLike(group)){length=nativeMax(group.length,length);return true}});var result=Array(length);while(++index<length){result[index]=arrayMap(array,baseProperty(index))}return result}function unzipWith(array,iteratee,thisArg){var length=array?array.length:0;if(!length){return[]}var result=unzip(array);if(iteratee==null){return result}iteratee=bindCallback(iteratee,thisArg,4);return arrayMap(result,function(group){return arrayReduce(group,iteratee,undefined,true)})}var without=restParam(function(array,values){return isArrayLike(array)?baseDifference(array,values):[]});function xor(){var index=-1,length=arguments.length;while(++index<length){var array=arguments[index];if(isArrayLike(array)){var result=result?arrayPush(baseDifference(result,array),baseDifference(array,result)):array}}return result?baseUniq(result):[]}var zip=restParam(unzip);function zipObject(props,values){var index=-1,length=props?props.length:0,result={};if(length&&!values&&!isArray(props[0])){values=[]}while(++index<length){var key=props[index];if(values){result[key]=values[index]}else if(key){result[key[0]]=key[1]}}return result}var zipWith=restParam(function(arrays){var length=arrays.length,iteratee=length>2?arrays[length-2]:undefined,thisArg=length>1?arrays[length-1]:undefined;if(length>2&&typeof iteratee=="function"){length-=2}else{iteratee=length>1&&typeof thisArg=="function"?(--length,thisArg):undefined;thisArg=undefined}arrays.length=length;return unzipWith(arrays,iteratee,thisArg)});function chain(value){var result=lodash(value);result.__chain__=true;return result}function tap(value,interceptor,thisArg){interceptor.call(thisArg,value);return value}function thru(value,interceptor,thisArg){return interceptor.call(thisArg,value)}function wrapperChain(){return chain(this)}function wrapperCommit(){return new LodashWrapper(this.value(),this.__chain__)}var wrapperConcat=restParam(function(values){values=baseFlatten(values);return this.thru(function(array){return arrayConcat(isArray(array)?array:[toObject(array)],values)})});function wrapperPlant(value){var result,parent=this;while(parent instanceof baseLodash){var clone=wrapperClone(parent);if(result){previous.__wrapped__=clone}else{result=clone}var previous=clone;parent=parent.__wrapped__}previous.__wrapped__=value;return result}function wrapperReverse(){var value=this.__wrapped__;var interceptor=function(value){return wrapped&&wrapped.__dir__<0?value:value.reverse()};if(value instanceof LazyWrapper){var wrapped=value;if(this.__actions__.length){wrapped=new LazyWrapper(this)}wrapped=wrapped.reverse();wrapped.__actions__.push({func:thru,args:[interceptor],thisArg:undefined});return new LodashWrapper(wrapped,this.__chain__)}return this.thru(interceptor)}function wrapperToString(){return this.value()+""}function wrapperValue(){return baseWrapperValue(this.__wrapped__,this.__actions__)}var at=restParam(function(collection,props){return baseAt(collection,baseFlatten(props))});var countBy=createAggregator(function(result,value,key){hasOwnProperty.call(result,key)?++result[key]:result[key]=1});function every(collection,predicate,thisArg){var func=isArray(collection)?arrayEvery:baseEvery;if(thisArg&&isIterateeCall(collection,predicate,thisArg)){predicate=undefined}if(typeof predicate!="function"||thisArg!==undefined){predicate=getCallback(predicate,thisArg,3)}return func(collection,predicate)}function filter(collection,predicate,thisArg){var func=isArray(collection)?arrayFilter:baseFilter;predicate=getCallback(predicate,thisArg,3);return func(collection,predicate)}var find=createFind(baseEach);var findLast=createFind(baseEachRight,true);function findWhere(collection,source){return find(collection,baseMatches(source))}var forEach=createForEach(arrayEach,baseEach);var forEachRight=createForEach(arrayEachRight,baseEachRight);
+var groupBy=createAggregator(function(result,value,key){if(hasOwnProperty.call(result,key)){result[key].push(value)}else{result[key]=[value]}});function includes(collection,target,fromIndex,guard){var length=collection?getLength(collection):0;if(!isLength(length)){collection=values(collection);length=collection.length}if(typeof fromIndex!="number"||guard&&isIterateeCall(target,fromIndex,guard)){fromIndex=0}else{fromIndex=fromIndex<0?nativeMax(length+fromIndex,0):fromIndex||0}return typeof collection=="string"||!isArray(collection)&&isString(collection)?fromIndex<=length&&collection.indexOf(target,fromIndex)>-1:!!length&&getIndexOf(collection,target,fromIndex)>-1}var indexBy=createAggregator(function(result,value,key){result[key]=value});var invoke=restParam(function(collection,path,args){var index=-1,isFunc=typeof path=="function",isProp=isKey(path),result=isArrayLike(collection)?Array(collection.length):[];baseEach(collection,function(value){var func=isFunc?path:isProp&&value!=null?value[path]:undefined;result[++index]=func?func.apply(value,args):invokePath(value,path,args)});return result});function map(collection,iteratee,thisArg){var func=isArray(collection)?arrayMap:baseMap;iteratee=getCallback(iteratee,thisArg,3);return func(collection,iteratee)}var partition=createAggregator(function(result,value,key){result[key?0:1].push(value)},function(){return[[],[]]});function pluck(collection,path){return map(collection,property(path))}var reduce=createReduce(arrayReduce,baseEach);var reduceRight=createReduce(arrayReduceRight,baseEachRight);function reject(collection,predicate,thisArg){var func=isArray(collection)?arrayFilter:baseFilter;predicate=getCallback(predicate,thisArg,3);return func(collection,function(value,index,collection){return!predicate(value,index,collection)})}function sample(collection,n,guard){if(guard?isIterateeCall(collection,n,guard):n==null){collection=toIterable(collection);var length=collection.length;return length>0?collection[baseRandom(0,length-1)]:undefined}var index=-1,result=toArray(collection),length=result.length,lastIndex=length-1;n=nativeMin(n<0?0:+n||0,length);while(++index<n){var rand=baseRandom(index,lastIndex),value=result[rand];result[rand]=result[index];result[index]=value}result.length=n;return result}function shuffle(collection){return sample(collection,POSITIVE_INFINITY)}function size(collection){var length=collection?getLength(collection):0;return isLength(length)?length:keys(collection).length}function some(collection,predicate,thisArg){var func=isArray(collection)?arraySome:baseSome;if(thisArg&&isIterateeCall(collection,predicate,thisArg)){predicate=undefined}if(typeof predicate!="function"||thisArg!==undefined){predicate=getCallback(predicate,thisArg,3)}return func(collection,predicate)}function sortBy(collection,iteratee,thisArg){if(collection==null){return[]}if(thisArg&&isIterateeCall(collection,iteratee,thisArg)){iteratee=undefined}var index=-1;iteratee=getCallback(iteratee,thisArg,3);var result=baseMap(collection,function(value,key,collection){return{criteria:iteratee(value,key,collection),index:++index,value:value}});return baseSortBy(result,compareAscending)}var sortByAll=restParam(function(collection,iteratees){if(collection==null){return[]}var guard=iteratees[2];if(guard&&isIterateeCall(iteratees[0],iteratees[1],guard)){iteratees.length=1}return baseSortByOrder(collection,baseFlatten(iteratees),[])});function sortByOrder(collection,iteratees,orders,guard){if(collection==null){return[]}if(guard&&isIterateeCall(iteratees,orders,guard)){orders=undefined}if(!isArray(iteratees)){iteratees=iteratees==null?[]:[iteratees]}if(!isArray(orders)){orders=orders==null?[]:[orders]}return baseSortByOrder(collection,iteratees,orders)}function where(collection,source){return filter(collection,baseMatches(source))}var now=nativeNow||function(){return(new Date).getTime()};function after(n,func){if(typeof func!="function"){if(typeof n=="function"){var temp=n;n=func;func=temp}else{throw new TypeError(FUNC_ERROR_TEXT)}}n=nativeIsFinite(n=+n)?n:0;return function(){if(--n<1){return func.apply(this,arguments)}}}function ary(func,n,guard){if(guard&&isIterateeCall(func,n,guard)){n=undefined}n=func&&n==null?func.length:nativeMax(+n||0,0);return createWrapper(func,ARY_FLAG,undefined,undefined,undefined,undefined,n)}function before(n,func){var result;if(typeof func!="function"){if(typeof n=="function"){var temp=n;n=func;func=temp}else{throw new TypeError(FUNC_ERROR_TEXT)}}return function(){if(--n>0){result=func.apply(this,arguments)}if(n<=1){func=undefined}return result}}var bind=restParam(function(func,thisArg,partials){var bitmask=BIND_FLAG;if(partials.length){var holders=replaceHolders(partials,bind.placeholder);bitmask|=PARTIAL_FLAG}return createWrapper(func,bitmask,thisArg,partials,holders)});var bindAll=restParam(function(object,methodNames){methodNames=methodNames.length?baseFlatten(methodNames):functions(object);var index=-1,length=methodNames.length;while(++index<length){var key=methodNames[index];object[key]=createWrapper(object[key],BIND_FLAG,object)}return object});var bindKey=restParam(function(object,key,partials){var bitmask=BIND_FLAG|BIND_KEY_FLAG;if(partials.length){var holders=replaceHolders(partials,bindKey.placeholder);bitmask|=PARTIAL_FLAG}return createWrapper(key,bitmask,object,partials,holders)});var curry=createCurry(CURRY_FLAG);var curryRight=createCurry(CURRY_RIGHT_FLAG);function debounce(func,wait,options){var args,maxTimeoutId,result,stamp,thisArg,timeoutId,trailingCall,lastCalled=0,maxWait=false,trailing=true;if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}wait=wait<0?0:+wait||0;if(options===true){var leading=true;trailing=false}else if(isObject(options)){leading=!!options.leading;maxWait="maxWait"in options&&nativeMax(+options.maxWait||0,wait);trailing="trailing"in options?!!options.trailing:trailing}function cancel(){if(timeoutId){clearTimeout(timeoutId)}if(maxTimeoutId){clearTimeout(maxTimeoutId)}lastCalled=0;maxTimeoutId=timeoutId=trailingCall=undefined}function complete(isCalled,id){if(id){clearTimeout(id)}maxTimeoutId=timeoutId=trailingCall=undefined;if(isCalled){lastCalled=now();result=func.apply(thisArg,args);if(!timeoutId&&!maxTimeoutId){args=thisArg=undefined}}}function delayed(){var remaining=wait-(now()-stamp);if(remaining<=0||remaining>wait){complete(trailingCall,maxTimeoutId)}else{timeoutId=setTimeout(delayed,remaining)}}function maxDelayed(){complete(trailing,timeoutId)}function debounced(){args=arguments;stamp=now();thisArg=this;trailingCall=trailing&&(timeoutId||!leading);if(maxWait===false){var leadingCall=leading&&!timeoutId}else{if(!maxTimeoutId&&!leading){lastCalled=stamp}var remaining=maxWait-(stamp-lastCalled),isCalled=remaining<=0||remaining>maxWait;if(isCalled){if(maxTimeoutId){maxTimeoutId=clearTimeout(maxTimeoutId)}lastCalled=stamp;result=func.apply(thisArg,args)}else if(!maxTimeoutId){maxTimeoutId=setTimeout(maxDelayed,remaining)}}if(isCalled&&timeoutId){timeoutId=clearTimeout(timeoutId)}else if(!timeoutId&&wait!==maxWait){timeoutId=setTimeout(delayed,wait)}if(leadingCall){isCalled=true;result=func.apply(thisArg,args)}if(isCalled&&!timeoutId&&!maxTimeoutId){args=thisArg=undefined}return result}debounced.cancel=cancel;return debounced}var defer=restParam(function(func,args){return baseDelay(func,1,args)});var delay=restParam(function(func,wait,args){return baseDelay(func,wait,args)});var flow=createFlow();var flowRight=createFlow(true);function memoize(func,resolver){if(typeof func!="function"||resolver&&typeof resolver!="function"){throw new TypeError(FUNC_ERROR_TEXT)}var memoized=function(){var args=arguments,key=resolver?resolver.apply(this,args):args[0],cache=memoized.cache;if(cache.has(key)){return cache.get(key)}var result=func.apply(this,args);memoized.cache=cache.set(key,result);return result};memoized.cache=new memoize.Cache;return memoized}var modArgs=restParam(function(func,transforms){transforms=baseFlatten(transforms);if(typeof func!="function"||!arrayEvery(transforms,baseIsFunction)){throw new TypeError(FUNC_ERROR_TEXT)}var length=transforms.length;return restParam(function(args){var index=nativeMin(args.length,length);while(index--){args[index]=transforms[index](args[index])}return func.apply(this,args)})});function negate(predicate){if(typeof predicate!="function"){throw new TypeError(FUNC_ERROR_TEXT)}return function(){return!predicate.apply(this,arguments)}}function once(func){return before(2,func)}var partial=createPartial(PARTIAL_FLAG);var partialRight=createPartial(PARTIAL_RIGHT_FLAG);var rearg=restParam(function(func,indexes){return createWrapper(func,REARG_FLAG,undefined,undefined,undefined,baseFlatten(indexes))});function restParam(func,start){if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}start=nativeMax(start===undefined?func.length-1:+start||0,0);return function(){var args=arguments,index=-1,length=nativeMax(args.length-start,0),rest=Array(length);while(++index<length){rest[index]=args[start+index]}switch(start){case 0:return func.call(this,rest);case 1:return func.call(this,args[0],rest);case 2:return func.call(this,args[0],args[1],rest)}var otherArgs=Array(start+1);index=-1;while(++index<start){otherArgs[index]=args[index]}otherArgs[start]=rest;return func.apply(this,otherArgs)}}function spread(func){if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}return function(array){return func.apply(this,array)}}function throttle(func,wait,options){var leading=true,trailing=true;if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}if(options===false){leading=false}else if(isObject(options)){leading="leading"in options?!!options.leading:leading;trailing="trailing"in options?!!options.trailing:trailing}return debounce(func,wait,{leading:leading,maxWait:+wait,trailing:trailing})}function wrap(value,wrapper){wrapper=wrapper==null?identity:wrapper;return createWrapper(wrapper,PARTIAL_FLAG,undefined,[value],[])}function clone(value,isDeep,customizer,thisArg){if(isDeep&&typeof isDeep!="boolean"&&isIterateeCall(value,isDeep,customizer)){isDeep=false}else if(typeof isDeep=="function"){thisArg=customizer;customizer=isDeep;isDeep=false}return typeof customizer=="function"?baseClone(value,isDeep,bindCallback(customizer,thisArg,1)):baseClone(value,isDeep)}function cloneDeep(value,customizer,thisArg){return typeof customizer=="function"?baseClone(value,true,bindCallback(customizer,thisArg,1)):baseClone(value,true)}function gt(value,other){return value>other}function gte(value,other){return value>=other}function isArguments(value){return isObjectLike(value)&&isArrayLike(value)&&hasOwnProperty.call(value,"callee")&&!propertyIsEnumerable.call(value,"callee")}var isArray=nativeIsArray||function(value){return isObjectLike(value)&&isLength(value.length)&&objToString.call(value)==arrayTag};function isBoolean(value){return value===true||value===false||isObjectLike(value)&&objToString.call(value)==boolTag}function isDate(value){return isObjectLike(value)&&objToString.call(value)==dateTag}function isElement(value){return!!value&&value.nodeType===1&&isObjectLike(value)&&!isPlainObject(value)}function isEmpty(value){if(value==null){return true}if(isArrayLike(value)&&(isArray(value)||isString(value)||isArguments(value)||isObjectLike(value)&&isFunction(value.splice))){return!value.length}return!keys(value).length}function isEqual(value,other,customizer,thisArg){customizer=typeof customizer=="function"?bindCallback(customizer,thisArg,3):undefined;var result=customizer?customizer(value,other):undefined;return result===undefined?baseIsEqual(value,other,customizer):!!result}function isError(value){return isObjectLike(value)&&typeof value.message=="string"&&objToString.call(value)==errorTag}function isFinite(value){return typeof value=="number"&&nativeIsFinite(value)}function isFunction(value){return isObject(value)&&objToString.call(value)==funcTag}function isObject(value){var type=typeof value;return!!value&&(type=="object"||type=="function")}function isMatch(object,source,customizer,thisArg){customizer=typeof customizer=="function"?bindCallback(customizer,thisArg,3):undefined;return baseIsMatch(object,getMatchData(source),customizer)}function isNaN(value){return isNumber(value)&&value!=+value}function isNative(value){if(value==null){return false}if(isFunction(value)){return reIsNative.test(fnToString.call(value))}return isObjectLike(value)&&reIsHostCtor.test(value)}function isNull(value){return value===null}function isNumber(value){return typeof value=="number"||isObjectLike(value)&&objToString.call(value)==numberTag}function isPlainObject(value){var Ctor;if(!(isObjectLike(value)&&objToString.call(value)==objectTag&&!isArguments(value))||!hasOwnProperty.call(value,"constructor")&&(Ctor=value.constructor,typeof Ctor=="function"&&!(Ctor instanceof Ctor))){return false}var result;baseForIn(value,function(subValue,key){result=key});return result===undefined||hasOwnProperty.call(value,result)}function isRegExp(value){return isObject(value)&&objToString.call(value)==regexpTag}function isString(value){return typeof value=="string"||isObjectLike(value)&&objToString.call(value)==stringTag}function isTypedArray(value){return isObjectLike(value)&&isLength(value.length)&&!!typedArrayTags[objToString.call(value)]}function isUndefined(value){return value===undefined}function lt(value,other){return value<other}function lte(value,other){return value<=other}function toArray(value){var length=value?getLength(value):0;if(!isLength(length)){return values(value)}if(!length){return[]}return arrayCopy(value)}function toPlainObject(value){return baseCopy(value,keysIn(value))}var merge=createAssigner(baseMerge);var assign=createAssigner(function(object,source,customizer){return customizer?assignWith(object,source,customizer):baseAssign(object,source)});function create(prototype,properties,guard){var result=baseCreate(prototype);if(guard&&isIterateeCall(prototype,properties,guard)){properties=undefined}return properties?baseAssign(result,properties):result}var defaults=createDefaults(assign,assignDefaults);var defaultsDeep=createDefaults(merge,mergeDefaults);var findKey=createFindKey(baseForOwn);var findLastKey=createFindKey(baseForOwnRight);var forIn=createForIn(baseFor);var forInRight=createForIn(baseForRight);var forOwn=createForOwn(baseForOwn);var forOwnRight=createForOwn(baseForOwnRight);function functions(object){return baseFunctions(object,keysIn(object))}function get(object,path,defaultValue){var result=object==null?undefined:baseGet(object,toPath(path),path+"");return result===undefined?defaultValue:result}function has(object,path){if(object==null){return false}var result=hasOwnProperty.call(object,path);if(!result&&!isKey(path)){path=toPath(path);object=path.length==1?object:baseGet(object,baseSlice(path,0,-1));if(object==null){return false}path=last(path);result=hasOwnProperty.call(object,path)}return result||isLength(object.length)&&isIndex(path,object.length)&&(isArray(object)||isArguments(object))}function invert(object,multiValue,guard){if(guard&&isIterateeCall(object,multiValue,guard)){multiValue=undefined}var index=-1,props=keys(object),length=props.length,result={};while(++index<length){var key=props[index],value=object[key];if(multiValue){if(hasOwnProperty.call(result,value)){result[value].push(key)}else{result[value]=[key]}}else{result[value]=key}}return result}var keys=!nativeKeys?shimKeys:function(object){var Ctor=object==null?undefined:object.constructor;if(typeof Ctor=="function"&&Ctor.prototype===object||typeof object!="function"&&isArrayLike(object)){return shimKeys(object)}return isObject(object)?nativeKeys(object):[]};function keysIn(object){if(object==null){return[]}if(!isObject(object)){object=Object(object)}var length=object.length;length=length&&isLength(length)&&(isArray(object)||isArguments(object))&&length||0;var Ctor=object.constructor,index=-1,isProto=typeof Ctor=="function"&&Ctor.prototype===object,result=Array(length),skipIndexes=length>0;while(++index<length){result[index]=index+""}for(var key in object){if(!(skipIndexes&&isIndex(key,length))&&!(key=="constructor"&&(isProto||!hasOwnProperty.call(object,key)))){result.push(key)}}return result}var mapKeys=createObjectMapper(true);var mapValues=createObjectMapper();var omit=restParam(function(object,props){if(object==null){return{}}if(typeof props[0]!="function"){var props=arrayMap(baseFlatten(props),String);return pickByArray(object,baseDifference(keysIn(object),props))}var predicate=bindCallback(props[0],props[1],3);return pickByCallback(object,function(value,key,object){return!predicate(value,key,object)})});function pairs(object){object=toObject(object);var index=-1,props=keys(object),length=props.length,result=Array(length);while(++index<length){var key=props[index];result[index]=[key,object[key]]}return result}var pick=restParam(function(object,props){if(object==null){return{}}return typeof props[0]=="function"?pickByCallback(object,bindCallback(props[0],props[1],3)):pickByArray(object,baseFlatten(props))});function result(object,path,defaultValue){var result=object==null?undefined:object[path];if(result===undefined){if(object!=null&&!isKey(path,object)){path=toPath(path);object=path.length==1?object:baseGet(object,baseSlice(path,0,-1));result=object==null?undefined:object[last(path)]}result=result===undefined?defaultValue:result}return isFunction(result)?result.call(object):result}function set(object,path,value){if(object==null){return object}var pathKey=path+"";path=object[pathKey]!=null||isKey(path,object)?[pathKey]:toPath(path);var index=-1,length=path.length,lastIndex=length-1,nested=object;while(nested!=null&&++index<length){var key=path[index];if(isObject(nested)){if(index==lastIndex){nested[key]=value}else if(nested[key]==null){nested[key]=isIndex(path[index+1])?[]:{}}}nested=nested[key]}return object}function transform(object,iteratee,accumulator,thisArg){var isArr=isArray(object)||isTypedArray(object);iteratee=getCallback(iteratee,thisArg,4);if(accumulator==null){if(isArr||isObject(object)){var Ctor=object.constructor;if(isArr){accumulator=isArray(object)?new Ctor:[]}else{accumulator=baseCreate(isFunction(Ctor)?Ctor.prototype:undefined)}}else{accumulator={}}}(isArr?arrayEach:baseForOwn)(object,function(value,index,object){return iteratee(accumulator,value,index,object)});return accumulator}function values(object){return baseValues(object,keys(object))}function valuesIn(object){return baseValues(object,keysIn(object))}function inRange(value,start,end){start=+start||0;if(end===undefined){end=start;start=0}else{end=+end||0}return value>=nativeMin(start,end)&&value<nativeMax(start,end)}function random(min,max,floating){if(floating&&isIterateeCall(min,max,floating)){max=floating=undefined}var noMin=min==null,noMax=max==null;if(floating==null){if(noMax&&typeof min=="boolean"){floating=min;min=1}else if(typeof max=="boolean"){floating=max;noMax=true}}if(noMin&&noMax){max=1;noMax=false}min=+min||0;if(noMax){max=min;min=0}else{max=+max||0}if(floating||min%1||max%1){var rand=nativeRandom();return nativeMin(min+rand*(max-min+parseFloat("1e-"+((rand+"").length-1))),max)}return baseRandom(min,max)}var camelCase=createCompounder(function(result,word,index){word=word.toLowerCase();return result+(index?word.charAt(0).toUpperCase()+word.slice(1):word)});function capitalize(string){string=baseToString(string);return string&&string.charAt(0).toUpperCase()+string.slice(1)}function deburr(string){string=baseToString(string);return string&&string.replace(reLatin1,deburrLetter).replace(reComboMark,"")}function endsWith(string,target,position){string=baseToString(string);target=target+"";var length=string.length;position=position===undefined?length:nativeMin(position<0?0:+position||0,length);position-=target.length;return position>=0&&string.indexOf(target,position)==position}function escape(string){string=baseToString(string);return string&&reHasUnescapedHtml.test(string)?string.replace(reUnescapedHtml,escapeHtmlChar):string}function escapeRegExp(string){string=baseToString(string);return string&&reHasRegExpChars.test(string)?string.replace(reRegExpChars,escapeRegExpChar):string||"(?:)"}var kebabCase=createCompounder(function(result,word,index){return result+(index?"-":"")+word.toLowerCase()});function pad(string,length,chars){string=baseToString(string);length=+length;var strLength=string.length;if(strLength>=length||!nativeIsFinite(length)){return string}var mid=(length-strLength)/2,leftLength=nativeFloor(mid),rightLength=nativeCeil(mid);chars=createPadding("",rightLength,chars);return chars.slice(0,leftLength)+string+chars}var padLeft=createPadDir();var padRight=createPadDir(true);function parseInt(string,radix,guard){if(guard?isIterateeCall(string,radix,guard):radix==null){radix=0}else if(radix){radix=+radix}string=trim(string);return nativeParseInt(string,radix||(reHasHexPrefix.test(string)?16:10))}function repeat(string,n){var result="";string=baseToString(string);n=+n;if(n<1||!string||!nativeIsFinite(n)){return result}do{if(n%2){result+=string}n=nativeFloor(n/2);string+=string}while(n);return result}var snakeCase=createCompounder(function(result,word,index){return result+(index?"_":"")+word.toLowerCase()});var startCase=createCompounder(function(result,word,index){return result+(index?" ":"")+(word.charAt(0).toUpperCase()+word.slice(1))});function startsWith(string,target,position){string=baseToString(string);position=position==null?0:nativeMin(position<0?0:+position||0,string.length);return string.lastIndexOf(target,position)==position}function template(string,options,otherOptions){var settings=lodash.templateSettings;if(otherOptions&&isIterateeCall(string,options,otherOptions)){options=otherOptions=undefined}string=baseToString(string);options=assignWith(baseAssign({},otherOptions||options),settings,assignOwnDefaults);var imports=assignWith(baseAssign({},options.imports),settings.imports,assignOwnDefaults),importsKeys=keys(imports),importsValues=baseValues(imports,importsKeys);var isEscaping,isEvaluating,index=0,interpolate=options.interpolate||reNoMatch,source="__p += '";var reDelimiters=RegExp((options.escape||reNoMatch).source+"|"+interpolate.source+"|"+(interpolate===reInterpolate?reEsTemplate:reNoMatch).source+"|"+(options.evaluate||reNoMatch).source+"|$","g");var sourceURL="//# sourceURL="+("sourceURL"in options?options.sourceURL:"lodash.templateSources["+ ++templateCounter+"]")+"\n";string.replace(reDelimiters,function(match,escapeValue,interpolateValue,esTemplateValue,evaluateValue,offset){interpolateValue||(interpolateValue=esTemplateValue);source+=string.slice(index,offset).replace(reUnescapedString,escapeStringChar);if(escapeValue){isEscaping=true;source+="' +\n__e("+escapeValue+") +\n'"}if(evaluateValue){isEvaluating=true;source+="';\n"+evaluateValue+";\n__p += '"}if(interpolateValue){source+="' +\n((__t = ("+interpolateValue+")) == null ? '' : __t) +\n'"}index=offset+match.length;return match});source+="';\n";var variable=options.variable;if(!variable){source="with (obj) {\n"+source+"\n}\n"}source=(isEvaluating?source.replace(reEmptyStringLeading,""):source).replace(reEmptyStringMiddle,"$1").replace(reEmptyStringTrailing,"$1;");source="function("+(variable||"obj")+") {\n"+(variable?"":"obj || (obj = {});\n")+"var __t, __p = ''"+(isEscaping?", __e = _.escape":"")+(isEvaluating?", __j = Array.prototype.join;\n"+"function print() { __p += __j.call(arguments, '') }\n":";\n")+source+"return __p\n}";var result=attempt(function(){return Function(importsKeys,sourceURL+"return "+source).apply(undefined,importsValues)});result.source=source;if(isError(result)){throw result}return result}function trim(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(trimmedLeftIndex(string),trimmedRightIndex(string)+1)}chars=chars+"";return string.slice(charsLeftIndex(string,chars),charsRightIndex(string,chars)+1)}function trimLeft(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(trimmedLeftIndex(string))}return string.slice(charsLeftIndex(string,chars+""))}function trimRight(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(0,trimmedRightIndex(string)+1)}return string.slice(0,charsRightIndex(string,chars+"")+1)}function trunc(string,options,guard){if(guard&&isIterateeCall(string,options,guard)){options=undefined}var length=DEFAULT_TRUNC_LENGTH,omission=DEFAULT_TRUNC_OMISSION;if(options!=null){if(isObject(options)){var separator="separator"in options?options.separator:separator;length="length"in options?+options.length||0:length;omission="omission"in options?baseToString(options.omission):omission}else{length=+options||0}}string=baseToString(string);if(length>=string.length){return string}var end=length-omission.length;if(end<1){return omission}var result=string.slice(0,end);if(separator==null){return result+omission}if(isRegExp(separator)){if(string.slice(end).search(separator)){var match,newEnd,substring=string.slice(0,end);if(!separator.global){separator=RegExp(separator.source,(reFlags.exec(separator)||"")+"g")}separator.lastIndex=0;while(match=separator.exec(substring)){newEnd=match.index}result=result.slice(0,newEnd==null?end:newEnd)}}else if(string.indexOf(separator,end)!=end){var index=result.lastIndexOf(separator);if(index>-1){result=result.slice(0,index)}}return result+omission}function unescape(string){string=baseToString(string);return string&&reHasEscapedHtml.test(string)?string.replace(reEscapedHtml,unescapeHtmlChar):string}function words(string,pattern,guard){if(guard&&isIterateeCall(string,pattern,guard)){pattern=undefined}string=baseToString(string);return string.match(pattern||reWords)||[]}var attempt=restParam(function(func,args){try{return func.apply(undefined,args)}catch(e){return isError(e)?e:new Error(e)}});function callback(func,thisArg,guard){if(guard&&isIterateeCall(func,thisArg,guard)){thisArg=undefined}return isObjectLike(func)?matches(func):baseCallback(func,thisArg)}function constant(value){return function(){return value}}function identity(value){return value}function matches(source){return baseMatches(baseClone(source,true))}function matchesProperty(path,srcValue){return baseMatchesProperty(path,baseClone(srcValue,true))}var method=restParam(function(path,args){return function(object){return invokePath(object,path,args)}});var methodOf=restParam(function(object,args){return function(path){return invokePath(object,path,args)}});function mixin(object,source,options){if(options==null){var isObj=isObject(source),props=isObj?keys(source):undefined,methodNames=props&&props.length?baseFunctions(source,props):undefined;if(!(methodNames?methodNames.length:isObj)){methodNames=false;options=source;source=object;object=this}}if(!methodNames){methodNames=baseFunctions(source,keys(source))}var chain=true,index=-1,isFunc=isFunction(object),length=methodNames.length;if(options===false){chain=false}else if(isObject(options)&&"chain"in options){chain=options.chain}while(++index<length){var methodName=methodNames[index],func=source[methodName];object[methodName]=func;if(isFunc){object.prototype[methodName]=function(func){return function(){var chainAll=this.__chain__;if(chain||chainAll){var result=object(this.__wrapped__),actions=result.__actions__=arrayCopy(this.__actions__);actions.push({func:func,args:arguments,thisArg:object});result.__chain__=chainAll;return result}return func.apply(object,arrayPush([this.value()],arguments))}}(func)}}return object}function noConflict(){root._=oldDash;return this}function noop(){}function property(path){return isKey(path)?baseProperty(path):basePropertyDeep(path)}function propertyOf(object){return function(path){return baseGet(object,toPath(path),path+"")}}function range(start,end,step){if(step&&isIterateeCall(start,end,step)){end=step=undefined}start=+start||0;step=step==null?1:+step||0;if(end==null){end=start;start=0}else{end=+end||0}var index=-1,length=nativeMax(nativeCeil((end-start)/(step||1)),0),result=Array(length);while(++index<length){result[index]=start;start+=step}return result}function times(n,iteratee,thisArg){n=nativeFloor(n);if(n<1||!nativeIsFinite(n)){return[]}var index=-1,result=Array(nativeMin(n,MAX_ARRAY_LENGTH));iteratee=bindCallback(iteratee,thisArg,1);while(++index<n){if(index<MAX_ARRAY_LENGTH){result[index]=iteratee(index)}else{iteratee(index)}}return result}function uniqueId(prefix){var id=++idCounter;return baseToString(prefix)+id}function add(augend,addend){return(+augend||0)+(+addend||0)}var ceil=createRound("ceil");var floor=createRound("floor");var max=createExtremum(gt,NEGATIVE_INFINITY);var min=createExtremum(lt,POSITIVE_INFINITY);var round=createRound("round");function sum(collection,iteratee,thisArg){if(thisArg&&isIterateeCall(collection,iteratee,thisArg)){iteratee=undefined}iteratee=getCallback(iteratee,thisArg,3);return iteratee.length==1?arraySum(isArray(collection)?collection:toIterable(collection),iteratee):baseSum(collection,iteratee)}lodash.prototype=baseLodash.prototype;LodashWrapper.prototype=baseCreate(baseLodash.prototype);LodashWrapper.prototype.constructor=LodashWrapper;LazyWrapper.prototype=baseCreate(baseLodash.prototype);LazyWrapper.prototype.constructor=LazyWrapper;MapCache.prototype["delete"]=mapDelete;MapCache.prototype.get=mapGet;MapCache.prototype.has=mapHas;MapCache.prototype.set=mapSet;SetCache.prototype.push=cachePush;memoize.Cache=MapCache;lodash.after=after;lodash.ary=ary;lodash.assign=assign;lodash.at=at;lodash.before=before;lodash.bind=bind;lodash.bindAll=bindAll;lodash.bindKey=bindKey;lodash.callback=callback;lodash.chain=chain;lodash.chunk=chunk;lodash.compact=compact;lodash.constant=constant;lodash.countBy=countBy;lodash.create=create;lodash.curry=curry;lodash.curryRight=curryRight;lodash.debounce=debounce;lodash.defaults=defaults;lodash.defaultsDeep=defaultsDeep;lodash.defer=defer;lodash.delay=delay;lodash.difference=difference;lodash.drop=drop;lodash.dropRight=dropRight;lodash.dropRightWhile=dropRightWhile;lodash.dropWhile=dropWhile;lodash.fill=fill;lodash.filter=filter;lodash.flatten=flatten;lodash.flattenDeep=flattenDeep;lodash.flow=flow;lodash.flowRight=flowRight;lodash.forEach=forEach;lodash.forEachRight=forEachRight;lodash.forIn=forIn;lodash.forInRight=forInRight;lodash.forOwn=forOwn;lodash.forOwnRight=forOwnRight;lodash.functions=functions;lodash.groupBy=groupBy;lodash.indexBy=indexBy;lodash.initial=initial;lodash.intersection=intersection;lodash.invert=invert;lodash.invoke=invoke;lodash.keys=keys;lodash.keysIn=keysIn;lodash.map=map;lodash.mapKeys=mapKeys;lodash.mapValues=mapValues;lodash.matches=matches;lodash.matchesProperty=matchesProperty;lodash.memoize=memoize;lodash.merge=merge;lodash.method=method;lodash.methodOf=methodOf;lodash.mixin=mixin;lodash.modArgs=modArgs;lodash.negate=negate;lodash.omit=omit;lodash.once=once;lodash.pairs=pairs;lodash.partial=partial;lodash.partialRight=partialRight;lodash.partition=partition;lodash.pick=pick;lodash.pluck=pluck;lodash.property=property;lodash.propertyOf=propertyOf;lodash.pull=pull;lodash.pullAt=pullAt;lodash.range=range;lodash.rearg=rearg;lodash.reject=reject;lodash.remove=remove;lodash.rest=rest;lodash.restParam=restParam;lodash.set=set;lodash.shuffle=shuffle;lodash.slice=slice;lodash.sortBy=sortBy;lodash.sortByAll=sortByAll;lodash.sortByOrder=sortByOrder;lodash.spread=spread;lodash.take=take;lodash.takeRight=takeRight;lodash.takeRightWhile=takeRightWhile;lodash.takeWhile=takeWhile;lodash.tap=tap;lodash.throttle=throttle;lodash.thru=thru;lodash.times=times;lodash.toArray=toArray;lodash.toPlainObject=toPlainObject;lodash.transform=transform;lodash.union=union;lodash.uniq=uniq;lodash.unzip=unzip;lodash.unzipWith=unzipWith;lodash.values=values;lodash.valuesIn=valuesIn;lodash.where=where;lodash.without=without;
+lodash.wrap=wrap;lodash.xor=xor;lodash.zip=zip;lodash.zipObject=zipObject;lodash.zipWith=zipWith;lodash.backflow=flowRight;lodash.collect=map;lodash.compose=flowRight;lodash.each=forEach;lodash.eachRight=forEachRight;lodash.extend=assign;lodash.iteratee=callback;lodash.methods=functions;lodash.object=zipObject;lodash.select=filter;lodash.tail=rest;lodash.unique=uniq;mixin(lodash,lodash);lodash.add=add;lodash.attempt=attempt;lodash.camelCase=camelCase;lodash.capitalize=capitalize;lodash.ceil=ceil;lodash.clone=clone;lodash.cloneDeep=cloneDeep;lodash.deburr=deburr;lodash.endsWith=endsWith;lodash.escape=escape;lodash.escapeRegExp=escapeRegExp;lodash.every=every;lodash.find=find;lodash.findIndex=findIndex;lodash.findKey=findKey;lodash.findLast=findLast;lodash.findLastIndex=findLastIndex;lodash.findLastKey=findLastKey;lodash.findWhere=findWhere;lodash.first=first;lodash.floor=floor;lodash.get=get;lodash.gt=gt;lodash.gte=gte;lodash.has=has;lodash.identity=identity;lodash.includes=includes;lodash.indexOf=indexOf;lodash.inRange=inRange;lodash.isArguments=isArguments;lodash.isArray=isArray;lodash.isBoolean=isBoolean;lodash.isDate=isDate;lodash.isElement=isElement;lodash.isEmpty=isEmpty;lodash.isEqual=isEqual;lodash.isError=isError;lodash.isFinite=isFinite;lodash.isFunction=isFunction;lodash.isMatch=isMatch;lodash.isNaN=isNaN;lodash.isNative=isNative;lodash.isNull=isNull;lodash.isNumber=isNumber;lodash.isObject=isObject;lodash.isPlainObject=isPlainObject;lodash.isRegExp=isRegExp;lodash.isString=isString;lodash.isTypedArray=isTypedArray;lodash.isUndefined=isUndefined;lodash.kebabCase=kebabCase;lodash.last=last;lodash.lastIndexOf=lastIndexOf;lodash.lt=lt;lodash.lte=lte;lodash.max=max;lodash.min=min;lodash.noConflict=noConflict;lodash.noop=noop;lodash.now=now;lodash.pad=pad;lodash.padLeft=padLeft;lodash.padRight=padRight;lodash.parseInt=parseInt;lodash.random=random;lodash.reduce=reduce;lodash.reduceRight=reduceRight;lodash.repeat=repeat;lodash.result=result;lodash.round=round;lodash.runInContext=runInContext;lodash.size=size;lodash.snakeCase=snakeCase;lodash.some=some;lodash.sortedIndex=sortedIndex;lodash.sortedLastIndex=sortedLastIndex;lodash.startCase=startCase;lodash.startsWith=startsWith;lodash.sum=sum;lodash.template=template;lodash.trim=trim;lodash.trimLeft=trimLeft;lodash.trimRight=trimRight;lodash.trunc=trunc;lodash.unescape=unescape;lodash.uniqueId=uniqueId;lodash.words=words;lodash.all=every;lodash.any=some;lodash.contains=includes;lodash.eq=isEqual;lodash.detect=find;lodash.foldl=reduce;lodash.foldr=reduceRight;lodash.head=first;lodash.include=includes;lodash.inject=reduce;mixin(lodash,function(){var source={};baseForOwn(lodash,function(func,methodName){if(!lodash.prototype[methodName]){source[methodName]=func}});return source}(),false);lodash.sample=sample;lodash.prototype.sample=function(n){if(!this.__chain__&&n==null){return sample(this.value())}return this.thru(function(value){return sample(value,n)})};lodash.VERSION=VERSION;arrayEach(["bind","bindKey","curry","curryRight","partial","partialRight"],function(methodName){lodash[methodName].placeholder=lodash});arrayEach(["drop","take"],function(methodName,index){LazyWrapper.prototype[methodName]=function(n){var filtered=this.__filtered__;if(filtered&&!index){return new LazyWrapper(this)}n=n==null?1:nativeMax(nativeFloor(n)||0,0);var result=this.clone();if(filtered){result.__takeCount__=nativeMin(result.__takeCount__,n)}else{result.__views__.push({size:n,type:methodName+(result.__dir__<0?"Right":"")})}return result};LazyWrapper.prototype[methodName+"Right"]=function(n){return this.reverse()[methodName](n).reverse()}});arrayEach(["filter","map","takeWhile"],function(methodName,index){var type=index+1,isFilter=type!=LAZY_MAP_FLAG;LazyWrapper.prototype[methodName]=function(iteratee,thisArg){var result=this.clone();result.__iteratees__.push({iteratee:getCallback(iteratee,thisArg,1),type:type});result.__filtered__=result.__filtered__||isFilter;return result}});arrayEach(["first","last"],function(methodName,index){var takeName="take"+(index?"Right":"");LazyWrapper.prototype[methodName]=function(){return this[takeName](1).value()[0]}});arrayEach(["initial","rest"],function(methodName,index){var dropName="drop"+(index?"":"Right");LazyWrapper.prototype[methodName]=function(){return this.__filtered__?new LazyWrapper(this):this[dropName](1)}});arrayEach(["pluck","where"],function(methodName,index){var operationName=index?"filter":"map",createCallback=index?baseMatches:property;LazyWrapper.prototype[methodName]=function(value){return this[operationName](createCallback(value))}});LazyWrapper.prototype.compact=function(){return this.filter(identity)};LazyWrapper.prototype.reject=function(predicate,thisArg){predicate=getCallback(predicate,thisArg,1);return this.filter(function(value){return!predicate(value)})};LazyWrapper.prototype.slice=function(start,end){start=start==null?0:+start||0;var result=this;if(result.__filtered__&&(start>0||end<0)){return new LazyWrapper(result)}if(start<0){result=result.takeRight(-start)}else if(start){result=result.drop(start)}if(end!==undefined){end=+end||0;result=end<0?result.dropRight(-end):result.take(end-start)}return result};LazyWrapper.prototype.takeRightWhile=function(predicate,thisArg){return this.reverse().takeWhile(predicate,thisArg).reverse()};LazyWrapper.prototype.toArray=function(){return this.take(POSITIVE_INFINITY)};baseForOwn(LazyWrapper.prototype,function(func,methodName){var checkIteratee=/^(?:filter|map|reject)|While$/.test(methodName),retUnwrapped=/^(?:first|last)$/.test(methodName),lodashFunc=lodash[retUnwrapped?"take"+(methodName=="last"?"Right":""):methodName];if(!lodashFunc){return}lodash.prototype[methodName]=function(){var args=retUnwrapped?[1]:arguments,chainAll=this.__chain__,value=this.__wrapped__,isHybrid=!!this.__actions__.length,isLazy=value instanceof LazyWrapper,iteratee=args[0],useLazy=isLazy||isArray(value);if(useLazy&&checkIteratee&&typeof iteratee=="function"&&iteratee.length!=1){isLazy=useLazy=false}var interceptor=function(value){return retUnwrapped&&chainAll?lodashFunc(value,1)[0]:lodashFunc.apply(undefined,arrayPush([value],args))};var action={func:thru,args:[interceptor],thisArg:undefined},onlyLazy=isLazy&&!isHybrid;if(retUnwrapped&&!chainAll){if(onlyLazy){value=value.clone();value.__actions__.push(action);return func.call(value)}return lodashFunc.call(undefined,this.value())[0]}if(!retUnwrapped&&useLazy){value=onlyLazy?value:new LazyWrapper(this);var result=func.apply(value,args);result.__actions__.push(action);return new LodashWrapper(result,chainAll)}return this.thru(interceptor)}});arrayEach(["join","pop","push","replace","shift","sort","splice","split","unshift"],function(methodName){var func=(/^(?:replace|split)$/.test(methodName)?stringProto:arrayProto)[methodName],chainName=/^(?:push|sort|unshift)$/.test(methodName)?"tap":"thru",retUnwrapped=/^(?:join|pop|replace|shift)$/.test(methodName);lodash.prototype[methodName]=function(){var args=arguments;if(retUnwrapped&&!this.__chain__){return func.apply(this.value(),args)}return this[chainName](function(value){return func.apply(value,args)})}});baseForOwn(LazyWrapper.prototype,function(func,methodName){var lodashFunc=lodash[methodName];if(lodashFunc){var key=lodashFunc.name,names=realNames[key]||(realNames[key]=[]);names.push({name:methodName,func:lodashFunc})}});realNames[createHybridWrapper(undefined,BIND_KEY_FLAG).name]=[{name:"wrapper",func:undefined}];LazyWrapper.prototype.clone=lazyClone;LazyWrapper.prototype.reverse=lazyReverse;LazyWrapper.prototype.value=lazyValue;lodash.prototype.chain=wrapperChain;lodash.prototype.commit=wrapperCommit;lodash.prototype.concat=wrapperConcat;lodash.prototype.plant=wrapperPlant;lodash.prototype.reverse=wrapperReverse;lodash.prototype.toString=wrapperToString;lodash.prototype.run=lodash.prototype.toJSON=lodash.prototype.valueOf=lodash.prototype.value=wrapperValue;lodash.prototype.collect=lodash.prototype.map;lodash.prototype.head=lodash.prototype.first;lodash.prototype.select=lodash.prototype.filter;lodash.prototype.tail=lodash.prototype.rest;return lodash}var _=runInContext();if(typeof define=="function"&&typeof define.amd=="object"&&define.amd){root._=_;define(function(){return _})}else if(freeExports&&freeModule){if(moduleExports){(freeModule.exports=_)._=_}else{freeExports._=_}}else{root._=_}}).call(this)}).call(this,typeof global!=="undefined"?global:typeof self!=="undefined"?self:typeof window!=="undefined"?window:{})},{}],3:[function(require,module,exports){(function(window,document,undefined){var _MAP={8:"backspace",9:"tab",13:"enter",16:"shift",17:"ctrl",18:"alt",20:"capslock",27:"esc",32:"space",33:"pageup",34:"pagedown",35:"end",36:"home",37:"left",38:"up",39:"right",40:"down",45:"ins",46:"del",91:"meta",93:"meta",224:"meta"};var _KEYCODE_MAP={106:"*",107:"+",109:"-",110:".",111:"/",186:";",187:"=",188:",",189:"-",190:".",191:"/",192:"`",219:"[",220:"\\",221:"]",222:"'"};var _SHIFT_MAP={"~":"`","!":"1","@":"2","#":"3",$:"4","%":"5","^":"6","&":"7","*":"8","(":"9",")":"0",_:"-","+":"=",":":";",'"':"'","<":",",">":".","?":"/","|":"\\"};var _SPECIAL_ALIASES={option:"alt",command:"meta","return":"enter",escape:"esc",plus:"+",mod:/Mac|iPod|iPhone|iPad/.test(navigator.platform)?"meta":"ctrl"};var _REVERSE_MAP;for(var i=1;i<20;++i){_MAP[111+i]="f"+i}for(i=0;i<=9;++i){_MAP[i+96]=i}function _addEvent(object,type,callback){if(object.addEventListener){object.addEventListener(type,callback,false);return}object.attachEvent("on"+type,callback)}function _characterFromEvent(e){if(e.type=="keypress"){var character=String.fromCharCode(e.which);if(!e.shiftKey){character=character.toLowerCase()}return character}if(_MAP[e.which]){return _MAP[e.which]}if(_KEYCODE_MAP[e.which]){return _KEYCODE_MAP[e.which]}return String.fromCharCode(e.which).toLowerCase()}function _modifiersMatch(modifiers1,modifiers2){return modifiers1.sort().join(",")===modifiers2.sort().join(",")}function _eventModifiers(e){var modifiers=[];if(e.shiftKey){modifiers.push("shift")}if(e.altKey){modifiers.push("alt")}if(e.ctrlKey){modifiers.push("ctrl")}if(e.metaKey){modifiers.push("meta")}return modifiers}function _preventDefault(e){if(e.preventDefault){e.preventDefault();return}e.returnValue=false}function _stopPropagation(e){if(e.stopPropagation){e.stopPropagation();return}e.cancelBubble=true}function _isModifier(key){return key=="shift"||key=="ctrl"||key=="alt"||key=="meta"}function _getReverseMap(){if(!_REVERSE_MAP){_REVERSE_MAP={};for(var key in _MAP){if(key>95&&key<112){continue}if(_MAP.hasOwnProperty(key)){_REVERSE_MAP[_MAP[key]]=key}}}return _REVERSE_MAP}function _pickBestAction(key,modifiers,action){if(!action){action=_getReverseMap()[key]?"keydown":"keypress"}if(action=="keypress"&&modifiers.length){action="keydown"}return action}function _keysFromString(combination){if(combination==="+"){return["+"]}combination=combination.replace(/\+{2}/g,"+plus");return combination.split("+")}function _getKeyInfo(combination,action){var keys;var key;var i;var modifiers=[];keys=_keysFromString(combination);for(i=0;i<keys.length;++i){key=keys[i];if(_SPECIAL_ALIASES[key]){key=_SPECIAL_ALIASES[key]}if(action&&action!="keypress"&&_SHIFT_MAP[key]){key=_SHIFT_MAP[key];modifiers.push("shift")}if(_isModifier(key)){modifiers.push(key)}}action=_pickBestAction(key,modifiers,action);return{key:key,modifiers:modifiers,action:action}}function _belongsTo(element,ancestor){if(element===null||element===document){return false}if(element===ancestor){return true}return _belongsTo(element.parentNode,ancestor)}function Mousetrap(targetElement){var self=this;targetElement=targetElement||document;if(!(self instanceof Mousetrap)){return new Mousetrap(targetElement)}self.target=targetElement;self._callbacks={};self._directMap={};var _sequenceLevels={};var _resetTimer;var _ignoreNextKeyup=false;var _ignoreNextKeypress=false;var _nextExpectedAction=false;function _resetSequences(doNotReset){doNotReset=doNotReset||{};var activeSequences=false,key;for(key in _sequenceLevels){if(doNotReset[key]){activeSequences=true;continue}_sequenceLevels[key]=0}if(!activeSequences){_nextExpectedAction=false}}function _getMatches(character,modifiers,e,sequenceName,combination,level){var i;var callback;var matches=[];var action=e.type;if(!self._callbacks[character]){return[]}if(action=="keyup"&&_isModifier(character)){modifiers=[character]}for(i=0;i<self._callbacks[character].length;++i){callback=self._callbacks[character][i];if(!sequenceName&&callback.seq&&_sequenceLevels[callback.seq]!=callback.level){continue}if(action!=callback.action){continue}if(action=="keypress"&&!e.metaKey&&!e.ctrlKey||_modifiersMatch(modifiers,callback.modifiers)){var deleteCombo=!sequenceName&&callback.combo==combination;var deleteSequence=sequenceName&&callback.seq==sequenceName&&callback.level==level;if(deleteCombo||deleteSequence){self._callbacks[character].splice(i,1)}matches.push(callback)}}return matches}function _fireCallback(callback,e,combo,sequence){if(self.stopCallback(e,e.target||e.srcElement,combo,sequence)){return}if(callback(e,combo)===false){_preventDefault(e);_stopPropagation(e)}}self._handleKey=function(character,modifiers,e){var callbacks=_getMatches(character,modifiers,e);var i;var doNotReset={};var maxLevel=0;var processedSequenceCallback=false;for(i=0;i<callbacks.length;++i){if(callbacks[i].seq){maxLevel=Math.max(maxLevel,callbacks[i].level)}}for(i=0;i<callbacks.length;++i){if(callbacks[i].seq){if(callbacks[i].level!=maxLevel){continue}processedSequenceCallback=true;doNotReset[callbacks[i].seq]=1;_fireCallback(callbacks[i].callback,e,callbacks[i].combo,callbacks[i].seq);continue}if(!processedSequenceCallback){_fireCallback(callbacks[i].callback,e,callbacks[i].combo)}}var ignoreThisKeypress=e.type=="keypress"&&_ignoreNextKeypress;if(e.type==_nextExpectedAction&&!_isModifier(character)&&!ignoreThisKeypress){_resetSequences(doNotReset)}_ignoreNextKeypress=processedSequenceCallback&&e.type=="keydown"};function _handleKeyEvent(e){if(typeof e.which!=="number"){e.which=e.keyCode}var character=_characterFromEvent(e);if(!character){return}if(e.type=="keyup"&&_ignoreNextKeyup===character){_ignoreNextKeyup=false;return}self.handleKey(character,_eventModifiers(e),e)}function _resetSequenceTimer(){clearTimeout(_resetTimer);_resetTimer=setTimeout(_resetSequences,1e3)}function _bindSequence(combo,keys,callback,action){_sequenceLevels[combo]=0;function _increaseSequence(nextAction){return function(){_nextExpectedAction=nextAction;++_sequenceLevels[combo];_resetSequenceTimer()}}function _callbackAndReset(e){_fireCallback(callback,e,combo);if(action!=="keyup"){_ignoreNextKeyup=_characterFromEvent(e)}setTimeout(_resetSequences,10)}for(var i=0;i<keys.length;++i){var isFinal=i+1===keys.length;var wrappedCallback=isFinal?_callbackAndReset:_increaseSequence(action||_getKeyInfo(keys[i+1]).action);_bindSingle(keys[i],wrappedCallback,action,combo,i)}}function _bindSingle(combination,callback,action,sequenceName,level){self._directMap[combination+":"+action]=callback;combination=combination.replace(/\s+/g," ");var sequence=combination.split(" ");var info;if(sequence.length>1){_bindSequence(combination,sequence,callback,action);return}info=_getKeyInfo(combination,action);self._callbacks[info.key]=self._callbacks[info.key]||[];_getMatches(info.key,info.modifiers,{type:info.action},sequenceName,combination,level);self._callbacks[info.key][sequenceName?"unshift":"push"]({callback:callback,modifiers:info.modifiers,action:info.action,seq:sequenceName,level:level,combo:combination})}self._bindMultiple=function(combinations,callback,action){for(var i=0;i<combinations.length;++i){_bindSingle(combinations[i],callback,action)}};_addEvent(targetElement,"keypress",_handleKeyEvent);_addEvent(targetElement,"keydown",_handleKeyEvent);_addEvent(targetElement,"keyup",_handleKeyEvent)}Mousetrap.prototype.bind=function(keys,callback,action){var self=this;keys=keys instanceof Array?keys:[keys];self._bindMultiple.call(self,keys,callback,action);return self};Mousetrap.prototype.unbind=function(keys,action){var self=this;return self.bind.call(self,keys,function(){},action)};Mousetrap.prototype.trigger=function(keys,action){var self=this;if(self._directMap[keys+":"+action]){self._directMap[keys+":"+action]({},keys)}return self};Mousetrap.prototype.reset=function(){var self=this;self._callbacks={};self._directMap={};return self};Mousetrap.prototype.stopCallback=function(e,element){var self=this;if((" "+element.className+" ").indexOf(" mousetrap ")>-1){return false}if(_belongsTo(element,self.target)){return false}return element.tagName=="INPUT"||element.tagName=="SELECT"||element.tagName=="TEXTAREA"||element.isContentEditable};Mousetrap.prototype.handleKey=function(){var self=this;return self._handleKey.apply(self,arguments)};Mousetrap.init=function(){var documentMousetrap=Mousetrap(document);for(var method in documentMousetrap){if(method.charAt(0)!=="_"){Mousetrap[method]=function(method){return function(){return documentMousetrap[method].apply(documentMousetrap,arguments)}}(method)}}};Mousetrap.init();window.Mousetrap=Mousetrap;if(typeof module!=="undefined"&&module.exports){module.exports=Mousetrap}if(typeof define==="function"&&define.amd){define(function(){return Mousetrap})}})(window,document)},{}],4:[function(require,module,exports){(function(process){function normalizeArray(parts,allowAboveRoot){var up=0;for(var i=parts.length-1;i>=0;i--){var last=parts[i];if(last==="."){parts.splice(i,1)}else if(last===".."){parts.splice(i,1);up++}else if(up){parts.splice(i,1);up--}}if(allowAboveRoot){for(;up--;up){parts.unshift("..")}}return parts}var splitPathRe=/^(\/?|)([\s\S]*?)((?:\.{1,2}|[^\/]+?|)(\.[^.\/]*|))(?:[\/]*)$/;var splitPath=function(filename){return splitPathRe.exec(filename).slice(1)};exports.resolve=function(){var resolvedPath="",resolvedAbsolute=false;for(var i=arguments.length-1;i>=-1&&!resolvedAbsolute;i--){var path=i>=0?arguments[i]:process.cwd();if(typeof path!=="string"){throw new TypeError("Arguments to path.resolve must be strings")}else if(!path){continue}resolvedPath=path+"/"+resolvedPath;resolvedAbsolute=path.charAt(0)==="/"}resolvedPath=normalizeArray(filter(resolvedPath.split("/"),function(p){return!!p}),!resolvedAbsolute).join("/");return(resolvedAbsolute?"/":"")+resolvedPath||"."};exports.normalize=function(path){var isAbsolute=exports.isAbsolute(path),trailingSlash=substr(path,-1)==="/";path=normalizeArray(filter(path.split("/"),function(p){return!!p}),!isAbsolute).join("/");if(!path&&!isAbsolute){path="."}if(path&&trailingSlash){path+="/"}return(isAbsolute?"/":"")+path};exports.isAbsolute=function(path){return path.charAt(0)==="/"};exports.join=function(){var paths=Array.prototype.slice.call(arguments,0);return exports.normalize(filter(paths,function(p,index){if(typeof p!=="string"){throw new TypeError("Arguments to path.join must be strings")}return p}).join("/"))};exports.relative=function(from,to){from=exports.resolve(from).substr(1);to=exports.resolve(to).substr(1);function trim(arr){var start=0;for(;start<arr.length;start++){if(arr[start]!=="")break}var end=arr.length-1;for(;end>=0;end--){if(arr[end]!=="")break}if(start>end)return[];return arr.slice(start,end-start+1)}var fromParts=trim(from.split("/"));var toParts=trim(to.split("/"));var length=Math.min(fromParts.length,toParts.length);var samePartsLength=length;for(var i=0;i<length;i++){if(fromParts[i]!==toParts[i]){samePartsLength=i;break}}var outputParts=[];for(var i=samePartsLength;i<fromParts.length;i++){outputParts.push("..")}outputParts=outputParts.concat(toParts.slice(samePartsLength));return outputParts.join("/")};exports.sep="/";exports.delimiter=":";exports.dirname=function(path){var result=splitPath(path),root=result[0],dir=result[1];if(!root&&!dir){return"."}if(dir){dir=dir.substr(0,dir.length-1)}return root+dir};exports.basename=function(path,ext){var f=splitPath(path)[2];if(ext&&f.substr(-1*ext.length)===ext){f=f.substr(0,f.length-ext.length)}return f};exports.extname=function(path){return splitPath(path)[3]};function filter(xs,f){if(xs.filter)return xs.filter(f);var res=[];for(var i=0;i<xs.length;i++){if(f(xs[i],i,xs))res.push(xs[i])}return res}var substr="ab".substr(-1)==="b"?function(str,start,len){return str.substr(start,len)}:function(str,start,len){if(start<0)start=str.length+start;return str.substr(start,len)}}).call(this,require("_process"))},{_process:5}],5:[function(require,module,exports){var process=module.exports={};var queue=[];var draining=false;var currentQueue;var queueIndex=-1;function cleanUpNextTick(){draining=false;if(currentQueue.length){queue=currentQueue.concat(queue)}else{queueIndex=-1}if(queue.length){drainQueue()}}function drainQueue(){if(draining){return}var timeout=setTimeout(cleanUpNextTick);draining=true;var len=queue.length;while(len){currentQueue=queue;queue=[];while(++queueIndex<len){if(currentQueue){currentQueue[queueIndex].run()}}queueIndex=-1;len=queue.length}currentQueue=null;draining=false;clearTimeout(timeout)}process.nextTick=function(fun){var args=new Array(arguments.length-1);if(arguments.length>1){for(var i=1;i<arguments.length;i++){args[i-1]=arguments[i]}}queue.push(new Item(fun,args));if(queue.length===1&&!draining){setTimeout(drainQueue,0)}};function Item(fun,array){this.fun=fun;this.array=array}Item.prototype.run=function(){this.fun.apply(null,this.array)};process.title="browser";process.browser=true;process.env={};process.argv=[];process.version="";process.versions={};function noop(){}process.on=noop;process.addListener=noop;process.once=noop;process.off=noop;process.removeListener=noop;process.removeAllListeners=noop;process.emit=noop;process.binding=function(name){throw new Error("process.binding is not supported")};process.cwd=function(){return"/"};process.chdir=function(dir){throw new Error("process.chdir is not supported")};process.umask=function(){return 0}},{}],6:[function(require,module,exports){(function(global){(function(root){var freeExports=typeof exports=="object"&&exports&&!exports.nodeType&&exports;var freeModule=typeof module=="object"&&module&&!module.nodeType&&module;var freeGlobal=typeof global=="object"&&global;if(freeGlobal.global===freeGlobal||freeGlobal.window===freeGlobal||freeGlobal.self===freeGlobal){root=freeGlobal}var punycode,maxInt=2147483647,base=36,tMin=1,tMax=26,skew=38,damp=700,initialBias=72,initialN=128,delimiter="-",regexPunycode=/^xn--/,regexNonASCII=/[^\x20-\x7E]/,regexSeparators=/[\x2E\u3002\uFF0E\uFF61]/g,errors={overflow:"Overflow: input needs wider integers to process","not-basic":"Illegal input >= 0x80 (not a basic code point)","invalid-input":"Invalid input"},baseMinusTMin=base-tMin,floor=Math.floor,stringFromCharCode=String.fromCharCode,key;function error(type){throw RangeError(errors[type])}function map(array,fn){var length=array.length;var result=[];while(length--){result[length]=fn(array[length])}return result}function mapDomain(string,fn){var parts=string.split("@");var result="";if(parts.length>1){result=parts[0]+"@";string=parts[1]}string=string.replace(regexSeparators,".");var labels=string.split(".");var encoded=map(labels,fn).join(".");return result+encoded}function ucs2decode(string){var output=[],counter=0,length=string.length,value,extra;while(counter<length){value=string.charCodeAt(counter++);if(value>=55296&&value<=56319&&counter<length){extra=string.charCodeAt(counter++);if((extra&64512)==56320){output.push(((value&1023)<<10)+(extra&1023)+65536)}else{output.push(value);counter--}}else{output.push(value)}}return output}function ucs2encode(array){return map(array,function(value){var output="";if(value>65535){value-=65536;output+=stringFromCharCode(value>>>10&1023|55296);value=56320|value&1023}output+=stringFromCharCode(value);return output}).join("")}function basicToDigit(codePoint){if(codePoint-48<10){return codePoint-22}if(codePoint-65<26){return codePoint-65}if(codePoint-97<26){return codePoint-97}return base}function digitToBasic(digit,flag){return digit+22+75*(digit<26)-((flag!=0)<<5)}function adapt(delta,numPoints,firstTime){var k=0;delta=firstTime?floor(delta/damp):delta>>1;delta+=floor(delta/numPoints);for(;delta>baseMinusTMin*tMax>>1;k+=base){delta=floor(delta/baseMinusTMin)}return floor(k+(baseMinusTMin+1)*delta/(delta+skew))}function decode(input){var output=[],inputLength=input.length,out,i=0,n=initialN,bias=initialBias,basic,j,index,oldi,w,k,digit,t,baseMinusT;basic=input.lastIndexOf(delimiter);if(basic<0){basic=0}for(j=0;j<basic;++j){if(input.charCodeAt(j)>=128){error("not-basic")}output.push(input.charCodeAt(j))}for(index=basic>0?basic+1:0;index<inputLength;){for(oldi=i,w=1,k=base;;k+=base){if(index>=inputLength){error("invalid-input")}digit=basicToDigit(input.charCodeAt(index++));if(digit>=base||digit>floor((maxInt-i)/w)){error("overflow")}i+=digit*w;t=k<=bias?tMin:k>=bias+tMax?tMax:k-bias;if(digit<t){break}baseMinusT=base-t;if(w>floor(maxInt/baseMinusT)){error("overflow")}w*=baseMinusT}out=output.length+1;bias=adapt(i-oldi,out,oldi==0);if(floor(i/out)>maxInt-n){error("overflow")}n+=floor(i/out);i%=out;output.splice(i++,0,n)}return ucs2encode(output)}function encode(input){var n,delta,handledCPCount,basicLength,bias,j,m,q,k,t,currentValue,output=[],inputLength,handledCPCountPlusOne,baseMinusT,qMinusT;input=ucs2decode(input);inputLength=input.length;n=initialN;delta=0;bias=initialBias;for(j=0;j<inputLength;++j){currentValue=input[j];if(currentValue<128){output.push(stringFromCharCode(currentValue))}}handledCPCount=basicLength=output.length;if(basicLength){output.push(delimiter)}while(handledCPCount<inputLength){for(m=maxInt,j=0;j<inputLength;++j){currentValue=input[j];if(currentValue>=n&&currentValue<m){m=currentValue}}handledCPCountPlusOne=handledCPCount+1;if(m-n>floor((maxInt-delta)/handledCPCountPlusOne)){error("overflow")}delta+=(m-n)*handledCPCountPlusOne;n=m;for(j=0;j<inputLength;++j){currentValue=input[j];if(currentValue<n&&++delta>maxInt){error("overflow")}if(currentValue==n){for(q=delta,k=base;;k+=base){t=k<=bias?tMin:k>=bias+tMax?tMax:k-bias;if(q<t){break}qMinusT=q-t;baseMinusT=base-t;output.push(stringFromCharCode(digitToBasic(t+qMinusT%baseMinusT,0)));q=floor(qMinusT/baseMinusT)}output.push(stringFromCharCode(digitToBasic(q,0)));bias=adapt(delta,handledCPCountPlusOne,handledCPCount==basicLength);delta=0;++handledCPCount}}++delta;++n}return output.join("")}function toUnicode(input){return mapDomain(input,function(string){return regexPunycode.test(string)?decode(string.slice(4).toLowerCase()):string})}function toASCII(input){return mapDomain(input,function(string){return regexNonASCII.test(string)?"xn--"+encode(string):string})}punycode={version:"1.3.2",ucs2:{decode:ucs2decode,encode:ucs2encode},decode:decode,encode:encode,toASCII:toASCII,toUnicode:toUnicode};if(typeof define=="function"&&typeof define.amd=="object"&&define.amd){define("punycode",function(){return punycode})}else if(freeExports&&freeModule){if(module.exports==freeExports){freeModule.exports=punycode}else{for(key in punycode){punycode.hasOwnProperty(key)&&(freeExports[key]=punycode[key])}}}else{root.punycode=punycode}})(this)}).call(this,typeof global!=="undefined"?global:typeof self!=="undefined"?self:typeof window!=="undefined"?window:{})},{}],7:[function(require,module,exports){"use strict";function hasOwnProperty(obj,prop){return Object.prototype.hasOwnProperty.call(obj,prop)}module.exports=function(qs,sep,eq,options){sep=sep||"&";eq=eq||"=";var obj={};if(typeof qs!=="string"||qs.length===0){return obj}var regexp=/\+/g;qs=qs.split(sep);var maxKeys=1e3;if(options&&typeof options.maxKeys==="number"){maxKeys=options.maxKeys}var len=qs.length;if(maxKeys>0&&len>maxKeys){len=maxKeys}for(var i=0;i<len;++i){var x=qs[i].replace(regexp,"%20"),idx=x.indexOf(eq),kstr,vstr,k,v;if(idx>=0){kstr=x.substr(0,idx);vstr=x.substr(idx+1)}else{kstr=x;vstr=""}k=decodeURIComponent(kstr);v=decodeURIComponent(vstr);if(!hasOwnProperty(obj,k)){obj[k]=v}else if(isArray(obj[k])){obj[k].push(v)}else{obj[k]=[obj[k],v]}}return obj};var isArray=Array.isArray||function(xs){return Object.prototype.toString.call(xs)==="[object Array]"}},{}],8:[function(require,module,exports){"use strict";var stringifyPrimitive=function(v){switch(typeof v){case"string":return v;case"boolean":return v?"true":"false";case"number":return isFinite(v)?v:"";default:return""}};module.exports=function(obj,sep,eq,name){sep=sep||"&";eq=eq||"=";if(obj===null){obj=undefined}if(typeof obj==="object"){return map(objectKeys(obj),function(k){var ks=encodeURIComponent(stringifyPrimitive(k))+eq;if(isArray(obj[k])){return map(obj[k],function(v){return ks+encodeURIComponent(stringifyPrimitive(v))}).join(sep)}else{return ks+encodeURIComponent(stringifyPrimitive(obj[k]))}}).join(sep)}if(!name)return"";return encodeURIComponent(stringifyPrimitive(name))+eq+encodeURIComponent(stringifyPrimitive(obj))};var isArray=Array.isArray||function(xs){return Object.prototype.toString.call(xs)==="[object Array]"};function map(xs,f){if(xs.map)return xs.map(f);var res=[];for(var i=0;i<xs.length;i++){res.push(f(xs[i],i))}return res}var objectKeys=Object.keys||function(obj){var res=[];for(var key in obj){if(Object.prototype.hasOwnProperty.call(obj,key))res.push(key)}return res}},{}],9:[function(require,module,exports){"use strict";exports.decode=exports.parse=require("./decode");exports.encode=exports.stringify=require("./encode")},{"./decode":7,"./encode":8}],10:[function(require,module,exports){var punycode=require("punycode");exports.parse=urlParse;exports.resolve=urlResolve;exports.resolveObject=urlResolveObject;exports.format=urlFormat;exports.Url=Url;function Url(){this.protocol=null;this.slashes=null;this.auth=null;this.host=null;this.port=null;this.hostname=null;this.hash=null;this.search=null;this.query=null;this.pathname=null;this.path=null;this.href=null}var protocolPattern=/^([a-z0-9.+-]+:)/i,portPattern=/:[0-9]*$/,delims=["<",">",'"',"`"," ","\r","\n","	"],unwise=["{","}","|","\\","^","`"].concat(delims),autoEscape=["'"].concat(unwise),nonHostChars=["%","/","?",";","#"].concat(autoEscape),hostEndingChars=["/","?","#"],hostnameMaxLen=255,hostnamePartPattern=/^[a-z0-9A-Z_-]{0,63}$/,hostnamePartStart=/^([a-z0-9A-Z_-]{0,63})(.*)$/,unsafeProtocol={javascript:true,"javascript:":true},hostlessProtocol={javascript:true,"javascript:":true},slashedProtocol={http:true,https:true,ftp:true,gopher:true,file:true,"http:":true,"https:":true,"ftp:":true,"gopher:":true,"file:":true},querystring=require("querystring");function urlParse(url,parseQueryString,slashesDenoteHost){if(url&&isObject(url)&&url instanceof Url)return url;var u=new Url;u.parse(url,parseQueryString,slashesDenoteHost);return u}Url.prototype.parse=function(url,parseQueryString,slashesDenoteHost){if(!isString(url)){throw new TypeError("Parameter 'url' must be a string, not "+typeof url)}var rest=url;rest=rest.trim();var proto=protocolPattern.exec(rest);if(proto){proto=proto[0];var lowerProto=proto.toLowerCase();this.protocol=lowerProto;rest=rest.substr(proto.length)}if(slashesDenoteHost||proto||rest.match(/^\/\/[^@\/]+@[^@\/]+/)){var slashes=rest.substr(0,2)==="//";if(slashes&&!(proto&&hostlessProtocol[proto])){rest=rest.substr(2);this.slashes=true}}if(!hostlessProtocol[proto]&&(slashes||proto&&!slashedProtocol[proto])){var hostEnd=-1;for(var i=0;i<hostEndingChars.length;i++){var hec=rest.indexOf(hostEndingChars[i]);if(hec!==-1&&(hostEnd===-1||hec<hostEnd))hostEnd=hec}var auth,atSign;if(hostEnd===-1){atSign=rest.lastIndexOf("@")}else{atSign=rest.lastIndexOf("@",hostEnd)}if(atSign!==-1){auth=rest.slice(0,atSign);rest=rest.slice(atSign+1);this.auth=decodeURIComponent(auth)}hostEnd=-1;for(var i=0;i<nonHostChars.length;i++){var hec=rest.indexOf(nonHostChars[i]);if(hec!==-1&&(hostEnd===-1||hec<hostEnd))hostEnd=hec}if(hostEnd===-1)hostEnd=rest.length;this.host=rest.slice(0,hostEnd);rest=rest.slice(hostEnd);this.parseHost();this.hostname=this.hostname||"";var ipv6Hostname=this.hostname[0]==="["&&this.hostname[this.hostname.length-1]==="]";if(!ipv6Hostname){var hostparts=this.hostname.split(/\./);
+for(var i=0,l=hostparts.length;i<l;i++){var part=hostparts[i];if(!part)continue;if(!part.match(hostnamePartPattern)){var newpart="";for(var j=0,k=part.length;j<k;j++){if(part.charCodeAt(j)>127){newpart+="x"}else{newpart+=part[j]}}if(!newpart.match(hostnamePartPattern)){var validParts=hostparts.slice(0,i);var notHost=hostparts.slice(i+1);var bit=part.match(hostnamePartStart);if(bit){validParts.push(bit[1]);notHost.unshift(bit[2])}if(notHost.length){rest="/"+notHost.join(".")+rest}this.hostname=validParts.join(".");break}}}}if(this.hostname.length>hostnameMaxLen){this.hostname=""}else{this.hostname=this.hostname.toLowerCase()}if(!ipv6Hostname){var domainArray=this.hostname.split(".");var newOut=[];for(var i=0;i<domainArray.length;++i){var s=domainArray[i];newOut.push(s.match(/[^A-Za-z0-9_-]/)?"xn--"+punycode.encode(s):s)}this.hostname=newOut.join(".")}var p=this.port?":"+this.port:"";var h=this.hostname||"";this.host=h+p;this.href+=this.host;if(ipv6Hostname){this.hostname=this.hostname.substr(1,this.hostname.length-2);if(rest[0]!=="/"){rest="/"+rest}}}if(!unsafeProtocol[lowerProto]){for(var i=0,l=autoEscape.length;i<l;i++){var ae=autoEscape[i];var esc=encodeURIComponent(ae);if(esc===ae){esc=escape(ae)}rest=rest.split(ae).join(esc)}}var hash=rest.indexOf("#");if(hash!==-1){this.hash=rest.substr(hash);rest=rest.slice(0,hash)}var qm=rest.indexOf("?");if(qm!==-1){this.search=rest.substr(qm);this.query=rest.substr(qm+1);if(parseQueryString){this.query=querystring.parse(this.query)}rest=rest.slice(0,qm)}else if(parseQueryString){this.search="";this.query={}}if(rest)this.pathname=rest;if(slashedProtocol[lowerProto]&&this.hostname&&!this.pathname){this.pathname="/"}if(this.pathname||this.search){var p=this.pathname||"";var s=this.search||"";this.path=p+s}this.href=this.format();return this};function urlFormat(obj){if(isString(obj))obj=urlParse(obj);if(!(obj instanceof Url))return Url.prototype.format.call(obj);return obj.format()}Url.prototype.format=function(){var auth=this.auth||"";if(auth){auth=encodeURIComponent(auth);auth=auth.replace(/%3A/i,":");auth+="@"}var protocol=this.protocol||"",pathname=this.pathname||"",hash=this.hash||"",host=false,query="";if(this.host){host=auth+this.host}else if(this.hostname){host=auth+(this.hostname.indexOf(":")===-1?this.hostname:"["+this.hostname+"]");if(this.port){host+=":"+this.port}}if(this.query&&isObject(this.query)&&Object.keys(this.query).length){query=querystring.stringify(this.query)}var search=this.search||query&&"?"+query||"";if(protocol&&protocol.substr(-1)!==":")protocol+=":";if(this.slashes||(!protocol||slashedProtocol[protocol])&&host!==false){host="//"+(host||"");if(pathname&&pathname.charAt(0)!=="/")pathname="/"+pathname}else if(!host){host=""}if(hash&&hash.charAt(0)!=="#")hash="#"+hash;if(search&&search.charAt(0)!=="?")search="?"+search;pathname=pathname.replace(/[?#]/g,function(match){return encodeURIComponent(match)});search=search.replace("#","%23");return protocol+host+pathname+search+hash};function urlResolve(source,relative){return urlParse(source,false,true).resolve(relative)}Url.prototype.resolve=function(relative){return this.resolveObject(urlParse(relative,false,true)).format()};function urlResolveObject(source,relative){if(!source)return relative;return urlParse(source,false,true).resolveObject(relative)}Url.prototype.resolveObject=function(relative){if(isString(relative)){var rel=new Url;rel.parse(relative,false,true);relative=rel}var result=new Url;Object.keys(this).forEach(function(k){result[k]=this[k]},this);result.hash=relative.hash;if(relative.href===""){result.href=result.format();return result}if(relative.slashes&&!relative.protocol){Object.keys(relative).forEach(function(k){if(k!=="protocol")result[k]=relative[k]});if(slashedProtocol[result.protocol]&&result.hostname&&!result.pathname){result.path=result.pathname="/"}result.href=result.format();return result}if(relative.protocol&&relative.protocol!==result.protocol){if(!slashedProtocol[relative.protocol]){Object.keys(relative).forEach(function(k){result[k]=relative[k]});result.href=result.format();return result}result.protocol=relative.protocol;if(!relative.host&&!hostlessProtocol[relative.protocol]){var relPath=(relative.pathname||"").split("/");while(relPath.length&&!(relative.host=relPath.shift()));if(!relative.host)relative.host="";if(!relative.hostname)relative.hostname="";if(relPath[0]!=="")relPath.unshift("");if(relPath.length<2)relPath.unshift("");result.pathname=relPath.join("/")}else{result.pathname=relative.pathname}result.search=relative.search;result.query=relative.query;result.host=relative.host||"";result.auth=relative.auth;result.hostname=relative.hostname||relative.host;result.port=relative.port;if(result.pathname||result.search){var p=result.pathname||"";var s=result.search||"";result.path=p+s}result.slashes=result.slashes||relative.slashes;result.href=result.format();return result}var isSourceAbs=result.pathname&&result.pathname.charAt(0)==="/",isRelAbs=relative.host||relative.pathname&&relative.pathname.charAt(0)==="/",mustEndAbs=isRelAbs||isSourceAbs||result.host&&relative.pathname,removeAllDots=mustEndAbs,srcPath=result.pathname&&result.pathname.split("/")||[],relPath=relative.pathname&&relative.pathname.split("/")||[],psychotic=result.protocol&&!slashedProtocol[result.protocol];if(psychotic){result.hostname="";result.port=null;if(result.host){if(srcPath[0]==="")srcPath[0]=result.host;else srcPath.unshift(result.host)}result.host="";if(relative.protocol){relative.hostname=null;relative.port=null;if(relative.host){if(relPath[0]==="")relPath[0]=relative.host;else relPath.unshift(relative.host)}relative.host=null}mustEndAbs=mustEndAbs&&(relPath[0]===""||srcPath[0]==="")}if(isRelAbs){result.host=relative.host||relative.host===""?relative.host:result.host;result.hostname=relative.hostname||relative.hostname===""?relative.hostname:result.hostname;result.search=relative.search;result.query=relative.query;srcPath=relPath}else if(relPath.length){if(!srcPath)srcPath=[];srcPath.pop();srcPath=srcPath.concat(relPath);result.search=relative.search;result.query=relative.query}else if(!isNullOrUndefined(relative.search)){if(psychotic){result.hostname=result.host=srcPath.shift();var authInHost=result.host&&result.host.indexOf("@")>0?result.host.split("@"):false;if(authInHost){result.auth=authInHost.shift();result.host=result.hostname=authInHost.shift()}}result.search=relative.search;result.query=relative.query;if(!isNull(result.pathname)||!isNull(result.search)){result.path=(result.pathname?result.pathname:"")+(result.search?result.search:"")}result.href=result.format();return result}if(!srcPath.length){result.pathname=null;if(result.search){result.path="/"+result.search}else{result.path=null}result.href=result.format();return result}var last=srcPath.slice(-1)[0];var hasTrailingSlash=(result.host||relative.host)&&(last==="."||last==="..")||last==="";var up=0;for(var i=srcPath.length;i>=0;i--){last=srcPath[i];if(last=="."){srcPath.splice(i,1)}else if(last===".."){srcPath.splice(i,1);up++}else if(up){srcPath.splice(i,1);up--}}if(!mustEndAbs&&!removeAllDots){for(;up--;up){srcPath.unshift("..")}}if(mustEndAbs&&srcPath[0]!==""&&(!srcPath[0]||srcPath[0].charAt(0)!=="/")){srcPath.unshift("")}if(hasTrailingSlash&&srcPath.join("/").substr(-1)!=="/"){srcPath.push("")}var isAbsolute=srcPath[0]===""||srcPath[0]&&srcPath[0].charAt(0)==="/";if(psychotic){result.hostname=result.host=isAbsolute?"":srcPath.length?srcPath.shift():"";var authInHost=result.host&&result.host.indexOf("@")>0?result.host.split("@"):false;if(authInHost){result.auth=authInHost.shift();result.host=result.hostname=authInHost.shift()}}mustEndAbs=mustEndAbs||result.host&&srcPath.length;if(mustEndAbs&&!isAbsolute){srcPath.unshift("")}if(!srcPath.length){result.pathname=null;result.path=null}else{result.pathname=srcPath.join("/")}if(!isNull(result.pathname)||!isNull(result.search)){result.path=(result.pathname?result.pathname:"")+(result.search?result.search:"")}result.auth=relative.auth||result.auth;result.slashes=result.slashes||relative.slashes;result.href=result.format();return result};Url.prototype.parseHost=function(){var host=this.host;var port=portPattern.exec(host);if(port){port=port[0];if(port!==":"){this.port=port.substr(1)}host=host.substr(0,host.length-port.length)}if(host)this.hostname=host};function isString(arg){return typeof arg==="string"}function isObject(arg){return typeof arg==="object"&&arg!==null}function isNull(arg){return arg===null}function isNullOrUndefined(arg){return arg==null}},{punycode:6,querystring:9}],11:[function(require,module,exports){var $=require("jquery");function toggleDropdown(e){var $dropdown=$(e.currentTarget).parent().find(".dropdown-menu");$dropdown.toggleClass("open");e.stopPropagation();e.preventDefault()}function closeDropdown(e){$(".dropdown-menu").removeClass("open")}function init(){$(document).on("click",".toggle-dropdown",toggleDropdown);$(document).on("click",".dropdown-menu",function(e){e.stopPropagation()});$(document).on("click",closeDropdown)}module.exports={init:init}},{jquery:1}],12:[function(require,module,exports){var $=require("jquery");module.exports=$({})},{jquery:1}],13:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var storage=require("./storage");var dropdown=require("./dropdown");var events=require("./events");var state=require("./state");var keyboard=require("./keyboard");var navigation=require("./navigation");var sidebar=require("./sidebar");var toolbar=require("./toolbar");function start(config){sidebar.init();keyboard.init();dropdown.init();navigation.init();toolbar.createButton({index:0,icon:"fa fa-align-justify",onClick:function(e){e.preventDefault();sidebar.toggle()}});events.trigger("start",config);navigation.notify()}var gitbook={start:start,events:events,state:state,toolbar:toolbar,sidebar:sidebar,storage:storage,keyboard:keyboard};var MODULES={gitbook:gitbook,jquery:$,lodash:_};window.gitbook=gitbook;window.$=$;window.jQuery=$;gitbook.require=function(mods,fn){mods=_.map(mods,function(mod){mod=mod.toLowerCase();if(!MODULES[mod]){throw new Error("GitBook module "+mod+" doesn't exist")}return MODULES[mod]});fn.apply(null,mods)};module.exports={}},{"./dropdown":11,"./events":12,"./keyboard":14,"./navigation":16,"./sidebar":18,"./state":19,"./storage":20,"./toolbar":21,jquery:1,lodash:2}],14:[function(require,module,exports){var Mousetrap=require("mousetrap");var navigation=require("./navigation");var sidebar=require("./sidebar");function bindShortcut(keys,fn){Mousetrap.bind(keys,function(e){fn();return false})}function init(){bindShortcut(["right"],function(e){navigation.goNext()});bindShortcut(["left"],function(e){navigation.goPrev()});bindShortcut(["s"],function(e){sidebar.toggle()})}module.exports={init:init,bind:bindShortcut}},{"./navigation":16,"./sidebar":18,mousetrap:3}],15:[function(require,module,exports){var state=require("./state");function showLoading(p){state.$book.addClass("is-loading");p.always(function(){state.$book.removeClass("is-loading")});return p}module.exports={show:showLoading}},{"./state":19}],16:[function(require,module,exports){var $=require("jquery");var url=require("url");var events=require("./events");var state=require("./state");var loading=require("./loading");var usePushState=typeof history.pushState!=="undefined";function handleNavigation(relativeUrl,push){var uri=url.resolve(window.location.pathname,relativeUrl);notifyPageChange();location.href=relativeUrl;return;return loading.show($.get(uri).done(function(html){if(push)history.pushState({path:uri},null,uri);html=html.replace(/<(\/?)(html|head|body)([^>]*)>/gi,function(a,b,c,d){return"<"+b+"div"+(b?"":' data-element="'+c+'"')+d+">"});var $page=$(html);var $pageHead=$page.find("[data-element=head]");var $pageBody=$page.find(".book");document.title=$pageHead.find("title").text();var $head=$("head");$head.find("link[rel=prev]").remove();$head.find("link[rel=next]").remove();$head.append($pageHead.find("link[rel=prev]"));$head.append($pageHead.find("link[rel=next]"));var bodyClass=$(".book").attr("class");var scrollPosition=$(".book-summary .summary").scrollTop();$pageBody.toggleClass("with-summary",$(".book").hasClass("with-summary"));$(".book").replaceWith($pageBody);$(".book").attr("class",bodyClass);$(".book-summary .summary").scrollTop(scrollPosition);state.update($("html"));preparePage()}).fail(function(e){location.href=relativeUrl}))}function updateNavigationPosition(){var bodyInnerWidth,pageWrapperWidth;bodyInnerWidth=parseInt($(".body-inner").css("width"),10);pageWrapperWidth=parseInt($(".page-wrapper").css("width"),10);$(".navigation-next").css("margin-right",bodyInnerWidth-pageWrapperWidth+"px")}function notifyPageChange(){events.trigger("page.change")}function preparePage(notify){var $bookBody=$(".book-body");var $bookInner=$bookBody.find(".body-inner");var $pageWrapper=$bookInner.find(".page-wrapper");updateNavigationPosition();$bookInner.scrollTop(0);$bookBody.scrollTop(0);if(notify!==false)notifyPageChange()}function isLeftClickEvent(e){return e.button===0}function isModifiedEvent(e){return!!(e.metaKey||e.altKey||e.ctrlKey||e.shiftKey)}function handlePagination(e){if(isModifiedEvent(e)||!isLeftClickEvent(e)){return}e.stopPropagation();e.preventDefault();var url=$(this).attr("href");if(url)handleNavigation(url,true)}function goNext(){var url=$(".navigation-next").attr("href");if(url)handleNavigation(url,true)}function goPrev(){var url=$(".navigation-prev").attr("href");if(url)handleNavigation(url,true)}function init(){$.ajaxSetup({});if(location.protocol!=="file:"){history.replaceState({path:window.location.href},"")}window.onpopstate=function(event){if(event.state===null){return}return handleNavigation(event.state.path,false)};$(document).on("click",".navigation-prev",handlePagination);$(document).on("click",".navigation-next",handlePagination);$(document).on("click",".summary [data-path] a",handlePagination);$(window).resize(updateNavigationPosition);preparePage(false)}module.exports={init:init,goNext:goNext,goPrev:goPrev,notify:notifyPageChange}},{"./events":12,"./loading":15,"./state":19,jquery:1,url:10}],17:[function(require,module,exports){module.exports={isMobile:function(){return document.body.clientWidth<=600}}},{}],18:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var storage=require("./storage");var platform=require("./platform");var state=require("./state");function toggleSidebar(_state,animation){if(state!=null&&isOpen()==_state)return;if(animation==null)animation=true;state.$book.toggleClass("without-animation",!animation);state.$book.toggleClass("with-summary",_state);storage.set("sidebar",isOpen())}function isOpen(){return state.$book.hasClass("with-summary")}function init(){if(platform.isMobile()){toggleSidebar(false,false)}else{toggleSidebar(storage.get("sidebar",true),false)}$(document).on("click",".book-summary li.chapter a",function(e){if(platform.isMobile())toggleSidebar(false,false)})}function filterSummary(paths){var $summary=$(".book-summary");$summary.find("li").each(function(){var path=$(this).data("path");var st=paths==null||_.contains(paths,path);$(this).toggle(st);if(st)$(this).parents("li").show()})}module.exports={init:init,isOpen:isOpen,toggle:toggleSidebar,filter:filterSummary}},{"./platform":17,"./state":19,"./storage":20,jquery:1,lodash:2}],19:[function(require,module,exports){var $=require("jquery");var url=require("url");var path=require("path");var state={};state.update=function(dom){var $book=$(dom.find(".book"));state.$book=$book;state.level=$book.data("level");state.basePath=$book.data("basepath");state.innerLanguage=$book.data("innerlanguage");state.revision=$book.data("revision");state.filepath=$book.data("filepath");state.chapterTitle=$book.data("chapter-title");state.root=url.resolve(location.protocol+"//"+location.host,path.dirname(path.resolve(location.pathname.replace(/\/$/,"/index.html"),state.basePath))).replace(/\/?$/,"/");state.bookRoot=state.innerLanguage?url.resolve(state.root,".."):state.root};state.update($);module.exports=state},{jquery:1,path:4,url:10}],20:[function(require,module,exports){var baseKey="";module.exports={setBaseKey:function(key){baseKey=key},set:function(key,value){key=baseKey+":"+key;try{localStorage[key]=JSON.stringify(value)}catch(e){}},get:function(key,def){key=baseKey+":"+key;if(localStorage[key]===undefined)return def;try{var v=JSON.parse(localStorage[key]);return v==null?def:v}catch(err){return localStorage[key]||def}},remove:function(key){key=baseKey+":"+key;localStorage.removeItem(key)}}},{}],21:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var events=require("./events");var buttons=[];function insertAt(parent,selector,index,element){var lastIndex=parent.children(selector).size();if(index<0){index=Math.max(0,lastIndex+1+index)}parent.append(element);if(index<lastIndex){parent.children(selector).eq(index).before(parent.children(selector).last())}}function defaultOnClick(e){e.preventDefault()}function createDropdownMenu(dropdown){var $menu=$("<div>",{"class":"dropdown-menu",html:'<div class="dropdown-caret"><span class="caret-outer"></span><span class="caret-inner"></span></div>'});if(_.isString(dropdown)){$menu.append(dropdown)}else{var groups=_.map(dropdown,function(group){if(_.isArray(group))return group;else return[group]});_.each(groups,function(group){var $group=$("<div>",{"class":"buttons"});var sizeClass="size-"+group.length;_.each(group,function(btn){btn=_.defaults(btn||{},{text:"",className:"",onClick:defaultOnClick});var $btn=$("<button>",{"class":"button "+sizeClass+" "+btn.className,text:btn.text});$btn.click(btn.onClick);$group.append($btn)});$menu.append($group)})}return $menu}function createButton(opts){opts=_.defaults(opts||{},{label:"",icon:"",text:"",position:"left",className:"",onClick:defaultOnClick,dropdown:null,index:null});buttons.push(opts);updateButton(opts)}function updateButton(opts){var $result;var $toolbar=$(".book-header");var $title=$toolbar.find("h1");var positionClass="pull-"+opts.position;var $btn=$("<a>",{"class":"btn",text:opts.text?" "+opts.text:"","aria-label":opts.label,href:"#"});$btn.click(opts.onClick);if(opts.icon){$("<i>",{"class":opts.icon}).prependTo($btn)}if(opts.dropdown){var $container=$("<div>",{"class":"dropdown "+positionClass+" "+opts.className});$btn.addClass("toggle-dropdown");$container.append($btn);var $menu=createDropdownMenu(opts.dropdown);$menu.addClass("dropdown-"+(opts.position=="right"?"left":"right"));$container.append($menu);$result=$container}else{$btn.addClass(positionClass);$btn.addClass(opts.className);$result=$btn}$result.addClass("js-toolbar-action");if(_.isNumber(opts.index)&&opts.index>=0){insertAt($toolbar,".btn, .dropdown, h1",opts.index,$result)}else{$result.insertBefore($title)}}module.exports={createButton:createButton}},{"./events":12,jquery:1,lodash:2}]},{},[13]);
+//# sourceMappingURL=app.min.map
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/jquery.highlight.js b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/jquery.highlight.js
new file mode 100644
index 000000000..a0b69fc96
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/jquery.highlight.js
@@ -0,0 +1,84 @@
+gitbook.require(["jQuery"], function(jQuery) {
+
+/*
+ * jQuery Highlight plugin
+ *
+ * Based on highlight v3 by Johann Burkard
+ * http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html
+ *
+ * Code a little bit refactored and cleaned (in my humble opinion).
+ * Most important changes:
+ *  - has an option to highlight only entire words (wordsOnly - false by default),
+ *  - has an option to be case sensitive (caseSensitive - false by default)
+ *  - highlight element tag and class names can be specified in options
+ *
+ * Copyright (c) 2009 Bartek Szopka
+ *
+ * Licensed under MIT license.
+ *
+ */
+
+jQuery.extend({
+    highlight: function (node, re, nodeName, className) {
+        if (node.nodeType === 3) {
+            var match = node.data.match(re);
+            if (match) {
+                var highlight = document.createElement(nodeName || 'span');
+                highlight.className = className || 'highlight';
+                var wordNode = node.splitText(match.index);
+                wordNode.splitText(match[0].length);
+                var wordClone = wordNode.cloneNode(true);
+                highlight.appendChild(wordClone);
+                wordNode.parentNode.replaceChild(highlight, wordNode);
+                return 1; //skip added node in parent
+            }
+        } else if ((node.nodeType === 1 && node.childNodes) && // only element nodes that have children
+                !/(script|style)/i.test(node.tagName) && // ignore script and style nodes
+                !(node.tagName === nodeName.toUpperCase() && node.className === className)) { // skip if already highlighted
+            for (var i = 0; i < node.childNodes.length; i++) {
+                i += jQuery.highlight(node.childNodes[i], re, nodeName, className);
+            }
+        }
+        return 0;
+    }
+});
+
+jQuery.fn.unhighlight = function (options) {
+    var settings = { className: 'highlight', element: 'span' };
+    jQuery.extend(settings, options);
+
+    return this.find(settings.element + "." + settings.className).each(function () {
+        var parent = this.parentNode;
+        parent.replaceChild(this.firstChild, this);
+        parent.normalize();
+    }).end();
+};
+
+jQuery.fn.highlight = function (words, options) {
+    var settings = { className: 'highlight', element: 'span', caseSensitive: false, wordsOnly: false };
+    jQuery.extend(settings, options);
+
+    if (words.constructor === String) {
+        words = [words];
+    }
+    words = jQuery.grep(words, function(word, i){
+      return word !== '';
+    });
+    words = jQuery.map(words, function(word, i) {
+      return word.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
+    });
+    if (words.length === 0) { return this; }
+
+    var flag = settings.caseSensitive ? "" : "i";
+    var pattern = "(" + words.join("|") + ")";
+    if (settings.wordsOnly) {
+        pattern = "\\b" + pattern + "\\b";
+    }
+    var re = new RegExp(pattern, flag);
+
+    return this.each(function () {
+        jQuery.highlight(this, re, settings.element, settings.className);
+    });
+};
+
+});
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/lunr.js b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/lunr.js
new file mode 100644
index 000000000..3f846a141
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/lunr.js
@@ -0,0 +1,7 @@
+/**
+ * lunr - http://lunrjs.com - A bit like Solr, but much smaller and not as bright - 0.5.12
+ * Copyright (C) 2015 Oliver Nightingale
+ * MIT Licensed
+ * @license
+ */
+!function(){var t=function(e){var n=new t.Index;return n.pipeline.add(t.trimmer,t.stopWordFilter,t.stemmer),e&&e.call(n,n),n};t.version="0.5.12",t.utils={},t.utils.warn=function(t){return function(e){t.console&&console.warn&&console.warn(e)}}(this),t.EventEmitter=function(){this.events={}},t.EventEmitter.prototype.addListener=function(){var t=Array.prototype.slice.call(arguments),e=t.pop(),n=t;if("function"!=typeof e)throw new TypeError("last argument must be a function");n.forEach(function(t){this.hasHandler(t)||(this.events[t]=[]),this.events[t].push(e)},this)},t.EventEmitter.prototype.removeListener=function(t,e){if(this.hasHandler(t)){var n=this.events[t].indexOf(e);this.events[t].splice(n,1),this.events[t].length||delete this.events[t]}},t.EventEmitter.prototype.emit=function(t){if(this.hasHandler(t)){var e=Array.prototype.slice.call(arguments,1);this.events[t].forEach(function(t){t.apply(void 0,e)})}},t.EventEmitter.prototype.hasHandler=function(t){return t in this.events},t.tokenizer=function(t){return arguments.length&&null!=t&&void 0!=t?Array.isArray(t)?t.map(function(t){return t.toLowerCase()}):t.toString().trim().toLowerCase().split(/[\s\-\/]+/):[]},t.Pipeline=function(){this._stack=[]},t.Pipeline.registeredFunctions={},t.Pipeline.registerFunction=function(e,n){n in this.registeredFunctions&&t.utils.warn("Overwriting existing registered function: "+n),e.label=n,t.Pipeline.registeredFunctions[e.label]=e},t.Pipeline.warnIfFunctionNotRegistered=function(e){var n=e.label&&e.label in this.registeredFunctions;n||t.utils.warn("Function is not registered with pipeline. This may cause problems when serialising the index.\n",e)},t.Pipeline.load=function(e){var n=new t.Pipeline;return e.forEach(function(e){var i=t.Pipeline.registeredFunctions[e];if(!i)throw new Error("Cannot load un-registered function: "+e);n.add(i)}),n},t.Pipeline.prototype.add=function(){var e=Array.prototype.slice.call(arguments);e.forEach(function(e){t.Pipeline.warnIfFunctionNotRegistered(e),this._stack.push(e)},this)},t.Pipeline.prototype.after=function(e,n){t.Pipeline.warnIfFunctionNotRegistered(n);var i=this._stack.indexOf(e);if(-1==i)throw new Error("Cannot find existingFn");i+=1,this._stack.splice(i,0,n)},t.Pipeline.prototype.before=function(e,n){t.Pipeline.warnIfFunctionNotRegistered(n);var i=this._stack.indexOf(e);if(-1==i)throw new Error("Cannot find existingFn");this._stack.splice(i,0,n)},t.Pipeline.prototype.remove=function(t){var e=this._stack.indexOf(t);-1!=e&&this._stack.splice(e,1)},t.Pipeline.prototype.run=function(t){for(var e=[],n=t.length,i=this._stack.length,o=0;n>o;o++){for(var r=t[o],s=0;i>s&&(r=this._stack[s](r,o,t),void 0!==r);s++);void 0!==r&&e.push(r)}return e},t.Pipeline.prototype.reset=function(){this._stack=[]},t.Pipeline.prototype.toJSON=function(){return this._stack.map(function(e){return t.Pipeline.warnIfFunctionNotRegistered(e),e.label})},t.Vector=function(){this._magnitude=null,this.list=void 0,this.length=0},t.Vector.Node=function(t,e,n){this.idx=t,this.val=e,this.next=n},t.Vector.prototype.insert=function(e,n){this._magnitude=void 0;var i=this.list;if(!i)return this.list=new t.Vector.Node(e,n,i),this.length++;if(e<i.idx)return this.list=new t.Vector.Node(e,n,i),this.length++;for(var o=i,r=i.next;void 0!=r;){if(e<r.idx)return o.next=new t.Vector.Node(e,n,r),this.length++;o=r,r=r.next}return o.next=new t.Vector.Node(e,n,r),this.length++},t.Vector.prototype.magnitude=function(){if(this._magnitude)return this._magnitude;for(var t,e=this.list,n=0;e;)t=e.val,n+=t*t,e=e.next;return this._magnitude=Math.sqrt(n)},t.Vector.prototype.dot=function(t){for(var e=this.list,n=t.list,i=0;e&&n;)e.idx<n.idx?e=e.next:e.idx>n.idx?n=n.next:(i+=e.val*n.val,e=e.next,n=n.next);return i},t.Vector.prototype.similarity=function(t){return this.dot(t)/(this.magnitude()*t.magnitude())},t.SortedSet=function(){this.length=0,this.elements=[]},t.SortedSet.load=function(t){var e=new this;return e.elements=t,e.length=t.length,e},t.SortedSet.prototype.add=function(){var t,e;for(t=0;t<arguments.length;t++)e=arguments[t],~this.indexOf(e)||this.elements.splice(this.locationFor(e),0,e);this.length=this.elements.length},t.SortedSet.prototype.toArray=function(){return this.elements.slice()},t.SortedSet.prototype.map=function(t,e){return this.elements.map(t,e)},t.SortedSet.prototype.forEach=function(t,e){return this.elements.forEach(t,e)},t.SortedSet.prototype.indexOf=function(t){for(var e=0,n=this.elements.length,i=n-e,o=e+Math.floor(i/2),r=this.elements[o];i>1;){if(r===t)return o;t>r&&(e=o),r>t&&(n=o),i=n-e,o=e+Math.floor(i/2),r=this.elements[o]}return r===t?o:-1},t.SortedSet.prototype.locationFor=function(t){for(var e=0,n=this.elements.length,i=n-e,o=e+Math.floor(i/2),r=this.elements[o];i>1;)t>r&&(e=o),r>t&&(n=o),i=n-e,o=e+Math.floor(i/2),r=this.elements[o];return r>t?o:t>r?o+1:void 0},t.SortedSet.prototype.intersect=function(e){for(var n=new t.SortedSet,i=0,o=0,r=this.length,s=e.length,a=this.elements,h=e.elements;;){if(i>r-1||o>s-1)break;a[i]!==h[o]?a[i]<h[o]?i++:a[i]>h[o]&&o++:(n.add(a[i]),i++,o++)}return n},t.SortedSet.prototype.clone=function(){var e=new t.SortedSet;return e.elements=this.toArray(),e.length=e.elements.length,e},t.SortedSet.prototype.union=function(t){var e,n,i;return this.length>=t.length?(e=this,n=t):(e=t,n=this),i=e.clone(),i.add.apply(i,n.toArray()),i},t.SortedSet.prototype.toJSON=function(){return this.toArray()},t.Index=function(){this._fields=[],this._ref="id",this.pipeline=new t.Pipeline,this.documentStore=new t.Store,this.tokenStore=new t.TokenStore,this.corpusTokens=new t.SortedSet,this.eventEmitter=new t.EventEmitter,this._idfCache={},this.on("add","remove","update",function(){this._idfCache={}}.bind(this))},t.Index.prototype.on=function(){var t=Array.prototype.slice.call(arguments);return this.eventEmitter.addListener.apply(this.eventEmitter,t)},t.Index.prototype.off=function(t,e){return this.eventEmitter.removeListener(t,e)},t.Index.load=function(e){e.version!==t.version&&t.utils.warn("version mismatch: current "+t.version+" importing "+e.version);var n=new this;return n._fields=e.fields,n._ref=e.ref,n.documentStore=t.Store.load(e.documentStore),n.tokenStore=t.TokenStore.load(e.tokenStore),n.corpusTokens=t.SortedSet.load(e.corpusTokens),n.pipeline=t.Pipeline.load(e.pipeline),n},t.Index.prototype.field=function(t,e){var e=e||{},n={name:t,boost:e.boost||1};return this._fields.push(n),this},t.Index.prototype.ref=function(t){return this._ref=t,this},t.Index.prototype.add=function(e,n){var i={},o=new t.SortedSet,r=e[this._ref],n=void 0===n?!0:n;this._fields.forEach(function(n){var r=this.pipeline.run(t.tokenizer(e[n.name]));i[n.name]=r,t.SortedSet.prototype.add.apply(o,r)},this),this.documentStore.set(r,o),t.SortedSet.prototype.add.apply(this.corpusTokens,o.toArray());for(var s=0;s<o.length;s++){var a=o.elements[s],h=this._fields.reduce(function(t,e){var n=i[e.name].length;if(!n)return t;var o=i[e.name].filter(function(t){return t===a}).length;return t+o/n*e.boost},0);this.tokenStore.add(a,{ref:r,tf:h})}n&&this.eventEmitter.emit("add",e,this)},t.Index.prototype.remove=function(t,e){var n=t[this._ref],e=void 0===e?!0:e;if(this.documentStore.has(n)){var i=this.documentStore.get(n);this.documentStore.remove(n),i.forEach(function(t){this.tokenStore.remove(t,n)},this),e&&this.eventEmitter.emit("remove",t,this)}},t.Index.prototype.update=function(t,e){var e=void 0===e?!0:e;this.remove(t,!1),this.add(t,!1),e&&this.eventEmitter.emit("update",t,this)},t.Index.prototype.idf=function(t){var e="@"+t;if(Object.prototype.hasOwnProperty.call(this._idfCache,e))return this._idfCache[e];var n=this.tokenStore.count(t),i=1;return n>0&&(i=1+Math.log(this.documentStore.length/n)),this._idfCache[e]=i},t.Index.prototype.search=function(e){var n=this.pipeline.run(t.tokenizer(e)),i=new t.Vector,o=[],r=this._fields.reduce(function(t,e){return t+e.boost},0),s=n.some(function(t){return this.tokenStore.has(t)},this);if(!s)return[];n.forEach(function(e,n,s){var a=1/s.length*this._fields.length*r,h=this,l=this.tokenStore.expand(e).reduce(function(n,o){var r=h.corpusTokens.indexOf(o),s=h.idf(o),l=1,u=new t.SortedSet;if(o!==e){var c=Math.max(3,o.length-e.length);l=1/Math.log(c)}return r>-1&&i.insert(r,a*s*l),Object.keys(h.tokenStore.get(o)).forEach(function(t){u.add(t)}),n.union(u)},new t.SortedSet);o.push(l)},this);var a=o.reduce(function(t,e){return t.intersect(e)});return a.map(function(t){return{ref:t,score:i.similarity(this.documentVector(t))}},this).sort(function(t,e){return e.score-t.score})},t.Index.prototype.documentVector=function(e){for(var n=this.documentStore.get(e),i=n.length,o=new t.Vector,r=0;i>r;r++){var s=n.elements[r],a=this.tokenStore.get(s)[e].tf,h=this.idf(s);o.insert(this.corpusTokens.indexOf(s),a*h)}return o},t.Index.prototype.toJSON=function(){return{version:t.version,fields:this._fields,ref:this._ref,documentStore:this.documentStore.toJSON(),tokenStore:this.tokenStore.toJSON(),corpusTokens:this.corpusTokens.toJSON(),pipeline:this.pipeline.toJSON()}},t.Index.prototype.use=function(t){var e=Array.prototype.slice.call(arguments,1);e.unshift(this),t.apply(this,e)},t.Store=function(){this.store={},this.length=0},t.Store.load=function(e){var n=new this;return n.length=e.length,n.store=Object.keys(e.store).reduce(function(n,i){return n[i]=t.SortedSet.load(e.store[i]),n},{}),n},t.Store.prototype.set=function(t,e){this.has(t)||this.length++,this.store[t]=e},t.Store.prototype.get=function(t){return this.store[t]},t.Store.prototype.has=function(t){return t in this.store},t.Store.prototype.remove=function(t){this.has(t)&&(delete this.store[t],this.length--)},t.Store.prototype.toJSON=function(){return{store:this.store,length:this.length}},t.stemmer=function(){var t={ational:"ate",tional:"tion",enci:"ence",anci:"ance",izer:"ize",bli:"ble",alli:"al",entli:"ent",eli:"e",ousli:"ous",ization:"ize",ation:"ate",ator:"ate",alism:"al",iveness:"ive",fulness:"ful",ousness:"ous",aliti:"al",iviti:"ive",biliti:"ble",logi:"log"},e={icate:"ic",ative:"",alize:"al",iciti:"ic",ical:"ic",ful:"",ness:""},n="[^aeiou]",i="[aeiouy]",o=n+"[^aeiouy]*",r=i+"[aeiou]*",s="^("+o+")?"+r+o,a="^("+o+")?"+r+o+"("+r+")?$",h="^("+o+")?"+r+o+r+o,l="^("+o+")?"+i,u=new RegExp(s),c=new RegExp(h),f=new RegExp(a),d=new RegExp(l),p=/^(.+?)(ss|i)es$/,m=/^(.+?)([^s])s$/,v=/^(.+?)eed$/,y=/^(.+?)(ed|ing)$/,g=/.$/,S=/(at|bl|iz)$/,w=new RegExp("([^aeiouylsz])\\1$"),x=new RegExp("^"+o+i+"[^aeiouwxy]$"),k=/^(.+?[^aeiou])y$/,b=/^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/,E=/^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/,_=/^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/,F=/^(.+?)(s|t)(ion)$/,O=/^(.+?)e$/,P=/ll$/,N=new RegExp("^"+o+i+"[^aeiouwxy]$"),T=function(n){var i,o,r,s,a,h,l;if(n.length<3)return n;if(r=n.substr(0,1),"y"==r&&(n=r.toUpperCase()+n.substr(1)),s=p,a=m,s.test(n)?n=n.replace(s,"$1$2"):a.test(n)&&(n=n.replace(a,"$1$2")),s=v,a=y,s.test(n)){var T=s.exec(n);s=u,s.test(T[1])&&(s=g,n=n.replace(s,""))}else if(a.test(n)){var T=a.exec(n);i=T[1],a=d,a.test(i)&&(n=i,a=S,h=w,l=x,a.test(n)?n+="e":h.test(n)?(s=g,n=n.replace(s,"")):l.test(n)&&(n+="e"))}if(s=k,s.test(n)){var T=s.exec(n);i=T[1],n=i+"i"}if(s=b,s.test(n)){var T=s.exec(n);i=T[1],o=T[2],s=u,s.test(i)&&(n=i+t[o])}if(s=E,s.test(n)){var T=s.exec(n);i=T[1],o=T[2],s=u,s.test(i)&&(n=i+e[o])}if(s=_,a=F,s.test(n)){var T=s.exec(n);i=T[1],s=c,s.test(i)&&(n=i)}else if(a.test(n)){var T=a.exec(n);i=T[1]+T[2],a=c,a.test(i)&&(n=i)}if(s=O,s.test(n)){var T=s.exec(n);i=T[1],s=c,a=f,h=N,(s.test(i)||a.test(i)&&!h.test(i))&&(n=i)}return s=P,a=c,s.test(n)&&a.test(n)&&(s=g,n=n.replace(s,"")),"y"==r&&(n=r.toLowerCase()+n.substr(1)),n};return T}(),t.Pipeline.registerFunction(t.stemmer,"stemmer"),t.stopWordFilter=function(e){return e&&t.stopWordFilter.stopWords[e]!==e?e:void 0},t.stopWordFilter.stopWords={a:"a",able:"able",about:"about",across:"across",after:"after",all:"all",almost:"almost",also:"also",am:"am",among:"among",an:"an",and:"and",any:"any",are:"are",as:"as",at:"at",be:"be",because:"because",been:"been",but:"but",by:"by",can:"can",cannot:"cannot",could:"could",dear:"dear",did:"did","do":"do",does:"does",either:"either","else":"else",ever:"ever",every:"every","for":"for",from:"from",get:"get",got:"got",had:"had",has:"has",have:"have",he:"he",her:"her",hers:"hers",him:"him",his:"his",how:"how",however:"however",i:"i","if":"if","in":"in",into:"into",is:"is",it:"it",its:"its",just:"just",least:"least",let:"let",like:"like",likely:"likely",may:"may",me:"me",might:"might",most:"most",must:"must",my:"my",neither:"neither",no:"no",nor:"nor",not:"not",of:"of",off:"off",often:"often",on:"on",only:"only",or:"or",other:"other",our:"our",own:"own",rather:"rather",said:"said",say:"say",says:"says",she:"she",should:"should",since:"since",so:"so",some:"some",than:"than",that:"that",the:"the",their:"their",them:"them",then:"then",there:"there",these:"these",they:"they","this":"this",tis:"tis",to:"to",too:"too",twas:"twas",us:"us",wants:"wants",was:"was",we:"we",were:"were",what:"what",when:"when",where:"where",which:"which","while":"while",who:"who",whom:"whom",why:"why",will:"will","with":"with",would:"would",yet:"yet",you:"you",your:"your"},t.Pipeline.registerFunction(t.stopWordFilter,"stopWordFilter"),t.trimmer=function(t){var e=t.replace(/^\W+/,"").replace(/\W+$/,"");return""===e?void 0:e},t.Pipeline.registerFunction(t.trimmer,"trimmer"),t.TokenStore=function(){this.root={docs:{}},this.length=0},t.TokenStore.load=function(t){var e=new this;return e.root=t.root,e.length=t.length,e},t.TokenStore.prototype.add=function(t,e,n){var n=n||this.root,i=t[0],o=t.slice(1);return i in n||(n[i]={docs:{}}),0===o.length?(n[i].docs[e.ref]=e,void(this.length+=1)):this.add(o,e,n[i])},t.TokenStore.prototype.has=function(t){if(!t)return!1;for(var e=this.root,n=0;n<t.length;n++){if(!e[t[n]])return!1;e=e[t[n]]}return!0},t.TokenStore.prototype.getNode=function(t){if(!t)return{};for(var e=this.root,n=0;n<t.length;n++){if(!e[t[n]])return{};e=e[t[n]]}return e},t.TokenStore.prototype.get=function(t,e){return this.getNode(t,e).docs||{}},t.TokenStore.prototype.count=function(t,e){return Object.keys(this.get(t,e)).length},t.TokenStore.prototype.remove=function(t,e){if(t){for(var n=this.root,i=0;i<t.length;i++){if(!(t[i]in n))return;n=n[t[i]]}delete n.docs[e]}},t.TokenStore.prototype.expand=function(t,e){var n=this.getNode(t),i=n.docs||{},e=e||[];return Object.keys(i).length&&e.push(t),Object.keys(n).forEach(function(n){"docs"!==n&&e.concat(this.expand(t+n,e))},this),e},t.TokenStore.prototype.toJSON=function(){return{root:this.root,length:this.length}},function(t,e){"function"==typeof define&&define.amd?define(e):"object"==typeof exports?module.exports=e():t.lunr=e()}(this,function(){return t})}();
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-bookdown.js b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-bookdown.js
new file mode 100644
index 000000000..4337cd193
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-bookdown.js
@@ -0,0 +1,232 @@
+gitbook.require(["gitbook", "lodash", "jQuery"], function(gitbook, _, $) {
+
+  var gs = gitbook.storage;
+
+  gitbook.events.bind("start", function(e, config) {
+
+    // add the Edit button (edit on Github)
+    var edit = config.edit;
+    if (edit && edit.link) gitbook.toolbar.createButton({
+      icon: 'fa fa-edit',
+      label: edit.text || 'Edit',
+      position: 'left',
+      onClick: function(e) {
+        e.preventDefault();
+        window.open(edit.link);
+      }
+    });
+
+    // add the History button (file history on Github)
+    var history = config.history;
+    if (history && history.link) gitbook.toolbar.createButton({
+      icon: 'fa fa-history',
+      label: history.text || 'History',
+      position: 'left',
+      onClick: function(e) {
+        e.preventDefault();
+        window.open(history.link);
+      }
+    });
+
+    var down = config.download;
+    var normalizeDownload = function() {
+      if (!down || !(down instanceof Array) || down.length === 0) return;
+      if (down[0] instanceof Array) return down;
+      return $.map(down, function(file, i) {
+        return [[file, file.replace(/.*[.]/g, '').toUpperCase()]];
+      });
+    };
+    down = normalizeDownload(down);
+    if (down) if (down.length === 1 && /[.]pdf$/.test(down[0][0])) {
+      gitbook.toolbar.createButton({
+        icon: 'fa fa-file-pdf-o',
+        label: down[0][1],
+        position: 'left',
+        onClick: function(e) {
+          e.preventDefault();
+          window.open(down[0][0]);
+        }
+      });
+    } else {
+      gitbook.toolbar.createButton({
+        icon: 'fa fa-download',
+        label: 'Download',
+        position: 'left',
+        dropdown: $.map(down, function(item, i) {
+          return {
+            text: item[1],
+            onClick: function(e) {
+              e.preventDefault();
+              window.open(item[0]);
+            }
+          };
+        })
+      });
+    }
+
+    // highlight the current section in TOC
+    var href = window.location.pathname;
+    href = href.substr(href.lastIndexOf('/') + 1);
+    if (href === '') href = 'index.html';
+    var li = $('a[href^="' + href + location.hash + '"]').parent('li.chapter').first();
+    var summary = $('ul.summary'), chaps = summary.find('li.chapter');
+    if (li.length === 0) li = chaps.first();
+    li.addClass('active');
+    chaps.on('click', function(e) {
+      chaps.removeClass('active');
+      $(this).addClass('active');
+      gs.set('tocScrollTop', summary.scrollTop());
+    });
+
+    var toc = config.toc;
+    // collapse TOC items that are not for the current chapter
+    if (toc && toc.collapse) (function() {
+      var type = toc.collapse;
+      if (type === 'none') return;
+      if (type !== 'section' && type !== 'subsection') return;
+      // sections under chapters
+      var toc_sub = summary.children('li[data-level]').children('ul');
+      if (type === 'section') {
+        toc_sub.hide()
+          .parent().has(li).children('ul').show();
+      } else {
+        toc_sub.children('li').children('ul').hide()
+          .parent().has(li).children('ul').show();
+      }
+      li.children('ul').show();
+      var toc_sub2 = toc_sub.children('li');
+      if (type === 'section') toc_sub2.children('ul').hide();
+      summary.children('li[data-level]').find('a')
+        .on('click.bookdown', function(e) {
+          if (href === $(this).attr('href').replace(/#.*/, ''))
+            $(this).parent('li').children('ul').toggle();
+        });
+    })();
+
+    // add tooltips to the <a>'s that are truncated
+    $('a').each(function(i, el) {
+      if (el.offsetWidth >= el.scrollWidth) return;
+      if (typeof el.title === 'undefined') return;
+      el.title = el.text;
+    });
+
+    // restore TOC scroll position
+    var pos = gs.get('tocScrollTop');
+    if (typeof pos !== 'undefined') summary.scrollTop(pos);
+
+    // highlight the TOC item that has same text as the heading in view as scrolling
+    if (toc && toc.scroll_highlight !== false) (function() {
+      // scroll the current TOC item into viewport
+      var ht = $(window).height(), rect = li[0].getBoundingClientRect();
+      if (rect.top >= ht || rect.top <= 0 || rect.bottom <= 0) {
+        summary.scrollTop(li[0].offsetTop);
+      }
+      // current chapter TOC items
+      var items = $('a[href^="' + href + '"]').parent('li.chapter'),
+          m = items.length;
+      if (m === 0) {
+        items = summary.find('li.chapter');
+        m = items.length;
+      }
+      if (m === 0) return;
+      // all section titles on current page
+      var hs = bookInner.find('.page-inner').find('h1,h2,h3'), n = hs.length,
+          ts = hs.map(function(i, el) { return $(el).text(); });
+      if (n === 0) return;
+      var scrollHandler = function(e) {
+        var ht = $(window).height();
+        clearTimeout($.data(this, 'scrollTimer'));
+        $.data(this, 'scrollTimer', setTimeout(function() {
+          // find the first visible title in the viewport
+          for (var i = 0; i < n; i++) {
+            var rect = hs[i].getBoundingClientRect();
+            if (rect.top >= 0 && rect.bottom <= ht) break;
+          }
+          if (i === n) return;
+          items.removeClass('active');
+          for (var j = 0; j < m; j++) {
+            if (items.eq(j).children('a').first().text() === ts[i]) break;
+          }
+          if (j === m) j = 0;  // highlight the chapter title
+          // search bottom-up for a visible TOC item to highlight; if an item is
+          // hidden, we check if its parent is visible, and so on
+          while (j > 0 && items.eq(j).is(':hidden')) j--;
+          items.eq(j).addClass('active');
+        }, 250));
+      };
+      bookInner.on('scroll.bookdown', scrollHandler);
+      bookBody.on('scroll.bookdown', scrollHandler);
+    })();
+
+    // do not refresh the page if the TOC item points to the current page
+    $('a[href="' + href + '"]').parent('li.chapter').children('a')
+      .on('click', function(e) {
+        bookInner.scrollTop(0);
+        bookBody.scrollTop(0);
+        return false;
+      });
+
+    var toolbar = config.toolbar;
+    if (!toolbar || toolbar.position !== 'static') {
+      var bookHeader = $('.book-header');
+      bookBody.addClass('fixed');
+      bookHeader.addClass('fixed')
+      .css('background-color', bookBody.css('background-color'))
+      .on('click.bookdown', function(e) {
+        // the theme may have changed after user clicks the theme button
+        bookHeader.css('background-color', bookBody.css('background-color'));
+      });
+    }
+
+  });
+
+  gitbook.events.bind("page.change", function(e) {
+    // store TOC scroll position
+    var summary = $('ul.summary');
+    gs.set('tocScrollTop', summary.scrollTop());
+  });
+
+  var bookBody = $('.book-body'), bookInner = bookBody.find('.body-inner');
+  var chapterTitle = function() {
+    return bookInner.find('.page-inner').find('h1,h2').first().text();
+  };
+  var bookTitle = function() {
+    return bookInner.find('.book-header > h1').first().text();
+  };
+  var saveScrollPos = function(e) {
+    // save scroll position before page is reloaded
+    gs.set('bodyScrollTop', {
+      body: bookBody.scrollTop(),
+      inner: bookInner.scrollTop(),
+      focused: document.hasFocus(),
+      title: chapterTitle()
+    });
+  };
+  $(document).on('servr:reload', saveScrollPos);
+
+  // check if the page is loaded in an iframe (e.g. the RStudio preview window)
+  var inIFrame = function() {
+    var inIframe = true;
+    try { inIframe = window.self !== window.top; } catch (e) {}
+    return inIframe;
+  };
+  $(window).on('blur unload', function(e) {
+    if (inIFrame()) saveScrollPos(e);
+    gs.set('bookTitle', bookTitle());
+  });
+
+  $(function(e) {
+    if (gs.get('bookTitle', '') !== bookTitle()) localStorage.clear();
+    var pos = gs.get('bodyScrollTop');
+    if (pos) {
+      if (pos.title === chapterTitle()) {
+        if (pos.body !== 0) bookBody.scrollTop(pos.body);
+        if (pos.inner !== 0) bookInner.scrollTop(pos.inner);
+      }
+      if (pos.focused) bookInner.find('.page-wrapper').focus();
+    }
+    // clear book body scroll position
+    gs.remove('bodyScrollTop');
+  });
+
+});
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-fontsettings.js b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-fontsettings.js
new file mode 100644
index 000000000..b39eca27e
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-fontsettings.js
@@ -0,0 +1,151 @@
+gitbook.require(["gitbook", "lodash", "jQuery"], function(gitbook, _, $) {
+    var fontState;
+
+    var THEMES = {
+        "white": 0,
+        "sepia": 1,
+        "night": 2
+    };
+
+    var FAMILY = {
+        "serif": 0,
+        "sans": 1
+    };
+
+    // Save current font settings
+    function saveFontSettings() {
+        gitbook.storage.set("fontState", fontState);
+        update();
+    }
+
+    // Increase font size
+    function enlargeFontSize(e) {
+        e.preventDefault();
+        if (fontState.size >= 4) return;
+
+        fontState.size++;
+        saveFontSettings();
+    };
+
+    // Decrease font size
+    function reduceFontSize(e) {
+        e.preventDefault();
+        if (fontState.size <= 0) return;
+
+        fontState.size--;
+        saveFontSettings();
+    };
+
+    // Change font family
+    function changeFontFamily(index, e) {
+        e.preventDefault();
+
+        fontState.family = index;
+        saveFontSettings();
+    };
+
+    // Change type of color
+    function changeColorTheme(index, e) {
+        e.preventDefault();
+
+        var $book = $(".book");
+
+        if (fontState.theme !== 0)
+            $book.removeClass("color-theme-"+fontState.theme);
+
+        fontState.theme = index;
+        if (fontState.theme !== 0)
+            $book.addClass("color-theme-"+fontState.theme);
+
+        saveFontSettings();
+    };
+
+    function update() {
+        var $book = gitbook.state.$book;
+
+        $(".font-settings .font-family-list li").removeClass("active");
+        $(".font-settings .font-family-list li:nth-child("+(fontState.family+1)+")").addClass("active");
+
+        $book[0].className = $book[0].className.replace(/\bfont-\S+/g, '');
+        $book.addClass("font-size-"+fontState.size);
+        $book.addClass("font-family-"+fontState.family);
+
+        if(fontState.theme !== 0) {
+            $book[0].className = $book[0].className.replace(/\bcolor-theme-\S+/g, '');
+            $book.addClass("color-theme-"+fontState.theme);
+        }
+    };
+
+    function init(config) {
+        var $bookBody, $book;
+
+        //Find DOM elements.
+        $book = gitbook.state.$book;
+        $bookBody = $book.find(".book-body");
+
+        // Instantiate font state object
+        fontState = gitbook.storage.get("fontState", {
+            size: config.size || 2,
+            family: FAMILY[config.family || "sans"],
+            theme: THEMES[config.theme || "white"]
+        });
+
+        update();
+    };
+
+
+    gitbook.events.bind("start", function(e, config) {
+        var opts = config.fontsettings;
+
+        // Create buttons in toolbar
+        gitbook.toolbar.createButton({
+            icon: 'fa fa-font',
+            label: 'Font Settings',
+            className: 'font-settings',
+            dropdown: [
+                [
+                    {
+                        text: 'A',
+                        className: 'font-reduce',
+                        onClick: reduceFontSize
+                    },
+                    {
+                        text: 'A',
+                        className: 'font-enlarge',
+                        onClick: enlargeFontSize
+                    }
+                ],
+                [
+                    {
+                        text: 'Serif',
+                        onClick: _.partial(changeFontFamily, 0)
+                    },
+                    {
+                        text: 'Sans',
+                        onClick: _.partial(changeFontFamily, 1)
+                    }
+                ],
+                [
+                    {
+                        text: 'White',
+                        onClick: _.partial(changeColorTheme, 0)
+                    },
+                    {
+                        text: 'Sepia',
+                        onClick: _.partial(changeColorTheme, 1)
+                    },
+                    {
+                        text: 'Night',
+                        onClick: _.partial(changeColorTheme, 2)
+                    }
+                ]
+            ]
+        });
+
+
+        // Init current settings
+        init(opts);
+    });
+});
+
+
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-search.js b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-search.js
new file mode 100644
index 000000000..a8dad2893
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-search.js
@@ -0,0 +1,222 @@
+gitbook.require(["gitbook", "lodash", "jQuery"], function(gitbook, _, $) {
+    var index = null;
+    var $searchInput, $searchLabel, $searchForm;
+    var $highlighted, hi = 0, hiOpts = { className: 'search-highlight' };
+    var collapse = false;
+
+    // Use a specific index
+    function loadIndex(data) {
+        // [Yihui] In bookdown, I use a character matrix to store the chapter
+        // content, and the index is dynamically built on the client side.
+        // Gitbook prebuilds the index data instead: https://github.com/GitbookIO/plugin-search
+        // We can certainly do that via R packages V8 and jsonlite, but let's
+        // see how slow it really is before improving it. On the other hand,
+        // lunr cannot handle non-English text very well, e.g. the default
+        // tokenizer cannot deal with Chinese text, so we may want to replace
+        // lunr with a dumb simple text matching approach.
+        index = lunr(function () {
+          this.ref('url');
+          this.field('title', { boost: 10 });
+          this.field('body');
+        });
+        data.map(function(item) {
+          index.add({
+            url: item[0],
+            title: item[1],
+            body: item[2]
+          });
+        });
+    }
+
+    // Fetch the search index
+    function fetchIndex() {
+        return $.getJSON(gitbook.state.basePath+"/search_index.json")
+                .then(loadIndex);  // [Yihui] we need to use this object later
+    }
+
+    // Search for a term and return results
+    function search(q) {
+        if (!index) return;
+
+        var results = _.chain(index.search(q))
+        .map(function(result) {
+            var parts = result.ref.split("#");
+            return {
+                path: parts[0],
+                hash: parts[1]
+            };
+        })
+        .value();
+
+        // [Yihui] Highlight the search keyword on current page
+        hi = 0;
+        $highlighted = results.length === 0 ? undefined : $('.page-inner')
+          .unhighlight(hiOpts).highlight(q, hiOpts).find('span.search-highlight');
+        scrollToHighlighted();
+        toggleTOC(results.length > 0);
+
+        return results;
+    }
+
+    // [Yihui] Scroll the chapter body to the i-th highlighted string
+    function scrollToHighlighted() {
+      if (!$highlighted) return;
+      var n = $highlighted.length;
+      if (n === 0) return;
+      var $p = $highlighted.eq(hi), p = $p[0], rect = p.getBoundingClientRect();
+      if (rect.top < 0 || rect.bottom > $(window).height()) {
+        ($(window).width() >= 1240 ? $('.body-inner') : $('.book-body'))
+          .scrollTop(p.offsetTop - 100);
+      }
+      $highlighted.css('background-color', '');
+      // an orange background color on the current item and removed later
+      $p.css('background-color', 'orange');
+      setTimeout(function() {
+        $p.css('background-color', '');
+      }, 2000);
+    }
+
+    // [Yihui] Expand/collapse TOC
+    function toggleTOC(show) {
+      if (!collapse) return;
+      var toc_sub = $('ul.summary').children('li[data-level]').children('ul');
+      if (show) return toc_sub.show();
+      var href = window.location.pathname;
+      href = href.substr(href.lastIndexOf('/') + 1);
+      if (href === '') href = 'index.html';
+      var li = $('a[href^="' + href + location.hash + '"]').parent('li.chapter').first();
+      toc_sub.hide().parent().has(li).children('ul').show();
+      li.children('ul').show();
+    }
+
+    // Create search form
+    function createForm(value) {
+        if ($searchForm) $searchForm.remove();
+        if ($searchLabel) $searchLabel.remove();
+        if ($searchInput) $searchInput.remove();
+
+        $searchForm = $('<div>', {
+            'class': 'book-search',
+            'role': 'search'
+        });
+
+        $searchLabel = $('<label>', {
+            'for': 'search-box',
+            'aria-hidden': 'false',
+            'hidden': ''
+        });
+
+        $searchInput = $('<input>', {
+            'id': 'search-box',
+            'type': 'search',
+            'class': 'form-control',
+            'val': value,
+            'placeholder': 'Type to search'
+        });
+
+        $searchLabel.append("Type to search");
+        $searchLabel.appendTo($searchForm);
+        $searchInput.appendTo($searchForm);
+        $searchForm.prependTo(gitbook.state.$book.find('.book-summary'));
+    }
+
+    // Return true if search is open
+    function isSearchOpen() {
+        return gitbook.state.$book.hasClass("with-search");
+    }
+
+    // Toggle the search
+    function toggleSearch(_state) {
+        if (isSearchOpen() === _state) return;
+        if (!$searchInput) return;
+
+        gitbook.state.$book.toggleClass("with-search", _state);
+
+        // If search bar is open: focus input
+        if (isSearchOpen()) {
+            gitbook.sidebar.toggle(true);
+            $searchInput.focus();
+        } else {
+            $searchInput.blur();
+            $searchInput.val("");
+            gitbook.storage.remove("keyword");
+            gitbook.sidebar.filter(null);
+            $('.page-inner').unhighlight(hiOpts);
+            toggleTOC(false);
+        }
+    }
+
+    // Recover current search when page changed
+    function recoverSearch() {
+        var keyword = gitbook.storage.get("keyword", "");
+
+        createForm(keyword);
+
+        if (keyword.length > 0) {
+            if(!isSearchOpen()) {
+                toggleSearch(true); // [Yihui] open the search box
+            }
+            gitbook.sidebar.filter(_.pluck(search(keyword), "path"));
+        }
+    }
+
+
+    gitbook.events.bind("start", function(e, config) {
+        // [Yihui] disable search
+        if (config.search === false) return;
+        collapse = !config.toc || config.toc.collapse === 'section' ||
+          config.toc.collapse === 'subsection';
+
+        // Pre-fetch search index and create the form
+        fetchIndex()
+        // [Yihui] recover search after the page is loaded
+        .then(recoverSearch);
+
+
+        // Type in search bar
+        $(document).on("keyup", ".book-search input", function(e) {
+            var key = (e.keyCode ? e.keyCode : e.which);
+            // [Yihui] Escape -> close search box; Up/Down: previous/next highlighted
+            if (key == 27) {
+                e.preventDefault();
+                toggleSearch(false);
+            } else if (key == 38) {
+              if (hi <= 0 && $highlighted) hi = $highlighted.length;
+              hi--;
+              scrollToHighlighted();
+            } else if (key == 40) {
+              hi++;
+              if ($highlighted && hi >= $highlighted.length) hi = 0;
+              scrollToHighlighted();
+            }
+        }).on("input", ".book-search input", function(e) {
+            var q = $(this).val().trim();
+            if (q.length === 0) {
+                gitbook.sidebar.filter(null);
+                gitbook.storage.remove("keyword");
+                $('.page-inner').unhighlight(hiOpts);
+                toggleTOC(false);
+            } else {
+                var results = search(q);
+                gitbook.sidebar.filter(
+                    _.pluck(results, "path")
+                );
+                gitbook.storage.set("keyword", q);
+            }
+        });
+
+        // Create the toggle search button
+        gitbook.toolbar.createButton({
+            icon: 'fa fa-search',
+            label: 'Search',
+            position: 'left',
+            onClick: toggleSearch
+        });
+
+        // Bind keyboard to toggle search
+        gitbook.keyboard.bind(['f'], toggleSearch);
+    });
+
+    // [Yihui] do not try to recover search; always start fresh
+    // gitbook.events.bind("page.change", recoverSearch);
+});
diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-sharing.js b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-sharing.js
new file mode 100644
index 000000000..afa214826
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-sharing.js
@@ -0,0 +1,112 @@
+gitbook.require(["gitbook", "lodash", "jQuery"], function(gitbook, _, $) {
+    var SITES = {
+        'github': {
+            'label': 'Github',
+            'icon': 'fa fa-github',
+            'onClick': function(e) {
+                e.preventDefault();
+                var repo = $('meta[name="github-repo"]').attr('content');
+                if (typeof repo === 'undefined') throw("Github repo not defined");
+                window.open("https://github.com/"+repo);
+            }
+        },
+        'facebook': {
+            'label': 'Facebook',
+            'icon': 'fa fa-facebook',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("http://www.facebook.com/sharer/sharer.php?s=100&p[url]="+encodeURIComponent(location.href));
+            }
+        },
+        'twitter': {
+            'label': 'Twitter',
+            'icon': 'fa fa-twitter',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("http://twitter.com/home?status="+encodeURIComponent(document.title+" "+location.href));
+            }
+        },
+        'google': {
+            'label': 'Google+',
+            'icon': 'fa fa-google-plus',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("https://plus.google.com/share?url="+encodeURIComponent(location.href));
+            }
+        },
+        'linkedin': {
+            'label': 'LinkedIn',
+            'icon': 'fa fa-linkedin',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("https://www.linkedin.com/shareArticle?mini=true&url="+encodeURIComponent(location.href)+"&title="+encodeURIComponent(document.title));
+            }
+        },
+        'weibo': {
+            'label': 'Weibo',
+            'icon': 'fa fa-weibo',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("http://service.weibo.com/share/share.php?content=utf-8&url="+encodeURIComponent(location.href)+"&title="+encodeURIComponent(document.title));
+            }
+        },
+        'instapaper': {
+            'label': 'Instapaper',
+            'icon': 'fa fa-instapaper',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("http://www.instapaper.com/text?u="+encodeURIComponent(location.href));
+            }
+        },
+        'vk': {
+            'label': 'VK',
+            'icon': 'fa fa-vk',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("http://vkontakte.ru/share.php?url="+encodeURIComponent(location.href));
+            }
+        }
+    };
+
+
+
+    gitbook.events.bind("start", function(e, config) {
+        var opts = config.sharing;
+        if (!opts) return;
+
+        // Create dropdown menu
+        var menu = _.chain(opts.all)
+            .map(function(id) {
+                var site = SITES[id];
+
+                return {
+                    text: site.label,
+                    onClick: site.onClick
+                };
+            })
+            .compact()
+            .value();
+
+        // Create main button with dropdown
+        if (menu.length > 0) {
+            gitbook.toolbar.createButton({
+                icon: 'fa fa-share-alt',
+                label: 'Share',
+                position: 'right',
+                dropdown: [menu]
+            });
+        }
+
+        // Direct actions to share
+        _.each(SITES, function(site, sideId) {
+            if (!opts[sideId]) return;
+
+            gitbook.toolbar.createButton({
+                icon: site.icon,
+                label: site.text,
+                position: 'right',
+                onClick: site.onClick
+            });
+        });
+    });
+});
diff --git a/docs/libs/htmlwidgets-1.2/htmlwidgets.js b/docs/previous_versions/v0.4.0/libs/htmlwidgets-1.2/htmlwidgets.js
old mode 100644
new mode 100755
similarity index 100%
rename from docs/libs/htmlwidgets-1.2/htmlwidgets.js
rename to docs/previous_versions/v0.4.0/libs/htmlwidgets-1.2/htmlwidgets.js
diff --git a/docs/previous_versions/v0.4.0/libs/htmlwidgets-1.3/htmlwidgets.js b/docs/previous_versions/v0.4.0/libs/htmlwidgets-1.3/htmlwidgets.js
new file mode 100644
index 000000000..ed9837d9c
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/htmlwidgets-1.3/htmlwidgets.js
@@ -0,0 +1,839 @@
+(function() {
+  // If window.HTMLWidgets is already defined, then use it; otherwise create a
+  // new object. This allows preceding code to set options that affect the
+  // initialization process (though none currently exist).
+  window.HTMLWidgets = window.HTMLWidgets || {};
+
+  // See if we're running in a viewer pane. If not, we're in a web browser.
+  var viewerMode = window.HTMLWidgets.viewerMode =
+      /\bviewer_pane=1\b/.test(window.location);
+
+  // See if we're running in Shiny mode. If not, it's a static document.
+  // Note that static widgets can appear in both Shiny and static modes, but
+  // obviously, Shiny widgets can only appear in Shiny apps/documents.
+  var shinyMode = window.HTMLWidgets.shinyMode =
+      typeof(window.Shiny) !== "undefined" && !!window.Shiny.outputBindings;
+
+  // We can't count on jQuery being available, so we implement our own
+  // version if necessary.
+  function querySelectorAll(scope, selector) {
+    if (typeof(jQuery) !== "undefined" && scope instanceof jQuery) {
+      return scope.find(selector);
+    }
+    if (scope.querySelectorAll) {
+      return scope.querySelectorAll(selector);
+    }
+  }
+
+  function asArray(value) {
+    if (value === null)
+      return [];
+    if ($.isArray(value))
+      return value;
+    return [value];
+  }
+
+  // Implement jQuery's extend
+  function extend(target /*, ... */) {
+    if (arguments.length == 1) {
+      return target;
+    }
+    for (var i = 1; i < arguments.length; i++) {
+      var source = arguments[i];
+      for (var prop in source) {
+        if (source.hasOwnProperty(prop)) {
+          target[prop] = source[prop];
+        }
+      }
+    }
+    return target;
+  }
+
+  // IE8 doesn't support Array.forEach.
+  function forEach(values, callback, thisArg) {
+    if (values.forEach) {
+      values.forEach(callback, thisArg);
+    } else {
+      for (var i = 0; i < values.length; i++) {
+        callback.call(thisArg, values[i], i, values);
+      }
+    }
+  }
+
+  // Replaces the specified method with the return value of funcSource.
+  //
+  // Note that funcSource should not BE the new method, it should be a function
+  // that RETURNS the new method. funcSource receives a single argument that is
+  // the overridden method, it can be called from the new method. The overridden
+  // method can be called like a regular function, it has the target permanently
+  // bound to it so "this" will work correctly.
+  function overrideMethod(target, methodName, funcSource) {
+    var superFunc = target[methodName] || function() {};
+    var superFuncBound = function() {
+      return superFunc.apply(target, arguments);
+    };
+    target[methodName] = funcSource(superFuncBound);
+  }
+
+  // Add a method to delegator that, when invoked, calls
+  // delegatee.methodName. If there is no such method on
+  // the delegatee, but there was one on delegator before
+  // delegateMethod was called, then the original version
+  // is invoked instead.
+  // For example:
+  //
+  // var a = {
+  //   method1: function() { console.log('a1'); }
+  //   method2: function() { console.log('a2'); }
+  // };
+  // var b = {
+  //   method1: function() { console.log('b1'); }
+  // };
+  // delegateMethod(a, b, "method1");
+  // delegateMethod(a, b, "method2");
+  // a.method1();
+  // a.method2();
+  //
+  // The output would be "b1", "a2".
+  function delegateMethod(delegator, delegatee, methodName) {
+    var inherited = delegator[methodName];
+    delegator[methodName] = function() {
+      var target = delegatee;
+      var method = delegatee[methodName];
+
+      // The method doesn't exist on the delegatee. Instead,
+      // call the method on the delegator, if it exists.
+      if (!method) {
+        target = delegator;
+        method = inherited;
+      }
+
+      if (method) {
+        return method.apply(target, arguments);
+      }
+    };
+  }
+
+  // Implement a vague facsimilie of jQuery's data method
+  function elementData(el, name, value) {
+    if (arguments.length == 2) {
+      return el["htmlwidget_data_" + name];
+    } else if (arguments.length == 3) {
+      el["htmlwidget_data_" + name] = value;
+      return el;
+    } else {
+      throw new Error("Wrong number of arguments for elementData: " +
+        arguments.length);
+    }
+  }
+
+  // http://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
+  function escapeRegExp(str) {
+    return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
+  }
+
+  function hasClass(el, className) {
+    var re = new RegExp("\\b" + escapeRegExp(className) + "\\b");
+    return re.test(el.className);
+  }
+
+  // elements - array (or array-like object) of HTML elements
+  // className - class name to test for
+  // include - if true, only return elements with given className;
+  //   if false, only return elements *without* given className
+  function filterByClass(elements, className, include) {
+    var results = [];
+    for (var i = 0; i < elements.length; i++) {
+      if (hasClass(elements[i], className) == include)
+        results.push(elements[i]);
+    }
+    return results;
+  }
+
+  function on(obj, eventName, func) {
+    if (obj.addEventListener) {
+      obj.addEventListener(eventName, func, false);
+    } else if (obj.attachEvent) {
+      obj.attachEvent(eventName, func);
+    }
+  }
+
+  function off(obj, eventName, func) {
+    if (obj.removeEventListener)
+      obj.removeEventListener(eventName, func, false);
+    else if (obj.detachEvent) {
+      obj.detachEvent(eventName, func);
+    }
+  }
+
+  // Translate array of values to top/right/bottom/left, as usual with
+  // the "padding" CSS property
+  // https://developer.mozilla.org/en-US/docs/Web/CSS/padding
+  function unpackPadding(value) {
+    if (typeof(value) === "number")
+      value = [value];
+    if (value.length === 1) {
+      return {top: value[0], right: value[0], bottom: value[0], left: value[0]};
+    }
+    if (value.length === 2) {
+      return {top: value[0], right: value[1], bottom: value[0], left: value[1]};
+    }
+    if (value.length === 3) {
+      return {top: value[0], right: value[1], bottom: value[2], left: value[1]};
+    }
+    if (value.length === 4) {
+      return {top: value[0], right: value[1], bottom: value[2], left: value[3]};
+    }
+  }
+
+  // Convert an unpacked padding object to a CSS value
+  function paddingToCss(paddingObj) {
+    return paddingObj.top + "px " + paddingObj.right + "px " + paddingObj.bottom + "px " + paddingObj.left + "px";
+  }
+
+  // Makes a number suitable for CSS
+  function px(x) {
+    if (typeof(x) === "number")
+      return x + "px";
+    else
+      return x;
+  }
+
+  // Retrieves runtime widget sizing information for an element.
+  // The return value is either null, or an object with fill, padding,
+  // defaultWidth, defaultHeight fields.
+  function sizingPolicy(el) {
+    var sizingEl = document.querySelector("script[data-for='" + el.id + "'][type='application/htmlwidget-sizing']");
+    if (!sizingEl)
+      return null;
+    var sp = JSON.parse(sizingEl.textContent || sizingEl.text || "{}");
+    if (viewerMode) {
+      return sp.viewer;
+    } else {
+      return sp.browser;
+    }
+  }
+
+  // @param tasks Array of strings (or falsy value, in which case no-op).
+  //   Each element must be a valid JavaScript expression that yields a
+  //   function. Or, can be an array of objects with "code" and "data"
+  //   properties; in this case, the "code" property should be a string
+  //   of JS that's an expr that yields a function, and "data" should be
+  //   an object that will be added as an additional argument when that
+  //   function is called.
+  // @param target The object that will be "this" for each function
+  //   execution.
+  // @param args Array of arguments to be passed to the functions. (The
+  //   same arguments will be passed to all functions.)
+  function evalAndRun(tasks, target, args) {
+    if (tasks) {
+      forEach(tasks, function(task) {
+        var theseArgs = args;
+        if (typeof(task) === "object") {
+          theseArgs = theseArgs.concat([task.data]);
+          task = task.code;
+        }
+        var taskFunc = eval("(" + task + ")");
+        if (typeof(taskFunc) !== "function") {
+          throw new Error("Task must be a function! Source:\n" + task);
+        }
+        taskFunc.apply(target, theseArgs);
+      });
+    }
+  }
+
+  function initSizing(el) {
+    var sizing = sizingPolicy(el);
+    if (!sizing)
+      return;
+
+    var cel = document.getElementById("htmlwidget_container");
+    if (!cel)
+      return;
+
+    if (typeof(sizing.padding) !== "undefined") {
+      document.body.style.margin = "0";
+      document.body.style.padding = paddingToCss(unpackPadding(sizing.padding));
+    }
+
+    if (sizing.fill) {
+      document.body.style.overflow = "hidden";
+      document.body.style.width = "100%";
+      document.body.style.height = "100%";
+      document.documentElement.style.width = "100%";
+      document.documentElement.style.height = "100%";
+      if (cel) {
+        cel.style.position = "absolute";
+        var pad = unpackPadding(sizing.padding);
+        cel.style.top = pad.top + "px";
+        cel.style.right = pad.right + "px";
+        cel.style.bottom = pad.bottom + "px";
+        cel.style.left = pad.left + "px";
+        el.style.width = "100%";
+        el.style.height = "100%";
+      }
+
+      return {
+        getWidth: function() { return cel.offsetWidth; },
+        getHeight: function() { return cel.offsetHeight; }
+      };
+
+    } else {
+      el.style.width = px(sizing.width);
+      el.style.height = px(sizing.height);
+
+      return {
+        getWidth: function() { return el.offsetWidth; },
+        getHeight: function() { return el.offsetHeight; }
+      };
+    }
+  }
+
+  // Default implementations for methods
+  var defaults = {
+    find: function(scope) {
+      return querySelectorAll(scope, "." + this.name);
+    },
+    renderError: function(el, err) {
+      var $el = $(el);
+
+      this.clearError(el);
+
+      // Add all these error classes, as Shiny does
+      var errClass = "shiny-output-error";
+      if (err.type !== null) {
+        // use the classes of the error condition as CSS class names
+        errClass = errClass + " " + $.map(asArray(err.type), function(type) {
+          return errClass + "-" + type;
+        }).join(" ");
+      }
+      errClass = errClass + " htmlwidgets-error";
+
+      // Is el inline or block? If inline or inline-block, just display:none it
+      // and add an inline error.
+      var display = $el.css("display");
+      $el.data("restore-display-mode", display);
+
+      if (display === "inline" || display === "inline-block") {
+        $el.hide();
+        if (err.message !== "") {
+          var errorSpan = $("<span>").addClass(errClass);
+          errorSpan.text(err.message);
+          $el.after(errorSpan);
+        }
+      } else if (display === "block") {
+        // If block, add an error just after the el, set visibility:none on the
+        // el, and position the error to be on top of the el.
+        // Mark it with a unique ID and CSS class so we can remove it later.
+        $el.css("visibility", "hidden");
+        if (err.message !== "") {
+          var errorDiv = $("<div>").addClass(errClass).css("position", "absolute")
+            .css("top", el.offsetTop)
+            .css("left", el.offsetLeft)
+            // setting width can push out the page size, forcing otherwise
+            // unnecessary scrollbars to appear and making it impossible for
+            // the element to shrink; so use max-width instead
+            .css("maxWidth", el.offsetWidth)
+            .css("height", el.offsetHeight);
+          errorDiv.text(err.message);
+          $el.after(errorDiv);
+
+          // Really dumb way to keep the size/position of the error in sync with
+          // the parent element as the window is resized or whatever.
+          var intId = setInterval(function() {
+            if (!errorDiv[0].parentElement) {
+              clearInterval(intId);
+              return;
+            }
+            errorDiv
+              .css("top", el.offsetTop)
+              .css("left", el.offsetLeft)
+              .css("maxWidth", el.offsetWidth)
+              .css("height", el.offsetHeight);
+          }, 500);
+        }
+      }
+    },
+    clearError: function(el) {
+      var $el = $(el);
+      var display = $el.data("restore-display-mode");
+      $el.data("restore-display-mode", null);
+
+      if (display === "inline" || display === "inline-block") {
+        if (display)
+          $el.css("display", display);
+        $(el.nextSibling).filter(".htmlwidgets-error").remove();
+      } else if (display === "block"){
+        $el.css("visibility", "inherit");
+        $(el.nextSibling).filter(".htmlwidgets-error").remove();
+      }
+    },
+    sizing: {}
+  };
+
+  // Called by widget bindings to register a new type of widget. The definition
+  // object can contain the following properties:
+  // - name (required) - A string indicating the binding name, which will be
+  //   used by default as the CSS classname to look for.
+  // - initialize (optional) - A function(el) that will be called once per
+  //   widget element; if a value is returned, it will be passed as the third
+  //   value to renderValue.
+  // - renderValue (required) - A function(el, data, initValue) that will be
+  //   called with data. Static contexts will cause this to be called once per
+  //   element; Shiny apps will cause this to be called multiple times per
+  //   element, as the data changes.
+  window.HTMLWidgets.widget = function(definition) {
+    if (!definition.name) {
+      throw new Error("Widget must have a name");
+    }
+    if (!definition.type) {
+      throw new Error("Widget must have a type");
+    }
+    // Currently we only support output widgets
+    if (definition.type !== "output") {
+      throw new Error("Unrecognized widget type '" + definition.type + "'");
+    }
+    // TODO: Verify that .name is a valid CSS classname
+
+    // Support new-style instance-bound definitions. Old-style class-bound
+    // definitions have one widget "object" per widget per type/class of
+    // widget; the renderValue and resize methods on such widget objects
+    // take el and instance arguments, because the widget object can't
+    // store them. New-style instance-bound definitions have one widget
+    // object per widget instance; the definition that's passed in doesn't
+    // provide renderValue or resize methods at all, just the single method
+    //   factory(el, width, height)
+    // which returns an object that has renderValue(x) and resize(w, h).
+    // This enables a far more natural programming style for the widget
+    // author, who can store per-instance state using either OO-style
+    // instance fields or functional-style closure variables (I guess this
+    // is in contrast to what can only be called C-style pseudo-OO which is
+    // what we required before).
+    if (definition.factory) {
+      definition = createLegacyDefinitionAdapter(definition);
+    }
+
+    if (!definition.renderValue) {
+      throw new Error("Widget must have a renderValue function");
+    }
+
+    // For static rendering (non-Shiny), use a simple widget registration
+    // scheme. We also use this scheme for Shiny apps/documents that also
+    // contain static widgets.
+    window.HTMLWidgets.widgets = window.HTMLWidgets.widgets || [];
+    // Merge defaults into the definition; don't mutate the original definition.
+    var staticBinding = extend({}, defaults, definition);
+    overrideMethod(staticBinding, "find", function(superfunc) {
+      return function(scope) {
+        var results = superfunc(scope);
+        // Filter out Shiny outputs, we only want the static kind
+        return filterByClass(results, "html-widget-output", false);
+      };
+    });
+    window.HTMLWidgets.widgets.push(staticBinding);
+
+    if (shinyMode) {
+      // Shiny is running. Register the definition with an output binding.
+      // The definition itself will not be the output binding, instead
+      // we will make an output binding object that delegates to the
+      // definition. This is because we foolishly used the same method
+      // name (renderValue) for htmlwidgets definition and Shiny bindings
+      // but they actually have quite different semantics (the Shiny
+      // bindings receive data that includes lots of metadata that it
+      // strips off before calling htmlwidgets renderValue). We can't
+      // just ignore the difference because in some widgets it's helpful
+      // to call this.renderValue() from inside of resize(), and if
+      // we're not delegating, then that call will go to the Shiny
+      // version instead of the htmlwidgets version.
+
+      // Merge defaults with definition, without mutating either.
+      var bindingDef = extend({}, defaults, definition);
+
+      // This object will be our actual Shiny binding.
+      var shinyBinding = new Shiny.OutputBinding();
+
+      // With a few exceptions, we'll want to simply use the bindingDef's
+      // version of methods if they are available, otherwise fall back to
+      // Shiny's defaults. NOTE: If Shiny's output bindings gain additional
+      // methods in the future, and we want them to be overrideable by
+      // HTMLWidget binding definitions, then we'll need to add them to this
+      // list.
+      delegateMethod(shinyBinding, bindingDef, "getId");
+      delegateMethod(shinyBinding, bindingDef, "onValueChange");
+      delegateMethod(shinyBinding, bindingDef, "onValueError");
+      delegateMethod(shinyBinding, bindingDef, "renderError");
+      delegateMethod(shinyBinding, bindingDef, "clearError");
+      delegateMethod(shinyBinding, bindingDef, "showProgress");
+
+      // The find, renderValue, and resize are handled differently, because we
+      // want to actually decorate the behavior of the bindingDef methods.
+
+      shinyBinding.find = function(scope) {
+        var results = bindingDef.find(scope);
+
+        // Only return elements that are Shiny outputs, not static ones
+        var dynamicResults = results.filter(".html-widget-output");
+
+        // It's possible that whatever caused Shiny to think there might be
+        // new dynamic outputs, also caused there to be new static outputs.
+        // Since there might be lots of different htmlwidgets bindings, we
+        // schedule execution for later--no need to staticRender multiple
+        // times.
+        if (results.length !== dynamicResults.length)
+          scheduleStaticRender();
+
+        return dynamicResults;
+      };
+
+      // Wrap renderValue to handle initialization, which unfortunately isn't
+      // supported natively by Shiny at the time of this writing.
+
+      shinyBinding.renderValue = function(el, data) {
+        Shiny.renderDependencies(data.deps);
+        // Resolve strings marked as javascript literals to objects
+        if (!(data.evals instanceof Array)) data.evals = [data.evals];
+        for (var i = 0; data.evals && i < data.evals.length; i++) {
+          window.HTMLWidgets.evaluateStringMember(data.x, data.evals[i]);
+        }
+        if (!bindingDef.renderOnNullValue) {
+          if (data.x === null) {
+            el.style.visibility = "hidden";
+            return;
+          } else {
+            el.style.visibility = "inherit";
+          }
+        }
+        if (!elementData(el, "initialized")) {
+          initSizing(el);
+
+          elementData(el, "initialized", true);
+          if (bindingDef.initialize) {
+            var result = bindingDef.initialize(el, el.offsetWidth,
+              el.offsetHeight);
+            elementData(el, "init_result", result);
+          }
+        }
+        bindingDef.renderValue(el, data.x, elementData(el, "init_result"));
+        evalAndRun(data.jsHooks.render, elementData(el, "init_result"), [el, data.x]);
+      };
+
+      // Only override resize if bindingDef implements it
+      if (bindingDef.resize) {
+        shinyBinding.resize = function(el, width, height) {
+          // Shiny can call resize before initialize/renderValue have been
+          // called, which doesn't make sense for widgets.
+          if (elementData(el, "initialized")) {
+            bindingDef.resize(el, width, height, elementData(el, "init_result"));
+          }
+        };
+      }
+
+      Shiny.outputBindings.register(shinyBinding, bindingDef.name);
+    }
+  };
+
+  var scheduleStaticRenderTimerId = null;
+  function scheduleStaticRender() {
+    if (!scheduleStaticRenderTimerId) {
+      scheduleStaticRenderTimerId = setTimeout(function() {
+        scheduleStaticRenderTimerId = null;
+        window.HTMLWidgets.staticRender();
+      }, 1);
+    }
+  }
+
+  // Render static widgets after the document finishes loading
+  // Statically render all elements that are of this widget's class
+  window.HTMLWidgets.staticRender = function() {
+    var bindings = window.HTMLWidgets.widgets || [];
+    forEach(bindings, function(binding) {
+      var matches = binding.find(document.documentElement);
+      forEach(matches, function(el) {
+        var sizeObj = initSizing(el, binding);
+
+        if (hasClass(el, "html-widget-static-bound"))
+          return;
+        el.className = el.className + " html-widget-static-bound";
+
+        var initResult;
+        if (binding.initialize) {
+          initResult = binding.initialize(el,
+            sizeObj ? sizeObj.getWidth() : el.offsetWidth,
+            sizeObj ? sizeObj.getHeight() : el.offsetHeight
+          );
+          elementData(el, "init_result", initResult);
+        }
+
+        if (binding.resize) {
+          var lastSize = {
+            w: sizeObj ? sizeObj.getWidth() : el.offsetWidth,
+            h: sizeObj ? sizeObj.getHeight() : el.offsetHeight
+          };
+          var resizeHandler = function(e) {
+            var size = {
+              w: sizeObj ? sizeObj.getWidth() : el.offsetWidth,
+              h: sizeObj ? sizeObj.getHeight() : el.offsetHeight
+            };
+            if (size.w === 0 && size.h === 0)
+              return;
+            if (size.w === lastSize.w && size.h === lastSize.h)
+              return;
+            lastSize = size;
+            binding.resize(el, size.w, size.h, initResult);
+          };
+
+          on(window, "resize", resizeHandler);
+
+          // This is needed for cases where we're running in a Shiny
+          // app, but the widget itself is not a Shiny output, but
+          // rather a simple static widget. One example of this is
+          // an rmarkdown document that has runtime:shiny and widget
+          // that isn't in a render function. Shiny only knows to
+          // call resize handlers for Shiny outputs, not for static
+          // widgets, so we do it ourselves.
+          if (window.jQuery) {
+            window.jQuery(document).on(
+              "shown.htmlwidgets shown.bs.tab.htmlwidgets shown.bs.collapse.htmlwidgets",
+              resizeHandler
+            );
+            window.jQuery(document).on(
+              "hidden.htmlwidgets hidden.bs.tab.htmlwidgets hidden.bs.collapse.htmlwidgets",
+              resizeHandler
+            );
+          }
+
+          // This is needed for the specific case of ioslides, which
+          // flips slides between display:none and display:block.
+          // Ideally we would not have to have ioslide-specific code
+          // here, but rather have ioslides raise a generic event,
+          // but the rmarkdown package just went to CRAN so the
+          // window to getting that fixed may be long.
+          if (window.addEventListener) {
+            // It's OK to limit this to window.addEventListener
+            // browsers because ioslides itself only supports
+            // such browsers.
+            on(document, "slideenter", resizeHandler);
+            on(document, "slideleave", resizeHandler);
+          }
+        }
+
+        var scriptData = document.querySelector("script[data-for='" + el.id + "'][type='application/json']");
+        if (scriptData) {
+          var data = JSON.parse(scriptData.textContent || scriptData.text);
+          // Resolve strings marked as javascript literals to objects
+          if (!(data.evals instanceof Array)) data.evals = [data.evals];
+          for (var k = 0; data.evals && k < data.evals.length; k++) {
+            window.HTMLWidgets.evaluateStringMember(data.x, data.evals[k]);
+          }
+          binding.renderValue(el, data.x, initResult);
+          evalAndRun(data.jsHooks.render, initResult, [el, data.x]);
+        }
+      });
+    });
+
+    invokePostRenderHandlers();
+  }
+
+  // Wait until after the document has loaded to render the widgets.
+  if (document.addEventListener) {
+    document.addEventListener("DOMContentLoaded", function() {
+      document.removeEventListener("DOMContentLoaded", arguments.callee, false);
+      window.HTMLWidgets.staticRender();
+    }, false);
+  } else if (document.attachEvent) {
+    document.attachEvent("onreadystatechange", function() {
+      if (document.readyState === "complete") {
+        document.detachEvent("onreadystatechange", arguments.callee);
+        window.HTMLWidgets.staticRender();
+      }
+    });
+  }
+
+
+  window.HTMLWidgets.getAttachmentUrl = function(depname, key) {
+    // If no key, default to the first item
+    if (typeof(key) === "undefined")
+      key = 1;
+
+    var link = document.getElementById(depname + "-" + key + "-attachment");
+    if (!link) {
+      throw new Error("Attachment " + depname + "/" + key + " not found in document");
+    }
+    return link.getAttribute("href");
+  };
+
+  window.HTMLWidgets.dataframeToD3 = function(df) {
+    var names = [];
+    var length;
+    for (var name in df) {
+        if (df.hasOwnProperty(name))
+            names.push(name);
+        if (typeof(df[name]) !== "object" || typeof(df[name].length) === "undefined") {
+            throw new Error("All fields must be arrays");
+        } else if (typeof(length) !== "undefined" && length !== df[name].length) {
+            throw new Error("All fields must be arrays of the same length");
+        }
+        length = df[name].length;
+    }
+    var results = [];
+    var item;
+    for (var row = 0; row < length; row++) {
+        item = {};
+        for (var col = 0; col < names.length; col++) {
+            item[names[col]] = df[names[col]][row];
+        }
+        results.push(item);
+    }
+    return results;
+  };
+
+  window.HTMLWidgets.transposeArray2D = function(array) {
+      if (array.length === 0) return array;
+      var newArray = array[0].map(function(col, i) {
+          return array.map(function(row) {
+              return row[i]
+          })
+      });
+      return newArray;
+  };
+  // Split value at splitChar, but allow splitChar to be escaped
+  // using escapeChar. Any other characters escaped by escapeChar
+  // will be included as usual (including escapeChar itself).
+  function splitWithEscape(value, splitChar, escapeChar) {
+    var results = [];
+    var escapeMode = false;
+    var currentResult = "";
+    for (var pos = 0; pos < value.length; pos++) {
+      if (!escapeMode) {
+        if (value[pos] === splitChar) {
+          results.push(currentResult);
+          currentResult = "";
+        } else if (value[pos] === escapeChar) {
+          escapeMode = true;
+        } else {
+          currentResult += value[pos];
+        }
+      } else {
+        currentResult += value[pos];
+        escapeMode = false;
+      }
+    }
+    if (currentResult !== "") {
+      results.push(currentResult);
+    }
+    return results;
+  }
+  // Function authored by Yihui/JJ Allaire
+  window.HTMLWidgets.evaluateStringMember = function(o, member) {
+    var parts = splitWithEscape(member, '.', '\\');
+    for (var i = 0, l = parts.length; i < l; i++) {
+      var part = parts[i];
+      // part may be a character or 'numeric' member name
+      if (o !== null && typeof o === "object" && part in o) {
+        if (i == (l - 1)) { // if we are at the end of the line then evalulate
+          if (typeof o[part] === "string")
+            o[part] = eval("(" + o[part] + ")");
+        } else { // otherwise continue to next embedded object
+          o = o[part];
+        }
+      }
+    }
+  };
+
+  // Retrieve the HTMLWidget instance (i.e. the return value of an
+  // HTMLWidget binding's initialize() or factory() function)
+  // associated with an element, or null if none.
+  window.HTMLWidgets.getInstance = function(el) {
+    return elementData(el, "init_result");
+  };
+
+  // Finds the first element in the scope that matches the selector,
+  // and returns the HTMLWidget instance (i.e. the return value of
+  // an HTMLWidget binding's initialize() or factory() function)
+  // associated with that element, if any. If no element matches the
+  // selector, or the first matching element has no HTMLWidget
+  // instance associated with it, then null is returned.
+  //
+  // The scope argument is optional, and defaults to window.document.
+  window.HTMLWidgets.find = function(scope, selector) {
+    if (arguments.length == 1) {
+      selector = scope;
+      scope = document;
+    }
+
+    var el = scope.querySelector(selector);
+    if (el === null) {
+      return null;
+    } else {
+      return window.HTMLWidgets.getInstance(el);
+    }
+  };
+
+  // Finds all elements in the scope that match the selector, and
+  // returns the HTMLWidget instances (i.e. the return values of
+  // an HTMLWidget binding's initialize() or factory() function)
+  // associated with the elements, in an array. If elements that
+  // match the selector don't have an associated HTMLWidget
+  // instance, the returned array will contain nulls.
+  //
+  // The scope argument is optional, and defaults to window.document.
+  window.HTMLWidgets.findAll = function(scope, selector) {
+    if (arguments.length == 1) {
+      selector = scope;
+      scope = document;
+    }
+
+    var nodes = scope.querySelectorAll(selector);
+    var results = [];
+    for (var i = 0; i < nodes.length; i++) {
+      results.push(window.HTMLWidgets.getInstance(nodes[i]));
+    }
+    return results;
+  };
+
+  var postRenderHandlers = [];
+  function invokePostRenderHandlers() {
+    while (postRenderHandlers.length) {
+      var handler = postRenderHandlers.shift();
+      if (handler) {
+        handler();
+      }
+    }
+  }
+
+  // Register the given callback function to be invoked after the
+  // next time static widgets are rendered.
+  window.HTMLWidgets.addPostRenderHandler = function(callback) {
+    postRenderHandlers.push(callback);
+  };
+
+  // Takes a new-style instance-bound definition, and returns an
+  // old-style class-bound definition. This saves us from having
+  // to rewrite all the logic in this file to accomodate both
+  // types of definitions.
+  function createLegacyDefinitionAdapter(defn) {
+    var result = {
+      name: defn.name,
+      type: defn.type,
+      initialize: function(el, width, height) {
+        return defn.factory(el, width, height);
+      },
+      renderValue: function(el, x, instance) {
+        return instance.renderValue(x);
+      },
+      resize: function(el, width, height, instance) {
+        return instance.resize(width, height);
+      }
+    };
+
+    if (defn.find)
+      result.find = defn.find;
+    if (defn.renderError)
+      result.renderError = defn.renderError;
+    if (defn.clearError)
+      result.clearError = defn.clearError;
+
+    return result;
+  }
+})();
+
diff --git a/docs/previous_versions/v0.4.0/libs/jquery-2.2.3/jquery.min.js b/docs/previous_versions/v0.4.0/libs/jquery-2.2.3/jquery.min.js
new file mode 100644
index 000000000..b8c4187de
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/jquery-2.2.3/jquery.min.js
@@ -0,0 +1,4 @@
+/*! jQuery v2.2.3 | (c) jQuery Foundation | jquery.org/license */
+!function(a,b){"object"==typeof module&&"object"==typeof module.exports?module.exports=a.document?b(a,!0):function(a){if(!a.document)throw new Error("jQuery requires a window with a document");return b(a)}:b(a)}("undefined"!=typeof window?window:this,function(a,b){var c=[],d=a.document,e=c.slice,f=c.concat,g=c.push,h=c.indexOf,i={},j=i.toString,k=i.hasOwnProperty,l={},m="2.2.3",n=function(a,b){return new n.fn.init(a,b)},o=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,p=/^-ms-/,q=/-([\da-z])/gi,r=function(a,b){return b.toUpperCase()};n.fn=n.prototype={jquery:m,constructor:n,selector:"",length:0,toArray:function(){return e.call(this)},get:function(a){return null!=a?0>a?this[a+this.length]:this[a]:e.call(this)},pushStack:function(a){var b=n.merge(this.constructor(),a);return b.prevObject=this,b.context=this.context,b},each:function(a){return n.each(this,a)},map:function(a){return this.pushStack(n.map(this,function(b,c){return a.call(b,c,b)}))},slice:function(){return this.pushStack(e.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(a){var b=this.length,c=+a+(0>a?b:0);return this.pushStack(c>=0&&b>c?[this[c]]:[])},end:function(){return this.prevObject||this.constructor()},push:g,sort:c.sort,splice:c.splice},n.extend=n.fn.extend=function(){var a,b,c,d,e,f,g=arguments[0]||{},h=1,i=arguments.length,j=!1;for("boolean"==typeof g&&(j=g,g=arguments[h]||{},h++),"object"==typeof g||n.isFunction(g)||(g={}),h===i&&(g=this,h--);i>h;h++)if(null!=(a=arguments[h]))for(b in a)c=g[b],d=a[b],g!==d&&(j&&d&&(n.isPlainObject(d)||(e=n.isArray(d)))?(e?(e=!1,f=c&&n.isArray(c)?c:[]):f=c&&n.isPlainObject(c)?c:{},g[b]=n.extend(j,f,d)):void 0!==d&&(g[b]=d));return g},n.extend({expando:"jQuery"+(m+Math.random()).replace(/\D/g,""),isReady:!0,error:function(a){throw new Error(a)},noop:function(){},isFunction:function(a){return"function"===n.type(a)},isArray:Array.isArray,isWindow:function(a){return null!=a&&a===a.window},isNumeric:function(a){var b=a&&a.toString();return!n.isArray(a)&&b-parseFloat(b)+1>=0},isPlainObject:function(a){var b;if("object"!==n.type(a)||a.nodeType||n.isWindow(a))return!1;if(a.constructor&&!k.call(a,"constructor")&&!k.call(a.constructor.prototype||{},"isPrototypeOf"))return!1;for(b in a);return void 0===b||k.call(a,b)},isEmptyObject:function(a){var b;for(b in a)return!1;return!0},type:function(a){return null==a?a+"":"object"==typeof a||"function"==typeof a?i[j.call(a)]||"object":typeof a},globalEval:function(a){var b,c=eval;a=n.trim(a),a&&(1===a.indexOf("use strict")?(b=d.createElement("script"),b.text=a,d.head.appendChild(b).parentNode.removeChild(b)):c(a))},camelCase:function(a){return a.replace(p,"ms-").replace(q,r)},nodeName:function(a,b){return a.nodeName&&a.nodeName.toLowerCase()===b.toLowerCase()},each:function(a,b){var c,d=0;if(s(a)){for(c=a.length;c>d;d++)if(b.call(a[d],d,a[d])===!1)break}else for(d in a)if(b.call(a[d],d,a[d])===!1)break;return a},trim:function(a){return null==a?"":(a+"").replace(o,"")},makeArray:function(a,b){var c=b||[];return null!=a&&(s(Object(a))?n.merge(c,"string"==typeof a?[a]:a):g.call(c,a)),c},inArray:function(a,b,c){return null==b?-1:h.call(b,a,c)},merge:function(a,b){for(var c=+b.length,d=0,e=a.length;c>d;d++)a[e++]=b[d];return a.length=e,a},grep:function(a,b,c){for(var d,e=[],f=0,g=a.length,h=!c;g>f;f++)d=!b(a[f],f),d!==h&&e.push(a[f]);return e},map:function(a,b,c){var d,e,g=0,h=[];if(s(a))for(d=a.length;d>g;g++)e=b(a[g],g,c),null!=e&&h.push(e);else for(g in a)e=b(a[g],g,c),null!=e&&h.push(e);return f.apply([],h)},guid:1,proxy:function(a,b){var c,d,f;return"string"==typeof b&&(c=a[b],b=a,a=c),n.isFunction(a)?(d=e.call(arguments,2),f=function(){return a.apply(b||this,d.concat(e.call(arguments)))},f.guid=a.guid=a.guid||n.guid++,f):void 0},now:Date.now,support:l}),"function"==typeof Symbol&&(n.fn[Symbol.iterator]=c[Symbol.iterator]),n.each("Boolean Number String Function Array Date RegExp Object Error Symbol".split(" "),function(a,b){i["[object "+b+"]"]=b.toLowerCase()});function s(a){var b=!!a&&"length"in a&&a.length,c=n.type(a);return"function"===c||n.isWindow(a)?!1:"array"===c||0===b||"number"==typeof b&&b>0&&b-1 in a}var t=function(a){var b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u="sizzle"+1*new Date,v=a.document,w=0,x=0,y=ga(),z=ga(),A=ga(),B=function(a,b){return a===b&&(l=!0),0},C=1<<31,D={}.hasOwnProperty,E=[],F=E.pop,G=E.push,H=E.push,I=E.slice,J=function(a,b){for(var c=0,d=a.length;d>c;c++)if(a[c]===b)return c;return-1},K="checked|selected|async|autofocus|autoplay|controls|defer|disabled|hidden|ismap|loop|multiple|open|readonly|required|scoped",L="[\\x20\\t\\r\\n\\f]",M="(?:\\\\.|[\\w-]|[^\\x00-\\xa0])+",N="\\["+L+"*("+M+")(?:"+L+"*([*^$|!~]?=)"+L+"*(?:'((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\"|("+M+"))|)"+L+"*\\]",O=":("+M+")(?:\\((('((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\")|((?:\\\\.|[^\\\\()[\\]]|"+N+")*)|.*)\\)|)",P=new RegExp(L+"+","g"),Q=new RegExp("^"+L+"+|((?:^|[^\\\\])(?:\\\\.)*)"+L+"+$","g"),R=new RegExp("^"+L+"*,"+L+"*"),S=new RegExp("^"+L+"*([>+~]|"+L+")"+L+"*"),T=new RegExp("="+L+"*([^\\]'\"]*?)"+L+"*\\]","g"),U=new RegExp(O),V=new RegExp("^"+M+"$"),W={ID:new RegExp("^#("+M+")"),CLASS:new RegExp("^\\.("+M+")"),TAG:new RegExp("^("+M+"|[*])"),ATTR:new RegExp("^"+N),PSEUDO:new RegExp("^"+O),CHILD:new RegExp("^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\("+L+"*(even|odd|(([+-]|)(\\d*)n|)"+L+"*(?:([+-]|)"+L+"*(\\d+)|))"+L+"*\\)|)","i"),bool:new RegExp("^(?:"+K+")$","i"),needsContext:new RegExp("^"+L+"*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\("+L+"*((?:-\\d)?\\d*)"+L+"*\\)|)(?=[^-]|$)","i")},X=/^(?:input|select|textarea|button)$/i,Y=/^h\d$/i,Z=/^[^{]+\{\s*\[native \w/,$=/^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/,_=/[+~]/,aa=/'|\\/g,ba=new RegExp("\\\\([\\da-f]{1,6}"+L+"?|("+L+")|.)","ig"),ca=function(a,b,c){var d="0x"+b-65536;return d!==d||c?b:0>d?String.fromCharCode(d+65536):String.fromCharCode(d>>10|55296,1023&d|56320)},da=function(){m()};try{H.apply(E=I.call(v.childNodes),v.childNodes),E[v.childNodes.length].nodeType}catch(ea){H={apply:E.length?function(a,b){G.apply(a,I.call(b))}:function(a,b){var c=a.length,d=0;while(a[c++]=b[d++]);a.length=c-1}}}function fa(a,b,d,e){var f,h,j,k,l,o,r,s,w=b&&b.ownerDocument,x=b?b.nodeType:9;if(d=d||[],"string"!=typeof a||!a||1!==x&&9!==x&&11!==x)return d;if(!e&&((b?b.ownerDocument||b:v)!==n&&m(b),b=b||n,p)){if(11!==x&&(o=$.exec(a)))if(f=o[1]){if(9===x){if(!(j=b.getElementById(f)))return d;if(j.id===f)return d.push(j),d}else if(w&&(j=w.getElementById(f))&&t(b,j)&&j.id===f)return d.push(j),d}else{if(o[2])return H.apply(d,b.getElementsByTagName(a)),d;if((f=o[3])&&c.getElementsByClassName&&b.getElementsByClassName)return H.apply(d,b.getElementsByClassName(f)),d}if(c.qsa&&!A[a+" "]&&(!q||!q.test(a))){if(1!==x)w=b,s=a;else if("object"!==b.nodeName.toLowerCase()){(k=b.getAttribute("id"))?k=k.replace(aa,"\\$&"):b.setAttribute("id",k=u),r=g(a),h=r.length,l=V.test(k)?"#"+k:"[id='"+k+"']";while(h--)r[h]=l+" "+qa(r[h]);s=r.join(","),w=_.test(a)&&oa(b.parentNode)||b}if(s)try{return H.apply(d,w.querySelectorAll(s)),d}catch(y){}finally{k===u&&b.removeAttribute("id")}}}return i(a.replace(Q,"$1"),b,d,e)}function ga(){var a=[];function b(c,e){return a.push(c+" ")>d.cacheLength&&delete b[a.shift()],b[c+" "]=e}return b}function ha(a){return a[u]=!0,a}function ia(a){var b=n.createElement("div");try{return!!a(b)}catch(c){return!1}finally{b.parentNode&&b.parentNode.removeChild(b),b=null}}function ja(a,b){var c=a.split("|"),e=c.length;while(e--)d.attrHandle[c[e]]=b}function ka(a,b){var c=b&&a,d=c&&1===a.nodeType&&1===b.nodeType&&(~b.sourceIndex||C)-(~a.sourceIndex||C);if(d)return d;if(c)while(c=c.nextSibling)if(c===b)return-1;return a?1:-1}function la(a){return function(b){var c=b.nodeName.toLowerCase();return"input"===c&&b.type===a}}function ma(a){return function(b){var c=b.nodeName.toLowerCase();return("input"===c||"button"===c)&&b.type===a}}function na(a){return ha(function(b){return b=+b,ha(function(c,d){var e,f=a([],c.length,b),g=f.length;while(g--)c[e=f[g]]&&(c[e]=!(d[e]=c[e]))})})}function oa(a){return a&&"undefined"!=typeof a.getElementsByTagName&&a}c=fa.support={},f=fa.isXML=function(a){var b=a&&(a.ownerDocument||a).documentElement;return b?"HTML"!==b.nodeName:!1},m=fa.setDocument=function(a){var b,e,g=a?a.ownerDocument||a:v;return g!==n&&9===g.nodeType&&g.documentElement?(n=g,o=n.documentElement,p=!f(n),(e=n.defaultView)&&e.top!==e&&(e.addEventListener?e.addEventListener("unload",da,!1):e.attachEvent&&e.attachEvent("onunload",da)),c.attributes=ia(function(a){return a.className="i",!a.getAttribute("className")}),c.getElementsByTagName=ia(function(a){return a.appendChild(n.createComment("")),!a.getElementsByTagName("*").length}),c.getElementsByClassName=Z.test(n.getElementsByClassName),c.getById=ia(function(a){return o.appendChild(a).id=u,!n.getElementsByName||!n.getElementsByName(u).length}),c.getById?(d.find.ID=function(a,b){if("undefined"!=typeof b.getElementById&&p){var c=b.getElementById(a);return c?[c]:[]}},d.filter.ID=function(a){var b=a.replace(ba,ca);return function(a){return a.getAttribute("id")===b}}):(delete d.find.ID,d.filter.ID=function(a){var b=a.replace(ba,ca);return function(a){var c="undefined"!=typeof a.getAttributeNode&&a.getAttributeNode("id");return c&&c.value===b}}),d.find.TAG=c.getElementsByTagName?function(a,b){return"undefined"!=typeof b.getElementsByTagName?b.getElementsByTagName(a):c.qsa?b.querySelectorAll(a):void 0}:function(a,b){var c,d=[],e=0,f=b.getElementsByTagName(a);if("*"===a){while(c=f[e++])1===c.nodeType&&d.push(c);return d}return f},d.find.CLASS=c.getElementsByClassName&&function(a,b){return"undefined"!=typeof b.getElementsByClassName&&p?b.getElementsByClassName(a):void 0},r=[],q=[],(c.qsa=Z.test(n.querySelectorAll))&&(ia(function(a){o.appendChild(a).innerHTML="<a id='"+u+"'></a><select id='"+u+"-\r\\' msallowcapture=''><option selected=''></option></select>",a.querySelectorAll("[msallowcapture^='']").length&&q.push("[*^$]="+L+"*(?:''|\"\")"),a.querySelectorAll("[selected]").length||q.push("\\["+L+"*(?:value|"+K+")"),a.querySelectorAll("[id~="+u+"-]").length||q.push("~="),a.querySelectorAll(":checked").length||q.push(":checked"),a.querySelectorAll("a#"+u+"+*").length||q.push(".#.+[+~]")}),ia(function(a){var b=n.createElement("input");b.setAttribute("type","hidden"),a.appendChild(b).setAttribute("name","D"),a.querySelectorAll("[name=d]").length&&q.push("name"+L+"*[*^$|!~]?="),a.querySelectorAll(":enabled").length||q.push(":enabled",":disabled"),a.querySelectorAll("*,:x"),q.push(",.*:")})),(c.matchesSelector=Z.test(s=o.matches||o.webkitMatchesSelector||o.mozMatchesSelector||o.oMatchesSelector||o.msMatchesSelector))&&ia(function(a){c.disconnectedMatch=s.call(a,"div"),s.call(a,"[s!='']:x"),r.push("!=",O)}),q=q.length&&new RegExp(q.join("|")),r=r.length&&new RegExp(r.join("|")),b=Z.test(o.compareDocumentPosition),t=b||Z.test(o.contains)?function(a,b){var c=9===a.nodeType?a.documentElement:a,d=b&&b.parentNode;return a===d||!(!d||1!==d.nodeType||!(c.contains?c.contains(d):a.compareDocumentPosition&&16&a.compareDocumentPosition(d)))}:function(a,b){if(b)while(b=b.parentNode)if(b===a)return!0;return!1},B=b?function(a,b){if(a===b)return l=!0,0;var d=!a.compareDocumentPosition-!b.compareDocumentPosition;return d?d:(d=(a.ownerDocument||a)===(b.ownerDocument||b)?a.compareDocumentPosition(b):1,1&d||!c.sortDetached&&b.compareDocumentPosition(a)===d?a===n||a.ownerDocument===v&&t(v,a)?-1:b===n||b.ownerDocument===v&&t(v,b)?1:k?J(k,a)-J(k,b):0:4&d?-1:1)}:function(a,b){if(a===b)return l=!0,0;var c,d=0,e=a.parentNode,f=b.parentNode,g=[a],h=[b];if(!e||!f)return a===n?-1:b===n?1:e?-1:f?1:k?J(k,a)-J(k,b):0;if(e===f)return ka(a,b);c=a;while(c=c.parentNode)g.unshift(c);c=b;while(c=c.parentNode)h.unshift(c);while(g[d]===h[d])d++;return d?ka(g[d],h[d]):g[d]===v?-1:h[d]===v?1:0},n):n},fa.matches=function(a,b){return fa(a,null,null,b)},fa.matchesSelector=function(a,b){if((a.ownerDocument||a)!==n&&m(a),b=b.replace(T,"='$1']"),c.matchesSelector&&p&&!A[b+" "]&&(!r||!r.test(b))&&(!q||!q.test(b)))try{var d=s.call(a,b);if(d||c.disconnectedMatch||a.document&&11!==a.document.nodeType)return d}catch(e){}return fa(b,n,null,[a]).length>0},fa.contains=function(a,b){return(a.ownerDocument||a)!==n&&m(a),t(a,b)},fa.attr=function(a,b){(a.ownerDocument||a)!==n&&m(a);var e=d.attrHandle[b.toLowerCase()],f=e&&D.call(d.attrHandle,b.toLowerCase())?e(a,b,!p):void 0;return void 0!==f?f:c.attributes||!p?a.getAttribute(b):(f=a.getAttributeNode(b))&&f.specified?f.value:null},fa.error=function(a){throw new Error("Syntax error, unrecognized expression: "+a)},fa.uniqueSort=function(a){var b,d=[],e=0,f=0;if(l=!c.detectDuplicates,k=!c.sortStable&&a.slice(0),a.sort(B),l){while(b=a[f++])b===a[f]&&(e=d.push(f));while(e--)a.splice(d[e],1)}return k=null,a},e=fa.getText=function(a){var b,c="",d=0,f=a.nodeType;if(f){if(1===f||9===f||11===f){if("string"==typeof a.textContent)return a.textContent;for(a=a.firstChild;a;a=a.nextSibling)c+=e(a)}else if(3===f||4===f)return a.nodeValue}else while(b=a[d++])c+=e(b);return c},d=fa.selectors={cacheLength:50,createPseudo:ha,match:W,attrHandle:{},find:{},relative:{">":{dir:"parentNode",first:!0}," ":{dir:"parentNode"},"+":{dir:"previousSibling",first:!0},"~":{dir:"previousSibling"}},preFilter:{ATTR:function(a){return a[1]=a[1].replace(ba,ca),a[3]=(a[3]||a[4]||a[5]||"").replace(ba,ca),"~="===a[2]&&(a[3]=" "+a[3]+" "),a.slice(0,4)},CHILD:function(a){return a[1]=a[1].toLowerCase(),"nth"===a[1].slice(0,3)?(a[3]||fa.error(a[0]),a[4]=+(a[4]?a[5]+(a[6]||1):2*("even"===a[3]||"odd"===a[3])),a[5]=+(a[7]+a[8]||"odd"===a[3])):a[3]&&fa.error(a[0]),a},PSEUDO:function(a){var b,c=!a[6]&&a[2];return W.CHILD.test(a[0])?null:(a[3]?a[2]=a[4]||a[5]||"":c&&U.test(c)&&(b=g(c,!0))&&(b=c.indexOf(")",c.length-b)-c.length)&&(a[0]=a[0].slice(0,b),a[2]=c.slice(0,b)),a.slice(0,3))}},filter:{TAG:function(a){var b=a.replace(ba,ca).toLowerCase();return"*"===a?function(){return!0}:function(a){return a.nodeName&&a.nodeName.toLowerCase()===b}},CLASS:function(a){var b=y[a+" "];return b||(b=new RegExp("(^|"+L+")"+a+"("+L+"|$)"))&&y(a,function(a){return b.test("string"==typeof a.className&&a.className||"undefined"!=typeof a.getAttribute&&a.getAttribute("class")||"")})},ATTR:function(a,b,c){return function(d){var e=fa.attr(d,a);return null==e?"!="===b:b?(e+="","="===b?e===c:"!="===b?e!==c:"^="===b?c&&0===e.indexOf(c):"*="===b?c&&e.indexOf(c)>-1:"$="===b?c&&e.slice(-c.length)===c:"~="===b?(" "+e.replace(P," ")+" ").indexOf(c)>-1:"|="===b?e===c||e.slice(0,c.length+1)===c+"-":!1):!0}},CHILD:function(a,b,c,d,e){var f="nth"!==a.slice(0,3),g="last"!==a.slice(-4),h="of-type"===b;return 1===d&&0===e?function(a){return!!a.parentNode}:function(b,c,i){var j,k,l,m,n,o,p=f!==g?"nextSibling":"previousSibling",q=b.parentNode,r=h&&b.nodeName.toLowerCase(),s=!i&&!h,t=!1;if(q){if(f){while(p){m=b;while(m=m[p])if(h?m.nodeName.toLowerCase()===r:1===m.nodeType)return!1;o=p="only"===a&&!o&&"nextSibling"}return!0}if(o=[g?q.firstChild:q.lastChild],g&&s){m=q,l=m[u]||(m[u]={}),k=l[m.uniqueID]||(l[m.uniqueID]={}),j=k[a]||[],n=j[0]===w&&j[1],t=n&&j[2],m=n&&q.childNodes[n];while(m=++n&&m&&m[p]||(t=n=0)||o.pop())if(1===m.nodeType&&++t&&m===b){k[a]=[w,n,t];break}}else if(s&&(m=b,l=m[u]||(m[u]={}),k=l[m.uniqueID]||(l[m.uniqueID]={}),j=k[a]||[],n=j[0]===w&&j[1],t=n),t===!1)while(m=++n&&m&&m[p]||(t=n=0)||o.pop())if((h?m.nodeName.toLowerCase()===r:1===m.nodeType)&&++t&&(s&&(l=m[u]||(m[u]={}),k=l[m.uniqueID]||(l[m.uniqueID]={}),k[a]=[w,t]),m===b))break;return t-=e,t===d||t%d===0&&t/d>=0}}},PSEUDO:function(a,b){var c,e=d.pseudos[a]||d.setFilters[a.toLowerCase()]||fa.error("unsupported pseudo: "+a);return e[u]?e(b):e.length>1?(c=[a,a,"",b],d.setFilters.hasOwnProperty(a.toLowerCase())?ha(function(a,c){var d,f=e(a,b),g=f.length;while(g--)d=J(a,f[g]),a[d]=!(c[d]=f[g])}):function(a){return e(a,0,c)}):e}},pseudos:{not:ha(function(a){var b=[],c=[],d=h(a.replace(Q,"$1"));return d[u]?ha(function(a,b,c,e){var f,g=d(a,null,e,[]),h=a.length;while(h--)(f=g[h])&&(a[h]=!(b[h]=f))}):function(a,e,f){return b[0]=a,d(b,null,f,c),b[0]=null,!c.pop()}}),has:ha(function(a){return function(b){return fa(a,b).length>0}}),contains:ha(function(a){return a=a.replace(ba,ca),function(b){return(b.textContent||b.innerText||e(b)).indexOf(a)>-1}}),lang:ha(function(a){return V.test(a||"")||fa.error("unsupported lang: "+a),a=a.replace(ba,ca).toLowerCase(),function(b){var c;do if(c=p?b.lang:b.getAttribute("xml:lang")||b.getAttribute("lang"))return c=c.toLowerCase(),c===a||0===c.indexOf(a+"-");while((b=b.parentNode)&&1===b.nodeType);return!1}}),target:function(b){var c=a.location&&a.location.hash;return c&&c.slice(1)===b.id},root:function(a){return a===o},focus:function(a){return a===n.activeElement&&(!n.hasFocus||n.hasFocus())&&!!(a.type||a.href||~a.tabIndex)},enabled:function(a){return a.disabled===!1},disabled:function(a){return a.disabled===!0},checked:function(a){var b=a.nodeName.toLowerCase();return"input"===b&&!!a.checked||"option"===b&&!!a.selected},selected:function(a){return a.parentNode&&a.parentNode.selectedIndex,a.selected===!0},empty:function(a){for(a=a.firstChild;a;a=a.nextSibling)if(a.nodeType<6)return!1;return!0},parent:function(a){return!d.pseudos.empty(a)},header:function(a){return Y.test(a.nodeName)},input:function(a){return X.test(a.nodeName)},button:function(a){var b=a.nodeName.toLowerCase();return"input"===b&&"button"===a.type||"button"===b},text:function(a){var b;return"input"===a.nodeName.toLowerCase()&&"text"===a.type&&(null==(b=a.getAttribute("type"))||"text"===b.toLowerCase())},first:na(function(){return[0]}),last:na(function(a,b){return[b-1]}),eq:na(function(a,b,c){return[0>c?c+b:c]}),even:na(function(a,b){for(var c=0;b>c;c+=2)a.push(c);return a}),odd:na(function(a,b){for(var c=1;b>c;c+=2)a.push(c);return a}),lt:na(function(a,b,c){for(var d=0>c?c+b:c;--d>=0;)a.push(d);return a}),gt:na(function(a,b,c){for(var d=0>c?c+b:c;++d<b;)a.push(d);return a})}},d.pseudos.nth=d.pseudos.eq;for(b in{radio:!0,checkbox:!0,file:!0,password:!0,image:!0})d.pseudos[b]=la(b);for(b in{submit:!0,reset:!0})d.pseudos[b]=ma(b);function pa(){}pa.prototype=d.filters=d.pseudos,d.setFilters=new pa,g=fa.tokenize=function(a,b){var c,e,f,g,h,i,j,k=z[a+" "];if(k)return b?0:k.slice(0);h=a,i=[],j=d.preFilter;while(h){c&&!(e=R.exec(h))||(e&&(h=h.slice(e[0].length)||h),i.push(f=[])),c=!1,(e=S.exec(h))&&(c=e.shift(),f.push({value:c,type:e[0].replace(Q," ")}),h=h.slice(c.length));for(g in d.filter)!(e=W[g].exec(h))||j[g]&&!(e=j[g](e))||(c=e.shift(),f.push({value:c,type:g,matches:e}),h=h.slice(c.length));if(!c)break}return b?h.length:h?fa.error(a):z(a,i).slice(0)};function qa(a){for(var b=0,c=a.length,d="";c>b;b++)d+=a[b].value;return d}function ra(a,b,c){var d=b.dir,e=c&&"parentNode"===d,f=x++;return b.first?function(b,c,f){while(b=b[d])if(1===b.nodeType||e)return a(b,c,f)}:function(b,c,g){var h,i,j,k=[w,f];if(g){while(b=b[d])if((1===b.nodeType||e)&&a(b,c,g))return!0}else while(b=b[d])if(1===b.nodeType||e){if(j=b[u]||(b[u]={}),i=j[b.uniqueID]||(j[b.uniqueID]={}),(h=i[d])&&h[0]===w&&h[1]===f)return k[2]=h[2];if(i[d]=k,k[2]=a(b,c,g))return!0}}}function sa(a){return a.length>1?function(b,c,d){var e=a.length;while(e--)if(!a[e](b,c,d))return!1;return!0}:a[0]}function ta(a,b,c){for(var d=0,e=b.length;e>d;d++)fa(a,b[d],c);return c}function ua(a,b,c,d,e){for(var f,g=[],h=0,i=a.length,j=null!=b;i>h;h++)(f=a[h])&&(c&&!c(f,d,e)||(g.push(f),j&&b.push(h)));return g}function va(a,b,c,d,e,f){return d&&!d[u]&&(d=va(d)),e&&!e[u]&&(e=va(e,f)),ha(function(f,g,h,i){var j,k,l,m=[],n=[],o=g.length,p=f||ta(b||"*",h.nodeType?[h]:h,[]),q=!a||!f&&b?p:ua(p,m,a,h,i),r=c?e||(f?a:o||d)?[]:g:q;if(c&&c(q,r,h,i),d){j=ua(r,n),d(j,[],h,i),k=j.length;while(k--)(l=j[k])&&(r[n[k]]=!(q[n[k]]=l))}if(f){if(e||a){if(e){j=[],k=r.length;while(k--)(l=r[k])&&j.push(q[k]=l);e(null,r=[],j,i)}k=r.length;while(k--)(l=r[k])&&(j=e?J(f,l):m[k])>-1&&(f[j]=!(g[j]=l))}}else r=ua(r===g?r.splice(o,r.length):r),e?e(null,g,r,i):H.apply(g,r)})}function wa(a){for(var b,c,e,f=a.length,g=d.relative[a[0].type],h=g||d.relative[" "],i=g?1:0,k=ra(function(a){return a===b},h,!0),l=ra(function(a){return J(b,a)>-1},h,!0),m=[function(a,c,d){var e=!g&&(d||c!==j)||((b=c).nodeType?k(a,c,d):l(a,c,d));return b=null,e}];f>i;i++)if(c=d.relative[a[i].type])m=[ra(sa(m),c)];else{if(c=d.filter[a[i].type].apply(null,a[i].matches),c[u]){for(e=++i;f>e;e++)if(d.relative[a[e].type])break;return va(i>1&&sa(m),i>1&&qa(a.slice(0,i-1).concat({value:" "===a[i-2].type?"*":""})).replace(Q,"$1"),c,e>i&&wa(a.slice(i,e)),f>e&&wa(a=a.slice(e)),f>e&&qa(a))}m.push(c)}return sa(m)}function xa(a,b){var c=b.length>0,e=a.length>0,f=function(f,g,h,i,k){var l,o,q,r=0,s="0",t=f&&[],u=[],v=j,x=f||e&&d.find.TAG("*",k),y=w+=null==v?1:Math.random()||.1,z=x.length;for(k&&(j=g===n||g||k);s!==z&&null!=(l=x[s]);s++){if(e&&l){o=0,g||l.ownerDocument===n||(m(l),h=!p);while(q=a[o++])if(q(l,g||n,h)){i.push(l);break}k&&(w=y)}c&&((l=!q&&l)&&r--,f&&t.push(l))}if(r+=s,c&&s!==r){o=0;while(q=b[o++])q(t,u,g,h);if(f){if(r>0)while(s--)t[s]||u[s]||(u[s]=F.call(i));u=ua(u)}H.apply(i,u),k&&!f&&u.length>0&&r+b.length>1&&fa.uniqueSort(i)}return k&&(w=y,j=v),t};return c?ha(f):f}return h=fa.compile=function(a,b){var c,d=[],e=[],f=A[a+" "];if(!f){b||(b=g(a)),c=b.length;while(c--)f=wa(b[c]),f[u]?d.push(f):e.push(f);f=A(a,xa(e,d)),f.selector=a}return f},i=fa.select=function(a,b,e,f){var i,j,k,l,m,n="function"==typeof a&&a,o=!f&&g(a=n.selector||a);if(e=e||[],1===o.length){if(j=o[0]=o[0].slice(0),j.length>2&&"ID"===(k=j[0]).type&&c.getById&&9===b.nodeType&&p&&d.relative[j[1].type]){if(b=(d.find.ID(k.matches[0].replace(ba,ca),b)||[])[0],!b)return e;n&&(b=b.parentNode),a=a.slice(j.shift().value.length)}i=W.needsContext.test(a)?0:j.length;while(i--){if(k=j[i],d.relative[l=k.type])break;if((m=d.find[l])&&(f=m(k.matches[0].replace(ba,ca),_.test(j[0].type)&&oa(b.parentNode)||b))){if(j.splice(i,1),a=f.length&&qa(j),!a)return H.apply(e,f),e;break}}}return(n||h(a,o))(f,b,!p,e,!b||_.test(a)&&oa(b.parentNode)||b),e},c.sortStable=u.split("").sort(B).join("")===u,c.detectDuplicates=!!l,m(),c.sortDetached=ia(function(a){return 1&a.compareDocumentPosition(n.createElement("div"))}),ia(function(a){return a.innerHTML="<a href='#'></a>","#"===a.firstChild.getAttribute("href")})||ja("type|href|height|width",function(a,b,c){return c?void 0:a.getAttribute(b,"type"===b.toLowerCase()?1:2)}),c.attributes&&ia(function(a){return a.innerHTML="<input/>",a.firstChild.setAttribute("value",""),""===a.firstChild.getAttribute("value")})||ja("value",function(a,b,c){return c||"input"!==a.nodeName.toLowerCase()?void 0:a.defaultValue}),ia(function(a){return null==a.getAttribute("disabled")})||ja(K,function(a,b,c){var d;return c?void 0:a[b]===!0?b.toLowerCase():(d=a.getAttributeNode(b))&&d.specified?d.value:null}),fa}(a);n.find=t,n.expr=t.selectors,n.expr[":"]=n.expr.pseudos,n.uniqueSort=n.unique=t.uniqueSort,n.text=t.getText,n.isXMLDoc=t.isXML,n.contains=t.contains;var u=function(a,b,c){var d=[],e=void 0!==c;while((a=a[b])&&9!==a.nodeType)if(1===a.nodeType){if(e&&n(a).is(c))break;d.push(a)}return d},v=function(a,b){for(var c=[];a;a=a.nextSibling)1===a.nodeType&&a!==b&&c.push(a);return c},w=n.expr.match.needsContext,x=/^<([\w-]+)\s*\/?>(?:<\/\1>|)$/,y=/^.[^:#\[\.,]*$/;function z(a,b,c){if(n.isFunction(b))return n.grep(a,function(a,d){return!!b.call(a,d,a)!==c});if(b.nodeType)return n.grep(a,function(a){return a===b!==c});if("string"==typeof b){if(y.test(b))return n.filter(b,a,c);b=n.filter(b,a)}return n.grep(a,function(a){return h.call(b,a)>-1!==c})}n.filter=function(a,b,c){var d=b[0];return c&&(a=":not("+a+")"),1===b.length&&1===d.nodeType?n.find.matchesSelector(d,a)?[d]:[]:n.find.matches(a,n.grep(b,function(a){return 1===a.nodeType}))},n.fn.extend({find:function(a){var b,c=this.length,d=[],e=this;if("string"!=typeof a)return this.pushStack(n(a).filter(function(){for(b=0;c>b;b++)if(n.contains(e[b],this))return!0}));for(b=0;c>b;b++)n.find(a,e[b],d);return d=this.pushStack(c>1?n.unique(d):d),d.selector=this.selector?this.selector+" "+a:a,d},filter:function(a){return this.pushStack(z(this,a||[],!1))},not:function(a){return this.pushStack(z(this,a||[],!0))},is:function(a){return!!z(this,"string"==typeof a&&w.test(a)?n(a):a||[],!1).length}});var A,B=/^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]*))$/,C=n.fn.init=function(a,b,c){var e,f;if(!a)return this;if(c=c||A,"string"==typeof a){if(e="<"===a[0]&&">"===a[a.length-1]&&a.length>=3?[null,a,null]:B.exec(a),!e||!e[1]&&b)return!b||b.jquery?(b||c).find(a):this.constructor(b).find(a);if(e[1]){if(b=b instanceof n?b[0]:b,n.merge(this,n.parseHTML(e[1],b&&b.nodeType?b.ownerDocument||b:d,!0)),x.test(e[1])&&n.isPlainObject(b))for(e in b)n.isFunction(this[e])?this[e](b[e]):this.attr(e,b[e]);return this}return f=d.getElementById(e[2]),f&&f.parentNode&&(this.length=1,this[0]=f),this.context=d,this.selector=a,this}return a.nodeType?(this.context=this[0]=a,this.length=1,this):n.isFunction(a)?void 0!==c.ready?c.ready(a):a(n):(void 0!==a.selector&&(this.selector=a.selector,this.context=a.context),n.makeArray(a,this))};C.prototype=n.fn,A=n(d);var D=/^(?:parents|prev(?:Until|All))/,E={children:!0,contents:!0,next:!0,prev:!0};n.fn.extend({has:function(a){var b=n(a,this),c=b.length;return this.filter(function(){for(var a=0;c>a;a++)if(n.contains(this,b[a]))return!0})},closest:function(a,b){for(var c,d=0,e=this.length,f=[],g=w.test(a)||"string"!=typeof a?n(a,b||this.context):0;e>d;d++)for(c=this[d];c&&c!==b;c=c.parentNode)if(c.nodeType<11&&(g?g.index(c)>-1:1===c.nodeType&&n.find.matchesSelector(c,a))){f.push(c);break}return this.pushStack(f.length>1?n.uniqueSort(f):f)},index:function(a){return a?"string"==typeof a?h.call(n(a),this[0]):h.call(this,a.jquery?a[0]:a):this[0]&&this[0].parentNode?this.first().prevAll().length:-1},add:function(a,b){return this.pushStack(n.uniqueSort(n.merge(this.get(),n(a,b))))},addBack:function(a){return this.add(null==a?this.prevObject:this.prevObject.filter(a))}});function F(a,b){while((a=a[b])&&1!==a.nodeType);return a}n.each({parent:function(a){var b=a.parentNode;return b&&11!==b.nodeType?b:null},parents:function(a){return u(a,"parentNode")},parentsUntil:function(a,b,c){return u(a,"parentNode",c)},next:function(a){return F(a,"nextSibling")},prev:function(a){return F(a,"previousSibling")},nextAll:function(a){return u(a,"nextSibling")},prevAll:function(a){return u(a,"previousSibling")},nextUntil:function(a,b,c){return u(a,"nextSibling",c)},prevUntil:function(a,b,c){return u(a,"previousSibling",c)},siblings:function(a){return v((a.parentNode||{}).firstChild,a)},children:function(a){return v(a.firstChild)},contents:function(a){return a.contentDocument||n.merge([],a.childNodes)}},function(a,b){n.fn[a]=function(c,d){var e=n.map(this,b,c);return"Until"!==a.slice(-5)&&(d=c),d&&"string"==typeof d&&(e=n.filter(d,e)),this.length>1&&(E[a]||n.uniqueSort(e),D.test(a)&&e.reverse()),this.pushStack(e)}});var G=/\S+/g;function H(a){var b={};return n.each(a.match(G)||[],function(a,c){b[c]=!0}),b}n.Callbacks=function(a){a="string"==typeof a?H(a):n.extend({},a);var b,c,d,e,f=[],g=[],h=-1,i=function(){for(e=a.once,d=b=!0;g.length;h=-1){c=g.shift();while(++h<f.length)f[h].apply(c[0],c[1])===!1&&a.stopOnFalse&&(h=f.length,c=!1)}a.memory||(c=!1),b=!1,e&&(f=c?[]:"")},j={add:function(){return f&&(c&&!b&&(h=f.length-1,g.push(c)),function d(b){n.each(b,function(b,c){n.isFunction(c)?a.unique&&j.has(c)||f.push(c):c&&c.length&&"string"!==n.type(c)&&d(c)})}(arguments),c&&!b&&i()),this},remove:function(){return n.each(arguments,function(a,b){var c;while((c=n.inArray(b,f,c))>-1)f.splice(c,1),h>=c&&h--}),this},has:function(a){return a?n.inArray(a,f)>-1:f.length>0},empty:function(){return f&&(f=[]),this},disable:function(){return e=g=[],f=c="",this},disabled:function(){return!f},lock:function(){return e=g=[],c||(f=c=""),this},locked:function(){return!!e},fireWith:function(a,c){return e||(c=c||[],c=[a,c.slice?c.slice():c],g.push(c),b||i()),this},fire:function(){return j.fireWith(this,arguments),this},fired:function(){return!!d}};return j},n.extend({Deferred:function(a){var b=[["resolve","done",n.Callbacks("once memory"),"resolved"],["reject","fail",n.Callbacks("once memory"),"rejected"],["notify","progress",n.Callbacks("memory")]],c="pending",d={state:function(){return c},always:function(){return e.done(arguments).fail(arguments),this},then:function(){var a=arguments;return n.Deferred(function(c){n.each(b,function(b,f){var g=n.isFunction(a[b])&&a[b];e[f[1]](function(){var a=g&&g.apply(this,arguments);a&&n.isFunction(a.promise)?a.promise().progress(c.notify).done(c.resolve).fail(c.reject):c[f[0]+"With"](this===d?c.promise():this,g?[a]:arguments)})}),a=null}).promise()},promise:function(a){return null!=a?n.extend(a,d):d}},e={};return d.pipe=d.then,n.each(b,function(a,f){var g=f[2],h=f[3];d[f[1]]=g.add,h&&g.add(function(){c=h},b[1^a][2].disable,b[2][2].lock),e[f[0]]=function(){return e[f[0]+"With"](this===e?d:this,arguments),this},e[f[0]+"With"]=g.fireWith}),d.promise(e),a&&a.call(e,e),e},when:function(a){var b=0,c=e.call(arguments),d=c.length,f=1!==d||a&&n.isFunction(a.promise)?d:0,g=1===f?a:n.Deferred(),h=function(a,b,c){return function(d){b[a]=this,c[a]=arguments.length>1?e.call(arguments):d,c===i?g.notifyWith(b,c):--f||g.resolveWith(b,c)}},i,j,k;if(d>1)for(i=new Array(d),j=new Array(d),k=new Array(d);d>b;b++)c[b]&&n.isFunction(c[b].promise)?c[b].promise().progress(h(b,j,i)).done(h(b,k,c)).fail(g.reject):--f;return f||g.resolveWith(k,c),g.promise()}});var I;n.fn.ready=function(a){return n.ready.promise().done(a),this},n.extend({isReady:!1,readyWait:1,holdReady:function(a){a?n.readyWait++:n.ready(!0)},ready:function(a){(a===!0?--n.readyWait:n.isReady)||(n.isReady=!0,a!==!0&&--n.readyWait>0||(I.resolveWith(d,[n]),n.fn.triggerHandler&&(n(d).triggerHandler("ready"),n(d).off("ready"))))}});function J(){d.removeEventListener("DOMContentLoaded",J),a.removeEventListener("load",J),n.ready()}n.ready.promise=function(b){return I||(I=n.Deferred(),"complete"===d.readyState||"loading"!==d.readyState&&!d.documentElement.doScroll?a.setTimeout(n.ready):(d.addEventListener("DOMContentLoaded",J),a.addEventListener("load",J))),I.promise(b)},n.ready.promise();var K=function(a,b,c,d,e,f,g){var h=0,i=a.length,j=null==c;if("object"===n.type(c)){e=!0;for(h in c)K(a,b,h,c[h],!0,f,g)}else if(void 0!==d&&(e=!0,n.isFunction(d)||(g=!0),j&&(g?(b.call(a,d),b=null):(j=b,b=function(a,b,c){return j.call(n(a),c)})),b))for(;i>h;h++)b(a[h],c,g?d:d.call(a[h],h,b(a[h],c)));return e?a:j?b.call(a):i?b(a[0],c):f},L=function(a){return 1===a.nodeType||9===a.nodeType||!+a.nodeType};function M(){this.expando=n.expando+M.uid++}M.uid=1,M.prototype={register:function(a,b){var c=b||{};return a.nodeType?a[this.expando]=c:Object.defineProperty(a,this.expando,{value:c,writable:!0,configurable:!0}),a[this.expando]},cache:function(a){if(!L(a))return{};var b=a[this.expando];return b||(b={},L(a)&&(a.nodeType?a[this.expando]=b:Object.defineProperty(a,this.expando,{value:b,configurable:!0}))),b},set:function(a,b,c){var d,e=this.cache(a);if("string"==typeof b)e[b]=c;else for(d in b)e[d]=b[d];return e},get:function(a,b){return void 0===b?this.cache(a):a[this.expando]&&a[this.expando][b]},access:function(a,b,c){var d;return void 0===b||b&&"string"==typeof b&&void 0===c?(d=this.get(a,b),void 0!==d?d:this.get(a,n.camelCase(b))):(this.set(a,b,c),void 0!==c?c:b)},remove:function(a,b){var c,d,e,f=a[this.expando];if(void 0!==f){if(void 0===b)this.register(a);else{n.isArray(b)?d=b.concat(b.map(n.camelCase)):(e=n.camelCase(b),b in f?d=[b,e]:(d=e,d=d in f?[d]:d.match(G)||[])),c=d.length;while(c--)delete f[d[c]]}(void 0===b||n.isEmptyObject(f))&&(a.nodeType?a[this.expando]=void 0:delete a[this.expando])}},hasData:function(a){var b=a[this.expando];return void 0!==b&&!n.isEmptyObject(b)}};var N=new M,O=new M,P=/^(?:\{[\w\W]*\}|\[[\w\W]*\])$/,Q=/[A-Z]/g;function R(a,b,c){var d;if(void 0===c&&1===a.nodeType)if(d="data-"+b.replace(Q,"-$&").toLowerCase(),c=a.getAttribute(d),"string"==typeof c){try{c="true"===c?!0:"false"===c?!1:"null"===c?null:+c+""===c?+c:P.test(c)?n.parseJSON(c):c;
+}catch(e){}O.set(a,b,c)}else c=void 0;return c}n.extend({hasData:function(a){return O.hasData(a)||N.hasData(a)},data:function(a,b,c){return O.access(a,b,c)},removeData:function(a,b){O.remove(a,b)},_data:function(a,b,c){return N.access(a,b,c)},_removeData:function(a,b){N.remove(a,b)}}),n.fn.extend({data:function(a,b){var c,d,e,f=this[0],g=f&&f.attributes;if(void 0===a){if(this.length&&(e=O.get(f),1===f.nodeType&&!N.get(f,"hasDataAttrs"))){c=g.length;while(c--)g[c]&&(d=g[c].name,0===d.indexOf("data-")&&(d=n.camelCase(d.slice(5)),R(f,d,e[d])));N.set(f,"hasDataAttrs",!0)}return e}return"object"==typeof a?this.each(function(){O.set(this,a)}):K(this,function(b){var c,d;if(f&&void 0===b){if(c=O.get(f,a)||O.get(f,a.replace(Q,"-$&").toLowerCase()),void 0!==c)return c;if(d=n.camelCase(a),c=O.get(f,d),void 0!==c)return c;if(c=R(f,d,void 0),void 0!==c)return c}else d=n.camelCase(a),this.each(function(){var c=O.get(this,d);O.set(this,d,b),a.indexOf("-")>-1&&void 0!==c&&O.set(this,a,b)})},null,b,arguments.length>1,null,!0)},removeData:function(a){return this.each(function(){O.remove(this,a)})}}),n.extend({queue:function(a,b,c){var d;return a?(b=(b||"fx")+"queue",d=N.get(a,b),c&&(!d||n.isArray(c)?d=N.access(a,b,n.makeArray(c)):d.push(c)),d||[]):void 0},dequeue:function(a,b){b=b||"fx";var c=n.queue(a,b),d=c.length,e=c.shift(),f=n._queueHooks(a,b),g=function(){n.dequeue(a,b)};"inprogress"===e&&(e=c.shift(),d--),e&&("fx"===b&&c.unshift("inprogress"),delete f.stop,e.call(a,g,f)),!d&&f&&f.empty.fire()},_queueHooks:function(a,b){var c=b+"queueHooks";return N.get(a,c)||N.access(a,c,{empty:n.Callbacks("once memory").add(function(){N.remove(a,[b+"queue",c])})})}}),n.fn.extend({queue:function(a,b){var c=2;return"string"!=typeof a&&(b=a,a="fx",c--),arguments.length<c?n.queue(this[0],a):void 0===b?this:this.each(function(){var c=n.queue(this,a,b);n._queueHooks(this,a),"fx"===a&&"inprogress"!==c[0]&&n.dequeue(this,a)})},dequeue:function(a){return this.each(function(){n.dequeue(this,a)})},clearQueue:function(a){return this.queue(a||"fx",[])},promise:function(a,b){var c,d=1,e=n.Deferred(),f=this,g=this.length,h=function(){--d||e.resolveWith(f,[f])};"string"!=typeof a&&(b=a,a=void 0),a=a||"fx";while(g--)c=N.get(f[g],a+"queueHooks"),c&&c.empty&&(d++,c.empty.add(h));return h(),e.promise(b)}});var S=/[+-]?(?:\d*\.|)\d+(?:[eE][+-]?\d+|)/.source,T=new RegExp("^(?:([+-])=|)("+S+")([a-z%]*)$","i"),U=["Top","Right","Bottom","Left"],V=function(a,b){return a=b||a,"none"===n.css(a,"display")||!n.contains(a.ownerDocument,a)};function W(a,b,c,d){var e,f=1,g=20,h=d?function(){return d.cur()}:function(){return n.css(a,b,"")},i=h(),j=c&&c[3]||(n.cssNumber[b]?"":"px"),k=(n.cssNumber[b]||"px"!==j&&+i)&&T.exec(n.css(a,b));if(k&&k[3]!==j){j=j||k[3],c=c||[],k=+i||1;do f=f||".5",k/=f,n.style(a,b,k+j);while(f!==(f=h()/i)&&1!==f&&--g)}return c&&(k=+k||+i||0,e=c[1]?k+(c[1]+1)*c[2]:+c[2],d&&(d.unit=j,d.start=k,d.end=e)),e}var X=/^(?:checkbox|radio)$/i,Y=/<([\w:-]+)/,Z=/^$|\/(?:java|ecma)script/i,$={option:[1,"<select multiple='multiple'>","</select>"],thead:[1,"<table>","</table>"],col:[2,"<table><colgroup>","</colgroup></table>"],tr:[2,"<table><tbody>","</tbody></table>"],td:[3,"<table><tbody><tr>","</tr></tbody></table>"],_default:[0,"",""]};$.optgroup=$.option,$.tbody=$.tfoot=$.colgroup=$.caption=$.thead,$.th=$.td;function _(a,b){var c="undefined"!=typeof a.getElementsByTagName?a.getElementsByTagName(b||"*"):"undefined"!=typeof a.querySelectorAll?a.querySelectorAll(b||"*"):[];return void 0===b||b&&n.nodeName(a,b)?n.merge([a],c):c}function aa(a,b){for(var c=0,d=a.length;d>c;c++)N.set(a[c],"globalEval",!b||N.get(b[c],"globalEval"))}var ba=/<|&#?\w+;/;function ca(a,b,c,d,e){for(var f,g,h,i,j,k,l=b.createDocumentFragment(),m=[],o=0,p=a.length;p>o;o++)if(f=a[o],f||0===f)if("object"===n.type(f))n.merge(m,f.nodeType?[f]:f);else if(ba.test(f)){g=g||l.appendChild(b.createElement("div")),h=(Y.exec(f)||["",""])[1].toLowerCase(),i=$[h]||$._default,g.innerHTML=i[1]+n.htmlPrefilter(f)+i[2],k=i[0];while(k--)g=g.lastChild;n.merge(m,g.childNodes),g=l.firstChild,g.textContent=""}else m.push(b.createTextNode(f));l.textContent="",o=0;while(f=m[o++])if(d&&n.inArray(f,d)>-1)e&&e.push(f);else if(j=n.contains(f.ownerDocument,f),g=_(l.appendChild(f),"script"),j&&aa(g),c){k=0;while(f=g[k++])Z.test(f.type||"")&&c.push(f)}return l}!function(){var a=d.createDocumentFragment(),b=a.appendChild(d.createElement("div")),c=d.createElement("input");c.setAttribute("type","radio"),c.setAttribute("checked","checked"),c.setAttribute("name","t"),b.appendChild(c),l.checkClone=b.cloneNode(!0).cloneNode(!0).lastChild.checked,b.innerHTML="<textarea>x</textarea>",l.noCloneChecked=!!b.cloneNode(!0).lastChild.defaultValue}();var da=/^key/,ea=/^(?:mouse|pointer|contextmenu|drag|drop)|click/,fa=/^([^.]*)(?:\.(.+)|)/;function ga(){return!0}function ha(){return!1}function ia(){try{return d.activeElement}catch(a){}}function ja(a,b,c,d,e,f){var g,h;if("object"==typeof b){"string"!=typeof c&&(d=d||c,c=void 0);for(h in b)ja(a,h,c,d,b[h],f);return a}if(null==d&&null==e?(e=c,d=c=void 0):null==e&&("string"==typeof c?(e=d,d=void 0):(e=d,d=c,c=void 0)),e===!1)e=ha;else if(!e)return a;return 1===f&&(g=e,e=function(a){return n().off(a),g.apply(this,arguments)},e.guid=g.guid||(g.guid=n.guid++)),a.each(function(){n.event.add(this,b,e,d,c)})}n.event={global:{},add:function(a,b,c,d,e){var f,g,h,i,j,k,l,m,o,p,q,r=N.get(a);if(r){c.handler&&(f=c,c=f.handler,e=f.selector),c.guid||(c.guid=n.guid++),(i=r.events)||(i=r.events={}),(g=r.handle)||(g=r.handle=function(b){return"undefined"!=typeof n&&n.event.triggered!==b.type?n.event.dispatch.apply(a,arguments):void 0}),b=(b||"").match(G)||[""],j=b.length;while(j--)h=fa.exec(b[j])||[],o=q=h[1],p=(h[2]||"").split(".").sort(),o&&(l=n.event.special[o]||{},o=(e?l.delegateType:l.bindType)||o,l=n.event.special[o]||{},k=n.extend({type:o,origType:q,data:d,handler:c,guid:c.guid,selector:e,needsContext:e&&n.expr.match.needsContext.test(e),namespace:p.join(".")},f),(m=i[o])||(m=i[o]=[],m.delegateCount=0,l.setup&&l.setup.call(a,d,p,g)!==!1||a.addEventListener&&a.addEventListener(o,g)),l.add&&(l.add.call(a,k),k.handler.guid||(k.handler.guid=c.guid)),e?m.splice(m.delegateCount++,0,k):m.push(k),n.event.global[o]=!0)}},remove:function(a,b,c,d,e){var f,g,h,i,j,k,l,m,o,p,q,r=N.hasData(a)&&N.get(a);if(r&&(i=r.events)){b=(b||"").match(G)||[""],j=b.length;while(j--)if(h=fa.exec(b[j])||[],o=q=h[1],p=(h[2]||"").split(".").sort(),o){l=n.event.special[o]||{},o=(d?l.delegateType:l.bindType)||o,m=i[o]||[],h=h[2]&&new RegExp("(^|\\.)"+p.join("\\.(?:.*\\.|)")+"(\\.|$)"),g=f=m.length;while(f--)k=m[f],!e&&q!==k.origType||c&&c.guid!==k.guid||h&&!h.test(k.namespace)||d&&d!==k.selector&&("**"!==d||!k.selector)||(m.splice(f,1),k.selector&&m.delegateCount--,l.remove&&l.remove.call(a,k));g&&!m.length&&(l.teardown&&l.teardown.call(a,p,r.handle)!==!1||n.removeEvent(a,o,r.handle),delete i[o])}else for(o in i)n.event.remove(a,o+b[j],c,d,!0);n.isEmptyObject(i)&&N.remove(a,"handle events")}},dispatch:function(a){a=n.event.fix(a);var b,c,d,f,g,h=[],i=e.call(arguments),j=(N.get(this,"events")||{})[a.type]||[],k=n.event.special[a.type]||{};if(i[0]=a,a.delegateTarget=this,!k.preDispatch||k.preDispatch.call(this,a)!==!1){h=n.event.handlers.call(this,a,j),b=0;while((f=h[b++])&&!a.isPropagationStopped()){a.currentTarget=f.elem,c=0;while((g=f.handlers[c++])&&!a.isImmediatePropagationStopped())a.rnamespace&&!a.rnamespace.test(g.namespace)||(a.handleObj=g,a.data=g.data,d=((n.event.special[g.origType]||{}).handle||g.handler).apply(f.elem,i),void 0!==d&&(a.result=d)===!1&&(a.preventDefault(),a.stopPropagation()))}return k.postDispatch&&k.postDispatch.call(this,a),a.result}},handlers:function(a,b){var c,d,e,f,g=[],h=b.delegateCount,i=a.target;if(h&&i.nodeType&&("click"!==a.type||isNaN(a.button)||a.button<1))for(;i!==this;i=i.parentNode||this)if(1===i.nodeType&&(i.disabled!==!0||"click"!==a.type)){for(d=[],c=0;h>c;c++)f=b[c],e=f.selector+" ",void 0===d[e]&&(d[e]=f.needsContext?n(e,this).index(i)>-1:n.find(e,this,null,[i]).length),d[e]&&d.push(f);d.length&&g.push({elem:i,handlers:d})}return h<b.length&&g.push({elem:this,handlers:b.slice(h)}),g},props:"altKey bubbles cancelable ctrlKey currentTarget detail eventPhase metaKey relatedTarget shiftKey target timeStamp view which".split(" "),fixHooks:{},keyHooks:{props:"char charCode key keyCode".split(" "),filter:function(a,b){return null==a.which&&(a.which=null!=b.charCode?b.charCode:b.keyCode),a}},mouseHooks:{props:"button buttons clientX clientY offsetX offsetY pageX pageY screenX screenY toElement".split(" "),filter:function(a,b){var c,e,f,g=b.button;return null==a.pageX&&null!=b.clientX&&(c=a.target.ownerDocument||d,e=c.documentElement,f=c.body,a.pageX=b.clientX+(e&&e.scrollLeft||f&&f.scrollLeft||0)-(e&&e.clientLeft||f&&f.clientLeft||0),a.pageY=b.clientY+(e&&e.scrollTop||f&&f.scrollTop||0)-(e&&e.clientTop||f&&f.clientTop||0)),a.which||void 0===g||(a.which=1&g?1:2&g?3:4&g?2:0),a}},fix:function(a){if(a[n.expando])return a;var b,c,e,f=a.type,g=a,h=this.fixHooks[f];h||(this.fixHooks[f]=h=ea.test(f)?this.mouseHooks:da.test(f)?this.keyHooks:{}),e=h.props?this.props.concat(h.props):this.props,a=new n.Event(g),b=e.length;while(b--)c=e[b],a[c]=g[c];return a.target||(a.target=d),3===a.target.nodeType&&(a.target=a.target.parentNode),h.filter?h.filter(a,g):a},special:{load:{noBubble:!0},focus:{trigger:function(){return this!==ia()&&this.focus?(this.focus(),!1):void 0},delegateType:"focusin"},blur:{trigger:function(){return this===ia()&&this.blur?(this.blur(),!1):void 0},delegateType:"focusout"},click:{trigger:function(){return"checkbox"===this.type&&this.click&&n.nodeName(this,"input")?(this.click(),!1):void 0},_default:function(a){return n.nodeName(a.target,"a")}},beforeunload:{postDispatch:function(a){void 0!==a.result&&a.originalEvent&&(a.originalEvent.returnValue=a.result)}}}},n.removeEvent=function(a,b,c){a.removeEventListener&&a.removeEventListener(b,c)},n.Event=function(a,b){return this instanceof n.Event?(a&&a.type?(this.originalEvent=a,this.type=a.type,this.isDefaultPrevented=a.defaultPrevented||void 0===a.defaultPrevented&&a.returnValue===!1?ga:ha):this.type=a,b&&n.extend(this,b),this.timeStamp=a&&a.timeStamp||n.now(),void(this[n.expando]=!0)):new n.Event(a,b)},n.Event.prototype={constructor:n.Event,isDefaultPrevented:ha,isPropagationStopped:ha,isImmediatePropagationStopped:ha,preventDefault:function(){var a=this.originalEvent;this.isDefaultPrevented=ga,a&&a.preventDefault()},stopPropagation:function(){var a=this.originalEvent;this.isPropagationStopped=ga,a&&a.stopPropagation()},stopImmediatePropagation:function(){var a=this.originalEvent;this.isImmediatePropagationStopped=ga,a&&a.stopImmediatePropagation(),this.stopPropagation()}},n.each({mouseenter:"mouseover",mouseleave:"mouseout",pointerenter:"pointerover",pointerleave:"pointerout"},function(a,b){n.event.special[a]={delegateType:b,bindType:b,handle:function(a){var c,d=this,e=a.relatedTarget,f=a.handleObj;return e&&(e===d||n.contains(d,e))||(a.type=f.origType,c=f.handler.apply(this,arguments),a.type=b),c}}}),n.fn.extend({on:function(a,b,c,d){return ja(this,a,b,c,d)},one:function(a,b,c,d){return ja(this,a,b,c,d,1)},off:function(a,b,c){var d,e;if(a&&a.preventDefault&&a.handleObj)return d=a.handleObj,n(a.delegateTarget).off(d.namespace?d.origType+"."+d.namespace:d.origType,d.selector,d.handler),this;if("object"==typeof a){for(e in a)this.off(e,b,a[e]);return this}return b!==!1&&"function"!=typeof b||(c=b,b=void 0),c===!1&&(c=ha),this.each(function(){n.event.remove(this,a,c,b)})}});var ka=/<(?!area|br|col|embed|hr|img|input|link|meta|param)(([\w:-]+)[^>]*)\/>/gi,la=/<script|<style|<link/i,ma=/checked\s*(?:[^=]|=\s*.checked.)/i,na=/^true\/(.*)/,oa=/^\s*<!(?:\[CDATA\[|--)|(?:\]\]|--)>\s*$/g;function pa(a,b){return n.nodeName(a,"table")&&n.nodeName(11!==b.nodeType?b:b.firstChild,"tr")?a.getElementsByTagName("tbody")[0]||a.appendChild(a.ownerDocument.createElement("tbody")):a}function qa(a){return a.type=(null!==a.getAttribute("type"))+"/"+a.type,a}function ra(a){var b=na.exec(a.type);return b?a.type=b[1]:a.removeAttribute("type"),a}function sa(a,b){var c,d,e,f,g,h,i,j;if(1===b.nodeType){if(N.hasData(a)&&(f=N.access(a),g=N.set(b,f),j=f.events)){delete g.handle,g.events={};for(e in j)for(c=0,d=j[e].length;d>c;c++)n.event.add(b,e,j[e][c])}O.hasData(a)&&(h=O.access(a),i=n.extend({},h),O.set(b,i))}}function ta(a,b){var c=b.nodeName.toLowerCase();"input"===c&&X.test(a.type)?b.checked=a.checked:"input"!==c&&"textarea"!==c||(b.defaultValue=a.defaultValue)}function ua(a,b,c,d){b=f.apply([],b);var e,g,h,i,j,k,m=0,o=a.length,p=o-1,q=b[0],r=n.isFunction(q);if(r||o>1&&"string"==typeof q&&!l.checkClone&&ma.test(q))return a.each(function(e){var f=a.eq(e);r&&(b[0]=q.call(this,e,f.html())),ua(f,b,c,d)});if(o&&(e=ca(b,a[0].ownerDocument,!1,a,d),g=e.firstChild,1===e.childNodes.length&&(e=g),g||d)){for(h=n.map(_(e,"script"),qa),i=h.length;o>m;m++)j=e,m!==p&&(j=n.clone(j,!0,!0),i&&n.merge(h,_(j,"script"))),c.call(a[m],j,m);if(i)for(k=h[h.length-1].ownerDocument,n.map(h,ra),m=0;i>m;m++)j=h[m],Z.test(j.type||"")&&!N.access(j,"globalEval")&&n.contains(k,j)&&(j.src?n._evalUrl&&n._evalUrl(j.src):n.globalEval(j.textContent.replace(oa,"")))}return a}function va(a,b,c){for(var d,e=b?n.filter(b,a):a,f=0;null!=(d=e[f]);f++)c||1!==d.nodeType||n.cleanData(_(d)),d.parentNode&&(c&&n.contains(d.ownerDocument,d)&&aa(_(d,"script")),d.parentNode.removeChild(d));return a}n.extend({htmlPrefilter:function(a){return a.replace(ka,"<$1></$2>")},clone:function(a,b,c){var d,e,f,g,h=a.cloneNode(!0),i=n.contains(a.ownerDocument,a);if(!(l.noCloneChecked||1!==a.nodeType&&11!==a.nodeType||n.isXMLDoc(a)))for(g=_(h),f=_(a),d=0,e=f.length;e>d;d++)ta(f[d],g[d]);if(b)if(c)for(f=f||_(a),g=g||_(h),d=0,e=f.length;e>d;d++)sa(f[d],g[d]);else sa(a,h);return g=_(h,"script"),g.length>0&&aa(g,!i&&_(a,"script")),h},cleanData:function(a){for(var b,c,d,e=n.event.special,f=0;void 0!==(c=a[f]);f++)if(L(c)){if(b=c[N.expando]){if(b.events)for(d in b.events)e[d]?n.event.remove(c,d):n.removeEvent(c,d,b.handle);c[N.expando]=void 0}c[O.expando]&&(c[O.expando]=void 0)}}}),n.fn.extend({domManip:ua,detach:function(a){return va(this,a,!0)},remove:function(a){return va(this,a)},text:function(a){return K(this,function(a){return void 0===a?n.text(this):this.empty().each(function(){1!==this.nodeType&&11!==this.nodeType&&9!==this.nodeType||(this.textContent=a)})},null,a,arguments.length)},append:function(){return ua(this,arguments,function(a){if(1===this.nodeType||11===this.nodeType||9===this.nodeType){var b=pa(this,a);b.appendChild(a)}})},prepend:function(){return ua(this,arguments,function(a){if(1===this.nodeType||11===this.nodeType||9===this.nodeType){var b=pa(this,a);b.insertBefore(a,b.firstChild)}})},before:function(){return ua(this,arguments,function(a){this.parentNode&&this.parentNode.insertBefore(a,this)})},after:function(){return ua(this,arguments,function(a){this.parentNode&&this.parentNode.insertBefore(a,this.nextSibling)})},empty:function(){for(var a,b=0;null!=(a=this[b]);b++)1===a.nodeType&&(n.cleanData(_(a,!1)),a.textContent="");return this},clone:function(a,b){return a=null==a?!1:a,b=null==b?a:b,this.map(function(){return n.clone(this,a,b)})},html:function(a){return K(this,function(a){var b=this[0]||{},c=0,d=this.length;if(void 0===a&&1===b.nodeType)return b.innerHTML;if("string"==typeof a&&!la.test(a)&&!$[(Y.exec(a)||["",""])[1].toLowerCase()]){a=n.htmlPrefilter(a);try{for(;d>c;c++)b=this[c]||{},1===b.nodeType&&(n.cleanData(_(b,!1)),b.innerHTML=a);b=0}catch(e){}}b&&this.empty().append(a)},null,a,arguments.length)},replaceWith:function(){var a=[];return ua(this,arguments,function(b){var c=this.parentNode;n.inArray(this,a)<0&&(n.cleanData(_(this)),c&&c.replaceChild(b,this))},a)}}),n.each({appendTo:"append",prependTo:"prepend",insertBefore:"before",insertAfter:"after",replaceAll:"replaceWith"},function(a,b){n.fn[a]=function(a){for(var c,d=[],e=n(a),f=e.length-1,h=0;f>=h;h++)c=h===f?this:this.clone(!0),n(e[h])[b](c),g.apply(d,c.get());return this.pushStack(d)}});var wa,xa={HTML:"block",BODY:"block"};function ya(a,b){var c=n(b.createElement(a)).appendTo(b.body),d=n.css(c[0],"display");return c.detach(),d}function za(a){var b=d,c=xa[a];return c||(c=ya(a,b),"none"!==c&&c||(wa=(wa||n("<iframe frameborder='0' width='0' height='0'/>")).appendTo(b.documentElement),b=wa[0].contentDocument,b.write(),b.close(),c=ya(a,b),wa.detach()),xa[a]=c),c}var Aa=/^margin/,Ba=new RegExp("^("+S+")(?!px)[a-z%]+$","i"),Ca=function(b){var c=b.ownerDocument.defaultView;return c&&c.opener||(c=a),c.getComputedStyle(b)},Da=function(a,b,c,d){var e,f,g={};for(f in b)g[f]=a.style[f],a.style[f]=b[f];e=c.apply(a,d||[]);for(f in b)a.style[f]=g[f];return e},Ea=d.documentElement;!function(){var b,c,e,f,g=d.createElement("div"),h=d.createElement("div");if(h.style){h.style.backgroundClip="content-box",h.cloneNode(!0).style.backgroundClip="",l.clearCloneStyle="content-box"===h.style.backgroundClip,g.style.cssText="border:0;width:8px;height:0;top:0;left:-9999px;padding:0;margin-top:1px;position:absolute",g.appendChild(h);function i(){h.style.cssText="-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;position:relative;display:block;margin:auto;border:1px;padding:1px;top:1%;width:50%",h.innerHTML="",Ea.appendChild(g);var d=a.getComputedStyle(h);b="1%"!==d.top,f="2px"===d.marginLeft,c="4px"===d.width,h.style.marginRight="50%",e="4px"===d.marginRight,Ea.removeChild(g)}n.extend(l,{pixelPosition:function(){return i(),b},boxSizingReliable:function(){return null==c&&i(),c},pixelMarginRight:function(){return null==c&&i(),e},reliableMarginLeft:function(){return null==c&&i(),f},reliableMarginRight:function(){var b,c=h.appendChild(d.createElement("div"));return c.style.cssText=h.style.cssText="-webkit-box-sizing:content-box;box-sizing:content-box;display:block;margin:0;border:0;padding:0",c.style.marginRight=c.style.width="0",h.style.width="1px",Ea.appendChild(g),b=!parseFloat(a.getComputedStyle(c).marginRight),Ea.removeChild(g),h.removeChild(c),b}})}}();function Fa(a,b,c){var d,e,f,g,h=a.style;return c=c||Ca(a),g=c?c.getPropertyValue(b)||c[b]:void 0,""!==g&&void 0!==g||n.contains(a.ownerDocument,a)||(g=n.style(a,b)),c&&!l.pixelMarginRight()&&Ba.test(g)&&Aa.test(b)&&(d=h.width,e=h.minWidth,f=h.maxWidth,h.minWidth=h.maxWidth=h.width=g,g=c.width,h.width=d,h.minWidth=e,h.maxWidth=f),void 0!==g?g+"":g}function Ga(a,b){return{get:function(){return a()?void delete this.get:(this.get=b).apply(this,arguments)}}}var Ha=/^(none|table(?!-c[ea]).+)/,Ia={position:"absolute",visibility:"hidden",display:"block"},Ja={letterSpacing:"0",fontWeight:"400"},Ka=["Webkit","O","Moz","ms"],La=d.createElement("div").style;function Ma(a){if(a in La)return a;var b=a[0].toUpperCase()+a.slice(1),c=Ka.length;while(c--)if(a=Ka[c]+b,a in La)return a}function Na(a,b,c){var d=T.exec(b);return d?Math.max(0,d[2]-(c||0))+(d[3]||"px"):b}function Oa(a,b,c,d,e){for(var f=c===(d?"border":"content")?4:"width"===b?1:0,g=0;4>f;f+=2)"margin"===c&&(g+=n.css(a,c+U[f],!0,e)),d?("content"===c&&(g-=n.css(a,"padding"+U[f],!0,e)),"margin"!==c&&(g-=n.css(a,"border"+U[f]+"Width",!0,e))):(g+=n.css(a,"padding"+U[f],!0,e),"padding"!==c&&(g+=n.css(a,"border"+U[f]+"Width",!0,e)));return g}function Pa(b,c,e){var f=!0,g="width"===c?b.offsetWidth:b.offsetHeight,h=Ca(b),i="border-box"===n.css(b,"boxSizing",!1,h);if(d.msFullscreenElement&&a.top!==a&&b.getClientRects().length&&(g=Math.round(100*b.getBoundingClientRect()[c])),0>=g||null==g){if(g=Fa(b,c,h),(0>g||null==g)&&(g=b.style[c]),Ba.test(g))return g;f=i&&(l.boxSizingReliable()||g===b.style[c]),g=parseFloat(g)||0}return g+Oa(b,c,e||(i?"border":"content"),f,h)+"px"}function Qa(a,b){for(var c,d,e,f=[],g=0,h=a.length;h>g;g++)d=a[g],d.style&&(f[g]=N.get(d,"olddisplay"),c=d.style.display,b?(f[g]||"none"!==c||(d.style.display=""),""===d.style.display&&V(d)&&(f[g]=N.access(d,"olddisplay",za(d.nodeName)))):(e=V(d),"none"===c&&e||N.set(d,"olddisplay",e?c:n.css(d,"display"))));for(g=0;h>g;g++)d=a[g],d.style&&(b&&"none"!==d.style.display&&""!==d.style.display||(d.style.display=b?f[g]||"":"none"));return a}n.extend({cssHooks:{opacity:{get:function(a,b){if(b){var c=Fa(a,"opacity");return""===c?"1":c}}}},cssNumber:{animationIterationCount:!0,columnCount:!0,fillOpacity:!0,flexGrow:!0,flexShrink:!0,fontWeight:!0,lineHeight:!0,opacity:!0,order:!0,orphans:!0,widows:!0,zIndex:!0,zoom:!0},cssProps:{"float":"cssFloat"},style:function(a,b,c,d){if(a&&3!==a.nodeType&&8!==a.nodeType&&a.style){var e,f,g,h=n.camelCase(b),i=a.style;return b=n.cssProps[h]||(n.cssProps[h]=Ma(h)||h),g=n.cssHooks[b]||n.cssHooks[h],void 0===c?g&&"get"in g&&void 0!==(e=g.get(a,!1,d))?e:i[b]:(f=typeof c,"string"===f&&(e=T.exec(c))&&e[1]&&(c=W(a,b,e),f="number"),null!=c&&c===c&&("number"===f&&(c+=e&&e[3]||(n.cssNumber[h]?"":"px")),l.clearCloneStyle||""!==c||0!==b.indexOf("background")||(i[b]="inherit"),g&&"set"in g&&void 0===(c=g.set(a,c,d))||(i[b]=c)),void 0)}},css:function(a,b,c,d){var e,f,g,h=n.camelCase(b);return b=n.cssProps[h]||(n.cssProps[h]=Ma(h)||h),g=n.cssHooks[b]||n.cssHooks[h],g&&"get"in g&&(e=g.get(a,!0,c)),void 0===e&&(e=Fa(a,b,d)),"normal"===e&&b in Ja&&(e=Ja[b]),""===c||c?(f=parseFloat(e),c===!0||isFinite(f)?f||0:e):e}}),n.each(["height","width"],function(a,b){n.cssHooks[b]={get:function(a,c,d){return c?Ha.test(n.css(a,"display"))&&0===a.offsetWidth?Da(a,Ia,function(){return Pa(a,b,d)}):Pa(a,b,d):void 0},set:function(a,c,d){var e,f=d&&Ca(a),g=d&&Oa(a,b,d,"border-box"===n.css(a,"boxSizing",!1,f),f);return g&&(e=T.exec(c))&&"px"!==(e[3]||"px")&&(a.style[b]=c,c=n.css(a,b)),Na(a,c,g)}}}),n.cssHooks.marginLeft=Ga(l.reliableMarginLeft,function(a,b){return b?(parseFloat(Fa(a,"marginLeft"))||a.getBoundingClientRect().left-Da(a,{marginLeft:0},function(){return a.getBoundingClientRect().left}))+"px":void 0}),n.cssHooks.marginRight=Ga(l.reliableMarginRight,function(a,b){return b?Da(a,{display:"inline-block"},Fa,[a,"marginRight"]):void 0}),n.each({margin:"",padding:"",border:"Width"},function(a,b){n.cssHooks[a+b]={expand:function(c){for(var d=0,e={},f="string"==typeof c?c.split(" "):[c];4>d;d++)e[a+U[d]+b]=f[d]||f[d-2]||f[0];return e}},Aa.test(a)||(n.cssHooks[a+b].set=Na)}),n.fn.extend({css:function(a,b){return K(this,function(a,b,c){var d,e,f={},g=0;if(n.isArray(b)){for(d=Ca(a),e=b.length;e>g;g++)f[b[g]]=n.css(a,b[g],!1,d);return f}return void 0!==c?n.style(a,b,c):n.css(a,b)},a,b,arguments.length>1)},show:function(){return Qa(this,!0)},hide:function(){return Qa(this)},toggle:function(a){return"boolean"==typeof a?a?this.show():this.hide():this.each(function(){V(this)?n(this).show():n(this).hide()})}});function Ra(a,b,c,d,e){return new Ra.prototype.init(a,b,c,d,e)}n.Tween=Ra,Ra.prototype={constructor:Ra,init:function(a,b,c,d,e,f){this.elem=a,this.prop=c,this.easing=e||n.easing._default,this.options=b,this.start=this.now=this.cur(),this.end=d,this.unit=f||(n.cssNumber[c]?"":"px")},cur:function(){var a=Ra.propHooks[this.prop];return a&&a.get?a.get(this):Ra.propHooks._default.get(this)},run:function(a){var b,c=Ra.propHooks[this.prop];return this.options.duration?this.pos=b=n.easing[this.easing](a,this.options.duration*a,0,1,this.options.duration):this.pos=b=a,this.now=(this.end-this.start)*b+this.start,this.options.step&&this.options.step.call(this.elem,this.now,this),c&&c.set?c.set(this):Ra.propHooks._default.set(this),this}},Ra.prototype.init.prototype=Ra.prototype,Ra.propHooks={_default:{get:function(a){var b;return 1!==a.elem.nodeType||null!=a.elem[a.prop]&&null==a.elem.style[a.prop]?a.elem[a.prop]:(b=n.css(a.elem,a.prop,""),b&&"auto"!==b?b:0)},set:function(a){n.fx.step[a.prop]?n.fx.step[a.prop](a):1!==a.elem.nodeType||null==a.elem.style[n.cssProps[a.prop]]&&!n.cssHooks[a.prop]?a.elem[a.prop]=a.now:n.style(a.elem,a.prop,a.now+a.unit)}}},Ra.propHooks.scrollTop=Ra.propHooks.scrollLeft={set:function(a){a.elem.nodeType&&a.elem.parentNode&&(a.elem[a.prop]=a.now)}},n.easing={linear:function(a){return a},swing:function(a){return.5-Math.cos(a*Math.PI)/2},_default:"swing"},n.fx=Ra.prototype.init,n.fx.step={};var Sa,Ta,Ua=/^(?:toggle|show|hide)$/,Va=/queueHooks$/;function Wa(){return a.setTimeout(function(){Sa=void 0}),Sa=n.now()}function Xa(a,b){var c,d=0,e={height:a};for(b=b?1:0;4>d;d+=2-b)c=U[d],e["margin"+c]=e["padding"+c]=a;return b&&(e.opacity=e.width=a),e}function Ya(a,b,c){for(var d,e=(_a.tweeners[b]||[]).concat(_a.tweeners["*"]),f=0,g=e.length;g>f;f++)if(d=e[f].call(c,b,a))return d}function Za(a,b,c){var d,e,f,g,h,i,j,k,l=this,m={},o=a.style,p=a.nodeType&&V(a),q=N.get(a,"fxshow");c.queue||(h=n._queueHooks(a,"fx"),null==h.unqueued&&(h.unqueued=0,i=h.empty.fire,h.empty.fire=function(){h.unqueued||i()}),h.unqueued++,l.always(function(){l.always(function(){h.unqueued--,n.queue(a,"fx").length||h.empty.fire()})})),1===a.nodeType&&("height"in b||"width"in b)&&(c.overflow=[o.overflow,o.overflowX,o.overflowY],j=n.css(a,"display"),k="none"===j?N.get(a,"olddisplay")||za(a.nodeName):j,"inline"===k&&"none"===n.css(a,"float")&&(o.display="inline-block")),c.overflow&&(o.overflow="hidden",l.always(function(){o.overflow=c.overflow[0],o.overflowX=c.overflow[1],o.overflowY=c.overflow[2]}));for(d in b)if(e=b[d],Ua.exec(e)){if(delete b[d],f=f||"toggle"===e,e===(p?"hide":"show")){if("show"!==e||!q||void 0===q[d])continue;p=!0}m[d]=q&&q[d]||n.style(a,d)}else j=void 0;if(n.isEmptyObject(m))"inline"===("none"===j?za(a.nodeName):j)&&(o.display=j);else{q?"hidden"in q&&(p=q.hidden):q=N.access(a,"fxshow",{}),f&&(q.hidden=!p),p?n(a).show():l.done(function(){n(a).hide()}),l.done(function(){var b;N.remove(a,"fxshow");for(b in m)n.style(a,b,m[b])});for(d in m)g=Ya(p?q[d]:0,d,l),d in q||(q[d]=g.start,p&&(g.end=g.start,g.start="width"===d||"height"===d?1:0))}}function $a(a,b){var c,d,e,f,g;for(c in a)if(d=n.camelCase(c),e=b[d],f=a[c],n.isArray(f)&&(e=f[1],f=a[c]=f[0]),c!==d&&(a[d]=f,delete a[c]),g=n.cssHooks[d],g&&"expand"in g){f=g.expand(f),delete a[d];for(c in f)c in a||(a[c]=f[c],b[c]=e)}else b[d]=e}function _a(a,b,c){var d,e,f=0,g=_a.prefilters.length,h=n.Deferred().always(function(){delete i.elem}),i=function(){if(e)return!1;for(var b=Sa||Wa(),c=Math.max(0,j.startTime+j.duration-b),d=c/j.duration||0,f=1-d,g=0,i=j.tweens.length;i>g;g++)j.tweens[g].run(f);return h.notifyWith(a,[j,f,c]),1>f&&i?c:(h.resolveWith(a,[j]),!1)},j=h.promise({elem:a,props:n.extend({},b),opts:n.extend(!0,{specialEasing:{},easing:n.easing._default},c),originalProperties:b,originalOptions:c,startTime:Sa||Wa(),duration:c.duration,tweens:[],createTween:function(b,c){var d=n.Tween(a,j.opts,b,c,j.opts.specialEasing[b]||j.opts.easing);return j.tweens.push(d),d},stop:function(b){var c=0,d=b?j.tweens.length:0;if(e)return this;for(e=!0;d>c;c++)j.tweens[c].run(1);return b?(h.notifyWith(a,[j,1,0]),h.resolveWith(a,[j,b])):h.rejectWith(a,[j,b]),this}}),k=j.props;for($a(k,j.opts.specialEasing);g>f;f++)if(d=_a.prefilters[f].call(j,a,k,j.opts))return n.isFunction(d.stop)&&(n._queueHooks(j.elem,j.opts.queue).stop=n.proxy(d.stop,d)),d;return n.map(k,Ya,j),n.isFunction(j.opts.start)&&j.opts.start.call(a,j),n.fx.timer(n.extend(i,{elem:a,anim:j,queue:j.opts.queue})),j.progress(j.opts.progress).done(j.opts.done,j.opts.complete).fail(j.opts.fail).always(j.opts.always)}n.Animation=n.extend(_a,{tweeners:{"*":[function(a,b){var c=this.createTween(a,b);return W(c.elem,a,T.exec(b),c),c}]},tweener:function(a,b){n.isFunction(a)?(b=a,a=["*"]):a=a.match(G);for(var c,d=0,e=a.length;e>d;d++)c=a[d],_a.tweeners[c]=_a.tweeners[c]||[],_a.tweeners[c].unshift(b)},prefilters:[Za],prefilter:function(a,b){b?_a.prefilters.unshift(a):_a.prefilters.push(a)}}),n.speed=function(a,b,c){var d=a&&"object"==typeof a?n.extend({},a):{complete:c||!c&&b||n.isFunction(a)&&a,duration:a,easing:c&&b||b&&!n.isFunction(b)&&b};return d.duration=n.fx.off?0:"number"==typeof d.duration?d.duration:d.duration in n.fx.speeds?n.fx.speeds[d.duration]:n.fx.speeds._default,null!=d.queue&&d.queue!==!0||(d.queue="fx"),d.old=d.complete,d.complete=function(){n.isFunction(d.old)&&d.old.call(this),d.queue&&n.dequeue(this,d.queue)},d},n.fn.extend({fadeTo:function(a,b,c,d){return this.filter(V).css("opacity",0).show().end().animate({opacity:b},a,c,d)},animate:function(a,b,c,d){var e=n.isEmptyObject(a),f=n.speed(b,c,d),g=function(){var b=_a(this,n.extend({},a),f);(e||N.get(this,"finish"))&&b.stop(!0)};return g.finish=g,e||f.queue===!1?this.each(g):this.queue(f.queue,g)},stop:function(a,b,c){var d=function(a){var b=a.stop;delete a.stop,b(c)};return"string"!=typeof a&&(c=b,b=a,a=void 0),b&&a!==!1&&this.queue(a||"fx",[]),this.each(function(){var b=!0,e=null!=a&&a+"queueHooks",f=n.timers,g=N.get(this);if(e)g[e]&&g[e].stop&&d(g[e]);else for(e in g)g[e]&&g[e].stop&&Va.test(e)&&d(g[e]);for(e=f.length;e--;)f[e].elem!==this||null!=a&&f[e].queue!==a||(f[e].anim.stop(c),b=!1,f.splice(e,1));!b&&c||n.dequeue(this,a)})},finish:function(a){return a!==!1&&(a=a||"fx"),this.each(function(){var b,c=N.get(this),d=c[a+"queue"],e=c[a+"queueHooks"],f=n.timers,g=d?d.length:0;for(c.finish=!0,n.queue(this,a,[]),e&&e.stop&&e.stop.call(this,!0),b=f.length;b--;)f[b].elem===this&&f[b].queue===a&&(f[b].anim.stop(!0),f.splice(b,1));for(b=0;g>b;b++)d[b]&&d[b].finish&&d[b].finish.call(this);delete c.finish})}}),n.each(["toggle","show","hide"],function(a,b){var c=n.fn[b];n.fn[b]=function(a,d,e){return null==a||"boolean"==typeof a?c.apply(this,arguments):this.animate(Xa(b,!0),a,d,e)}}),n.each({slideDown:Xa("show"),slideUp:Xa("hide"),slideToggle:Xa("toggle"),fadeIn:{opacity:"show"},fadeOut:{opacity:"hide"},fadeToggle:{opacity:"toggle"}},function(a,b){n.fn[a]=function(a,c,d){return this.animate(b,a,c,d)}}),n.timers=[],n.fx.tick=function(){var a,b=0,c=n.timers;for(Sa=n.now();b<c.length;b++)a=c[b],a()||c[b]!==a||c.splice(b--,1);c.length||n.fx.stop(),Sa=void 0},n.fx.timer=function(a){n.timers.push(a),a()?n.fx.start():n.timers.pop()},n.fx.interval=13,n.fx.start=function(){Ta||(Ta=a.setInterval(n.fx.tick,n.fx.interval))},n.fx.stop=function(){a.clearInterval(Ta),Ta=null},n.fx.speeds={slow:600,fast:200,_default:400},n.fn.delay=function(b,c){return b=n.fx?n.fx.speeds[b]||b:b,c=c||"fx",this.queue(c,function(c,d){var e=a.setTimeout(c,b);d.stop=function(){a.clearTimeout(e)}})},function(){var a=d.createElement("input"),b=d.createElement("select"),c=b.appendChild(d.createElement("option"));a.type="checkbox",l.checkOn=""!==a.value,l.optSelected=c.selected,b.disabled=!0,l.optDisabled=!c.disabled,a=d.createElement("input"),a.value="t",a.type="radio",l.radioValue="t"===a.value}();var ab,bb=n.expr.attrHandle;n.fn.extend({attr:function(a,b){return K(this,n.attr,a,b,arguments.length>1)},removeAttr:function(a){return this.each(function(){n.removeAttr(this,a)})}}),n.extend({attr:function(a,b,c){var d,e,f=a.nodeType;if(3!==f&&8!==f&&2!==f)return"undefined"==typeof a.getAttribute?n.prop(a,b,c):(1===f&&n.isXMLDoc(a)||(b=b.toLowerCase(),e=n.attrHooks[b]||(n.expr.match.bool.test(b)?ab:void 0)),void 0!==c?null===c?void n.removeAttr(a,b):e&&"set"in e&&void 0!==(d=e.set(a,c,b))?d:(a.setAttribute(b,c+""),c):e&&"get"in e&&null!==(d=e.get(a,b))?d:(d=n.find.attr(a,b),null==d?void 0:d))},attrHooks:{type:{set:function(a,b){if(!l.radioValue&&"radio"===b&&n.nodeName(a,"input")){var c=a.value;return a.setAttribute("type",b),c&&(a.value=c),b}}}},removeAttr:function(a,b){var c,d,e=0,f=b&&b.match(G);if(f&&1===a.nodeType)while(c=f[e++])d=n.propFix[c]||c,n.expr.match.bool.test(c)&&(a[d]=!1),a.removeAttribute(c)}}),ab={set:function(a,b,c){return b===!1?n.removeAttr(a,c):a.setAttribute(c,c),c}},n.each(n.expr.match.bool.source.match(/\w+/g),function(a,b){var c=bb[b]||n.find.attr;bb[b]=function(a,b,d){var e,f;return d||(f=bb[b],bb[b]=e,e=null!=c(a,b,d)?b.toLowerCase():null,bb[b]=f),e}});var cb=/^(?:input|select|textarea|button)$/i,db=/^(?:a|area)$/i;n.fn.extend({prop:function(a,b){return K(this,n.prop,a,b,arguments.length>1)},removeProp:function(a){return this.each(function(){delete this[n.propFix[a]||a]})}}),n.extend({prop:function(a,b,c){var d,e,f=a.nodeType;if(3!==f&&8!==f&&2!==f)return 1===f&&n.isXMLDoc(a)||(b=n.propFix[b]||b,
+e=n.propHooks[b]),void 0!==c?e&&"set"in e&&void 0!==(d=e.set(a,c,b))?d:a[b]=c:e&&"get"in e&&null!==(d=e.get(a,b))?d:a[b]},propHooks:{tabIndex:{get:function(a){var b=n.find.attr(a,"tabindex");return b?parseInt(b,10):cb.test(a.nodeName)||db.test(a.nodeName)&&a.href?0:-1}}},propFix:{"for":"htmlFor","class":"className"}}),l.optSelected||(n.propHooks.selected={get:function(a){var b=a.parentNode;return b&&b.parentNode&&b.parentNode.selectedIndex,null},set:function(a){var b=a.parentNode;b&&(b.selectedIndex,b.parentNode&&b.parentNode.selectedIndex)}}),n.each(["tabIndex","readOnly","maxLength","cellSpacing","cellPadding","rowSpan","colSpan","useMap","frameBorder","contentEditable"],function(){n.propFix[this.toLowerCase()]=this});var eb=/[\t\r\n\f]/g;function fb(a){return a.getAttribute&&a.getAttribute("class")||""}n.fn.extend({addClass:function(a){var b,c,d,e,f,g,h,i=0;if(n.isFunction(a))return this.each(function(b){n(this).addClass(a.call(this,b,fb(this)))});if("string"==typeof a&&a){b=a.match(G)||[];while(c=this[i++])if(e=fb(c),d=1===c.nodeType&&(" "+e+" ").replace(eb," ")){g=0;while(f=b[g++])d.indexOf(" "+f+" ")<0&&(d+=f+" ");h=n.trim(d),e!==h&&c.setAttribute("class",h)}}return this},removeClass:function(a){var b,c,d,e,f,g,h,i=0;if(n.isFunction(a))return this.each(function(b){n(this).removeClass(a.call(this,b,fb(this)))});if(!arguments.length)return this.attr("class","");if("string"==typeof a&&a){b=a.match(G)||[];while(c=this[i++])if(e=fb(c),d=1===c.nodeType&&(" "+e+" ").replace(eb," ")){g=0;while(f=b[g++])while(d.indexOf(" "+f+" ")>-1)d=d.replace(" "+f+" "," ");h=n.trim(d),e!==h&&c.setAttribute("class",h)}}return this},toggleClass:function(a,b){var c=typeof a;return"boolean"==typeof b&&"string"===c?b?this.addClass(a):this.removeClass(a):n.isFunction(a)?this.each(function(c){n(this).toggleClass(a.call(this,c,fb(this),b),b)}):this.each(function(){var b,d,e,f;if("string"===c){d=0,e=n(this),f=a.match(G)||[];while(b=f[d++])e.hasClass(b)?e.removeClass(b):e.addClass(b)}else void 0!==a&&"boolean"!==c||(b=fb(this),b&&N.set(this,"__className__",b),this.setAttribute&&this.setAttribute("class",b||a===!1?"":N.get(this,"__className__")||""))})},hasClass:function(a){var b,c,d=0;b=" "+a+" ";while(c=this[d++])if(1===c.nodeType&&(" "+fb(c)+" ").replace(eb," ").indexOf(b)>-1)return!0;return!1}});var gb=/\r/g,hb=/[\x20\t\r\n\f]+/g;n.fn.extend({val:function(a){var b,c,d,e=this[0];{if(arguments.length)return d=n.isFunction(a),this.each(function(c){var e;1===this.nodeType&&(e=d?a.call(this,c,n(this).val()):a,null==e?e="":"number"==typeof e?e+="":n.isArray(e)&&(e=n.map(e,function(a){return null==a?"":a+""})),b=n.valHooks[this.type]||n.valHooks[this.nodeName.toLowerCase()],b&&"set"in b&&void 0!==b.set(this,e,"value")||(this.value=e))});if(e)return b=n.valHooks[e.type]||n.valHooks[e.nodeName.toLowerCase()],b&&"get"in b&&void 0!==(c=b.get(e,"value"))?c:(c=e.value,"string"==typeof c?c.replace(gb,""):null==c?"":c)}}}),n.extend({valHooks:{option:{get:function(a){var b=n.find.attr(a,"value");return null!=b?b:n.trim(n.text(a)).replace(hb," ")}},select:{get:function(a){for(var b,c,d=a.options,e=a.selectedIndex,f="select-one"===a.type||0>e,g=f?null:[],h=f?e+1:d.length,i=0>e?h:f?e:0;h>i;i++)if(c=d[i],(c.selected||i===e)&&(l.optDisabled?!c.disabled:null===c.getAttribute("disabled"))&&(!c.parentNode.disabled||!n.nodeName(c.parentNode,"optgroup"))){if(b=n(c).val(),f)return b;g.push(b)}return g},set:function(a,b){var c,d,e=a.options,f=n.makeArray(b),g=e.length;while(g--)d=e[g],(d.selected=n.inArray(n.valHooks.option.get(d),f)>-1)&&(c=!0);return c||(a.selectedIndex=-1),f}}}}),n.each(["radio","checkbox"],function(){n.valHooks[this]={set:function(a,b){return n.isArray(b)?a.checked=n.inArray(n(a).val(),b)>-1:void 0}},l.checkOn||(n.valHooks[this].get=function(a){return null===a.getAttribute("value")?"on":a.value})});var ib=/^(?:focusinfocus|focusoutblur)$/;n.extend(n.event,{trigger:function(b,c,e,f){var g,h,i,j,l,m,o,p=[e||d],q=k.call(b,"type")?b.type:b,r=k.call(b,"namespace")?b.namespace.split("."):[];if(h=i=e=e||d,3!==e.nodeType&&8!==e.nodeType&&!ib.test(q+n.event.triggered)&&(q.indexOf(".")>-1&&(r=q.split("."),q=r.shift(),r.sort()),l=q.indexOf(":")<0&&"on"+q,b=b[n.expando]?b:new n.Event(q,"object"==typeof b&&b),b.isTrigger=f?2:3,b.namespace=r.join("."),b.rnamespace=b.namespace?new RegExp("(^|\\.)"+r.join("\\.(?:.*\\.|)")+"(\\.|$)"):null,b.result=void 0,b.target||(b.target=e),c=null==c?[b]:n.makeArray(c,[b]),o=n.event.special[q]||{},f||!o.trigger||o.trigger.apply(e,c)!==!1)){if(!f&&!o.noBubble&&!n.isWindow(e)){for(j=o.delegateType||q,ib.test(j+q)||(h=h.parentNode);h;h=h.parentNode)p.push(h),i=h;i===(e.ownerDocument||d)&&p.push(i.defaultView||i.parentWindow||a)}g=0;while((h=p[g++])&&!b.isPropagationStopped())b.type=g>1?j:o.bindType||q,m=(N.get(h,"events")||{})[b.type]&&N.get(h,"handle"),m&&m.apply(h,c),m=l&&h[l],m&&m.apply&&L(h)&&(b.result=m.apply(h,c),b.result===!1&&b.preventDefault());return b.type=q,f||b.isDefaultPrevented()||o._default&&o._default.apply(p.pop(),c)!==!1||!L(e)||l&&n.isFunction(e[q])&&!n.isWindow(e)&&(i=e[l],i&&(e[l]=null),n.event.triggered=q,e[q](),n.event.triggered=void 0,i&&(e[l]=i)),b.result}},simulate:function(a,b,c){var d=n.extend(new n.Event,c,{type:a,isSimulated:!0});n.event.trigger(d,null,b),d.isDefaultPrevented()&&c.preventDefault()}}),n.fn.extend({trigger:function(a,b){return this.each(function(){n.event.trigger(a,b,this)})},triggerHandler:function(a,b){var c=this[0];return c?n.event.trigger(a,b,c,!0):void 0}}),n.each("blur focus focusin focusout load resize scroll unload click dblclick mousedown mouseup mousemove mouseover mouseout mouseenter mouseleave change select submit keydown keypress keyup error contextmenu".split(" "),function(a,b){n.fn[b]=function(a,c){return arguments.length>0?this.on(b,null,a,c):this.trigger(b)}}),n.fn.extend({hover:function(a,b){return this.mouseenter(a).mouseleave(b||a)}}),l.focusin="onfocusin"in a,l.focusin||n.each({focus:"focusin",blur:"focusout"},function(a,b){var c=function(a){n.event.simulate(b,a.target,n.event.fix(a))};n.event.special[b]={setup:function(){var d=this.ownerDocument||this,e=N.access(d,b);e||d.addEventListener(a,c,!0),N.access(d,b,(e||0)+1)},teardown:function(){var d=this.ownerDocument||this,e=N.access(d,b)-1;e?N.access(d,b,e):(d.removeEventListener(a,c,!0),N.remove(d,b))}}});var jb=a.location,kb=n.now(),lb=/\?/;n.parseJSON=function(a){return JSON.parse(a+"")},n.parseXML=function(b){var c;if(!b||"string"!=typeof b)return null;try{c=(new a.DOMParser).parseFromString(b,"text/xml")}catch(d){c=void 0}return c&&!c.getElementsByTagName("parsererror").length||n.error("Invalid XML: "+b),c};var mb=/#.*$/,nb=/([?&])_=[^&]*/,ob=/^(.*?):[ \t]*([^\r\n]*)$/gm,pb=/^(?:about|app|app-storage|.+-extension|file|res|widget):$/,qb=/^(?:GET|HEAD)$/,rb=/^\/\//,sb={},tb={},ub="*/".concat("*"),vb=d.createElement("a");vb.href=jb.href;function wb(a){return function(b,c){"string"!=typeof b&&(c=b,b="*");var d,e=0,f=b.toLowerCase().match(G)||[];if(n.isFunction(c))while(d=f[e++])"+"===d[0]?(d=d.slice(1)||"*",(a[d]=a[d]||[]).unshift(c)):(a[d]=a[d]||[]).push(c)}}function xb(a,b,c,d){var e={},f=a===tb;function g(h){var i;return e[h]=!0,n.each(a[h]||[],function(a,h){var j=h(b,c,d);return"string"!=typeof j||f||e[j]?f?!(i=j):void 0:(b.dataTypes.unshift(j),g(j),!1)}),i}return g(b.dataTypes[0])||!e["*"]&&g("*")}function yb(a,b){var c,d,e=n.ajaxSettings.flatOptions||{};for(c in b)void 0!==b[c]&&((e[c]?a:d||(d={}))[c]=b[c]);return d&&n.extend(!0,a,d),a}function zb(a,b,c){var d,e,f,g,h=a.contents,i=a.dataTypes;while("*"===i[0])i.shift(),void 0===d&&(d=a.mimeType||b.getResponseHeader("Content-Type"));if(d)for(e in h)if(h[e]&&h[e].test(d)){i.unshift(e);break}if(i[0]in c)f=i[0];else{for(e in c){if(!i[0]||a.converters[e+" "+i[0]]){f=e;break}g||(g=e)}f=f||g}return f?(f!==i[0]&&i.unshift(f),c[f]):void 0}function Ab(a,b,c,d){var e,f,g,h,i,j={},k=a.dataTypes.slice();if(k[1])for(g in a.converters)j[g.toLowerCase()]=a.converters[g];f=k.shift();while(f)if(a.responseFields[f]&&(c[a.responseFields[f]]=b),!i&&d&&a.dataFilter&&(b=a.dataFilter(b,a.dataType)),i=f,f=k.shift())if("*"===f)f=i;else if("*"!==i&&i!==f){if(g=j[i+" "+f]||j["* "+f],!g)for(e in j)if(h=e.split(" "),h[1]===f&&(g=j[i+" "+h[0]]||j["* "+h[0]])){g===!0?g=j[e]:j[e]!==!0&&(f=h[0],k.unshift(h[1]));break}if(g!==!0)if(g&&a["throws"])b=g(b);else try{b=g(b)}catch(l){return{state:"parsererror",error:g?l:"No conversion from "+i+" to "+f}}}return{state:"success",data:b}}n.extend({active:0,lastModified:{},etag:{},ajaxSettings:{url:jb.href,type:"GET",isLocal:pb.test(jb.protocol),global:!0,processData:!0,async:!0,contentType:"application/x-www-form-urlencoded; charset=UTF-8",accepts:{"*":ub,text:"text/plain",html:"text/html",xml:"application/xml, text/xml",json:"application/json, text/javascript"},contents:{xml:/\bxml\b/,html:/\bhtml/,json:/\bjson\b/},responseFields:{xml:"responseXML",text:"responseText",json:"responseJSON"},converters:{"* text":String,"text html":!0,"text json":n.parseJSON,"text xml":n.parseXML},flatOptions:{url:!0,context:!0}},ajaxSetup:function(a,b){return b?yb(yb(a,n.ajaxSettings),b):yb(n.ajaxSettings,a)},ajaxPrefilter:wb(sb),ajaxTransport:wb(tb),ajax:function(b,c){"object"==typeof b&&(c=b,b=void 0),c=c||{};var e,f,g,h,i,j,k,l,m=n.ajaxSetup({},c),o=m.context||m,p=m.context&&(o.nodeType||o.jquery)?n(o):n.event,q=n.Deferred(),r=n.Callbacks("once memory"),s=m.statusCode||{},t={},u={},v=0,w="canceled",x={readyState:0,getResponseHeader:function(a){var b;if(2===v){if(!h){h={};while(b=ob.exec(g))h[b[1].toLowerCase()]=b[2]}b=h[a.toLowerCase()]}return null==b?null:b},getAllResponseHeaders:function(){return 2===v?g:null},setRequestHeader:function(a,b){var c=a.toLowerCase();return v||(a=u[c]=u[c]||a,t[a]=b),this},overrideMimeType:function(a){return v||(m.mimeType=a),this},statusCode:function(a){var b;if(a)if(2>v)for(b in a)s[b]=[s[b],a[b]];else x.always(a[x.status]);return this},abort:function(a){var b=a||w;return e&&e.abort(b),z(0,b),this}};if(q.promise(x).complete=r.add,x.success=x.done,x.error=x.fail,m.url=((b||m.url||jb.href)+"").replace(mb,"").replace(rb,jb.protocol+"//"),m.type=c.method||c.type||m.method||m.type,m.dataTypes=n.trim(m.dataType||"*").toLowerCase().match(G)||[""],null==m.crossDomain){j=d.createElement("a");try{j.href=m.url,j.href=j.href,m.crossDomain=vb.protocol+"//"+vb.host!=j.protocol+"//"+j.host}catch(y){m.crossDomain=!0}}if(m.data&&m.processData&&"string"!=typeof m.data&&(m.data=n.param(m.data,m.traditional)),xb(sb,m,c,x),2===v)return x;k=n.event&&m.global,k&&0===n.active++&&n.event.trigger("ajaxStart"),m.type=m.type.toUpperCase(),m.hasContent=!qb.test(m.type),f=m.url,m.hasContent||(m.data&&(f=m.url+=(lb.test(f)?"&":"?")+m.data,delete m.data),m.cache===!1&&(m.url=nb.test(f)?f.replace(nb,"$1_="+kb++):f+(lb.test(f)?"&":"?")+"_="+kb++)),m.ifModified&&(n.lastModified[f]&&x.setRequestHeader("If-Modified-Since",n.lastModified[f]),n.etag[f]&&x.setRequestHeader("If-None-Match",n.etag[f])),(m.data&&m.hasContent&&m.contentType!==!1||c.contentType)&&x.setRequestHeader("Content-Type",m.contentType),x.setRequestHeader("Accept",m.dataTypes[0]&&m.accepts[m.dataTypes[0]]?m.accepts[m.dataTypes[0]]+("*"!==m.dataTypes[0]?", "+ub+"; q=0.01":""):m.accepts["*"]);for(l in m.headers)x.setRequestHeader(l,m.headers[l]);if(m.beforeSend&&(m.beforeSend.call(o,x,m)===!1||2===v))return x.abort();w="abort";for(l in{success:1,error:1,complete:1})x[l](m[l]);if(e=xb(tb,m,c,x)){if(x.readyState=1,k&&p.trigger("ajaxSend",[x,m]),2===v)return x;m.async&&m.timeout>0&&(i=a.setTimeout(function(){x.abort("timeout")},m.timeout));try{v=1,e.send(t,z)}catch(y){if(!(2>v))throw y;z(-1,y)}}else z(-1,"No Transport");function z(b,c,d,h){var j,l,t,u,w,y=c;2!==v&&(v=2,i&&a.clearTimeout(i),e=void 0,g=h||"",x.readyState=b>0?4:0,j=b>=200&&300>b||304===b,d&&(u=zb(m,x,d)),u=Ab(m,u,x,j),j?(m.ifModified&&(w=x.getResponseHeader("Last-Modified"),w&&(n.lastModified[f]=w),w=x.getResponseHeader("etag"),w&&(n.etag[f]=w)),204===b||"HEAD"===m.type?y="nocontent":304===b?y="notmodified":(y=u.state,l=u.data,t=u.error,j=!t)):(t=y,!b&&y||(y="error",0>b&&(b=0))),x.status=b,x.statusText=(c||y)+"",j?q.resolveWith(o,[l,y,x]):q.rejectWith(o,[x,y,t]),x.statusCode(s),s=void 0,k&&p.trigger(j?"ajaxSuccess":"ajaxError",[x,m,j?l:t]),r.fireWith(o,[x,y]),k&&(p.trigger("ajaxComplete",[x,m]),--n.active||n.event.trigger("ajaxStop")))}return x},getJSON:function(a,b,c){return n.get(a,b,c,"json")},getScript:function(a,b){return n.get(a,void 0,b,"script")}}),n.each(["get","post"],function(a,b){n[b]=function(a,c,d,e){return n.isFunction(c)&&(e=e||d,d=c,c=void 0),n.ajax(n.extend({url:a,type:b,dataType:e,data:c,success:d},n.isPlainObject(a)&&a))}}),n._evalUrl=function(a){return n.ajax({url:a,type:"GET",dataType:"script",async:!1,global:!1,"throws":!0})},n.fn.extend({wrapAll:function(a){var b;return n.isFunction(a)?this.each(function(b){n(this).wrapAll(a.call(this,b))}):(this[0]&&(b=n(a,this[0].ownerDocument).eq(0).clone(!0),this[0].parentNode&&b.insertBefore(this[0]),b.map(function(){var a=this;while(a.firstElementChild)a=a.firstElementChild;return a}).append(this)),this)},wrapInner:function(a){return n.isFunction(a)?this.each(function(b){n(this).wrapInner(a.call(this,b))}):this.each(function(){var b=n(this),c=b.contents();c.length?c.wrapAll(a):b.append(a)})},wrap:function(a){var b=n.isFunction(a);return this.each(function(c){n(this).wrapAll(b?a.call(this,c):a)})},unwrap:function(){return this.parent().each(function(){n.nodeName(this,"body")||n(this).replaceWith(this.childNodes)}).end()}}),n.expr.filters.hidden=function(a){return!n.expr.filters.visible(a)},n.expr.filters.visible=function(a){return a.offsetWidth>0||a.offsetHeight>0||a.getClientRects().length>0};var Bb=/%20/g,Cb=/\[\]$/,Db=/\r?\n/g,Eb=/^(?:submit|button|image|reset|file)$/i,Fb=/^(?:input|select|textarea|keygen)/i;function Gb(a,b,c,d){var e;if(n.isArray(b))n.each(b,function(b,e){c||Cb.test(a)?d(a,e):Gb(a+"["+("object"==typeof e&&null!=e?b:"")+"]",e,c,d)});else if(c||"object"!==n.type(b))d(a,b);else for(e in b)Gb(a+"["+e+"]",b[e],c,d)}n.param=function(a,b){var c,d=[],e=function(a,b){b=n.isFunction(b)?b():null==b?"":b,d[d.length]=encodeURIComponent(a)+"="+encodeURIComponent(b)};if(void 0===b&&(b=n.ajaxSettings&&n.ajaxSettings.traditional),n.isArray(a)||a.jquery&&!n.isPlainObject(a))n.each(a,function(){e(this.name,this.value)});else for(c in a)Gb(c,a[c],b,e);return d.join("&").replace(Bb,"+")},n.fn.extend({serialize:function(){return n.param(this.serializeArray())},serializeArray:function(){return this.map(function(){var a=n.prop(this,"elements");return a?n.makeArray(a):this}).filter(function(){var a=this.type;return this.name&&!n(this).is(":disabled")&&Fb.test(this.nodeName)&&!Eb.test(a)&&(this.checked||!X.test(a))}).map(function(a,b){var c=n(this).val();return null==c?null:n.isArray(c)?n.map(c,function(a){return{name:b.name,value:a.replace(Db,"\r\n")}}):{name:b.name,value:c.replace(Db,"\r\n")}}).get()}}),n.ajaxSettings.xhr=function(){try{return new a.XMLHttpRequest}catch(b){}};var Hb={0:200,1223:204},Ib=n.ajaxSettings.xhr();l.cors=!!Ib&&"withCredentials"in Ib,l.ajax=Ib=!!Ib,n.ajaxTransport(function(b){var c,d;return l.cors||Ib&&!b.crossDomain?{send:function(e,f){var g,h=b.xhr();if(h.open(b.type,b.url,b.async,b.username,b.password),b.xhrFields)for(g in b.xhrFields)h[g]=b.xhrFields[g];b.mimeType&&h.overrideMimeType&&h.overrideMimeType(b.mimeType),b.crossDomain||e["X-Requested-With"]||(e["X-Requested-With"]="XMLHttpRequest");for(g in e)h.setRequestHeader(g,e[g]);c=function(a){return function(){c&&(c=d=h.onload=h.onerror=h.onabort=h.onreadystatechange=null,"abort"===a?h.abort():"error"===a?"number"!=typeof h.status?f(0,"error"):f(h.status,h.statusText):f(Hb[h.status]||h.status,h.statusText,"text"!==(h.responseType||"text")||"string"!=typeof h.responseText?{binary:h.response}:{text:h.responseText},h.getAllResponseHeaders()))}},h.onload=c(),d=h.onerror=c("error"),void 0!==h.onabort?h.onabort=d:h.onreadystatechange=function(){4===h.readyState&&a.setTimeout(function(){c&&d()})},c=c("abort");try{h.send(b.hasContent&&b.data||null)}catch(i){if(c)throw i}},abort:function(){c&&c()}}:void 0}),n.ajaxSetup({accepts:{script:"text/javascript, application/javascript, application/ecmascript, application/x-ecmascript"},contents:{script:/\b(?:java|ecma)script\b/},converters:{"text script":function(a){return n.globalEval(a),a}}}),n.ajaxPrefilter("script",function(a){void 0===a.cache&&(a.cache=!1),a.crossDomain&&(a.type="GET")}),n.ajaxTransport("script",function(a){if(a.crossDomain){var b,c;return{send:function(e,f){b=n("<script>").prop({charset:a.scriptCharset,src:a.url}).on("load error",c=function(a){b.remove(),c=null,a&&f("error"===a.type?404:200,a.type)}),d.head.appendChild(b[0])},abort:function(){c&&c()}}}});var Jb=[],Kb=/(=)\?(?=&|$)|\?\?/;n.ajaxSetup({jsonp:"callback",jsonpCallback:function(){var a=Jb.pop()||n.expando+"_"+kb++;return this[a]=!0,a}}),n.ajaxPrefilter("json jsonp",function(b,c,d){var e,f,g,h=b.jsonp!==!1&&(Kb.test(b.url)?"url":"string"==typeof b.data&&0===(b.contentType||"").indexOf("application/x-www-form-urlencoded")&&Kb.test(b.data)&&"data");return h||"jsonp"===b.dataTypes[0]?(e=b.jsonpCallback=n.isFunction(b.jsonpCallback)?b.jsonpCallback():b.jsonpCallback,h?b[h]=b[h].replace(Kb,"$1"+e):b.jsonp!==!1&&(b.url+=(lb.test(b.url)?"&":"?")+b.jsonp+"="+e),b.converters["script json"]=function(){return g||n.error(e+" was not called"),g[0]},b.dataTypes[0]="json",f=a[e],a[e]=function(){g=arguments},d.always(function(){void 0===f?n(a).removeProp(e):a[e]=f,b[e]&&(b.jsonpCallback=c.jsonpCallback,Jb.push(e)),g&&n.isFunction(f)&&f(g[0]),g=f=void 0}),"script"):void 0}),n.parseHTML=function(a,b,c){if(!a||"string"!=typeof a)return null;"boolean"==typeof b&&(c=b,b=!1),b=b||d;var e=x.exec(a),f=!c&&[];return e?[b.createElement(e[1])]:(e=ca([a],b,f),f&&f.length&&n(f).remove(),n.merge([],e.childNodes))};var Lb=n.fn.load;n.fn.load=function(a,b,c){if("string"!=typeof a&&Lb)return Lb.apply(this,arguments);var d,e,f,g=this,h=a.indexOf(" ");return h>-1&&(d=n.trim(a.slice(h)),a=a.slice(0,h)),n.isFunction(b)?(c=b,b=void 0):b&&"object"==typeof b&&(e="POST"),g.length>0&&n.ajax({url:a,type:e||"GET",dataType:"html",data:b}).done(function(a){f=arguments,g.html(d?n("<div>").append(n.parseHTML(a)).find(d):a)}).always(c&&function(a,b){g.each(function(){c.apply(this,f||[a.responseText,b,a])})}),this},n.each(["ajaxStart","ajaxStop","ajaxComplete","ajaxError","ajaxSuccess","ajaxSend"],function(a,b){n.fn[b]=function(a){return this.on(b,a)}}),n.expr.filters.animated=function(a){return n.grep(n.timers,function(b){return a===b.elem}).length};function Mb(a){return n.isWindow(a)?a:9===a.nodeType&&a.defaultView}n.offset={setOffset:function(a,b,c){var d,e,f,g,h,i,j,k=n.css(a,"position"),l=n(a),m={};"static"===k&&(a.style.position="relative"),h=l.offset(),f=n.css(a,"top"),i=n.css(a,"left"),j=("absolute"===k||"fixed"===k)&&(f+i).indexOf("auto")>-1,j?(d=l.position(),g=d.top,e=d.left):(g=parseFloat(f)||0,e=parseFloat(i)||0),n.isFunction(b)&&(b=b.call(a,c,n.extend({},h))),null!=b.top&&(m.top=b.top-h.top+g),null!=b.left&&(m.left=b.left-h.left+e),"using"in b?b.using.call(a,m):l.css(m)}},n.fn.extend({offset:function(a){if(arguments.length)return void 0===a?this:this.each(function(b){n.offset.setOffset(this,a,b)});var b,c,d=this[0],e={top:0,left:0},f=d&&d.ownerDocument;if(f)return b=f.documentElement,n.contains(b,d)?(e=d.getBoundingClientRect(),c=Mb(f),{top:e.top+c.pageYOffset-b.clientTop,left:e.left+c.pageXOffset-b.clientLeft}):e},position:function(){if(this[0]){var a,b,c=this[0],d={top:0,left:0};return"fixed"===n.css(c,"position")?b=c.getBoundingClientRect():(a=this.offsetParent(),b=this.offset(),n.nodeName(a[0],"html")||(d=a.offset()),d.top+=n.css(a[0],"borderTopWidth",!0),d.left+=n.css(a[0],"borderLeftWidth",!0)),{top:b.top-d.top-n.css(c,"marginTop",!0),left:b.left-d.left-n.css(c,"marginLeft",!0)}}},offsetParent:function(){return this.map(function(){var a=this.offsetParent;while(a&&"static"===n.css(a,"position"))a=a.offsetParent;return a||Ea})}}),n.each({scrollLeft:"pageXOffset",scrollTop:"pageYOffset"},function(a,b){var c="pageYOffset"===b;n.fn[a]=function(d){return K(this,function(a,d,e){var f=Mb(a);return void 0===e?f?f[b]:a[d]:void(f?f.scrollTo(c?f.pageXOffset:e,c?e:f.pageYOffset):a[d]=e)},a,d,arguments.length)}}),n.each(["top","left"],function(a,b){n.cssHooks[b]=Ga(l.pixelPosition,function(a,c){return c?(c=Fa(a,b),Ba.test(c)?n(a).position()[b]+"px":c):void 0})}),n.each({Height:"height",Width:"width"},function(a,b){n.each({padding:"inner"+a,content:b,"":"outer"+a},function(c,d){n.fn[d]=function(d,e){var f=arguments.length&&(c||"boolean"!=typeof d),g=c||(d===!0||e===!0?"margin":"border");return K(this,function(b,c,d){var e;return n.isWindow(b)?b.document.documentElement["client"+a]:9===b.nodeType?(e=b.documentElement,Math.max(b.body["scroll"+a],e["scroll"+a],b.body["offset"+a],e["offset"+a],e["client"+a])):void 0===d?n.css(b,c,g):n.style(b,c,d,g)},b,f?d:void 0,f,null)}})}),n.fn.extend({bind:function(a,b,c){return this.on(a,null,b,c)},unbind:function(a,b){return this.off(a,null,b)},delegate:function(a,b,c,d){return this.on(b,a,c,d)},undelegate:function(a,b,c){return 1===arguments.length?this.off(a,"**"):this.off(b,a||"**",c)},size:function(){return this.length}}),n.fn.andSelf=n.fn.addBack,"function"==typeof define&&define.amd&&define("jquery",[],function(){return n});var Nb=a.jQuery,Ob=a.$;return n.noConflict=function(b){return a.$===n&&(a.$=Ob),b&&a.jQuery===n&&(a.jQuery=Nb),n},b||(a.jQuery=a.$=n),n});
diff --git a/docs/previous_versions/v0.4.0/libs/moment-2.8.4/moment.js b/docs/previous_versions/v0.4.0/libs/moment-2.8.4/moment.js
new file mode 100644
index 000000000..85e190d4a
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/moment-2.8.4/moment.js
@@ -0,0 +1,2936 @@
+//! moment.js
+//! version : 2.8.4
+//! authors : Tim Wood, Iskren Chernev, Moment.js contributors
+//! license : MIT
+//! momentjs.com
+
+(function (undefined) {
+    /************************************
+        Constants
+    ************************************/
+
+    var moment,
+        VERSION = '2.8.4',
+        // the global-scope this is NOT the global object in Node.js
+        globalScope = typeof global !== 'undefined' ? global : this,
+        oldGlobalMoment,
+        round = Math.round,
+        hasOwnProperty = Object.prototype.hasOwnProperty,
+        i,
+
+        YEAR = 0,
+        MONTH = 1,
+        DATE = 2,
+        HOUR = 3,
+        MINUTE = 4,
+        SECOND = 5,
+        MILLISECOND = 6,
+
+        // internal storage for locale config files
+        locales = {},
+
+        // extra moment internal properties (plugins register props here)
+        momentProperties = [],
+
+        // check for nodeJS
+        hasModule = (typeof module !== 'undefined' && module && module.exports),
+
+        // ASP.NET json date format regex
+        aspNetJsonRegex = /^\/?Date\((\-?\d+)/i,
+        aspNetTimeSpanJsonRegex = /(\-)?(?:(\d*)\.)?(\d+)\:(\d+)(?:\:(\d+)\.?(\d{3})?)?/,
+
+        // from http://docs.closure-library.googlecode.com/git/closure_goog_date_date.js.source.html
+        // somewhat more in line with 4.4.3.2 2004 spec, but allows decimal anywhere
+        isoDurationRegex = /^(-)?P(?:(?:([0-9,.]*)Y)?(?:([0-9,.]*)M)?(?:([0-9,.]*)D)?(?:T(?:([0-9,.]*)H)?(?:([0-9,.]*)M)?(?:([0-9,.]*)S)?)?|([0-9,.]*)W)$/,
+
+        // format tokens
+        formattingTokens = /(\[[^\[]*\])|(\\)?(Mo|MM?M?M?|Do|DDDo|DD?D?D?|ddd?d?|do?|w[o|w]?|W[o|W]?|Q|YYYYYY|YYYYY|YYYY|YY|gg(ggg?)?|GG(GGG?)?|e|E|a|A|hh?|HH?|mm?|ss?|S{1,4}|x|X|zz?|ZZ?|.)/g,
+        localFormattingTokens = /(\[[^\[]*\])|(\\)?(LTS|LT|LL?L?L?|l{1,4})/g,
+
+        // parsing token regexes
+        parseTokenOneOrTwoDigits = /\d\d?/, // 0 - 99
+        parseTokenOneToThreeDigits = /\d{1,3}/, // 0 - 999
+        parseTokenOneToFourDigits = /\d{1,4}/, // 0 - 9999
+        parseTokenOneToSixDigits = /[+\-]?\d{1,6}/, // -999,999 - 999,999
+        parseTokenDigits = /\d+/, // nonzero number of digits
+        parseTokenWord = /[0-9]*['a-z\u00A0-\u05FF\u0700-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]+|[\u0600-\u06FF\/]+(\s*?[\u0600-\u06FF]+){1,2}/i, // any word (or two) characters or numbers including two/three word month in arabic.
+        parseTokenTimezone = /Z|[\+\-]\d\d:?\d\d/gi, // +00:00 -00:00 +0000 -0000 or Z
+        parseTokenT = /T/i, // T (ISO separator)
+        parseTokenOffsetMs = /[\+\-]?\d+/, // 1234567890123
+        parseTokenTimestampMs = /[\+\-]?\d+(\.\d{1,3})?/, // 123456789 123456789.123
+
+        //strict parsing regexes
+        parseTokenOneDigit = /\d/, // 0 - 9
+        parseTokenTwoDigits = /\d\d/, // 00 - 99
+        parseTokenThreeDigits = /\d{3}/, // 000 - 999
+        parseTokenFourDigits = /\d{4}/, // 0000 - 9999
+        parseTokenSixDigits = /[+-]?\d{6}/, // -999,999 - 999,999
+        parseTokenSignedNumber = /[+-]?\d+/, // -inf - inf
+
+        // iso 8601 regex
+        // 0000-00-00 0000-W00 or 0000-W00-0 + T + 00 or 00:00 or 00:00:00 or 00:00:00.000 + +00:00 or +0000 or +00)
+        isoRegex = /^\s*(?:[+-]\d{6}|\d{4})-(?:(\d\d-\d\d)|(W\d\d$)|(W\d\d-\d)|(\d\d\d))((T| )(\d\d(:\d\d(:\d\d(\.\d+)?)?)?)?([\+\-]\d\d(?::?\d\d)?|\s*Z)?)?$/,
+
+        isoFormat = 'YYYY-MM-DDTHH:mm:ssZ',
+
+        isoDates = [
+            ['YYYYYY-MM-DD', /[+-]\d{6}-\d{2}-\d{2}/],
+            ['YYYY-MM-DD', /\d{4}-\d{2}-\d{2}/],
+            ['GGGG-[W]WW-E', /\d{4}-W\d{2}-\d/],
+            ['GGGG-[W]WW', /\d{4}-W\d{2}/],
+            ['YYYY-DDD', /\d{4}-\d{3}/]
+        ],
+
+        // iso time formats and regexes
+        isoTimes = [
+            ['HH:mm:ss.SSSS', /(T| )\d\d:\d\d:\d\d\.\d+/],
+            ['HH:mm:ss', /(T| )\d\d:\d\d:\d\d/],
+            ['HH:mm', /(T| )\d\d:\d\d/],
+            ['HH', /(T| )\d\d/]
+        ],
+
+        // timezone chunker '+10:00' > ['10', '00'] or '-1530' > ['-15', '30']
+        parseTimezoneChunker = /([\+\-]|\d\d)/gi,
+
+        // getter and setter names
+        proxyGettersAndSetters = 'Date|Hours|Minutes|Seconds|Milliseconds'.split('|'),
+        unitMillisecondFactors = {
+            'Milliseconds' : 1,
+            'Seconds' : 1e3,
+            'Minutes' : 6e4,
+            'Hours' : 36e5,
+            'Days' : 864e5,
+            'Months' : 2592e6,
+            'Years' : 31536e6
+        },
+
+        unitAliases = {
+            ms : 'millisecond',
+            s : 'second',
+            m : 'minute',
+            h : 'hour',
+            d : 'day',
+            D : 'date',
+            w : 'week',
+            W : 'isoWeek',
+            M : 'month',
+            Q : 'quarter',
+            y : 'year',
+            DDD : 'dayOfYear',
+            e : 'weekday',
+            E : 'isoWeekday',
+            gg: 'weekYear',
+            GG: 'isoWeekYear'
+        },
+
+        camelFunctions = {
+            dayofyear : 'dayOfYear',
+            isoweekday : 'isoWeekday',
+            isoweek : 'isoWeek',
+            weekyear : 'weekYear',
+            isoweekyear : 'isoWeekYear'
+        },
+
+        // format function strings
+        formatFunctions = {},
+
+        // default relative time thresholds
+        relativeTimeThresholds = {
+            s: 45,  // seconds to minute
+            m: 45,  // minutes to hour
+            h: 22,  // hours to day
+            d: 26,  // days to month
+            M: 11   // months to year
+        },
+
+        // tokens to ordinalize and pad
+        ordinalizeTokens = 'DDD w W M D d'.split(' '),
+        paddedTokens = 'M D H h m s w W'.split(' '),
+
+        formatTokenFunctions = {
+            M    : function () {
+                return this.month() + 1;
+            },
+            MMM  : function (format) {
+                return this.localeData().monthsShort(this, format);
+            },
+            MMMM : function (format) {
+                return this.localeData().months(this, format);
+            },
+            D    : function () {
+                return this.date();
+            },
+            DDD  : function () {
+                return this.dayOfYear();
+            },
+            d    : function () {
+                return this.day();
+            },
+            dd   : function (format) {
+                return this.localeData().weekdaysMin(this, format);
+            },
+            ddd  : function (format) {
+                return this.localeData().weekdaysShort(this, format);
+            },
+            dddd : function (format) {
+                return this.localeData().weekdays(this, format);
+            },
+            w    : function () {
+                return this.week();
+            },
+            W    : function () {
+                return this.isoWeek();
+            },
+            YY   : function () {
+                return leftZeroFill(this.year() % 100, 2);
+            },
+            YYYY : function () {
+                return leftZeroFill(this.year(), 4);
+            },
+            YYYYY : function () {
+                return leftZeroFill(this.year(), 5);
+            },
+            YYYYYY : function () {
+                var y = this.year(), sign = y >= 0 ? '+' : '-';
+                return sign + leftZeroFill(Math.abs(y), 6);
+            },
+            gg   : function () {
+                return leftZeroFill(this.weekYear() % 100, 2);
+            },
+            gggg : function () {
+                return leftZeroFill(this.weekYear(), 4);
+            },
+            ggggg : function () {
+                return leftZeroFill(this.weekYear(), 5);
+            },
+            GG   : function () {
+                return leftZeroFill(this.isoWeekYear() % 100, 2);
+            },
+            GGGG : function () {
+                return leftZeroFill(this.isoWeekYear(), 4);
+            },
+            GGGGG : function () {
+                return leftZeroFill(this.isoWeekYear(), 5);
+            },
+            e : function () {
+                return this.weekday();
+            },
+            E : function () {
+                return this.isoWeekday();
+            },
+            a    : function () {
+                return this.localeData().meridiem(this.hours(), this.minutes(), true);
+            },
+            A    : function () {
+                return this.localeData().meridiem(this.hours(), this.minutes(), false);
+            },
+            H    : function () {
+                return this.hours();
+            },
+            h    : function () {
+                return this.hours() % 12 || 12;
+            },
+            m    : function () {
+                return this.minutes();
+            },
+            s    : function () {
+                return this.seconds();
+            },
+            S    : function () {
+                return toInt(this.milliseconds() / 100);
+            },
+            SS   : function () {
+                return leftZeroFill(toInt(this.milliseconds() / 10), 2);
+            },
+            SSS  : function () {
+                return leftZeroFill(this.milliseconds(), 3);
+            },
+            SSSS : function () {
+                return leftZeroFill(this.milliseconds(), 3);
+            },
+            Z    : function () {
+                var a = -this.zone(),
+                    b = '+';
+                if (a < 0) {
+                    a = -a;
+                    b = '-';
+                }
+                return b + leftZeroFill(toInt(a / 60), 2) + ':' + leftZeroFill(toInt(a) % 60, 2);
+            },
+            ZZ   : function () {
+                var a = -this.zone(),
+                    b = '+';
+                if (a < 0) {
+                    a = -a;
+                    b = '-';
+                }
+                return b + leftZeroFill(toInt(a / 60), 2) + leftZeroFill(toInt(a) % 60, 2);
+            },
+            z : function () {
+                return this.zoneAbbr();
+            },
+            zz : function () {
+                return this.zoneName();
+            },
+            x    : function () {
+                return this.valueOf();
+            },
+            X    : function () {
+                return this.unix();
+            },
+            Q : function () {
+                return this.quarter();
+            }
+        },
+
+        deprecations = {},
+
+        lists = ['months', 'monthsShort', 'weekdays', 'weekdaysShort', 'weekdaysMin'];
+
+    // Pick the first defined of two or three arguments. dfl comes from
+    // default.
+    function dfl(a, b, c) {
+        switch (arguments.length) {
+            case 2: return a != null ? a : b;
+            case 3: return a != null ? a : b != null ? b : c;
+            default: throw new Error('Implement me');
+        }
+    }
+
+    function hasOwnProp(a, b) {
+        return hasOwnProperty.call(a, b);
+    }
+
+    function defaultParsingFlags() {
+        // We need to deep clone this object, and es5 standard is not very
+        // helpful.
+        return {
+            empty : false,
+            unusedTokens : [],
+            unusedInput : [],
+            overflow : -2,
+            charsLeftOver : 0,
+            nullInput : false,
+            invalidMonth : null,
+            invalidFormat : false,
+            userInvalidated : false,
+            iso: false
+        };
+    }
+
+    function printMsg(msg) {
+        if (moment.suppressDeprecationWarnings === false &&
+                typeof console !== 'undefined' && console.warn) {
+            console.warn('Deprecation warning: ' + msg);
+        }
+    }
+
+    function deprecate(msg, fn) {
+        var firstTime = true;
+        return extend(function () {
+            if (firstTime) {
+                printMsg(msg);
+                firstTime = false;
+            }
+            return fn.apply(this, arguments);
+        }, fn);
+    }
+
+    function deprecateSimple(name, msg) {
+        if (!deprecations[name]) {
+            printMsg(msg);
+            deprecations[name] = true;
+        }
+    }
+
+    function padToken(func, count) {
+        return function (a) {
+            return leftZeroFill(func.call(this, a), count);
+        };
+    }
+    function ordinalizeToken(func, period) {
+        return function (a) {
+            return this.localeData().ordinal(func.call(this, a), period);
+        };
+    }
+
+    while (ordinalizeTokens.length) {
+        i = ordinalizeTokens.pop();
+        formatTokenFunctions[i + 'o'] = ordinalizeToken(formatTokenFunctions[i], i);
+    }
+    while (paddedTokens.length) {
+        i = paddedTokens.pop();
+        formatTokenFunctions[i + i] = padToken(formatTokenFunctions[i], 2);
+    }
+    formatTokenFunctions.DDDD = padToken(formatTokenFunctions.DDD, 3);
+
+
+    /************************************
+        Constructors
+    ************************************/
+
+    function Locale() {
+    }
+
+    // Moment prototype object
+    function Moment(config, skipOverflow) {
+        if (skipOverflow !== false) {
+            checkOverflow(config);
+        }
+        copyConfig(this, config);
+        this._d = new Date(+config._d);
+    }
+
+    // Duration Constructor
+    function Duration(duration) {
+        var normalizedInput = normalizeObjectUnits(duration),
+            years = normalizedInput.year || 0,
+            quarters = normalizedInput.quarter || 0,
+            months = normalizedInput.month || 0,
+            weeks = normalizedInput.week || 0,
+            days = normalizedInput.day || 0,
+            hours = normalizedInput.hour || 0,
+            minutes = normalizedInput.minute || 0,
+            seconds = normalizedInput.second || 0,
+            milliseconds = normalizedInput.millisecond || 0;
+
+        // representation for dateAddRemove
+        this._milliseconds = +milliseconds +
+            seconds * 1e3 + // 1000
+            minutes * 6e4 + // 1000 * 60
+            hours * 36e5; // 1000 * 60 * 60
+        // Because of dateAddRemove treats 24 hours as different from a
+        // day when working around DST, we need to store them separately
+        this._days = +days +
+            weeks * 7;
+        // It is impossible translate months into days without knowing
+        // which months you are are talking about, so we have to store
+        // it separately.
+        this._months = +months +
+            quarters * 3 +
+            years * 12;
+
+        this._data = {};
+
+        this._locale = moment.localeData();
+
+        this._bubble();
+    }
+
+    /************************************
+        Helpers
+    ************************************/
+
+
+    function extend(a, b) {
+        for (var i in b) {
+            if (hasOwnProp(b, i)) {
+                a[i] = b[i];
+            }
+        }
+
+        if (hasOwnProp(b, 'toString')) {
+            a.toString = b.toString;
+        }
+
+        if (hasOwnProp(b, 'valueOf')) {
+            a.valueOf = b.valueOf;
+        }
+
+        return a;
+    }
+
+    function copyConfig(to, from) {
+        var i, prop, val;
+
+        if (typeof from._isAMomentObject !== 'undefined') {
+            to._isAMomentObject = from._isAMomentObject;
+        }
+        if (typeof from._i !== 'undefined') {
+            to._i = from._i;
+        }
+        if (typeof from._f !== 'undefined') {
+            to._f = from._f;
+        }
+        if (typeof from._l !== 'undefined') {
+            to._l = from._l;
+        }
+        if (typeof from._strict !== 'undefined') {
+            to._strict = from._strict;
+        }
+        if (typeof from._tzm !== 'undefined') {
+            to._tzm = from._tzm;
+        }
+        if (typeof from._isUTC !== 'undefined') {
+            to._isUTC = from._isUTC;
+        }
+        if (typeof from._offset !== 'undefined') {
+            to._offset = from._offset;
+        }
+        if (typeof from._pf !== 'undefined') {
+            to._pf = from._pf;
+        }
+        if (typeof from._locale !== 'undefined') {
+            to._locale = from._locale;
+        }
+
+        if (momentProperties.length > 0) {
+            for (i in momentProperties) {
+                prop = momentProperties[i];
+                val = from[prop];
+                if (typeof val !== 'undefined') {
+                    to[prop] = val;
+                }
+            }
+        }
+
+        return to;
+    }
+
+    function absRound(number) {
+        if (number < 0) {
+            return Math.ceil(number);
+        } else {
+            return Math.floor(number);
+        }
+    }
+
+    // left zero fill a number
+    // see http://jsperf.com/left-zero-filling for performance comparison
+    function leftZeroFill(number, targetLength, forceSign) {
+        var output = '' + Math.abs(number),
+            sign = number >= 0;
+
+        while (output.length < targetLength) {
+            output = '0' + output;
+        }
+        return (sign ? (forceSign ? '+' : '') : '-') + output;
+    }
+
+    function positiveMomentsDifference(base, other) {
+        var res = {milliseconds: 0, months: 0};
+
+        res.months = other.month() - base.month() +
+            (other.year() - base.year()) * 12;
+        if (base.clone().add(res.months, 'M').isAfter(other)) {
+            --res.months;
+        }
+
+        res.milliseconds = +other - +(base.clone().add(res.months, 'M'));
+
+        return res;
+    }
+
+    function momentsDifference(base, other) {
+        var res;
+        other = makeAs(other, base);
+        if (base.isBefore(other)) {
+            res = positiveMomentsDifference(base, other);
+        } else {
+            res = positiveMomentsDifference(other, base);
+            res.milliseconds = -res.milliseconds;
+            res.months = -res.months;
+        }
+
+        return res;
+    }
+
+    // TODO: remove 'name' arg after deprecation is removed
+    function createAdder(direction, name) {
+        return function (val, period) {
+            var dur, tmp;
+            //invert the arguments, but complain about it
+            if (period !== null && !isNaN(+period)) {
+                deprecateSimple(name, 'moment().' + name  + '(period, number) is deprecated. Please use moment().' + name + '(number, period).');
+                tmp = val; val = period; period = tmp;
+            }
+
+            val = typeof val === 'string' ? +val : val;
+            dur = moment.duration(val, period);
+            addOrSubtractDurationFromMoment(this, dur, direction);
+            return this;
+        };
+    }
+
+    function addOrSubtractDurationFromMoment(mom, duration, isAdding, updateOffset) {
+        var milliseconds = duration._milliseconds,
+            days = duration._days,
+            months = duration._months;
+        updateOffset = updateOffset == null ? true : updateOffset;
+
+        if (milliseconds) {
+            mom._d.setTime(+mom._d + milliseconds * isAdding);
+        }
+        if (days) {
+            rawSetter(mom, 'Date', rawGetter(mom, 'Date') + days * isAdding);
+        }
+        if (months) {
+            rawMonthSetter(mom, rawGetter(mom, 'Month') + months * isAdding);
+        }
+        if (updateOffset) {
+            moment.updateOffset(mom, days || months);
+        }
+    }
+
+    // check if is an array
+    function isArray(input) {
+        return Object.prototype.toString.call(input) === '[object Array]';
+    }
+
+    function isDate(input) {
+        return Object.prototype.toString.call(input) === '[object Date]' ||
+            input instanceof Date;
+    }
+
+    // compare two arrays, return the number of differences
+    function compareArrays(array1, array2, dontConvert) {
+        var len = Math.min(array1.length, array2.length),
+            lengthDiff = Math.abs(array1.length - array2.length),
+            diffs = 0,
+            i;
+        for (i = 0; i < len; i++) {
+            if ((dontConvert && array1[i] !== array2[i]) ||
+                (!dontConvert && toInt(array1[i]) !== toInt(array2[i]))) {
+                diffs++;
+            }
+        }
+        return diffs + lengthDiff;
+    }
+
+    function normalizeUnits(units) {
+        if (units) {
+            var lowered = units.toLowerCase().replace(/(.)s$/, '$1');
+            units = unitAliases[units] || camelFunctions[lowered] || lowered;
+        }
+        return units;
+    }
+
+    function normalizeObjectUnits(inputObject) {
+        var normalizedInput = {},
+            normalizedProp,
+            prop;
+
+        for (prop in inputObject) {
+            if (hasOwnProp(inputObject, prop)) {
+                normalizedProp = normalizeUnits(prop);
+                if (normalizedProp) {
+                    normalizedInput[normalizedProp] = inputObject[prop];
+                }
+            }
+        }
+
+        return normalizedInput;
+    }
+
+    function makeList(field) {
+        var count, setter;
+
+        if (field.indexOf('week') === 0) {
+            count = 7;
+            setter = 'day';
+        }
+        else if (field.indexOf('month') === 0) {
+            count = 12;
+            setter = 'month';
+        }
+        else {
+            return;
+        }
+
+        moment[field] = function (format, index) {
+            var i, getter,
+                method = moment._locale[field],
+                results = [];
+
+            if (typeof format === 'number') {
+                index = format;
+                format = undefined;
+            }
+
+            getter = function (i) {
+                var m = moment().utc().set(setter, i);
+                return method.call(moment._locale, m, format || '');
+            };
+
+            if (index != null) {
+                return getter(index);
+            }
+            else {
+                for (i = 0; i < count; i++) {
+                    results.push(getter(i));
+                }
+                return results;
+            }
+        };
+    }
+
+    function toInt(argumentForCoercion) {
+        var coercedNumber = +argumentForCoercion,
+            value = 0;
+
+        if (coercedNumber !== 0 && isFinite(coercedNumber)) {
+            if (coercedNumber >= 0) {
+                value = Math.floor(coercedNumber);
+            } else {
+                value = Math.ceil(coercedNumber);
+            }
+        }
+
+        return value;
+    }
+
+    function daysInMonth(year, month) {
+        return new Date(Date.UTC(year, month + 1, 0)).getUTCDate();
+    }
+
+    function weeksInYear(year, dow, doy) {
+        return weekOfYear(moment([year, 11, 31 + dow - doy]), dow, doy).week;
+    }
+
+    function daysInYear(year) {
+        return isLeapYear(year) ? 366 : 365;
+    }
+
+    function isLeapYear(year) {
+        return (year % 4 === 0 && year % 100 !== 0) || year % 400 === 0;
+    }
+
+    function checkOverflow(m) {
+        var overflow;
+        if (m._a && m._pf.overflow === -2) {
+            overflow =
+                m._a[MONTH] < 0 || m._a[MONTH] > 11 ? MONTH :
+                m._a[DATE] < 1 || m._a[DATE] > daysInMonth(m._a[YEAR], m._a[MONTH]) ? DATE :
+                m._a[HOUR] < 0 || m._a[HOUR] > 24 ||
+                    (m._a[HOUR] === 24 && (m._a[MINUTE] !== 0 ||
+                                           m._a[SECOND] !== 0 ||
+                                           m._a[MILLISECOND] !== 0)) ? HOUR :
+                m._a[MINUTE] < 0 || m._a[MINUTE] > 59 ? MINUTE :
+                m._a[SECOND] < 0 || m._a[SECOND] > 59 ? SECOND :
+                m._a[MILLISECOND] < 0 || m._a[MILLISECOND] > 999 ? MILLISECOND :
+                -1;
+
+            if (m._pf._overflowDayOfYear && (overflow < YEAR || overflow > DATE)) {
+                overflow = DATE;
+            }
+
+            m._pf.overflow = overflow;
+        }
+    }
+
+    function isValid(m) {
+        if (m._isValid == null) {
+            m._isValid = !isNaN(m._d.getTime()) &&
+                m._pf.overflow < 0 &&
+                !m._pf.empty &&
+                !m._pf.invalidMonth &&
+                !m._pf.nullInput &&
+                !m._pf.invalidFormat &&
+                !m._pf.userInvalidated;
+
+            if (m._strict) {
+                m._isValid = m._isValid &&
+                    m._pf.charsLeftOver === 0 &&
+                    m._pf.unusedTokens.length === 0 &&
+                    m._pf.bigHour === undefined;
+            }
+        }
+        return m._isValid;
+    }
+
+    function normalizeLocale(key) {
+        return key ? key.toLowerCase().replace('_', '-') : key;
+    }
+
+    // pick the locale from the array
+    // try ['en-au', 'en-gb'] as 'en-au', 'en-gb', 'en', as in move through the list trying each
+    // substring from most specific to least, but move to the next array item if it's a more specific variant than the current root
+    function chooseLocale(names) {
+        var i = 0, j, next, locale, split;
+
+        while (i < names.length) {
+            split = normalizeLocale(names[i]).split('-');
+            j = split.length;
+            next = normalizeLocale(names[i + 1]);
+            next = next ? next.split('-') : null;
+            while (j > 0) {
+                locale = loadLocale(split.slice(0, j).join('-'));
+                if (locale) {
+                    return locale;
+                }
+                if (next && next.length >= j && compareArrays(split, next, true) >= j - 1) {
+                    //the next array item is better than a shallower substring of this one
+                    break;
+                }
+                j--;
+            }
+            i++;
+        }
+        return null;
+    }
+
+    function loadLocale(name) {
+        var oldLocale = null;
+        if (!locales[name] && hasModule) {
+            try {
+                oldLocale = moment.locale();
+                require('./locale/' + name);
+                // because defineLocale currently also sets the global locale, we want to undo that for lazy loaded locales
+                moment.locale(oldLocale);
+            } catch (e) { }
+        }
+        return locales[name];
+    }
+
+    // Return a moment from input, that is local/utc/zone equivalent to model.
+    function makeAs(input, model) {
+        var res, diff;
+        if (model._isUTC) {
+            res = model.clone();
+            diff = (moment.isMoment(input) || isDate(input) ?
+                    +input : +moment(input)) - (+res);
+            // Use low-level api, because this fn is low-level api.
+            res._d.setTime(+res._d + diff);
+            moment.updateOffset(res, false);
+            return res;
+        } else {
+            return moment(input).local();
+        }
+    }
+
+    /************************************
+        Locale
+    ************************************/
+
+
+    extend(Locale.prototype, {
+
+        set : function (config) {
+            var prop, i;
+            for (i in config) {
+                prop = config[i];
+                if (typeof prop === 'function') {
+                    this[i] = prop;
+                } else {
+                    this['_' + i] = prop;
+                }
+            }
+            // Lenient ordinal parsing accepts just a number in addition to
+            // number + (possibly) stuff coming from _ordinalParseLenient.
+            this._ordinalParseLenient = new RegExp(this._ordinalParse.source + '|' + /\d{1,2}/.source);
+        },
+
+        _months : 'January_February_March_April_May_June_July_August_September_October_November_December'.split('_'),
+        months : function (m) {
+            return this._months[m.month()];
+        },
+
+        _monthsShort : 'Jan_Feb_Mar_Apr_May_Jun_Jul_Aug_Sep_Oct_Nov_Dec'.split('_'),
+        monthsShort : function (m) {
+            return this._monthsShort[m.month()];
+        },
+
+        monthsParse : function (monthName, format, strict) {
+            var i, mom, regex;
+
+            if (!this._monthsParse) {
+                this._monthsParse = [];
+                this._longMonthsParse = [];
+                this._shortMonthsParse = [];
+            }
+
+            for (i = 0; i < 12; i++) {
+                // make the regex if we don't have it already
+                mom = moment.utc([2000, i]);
+                if (strict && !this._longMonthsParse[i]) {
+                    this._longMonthsParse[i] = new RegExp('^' + this.months(mom, '').replace('.', '') + '$', 'i');
+                    this._shortMonthsParse[i] = new RegExp('^' + this.monthsShort(mom, '').replace('.', '') + '$', 'i');
+                }
+                if (!strict && !this._monthsParse[i]) {
+                    regex = '^' + this.months(mom, '') + '|^' + this.monthsShort(mom, '');
+                    this._monthsParse[i] = new RegExp(regex.replace('.', ''), 'i');
+                }
+                // test the regex
+                if (strict && format === 'MMMM' && this._longMonthsParse[i].test(monthName)) {
+                    return i;
+                } else if (strict && format === 'MMM' && this._shortMonthsParse[i].test(monthName)) {
+                    return i;
+                } else if (!strict && this._monthsParse[i].test(monthName)) {
+                    return i;
+                }
+            }
+        },
+
+        _weekdays : 'Sunday_Monday_Tuesday_Wednesday_Thursday_Friday_Saturday'.split('_'),
+        weekdays : function (m) {
+            return this._weekdays[m.day()];
+        },
+
+        _weekdaysShort : 'Sun_Mon_Tue_Wed_Thu_Fri_Sat'.split('_'),
+        weekdaysShort : function (m) {
+            return this._weekdaysShort[m.day()];
+        },
+
+        _weekdaysMin : 'Su_Mo_Tu_We_Th_Fr_Sa'.split('_'),
+        weekdaysMin : function (m) {
+            return this._weekdaysMin[m.day()];
+        },
+
+        weekdaysParse : function (weekdayName) {
+            var i, mom, regex;
+
+            if (!this._weekdaysParse) {
+                this._weekdaysParse = [];
+            }
+
+            for (i = 0; i < 7; i++) {
+                // make the regex if we don't have it already
+                if (!this._weekdaysParse[i]) {
+                    mom = moment([2000, 1]).day(i);
+                    regex = '^' + this.weekdays(mom, '') + '|^' + this.weekdaysShort(mom, '') + '|^' + this.weekdaysMin(mom, '');
+                    this._weekdaysParse[i] = new RegExp(regex.replace('.', ''), 'i');
+                }
+                // test the regex
+                if (this._weekdaysParse[i].test(weekdayName)) {
+                    return i;
+                }
+            }
+        },
+
+        _longDateFormat : {
+            LTS : 'h:mm:ss A',
+            LT : 'h:mm A',
+            L : 'MM/DD/YYYY',
+            LL : 'MMMM D, YYYY',
+            LLL : 'MMMM D, YYYY LT',
+            LLLL : 'dddd, MMMM D, YYYY LT'
+        },
+        longDateFormat : function (key) {
+            var output = this._longDateFormat[key];
+            if (!output && this._longDateFormat[key.toUpperCase()]) {
+                output = this._longDateFormat[key.toUpperCase()].replace(/MMMM|MM|DD|dddd/g, function (val) {
+                    return val.slice(1);
+                });
+                this._longDateFormat[key] = output;
+            }
+            return output;
+        },
+
+        isPM : function (input) {
+            // IE8 Quirks Mode & IE7 Standards Mode do not allow accessing strings like arrays
+            // Using charAt should be more compatible.
+            return ((input + '').toLowerCase().charAt(0) === 'p');
+        },
+
+        _meridiemParse : /[ap]\.?m?\.?/i,
+        meridiem : function (hours, minutes, isLower) {
+            if (hours > 11) {
+                return isLower ? 'pm' : 'PM';
+            } else {
+                return isLower ? 'am' : 'AM';
+            }
+        },
+
+        _calendar : {
+            sameDay : '[Today at] LT',
+            nextDay : '[Tomorrow at] LT',
+            nextWeek : 'dddd [at] LT',
+            lastDay : '[Yesterday at] LT',
+            lastWeek : '[Last] dddd [at] LT',
+            sameElse : 'L'
+        },
+        calendar : function (key, mom, now) {
+            var output = this._calendar[key];
+            return typeof output === 'function' ? output.apply(mom, [now]) : output;
+        },
+
+        _relativeTime : {
+            future : 'in %s',
+            past : '%s ago',
+            s : 'a few seconds',
+            m : 'a minute',
+            mm : '%d minutes',
+            h : 'an hour',
+            hh : '%d hours',
+            d : 'a day',
+            dd : '%d days',
+            M : 'a month',
+            MM : '%d months',
+            y : 'a year',
+            yy : '%d years'
+        },
+
+        relativeTime : function (number, withoutSuffix, string, isFuture) {
+            var output = this._relativeTime[string];
+            return (typeof output === 'function') ?
+                output(number, withoutSuffix, string, isFuture) :
+                output.replace(/%d/i, number);
+        },
+
+        pastFuture : function (diff, output) {
+            var format = this._relativeTime[diff > 0 ? 'future' : 'past'];
+            return typeof format === 'function' ? format(output) : format.replace(/%s/i, output);
+        },
+
+        ordinal : function (number) {
+            return this._ordinal.replace('%d', number);
+        },
+        _ordinal : '%d',
+        _ordinalParse : /\d{1,2}/,
+
+        preparse : function (string) {
+            return string;
+        },
+
+        postformat : function (string) {
+            return string;
+        },
+
+        week : function (mom) {
+            return weekOfYear(mom, this._week.dow, this._week.doy).week;
+        },
+
+        _week : {
+            dow : 0, // Sunday is the first day of the week.
+            doy : 6  // The week that contains Jan 1st is the first week of the year.
+        },
+
+        _invalidDate: 'Invalid date',
+        invalidDate: function () {
+            return this._invalidDate;
+        }
+    });
+
+    /************************************
+        Formatting
+    ************************************/
+
+
+    function removeFormattingTokens(input) {
+        if (input.match(/\[[\s\S]/)) {
+            return input.replace(/^\[|\]$/g, '');
+        }
+        return input.replace(/\\/g, '');
+    }
+
+    function makeFormatFunction(format) {
+        var array = format.match(formattingTokens), i, length;
+
+        for (i = 0, length = array.length; i < length; i++) {
+            if (formatTokenFunctions[array[i]]) {
+                array[i] = formatTokenFunctions[array[i]];
+            } else {
+                array[i] = removeFormattingTokens(array[i]);
+            }
+        }
+
+        return function (mom) {
+            var output = '';
+            for (i = 0; i < length; i++) {
+                output += array[i] instanceof Function ? array[i].call(mom, format) : array[i];
+            }
+            return output;
+        };
+    }
+
+    // format date using native date object
+    function formatMoment(m, format) {
+        if (!m.isValid()) {
+            return m.localeData().invalidDate();
+        }
+
+        format = expandFormat(format, m.localeData());
+
+        if (!formatFunctions[format]) {
+            formatFunctions[format] = makeFormatFunction(format);
+        }
+
+        return formatFunctions[format](m);
+    }
+
+    function expandFormat(format, locale) {
+        var i = 5;
+
+        function replaceLongDateFormatTokens(input) {
+            return locale.longDateFormat(input) || input;
+        }
+
+        localFormattingTokens.lastIndex = 0;
+        while (i >= 0 && localFormattingTokens.test(format)) {
+            format = format.replace(localFormattingTokens, replaceLongDateFormatTokens);
+            localFormattingTokens.lastIndex = 0;
+            i -= 1;
+        }
+
+        return format;
+    }
+
+
+    /************************************
+        Parsing
+    ************************************/
+
+
+    // get the regex to find the next token
+    function getParseRegexForToken(token, config) {
+        var a, strict = config._strict;
+        switch (token) {
+        case 'Q':
+            return parseTokenOneDigit;
+        case 'DDDD':
+            return parseTokenThreeDigits;
+        case 'YYYY':
+        case 'GGGG':
+        case 'gggg':
+            return strict ? parseTokenFourDigits : parseTokenOneToFourDigits;
+        case 'Y':
+        case 'G':
+        case 'g':
+            return parseTokenSignedNumber;
+        case 'YYYYYY':
+        case 'YYYYY':
+        case 'GGGGG':
+        case 'ggggg':
+            return strict ? parseTokenSixDigits : parseTokenOneToSixDigits;
+        case 'S':
+            if (strict) {
+                return parseTokenOneDigit;
+            }
+            /* falls through */
+        case 'SS':
+            if (strict) {
+                return parseTokenTwoDigits;
+            }
+            /* falls through */
+        case 'SSS':
+            if (strict) {
+                return parseTokenThreeDigits;
+            }
+            /* falls through */
+        case 'DDD':
+            return parseTokenOneToThreeDigits;
+        case 'MMM':
+        case 'MMMM':
+        case 'dd':
+        case 'ddd':
+        case 'dddd':
+            return parseTokenWord;
+        case 'a':
+        case 'A':
+            return config._locale._meridiemParse;
+        case 'x':
+            return parseTokenOffsetMs;
+        case 'X':
+            return parseTokenTimestampMs;
+        case 'Z':
+        case 'ZZ':
+            return parseTokenTimezone;
+        case 'T':
+            return parseTokenT;
+        case 'SSSS':
+            return parseTokenDigits;
+        case 'MM':
+        case 'DD':
+        case 'YY':
+        case 'GG':
+        case 'gg':
+        case 'HH':
+        case 'hh':
+        case 'mm':
+        case 'ss':
+        case 'ww':
+        case 'WW':
+            return strict ? parseTokenTwoDigits : parseTokenOneOrTwoDigits;
+        case 'M':
+        case 'D':
+        case 'd':
+        case 'H':
+        case 'h':
+        case 'm':
+        case 's':
+        case 'w':
+        case 'W':
+        case 'e':
+        case 'E':
+            return parseTokenOneOrTwoDigits;
+        case 'Do':
+            return strict ? config._locale._ordinalParse : config._locale._ordinalParseLenient;
+        default :
+            a = new RegExp(regexpEscape(unescapeFormat(token.replace('\\', '')), 'i'));
+            return a;
+        }
+    }
+
+    function timezoneMinutesFromString(string) {
+        string = string || '';
+        var possibleTzMatches = (string.match(parseTokenTimezone) || []),
+            tzChunk = possibleTzMatches[possibleTzMatches.length - 1] || [],
+            parts = (tzChunk + '').match(parseTimezoneChunker) || ['-', 0, 0],
+            minutes = +(parts[1] * 60) + toInt(parts[2]);
+
+        return parts[0] === '+' ? -minutes : minutes;
+    }
+
+    // function to convert string input to date
+    function addTimeToArrayFromToken(token, input, config) {
+        var a, datePartArray = config._a;
+
+        switch (token) {
+        // QUARTER
+        case 'Q':
+            if (input != null) {
+                datePartArray[MONTH] = (toInt(input) - 1) * 3;
+            }
+            break;
+        // MONTH
+        case 'M' : // fall through to MM
+        case 'MM' :
+            if (input != null) {
+                datePartArray[MONTH] = toInt(input) - 1;
+            }
+            break;
+        case 'MMM' : // fall through to MMMM
+        case 'MMMM' :
+            a = config._locale.monthsParse(input, token, config._strict);
+            // if we didn't find a month name, mark the date as invalid.
+            if (a != null) {
+                datePartArray[MONTH] = a;
+            } else {
+                config._pf.invalidMonth = input;
+            }
+            break;
+        // DAY OF MONTH
+        case 'D' : // fall through to DD
+        case 'DD' :
+            if (input != null) {
+                datePartArray[DATE] = toInt(input);
+            }
+            break;
+        case 'Do' :
+            if (input != null) {
+                datePartArray[DATE] = toInt(parseInt(
+                            input.match(/\d{1,2}/)[0], 10));
+            }
+            break;
+        // DAY OF YEAR
+        case 'DDD' : // fall through to DDDD
+        case 'DDDD' :
+            if (input != null) {
+                config._dayOfYear = toInt(input);
+            }
+
+            break;
+        // YEAR
+        case 'YY' :
+            datePartArray[YEAR] = moment.parseTwoDigitYear(input);
+            break;
+        case 'YYYY' :
+        case 'YYYYY' :
+        case 'YYYYYY' :
+            datePartArray[YEAR] = toInt(input);
+            break;
+        // AM / PM
+        case 'a' : // fall through to A
+        case 'A' :
+            config._isPm = config._locale.isPM(input);
+            break;
+        // HOUR
+        case 'h' : // fall through to hh
+        case 'hh' :
+            config._pf.bigHour = true;
+            /* falls through */
+        case 'H' : // fall through to HH
+        case 'HH' :
+            datePartArray[HOUR] = toInt(input);
+            break;
+        // MINUTE
+        case 'm' : // fall through to mm
+        case 'mm' :
+            datePartArray[MINUTE] = toInt(input);
+            break;
+        // SECOND
+        case 's' : // fall through to ss
+        case 'ss' :
+            datePartArray[SECOND] = toInt(input);
+            break;
+        // MILLISECOND
+        case 'S' :
+        case 'SS' :
+        case 'SSS' :
+        case 'SSSS' :
+            datePartArray[MILLISECOND] = toInt(('0.' + input) * 1000);
+            break;
+        // UNIX OFFSET (MILLISECONDS)
+        case 'x':
+            config._d = new Date(toInt(input));
+            break;
+        // UNIX TIMESTAMP WITH MS
+        case 'X':
+            config._d = new Date(parseFloat(input) * 1000);
+            break;
+        // TIMEZONE
+        case 'Z' : // fall through to ZZ
+        case 'ZZ' :
+            config._useUTC = true;
+            config._tzm = timezoneMinutesFromString(input);
+            break;
+        // WEEKDAY - human
+        case 'dd':
+        case 'ddd':
+        case 'dddd':
+            a = config._locale.weekdaysParse(input);
+            // if we didn't get a weekday name, mark the date as invalid
+            if (a != null) {
+                config._w = config._w || {};
+                config._w['d'] = a;
+            } else {
+                config._pf.invalidWeekday = input;
+            }
+            break;
+        // WEEK, WEEK DAY - numeric
+        case 'w':
+        case 'ww':
+        case 'W':
+        case 'WW':
+        case 'd':
+        case 'e':
+        case 'E':
+            token = token.substr(0, 1);
+            /* falls through */
+        case 'gggg':
+        case 'GGGG':
+        case 'GGGGG':
+            token = token.substr(0, 2);
+            if (input) {
+                config._w = config._w || {};
+                config._w[token] = toInt(input);
+            }
+            break;
+        case 'gg':
+        case 'GG':
+            config._w = config._w || {};
+            config._w[token] = moment.parseTwoDigitYear(input);
+        }
+    }
+
+    function dayOfYearFromWeekInfo(config) {
+        var w, weekYear, week, weekday, dow, doy, temp;
+
+        w = config._w;
+        if (w.GG != null || w.W != null || w.E != null) {
+            dow = 1;
+            doy = 4;
+
+            // TODO: We need to take the current isoWeekYear, but that depends on
+            // how we interpret now (local, utc, fixed offset). So create
+            // a now version of current config (take local/utc/offset flags, and
+            // create now).
+            weekYear = dfl(w.GG, config._a[YEAR], weekOfYear(moment(), 1, 4).year);
+            week = dfl(w.W, 1);
+            weekday = dfl(w.E, 1);
+        } else {
+            dow = config._locale._week.dow;
+            doy = config._locale._week.doy;
+
+            weekYear = dfl(w.gg, config._a[YEAR], weekOfYear(moment(), dow, doy).year);
+            week = dfl(w.w, 1);
+
+            if (w.d != null) {
+                // weekday -- low day numbers are considered next week
+                weekday = w.d;
+                if (weekday < dow) {
+                    ++week;
+                }
+            } else if (w.e != null) {
+                // local weekday -- counting starts from begining of week
+                weekday = w.e + dow;
+            } else {
+                // default to begining of week
+                weekday = dow;
+            }
+        }
+        temp = dayOfYearFromWeeks(weekYear, week, weekday, doy, dow);
+
+        config._a[YEAR] = temp.year;
+        config._dayOfYear = temp.dayOfYear;
+    }
+
+    // convert an array to a date.
+    // the array should mirror the parameters below
+    // note: all values past the year are optional and will default to the lowest possible value.
+    // [year, month, day , hour, minute, second, millisecond]
+    function dateFromConfig(config) {
+        var i, date, input = [], currentDate, yearToUse;
+
+        if (config._d) {
+            return;
+        }
+
+        currentDate = currentDateArray(config);
+
+        //compute day of the year from weeks and weekdays
+        if (config._w && config._a[DATE] == null && config._a[MONTH] == null) {
+            dayOfYearFromWeekInfo(config);
+        }
+
+        //if the day of the year is set, figure out what it is
+        if (config._dayOfYear) {
+            yearToUse = dfl(config._a[YEAR], currentDate[YEAR]);
+
+            if (config._dayOfYear > daysInYear(yearToUse)) {
+                config._pf._overflowDayOfYear = true;
+            }
+
+            date = makeUTCDate(yearToUse, 0, config._dayOfYear);
+            config._a[MONTH] = date.getUTCMonth();
+            config._a[DATE] = date.getUTCDate();
+        }
+
+        // Default to current date.
+        // * if no year, month, day of month are given, default to today
+        // * if day of month is given, default month and year
+        // * if month is given, default only year
+        // * if year is given, don't default anything
+        for (i = 0; i < 3 && config._a[i] == null; ++i) {
+            config._a[i] = input[i] = currentDate[i];
+        }
+
+        // Zero out whatever was not defaulted, including time
+        for (; i < 7; i++) {
+            config._a[i] = input[i] = (config._a[i] == null) ? (i === 2 ? 1 : 0) : config._a[i];
+        }
+
+        // Check for 24:00:00.000
+        if (config._a[HOUR] === 24 &&
+                config._a[MINUTE] === 0 &&
+                config._a[SECOND] === 0 &&
+                config._a[MILLISECOND] === 0) {
+            config._nextDay = true;
+            config._a[HOUR] = 0;
+        }
+
+        config._d = (config._useUTC ? makeUTCDate : makeDate).apply(null, input);
+        // Apply timezone offset from input. The actual zone can be changed
+        // with parseZone.
+        if (config._tzm != null) {
+            config._d.setUTCMinutes(config._d.getUTCMinutes() + config._tzm);
+        }
+
+        if (config._nextDay) {
+            config._a[HOUR] = 24;
+        }
+    }
+
+    function dateFromObject(config) {
+        var normalizedInput;
+
+        if (config._d) {
+            return;
+        }
+
+        normalizedInput = normalizeObjectUnits(config._i);
+        config._a = [
+            normalizedInput.year,
+            normalizedInput.month,
+            normalizedInput.day || normalizedInput.date,
+            normalizedInput.hour,
+            normalizedInput.minute,
+            normalizedInput.second,
+            normalizedInput.millisecond
+        ];
+
+        dateFromConfig(config);
+    }
+
+    function currentDateArray(config) {
+        var now = new Date();
+        if (config._useUTC) {
+            return [
+                now.getUTCFullYear(),
+                now.getUTCMonth(),
+                now.getUTCDate()
+            ];
+        } else {
+            return [now.getFullYear(), now.getMonth(), now.getDate()];
+        }
+    }
+
+    // date from string and format string
+    function makeDateFromStringAndFormat(config) {
+        if (config._f === moment.ISO_8601) {
+            parseISO(config);
+            return;
+        }
+
+        config._a = [];
+        config._pf.empty = true;
+
+        // This array is used to make a Date, either with `new Date` or `Date.UTC`
+        var string = '' + config._i,
+            i, parsedInput, tokens, token, skipped,
+            stringLength = string.length,
+            totalParsedInputLength = 0;
+
+        tokens = expandFormat(config._f, config._locale).match(formattingTokens) || [];
+
+        for (i = 0; i < tokens.length; i++) {
+            token = tokens[i];
+            parsedInput = (string.match(getParseRegexForToken(token, config)) || [])[0];
+            if (parsedInput) {
+                skipped = string.substr(0, string.indexOf(parsedInput));
+                if (skipped.length > 0) {
+                    config._pf.unusedInput.push(skipped);
+                }
+                string = string.slice(string.indexOf(parsedInput) + parsedInput.length);
+                totalParsedInputLength += parsedInput.length;
+            }
+            // don't parse if it's not a known token
+            if (formatTokenFunctions[token]) {
+                if (parsedInput) {
+                    config._pf.empty = false;
+                }
+                else {
+                    config._pf.unusedTokens.push(token);
+                }
+                addTimeToArrayFromToken(token, parsedInput, config);
+            }
+            else if (config._strict && !parsedInput) {
+                config._pf.unusedTokens.push(token);
+            }
+        }
+
+        // add remaining unparsed input length to the string
+        config._pf.charsLeftOver = stringLength - totalParsedInputLength;
+        if (string.length > 0) {
+            config._pf.unusedInput.push(string);
+        }
+
+        // clear _12h flag if hour is <= 12
+        if (config._pf.bigHour === true && config._a[HOUR] <= 12) {
+            config._pf.bigHour = undefined;
+        }
+        // handle am pm
+        if (config._isPm && config._a[HOUR] < 12) {
+            config._a[HOUR] += 12;
+        }
+        // if is 12 am, change hours to 0
+        if (config._isPm === false && config._a[HOUR] === 12) {
+            config._a[HOUR] = 0;
+        }
+        dateFromConfig(config);
+        checkOverflow(config);
+    }
+
+    function unescapeFormat(s) {
+        return s.replace(/\\(\[)|\\(\])|\[([^\]\[]*)\]|\\(.)/g, function (matched, p1, p2, p3, p4) {
+            return p1 || p2 || p3 || p4;
+        });
+    }
+
+    // Code from http://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
+    function regexpEscape(s) {
+        return s.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
+    }
+
+    // date from string and array of format strings
+    function makeDateFromStringAndArray(config) {
+        var tempConfig,
+            bestMoment,
+
+            scoreToBeat,
+            i,
+            currentScore;
+
+        if (config._f.length === 0) {
+            config._pf.invalidFormat = true;
+            config._d = new Date(NaN);
+            return;
+        }
+
+        for (i = 0; i < config._f.length; i++) {
+            currentScore = 0;
+            tempConfig = copyConfig({}, config);
+            if (config._useUTC != null) {
+                tempConfig._useUTC = config._useUTC;
+            }
+            tempConfig._pf = defaultParsingFlags();
+            tempConfig._f = config._f[i];
+            makeDateFromStringAndFormat(tempConfig);
+
+            if (!isValid(tempConfig)) {
+                continue;
+            }
+
+            // if there is any input that was not parsed add a penalty for that format
+            currentScore += tempConfig._pf.charsLeftOver;
+
+            //or tokens
+            currentScore += tempConfig._pf.unusedTokens.length * 10;
+
+            tempConfig._pf.score = currentScore;
+
+            if (scoreToBeat == null || currentScore < scoreToBeat) {
+                scoreToBeat = currentScore;
+                bestMoment = tempConfig;
+            }
+        }
+
+        extend(config, bestMoment || tempConfig);
+    }
+
+    // date from iso format
+    function parseISO(config) {
+        var i, l,
+            string = config._i,
+            match = isoRegex.exec(string);
+
+        if (match) {
+            config._pf.iso = true;
+            for (i = 0, l = isoDates.length; i < l; i++) {
+                if (isoDates[i][1].exec(string)) {
+                    // match[5] should be 'T' or undefined
+                    config._f = isoDates[i][0] + (match[6] || ' ');
+                    break;
+                }
+            }
+            for (i = 0, l = isoTimes.length; i < l; i++) {
+                if (isoTimes[i][1].exec(string)) {
+                    config._f += isoTimes[i][0];
+                    break;
+                }
+            }
+            if (string.match(parseTokenTimezone)) {
+                config._f += 'Z';
+            }
+            makeDateFromStringAndFormat(config);
+        } else {
+            config._isValid = false;
+        }
+    }
+
+    // date from iso format or fallback
+    function makeDateFromString(config) {
+        parseISO(config);
+        if (config._isValid === false) {
+            delete config._isValid;
+            moment.createFromInputFallback(config);
+        }
+    }
+
+    function map(arr, fn) {
+        var res = [], i;
+        for (i = 0; i < arr.length; ++i) {
+            res.push(fn(arr[i], i));
+        }
+        return res;
+    }
+
+    function makeDateFromInput(config) {
+        var input = config._i, matched;
+        if (input === undefined) {
+            config._d = new Date();
+        } else if (isDate(input)) {
+            config._d = new Date(+input);
+        } else if ((matched = aspNetJsonRegex.exec(input)) !== null) {
+            config._d = new Date(+matched[1]);
+        } else if (typeof input === 'string') {
+            makeDateFromString(config);
+        } else if (isArray(input)) {
+            config._a = map(input.slice(0), function (obj) {
+                return parseInt(obj, 10);
+            });
+            dateFromConfig(config);
+        } else if (typeof(input) === 'object') {
+            dateFromObject(config);
+        } else if (typeof(input) === 'number') {
+            // from milliseconds
+            config._d = new Date(input);
+        } else {
+            moment.createFromInputFallback(config);
+        }
+    }
+
+    function makeDate(y, m, d, h, M, s, ms) {
+        //can't just apply() to create a date:
+        //http://stackoverflow.com/questions/181348/instantiating-a-javascript-object-by-calling-prototype-constructor-apply
+        var date = new Date(y, m, d, h, M, s, ms);
+
+        //the date constructor doesn't accept years < 1970
+        if (y < 1970) {
+            date.setFullYear(y);
+        }
+        return date;
+    }
+
+    function makeUTCDate(y) {
+        var date = new Date(Date.UTC.apply(null, arguments));
+        if (y < 1970) {
+            date.setUTCFullYear(y);
+        }
+        return date;
+    }
+
+    function parseWeekday(input, locale) {
+        if (typeof input === 'string') {
+            if (!isNaN(input)) {
+                input = parseInt(input, 10);
+            }
+            else {
+                input = locale.weekdaysParse(input);
+                if (typeof input !== 'number') {
+                    return null;
+                }
+            }
+        }
+        return input;
+    }
+
+    /************************************
+        Relative Time
+    ************************************/
+
+
+    // helper function for moment.fn.from, moment.fn.fromNow, and moment.duration.fn.humanize
+    function substituteTimeAgo(string, number, withoutSuffix, isFuture, locale) {
+        return locale.relativeTime(number || 1, !!withoutSuffix, string, isFuture);
+    }
+
+    function relativeTime(posNegDuration, withoutSuffix, locale) {
+        var duration = moment.duration(posNegDuration).abs(),
+            seconds = round(duration.as('s')),
+            minutes = round(duration.as('m')),
+            hours = round(duration.as('h')),
+            days = round(duration.as('d')),
+            months = round(duration.as('M')),
+            years = round(duration.as('y')),
+
+            args = seconds < relativeTimeThresholds.s && ['s', seconds] ||
+                minutes === 1 && ['m'] ||
+                minutes < relativeTimeThresholds.m && ['mm', minutes] ||
+                hours === 1 && ['h'] ||
+                hours < relativeTimeThresholds.h && ['hh', hours] ||
+                days === 1 && ['d'] ||
+                days < relativeTimeThresholds.d && ['dd', days] ||
+                months === 1 && ['M'] ||
+                months < relativeTimeThresholds.M && ['MM', months] ||
+                years === 1 && ['y'] || ['yy', years];
+
+        args[2] = withoutSuffix;
+        args[3] = +posNegDuration > 0;
+        args[4] = locale;
+        return substituteTimeAgo.apply({}, args);
+    }
+
+
+    /************************************
+        Week of Year
+    ************************************/
+
+
+    // firstDayOfWeek       0 = sun, 6 = sat
+    //                      the day of the week that starts the week
+    //                      (usually sunday or monday)
+    // firstDayOfWeekOfYear 0 = sun, 6 = sat
+    //                      the first week is the week that contains the first
+    //                      of this day of the week
+    //                      (eg. ISO weeks use thursday (4))
+    function weekOfYear(mom, firstDayOfWeek, firstDayOfWeekOfYear) {
+        var end = firstDayOfWeekOfYear - firstDayOfWeek,
+            daysToDayOfWeek = firstDayOfWeekOfYear - mom.day(),
+            adjustedMoment;
+
+
+        if (daysToDayOfWeek > end) {
+            daysToDayOfWeek -= 7;
+        }
+
+        if (daysToDayOfWeek < end - 7) {
+            daysToDayOfWeek += 7;
+        }
+
+        adjustedMoment = moment(mom).add(daysToDayOfWeek, 'd');
+        return {
+            week: Math.ceil(adjustedMoment.dayOfYear() / 7),
+            year: adjustedMoment.year()
+        };
+    }
+
+    //http://en.wikipedia.org/wiki/ISO_week_date#Calculating_a_date_given_the_year.2C_week_number_and_weekday
+    function dayOfYearFromWeeks(year, week, weekday, firstDayOfWeekOfYear, firstDayOfWeek) {
+        var d = makeUTCDate(year, 0, 1).getUTCDay(), daysToAdd, dayOfYear;
+
+        d = d === 0 ? 7 : d;
+        weekday = weekday != null ? weekday : firstDayOfWeek;
+        daysToAdd = firstDayOfWeek - d + (d > firstDayOfWeekOfYear ? 7 : 0) - (d < firstDayOfWeek ? 7 : 0);
+        dayOfYear = 7 * (week - 1) + (weekday - firstDayOfWeek) + daysToAdd + 1;
+
+        return {
+            year: dayOfYear > 0 ? year : year - 1,
+            dayOfYear: dayOfYear > 0 ?  dayOfYear : daysInYear(year - 1) + dayOfYear
+        };
+    }
+
+    /************************************
+        Top Level Functions
+    ************************************/
+
+    function makeMoment(config) {
+        var input = config._i,
+            format = config._f,
+            res;
+
+        config._locale = config._locale || moment.localeData(config._l);
+
+        if (input === null || (format === undefined && input === '')) {
+            return moment.invalid({nullInput: true});
+        }
+
+        if (typeof input === 'string') {
+            config._i = input = config._locale.preparse(input);
+        }
+
+        if (moment.isMoment(input)) {
+            return new Moment(input, true);
+        } else if (format) {
+            if (isArray(format)) {
+                makeDateFromStringAndArray(config);
+            } else {
+                makeDateFromStringAndFormat(config);
+            }
+        } else {
+            makeDateFromInput(config);
+        }
+
+        res = new Moment(config);
+        if (res._nextDay) {
+            // Adding is smart enough around DST
+            res.add(1, 'd');
+            res._nextDay = undefined;
+        }
+
+        return res;
+    }
+
+    moment = function (input, format, locale, strict) {
+        var c;
+
+        if (typeof(locale) === 'boolean') {
+            strict = locale;
+            locale = undefined;
+        }
+        // object construction must be done this way.
+        // https://github.com/moment/moment/issues/1423
+        c = {};
+        c._isAMomentObject = true;
+        c._i = input;
+        c._f = format;
+        c._l = locale;
+        c._strict = strict;
+        c._isUTC = false;
+        c._pf = defaultParsingFlags();
+
+        return makeMoment(c);
+    };
+
+    moment.suppressDeprecationWarnings = false;
+
+    moment.createFromInputFallback = deprecate(
+        'moment construction falls back to js Date. This is ' +
+        'discouraged and will be removed in upcoming major ' +
+        'release. Please refer to ' +
+        'https://github.com/moment/moment/issues/1407 for more info.',
+        function (config) {
+            config._d = new Date(config._i + (config._useUTC ? ' UTC' : ''));
+        }
+    );
+
+    // Pick a moment m from moments so that m[fn](other) is true for all
+    // other. This relies on the function fn to be transitive.
+    //
+    // moments should either be an array of moment objects or an array, whose
+    // first element is an array of moment objects.
+    function pickBy(fn, moments) {
+        var res, i;
+        if (moments.length === 1 && isArray(moments[0])) {
+            moments = moments[0];
+        }
+        if (!moments.length) {
+            return moment();
+        }
+        res = moments[0];
+        for (i = 1; i < moments.length; ++i) {
+            if (moments[i][fn](res)) {
+                res = moments[i];
+            }
+        }
+        return res;
+    }
+
+    moment.min = function () {
+        var args = [].slice.call(arguments, 0);
+
+        return pickBy('isBefore', args);
+    };
+
+    moment.max = function () {
+        var args = [].slice.call(arguments, 0);
+
+        return pickBy('isAfter', args);
+    };
+
+    // creating with utc
+    moment.utc = function (input, format, locale, strict) {
+        var c;
+
+        if (typeof(locale) === 'boolean') {
+            strict = locale;
+            locale = undefined;
+        }
+        // object construction must be done this way.
+        // https://github.com/moment/moment/issues/1423
+        c = {};
+        c._isAMomentObject = true;
+        c._useUTC = true;
+        c._isUTC = true;
+        c._l = locale;
+        c._i = input;
+        c._f = format;
+        c._strict = strict;
+        c._pf = defaultParsingFlags();
+
+        return makeMoment(c).utc();
+    };
+
+    // creating with unix timestamp (in seconds)
+    moment.unix = function (input) {
+        return moment(input * 1000);
+    };
+
+    // duration
+    moment.duration = function (input, key) {
+        var duration = input,
+            // matching against regexp is expensive, do it on demand
+            match = null,
+            sign,
+            ret,
+            parseIso,
+            diffRes;
+
+        if (moment.isDuration(input)) {
+            duration = {
+                ms: input._milliseconds,
+                d: input._days,
+                M: input._months
+            };
+        } else if (typeof input === 'number') {
+            duration = {};
+            if (key) {
+                duration[key] = input;
+            } else {
+                duration.milliseconds = input;
+            }
+        } else if (!!(match = aspNetTimeSpanJsonRegex.exec(input))) {
+            sign = (match[1] === '-') ? -1 : 1;
+            duration = {
+                y: 0,
+                d: toInt(match[DATE]) * sign,
+                h: toInt(match[HOUR]) * sign,
+                m: toInt(match[MINUTE]) * sign,
+                s: toInt(match[SECOND]) * sign,
+                ms: toInt(match[MILLISECOND]) * sign
+            };
+        } else if (!!(match = isoDurationRegex.exec(input))) {
+            sign = (match[1] === '-') ? -1 : 1;
+            parseIso = function (inp) {
+                // We'd normally use ~~inp for this, but unfortunately it also
+                // converts floats to ints.
+                // inp may be undefined, so careful calling replace on it.
+                var res = inp && parseFloat(inp.replace(',', '.'));
+                // apply sign while we're at it
+                return (isNaN(res) ? 0 : res) * sign;
+            };
+            duration = {
+                y: parseIso(match[2]),
+                M: parseIso(match[3]),
+                d: parseIso(match[4]),
+                h: parseIso(match[5]),
+                m: parseIso(match[6]),
+                s: parseIso(match[7]),
+                w: parseIso(match[8])
+            };
+        } else if (typeof duration === 'object' &&
+                ('from' in duration || 'to' in duration)) {
+            diffRes = momentsDifference(moment(duration.from), moment(duration.to));
+
+            duration = {};
+            duration.ms = diffRes.milliseconds;
+            duration.M = diffRes.months;
+        }
+
+        ret = new Duration(duration);
+
+        if (moment.isDuration(input) && hasOwnProp(input, '_locale')) {
+            ret._locale = input._locale;
+        }
+
+        return ret;
+    };
+
+    // version number
+    moment.version = VERSION;
+
+    // default format
+    moment.defaultFormat = isoFormat;
+
+    // constant that refers to the ISO standard
+    moment.ISO_8601 = function () {};
+
+    // Plugins that add properties should also add the key here (null value),
+    // so we can properly clone ourselves.
+    moment.momentProperties = momentProperties;
+
+    // This function will be called whenever a moment is mutated.
+    // It is intended to keep the offset in sync with the timezone.
+    moment.updateOffset = function () {};
+
+    // This function allows you to set a threshold for relative time strings
+    moment.relativeTimeThreshold = function (threshold, limit) {
+        if (relativeTimeThresholds[threshold] === undefined) {
+            return false;
+        }
+        if (limit === undefined) {
+            return relativeTimeThresholds[threshold];
+        }
+        relativeTimeThresholds[threshold] = limit;
+        return true;
+    };
+
+    moment.lang = deprecate(
+        'moment.lang is deprecated. Use moment.locale instead.',
+        function (key, value) {
+            return moment.locale(key, value);
+        }
+    );
+
+    // This function will load locale and then set the global locale.  If
+    // no arguments are passed in, it will simply return the current global
+    // locale key.
+    moment.locale = function (key, values) {
+        var data;
+        if (key) {
+            if (typeof(values) !== 'undefined') {
+                data = moment.defineLocale(key, values);
+            }
+            else {
+                data = moment.localeData(key);
+            }
+
+            if (data) {
+                moment.duration._locale = moment._locale = data;
+            }
+        }
+
+        return moment._locale._abbr;
+    };
+
+    moment.defineLocale = function (name, values) {
+        if (values !== null) {
+            values.abbr = name;
+            if (!locales[name]) {
+                locales[name] = new Locale();
+            }
+            locales[name].set(values);
+
+            // backwards compat for now: also set the locale
+            moment.locale(name);
+
+            return locales[name];
+        } else {
+            // useful for testing
+            delete locales[name];
+            return null;
+        }
+    };
+
+    moment.langData = deprecate(
+        'moment.langData is deprecated. Use moment.localeData instead.',
+        function (key) {
+            return moment.localeData(key);
+        }
+    );
+
+    // returns locale data
+    moment.localeData = function (key) {
+        var locale;
+
+        if (key && key._locale && key._locale._abbr) {
+            key = key._locale._abbr;
+        }
+
+        if (!key) {
+            return moment._locale;
+        }
+
+        if (!isArray(key)) {
+            //short-circuit everything else
+            locale = loadLocale(key);
+            if (locale) {
+                return locale;
+            }
+            key = [key];
+        }
+
+        return chooseLocale(key);
+    };
+
+    // compare moment object
+    moment.isMoment = function (obj) {
+        return obj instanceof Moment ||
+            (obj != null && hasOwnProp(obj, '_isAMomentObject'));
+    };
+
+    // for typechecking Duration objects
+    moment.isDuration = function (obj) {
+        return obj instanceof Duration;
+    };
+
+    for (i = lists.length - 1; i >= 0; --i) {
+        makeList(lists[i]);
+    }
+
+    moment.normalizeUnits = function (units) {
+        return normalizeUnits(units);
+    };
+
+    moment.invalid = function (flags) {
+        var m = moment.utc(NaN);
+        if (flags != null) {
+            extend(m._pf, flags);
+        }
+        else {
+            m._pf.userInvalidated = true;
+        }
+
+        return m;
+    };
+
+    moment.parseZone = function () {
+        return moment.apply(null, arguments).parseZone();
+    };
+
+    moment.parseTwoDigitYear = function (input) {
+        return toInt(input) + (toInt(input) > 68 ? 1900 : 2000);
+    };
+
+    /************************************
+        Moment Prototype
+    ************************************/
+
+
+    extend(moment.fn = Moment.prototype, {
+
+        clone : function () {
+            return moment(this);
+        },
+
+        valueOf : function () {
+            return +this._d + ((this._offset || 0) * 60000);
+        },
+
+        unix : function () {
+            return Math.floor(+this / 1000);
+        },
+
+        toString : function () {
+            return this.clone().locale('en').format('ddd MMM DD YYYY HH:mm:ss [GMT]ZZ');
+        },
+
+        toDate : function () {
+            return this._offset ? new Date(+this) : this._d;
+        },
+
+        toISOString : function () {
+            var m = moment(this).utc();
+            if (0 < m.year() && m.year() <= 9999) {
+                if ('function' === typeof Date.prototype.toISOString) {
+                    // native implementation is ~50x faster, use it when we can
+                    return this.toDate().toISOString();
+                } else {
+                    return formatMoment(m, 'YYYY-MM-DD[T]HH:mm:ss.SSS[Z]');
+                }
+            } else {
+                return formatMoment(m, 'YYYYYY-MM-DD[T]HH:mm:ss.SSS[Z]');
+            }
+        },
+
+        toArray : function () {
+            var m = this;
+            return [
+                m.year(),
+                m.month(),
+                m.date(),
+                m.hours(),
+                m.minutes(),
+                m.seconds(),
+                m.milliseconds()
+            ];
+        },
+
+        isValid : function () {
+            return isValid(this);
+        },
+
+        isDSTShifted : function () {
+            if (this._a) {
+                return this.isValid() && compareArrays(this._a, (this._isUTC ? moment.utc(this._a) : moment(this._a)).toArray()) > 0;
+            }
+
+            return false;
+        },
+
+        parsingFlags : function () {
+            return extend({}, this._pf);
+        },
+
+        invalidAt: function () {
+            return this._pf.overflow;
+        },
+
+        utc : function (keepLocalTime) {
+            return this.zone(0, keepLocalTime);
+        },
+
+        local : function (keepLocalTime) {
+            if (this._isUTC) {
+                this.zone(0, keepLocalTime);
+                this._isUTC = false;
+
+                if (keepLocalTime) {
+                    this.add(this._dateTzOffset(), 'm');
+                }
+            }
+            return this;
+        },
+
+        format : function (inputString) {
+            var output = formatMoment(this, inputString || moment.defaultFormat);
+            return this.localeData().postformat(output);
+        },
+
+        add : createAdder(1, 'add'),
+
+        subtract : createAdder(-1, 'subtract'),
+
+        diff : function (input, units, asFloat) {
+            var that = makeAs(input, this),
+                zoneDiff = (this.zone() - that.zone()) * 6e4,
+                diff, output, daysAdjust;
+
+            units = normalizeUnits(units);
+
+            if (units === 'year' || units === 'month') {
+                // average number of days in the months in the given dates
+                diff = (this.daysInMonth() + that.daysInMonth()) * 432e5; // 24 * 60 * 60 * 1000 / 2
+                // difference in months
+                output = ((this.year() - that.year()) * 12) + (this.month() - that.month());
+                // adjust by taking difference in days, average number of days
+                // and dst in the given months.
+                daysAdjust = (this - moment(this).startOf('month')) -
+                    (that - moment(that).startOf('month'));
+                // same as above but with zones, to negate all dst
+                daysAdjust -= ((this.zone() - moment(this).startOf('month').zone()) -
+                        (that.zone() - moment(that).startOf('month').zone())) * 6e4;
+                output += daysAdjust / diff;
+                if (units === 'year') {
+                    output = output / 12;
+                }
+            } else {
+                diff = (this - that);
+                output = units === 'second' ? diff / 1e3 : // 1000
+                    units === 'minute' ? diff / 6e4 : // 1000 * 60
+                    units === 'hour' ? diff / 36e5 : // 1000 * 60 * 60
+                    units === 'day' ? (diff - zoneDiff) / 864e5 : // 1000 * 60 * 60 * 24, negate dst
+                    units === 'week' ? (diff - zoneDiff) / 6048e5 : // 1000 * 60 * 60 * 24 * 7, negate dst
+                    diff;
+            }
+            return asFloat ? output : absRound(output);
+        },
+
+        from : function (time, withoutSuffix) {
+            return moment.duration({to: this, from: time}).locale(this.locale()).humanize(!withoutSuffix);
+        },
+
+        fromNow : function (withoutSuffix) {
+            return this.from(moment(), withoutSuffix);
+        },
+
+        calendar : function (time) {
+            // We want to compare the start of today, vs this.
+            // Getting start-of-today depends on whether we're zone'd or not.
+            var now = time || moment(),
+                sod = makeAs(now, this).startOf('day'),
+                diff = this.diff(sod, 'days', true),
+                format = diff < -6 ? 'sameElse' :
+                    diff < -1 ? 'lastWeek' :
+                    diff < 0 ? 'lastDay' :
+                    diff < 1 ? 'sameDay' :
+                    diff < 2 ? 'nextDay' :
+                    diff < 7 ? 'nextWeek' : 'sameElse';
+            return this.format(this.localeData().calendar(format, this, moment(now)));
+        },
+
+        isLeapYear : function () {
+            return isLeapYear(this.year());
+        },
+
+        isDST : function () {
+            return (this.zone() < this.clone().month(0).zone() ||
+                this.zone() < this.clone().month(5).zone());
+        },
+
+        day : function (input) {
+            var day = this._isUTC ? this._d.getUTCDay() : this._d.getDay();
+            if (input != null) {
+                input = parseWeekday(input, this.localeData());
+                return this.add(input - day, 'd');
+            } else {
+                return day;
+            }
+        },
+
+        month : makeAccessor('Month', true),
+
+        startOf : function (units) {
+            units = normalizeUnits(units);
+            // the following switch intentionally omits break keywords
+            // to utilize falling through the cases.
+            switch (units) {
+            case 'year':
+                this.month(0);
+                /* falls through */
+            case 'quarter':
+            case 'month':
+                this.date(1);
+                /* falls through */
+            case 'week':
+            case 'isoWeek':
+            case 'day':
+                this.hours(0);
+                /* falls through */
+            case 'hour':
+                this.minutes(0);
+                /* falls through */
+            case 'minute':
+                this.seconds(0);
+                /* falls through */
+            case 'second':
+                this.milliseconds(0);
+                /* falls through */
+            }
+
+            // weeks are a special case
+            if (units === 'week') {
+                this.weekday(0);
+            } else if (units === 'isoWeek') {
+                this.isoWeekday(1);
+            }
+
+            // quarters are also special
+            if (units === 'quarter') {
+                this.month(Math.floor(this.month() / 3) * 3);
+            }
+
+            return this;
+        },
+
+        endOf: function (units) {
+            units = normalizeUnits(units);
+            if (units === undefined || units === 'millisecond') {
+                return this;
+            }
+            return this.startOf(units).add(1, (units === 'isoWeek' ? 'week' : units)).subtract(1, 'ms');
+        },
+
+        isAfter: function (input, units) {
+            var inputMs;
+            units = normalizeUnits(typeof units !== 'undefined' ? units : 'millisecond');
+            if (units === 'millisecond') {
+                input = moment.isMoment(input) ? input : moment(input);
+                return +this > +input;
+            } else {
+                inputMs = moment.isMoment(input) ? +input : +moment(input);
+                return inputMs < +this.clone().startOf(units);
+            }
+        },
+
+        isBefore: function (input, units) {
+            var inputMs;
+            units = normalizeUnits(typeof units !== 'undefined' ? units : 'millisecond');
+            if (units === 'millisecond') {
+                input = moment.isMoment(input) ? input : moment(input);
+                return +this < +input;
+            } else {
+                inputMs = moment.isMoment(input) ? +input : +moment(input);
+                return +this.clone().endOf(units) < inputMs;
+            }
+        },
+
+        isSame: function (input, units) {
+            var inputMs;
+            units = normalizeUnits(units || 'millisecond');
+            if (units === 'millisecond') {
+                input = moment.isMoment(input) ? input : moment(input);
+                return +this === +input;
+            } else {
+                inputMs = +moment(input);
+                return +(this.clone().startOf(units)) <= inputMs && inputMs <= +(this.clone().endOf(units));
+            }
+        },
+
+        min: deprecate(
+                 'moment().min is deprecated, use moment.min instead. https://github.com/moment/moment/issues/1548',
+                 function (other) {
+                     other = moment.apply(null, arguments);
+                     return other < this ? this : other;
+                 }
+         ),
+
+        max: deprecate(
+                'moment().max is deprecated, use moment.max instead. https://github.com/moment/moment/issues/1548',
+                function (other) {
+                    other = moment.apply(null, arguments);
+                    return other > this ? this : other;
+                }
+        ),
+
+        // keepLocalTime = true means only change the timezone, without
+        // affecting the local hour. So 5:31:26 +0300 --[zone(2, true)]-->
+        // 5:31:26 +0200 It is possible that 5:31:26 doesn't exist int zone
+        // +0200, so we adjust the time as needed, to be valid.
+        //
+        // Keeping the time actually adds/subtracts (one hour)
+        // from the actual represented time. That is why we call updateOffset
+        // a second time. In case it wants us to change the offset again
+        // _changeInProgress == true case, then we have to adjust, because
+        // there is no such time in the given timezone.
+        zone : function (input, keepLocalTime) {
+            var offset = this._offset || 0,
+                localAdjust;
+            if (input != null) {
+                if (typeof input === 'string') {
+                    input = timezoneMinutesFromString(input);
+                }
+                if (Math.abs(input) < 16) {
+                    input = input * 60;
+                }
+                if (!this._isUTC && keepLocalTime) {
+                    localAdjust = this._dateTzOffset();
+                }
+                this._offset = input;
+                this._isUTC = true;
+                if (localAdjust != null) {
+                    this.subtract(localAdjust, 'm');
+                }
+                if (offset !== input) {
+                    if (!keepLocalTime || this._changeInProgress) {
+                        addOrSubtractDurationFromMoment(this,
+                                moment.duration(offset - input, 'm'), 1, false);
+                    } else if (!this._changeInProgress) {
+                        this._changeInProgress = true;
+                        moment.updateOffset(this, true);
+                        this._changeInProgress = null;
+                    }
+                }
+            } else {
+                return this._isUTC ? offset : this._dateTzOffset();
+            }
+            return this;
+        },
+
+        zoneAbbr : function () {
+            return this._isUTC ? 'UTC' : '';
+        },
+
+        zoneName : function () {
+            return this._isUTC ? 'Coordinated Universal Time' : '';
+        },
+
+        parseZone : function () {
+            if (this._tzm) {
+                this.zone(this._tzm);
+            } else if (typeof this._i === 'string') {
+                this.zone(this._i);
+            }
+            return this;
+        },
+
+        hasAlignedHourOffset : function (input) {
+            if (!input) {
+                input = 0;
+            }
+            else {
+                input = moment(input).zone();
+            }
+
+            return (this.zone() - input) % 60 === 0;
+        },
+
+        daysInMonth : function () {
+            return daysInMonth(this.year(), this.month());
+        },
+
+        dayOfYear : function (input) {
+            var dayOfYear = round((moment(this).startOf('day') - moment(this).startOf('year')) / 864e5) + 1;
+            return input == null ? dayOfYear : this.add((input - dayOfYear), 'd');
+        },
+
+        quarter : function (input) {
+            return input == null ? Math.ceil((this.month() + 1) / 3) : this.month((input - 1) * 3 + this.month() % 3);
+        },
+
+        weekYear : function (input) {
+            var year = weekOfYear(this, this.localeData()._week.dow, this.localeData()._week.doy).year;
+            return input == null ? year : this.add((input - year), 'y');
+        },
+
+        isoWeekYear : function (input) {
+            var year = weekOfYear(this, 1, 4).year;
+            return input == null ? year : this.add((input - year), 'y');
+        },
+
+        week : function (input) {
+            var week = this.localeData().week(this);
+            return input == null ? week : this.add((input - week) * 7, 'd');
+        },
+
+        isoWeek : function (input) {
+            var week = weekOfYear(this, 1, 4).week;
+            return input == null ? week : this.add((input - week) * 7, 'd');
+        },
+
+        weekday : function (input) {
+            var weekday = (this.day() + 7 - this.localeData()._week.dow) % 7;
+            return input == null ? weekday : this.add(input - weekday, 'd');
+        },
+
+        isoWeekday : function (input) {
+            // behaves the same as moment#day except
+            // as a getter, returns 7 instead of 0 (1-7 range instead of 0-6)
+            // as a setter, sunday should belong to the previous week.
+            return input == null ? this.day() || 7 : this.day(this.day() % 7 ? input : input - 7);
+        },
+
+        isoWeeksInYear : function () {
+            return weeksInYear(this.year(), 1, 4);
+        },
+
+        weeksInYear : function () {
+            var weekInfo = this.localeData()._week;
+            return weeksInYear(this.year(), weekInfo.dow, weekInfo.doy);
+        },
+
+        get : function (units) {
+            units = normalizeUnits(units);
+            return this[units]();
+        },
+
+        set : function (units, value) {
+            units = normalizeUnits(units);
+            if (typeof this[units] === 'function') {
+                this[units](value);
+            }
+            return this;
+        },
+
+        // If passed a locale key, it will set the locale for this
+        // instance.  Otherwise, it will return the locale configuration
+        // variables for this instance.
+        locale : function (key) {
+            var newLocaleData;
+
+            if (key === undefined) {
+                return this._locale._abbr;
+            } else {
+                newLocaleData = moment.localeData(key);
+                if (newLocaleData != null) {
+                    this._locale = newLocaleData;
+                }
+                return this;
+            }
+        },
+
+        lang : deprecate(
+            'moment().lang() is deprecated. Instead, use moment().localeData() to get the language configuration. Use moment().locale() to change languages.',
+            function (key) {
+                if (key === undefined) {
+                    return this.localeData();
+                } else {
+                    return this.locale(key);
+                }
+            }
+        ),
+
+        localeData : function () {
+            return this._locale;
+        },
+
+        _dateTzOffset : function () {
+            // On Firefox.24 Date#getTimezoneOffset returns a floating point.
+            // https://github.com/moment/moment/pull/1871
+            return Math.round(this._d.getTimezoneOffset() / 15) * 15;
+        }
+    });
+
+    function rawMonthSetter(mom, value) {
+        var dayOfMonth;
+
+        // TODO: Move this out of here!
+        if (typeof value === 'string') {
+            value = mom.localeData().monthsParse(value);
+            // TODO: Another silent failure?
+            if (typeof value !== 'number') {
+                return mom;
+            }
+        }
+
+        dayOfMonth = Math.min(mom.date(),
+                daysInMonth(mom.year(), value));
+        mom._d['set' + (mom._isUTC ? 'UTC' : '') + 'Month'](value, dayOfMonth);
+        return mom;
+    }
+
+    function rawGetter(mom, unit) {
+        return mom._d['get' + (mom._isUTC ? 'UTC' : '') + unit]();
+    }
+
+    function rawSetter(mom, unit, value) {
+        if (unit === 'Month') {
+            return rawMonthSetter(mom, value);
+        } else {
+            return mom._d['set' + (mom._isUTC ? 'UTC' : '') + unit](value);
+        }
+    }
+
+    function makeAccessor(unit, keepTime) {
+        return function (value) {
+            if (value != null) {
+                rawSetter(this, unit, value);
+                moment.updateOffset(this, keepTime);
+                return this;
+            } else {
+                return rawGetter(this, unit);
+            }
+        };
+    }
+
+    moment.fn.millisecond = moment.fn.milliseconds = makeAccessor('Milliseconds', false);
+    moment.fn.second = moment.fn.seconds = makeAccessor('Seconds', false);
+    moment.fn.minute = moment.fn.minutes = makeAccessor('Minutes', false);
+    // Setting the hour should keep the time, because the user explicitly
+    // specified which hour he wants. So trying to maintain the same hour (in
+    // a new timezone) makes sense. Adding/subtracting hours does not follow
+    // this rule.
+    moment.fn.hour = moment.fn.hours = makeAccessor('Hours', true);
+    // moment.fn.month is defined separately
+    moment.fn.date = makeAccessor('Date', true);
+    moment.fn.dates = deprecate('dates accessor is deprecated. Use date instead.', makeAccessor('Date', true));
+    moment.fn.year = makeAccessor('FullYear', true);
+    moment.fn.years = deprecate('years accessor is deprecated. Use year instead.', makeAccessor('FullYear', true));
+
+    // add plural methods
+    moment.fn.days = moment.fn.day;
+    moment.fn.months = moment.fn.month;
+    moment.fn.weeks = moment.fn.week;
+    moment.fn.isoWeeks = moment.fn.isoWeek;
+    moment.fn.quarters = moment.fn.quarter;
+
+    // add aliased format methods
+    moment.fn.toJSON = moment.fn.toISOString;
+
+    /************************************
+        Duration Prototype
+    ************************************/
+
+
+    function daysToYears (days) {
+        // 400 years have 146097 days (taking into account leap year rules)
+        return days * 400 / 146097;
+    }
+
+    function yearsToDays (years) {
+        // years * 365 + absRound(years / 4) -
+        //     absRound(years / 100) + absRound(years / 400);
+        return years * 146097 / 400;
+    }
+
+    extend(moment.duration.fn = Duration.prototype, {
+
+        _bubble : function () {
+            var milliseconds = this._milliseconds,
+                days = this._days,
+                months = this._months,
+                data = this._data,
+                seconds, minutes, hours, years = 0;
+
+            // The following code bubbles up values, see the tests for
+            // examples of what that means.
+            data.milliseconds = milliseconds % 1000;
+
+            seconds = absRound(milliseconds / 1000);
+            data.seconds = seconds % 60;
+
+            minutes = absRound(seconds / 60);
+            data.minutes = minutes % 60;
+
+            hours = absRound(minutes / 60);
+            data.hours = hours % 24;
+
+            days += absRound(hours / 24);
+
+            // Accurately convert days to years, assume start from year 0.
+            years = absRound(daysToYears(days));
+            days -= absRound(yearsToDays(years));
+
+            // 30 days to a month
+            // TODO (iskren): Use anchor date (like 1st Jan) to compute this.
+            months += absRound(days / 30);
+            days %= 30;
+
+            // 12 months -> 1 year
+            years += absRound(months / 12);
+            months %= 12;
+
+            data.days = days;
+            data.months = months;
+            data.years = years;
+        },
+
+        abs : function () {
+            this._milliseconds = Math.abs(this._milliseconds);
+            this._days = Math.abs(this._days);
+            this._months = Math.abs(this._months);
+
+            this._data.milliseconds = Math.abs(this._data.milliseconds);
+            this._data.seconds = Math.abs(this._data.seconds);
+            this._data.minutes = Math.abs(this._data.minutes);
+            this._data.hours = Math.abs(this._data.hours);
+            this._data.months = Math.abs(this._data.months);
+            this._data.years = Math.abs(this._data.years);
+
+            return this;
+        },
+
+        weeks : function () {
+            return absRound(this.days() / 7);
+        },
+
+        valueOf : function () {
+            return this._milliseconds +
+              this._days * 864e5 +
+              (this._months % 12) * 2592e6 +
+              toInt(this._months / 12) * 31536e6;
+        },
+
+        humanize : function (withSuffix) {
+            var output = relativeTime(this, !withSuffix, this.localeData());
+
+            if (withSuffix) {
+                output = this.localeData().pastFuture(+this, output);
+            }
+
+            return this.localeData().postformat(output);
+        },
+
+        add : function (input, val) {
+            // supports only 2.0-style add(1, 's') or add(moment)
+            var dur = moment.duration(input, val);
+
+            this._milliseconds += dur._milliseconds;
+            this._days += dur._days;
+            this._months += dur._months;
+
+            this._bubble();
+
+            return this;
+        },
+
+        subtract : function (input, val) {
+            var dur = moment.duration(input, val);
+
+            this._milliseconds -= dur._milliseconds;
+            this._days -= dur._days;
+            this._months -= dur._months;
+
+            this._bubble();
+
+            return this;
+        },
+
+        get : function (units) {
+            units = normalizeUnits(units);
+            return this[units.toLowerCase() + 's']();
+        },
+
+        as : function (units) {
+            var days, months;
+            units = normalizeUnits(units);
+
+            if (units === 'month' || units === 'year') {
+                days = this._days + this._milliseconds / 864e5;
+                months = this._months + daysToYears(days) * 12;
+                return units === 'month' ? months : months / 12;
+            } else {
+                // handle milliseconds separately because of floating point math errors (issue #1867)
+                days = this._days + Math.round(yearsToDays(this._months / 12));
+                switch (units) {
+                    case 'week': return days / 7 + this._milliseconds / 6048e5;
+                    case 'day': return days + this._milliseconds / 864e5;
+                    case 'hour': return days * 24 + this._milliseconds / 36e5;
+                    case 'minute': return days * 24 * 60 + this._milliseconds / 6e4;
+                    case 'second': return days * 24 * 60 * 60 + this._milliseconds / 1000;
+                    // Math.floor prevents floating point math errors here
+                    case 'millisecond': return Math.floor(days * 24 * 60 * 60 * 1000) + this._milliseconds;
+                    default: throw new Error('Unknown unit ' + units);
+                }
+            }
+        },
+
+        lang : moment.fn.lang,
+        locale : moment.fn.locale,
+
+        toIsoString : deprecate(
+            'toIsoString() is deprecated. Please use toISOString() instead ' +
+            '(notice the capitals)',
+            function () {
+                return this.toISOString();
+            }
+        ),
+
+        toISOString : function () {
+            // inspired by https://github.com/dordille/moment-isoduration/blob/master/moment.isoduration.js
+            var years = Math.abs(this.years()),
+                months = Math.abs(this.months()),
+                days = Math.abs(this.days()),
+                hours = Math.abs(this.hours()),
+                minutes = Math.abs(this.minutes()),
+                seconds = Math.abs(this.seconds() + this.milliseconds() / 1000);
+
+            if (!this.asSeconds()) {
+                // this is the same as C#'s (Noda) and python (isodate)...
+                // but not other JS (goog.date)
+                return 'P0D';
+            }
+
+            return (this.asSeconds() < 0 ? '-' : '') +
+                'P' +
+                (years ? years + 'Y' : '') +
+                (months ? months + 'M' : '') +
+                (days ? days + 'D' : '') +
+                ((hours || minutes || seconds) ? 'T' : '') +
+                (hours ? hours + 'H' : '') +
+                (minutes ? minutes + 'M' : '') +
+                (seconds ? seconds + 'S' : '');
+        },
+
+        localeData : function () {
+            return this._locale;
+        }
+    });
+
+    moment.duration.fn.toString = moment.duration.fn.toISOString;
+
+    function makeDurationGetter(name) {
+        moment.duration.fn[name] = function () {
+            return this._data[name];
+        };
+    }
+
+    for (i in unitMillisecondFactors) {
+        if (hasOwnProp(unitMillisecondFactors, i)) {
+            makeDurationGetter(i.toLowerCase());
+        }
+    }
+
+    moment.duration.fn.asMilliseconds = function () {
+        return this.as('ms');
+    };
+    moment.duration.fn.asSeconds = function () {
+        return this.as('s');
+    };
+    moment.duration.fn.asMinutes = function () {
+        return this.as('m');
+    };
+    moment.duration.fn.asHours = function () {
+        return this.as('h');
+    };
+    moment.duration.fn.asDays = function () {
+        return this.as('d');
+    };
+    moment.duration.fn.asWeeks = function () {
+        return this.as('weeks');
+    };
+    moment.duration.fn.asMonths = function () {
+        return this.as('M');
+    };
+    moment.duration.fn.asYears = function () {
+        return this.as('y');
+    };
+
+    /************************************
+        Default Locale
+    ************************************/
+
+
+    // Set default locale, other locale will inherit from English.
+    moment.locale('en', {
+        ordinalParse: /\d{1,2}(th|st|nd|rd)/,
+        ordinal : function (number) {
+            var b = number % 10,
+                output = (toInt(number % 100 / 10) === 1) ? 'th' :
+                (b === 1) ? 'st' :
+                (b === 2) ? 'nd' :
+                (b === 3) ? 'rd' : 'th';
+            return number + output;
+        }
+    });
+
+    /* EMBED_LOCALES */
+
+    /************************************
+        Exposing Moment
+    ************************************/
+
+    function makeGlobal(shouldDeprecate) {
+        /*global ender:false */
+        if (typeof ender !== 'undefined') {
+            return;
+        }
+        oldGlobalMoment = globalScope.moment;
+        if (shouldDeprecate) {
+            globalScope.moment = deprecate(
+                    'Accessing Moment through the global scope is ' +
+                    'deprecated, and will be removed in an upcoming ' +
+                    'release.',
+                    moment);
+        } else {
+            globalScope.moment = moment;
+        }
+    }
+
+    // CommonJS module is defined
+    if (hasModule) {
+        module.exports = moment;
+    } else if (typeof define === 'function' && define.amd) {
+        define('moment', function (require, exports, module) {
+            if (module.config && module.config() && module.config().noGlobal === true) {
+                // release the global variable
+                globalScope.moment = oldGlobalMoment;
+            }
+
+            return moment;
+        });
+        makeGlobal(true);
+    } else {
+        makeGlobal();
+    }
+}).call(this);
diff --git a/docs/previous_versions/v0.4.0/libs/moment-fquarter-1.0.0/moment-fquarter.min.js b/docs/previous_versions/v0.4.0/libs/moment-fquarter-1.0.0/moment-fquarter.min.js
new file mode 100644
index 000000000..fb8bd90fa
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/moment-fquarter-1.0.0/moment-fquarter.min.js
@@ -0,0 +1 @@
+(function(){function n(n){return n.fn.fquarter=function(n){var u=this.lang()._quarter||"Q",t={},r,i=null;return n=n||4,n>1?(r=this.subtract("months",n-1),i=r.clone().add("years",1)):r=this,t.quarter=Math.ceil((r.month()+1)/3),t.year=r.year(),t.nextYear=i?i.year():i,t.toString=function(){var n=u+t.quarter+" "+t.year;return i?n+"/"+i.format("YY"):n},t},n}typeof define=="function"&&define.amd?define("moment-fquarter",["moment"],n):typeof module!="undefined"?module.exports=n(require("moment")):typeof window!="undefined"&&window.moment&&n(window.moment)}).apply(this);
\ No newline at end of file
diff --git a/docs/previous_versions/v0.4.0/libs/moment-timezone-0.2.5/moment-timezone-with-data.js b/docs/previous_versions/v0.4.0/libs/moment-timezone-0.2.5/moment-timezone-with-data.js
new file mode 100644
index 000000000..27718d62c
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/libs/moment-timezone-0.2.5/moment-timezone-with-data.js
@@ -0,0 +1,990 @@
+//! moment-timezone.js
+//! version : 0.2.5
+//! author : Tim Wood
+//! license : MIT
+//! github.com/moment/moment-timezone
+
+(function (root, factory) {
+	"use strict";
+
+	/*global define*/
+	if (typeof define === 'function' && define.amd) {
+		define(['moment'], factory);                 // AMD
+	} else if (typeof exports === 'object') {
+		module.exports = factory(require('moment')); // Node
+	} else {
+		factory(root.moment);                        // Browser
+	}
+}(this, function (moment) {
+	"use strict";
+
+	// Do not load moment-timezone a second time.
+	if (moment.tz !== undefined) { return moment; }
+
+	var VERSION = "0.2.5",
+		zones = {},
+		links = {},
+
+		momentVersion = moment.version.split('.'),
+		major = +momentVersion[0],
+		minor = +momentVersion[1];
+
+	// Moment.js version check
+	if (major < 2 || (major === 2 && minor < 6)) {
+		logError('Moment Timezone requires Moment.js >= 2.6.0. You are using Moment.js ' + moment.version + '. See momentjs.com');
+	}
+
+	/************************************
+		Unpacking
+	************************************/
+
+	function charCodeToInt(charCode) {
+		if (charCode > 96) {
+			return charCode - 87;
+		} else if (charCode > 64) {
+			return charCode - 29;
+		}
+		return charCode - 48;
+	}
+
+	function unpackBase60(string) {
+		var i = 0,
+			parts = string.split('.'),
+			whole = parts[0],
+			fractional = parts[1] || '',
+			multiplier = 1,
+			num,
+			out = 0,
+			sign = 1;
+
+		// handle negative numbers
+		if (string.charCodeAt(0) === 45) {
+			i = 1;
+			sign = -1;
+		}
+
+		// handle digits before the decimal
+		for (i; i < whole.length; i++) {
+			num = charCodeToInt(whole.charCodeAt(i));
+			out = 60 * out + num;
+		}
+
+		// handle digits after the decimal
+		for (i = 0; i < fractional.length; i++) {
+			multiplier = multiplier / 60;
+			num = charCodeToInt(fractional.charCodeAt(i));
+			out += num * multiplier;
+		}
+
+		return out * sign;
+	}
+
+	function arrayToInt (array) {
+		for (var i = 0; i < array.length; i++) {
+			array[i] = unpackBase60(array[i]);
+		}
+	}
+
+	function intToUntil (array, length) {
+		for (var i = 0; i < length; i++) {
+			array[i] = Math.round((array[i - 1] || 0) + (array[i] * 60000)); // minutes to milliseconds
+		}
+
+		array[length - 1] = Infinity;
+	}
+
+	function mapIndices (source, indices) {
+		var out = [], i;
+
+		for (i = 0; i < indices.length; i++) {
+			out[i] = source[indices[i]];
+		}
+
+		return out;
+	}
+
+	function unpack (string) {
+		var data = string.split('|'),
+			offsets = data[2].split(' '),
+			indices = data[3].split(''),
+			untils  = data[4].split(' ');
+
+		arrayToInt(offsets);
+		arrayToInt(indices);
+		arrayToInt(untils);
+
+		intToUntil(untils, indices.length);
+
+		return {
+			name    : data[0],
+			abbrs   : mapIndices(data[1].split(' '), indices),
+			offsets : mapIndices(offsets, indices),
+			untils  : untils
+		};
+	}
+
+	/************************************
+		Zone object
+	************************************/
+
+	function Zone (packedString) {
+		if (packedString) {
+			this._set(unpack(packedString));
+		}
+	}
+
+	Zone.prototype = {
+		_set : function (unpacked) {
+			this.name    = unpacked.name;
+			this.abbrs   = unpacked.abbrs;
+			this.untils  = unpacked.untils;
+			this.offsets = unpacked.offsets;
+		},
+
+		_index : function (timestamp) {
+			var target = +timestamp,
+				untils = this.untils,
+				i;
+
+			for (i = 0; i < untils.length; i++) {
+				if (target < untils[i]) {
+					return i;
+				}
+			}
+		},
+
+		parse : function (timestamp) {
+			var target  = +timestamp,
+				offsets = this.offsets,
+				untils  = this.untils,
+				max     = untils.length - 1,
+				offset, offsetNext, offsetPrev, i;
+
+			for (i = 0; i < max; i++) {
+				offset     = offsets[i];
+				offsetNext = offsets[i + 1];
+				offsetPrev = offsets[i ? i - 1 : i];
+
+				if (offset < offsetNext && tz.moveAmbiguousForward) {
+					offset = offsetNext;
+				} else if (offset > offsetPrev && tz.moveInvalidForward) {
+					offset = offsetPrev;
+				}
+
+				if (target < untils[i] - (offset * 60000)) {
+					return offsets[i];
+				}
+			}
+
+			return offsets[max];
+		},
+
+		abbr : function (mom) {
+			return this.abbrs[this._index(mom)];
+		},
+
+		offset : function (mom) {
+			return this.offsets[this._index(mom)];
+		}
+	};
+
+	/************************************
+		Global Methods
+	************************************/
+
+	function normalizeName (name) {
+		return (name || '').toLowerCase().replace(/\//g, '_');
+	}
+
+	function addZone (packed) {
+		var i, zone, zoneName;
+
+		if (typeof packed === "string") {
+			packed = [packed];
+		}
+
+		for (i = 0; i < packed.length; i++) {
+			zone = new Zone(packed[i]);
+			zoneName = normalizeName(zone.name);
+			zones[zoneName] = zone;
+			upgradeLinksToZones(zoneName);
+		}
+	}
+
+	function getZone (name) {
+		return zones[normalizeName(name)] || null;
+	}
+
+	function getNames () {
+		var i, out = [];
+
+		for (i in zones) {
+			if (zones.hasOwnProperty(i) && zones[i]) {
+				out.push(zones[i].name);
+			}
+		}
+
+		return out.sort();
+	}
+
+	function addLink (aliases) {
+		var i, alias;
+
+		if (typeof aliases === "string") {
+			aliases = [aliases];
+		}
+
+		for (i = 0; i < aliases.length; i++) {
+			alias = aliases[i].split('|');
+			pushLink(alias[0], alias[1]);
+			pushLink(alias[1], alias[0]);
+		}
+	}
+
+	function upgradeLinksToZones (zoneName) {
+		if (!links[zoneName]) {
+			return;
+		}
+
+		var i,
+			zone = zones[zoneName],
+			linkNames = links[zoneName];
+
+		for (i = 0; i < linkNames.length; i++) {
+			copyZoneWithName(zone, linkNames[i]);
+		}
+
+		links[zoneName] = null;
+	}
+
+	function copyZoneWithName (zone, name) {
+		var linkZone = zones[normalizeName(name)] = new Zone();
+		linkZone._set(zone);
+		linkZone.name = name;
+	}
+
+	function pushLink (zoneName, linkName) {
+		zoneName = normalizeName(zoneName);
+
+		if (zones[zoneName]) {
+			copyZoneWithName(zones[zoneName], linkName);
+		} else {
+			links[zoneName] = links[zoneName] || [];
+			links[zoneName].push(linkName);
+		}
+	}
+
+	function loadData (data) {
+		addZone(data.zones);
+		addLink(data.links);
+		tz.dataVersion = data.version;
+	}
+
+	function zoneExists (name) {
+		if (!zoneExists.didShowError) {
+			zoneExists.didShowError = true;
+				logError("moment.tz.zoneExists('" + name + "') has been deprecated in favor of !moment.tz.zone('" + name + "')");
+		}
+		return !!getZone(name);
+	}
+
+	function needsOffset (m) {
+		return !!(m._a && (m._tzm === undefined));
+	}
+
+	function logError (message) {
+		if (typeof console !== 'undefined' && typeof console.error === 'function') {
+			console.error(message);
+		}
+	}
+
+	/************************************
+		moment.tz namespace
+	************************************/
+
+	function tz () {
+		var args = Array.prototype.slice.call(arguments, 0, -1),
+			name = arguments[arguments.length - 1],
+			zone = getZone(name),
+			out  = moment.utc.apply(null, args);
+
+		if (zone && needsOffset(out)) {
+			out.add(zone.parse(out), 'minutes');
+		}
+
+		out.tz(name);
+
+		return out;
+	}
+
+	tz.version      = VERSION;
+	tz.dataVersion  = '';
+	tz._zones       = zones;
+	tz._links       = links;
+	tz.add          = addZone;
+	tz.link         = addLink;
+	tz.load         = loadData;
+	tz.zone         = getZone;
+	tz.zoneExists   = zoneExists; // deprecated in 0.1.0
+	tz.names        = getNames;
+	tz.Zone         = Zone;
+	tz.unpack       = unpack;
+	tz.unpackBase60 = unpackBase60;
+	tz.needsOffset  = needsOffset;
+	tz.moveInvalidForward   = true;
+	tz.moveAmbiguousForward = false;
+
+	/************************************
+		Interface with Moment.js
+	************************************/
+
+	var fn = moment.fn;
+
+	moment.tz = tz;
+
+	moment.updateOffset = function (mom, keepTime) {
+		var offset;
+		if (mom._z) {
+			offset = mom._z.offset(mom);
+			if (Math.abs(offset) < 16) {
+				offset = offset / 60;
+			}
+			mom.zone(offset, keepTime);
+		}
+	};
+
+	fn.tz = function (name) {
+		if (name) {
+			this._z = getZone(name);
+			if (this._z) {
+				moment.updateOffset(this);
+			} else {
+				logError("Moment Timezone has no data for " + name + ". See http://momentjs.com/timezone/docs/#/data-loading/.");
+			}
+			return this;
+		}
+		if (this._z) { return this._z.name; }
+	};
+
+	function abbrWrap (old) {
+		return function () {
+			if (this._z) { return this._z.abbr(this); }
+			return old.call(this);
+		};
+	}
+
+	function resetZoneWrap (old) {
+		return function () {
+			this._z = null;
+			return old.apply(this, arguments);
+		};
+	}
+
+	fn.zoneName = abbrWrap(fn.zoneName);
+	fn.zoneAbbr = abbrWrap(fn.zoneAbbr);
+	fn.utc      = resetZoneWrap(fn.utc);
+
+	// Cloning a moment should include the _z property.
+	var momentProperties = moment.momentProperties;
+	if (Object.prototype.toString.call(momentProperties) === '[object Array]') {
+		// moment 2.8.1+
+		momentProperties.push('_z');
+		momentProperties.push('_a');
+	} else if (momentProperties) {
+		// moment 2.7.0
+		momentProperties._z = null;
+	}
+
+	loadData({
+		"version": "2014j",
+		"zones": [
+			"Africa/Abidjan|LMT GMT|g.8 0|01|-2ldXH.Q",
+			"Africa/Accra|LMT GMT GHST|.Q 0 -k|012121212121212121212121212121212121212121212121|-26BbX.8 6tzX.8 MnE 1BAk MnE 1BAk MnE 1BAk MnE 1C0k MnE 1BAk MnE 1BAk MnE 1BAk MnE 1C0k MnE 1BAk MnE 1BAk MnE 1BAk MnE 1C0k MnE 1BAk MnE 1BAk MnE 1BAk MnE 1C0k MnE 1BAk MnE 1BAk MnE 1BAk MnE 1C0k MnE 1BAk MnE 1BAk MnE",
+			"Africa/Addis_Ababa|LMT EAT BEAT BEAUT|-2r.g -30 -2u -2J|01231|-1F3Cr.g 3Dzr.g okMu MFXJ",
+			"Africa/Algiers|PMT WET WEST CET CEST|-9.l 0 -10 -10 -20|0121212121212121343431312123431213|-2nco9.l cNb9.l HA0 19A0 1iM0 11c0 1oo0 Wo0 1rc0 QM0 1EM0 UM0 DA0 Imo0 rd0 De0 9Xz0 1fb0 1ap0 16K0 2yo0 mEp0 hwL0 jxA0 11A0 dDd0 17b0 11B0 1cN0 2Dy0 1cN0 1fB0 1cL0",
+			"Africa/Bangui|LMT WAT|-d.A -10|01|-22y0d.A",
+			"Africa/Bissau|LMT WAT GMT|12.k 10 0|012|-2ldWV.E 2xonV.E",
+			"Africa/Blantyre|LMT CAT|-2a.k -20|01|-2GJea.k",
+			"Africa/Cairo|EET EEST|-20 -30|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-1bIO0 vb0 1ip0 11z0 1iN0 1nz0 12p0 1pz0 10N0 1pz0 16p0 1jz0 s3d0 Vz0 1oN0 11b0 1oO0 10N0 1pz0 10N0 1pb0 10N0 1pb0 10N0 1pb0 10N0 1pz0 10N0 1pb0 10N0 1pb0 11d0 1oL0 11d0 1pb0 11d0 1oL0 11d0 1oL0 11d0 1oL0 11d0 1pb0 11d0 1oL0 11d0 1oL0 11d0 1oL0 11d0 1pb0 11d0 1oL0 11d0 1oL0 11d0 1oL0 11d0 1pb0 11d0 1oL0 11d0 1WL0 rd0 1Rz0 wp0 1pb0 11d0 1oL0 11d0 1oL0 11d0 1oL0 11d0 1pb0 11d0 1qL0 Xd0 1oL0 11d0 1oL0 11d0 1pb0 11d0 1oL0 11d0 1oL0 11d0 1ny0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 WL0 1qN0 Rb0 1wp0 On0 1zd0 Lz0 1EN0 Fb0 c10 8n0 8Nd0 gL0 e10 mn0 1o10 jz0 gN0 pb0 1qN0 dX0 e10 xz0 1o10 bb0 e10 An0 1o10 5z0 e10 FX0 1o10 2L0 e10 IL0 1C10 Lz0 1wp0 TX0 1qN0 WL0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0",
+			"Africa/Casablanca|LMT WET WEST CET|u.k 0 -10 -10|012121212121212121312121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2gMnt.E 130Lt.E rb0 Dd0 dVb0 b6p0 TX0 EoB0 LL0 gnd0 rz0 43d0 AL0 1Nd0 XX0 1Cp0 pz0 dEp0 4mn0 SyN0 AL0 1Nd0 wn0 1FB0 Db0 1zd0 Lz0 1Nf0 wM0 co0 go0 1o00 s00 dA0 vc0 11A0 A00 e00 y00 11A0 uo0 e00 DA0 11A0 rA0 e00 Jc0 WM0 m00 gM0 M00 WM0 jc0 e00 RA0 11A0 dA0 e00 Uo0 11A0 800 gM0 Xc0 11A0 5c0 e00 17A0 WM0 2o0 e00 1ao0 19A0 1g00 16M0 1iM0 1400 1lA0 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qo0 1200 1kM0 14M0 1i00",
+			"Africa/Ceuta|WET WEST CET CEST|0 -10 -10 -20|010101010101010101010232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-25KN0 11z0 drd0 18o0 3I00 17c0 1fA0 1a00 1io0 1a00 1y7p0 LL0 gnd0 rz0 43d0 AL0 1Nd0 XX0 1Cp0 pz0 dEp0 4VB0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Africa/El_Aaiun|LMT WAT WET WEST|Q.M 10 0 -10|0123232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-1rDz7.c 1GVA7.c 6L0 AL0 1Nd0 XX0 1Cp0 pz0 1cBB0 AL0 1Nd0 wn0 1FB0 Db0 1zd0 Lz0 1Nf0 wM0 co0 go0 1o00 s00 dA0 vc0 11A0 A00 e00 y00 11A0 uo0 e00 DA0 11A0 rA0 e00 Jc0 WM0 m00 gM0 M00 WM0 jc0 e00 RA0 11A0 dA0 e00 Uo0 11A0 800 gM0 Xc0 11A0 5c0 e00 17A0 WM0 2o0 e00 1ao0 19A0 1g00 16M0 1iM0 1400 1lA0 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qo0 1200 1kM0 14M0 1i00",
+			"Africa/Johannesburg|SAST SAST SAST|-1u -20 -30|012121|-2GJdu 1Ajdu 1cL0 1cN0 1cL0",
+			"Africa/Juba|LMT CAT CAST EAT|-2a.8 -20 -30 -30|01212121212121212121212121212121213|-1yW2a.8 1zK0a.8 16L0 1iN0 17b0 1jd0 17b0 1ip0 17z0 1i10 17X0 1hB0 18n0 1hd0 19b0 1gp0 19z0 1iN0 17b0 1ip0 17z0 1i10 18n0 1hd0 18L0 1gN0 19b0 1gp0 19z0 1iN0 17z0 1i10 17X0 yGd0",
+			"Africa/Monrovia|MMT LRT GMT|H.8 I.u 0|012|-23Lzg.Q 29s01.m",
+			"Africa/Ndjamena|LMT WAT WAST|-10.c -10 -20|0121|-2le10.c 2J3c0.c Wn0",
+			"Africa/Tripoli|LMT CET CEST EET|-Q.I -10 -20 -20|012121213121212121212121213123123|-21JcQ.I 1hnBQ.I vx0 4iP0 xx0 4eN0 Bb0 7ip0 U0n0 A10 1db0 1cN0 1db0 1dd0 1db0 1eN0 1bb0 1e10 1cL0 1c10 1db0 1dd0 1db0 1cN0 1db0 1q10 fAn0 1ep0 1db0 AKq0 TA0 1o00",
+			"Africa/Tunis|PMT CET CEST|-9.l -10 -20|0121212121212121212121212121212121|-2nco9.l 18pa9.l 1qM0 DA0 3Tc0 11B0 1ze0 WM0 7z0 3d0 14L0 1cN0 1f90 1ar0 16J0 1gXB0 WM0 1rA0 11c0 nwo0 Ko0 1cM0 1cM0 1rA0 10M0 zuM0 10N0 1aN0 1qM0 WM0 1qM0 11A0 1o00",
+			"Africa/Windhoek|SWAT SAST SAST CAT WAT WAST|-1u -20 -30 -20 -10 -20|012134545454545454545454545454545454545454545454545454545454545454545454545454545454545454545|-2GJdu 1Ajdu 1cL0 1SqL0 9NA0 11D0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 11B0 1nX0 11B0",
+			"America/Adak|NST NWT NPT BST BDT AHST HAST HADT|b0 a0 a0 b0 a0 a0 a0 90|012034343434343434343434343434343456767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676|-17SX0 8wW0 iB0 Qlb0 52O0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 cm0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Anchorage|CAT CAWT CAPT AHST AHDT YST AKST AKDT|a0 90 90 a0 90 90 90 80|012034343434343434343434343434343456767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676|-17T00 8wX0 iA0 Qlb0 52O0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 cm0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Anguilla|LMT AST|46.4 40|01|-2kNvR.U",
+			"America/Antigua|LMT EST AST|47.c 50 40|012|-2kNvQ.M 1yxAQ.M",
+			"America/Araguaina|LMT BRT BRST|3c.M 30 20|0121212121212121212121212121212121212121212121212121|-2glwL.c HdKL.c 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 dMN0 Lz0 1zd0 Rb0 1wN0 Wn0 1tB0 Rb0 1tB0 WL0 1tB0 Rb0 1zd0 On0 1HB0 FX0 ny10 Lz0",
+			"America/Argentina/Buenos_Aires|CMT ART ARST ART ARST|4g.M 40 30 30 20|0121212121212121212121212121212121212121213434343434343234343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wp0 Rb0 1wp0 TX0 g0p0 10M0 j3c0 uL0 1qN0 WL0",
+			"America/Argentina/Catamarca|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|0121212121212121212121212121212121212121213434343454343235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wq0 Ra0 1wp0 TX0 g0p0 10M0 ako0 7B0 8zb0 uL0",
+			"America/Argentina/Cordoba|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|0121212121212121212121212121212121212121213434343454343234343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wq0 Ra0 1wp0 TX0 g0p0 10M0 j3c0 uL0 1qN0 WL0",
+			"America/Argentina/Jujuy|CMT ART ARST ART ARST WART WARST|4g.M 40 30 30 20 40 30|01212121212121212121212121212121212121212134343456543432343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1ze0 TX0 1ld0 WK0 1wp0 TX0 g0p0 10M0 j3c0 uL0",
+			"America/Argentina/La_Rioja|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|01212121212121212121212121212121212121212134343434534343235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Qn0 qO0 16n0 Rb0 1wp0 TX0 g0p0 10M0 ako0 7B0 8zb0 uL0",
+			"America/Argentina/Mendoza|CMT ART ARST ART ARST WART WARST|4g.M 40 30 30 20 40 30|0121212121212121212121212121212121212121213434345656543235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1u20 SL0 1vd0 Tb0 1wp0 TW0 g0p0 10M0 agM0 Op0 7TX0 uL0",
+			"America/Argentina/Rio_Gallegos|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|0121212121212121212121212121212121212121213434343434343235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wp0 Rb0 1wp0 TX0 g0p0 10M0 ako0 7B0 8zb0 uL0",
+			"America/Argentina/Salta|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|01212121212121212121212121212121212121212134343434543432343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wq0 Ra0 1wp0 TX0 g0p0 10M0 j3c0 uL0",
+			"America/Argentina/San_Juan|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|01212121212121212121212121212121212121212134343434534343235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Qn0 qO0 16n0 Rb0 1wp0 TX0 g0p0 10M0 ak00 m10 8lb0 uL0",
+			"America/Argentina/San_Luis|CMT ART ARST ART ARST WART WARST|4g.M 40 30 30 20 40 30|01212121212121212121212121212121212121212134343456536353465653|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 XX0 1q20 SL0 AN0 kin0 10M0 ak00 m10 8lb0 8L0 jd0 1qN0 WL0 1qN0",
+			"America/Argentina/Tucuman|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|012121212121212121212121212121212121212121343434345434323534343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wq0 Ra0 1wp0 TX0 g0p0 10M0 ako0 4N0 8BX0 uL0 1qN0 WL0",
+			"America/Argentina/Ushuaia|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|0121212121212121212121212121212121212121213434343434343235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wp0 Rb0 1wp0 TX0 g0p0 10M0 ajA0 8p0 8zb0 uL0",
+			"America/Aruba|LMT ANT AST|4z.L 4u 40|012|-2kV7o.d 28KLS.d",
+			"America/Asuncion|AMT PYT PYT PYST|3O.E 40 30 30|012131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313|-1x589.k 1DKM9.k 3CL0 3Dd0 10L0 1pB0 10n0 1pB0 10n0 1pB0 1cL0 1dd0 1db0 1dd0 1cL0 1dd0 1cL0 1dd0 1cL0 1dd0 1db0 1dd0 1cL0 1dd0 1cL0 1dd0 1cL0 1dd0 1db0 1dd0 1cL0 1lB0 14n0 1dd0 1cL0 1fd0 WL0 1rd0 1aL0 1dB0 Xz0 1qp0 Xb0 1qN0 10L0 1rB0 TX0 1tB0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 1cL0 WN0 1qL0 11B0 1nX0 1ip0 WL0 1qN0 WL0 1qN0 WL0 1tB0 TX0 1tB0 TX0 1tB0 19X0 1a10 1fz0 1a10 1fz0 1cN0 17b0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1fB0 19X0 1fB0 19X0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1fB0 19X0 1fB0 19X0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1ip0 17b0 1ip0 17b0 1ip0",
+			"America/Atikokan|CST CDT CWT CPT EST|60 50 50 50 50|0101234|-25TQ0 1in0 Rnb0 3je0 8x30 iw0",
+			"America/Bahia|LMT BRT BRST|2y.4 30 20|01212121212121212121212121212121212121212121212121212121212121|-2glxp.U HdLp.U 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 1EN0 Lz0 1C10 IL0 1HB0 Db0 1HB0 On0 1zd0 On0 1zd0 Lz0 1zd0 Rb0 1wN0 Wn0 1tB0 Rb0 1tB0 WL0 1tB0 Rb0 1zd0 On0 1HB0 FX0 l5B0 Rb0",
+			"America/Bahia_Banderas|LMT MST CST PST MDT CDT|71 70 60 80 60 50|0121212131414141414141414141414141414152525252525252525252525252525252525252525252525252525252|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 otX0 gmN0 P2N0 13Vd0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nW0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Barbados|LMT BMT AST ADT|3W.t 3W.t 40 30|01232323232|-1Q0I1.v jsM0 1ODC1.v IL0 1ip0 17b0 1ip0 17b0 1ld0 13b0",
+			"America/Belem|LMT BRT BRST|3d.U 30 20|012121212121212121212121212121|-2glwK.4 HdKK.4 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0",
+			"America/Belize|LMT CST CHDT CDT|5Q.M 60 5u 50|01212121212121212121212121212121212121212121212121213131|-2kBu7.c fPA7.c Onu 1zcu Rbu 1wou Rbu 1wou Rbu 1zcu Onu 1zcu Onu 1zcu Rbu 1wou Rbu 1wou Rbu 1wou Rbu 1zcu Onu 1zcu Onu 1zcu Rbu 1wou Rbu 1wou Rbu 1zcu Onu 1zcu Onu 1zcu Onu 1zcu Rbu 1wou Rbu 1wou Rbu 1zcu Onu 1zcu Onu 1zcu Rbu 1wou Rbu 1f0Mu qn0 lxB0 mn0",
+			"America/Blanc-Sablon|AST ADT AWT APT|40 30 30 30|010230|-25TS0 1in0 UGp0 8x50 iu0",
+			"America/Boa_Vista|LMT AMT AMST|42.E 40 30|0121212121212121212121212121212121|-2glvV.k HdKV.k 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 smp0 WL0 1tB0 2L0",
+			"America/Bogota|BMT COT COST|4U.g 50 40|0121|-2eb73.I 38yo3.I 2en0",
+			"America/Boise|PST PDT MST MWT MPT MDT|80 70 70 60 60 60|0101023425252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252|-261q0 1nX0 11B0 1nX0 8C10 JCL0 8x20 ix0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 Dd0 1Kn0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Cambridge_Bay|zzz MST MWT MPT MDDT MDT CST CDT EST|0 70 60 60 50 60 60 50 50|0123141515151515151515151515151515151515151515678651515151515151515151515151515151515151515151515151515151515151515151515151|-21Jc0 RO90 8x20 ix0 LCL0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11A0 1nX0 2K0 WQ0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Campo_Grande|LMT AMT AMST|3C.s 40 30|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212|-2glwl.w HdLl.w 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 1EN0 Lz0 1C10 IL0 1HB0 Db0 1HB0 On0 1zd0 On0 1zd0 Lz0 1zd0 Rb0 1wN0 Wn0 1tB0 Rb0 1tB0 WL0 1tB0 Rb0 1zd0 On0 1HB0 FX0 1C10 Lz0 1Ip0 HX0 1zd0 On0 1HB0 IL0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1zd0 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0",
+			"America/Cancun|LMT CST EST EDT CDT|5L.4 60 50 40 50|0123232341414141414141414141414141414141414141414141414141414141414141414141414141414141|-1UQG0 2q2o0 yLB0 1lb0 14p0 1lb0 14p0 Lz0 xB0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Caracas|CMT VET VET|4r.E 4u 40|0121|-2kV7w.k 28KM2.k 1IwOu",
+			"America/Cayenne|LMT GFT GFT|3t.k 40 30|012|-2mrwu.E 2gWou.E",
+			"America/Cayman|KMT EST|57.b 50|01|-2l1uQ.N",
+			"America/Chicago|CST CDT EST CWT CPT|60 50 50 50 50|01010101010101010101010101010101010102010101010103401010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261s0 1nX0 11B0 1nX0 1wp0 TX0 WN0 1qL0 1cN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 11B0 1Hz0 14p0 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 RB0 8x30 iw0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Chihuahua|LMT MST CST CDT MDT|74.k 70 60 50 60|0121212323241414141414141414141414141414141414141414141414141414141414141414141414141414141|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 2zQN0 1lb0 14p0 1lb0 14q0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Costa_Rica|SJMT CST CDT|5A.d 60 50|0121212121|-1Xd6n.L 2lu0n.L Db0 1Kp0 Db0 pRB0 15b0 1kp0 mL0",
+			"America/Creston|MST PST|70 80|010|-29DR0 43B0",
+			"America/Cuiaba|LMT AMT AMST|3I.k 40 30|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212|-2glwf.E HdLf.E 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 1EN0 Lz0 1C10 IL0 1HB0 Db0 1HB0 On0 1zd0 On0 1zd0 Lz0 1zd0 Rb0 1wN0 Wn0 1tB0 Rb0 1tB0 WL0 1tB0 Rb0 1zd0 On0 1HB0 FX0 4a10 HX0 1zd0 On0 1HB0 IL0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1zd0 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0",
+			"America/Danmarkshavn|LMT WGT WGST GMT|1e.E 30 20 0|01212121212121212121212121212121213|-2a5WJ.k 2z5fJ.k 19U0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 DC0",
+			"America/Dawson|YST YDT YWT YPT YDDT PST PDT|90 80 80 80 70 80 70|0101023040565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565|-25TN0 1in0 1o10 13V0 Ser0 8x00 iz0 LCL0 1fA0 jrA0 fNd0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Dawson_Creek|PST PDT PWT PPT MST|80 70 70 70 70|0102301010101010101010101010101010101010101010101010101014|-25TO0 1in0 UGp0 8x10 iy0 3NB0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 ML0",
+			"America/Denver|MST MDT MWT MPT|70 60 60 60|01010101023010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261r0 1nX0 11B0 1nX0 11B0 1qL0 WN0 mn0 Ord0 8x20 ix0 LCN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Detroit|LMT CST EST EWT EPT EDT|5w.b 60 50 40 40 40|01234252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252|-2Cgir.N peqr.N 156L0 8x40 iv0 6fd0 11z0 Jy10 SL0 dnB0 1cL0 s10 1Vz0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Edmonton|LMT MST MDT MWT MPT|7x.Q 70 60 60 60|01212121212121341212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2yd4q.8 shdq.8 1in0 17d0 hz0 2dB0 1fz0 1a10 11z0 1qN0 WL0 1qN0 11z0 IGN0 8x20 ix0 3NB0 11z0 LFB0 1cL0 3Cp0 1cL0 66N0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Eirunepe|LMT ACT ACST AMT|4D.s 50 40 40|0121212121212121212121212121212131|-2glvk.w HdLk.w 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 dPB0 On0 yTd0 d5X0",
+			"America/El_Salvador|LMT CST CDT|5U.M 60 50|012121|-1XiG3.c 2Fvc3.c WL0 1qN0 WL0",
+			"America/Ensenada|LMT MST PST PDT PWT PPT|7M.4 70 80 70 70 70|012123245232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-1UQE0 4PX0 8mM0 8lc0 SN0 1cL0 pHB0 83r0 zI0 5O10 1Rz0 cOP0 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 BUp0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 U10 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Fort_Wayne|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|010101023010101010101010101040454545454545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 QI10 Db0 RB0 8x30 iw0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 5Tz0 1o10 qLb0 1cL0 1cN0 1cL0 1qhd0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Fortaleza|LMT BRT BRST|2y 30 20|0121212121212121212121212121212121212121|-2glxq HdLq 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 nsp0 WL0 1tB0 5z0 2mN0 On0",
+			"America/Glace_Bay|LMT AST ADT AWT APT|3X.M 40 30 30 30|012134121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2IsI0.c CwO0.c 1in0 UGp0 8x50 iu0 iq10 11z0 Jg10 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Godthab|LMT WGT WGST|3q.U 30 20|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2a5Ux.4 2z5dx.4 19U0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"America/Goose_Bay|NST NDT NST NDT NWT NPT AST ADT ADDT|3u.Q 2u.Q 3u 2u 2u 2u 40 30 20|010232323232323245232323232323232323232323232323232323232326767676767676767676767676767676767676767676768676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676|-25TSt.8 1in0 DXb0 2HbX.8 WL0 1qN0 WL0 1qN0 WL0 1tB0 TX0 1tB0 WL0 1qN0 WL0 1qN0 7UHu itu 1tB0 WL0 1qN0 WL0 1qN0 WL0 1qN0 WL0 1tB0 WL0 1ld0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 S10 g0u 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14n1 1lb0 14p0 1nW0 11C0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zcX Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Grand_Turk|KMT EST EDT AST|57.b 50 40 40|0121212121212121212121212121212121212121212121212121212121212121212121212123|-2l1uQ.N 2HHBQ.N 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Guatemala|LMT CST CDT|62.4 60 50|0121212121|-24KhV.U 2efXV.U An0 mtd0 Nz0 ifB0 17b0 zDB0 11z0",
+			"America/Guayaquil|QMT ECT|5e 50|01|-1yVSK",
+			"America/Guyana|LMT GBGT GYT GYT GYT|3Q.E 3J 3J 30 40|01234|-2dvU7.k 24JzQ.k mlc0 Bxbf",
+			"America/Halifax|LMT AST ADT AWT APT|4e.o 40 30 30 30|0121212121212121212121212121212121212121212121212134121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2IsHJ.A xzzJ.A 1db0 3I30 1in0 3HX0 IL0 1E10 ML0 1yN0 Pb0 1Bd0 Mn0 1Bd0 Rz0 1w10 Xb0 1w10 LX0 1w10 Xb0 1w10 Lz0 1C10 Jz0 1E10 OL0 1yN0 Un0 1qp0 Xb0 1qp0 11X0 1w10 Lz0 1HB0 LX0 1C10 FX0 1w10 Xb0 1qp0 Xb0 1BB0 LX0 1td0 Xb0 1qp0 Xb0 Rf0 8x50 iu0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 3Qp0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 3Qp0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 6i10 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Havana|HMT CST CDT|5t.A 50 40|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1Meuu.o 72zu.o ML0 sld0 An0 1Nd0 Db0 1Nd0 An0 6Ep0 An0 1Nd0 An0 JDd0 Mn0 1Ap0 On0 1fd0 11X0 1qN0 WL0 1wp0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 14n0 1ld0 14L0 1kN0 15b0 1kp0 1cL0 1cN0 1fz0 1a10 1fz0 1fB0 11z0 14p0 1nX0 11B0 1nX0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 1a10 1in0 1a10 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 17c0 1o00 11A0 1qM0 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 11A0 6i00 Rc0 1wo0 U00 1tA0 Rc0 1wo0 U00 1wo0 U00 1zc0 U00 1qM0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0",
+			"America/Hermosillo|LMT MST CST PST MDT|7n.Q 70 60 80 60|0121212131414141|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 otX0 gmN0 P2N0 13Vd0 1lb0 14p0 1lb0 14p0 1lb0",
+			"America/Indiana/Knox|CST CDT CWT CPT EST|60 50 50 50 50|0101023010101010101010101010101010101040101010101010101010101010101010101010101010101010141010101010101010101010101010101010101010101010101010101010101010|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 3NB0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 11z0 1o10 11z0 1o10 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 3Cn0 8wp0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 z8o0 1o00 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Marengo|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|0101023010101010101010104545454545414545454545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 dyN0 11z0 6fd0 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 jrz0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1VA0 LA0 1BX0 1e6p0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Petersburg|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|01010230101010101010101010104010101010101010101010141014545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 njX0 WN0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 3Fb0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 19co0 1o00 Rd0 1zb0 Oo0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Tell_City|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|01010230101010101010101010101010454541010101010101010101010101010101010101010101010101010101010101010|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 1o10 11z0 g0p0 11z0 1o10 11z0 1qL0 WN0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 WL0 1qN0 1cL0 1cN0 1cL0 1cN0 caL0 1cL0 1cN0 1cL0 1qhd0 1o00 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Vevay|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|010102304545454545454545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 kPB0 Awn0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1lnd0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Vincennes|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|01010230101010101010101010101010454541014545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 1o10 11z0 g0p0 11z0 1o10 11z0 1qL0 WN0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 WL0 1qN0 1cL0 1cN0 1cL0 1cN0 caL0 1cL0 1cN0 1cL0 1qhd0 1o00 Rd0 1zb0 Oo0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Winamac|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|01010230101010101010101010101010101010454541054545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 jrz0 1cL0 1cN0 1cL0 1qhd0 1o00 Rd0 1za0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Inuvik|zzz PST PDDT MST MDT|0 80 60 70 60|0121343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343|-FnA0 tWU0 1fA0 wPe0 2pz0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Iqaluit|zzz EWT EPT EST EDDT EDT CST CDT|0 40 40 50 30 40 60 50|01234353535353535353535353535353535353535353567353535353535353535353535353535353535353535353535353535353535353535353535353|-16K00 7nX0 iv0 LCL0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11C0 1nX0 11A0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Jamaica|KMT EST EDT|57.b 50 40|0121212121212121212121|-2l1uQ.N 2uM1Q.N 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0",
+			"America/Juneau|PST PWT PPT PDT YDT YST AKST AKDT|80 70 70 70 80 90 90 80|01203030303030303030303030403030356767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676|-17T20 8x10 iy0 Vo10 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cM0 1cM0 1cL0 1cN0 1fz0 1a10 1fz0 co0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Kentucky/Louisville|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|0101010102301010101010101010101010101454545454545414545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 3Fd0 Nb0 LPd0 11z0 RB0 8x30 iw0 Bb0 10N0 2bB0 8in0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 xz0 gso0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1VA0 LA0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Kentucky/Monticello|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|0101023010101010101010101010101010101010101010101010101010101010101010101454545454545454545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 SWp0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11A0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/La_Paz|CMT BOST BOT|4w.A 3w.A 40|012|-1x37r.o 13b0",
+			"America/Lima|LMT PET PEST|58.A 50 40|0121212121212121|-2tyGP.o 1bDzP.o zX0 1aN0 1cL0 1cN0 1cL0 1PrB0 zX0 1O10 zX0 6Gp0 zX0 98p0 zX0",
+			"America/Los_Angeles|PST PDT PWT PPT|80 70 70 70|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261q0 1nX0 11B0 1nX0 SgN0 8x10 iy0 5Wp0 1Vb0 3dB0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Maceio|LMT BRT BRST|2m.Q 30 20|012121212121212121212121212121212121212121|-2glxB.8 HdLB.8 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 dMN0 Lz0 8Q10 WL0 1tB0 5z0 2mN0 On0",
+			"America/Managua|MMT CST EST CDT|5J.c 60 50 50|0121313121213131|-1quie.M 1yAMe.M 4mn0 9Up0 Dz0 1K10 Dz0 s3F0 1KH0 DB0 9In0 k8p0 19X0 1o30 11y0",
+			"America/Manaus|LMT AMT AMST|40.4 40 30|01212121212121212121212121212121|-2glvX.U HdKX.U 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 dPB0 On0",
+			"America/Martinique|FFMT AST ADT|44.k 40 30|0121|-2mPTT.E 2LPbT.E 19X0",
+			"America/Matamoros|LMT CST CDT|6E 60 50|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1UQG0 2FjC0 1nX0 i6p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 U10 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Mazatlan|LMT MST CST PST MDT|75.E 70 60 80 60|0121212131414141414141414141414141414141414141414141414141414141414141414141414141414141414141|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 otX0 gmN0 P2N0 13Vd0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Menominee|CST CDT CWT CPT EST|60 50 50 50 50|01010230101041010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 1o10 11z0 LCN0 1fz0 6410 9Jb0 1cM0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Merida|LMT CST EST CDT|5W.s 60 50 50|0121313131313131313131313131313131313131313131313131313131313131313131313131313131313131|-1UQG0 2q2o0 2hz0 wu30 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Metlakatla|PST PWT PPT PDT|80 70 70 70|0120303030303030303030303030303030|-17T20 8x10 iy0 Vo10 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0",
+			"America/Mexico_City|LMT MST CST CDT CWT|6A.A 70 60 50 50|012121232324232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 gEn0 TX0 3xd0 Jb0 6zB0 SL0 e5d0 17b0 1Pff0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Miquelon|LMT AST PMST PMDT|3I.E 40 30 20|012323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-2mKkf.k 2LTAf.k gQ10 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Moncton|EST AST ADT AWT APT|50 40 30 30 30|012121212121212121212134121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2IsH0 CwN0 1in0 zAo0 An0 1Nd0 An0 1Nd0 An0 1Nd0 An0 1Nd0 An0 1Nd0 An0 1K10 Lz0 1zB0 NX0 1u10 Wn0 S20 8x50 iu0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 3Cp0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14n1 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 ReX 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Monterrey|LMT CST CDT|6F.g 60 50|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1UQG0 2FjC0 1nX0 i6p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Montevideo|MMT UYT UYHST UYST UYT UYHST|3I.I 3u 30 20 30 2u|012121212121212121212121213434343434345454543453434343434343434343434343434343434343434343434343434343434343434343434343434343434343|-20UIf.g 8jzJ.g 1cLu 1dcu 1cLu 1dcu 1cLu ircu 11zu 1o0u 11zu 1o0u 11zu 1qMu WLu 1qMu WLu 1qMu WLu 1qMu 11zu 1o0u 11zu NAu 11bu 2iMu zWu Dq10 19X0 pd0 jz0 cm10 19X0 1fB0 1on0 11d0 1oL0 1nB0 1fzu 1aou 1fzu 1aou 1fzu 3nAu Jb0 3MN0 1SLu 4jzu 2PB0 Lb0 3Dd0 1pb0 ixd0 An0 1MN0 An0 1wp0 On0 1wp0 Rb0 1zd0 On0 1wp0 Rb0 s8p0 1fB0 1ip0 11z0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 1ld0 14n0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 1ld0 14n0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10",
+			"America/Montreal|EST EDT EWT EPT|50 40 40 40|01010101010101010101010101010101010101010101012301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-28tR0 bV0 2m30 1in0 121u 1nb0 1g10 11z0 1o0u 11zu 1o0u 11zu 3VAu Rzu 1qMu WLu 1qMu WLu 1qKu WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 4kO0 8x40 iv0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Nassau|LMT EST EDT|59.u 50 40|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2kNuO.u 26XdO.u 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/New_York|EST EDT EWT EPT|50 40 40 40|01010101010101010101010101010101010101010101010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261t0 1nX0 11B0 1nX0 11B0 1qL0 1a10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 RB0 8x40 iv0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Nipigon|EST EDT EWT EPT|50 40 40 40|010123010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-25TR0 1in0 Rnb0 3je0 8x40 iv0 19yN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Nome|NST NWT NPT BST BDT YST AKST AKDT|b0 a0 a0 b0 a0 90 90 80|012034343434343434343434343434343456767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676|-17SX0 8wW0 iB0 Qlb0 52O0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 cl0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Noronha|LMT FNT FNST|29.E 20 10|0121212121212121212121212121212121212121|-2glxO.k HdKO.k 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 nsp0 WL0 1tB0 2L0 2pB0 On0",
+			"America/North_Dakota/Beulah|MST MDT MWT MPT CST CDT|70 60 60 60 60 50|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101014545454545454545454545454545454545454545454545454545454|-261r0 1nX0 11B0 1nX0 SgN0 8x20 ix0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Oo0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/North_Dakota/Center|MST MDT MWT MPT CST CDT|70 60 60 60 60 50|010102301010101010101010101010101010101010101010101010101014545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-261r0 1nX0 11B0 1nX0 SgN0 8x20 ix0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14o0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/North_Dakota/New_Salem|MST MDT MWT MPT CST CDT|70 60 60 60 60 50|010102301010101010101010101010101010101010101010101010101010101010101010101010101454545454545454545454545454545454545454545454545454545454545454545454|-261r0 1nX0 11B0 1nX0 SgN0 8x20 ix0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14o0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Ojinaga|LMT MST CST CDT MDT|6V.E 70 60 50 60|0121212323241414141414141414141414141414141414141414141414141414141414141414141414141414141|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 2zQN0 1lb0 14p0 1lb0 14q0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 U10 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Panama|CMT EST|5j.A 50|01|-2uduE.o",
+			"America/Pangnirtung|zzz AST AWT APT ADDT ADT EDT EST CST CDT|0 40 30 30 20 30 40 50 60 50|012314151515151515151515151515151515167676767689767676767676767676767676767676767676767676767676767676767676767676767676767|-1XiM0 PnG0 8x50 iu0 LCL0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1o00 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11C0 1nX0 11A0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Paramaribo|LMT PMT PMT NEGT SRT SRT|3E.E 3E.Q 3E.A 3u 3u 30|012345|-2nDUj.k Wqo0.c qanX.I 1dmLN.o lzc0",
+			"America/Phoenix|MST MDT MWT|70 60 60|01010202010|-261r0 1nX0 11B0 1nX0 SgN0 4Al1 Ap0 1db0 SWqX 1cL0",
+			"America/Port-au-Prince|PPMT EST EDT|4N 50 40|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-28RHb 2FnMb 19X0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14q0 1o00 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 14o0 1o00 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 i6n0 1nX0 11B0 1nX0 d430 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Porto_Acre|LMT ACT ACST AMT|4v.c 50 40 40|01212121212121212121212121212131|-2glvs.M HdLs.M 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 NBd0 d5X0",
+			"America/Porto_Velho|LMT AMT AMST|4f.A 40 30|012121212121212121212121212121|-2glvI.o HdKI.o 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0",
+			"America/Puerto_Rico|AST AWT APT|40 30 30|0120|-17lU0 7XT0 iu0",
+			"America/Rainy_River|CST CDT CWT CPT|60 50 50 50|010123010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-25TQ0 1in0 Rnb0 3je0 8x30 iw0 19yN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Rankin_Inlet|zzz CST CDDT CDT EST|0 60 40 50 50|012131313131313131313131313131313131313131313431313131313131313131313131313131313131313131313131313131313131313131313131|-vDc0 keu0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Recife|LMT BRT BRST|2j.A 30 20|0121212121212121212121212121212121212121|-2glxE.o HdLE.o 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 nsp0 WL0 1tB0 2L0 2pB0 On0",
+			"America/Regina|LMT MST MDT MWT MPT CST|6W.A 70 60 60 60 60|012121212121212121212121341212121212121212121212121215|-2AD51.o uHe1.o 1in0 s2L0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 66N0 1cL0 1cN0 19X0 1fB0 1cL0 1fB0 1cL0 1cN0 1cL0 M30 8x20 ix0 1ip0 1cL0 1ip0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 3NB0 1cL0 1cN0",
+			"America/Resolute|zzz CST CDDT CDT EST|0 60 40 50 50|012131313131313131313131313131313131313131313431313131313431313131313131313131313131313131313131313131313131313131313131|-SnA0 GWS0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Santa_Isabel|LMT MST PST PDT PWT PPT|7D.s 70 80 70 70 70|012123245232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-1UQE0 4PX0 8mM0 8lc0 SN0 1cL0 pHB0 83r0 zI0 5O10 1Rz0 cOP0 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 BUp0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Santarem|LMT AMT AMST BRT|3C.M 40 30 30|0121212121212121212121212121213|-2glwl.c HdLl.c 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 NBd0",
+			"America/Santiago|SMT CLT CLT CLST CLST|4G.K 50 40 40 30|010203131313131313124242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424|-2q5Th.e fNch.e 5gLG.K 21bh.e jRAG.K 1pbh.e 11d0 1oL0 11d0 1oL0 11d0 1oL0 11d0 1pb0 11d0 nHX0 op0 9UK0 1Je0 Qen0 WL0 1zd0 On0 1ip0 11z0 1o10 11z0 1qN0 WL0 1ld0 14n0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 WL0 1qN0 1cL0 1cN0 11z0 1ld0 14n0 1qN0 11z0 1cN0 19X0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1ip0 1fz0 1fB0 11z0 1qN0 WL0 1qN0 WL0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1o10 19X0 1fB0 1nX0 G10 1EL0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0",
+			"America/Santo_Domingo|SDMT EST EDT EHDT AST|4E 50 40 4u 40|01213131313131414|-1ttjk 1lJMk Mn0 6sp0 Lbu 1Cou yLu 1RAu wLu 1QMu xzu 1Q0u xXu 1PAu 13jB0 e00",
+			"America/Sao_Paulo|LMT BRT BRST|36.s 30 20|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212|-2glwR.w HdKR.w 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 pTd0 PX0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 1EN0 Lz0 1C10 IL0 1HB0 Db0 1HB0 On0 1zd0 On0 1zd0 Lz0 1zd0 Rb0 1wN0 Wn0 1tB0 Rb0 1tB0 WL0 1tB0 Rb0 1zd0 On0 1HB0 FX0 1C10 Lz0 1Ip0 HX0 1zd0 On0 1HB0 IL0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1zd0 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0",
+			"America/Scoresbysund|LMT CGT CGST EGST EGT|1r.Q 20 10 0 10|0121343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434|-2a5Ww.8 2z5ew.8 1a00 1cK0 1cL0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"America/Sitka|PST PWT PPT PDT YST AKST AKDT|80 70 70 70 90 90 80|01203030303030303030303030303030345656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565|-17T20 8x10 iy0 Vo10 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 co0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/St_Johns|NST NDT NST NDT NWT NPT NDDT|3u.Q 2u.Q 3u 2u 2u 2u 1u|01010101010101010101010101010101010102323232323232324523232323232323232323232323232323232323232323232323232323232323232323232323232323232326232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-28oit.8 14L0 1nB0 1in0 1gm0 Dz0 1JB0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1fB0 19X0 1fB0 19X0 10O0 eKX.8 19X0 1iq0 WL0 1qN0 WL0 1qN0 WL0 1tB0 TX0 1tB0 WL0 1qN0 WL0 1qN0 7UHu itu 1tB0 WL0 1qN0 WL0 1qN0 WL0 1qN0 WL0 1tB0 WL0 1ld0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14n1 1lb0 14p0 1nW0 11C0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zcX Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Swift_Current|LMT MST MDT MWT MPT CST|7b.k 70 60 60 60 60|012134121212121212121215|-2AD4M.E uHdM.E 1in0 UGp0 8x20 ix0 1o10 17b0 1ip0 11z0 1o10 11z0 1o10 11z0 isN0 1cL0 3Cp0 1cL0 1cN0 11z0 1qN0 WL0 pMp0",
+			"America/Tegucigalpa|LMT CST CDT|5M.Q 60 50|01212121|-1WGGb.8 2ETcb.8 WL0 1qN0 WL0 GRd0 AL0",
+			"America/Thule|LMT AST ADT|4z.8 40 30|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2a5To.Q 31NBo.Q 1cL0 1cN0 1cL0 1fB0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Thunder_Bay|CST EST EWT EPT EDT|60 50 40 40 40|0123141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141|-2q5S0 1iaN0 8x40 iv0 XNB0 1cL0 1cN0 1fz0 1cN0 1cL0 3Cp0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Toronto|EST EDT EWT EPT|50 40 40 40|01010101010101010101010101010101010101010101012301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-25TR0 1in0 11Wu 1nzu 1fD0 WJ0 1wr0 Nb0 1Ap0 On0 1zd0 On0 1wp0 TX0 1tB0 TX0 1tB0 TX0 1tB0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 4kM0 8x40 iv0 1o10 11z0 1nX0 11z0 1o10 11z0 1o10 1qL0 11D0 1nX0 11B0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Vancouver|PST PDT PWT PPT|80 70 70 70|0102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-25TO0 1in0 UGp0 8x10 iy0 1o10 17b0 1ip0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Whitehorse|YST YDT YWT YPT YDDT PST PDT|90 80 80 80 70 80 70|0101023040565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565|-25TN0 1in0 1o10 13V0 Ser0 8x00 iz0 LCL0 1fA0 1Be0 xDz0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Winnipeg|CST CDT CWT CPT|60 50 50 50|010101023010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aIi0 WL0 3ND0 1in0 Jap0 Rb0 aCN0 8x30 iw0 1tB0 11z0 1ip0 11z0 1o10 11z0 1o10 11z0 1rd0 10L0 1op0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 1cL0 1cN0 11z0 6i10 WL0 6i10 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1a00 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1a00 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 14o0 1lc0 14o0 1o00 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 14o0 1o00 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1o00 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 14o0 1o00 11A0 1o00 11A0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Yakutat|YST YWT YPT YDT AKST AKDT|90 80 80 80 90 80|01203030303030303030303030303030304545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-17T10 8x00 iz0 Vo10 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 cn0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Yellowknife|zzz MST MWT MPT MDDT MDT|0 70 60 60 50 60|012314151515151515151515151515151515151515151515151515151515151515151515151515151515151515151515151515151515151515151515151|-1pdA0 hix0 8x20 ix0 LCL0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"Antarctica/Casey|zzz AWST CAST|0 -80 -b0|012121|-2q00 1DjS0 T90 40P0 KL0",
+			"Antarctica/Davis|zzz DAVT DAVT|0 -70 -50|01012121|-vyo0 iXt0 alj0 1D7v0 VB0 3Wn0 KN0",
+			"Antarctica/DumontDUrville|zzz PMT DDUT|0 -a0 -a0|0102|-U0o0 cfq0 bFm0",
+			"Antarctica/Macquarie|AEST AEDT zzz MIST|-a0 -b0 0 -b0|0102010101010101010101010101010101010101010101010101010101010101010101010101010101010101013|-29E80 19X0 4SL0 1ayy0 Lvs0 1cM0 1o00 Rc0 1wo0 Rc0 1wo0 U00 1wo0 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 11A0 1qM0 WM0 1qM0 Oo0 1zc0 Oo0 1zc0 Oo0 1wo0 WM0 1tA0 WM0 1tA0 U00 1tA0 U00 1tA0 11A0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 11A0 1o00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1cM0 1cM0 1cM0",
+			"Antarctica/Mawson|zzz MAWT MAWT|0 -60 -50|012|-CEo0 2fyk0",
+			"Antarctica/McMurdo|NZMT NZST NZST NZDT|-bu -cu -c0 -d0|01020202020202020202020202023232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323|-1GCVu Lz0 1tB0 11zu 1o0u 11zu 1o0u 11zu 1o0u 14nu 1lcu 14nu 1lcu 1lbu 11Au 1nXu 11Au 1nXu 11Au 1nXu 11Au 1nXu 11Au 1qLu WMu 1qLu 11Au 1n1bu IM0 1C00 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1qM0 14o0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1io0 17c0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1io0 17c0 1io0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00",
+			"Antarctica/Palmer|zzz ARST ART ART ARST CLT CLST|0 30 40 30 20 40 30|012121212123435656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656|-cao0 nD0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 jsN0 14N0 11z0 1o10 11z0 1qN0 WL0 1qN0 WL0 1qN0 1cL0 1cN0 11z0 1ld0 14n0 1qN0 11z0 1cN0 19X0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1ip0 1fz0 1fB0 11z0 1qN0 WL0 1qN0 WL0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1o10 19X0 1fB0 1nX0 G10 1EL0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0",
+			"Antarctica/Rothera|zzz ROTT|0 30|01|gOo0",
+			"Antarctica/Syowa|zzz SYOT|0 -30|01|-vs00",
+			"Antarctica/Troll|zzz UTC CEST|0 0 -20|01212121212121212121212121212121212121212121212121212121212121212121|1puo0 hd0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Antarctica/Vostok|zzz VOST|0 -60|01|-tjA0",
+			"Arctic/Longyearbyen|CET CEST|-10 -20|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2awM0 Qm0 W6o0 5pf0 WM0 1fA0 1cM0 1cM0 1cM0 1cM0 wJc0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1qM0 WM0 zpc0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Asia/Aden|LMT AST|-2X.S -30|01|-MG2X.S",
+			"Asia/Almaty|LMT ALMT ALMT ALMST|-57.M -50 -60 -70|0123232323232323232323232323232323232323232323232|-1Pc57.M eUo7.M 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 3Cl0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0",
+			"Asia/Amman|LMT EET EEST|-2n.I -20 -30|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1yW2n.I 1HiMn.I KL0 1oN0 11b0 1oN0 11b0 1pd0 1dz0 1cp0 11b0 1op0 11b0 fO10 1db0 1e10 1cL0 1cN0 1cL0 1cN0 1fz0 1pd0 10n0 1ld0 14n0 1hB0 15b0 1ip0 19X0 1cN0 1cL0 1cN0 17b0 1ld0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1So0 y00 1fc0 1dc0 1co0 1dc0 1cM0 1cM0 1cM0 1o00 11A0 1lc0 17c0 1cM0 1cM0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 4bX0 Dd0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0",
+			"Asia/Anadyr|LMT ANAT ANAT ANAST ANAST ANAST ANAT|-bN.U -c0 -d0 -e0 -d0 -c0 -b0|01232414141414141414141561414141414141414141414141414141414141561|-1PcbN.U eUnN.U 23CL0 1db0 1cN0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qN0 WM0",
+			"Asia/Aqtau|LMT FORT FORT SHET SHET SHEST AQTT AQTST AQTST AQTT|-3l.4 -40 -50 -50 -60 -60 -50 -60 -50 -40|012345353535353535353536767676898989898989898989896|-1Pc3l.4 eUnl.4 1jcL0 JDc0 1cL0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 2UK0 Fz0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cN0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 RW0",
+			"Asia/Aqtobe|LMT AKTT AKTT AKTST AKTT AQTT AQTST|-3M.E -40 -50 -60 -60 -50 -60|01234323232323232323232565656565656565656565656565|-1Pc3M.E eUnM.E 23CL0 1db0 1cM0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 2UK0 Fz0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0",
+			"Asia/Ashgabat|LMT ASHT ASHT ASHST ASHST TMT TMT|-3R.w -40 -50 -60 -50 -40 -50|012323232323232323232324156|-1Pc3R.w eUnR.w 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 ba0 xC0",
+			"Asia/Baghdad|BMT AST ADT|-2V.A -30 -40|012121212121212121212121212121212121212121212121212121|-26BeV.A 2ACnV.A 11b0 1cp0 1dz0 1dd0 1db0 1cN0 1cp0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1de0 1dc0 1dc0 1dc0 1cM0 1dc0 1cM0 1dc0 1cM0 1dc0 1dc0 1dc0 1cM0 1dc0 1cM0 1dc0 1cM0 1dc0 1dc0 1dc0 1cM0 1dc0 1cM0 1dc0 1cM0 1dc0 1dc0 1dc0 1cM0 1dc0 1cM0 1dc0 1cM0 1dc0",
+			"Asia/Bahrain|LMT GST AST|-3m.k -40 -30|012|-21Jfm.k 27BXm.k",
+			"Asia/Baku|LMT BAKT BAKT BAKST BAKST AZST AZT AZT AZST|-3j.o -30 -40 -50 -40 -40 -30 -40 -50|0123232323232323232323245657878787878787878787878787878787878787878787878787878787878787878787878787878787878787|-1Pc3j.o 1jUoj.o WCL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 10K0 c30 1cJ0 1cL0 8wu0 1o00 11z0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Asia/Bangkok|BMT ICT|-6G.4 -70|01|-218SG.4",
+			"Asia/Beirut|EET EEST|-20 -30|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-21aq0 1on0 1410 1db0 19B0 1in0 1ip0 WL0 1lQp0 11b0 1oN0 11b0 1oN0 11b0 1pd0 11b0 1oN0 11b0 q6N0 En0 1oN0 11b0 1oN0 11b0 1oN0 11b0 1pd0 11b0 1oN0 11b0 1op0 11b0 dA10 17b0 1iN0 17b0 1iN0 17b0 1iN0 17b0 1vB0 SL0 1mp0 13z0 1iN0 17b0 1iN0 17b0 1jd0 12n0 1a10 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0",
+			"Asia/Bishkek|LMT FRUT FRUT FRUST FRUST KGT KGST KGT|-4W.o -50 -60 -70 -60 -50 -60 -60|01232323232323232323232456565656565656565656565656567|-1Pc4W.o eUnW.o 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 11c0 1tX0 17b0 1ip0 17b0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1cPu 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 T8u",
+			"Asia/Brunei|LMT BNT BNT|-7D.E -7u -80|012|-1KITD.E gDc9.E",
+			"Asia/Calcutta|HMT BURT IST IST|-5R.k -6u -5u -6u|01232|-18LFR.k 1unn.k HB0 7zX0",
+			"Asia/Chita|LMT YAKT YAKT YAKST YAKST YAKT IRKT|-7x.Q -80 -90 -a0 -90 -a0 -80|012323232323232323232324123232323232323232323232323232323232323256|-21Q7x.Q pAnx.Q 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Choibalsan|LMT ULAT ULAT CHOST CHOT CHOT|-7C -70 -80 -a0 -90 -80|012343434343434343434343434343434343434343434345|-2APHC 2UkoC cKn0 1da0 1dd0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 6hD0 11z0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 3Db0",
+			"Asia/Chongqing|CST CDT|-80 -90|01010101010101010|-1c1I0 LX0 16p0 1jz0 1Myp0 Rb0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0",
+			"Asia/Colombo|MMT IST IHST IST LKT LKT|-5j.w -5u -60 -6u -6u -60|01231451|-2zOtj.w 1rFbN.w 1zzu 7Apu 23dz0 11zu n3cu",
+			"Asia/Dacca|HMT BURT IST DACT BDT BDST|-5R.k -6u -5u -60 -60 -70|01213454|-18LFR.k 1unn.k HB0 m6n0 LqMu 1x6n0 1i00",
+			"Asia/Damascus|LMT EET EEST|-2p.c -20 -30|01212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-21Jep.c Hep.c 17b0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1xRB0 11X0 1oN0 10L0 1pB0 11b0 1oN0 10L0 1mp0 13X0 1oN0 11b0 1pd0 11b0 1oN0 11b0 1oN0 11b0 1oN0 11b0 1pd0 11b0 1oN0 11b0 1oN0 11b0 1oN0 11b0 1pd0 11b0 1oN0 Nb0 1AN0 Nb0 bcp0 19X0 1gp0 19X0 3ld0 1xX0 Vd0 1Bz0 Sp0 1vX0 10p0 1dz0 1cN0 1cL0 1db0 1db0 1g10 1an0 1ap0 1db0 1fd0 1db0 1cN0 1db0 1dd0 1db0 1cp0 1dz0 1c10 1dX0 1cN0 1db0 1dd0 1db0 1cN0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1db0 1cN0 1db0 1cN0 19z0 1fB0 1qL0 11B0 1on0 Wp0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0",
+			"Asia/Dili|LMT TLT JST TLT WITA|-8m.k -80 -90 -90 -80|012343|-2le8m.k 1dnXm.k 8HA0 1ew00 Xld0",
+			"Asia/Dubai|LMT GST|-3F.c -40|01|-21JfF.c",
+			"Asia/Dushanbe|LMT DUST DUST DUSST DUSST TJT|-4z.c -50 -60 -70 -60 -50|0123232323232323232323245|-1Pc4z.c eUnz.c 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 14N0",
+			"Asia/Gaza|EET EET EEST IST IDT|-20 -30 -30 -20 -30|010101010102020202020202020202023434343434343434343434343430202020202020202020202020202020202020202020202020202020202020202020202020202020202020|-1c2q0 5Rb0 10r0 1px0 10N0 1pz0 16p0 1jB0 16p0 1jx0 pBd0 Vz0 1oN0 11b0 1oO0 10N0 1pz0 10N0 1pb0 10N0 1pb0 10N0 1pb0 10N0 1pz0 10N0 1pb0 10N0 1pb0 11d0 1oL0 dW0 hfB0 Db0 1fB0 Rb0 npB0 11z0 1C10 IL0 1s10 10n0 1o10 WL0 1zd0 On0 1ld0 11z0 1o10 14n0 1o10 14n0 1nd0 12n0 1nd0 Xz0 1q10 12n0 M10 C00 17c0 1io0 17c0 1io0 17c0 1o00 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 17c0 1io0 18N0 1bz0 19z0 1gp0 1610 1iL0 11z0 1o10 14o0 1lA1 SKX 1xd1 MKX 1AN0 1a00 1fA0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0",
+			"Asia/Hebron|EET EET EEST IST IDT|-20 -30 -30 -20 -30|01010101010202020202020202020202343434343434343434343434343020202020202020202020202020202020202020202020202020202020202020202020202020202020202020|-1c2q0 5Rb0 10r0 1px0 10N0 1pz0 16p0 1jB0 16p0 1jx0 pBd0 Vz0 1oN0 11b0 1oO0 10N0 1pz0 10N0 1pb0 10N0 1pb0 10N0 1pb0 10N0 1pz0 10N0 1pb0 10N0 1pb0 11d0 1oL0 dW0 hfB0 Db0 1fB0 Rb0 npB0 11z0 1C10 IL0 1s10 10n0 1o10 WL0 1zd0 On0 1ld0 11z0 1o10 14n0 1o10 14n0 1nd0 12n0 1nd0 Xz0 1q10 12n0 M10 C00 17c0 1io0 17c0 1io0 17c0 1o00 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 17c0 1io0 18N0 1bz0 19z0 1gp0 1610 1iL0 12L0 1mN0 14o0 1lc0 Tb0 1xd1 MKX bB0 cn0 1cN0 1a00 1fA0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0",
+			"Asia/Ho_Chi_Minh|LMT PLMT ICT IDT JST|-76.E -76.u -70 -80 -90|0123423232|-2yC76.E bK00.a 1h7b6.u 5lz0 18o0 3Oq0 k5b0 aW00 BAM0",
+			"Asia/Hong_Kong|LMT HKT HKST JST|-7A.G -80 -90 -90|0121312121212121212121212121212121212121212121212121212121212121212121|-2CFHA.G 1sEP6.G 1cL0 ylu 93X0 1qQu 1tX0 Rd0 1In0 NB0 1cL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1kL0 14N0 1nX0 U10 1tz0 U10 1wn0 Rd0 1wn0 U10 1tz0 U10 1tz0 U10 1tz0 U10 1wn0 Rd0 1wn0 Rd0 1wn0 U10 1tz0 U10 1tz0 17d0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 s10 1Vz0 1cN0 1cL0 1cN0 1cL0 6fd0 14n0",
+			"Asia/Hovd|LMT HOVT HOVT HOVST|-66.A -60 -70 -80|01232323232323232323232323232323232323232323232|-2APG6.A 2Uko6.A cKn0 1db0 1dd0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 6hD0 11z0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0",
+			"Asia/Irkutsk|IMT IRKT IRKT IRKST IRKST IRKT|-6V.5 -70 -80 -90 -80 -90|012323232323232323232324123232323232323232323232323232323232323252|-21zGV.5 pjXV.5 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Istanbul|IMT EET EEST TRST TRT|-1U.U -20 -30 -40 -30|012121212121212121212121212121212121212121212121212121234343434342121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2ogNU.U dzzU.U 11b0 8tB0 1on0 1410 1db0 19B0 1in0 3Rd0 Un0 1oN0 11b0 zSp0 CL0 mN0 1Vz0 1gN0 1pz0 5Rd0 1fz0 1yp0 ML0 1kp0 17b0 1ip0 17b0 1fB0 19X0 1jB0 18L0 1ip0 17z0 qdd0 xX0 3S10 Tz0 dA10 11z0 1o10 11z0 1qN0 11z0 1ze0 11B0 WM0 1qO0 WI0 1nX0 1rB0 10L0 11B0 1in0 17d0 1in0 2pX0 19E0 1fU0 16Q0 1iI0 16Q0 1iI0 1Vd0 pb0 3Kp0 14o0 1df0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cL0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WO0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 Xc0 1qo0 WM0 1qM0 11A0 1o00 1200 1nA0 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Asia/Jakarta|BMT JAVT WIB JST WIB WIB|-77.c -7k -7u -90 -80 -70|01232425|-1Q0Tk luM0 mPzO 8vWu 6kpu 4PXu xhcu",
+			"Asia/Jayapura|LMT WIT ACST|-9m.M -90 -9u|0121|-1uu9m.M sMMm.M L4nu",
+			"Asia/Jerusalem|JMT IST IDT IDDT|-2k.E -20 -30 -40|01212121212132121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-26Bek.E SyMk.E 5Rb0 10r0 1px0 10N0 1pz0 16p0 1jB0 16p0 1jx0 3LB0 Em0 or0 1cn0 1dB0 16n0 10O0 1ja0 1tC0 14o0 1cM0 1a00 11A0 1Na0 An0 1MP0 AJ0 1Kp0 LC0 1oo0 Wl0 EQN0 Db0 1fB0 Rb0 npB0 11z0 1C10 IL0 1s10 10n0 1o10 WL0 1zd0 On0 1ld0 11z0 1o10 14n0 1o10 14n0 1nd0 12n0 1nd0 Xz0 1q10 12n0 1hB0 1dX0 1ep0 1aL0 1eN0 17X0 1nf0 11z0 1tB0 19W0 1e10 17b0 1ep0 1gL0 18N0 1fz0 1eN0 17b0 1gq0 1gn0 19d0 1dz0 1c10 17X0 1hB0 1gn0 19d0 1dz0 1c10 17X0 1kp0 1dz0 1c10 1aL0 1eN0 1oL0 10N0 1oL0 10N0 1oL0 10N0 1rz0 W10 1rz0 W10 1rz0 10N0 1oL0 10N0 1oL0 10N0 1rz0 W10 1rz0 W10 1rz0 10N0 1oL0 10N0 1oL0 10N0 1oL0 10N0 1rz0 W10 1rz0 W10 1rz0 10N0 1oL0 10N0 1oL0 10N0 1rz0 W10 1rz0 W10 1rz0 W10 1rz0 10N0 1oL0 10N0 1oL0",
+			"Asia/Kabul|AFT AFT|-40 -4u|01|-10Qs0",
+			"Asia/Kamchatka|LMT PETT PETT PETST PETST|-ay.A -b0 -c0 -d0 -c0|01232323232323232323232412323232323232323232323232323232323232412|-1SLKy.A ivXy.A 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qN0 WM0",
+			"Asia/Karachi|LMT IST IST KART PKT PKST|-4s.c -5u -6u -50 -50 -60|012134545454|-2xoss.c 1qOKW.c 7zX0 eup0 LqMu 1fy01 1cL0 dK0X 11b0 1610 1jX0",
+			"Asia/Kashgar|LMT XJT|-5O.k -60|01|-1GgtO.k",
+			"Asia/Kathmandu|LMT IST NPT|-5F.g -5u -5J|012|-21JhF.g 2EGMb.g",
+			"Asia/Khandyga|LMT YAKT YAKT YAKST YAKST VLAT VLAST VLAT YAKT|-92.d -80 -90 -a0 -90 -a0 -b0 -b0 -a0|01232323232323232323232412323232323232323232323232565656565656565782|-21Q92.d pAp2.d 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 qK0 yN0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 17V0 7zD0",
+			"Asia/Krasnoyarsk|LMT KRAT KRAT KRAST KRAST KRAT|-6b.q -60 -70 -80 -70 -80|012323232323232323232324123232323232323232323232323232323232323252|-21Hib.q prAb.q 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Kuala_Lumpur|SMT MALT MALST MALT MALT JST MYT|-6T.p -70 -7k -7k -7u -90 -80|01234546|-2Bg6T.p 17anT.p 7hXE dM00 17bO 8Fyu 1so1u",
+			"Asia/Kuching|LMT BORT BORT BORTST JST MYT|-7l.k -7u -80 -8k -90 -80|01232323232323232425|-1KITl.k gDbP.k 6ynu AnE 1O0k AnE 1NAk AnE 1NAk AnE 1NAk AnE 1O0k AnE 1NAk AnE pAk 8Fz0 1so10",
+			"Asia/Kuwait|LMT AST|-3b.U -30|01|-MG3b.U",
+			"Asia/Macao|LMT MOT MOST CST|-7y.k -80 -90 -80|0121212121212121212121212121212121212121213|-2le7y.k 1XO34.k 1wn0 Rd0 1wn0 R9u 1wqu U10 1tz0 TVu 1tz0 17gu 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cJu 1cL0 1cN0 1fz0 1cN0 1cOu 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cJu 1cL0 1cN0 1fz0 1cN0 1cL0 KEp0",
+			"Asia/Magadan|LMT MAGT MAGT MAGST MAGST MAGT|-a3.c -a0 -b0 -c0 -b0 -c0|012323232323232323232324123232323232323232323232323232323232323251|-1Pca3.c eUo3.c 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Makassar|LMT MMT WITA JST|-7V.A -7V.A -80 -90|01232|-21JjV.A vfc0 myLV.A 8ML0",
+			"Asia/Manila|PHT PHST JST|-80 -90 -90|010201010|-1kJI0 AL0 cK10 65X0 mXB0 vX0 VK10 1db0",
+			"Asia/Muscat|LMT GST|-3S.o -40|01|-21JfS.o",
+			"Asia/Nicosia|LMT EET EEST|-2d.s -20 -30|01212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1Vc2d.s 2a3cd.s 1cL0 1qp0 Xz0 19B0 19X0 1fB0 1db0 1cp0 1cL0 1fB0 19X0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1o30 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Asia/Novokuznetsk|LMT KRAT KRAT KRAST KRAST NOVST NOVT NOVT|-5M.M -60 -70 -80 -70 -70 -60 -70|012323232323232323232324123232323232323232323232323232323232325672|-1PctM.M eULM.M 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qN0 WM0 8Hz0",
+			"Asia/Novosibirsk|LMT NOVT NOVT NOVST NOVST|-5v.E -60 -70 -80 -70|0123232323232323232323241232341414141414141414141414141414141414121|-21Qnv.E pAFv.E 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 ml0 Os0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Omsk|LMT OMST OMST OMSST OMSST OMST|-4R.u -50 -60 -70 -60 -70|012323232323232323232324123232323232323232323232323232323232323252|-224sR.u pMLR.u 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Oral|LMT URAT URAT URAST URAT URAST ORAT ORAST ORAT|-3p.o -40 -50 -60 -60 -50 -40 -50 -50|012343232323232323251516767676767676767676767676768|-1Pc3p.o eUnp.o 23CL0 1db0 1cM0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1fA0 2UK0 Fz0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 RW0",
+			"Asia/Pontianak|LMT PMT WIB JST WIB WITA WIB|-7h.k -7h.k -7u -90 -80 -80 -70|012324256|-2ua7h.k XE00 munL.k 8Rau 6kpu 4PXu xhcu Wqnu",
+			"Asia/Pyongyang|LMT KST JCST JST KST|-8n -8u -90 -90 -90|01234|-2um8n 97XR 12FXu jdA0",
+			"Asia/Qatar|LMT GST AST|-3q.8 -40 -30|012|-21Jfq.8 27BXq.8",
+			"Asia/Qyzylorda|LMT KIZT KIZT KIZST KIZT QYZT QYZT QYZST|-4l.Q -40 -50 -60 -60 -50 -60 -70|012343232323232323232325676767676767676767676767676|-1Pc4l.Q eUol.Q 23CL0 1db0 1cM0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 2UK0 dC0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0",
+			"Asia/Rangoon|RMT BURT JST MMT|-6o.E -6u -90 -6u|0123|-21Jio.E SmnS.E 7j9u",
+			"Asia/Riyadh|LMT AST|-36.Q -30|01|-TvD6.Q",
+			"Asia/Sakhalin|LMT JCST JST SAKT SAKST SAKST SAKT|-9u.M -90 -90 -b0 -c0 -b0 -a0|0123434343434343434343435634343434343565656565656565656565656565636|-2AGVu.M 1iaMu.M je00 1qFa0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o10 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Samarkand|LMT SAMT SAMT SAMST TAST UZST UZT|-4r.R -40 -50 -60 -60 -60 -50|01234323232323232323232356|-1Pc4r.R eUor.R 23CL0 1db0 1cM0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 11x0 bf0",
+			"Asia/Seoul|LMT KST JCST JST KST KDT KDT|-8r.Q -8u -90 -90 -90 -9u -a0|01234151515151515146464|-2um8r.Q 97XV.Q 12FXu jjA0 kKo0 2I0u OL0 1FB0 Rb0 1qN0 TX0 1tB0 TX0 1tB0 TX0 1tB0 TX0 2ap0 12FBu 11A0 1o00 11A0",
+			"Asia/Singapore|SMT MALT MALST MALT MALT JST SGT SGT|-6T.p -70 -7k -7k -7u -90 -7u -80|012345467|-2Bg6T.p 17anT.p 7hXE dM00 17bO 8Fyu Mspu DTA0",
+			"Asia/Srednekolymsk|LMT MAGT MAGT MAGST MAGST MAGT SRET|-ae.Q -a0 -b0 -c0 -b0 -c0 -b0|012323232323232323232324123232323232323232323232323232323232323256|-1Pcae.Q eUoe.Q 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Taipei|JWST JST CST CDT|-80 -90 -80 -90|01232323232323232323232323232323232323232|-1iw80 joM0 1yo0 Tz0 1ip0 1jX0 1cN0 11b0 1oN0 11b0 1oN0 11b0 1oN0 11b0 10N0 1BX0 10p0 1pz0 10p0 1pz0 10p0 1db0 1dd0 1db0 1cN0 1db0 1cN0 1db0 1cN0 1db0 1BB0 ML0 1Bd0 ML0 uq10 1db0 1cN0 1db0 97B0 AL0",
+			"Asia/Tashkent|LMT TAST TAST TASST TASST UZST UZT|-4B.b -50 -60 -70 -60 -60 -50|01232323232323232323232456|-1Pc4B.b eUnB.b 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 11y0 bf0",
+			"Asia/Tbilisi|TBMT TBIT TBIT TBIST TBIST GEST GET GET GEST|-2X.b -30 -40 -50 -40 -40 -30 -40 -50|0123232323232323232323245656565787878787878787878567|-1Pc2X.b 1jUnX.b WCL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 3y0 19f0 1cK0 1cL0 1cN0 1cL0 1cN0 1cL0 1cM0 1cL0 1fB0 3Nz0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 An0 Os0 WM0",
+			"Asia/Tehran|LMT TMT IRST IRST IRDT IRDT|-3p.I -3p.I -3u -40 -50 -4u|01234325252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252|-2btDp.I 1d3c0 1huLT.I TXu 1pz0 sN0 vAu 1cL0 1dB0 1en0 pNB0 UL0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 64p0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0",
+			"Asia/Thimbu|LMT IST BTT|-5W.A -5u -60|012|-Su5W.A 1BGMs.A",
+			"Asia/Tokyo|JCST JST JDT|-90 -90 -a0|0121212121|-1iw90 pKq0 QL0 1lB0 13X0 1zB0 NX0 1zB0 NX0",
+			"Asia/Ulaanbaatar|LMT ULAT ULAT ULAST|-77.w -70 -80 -90|01232323232323232323232323232323232323232323232|-2APH7.w 2Uko7.w cKn0 1db0 1dd0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 6hD0 11z0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0",
+			"Asia/Ust-Nera|LMT YAKT YAKT MAGST MAGT MAGST MAGT MAGT VLAT VLAT|-9w.S -80 -90 -c0 -b0 -b0 -a0 -c0 -b0 -a0|0123434343434343434343456434343434343434343434343434343434343434789|-21Q9w.S pApw.S 23CL0 1d90 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 17V0 7zD0",
+			"Asia/Vladivostok|LMT VLAT VLAT VLAST VLAST VLAT|-8L.v -90 -a0 -b0 -a0 -b0|012323232323232323232324123232323232323232323232323232323232323252|-1SJIL.v itXL.v 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Yakutsk|LMT YAKT YAKT YAKST YAKST YAKT|-8C.W -80 -90 -a0 -90 -a0|012323232323232323232324123232323232323232323232323232323232323252|-21Q8C.W pAoC.W 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Yekaterinburg|LMT PMT SVET SVET SVEST SVEST YEKT YEKST YEKT|-42.x -3J.5 -40 -50 -60 -50 -50 -60 -60|0123434343434343434343435267676767676767676767676767676767676767686|-2ag42.x 7mQh.s qBvJ.5 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Yerevan|LMT YERT YERT YERST YERST AMST AMT AMT AMST|-2W -30 -40 -50 -40 -40 -30 -40 -50|0123232323232323232323245656565657878787878787878787878787878787|-1Pc2W 1jUnW WCL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1am0 2r0 1cJ0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 3Fb0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0",
+			"Atlantic/Azores|HMT AZOT AZOST AZOMT AZOT AZOST WET|1S.w 20 10 0 10 0 0|01212121212121212121212121212121212121212121232123212321232121212121212121212121212121212121212121454545454545454545454545454545456545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-2ldW5.s aPX5.s Sp0 LX0 1vc0 Tc0 1uM0 SM0 1vc0 Tc0 1vc0 SM0 1vc0 6600 1co0 3E00 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 3I00 17c0 1cM0 1cM0 3Fc0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Dc0 1tA0 1cM0 1dc0 1400 gL0 IM0 s10 U00 dX0 Rc0 pd0 Rc0 gL0 Oo0 pd0 Rc0 gL0 Oo0 pd0 14o0 1cM0 1cP0 1cM0 1cM0 1cM0 1cM0 1cM0 3Co0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 qIl0 1cM0 1fA0 1cM0 1cM0 1cN0 1cL0 1cN0 1cM0 1cM0 1cM0 1cM0 1cN0 1cL0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cL0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Atlantic/Bermuda|LMT AST ADT|4j.i 40 30|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1BnRE.G 1LTbE.G 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"Atlantic/Canary|LMT CANT WET WEST|11.A 10 0 -10|01232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-1UtaW.o XPAW.o 1lAK0 1a10 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Atlantic/Cape_Verde|LMT CVT CVST CVT|1y.4 20 10 10|01213|-2xomp.U 1qOMp.U 7zX0 1djf0",
+			"Atlantic/Faeroe|LMT WET WEST|r.4 0 -10|01212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2uSnw.U 2Wgow.U 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Atlantic/Madeira|FMT MADT MADST MADMT WET WEST|17.A 10 0 -10 0 -10|01212121212121212121212121212121212121212121232123212321232121212121212121212121212121212121212121454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-2ldWQ.o aPWQ.o Sp0 LX0 1vc0 Tc0 1uM0 SM0 1vc0 Tc0 1vc0 SM0 1vc0 6600 1co0 3E00 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 3I00 17c0 1cM0 1cM0 3Fc0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Dc0 1tA0 1cM0 1dc0 1400 gL0 IM0 s10 U00 dX0 Rc0 pd0 Rc0 gL0 Oo0 pd0 Rc0 gL0 Oo0 pd0 14o0 1cM0 1cP0 1cM0 1cM0 1cM0 1cM0 1cM0 3Co0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 qIl0 1cM0 1fA0 1cM0 1cM0 1cN0 1cL0 1cN0 1cM0 1cM0 1cM0 1cM0 1cN0 1cL0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Atlantic/Reykjavik|RMT IST ISST GMT|1r.M 10 0 0|01212121212121212121212121212121212121212121212121212121212121213|-2uWmw.c mfaw.c 1Bd0 ML0 1LB0 NLX0 1pe0 zd0 1EL0 LA0 1C00 Oo0 1wo0 Rc0 1wo0 Rc0 1wo0 Rc0 1zc0 Oo0 1zc0 14o0 1lc0 14o0 1lc0 14o0 1o00 11A0 1lc0 14o0 1o00 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1o00 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1o00 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1o00 14o0",
+			"Atlantic/South_Georgia|GST|20|0|",
+			"Atlantic/Stanley|SMT FKT FKST FKT FKST|3P.o 40 30 30 20|0121212121212134343212121212121212121212121212121212121212121212121212|-2kJw8.A 12bA8.A 19X0 1fB0 19X0 1ip0 19X0 1fB0 19X0 1fB0 19X0 1fB0 Cn0 1Cc10 WL0 1qL0 U10 1tz0 U10 1qM0 WN0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1tz0 U10 1tz0 WN0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1tz0 WN0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1qN0 U10 1wn0 Rd0 1wn0 U10 1tz0 U10 1tz0 U10 1tz0 U10 1tz0 U10 1wn0 U10 1tz0 U10 1tz0 U10",
+			"Australia/ACT|AEST AEDT|-a0 -b0|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-293lX xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 14o0 1o00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 U00 1qM0 WM0 1tA0 WM0 1tA0 U00 1tA0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 11A0 1o00 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 14o0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/Adelaide|ACST ACDT|-9u -au|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-293lt xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 U00 1qM0 WM0 1tA0 WM0 1tA0 U00 1tA0 U00 1tA0 Oo0 1zc0 WM0 1qM0 Rc0 1zc0 U00 1tA0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 14o0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/Brisbane|AEST AEDT|-a0 -b0|01010101010101010|-293lX xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 H1A0 Oo0 1zc0 Oo0 1zc0 Oo0",
+			"Australia/Broken_Hill|ACST ACDT|-9u -au|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-293lt xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 14o0 1o00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 U00 1qM0 WM0 1tA0 WM0 1tA0 U00 1tA0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 14o0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/Currie|AEST AEDT|-a0 -b0|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-29E80 19X0 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 11A0 1qM0 WM0 1qM0 Oo0 1zc0 Oo0 1zc0 Oo0 1wo0 WM0 1tA0 WM0 1tA0 U00 1tA0 U00 1tA0 11A0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 11A0 1o00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/Darwin|ACST ACDT|-9u -au|010101010|-293lt xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0",
+			"Australia/Eucla|ACWST ACWDT|-8J -9J|0101010101010101010|-293kI xcX 10jd0 yL0 1cN0 1cL0 1gSp0 Oo0 l5A0 Oo0 iJA0 G00 zU00 IM0 1qM0 11A0 1o00 11A0",
+			"Australia/Hobart|AEST AEDT|-a0 -b0|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-29E80 19X0 10jd0 yL0 1cN0 1cL0 1fB0 19X0 VfB0 1cM0 1o00 Rc0 1wo0 Rc0 1wo0 U00 1wo0 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 11A0 1qM0 WM0 1qM0 Oo0 1zc0 Oo0 1zc0 Oo0 1wo0 WM0 1tA0 WM0 1tA0 U00 1tA0 U00 1tA0 11A0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 11A0 1o00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/LHI|AEST LHST LHDT LHDT|-a0 -au -bu -b0|0121212121313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313|raC0 1zdu Rb0 1zd0 On0 1zd0 On0 1zd0 On0 1zd0 TXu 1qMu WLu 1tAu WLu 1tAu TXu 1tAu Onu 1zcu Onu 1zcu Onu 1zcu Rbu 1zcu Onu 1zcu Onu 1zcu 11zu 1o0u 11zu 1o0u 11zu 1o0u 11zu 1qMu WLu 11Au 1nXu 1qMu 11zu 1o0u 11zu 1o0u 11zu 1qMu WLu 1qMu 11zu 1o0u WLu 1qMu 14nu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1fAu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1fAu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1fzu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1fAu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1fAu 1cLu 1cMu 1cLu 1cMu",
+			"Australia/Lindeman|AEST AEDT|-a0 -b0|010101010101010101010|-293lX xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 H1A0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0",
+			"Australia/Melbourne|AEST AEDT|-a0 -b0|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-293lX xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 U00 1qM0 WM0 1qM0 11A0 1tA0 U00 1tA0 U00 1tA0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 11A0 1o00 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 14o0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/Perth|AWST AWDT|-80 -90|0101010101010101010|-293jX xcX 10jd0 yL0 1cN0 1cL0 1gSp0 Oo0 l5A0 Oo0 iJA0 G00 zU00 IM0 1qM0 11A0 1o00 11A0",
+			"CET|CET CEST|-10 -20|01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1o00 11A0 Qrc0 6i00 WM0 1fA0 1cM0 1cM0 1cM0 16M0 1gMM0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"CST6CDT|CST CDT CWT CPT|60 50 50 50|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"Chile/EasterIsland|EMT EASST EAST EAST EASST|7h.s 60 70 60 50|012121212121212121212121212121213434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434|-1uSgG.w nHUG.w op0 9UK0 RXB0 WL0 1zd0 On0 1ip0 11z0 1o10 11z0 1qN0 WL0 1ld0 14n0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 WL0 1qN0 1cL0 1cN0 11z0 1ld0 14n0 1qN0 11z0 1cN0 19X0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1ip0 1fz0 1fB0 11z0 1qN0 WL0 1qN0 WL0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1o10 19X0 1fB0 1nX0 G10 1EL0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0",
+			"EET|EET EEST|-20 -30|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|hDB0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"EST|EST|50|0|",
+			"EST5EDT|EST EDT EWT EPT|50 40 40 40|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261t0 1nX0 11B0 1nX0 SgN0 8x40 iv0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"Eire|DMT IST GMT BST IST|p.l -y.D 0 -10 -10|01232323232324242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242|-2ax9y.D Rc0 1fzy.D 14M0 1fc0 1g00 1co0 1dc0 1co0 1oo0 1400 1dc0 19A0 1io0 1io0 WM0 1o00 14o0 1o00 17c0 1io0 17c0 1fA0 1a00 1lc0 17c0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1cM0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1a00 1io0 1qM0 Dc0 g5X0 14p0 1wn0 17d0 1io0 11A0 1o00 17c0 1fA0 1a00 1fA0 1cM0 1fA0 1a00 17c0 1fA0 1a00 1io0 17c0 1lc0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1a00 1a00 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1tA0 IM0 90o0 U00 1tA0 U00 1tA0 U00 1tA0 U00 1tA0 WM0 1qM0 WM0 1qM0 WM0 1tA0 U00 1tA0 U00 1tA0 11z0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 14o0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Etc/GMT+0|GMT|0|0|",
+			"Etc/GMT+1|GMT+1|10|0|",
+			"Etc/GMT+10|GMT+10|a0|0|",
+			"Etc/GMT+11|GMT+11|b0|0|",
+			"Etc/GMT+12|GMT+12|c0|0|",
+			"Etc/GMT+2|GMT+2|20|0|",
+			"Etc/GMT+3|GMT+3|30|0|",
+			"Etc/GMT+4|GMT+4|40|0|",
+			"Etc/GMT+5|GMT+5|50|0|",
+			"Etc/GMT+6|GMT+6|60|0|",
+			"Etc/GMT+7|GMT+7|70|0|",
+			"Etc/GMT+8|GMT+8|80|0|",
+			"Etc/GMT+9|GMT+9|90|0|",
+			"Etc/GMT-1|GMT-1|-10|0|",
+			"Etc/GMT-10|GMT-10|-a0|0|",
+			"Etc/GMT-11|GMT-11|-b0|0|",
+			"Etc/GMT-12|GMT-12|-c0|0|",
+			"Etc/GMT-13|GMT-13|-d0|0|",
+			"Etc/GMT-14|GMT-14|-e0|0|",
+			"Etc/GMT-2|GMT-2|-20|0|",
+			"Etc/GMT-3|GMT-3|-30|0|",
+			"Etc/GMT-4|GMT-4|-40|0|",
+			"Etc/GMT-5|GMT-5|-50|0|",
+			"Etc/GMT-6|GMT-6|-60|0|",
+			"Etc/GMT-7|GMT-7|-70|0|",
+			"Etc/GMT-8|GMT-8|-80|0|",
+			"Etc/GMT-9|GMT-9|-90|0|",
+			"Etc/UCT|UCT|0|0|",
+			"Etc/UTC|UTC|0|0|",
+			"Europe/Amsterdam|AMT NST NEST NET CEST CET|-j.w -1j.w -1k -k -20 -10|010101010101010101010101010101010101010101012323234545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545|-2aFcj.w 11b0 1iP0 11A0 1io0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1co0 1io0 1yo0 Pc0 1a00 1fA0 1Bc0 Mo0 1tc0 Uo0 1tA0 U00 1uo0 W00 1s00 VA0 1so0 Vc0 1sM0 UM0 1wo0 Rc0 1u00 Wo0 1rA0 W00 1s00 VA0 1sM0 UM0 1w00 fV0 BCX.w 1tA0 U00 1u00 Wo0 1sm0 601k WM0 1fA0 1cM0 1cM0 1cM0 16M0 1gMM0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Andorra|WET CET CEST|0 -10 -20|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-UBA0 1xIN0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Athens|AMT EET EEST CEST CET|-1y.Q -20 -30 -20 -10|012123434121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2a61x.Q CNbx.Q mn0 kU10 9b0 3Es0 Xa0 1fb0 1dd0 k3X0 Nz0 SCp0 1vc0 SO0 1cM0 1a00 1ao0 1fc0 1a10 1fG0 1cg0 1dX0 1bX0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Belfast|GMT BST BDST|0 -10 -20|0101010101010101010101010101010101010101010101010121212121210101210101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2axa0 Rc0 1fA0 14M0 1fc0 1g00 1co0 1dc0 1co0 1oo0 1400 1dc0 19A0 1io0 1io0 WM0 1o00 14o0 1o00 17c0 1io0 17c0 1fA0 1a00 1lc0 17c0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1cM0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1a00 1io0 1qM0 Dc0 2Rz0 Dc0 1zc0 Oo0 1zc0 Rc0 1wo0 17c0 1iM0 FA0 xB0 1fA0 1a00 14o0 bb0 LA0 xB0 Rc0 1wo0 11A0 1o00 17c0 1fA0 1a00 1fA0 1cM0 1fA0 1a00 17c0 1fA0 1a00 1io0 17c0 1lc0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1a00 1a00 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1tA0 IM0 90o0 U00 1tA0 U00 1tA0 U00 1tA0 U00 1tA0 WM0 1qM0 WM0 1qM0 WM0 1tA0 U00 1tA0 U00 1tA0 11z0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 14o0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Belgrade|CET CEST|-10 -20|01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-19RC0 3IP0 WM0 1fA0 1cM0 1cM0 1rc0 Qo0 1vmo0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Berlin|CET CEST CEMT|-10 -20 -30|01010101010101210101210101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1o00 11A0 Qrc0 6i00 WM0 1fA0 1cM0 1cM0 1cM0 kL0 Nc0 m10 WM0 1ao0 1cp0 dX0 jz0 Dd0 1io0 17c0 1fA0 1a00 1ehA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Bratislava|CET CEST|-10 -20|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1o00 11A0 Qrc0 6i00 WM0 1fA0 1cM0 16M0 1lc0 1tA0 17A0 11c0 1io0 17c0 1io0 17c0 1fc0 1ao0 1bNc0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Brussels|WET CET CEST WEST|0 -10 -20 -10|0121212103030303030303030303030303030303030303030303212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2ehc0 3zX0 11c0 1iO0 11A0 1o00 11A0 my0 Ic0 1qM0 Rc0 1EM0 UM0 1u00 10o0 1io0 1io0 17c0 1a00 1fA0 1cM0 1cM0 1io0 17c0 1fA0 1a00 1io0 1a30 1io0 17c0 1fA0 1a00 1io0 17c0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Dc0 y00 5Wn0 WM0 1fA0 1cM0 16M0 1iM0 16M0 1C00 Uo0 1eeo0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Bucharest|BMT EET EEST|-1I.o -20 -30|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1xApI.o 20LI.o RA0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1Axc0 On0 1fA0 1a10 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cK0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cL0 1cN0 1cL0 1fB0 1nX0 11E0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Budapest|CET CEST|-10 -20|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1ip0 17b0 1op0 1tb0 Q2m0 3Ne0 WM0 1fA0 1cM0 1cM0 1oJ0 1dc0 1030 1fA0 1cM0 1cM0 1cM0 1cM0 1fA0 1a00 1iM0 1fA0 8Ha0 Rb0 1wN0 Rb0 1BB0 Lz0 1C20 LB0 SNX0 1a10 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Busingen|CET CEST|-10 -20|01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-19Lc0 11A0 1o00 11A0 1xG10 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Chisinau|CMT BMT EET EEST CEST CET MSK MSD|-1T -1I.o -20 -30 -20 -10 -30 -40|0123232323232323232345454676767676767676767623232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-26jdT wGMa.A 20LI.o RA0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 27A0 2en0 39g0 WM0 1fA0 1cM0 V90 1t7z0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1ty0 2bD0 1cM0 1cK0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1nX0 11E0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Copenhagen|CET CEST|-10 -20|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2azC0 Tz0 VuO0 60q0 WM0 1fA0 1cM0 1cM0 1cM0 S00 1HA0 Nc0 1C00 Dc0 1Nc0 Ao0 1h5A0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Gibraltar|GMT BST BDST CET CEST|0 -10 -20 -10 -20|010101010101010101010101010101010101010101010101012121212121010121010101010101010101034343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343|-2axa0 Rc0 1fA0 14M0 1fc0 1g00 1co0 1dc0 1co0 1oo0 1400 1dc0 19A0 1io0 1io0 WM0 1o00 14o0 1o00 17c0 1io0 17c0 1fA0 1a00 1lc0 17c0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1cM0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1a00 1io0 1qM0 Dc0 2Rz0 Dc0 1zc0 Oo0 1zc0 Rc0 1wo0 17c0 1iM0 FA0 xB0 1fA0 1a00 14o0 bb0 LA0 xB0 Rc0 1wo0 11A0 1o00 17c0 1fA0 1a00 1fA0 1cM0 1fA0 1a00 17c0 1fA0 1a00 1io0 17c0 1lc0 17c0 1fA0 10Jz0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Helsinki|HMT EET EEST|-1D.N -20 -30|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1WuND.N OULD.N 1dA0 1xGq0 1cM0 1cM0 1cM0 1cN0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Kaliningrad|CET CEST CET CEST MSK MSD EEST EET FET|-10 -20 -20 -30 -30 -40 -30 -20 -30|0101010101010232454545454545454545454676767676767676767676767676767676767676787|-2aFe0 11d0 1iO0 11A0 1o00 11A0 Qrc0 6i00 WM0 1fA0 1cM0 1cM0 Am0 Lb0 1en0 op0 1pNz0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 1cJ0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Europe/Kiev|KMT EET MSK CEST CET MSD EEST|-22.4 -20 -30 -20 -10 -40 -30|0123434252525252525252525256161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161|-1Pc22.4 eUo2.4 rnz0 2Hg0 WM0 1fA0 da0 1v4m0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 Db0 3220 1cK0 1cL0 1cN0 1cL0 1cN0 1cL0 1cQ0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Lisbon|LMT WET WEST WEMT CET CEST|A.J 0 -10 -20 -10 -20|012121212121212121212121212121212121212121212321232123212321212121212121212121212121212121212121214121212121212121212121212121212124545454212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2ldXn.f aPWn.f Sp0 LX0 1vc0 Tc0 1uM0 SM0 1vc0 Tc0 1vc0 SM0 1vc0 6600 1co0 3E00 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 3I00 17c0 1cM0 1cM0 3Fc0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Dc0 1tA0 1cM0 1dc0 1400 gL0 IM0 s10 U00 dX0 Rc0 pd0 Rc0 gL0 Oo0 pd0 Rc0 gL0 Oo0 pd0 14o0 1cM0 1cP0 1cM0 1cM0 1cM0 1cM0 1cM0 3Co0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 pvy0 1cM0 1cM0 1fA0 1cM0 1cM0 1cN0 1cL0 1cN0 1cM0 1cM0 1cM0 1cM0 1cN0 1cL0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Luxembourg|LMT CET CEST WET WEST WEST WET|-o.A -10 -20 0 -10 -20 -10|0121212134343434343434343434343434343434343434343434565651212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2DG0o.A t6mo.A TB0 1nX0 Up0 1o20 11A0 rW0 CM0 1qP0 R90 1EO0 UK0 1u20 10m0 1ip0 1in0 17e0 19W0 1fB0 1db0 1cp0 1in0 17d0 1fz0 1a10 1in0 1a10 1in0 17f0 1fA0 1a00 1io0 17c0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Dc0 vA0 60L0 WM0 1fA0 1cM0 17c0 1io0 16M0 1C00 Uo0 1eeo0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Madrid|WET WEST WEMT CET CEST|0 -10 -20 -10 -20|01010101010101010101010121212121234343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343|-28dd0 11A0 1go0 19A0 1co0 1dA0 b1A0 18o0 3I00 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 iyo0 Rc0 18o0 1hc0 1io0 1a00 14o0 5aL0 MM0 1vc0 17A0 1i00 1bc0 1eo0 17d0 1in0 17A0 6hA0 10N0 XIL0 1a10 1in0 17d0 19X0 1cN0 1fz0 1a10 1fX0 1cp0 1cO0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Malta|CET CEST|-10 -20|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2as10 M00 1cM0 1cM0 14o0 1o00 WM0 1qM0 17c0 1cM0 M3A0 5M20 WM0 1fA0 1cM0 1cM0 1cM0 16m0 1de0 1lc0 14m0 1lc0 WO0 1qM0 GTW0 On0 1C10 Lz0 1C10 Lz0 1EN0 Lz0 1C10 Lz0 1zd0 Oo0 1C00 On0 1cp0 1cM0 1lA0 Xc0 1qq0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1iN0 19z0 1fB0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Minsk|MMT EET MSK CEST CET MSD EEST FET|-1O -20 -30 -20 -10 -40 -30 -30|012343432525252525252525252616161616161616161616161616161616161616172|-1Pc1O eUnO qNX0 3gQ0 WM0 1fA0 1cM0 Al0 1tsn0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 3Fc0 1cN0 1cK0 1cM0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hy0",
+			"Europe/Monaco|PMT WET WEST WEMT CET CEST|-9.l 0 -10 -20 -10 -20|01212121212121212121212121212121212121212121212121232323232345454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-2nco9.l cNb9.l HA0 19A0 1iM0 11c0 1oo0 Wo0 1rc0 QM0 1EM0 UM0 1u00 10o0 1io0 1wo0 Rc0 1a00 1fA0 1cM0 1cM0 1io0 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 1fA0 1a00 1io0 17c0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Df0 2RV0 11z0 11B0 1ze0 WM0 1fA0 1cM0 1fa0 1aq0 16M0 1ekn0 1cL0 1fC0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Moscow|MMT MMT MST MDST MSD MSK MSM EET EEST MSK|-2u.h -2v.j -3v.j -4v.j -40 -30 -50 -20 -30 -40|012132345464575454545454545454545458754545454545454545454545454545454545454595|-2ag2u.h 2pyW.W 1bA0 11X0 GN0 1Hb0 c20 imv.j 3DA0 dz0 15A0 c10 2q10 iM10 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Europe/Paris|PMT WET WEST CEST CET WEMT|-9.l 0 -10 -20 -10 -20|0121212121212121212121212121212121212121212121212123434352543434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434|-2nco8.l cNb8.l HA0 19A0 1iM0 11c0 1oo0 Wo0 1rc0 QM0 1EM0 UM0 1u00 10o0 1io0 1wo0 Rc0 1a00 1fA0 1cM0 1cM0 1io0 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 1fA0 1a00 1io0 17c0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Df0 Ik0 5M30 WM0 1fA0 1cM0 Vx0 hB0 1aq0 16M0 1ekn0 1cL0 1fC0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Riga|RMT LST EET MSK CEST CET MSD EEST|-1A.y -2A.y -20 -30 -20 -10 -40 -30|010102345454536363636363636363727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272|-25TzA.y 11A0 1iM0 ko0 gWm0 yDXA.y 2bX0 3fE0 WM0 1fA0 1cM0 1cM0 4m0 1sLy0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 1o00 11A0 1o00 11A0 1qM0 3oo0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Rome|CET CEST|-10 -20|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2as10 M00 1cM0 1cM0 14o0 1o00 WM0 1qM0 17c0 1cM0 M3A0 5M20 WM0 1fA0 1cM0 16K0 1iO0 16m0 1de0 1lc0 14m0 1lc0 WO0 1qM0 GTW0 On0 1C10 Lz0 1C10 Lz0 1EN0 Lz0 1C10 Lz0 1zd0 Oo0 1C00 On0 1C10 Lz0 1zd0 On0 1C10 LA0 1C00 LA0 1zc0 Oo0 1C00 Oo0 1zc0 Oo0 1fC0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Samara|LMT SAMT SAMT KUYT KUYST MSD MSK EEST KUYT SAMST SAMST|-3k.k -30 -40 -40 -50 -40 -30 -30 -30 -50 -40|012343434343434343435656782929292929292929292929292929292929292a12|-22WNk.k qHak.k bcn0 1Qqo0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1fA0 1cM0 1cN0 8o0 14j0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qN0 WM0",
+			"Europe/Simferopol|SMT EET MSK CEST CET MSD EEST MSK|-2g -20 -30 -20 -10 -40 -30 -40|012343432525252525252525252161616525252616161616161616161616161616161616172|-1Pc2g eUog rEn0 2qs0 WM0 1fA0 1cM0 3V0 1u0L0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1Q00 4eL0 1cL0 1cN0 1cL0 1cN0 dX0 WL0 1cN0 1cL0 1fB0 1o30 11B0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11z0 1nW0",
+			"Europe/Sofia|EET CET CEST EEST|-20 -10 -20 -30|01212103030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030|-168L0 WM0 1fA0 1cM0 1cM0 1cN0 1mKH0 1dd0 1fb0 1ap0 1fb0 1a20 1fy0 1a30 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cK0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1nX0 11E0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Stockholm|CET CEST|-10 -20|01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2azC0 TB0 2yDe0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Tallinn|TMT CET CEST EET MSK MSD EEST|-1D -10 -20 -20 -30 -40 -30|012103421212454545454545454546363636363636363636363636363636363636363636363636363636363636363636363636363636363636363636363|-26oND teD 11A0 1Ta0 4rXl KSLD 2FX0 2Jg0 WM0 1fA0 1cM0 18J0 1sTX0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o10 11A0 1qM0 5QM0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Tirane|LMT CET CEST|-1j.k -10 -20|01212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2glBj.k 14pcj.k 5LC0 WM0 4M0 1fCK0 10n0 1op0 11z0 1pd0 11z0 1qN0 WL0 1qp0 Xb0 1qp0 Xb0 1qp0 11z0 1lB0 11z0 1qN0 11z0 1iN0 16n0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Uzhgorod|CET CEST MSK MSD EET EEST|-10 -20 -30 -40 -20 -30|010101023232323232323232320454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-1cqL0 6i00 WM0 1fA0 1cM0 1ml0 1Cp0 1r3W0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1Q00 1Nf0 2pw0 1cL0 1cN0 1cL0 1cN0 1cL0 1cQ0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Vienna|CET CEST|-10 -20|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1o00 11A0 3KM0 14o0 LA00 6i00 WM0 1fA0 1cM0 1cM0 1cM0 400 2qM0 1a00 1cM0 1cM0 1io0 17c0 1gHa0 19X0 1cP0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Vilnius|WMT KMT CET EET MSK CEST MSD EEST|-1o -1z.A -10 -20 -30 -20 -40 -30|012324525254646464646464646464647373737373737352537373737373737373737373737373737373737373737373737373737373737373737373|-293do 6ILM.o 1Ooz.A zz0 Mfd0 29W0 3is0 WM0 1fA0 1cM0 LV0 1tgL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11B0 1o00 11A0 1qM0 8io0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Volgograd|LMT TSAT STAT STAT VOLT VOLST VOLST VOLT MSK MSK|-2V.E -30 -30 -40 -40 -50 -40 -30 -40 -30|012345454545454545454676748989898989898989898989898989898989898989|-21IqV.E cLXV.E cEM0 1gqn0 Lco0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1fA0 1cM0 2pz0 1cJ0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Europe/Warsaw|WMT CET CEST EET EEST|-1o -10 -20 -20 -30|012121234312121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2ctdo 1LXo 11d0 1iO0 11A0 1o00 11A0 1on0 11A0 6zy0 HWP0 5IM0 WM0 1fA0 1cM0 1dz0 1mL0 1en0 15B0 1aq0 1nA0 11A0 1io0 17c0 1fA0 1a00 iDX0 LA0 1cM0 1cM0 1C00 Oo0 1cM0 1cM0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1C00 LA0 uso0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Zaporozhye|CUT EET MSK CEST CET MSD EEST|-2k -20 -30 -20 -10 -40 -30|01234342525252525252525252526161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161|-1Pc2k eUok rdb0 2RE0 WM0 1fA0 8m0 1v9a0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cK0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cQ0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"HST|HST|a0|0|",
+			"Indian/Chagos|LMT IOT IOT|-4N.E -50 -60|012|-2xosN.E 3AGLN.E",
+			"Indian/Christmas|CXT|-70|0|",
+			"Indian/Cocos|CCT|-6u|0|",
+			"Indian/Kerguelen|zzz TFT|0 -50|01|-MG00",
+			"Indian/Mahe|LMT SCT|-3F.M -40|01|-2yO3F.M",
+			"Indian/Maldives|MMT MVT|-4S -50|01|-olgS",
+			"Indian/Mauritius|LMT MUT MUST|-3O -40 -50|012121|-2xorO 34unO 14L0 12kr0 11z0",
+			"Indian/Reunion|LMT RET|-3F.Q -40|01|-2mDDF.Q",
+			"Kwajalein|MHT KWAT MHT|-b0 c0 -c0|012|-AX0 W9X0",
+			"MET|MET MEST|-10 -20|01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1o00 11A0 Qrc0 6i00 WM0 1fA0 1cM0 1cM0 1cM0 16M0 1gMM0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"MST|MST|70|0|",
+			"MST7MDT|MST MDT MWT MPT|70 60 60 60|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261r0 1nX0 11B0 1nX0 SgN0 8x20 ix0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"NZ-CHAT|CHAST CHAST CHADT|-cf -cJ -dJ|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212|-WqAf 1adef IM0 1C00 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1qM0 14o0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1io0 17c0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1io0 17c0 1io0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00",
+			"PST8PDT|PST PDT PWT PPT|80 70 70 70|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261q0 1nX0 11B0 1nX0 SgN0 8x10 iy0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"Pacific/Apia|LMT WSST SST SDT WSDT WSST|bq.U bu b0 a0 -e0 -d0|01232345454545454545454545454545454545454545454545454545454|-2nDMx.4 1yW03.4 2rRbu 1ff0 1a00 CI0 AQ0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00",
+			"Pacific/Bougainville|PGT JST BST|-a0 -90 -b0|0102|-16Wy0 7CN0 2MQp0",
+			"Pacific/Chuuk|CHUT|-a0|0|",
+			"Pacific/Efate|LMT VUT VUST|-bd.g -b0 -c0|0121212121212121212121|-2l9nd.g 2Szcd.g 1cL0 1oN0 10L0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 Lz0 1Nd0 An0",
+			"Pacific/Enderbury|PHOT PHOT PHOT|c0 b0 -d0|012|nIc0 B8n0",
+			"Pacific/Fakaofo|TKT TKT|b0 -d0|01|1Gfn0",
+			"Pacific/Fiji|LMT FJT FJST|-bT.I -c0 -d0|012121212121212121212121212121212121212121212121212121212121212|-2bUzT.I 3m8NT.I LA0 1EM0 IM0 nJc0 LA0 1o00 Rc0 1wo0 Ao0 1Nc0 Ao0 1Q00 xz0 1SN0 uM0 1SM0 xA0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 xA0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 xA0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1VA0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0",
+			"Pacific/Funafuti|TVT|-c0|0|",
+			"Pacific/Galapagos|LMT ECT GALT|5W.o 50 60|012|-1yVS1.A 2dTz1.A",
+			"Pacific/Gambier|LMT GAMT|8X.M 90|01|-2jof0.c",
+			"Pacific/Guadalcanal|LMT SBT|-aD.M -b0|01|-2joyD.M",
+			"Pacific/Guam|GST ChST|-a0 -a0|01|1fpq0",
+			"Pacific/Honolulu|HST HDT HST|au 9u a0|010102|-1thLu 8x0 lef0 8Pz0 46p0",
+			"Pacific/Kiritimati|LINT LINT LINT|aE a0 -e0|012|nIaE B8nk",
+			"Pacific/Kosrae|KOST KOST|-b0 -c0|010|-AX0 1bdz0",
+			"Pacific/Majuro|MHT MHT|-b0 -c0|01|-AX0",
+			"Pacific/Marquesas|LMT MART|9i 9u|01|-2joeG",
+			"Pacific/Midway|NST NDT BST SST|b0 a0 b0 b0|01023|-x3N0 An0 pJd0 EyM0",
+			"Pacific/Nauru|LMT NRT JST NRT|-b7.E -bu -90 -c0|01213|-1Xdn7.E PvzB.E 5RCu 1ouJu",
+			"Pacific/Niue|NUT NUT NUT|bk bu b0|012|-KfME 17y0a",
+			"Pacific/Norfolk|NMT NFT|-bc -bu|01|-Kgbc",
+			"Pacific/Noumea|LMT NCT NCST|-b5.M -b0 -c0|01212121|-2l9n5.M 2EqM5.M xX0 1PB0 yn0 HeP0 Ao0",
+			"Pacific/Pago_Pago|LMT NST BST SST|bm.M b0 b0 b0|0123|-2nDMB.c 2gVzB.c EyM0",
+			"Pacific/Palau|PWT|-90|0|",
+			"Pacific/Pitcairn|PNT PST|8u 80|01|18Vku",
+			"Pacific/Pohnpei|PONT|-b0|0|",
+			"Pacific/Port_Moresby|PGT|-a0|0|",
+			"Pacific/Rarotonga|CKT CKHST CKT|au 9u a0|012121212121212121212121212|lyWu IL0 1zcu Onu 1zcu Onu 1zcu Rbu 1zcu Onu 1zcu Onu 1zcu Onu 1zcu Onu 1zcu Onu 1zcu Rbu 1zcu Onu 1zcu Onu 1zcu Onu",
+			"Pacific/Saipan|MPT MPT ChST|-90 -a0 -a0|012|-AV0 1g2n0",
+			"Pacific/Tahiti|LMT TAHT|9W.g a0|01|-2joe1.I",
+			"Pacific/Tarawa|GILT|-c0|0|",
+			"Pacific/Tongatapu|TOT TOT TOST|-ck -d0 -e0|01212121|-1aB0k 2n5dk 15A0 1wo0 xz0 1Q10 xz0",
+			"Pacific/Wake|WAKT|-c0|0|",
+			"Pacific/Wallis|WFT|-c0|0|",
+			"WET|WET WEST|0 -10|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|hDB0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00"
+		],
+		"links": [
+			"Africa/Abidjan|Africa/Bamako",
+			"Africa/Abidjan|Africa/Banjul",
+			"Africa/Abidjan|Africa/Conakry",
+			"Africa/Abidjan|Africa/Dakar",
+			"Africa/Abidjan|Africa/Freetown",
+			"Africa/Abidjan|Africa/Lome",
+			"Africa/Abidjan|Africa/Nouakchott",
+			"Africa/Abidjan|Africa/Ouagadougou",
+			"Africa/Abidjan|Africa/Sao_Tome",
+			"Africa/Abidjan|Africa/Timbuktu",
+			"Africa/Abidjan|Atlantic/St_Helena",
+			"Africa/Addis_Ababa|Africa/Asmara",
+			"Africa/Addis_Ababa|Africa/Asmera",
+			"Africa/Addis_Ababa|Africa/Dar_es_Salaam",
+			"Africa/Addis_Ababa|Africa/Djibouti",
+			"Africa/Addis_Ababa|Africa/Kampala",
+			"Africa/Addis_Ababa|Africa/Mogadishu",
+			"Africa/Addis_Ababa|Africa/Nairobi",
+			"Africa/Addis_Ababa|Indian/Antananarivo",
+			"Africa/Addis_Ababa|Indian/Comoro",
+			"Africa/Addis_Ababa|Indian/Mayotte",
+			"Africa/Bangui|Africa/Brazzaville",
+			"Africa/Bangui|Africa/Douala",
+			"Africa/Bangui|Africa/Kinshasa",
+			"Africa/Bangui|Africa/Lagos",
+			"Africa/Bangui|Africa/Libreville",
+			"Africa/Bangui|Africa/Luanda",
+			"Africa/Bangui|Africa/Malabo",
+			"Africa/Bangui|Africa/Niamey",
+			"Africa/Bangui|Africa/Porto-Novo",
+			"Africa/Blantyre|Africa/Bujumbura",
+			"Africa/Blantyre|Africa/Gaborone",
+			"Africa/Blantyre|Africa/Harare",
+			"Africa/Blantyre|Africa/Kigali",
+			"Africa/Blantyre|Africa/Lubumbashi",
+			"Africa/Blantyre|Africa/Lusaka",
+			"Africa/Blantyre|Africa/Maputo",
+			"Africa/Cairo|Egypt",
+			"Africa/Johannesburg|Africa/Maseru",
+			"Africa/Johannesburg|Africa/Mbabane",
+			"Africa/Juba|Africa/Khartoum",
+			"Africa/Tripoli|Libya",
+			"America/Adak|America/Atka",
+			"America/Adak|US/Aleutian",
+			"America/Anchorage|US/Alaska",
+			"America/Anguilla|America/Dominica",
+			"America/Anguilla|America/Grenada",
+			"America/Anguilla|America/Guadeloupe",
+			"America/Anguilla|America/Marigot",
+			"America/Anguilla|America/Montserrat",
+			"America/Anguilla|America/Port_of_Spain",
+			"America/Anguilla|America/St_Barthelemy",
+			"America/Anguilla|America/St_Kitts",
+			"America/Anguilla|America/St_Lucia",
+			"America/Anguilla|America/St_Thomas",
+			"America/Anguilla|America/St_Vincent",
+			"America/Anguilla|America/Tortola",
+			"America/Anguilla|America/Virgin",
+			"America/Argentina/Buenos_Aires|America/Buenos_Aires",
+			"America/Argentina/Catamarca|America/Argentina/ComodRivadavia",
+			"America/Argentina/Catamarca|America/Catamarca",
+			"America/Argentina/Cordoba|America/Cordoba",
+			"America/Argentina/Cordoba|America/Rosario",
+			"America/Argentina/Jujuy|America/Jujuy",
+			"America/Argentina/Mendoza|America/Mendoza",
+			"America/Aruba|America/Curacao",
+			"America/Aruba|America/Kralendijk",
+			"America/Aruba|America/Lower_Princes",
+			"America/Atikokan|America/Coral_Harbour",
+			"America/Chicago|US/Central",
+			"America/Denver|America/Shiprock",
+			"America/Denver|Navajo",
+			"America/Denver|US/Mountain",
+			"America/Detroit|US/Michigan",
+			"America/Edmonton|Canada/Mountain",
+			"America/Ensenada|America/Tijuana",
+			"America/Ensenada|Mexico/BajaNorte",
+			"America/Fort_Wayne|America/Indiana/Indianapolis",
+			"America/Fort_Wayne|America/Indianapolis",
+			"America/Fort_Wayne|US/East-Indiana",
+			"America/Halifax|Canada/Atlantic",
+			"America/Havana|Cuba",
+			"America/Indiana/Knox|America/Knox_IN",
+			"America/Indiana/Knox|US/Indiana-Starke",
+			"America/Jamaica|Jamaica",
+			"America/Kentucky/Louisville|America/Louisville",
+			"America/Los_Angeles|US/Pacific",
+			"America/Los_Angeles|US/Pacific-New",
+			"America/Manaus|Brazil/West",
+			"America/Mazatlan|Mexico/BajaSur",
+			"America/Mexico_City|Mexico/General",
+			"America/New_York|US/Eastern",
+			"America/Noronha|Brazil/DeNoronha",
+			"America/Phoenix|US/Arizona",
+			"America/Porto_Acre|America/Rio_Branco",
+			"America/Porto_Acre|Brazil/Acre",
+			"America/Regina|Canada/East-Saskatchewan",
+			"America/Regina|Canada/Saskatchewan",
+			"America/Santiago|Chile/Continental",
+			"America/Sao_Paulo|Brazil/East",
+			"America/St_Johns|Canada/Newfoundland",
+			"America/Toronto|Canada/Eastern",
+			"America/Vancouver|Canada/Pacific",
+			"America/Whitehorse|Canada/Yukon",
+			"America/Winnipeg|Canada/Central",
+			"Antarctica/McMurdo|Antarctica/South_Pole",
+			"Antarctica/McMurdo|NZ",
+			"Antarctica/McMurdo|Pacific/Auckland",
+			"Arctic/Longyearbyen|Atlantic/Jan_Mayen",
+			"Arctic/Longyearbyen|Europe/Oslo",
+			"Asia/Ashgabat|Asia/Ashkhabad",
+			"Asia/Bangkok|Asia/Phnom_Penh",
+			"Asia/Bangkok|Asia/Vientiane",
+			"Asia/Calcutta|Asia/Kolkata",
+			"Asia/Chongqing|Asia/Chungking",
+			"Asia/Chongqing|Asia/Harbin",
+			"Asia/Chongqing|Asia/Shanghai",
+			"Asia/Chongqing|PRC",
+			"Asia/Dacca|Asia/Dhaka",
+			"Asia/Ho_Chi_Minh|Asia/Saigon",
+			"Asia/Hong_Kong|Hongkong",
+			"Asia/Istanbul|Europe/Istanbul",
+			"Asia/Istanbul|Turkey",
+			"Asia/Jerusalem|Asia/Tel_Aviv",
+			"Asia/Jerusalem|Israel",
+			"Asia/Kashgar|Asia/Urumqi",
+			"Asia/Kathmandu|Asia/Katmandu",
+			"Asia/Macao|Asia/Macau",
+			"Asia/Makassar|Asia/Ujung_Pandang",
+			"Asia/Nicosia|Europe/Nicosia",
+			"Asia/Seoul|ROK",
+			"Asia/Singapore|Singapore",
+			"Asia/Taipei|ROC",
+			"Asia/Tehran|Iran",
+			"Asia/Thimbu|Asia/Thimphu",
+			"Asia/Tokyo|Japan",
+			"Asia/Ulaanbaatar|Asia/Ulan_Bator",
+			"Atlantic/Faeroe|Atlantic/Faroe",
+			"Atlantic/Reykjavik|Iceland",
+			"Australia/ACT|Australia/Canberra",
+			"Australia/ACT|Australia/NSW",
+			"Australia/ACT|Australia/Sydney",
+			"Australia/Adelaide|Australia/South",
+			"Australia/Brisbane|Australia/Queensland",
+			"Australia/Broken_Hill|Australia/Yancowinna",
+			"Australia/Darwin|Australia/North",
+			"Australia/Hobart|Australia/Tasmania",
+			"Australia/LHI|Australia/Lord_Howe",
+			"Australia/Melbourne|Australia/Victoria",
+			"Australia/Perth|Australia/West",
+			"Chile/EasterIsland|Pacific/Easter",
+			"Eire|Europe/Dublin",
+			"Etc/GMT+0|Etc/GMT",
+			"Etc/GMT+0|Etc/GMT-0",
+			"Etc/GMT+0|Etc/GMT0",
+			"Etc/GMT+0|Etc/Greenwich",
+			"Etc/GMT+0|GMT",
+			"Etc/GMT+0|GMT+0",
+			"Etc/GMT+0|GMT-0",
+			"Etc/GMT+0|GMT0",
+			"Etc/GMT+0|Greenwich",
+			"Etc/UCT|UCT",
+			"Etc/UTC|Etc/Universal",
+			"Etc/UTC|Etc/Zulu",
+			"Etc/UTC|UTC",
+			"Etc/UTC|Universal",
+			"Etc/UTC|Zulu",
+			"Europe/Belfast|Europe/Guernsey",
+			"Europe/Belfast|Europe/Isle_of_Man",
+			"Europe/Belfast|Europe/Jersey",
+			"Europe/Belfast|Europe/London",
+			"Europe/Belfast|GB",
+			"Europe/Belfast|GB-Eire",
+			"Europe/Belgrade|Europe/Ljubljana",
+			"Europe/Belgrade|Europe/Podgorica",
+			"Europe/Belgrade|Europe/Sarajevo",
+			"Europe/Belgrade|Europe/Skopje",
+			"Europe/Belgrade|Europe/Zagreb",
+			"Europe/Bratislava|Europe/Prague",
+			"Europe/Busingen|Europe/Vaduz",
+			"Europe/Busingen|Europe/Zurich",
+			"Europe/Chisinau|Europe/Tiraspol",
+			"Europe/Helsinki|Europe/Mariehamn",
+			"Europe/Lisbon|Portugal",
+			"Europe/Moscow|W-SU",
+			"Europe/Rome|Europe/San_Marino",
+			"Europe/Rome|Europe/Vatican",
+			"Europe/Warsaw|Poland",
+			"Kwajalein|Pacific/Kwajalein",
+			"NZ-CHAT|Pacific/Chatham",
+			"Pacific/Chuuk|Pacific/Truk",
+			"Pacific/Chuuk|Pacific/Yap",
+			"Pacific/Honolulu|Pacific/Johnston",
+			"Pacific/Honolulu|US/Hawaii",
+			"Pacific/Pago_Pago|Pacific/Samoa",
+			"Pacific/Pago_Pago|US/Samoa",
+			"Pacific/Pohnpei|Pacific/Ponape"
+		]
+	});
+
+
+	return moment;
+}));
diff --git a/docs/previous_versions/v0.4.0/references.html b/docs/previous_versions/v0.4.0/references.html
new file mode 100644
index 000000000..feee013f5
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/references.html
@@ -0,0 +1,670 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>References | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="References | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="References | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="C-appendixC.html">
+
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="references" class="section level1 unnumbered">
+<h1>References</h1>
+
+<div id="refs" class="references">
+<div>
+<p>Bray, Andrew, Chester Ismay, Ben Baumer, and Mine Cetinkaya-Rundel. 2019. <em>Infer: Tidy Statistical Inference</em>. <a href="https://github.com/tidymodels/infer">https://github.com/tidymodels/infer</a>.</p>
+</div>
+<div>
+<p>Chihara, Laura M., and Tim C. Hesterberg. 2011. <em>Mathematical Statistics with Resampling and R</em>. Hoboken, NJ: John Wiley; Sons. <a href="https://sites.google.com/site/chiharahesterberg/home">https://sites.google.com/site/chiharahesterberg/home</a>.</p>
+</div>
+<div>
+<p>Diez, David M, Christopher D Barr, and Mine Çetinkaya-Rundel. 2014. <em>Introductory Statistics with Randomization and Simulation</em>. First Edition. <a href="https://www.openintro.org/stat/textbook.php?stat_book=isrs">https://www.openintro.org/stat/textbook.php?stat_book=isrs</a>.</p>
+</div>
+<div>
+<p>Grolemund, Garrett, and Hadley Wickham. 2016. <em>R for Data Science</em>. <a href="http://r4ds.had.co.nz/">http://r4ds.had.co.nz/</a>.</p>
+</div>
+<div>
+<p>Ismay, Chester. 2016. <em>Getting Used to R, RStudio, and R Markdown</em>. <a href="http://ismayc.github.io/rbasics-book">http://ismayc.github.io/rbasics-book</a>.</p>
+</div>
+<div>
+<p>Kim, Albert Y., Chester Ismay, and Jennifer Chunn. 2019. <em>Fivethirtyeight: Data and Code Behind the Stories and Interactives at ’Fivethirtyeight’</em>. <a href="https://github.com/rudeboybert/fivethirtyeight">https://github.com/rudeboybert/fivethirtyeight</a>.</p>
+</div>
+<div>
+<p>Robbins, Naomi. 2013. <em>Creating More Effective Graphs</em>. Chart House.</p>
+</div>
+<div>
+<p>Wickham, Hadley. 2014. “Tidy Data.” <em>Journal of Statistical Software</em> Volume 59 (Issue 10). <a href="https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf">https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf</a>.</p>
+</div>
+<div>
+<p>———. 2015. <em>Ggplot2movies: Movies Data</em>. <a href="https://CRAN.R-project.org/package=ggplot2movies">https://CRAN.R-project.org/package=ggplot2movies</a>.</p>
+</div>
+<div>
+<p>———. 2018. <em>Nycflights13: Flights That Departed Nyc in 2013</em>. <a href="https://CRAN.R-project.org/package=nycflights13">https://CRAN.R-project.org/package=nycflights13</a>.</p>
+</div>
+<div>
+<p>Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. <em>Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics</em>. <a href="https://CRAN.R-project.org/package=ggplot2">https://CRAN.R-project.org/package=ggplot2</a>.</p>
+</div>
+<div>
+<p>Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2019. <em>Dplyr: A Grammar of Data Manipulation</em>. <a href="https://CRAN.R-project.org/package=dplyr">https://CRAN.R-project.org/package=dplyr</a>.</p>
+</div>
+<div>
+<p>Wickham, Hadley, and Lionel Henry. 2018. <em>Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions</em>. <a href="https://CRAN.R-project.org/package=tidyr">https://CRAN.R-project.org/package=tidyr</a>.</p>
+</div>
+<div>
+<p>Wilkinson, Leland. 2005. <em>The Grammar of Graphics (Statistics and Computing)</em>. Secaucus, NJ, USA: Springer-Verlag New York, Inc.</p>
+</div>
+<div>
+<p>Xie, Yihui. 2018. <em>Bookdown: Authoring Books and Technical Documents with R Markdown</em>. <a href="https://CRAN.R-project.org/package=bookdown">https://CRAN.R-project.org/package=bookdown</a>.</p>
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="C-appendixC.html" class="navigation navigation-prev navigation-unique" aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/99-references.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/docs/previous_versions/v0.4.0/scripts/02-getting-started.R b/docs/previous_versions/v0.4.0/scripts/02-getting-started.R
new file mode 100755
index 000000000..1d5551eb9
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/02-getting-started.R
@@ -0,0 +1,49 @@
+## ---- eval=FALSE---------------------------------------------------------
+## library(ggplot2)
+## library(dplyr)
+
+## ----message=FALSE-------------------------------------------------------
+library(dplyr)
+
+# Be sure to install these first!
+library(nycflights13)
+library(knitr)
+
+## ----load_flights--------------------------------------------------------
+flights
+
+## **_Learning check_**
+
+## **Learning Check Solutions**
+
+## NA
+## ------------------------------------------------------------------------
+glimpse(flights)
+
+## **_Learning check_**
+
+## **Learning Check Solutions**
+
+## NA
+## ----eval=FALSE----------------------------------------------------------
+## airlines
+## kable(airlines)
+
+## ----eval=FALSE----------------------------------------------------------
+## airlines
+## airlines$name
+
+## ----eval=FALSE----------------------------------------------------------
+## ?flights
+
+## ---- echo=FALSE, warning=FALSE, message=FALSE, results='hide'-----------
+# needed_pkgs <- c("nycflights13", "tibble", "dplyr", "ggplot2", "knitr", 
+#   "okcupiddata", "dygraphs", "rmarkdown", "mosaic", 
+#   "ggplot2movies", "fivethirtyeight", "readr")
+# 
+# new.pkgs <- needed_pkgs[!(needed_pkgs %in% installed.packages())]
+# 
+# if(length(new.pkgs)) {
+#   install.packages(new.pkgs, repos = "http://cran.rstudio.com")
+# }
+
diff --git a/docs/previous_versions/v0.4.0/scripts/03-visualization.R b/docs/previous_versions/v0.4.0/scripts/03-visualization.R
new file mode 100755
index 000000000..3a52e073a
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/03-visualization.R
@@ -0,0 +1,328 @@
+## ----message=FALSE-------------------------------------------------------
+library(nycflights13)
+library(ggplot2)
+library(dplyr)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(gapminder)
+library(knitr)
+library(readr)
+
+## ---- echo=FALSE---------------------------------------------------------
+gapminder_2007 <- gapminder %>% 
+  filter(year == 2007) %>% 
+  select(-year) %>% 
+  rename(
+    Country = country,
+    Continent = continent,
+    `Life Expectancy` = lifeExp,
+    `Population` = pop,
+    `GDP per Capita` = gdpPercap
+  )
+
+## ---- echo=FALSE---------------------------------------------------------
+gapminder_2007 %>% 
+  head() %>% 
+  kable(
+    digits=2,
+    caption = "Gapminder 2007 Data: First 6 of 142 countries", 
+    booktabs = TRUE
+  )
+
+## ----gapminder, echo=FALSE, fig.cap="Life Expectancy over GDP per Capita in 2007"----
+ggplot(data = gapminder_2007, mapping = aes(x=`GDP per Capita`, y=`Life Expectancy`, size=Population, col=Continent)) +
+  geom_point()
+
+## ---- echo=FALSE---------------------------------------------------------
+map <- data_frame(
+  `data variable` = c("GDP per Capita", "Life Expectancy", "Population", "Continent"),
+  aes = c("x", "y", "size", "color"),
+  geom = c("point", "point", "point", "point")
+)
+
+map %>% 
+  kable(
+    caption = "Summary of Grammar of Graphics for this plot", 
+    booktabs = TRUE
+    )
+
+## **_Review questions_**
+
+## ------------------------------------------------------------------------
+all_alaska_flights <- flights %>% 
+  filter(carrier == "AS")
+
+## **Learning Check Solutions**
+
+## ----noalpha, fig.cap="Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013", message=TRUE----
+ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + 
+  geom_point()
+
+## ----nolayers, fig.cap="Plot with No Layers"-----------------------------
+ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay))
+
+## **Learning Check Solutions**
+
+## ---- include=show_solutions('3-2'), echo=show_solutions('3-2')----------
+ggplot(data = all_alaska_flights, mapping = aes(x = dep_time, y = dep_delay)) +
+  geom_point()
+
+## ----alpha, fig.cap="Delay scatterplot with alpha=0.2"-------------------
+ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + 
+  geom_point(alpha = 0.2)
+
+## ----jitter-example-df, echo=FALSE---------------------------------------
+jitter_example <- data_frame(
+  x = c(0, 0, 0, 0),
+  y = c(0, 0, 0, 0)
+)
+
+## ----jitter-example-df-01------------------------------------------------
+jitter_example
+
+## ----jitter-example-plot-1, fig.cap="Regular scatterplot of jitter example data", echo=FALSE----
+ggplot(data = jitter_example, mapping = aes(x = x, y = y)) + 
+  geom_point() +
+  coord_cartesian(xlim = c(-0.025, 0.025), ylim = c(-0.025, 0.025)) + 
+  labs(title = "Regular scatterplot")
+
+## ----jitter-example-plot-2, fig.cap="Jittered scatterplot of jitter example data", echo=FALSE----
+ggplot(data = jitter_example, mapping = aes(x = x, y = y)) + 
+  geom_jitter(width = 0.01, height = 0.01) +
+  coord_cartesian(xlim = c(-0.025, 0.025), ylim = c(-0.025, 0.025)) + 
+  labs(title = "Jittered scatterplot")
+
+## ----jitter, fig.cap="Jittered delay scatterplot"------------------------
+ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + 
+  geom_jitter(width = 30, height = 30)
+
+## ---- eval = FALSE-------------------------------------------------------
+## ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) +
+##   geom_jitter(width = 30, height = 30)
+## ggplot(all_alaska_flights, aes(x = dep_delay, y = arr_delay)) +
+##   geom_jitter(width = 30, height = 30)
+
+## **Learning Check Solutions**
+
+## ------------------------------------------------------------------------
+early_january_weather <- weather %>% 
+  filter(origin == "EWR" & month == 1 & day <= 15)
+
+## **Learning Check Solutions**
+
+## ----hourlytemp, fig.cap="Hourly Temperature in Newark for January 1-15, 2013"----
+ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) +
+  geom_line()
+
+## **Learning Check Solutions**
+
+## ---- include=show_solutions('3-5'), echo=show_solutions('3-5')----------
+ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = humid)) +
+  geom_line()
+
+## ----echo=FALSE, fig.height=0.8, fig.cap="Plot of Hourly Temperature Recordings from NYC in 2013"----
+ggplot(data = weather, mapping = aes(x = temp, y = factor("A"))) +
+  geom_point() +
+  theme(axis.ticks.y = element_blank(), 
+        axis.title.y = element_blank(),
+        axis.text.y = element_blank())
+hist_title <- "Histogram of Hourly Temperature Recordings from NYC in 2013"
+
+## ---- warning=TRUE, fig.cap=hist_title-----------------------------------
+ggplot(data = weather, mapping = aes(x = temp)) +
+  geom_histogram()
+
+## ----fig.cap=paste(hist_title, "- 60 Bins")------------------------------
+ggplot(data = weather, mapping = aes(x = temp)) +
+  geom_histogram(bins = 60, color = "white")
+
+## ----fig.cap=paste(hist_title, "- 60 Colored Bins")----------------------
+ggplot(data = weather, mapping = aes(x = temp)) +
+  geom_histogram(bins = 60, color = "white", fill = "steelblue")
+
+## ----fig.cap=paste(hist_title, "- Binwidth = 10"), fig.height=5----------
+ggplot(data = weather, mapping = aes(x = temp)) +
+  geom_histogram(binwidth = 10, color = "white")
+
+## **Learning Check Solutions**
+
+## ---- echo=show_solutions('3-7'), include=show_solutions('3-7'), message=FALSE, warning=FALSE----
+IQR(weather$temp, na.rm=TRUE)
+
+## ---- echo=show_solutions('3-7'), include=show_solutions('3-7'), message=FALSE, warning=FALSE----
+summary(weather$temp)
+
+## ----facethistogram, fig.cap="Faceted histogram"-------------------------
+ggplot(data = weather, mapping = aes(x = temp)) +
+  geom_histogram(binwidth = 5, color = "white") +
+  facet_wrap(~ month, nrow = 4)
+
+## **Learning Check Solutions**
+
+## ----badbox, fig.cap="Invalid boxplot specification", fig.height=3.5-----
+ggplot(data = weather, mapping = aes(x = month, y = temp)) +
+  geom_boxplot()
+
+## ----monthtempbox, fig.cap="Month by temp boxplot", fig.height=3.7-------
+ggplot(data = weather, mapping = aes(x = factor(month), y = temp)) +
+  geom_boxplot()
+
+## ----monthtempbox2, echo=FALSE, fig.cap="November boxplot", fig.height=3.7----
+weather %>% 
+  filter(month %in% c(11)) %>% 
+  ggplot(mapping = aes(x = factor(month), y = temp)) +
+  geom_boxplot()
+
+## ----monthtempbox3, echo=FALSE, fig.cap="November boxplot with points", fig.height=3.7----
+quartiles <- weather %>% filter(month == 11) %>% pull(temp) %>% quantile(prob=c(0.25, 0.5, 0.75))
+weather %>% 
+  filter(month %in% c(11)) %>% 
+  ggplot(mapping = aes(x = factor(month), y = temp)) +
+  geom_boxplot() +
+  geom_jitter(width = 0.05, height = 0.5, alpha = 0.2)
+
+## **Learning Check Solutions**
+
+## ---- include=show_solutions('3-9'), echo=show_solutions('3-9')----------
+weather %>% 
+  filter(month==5 & temp < 25)
+
+## There appears to be only one hour and only at JFK that recorded 13.1 F (-10.5 C) in the month of May. This is probably a data entry mistake!
+
+## ---- echo=show_solutions('3-9'), eval=FALSE-----------------------------
+## weather %>%
+##   group_by(month) %>%
+##   summarize(IQR = IQR(temp, na.rm=TRUE)) %>%
+##   arrange(desc(IQR))
+
+## ---- echo=FALSE, include=show_solutions('3-9')--------------------------
+weather %>%
+  group_by(month) %>%
+  summarize(IQR = IQR(temp, na.rm=TRUE)) %>%
+  arrange(desc(IQR)) %>%
+  kable()
+
+## **`r paste0("(LC", chap, ".", (lc - 1), ")")`: We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can't we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?**
+
+## ------------------------------------------------------------------------
+fruits <- data_frame(
+  fruit = c("apple", "apple", "apple", "orange", "orange")
+)
+fruits_counted <- data_frame(
+  fruit = c("apple", "orange"),
+  number = c(3, 2)
+)
+
+## ----fruits, echo=FALSE--------------------------------------------------
+kable(
+    fruits,
+    digits=2,
+    caption = "Fruits", 
+    booktabs = TRUE
+  )
+
+## ----fruitscounted, echo=FALSE-------------------------------------------
+kable(
+    fruits_counted,
+    digits=2,
+    caption = "Fruits (Pre-Counted)", 
+    booktabs = TRUE
+  )
+
+## ----geombar, fig.cap="Barplot when counts are not pre-counted", fig.height=2.5----
+ggplot(data = fruits, mapping = aes(x = fruit)) +
+  geom_bar()
+
+## ---- geomcol, fig.cap="Barplot when counts are pre-counted", fig.height=2.5----
+ggplot(data = fruits_counted, mapping = aes(x = fruit, y = number)) +
+  geom_col()
+
+## ----flightsbar, fig.cap="Number of flights departing NYC in 2013 by airline using geom_bar", fig.height=2.5----
+ggplot(data = flights, mapping = aes(x = carrier)) +
+  geom_bar()
+
+## ---- eval=FALSE---------------------------------------------------------
+## airlines
+
+## ---- echo=FALSE---------------------------------------------------------
+kable(airlines)
+
+## ----message=FALSE, eval=FALSE-------------------------------------------
+## flights_table <- flights %>%
+##   group_by(carrier) %>%
+##   summarize(number = n())
+## flights_table
+
+## ----message=FALSE, echo=FALSE-------------------------------------------
+flights_table <- flights %>% 
+  group_by(carrier) %>% 
+  summarize(number = n())
+kable(flights_table)
+
+## ----flightscol, fig.cap="Number of flights departing NYC in 2013 by airline using geom_col", fig.height=2.5----
+ggplot(data = flights_table, mapping = aes(x = carrier, y = number)) +
+  geom_col()
+
+## **Learning Check Solutions**
+
+## ----carrierpie, echo=FALSE, fig.cap="The dreaded pie chart", fig.height=5----
+ggplot(flights, mapping = aes(x = factor(1), fill = carrier)) +
+  geom_bar(width = 1) +
+  coord_polar(theta = "y") +
+  theme(axis.title.x = element_blank(), 
+    axis.title.y = element_blank(),
+    axis.ticks = element_blank(),
+    axis.text.y = element_blank(),
+    axis.text.x = element_blank(),
+    panel.grid.major = element_blank(),
+    panel.grid.minor = element_blank()) +
+  guides(fill = guide_legend(keywidth = 0.8, keyheight = 0.8))
+
+## **Learning Check Solutions**
+
+## ----message=FALSE-------------------------------------------------------
+flights_namedports <- flights %>% 
+  inner_join(airports, by = c("origin" = "faa"))
+
+## ---- fig.cap="Stacked barplot comparing the number of flights by carrier and airport", fig.height=3.5----
+ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
+  geom_bar()
+
+## ---- eval=FALSE---------------------------------------------------------
+## ggplot(data = flights_namedports, mapping = aes(x = carrier), fill = name) +
+##   geom_bar()
+
+## **Learning Check Solutions**
+
+## ---- fig.cap="Side-by-side AKA dodged barplot comparing the number of flights by carrier and airport", fig.height=5----
+ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
+  geom_bar(position = "dodge")
+
+## **Learning Check Solutions**
+
+## ----facet-bar-vert, fig.cap="Faceted barplot comparing the number of flights by carrier and airport", fig.height=7.5----
+ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
+  geom_bar() +
+  facet_wrap(~ name, ncol = 1)
+
+## **Learning Check Solutions**
+
+## ----viz-summary-table, echo=FALSE, message=FALSE------------------------
+# Original at https://docs.google.com/spreadsheets/d/1vzqlFiT6qm5wzy_L_0nL7EWAd6jiUZmLSCFhDhztDSg/edit#gid=0
+read_csv("data/ch3_summary_table - Sheet1.csv", na = "") %>% 
+  rename_(" " = "X1") %>% 
+  kable(
+    caption = "Summary of 5NG", 
+    booktabs = TRUE
+  )
+
+## ----viz-map, echo=FALSE, fig.cap="Mind map for Data Visualization", out.width="200%"----
+#library(knitr)
+#if(knitr:::is_html_output()){
+#  include_url("https://coggle.it/diagram/V_G2gzukTDoQ-aZt-", 
+#              height = "1000px")
+#} else {
+  #include_graphics("images/coggleviz.png")
+#}
+
diff --git a/docs/previous_versions/v0.4.0/scripts/04-tidy.R b/docs/previous_versions/v0.4.0/scripts/04-tidy.R
new file mode 100755
index 000000000..413aea2f0
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/04-tidy.R
@@ -0,0 +1,188 @@
+## ----setup_tidy, include=FALSE-------------------------------------------
+chap <- 4
+lc <- 0
+rq <- 0
+# **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**
+# **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
+
+knitr::opts_chunk$set(
+  tidy = FALSE, 
+  out.width = '\\textwidth'
+  )
+
+# This bit of code is a bug fix on asis blocks, which we use to show/not show LC
+# solutions, which are written like markdown text. In theory, it shouldn't be
+# necessary for knitr versions <=1.11.6, but I've found I still need to for
+# everything to knit properly in asis blocks. More info here: 
+# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
+library(knitr)
+knit_engines$set(asis = function(options) {
+  if (options$echo && options$eval) knit_child(text = options$code)
+})
+
+# This controls which LC solutions to show. Options for solutions_shown: "ALL"
+# (to show all solutions), or subsets of c('4-4', '4-5'), including the
+# null vector c('') to show no solutions.
+# solutions_shown <- c('4-1', '4-2', '4-3', '4-4')
+solutions_shown <- c('')
+show_solutions <- function(section){
+  return(solutions_shown == "ALL" | section %in% solutions_shown)
+  }
+
+## ----warning=FALSE, message=FALSE----------------------------------------
+library(dplyr)
+library(ggplot2)
+library(nycflights13)
+library(tidyr)
+library(readr)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+library(fivethirtyeight)
+library(stringr)
+
+## ----tidyfig, echo=FALSE, fig.cap="Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html"----
+knitr::include_graphics("images/tidy-1.png")
+
+## ----echo=FALSE----------------------------------------------------------
+stocks <- data_frame(
+  Date = as.Date('2009-01-01') + 0:4,
+  `Boeing Stock Price` = paste("$", c("173.55", "172.61", "173.86", "170.77", "174.29"), sep = ""),
+  `Amazon Stock Price` = paste("$", c("174.90", "171.42", "171.58", "173.89", "170.16"), sep = ""),
+  `Google Stock Price` = paste("$", c("174.34", "170.04", "173.65", "174.87", "172.19") ,sep = "")
+) %>% 
+  slice(1:2)
+stocks %>% 
+  kable(
+    digits = 2,
+    caption = "Stock Prices (Non-Tidy Format)", 
+    booktabs = TRUE
+  )
+
+## ----echo=FALSE----------------------------------------------------------
+stocks_tidy <- stocks %>% 
+  rename(
+    Boeing = `Boeing Stock Price`,
+    Amazon = `Amazon Stock Price`,
+    Google = `Google Stock Price`
+  ) %>% 
+  gather(`Stock Name`, `Stock Price`, -Date)
+stocks_tidy %>% 
+  kable(
+    digits = 2,
+    caption = "Stock Prices (Tidy Format)", 
+    booktabs = TRUE
+  ) 
+
+## ----echo=FALSE----------------------------------------------------------
+stocks <- data_frame(
+  Date = as.Date('2009-01-01') + 0:4,
+  `Boeing Price` = paste("$", c("173.55", "172.61", "173.86", "170.77", "174.29"), sep = ""),
+  `Weather` = c("Sunny", "Overcast", "Rain", "Rain", "Sunny")
+) %>% 
+  slice(1:2)
+stocks %>% 
+  kable(
+    digits = 2,
+    caption = "Date, Boeing Price, Weather Data", 
+    booktabs = TRUE
+  )
+
+## **_Learning check_**
+
+## ----echo=FALSE----------------------------------------------------------
+drinks_sub <- drinks %>%
+  select(-total_litres_of_pure_alcohol) %>% 
+  filter(country %in% c("USA", "Canada", "South Korea"))
+drinks_sub_tidy <- drinks_sub %>%
+  gather(type, servings, -c(country)) %>%
+  mutate(
+    type = str_sub(type, start=1, end=-10)
+  ) %>%
+  arrange(country, type) %>% 
+  rename(`alcohol type` = type)
+drinks_sub
+
+## **Learning Check Solutions**
+
+## ----lc4-1solutions-2, include=show_solutions('4-1'), echo=FALSE---------
+drinks_sub_tidy
+
+## Note that how the rows are sorted is inconsequential in whether or not the data frame is in tidy format. In other words, the following data frame sorted by alcohol type instead of country is equally in tidy format.
+
+## ----lc4-1solutions-4, include=show_solutions('4-1'), echo=FALSE---------
+drinks_sub_tidy %>% 
+  arrange(`alcohol type`)
+
+## ------------------------------------------------------------------------
+glimpse(airports)
+
+## **_Learning check_**
+
+## **Learning Check Solutions**
+
+## ----message=FALSE, eval=FALSE-------------------------------------------
+## library(readr)
+## dem_score <- read_csv("https://moderndive.com/data/dem_score.csv")
+## dem_score
+
+## ----message=FALSE, echo=FALSE-------------------------------------------
+dem_score <- read_csv("data/dem_score.csv")
+dem_score
+
+## ------------------------------------------------------------------------
+guat_dem <- dem_score %>% 
+  filter(country == "Guatemala")
+guat_dem
+
+## ------------------------------------------------------------------------
+guat_tidy <- gather(data = guat_dem, 
+                    key = year,
+                    value = democracy_score,
+                    - country) 
+guat_tidy
+
+## ----errors=TRUE---------------------------------------------------------
+ggplot(data = guat_tidy, mapping = aes(x = year, y = democracy_score)) +
+  geom_line()
+
+## ----guatline, fig.cap="Guatemala's democracy score ratings from 1952 to 1992"----
+ggplot(data = guat_tidy, mapping = aes(x = parse_number(year), y = democracy_score)) +
+  geom_line() +
+  labs(x = "year")
+
+## **Learning Check Solutions**
+
+## ----lc4-3solutions-2, include=show_solutions('4-3')---------------------
+dem_score_tidy <- gather(data = dem_score, key = year, value = democracy_score, - country)
+
+## Let's now compare the `dem_score` and `dem_score_tidy`. `dem_score` has democracy score information for each year in columns, whereas in `dem_score_tidy` there are explicit variables `year` and `democracy_score`. While both representations of the data contain the same information, we can only use `ggplot()` to create plots using the `dem_score_tidy` data frame.
+
+## ----lc4-3solutions-4, include=show_solutions('4-3')---------------------
+dem_score
+dem_score_tidy
+
+## **`r paste0("(LC", chap, ".", (lc - 1), ")")`** The code is similar
+
+## ----lc4-3solutions-6, include=show_solutions('4-3'), echo=show_solutions('4-3'), message=FALSE, warning=FALSE----
+life_expectancy <- read_csv('https://moderndive.com/data/le_mess.csv')
+life_expectancy_tidy <- gather(data = life_expectancy, key = year, value = life_expectancy, -country)
+
+## We observe the same construct structure with respect to `year` in `life_expectancy` vs `life_expectancy_tidy` as we did in `dem_score` vs `dem_score_tidy`:
+
+## ----lc4-3solutions-8, lc4-2solutions-4, include=show_solutions('4-3')----
+life_expectancy
+life_expectancy_tidy
+
+## ----message=FALSE-------------------------------------------------------
+library(dplyr)
+joined_flights <- inner_join(x = flights, y = airlines, by = "carrier")
+
+## ----eval=FALSE----------------------------------------------------------
+## View(joined_flights)
+
+## **_Learning check_**
+
+## **Learning Check Solutions**
+
diff --git a/docs/previous_versions/v0.4.0/scripts/05-wrangling.R b/docs/previous_versions/v0.4.0/scripts/05-wrangling.R
new file mode 100755
index 000000000..332f8b26f
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/05-wrangling.R
@@ -0,0 +1,336 @@
+## ---- message=FALSE------------------------------------------------------
+library(dplyr)
+library(ggplot2)
+library(nycflights13)
+
+## ---- eval=FALSE---------------------------------------------------------
+## portland_flights <- flights %>%
+##   filter(dest == "PDX")
+## View(portland_flights)
+
+## ---- eval=FALSE---------------------------------------------------------
+## btv_sea_flights_fall <- flights %>%
+##   filter(origin == "JFK", (dest == "BTV" | dest == "SEA"), month >= 10)
+## View(btv_sea_flights_fall)
+
+## ---- eval=FALSE---------------------------------------------------------
+## not_BTV_SEA <- flights %>%
+##   filter(!(dest == "BTV" | dest == "SEA"))
+## View(not_BTV_SEA)
+
+## ---- eval=FALSE---------------------------------------------------------
+## summary_temp <- weather %>%
+##   summarize(mean = mean(temp), std_dev = sd(temp))
+## summary_temp
+
+## ---- echo=FALSE---------------------------------------------------------
+summary_temp <- weather %>% 
+  summarize(mean = mean(temp), std_dev = sd(temp))
+kable(summary_temp)
+
+## ---- eval=FALSE---------------------------------------------------------
+## summary_temp <- weather %>%
+##   summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE))
+## summary_temp
+
+## ---- echo=FALSE---------------------------------------------------------
+summary_temp <- weather %>% 
+  summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE))
+kable(summary_temp)
+
+## ------------------------------------------------------------------------
+#summary_temp$mean
+
+## ----eval=FALSE----------------------------------------------------------
+## summary_temp <- weather %>%
+##   summarize(mean = mean(temp, na.rm = TRUE)) %>%
+##   summarize(std_dev = sd(temp, na.rm = TRUE))
+
+## ---- eval=FALSE---------------------------------------------------------
+## summary_monthly_temp <- weather %>%
+##   group_by(month) %>%
+##   summarize(mean = mean(temp, na.rm = TRUE),
+##             std_dev = sd(temp, na.rm = TRUE))
+## summary_monthly_temp
+
+## ---- echo=FALSE---------------------------------------------------------
+summary_monthly_temp <- weather %>% 
+  group_by(month) %>% 
+  summarize(mean = mean(temp, na.rm = TRUE), 
+            std_dev = sd(temp, na.rm = TRUE))
+kable(summary_monthly_temp)
+
+## ---- eval=FALSE---------------------------------------------------------
+## by_origin <- flights %>%
+##   group_by(origin) %>%
+##   summarize(count = n())
+## by_origin
+
+## ---- echo=FALSE---------------------------------------------------------
+by_origin <- flights %>% 
+  group_by(origin) %>% 
+  summarize(count = n())
+kable(by_origin)
+
+## ------------------------------------------------------------------------
+by_origin_monthly <- flights %>% 
+  group_by(origin, month) %>% 
+  summarize(count = n())
+by_origin_monthly
+
+## ------------------------------------------------------------------------
+by_monthly_origin <- flights %>% 
+  group_by(month, origin) %>% 
+  summarize(count = n())
+by_monthly_origin
+
+## ------------------------------------------------------------------------
+by_origin_monthly_incorrect <- flights %>% 
+  group_by(origin) %>% 
+  group_by(month) %>% 
+  summarize(count = n())
+by_origin_monthly_incorrect
+
+## ---- eval=FALSE---------------------------------------------------------
+## by_monthly_origin <- flights %>%
+##   count(origin, month)
+## by_monthly_origin
+
+## NA
+## ------------------------------------------------------------------------
+flights <- flights %>% 
+  mutate(gain = dep_delay - arr_delay)
+
+## ---- echo=FALSE---------------------------------------------------------
+flights %>% 
+  select(dep_delay, arr_delay, gain) %>% 
+  slice(1:5)
+
+## ---- eval=FALSE---------------------------------------------------------
+## gain_summary <- flights %>%
+##   summarize(
+##     min = min(gain, na.rm = TRUE),
+##     q1 = quantile(gain, 0.25, na.rm = TRUE),
+##     median = quantile(gain, 0.5, na.rm = TRUE),
+##     q3 = quantile(gain, 0.75, na.rm = TRUE),
+##     max = max(gain, na.rm = TRUE),
+##     mean = mean(gain, na.rm = TRUE),
+##     sd = sd(gain, na.rm = TRUE),
+##     missing = sum(is.na(gain))
+##   )
+## gain_summary
+
+## ----echo=FALSE----------------------------------------------------------
+gain_summary <- flights %>% 
+  summarize(
+    min = min(gain, na.rm = TRUE),
+    q1 = quantile(gain, 0.25, na.rm = TRUE),
+    median = quantile(gain, 0.5, na.rm = TRUE),
+    q3 = quantile(gain, 0.75, na.rm = TRUE),
+    max = max(gain, na.rm = TRUE),
+    mean = mean(gain, na.rm = TRUE),
+    sd = sd(gain, na.rm = TRUE),
+    missing = sum(is.na(gain))
+  )
+kable(gain_summary)
+
+## ----message=FALSE, fig.cap="Histogram of gain variable"-----------------
+ggplot(data = flights, mapping = aes(x = gain)) +
+  geom_histogram(color = "white", bins = 20)
+
+## ------------------------------------------------------------------------
+flights <- flights %>% 
+  mutate(
+    gain = dep_delay - arr_delay,
+    hours = air_time / 60,
+    gain_per_hour = gain / hours
+  )
+
+## ---- eval---------------------------------------------------------------
+freq_dest <- flights %>% 
+  group_by(dest) %>% 
+  summarize(num_flights = n())
+freq_dest
+
+## ------------------------------------------------------------------------
+freq_dest %>% 
+  arrange(num_flights)
+
+## ------------------------------------------------------------------------
+freq_dest %>% 
+  arrange(desc(num_flights))
+
+## ----eval=FALSE----------------------------------------------------------
+## View(airlines)
+
+## ----eval=FALSE----------------------------------------------------------
+## flights_joined <- flights %>%
+##   inner_join(airlines, by = "carrier")
+## View(flights)
+## View(flights_joined)
+
+## ----eval=FALSE----------------------------------------------------------
+## View(airports)
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights %>%
+##   inner_join(airports, by = c("dest" = "faa"))
+
+## ------------------------------------------------------------------------
+named_dests <- flights %>%
+  group_by(dest) %>%
+  summarize(num_flights = n()) %>%
+  arrange(desc(num_flights)) %>%
+  inner_join(airports, by = c("dest" = "faa")) %>%
+  rename(airport_name = name)
+named_dests
+
+## ------------------------------------------------------------------------
+flights_weather_joined <- flights %>%
+  inner_join(weather, by = c("year", "month", "day", "hour", "origin"))
+flights_weather_joined
+
+## ---- eval=FALSE---------------------------------------------------------
+## glimpse(flights)
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights %>%
+##   select(carrier, flight)
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_no_year <- flights %>%
+##   select(-year)
+## names(flights_no_year)
+
+## ---- eval=FALSE---------------------------------------------------------
+## flight_arr_times <- flights %>%
+##   select(month:day, arr_time:sched_arr_time)
+## flight_arr_times
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_reorder <- flights %>%
+##   select(month:day, hour:time_hour, everything())
+## names(flights_reorder)
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_begin_a <- flights %>%
+##   select(starts_with("a"))
+## flights_begin_a
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_delays <- flights %>%
+##   select(ends_with("delay"))
+## flights_delays
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_time <- flights %>%
+##   select(contains("time"))
+## flights_time
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_time_new <- flights %>%
+##   select(contains("time")) %>%
+##   rename(departure_time = dep_time,
+##          arrival_time = arr_time)
+## names(flights_time)
+
+## ---- eval=FALSE---------------------------------------------------------
+## named_dests %>%
+##   top_n(n = 10, wt = num_flights)
+
+## ---- eval=FALSE---------------------------------------------------------
+## named_dests  %>%
+##   top_n(n = 10, wt = num_flights) %>%
+##   arrange(desc(num_flights))
+
+## ---- eval=FALSE---------------------------------------------------------
+## ten_freq_dests <- flights %>%
+##   group_by(dest) %>%
+##   summarize(num_flights = n()) %>%
+##   arrange(desc(num_flights)) %>%
+##   top_n(n = 10)
+## View(ten_freq_dests)
+
+## ----wrangle-summary-table, echo=FALSE, message=FALSE--------------------
+# Original at https://docs.google.com/spreadsheets/d/1nRkXfYMQiTj79c08xQPY0zkoJSpde3NC1w6DRhsWCss/edit#gid=0
+read_csv("data/ch5_summary_table - Sheet1.csv", na = "") %>% 
+  rename_(" " = "X1") %>% 
+  kable(
+    caption = "Summary of data wrangling verbs", 
+    booktabs = TRUE
+  )
+
+## **Learning Check Solutions**
+
+## ----lc5-71solutions-2, include=show_solutions('5-7')--------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  mutate(ASM = seats * distance) %>% 
+  group_by(carrier) %>% 
+  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
+  arrange(desc(ASM))
+
+## Let's now break this down step-by-step. To compute the available seat miles for a given flight, we need the `distance` variable from the `flights` data frame and the `seats` variable from the `planes` data frame, necessitating a join by the key variable `tailnum` as illustrated in Figure \@ref(fig:reldiagram). To keep the resulting data frame easy to view, we'll `select()` only these two variables and `carrier`:
+
+## ----lc5-71solutions-4, include=show_solutions('5-7')--------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance)
+
+## Now for each flight we can compute the available seat miles `ASM` by multiplying the number of seats by the distance via a `mutate()`:
+
+## ----lc5-71solutions-6, include=show_solutions('5-7')--------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  # Added:
+  mutate(ASM = seats * distance)
+
+## Next we want to sum the `ASM` for each carrier. We achieve this by first grouping by `carrier` and then summarizing using the `sum()` function:
+
+## ----lc5-71solutions-8, include=show_solutions('5-7')--------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  mutate(ASM = seats * distance) %>% 
+  # Added:
+  group_by(carrier) %>% 
+  summarize(ASM = sum(ASM))
+
+## However, because for certain carriers certain flights have missing `NA` values, the resulting table also returns `NA`'s. We can eliminate these by adding a `na.rm = TRUE` argument to `sum()`, telling R that we want to remove the `NA`'s in the sum. We saw this in Section \ref(summarize):
+
+## ----lc5-71solutions-10, include=show_solutions('5-7')-------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  mutate(ASM = seats * distance) %>% 
+  group_by(carrier) %>% 
+  # Modified:
+  summarize(ASM = sum(ASM, na.rm = TRUE))
+
+## Finally, we `arrange()` the data in `desc()`ending order of `ASM`.
+
+## ----lc5-71solutions-12, include=show_solutions('5-7')-------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  mutate(ASM = seats * distance) %>% 
+  group_by(carrier) %>% 
+  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
+  # Added:
+  arrange(desc(ASM))
+
+## While the above data frame is correct, the IATA `carrier` code is not always useful. For example, what carrier is `WN`? We can address this by joining with the `airlines` dataset using `carrier` is the key variable. While this step is not absolutely required, it goes a long way to making the table easier to make sense of. It is important to be empathetic with the ultimate consumers of your presented data!
+
+## ----lc5-71solutions-14, include=show_solutions('5-7')-------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  mutate(ASM = seats * distance) %>% 
+  group_by(carrier) %>% 
+  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
+  arrange(desc(ASM)) %>% 
+  # Added:
+  inner_join(airlines, by = "carrier")
+
diff --git a/docs/previous_versions/v0.4.0/scripts/06-regression.R b/docs/previous_versions/v0.4.0/scripts/06-regression.R
new file mode 100755
index 000000000..74b3f07b0
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/06-regression.R
@@ -0,0 +1,545 @@
+## ---- message=FALSE, warning=FALSE---------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+library(gapminder)
+library(skimr)
+
+## ---- message=FALSE, warning=FALSE, echo=FALSE---------------------------
+# Packages needed internally, but not in text.
+library(mvtnorm)
+library(tidyr)
+library(forcats)
+library(gridExtra)
+library(broom)
+library(janitor)
+library(patchwork)
+
+## ------------------------------------------------------------------------
+evals_ch6 <- evals %>%
+  select(score, bty_avg, age)
+
+## ---- eval=FALSE---------------------------------------------------------
+## evals_ch6 %>%
+##   sample_n(5)
+
+## ---- echo=FALSE---------------------------------------------------------
+set.seed(76)
+evals_ch6 %>%
+  sample_n(5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Random sample of 5 instructors",
+    booktabs = TRUE
+  )
+
+## ------------------------------------------------------------------------
+glimpse(evals_ch6)
+
+## ------------------------------------------------------------------------
+evals_ch6 %>% 
+  select(score, bty_avg) %>% 
+  skim()
+
+## ----correlation1, echo=FALSE, fig.cap="Different correlation coefficients"----
+correlation <- c(-0.9999, -0.75, 0, 0.75, 0.9999)
+n_sim <- 100
+
+values <- NULL
+for(i in seq_len(length(correlation))){
+  rho <- correlation[i]
+  sigma <- matrix(c(5, rho * sqrt(50), rho * sqrt(50), 10), 2, 2) 
+  sim <- rmvnorm(
+    n = n_sim,
+    mean = c(20,40),
+    sigma = sigma
+    ) %>%
+    as_data_frame() %>% 
+    mutate(correlation = round(rho, 2))
+  
+  values <- bind_rows(values, sim)
+}
+
+ggplot(data = values, mapping = aes(V1, V2)) +
+  geom_point() +
+  facet_wrap(~ correlation, nrow = 2) +
+  labs(x = "x", y = "y") + 
+  theme(
+    axis.text.x = element_blank(),
+    axis.text.y = element_blank(),
+    axis.ticks = element_blank()
+  )
+
+## ------------------------------------------------------------------------
+evals_ch6 %>% 
+  get_correlation(formula = score ~ bty_avg)
+
+## ------------------------------------------------------------------------
+cor(x = evals_ch6$bty_avg, y = evals_ch6$score)
+
+## ----numxplot1, warning=FALSE, fig.cap="Instructor evaluation scores at UT Austin"----
+ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Relationship of teaching and beauty scores")
+
+## ----numxplot2, echo=FALSE, warning=FALSE, fig.cap="Instructor evaluation scores at UT Austin: Jittered"----
+set.seed(76)
+ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_jitter() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Relationship of teaching and beauty scores")
+
+## ----numxplot2-a, echo=FALSE, warning=FALSE, fig.cap="Comparing regular and jittered scatterplots."----
+box <- data_frame(x=c(7.6, 8, 8, 7.6, 7.6), y=c(4.75, 4.75, 5.1, 5.1, 4.75))
+p1 <- ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Regular scatterplot") +
+  geom_path(data = box, aes(x=x, y=y), col = "orange", size = 1)
+set.seed(76)
+p2 <- ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_jitter() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Jittered scatterplot") +
+  geom_path(data = box, aes(x=x, y=y), col = "orange", size = 1)
+p1 + p2
+
+## ---- echo=FALSE---------------------------------------------------------
+set.seed(76)
+
+## ----numxplot3, warning=FALSE, fig.cap="Regression line"-----------------
+ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Relationship of teaching and beauty scores") +  
+  geom_smooth(method = "lm")
+
+## ---- echo=FALSE---------------------------------------------------------
+set.seed(76)
+
+## ----numxplot4, warning=FALSE, fig.cap="Regression line without error bands"----
+ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Relationship of teaching and beauty scores") +
+  geom_smooth(method = "lm", se = FALSE)
+
+## ----regtable-0----------------------------------------------------------
+score_model <- lm(score ~ bty_avg, data = evals_ch6)
+score_model
+
+## ----regtable, eval=FALSE------------------------------------------------
+## # Fit regression model:
+## score_model <- lm(score ~ bty_avg, data = evals_ch6)
+## # Get regression table:
+## get_regression_table(score_model)
+
+## ---- echo=FALSE---------------------------------------------------------
+score_model <- lm(score ~ bty_avg, data = evals_ch6)
+evals_line <- score_model %>% 
+  get_regression_table() %>%
+  pull(estimate)
+
+## ----numxplot4b, echo=FALSE----------------------------------------------
+get_regression_table(score_model) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Linear regression table",
+    booktabs = TRUE
+  )
+
+## ----moderndive-figure-wrapper, echo=FALSE, fig.align='center', fig.cap="The concept of a 'wrapper' function."----
+knitr::include_graphics("images/flowcharts/flowchart.011-cropped.png")
+
+## ---- echo=FALSE---------------------------------------------------------
+index <- which(evals_ch6$bty_avg == 7.333 & evals_ch6$score == 4.9)
+target_point <- score_model %>% 
+  get_regression_points() %>% 
+  slice(index)
+x <- target_point$bty_avg
+y <- target_point$score
+y_hat <- target_point$score_hat
+resid <- target_point$residual
+evals_ch6 %>%
+  slice(index) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Data for 21st instructor",
+    booktabs = TRUE
+  )
+
+## ---- echo=FALSE---------------------------------------------------------
+set.seed(76)
+
+## ----numxplot5, echo=FALSE, warning=FALSE, fig.cap="Example of observed value, fitted value, and residual"----
+best_fit_plot <- ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Relationship of teaching and beauty scores") + 
+  geom_smooth(method = "lm", se = FALSE) +
+  annotate("point", x = x, y = y, col = "red", size = 3) +
+  annotate("point", x = x, y = y_hat, col = "red", shape = 15, size = 3) +
+  annotate("segment", x = x, xend = x, y = y, yend = y_hat, color = "blue",
+           arrow = arrow(type = "closed", length = unit(0.02, "npc")))
+best_fit_plot
+
+## ---- eval=FALSE---------------------------------------------------------
+## regression_points <- get_regression_points(score_model)
+## regression_points
+
+## ---- echo=FALSE---------------------------------------------------------
+set.seed(76)
+regression_points <- get_regression_points(score_model) 
+regression_points %>%
+  slice(c(index, index + 1, index + 2, index + 3)) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Regression points (for only 21st through 24th instructor)",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE, echo=TRUE----------------------------------------------
+## ggplot(regression_points, aes(x = bty_avg, y = residual)) +
+##   geom_point() +
+##   labs(x = "Beauty Score", y = "Residual") +
+##   geom_hline(yintercept = 0, col = "blue", size = 1)
+
+## ----numxplot6, echo=FALSE, warning=FALSE, fig.cap="Plot of residuals over beauty score"----
+ggplot(regression_points, aes(x = bty_avg, y = residual)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Residual") +
+  geom_hline(yintercept = 0, col = "blue", size = 1) +
+  annotate("point", x = x, y = resid, col = "red", size = 3) +
+  annotate("point", x = x, y = 0, col = "red", shape = 15, size = 3) +
+  annotate("segment", x = x, xend = x, y = resid, yend = 0, color = "blue",
+           arrow = arrow(type = "closed", length = unit(0.02, "npc")))
+
+## ----numxplot7, echo=FALSE, warning=FALSE, fig.cap="Examples of less than ideal residual patterns"----
+resid_ex <- evals_ch6
+resid_ex$ex_1 <- ((evals_ch6$bty_avg - 5) ^ 2 - 6 + rnorm(nrow(evals_ch6), 0, 0.5)) * 0.4
+resid_ex$ex_2 <- (rnorm(nrow(evals_ch6), 0, 0.075 * evals_ch6$bty_avg ^ 2)) * 0.4
+  
+resid_ex <- resid_ex %>%
+  select(bty_avg, ex_1, ex_2) %>%
+  gather(type, eps, -bty_avg) %>% 
+  mutate(type = ifelse(type == "ex_1", "Example 1", "Example 2"))
+
+ggplot(resid_ex, aes(x = bty_avg, y = eps)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Residual") +
+  geom_hline(yintercept = 0, col = "blue", size = 1) +
+  facet_wrap(~type)
+
+## ---- eval=FALSE, echo=TRUE----------------------------------------------
+## ggplot(regression_points, aes(x = residual)) +
+##   geom_histogram(binwidth = 0.25, color = "white") +
+##   labs(x = "Residual")
+
+## ----model1residualshist, echo=FALSE, warning=FALSE, fig.cap= "Histogram of residuals"----
+ggplot(regression_points, aes(x = residual)) +
+  geom_histogram(binwidth = 0.25, color = "white") +
+  labs(x = "Residual")
+
+## ----numxplot9, echo=FALSE, warning=FALSE, fig.cap="Examples of ideal and less than ideal residual patterns"----
+resid_ex <- evals_ch6
+resid_ex$`Ideal` <- rnorm(nrow(resid_ex), 0, sd = sd(regression_points$residual))
+resid_ex$`Less than ideal` <-
+  rnorm(nrow(resid_ex), 0, sd = sd(regression_points$residual))^2
+resid_ex$`Less than ideal` <- resid_ex$`Less than ideal` - mean(resid_ex$`Less than ideal` )
+
+resid_ex <- resid_ex %>%
+  select(bty_avg, `Ideal`, `Less than ideal`) %>%
+  gather(type, eps, -bty_avg)
+
+ggplot(resid_ex, aes(x = eps)) +
+  geom_histogram(binwidth = 0.25, color = "white") +
+  labs(x = "Residual") +
+  facet_wrap( ~ type, scales = "free")
+
+## ---- warning=FALSE, message=FALSE---------------------------------------
+library(gapminder)
+gapminder2007 <- gapminder %>%
+  filter(year == 2007) %>% 
+  select(country, continent, lifeExp, gdpPercap)
+
+## ---- eval=FALSE---------------------------------------------------------
+## View(gapminder2007)
+
+## ----model2-data-preview, echo=FALSE-------------------------------------
+gapminder2007 %>%
+  sample_n(5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Random sample of 5 countries",
+    booktabs = TRUE
+  )
+
+## ------------------------------------------------------------------------
+glimpse(gapminder2007)
+
+## ------------------------------------------------------------------------
+gapminder2007 %>% 
+  select(continent, lifeExp) %>% 
+  skim()
+
+## ---- echo=FALSE---------------------------------------------------------
+lifeExp_worldwide <- gapminder2007 %>%
+  summarize(median = median(lifeExp), mean = mean(lifeExp))
+
+## ----lifeExp2007hist, echo=FALSE, warning=FALSE, fig.cap="Histogram of Life Expectancy in 2007"----
+ggplot(gapminder2007, aes(x = lifeExp)) +
+  geom_histogram(binwidth = 5, color = "white") +
+  labs(x = "Life expectancy", y = "Number of countries", 
+       title = "Worldwide life expectancy")
+
+## ---- eval=TRUE----------------------------------------------------------
+lifeExp_by_continent <- gapminder2007 %>%
+  group_by(continent) %>%
+  summarize(median = median(lifeExp), mean = mean(lifeExp))
+
+## ----catxplot0, echo=FALSE-----------------------------------------------
+lifeExp_by_continent %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Life expectancy by continent",
+    booktabs = TRUE
+  )
+
+## ---- echo=FALSE---------------------------------------------------------
+median_africa <- lifeExp_by_continent %>%
+  filter(continent == "Africa") %>%
+  pull(median)
+mean_africa <- lifeExp_by_continent %>%
+  filter(continent == "Africa") %>%
+  pull(mean)
+n_countries <- gapminder2007 %>% nrow()
+n_countries_africa <- gapminder2007 %>% filter(continent == "Africa") %>% nrow()
+
+## ----catxplot0b, warning=FALSE, fig.cap="Life expectancy in 2007"--------
+ggplot(gapminder2007, aes(x = lifeExp)) +
+  geom_histogram(binwidth = 5, color = "white") +
+  labs(x = "Life expectancy", y = "Number of countries", 
+       title = "Life expectancy by continent") +
+  facet_wrap(~ continent, nrow = 2)
+
+## ----catxplot1, warning=FALSE, fig.cap="Life expectancy in 2007"---------
+ggplot(gapminder2007, aes(x = continent, y = lifeExp)) +
+  geom_boxplot() +
+  labs(x = "Continent", y = "Life expectancy (years)", 
+       title = "Life expectancy by continent") 
+
+## ----continent-mean-life-expectancies, echo=FALSE------------------------
+gapminder2007 %>%
+  group_by(continent) %>%
+  summarize(mean = mean(lifeExp)) %>%
+  mutate(`mean vs Africa` = mean - mean_africa) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Mean life expectancy by continent",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## lifeExp_model <- lm(lifeExp ~ continent, data = gapminder2007)
+## get_regression_table(lifeExp_model)
+
+## ---- echo=FALSE---------------------------------------------------------
+lifeExp_model <- lm(lifeExp ~ continent, data = gapminder2007)
+evals_line <- get_regression_table(lifeExp_model) %>%
+  pull(estimate)
+
+## ----catxplot4b, echo=FALSE----------------------------------------------
+get_regression_table(lifeExp_model) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Linear regression table",
+    booktabs = TRUE
+  )
+
+## ---- echo=FALSE---------------------------------------------------------
+gapminder2007 %>%
+  slice(1:10) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "First 10 out of 142 countries",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## regression_points <- get_regression_points(lifeExp_model)
+## regression_points
+
+## ---- echo=FALSE---------------------------------------------------------
+regression_points <- get_regression_points(lifeExp_model)
+regression_points %>%
+  slice(1:10) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Regression points (First 10 out of 142 countries)",
+    booktabs = TRUE
+  )
+
+## ----catxplot7, warning=FALSE, fig.cap="Plot of residuals over continent"----
+ggplot(regression_points, aes(x = continent, y = residual)) +
+  geom_jitter(width = 0.1) + 
+  labs(x = "Continent", y = "Residual") +
+  geom_hline(yintercept = 0, col = "blue")
+
+## ---- eval=FALSE---------------------------------------------------------
+## gapminder2007 %>%
+##   filter(continent == "Asia") %>%
+##   arrange(lifeExp)
+
+## ---- echo=FALSE---------------------------------------------------------
+gapminder2007 %>%
+  filter(continent == "Asia") %>%
+  arrange(lifeExp) %>%
+  slice(1:5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Countries in Asia with shortest life expectancy",
+    booktabs = TRUE
+  )
+
+## ----catxplot8, warning=FALSE, fig.cap="Histogram of residuals"----------
+ggplot(regression_points, aes(x = residual)) +
+  geom_histogram(binwidth = 5, color = "white") +
+  labs(x = "Residual")
+
+## ----correlation2, echo=FALSE, fig.cap="Different Correlation Coefficients"----
+correlation <- c(-0.9999, -0.9, -0.75, -0.3, 0, 0.3, 0.75, 0.9, 0.9999)
+n_sim <- 100
+
+values <- NULL
+for(i in seq_len(length(correlation))){
+  rho <- correlation[i]
+  sigma <- matrix(c(5, rho * sqrt(50), rho * sqrt(50), 10), 2, 2) 
+  sim <- rmvnorm(
+    n = n_sim,
+    mean = c(20,40),
+    sigma = sigma
+    ) %>%
+    as_data_frame() %>% 
+    mutate(correlation = round(rho,2))
+  
+  values <- bind_rows(values, sim)
+}
+
+ggplot(data = values, mapping = aes(V1, V2)) +
+  geom_point() +
+  facet_wrap(~ correlation, ncol = 3) +
+  labs(x = "x", y = "y") + 
+  theme(
+    axis.text.x = element_blank(),
+    axis.text.y = element_blank(),
+    axis.ticks = element_blank()
+  )
+
+## ----moderndive-figure-causal-graph-2, echo=FALSE, fig.align='center', fig.cap="Does sleeping with shoes on cause headaches?"----
+knitr::include_graphics("images/flowcharts/flowchart.010-cropped.png")
+
+## ----moderndive-figure-causal-graph, echo=FALSE, fig.align='center', fig.cap="Causal graph."----
+knitr::include_graphics("images/flowcharts/flowchart.009-cropped.png")
+
+## ----echo=FALSE----------------------------------------------------------
+index <- which(evals_ch6$bty_avg == 2.333 & evals_ch6$score == 2.7)
+target_point <- get_regression_points(score_model) %>% 
+  slice(index)
+x <- target_point$bty_avg
+y <- target_point$score
+y_hat <- target_point$score_hat
+resid <- target_point$residual
+
+best_fit_plot <- best_fit_plot +
+  annotate("point", x = x, y = y, col = "red", size = 3) +
+  annotate("point", x = x, y = y_hat, col = "red", shape = 15, size = 3) +
+  annotate("segment", x = x, xend = x, y = y, yend = y_hat, color = "blue",
+           arrow = arrow(type = "closed", length = unit(0.02, "npc")))
+best_fit_plot
+
+## ---- echo=FALSE---------------------------------------------------------
+index <- which(evals_ch6$bty_avg == 3.667 & evals_ch6$score == 4.4)
+score_model <- lm(score ~ bty_avg, data = evals_ch6)
+target_point <- get_regression_points(score_model) %>% 
+  slice(index)
+x <- target_point$bty_avg
+y <- target_point$score
+y_hat <- target_point$score_hat
+resid <- target_point$residual
+
+best_fit_plot <- best_fit_plot +
+  annotate("point", x = x, y = y, col = "red", size = 3) +
+  annotate("point", x = x, y = y_hat, col = "red", shape = 15, size = 3) +
+  annotate("segment", x = x, xend = x, y = y, yend = y_hat,
+           color = "blue", 
+           arrow = arrow(type = "closed", length = unit(0.02, "npc")))
+best_fit_plot
+
+## ----here, echo=FALSE----------------------------------------------------
+index <- which(evals_ch6$bty_avg == 6 & evals_ch6$score == 3.8)
+score_model <- lm(score ~ bty_avg, data = evals_ch6)
+target_point <- get_regression_points(score_model) %>%
+  slice(index)
+x <- target_point$bty_avg
+y <- target_point$score
+y_hat <- target_point$score_hat
+resid <- target_point$residual
+
+best_fit_plot <- best_fit_plot +
+  annotate("point", x = x, y = y, col = "red", size = 3) +
+  annotate("point", x = x, y = y_hat, col = "red", shape = 15, size = 3) +
+  annotate("segment", x = x, xend = x, y = y, yend = y_hat, color = "blue",
+           arrow = arrow(type = "closed", length = unit(0.02, "npc")))
+best_fit_plot
+
+## ---- eval = FALSE-------------------------------------------------------
+## score_model <- lm(score ~ bty_avg, data = evals_ch6)
+## get_regression_table(score_model)
+
+## ---- echo = FALSE-------------------------------------------------------
+score_model <- lm(score ~ bty_avg, data = evals_ch6)
+get_regression_table(score_model) %>% 
+  knitr::kable()
+
+## ---- eval = FALSE-------------------------------------------------------
+## library(broom)
+## library(janitor)
+## score_model %>%
+##   tidy(conf.int = TRUE) %>%
+##   mutate_if(is.numeric, round, digits = 3) %>%
+##   clean_names() %>%
+##   rename(lower_ci = conf_low,
+##          upper_ci = conf_high)
+
+## ---- echo = FALSE-------------------------------------------------------
+library(broom)
+library(janitor)
+score_model %>% 
+  tidy(conf.int = TRUE) %>% 
+  mutate_if(is.numeric, round, digits = 3) %>%
+  clean_names() %>% 
+  rename(lower_ci = conf_low,
+         upper_ci = conf_high) %>% 
+  knitr::kable()
+
+## ---- eval = FALSE-------------------------------------------------------
+## library(broom)
+## library(janitor)
+## score_model %>%
+##   augment() %>%
+##   mutate_if(is.numeric, round, digits = 3) %>%
+##   clean_names() %>%
+##   select(-c("se_fit", "hat", "sigma", "cooksd", "std_resid"))
+
+## ---- echo = FALSE-------------------------------------------------------
+library(broom)
+library(janitor)
+score_model %>% 
+  augment() %>% 
+  mutate_if(is.numeric, round, digits = 3) %>%
+  clean_names() %>% 
+  select(-c("se_fit", "hat", "sigma", "cooksd", "std_resid")) %>% 
+  slice(1:10) %>% 
+  knitr::kable()
+
diff --git a/docs/previous_versions/v0.4.0/scripts/07-multiple-regression.R b/docs/previous_versions/v0.4.0/scripts/07-multiple-regression.R
new file mode 100755
index 000000000..5fedd3571
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/07-multiple-regression.R
@@ -0,0 +1,386 @@
+## ---- message=FALSE, warning=FALSE---------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+library(ISLR)
+library(skimr)
+
+## ---- message=FALSE, warning=FALSE, echo=FALSE---------------------------
+# Packages needed internally, but not in text:
+library(mvtnorm)
+library(tidyr)
+library(forcats)
+library(gridExtra)
+
+## ---- warning=FALSE, message=FALSE---------------------------------------
+library(ISLR)
+Credit <- Credit %>%
+  select(Balance, Limit, Income, Rating, Age)
+
+## ---- eval=FALSE---------------------------------------------------------
+## View(Credit)
+
+## ----model3-data-preview, echo=FALSE-------------------------------------
+Credit %>%
+  sample_n(5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Random sample of 5 credit card holders",
+    booktabs = TRUE
+  )
+
+## ------------------------------------------------------------------------
+glimpse(Credit)
+
+## ------------------------------------------------------------------------
+Credit %>% 
+  select(Balance, Limit, Income) %>% 
+  skim()
+
+## ---- eval=FALSE---------------------------------------------------------
+## Credit %>%
+##   get_correlation(Balance ~ Limit)
+## Credit %>%
+##   get_correlation(Balance ~ Income)
+
+## ---- eval=FALSE---------------------------------------------------------
+## Credit %>%
+##   select(Balance, Limit, Income) %>%
+##   cor()
+
+## ----model3-correlation, echo=FALSE--------------------------------------
+Credit %>% 
+  select(Balance, Limit, Income) %>% 
+  cor() %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Correlations between credit card balance, credit limit, and income", 
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## ggplot(Credit, aes(x = Limit, y = Balance)) +
+##   geom_point() +
+##   labs(x = "Credit limit (in $)", y = "Credit card balance (in $)",
+##        title = "Relationship between balance and credit limit") +
+##   geom_smooth(method = "lm", se = FALSE)
+## 
+## ggplot(Credit, aes(x = Income, y = Balance)) +
+##   geom_point() +
+##   labs(x = "Income (in $1000)", y = "Credit card balance (in $)",
+##        title = "Relationship between balance and income") +
+##   geom_smooth(method = "lm", se = FALSE)
+
+## ----2numxplot1, echo=FALSE, fig.height=4, fig.cap="Relationship between credit card balance and credit limit/income"----
+model3_balance_vs_limit_plot <- ggplot(Credit, aes(x = Limit, y = Balance)) +
+  geom_point() +
+  labs(x = "Credit limit (in $)", y = "Credit card balance (in $)", 
+       title = "Balance vs credit limit") +
+  geom_smooth(method = "lm", se = FALSE)
+model3_balance_vs_income_plot <- ggplot(Credit, aes(x = Income, y = Balance)) +
+  geom_point() +
+  labs(x = "Income (in $1000)", y = "Credit card balance (in $)", 
+       title = "Balance vs income") +
+  geom_smooth(method = "lm", se = FALSE) +
+  scale_y_continuous(limits = c(0, NA))
+grid.arrange(model3_balance_vs_limit_plot, model3_balance_vs_income_plot, nrow = 1)
+
+## ---- eval=FALSE, echo=FALSE---------------------------------------------
+## # Save as 798 x 562 images/credit_card_balance_3D_scatterplot.png
+## library(ISLR)
+## library(plotly)
+## plot_ly(showscale=FALSE) %>%
+##   add_markers(
+##     x = Credit$Income,
+##     y = Credit$Limit,
+##     z = Credit$Balance,
+##     hoverinfo = 'text',
+##     text = ~paste("x1 - Income: ", Credit$Income,
+##                   "</br> x2 - Limit: ", Credit$Limit,
+##                   "</br> y - Balance: ", Credit$Balance)
+##   ) %>%
+##   layout(
+##     scene = list(
+##       xaxis = list(title = "x1 - Income (in $10K)"),
+##       yaxis = list(title = "x2 - Limit ($)"),
+##       zaxis = list(title = "y - Balance ($)")
+##     )
+##   )
+
+## ---- eval=FALSE, echo=FALSE---------------------------------------------
+## # Save as 798 x 562 images/credit_card_balance_regression_plane.png
+## library(ISLR)
+## library(plotly)
+## library(tidyverse)
+## 
+## # setup hideous grid required by plotly
+## model_lm <- lm(Balance ~ Income + Limit, data=Credit)
+## x_grid <- seq(from=min(Credit$Income), to=max(Credit$Income), length=100)
+## y_grid <- seq(from=min(Credit$Limit), to=max(Credit$Limit), length=200)
+## z_grid <- expand.grid(x_grid, y_grid) %>%
+##   tbl_df() %>%
+##   rename(
+##     x_grid = Var1,
+##     y_grid = Var2
+##   ) %>%
+##   mutate(z = coef(model_lm)[1] + coef(model_lm)[2]*x_grid + coef(model_lm)[3]*y_grid) %>%
+##   .[["z"]] %>%
+##   matrix(nrow=length(x_grid)) %>%
+##   t()
+## 
+## # plot points and plane
+## plot_ly(showscale = FALSE) %>%
+##   add_markers(
+##     x = Credit$Income,
+##     y = Credit$Limit,
+##     z = Credit$Balance,
+##     hoverinfo = 'text',
+##     text = ~paste("x1 - Income: ", Credit$Income, "</br> x2 - Limit: ",
+##                   Credit$Limit, "</br> y - Balance: ", Credit$Balance)
+##   ) %>%
+##   layout(
+##     scene = list(
+##       xaxis = list(title = "x1 - Income (in $10K)"),
+##       yaxis = list(title = "x2 - Limit ($)"),
+##       zaxis = list(title = "y - Balance ($)")
+##     )
+##   ) %>%
+##   add_surface(
+##     x = x_grid,
+##     y = y_grid,
+##     z = z_grid
+##   )
+
+## ---- eval=FALSE---------------------------------------------------------
+## Balance_model <- lm(Balance ~ Limit + Income, data = Credit)
+## get_regression_table(Balance_model)
+
+## ---- echo=FALSE---------------------------------------------------------
+Balance_model <- lm(Balance ~ Limit + Income, data = Credit)
+Credit_line <- get_regression_table(Balance_model) %>%
+  pull(estimate)
+
+## ----model3-table-output, echo=FALSE-------------------------------------
+get_regression_table(Balance_model) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Multiple regression table", 
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## regression_points <- get_regression_points(Balance_model)
+## regression_points
+
+## ----model3-points-table, echo=FALSE-------------------------------------
+set.seed(76)
+regression_points <- get_regression_points(Balance_model)
+regression_points %>%
+  slice(1:5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Regression points (first 5 rows of 400)",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## ggplot(regression_points, aes(x = Limit, y = residual)) +
+##   geom_point() +
+##   labs(x = "Credit limit (in $)", y = "Residual", title = "Residuals vs credit limit")
+## 
+## ggplot(regression_points, aes(x = Income, y = residual)) +
+##   geom_point() +
+##   labs(x = "Income (in $1000)", y = "Residual", title = "Residuals vs income")
+
+## ---- echo=FALSE, fig.height=4, fig.cap="Residuals vs credit limit and income"----
+model3_residual_vs_limit_plot <- ggplot(regression_points, aes(x = Limit, y = residual)) +
+  geom_point() +
+  labs(x = "Credit limit (in $)", y = "Residual", 
+       title = "Residuals vs credit limit")
+model3_residual_vs_income_plot <- ggplot(regression_points, aes(x = Income, y = residual)) +
+  geom_point() +
+  labs(x = "Income (in $1000)", y = "Residual", 
+       title = "Residuals vs income")
+grid.arrange(model3_residual_vs_limit_plot, model3_residual_vs_income_plot, nrow = 1)
+
+## ----model3-residuals-hist, fig.height=4, fig.cap="Relationship between credit card balance and credit limit/income"----
+ggplot(regression_points, aes(x = residual)) +
+  geom_histogram(color = "white") +
+  labs(x = "Residual")
+
+## ------------------------------------------------------------------------
+evals_ch7 <- evals %>%
+  select(score, age, gender)
+
+## ---- eval=FALSE---------------------------------------------------------
+## View(evals_ch7)
+
+## ----model4-data-preview, echo=FALSE-------------------------------------
+evals_ch7 %>%
+  sample_n(5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Random sample of 5 instructors",
+    booktabs = TRUE
+  )
+
+## ------------------------------------------------------------------------
+evals_ch7 %>% 
+  skim()
+
+## ------------------------------------------------------------------------
+evals_ch7 %>% 
+  get_correlation(formula = score ~ age)
+
+## ----numxcatxplot1, warning=FALSE, fig.cap="Instructor evaluation scores at UT Austin split by gender (jittered)"----
+ggplot(evals_ch7, aes(x = age, y = score, color = gender)) +
+  geom_jitter() +
+  labs(x = "Age", y = "Teaching Score", color = "Gender") +
+  geom_smooth(method = "lm", se = FALSE)
+
+## ---- eval=FALSE---------------------------------------------------------
+## score_model_2 <- lm(score ~ age + gender, data = evals_ch7)
+## get_regression_table(score_model_2)
+
+## ---- echo=FALSE---------------------------------------------------------
+score_model_2 <- lm(score ~ age + gender, data = evals_ch7)
+get_regression_table(score_model_2) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Regression table", 
+    booktabs = TRUE
+  )
+
+## ----numxcatxplot2, echo=FALSE, warning=FALSE, fig.cap="Instructor evaluation scores at UT Austin by gender: same slope"----
+coeff <- lm(score ~ age + gender, data = evals_ch7) %>% 
+  coef() %>%
+  as.numeric()
+slopes <- evals_ch7 %>%
+  group_by(gender) %>%
+  summarise(min = min(age), max = max(age)) %>%
+  mutate(intercept = coeff[1]) %>%
+  mutate(intercept = ifelse(gender == "male", intercept + coeff[3], intercept)) %>%
+  gather(point, age, -c(gender, intercept)) %>%
+  mutate(y_hat = intercept + age * coeff[2])
+
+ggplot(evals_ch7, aes(x = age, y = score, col = gender)) +
+  geom_jitter() +
+  labs(x = "Age", y = "Teaching Score", color = "Gender") +
+  geom_line(data = slopes, aes(y = y_hat), size = 1)
+
+## ---- eval=FALSE---------------------------------------------------------
+## score_model_interaction <- lm(score ~ age * gender, data = evals_ch7)
+## get_regression_table(score_model_interaction)
+
+## ---- echo=FALSE---------------------------------------------------------
+score_model_interaction <- lm(score ~ age * gender, data = evals_ch7)
+get_regression_table(score_model_interaction) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Regression table", 
+    booktabs = TRUE
+  )
+
+## ---- echo=FALSE---------------------------------------------------------
+data_frame(
+  Gender = c("Male instructors", "Female instructors"),
+  Intercept = c(4.437, 4.883),
+  `Slope for age` = c(-0.004, -0.018)
+) %>% 
+  knitr::kable(
+    caption = "Comparison of male and female intercepts and age slopes", 
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## regression_points <- get_regression_points(score_model_interaction)
+## regression_points
+
+## ----model4-points-table, echo=FALSE-------------------------------------
+set.seed(76)
+regression_points <- get_regression_points(score_model_interaction)
+regression_points %>%
+  slice(1:5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Regression points (first 5 rows of 463)",
+    booktabs = TRUE
+  )
+
+## ----residual1, warning=FALSE, fig.cap="Interaction model histogram of residuals"----
+ggplot(regression_points, aes(x = residual)) +
+  geom_histogram(binwidth = 0.25, color = "white") +
+  labs(x = "Residual") +
+  facet_wrap(~gender)
+
+## ----residual2, warning=FALSE, fig.cap="Interaction model residuals vs predictor"----
+ggplot(regression_points, aes(x = age, y = residual)) +
+  geom_point() +
+  labs(x = "age", y = "Residual") +
+  geom_hline(yintercept = 0, col = "blue", size = 1) +
+  facet_wrap(~ gender)
+
+## ---- eval=FALSE---------------------------------------------------------
+## library(ISLR)
+## data(Credit)
+## Credit %>%
+##   select(Balance, Income) %>%
+##   mutate(Income = Income * 1000) %>%
+##   cor()
+
+## ----cor-credit-2, echo=FALSE--------------------------------------------
+library(ISLR)
+data(Credit)
+Credit %>% 
+  select(Balance, Income) %>% 
+  mutate(Income = Income * 1000) %>% 
+  cor() %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Correlation between income (in $) and credit card balance", 
+    booktabs = TRUE
+  )
+
+## ----echo=FALSE, fig.height=4, fig.cap="Relationship between credit card balance and credit limit/income"----
+grid.arrange(model3_balance_vs_limit_plot, model3_balance_vs_income_plot, nrow = 1)
+
+## ----credit-limit-quartiles, echo=FALSE, fig.height=4, fig.cap="Histogram of credit limits and quartiles"----
+ggplot(Credit, aes(x = Limit)) +
+  geom_histogram(color = "white") +
+  geom_vline(xintercept = quantile(Credit$Limit, probs = c(0.25, 0.5, 0.75)), col = "red", linetype = "dashed")
+
+## ---- 2numxplot4, fig.height=4, echo=FALSE, fig.cap="Relationship between credit card balance and income for different credit limit brackets"----
+Credit <- Credit %>% 
+  mutate(limit_bracket = cut_number(Limit, 4)) %>% 
+  mutate(limit_bracket = fct_recode(limit_bracket,
+    "low" =  "[855,3.09e+03]",
+    "medium-low" = "(3.09e+03,4.62e+03]", 
+    "medium-high" = "(4.62e+03,5.87e+03]", 
+    "high" = "(5.87e+03,1.39e+04]"
+  ))
+
+model3_balance_vs_income_plot <- ggplot(Credit, aes(x = Income, y = Balance)) +
+  geom_point() +
+  labs(x = "Income (in $1000)", y = "Credit card balance (in $)", 
+       title = "Balance vs income (overall)") +
+  geom_smooth(method = "lm", se = FALSE) +
+  scale_y_continuous(limits = c(0, NA))
+
+model3_balance_vs_income_plot_colored <- ggplot(Credit, aes(x = Income, y = Balance, col = limit_bracket)) +
+  geom_point() +
+  geom_smooth(method = "lm", se = FALSE) +
+  labs(x = "Income (in $1000)", y = "Credit card balance (in $)", 
+       color = "Credit limit\nbracket", title = "Balance vs income (by bracket)") + 
+  theme(legend.position = "none") +
+  scale_y_continuous(limits = c(0, NA))
+  
+grid.arrange(model3_balance_vs_income_plot, model3_balance_vs_income_plot_colored, nrow = 1)
+#cowplot::plot_grid(model3_balance_vs_income_plot, model3_balance_vs_income_plot_colored, nrow = 1, rel_widths = c(2/5, 3/5))
+
+## ---- 2numxplot5, echo=FALSE, warning=FALSE, fig.cap="Relationship between credit card balance and income for different credit limit brackets"----
+ggplot(Credit, aes(x = Income, y = Balance)) +
+  geom_point() +
+  facet_wrap(~limit_bracket) +
+  geom_smooth(method = "lm", se = FALSE) +
+  labs(x = "Income (in $1000)", y = "Credit card balance (in $)")
+
diff --git a/docs/previous_versions/v0.4.0/scripts/08-sampling.R b/docs/previous_versions/v0.4.0/scripts/08-sampling.R
new file mode 100755
index 000000000..255260859
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/08-sampling.R
@@ -0,0 +1,251 @@
+## ----message=FALSE, warning=FALSE----------------------------------------
+library(dplyr)
+library(ggplot2)
+library(moderndive)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+library(patchwork)
+set.seed(79)
+
+## ---- eval=FALSE---------------------------------------------------------
+## tactile_prop_red
+## View(tactile_prop_red)
+
+## ----tactile-prop-red, echo=FALSE, message=FALSE, warning=FALSE----------
+tactile_prop_red %>% 
+  kable(
+    digits = 2,
+    caption = "33 sample proportions based on 33 tactile samples with n = 50", 
+    booktabs = TRUE
+  )
+
+## ----eval=FALSE----------------------------------------------------------
+## ggplot(tactile_prop_red, aes(x = prop_red)) +
+##   geom_histogram(binwidth = 0.05, color = "white") +
+##   labs(x = "Sample proportion red based on n = 50", title = "Sampling distribution of p-hat")
+
+## ----samplingdistribution-tactile, echo=FALSE, fig.cap="Sampling distribution of 33 sample proportions based on 33 tactile samples with n=50"----
+tactile_histogram <- ggplot(tactile_prop_red, aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, color = "white")
+tactile_histogram + 
+    labs(
+      x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
+      title = expression(paste("Sampling distribution of ", hat(p)))
+      )
+
+## ---- eval=FALSE---------------------------------------------------------
+## tactile_prop_red %>%
+##   summarize(mean = mean(prop_red), sd = sd(prop_red))
+
+## ---- echo=FALSE---------------------------------------------------------
+summary_stats <- tactile_prop_red %>% 
+  summarize(mean = mean(prop_red), sd = sd(prop_red))
+summary_stats %>% 
+  kable(digits = 3)
+
+## ------------------------------------------------------------------------
+bowl
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_shovel <- bowl %>%
+##   rep_sample_n(size = 50)
+## View(virtual_shovel)
+
+## ---- echo=FALSE---------------------------------------------------------
+virtual_shovel <- bowl %>% 
+  rep_sample_n(size = 50)
+virtual_shovel %>% 
+  slice(1:10) %>%
+  knitr::kable(
+    align = c("r", "r"),
+    digits = 3,
+    caption = "First 10 sampled balls of 50 in virtual sample",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_shovel %>%
+##   summarize(red = sum(color == "red")) %>%
+##   mutate(prop_red = red / 50)
+
+## ---- echo=FALSE---------------------------------------------------------
+virtual_shovel %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Count and proportion red in single virtual sample of size n = 50",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_samples <- bowl %>%
+##   rep_sample_n(size = 50, reps = 33)
+## View(virtual_samples)
+
+## ---- echo=FALSE---------------------------------------------------------
+virtual_samples <- bowl %>% 
+  rep_sample_n(size = 50, reps = 33)
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_prop_red <- virtual_samples %>%
+##   group_by(replicate) %>%
+##   summarize(red = sum(color == "red")) %>%
+##   mutate(prop_red = red / 50)
+## View(virtual_prop_red)
+
+## ----virtual-prop-red, echo=FALSE----------------------------------------
+virtual_prop_red <- virtual_samples %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+virtual_prop_red %>% 
+  kable(
+    digits = 2,
+    caption = "33 sample proportions red based on 33 virtual samples with n=50", 
+    booktabs = TRUE
+  )
+
+## ---- eval = FALSE-------------------------------------------------------
+## ggplot(virtual_prop_red, aes(x = prop_red)) +
+##   geom_histogram(binwidth = 0.05, color = "white") +
+##   labs(x = "Sample proportion red based on n = 50", title = "Sampling distribution of p-hat")
+
+## ----samplingdistribution-virtual, echo=FALSE, fig.cap="Sampling distribution of 33 sample proportions based on 33 virtual samples with n=50"----
+virtual_histogram <- ggplot(virtual_prop_red, aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, color = "white")
+virtual_histogram +
+    labs(
+      x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
+      title = expression(paste("Sampling distribution of ", hat(p)))
+      )
+
+## ----tactile-vs-virtual, echo=FALSE, fig.cap="Comparison of sampling distributions based on 33 tactile & virtual samples with n=50"----
+tactile_histogram <- tactile_histogram +
+  labs(
+    x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
+    title = "Sampling distribution: Tactile"
+    )
+virtual_histogram <- virtual_histogram +
+  labs(
+    x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
+    title = "Sampling distribution: Virtual"
+    )
+# using patchwork package for ggplot compositions
+tactile_histogram + virtual_histogram
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_samples <- bowl %>%
+##   rep_sample_n(size = 50, reps = 1000)
+## View(virtual_samples)
+
+## ---- echo=FALSE---------------------------------------------------------
+virtual_samples <- bowl %>% 
+  rep_sample_n(size = 50, reps = 1000)
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_prop_red <- virtual_samples %>%
+##   group_by(replicate) %>%
+##   summarize(red = sum(color == "red")) %>%
+##   mutate(prop_red = red / 50)
+## View(virtual_prop_red)
+
+## ---- echo=FALSE---------------------------------------------------------
+virtual_prop_red <- virtual_samples %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+
+## ---- eval=FALSE---------------------------------------------------------
+## ggplot(virtual_prop_red, aes(x = prop_red)) +
+##   geom_histogram(binwidth = 0.05, color = "white") +
+##   labs(x = "Sample proportion red based on n = 50", title = "Sampling distribution of p-hat")
+
+## ----samplingdistribution-virtual-1000, echo=FALSE, fig.cap="Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50"----
+virtual_prop_red <- virtual_samples %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+
+ggplot(virtual_prop_red, aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, color = "white") +
+    labs(
+      x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
+      title = expression(paste("Sampling distribution of ", hat(p)))
+      )
+
+## ------------------------------------------------------------------------
+virtual_prop_red %>% 
+  summarize(SE = sd(prop_red))
+
+## ------------------------------------------------------------------------
+virtual_samples_50 <- bowl %>% 
+  rep_sample_n(size = 50, reps = 1000)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_50 <- virtual_samples_50 %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_50 %>% 
+  summarize(SE = sd(prop_red))
+
+## ------------------------------------------------------------------------
+virtual_samples_25 <- bowl %>% 
+  rep_sample_n(size = 25, reps = 1000)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_25 <- virtual_samples_25 %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 25)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_25 %>% 
+  summarize(SE = sd(prop_red))
+
+## ------------------------------------------------------------------------
+virtual_samples_100 <- bowl %>% 
+  rep_sample_n(size = 100, reps = 1000)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_100 <- virtual_samples_100 %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 100)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_100 %>% 
+  summarize(SE = sd(prop_red))
+
+## ----comparing-n, echo = FALSE-------------------------------------------
+virtual_prop_red_25 <- virtual_prop_red_25 %>% 
+  mutate(n = 25)
+virtual_prop_red_50 <- virtual_prop_red_50 %>% 
+  mutate(n = 50)
+virtual_prop_red_100 <- virtual_prop_red_100 %>% 
+  mutate(n = 100)
+
+virtual_prop <- virtual_prop_red_25 %>% 
+  bind_rows(virtual_prop_red_50) %>% 
+  bind_rows(virtual_prop_red_100)
+
+virtual_prop %>% 
+  group_by(n) %>% 
+  summarize(SE = sd(prop_red)) %>% 
+  kable(
+    digits = 4,
+    caption = "Comparing the SE for different n", 
+    booktabs = TRUE
+  )
+
+## ----comparing-sampling-distributions, echo = FALSE, fig.cap="Comparing sampling distributions of p-hat for different sample sizes n"----
+ggplot(virtual_prop, aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, color = "white") +
+  labs(x = "Sample proportion red", title = "Comparing sampling distributions of p-hat for different sample sizes n") +
+  facet_wrap(~n)
+
diff --git a/docs/previous_versions/v0.4.0/scripts/09-confidence-intervals.R b/docs/previous_versions/v0.4.0/scripts/09-confidence-intervals.R
new file mode 100755
index 000000000..0fb6f3698
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/09-confidence-intervals.R
@@ -0,0 +1,536 @@
+## ----inference-summary-table, echo=FALSE, message=FALSE------------------
+# Original at https://docs.google.com/spreadsheets/d/1QkOpnBGqOXGyJjwqx1T2O5G5D72wWGfWlPyufOgtkk4/edit#gid=0
+library(dplyr)
+library(readr)
+read_csv("data/ch9_summary_table - Sheet1.csv", na = "") %>% 
+  kable(
+    caption = "Scenarios of sampling for inference", 
+    booktabs = TRUE
+  )
+
+## ----message=FALSE, warning=FALSE----------------------------------------
+library(dplyr)
+library(ggplot2)
+library(janitor)
+library(moderndive)
+library(infer)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+
+## ----include=FALSE-------------------------------------------------------
+set.seed(2018)
+pennies_sample <- pennies %>% sample_n(40)
+
+## ------------------------------------------------------------------------
+pennies_sample
+
+## ------------------------------------------------------------------------
+ggplot(pennies_sample, aes(x = age_in_2011)) +
+  geom_histogram(bins = 10, color = "white")
+
+## ------------------------------------------------------------------------
+x_bar <- pennies_sample %>% 
+  summarize(stat = mean(age_in_2011))
+x_bar
+
+## ----include=FALSE-------------------------------------------------------
+set.seed(201)
+
+## ------------------------------------------------------------------------
+bootstrap_sample1 <- pennies_sample %>% 
+  rep_sample_n(size = 40, replace = TRUE, reps = 1)
+bootstrap_sample1
+
+## ------------------------------------------------------------------------
+ggplot(bootstrap_sample1, aes(x = age_in_2011)) +
+  geom_histogram(bins = 10, color = "white")
+
+## ------------------------------------------------------------------------
+bootstrap_sample1 %>% 
+  summarize(stat = mean(age_in_2011))
+
+## ------------------------------------------------------------------------
+six_bootstrap_samples <- pennies_sample %>% 
+  rep_sample_n(size = 40, replace = TRUE, reps = 6)
+
+## ------------------------------------------------------------------------
+ggplot(six_bootstrap_samples, aes(x = age_in_2011)) +
+  geom_histogram(bins = 10, color = "white") +
+  facet_wrap(~ replicate)
+
+## ------------------------------------------------------------------------
+six_bootstrap_samples %>% 
+  group_by(replicate) %>% 
+  summarize(stat = mean(age_in_2011))
+
+## ----fig.align='center', echo=FALSE--------------------------------------
+knitr::include_graphics("images/flowcharts/infer/specify.png")
+
+## ------------------------------------------------------------------------
+pennies_sample %>% 
+  specify(response = age_in_2011)
+
+## ------------------------------------------------------------------------
+pennies_sample %>% 
+  specify(formula = age_in_2011 ~ NULL)
+
+## ----fig.align='center', echo=FALSE--------------------------------------
+knitr::include_graphics("images/flowcharts/infer/generate.png")
+
+## ------------------------------------------------------------------------
+thousand_bootstrap_samples <- pennies_sample %>% 
+  specify(response = age_in_2011) %>% 
+  generate(reps = 1000)
+
+## ------------------------------------------------------------------------
+thousand_bootstrap_samples %>% count(replicate)
+
+## ----fig.align='center', echo=FALSE--------------------------------------
+knitr::include_graphics("images/flowcharts/infer/calculate.png")
+
+## ------------------------------------------------------------------------
+bootstrap_distribution <- pennies_sample %>% 
+  specify(response = age_in_2011) %>% 
+  generate(reps = 1000) %>% 
+  calculate(stat = "mean")
+bootstrap_distribution
+
+## ------------------------------------------------------------------------
+pennies_sample %>% 
+  summarize(stat = mean(age_in_2011))
+
+## ------------------------------------------------------------------------
+pennies_sample %>% 
+  specify(response = age_in_2011) %>% 
+  calculate(stat = "mean")
+
+## ----fig.align='center', echo=FALSE--------------------------------------
+knitr::include_graphics("images/flowcharts/infer/visualize.png")
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% visualize()
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% visualize(obs_stat = x_bar)
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  summarize(mean_of_means = mean(stat))
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  get_ci(level = 0.95, type = "percentile")
+
+## ------------------------------------------------------------------------
+percentile_ci <- bootstrap_distribution %>% 
+  get_ci()
+percentile_ci
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  visualize(endpoints = percentile_ci, direction = "between")
+
+## ----eval=FALSE----------------------------------------------------------
+## standard_error_ci <- bootstrap_distribution %>%
+##   get_ci(type = "se", point_estimate = x_bar)
+## standard_error_ci
+
+## ----echo=FALSE----------------------------------------------------------
+standard_error_ci <- bootstrap_distribution %>% 
+  get_ci(type = "se", point_estimate = x_bar)
+round(standard_error_ci, 2)
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  visualize(endpoints = standard_error_ci, direction = "between")
+
+## ------------------------------------------------------------------------
+ggplot(pennies, aes(x = age_in_2011)) +
+  geom_histogram(bins = 10, color = "white")
+
+## ------------------------------------------------------------------------
+pennies %>% 
+  summarize(mean_age = mean(age_in_2011),
+            median_age = median(age_in_2011))
+
+## ------------------------------------------------------------------------
+ggplot(pennies_sample, aes(x = age_in_2011)) +
+  geom_histogram(bins = 10, color = "white")
+
+## ------------------------------------------------------------------------
+pennies_sample %>% 
+  summarize(mean_age = mean(age_in_2011),
+            median_age = median(age_in_2011))
+
+## ------------------------------------------------------------------------
+thousand_samples <- pennies %>% 
+  rep_sample_n(size = 40, reps = 1000, replace = FALSE)
+
+## ------------------------------------------------------------------------
+sampling_distribution <- thousand_samples %>% 
+  group_by(replicate) %>% 
+  summarize(stat = mean(age_in_2011))
+
+## ---- fig.cap="Sampling distribution for n=40 samples of pennies"--------
+sampling_distribution %>% 
+  visualize(bins = 10, fill = "salmon")
+
+## ------------------------------------------------------------------------
+sampling_distribution %>% 
+  summarize(se = sd(stat))
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  visualize(bins = 10, fill = "blue")
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  summarize(se = sd(stat))
+
+## ------------------------------------------------------------------------
+sampling_distribution %>% 
+  summarize(mean_of_sampling_means = mean(stat))
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  summarize(mean_of_bootstrap_means = mean(stat))
+
+## ------------------------------------------------------------------------
+pennies %>% 
+  summarize(overall_mean = mean(age_in_2011))
+
+## ----include=FALSE-------------------------------------------------------
+pennies_mu <- pennies %>% 
+  summarize(overall_mean = mean(age_in_2011)) %>% 
+  pull()
+
+## ------------------------------------------------------------------------
+pennies_sample2 <- pennies %>% 
+  sample_n(size = 40)
+
+## ------------------------------------------------------------------------
+percentile_ci2 <- pennies_sample2 %>% 
+  specify(formula = age_in_2011 ~ NULL) %>% 
+  generate(reps = 1000) %>% 
+  calculate(stat = "mean") %>% 
+  get_ci()
+percentile_ci2
+
+## ----echo=FALSE----------------------------------------------------------
+set.seed(201)
+
+pennies_samples <- pennies %>% 
+  rep_sample_n(size = 40, reps = 100, replace = FALSE)
+
+nested_pennies <- pennies_samples %>% 
+  group_by(replicate) %>% 
+  tidyr::nest()
+
+infer_pipeline <- function(entry){
+  entry %>% 
+    specify(formula = age_in_2011 ~ NULL) %>% 
+    generate(reps = 1000) %>% 
+    calculate(stat = "mean") %>% 
+    get_ci()
+}
+
+if(!file.exists("rds/pennies_cis.rds")){
+  pennies_cis <- nested_pennies %>% 
+    mutate(percentile_ci = purrr::map(data, infer_pipeline)) %>% 
+    mutate(point_estimate = purrr::map_dbl(data, ~mean(.x$age_in_2011)))
+  saveRDS(object = pennies_cis, "rds/pennies_cis.rds")
+} else {
+  pennies_cis <- readRDS("rds/pennies_cis.rds")
+}
+
+perc_cis <- pennies_cis %>% 
+  tidyr::unnest(percentile_ci) %>% 
+  rename(lower = `2.5%`, upper = `97.5%`) %>% 
+  mutate(captured = lower <= pennies_mu & pennies_mu <= upper)
+
+ggplot(perc_cis) +
+  geom_point(aes(x = point_estimate, y = replicate, color = captured)) +
+  geom_segment(aes(y = replicate, yend = replicate, x = lower, xend = upper, 
+                   color = captured)) +
+  labs(
+    x = expression("Age in 2011 (Years)"),
+    y = "Replicate ID",
+    title = expression(paste("95% percentile-based confidence intervals for ", 
+                             mu, sep = ""))
+  ) +
+  scale_color_manual(values = c("blue", "orange")) + 
+  geom_vline(xintercept = pennies_mu, color = "red") 
+
+## ----echo=FALSE----------------------------------------------------------
+set.seed(2019)
+
+pennies_samples2 <- pennies %>% 
+  rep_sample_n(size = 40, reps = 100, replace = FALSE)
+
+nested_pennies2 <- pennies_samples2 %>% 
+  group_by(replicate) %>% 
+  tidyr::nest() %>% 
+  mutate(sample_mean = purrr::map_dbl(data, ~mean(.x$age_in_2011)))
+
+bootstrap_pipeline <- function(entry){
+  entry %>% 
+    specify(formula = age_in_2011 ~ NULL) %>% 
+    generate(reps = 1000) %>% 
+    calculate(stat = "mean")
+}
+
+if(!file.exists("rds/pennies_se_cis.rds")){
+  pennies_se_cis <- nested_pennies2 %>% 
+    mutate(bootstraps = purrr::map(data, bootstrap_pipeline)) %>% 
+    group_by(replicate) %>% 
+    mutate(se_ci = purrr::map(bootstraps, get_ci, type = "se",
+                              level = 0.9,
+                              point_estimate = sample_mean))
+  saveRDS(object = pennies_se_cis, "rds/pennies_se_cis.rds")
+} else {
+  pennies_se_cis <- readRDS("rds/pennies_se_cis.rds")
+}
+
+se_cis <- pennies_se_cis %>% 
+  tidyr::unnest(se_ci) %>% 
+  mutate(captured = lower <= pennies_mu & pennies_mu <= upper)
+
+ggplot(se_cis) +
+  geom_point(aes(x = sample_mean, y = replicate, color = captured)) +
+  geom_segment(aes(y = replicate, yend = replicate, x = lower, xend = upper, 
+                   color = captured)) +
+  labs(
+    x = expression("Age in 2011 (Years)"),
+    y = "Replicate ID",
+    title = expression(paste(
+      "90% standard error-based confidence intervals for ", mu, sep = "")
+      )
+  ) +
+  scale_color_manual(values = c("blue", "orange")) + 
+  geom_vline(xintercept = pennies_mu, color = "red") 
+
+## ----include=FALSE-------------------------------------------------------
+color <- c(rep("red", 21), rep("white", 50 - 21)) %>% 
+  sample()
+tactile_shovel1 <- tibble::tibble(color)
+
+## ------------------------------------------------------------------------
+tactile_shovel1
+
+## ------------------------------------------------------------------------
+p_hat <- tactile_shovel1 %>% 
+  specify(formula = color ~ NULL, success = "red") %>% 
+  calculate(stat = "prop")
+p_hat
+
+## ----eval=FALSE----------------------------------------------------------
+## tactile_shovel1 %>%
+##   specify(formula = color ~ NULL, success = "red") %>%
+##   generate(reps = 10000)
+
+## ----echo=FALSE----------------------------------------------------------
+set.seed(2018)
+gen <- tactile_shovel1 %>% 
+  specify(formula = color ~ NULL, success = "red") %>% 
+  generate(reps = 10000)
+
+## ----eval=FALSE----------------------------------------------------------
+## bootstrap_props <- tactile_shovel1 %>%
+##   specify(formula = color ~ NULL, success = "red") %>%
+##   generate(reps = 10000) %>%
+##   calculate(stat = "prop")
+
+## ----echo=FALSE----------------------------------------------------------
+bootstrap_props <- gen %>% 
+  calculate(stat = "prop")
+
+## ------------------------------------------------------------------------
+bootstrap_props %>% visualize(bins = 25)
+
+## ------------------------------------------------------------------------
+standard_error_ci <- bootstrap_props %>% 
+  get_ci(type = "se", level = 0.95, point_estimate = p_hat)
+standard_error_ci
+
+## ------------------------------------------------------------------------
+bootstrap_props %>% 
+  visualize(bins = 25, endpoints = standard_error_ci)
+
+## ---- eval=FALSE, message=FALSE, warning=FALSE---------------------------
+## tactile_prop_red
+
+## ---- eval=FALSE, message=FALSE, warning=FALSE---------------------------
+## conf_ints <- tactile_prop_red %>%
+##   rename(p_hat = prop_red) %>%
+##   mutate(
+##     n = 50,
+##     SE = sqrt(p_hat * (1 - p_hat) / n),
+##     MoE = 1.96 * SE,
+##     lower_ci = p_hat - MoE,
+##     upper_ci = p_hat + MoE
+##   )
+## conf_ints
+
+## ---- echo=FALSE, message=FALSE, warning=FALSE---------------------------
+conf_ints <- tactile_prop_red %>% 
+  rename(p_hat = prop_red) %>% 
+  select(-replicate) %>% 
+  mutate(
+    n = 50, 
+    SE = sqrt(p_hat*(1-p_hat)/n),
+    MoE = 1.96*SE,
+    lower_ci = p_hat - MoE,
+    upper_ci = p_hat + MoE,
+    y = seq_len(n())
+  )
+conf_ints %>% 
+  select(-y) %>% 
+  kable(
+    digits = 3,
+    caption = "33 confidence intervals from 33 tactile samples of size n=50", 
+    booktabs = TRUE
+  )
+
+## ----tactile-conf-int, echo=FALSE, message=FALSE, warning=FALSE, fig.cap= "33 confidence intervals based on 33 tactile samples of size n=50", fig.height=6----
+groups <- conf_ints$group
+conf_ints %>%
+  mutate(p = 900 / 2400,
+         captured = lower_ci <= p & p <= upper_ci) %>%
+  ggplot() +
+  geom_point(aes(x = p_hat, y = y, col = captured)) +
+  geom_vline(xintercept = 900 / 2400, col = "red") +
+  geom_segment(aes(
+    y = y,
+    yend = y,
+    x = lower_ci,
+    xend = upper_ci,
+    col = captured
+  )) +
+  scale_y_continuous(breaks = 1:33, labels = groups) +
+  labs(x = expression("Proportion red"),
+       y = "",
+       title = expression(paste("95% confidence intervals for ", p, 
+                                sep = ""))) +
+  scale_color_manual(values = c("blue", "orange")) 
+
+## ------------------------------------------------------------------------
+# First: Take 100 virtual samples of n=50 balls
+virtual_samples <- bowl %>% 
+  rep_sample_n(size = 50, reps = 100)
+
+# Second: For each virtual sample compute the proportion red
+virtual_prop_red <- virtual_samples %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+
+# Third: Compute the 95% confidence interval as above
+virtual_prop_red <- virtual_prop_red %>% 
+  rename(p_hat = prop_red) %>% 
+  mutate(
+    n = 50,
+    SE = sqrt(p_hat*(1-p_hat)/n),
+    MoE = 1.96 * SE,
+    lower_ci = p_hat - MoE,
+    upper_ci = p_hat + MoE
+  )
+
+## ----virtual-conf-int, echo=FALSE, message=FALSE, warning=FALSE, fig.height=6, fig.cap="100 confidence intervals based on 100 virtual samples of size n=50"----
+set.seed(79)
+
+virtual_samples <- bowl %>% 
+  rep_sample_n(size = 50, reps = 100)
+
+# Second: For each virtual sample compute the proportion red
+virtual_prop_red <- virtual_samples %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+
+# Third: Compute the 95% confidence interval as above
+virtual_prop_red <- virtual_prop_red %>% 
+  rename(p_hat = prop_red) %>% 
+  mutate(
+    n = 50,
+    SE = sqrt(p_hat * (1 - p_hat) / n),
+    MoE = 1.96 * SE,
+    lower_ci = p_hat - MoE,
+    upper_ci = p_hat + MoE
+  ) %>% 
+  mutate(
+    y = seq_len(n()),
+    p = 900 / 2400,
+    captured = lower_ci <= p & p <= upper_ci
+  )
+
+ggplot(virtual_prop_red) +
+  geom_point(aes(x = p_hat, y = y, color = captured)) +
+  geom_segment(aes(y = y, yend = y, x = lower_ci, xend = upper_ci, 
+                   color = captured)) +
+  labs(
+    x = expression("Proportion red"),
+    y = "Replicate ID",
+    title = expression(paste("95% confidence intervals for ", p, sep = ""))
+  ) +
+  scale_color_manual(values = c("blue", "orange")) + 
+  geom_vline(xintercept = 900 / 2400, color = "red") 
+
+## ------------------------------------------------------------------------
+mythbusters_yawn
+
+## ------------------------------------------------------------------------
+mythbusters_yawn %>% 
+  tabyl(group, yawn) %>% 
+  adorn_percentages() %>% 
+  adorn_pct_formatting() %>% 
+  # To show original counts
+  adorn_ns()
+
+## ----error=TRUE----------------------------------------------------------
+mythbusters_yawn %>% 
+  specify(formula = yawn ~ group)
+
+## ------------------------------------------------------------------------
+mythbusters_yawn %>% 
+  specify(formula = yawn ~ group, success = "yes")
+
+## ----error=TRUE----------------------------------------------------------
+mythbusters_yawn %>% 
+  specify(formula = yawn ~ group, success = "yes") %>% 
+  calculate(stat = "diff in props")
+
+## ----error=TRUE----------------------------------------------------------
+obs_diff <- mythbusters_yawn %>% 
+  specify(formula = yawn ~ group, success = "yes") %>% 
+  calculate(stat = "diff in props", order = c("seed", "control"))
+obs_diff
+
+## ------------------------------------------------------------------------
+head(mythbusters_yawn)
+
+## ------------------------------------------------------------------------
+set.seed(2019)
+
+## ------------------------------------------------------------------------
+head(mythbusters_yawn) %>% 
+  sample_n(size = 6, replace = TRUE)
+
+## ------------------------------------------------------------------------
+bootstrap_distribution <- mythbusters_yawn %>% 
+  specify(formula = yawn ~ group, success = "yes") %>% 
+  generate(reps = 1000) %>% 
+  calculate(stat = "diff in props", order = c("seed", "control"))
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% visualize(bins = 20)
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  get_ci(type = "percentile", level = 0.95)
+
+## ----include=FALSE-------------------------------------------------------
+bootstrap_distribution %>% 
+  get_ci(type = "percentile", level = 0.95) -> myth_ci
+
diff --git a/docs/previous_versions/v0.4.0/scripts/10-hypothesis-testing.R b/docs/previous_versions/v0.4.0/scripts/10-hypothesis-testing.R
new file mode 100755
index 000000000..cb6e7bb2a
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/10-hypothesis-testing.R
@@ -0,0 +1,184 @@
+## ----message=FALSE, warning=FALSE----------------------------------------
+library(dplyr)
+library(ggplot2)
+library(infer)
+library(nycflights13)
+library(ggplot2movies)
+library(broom)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+
+## ------------------------------------------------------------------------
+bos_sfo <- flights %>% 
+  na.omit() %>% 
+  filter(dest %in% c("BOS", "SFO")) %>% 
+  group_by(dest) %>% 
+  sample_n(100)
+
+## ------------------------------------------------------------------------
+bos_sfo_summary <- bos_sfo %>% group_by(dest) %>% 
+  summarize(mean_time = mean(air_time),
+            sd_time = sd(air_time))
+bos_sfo_summary
+
+## ------------------------------------------------------------------------
+ggplot(data = bos_sfo, mapping = aes(x = dest, y = air_time)) +
+  geom_boxplot()
+
+## ----message=FALSE, warning=FALSE----------------------------------------
+movies_trimmed <- movies %>% 
+  select(title, year, rating, Action, Romance)
+
+## ------------------------------------------------------------------------
+movies_trimmed <- movies_trimmed %>%
+  filter(!(Action == 1 & Romance == 1))
+
+## ------------------------------------------------------------------------
+movies_trimmed <- movies_trimmed %>%
+  mutate(genre = case_when(Action == 1 ~ "Action",
+                           Romance == 1 ~ "Romance",
+                           TRUE ~ "Neither")) %>%
+  filter(genre != "Neither") %>%
+  select(-Action, -Romance)
+
+## ----fig.cap="Rating vs genre in the population"-------------------------
+ggplot(data = movies_trimmed, aes(x = genre, y = rating)) +
+  geom_boxplot()
+
+## ----movie-hist, warning=FALSE, fig.cap="Faceted histogram of genre vs rating"----
+ggplot(data = movies_trimmed, mapping = aes(x = rating)) +
+  geom_histogram(binwidth = 1, color = "white") +
+  facet_grid(genre ~ .)
+
+## ------------------------------------------------------------------------
+set.seed(2017)
+movies_genre_sample <- movies_trimmed %>% 
+  group_by(genre) %>%
+  sample_n(34) %>% 
+  ungroup()
+
+## ----fig.cap="Genre vs rating for our sample"----------------------------
+ggplot(data = movies_genre_sample, aes(x = genre, y = rating)) +
+  geom_boxplot()
+
+## ----warning=FALSE, fig.cap="Genre vs rating for our sample as faceted histogram"----
+ggplot(data = movies_genre_sample, mapping = aes(x = rating)) +
+  geom_histogram(binwidth = 1, color = "white") +
+  facet_grid(genre ~ .)
+
+## ------------------------------------------------------------------------
+summary_ratings <- movies_genre_sample %>% 
+  group_by(genre) %>%
+  summarize(mean = mean(rating),
+            std_dev = sd(rating),
+            n = n())
+summary_ratings
+
+## ------------------------------------------------------------------------
+obs_diff <- movies_genre_sample %>% 
+  specify(formula = rating ~ genre) %>% 
+  calculate(stat = "diff in means", order = c("Romance", "Action"))
+obs_diff
+
+## ----include=FALSE-------------------------------------------------------
+set.seed(2018)
+
+## ----message=FALSE, warning=FALSE----------------------------------------
+shuffled_ratings_old <- #movies_trimmed %>%
+  movies_genre_sample %>% 
+     mutate(genre = mosaic::shuffle(genre)) %>% 
+     group_by(genre) %>%
+     summarize(mean = mean(rating))
+diff(shuffled_ratings_old$mean)
+
+permuted_ratings <- movies_genre_sample %>% 
+  specify(formula = rating ~ genre) %>% 
+  generate(reps = 1)
+
+## ----include=FALSE-------------------------------------------------------
+if(!file.exists("rds/generated_samples.rds")){
+  generated_samples <- movies_genre_sample %>% 
+    specify(formula = rating ~ genre) %>% 
+    hypothesize(null = "independence") %>% 
+    generate(reps = 5000)
+   saveRDS(object = generated_samples, 
+           "rds/generated_samples.rds")
+} else {
+   generated_samples <- readRDS("rds/generated_samples.rds")
+}
+
+## ----eval=FALSE----------------------------------------------------------
+## generated_samples <- movies_genre_sample %>%
+##   specify(formula = rating ~ genre) %>%
+##   hypothesize(null = "independence") %>%
+##   generate(reps = 5000)
+
+## ----include=FALSE-------------------------------------------------------
+null_distribution_two_means <- generated_samples %>% 
+  calculate(stat = "diff in means", order = c("Romance", "Action"))
+
+## ----fig.cap="Simulated differences in means histogram"------------------
+null_distribution_two_means %>% visualize()
+
+## ----fig.cap="Shaded histogram to show p-value"--------------------------
+null_distribution_two_means %>% 
+  visualize(obs_stat = obs_diff, direction = "both")
+
+## ----fig.cap="Histogram with vertical lines corresponding to observed statistic"----
+null_distribution_two_means %>% 
+  visualize(bins = 100, obs_stat = obs_diff, direction = "both")
+
+## ------------------------------------------------------------------------
+pvalue <- null_distribution_two_means %>% 
+  get_pvalue(obs_stat = obs_diff, direction = "both")
+pvalue
+
+## ----eval=FALSE----------------------------------------------------------
+## null_distribution_two_means <- movies_genre_sample %>%
+##   specify(formula = rating ~ genre) %>%
+##   hypothesize(null = "independence") %>%
+##   generate(reps = 5000) %>%
+##   calculate(stat = "diff in means", order = c("Romance", "Action"))
+
+## ------------------------------------------------------------------------
+percentile_ci_two_means <- movies_genre_sample %>% 
+  specify(formula = rating ~ genre) %>% 
+#  hypothesize(null = "independence") %>% 
+  generate(reps = 5000) %>% 
+  calculate(stat = "diff in means", order = c("Romance", "Action")) %>% 
+  get_ci()
+percentile_ci_two_means
+
+## ----echo=FALSE----------------------------------------------------------
+ggplot(data.frame(x = c(-4, 4)), aes(x)) + stat_function(fun = dnorm)
+
+## ----fig.cap="Simulated differences in means histogram"------------------
+ggplot(data = null_distribution_two_means, aes(x = stat)) +
+  geom_histogram(color = "white", bins = 20)
+
+## ----eval=FALSE----------------------------------------------------------
+## generated_samples <- movies_genre_sample %>%
+##   specify(formula = rating ~ genre) %>%
+##   hypothesize(null = "independence") %>%
+##   generate(reps = 5000)
+
+## ------------------------------------------------------------------------
+null_distribution_t <- generated_samples %>% 
+  calculate(stat = "t", order = c("Romance", "Action"))
+null_distribution_t %>% visualize()
+
+## ------------------------------------------------------------------------
+null_distribution_t %>% 
+  visualize(method = "both")
+
+## ------------------------------------------------------------------------
+obs_t <- movies_genre_sample %>% 
+  specify(formula = rating ~ genre) %>% 
+  calculate(stat = "t", order = c("Romance", "Action"))
+
+## ------------------------------------------------------------------------
+null_distribution_t %>% 
+  visualize(method = "both", obs_stat = obs_t, direction = "both")
+
diff --git a/docs/previous_versions/v0.4.0/scripts/11-inference-for-regression.R b/docs/previous_versions/v0.4.0/scripts/11-inference-for-regression.R
new file mode 100755
index 000000000..4b0efb2e1
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/11-inference-for-regression.R
@@ -0,0 +1,173 @@
+## ----setup_inference_regression, include=FALSE---------------------------
+chap <- 11
+lc <- 0
+rq <- 0
+# **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**
+# **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
+
+knitr::opts_chunk$set(
+  tidy = FALSE, 
+  out.width = '\\textwidth'
+  )
+
+# This bit of code is a bug fix on asis blocks, which we use to show/not show LC
+# solutions, which are written like markdown text. In theory, it shouldn't be
+# necessary for knitr versions <=1.11.6, but I've found I still need to for
+# everything to knit properly in asis blocks. More info here: 
+# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
+library(knitr)
+knit_engines$set(asis = function(options) {
+  if (options$echo && options$eval) knit_child(text = options$code)
+})
+
+# This controls which LC solutions to show. Options for solutions_shown: "ALL"
+# (to show all solutions), or subsets of c('11-1', '11-2'), including the
+# null vector c('') to show no solutions.
+solutions_shown <- c('')
+show_solutions <- function(section){
+  return(solutions_shown == "ALL" | section %in% solutions_shown)
+  }
+
+## ---- message=FALSE, warning=FALSE---------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+library(infer)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+
+## ------------------------------------------------------------------------
+evals %>% 
+  specify(score ~ bty_avg)
+
+## ------------------------------------------------------------------------
+slope_obs <- evals %>% 
+  specify(score ~ bty_avg) %>% 
+  calculate(stat = "slope")
+
+## ----eval=FALSE----------------------------------------------------------
+## null_slope_distn <- evals %>%
+##   specify(score ~ bty_avg) %>%
+##   hypothesize(null = "independence") %>%
+##   generate(reps = 10000) %>%
+##   calculate(stat = "slope")
+
+## ----echo=FALSE----------------------------------------------------------
+if(!file.exists("rds/null_slope_distn.rds")){
+  null_slope_distn <- evals %>% 
+    specify(score ~ bty_avg) %>%
+    hypothesize(null = "independence") %>% 
+    generate(reps = 10000) %>% 
+    calculate(stat = "slope")
+   saveRDS(object = null_slope_distn, 
+           "rds/null_slope_distn.rds")
+} else {
+   null_slope_distn <- readRDS("rds/null_slope_distn.rds")
+}
+
+## ------------------------------------------------------------------------
+null_slope_distn %>% 
+  visualize(obs_stat = slope_obs, direction = "greater")
+
+## ----fig.cap="Shaded histogram to show p-value"--------------------------
+null_slope_distn %>% 
+  get_pvalue(obs_stat = slope_obs, direction = "greater")
+
+## ----eval=FALSE----------------------------------------------------------
+## null_slope_distn <- evals %>%
+##   specify(score ~ bty_avg) %>%
+##   hypothesize(null = "independence") %>%
+##   generate(reps = 10000, type = "permute") %>%
+##   calculate(stat = "slope")
+
+## ----echo=FALSE----------------------------------------------------------
+bootstrap_slope_distn <- evals %>% 
+  specify(score ~ bty_avg) %>%
+  generate(reps = 10000, type = "bootstrap") %>% 
+  calculate(stat = "slope")
+
+## ----echo=FALSE----------------------------------------------------------
+if(!file.exists("rds/bootstrap_slope_distn.rds")){
+  bootstrap_slope_distn <- evals %>% 
+    specify(score ~ bty_avg) %>%
+    generate(reps = 10000, type = "bootstrap") %>% 
+    calculate(stat = "slope")
+  saveRDS(object = bootstrap_slope_distn, 
+           "rds/bootstrap_slope_distn.rds")
+} else {
+  bootstrap_slope_distn <- readRDS("rds/bootstrap_slope_distn.rds")
+}
+
+## ------------------------------------------------------------------------
+bootstrap_slope_distn %>% visualize()
+
+## ------------------------------------------------------------------------
+percentile_slope_ci <- bootstrap_slope_distn %>% 
+  get_ci(level = 0.99, type = "percentile")
+percentile_slope_ci
+
+## ------------------------------------------------------------------------
+se_slope_ci <- bootstrap_slope_distn %>% 
+  get_ci(level = 0.99, type = "se", point_estimate = slope_obs)
+se_slope_ci
+
+## ---- echo=FALSE---------------------------------------------------------
+library(tidyr)
+
+## ------------------------------------------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+
+evals_multiple <- evals %>%
+  select(score, ethnicity, gender, language, age, bty_avg, rank)
+
+## ----model1, echo=FALSE, warning=FALSE, fig.cap="Model 1: no interaction effect included"----
+coeff <- lm(score ~ age + gender, data = evals_multiple) %>% coef() %>% as.numeric()
+slopes <- evals_multiple %>%
+  group_by(gender) %>%
+  summarise(min = min(age), max = max(age)) %>%
+  mutate(intercept = coeff[1]) %>%
+  mutate(intercept = ifelse(gender == "male", intercept + coeff[3], intercept)) %>%
+  gather(point, age, -c(gender, intercept)) %>%
+  mutate(y_hat = intercept + age * coeff[2])
+  
+  ggplot(evals_multiple, aes(x = age, y = score, col = gender)) +
+  geom_jitter() +
+  labs(x = "Age", y = "Teaching Score", color = "Gender") +
+  geom_line(data = slopes, aes(y = y_hat), size = 1)
+
+## ----model2, echo=FALSE, warning=FALSE, fig.cap="Model 2: interaction effect included"----
+ggplot(evals_multiple, aes(x = age, y = score, col = gender)) +
+  geom_jitter() +
+  labs(x = "Age", y = "Teaching Score", color = "Gender") +
+  geom_smooth(method = "lm", se = FALSE)
+
+## ---- eval=FALSE---------------------------------------------------------
+## score_model_2 <- lm(score ~ age + gender, data = evals_multiple)
+## get_regression_table(score_model_2)
+
+## ---- echo=FALSE---------------------------------------------------------
+score_model_2 <- lm(score ~ age + gender, data = evals_multiple)
+get_regression_table(score_model_2) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Model 1: Regression table with no interaction effect included", 
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## score_model_3 <- lm(score ~ age * gender, data = evals_multiple)
+## get_regression_table(score_model_3)
+
+## ---- echo=FALSE---------------------------------------------------------
+score_model_3 <- lm(score ~ age * gender, data = evals_multiple)
+get_regression_table(score_model_3) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Model 2: Regression table with interaction effect included", 
+    booktabs = TRUE
+  )
+
diff --git a/docs/previous_versions/v0.4.0/scripts/12-thinking-with-data.R b/docs/previous_versions/v0.4.0/scripts/12-thinking-with-data.R
new file mode 100755
index 000000000..dde21cee6
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/scripts/12-thinking-with-data.R
@@ -0,0 +1,234 @@
+## ----setup_thinking_with_data, include=FALSE-----------------------------
+chap <- 12
+lc <- 0
+rq <- 0
+# **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**
+# **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
+
+knitr::opts_chunk$set(
+  tidy = FALSE, 
+  out.width = '\\textwidth'
+  )
+
+# This bit of code is a bug fix on asis blocks, which we use to show/not show LC
+# solutions, which are written like markdown text. In theory, it shouldn't be
+# necessary for knitr versions <=1.11.6, but I've found I still need to for
+# everything to knit properly in asis blocks. More info here: 
+# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
+library(knitr)
+knit_engines$set(asis = function(options) {
+  if (options$echo && options$eval) knit_child(text = options$code)
+})
+
+# This controls which LC solutions to show. Options for solutions_shown: "ALL"
+# (to show all solutions), or subsets of c('4-4', '4-5'), including the
+# null vector c('') to show no solutions.
+solutions_shown <- c('')
+show_solutions <- function(section){
+  return(solutions_shown == "ALL" | section %in% solutions_shown)
+  }
+
+## ----moderndive-figure-conclusion, echo=FALSE, fig.align='center', fig.cap="ModernDive Flowchart"----
+knitr::include_graphics("images/flowcharts/flowchart/flowchart.002.png")
+
+## ----pipeline-figure-conclusion, echo=FALSE, fig.align='center', fig.cap="Data/Science Pipeline"----
+knitr::include_graphics("images/tidy1.png")
+
+## ---- message=FALSE, warning=FALSE---------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+library(fivethirtyeight)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+library(patchwork)
+library(scales)
+
+## ----warning=FALSE, message=FALSE----------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+
+## ---- eval=FALSE---------------------------------------------------------
+## View(house_prices)
+## glimpse(house_prices)
+
+## ---- echo=FALSE---------------------------------------------------------
+glimpse(house_prices)
+
+## ---- eval=FALSE, message=FALSE, warning=FALSE---------------------------
+## # Histogram of house price:
+## ggplot(house_prices, aes(x = price)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "price (USD)", title = "House price")
+## 
+## # Histogram of sqft_living:
+## ggplot(house_prices, aes(x = sqft_living)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "living space (square feet)", title = "House size")
+## 
+## # Barplot of condition:
+## ggplot(house_prices, aes(x = condition)) +
+##   geom_bar() +
+##   labs(x = "condition", title = "House condition")
+
+## ----house-prices-viz, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Exploratory visualizations of Seattle house prices data", fig.width=16/2, fig.height=9/2.5----
+library(patchwork)
+p1 <- ggplot(house_prices, aes(x = price)) +
+  geom_histogram(color = "white") +
+  labs(x = "price (USD)", title = "House price")
+p2 <- ggplot(house_prices, aes(x = sqft_living)) +
+  geom_histogram(color = "white") +
+  labs(x = "living space (square feet)", title = "House size")
+p3 <- ggplot(house_prices, aes(x = condition)) +
+  geom_bar() +
+  labs(x = "condition", title = "House condition")
+p1 + p2 + p3
+
+## ------------------------------------------------------------------------
+house_prices %>% 
+  summarize(
+    mean_price = mean(price),
+    median_price = median(price),
+    sd_price = sd(price),
+    IQR_price = IQR(price)
+  )
+
+## ----log10-orders-of-magnitude, echo=FALSE-------------------------------
+data_frame(Price = c(1,10,100,1000,10000,100000,1000000)) %>% 
+  mutate(
+    `log10(Price)` = log10(Price),
+    Price = dollar(Price),
+    `Order of magnitude` = c("Singles", "Tens", "Hundreds", "Thousands", "Tens of thousands", "Hundreds of thousands", "Millions"),
+    `Examples` = c("Cups of coffee", "Books", "Mobile phones", "High definition TV's", "Cars", "Luxury cars & houses", "Luxury houses")
+    ) %>% 
+  kable(
+    caption = "log10-transformated prices, orders of magnitude, and examples", 
+    booktabs = TRUE
+  )
+
+## ------------------------------------------------------------------------
+house_prices <- house_prices %>%
+  mutate(
+    log10_price = log10(price),
+    log10_size = log10(sqft_living)
+    )
+
+## ---- eval=FALSE---------------------------------------------------------
+## house_prices %>%
+##   select(price, log10_price, sqft_living, log10_size)
+
+## ---- echo=FALSE---------------------------------------------------------
+house_prices %>% 
+  select(price, log10_price, sqft_living, log10_size) %>% 
+  slice(1:10)
+
+## ---- eval=FALSE---------------------------------------------------------
+## # Before:
+## ggplot(house_prices, aes(x = price)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "price (USD)", title = "House price: Before")
+## 
+## # After:
+## ggplot(house_prices, aes(x = log10_price)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "log10 price (USD)", title = "House price: After")
+
+## ----log10-price-viz, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="House price before and after log10-transformation", fig.width=16/2, fig.height=9/2----
+library(patchwork)
+p1 <- ggplot(house_prices, aes(x = price)) +
+  geom_histogram(color = "white") +
+  labs(x = "price (USD)", title = "House price: Before")
+p2 <- ggplot(house_prices, aes(x = log10_price)) +
+  geom_histogram(color = "white") +
+  labs(x = "log10 price (USD)", title = "House price: After")
+p1 + p2
+
+## ---- eval=FALSE---------------------------------------------------------
+## # Before:
+## ggplot(house_prices, aes(x = sqft_living)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "living space (square feet)", title = "House size: Before")
+## 
+## # After:
+## ggplot(house_prices, aes(x = log10_size)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "log10 living space (square feet)", title = "House size: After")
+
+## ----log10-size-viz, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="House size before and after log10-transformation", fig.width=16/2, fig.height=9/2----
+library(patchwork)
+p1 <- ggplot(house_prices, aes(x = sqft_living)) +
+  geom_histogram(color = "white") +
+  labs(x = "living space (square feet)", title = "House size: Before")
+p2 <- ggplot(house_prices, aes(x = log10_size)) +
+  geom_histogram(color = "white") +
+  labs(x = "log10 living space (square feet)", title = "House size: After")
+p1 + p2
+
+## ----house-price-parallel-slopes, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Parallel slopes model", fig.width=16/2, fig.height=9/2----
+model_price_3_points <-
+  house_prices %>%
+  lm(log10_price ~ log10_size + condition, data = .) %>%
+  get_regression_points()
+ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) +
+  geom_point(alpha = 0.1) +
+  labs(y = "log10 price", x = "log10 size", title = "House prices in Seattle: Parallel slopes model") +
+  geom_line(data = model_price_3_points, aes(y = log10_price_hat), show.legend = FALSE, size = 1) +
+  guides(colour = guide_legend(override.aes = list(alpha = 1)))
+
+## ----house-price-interaction, message=FALSE, warning=FALSE, fig.cap="Interaction model", fig.width=16/2, fig.height=9/2----
+ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) +
+  geom_point(alpha = 0.1) +
+  labs(y = "log10 price", x = "log10 size", title = "House prices in Seattle") +
+  geom_smooth(method = "lm", se = FALSE)
+
+## ----house-price-interaction-2, message=FALSE, warning=FALSE, fig.cap="Interaction model with facets", fig.width=16/2, fig.height=9/2----
+ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) +
+  geom_point(alpha = 0.3) +
+  labs(y = "log10 price", x = "log10 size", title = "House prices in Seattle") +
+  geom_smooth(method = "lm", se = FALSE) +
+  facet_wrap(~condition)
+
+## ------------------------------------------------------------------------
+# Fit regression model:
+price_interaction <- lm(log10_price ~ log10_size * condition, data = house_prices)
+# Get regression table:
+get_regression_table(price_interaction)
+
+## ----house-price-interaction-3, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Interaction model with prediction", fig.width=16/2, fig.height=9/2----
+new_house <- data_frame(log10_size = log10(1900), condition = factor(5)) %>% 
+  get_regression_points(price_interaction, newdata = .)
+
+ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) +
+  geom_point(alpha = 0.1) +
+  labs(y = "log10 price", x = "log10 size", title = "House prices in Seattle") +
+  geom_smooth(method = "lm", se = FALSE) +
+  geom_vline(xintercept = log10(1900), linetype = "dashed", size = 1) +
+  geom_point(data = new_house, aes(y = log10_price_hat), col ="black", size = 3)
+
+## ------------------------------------------------------------------------
+2.45 + 1 * log10(1900)
+
+## ------------------------------------------------------------------------
+10^(2.45 + 1 * log10(1900))
+
+## ----fivethirtyeight-----------------------------------------------------
+library(ggplot2)
+library(dplyr)
+library(fivethirtyeight)
+
+## ------------------------------------------------------------------------
+# Preview data
+glimpse(US_births_1994_2003)
+
+## ------------------------------------------------------------------------
+US_births_1999 <- US_births_1994_2003 %>%
+  filter(year == 1999)
+
+## ------------------------------------------------------------------------
+ggplot(US_births_1999, aes(x = date, y = births)) +
+  geom_line() +
+  labs(x = "Data", y = "Number of births", title = "US Births in 1999")
+
diff --git a/docs/previous_versions/v0.4.0/search_index.json b/docs/previous_versions/v0.4.0/search_index.json
new file mode 100755
index 000000000..d0ad68559
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/search_index.json
@@ -0,0 +1,18 @@
+[
+["index.html", "An Introduction to Statistical and Data Sciences via R 1 Introduction 1.1 Important Note 1.2 Introduction for students 1.3 Introduction for instructors 1.4 DataCamp 1.5 Connect and contribute 1.6 About this book 1.7 About the authors", " An Introduction to Statistical and Data Sciences via R Chester Ismay and Albert Y. Kim July 21, 2018 1 Introduction 1.1 Important Note This is a previous version (v0.4.0) of ModernDive and may be out of date. For the current version of ModernDive, please go to ModernDive.com. Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do? If you’re asking yourself this question, then you’ve come to the right place! Start with our Introduction for Students. Are you an instructor hoping to use this book in your courses? Then click here for more information on how to teach with this book. Are you looking to connect with and contribute to ModernDive? Then click here for information on how. Are you curious about the publishing of this book? Then click here for more information on the open-source technology, in particular R Markdown and the bookdown package. This is version 0.4.0 of ModernDive published on July 21, 2018. For previous versions of ModernDive, see Section 1.6. 1.2 Introduction for students This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would. In Figure 1.1 we present a flowchart of what you’ll cover in this book. You’ll first get started with with data in Chapter 2, where you’ll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then Data science: You’ll assemble your data science toolbox using tidyverse packages. In particular: Ch.3: Visualizing data via the ggplot2 package. Ch.4: Understanding the concept of “tidy” data as a standardized data input format for all packages in the tidyverse Ch.5: Wrangling data via the dplyr package. Data modeling: Using these data science tools and helper functions from the moderndive package, you’ll start performing data modeling. In particular: Ch.6: Constructing basic regression models. Ch.7: Constructing multiple regression models. Statistical inference: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the infer package. In particular: Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls. Ch.9: Building confidence intervals. Ch.10: Conducting hypothesis tests. Data modeling revisited: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.6 &amp; Ch.6. In particular: Ch.11: Interpreting both the statistical and practice significance of the results of the models. We’ll end with a discussion on what it means to “think with data” in Chapter 12 and present an example case study data analysis of house prices in Seattle. Figure 1.1: ModernDive Flowchart 1.2.1 What you will learn from this book We hope that by the end of this book, you’ll have learned How to use R to explore data. How to answer statistical questions using tools like confidence intervals and hypothesis tests. How to effectively create “data stories” using these tools. What do we mean by data stories? We mean any analysis involving data that engages the reader in answering questions with careful visuals and thoughtful discussion, such as How strong is the relationship between per capita income and crime in Chicago neighborhoods? and How many f**ks does Quentin Tarantino give (as measured by the amount of swearing in his films)?. Further discussions on data stories can be found in this Think With Google article. For other examples of data stories constructed by students like yourselves, look at the final projects for two courses that have previously used ModernDive: Middlebury College MATH 116 Introduction to Statistical and Data Sciences using student collected data. Pacific University SOC 301 Social Statistics using data from the fivethirtyeight R package. This book will help you develop your “data science toolbox”, including tools such as data visualization, data formatting, data wrangling, and data modeling using regression. With these tools, you’ll be able to perform the entirety of the “data/science pipeline” while building data communication skills (see Subsection 1.2.2 for more details). In particular, this book will lean heavily on data visualization. In today’s world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are to convey relationships with data. You’ll also see the use of visualization to introduce concepts like mean, median, standard deviation, distributions, etc. In general, we’ll use visualization as a way of building almost all of the ideas in this book. To impart the statistical lessons in this book, we have intentionally minimized the number of mathematical formulas used and instead have focused on developing a conceptual understanding via data visualization, statistical computing, and simulations. We hope this is a more intuitive experience than the way statistics has traditionally been taught in the past and how it is commonly perceived. Finally, you’ll learn the importance of literate programming. By this we mean you’ll learn how to write code that is useful not just for a computer to execute but also for readers to understand exactly what your analysis is doing and how you did it. This is part of a greater effort to encourage reproducible research (see Subsection 1.2.3 for more details). Hal Abelson coined the phrase that we will follow throughout this book: “Programs must be written for people to read, and only incidentally for machines to execute.” We understand that there may be challenging moments as you learn to program. Both of us continue to struggle and find ourselves often using web searches to find answers and reach out to colleagues for help. In the long run though, we all can solve problems faster and more elegantly via programming. We wrote this book as our way to help you get started and you should know that there is a huge community of R users that are always happy to help everyone along as well. This community exists in particular on the internet on various forums and websites such as stackoverflow.com. 1.2.2 Data/science pipeline You may think of statistics as just being a bunch of numbers. We commonly hear the phrase “statistician” when listening to broadcasts of sporting events. Statistics (in particular, data analysis), in addition to describing numbers like with baseball batting averages, plays a vital role in all of the sciences. You’ll commonly hear the phrase “statistically significant” thrown around in the media. You’ll see articles that say “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this book, you’ll be able to better understand whether these claims should be trusted or whether we should be wary. Inside data analysis are many sub-fields that we will discuss throughout this book (though not necessarily in this order): data collection data wrangling data visualization data modeling inference correlation and regression interpretation of results data communication/storytelling These sub-fields are summarized in what Grolemund and Wickham term the “Data/Science Pipeline” in Figure 1.2. Figure 1.2: Data/Science Pipeline We will begin by digging into the gray Understand portion of the cycle with data visualization, then with a discussion on what is meant by tidy data and data wrangling, and then conclude by talking about interpreting and discussing the results of our models via Communication. These steps are vital to any statistical analysis. But why should you care about statistics? “Why did they make me take this class?” There’s a reason so many fields require a statistics course. Scientific knowledge grows through an understanding of statistical significance and data analysis. You needn’t be intimidated by statistics. It’s not the beast that it used to be and, paired with computation, you’ll see how reproducible research in the sciences particularly increases scientific knowledge. 1.2.3 Reproducible research “The most important tool is the mindset, when starting, that the end product will be reproducible.” – Keith Baggerly Another goal of this book is to help readers understand the importance of reproducible analyses. The hope is to get readers into the habit of making their analyses reproducible from the very beginning. This means we’ll be trying to help you build new habits. This will take practice and be difficult at times. You’ll see just why it is so important for you to keep track of your code and well-document it to help yourself later and any potential collaborators as well. Copying and pasting results from one program into a word processor is not the way that efficient and effective scientific research is conducted. It’s much more important for time to be spent on data collection and data analysis and not on copying and pasting plots back and forth across a variety of programs. In a traditional analyses if an error was made with the original data, we’d need to step through the entire process again: recreate the plots and copy and paste all of the new plots and our statistical analysis into your document. This is error prone and a frustrating use of time. We’ll see how to use R Markdown to get away from this tedious activity so that we can spend more time doing science. “We are talking about computational reproducibility.” - Yihui Xie Reproducibility means a lot of things in terms of different scientific fields. Are experiments conducted in a way that another researcher could follow the steps and get similar results? In this book, we will focus on what is known as computational reproducibility. This refers to being able to pass all of one’s data analysis, data-sets, and conclusions to someone else and have them get exactly the same results on their machine. This allows for time to be spent interpreting results and considering assumptions instead of the more error prone way of starting from scratch or following a list of steps that may be different from machine to machine. 1.2.4 Final note for students At this point, if you are interested in instructor perspectives on this book, ways to contribute and collaborate, or the technical details of this book’s construction and publishing, then continue with the rest of the chapter below. Otherwise, let’s get started with R and RStudio in Chapter 2! 1.3 Introduction for instructors This book is inspired by the following books: “Mathematical Statistics with Resampling and R” (Chihara and Hesterberg 2011), “OpenIntro: Intro Stat with Randomization and Simulation” (Diez, Barr, and Çetinkaya-Rundel 2014), and “R for Data Science” (Grolemund and Wickham 2016). The first book, while designed for upper-level undergraduates and graduate students, provides an excellent resource on how to use resampling to impart statistical concepts like sampling distributions using computation instead of large-sample approximations and other mathematical formulas. The last two books are free options to learning introductory statistics and data science, providing an alternative to the many traditionally expensive introductory statistics textbooks. When looking over the large number of introductory statistics textbooks that currently exist, we found that there wasn’t one that incorporated many newly developed R packages directly into the text, in particular the many packages included in the tidyverse collection of packages, such as ggplot2, dplyr, tidyr, and broom. Additionally, there wasn’t an open-source and easily reproducible textbook available that exposed new learners all of three of the learning goals listed at the outset of Subsection 1.2.1. 1.3.1 Who is this book for? This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience. Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you. Blur the lines between lecture and lab With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened. It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key. Focus on the entire data/science research pipeline We believe that the entirety of Grolemund and Wickham’s data/science pipeline should be taught. We believe in “minimizing prerequisites to research”: students should be answering questions with data as soon as possible. It’s all about the data We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the nycflights13 and fivethirtyeight packages. We believe that data visualization is a gateway drug for statistics and that the Grammar of Graphics as implemented in the ggplot2 package is the best way to impart such lessons. However, we often hear: “You can’t teach ggplot2 for data visualization in intro stats!” We, like David Robinson, are much more optimistic. dplyr has made data wrangling much more accessible to novices, and hence much more interesting data-sets can be explored. Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference. This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics. Don’t fence off students from the computation pool, throw them in! Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice. We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis. Complete reproducibility and customizability We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book! Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see About this Book. 1.4 DataCamp DataCamp is a browser-based interactive platform for learning data science, offering courses on a wide array of courses on data science, analytics, statistics, machine learning, and artificial intelligence, where each course is a combination of lectures and exercises that offer immediate feedback. The following chapters of ModernDive roughly map to the following closely-integrated DataCamp courses that use the same R tools and often even the same datasets. By no means is this an exhaustive list of possible DataCamp courses that are relevant to the topics in this book, we recommend these ones in particular to supplement your ModernDive experience. Click on the image for each course to access its webpage on datacamp.com. Instructors at accredited universities can sign their class up for a free academic licence at DataCamp For The Classroom, giving their students access to all premium courses for 6 months for free. Chapter Topic DataCamp Courses 2 Basic R programming concepts 3 &amp; 5 Introductory data visualization and wrangling 4 &amp; 5 Data “tidying” and intermediate data wrangling 6 &amp; 7 Data modelling, basic regression, and multiple regression 9 &amp; 10 Statistical inference: confidence intervals and hypothesis testing 11 Inference for regression 1.5 Connect and contribute If you would like to connect with ModernDive, check out the following links: If you would like to receive periodic updates about ModernDive (roughly every 3 months), please sign up for our mailing list. Contact Albert at albert@moderndive.com and Chester chester@moderndive.com We’re on Twitter at ModernDive. If you would like to contribute to ModernDive, there are many ways! Let’s all work together to make this book as great as possible for as many students and instructors as possible! Please let us know if you find any errors, typos, or areas from improvement on our GitHub issues page. If you are familiar with GitHub and would like to contribute more, please see Section 1.6 below. The authors would like to thank Nina Sonneborn, Kristin Bott, and the participants of our USCOTS 2017 workshop for their feedback and suggestions. A special thanks goes to Prof. Yana Weinstein, cognitive psychological scientist and co-founder of The Learning Scientists, for her extensive contributions. 1.6 About this book This book was written using RStudio’s bookdown package by Yihui Xie (Xie 2018). This package simplifies the publishing of books by having all content written in R Markdown. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub: Latest published version The most up-to-date release: Version 0.4.0 released on July 21, 2018 (source code). Available at ModernDive.com Development version The working copy of the next version which is currently being edited: Preview of development version is available at https://moderndive.netlify.com/ Source code: Available on ModernDive’s GitHub repository page Previous versions Older versions that may be out of date: Version 0.3.0 released on February 3, 2018 (source code) Version 0.2.0 released on August 02, 2017 (source code) Version 0.1.3 released on February 09, 2017 (source code) Version 0.1.2 released on January 22, 2017 (source code) Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated editions of the textbook every few years, we apply a software design influenced model of publishing more easily updated versions. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests. Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of index.Rmd as “Chester Ismay, Albert Y. Kim, and YOU!” 1.7 About the authors Who we are! Chester Ismay Albert Y. Kim Chester Ismay: Data Science Curriculum Lead - DataCamp, Portland, OR, USA. Email: chester@moderndive.com Webpage: http://ismayc.github.io/ Twitter: old_man_chester GitHub: https://github.com/ismayc Albert Y. Kim: Assistant Professor of Statistical &amp; Data Sciences - Smith College, Northampton, MA, USA. Email: albert@moderndive.com Webpage: http://rudeboybert.rbind.io/ Twitter: rudeboybert GitHub: https://github.com/rudeboybert "],
+["2-getting-started.html", "2 Getting Started with Data in R 2.1 What are R and RStudio? 2.2 How do I code in R? 2.3 What are R packages? 2.4 Explore your first dataset 2.5 Conclusion", " 2 Getting Started with Data in R Before we can start exploring data in R, there are some key concepts to understand first: What are R and RStudio? How do I code in R? What are R packages? If you are already familiar with these concepts, feel free to skip to Section 2.4 below introducing some of the datasets we will explore in depth in this book. Much of this chapter is based on two sources which you should feel free to use as references if you are looking for additional details: ModernDive co-author Chester Ismay’s Getting used to R, RStudio, and R Markdown (Ismay 2016), which includes video screen recordings that you can follow along and pause as you learn. DataCamp’s online tutorials. DataCamp is a browser-based interactive platform for learning data science and their tutorials will help facilitate your learning of the above concepts (and other topics in this book). Go to DataCamp and create an account before continuing. 2.1 What are R and RStudio? For much of this book, we will assume that you are using R via RStudio. First time users often confuse the two. At its simplest: R is like a car’s engine RStudio is like a car’s dashboard R: Engine RStudio: Dashboard More precisely, R is a programming language that runs computations while RStudio is an integrated development environment (IDE) that provides an interface by adding many convenient features and tools. So the way of having access to a speedometer, rearview mirrors, and a navigation system makes driving much easier, using RStudio’s interface makes using R much easier as well. Optional: For a more in-depth discussion on the difference between R and RStudio IDE, watch this DataCamp video (2m52s). 2.1.1 Installing R and RStudio If your instructor has provided you with a link and access to RStudio Server, then you can skip this section. We do recommend though after a few months of working on the RStudio Server that you return to these instructions. If you don’t know what RStudio Server is, then please read this section. You will first need to download and install both R and RStudio (Desktop version) on your computer. Download and install R. Note: You must do this first. Click on the download link corresponding to your computer’s operating system. Download and install RStudio. Scroll down to “Installers for Supported Platforms” Click on the download link corresponding to your computer’s operating system. Optional: If you need more detailed instructions on how to install R and RStudio, watch this DataCamp video (1m22s). 2.1.2 Using R via RStudio Recall our car analogy from above. Much as we don’t drive a car by interacting directly with the engine but rather by using elements on the car’s dashboard, we won’t be using R directly but rather we will use RStudio’s interface. After you install R and RStudio on your computer, you’ll have two new programs AKA applications you can open. We will always work in RStudio and not R. In other words: R: Do not open this RStudio: Open this After you open RStudio, you should see the following: Watch the following DataCamp video (4m10s) to learn about the different panes in RStudio, in particular the Console pane where you will later run R code. 2.2 How do I code in R? Now that you’re set up with R and RStudio, you are probably asking yourself “OK. Now how do I use R?” The first thing to note as that unlike other software like Excel, STATA, or SAS that provide point and click interfaces, R is an interpreted language, meaning you have to enter in R commands written in R code i.e. you have to program in R (we use the terms “coding” and “programming” interchangeably in this book). While it is not required to be a seasoned coder/computer programmer to use R, there is still a set of basic programming concepts that R users need to understand. Consequently, while this book is not a book on programming, you will still learn just enough of these basic programming concepts needed to explore and analyze data effectively. 2.2.1 Basic programming concepts and terminology To introduce you to many of these basic programming concepts and terminology, we direct you to the following DataCamp online interactive tutorials. For each of the tutorials, we give a list of the basic programming concepts covered. Note that in this book, we will use a different font to distinguish regular font from computer_code. It is important to note that while these tutorials serve as excellent introductions, a single pass through them is insufficient for long-term learning and retention. The ultimate tools for long-term learning and retention are “learning by doing” and repetition, something we will have you do over the course of the entire book and we encourage this process as much as possible as you learn any new skill. From the Introduction to R course complete the following chapters. As you work through the chapters, carefully note the important terms and what they are used for. We recommend you do so in a notebook that you can easily refer back to. Chapter 1 Intro to basics: Console pane: where you enter in commands Objects: where values are saved, how to assign values to objects. Data types: integers, doubles/numerics, logicals, characters. Chapter 2 Vectors: Vectors: a series of values. Chapter 4 Factors: Categorical data (as opposed to numerical data) are represented in R as factors. Chapter 5 Data frames: Data frames are analogous to rectangular spreadsheets: they are representations of datasets in R where the rows correspond observations and the columns correspond to variables that describe the observations. We will revisit this later in Section 2.4. From the Intermediate R course complete the following chapters: Chapter 1 Conditionals and Control Flow: Testing for equality in R using == (and not = which is typically used for assignment). Ex: 2 + 1 == 3 compares 2 + 1 to 3 and is correct R syntax, while 2 + 1 = 3 is not and is incorrect R syntax. Boolean algebra: TRUE/FALSE statements and mathematical operators such as &lt; (less than), &lt;= (less than or equal), and != (not equal to). Logical operators: &amp; representing “and”, | representing “or”. Ex: (2 + 1 == 3) &amp; (2 + 1 == 4) returns FALSE while (2 + 1 == 3) | (2 + 1 == 4) returns TRUE. Chapter 3 Functions: Concept of functions: they take in inputs (called arguments) and return outputs. You either manually specify a function’s arguments or use the function’s defaults. This list is by no means an exhaustive list of all the programming concepts and terminology needed to become a savvy R user; such a list would be so large it wouldn’t be very useful, especially for novices. Rather, we feel this is the bare minimum you need to know before you get started; the rest we feel you can learn as you go. Remember that your knowledge of all of these concepts will build as you get better and better at “speaking R” and getting used to its syntax. 2.2.2 Tips on learning to code Learning to code/program is very much like learning a foreign language, it can be very daunting and frustrating at first. However just as with learning a foreign language, if you put in the effort and are not afraid to make mistakes, anybody can learn. Lastly, there are a few useful things to keep in mind as you learn to program: Computers are stupid: You have to tell a computer everything it needs to do. Furthermore, your instructions can’t have any mistakes in them, nor can they be ambiguous in any way. Take the “copy/paste/tweak” approach: Especially when learning your first programming language, it is often much easier to taking existing code that you know works and modify it to suit your ends, rather than trying to write new code from scratch. We call this the copy/paste/tweak approach. So early on, we suggest not trying to code from scratch, but please take the code we provide throughout this book and play around with it! Practice is key: Just as the only solution to improving your foreign language skills is practice, so also the only way to get better at R is through practice. Don’t worry however, we’ll give you plenty of opportunities to practice! 2.3 What are R packages? Another point of confusion with new R users is the notion of a package. R packages extend the functionality of R by providing additional functions, data, and documentation and can be downloaded for free from the internet. They are written by a world-wide community of R users. For example, among the many packages we will use in this book are the ggplot2 package for data visualization in Chapter 3 dplyr package for data wrangling in Chapter 5 There are two key things to remember about R packages: Installation: Most packages are not installed by default when you install R and RStudio. You need to install a package before you can use it. Once you’ve installed it, you likely don’t need to install it again unless you want to update it to a newer version of the package. Loading: Packages are not loaded automatically when you open RStudio. You need to load them every time you open RStudio using the library() command. A good analogy for R packages is they are like apps you can download onto a mobile phone: R: A new phone R Packages: Apps you can download So, expanding on this analogy a bit: R is like a new mobile phone. It has a certain amount of functionality when you use it for the first time, but it doesn’t have everything. R packages are like the apps you can download onto your phone, much like those offered in the App Store and Google Play. For example: Instagram. In order to use a package, just like in order to use Instagram, you must: First download it and install it. You do this only once. Load it, or in other words, “open” it, using the library() command. So just as you can only start sharing photos with your friends on Instagram if you first install the app and then open it, you can only access an R package’s data and functions if you first install the package and then load it with the library() command. Let’s cover these two steps: 2.3.1 Package installation (Note that if you are working on an RStudio Server, you probably will not need to install your own packages as that has been already done for you. Still it is important that you know this process for later when you are not using the RStudio Server but rather your own installation of RStudio Desktop.) There are two ways to install an R package. For example, to install the ggplot2 package: Easy way: In the Files pane of RStudio: Click on the “Packages” tab Click on “Install” Type the name of the package under “Packages (separate multiple with space or comma):” In this case, type ggplot2 Click “Install” Alternative way: In the Console pane run install.packages(&quot;ggplot2&quot;) (you must include the quotation marks). Repeat this for the dplyr and nycflights13 packages. Note: You only have to install a package once, unless you want to update an already installed package to the latest version. If you want to update a package to the latest version, then re-install it by repeating the above steps. 2.3.2 Package loading After you’ve installed a package, you can now load it using the library() command. For example, to load the ggplot2 and dplyr packages, run the following code in the Console pane: library(ggplot2) library(dplyr) Note: You have to reload each package you want to use every time you open a new session of RStudio. This is a little annoying to get used to and will be your most common error as you begin. When you see an error such as Error: could not find function remember that this likely comes from you trying to use a function in a package that has not been loaded. Remember to run the library() function with the appropriate package to fix this error. 2.4 Explore your first dataset Let’s put everything we’ve learned so far into practice and start exploring some real data! Data comes to us in a variety of formats, from pictures to text to numbers. Throughout this book, we’ll focus on datasets that can be stored in a spreadsheet as that is among the most common way data is collected in the many fields. Remember from Subsection 2.2.1 that these “spreadsheet”-type datasets are called data frames in R and we will focus on working with data frames throughout this book. Let’s first load all the packages needed for this chapter (This assumes you’ve already installed them. Read Section 2.3 for information on how to install and load R packages if you haven’t already.) At the beginning of all subsequent chapters in this text, we’ll always have a list of packages similar to what follows that you should have installed and loaded to work with that chapter’s R code. library(dplyr) Warning: package &#39;dplyr&#39; was built under R version 3.5.2 # Be sure to install these first! library(nycflights13) library(knitr) 2.4.1 nycflights13 package We likely have all flown on airplanes or know someone who has. Air travel has become an ever-present aspect in many people’s lives. If you live in or are visiting a relatively large city and you walk around that city’s airport, you see gates showing flight information from many different airlines. And you will frequently see that some flights are delayed because of a variety of conditions. Are there ways that we can avoid having to deal with these flight delays? We’d all like to arrive at our destinations on time whenever possible. (Unless you secretly love hanging out at airports. If you are one of these people, pretend for the moment that you are very much anticipating being at your final destination.) Throughout this book, we’re going to analyze data related to flights contained in the nycflights13 package (Wickham 2018). Specifically, this package contains five datasets saved as “data frames” (see Section 2.2) with information about all domestic flights departing from New York City in 2013, from either Newark Liberty International (EWR), John F. Kennedy International (JFK), or LaGuardia (LGA) airports: flights: information on all 336,776 flights airlines: translation between two letter IATA carrier codes and names (16 in total) planes: construction information about each of 3,322 planes used weather: hourly meteorological data (about 8705 observations) for each of the three NYC airports airports: airport names and locations 2.4.2 flights data frame We will begin by exploring the flights data frame that is included in the nycflights13 package and getting an idea of its structure. Run the following in your code in your console: it loads in the flights dataset into your Console. Note depending on the size of your monitor, the output may vary slightly. flights # A tibble: 336,776 x 19 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; 1 2013 1 1 517 515 2 830 819 2 2013 1 1 533 529 4 850 830 3 2013 1 1 542 540 2 923 850 4 2013 1 1 544 545 -1 1004 1022 5 2013 1 1 554 600 -6 812 837 6 2013 1 1 554 558 -4 740 728 7 2013 1 1 555 600 -5 913 854 8 2013 1 1 557 600 -3 709 723 9 2013 1 1 557 600 -3 838 846 10 2013 1 1 558 600 -2 753 745 # … with 336,766 more rows, and 11 more variables: arr_delay &lt;dbl&gt;, # carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;, # air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;, time_hour &lt;dttm&gt; Let’s unpack this output: A tibble: 336,776 x 19: a tibble is a kind of data frame. This particular data frame has 336,776 rows 19 columns corresponding to 19 variables describing each observation year month day dep_time sched_dep_time dep_delay arr_time are different columns, in other words variables, of this data frame. We then have the first 10 rows of observations corresponding to 10 flights. ... with 336,766 more rows, and 11 more variables: indicating to us that 336,766 more rows of data and 11 more variables could not fit in this screen. Unfortunately, this output does not allow us to explore the data very well. Let’s look at different tools to explore data frames. 2.4.3 Exploring data frames Among the many ways of getting a feel for the data contained in a data frame such as flights, we present three functions that take as their argument the data frame in question: Using the View() function built for use in RStudio. We will use this the most. Using the glimpse() function loaded via dplyr package Using the kable() function in the knitr package Using the $ operator to view a single variable in a data frame 1. View(): Run View(flights) in your Console in RStudio and explore this data frame in the resulting pop-up viewer. You should get into the habit of always Viewing any data frames that come your way. Note the capital “V” in View. R is case-sensitive so you’ll receive an error is you run view(flights) instead of View(flights). Learning check (LC2.1) What does any ONE row in this flights dataset refer to? A. Data on an airline B. Data on a flight C. Data on an airport D. Data on multiple flights By running View(flights), we see the different variables listed in the columns and we see that there are different types of variables. Some of the variables like distance, day, and arr_delay are what we will call quantitative variables. These variables are numerical in nature. Other variables here are categorical. Note that if you look in the leftmost column of the View(flights) output, you will see a column of numbers. These are the row numbers of the dataset. If you glance across a row with the same number, say row 5, you can get an idea of what each row corresponds to. In other words, this will allow you to identify what object is being referred to in a given row. This is often called the observational unit. The observational unit in this example is an individual flight departing New York City in 2013. You can identify the observational unit by determining what the thing is that is being measured in each of the variables. 2. glimpse(): The second way to explore a data frame is using the glimpse() function that you can access after you’ve loaded the dplyr package. It provides us with much of the above information and more. glimpse(flights) Observations: 336,776 Variables: 19 $ year &lt;int&gt; 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, … $ month &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … $ day &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … $ dep_time &lt;int&gt; 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558,… $ sched_dep_time &lt;int&gt; 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600,… $ dep_delay &lt;dbl&gt; 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -… $ arr_time &lt;int&gt; 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849… $ sched_arr_time &lt;int&gt; 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851… $ arr_delay &lt;dbl&gt; 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -… $ carrier &lt;chr&gt; &quot;UA&quot;, &quot;UA&quot;, &quot;AA&quot;, &quot;B6&quot;, &quot;DL&quot;, &quot;UA&quot;, &quot;B6&quot;, &quot;EV&quot;, &quot;B6&quot;, … $ flight &lt;int&gt; 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, … $ tailnum &lt;chr&gt; &quot;N14228&quot;, &quot;N24211&quot;, &quot;N619AA&quot;, &quot;N804JB&quot;, &quot;N668DN&quot;, &quot;N39… $ origin &lt;chr&gt; &quot;EWR&quot;, &quot;LGA&quot;, &quot;JFK&quot;, &quot;JFK&quot;, &quot;LGA&quot;, &quot;EWR&quot;, &quot;EWR&quot;, &quot;LGA&quot;… $ dest &lt;chr&gt; &quot;IAH&quot;, &quot;IAH&quot;, &quot;MIA&quot;, &quot;BQN&quot;, &quot;ATL&quot;, &quot;ORD&quot;, &quot;FLL&quot;, &quot;IAD&quot;… $ air_time &lt;dbl&gt; 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, … $ distance &lt;dbl&gt; 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733,… $ hour &lt;dbl&gt; 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, … $ minute &lt;dbl&gt; 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, … $ time_hour &lt;dttm&gt; 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 … Learning check (LC2.2) What are some examples in this dataset of categorical variables? What makes them different than quantitative variables? (LC2.3) What does int, dbl, and chr mean in the output above? We see that glimpse will give you the first few entries of each variable in a row after the variable. In addition, the data type (See Subsection 2.2.1) of the variable is given immediately after each variable’s name inside &lt; &gt;. Here, int and num refer to quantitative variables. In contrast, chr refers to categorical variables. One more type of variable is given here with the time_hour variable: dttm. As you may suspect, this variable corresponds to a specific date and time of day. 3. kable(): The final way to explore the entirety of a data frame is using the kable() function from the knitr package. Let’s explore the different carrier codes for all the airlines in our dataset two ways. Run both of these in your Console: airlines kable(airlines) At first glance of both outputs, it may not appear that there is much difference. However, we’ll see later on, especially when using a tool for document production called R Markdown, that the latter produces output that is much more legible. 4. $ operator Lastly, the $ operator allows us to explore a single variable within a data frame. For example, run the following in your console airlines airlines$name We used the $ operator to extract only the name variable and return it as a vector of length 16. We will only be occasionally exploring data frames using this operator. 2.4.4 Help files Another nice feature of R is the help system. You can get help in R by entering a ? before the name of a function or data frame in question and you will be presented with a page showing the documentation. For example, let’s look at the help file for the flights data frame: ?flights A help file should pop-up in the Help pane of RStudio. Note the content of this particular help file is also accessible on the web on page 3 of the PDF document. You should get in the habit of consulting the help file of any function or data frame in R about which you have questions. 2.5 Conclusion We’ve given you what we feel are the most essential concepts to know before you can start exploring data in R. Is this chapter exhaustive? Absolutely not. To try to include everything in this chapter would make the chapter so large it wouldn’t be useful! However, as we stated earlier, the best way to learn R is to learn by doing. Now let’s get into learning about how to create good stories about and with data. In Chapter 3, we start with what we feel is the most important tool in a data scientist’s toolbox: data visualization. 2.5.1 What’s to come? We’ll now start the “data science” portion of the in Chapter 3, where we will further explore the datasets include the nycflights13 package. We’ll see that data visualization is a powerful tool to add to our toolbox for exploring what is going on in a dataset beyond the View and glimpse functions we introduced in this chapter. "],
+["3-viz.html", "3 Data Visualization via ggplot2 3.1 The Grammar of Graphics 3.2 Five Named Graphs - The 5NG 3.3 5NG#1: Scatterplots 3.4 5NG#2: Linegraphs 3.5 5NG#3: Histograms 3.6 Facets 3.7 5NG#4: Boxplots 3.8 5NG#5: Barplots 3.9 Conclusion", " 3 Data Visualization via ggplot2 We begin the development of your data science toolbox with data visualization. By visualizing our data, we will be able to gain valuable insights from our data that we couldn’t initially see from just looking at the raw data in spreadsheet form. We will use the ggplot2 package as it provides an easy way to customize your plots and is rooted in the data visualization theory known as The Grammar of Graphics (Wilkinson 2005). At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). The most important thing to know about graphics is that they should be created to make it obvious for your audience to understand the findings and insight you want to get across. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible, but on the other you don’t want to include so many as to overwhelm your audience. As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the distribution of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is distributed in terms of its values) as we go across the levels of a different categorical variable. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(nycflights13) library(ggplot2) library(dplyr) DataCamp Our approach to introducing data visualization via the Grammar of Graphics and the ggplot2 package is very similar to the approach taken in David Robinson’s DataCamp course “Introduction to the Tidyverse,” a course targeted at people new to R and the tidyverse. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters of the course are Chapter 2 on “Data visualization” and Chapter 4 on “Types of visualizations”. 3.1 The Grammar of Graphics We begin with a discussion of a theoretical framework for data visualization known as the “The Grammar of Graphics,” which serves as the basis for the ggplot2 package. Much like how we construct sentences in any language by using a linguistic grammar (nouns, verbs, subjects, objects, etc.), the theoretical framework given by Leland Wilkinson (Wilkinson 2005) allows us to specify the components of a statistical graphic. 3.1.1 Components of the Grammar In short, the grammar tells us that: A statistical graphic is a mapping of data variables to aesthetic attributes of geometric objects. Specifically, we can break a graphic into the following three essential components: data: the data-set comprised of variables that we map. geom: the geometric object in question. This refers to our type of objects we can observe in our plot. For example, points, lines, bars, etc. aes: aesthetic attributes of the geometric object that we can perceive on a graphic. For example, x/y position, color, shape, and size. Each assigned aesthetic attribute can be mapped to a variable in our data-set. Let’s break down the grammar with an example. 3.1.2 Gapminder In February 2006, a statistician named Hans Rosling gave a TED talk titled “The best stats you’ve ever seen” where he presented global economic, health, and development data from the website gapminder.org. For example, from the 1704 countries included from 2007, consider only the first 6 countries when listed alphabetically: Table 3.1: Gapminder 2007 Data: First 6 of 142 countries Country Continent Life Expectancy Population GDP per Capita Afghanistan Asia 43.83 31889923 974.58 Albania Europe 76.42 3600523 5937.03 Algeria Africa 72.30 33333216 6223.37 Angola Africa 42.73 12420476 4797.23 Argentina Americas 75.32 40301927 12779.38 Australia Oceania 81.23 20434176 34435.37 Each row in this table corresponds to a country in 2007. For each row, we have 5 columns: Country: Name of country. Continent: Which of the five continents the country is part of. (Note that Americas groups North and South America and that Antarctica is excluded here.) Life Expectancy: Life expectancy in years. Population: Number of people living in the country. GDP per Capita: Gross domestic product (in US dollars). Now consider Figure 3.1, which plots this data for all 142 countries in the data frame. Note that R will deal with large numbers using scientific notation. So in the legend for “Population”, 1.25e+09 = \\(1.25 \\times 10^{9}\\) = 1,250,000,000 = 1.25 billion. Figure 3.1: Life Expectancy over GDP per Capita in 2007 Let’s view this plot through the grammar of graphics: The data variable GDP per Capita gets mapped to the x-position aesthetic of the points. The data variable Life Expectancy gets mapped to the y-position aesthetic of the points. The data variable Population gets mapped to the size aesthetic of the points. The data variable Continent gets mapped to the color aesthetic of the points. Recall that data here corresponds to each of the variables being in the same data frame and the “data variable” corresponds to a column in a data frame. While in this example we are considering one type of geometric object (of type point), graphics are not limited to just points. Some plots involve lines while others involve bars. Let’s summarize the three essential components of the grammar in a table: Table 3.2: Summary of Grammar of Graphics for this plot data variable aes geom GDP per Capita x point Life Expectancy y point Population size point Continent color point 3.1.3 Other components of the Grammar There are other components of the Grammar of Graphics we can control. As you start to delve deeper into the Grammar of Graphics, you’ll start to encounter these topics more and more often. In this book, we’ll only work with the two other components below (The other components are left to a more advanced text such as R for Data Science (Grolemund and Wickham 2016)): faceting breaks up a plot into small multiples corresponding to the levels of another variable (Section 3.6) position adjustments for barplots (Section 3.8) In general, the Grammar of Graphics allows for a high degree of customization and also a consistent framework for easy updating/modification of plots. 3.1.4 The ggplot2 package In this book, we will be using the ggplot2 package for data visualization, which is an implementation of the Grammar of Graphics for R (Wickham et al. 2018). You may have noticed that a lot of the previous text in this chapter is written in computer font. This is because the various components of the Grammar of Graphics are specified in the ggplot function, which expects at a bare minimum as arguments: The data frame where the variables exist: the data argument The mapping of the variables to aesthetic attributes: the mapping argument, which specifies the aesthetic attributes involved After we’ve specified these components, we then add layers to the plot using the + sign. The most essential layer to add to a plot is the specification of which type of geometric object we want the plot to involve; e.g. points, lines, bars. Other layers we can add include the specification of the plot title, axes labels, facets, and visual themes for the plot. Let’s now put the theory of the Grammar of Graphics into practice. 3.2 Five Named Graphs - The 5NG For our purposes, we will be limiting consideration to five different types of graphs. We term these five named graphs the 5NG: scatterplots linegraphs boxplots histograms barplots We will discuss some variations of these plots, but with this basic repertoire in your toolbox you can visualize a wide array of different data variable types. Note that certain plots are only appropriate for categorical/logical variables and others only for quantitative variables. You’ll want to quiz yourself often as we go along on which plot makes sense a given a particular problem or data-set. 3.3 5NG#1: Scatterplots The simplest of the 5NG are scatterplots (also called bivariate plots); they allow you to investigate the relationship between two numerical variables. While you may already be familiar with this type of plot, let’s view it through the lens of the Grammar of Graphics. Specifically, we will graphically investigate the relationship between the following two numerical variables in the flights data frame: dep_delay: departure delay on the horizontal “x” axis and arr_delay: arrival delay on the vertical “y” axis for Alaska Airlines flights leaving NYC in 2013. This requires paring down the flights data frame to a smaller data frame all_alaska_flights consisting of only Alaska Airlines (carrier code “AS”) flights. Don’t worry for now if you don’t fully understand what this code is doing, we’ll explain this in details Chapter 5, just run it all and understand that we are taking all flights and only considering those corresponding to Alaska Airlines. all_alaska_flights &lt;- flights %&gt;% filter(carrier == &quot;AS&quot;) This code snippet makes use of functions in the dplyr package for data wrangling to achieve our goal: it takes the flights data frame and filters it to only return the rows which meet the condition carrier == &quot;AS&quot;. Recall from Section 2.2 that testing for equality is specified with == and not =. You will see many more examples of == and filter() in Chapter 5. Learning check (LC3.1) Take a look at both the flights and all_alaska_flights data frames by running View(flights) and View(all_alaska_flights) in the console. In what respect do these data frames differ? 3.3.1 Scatterplots via geom_point We proceed to create the scatterplot using the ggplot() function: ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point() Figure 3.2: Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013 In Figure 3.2 we see that a positive relationship exists between dep_delay and arr_delay: as departure delays increase, arrival delays tend to also increase. We also note that the majority of points fall near the point (0, 0). There is a large mass of points clustered there. Furthermore after executing this code, R returns a warning message alerting us to the fact that 5 rows were ignored due to missing values. For 5 rows either the value for dep_delay or arr_delay or both were missing, and thus these rows were ignored in our plot. Let’s go back to the ggplot() function call that created this visualization, keeping in mind our discussion in Section 3.1: Within the ggplot() function call, we specify two of the components of the grammar: The data frame to be all_alaska_flights by setting data = all_alaska_flights The aesthetic mapping by setting aes(x = dep_delay, y = arr_delay). Specifically the variable dep_delay maps to the x position aesthetic the variable arr_delay maps to the y position aesthetic We add a layer to the ggplot() function call using the + sign. The layer in question specifies the third component of the grammar: the geometric object. In this case the geometric object are points, set by specifying geom_point(). Some notes on layers: Note that the + sign comes at the end of lines, and not at the beginning. You’ll get an error in R if you put it at the beginning. When adding layers to a plot, you are encouraged to hit Return on your keyboard after entering the + so that the code for each layer is on a new line. As we add more and more layers to plots, you’ll see this will greatly improve the legibility of your code. To stress the importance of adding layers, in particular the layer specifying the geometric object, consider Figure 3.3 where no layers are added. A not very useful plot! ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) Figure 3.3: Plot with No Layers Learning check (LC3.2) What are some practical reasons why dep_delay and arr_delay have a positive relationship? (LC3.3) What variables (not necessarily in the flights data frame) would you expect to have a negative correlation (i.e. a negative relationship) with dep_delay? Why? Remember that we are focusing on numerical variables here. (LC3.4) Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights? (LC3.5) What are some other features of the plot that stand out to you? (LC3.6) Create a new scatterplot using different variables in the all_alaska_flights data frame by modifying the example above. 3.3.2 Over-plotting The large mass of points near (0, 0) in Figure 3.2 can cause some confusion. This is the result of a phenomenon called overplotting. As one may guess, this corresponds to values being plotted on top of each other over and over again. It is often difficult to know just how many values are plotted in this way when looking at a basic scatterplot as we have here. There are two ways to address this issue: By adjusting the transparency of the points via the alpha argument By jittering the points via geom_jitter() The first way of relieving overplotting is by changing the alpha argument in geom_point() which controls the transparency of the points. By default, this value is set to 1. We can change this to any value between 0 and 1 where 0 sets the points to be 100% transparent and 1 sets the points to be 100% opaque. Note how the following function call is identical to the one in Section 3.3, but with alpha = 0.2 added to the geom_point(). ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point(alpha = 0.2) Figure 3.4: Delay scatterplot with alpha=0.2 The key feature to note in Figure 3.4 is that the transparency of the points is cumulative: areas with a high-degree of overplotting are darker, whereas areas with a lower degree are less dark. Note that there is no aes() surrounding alpha = 0.2 here. Since we are NOT mapping a variable to an aesthetic but instead are just changing a setting, we don’t need to create a mapping with aes(). In fact, you’ll receive an error if you try to change the second line above to geom_point(aes(alpha = 0.2)). The second way of relieving overplotting is to jitter the points a bit. In other words, we are going to add just a bit of random noise to the points to better see them and alleviate some of the overplotting. You can think of “jittering” as shaking the points around a bit on the plot. Let’s illustrate using a simple example first. Say we have a data frame jitter_example with 4 rows of identical value 0 for both x and y: jitter_example # A tibble: 4 x 2 x y &lt;dbl&gt; &lt;dbl&gt; 1 0 0 2 0 0 3 0 0 4 0 0 We display the resulting scatterplot in Figure 3.5; observe that the 4 points are superimposed on top of each other. While we know there are 4 values being plotted, this fact might not be apparent to others. Figure 3.5: Regular scatterplot of jitter example data In Figure 3.6 we instead display a jittered scatterplot. Since each point is given a random “nudge”, it is now plainly evident that there are four points. Figure 3.6: Jittered scatterplot of jitter example data To create a jittered scatterplot, instead of using geom_point, we use geom_jitter. To specify how much jitter to add, we adjust the width and height arguments. This corresponds to how hard you’d like to shake the plot in units corresponding to those for both the horizontal and vertical variables (in this case, minutes). It is important to add just enough jitter to break any overlap in points, but not so much that we completely obscure the overall pattern in points. ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_jitter(width = 30, height = 30) Figure 3.7: Jittered delay scatterplot Observe how this function call is identical to the one in Subsection 3.3.1, but with geom_point() replaced with geom_jitter(). Also, it is important to note that geom_jitter() is strictly a visualization tool and that does not alter the original values saved in jitter_example. The plot in Figure 3.7 helps us a little bit in getting a sense for the overplotting, but with a relatively large data-set like this one (714 flights), it can be argued that changing the transparency of the points by setting alpha proved more effective. Furthermore, we’ll see later on that the two following R commands will yield the exact same plot: ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_jitter(width = 30, height = 30) ggplot(all_alaska_flights, aes(x = dep_delay, y = arr_delay)) + geom_jitter(width = 30, height = 30) In other words you can drop the data = and mapping = if you keep the order of the two arguments the same. Since the ggplot() function is expecting its first argument data to be a data frame and its second argument to correspond to mapping =, you can omit both and you’ll get the same plot. As you get more and more practice, you’ll likely find yourself not including the specification of the argument like this. But for now to keep things straightforward let’s make it a point to include the data = and mapping =. Learning check (LC3.7) Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? (LC3.8) After viewing the Figure 3.4 above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2? 3.3.3 Summary Scatterplots display the relationship between two numerical variables. They are among the most commonly used plots because they can provide an immediate way to see the trend in one variable versus another. However, if you try to create a scatterplot where either one of the two variables is not numerical, you will get strange results. Be careful! With medium to large data-sets, you may need to play with either geom_jitter() or the alpha argument in order to get a good feel for relationships in your data. This tweaking is often a fun part of data visualization since you’ll have the chance to see different relationships come about as you make subtle changes to your plots. 3.4 5NG#2: Linegraphs The next of the 5NG is a linegraph. They are most frequently used when the x-axis represents time and the y-axis represents some other numerical variable; such plots are known as time series. Time represents a variable that is connected together by each day following the previous day. In other words, time has a natural ordering. Linegraphs should be avoided when there is not a clear sequential ordering to the explanatory variable, i.e. the x-variable or the predictor variable. Our focus now turns to the temp variable in this weather data-set. By Looking over the weather data-set by typing View(weather) in the console. Running ?weather to bring up the help file. We can see that the temp variable corresponds to hourly temperature (in Fahrenheit) recordings at weather stations near airports in New York City. Instead of considering all hours in 2013 for all three airports in NYC, let’s focus on the hourly temperature at Newark airport (origin code “EWR”) for the first 15 days in January 2013. The weather data frame in the nycflights13 package contains this data, but we first need to filter it to only include those rows that correspond to Newark in the first 15 days of January. early_january_weather &lt;- weather %&gt;% filter(origin == &quot;EWR&quot; &amp; month == 1 &amp; day &lt;= 15) This is similar to the previous use of the filter command in Section 3.3, however we now use the &amp; operator. The above selects only those rows in weather where the originating airport is &quot;EWR&quot; and we are in the first month and the day is from 1 to 15 inclusive. Learning check (LC3.9) Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather) in the console. In what respect do these data frames differ? (LC3.10) View() the flights data frame again. Why does the time_hour variable uniquely identify the hour of the measurement whereas the hour variable does not? 3.4.1 Linegraphs via geom_line We plot a linegraph of hourly temperature using geom_line(): ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) + geom_line() Figure 3.8: Hourly Temperature in Newark for January 1-15, 2013 Much as with the ggplot() call in Chapter 3.3.1, we describe the components of the Grammar of Graphics: Within the ggplot() function call, we specify two of the components of the grammar: The data frame to be early_january_weather by setting data = early_january_weather The aesthetic mapping by setting aes(x = time_hour, y = temp). Specifically time_hour (i.e. the time variable) maps to the x position temp maps to the y position We add a layer to the ggplot() function call using the + sign The layer in question specifies the third component of the grammar: the geometric object in question. In this case the geometric object is a line, set by specifying geom_line(). Learning check (LC3.11) Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? (LC3.12) Why are linegraphs frequently used when time is the explanatory variable? (LC3.13) Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013. 3.4.2 Summary Linegraphs, just like scatterplots, display the relationship between two numerical variables. However, the variable on the x-axis (i.e. the explanatory variable) should have a natural ordering, like some notion of time. We can mislead our audience if that isn’t the case. 3.5 5NG#3: Histograms Let’s consider the temp variable in the weather data frame once again, but now unlike with the linegraphs in Chapter 3.4, let’s say we don’t care about the relationship of temperature to time, but rather we care about the (statistical) distribution of temperatures. We could just produce points where each of the different values appear on something similar to a number line: Figure 3.9: Plot of Hourly Temperature Recordings from NYC in 2013 This gives us a general idea of how the values of temp differ. We see that temperatures vary from around 11 up to 100 degrees Fahrenheit. The area between 40 and 60 degrees appears to have more points plotted than outside that range. 3.5.1 Histograms via geom_histogram What is commonly produced instead of the above plot is a plot known as a histogram. The histogram shows how many elements of a single numerical variable fall in specified bins. In this case, these bins may correspond to between 0-10°F, 10-20°F, etc. We produce a histogram of the hour temperatures at all three NYC airports in 2013: ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram() `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Warning: Removed 1 rows containing non-finite values (stat_bin). Figure 3.10: Histogram of Hourly Temperature Recordings from NYC in 2013 Note here: There is only one variable being mapped in aes(): the single numerical variable temp. You don’t need to compute the y-aesthetic: it gets computed automatically. We set the geometric object to be geom_histogram() We got a warning message of 1 rows containing non-finite values being removed. This is due to one of the values of temperature being missing. R is alerting us that this happened. Another warning corresponds to an urge to specify the number of bins you’d like to create. 3.5.2 Adjusting the bins We can adjust characteristics of the bins in one of two ways: By adjusting the number of bins via the bins argument By adjusting the width of the bins via the binwidth argument First, we have the power to specify how many bins we would like to put the data into as an argument in the geom_histogram() function. By default, this is chosen to be 30 somewhat arbitrarily; we have received a warning above our plot that this was done. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(bins = 60, color = &quot;white&quot;) Figure 3.11: Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Bins Note the addition of the color argument. If you’d like to be able to more easily differentiate each of the bins, you can specify the color of the outline as done above. You can also adjust the color of the bars by setting the fill argument. Type colors() in your console to see all 657 available colors. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(bins = 60, color = &quot;white&quot;, fill = &quot;steelblue&quot;) Figure 3.12: Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Colored Bins Second, instead of specifying the number of bins, we can also specify the width of the bins by using the binwidth argument in the geom_histogram function. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(binwidth = 10, color = &quot;white&quot;) Figure 3.13: Histogram of Hourly Temperature Recordings from NYC in 2013 - Binwidth = 10 Learning check (LC3.14) What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures? (LC3.15) Would you classify the distribution of temperatures as symmetric or skewed? (LC3.16) What would you guess is the “center” value in this distribution? Why did you make that choice? (LC3.17) Is this data spread out greatly from the center or is it close? Why? 3.5.3 Summary Histograms, unlike scatterplots and linegraphs, present information on only a single numerical variable. In particular they are visualizations of the (statistical) distribution of values. 3.6 Facets Before continuing the 5NG, we briefly introduce a new concept called faceting. Faceting is used when we’d like to create small multiples of the same plot over a different categorical variable. By default, all of the small multiples will have the same vertical axis. For example, suppose we were interested in looking at how the temperature histograms we saw in Chapter 3.5 varied by month. This is what is meant by “the distribution of a variable over another variable”: temp is one variable and month is the other variable. In order to look at histograms of temp for each month, we add a layer facet_wrap(~ month). You can also specify how many rows you’d like the small multiple plots to be in using nrow or how many columns using ncol inside of facet_wrap. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + facet_wrap(~ month, nrow = 4) Figure 3.14: Faceted histogram Note the use of the ~ before month in facet_wrap. The tilde (~) is required and you’ll receive the error Error in as.quoted(facets) : object 'month' not found if you don’t include it before month here. As we might expect, the temperature tends to increase as summer approaches and then decrease as winter approaches. Learning check (LC3.18) What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables? (LC3.19) What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100? (LC3.20) For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics. (LC3.21) Does the temp variable in the weather data-set have a lot of variability? Why do you say that? 3.7 5NG#4: Boxplots While using faceted histograms can provide a way to compare distributions of a numerical variable split by groups of a categorical variable as in Section 3.6, an alternative plot called a boxplot (also called a side-by-side boxplot) achieves the same task and is frequently preferred. The boxplot uses the information provided in the five-number summary referred to in Appendix A. It gives a way to compare this summary information across the different levels of a categorical variable. 3.7.1 Boxplots via geom_boxplot Let’s create a boxplot to compare the monthly temperatures as we did above with the faceted histograms. ggplot(data = weather, mapping = aes(x = month, y = temp)) + geom_boxplot() Figure 3.15: Invalid boxplot specification Warning messages: 1: Continuous x aesthetic -- did you forget aes(group=...)? 2: Removed 1 rows containing non-finite values (stat_boxplot). Note the set of warnings that is given here. The second warning corresponds to missing values in the data frame and it is turned off on subsequent plots. Let’s focus on the first warning. Observe that this plot does not look like what we were expecting. We were expecting to see the distribution of temperatures for each month (so 12 different boxplots). The first warning is letting us know that we are plotting a numerical, and not categorical variable, on the x-axis. This gives us the overall boxplot without any other groupings. We can get around this by introducing a new function for our x variable: ggplot(data = weather, mapping = aes(x = factor(month), y = temp)) + geom_boxplot() Figure 3.16: Month by temp boxplot We have introduced a new function called factor() which converts a numerical variable to a categorical one. This is necessary as geom_boxplot requires the x variable to be a categorical variable, which the variable month is not. So after applying factor(month), month goes from having numerical values 1, 2, …, 12 to having labels “1”, “2”, …, “12”. The resulting Figure 3.16 shows 12 separate “box and whiskers” plots with the following features: The “box” portions of this plot represent the 25th percentile AKA the 1st quartile, the median AKA the 50th percentile AKA the 2nd quartile, and the 75th percentile AKA the 3rd quartile. The height of each box, i.e. the value of the 3rd quartile minus the value of the 1st quartile, is called the interquartile range (\\(IQR\\)). It is a measure of spread of the middle 50% of values, with longer boxes indicating more variability. The “whisker” portions of these plots extend out from the bottoms and tops of the boxes and represent points less than the 25th percentile and greater than the 75th percentiles respectively. They’re set to extend out no more than \\(1.5 \\times IQR\\) units away from either end of the boxes. We say “no more than” because the ends of the whiskers represent the first observed values of temp to be within the range of the whiskers. The length of these whiskers show how the data outside the middle 50% of values vary, with longer whiskers indicating more variability. The dots representing values falling outside the whiskers are called outliers. It is important to keep in mind that the definition of an outlier is somewhat arbitrary and not absolute. In this case, they are defined by the length of the whiskers, which are no more than \\(1.5 \\times IQR\\) units long. Looking at this plot we can see, as expected, that summer months (6 through 8) have higher median temperatures as evidenced by the higher solid lines in the middle of the boxes. We can easily compare temperatures across months by drawing imaginary horizontal lines across the plot. Furthermore, the height of the 12 boxes as quantified by the interquartile ranges are informative too; they tell us about variability, or spread, of temperatures recorded in a given month. But to really bring home what boxplots show, let’s focus only on the month of November’s 2141 temperature recordings. Figure 3.17: November boxplot Now let’s plot all 2141 temperature recordings for November on top of the boxplot in Figure 3.18. Figure 3.18: November boxplot with points What the boxplot does is summarize the 2141 points for you, in particular: 25% of points (about 534 observations) fall below the bottom edge of the box which is the first quartile of 35.96 degrees Fahrenheit (2.2 degrees Celsius). In other words 25% of observations were colder than 35.96 degrees Fahrenheit. 25% of points fall between the bottom edge of the box and the solid middle line which is the median of 44.96 degrees Fahrenheit (7.8 degrees Celsius). In other words 25% of observations were between 35.96 and 44.96 degrees Fahrenheit. 25% of points fall between the solid middle line and the top edge of the box which is the third quartile of 51.98 degrees Fahrenheit (11.1 degrees Celsius). In other words 25% of observations were between 44.96 and 51.98 degrees Fahrenheit. 25% of points fall over the top edge of the box. In other words 25% of observations were warmer than 51.98 degrees Fahrenheit. The middle 50% of points lie within the interquartile range 16.02 degrees Fahrenheit. Learning check (LC3.22) What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point. (LC3.23) Which months have the highest variability in temperature? What reasons do you think this is? (LC3.24) We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example? (LC3.25) Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram? 3.7.2 Summary Boxplots provide a way to compare and contrast the distribution of one quantitative variable across multiple levels of one categorical variable. One can see where the median falls across the different groups by looking at the center line in the box. To see how spread out the variable is across the different groups, look at both the width of the box and also how far the lines stretch vertically from the box. (If the lines stretch far from the box but the box has a small width, the variability of the values closer to the center is much smaller than the variability of the outer ends of the variable.) Outliers are even more easily identified when looking at a boxplot than when looking at a histogram. 3.8 5NG#5: Barplots Both histograms and boxplots represent ways to visualize the variability of numerical variables. Another common task is to present the distribution of a categorical variable. This is a simpler task, focused on how many elements from the data fall into different categories of the categorical variable. Often the best way to visualize these different counts (also known as frequencies) is via a barplot, also known as a barchart. One complication, however, is how your data is represented: is the categorical variable of interest “pre-counted” or not? For example, run the following code in your Console. This code manually creates two data frames representing a collection of fruit: 3 apples and 2 oranges. fruits &lt;- data_frame( fruit = c(&quot;apple&quot;, &quot;apple&quot;, &quot;apple&quot;, &quot;orange&quot;, &quot;orange&quot;) ) fruits_counted &lt;- data_frame( fruit = c(&quot;apple&quot;, &quot;orange&quot;), number = c(3, 2) ) We see both the fruits and fruits_counted data frames represent the same collection of fruit. Whereas fruits just lists the fruit individually: Table 3.3: Fruits fruit apple apple apple orange orange fruits_counted has a variable count which represents pre-counted values of each fruit. Table 3.4: Fruits (Pre-Counted) fruit number apple 3 orange 2 3.8.1 Barplots via geom_bar/geom_col Let’s generate barplots using these two different representations of the same basket of fruit: 3 apples and 2 oranges. Using the not pre-counted data fruits from Table 3.3: ggplot(data = fruits, mapping = aes(x = fruit)) + geom_bar() Figure 3.19: Barplot when counts are not pre-counted and using the pre-counted data fruits_counted from Table 3.4: ggplot(data = fruits_counted, mapping = aes(x = fruit, y = number)) + geom_col() Figure 3.20: Barplot when counts are pre-counted Compare the barplots in Figures 3.19 and 3.20, which are identical, but are based on the two different data frames. Observe that: The code that generates Figure 3.19 based on fruits does not map a variable to the y aesthetic and uses geom_bar(). The code that generates Figure 3.20 based on fruits_counted maps the number variable to the y aesthetic and uses geom_col() Stating the above differently: When the categorical variable you want to plot is not pre-counted in your data frame you need to use geom_bar(). When the categorical variable is pre-counted (in the above fruits_counted example in the variable number), you need to use geom_col() with the y aesthetic explicitly mapped. Please note that understanding this difference is one of ggplot2’s trickier aspects that causes the most confusion, and fortunately this is as complicated as our use of ggplot2 is going to get. Let’s consider a different distribution: the distribution of airlines that flew out of New York City in 2013. Here we explore the number of flights from each airline/carrier. This can be plotted by invoking the geom_bar function in ggplot2: ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() Figure 3.21: Number of flights departing NYC in 2013 by airline using geom_bar To get an understanding of what the names of these airlines are corresponding to these carrier codes, we can look at the airlines data frame in the nycflights13 package. airlines carrier name 9E Endeavor Air Inc. AA American Airlines Inc. AS Alaska Airlines Inc. B6 JetBlue Airways DL Delta Air Lines Inc. EV ExpressJet Airlines Inc. F9 Frontier Airlines Inc. FL AirTran Airways Corporation HA Hawaiian Airlines Inc. MQ Envoy Air OO SkyWest Airlines Inc. UA United Air Lines Inc. US US Airways Inc. VX Virgin America WN Southwest Airlines Co. YV Mesa Airlines Inc. Going back to our barplot, we see that United Air Lines, JetBlue Airways, and ExpressJet Airlines had the most flights depart New York City in 2013. To get the actual number of flights by each airline we can use the group_by(), summarize(), and n() functions in the dplyr package on the carrier variable in flights, which we will introduce formally in Chapter 5. flights_table &lt;- flights %&gt;% group_by(carrier) %&gt;% summarize(number = n()) flights_table carrier number 9E 18460 AA 32729 AS 714 B6 54635 DL 48110 EV 54173 F9 685 FL 3260 HA 342 MQ 26397 OO 32 UA 58665 US 20536 VX 5162 WN 12275 YV 601 In this table, the counts of the carriers are pre-counted. To create a barplot using the data frame flights_table, we use geom_col() instead of geom_bar() map the y aesthetic to the variable number. Compare this barplot using geom_col in Figure 3.22 with the earlier barplot using geom_bar in Figure 3.21. They are identical. However the input data we used for these are different. ggplot(data = flights_table, mapping = aes(x = carrier, y = number)) + geom_col() Figure 3.22: Number of flights departing NYC in 2013 by airline using geom_col Learning check (LC3.26) Why are histograms inappropriate for visualizing categorical variables? (LC3.27) What is the difference between histograms and barplots? (LC3.28) How many Envoy Air flights departed NYC in 2013? (LC3.29) What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly? 3.8.2 Must avoid pie charts! Unfortunately, one of the most common plots seen today for categorical data is the pie chart. While they may see harmless enough, they actually present a problem in that humans are unable to judge angles well. As Naomi Robbins describes in her book “Creating More Effective Graphs” (Robbins 2013), we overestimate angles greater than 90 degrees and we underestimate angles less than 90 degrees. In other words, it is difficult for us to determine relative size of one piece of the pie compared to another. Let’s examine our previous barplot example on the number of flights departing NYC by airline. This time we will use a pie chart. As you review this chart, try to identify how much larger the portion of the pie is for ExpressJet Airlines (EV) compared to US Airways (US), what the third largest carrier is in terms of departing flights, and how many carriers have fewer flights than United Airlines (UA)? Figure 3.23: The dreaded pie chart While it is quite easy to look back at the barplot to get the answer to these questions, it’s quite difficult to get the answers correct when looking at the pie graph. Barplots can always present the information in a way that is easier for the eye to determine relative position. There may be one exception from Nathan Yau at FlowingData.com but we will leave this for the reader to decide: Figure 3.24: The only good pie chart Learning check (LC3.30) Why should pie charts be avoided and replaced by barplots? (LC3.31) What is your opinion as to why pie charts continue to be used? 3.8.3 Using barplots to compare two categorical variables Barplots are the go-to way to visualize the frequency of different categories of a categorical variable. They make it easy to order the counts and to compare the frequencies of one group to another. Another use of barplots (unfortunately, sometimes inappropriately and confusingly) is to compare two categorical variables together. Let’s examine the distribution of outgoing flights from NYC by carrier and airport. We begin by getting the names of the airports in NYC that were included in the flights data-set. Here, we preview the inner_join() function from Chapter 5. This function will join the data frame flights with the data frame airports by matching rows that have the same airport code. However, in flights the airport code is included in the origin variable whereas in airports the airport code is included in the faa variable. We will revisit such examples in Section 5.8 on joining data-sets. flights_namedports &lt;- flights %&gt;% inner_join(airports, by = c(&quot;origin&quot; = &quot;faa&quot;)) After running View(flights_namedports), we see that name now corresponds to the name of the airport as referenced by the origin variable. We will now plot carrier as the horizontal variable. When we specify geom_bar, it will specify count as being the vertical variable. A new addition here is fill = name. Look over what was produced from the plot to get an idea of what this argument gives. ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) + geom_bar() Figure 3.25: Stacked barplot comparing the number of flights by carrier and airport This plot is what is known as a stacked barplot. While simple to make, it often leads to many problems. For example in this plot, it is difficult to compare the heights of the different colors (corresponding to the number of flights from each airport) between the bars (corresponding to the different carriers). Note that fill is an aesthetic just like x is an aesthetic, and thus must be included within the parentheses of the aes() mapping. The following code, where the fill aesthetic is specified on the outside will yield an error. This is a fairly common error that new ggplot users make: ggplot(data = flights_namedports, mapping = aes(x = carrier), fill = name) + geom_bar() Learning check (LC3.32) What kinds of questions are not easily answered by looking at the above figure? (LC3.33) What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights? Another variation on the stacked barplot is the side-by-side barplot also called a dodged barplot. ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) + geom_bar(position = &quot;dodge&quot;) Figure 3.26: Side-by-side AKA dodged barplot comparing the number of flights by carrier and airport Learning check (LC3.34) Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case? (LC3.35) What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general? Lastly, an often preferred type of barplot is the faceted barplot. We already saw this concept of faceting and small multiples in Section 3.6. This gives us a nicer way to compare the distributions across both carrier and airport/name. ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) + geom_bar() + facet_wrap(~ name, ncol = 1) Figure 3.27: Faceted barplot comparing the number of flights by carrier and airport Learning check (LC3.36) Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case? (LC3.37) What information about the different carriers at different airports is more easily seen in the faceted barplot? 3.8.4 Summary Barplots are the preferred way of displaying categorical variables. They are easy-to-understand and make it easy to compare across groups of a categorical variable. When dealing with more than one categorical variable, faceted barplots are frequently preferred over side-by-side or stacked barplots. Stacked barplots are sometimes nice to look at, but it is quite difficult to compare across the levels since the sizes of the bars are all of different sizes. Side-by-side barplots can provide an improvement on this, but the issue about comparing across groups still must be dealt with. 3.9 Conclusion 3.9.1 Putting it all together Let’s recap all five of the Five Named Graphs (5NG) in Table 3.5 summarizing their differences. Using these 5NG, you’ll be able to visualize the distributions and relationships of variables contained in a wide array of datasets. This will be even more the case as we start to map more variables to more of each geometric object’s aesthetic attribute options, further unlocking the awesome power of the ggplot2 package. Table 3.5: Summary of 5NG Named graph Shows Geometric object Notes 1 Scatterplot Relationship between 2 numerical variables geom_point() 2 Linegraph Relationship between 2 numerical variables geom_line() Used when there is a sequential order to x-variable e.g. time 3 Histogram Distribution of 1 numerical variable geom_histogram() Facetted histogram shows distribution of 1 numerical variable split by 1 categorical variable 4 Boxplot Distribution of 1 numerical variable split by 1 categorical variable geom_boxplot() 5 Barplot Distribution of 1 categorical variable geom_barplot() when counts are not pre-counted Stacked &amp; dodged barplots show distribution of 2 categorical variables geom_col() when counts are pre-counted 3.9.2 Review questions Review questions have been designed using the fivethirtyeight R package (Kim, Ismay, and Chunn 2019) with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course Effective Data Storytelling using the tidyverse. The material in this chapter is covered in the chapters of the DataCamp course available below: Scatterplots &amp; Linegraphs Histograms &amp; Boxplots Barplots ggplot2 Review 3.9.3 What’s to come? In Chapter 4, we’ll introduce the concept of “tidy data” and how it is used as a key data format for all the packages we use in this textbook. You’ll see that the concept appears to be simple, but actually can be a little challenging to decipher without careful practice. We’ll also investigate how to import CSV (comma-separated value) files into R using the readr package. 3.9.4 Resources An excellent resource as you begin to create plots using the ggplot2 package is a cheatsheet that RStudio has put together entitled “Data Visualization with ggplot2” available by clicking here or by clicking the RStudio Menu Bar -&gt; Help -&gt; Cheatsheets -&gt; “Data Visualization with ggplot2” This cheatsheet covers more than what we’ve discussed in this chapter but provides nice visual descriptions of what each function produces. 3.9.5 Script of R code An R script file of all R code used in this chapter is available here. "],
+["4-tidy.html", "4 Tidy Data via tidyr 4.1 What is tidy data? 4.2 Back to nycflights13 4.3 Importing spreadsheets into R 4.4 Converting to “tidy” data format 4.5 Optional: Normal forms of data 4.6 Conclusion", " 4 Tidy Data via tidyr In Subsection 2.2.1 we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section 2.4, we started explorations of our first data frame flights included in the nycflights13 package. In Chapter 3 we made graphics using data contained in flights and other data frames. In this chapter, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest of having your data “neatly organized” in a spreadsheet. Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored and the implications of these rules on analyses. Although knowledge of this type of data formatting was not necessary in our treatment of data visualization in Chapter 3 since all the data was already in tidy format, we’ll see going forward that having tidy data will allow you to more easily create data visualizations in a wide range of settings. Furthermore, it will also help you with data wrangling in Chapter 5 and in all subsequent chapters in this book when we cover regression and discuss statistical inference. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(nycflights13) library(tidyr) library(readr) DataCamp Our approach to introducing the concept of “tidy” data is aligned with the approach taken in Alison Hill’s DataCamp course “Working with Data in the Tidyverse,” a course where students learn to work with data using tools from the tidyverse in R. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapter is Chapter 3 “Tidy your data.” 4.1 What is tidy data? You have surely heard the word “tidy” in your life: “Tidy up your room!” “Please write your homework in a tidy way so that it is easier to grade and to provide feedback.” Marie Kondo’s best-selling book The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing “I am not by any stretch of the imagination a tidy person, and the piles of unread books on the coffee table and by my bed have a plaintive, pleading quality to me - ‘Read me, please!’” - Linda Grant What does it mean for your data to be “tidy”? Beyond just being organized, in the context of this book having “tidy” data means that your data follows a standardized format. This makes it easier for you and others to visualize your data, to wrangle/transform your data, and to model your data. We will follow Hadley Wickham’s definition of tidy data here (Wickham 2014): A dataset is a collection of values, usually either numbers (if quantitative) or strings AKA text data (if qualitative). Values are organised in two ways. Every value belongs to a variable and an observation. A variable contains all values that measure the same underlying attribute (like height, temperature, duration) across units. An observation contains all values measured on the same unit (like a person, or a day, or a city) across attributes. Tidy data is a standard way of mapping the meaning of a dataset to its structure. A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types. In tidy data: Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table. Figure 4.1: Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html For example, say the following table consists of stock prices: Table 4.1: Stock Prices (Non-Tidy Format) Date Boeing Stock Price Amazon Stock Price Google Stock Price 2009-01-01 $173.55 $174.90 $174.34 2009-01-02 $172.61 $171.42 $170.04 Although the data are neatly organized in a spreadsheet-type format, they are not in tidy format since there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), but there are not three columns. In tidy data format each variable should be its own column, as shown below. Notice that both tables present the same information, but in different formats. Table 4.2: Stock Prices (Tidy Format) Date Stock Name Stock Price 2009-01-01 Boeing $173.55 2009-01-02 Boeing $172.61 2009-01-01 Amazon $174.90 2009-01-02 Amazon $171.42 2009-01-01 Google $174.34 2009-01-02 Google $170.04 However, consider the following table Table 4.3: Date, Boeing Price, Weather Data Date Boeing Price Weather 2009-01-01 $173.55 Sunny 2009-01-02 $172.61 Overcast In this case, even though the variable “Boeing Price” occurs again, the data is tidy since there are three variables corresponding to three unique pieces of information (Date, Boeing stock price, and the weather that particular day). The non-tidy data format in the original table is also known as “wide” format whereas the tidy data format in the second table is also known as “long/narrow” data format. In this book, we will work mostly with datasets that are already in tidy format even though a lot of the world’s data isn’t always in this nice format that the tidyverse gets its name from. Data that is in wide format can be converted to “tidy” format by using the gather() function in the tidyr package (Wickham and Henry 2018) in the tidyverse; we’ll show an example of this in Section 4.4. For other examples of converting a dataset into “tidy” format, check out the different functions available for data tidying and a case study using data from the World Health Organization in R for Data Science (Grolemund and Wickham 2016). Learning check (LC4.1) Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits? # A tibble: 3 x 4 country beer_servings spirit_servings wine_servings &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 Canada 240 122 100 2 South Korea 140 16 9 3 USA 249 158 84 This data frame is not in tidy format. What would it look like if it were? 4.2 Back to nycflights13 Recall the nycflights13 package with data about all domestic flights departing from New York City in 2013 that we introduced in Section 2.4 and used extensively in Chapter 3 to create visualizations. In particular, let’s revisit the flights data frame by running View(flights) in your console. We see that flights has a rectangular shape with each row corresponding to a different flight and each column corresponding to a characteristic of that flight. This matches exactly with how Hadley Wickham defined tidy data: Each variable forms a column. Each observation forms a row. But what about the third property? Each type of observational unit forms a table. 4.2.1 Observational units We identified earlier that the observational unit in the flights dataset is an individual flight. And we have shown that this dataset consists of 336,776 flights with 19 variables. In other words, rows of this dataset don’t refer to a measurement on an airline or on an airport; they refer to characteristics/measurements on a given flight from New York City in 2013. Also included in the nycflights13 package are datasets with different observational units (Wickham 2018): airlines: translation between two letter IATA carrier codes and names (16 in total) planes: construction information about each of 3,322 planes used weather: hourly meteorological data (about 8705 observations) for each of the three NYC airports airports: airport names and locations The organization of this data follows the third “tidy” data property: observations corresponding to the same observational unit should be saved in the same table/data frame. Another example involves a spreadsheet of all students enrolled in a university along with information about them, such as name, gender, and date of birth. Each row represents an individual student, which is the observational unit in question. 4.2.2 Identification vs measurement variables There is a subtle difference between the kinds of variables that you will encounter in data frames: measurement variables and identification variables. The airports data frame you worked with above contains both these types of variables. Recall that in airports the observational unit is an airport, and thus each row corresponds to one particular airport. Let’s pull them apart using the glimpse function: glimpse(airports) Observations: 1,458 Variables: 8 $ faa &lt;chr&gt; &quot;04G&quot;, &quot;06A&quot;, &quot;06C&quot;, &quot;06N&quot;, &quot;09J&quot;, &quot;0A9&quot;, &quot;0G6&quot;, &quot;0G7&quot;, &quot;0P2&quot;, … $ name &lt;chr&gt; &quot;Lansdowne Airport&quot;, &quot;Moton Field Municipal Airport&quot;, &quot;Schaumbu… $ lat &lt;dbl&gt; 41.13047, 32.46057, 41.98934, 41.43191, 31.07447, 36.37122, 41.… $ lon &lt;dbl&gt; -80.61958, -85.68003, -88.10124, -74.39156, -81.42778, -82.1734… $ alt &lt;int&gt; 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 409, 875, 1… $ tz &lt;dbl&gt; -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -5, -5, -5,… $ dst &lt;chr&gt; &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;… $ tzone &lt;chr&gt; &quot;America/New_York&quot;, &quot;America/Chicago&quot;, &quot;America/Chicago&quot;, &quot;Amer… The variables faa and name are what we will call identification variables: variables that uniquely identify each observational unit. They are mainly used to provide a unique name to each observational unit, thereby allowing us to uniquely identify them. faa gives the unique code provided by the FAA for that airport, while the name variable gives the longer more natural name of the airport. The remaining variables (lat, lon, alt, tz, dst, tzone) are often called measurement or characteristic variables: variables that describe properties of each observational unit, in other words each observation in each row. For example, lat and long describe the latitude and longitude of each airport. So in our above example of a spreadsheet of all students enrolled at a university, email address could be treated as an identical variable since it uniquely identifies each observational unit i.e. each student, while date of birth could not since it is possible (and highly probable) that two students share the same birthday. Furthermore, sometimes a single variable might not be enough to uniquely identify each observational unit: combinations of variables might be needed (see Learning Check below). While it is not an absolute rule, for organizational purposes it is considered good practice to have your identification variables in the far left-most columns of your data frame. Learning check (LC4.2) What properties of the observational unit do each of lat, lon, alt, tz, dst, and tzone describe for the airports data frame? Note that you may want to use ?airports to get more information. (LC4.3) Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions. 4.3 Importing spreadsheets into R Up to this point, we’ve used data either stored inside of an R package or we’ve manually created the data such as the fruits and fruits_counted data in Subsection 3.8. Another common way to get data into R is by reading in data from a spreadsheet file either on your computer or online. Spreadsheet data is often saved in one of two formats: A Comma Separated Values .csv file. You can think of a CSV file as a bare-bones spreadsheet where: Each line in the file corresponds to one row of data/one observation. Values for each line are separated with commas. In other words, the values of different variables are separated by commas. The first line is often, but not always, a header row indicating the names of the columns/variables. An Excel .xlsx file. This format is based on Microsoft’s proprietary Excel software. As opposed to a bare-bones .csv files, .xlsx Excel files contain a lot of metadata, or put more simply, data about the data. Examples include the use of bold and italic fonts, colored cells, different column widths, and formula macros etc. Google Sheets allows you to download your data in both comma separated values .csv and Excel .xlsx formats: Go to the Google Sheets menu bar -&gt; File -&gt; Download as -&gt; Select “Microsoft Excel” or “Comma-separated values”. We’ll cover two methods for importing data in R: one using the R console and the other using RStudio’s graphical interface. 4.3.1 Method 1: From the console First, let’s download a Comma Separated Values (CSV) file of ratings of the level of democracy in different countries spanning 1952 to 1992: https://moderndive.com/data/dem_score.csv. We use the read_csv() function from the readr package to read it off the web and then take a look. library(readr) dem_score &lt;- read_csv(&quot;https://moderndive.com/data/dem_score.csv&quot;) dem_score # A tibble: 96 x 10 country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 Albania -9 -9 -9 -9 -9 -9 -9 -9 5 2 Argentina -9 -1 -1 -9 -9 -9 -8 8 7 3 Armenia -9 -7 -7 -7 -7 -7 -7 -7 7 4 Australia 10 10 10 10 10 10 10 10 10 5 Austria 10 10 10 10 10 10 10 10 10 6 Azerbaijan -9 -7 -7 -7 -7 -7 -7 -7 1 7 Belarus -9 -7 -7 -7 -7 -7 -7 -7 7 8 Belgium 10 10 10 10 10 10 10 10 10 9 Bhutan -10 -10 -10 -10 -10 -10 -10 -10 -10 10 Bolivia -4 -3 -3 -4 -7 -7 8 9 9 # … with 86 more rows In this dem_score data frame, the minimum value of -10 corresponds to a highly autocratic nation whereas a value of 10 corresponds to a highly democratic nation. 4.3.2 Method 2: Using RStudio’s interface Let’s read in the same data saved in Excel format this time at https://moderndive.com/data/dem_score.xlsx, but using RStudio’s graphical interface instead of via the R console. First download the Excel file, then go to the Files pane of RStudio -&gt; Navigate to the directory where your downloaded dem_score.xlsx is saved -&gt; Click on dem_score.xlsx -&gt; Click “Import Dataset…” -&gt; Click “Import Dataset…” At this point you should see an image like in After clicking on the “Import” button on the bottom right RStudio save this spreadsheet’s data in a data frame called dem_score and display its contents in the spreadsheet viewer. Furthermore you’ll see the code that read in your data in the console; you can copy and paste this code to reload your data again later instead of repeating the above manual process. 4.4 Converting to “tidy” data format In this Section, we’ll show you how to convert a dataset that isn’t in “tidy” format i.e. “wide” format, to a dataset that is in “tidy” format i.e. “long/narrow” format. Let’s use the dem_score data frame we loaded from a spreadsheet in the previous Section but focus on only data corresponding to the country of Guatemala. guat_dem &lt;- dem_score %&gt;% filter(country == &quot;Guatemala&quot;) guat_dem # A tibble: 1 x 10 country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 Guatemala 2 -6 -5 3 1 -3 -7 3 3 Now let’s produce a plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Let’s start by laying out how we would map our aesthetics to variables in the data frame: The data frame is guat_dem by setting data = guat_dem What are the names of the variables to plot? We’d like to see how the democracy score has changed over the years. Now we are stuck in a predicament. We see that we have a variable named country but its only value is &quot;Guatemala&quot;. We have other variables denoted by different year values. Unfortunately, we’ve run into a dataset that is not in the appropriate format to apply the Grammar of Graphics and ggplot2. Remember that ggplot2 is a package in the tidyverse and, thus, needs data to be in a tidy format. We’d like to finish off our mapping of aesthetics to variables by doing something like The aesthetic mapping is set by aes(x = year, y = democracy_score) but this is not possible with our wide-formatted data. We need to take the values of the current column names in guat_dem (aside from country) and convert them into a new variable that will act as a key called year. Then, we’d like to take the numbers on the inside of the table and turn them into a column that will act as values called democracy_score. Our resulting data frame will have three columns: country, year, and democracy_score. The gather() function in the tidyr package can complete this task for us. The first argument to gather(), just as with ggplot2(), is the data argument where we specify which data frame we would like to tidy. The next two arguments to gather() are key and value, which specify what we’d like to call the new columns that convert our wide data into long format. Lastly, we include a specification for variables we’d like to NOT include in this tidying process using a -. guat_tidy &lt;- gather(data = guat_dem, key = year, value = democracy_score, - country) guat_tidy # A tibble: 9 x 3 country year democracy_score &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; 1 Guatemala 1952 2 2 Guatemala 1957 -6 3 Guatemala 1962 -5 4 Guatemala 1967 3 5 Guatemala 1972 1 6 Guatemala 1977 -3 7 Guatemala 1982 -7 8 Guatemala 1987 3 9 Guatemala 1992 3 We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a linegraph and ggplot2. ggplot(data = guat_tidy, mapping = aes(x = year, y = democracy_score)) + geom_line() geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? Observe that the year variable in guat_tidy is stored as a character vector since we had to circumvent the naming rules in R by adding backticks around the different year columns in guat_dem. This is leading to ggplot not knowing exactly how to plot a line using a categorical variable. We can fix this by using the parse_number() function in the readr package and then specify the horizontal axis label to be &quot;year&quot;: ggplot(data = guat_tidy, mapping = aes(x = parse_number(year), y = democracy_score)) + geom_line() + labs(x = &quot;year&quot;) Figure 4.2: Guatemala’s democracy score ratings from 1952 to 1992 We’ll see in Chapter 5 how we could use the mutate() function to change year to be a numeric variable instead after we have done our tidying. Notice now that the mappings of aesthetics to variables make sense in Figure 4.2: The data frame is guat_tidy by setting data = dem_score The x aesthetic is mapped to year The y aesthetic is mapped to democracy_score The geom_etry chosen is line Learning check (LC4.4) Convert the dem_score data frame into a tidy data frame and assign the name of dem_score_tidy to the resulting long-formatted data frame. (LC4.5) Read in the life expectancy data stored at https://moderndive.com/data/le_mess.csv and convert it to a tidy data frame. 4.5 Optional: Normal forms of data The datasets included in the nycflights13 package are in a form that minimizes redundancy of data. We will see that there are ways to merge (or join) the different tables together easily. We are capable of doing so because each of the tables have keys in common to relate one to another. This is an important property of normal forms of data. The process of decomposing data frames into less redundant tables without losing information is called normalization. More information is available on Wikipedia. We saw an example of this above with the airlines dataset. While the flights data frame could also include a column with the names of the airlines instead of the carrier code, this would be repetitive since there is a unique mapping of the carrier code to the name of the airline/carrier. Below an example is given showing how to join the airlines data frame together with the flights data frame by linking together the two datasets via a common key of &quot;carrier&quot;. Note that this “joined” data frame is assigned to a new data frame called joined_flights. The key variable that we frequently join by is one of the identification variables mentioned above. library(dplyr) joined_flights &lt;- inner_join(x = flights, y = airlines, by = &quot;carrier&quot;) View(joined_flights) If we View this dataset, we see a new variable has been created called name. (We will see in Subsection 5.9.2 ways to change name to a more descriptive variable name.) More discussion about joining data frames together will be given in Chapter 5. We will see there that the names of the columns to be linked need not match as they did here with &quot;carrier&quot;. Learning check (LC4.6) What are common characteristics of “tidy” datasets? (LC4.7) What makes “tidy” datasets useful for organizing data? (LC4.8) What are some advantages of data in normal forms? What are some disadvantages? 4.6 Conclusion 4.6.1 Review questions Review questions have been designed using the fivethirtyeight R package (Kim, Ismay, and Chunn 2019) with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course Effective Data Storytelling using the tidyverse. The material in this chapter is covered in the Tidy Data chapter of the DataCamp course available here. 4.6.2 What’s to come? In Chapter 5, we’ll further explore data in tidy format by grouping our data, creating summaries based on those groupings, filtering our data to match conditions, and performing other wranglings with our data including defining new columns/variables. These data wrangling procedures will go hand-in-hand with the data visualizations you’ve produced in Chapter 3. 4.6.3 Script of R code An R script file of all R code used in this chapter is available here. "],
+["5-wrangling.html", "5 Data Wrangling via dplyr 5.1 The pipe %&gt;% 5.2 Data wrangling verbs 5.3 Filter observations using filter 5.4 Summarize variables using summarize 5.5 Group rows using group_by 5.6 Create new variables/change old variables using mutate 5.7 Reorder the data frame using arrange 5.8 Joining data frames 5.9 Other verbs 5.10 Conclusion", " 5 Data Wrangling via dplyr Let’s briefly recap where we have been so far and where we are headed. In Chapter 4, we discussed what it means for data to be tidy. We saw that this refers to observations corresponding to rows and variables being stored in columns (one variable for every column). The entries in the data frame correspond to different combinations of observations (specific instances of observational units) and variables. In the flights data frame, we saw that each row corresponds to a different flight leaving New York City. In other words, the observational unit of the flights tidy data frame is a flight. The variables are listed as columns, and for flights these columns include both quantitative variables like dep_delay and distance and also categorical variables like carrier and origin. An entry in the table corresponds to a particular flight on a given day and a particular value of a given variable representing that flight. Armed with this knowledge and looking back on Chapter 3, we see that organizing data in this tidy way makes it easy for us to produce graphics, specifically a set of 5 common graphics we termed the 5 Named Graphics (5NG): scatterplots linegraphs boxplots histograms barplots We can simply specify what variable/column we would like on one axis, (if applicable) what variable we’d like on the other axis, and what type of plot we’d like to make by specifying the geometric object in question. We can also vary aesthetic attributes of the geometric objects in question (points, lines, bar), such as the size and color, along the values of another variable in this tidy dataset. Recall the Gapminder example from Figure 3.1. Lastly, in a few spots in Chapter 3 and Chapter 4, we hinted at some ways to summarize and wrangle data to suit your needs, using the filter() and inner_join() functions. This chapter expands on these two functions and provides you with various new data wrangling tools from the dplyr package (Wickham et al. 2019) for your data science toolbox. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(nycflights13) DataCamp Our approach to introducing data wrangling tools from the dplyr package is very similar to the approach taken in David Robinson’s DataCamp course “Introduction to the Tidyverse,” a course targeted at people new to R and the tidyverse. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 on “Data wrangling” and Chapter 3 on “Grouping and summarizing”. While not required for this book, if you would like a quick peek at more powerful tools to explore, tame, tidy, and transform data, we suggest you take Alison Hill’s DataCamp course “Working with Data in the Tidyverse,” Click on the image below to access the course. The relevant chapter is Chapter 3 “Tidy your data.” 5.1 The pipe %&gt;% Before we dig into data wrangling, let’s first introduce the pipe operator (%&gt;%). Just as the + sign was used to add layers to a plot created using ggplot(), the pipe operator allows us to chain together dplyr data wrangling functions. The pipe operator can be read as “then”. The %&gt;% operator allows us to go from one step in dplyr to the next easily so we can, for example: filter our data frame to only focus on a few rows then group_by another variable to create groups then summarize this grouped data to calculate the mean for each level of the group. The piping syntax will be our major focus throughout the rest of this book and you’ll find that you’ll quickly be addicted to the chaining with some practice. 5.2 Data wrangling verbs The d in dplyr stands for data frames, so the functions in dplyr are built for working with objects of the data frame type. For now, we focus on the most commonly used functions that help wrangle and summarize data. A description of these verbs follows, with each section devoted to an example of that verb, or a combination of a few verbs, in action. filter(): Pick rows based on conditions about their values summarize(): Compute summary measures known as “summary statistics” of variables group_by(): Group rows of observations together mutate(): Create a new variable in the data frame by mutating existing ones arrange(): Arrange/sort the rows based on one or more variables join(): Join/merge two data frames by matching along a “key” variable. There are many different join()s available. Here, we will focus on the inner_join() function. All of the verbs are used similarly where you: take a data frame, pipe it using the %&gt;% syntax into one of the verbs above followed by other arguments specifying which criteria you’d like the verb to work with in parentheses. Keep in mind, there are more advanced functions than just these five and you’ll see some examples of this near the end of this chapter in 5.9, but with just the above verbs you’ll be able to perform a broad array of data wrangling tasks. 5.3 Filter observations using filter Figure 5.1: Filter diagram from Data Wrangling with dplyr and tidyr cheatsheet The filter function here works much like the “Filter” option in Microsoft Excel; it allows you to specify criteria about values of a variable in your dataset and then chooses only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The dest code (or airport code) for Portland, Oregon is &quot;PDX&quot;. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here: portland_flights &lt;- flights %&gt;% filter(dest == &quot;PDX&quot;) View(portland_flights) Note the following: The ordering of the commands: Take the data frame flights then filter the data frame so that only those where the dest equals &quot;PDX&quot; are included. The double equal sign == for testing for equality, and not =. You are almost guaranteed to make the mistake at least once of only including one equals sign. You can combine multiple criteria together using operators that make comparisons: | corresponds to “or” &amp; corresponds to “and” We can often skip the use of &amp; and just separate our conditions with a comma. You’ll see this in the example below. In addition, you can use other mathematical checks (similar to ==): &gt; corresponds to “greater than” &lt; corresponds to “less than” &gt;= corresponds to “greater than or equal to” &lt;= corresponds to “less than or equal to” != corresponds to “not equal to” To see many of these in action, let’s select all flights that left JFK airport heading to Burlington, Vermont (&quot;BTV&quot;) or Seattle, Washington (&quot;SEA&quot;) in the months of October, November, or December. Run the following btv_sea_flights_fall &lt;- flights %&gt;% filter(origin == &quot;JFK&quot;, (dest == &quot;BTV&quot; | dest == &quot;SEA&quot;), month &gt;= 10) View(btv_sea_flights_fall) Note: even though colloquially speaking one might say “all flights leaving Burlington, Vermont and Seattle, Washington,” in terms of computer logical operations, we really mean “all flights leaving Burlington, Vermont or Seattle, Washington.” For a given row in the data, dest can be “BTV”, “SEA”, or something else, but not “BTV” and “SEA” at the same time. Another example uses the ! to pick rows that don’t match a condition. The ! can be read as “not”. Here we are selecting rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA. not_BTV_SEA &lt;- flights %&gt;% filter(!(dest == &quot;BTV&quot; | dest == &quot;SEA&quot;)) View(not_BTV_SEA) As a final note we point out that filter() should often be the first verb you’ll apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope to just the observations your care about. Learning check (LC5.1) What’s another way using the “not” operator ! we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the flights data frame? Test this out using the code above. 5.4 Summarize variables using summarize The next common task when working with data is to be able to summarize data: take a large number of values and summarize them with a single value. While this may seem like a very abstract idea, something as simple as the sum, the smallest value, and the largest values are all summaries of a large number of values. Figure 5.2: Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet Figure 5.3: Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet We can calculate the standard deviation and mean of the temperature variable temp in the weather data frame of nycflights13 in one step using the summarize (or equivalently using the UK spelling summarise) function in dplyr (See Appendix A): summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp), std_dev = sd(temp)) summary_temp mean std_dev —– ——– We’ve created a small data frame here called summary_temp that includes both the mean and the std_dev of the temp variable in weather. Notice as shown in Figures 5.2 and 5.3, the data frame weather went from many rows to a single row of just the summary values in the data frame summary_temp. But why are the values returned NA? This stands for “not available or not applicable” and is how R encodes missing values; if in a data frame for a particular row and column no value exists, NA is stored instead. Furthermore, by default any time you try to summarize a number of values (using mean() and sd() for example) that has one or more missing values, then NA is returned. Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You’ll often encounter issues with missing values. You can summarize all non-missing values by setting the na.rm argument to TRUE (rm is short for “remove”). This will remove any NA missing values and only return the summary value for all non-missing values. So the code below computes the mean and standard deviation of all non-missing values. Notice how the na.rm=TRUE are set as arguments to the mean() and sd() functions, and not to the summarize() function. summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) summary_temp mean std_dev 55.26039 17.78785 It is not good practice to include a na.rm = TRUE in your summary commands by default; you should attempt to run code first without this argument as this will alert you to the presence of missing data. Only after you’ve identified where missing values occur and have thought about the potential causes of this missing should you consider using na.rm = TRUE. In the upcoming Learning Checks we’ll consider the possible ramifications of blindly sweeping rows with missing values under the rug. What other summary functions can we use inside the summarize() verb? Any function in R that takes a vector of values and returns just one. Here are just a few: mean(): the mean AKA the average sd(): the standard deviation, which is a measure of spread min() and max(): the minimum and maximum values respectively IQR(): Interquartile range sum(): the sum n(): a count of the number of rows/observations in each group. This particular summary function will make more sense when group_by() is covered in Section 5.5. Learning check (LC5.2) Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach? (LC5.3) Modify the above summarize function to create summary_temp to also use the n() summary function: summarize(count = n()). What does the returned value correspond to? (LC5.4) Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) first. summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) %&gt;% summarize(std_dev = sd(temp, na.rm = TRUE)) 5.5 Group rows using group_by Figure 5.4: Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet It’s often more useful to summarize a variable based on the groupings of another variable. Let’s say, we are interested in the mean and standard deviation of temperatures but grouped by month. To be more specific: we want the mean and standard deviation of temperatures split by month. sliced by month. aggregated by month. collapsed over month. Run the following code: summary_monthly_temp &lt;- weather %&gt;% group_by(month) %&gt;% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) summary_monthly_temp month mean std_dev 1 35.63566 10.224635 2 34.27060 6.982378 3 39.88007 6.249278 4 51.74564 8.786168 5 61.79500 9.681644 6 72.18400 7.546371 7 80.06622 7.119898 8 74.46847 5.191615 9 67.37129 8.465902 10 60.07113 8.846035 11 44.99043 10.443805 12 38.44180 9.982432 This code is identical to the previous code that created summary_temp, with an extra group_by(month) added. Grouping the weather dataset by month and then passing this new data frame into summarize yields a data frame that shows the mean and standard deviation of temperature for each month in New York City. Note: Since each row in summary_monthly_temp represents a summary of different rows in weather, the observational units have changed. It is important to note that group_by doesn’t change the data frame. It sets meta-data (data about the data), specifically the group structure of the data. It is only after we apply the summarize function that the data frame changes. If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the ungroup() function. For example, say the group structure meta-data is set to be by month via group_by(month), all future summarizations will be reported on a month-by-month basis. If however, we would like to no longer have this and have all summarizations be for all data in a single group (in this case over the entire year of 2013), then pipe the data frame in question through and ungroup() to remove this. We now revisit the n() counting summary function we introduced in the previous section. For example, suppose we’d like to get a sense for how many flights departed each of the three airports in New York City: by_origin &lt;- flights %&gt;% group_by(origin) %&gt;% summarize(count = n()) by_origin origin count EWR 120835 JFK 111279 LGA 104662 We see that Newark (&quot;EWR&quot;) had the most flights departing in 2013 followed by &quot;JFK&quot; and lastly by LaGuardia (&quot;LGA&quot;). Note there is a subtle but important difference between sum() and n(). While sum() simply adds up a large set of numbers, the latter counts the number of times each of many different values occur. 5.5.1 Grouping by more than one variable You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports for each month, we can also group by a second variable month: group_by(origin, month). by_origin_monthly &lt;- flights %&gt;% group_by(origin, month) %&gt;% summarize(count = n()) by_origin_monthly # A tibble: 36 x 3 # Groups: origin [3] origin month count &lt;chr&gt; &lt;int&gt; &lt;int&gt; 1 EWR 1 9893 2 EWR 2 9107 3 EWR 3 10420 4 EWR 4 10531 5 EWR 5 10592 6 EWR 6 10175 7 EWR 7 10475 8 EWR 8 10359 9 EWR 9 9550 10 EWR 10 10104 # … with 26 more rows We see there are 36 rows to by_origin_monthly because there are 12 months times 3 airports (EWR, JFK, and LGA). Let’s now pose two questions. First, what if we reverse the order of the grouping i.e. we group_by(month, origin)? by_monthly_origin &lt;- flights %&gt;% group_by(month, origin) %&gt;% summarize(count = n()) by_monthly_origin # A tibble: 36 x 3 # Groups: month [12] month origin count &lt;int&gt; &lt;chr&gt; &lt;int&gt; 1 1 EWR 9893 2 1 JFK 9161 3 1 LGA 7950 4 2 EWR 9107 5 2 JFK 8421 6 2 LGA 7423 7 3 EWR 10420 8 3 JFK 9697 9 3 LGA 8717 10 4 EWR 10531 # … with 26 more rows In by_monthly_origin the month column is now first and the rows are sorted by month instead of origin. If you compare the values of count in by_origin_monthly and by_monthly_origin using the View() function, you’ll see that the values are actually the same, just presented in a different order. Second, why do we group_by(origin, month) and not group_by(origin) and then group_by(month)? Let’s investigate: by_origin_monthly_incorrect &lt;- flights %&gt;% group_by(origin) %&gt;% group_by(month) %&gt;% summarize(count = n()) by_origin_monthly_incorrect # A tibble: 12 x 2 month count &lt;int&gt; &lt;int&gt; 1 1 27004 2 2 24951 3 3 28834 4 4 28330 5 5 28796 6 6 28243 7 7 29425 8 8 29327 9 9 27574 10 10 28889 11 11 27268 12 12 28135 What happened here is that the second group_by(month) overrode the first group_by(origin), so that in the end we are only grouping by month. The lesson here, is if you want to group_by() two or more variables, you should include all these variables in a single group_by() function call. Learning check (LC5.5) Recall from Chapter 3 when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the summary_monthly_temp data frame tell us about temperatures in New York City throughout the year? (LC5.6) What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC? (LC5.7) Recreate by_monthly_origin, but instead of grouping via group_by(origin, month), group variables in a different order group_by(month, origin). What differs in the resulting dataset? (LC5.8) How could we identify how many flights left each of the three airports for each carrier? (LC5.9) How does the filter operation differ from a group_by followed by a summarize? 5.6 Create new variables/change old variables using mutate Figure 5.5: Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet When looking at the flights dataset, there are some clear additional variables that could be calculated based on the values of variables already in the dataset. Passengers are often frustrated when their flights departs late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to when they expected to land. This is commonly referred to as “gain” and we will create this variable using the mutate function. Note that we have also overwritten the flights data frame with what it was before as well as an additional variable gain here, or put differently, the mutate() command outputs a new data frame which then gets saved over the original flights data frame. flights &lt;- flights %&gt;% mutate(gain = dep_delay - arr_delay) Let’s take a look at dep_delay, arr_delay, and the resulting gain variables for the first 5 rows in our new flights data frame: # A tibble: 5 x 3 dep_delay arr_delay gain &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 2 11 -9 2 4 20 -16 3 2 33 -31 4 -1 -18 17 5 -6 -25 19 The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its “gained time in the air” is actually a loss of 9 minutes, hence its gain is -9. Contrast this to the flight in the fourth row which departed a minute early (dep_delay of -1) but arrived 18 minutes early (arr_delay of -18), so its “gained time in the air” is 17 minutes, hence its gain is +17. Why did we overwrite flights instead of assigning the resulting data frame to a new object, like flights_with_gain? As a rough rule of thumb, as long as you are not losing information that you might need later, it’s acceptable practice to overwrite data frames. However, if you overwrite existing variables and/or change the observational units, recovering the original information might prove difficult. In this case, it might make sense to create a new data object. Let’s look at summary measures of this gain variable and even plot it in the form of a histogram: gain_summary &lt;- flights %&gt;% summarize( min = min(gain, na.rm = TRUE), q1 = quantile(gain, 0.25, na.rm = TRUE), median = quantile(gain, 0.5, na.rm = TRUE), q3 = quantile(gain, 0.75, na.rm = TRUE), max = max(gain, na.rm = TRUE), mean = mean(gain, na.rm = TRUE), sd = sd(gain, na.rm = TRUE), missing = sum(is.na(gain)) ) gain_summary min q1 median q3 max mean sd missing -196 -3 7 17 109 5.659779 18.04365 9430 We’ve recreated the summary function we saw in Chapter 3 here using the summarize function in dplyr. ggplot(data = flights, mapping = aes(x = gain)) + geom_histogram(color = &quot;white&quot;, bins = 20) Figure 5.6: Histogram of gain variable We can also create multiple columns at once and even refer to columns that were just created in a new column. Hadley and Garrett produce one such example in Chapter 5 of “R for Data Science” (Grolemund and Wickham 2016): flights &lt;- flights %&gt;% mutate( gain = dep_delay - arr_delay, hours = air_time / 60, gain_per_hour = gain / hours ) Learning check (LC5.10) What do positive values of the gain variable in flights correspond to? What about negative values? And what about a zero value? (LC5.11) Could we create the dep_delay and arr_delay columns by simply subtracting dep_time from sched_dep_time and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in flights. (LC5.12) What can we say about the distribution of gain? Describe it in a few sentences using the plot and the gain_summary data frame values. 5.7 Reorder the data frame using arrange One of the most common things people working with data would like to do is sort the data frames by a specific variable in a column. Have you ever been asked to calculate a median by hand? This requires you to put the data in order from smallest to highest in value. The dplyr package has a function called arrange that we will use to sort/reorder our data according to the values of the specified variable. This is often used after we have used the group_by and summarize functions as we will see. Let’s suppose we were interested in determining the most frequent destination airports from New York City in 2013: freq_dest &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) freq_dest # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 ABQ 254 2 ACK 265 3 ALB 439 4 ANC 8 5 ATL 17215 6 AUS 2439 7 AVL 275 8 BDL 443 9 BGR 375 10 BHM 297 # … with 95 more rows You’ll see that by default the values of dest are displayed in alphabetical order here. We are interested in finding those airports that appear most: freq_dest %&gt;% arrange(num_flights) # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 LEX 1 2 LGA 1 3 ANC 8 4 SBN 10 5 HDN 15 6 MTJ 15 7 EYW 17 8 PSP 19 9 JAC 25 10 BZN 36 # … with 95 more rows This is actually giving us the opposite of what we are looking for. It tells us the least frequent destination airports first. To switch the ordering to be descending instead of ascending we use the desc (descending) function: freq_dest %&gt;% arrange(desc(num_flights)) # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 ORD 17283 2 ATL 17215 3 LAX 16174 4 BOS 15508 5 MCO 14082 6 CLT 14064 7 SFO 13331 8 FLL 12055 9 MIA 11728 10 DCA 9705 # … with 95 more rows 5.8 Joining data frames Another common task is joining AKA merging two different datasets. For example, in the flights data, the variable carrier lists the carrier code for the different flights. While &quot;UA&quot; and &quot;AA&quot; might be somewhat easy to guess for some (United and American Airlines), what are “VX”, “HA”, and “B6”? This information is provided in a separate data frame airlines. View(airlines) We see that in airports, carrier is the carrier code while name is the full name of the airline. Using this table, we can see that “VX”, “HA”, and “B6” correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, will we have to continually look up the carrier’s name for each flight in the airlines dataset? No! Instead of having to do this manually, we can have R automatically do the “looking up” for us. Note that the values in the variable carrier in flights match the values in the variable carrier in airlines. In this case, we can use the variable carrier as a key variable to join/merge/match the two data frames by. Key variables are almost always identification variables that uniquely identify the observational units as we saw back in Subsection 4.2.2 on identification vs measurement variables. This ensures that rows in both data frames are appropriate matched during the join. Hadley and Garrett (Grolemund and Wickham 2016) created the following diagram to help us understand how the different datasets are linked by various key variables: Figure 5.7: Data relationships in nycflights13 from R for Data Science 5.8.1 Joining by “key” variables In both flights and airlines, the key variable we want to join/merge/match the two data frames with has the same name in both datasets: carriers. We make use of the inner_join() function to join by the variable carrier. flights_joined &lt;- flights %&gt;% inner_join(airlines, by = &quot;carrier&quot;) View(flights) View(flights_joined) We observed that the flights and flights_joined are identical except that flights_joined has an additional variable name whose values were drawn from airlines. A visual representation of the inner_join is given below (Grolemund and Wickham 2016): Figure 5.8: Diagram of inner join from R for Data Science There are more complex joins available, but the inner_join will solve nearly all of the problems you’ll face in our experience. 5.8.2 Joining by “key” variables with different names Say instead, you are interested in all the destinations of flights from NYC in 2013 and ask yourself: “What cities are these airports in?” “Is &quot;ORD&quot; Orlando?” &quot;Where is &quot;FLL&quot;? The airports data frame contains airport codes: View(airports) However, looking at both the airports and flights and the visual representation of the relations between the data frames in Figure 5.8, we see that in: airports the airport code is in the variable faa flights the airport code is in the variable origin So to join these two datasets, our inner_join operation involves a by argument that accounts for the different names: flights %&gt;% inner_join(airports, by = c(&quot;dest&quot; = &quot;faa&quot;)) Let’s construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport: named_dests &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) %&gt;% arrange(desc(num_flights)) %&gt;% inner_join(airports, by = c(&quot;dest&quot; = &quot;faa&quot;)) %&gt;% rename(airport_name = name) named_dests # A tibble: 101 x 9 dest num_flights airport_name lat lon alt tz dst tzone &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; 1 ORD 17283 Chicago Ohare Intl 42.0 -87.9 668 -6 A America… 2 ATL 17215 Hartsfield Jackson… 33.6 -84.4 1026 -5 A America… 3 LAX 16174 Los Angeles Intl 33.9 -118. 126 -8 A America… 4 BOS 15508 General Edward Law… 42.4 -71.0 19 -5 A America… 5 MCO 14082 Orlando Intl 28.4 -81.3 96 -5 A America… 6 CLT 14064 Charlotte Douglas … 35.2 -80.9 748 -5 A America… 7 SFO 13331 San Francisco Intl 37.6 -122. 13 -8 A America… 8 FLL 12055 Fort Lauderdale Ho… 26.1 -80.2 9 -5 A America… 9 MIA 11728 Miami Intl 25.8 -80.3 8 -5 A America… 10 DCA 9705 Ronald Reagan Wash… 38.9 -77.0 15 -5 A America… # … with 91 more rows In case you didn’t know, &quot;ORD&quot; is the airport code of Chicago O’Hare airport and &quot;FLL&quot; is the main airport in Fort Lauderdale, Florida, which we can now see in our named_dests data frame. 5.8.3 Joining by multiple “key” variables Say instead we are in a situation where we need to join by multiple variables. For example, in Figure 5.7 above we see that in order to join the flights and weather data frames, we need more than one key variable: year, month, day, hour, and origin. This is because the combination of these 5 variables act to uniquely identify each observational unit in the weather data frame: hourly weather recordings at each of the 3 NYC airports. We achieve this by specifying a vector of key variables to join by using the c() concatenate function. Note the individual variables need to be wrapped in quotation marks. flights_weather_joined &lt;- flights %&gt;% inner_join(weather, by = c(&quot;year&quot;, &quot;month&quot;, &quot;day&quot;, &quot;hour&quot;, &quot;origin&quot;)) flights_weather_joined # A tibble: 335,220 x 32 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; 1 2013 1 1 517 515 2 830 819 2 2013 1 1 533 529 4 850 830 3 2013 1 1 542 540 2 923 850 4 2013 1 1 544 545 -1 1004 1022 5 2013 1 1 554 600 -6 812 837 6 2013 1 1 554 558 -4 740 728 7 2013 1 1 555 600 -5 913 854 8 2013 1 1 557 600 -3 709 723 9 2013 1 1 557 600 -3 838 846 10 2013 1 1 558 600 -2 753 745 # … with 335,210 more rows, and 24 more variables: arr_delay &lt;dbl&gt;, # carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;, # air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;, # time_hour.x &lt;dttm&gt;, gain &lt;dbl&gt;, hours &lt;dbl&gt;, gain_per_hour &lt;dbl&gt;, # temp &lt;dbl&gt;, dewp &lt;dbl&gt;, humid &lt;dbl&gt;, wind_dir &lt;dbl&gt;, wind_speed &lt;dbl&gt;, # wind_gust &lt;dbl&gt;, precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;, # time_hour.y &lt;dttm&gt; Learning check (LC5.13) Looking at Figure 5.7, when joining flights and weather (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of year, month, day, hour, and origin, and not just hour? (LC5.14) What surprises you about the top 10 destinations from NYC in 2013? 5.9 Other verbs On top of the following examples of other verbs, if you’d like to see more examples on using dplyr, the data wrangling verbs we introduction in Section 5.2, and the pipe function %&gt;% with the nycflights13 dataset, check out Chapter 5 of Hadley and Garrett’s book (Grolemund and Wickham 2016). 5.9.1 Select variables using select Figure 5.9: Select diagram from Data Wrangling with dplyr and tidyr cheatsheet We’ve seen that the flights data frame in the nycflights13 package contains many different variables. The names function gives a listing of all the columns in a data frame; in our case you would run names(flights). You can also identify these variables by running the glimpse function in the dplyr package: glimpse(flights) However, say you only want to consider two of these variables, say carrier and flight. You can select these: flights %&gt;% select(carrier, flight) This function makes navigating datasets with a very large number of variables easier for humans by restricting consideration to only those of interest, like carrier and flight above. So for example, this might make viewing the dataset using the View() spreadsheet viewer more digestible. However, as far as the computer is concerned it doesn’t care how many variables additional variables are in the dataset in question, so long as carrier and flight are included. Another example involves the variable year. If you remember the original description of the flights data frame (or by running ?flights), you’ll remember that this data correspond to flights in 2013 departing New York City. The year variable isn’t really a variable here in that it doesn’t vary… flights actually comes from a larger dataset that covers many years. We may want to remove the year variable from our dataset since it won’t be helpful for analysis in this case. We can deselect year by using the - sign: flights_no_year &lt;- flights %&gt;% select(-year) names(flights_no_year) Or we could specify a ranges of columns: flight_arr_times &lt;- flights %&gt;% select(month:day, arr_time:sched_arr_time) flight_arr_times The select function can also be used to reorder columns in combination with the everything helper function. Let’s suppose we’d like the hour, minute, and time_hour variables, which appear at the end of the flights dataset, to actually appear immediately after the day variable: flights_reorder &lt;- flights %&gt;% select(month:day, hour:time_hour, everything()) names(flights_reorder) in this case everything() picks up all remaining variables. Lastly, the helper functions starts_with, ends_with, and contains can be used to choose column names that match those conditions: flights_begin_a &lt;- flights %&gt;% select(starts_with(&quot;a&quot;)) flights_begin_a flights_delays &lt;- flights %&gt;% select(ends_with(&quot;delay&quot;)) flights_delays flights_time &lt;- flights %&gt;% select(contains(&quot;time&quot;)) flights_time 5.9.2 Rename variables using rename Another useful function is rename, which as you may suspect renames one column to another name. Suppose we wanted dep_time and arr_time to be departure_time and arrival_time instead in the flights_time data frame: flights_time_new &lt;- flights %&gt;% select(contains(&quot;time&quot;)) %&gt;% rename(departure_time = dep_time, arrival_time = arr_time) names(flights_time) Note that in this case we used a single = sign with the rename(). Ex: departure_time = dep_time. This is because we are not testing for equality like we would using ==, but instead we want to assign a new variable departure_time to have the same values as dep_time and then delete the variable dep_time. It’s easy to forget if the new name comes before or after the equals sign. I usually remember this as “New Before, Old After” or NBOA. You’ll receive an error if you try to do it the other way: Error: Unknown variables: departure_time, arrival_time. 5.9.3 Find the top number of values using top_n We can also use the top_n function which automatically tells us the most frequent num_flights. We specify the top 10 airports here: named_dests %&gt;% top_n(n = 10, wt = num_flights) We’ll still need to arrange this by num_flights though: named_dests %&gt;% top_n(n = 10, wt = num_flights) %&gt;% arrange(desc(num_flights)) Note: Remember that I didn’t pull the n and wt arguments out of thin air. They can be found by using the ? function on top_n. We can go one stop further and tie together the group_by and summarize functions we used to find the most frequent flights: ten_freq_dests &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) %&gt;% arrange(desc(num_flights)) %&gt;% top_n(n = 10) View(ten_freq_dests) Learning check (LC5.15) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways. (LC5.16) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains. (LC5.17) Why might we want to use the select function on a data frame? (LC5.18) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013. 5.10 Conclusion 5.10.1 Putting it all together: Available seat miles Let’s recap a selection of verbs in Table 5.1 summarizing their differences. Using these verbs and the pipe %&gt;% operator from Section 5.1, you’ll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book. Table 5.1: Summary of data wrangling verbs Verb Data wrangling operation 1 filter() Pick out a subset of rows 2 summarize() Summarize many values to one using a summary statistic function like mean(), median(), etc. 3 group_by() Add grouping structure to rows in data frame. Note this does not change values in data frame. 4 mutate() Create new variables by mutating existing ones 5 arrange() Arrange rows of a data variable in ascending (default) or descending order 6 inner_join() Join/merge two data frames, matching rows by a key variable 7 select() Pick out a subset of columns to make data frames easier to view Let’s now put your newly acquired data wrangling skills to the test! An airline industry measure of a passenger airline’s capacity is the available seat miles, which is equal to the number of seats available multiplied by the number of miles or kilometers flown. So for example say an airline had 2 flights using a plane with 10 seats that flew 500 miles and 3 flights using a plane with 20 seats that flew 1000 miles, the available seat miles would be 2 \\(\\times\\) 10 \\(\\times\\) 500 \\(+\\) 3 \\(\\times\\) 20 \\(\\times\\) 1000 = 70,000 seat miles. Learning check (LC5.19) Using the datasets included in the nycflights13 package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints: Crucial: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level pseudocode that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse what you are trying to do (the algorithm) with how you are going to do it (writing dplyr code). Take a close look at all the datasets using the View() function: flights, weather, planes, airports, and airlines to identify which variables are necessary to compute available seat miles. Figure 5.7 above showing how the various datasets can be joined will also be useful. Consider the data wrangling verbs in Table 5.1 as your toolbox! 5.10.2 Review questions Review questions have been designed using the fivethirtyeight R package (Kim, Ismay, and Chunn 2019) with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course Effective Data Storytelling using the tidyverse. The material in this chapter is covered in the chapters of the DataCamp course available below: Filtering, Grouping, &amp; Summarizing dplyr Review 5.10.3 What’s to come? Congratulations! We’ve completed the “data science” portion of this book! We’ll now move to the “data modeling” portion in Chapters 6 and 7, where you’ll leverage your data visualization and wrangling skills to model the relationships between different variables of datasets. However, we’re going to leave “Inference for Regression” (Chapter 11) until later. 5.10.4 Resources As we saw with the RStudio cheatsheet on data visualization, RStudio has also created a cheatsheet for data wrangling entitled “Data Transformation with dplyr”. 5.10.5 Script of R code An R script file of all R code used in this chapter is available here. "],
+["6-regression.html", "6 Basic Regression 6.1 One numerical explanatory variable 6.2 One categorical explanatory variable 6.3 Related topics 6.4 Conclusion", " 6 Basic Regression Now that we are equipped with data visualization skills from Chapter 3, an understanding of the “tidy” data format from Chapter 4, and data wrangling skills from Chapter 5, we now proceed with data modeling. The fundamental premise of data modeling is to make explicit the relationship between: an outcome variable \\(y\\), also called a dependent variable and an explanatory/predictor variable \\(x\\), also called an independent variable or covariate. Another way to state this is using mathematical terminology: we will model the outcome variable \\(y\\) as a function of the explanatory/predictor variable \\(x\\). Why do we have two different labels, explanatory and predictor, for the variable \\(x\\)? That’s because roughly speaking data modeling can be used for two purposes: Modeling for prediction: You want to predict an outcome variable \\(y\\) based on the information contained in a set of predictor variables. You don’t care so much about understanding how all the variables relate and interact, but so long as you can make good predictions about \\(y\\), you’re fine. For example, if we know many individuals’ risk factors for lung cancer, such as smoking habits and age, can we predict whether or not they will develop lung cancer? Here we wouldn’t care so much about distinguishing the degree to which the different risk factors contribute to lung cancer, but instead only on whether or not they could be put together to make reliable predictions. Modeling for explanation: You want to explicitly describe the relationship between an outcome variable \\(y\\) and a set of explanatory variables, determine the significance of any found relationships, and have measures summarizing these. Continuing our example from above, we would now be interested in describing the individual effects of the different risk factors and quantifying the magnitude of these effects. One reason could be to design an intervention to reduce lung cancer cases in a population, such as targeting smokers of a specific age group with an advertisement for smoking cessation programs. In this book, we’ll focus more on this latter purpose. Data modeling is used in a wide variety of fields, including statistical inference, causal inference, artificial intelligence, and machine learning. There are many techniques for data modeling, such as tree-based models, neural networks and deep learning, and supervised learning. In this chapter, we’ll focus on one particular technique: linear regression, one of the most commonly-used and easy-to-understand approaches to modeling. Recall our discussion in Subsection 2.4.3 on numerical and categorical variables. Linear regression involves: an outcome variable \\(y\\) that is numerical and explanatory variables \\(\\vec{x}\\) that are either numerical or categorical. With linear regression there is always only one numerical outcome variable \\(y\\) but we have choices on both the number and the type of explanatory variables \\(\\vec{x}\\) to use. We’re going to cover the following regression scenarios: In this current chapter on basic regression, we’ll always have only one explanatory variable. In Section 6.1, this explanatory variable will be a single numerical explanatory variable \\(x\\). This scenario is known as simple linear regression. In Section 6.2, this explanatory variable will be a categorical explanatory variable \\(x\\). In the next chapter, Chapter 7 on multiple regression, we’ll have more than one explanatory variable: We’ll focus on two numerical explanatory variables \\(x_1\\) and \\(x_2\\) in Section 7.1. This can be denoted as \\(\\vec{x}\\) as well since we have more than one explanatory variable. We’ll use one numerical and one categorical explanatory variable in Section 7.1. We’ll also introduce interaction models here; there, the effect of one explanatory variable depends on the value of another. We’ll study all four of these regression scenarios using real data, all easily accessible via R packages! Needed packages In this chapter we introduce a new package, moderndive, that is an accompaniment package to this ModernDive book. It includes useful functions for linear regression and other functions as well as data used later in the book. Let’s now load all the packages needed for this chapter. If needed, read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(gapminder) library(skimr) DataCamp The introductory basic regression analysis below was the inspiration for a large part of ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 2 “Modeling with Basic Regression”. 6.1 One numerical explanatory variable Why do some professors and instructors at universities and colleges get high teaching evaluations from students while others don’t? What factors can explain these differences? Are there biases? These are questions that are of interest to university/college administrators, as teaching evaluations are among the many criteria considered in determining which professors and instructors should get promotions. Researchers at the University of Texas in Austin, Texas (UT Austin) tried to answer this question: what factors can explain differences in instructor’s teaching evaluation scores? To this end, they collected information on \\(n = 463\\) instructors. A full description of the study can be found at openintro.org. We’ll keep things simple for now and try to explain differences in instructor evaluation scores as a function of one numerical variable: their “beauty score.” The specifics on how this score was calculated will be described shortly. Could it be that instructors with higher beauty scores also have higher teaching evaluations? Could it be instead that instructors with higher beauty scores tend to have lower teaching evaluations? Or could it be there is no relationship between beauty score and teaching evaluations? We’ll achieve ways to address these questions by modeling the relationship between these two variables with a particular kind of linear regression called simple linear regression. Simple linear regression is the most basic form of linear regression. With it we have A numerical outcome variable \\(y\\). In this case, their teaching score. A single numerical explanatory variable \\(x\\). In this case, their beauty score. 6.1.1 Exploratory data analysis A crucial step before doing any kind of modeling or analysis is performing an exploratory data analysis, or EDA, of all our data. Exploratory data analysis can give you a sense of the distribution of the data, and whether there are outliers and/or missing values. Most importantly, it can inform how to build your model. There are many approaches to exploratory data analysis; here are three: Most fundamentally: just looking at the raw values, in a spreadsheet for example. While this may seem trivial, many people ignore this crucial step! Computing summary statistics likes means, medians, and standard deviations. Creating data visualizations. Let’s load the data, select only a subset of the variables, and look at the raw values. Recall you can look at the raw values by running View() in the console in RStudio to pop-up the spreadsheet viewer with the data frame of interest as the argument to View(). Here, however, we present only a snapshot of five randomly chosen rows: evals_ch6 &lt;- evals %&gt;% select(score, bty_avg, age) evals_ch6 %&gt;% sample_n(5) Table 6.1: Random sample of 5 instructors score bty_avg age 3.6 6.67 34 4.9 3.50 43 3.3 2.33 47 4.4 4.67 33 4.7 3.67 60 While a full description of each of these variables can be found at openintro.org, let’s summarize what each of these variables represents. score: Numerical variable of the average teaching score based on students’ evaluations between 1 and 5. This is the outcome variable \\(y\\) of interest. bty_avg: Numerical variable of average “beauty” rating based on a panel of 6 students’ scores between 1 and 10. This is the numerical explanatory variable \\(x\\) of interest. Here 1 corresponds to a low beauty rating and 10 to a high beauty rating. age: A numerical variable of age in years as an integer value. Another way to look at the raw values is using the glimpse() function, which gives us a slightly different view of the data. We see Observations: 463, indicating that there are 463 observations in evals, each corresponding to a particular instructor at UT Austin. Expressed differently, each row in the data frame evals corresponds to one of 463 instructors. glimpse(evals_ch6) Observations: 463 Variables: 3 $ score &lt;dbl&gt; 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4.5, 4… $ bty_avg &lt;dbl&gt; 5.00, 5.00, 5.00, 5.00, 3.00, 3.00, 3.00, 3.33, 3.33, 3.17, 3… $ age &lt;int&gt; 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, 40, 4… Since both the outcome variable score and the explanatory variable bty_avg are numerical, we can compute summary statistics about them such as the mean, median, and standard deviation. Let’s take evals_ch6 and select only the two variables of interest for now. However, let’s instead pipe this into the skim() function from the skimr package. This function quickly uses a “skim” of the data to return the following summary information about each variable. evals_ch6 %&gt;% select(score, bty_avg) %&gt;% skim() Skim summary statistics n obs: 463 n variables: 2 ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 hist bty_avg 0 463 463 4.42 1.53 1.67 3.17 4.33 5.5 8.17 ▂▅▅▇▃▃▂▁ score 0 463 463 4.17 0.54 2.3 3.8 4.3 4.6 5 ▁▁▂▃▅▇▇▆ In this case for our two numerical variables bty_avg beauty score and teaching score score it returns: missing: the number of missing values complete: the number of non-missing or complete values n: the total number of values mean: the average sd: the standard deviation p0: the 0th percentile: the value at which 0% of observations are smaller than it. This is also known as the minimum p25: the 25th percentile: the value at which 25% of observations are smaller than it. This is also known as the 1st quartile p50: the 25th percentile: the value at which 50% of observations are smaller than it. This is also know as the 2nd quartile and more commonly the median p75: the 75th percentile: the value at which 75% of observations are smaller than it. This is also known as the 3rd quartile p100: the 100th percentile: the value at which 100% of observations are smaller than it. This is also known as the maximum A quick snapshot of the histogram We get an idea of how the values in both variables are distributed. For example, the mean teaching score was 4.17 out of 5 whereas the mean beauty score was 4.42 out of 10. Furthermore, the middle 50% of teaching scores were between 3.80 and 4.6 (the first and third quartiles) while the middle 50% of beauty scores were between 3.17 and 5.5 out of 10. The skim() function however only returns what are called univariate summaries, i.e. summaries about single variables at a time. Since we are considering the relationship between two numerical variables, it would be nice to have a summary statistic that simultaneously considers both variables. The correlation coefficient is a bivariate summary statistic that fits this bill. Coefficients in general are quantitative expressions of a specific property of a phenomenon. A correlation coefficient is a quantitative expression between -1 and 1 that summarizes the strength of the linear relationship between two numerical variables: -1 indicates a perfect negative relationship: as the value of one variable goes up, the value of the other variable tends to go down. 0 indicates no relationship: the values of both variables go up/down independently of each other. +1 indicates a perfect positive relationship: as the value of one variable goes up, the value of the other variable tends to go up as well. Figure 6.1 gives examples of different correlation coefficient values for hypothetical numerical variables \\(x\\) and \\(y\\). We see that while for a correlation coefficient of -0.75 there is still a negative relationship between \\(x\\) and \\(y\\), it is not as strong as the negative relationship between \\(x\\) and \\(y\\) when the correlation coefficient is -1. Figure 6.1: Different correlation coefficients The correlation coefficient is computed using the get_correlation() function in the moderndive package, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. We place the name of the response variable on the left hand side of the ~ and the explanatory variable on the right hand side of the “tilde.” We will use this same “formula” syntax with regression later in this chapter. evals_ch6 %&gt;% get_correlation(formula = score ~ bty_avg) # A tibble: 1 x 1 correlation &lt;dbl&gt; 1 0.187 The correlation coefficient can also be computed using the cor() function, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. Recall from Subsection 2.4.3 that the $ pulls out specific variables from a data frame: cor(x = evals_ch6$bty_avg, y = evals_ch6$score) [1] 0.187 In our case, the correlation coefficient of 0.187 indicates that the relationship between teaching evaluation score and beauty average is “weakly positive.” There is a certain amount of subjectivity in interpreting correlation coefficients, especially those that aren’t close to -1, 0, and 1. For help developing such intuition and more discussion on the correlation coefficient see Subsection 6.3.1 below. Let’s now proceed by visualizing this data. Since both the score and bty_avg variables are numerical, a scatterplot is an appropriate graph to visualize this data. Let’s do this using geom_point() and set informative axes labels and title and display the result in Figure 6.2. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) Figure 6.2: Instructor evaluation scores at UT Austin Observe the following: Most “beauty” scores lie between 2 and 8. Most teaching scores lie between 3 and 5. Recall our earlier computation of the correlation coefficient, which describes the strength of the linear relationship between two numerical variables. Looking at Figure 6.3, it is not immediately apparent that these two variables are positively related. This is to be expected given the positive, but rather weak (close to 0), correlation coefficient of 0.187. Before we continue, we bring to light an important fact about this dataset: it suffers from overplotting. Recall from the data visualization Subsection 3.3.2 that overplotting occurs when several points are stacked directly on top of each other thereby obscuring the number of points. For example, let’s focus on the 6 points in the top-right of the plot with a beauty score of around 8 out of 10: are there truly only 6 points, or are there many more just stacked on top of each other? You can think of these as ties. Let’s break up these ties with a little random “jitter” added to the points in Figure 6.3. Figure 6.3: Instructor evaluation scores at UT Austin: Jittered Jittering adds a little random bump to each of the points to break up these ties: just enough so you can distinguish them, but not so much that the plot is overly altered. Furthermore, jittering is strictly a visualization tool; it does not alter the original values in the dataset. Let’s compare side-by-side the regular scatterplot in Figure 6.2 with the jittered scatterplot in Figure 6.3 in Figure 6.4. Figure 6.4: Comparing regular and jittered scatterplots. We make several further observations: Focusing our attention on the top-right of the plot again, as noted earlier where there seemed to only be 6 points in the regular scatterplot, we see there were in fact really 9 as seen in the jittered scatterplot. A further interesting trend is that the jittering revealed a large number of instructors with beauty scores of between 3 and 4.5, towards the lower end of the beauty scale. Going forward for simplicity’s sake however, we’ll only present regular scatterplot rather than the jittered scatterplots; we’ll only keep the overplotting in mind whenever looking at such plots. Going back to scatterplot in Figure 6.2, let’s improve on it by adding a “regression line” in Figure 6.5. This is easily done by adding a new layer to the ggplot code that created Figure 6.3: + geom_smooth(method = &quot;lm&quot;). A regression line is a “best fitting” line in that of all possible lines you could draw on this plot, it is “best” in terms of some mathematical criteria. We discuss the criteria for “best” in Subsection 6.3.3 below, but we suggest you read this only after covering the concept of a residual coming up in Subsection 6.1.3. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) + geom_smooth(method = &quot;lm&quot;) Figure 6.5: Regression line When viewed on this plot, the regression line is a visual summary of the relationship between two numerical variables, in our case the outcome variable score and the explanatory variable bty_avg. The positive slope of the blue line is consistent with our observed correlation coefficient of 0.187 suggesting that there is a positive relationship between score and bty_avg. We’ll see later however that while the correlation coefficient is not equal to the slope of this line, they always have the same sign: positive or negative. What are the grey bands surrounding the blue line? These are standard error bands, which can be thought of as error/uncertainty bands. Let’s skip this idea for now and suppress these grey bars for now by adding the argument se = FALSE to geom_smooth(method = &quot;lm&quot;). We’ll introduce standard errors in Chapter 8 on sampling, use them for constructing confidence intervals and conducting hypothesis tests in Chapters 9 and 10, and consider them when we revisit regression in Chapter 11. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) Figure 6.6: Regression line without error bands Learning check (LC6.1) Conduct a new exploratory data analysis with the same outcome variable \\(y\\) being score but with age as the new explanatory variable \\(x\\). Remember, this involves three things: Looking at the raw values. Computing summary statistics of the variables of interest. Creating informative visualizations. What can you say about the relationship between age and teaching scores based on this exploration? 6.1.2 Simple linear regression You may recall from secondary school / high school algebra, in general, the equation of a line is \\(y = a + bx\\), which is defined by two coefficients. Recall we defined this earlier as “quantitative expressions of a specific property of a phenomenon.” These two coefficients are: the intercept coefficient \\(a\\), or the value of \\(y\\) when \\(x = 0\\), and the slope coefficient \\(b\\), or the increase in \\(y\\) for every increase of one in \\(x\\). However, when defining a line specifically for regression, like the blue regression line in Figure 6.6, we use slightly different notation: the equation of the regression line is \\(\\widehat{y} = b_0 + b_1 \\cdot x\\) where the intercept coefficient is \\(b_0\\), or the value of \\(\\widehat{y}\\) when \\(x=0\\), and the slope coefficient \\(b_1\\), or the increase in \\(\\widehat{y}\\) for every increase of one in \\(x\\). Why do we put a “hat” on top of the \\(y\\)? It’s a form of notation commonly used in regression, which we’ll introduce in the next Subsection 6.1.3 when we discuss fitted values. For now, let’s ignore the hat and treat the equation of the line as you would from secondary school / high school algebra recognizing the slope and the intercept. We know looking at Figure 6.6 that the slope coefficient corresponding to bty_avg should be positive. Why? Because as bty_avg increases, professors tend to roughly have larger teaching evaluation scores. However, what are the specific values of the intercept and slope coefficients? Let’s not worry about computing these by hand, but instead let the computer do the work for us. Specifically let’s use R! Let’s get the value of the intercept and slope coefficients by outputting something called the linear regression table. We will fit the linear regression model to the data using the lm() function and save this to score_model. lm stands for “linear model”, given that we are dealing with lines. When we say “fit”, we are saying find the best fitting line to this data. The lm() function that “fits” the linear regression model is typically used as lm(y ~ x, data = data_frame_name) where: y is the outcome variable, followed by a tilde (~). This is likely the key to the left of “1” on your keyboard. In our case, y is set to score. x is the explanatory variable. In our case, x is set to bty_avg. We call the combination y ~ x a model formula. Recall the use of this notation when we computed the correlation coefficient using the get_correlation() function in Subsection 6.1.1. data_frame_name is the name of the data frame that contains the variables y and x. In our case, data_frame_name is the evals_ch6 data frame. score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) score_model Call: lm(formula = score ~ bty_avg, data = evals_ch6) Coefficients: (Intercept) bty_avg 3.8803 0.0666 This output is telling us that the Intercept coefficient \\(b_0\\) of the regression line is 3.8803 and the slope coefficient for by_avg is 0.0666. Therefore the blue regression line in Figure 6.6 is \\[\\widehat{\\text{score}} = b_0 + b_{\\text{bty_avg}} \\cdot\\text{bty_avg} = 3.8803 + 0.0666\\cdot\\text{ bty_avg}\\] where The intercept coefficient \\(b_0 = 3.8803\\) means for instructors that had a hypothetical beauty score of 0, we would expect them to have on average a teaching score of 3.8803. In this case however, while the intercept has a mathematical interpretation when defining the regression line, there is no practical interpretation since score is an average of a panel of 6 students’ ratings from 1 to 10, a bty_avg of 0 would be impossible. Furthermore, no instructors had a beauty score anywhere near 0 in this data. Of more interest is the slope coefficient associated with bty_avg: \\(b_{\\text{bty avg}} = +0.0666\\). This is a numerical quantity that summarizes the relationship between the outcome and explanatory variables. Note that the sign is positive, suggesting a positive relationship between beauty scores and teaching scores, meaning as beauty scores go up, so also do teaching scores go up. The slope’s precise interpretation is: For every increase of 1 unit in bty_avg, there is an associated increase of, on average, 0.0666 units of score. Such interpretations need be carefully worded: We only stated that there is an associated increase, and not necessarily a causal increase. For example, perhaps it’s not that beauty directly affects teaching scores, but instead individuals from wealthier backgrounds tend to have had better education and training, and hence have higher teaching scores, but these same individuals also have higher beauty scores. Avoiding such reasoning can be summarized by the adage “correlation is not necessarily causation.” In other words, just because two variables are correlated, it doesn’t mean one directly causes the other. We discuss these ideas more in Subsection 6.3.2. We say that this associated increase is on average 0.0666 units of teaching score and not that the associated increase is exactly 0.0666 units of score across all values of bty_avg. This is because the slope is the average increase across all points as shown by the regression line in Figure 6.6. Now that we’ve learned how to compute the equation for the blue regression line in Figure 6.6 and interpreted all its terms, let’s take our modeling one step further. This time after fitting the model using the lm(), let’s get something called the regression table using the get_regression_table() function from the moderndive package: # Fit regression model: score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) # Get regression table: get_regression_table(score_model) Table 6.2: Linear regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 Note how we took the output of the model fit saved in score_model and used it as an input to the subsequent get_regression_table() function. The output now looks like a table: in fact it is a data frame. The values of the intercept and slope of 3.880 and 0.0666 are now in the estimate column. But what are the remaining 5 columns: std_error, statistic, p_value, lower_ci and upper_ci? What do they tell us? They tell us about both the statistical significance and practical significance of our model results. You can think of this loosely as the “meaningfulness” of the results from a statistical perspective. We are going to put aside these ideas for now and revisit them in Chapter 11 on (statistical) inference for regression, after we’ve had a chance to cover: Standard errors in Chapter 8 (std_error) Confidence intervals in Chapter 9 (lower_ci and upper_ci) Hypothesis testing in Chapter 10 (statistic and p_value). For now, we’ll only focus on the term and estimate columns of any regression table. The get_regression_table() from the moderndive is an example of what’s known as a wrapper function in computer programming, which takes other pre-existing functions and “wraps” them into a single function. This concept is illustrated in Figure 6.7. Figure 6.7: The concept of a ‘wrapper’ function. So all you need to worry about is the what the inputs look like and what the outputs look like; you leave all the other details “under the hood of the car.” In our regression modeling example, the get_regression_table() has Input: A saved lm() linear regression Output: A data frame with information on the intercept and slope of the regression line. If you’re interested in learning more about the get_regression_table() function’s construction and thinking, see Subsection 6.3.4 below. Learning check (LC6.2) Fit a new simple linear regression using lm(score ~ age, data = evals_ch6) where age is the new explanatory variable \\(x\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 6.1.3 Observed/fitted values and residuals We just saw how to get the value of the intercept and the slope of the regression line from the regression table generated by get_regression_table(). Now instead, say we want information on individual points. In this case, we focus on one of the \\(n = 463\\) instructors in this dataset, corresponding to a single row of evals_ch6. For example, say we are interested in the 21st instructor in this dataset: Table 6.3: Data for 21st instructor score bty_avg age 4.9 7.33 31 What is the value on the blue line corresponding to this instructor’s bty_avg of 7.333? In Figure 6.8 we mark three values in particular corresponding to this instructor. Red circle: This is the observed value \\(y\\) = 4.9 and corresponds to this instructor’s actual teaching score. Red square: This is the fitted value \\(\\widehat{y}\\) and corresponds to the value on the regression line for \\(x\\) = 7.333. This value is computed using the intercept and slope in the regression table above: \\[\\widehat{y} = b_0 + b_1 \\cdot x = 3.88 + 0.067 * 7.333 = 4.369\\] Blue arrow: The length of this arrow is the residual and is computed by subtracting the fitted value \\(\\widehat{y}\\) from the observed value \\(y\\). The residual can be thought of as the error or “lack of fit” of the regression line. In the case of this instructor, it is \\(y - \\widehat{y}\\) = 4.9 - 4.369 = 0.531. In other words, the model was off by 0.531 teaching score units for this instructor. Figure 6.8: Example of observed value, fitted value, and residual What if we want both the fitted value \\(\\widehat{y} = b_0 + b_1 \\cdot x\\) and the residual \\(y - \\widehat{y}\\) not only the 21st instructor but for all 463 instructors in the study? Recall that each instructor corresponds to one of the 463 rows in the evals_ch6 data frame and also one of the 463 points in the regression plot in Figure 6.6. We could repeat the above calculations by hand 463 times, but that would be tedious and time consuming. Instead, let’s use the get_regression_points() function that we’ve included in the moderndive R package. Note that in the table below we only present the results for the 21st through the 24th instructors. regression_points &lt;- get_regression_points(score_model) regression_points Table 6.4: Regression points (for only 21st through 24th instructor) ID score bty_avg score_hat residual 21 4.9 7.33 4.37 0.531 22 4.6 7.33 4.37 0.231 23 4.5 7.33 4.37 0.131 24 4.4 5.50 4.25 0.153 Just as with the get_regression_table() function, the inputs to the get_regression_points() function are the same, however the outputs are different. Let’s inspect the individual columns: The score column represents the observed value of the outcome variable \\(y\\). The bty_avg column represents the values of the explanatory variable \\(x\\). The score_hat column represents the fitted values \\(\\widehat{y}\\). The residual column represents the residuals \\(y - \\widehat{y}\\). get_regression_points() is another example of a wrapper function we described in Figure 6.7. If you’re curious about this function as well, check out Subsection 6.3.4. Just as we did for the 21st instructor in the evals_ch6 dataset (in the first row of the table above), let’s repeat the above calculations for the 24th instructor in the evals_ch6 dataset (in the fourth row of the table above): score = 4.4 is the observed value \\(y\\) for this instructor. bty_avg = 5.50 is the value of the explanatory variable \\(x\\) for this instructor. score_hat = 4.25 = 3.88 + 0.067 * \\(x\\) = 3.88 + 0.067 * 5.50 is the fitted value \\(\\widehat{y}\\) for this instructor. residual = 0.153 = 4.4 - 4.25 is the value of the residual for this instructor. In other words, the model was off by 0.153 teaching score units for this instructor. More development of this idea appears in Section 6.3.3 and we encourage you to read that section after you investigate residuals. 6.1.4 Residual analysis Recall the residuals can be thought of as the error or the “lack-of-fit” between the observed value \\(y\\) and the fitted value \\(\\widehat{y}\\) on the blue regression line in Figure 6.6. Ideally when we fit a regression model, we’d like there to be no systematic pattern to these residuals. We’ll be more specific as to what we mean by no systematic pattern when we see Figure 6.10 below, but let’s keep this notion imprecise for now. Investigating any such patterns is known as residual analysis and is the theme of this section. We’ll perform our residual analysis in two ways: Creating a scatterplot with the residuals on the \\(y\\)-axis and the original explanatory variable \\(x\\) on the \\(x\\)-axis. Creating a histogram of the residuals, thereby showing the distribution of the residuals. First, recall in Figure 6.8 above we created a scatterplot where on the vertical axis we had the teaching score \\(y\\), on the horizontal axis we had the beauty score \\(x\\), and the blue arrow represented the residual for one particular instructor. Instead, in Figure 6.9 below, let’s create a scatterplot where On the vertical axis we have the residual \\(y-\\widehat{y}\\) instead On the horizontal axis we have the beauty score \\(x\\) as before: ggplot(regression_points, aes(x = bty_avg, y = residual)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;, size = 1) Figure 6.9: Plot of residuals over beauty score You can think of Figure 6.9 as Figure 6.8 but with the blue line flattened out to \\(y=0\\). Does it seem like there is no systematic pattern to the residuals? This question is rather qualitative and subjective in nature, thus different people may respond with different answers to the above question. However, it can be argued that there isn’t a drastic pattern in the residuals. Let’s now get a little more precise in our definition of no systematic pattern in the residuals. Ideally, the residuals should behave randomly. In addition, the residuals should be on average 0. In other words, sometimes the regression model will make a positive error in that \\(y - \\widehat{y} &gt; 0\\), sometimes the regression model will make a negative error in that \\(y - \\widehat{y} &lt; 0\\), but on average the error is 0. Further, the value and spread of the residuals should not depend on the value of \\(x\\). In Figure 6.10 below, we display some hypothetical examples where there are drastic patterns to the residuals. In Example 1, the value of the residual seems to depend on \\(x\\): the residuals tend to be positive for small and large values of \\(x\\) in this range, whereas values of \\(x\\) more in the middle tend to have negative residuals. In Example 2, while the residuals seem to be on average 0 for each value of \\(x\\), the spread of the residuals varies for different values of \\(x\\); this situation is known as heteroskedasticity. Figure 6.10: Examples of less than ideal residual patterns The second way to perform a residual analysis is to look at the histogram of the residuals: ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 0.25, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) Figure 6.11: Histogram of residuals This histogram seems to indicate that we have more positive residuals than negative. Since the residual \\(y-\\widehat{y}\\) is positive when \\(y &gt; \\widehat{y}\\), it seems our fitted teaching score from the regression model tends to underestimate the true teaching score. This histogram has a slight left-skew in that there is a long tail on the left. Another way to say this is this data exhibits a negative skew. Is this a problem? Again, there is a certain amount of subjectivity in the response. In the authors’ opinion, while there is a slight skew/pattern to the residuals, it isn’t a large concern. On the other hand, others might disagree with our assessment. Here are examples of an ideal and less than ideal pattern to the residuals when viewed in a histogram: Figure 6.12: Examples of ideal and less than ideal residual patterns In fact, we’ll see later on that we would like the residuals to be normally distributed with mean 0. In other words, be bell-shaped and centered at 0! While this requirement and residual analysis in general may seem to some of you as not being overly critical at this point, we’ll see later after when we cover inference for regression in Chapter 11 that for the last five columns of the regression table from earlier (std error, statistic, p_value,lower_ci, and upper_ci) to have valid interpretations, the above three conditions should roughly hold. Learning check (LC6.3) Continuing with our regression using age as the explanatory variable and teaching score as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 463 instructors. Perform a residual analysis and look for any systematic patterns in the residuals. Ideally, there should be little to no pattern. 6.2 One categorical explanatory variable It’s an unfortunate truth that life expectancy is not the same across various countries in the world; there are a multitude of factors that are associated with how long people live. International development agencies are very interested in studying these differences in the hope of understanding where governments should allocate resources to address this problem. In this section, we’ll explore differences in life expectancy in two ways: Differences between continents: Are there significant differences in life expectancy, on average, between the five continents of the world: Africa, the Americas, Asia, Europe, and Oceania? Differences within continents: How does life expectancy vary within the world’s five continents? For example, is the spread of life expectancy among the countries of Africa larger than the spread of life expectancy among the countries of Asia? To answer such questions, we’ll study the gapminder dataset in the gapminder package. Recall we mentioned this dataset in Subsection 3.1.2 when we first studied the “Grammar of Graphics” introduced in Figure 3.1. This dataset has international development statistics such as life expectancy, GDP per capita, and population by country (\\(n\\) = 142) for 5-year intervals between 1952 and 2007. We’ll use this data for linear regression again, but note that our explanatory variable \\(x\\) is now categorical, and not numerical like when we covered simple linear regression in Section 6.1. More precisely, we have: A numerical outcome variable \\(y\\). In this case, life expectancy. A single categorical explanatory variable \\(x\\), In this case, the continent the country is part of. When the explanatory variable \\(x\\) is categorical, the concept of a “best-fitting” line is a little different than the one we saw previously in Section 6.1 where the explanatory variable \\(x\\) was numerical. We’ll study these differences shortly in Subsection 6.2.2, but first we conduct our exploratory data analysis. 6.2.1 Exploratory data analysis Let’s load the gapminder data and filter() for only observations in 2007. Next we select() only the variables we’ll need along with gdpPercap, which is each country’s gross domestic product per capita (GDP). GDP is a rough measure of that country’s economic performance. (This will be used for the upcoming Learning Check). Lastly, we save this in a data frame with name gapminder2007: library(gapminder) gapminder2007 &lt;- gapminder %&gt;% filter(year == 2007) %&gt;% select(country, continent, lifeExp, gdpPercap) You should look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function. In Table 6.5 we only show 5 randomly selected countries out of 142: View(gapminder2007) Table 6.5: Random sample of 5 countries country continent lifeExp gdpPercap Slovak Republic Europe 74.7 18678 Israel Asia 80.7 25523 Bulgaria Europe 73.0 10681 Tanzania Africa 52.5 1107 Myanmar Asia 62.1 944 glimpse(gapminder2007) Observations: 142 Variables: 4 $ country &lt;fct&gt; Afghanistan, Albania, Algeria, Angola, Argentina, Australia… $ continent &lt;fct&gt; Asia, Europe, Africa, Africa, Americas, Oceania, Europe, As… $ lifeExp &lt;dbl&gt; 43.8, 76.4, 72.3, 42.7, 75.3, 81.2, 79.8, 75.6, 64.1, 79.4,… $ gdpPercap &lt;dbl&gt; 975, 5937, 6223, 4797, 12779, 34435, 36126, 29796, 1391, 33… We see that the variable continent is indeed categorical, as it is encoded as fct which stands for “factor.” This is R’s way of storing categorical variables. Let’s once again apply the skim() function from the skimr package to our two variables of interest: continent and lifeExp: gapminder2007 %&gt;% select(continent, lifeExp) %&gt;% skim() Skim summary statistics n obs: 142 n variables: 2 ── Variable type:factor ────── variable missing complete n n_unique top_counts continent 0 142 142 5 Afr: 52, Asi: 33, Eur: 30, Ame: 25 ordered FALSE ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 lifeExp 0 142 142 67.01 12.07 39.61 57.16 71.94 76.41 82.6 hist ▂▂▂▂▂▃▇▇ The output now reports summaries for categorical variables (the variable type: factor) separately from the numerical variables. For the categorical variable continent it now reports: missing, complete, n as before which are the number of missing, complete, and total number of values. n_unique: The unique number of levels to this variable, corresponding to Africa, Asia, Americas, Europe, and Oceania top_counts: In this case the top four counts: Africa has 52 entries each corresponding to a country, Asia has 33, Europe has 30, and Americans has 25. Not displayed is Oceania with 2 countries ordered: Reporting whether the variable is “ordinal.” In this case, it is not ordered. Given that the global median life expectancy is 71.94, half of the world’s countries (71 countries) will have a life expectancy less than 71.94. Further, half will have a life expectancy greater than this value. The mean life expectancy of 67.01 is lower however. Why are these two values different? Let’s look at a histogram of lifeExp in Figure 6.13 to see why. Figure 6.13: Histogram of Life Expectancy in 2007 We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancies that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a group_by(continent) to the above code: lifeExp_by_continent &lt;- gapminder2007 %&gt;% group_by(continent) %&gt;% summarize(median = median(lifeExp), mean = mean(lifeExp)) Table 6.6: Life expectancy by continent continent median mean Africa 52.9 54.8 Americas 72.9 73.6 Asia 72.4 70.7 Europe 78.6 77.6 Oceania 80.7 80.7 We see now that there are differences in life expectancies between the continents. For example let’s focus on only medians. While the median life expectancy across all \\(n = 142\\) countries in 2007 was 71.935, the median life expectancy across the \\(n =52\\) countries in Africa was only 52.927. Let’s create a corresponding visualization. One way to compare the life expectancies of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section 3.6, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure 6.14, the variable we facet by is continent, which is categorical with five levels, each corresponding to the five continents of the world. ggplot(gapminder2007, aes(x = lifeExp)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + labs(x = &quot;Life expectancy&quot;, y = &quot;Number of countries&quot;, title = &quot;Life expectancy by continent&quot;) + facet_wrap(~ continent, nrow = 2) Figure 6.14: Life expectancy in 2007 Another way would be via a geom_boxplot where we map the categorical variable continent to the \\(x\\)-axis and the different life expectancies within each continent on the \\(y\\)-axis; we do this in Figure 6.15. ggplot(gapminder2007, aes(x = continent, y = lifeExp)) + geom_boxplot() + labs(x = &quot;Continent&quot;, y = &quot;Life expectancy (years)&quot;, title = &quot;Life expectancy by continent&quot;) Figure 6.15: Life expectancy in 2007 Some people prefer comparing a numerical variable between different levels of a categorical variable, in this case comparing life expectancy between different continents, using a boxplot over a faceted histogram as we can make quick comparisons with single horizontal lines. For example, we can see that even the country with the highest life expectancy in Africa is still lower than all countries in Oceania. It’s important to remember however that the solid lines in the middle of the boxes correspond to the medians (i.e. the middle value) rather than the mean (the average). So, for example, if you look at Asia, the solid line denotes the median life expectancy of around 72 years, indicating to us that half of all countries in Asia have a life expectancy below 72 years whereas half of all countries in Asia have a life expectancy above 72 years. Furthermore, note that: Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes). Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand. Now, let’s start making comparisons of life expectancy between continents. Let’s use Africa as a baseline for comparsion. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa: The median life expectancy of the Americas is roughly 20 years greater. The median life expectancy of Asia is roughly 20 years greater. The median life expectancy of Europe is roughly 25 years greater. The median life expectancy of Oceania is roughly 27.8 years greater. Let’s remember these four differences vs Africa corresponding to the Americas, Asia, Europe, and Oceania: 20, 20, 25, 27.8. Learning check (LC6.4) Conduct a new exploratory data analysis with the same explanatory variable \\(x\\) being continent but with gdpPercap as the new outcome variable \\(y\\). Remember, this involves three things: Looking at the raw values Computing summary statistics of the variables of interest. Creating informative visualizations What can you say about the differences in GDP per capita between continents based on this exploration? 6.2.2 Linear regression In Subsection 6.1.2 we introduced simple linear regression, which involves modeling the relationship between a numerical outcome variable \\(y\\) as a function of a numerical explanatory variable \\(x\\), in our life expectancy example, we now have a categorical explanatory variable \\(x\\) continent. While we still can fit a regression model, given our categorical explanatory variable we no longer have a concept of a “best-fitting” line, but rather “differences relative to a baseline for comparison.” Before we fit our regression model, let’s create a table similar to Table 6.6, but Report the mean life expectancy for each continent. Report the difference in mean life expectancy relative to Africa’s mean life expectancy of 54.806 in the column “mean vs Africa”; this column is simply the “mean” column minus 54.806. Think back to your observations from the eyeball test of Figure 6.15 at the end of the last subsection. The column “mean vs Africa” is the same idea of comparing a summary statistic to a baseline for comparison, in this case the countries of Africa, but using means instead of medians. Table 6.7: Mean life expectancy by continent continent mean mean vs Africa Africa 54.8 0.0 Americas 73.6 18.8 Asia 70.7 15.9 Europe 77.6 22.8 Oceania 80.7 25.9 Now, let’s use the get_regression_table() function we introduced in Section 6.1.2 to get the regression table for gapminder2007 analysis: lifeExp_model &lt;- lm(lifeExp ~ continent, data = gapminder2007) get_regression_table(lifeExp_model) Table 6.8: Linear regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 54.8 1.02 53.45 0 52.8 56.8 continentAmericas 18.8 1.80 10.45 0 15.2 22.4 continentAsia 15.9 1.65 9.68 0 12.7 19.2 continentEurope 22.8 1.70 13.47 0 19.5 26.2 continentOceania 25.9 5.33 4.86 0 15.4 36.5 Just as before, we have the term and estimates columns of interest, but unlike before, we now have 5 rows corresponding to 5 outputs in our table: an intercept like before, but also continentAmericas, continentAsia, continentEurope, and continentOceania. What are these values? First, we must describe the equation for fitted value \\(\\widehat{y}\\), which is a little more complicated when the \\(x\\) explanatory variable is categorical: \\[ \\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x) \\end{align} \\] Let’s break this down. First, \\(\\mathbb{1}_{A}(x)\\) is what’s known in mathematics as an “indicator function” that takes one of two possible values: \\[ \\mathbb{1}_{A}(x) = \\left\\{ \\begin{array}{ll} 1 &amp; \\text{if } x \\text{ is in } A \\\\ 0 &amp; \\text{if } \\text{otherwise} \\end{array} \\right. \\] In a statistical modeling context this is also known as a “dummy variable”. In our case, let’s consider the first such indicator variable: \\[ \\mathbb{1}_{\\mbox{Amer}}(x) = \\left\\{ \\begin{array}{ll} 1 &amp; \\text{if } \\text{country } x \\text{ is in the Americas} \\\\ 0 &amp; \\text{otherwise}\\end{array} \\right. \\] Now let’s interpret the terms in the estimate column of the regression table. First \\(b_0 =\\) intercept = 54.8 corresponds to the mean life expectancy for countries in Africa, since for country \\(x\\) in Africa we have the following equation: \\[ \\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 0 + 15.9\\cdot 0 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 \\end{align} \\] i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table 6.7. Next, \\(b_{\\text{Amer}}\\) = continentAmericas = 18.8 is the difference in mean life expectancies of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is: \\[ \\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 1 + 15.9\\cdot 0 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 + 18.8\\\\ &amp;= 72.9 \\end{align} \\] i.e. in this case, only the indicator function \\(\\mathbb{1}_{\\mbox{Amer}}(x)\\) is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table 6.7. Similarly, \\(b_{\\text{Asia}}\\) = continentAsia = 15.9 is the difference in mean life expectancies of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is: \\[ \\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 0 + 15.9\\cdot 1 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 + 15.9\\\\ &amp;= 70.7 \\end{align} \\] i.e. in this case, only the indicator function \\(\\mathbb{1}_{\\mbox{Asia}}(x)\\) is equal to 1, but all others are 0. Recall that 70.7 corresponds to the group mean life expectancy for all countries in Asia in Table 6.7. The same logic applies to \\(b_{\\text{Euro}} = 22.8\\) and \\(b_{\\text{Ocean}} = 25.9\\); they correspond to the “offset” in mean life expectancy for countries in Europe and Oceania, relative to the mean life expectancy of the baseline group for comparison of African countries. Let’s generalize this idea a bit. If we fit a linear regression model using a categorical explanatory variable \\(x\\) that has \\(k\\) levels, a regression model will return an intercept and \\(k - 1\\) “slope” coefficients. When \\(x\\) is a numerical explanatory variable the interpretation is of a “slope” coefficient, but when \\(x\\) is categorical the meaning is a little trickier. They are offsets relative to the baseline. In our case, since there are \\(k = 5\\) continents, the regression model returns an intercept corresponding to the baseline for comparison Africa and \\(k - 1 = 4\\) slope coefficients corresponding to the Americas, Asia, Europe, and Oceania. Africa was chosen as the baseline by R for no other reason than it is first alphabetically of the 5 continents. You can manually specify which continent to use as baseline instead of the default choice of whichever comes first alphabetically, but we leave that to a more advanced course. (The forcats package is particularly nice for doing this and we encourage you to explore using it.) Learning check (LC6.5) Fit a new linear regression using lm(gdpPercap ~ continent, data = gapminder2007) where gdpPercap is the new outcome variable \\(y\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 6.2.3 Observed/fitted values and residuals Recall in Subsection 6.1.3 when we had a numerical explanatory variable \\(x\\), we defined: Observed values \\(y\\), or the observed value of the outcome variable Fitted values \\(\\widehat{y}\\), or the value on the regression line for a given \\(x\\) value Residuals \\(y - \\widehat{y}\\), or the error between the observed value and the fitted value What do fitted values \\(\\widehat{y}\\) and residuals \\(y - \\widehat{y}\\) correspond to when the explanatory variable \\(x\\) is categorical? Let’s investigate these values for the first 10 countries in the gapminder2007 dataset: Table 6.9: First 10 out of 142 countries country continent lifeExp gdpPercap Afghanistan Asia 43.8 975 Albania Europe 76.4 5937 Algeria Africa 72.3 6223 Angola Africa 42.7 4797 Argentina Americas 75.3 12779 Australia Oceania 81.2 34435 Austria Europe 79.8 36126 Bahrain Asia 75.6 29796 Bangladesh Asia 64.1 1391 Belgium Europe 79.4 33693 Recall the get_regression_points() function we used in Subsection 6.1.3 to return the observed value of the outcome variable, all explanatory variables, fitted values, and residuals for all points in the regression. Recall that each “point”. In this case, each row corresponds to one of 142 countries in the gapminder2007 dataset. They are also the 142 observations used to construct the boxplots in Figure 6.15. regression_points &lt;- get_regression_points(lifeExp_model) regression_points Table 6.10: Regression points (First 10 out of 142 countries) ID lifeExp continent lifeExp_hat residual 1 43.8 Asia 70.7 -26.900 2 76.4 Europe 77.6 -1.226 3 72.3 Africa 54.8 17.495 4 42.7 Africa 54.8 -12.075 5 75.3 Americas 73.6 1.712 6 81.2 Oceania 80.7 0.515 7 79.8 Europe 77.6 2.180 8 75.6 Asia 70.7 4.907 9 64.1 Asia 70.7 -6.666 10 79.4 Europe 77.6 1.792 Notice The fitted values lifeExp_hat \\(\\widehat{\\text{lifeexp}}\\). Countries in Africa have the same fitted value of 54.8, which is the mean life expectancy of Africa. Countries in Asia have the same fitted value of 70.7, which is the mean life expectancy of Asia. This similarly holds for countries in the Americas, Europe, and Oceania. The residual column is simply \\(y - \\widehat{y}\\) = lifeexp - lifeexp_hat. These values can be interpreted as that particular country’s deviation from the mean life expectancy of the respective continent’s mean. For example, the first row of this dataset corresponds to Afghanistan, and the residual of \\(-26.9 = 43.8 - 70.7\\) is Afghanistan’s mean life expectancy minus the mean life expectancy of all Asian countries. 6.2.4 Residual analysis Recall our discussion on residuals from Section 6.1.4 where our goal was to investigate whether or not there was a systematic pattern to the residuals. Ideally since residuals can be thought of as error, there should be no such pattern. While there are many ways to do such residual analysis, we focused on two approaches based on visualizations. A plot with residuals on the vertical axis and the predictor (in this case continent) on the horizontal axis A histogram of all residuals First, let’s plot the residuals versus continent in Figure 6.16, but also let’s plot all 142 points with a little horizontal random jitter by setting the width = 0.1 parameter in geom_jitter(): ggplot(regression_points, aes(x = continent, y = residual)) + geom_jitter(width = 0.1) + labs(x = &quot;Continent&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;) Figure 6.16: Plot of residuals over continent We observe There seems to be a rough balance of both positive and negative residuals for all 5 continents. However, there is one clear outlier in Asia. It has the smallest residual, hence also has the smallest life expectancy in Asia. Let’s investigate the 5 countries in Asia with the shortest life expectancy: gapminder2007 %&gt;% filter(continent == &quot;Asia&quot;) %&gt;% arrange(lifeExp) Table 6.11: Countries in Asia with shortest life expectancy country continent lifeExp gdpPercap Afghanistan Asia 43.8 975 Iraq Asia 59.5 4471 Cambodia Asia 59.7 1714 Myanmar Asia 62.1 944 Yemen, Rep. Asia 62.7 2281 This was the earlier identified residual for Afghanistan of -26.9. Unfortunately given recent geopolitical turmoil, individuals who live in Afghanistan and, in particular in 2007, have a drastically lower life expectancy. Second, let’s look at a histogram of all 142 values of residuals in Figure 6.17. In this case, the residuals form a rather nice bell-shape, although there are a couple of very low and very high values at the tails. As we said previously, searching for patterns in residuals can be somewhat subjective, but ideally we hope there are no “drastic” patterns. ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) Figure 6.17: Histogram of residuals Learning check (LC6.6) Continuing with our regression using gdpPercap as the outcome variable and continent as the explanatory variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 142 countries in 2007 and perform a residual analysis to look for any systematic patterns in the residuals. Is there a pattern? Please keep in mind that these types of questions are somewhat subjective and different people will most likely have different answers. The focus should be on being able to justify the conclusions made. 6.3 Related topics 6.3.1 Correlation coefficient Let’s re-plot Figure 6.1, but now consider a broader range of correlation coefficient values in Figure 6.18. Figure 6.18: Different Correlation Coefficients As we suggested in Subsection 6.1.1, interpreting coefficients that are not close to the extreme values of -1 and 1 can be subjective. To develop your sense of correlation coefficients, we suggest you play the following 80’s-style video game called “Guess the correlation”! Click on the image below to do so: 6.3.2 Correlation is not necessarily causation You’ll note throughout this chapter we’ve been very cautious in making statements of the “associated effect” of explanatory variables on the outcome variables, for example our statement from Subsection 6.1.2 that “for every increase of 1 unit in bty_avg, there is an associated increase of, on average, 18.802 units of score.” We stay this because we are careful not to make causal statements. So while beauty score bty_avg is positively correlated with teaching score, does it directly cause effects on teaching score. For example, let’s say an instructor has their bty_avg reevaluated, but only after taking steps to try to boost their beauty score. Does this mean that they will suddenly be a better instructor? Or will they suddenly get higher teaching scores? Maybe? Here is another example, a not-so-great medical doctor goes through their medical records and finds that patients who slept with their shoes on tended to wake up more with headaches. So this doctor declares “Sleeping with shoes on cause headaches!” Figure 6.19: Does sleeping with shoes on cause headaches? However as some of you might have guessed, if someone is sleeping with their shoes on its probably because they are intoxicated. Furthermore, drinking more tends to cause more hangovers, and hence more headaches. In this instance, alcohol is what’s known as a confounding/lurking variable. It “lurks” behind the scenes, confounding or making less apparent, the causal effect (if any) of “sleeping with shoes on” with waking up with a headache. We can summarize this notion in Figure 6.20 with a causal graph where: Y: Is an outcome variable, here “waking up with a headache.” X: Is a treatment variable whose causal effect we are interested in, here “sleeping with shoes on.” Figure 6.20: Causal graph. So for example, many such studies use regression modeling where the outcome variable is set to Y and the explanatory/predictor variable is X, much as you’ve started learning how to do in this chapter. However, Figure 6.20 also includes a third variable with arrows pointing at both X and Y. Z: Is a confounding variable that effects both X &amp; Y, thus “confounding” their relationship. So as we said, alcohol will both cause people to be more likely to sleep with their shoes on as well as more likely to wake up with a headache. Thus when evaluating what causes one to wake up with a headache, its hard to tease out the effect of sleeping with shoes on versus just the alcohol. Thus our model needs to also use Z as an explanatory/predictor variable as well, in other words our doctor needs to take into account who had been drinking the night before. We’ll start covering multiple regression models that allows us to incorporate more than one variable in the next chapter. Establishing causation is a tricky problem and frequently takes either carefully designed experiments or methods to control for the effects of potential confounding variables. Both these approaches attempt either to remove all confounding variables or take them into account as best they can, and only focus on the behavior of an outcome variable in the presence of the levels of the other variable(s). Be careful as you read studies to make sure that the writers aren’t falling into this fallacy of correlation implying causation. If you spot one, you may want to send them a link to Spurious Correlations. 6.3.3 Best fitting line Regression lines are also known as “best fitting lines”. But what do we mean by best? Let’s unpack the criteria that is used by regression to determine best. Recall the plot in Figure 6.8 where for a instructor with a beauty average score of \\(x=7.333\\) The observed value \\(y=4.9\\) was marked with a red circle The fitted value \\(\\widehat{y} = 4.369\\) on the regression line was marked with a red square The residual \\(y-\\widehat{y} = 4.9-4.369 = 0.531\\) was the length of the blue arrow. Let’s do this for another arbitrarily chosen instructor whose beauty score was \\(x=2.333\\). The residual in this case is \\(2.7 - 4.036 = -1.336\\). Another arbitrarily chosen instructor whose beauty score was \\(x=3.667\\) results in the residual in this case being \\(4.4 - 4.125 = 0.2753\\). Let’s do this one more time for another arbitrarily chosen instructor. This instructor had a beauty score of \\(x = 6\\). The residual in this case is \\(3.8 - 4.28 = -0.4802\\). Now let’s say we repeated this process for all 463 instructors in our dataset. Regression minimizes the sum of all 463 arrow lengths squared. In other words, it minimizes the sum of the squared residuals: \\[ \\sum_{i=1}^{n}(y_i - \\widehat{y}_i)^2 \\] We square the arrow lengths so that positive and negative deviations of the same amount are treated equally. That’s why alternative names for the simple linear regression line are the least-squares line and the best fitting line. It can be proven via calculus and linear algebra that this line uniquely minimizes the sum of the squared arrow lengths. For the regression line in the plot, the sum of the squared residuals is 131.879. This is the lowest possible value of the sum of the squared residuals of all possible lines we could draw on this scatterplot? How do we know this? We can mathematically prove this fact, but this requires some calculus and linear algebra, so let’s leave this proof for another course! 6.3.4 get_regression_x() functions What is going on behind the scenes with the get_regression_table() get_regression_points() from the moderndive package? Recall we introduced In Subsection 6.1.2, the get_regression_table() function that returned a regression table. In Subsection 6.1.3, the get_regression_points() function that returned information on all \\(n\\) points/observations involved in a regression? and that these were examples of wrapper functions that takes other pre-existing functions and “wraps” them in a single function. This way all the user needs to worry about is the input and the output format, and ignore what’s “under the hood.” In this subsection we “lift the hood” and see how the engine of these wrapper functions work. First, the get_regression_table() wrapper function leverages the the tidy() function in the broom package and the clean_names() function in the janitor package to generate tidy data frames with information about a regression model. Here is what the regression table from Subsection 6.1.2 looks like: score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) get_regression_table(score_model) term estimate std_error statistic p_value lower_ci upper_ci intercept 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 The get_regression_table() function takes the above two functions that already existed in other R packages, uses them, and hides the details as seen below. This was on the editorial decision on our part as we felt the following code was unfortunately out of the reach for some new coders, so the following wrapper function was written so that users need only focus on the output. library(broom) library(janitor) score_model %&gt;% tidy(conf.int = TRUE) %&gt;% mutate_if(is.numeric, round, digits = 3) %&gt;% clean_names() %&gt;% rename(lower_ci = conf_low, upper_ci = conf_high) term estimate std_error statistic p_value lower_ci upper_ci (Intercept) 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 Note that the mutate_if() function is from the dplyr package and applies the round() function with 3 significant digits precision only to those variables that are numerical. Similarly, the second get_regression_points() function is another wrapper function, but this time returning information about the points in a regression rather than the regression table. It uses the augment() function in the broom package instead of tidy() as with get_regression_points(). library(broom) library(janitor) score_model %&gt;% augment() %&gt;% mutate_if(is.numeric, round, digits = 3) %&gt;% clean_names() %&gt;% select(-c(&quot;se_fit&quot;, &quot;hat&quot;, &quot;sigma&quot;, &quot;cooksd&quot;, &quot;std_resid&quot;)) score bty_avg fitted resid 4.7 5.00 4.21 0.486 4.1 5.00 4.21 -0.114 3.9 5.00 4.21 -0.314 4.8 5.00 4.21 0.586 4.6 3.00 4.08 0.520 4.3 3.00 4.08 0.220 2.8 3.00 4.08 -1.280 4.1 3.33 4.10 -0.002 3.4 3.33 4.10 -0.702 4.5 3.17 4.09 0.409 In this case, it outputs only variables of interest to us as new regression modelers: the outcome variable \\(y\\) (score), all explanatory/predictor variables (bty_avg), all resulting fitted values \\(\\hat{y}\\) used by applying the equation of the regression line to bty_avg, and the residual \\(y - \\hat{y}\\). If you’re even more curious, take a look at the source code for these functions on GitHub. 6.4 Conclusion In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter 7, we’ll study multiple regression where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections {#model1residuals} and {#model2residuals}. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, lower_ci and upper_ci (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in! 6.4.1 Script of R code An R script file of all R code used in this chapter is available here. "],
+["7-multiple-regression.html", "7 Multiple Regression 7.1 Two numerical explanatory variables 7.2 One numerical &amp; one categorical explanatory variable 7.3 Related topics 7.4 Conclusion", " 7 Multiple Regression In Chapter 6 we introduced ideas related to modeling, in particular that the fundamental premise of modeling is to make explicit the relationship between an outcome variable \\(y\\) and an explanatory/predictor variable \\(x\\). Recall further the synonyms that we used to also denote \\(y\\) as the dependent variable and \\(x\\) as an independent variable or covariate. There are many modeling approaches one could take, among the most well-known being linear regression, which was the focus of the last chapter. Whereas in the last chapter we focused solely on regression scenarios where there is only one explanatory/predictor variable, in this chapter, we now focus on modeling scenarios where there is more than one. This case of regression more than one explanatory variable is known as multiple regression. You can imagine when trying to model a particular outcome variable, like teaching evaluation score as in Section 6.1 or life expectancy as in Section 6.2, it would be very useful to incorporate more than one explanatory variable. Since our regression models will now consider more than one explanatory/predictor variable, the interpretation of the associated effect of any one explanatory/predictor variables must be made in conjunction with the others. For example, say we are modeling individuals’ incomes as a function of their number of years of education and their parents’ wealth. When interpreting the effect of education on income, one has to consider the effect of their parents’ wealth at the same time, as these two variables are almost certainly related. Make note of this throughout this chapter and as you work on interpreting the results of multiple regression models into the future. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(ISLR) library(skimr) DataCamp The approach taken below of using more than one variable of information in models using multiple regression is identical to that taken in ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression”. 7.1 Two numerical explanatory variables Let’s now attempt to identify factors that are associated with how much credit card debt an individual will have. The textbook An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani is an intermediate-level textbook on statistical and machine learning freely available here. It has an accompanying R package called ISLR with datasets that the authors use to demonstrate various machine learning methods. One dataset that is frequently used by the authors is the Credit dataset where predictions are made on the credit card balance held by \\(n = 400\\) credit card holders. These predictions are based on information about them like income, credit limit, and education level. Note that this dataset is not based on actual individuals, it is a simulated dataset used for educational purposes. Since no information was provided as to who these \\(n\\) = 400 individuals are and how they came to be included in this dataset, it will be hard to make any scientific claims based on this data. Recall our discussion from the previous chapter that correlation does not necessarily imply causation. That being said, we’ll still use Credit to demonstrate multiple regression with: A numerical outcome variable \\(y\\), in this case credit card balance. Two explanatory variables: A first numerical explanatory variable \\(x_1\\). In this case, their credit limit. A second numerical explanatory variable \\(x_2\\). In this case, their income (in thousands of dollars). In the forthcoming Learning Checks, we’ll consider a different scenario: The same numerical outcome variable \\(y\\): credit card balance. Two new explanatory variables: A first numerical explanatory variable \\(x_1\\): their credit rating. A second numerical explanatory variable \\(x_2\\): their age. 7.1.1 Exploratory data analysis Let’s load the Credit data and select() only the needed subset of variables. library(ISLR) Credit &lt;- Credit %&gt;% select(Balance, Limit, Income, Rating, Age) Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function. Although in Table 7.1 we only show 5 randomly selected credit card holders out of 400: View(Credit) Table 7.1: Random sample of 5 credit card holders Balance Limit Income Rating Age 1425 6045 39.8 459 32 279 3300 15.1 266 66 204 5308 80.6 394 57 1050 9310 180.4 665 67 15 4952 88.8 360 86 glimpse(Credit) Observations: 400 Variables: 5 $ Balance &lt;int&gt; 333, 903, 580, 964, 331, 1151, 203, 872, 279, 1350, 1407, 0, … $ Limit &lt;int&gt; 3606, 6645, 7075, 9504, 4897, 8047, 3388, 7114, 3300, 6819, 8… $ Income &lt;dbl&gt; 14.9, 106.0, 104.6, 148.9, 55.9, 80.2, 21.0, 71.4, 15.1, 71.1… $ Rating &lt;int&gt; 283, 483, 514, 681, 357, 569, 259, 512, 266, 491, 589, 138, 3… $ Age &lt;int&gt; 34, 82, 71, 36, 68, 77, 37, 87, 66, 41, 30, 64, 57, 49, 75, 5… Let’s look at some summary statistics, again using the skim() function from the skimr package: Credit %&gt;% select(Balance, Limit, Income) %&gt;% skim() Skim summary statistics n obs: 400 n variables: 3 ── Variable type:integer ───── variable missing complete n mean sd p0 p25 p50 p75 p100 Balance 0 400 400 520.01 459.76 0 68.75 459.5 863 1999 Limit 0 400 400 4735.6 2308.2 855 3088 4622.5 5872.75 13913 hist ▇▃▃▃▂▁▁▁ ▅▇▇▃▂▁▁▁ ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 Income 0 400 400 45.22 35.24 10.35 21.01 33.12 57.47 186.63 hist ▇▃▂▁▁▁▁▁ We observe for example: The mean and median credit card balance are $520.01 and $495.50 respectively. 25% of card holders had debts of $68.75 or less. The mean and median credit card limit are $4735.6 and $4622.50 respectively. 75% of these card holders had incomes of $57,470 or less. Since our outcome variable Balance and the explanatory variables Limit and Rating are numerical, we can compute the correlation coefficient between pairs of these variables. First, we could run the get_correlation() command as seen in Subsection 6.1.1 twice, once for each explanatory variable: Credit %&gt;% get_correlation(Balance ~ Limit) Credit %&gt;% get_correlation(Balance ~ Income) Or we can simultaneously compute them by returning a correlation matrix in Table 7.2. We can read off the correlation coefficient for any pair of variables by looking them up in the appropriate row/column combination. Credit %&gt;% select(Balance, Limit, Income) %&gt;% cor() Table 7.2: Correlations between credit card balance, credit limit, and income Balance Limit Income Balance 1.000 0.862 0.464 Limit 0.862 1.000 0.792 Income 0.464 0.792 1.000 For example, the correlation coefficient of: Balance with itself is 1 as we would expect based on the definition of the correlation coefficient. Balance with Limit is 0.862. This indicates a strong positive linear relationship, which makes sense as only individuals with large credit limits can accrue large credit card balances. Balance with Income is 0.464. This is suggestive of another positive linear relationship, although not as strong as the relationship between Balance and Limit. As an added bonus, we can read off the correlation coefficient of the two explanatory variables, Limit and Income of 0.792. In this case, we say there is a high degree of collinearity between these two explanatory variables. Collinearity (or multicollinearity) is a phenomenon in which one explanatory variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. So in this case, if we knew someone’s credit card Limit and since Limit and Income are highly correlated, we could make a fairly accurate guess as to that person’s Income. Or put loosely, these two variables provided redundant information. For now let’s ignore any issues related to collinearity and press on. Let’s visualize the relationship of the outcome variable with each of the two explanatory variables in two separate plots: ggplot(Credit, aes(x = Limit, y = Balance)) + geom_point() + labs(x = &quot;Credit limit (in $)&quot;, y = &quot;Credit card balance (in $)&quot;, title = &quot;Relationship between balance and credit limit&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) ggplot(Credit, aes(x = Income, y = Balance)) + geom_point() + labs(x = &quot;Income (in $1000)&quot;, y = &quot;Credit card balance (in $)&quot;, title = &quot;Relationship between balance and income&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) Figure 7.1: Relationship between credit card balance and credit limit/income First, there is a positive relationship between credit limit and balance, since as credit limit increases so also does credit card balance; this is to be expected given the strongly positive correlation coefficient of 0.862. In the case of income, the positive relationship doesn’t appear as strong, given the weakly positive correlation coefficient of 0.464. However the two plots in Figure 7.1 only focus on the relationship of the outcome variable with each of the explanatory variables independently. To get a sense of the joint relationship of all three variables simultaneously through a visualization, let’s display the data in a 3-dimensional (3D) scatterplot, where The numerical outcome variable \\(y\\) Balance is on the z-axis (vertical axis) The two numerical explanatory variables form the “floor” axes. In this case The first numerical explanatory variable \\(x_1\\) Income is on of the floor axes. The second numerical explanatory variable \\(x_2\\) Limit is on the other floor axis. Click on the following image to open an interactive 3D scatterplot in your browser: Previously in Figure 6.6, we plotted a “best-fitting” regression line through a set of points where the numerical outcome variable \\(y\\) was teaching score and a single numerical explanatory variable \\(x\\) was bty_avg. What is the analogous concept when we have two numerical predictor variables? Instead of a best-fitting line, we now have a best-fitting plane, which is a 3D generalization of lines which exist in 2D. Click on the following image to open an interactive plot of the regression plane in your browser. Move the image around, zoom in, and think about how this plane generalizes the concept of a linear regression line to three dimensions. Learning check (LC7.1) Conduct a new exploratory data analysis with the same outcome variable \\(y\\) being Balance but with Rating and Age as the new explanatory variables \\(x_1\\) and \\(x_2\\). Remember, this involves three things: Looking at the raw values Computing summary statistics of the variables of interest. Creating informative visualizations What can you say about the relationship between a credit card holder’s balance and their credit rating and age? 7.1.2 Multiple regression Just as we did when we had a single numerical explanatory variable \\(x\\) in Subsection 6.1.2 and when we had a single categorical explanatory variable \\(x\\) in Subsection 6.2.2, we fit a regression model and obtained the regression table in our two numerical explanatory variable scenario. To fit a regression model and get a table using get_regression_table(), we now use a + to consider multiple explanatory variables. In this case since we want to perform a regression of Limit and Income simultaneously, we input Balance ~ Limit + Income. Balance_model &lt;- lm(Balance ~ Limit + Income, data = Credit) get_regression_table(Balance_model) Table 7.3: Multiple regression table term estimate std_error statistic p_value lower_ci upper_ci intercept -385.179 19.465 -19.8 0 -423.446 -346.912 Limit 0.264 0.006 45.0 0 0.253 0.276 Income -7.663 0.385 -19.9 0 -8.420 -6.906 How do we interpret these three values that define the regression plane? Intercept: -$385.18 (rounded to two decimal points to represent cents). The intercept in our case represents the credit card balance for an individual who has both a credit Limit of $0 and Income of $0. In our data however, the intercept has limited practical interpretation as no individuals had Limit or Income values of $0 and furthermore the smallest credit card balance was $0. Rather, it is used to situate the regression plane in 3D space. Limit: $0.26. Now that we have multiple variables to consider, we have to add a caveat to our interpretation: taking all other variables in our model into account, for every increase of one unit in credit Limit (dollars), there is an associated increase of on average $0.26 in credit card balance. Note: Just as we did in Subsection 6.1.2, we are not making any causal statements, only statements relating to the association between credit limit and balance We need to preface our interpretation of the associated effect of Limit with the statement “taking all other variables into account”, in this case Income, to emphasize that we are now jointly interpreting the associated effect of multiple explanatory variables in the same model and not in isolation. Income: -$7.66. Similarly, taking all other variables into account, for every increase of one unit in Income (in other words, $1000 in income), there is an associated decrease of on average $7.66 in credit card balance. However, recall in Figure 7.1 that when considered separately, both Limit and Income had positive relationships with the outcome variable Balance. As card holders’ credit limits increased their credit card balances tended to increase as well, and a similar relationship held for incomes and balances. In the above multiple regression, however, the slope for Income is now -7.66, suggesting a negative relationship between income and credit card balance. What explains these contradictory results? This is known as Simpson’s Paradox, a phenomenon in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. We expand on this in Subsection 7.3.2 where we’ll look at the relationship between credit Limit and credit card balance but split by different income bracket groups. Learning check (LC7.2) Fit a new simple linear regression using lm(Balance ~ Rating + Age, data = Credit) where Rating and Age are the new numerical explanatory variables \\(x_1\\) and \\(x_2\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 7.1.3 Observed/fitted values and residuals As we did previously in Table 7.4, let’s unpack the output of the get_regression_points() function for our model for credit card balance for all 400 card holders in the dataset. Recall that each card holder corresponds to one of the 400 rows in the Credit data frame and also for one of the 400 3D points in the 3D scatterplots in Subsection 7.1.1. regression_points &lt;- get_regression_points(Balance_model) regression_points Table 7.4: Regression points (first 5 rows of 400) ID Balance Limit Income Balance_hat residual 1 333 3606 14.9 454 -120.8 2 903 6645 106.0 559 344.3 3 580 7075 104.6 683 -103.4 4 964 9504 148.9 986 -21.7 5 331 4897 55.9 481 -150.0 Recall the format of the output: Balance corresponds to \\(y\\) (the observed value) Balance_hat corresponds to \\(\\widehat{y}\\) (the fitted value) residual corresponds to \\(y - \\widehat{y}\\) (the residual) 7.1.4 Residual analysis Recall in Section 6.1.4, our first residual analysis plot investigated the presence of any systematic pattern in the residuals when we had a single numerical predictor: bty_age. For the Credit card dataset, since we have two numerical predictors, Limit and Income, we must perform this twice: ggplot(regression_points, aes(x = Limit, y = residual)) + geom_point() + labs(x = &quot;Credit limit (in $)&quot;, y = &quot;Residual&quot;, title = &quot;Residuals vs credit limit&quot;) ggplot(regression_points, aes(x = Income, y = residual)) + geom_point() + labs(x = &quot;Income (in $1000)&quot;, y = &quot;Residual&quot;, title = &quot;Residuals vs income&quot;) Figure 7.2: Residuals vs credit limit and income In this case, there does appear to be a systematic pattern to the residuals. As the scatter of the residuals around the line \\(y=0\\) is definitely not consistent. This behavior of the residuals is further evidenced by the histogram of residuals in Figure 7.3. We observe that the residuals have a slight right-skew (recall we say that data is right-skewed, or positively-skewed, if there is a tail to the right). Ideally, these residuals should be bell-shaped around a residual value of 0. ggplot(regression_points, aes(x = residual)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) Figure 7.3: Relationship between credit card balance and credit limit/income Another way to interpret this histogram is that since the residual is computed as \\(y - \\widehat{y}\\) = balance - balance_hat, we have some values where the fitted value \\(\\widehat{y}\\) is very much lower than the observed value \\(y\\). In other words, we are underestimating certain credit card holders’ balances by a very large amount. Learning check (LC7.3) Continuing with our regression using Rating and Age as the explanatory variables and credit card Balance as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 400 credit card holders. Perform a residual analysis and look for any systematic patterns in the residuals. 7.2 One numerical &amp; one categorical explanatory variable Let’s revisit the instructor evaluation data introduced in Section 6.1, where we studied the relationship between instructor evaluation scores and their beauty scores. This analysis suggested that there is a positive relationship between bty_avg and score, in other words as instructors had higher beauty scores, they also tended to have higher teaching evaluation scores. Now let’s say instead of bty_avg we are interested in the numerical explanatory variable \\(x_1\\) age and furthermore we want to use a second explanatory variable \\(x_2\\), the (binary) categorical variable gender. Note: This study only focused on the gender binary of &quot;male&quot; or &quot;female&quot; when the data was collected and analyzed years ago. It has been tradition to use gender as an “easy” binary variable in the past in statistical analyses. We have chosen to include it here because of the interesting results of the study, but we also understand that a segment of the population is not included in this dichotomous assignment of gender and we advocate for more inclusion in future studies to show representation of groups that do not identify with the gender binary. We now resume our analyses using this evals data and hope that others find these results interesting and worth further exploration. Our modeling scenario now becomes A numerical outcome variable \\(y\\). As before, instructor evaluation score. Two explanatory variables: A numerical explanatory variable \\(x_1\\): in this case, their age. A categorical explanatory variable \\(x_2\\): in this case, their binary gender. 7.2.1 Exploratory data analysis Let’s reload the evals data and select() only the needed subset of variables. Note that these are different than the variables chosen in Chapter 6. Let’s given this the name evals_ch7. evals_ch7 &lt;- evals %&gt;% select(score, age, gender) Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function, although in Table 7.5 we only show 5 randomly selected instructors out of 463: View(evals_ch7) Table 7.5: Random sample of 5 instructors score age gender 3.6 34 male 4.9 43 male 3.3 47 male 4.4 33 female 4.7 60 male Let’s look at some summary statistics using the skim() function from the skimr package: evals_ch7 %&gt;% skim() Skim summary statistics n obs: 463 n variables: 3 ── Variable type:factor ────── variable missing complete n n_unique top_counts ordered gender 0 463 463 2 mal: 268, fem: 195, NA: 0 FALSE ── Variable type:integer ───── variable missing complete n mean sd p0 p25 p50 p75 p100 hist age 0 463 463 48.37 9.8 29 42 48 57 73 ▅▅▅▇▅▇▂▁ ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 hist score 0 463 463 4.17 0.54 2.3 3.8 4.3 4.6 5 ▁▁▂▃▅▇▇▆ Furthermore, let’s compute the correlation between two numerical variables we have score and age. Recall that correlation coefficients only exist between numerical variables. We observe that they are weakly negatively correlated. evals_ch7 %&gt;% get_correlation(formula = score ~ age) # A tibble: 1 x 1 correlation &lt;dbl&gt; 1 -0.107 In Figure 7.4, we plot a scatterplot of score over age. Given that gender is a binary categorical variable in this study, we can make some interesting tweaks: We can assign a color to points from each of the two levels of gender: female and male. Furthermore, the geom_smooth(method = &quot;lm&quot;, se = FALSE) layer automatically fits a different regression line for each since we have provided color = gender at the top level in ggplot(). This allows for all geom_etries that follow to have the same mapping of aes()thetics to variables throughout the plot. ggplot(evals_ch7, aes(x = age, y = score, color = gender)) + geom_jitter() + labs(x = &quot;Age&quot;, y = &quot;Teaching Score&quot;, color = &quot;Gender&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) Figure 7.4: Instructor evaluation scores at UT Austin split by gender (jittered) We notice some interesting trends: There are almost no women faculty over the age of 60. We can see this by the lack of red dots above 60. Fitting separate regression lines for men and women, we see they have different slopes. We see that the associated effect of increasing age seems to be much harsher for women than men. In other words, as women age, the drop in their teaching score appears to be faster. 7.2.2 Multiple regression: Parallel slopes model Much like we started to consider multiple explanatory variables using the + sign in Subsection 7.1.2, let’s fit a regression model and get the regression table. This time we provide the name of score_model_2 to our regression model fit, in so as to not overwrite the model score_model from Section 6.1.2. score_model_2 &lt;- lm(score ~ age + gender, data = evals_ch7) get_regression_table(score_model_2) Table 7.6: Regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 4.484 0.125 35.79 0.000 4.238 4.730 age -0.009 0.003 -3.28 0.001 -0.014 -0.003 gendermale 0.191 0.052 3.63 0.000 0.087 0.294 The modeling equation for this scenario is: \\[ \\begin{align} \\widehat{y} &amp;= b_0 + b_1 \\cdot x_1 + b_2 \\cdot x_2 \\\\ \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ \\end{align} \\] where \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) is an indicator function for sex == male. In other words, \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals one if the current observation corresponds to a male professor, and 0 if the current observation corresponds to a female professor. This model can be visualized in Figure 7.5. Figure 7.5: Instructor evaluation scores at UT Austin by gender: same slope We see that: Females are treated as the baseline for comparison for no other reason than “female” is alphabetically earlier than “male.” The \\(b_{male} = 0.1906\\) is the vertical “bump” that men get in their teaching evaluation scores. Or more precisely, it is the average difference in teaching score that men get relative to the baseline of women. Accordingly, the intercepts (which in this case make no sense since no instructor can have an age of 0) are : for women: \\(b_0\\) = 4.484 for men: \\(b_0 + b_{male}\\) = 4.484 + 0.191 = 4.675 Both men and women have the same slope. In other words, in this model the associated effect of age is the same for men and women. So for every increase of one year in age, there is on average an associated change of \\(b_{age}\\) = -0.009 (a decrease) in teaching score. But wait, why is Figure 7.5 different than Figure 7.4! What is going on? What we have in the original plot is known as an interaction effect between age and gender. Focusing on fitting a model for each of men and women, we see that the resulting regression lines are different. Thus, gender appears to interact in different ways for men and women with the different values of age. 7.2.3 Multiple regression: Interaction model We say a model has an interaction effect if the associated effect of one variable depends on the value of another variable. These types of models usually prove to be tricky to view on first glance because of their complexity. In this case, the effect of age will depend on the value of gender. Put differently, the effect of age on teaching scores will differ for men and for women, as was suggested by the different slopes for men and women in our visual exploratory data analysis in Figure 7.4. Let’s fit a regression with an interaction term. Instead of using the + sign in the enumeration of explanatory variables, we use the * sign. Let’s fit this regression and save it in score_model_3, then we get the regression table using the get_regression_table() function as before. score_model_interaction &lt;- lm(score ~ age * gender, data = evals_ch7) get_regression_table(score_model_interaction) Table 7.7: Regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 4.883 0.205 23.80 0.000 4.480 5.286 age -0.018 0.004 -3.92 0.000 -0.026 -0.009 gendermale -0.446 0.265 -1.68 0.094 -0.968 0.076 age:gendermale 0.014 0.006 2.45 0.015 0.003 0.024 The modeling equation for this scenario is: \\[ \\begin{align} \\widehat{y} &amp;= b_0 + b_1 \\cdot x_1 + b_2 \\cdot x_2 + b_3 \\cdot x_1 \\cdot x_2\\\\ \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ \\end{align} \\] Oof, that’s a lot of rows in the regression table output and a lot of terms in the model equation. The fourth term being added on the right hand side of the equation corresponds to the interaction term. Let’s simplify things by considering men and women separately. First, recall that \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals 1 if a particular observation (or row in evals_ch7) corresponds to a male instructor. In this case, using the values from the regression table the fitted value of \\(\\widehat{\\mbox{score}}\\) is: \\[ \\begin{align} \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot 1 + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot 1 \\\\ &amp;= \\left(b_0 + b_{\\mbox{male}}\\right) + \\left(b_{\\mbox{age}} + b_{\\mbox{age,male}} \\right) \\cdot \\mbox{age} \\\\ &amp;= \\left(4.883 + -0.446\\right) + \\left(-0.018 + 0.014 \\right) \\cdot \\mbox{age} \\\\ &amp;= 4.437 -0.004 \\cdot \\mbox{age} \\end{align} \\] Second, recall that \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals 0 if a particular observation corresponds to a female instructor. Again, using the values from the regression table the fitted value of \\(\\widehat{\\mbox{score}}\\) is: \\[ \\begin{align} \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot 0 + b_{\\mbox{age,male}}\\mbox{age} \\cdot 0 \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age}\\\\ &amp;= 4.883 -0.018 \\cdot \\mbox{age} \\end{align} \\] Let’s summarize these values in a table: Table 7.8: Comparison of male and female intercepts and age slopes Gender Intercept Slope for age Male instructors 4.44 -0.004 Female instructors 4.88 -0.018 We see that while male instructors have a lower intercept, as they age, they have a less steep associated average decrease in teaching scores: 0.004 teaching score units per year as opposed to -0.018 for women. This is consistent with the different slopes and intercepts of the red and blue regression lines fit in Figure 7.4. Recall our definition of a model having an interaction effect: when the associated effect of one variable, in this case age, depends on the value of another variable, in this case gender. But how do we know when it’s appropriate to include an interaction effect? For example, which is the more appropriate model? The regular multiple regression model without an interaction term we saw in Section 7.2.2 or the multiple regression model with the interaction term we just saw? We’ll revisit this question in Chapter 11 on “inference for regression.” 7.2.4 Observed/fitted values and residuals Now say we want to apply the above calculations for male and female instructors for all 463 instructors in the evals_ch7 dataset. As our multiple regression models get more and more complex, computing such values by hand gets more and more tedious. The get_regression_points() function spares us this tedium and returns all fitted values and all residuals. For simplicity, let’s focus only on the fitted interaction model, which is saved in score_model_interaction. regression_points &lt;- get_regression_points(score_model_interaction) regression_points Table 7.9: Regression points (first 5 rows of 463) ID score age gender score_hat residual 1 4.7 36 female 4.25 0.448 2 4.1 36 female 4.25 -0.152 3 3.9 36 female 4.25 -0.352 4 4.8 36 female 4.25 0.548 5 4.6 59 male 4.20 0.399 Recall the format of the output: score corresponds to \\(y\\) the observed value score_hat corresponds to \\(\\widehat{y} = \\widehat{\\mbox{score}}\\) the fitted value residual corresponds to the residual \\(y - \\widehat{y}\\) 7.2.5 Residual analysis As always, let’s perform a residual analysis first with a histogram, which we can facet by gender: ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 0.25, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) + facet_wrap(~gender) Figure 7.6: Interaction model histogram of residuals Second, the residuals as compared to the predictor variables: \\(x_1\\): numerical explanatory/predictor variable of age \\(x_2\\): categorical explanatory/predictor variable of gender ggplot(regression_points, aes(x = age, y = residual)) + geom_point() + labs(x = &quot;age&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;, size = 1) + facet_wrap(~ gender) Figure 7.7: Interaction model residuals vs predictor 7.3 Related topics 7.3.1 More on the correlation coefficient Recall from Table 7.2 that we saw the correlation coefficient between Income in thousands of dollars and credit card Balance was 0.464. What if in instead we looked at the correlation coefficient between Income and credit card Balance, but where Income was in dollars and not thousands of dollars? This can be done by multiplying Income by 1000. library(ISLR) data(Credit) Credit %&gt;% select(Balance, Income) %&gt;% mutate(Income = Income * 1000) %&gt;% cor() Table 7.10: Correlation between income (in $) and credit card balance Balance Income Balance 1.000 0.464 Income 0.464 1.000 We see it is the same! We say that the correlation coefficient is invariant to linear transformations! In other words, the correlation between \\(x\\) and \\(y\\) will be the same as the correlation between \\(a\\times x + b\\) and \\(y\\) where \\(a\\) and \\(b\\) are numerical values (real numbers in mathematical terms). 7.3.2 Simpson’s Paradox Recall in Section 7.1, we saw the two following seemingly contradictory results when studying the relationship between credit card balance, credit limit, and income. On the one hand, the right hand plot of Figure 7.1 suggested that credit card balance and income were positively related: Figure 7.8: Relationship between credit card balance and credit limit/income On the other hand, the multiple regression in Table 7.3, suggested that when modeling credit card balance as a function of both credit limit and income at the same time, credit limit has a negative relationship with balance, as evidenced by the slope of -7.66. How can this be? First, let’s dive a little deeper into the explanatory variable Limit. Figure 7.9 shows a histogram of all 400 values of Limit, along with vertical red lines that cut up the data into quartiles, meaning: 25% of credit limits were between $0 and $3088. Let’s call this the “low” credit limit bracket. 25% of credit limits were between $3088 and $4622. Let’s call this the “medium-low” credit limit bracket. 25% of credit limits were between $4622 and $5873. Let’s call this the “medium-high” credit limit bracket. 25% of credit limits were over $5873. Let’s call this the “high” credit limit bracket. Figure 7.9: Histogram of credit limits and quartiles Let’s now display The scatterplot showing the relationship between credit card balance and limit (the right-hand plot of Figure 7.1). The scatterplot showing the relationship between credit card balance and limit now with a color aesthetic added corresponding to the credit limit bracket. Figure 7.10: Relationship between credit card balance and income for different credit limit brackets In the right-hand plot, the Red points (bottom-left) correspond to the low credit limit bracket. Green points correspond to the medium-low credit limit bracket. Blue points correspond to the medium-high credit limit bracket. Purple points (top-right) correspond to the high credit limit bracket. The left-hand plot focuses of the relationship between balance and income in aggregate, but the right-hand plot focuses on the relationship between balance and income broken down by credit limit bracket. Whereas in aggregate there is an overall positive relationship, when broken down we now see that for the low (red points), medium-low (green points), and medium-high (blue points) income bracket groups, the strong positive relationship between credit card balance and income disappears! Only for the high bracket does the relationship stay somewhat positive. In this example, credit limit is a confounding variable for credit card balance and income. 7.4 Conclusion 7.4.1 What’s to come? Congratulations! We’re ready to proceed to the third portion of this book: “statistical inference” using a new package called infer. Once we’ve covered Chapters 8 on sampling, 9 on confidence intervals, and 10 on hypothesis testing, we’ll come back to the models we’ve seen in “data modeling” in Chapter 11 on inference for regression. As we said at the end of Chapter 6, we’ll see why we’ve been conducting the residual analyses from Subsections 7.1.4 and 7.2.5. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, conf_low and conf_high (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Up next: 7.4.2 Script of R code An R script file of all R code used in this chapter is available here. "],
+["8-sampling.html", "8 Sampling 8.1 Introduction to sampling 8.2 Tactile sampling simulation 8.3 Virtual sampling simulation 8.4 In real-life sampling: Polls 8.5 Conclusion", " 8 Sampling In this chapter we kick off the third segment of this book, statistical inference, by learning about sampling. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we’ll cover in Chapters 9 and 10 respectively. We will see that the tools that you learned in the data science segment of this book (data visualization, “tidy” data format, and data wrangling) will also play an important role here in the development of your understanding. As mentioned before, the concepts throughout this text all build into a culmination allowing you to “think with data.” Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(moderndive) 8.1 Introduction to sampling Let’s kick off this chapter immediately with an exercise that involves sampling. Imagine you are given a large bowl with 2400 balls that are either red or white. We are interested in the proportion of balls in this bowl that are red, but you don’t have the time to do an exhaustive count. You are also given a “shovel” that you can insert into this bowl… Figure 8.1: A bowl with 2400 balls … and extract a sample of 50 balls: Figure 8.2: A shovel used to extract a sample of size n = 50 Concepts related to sampling Let’s now define some concepts and terminology important to understand sampling, being sure to tie things back to the above example. You might have to read this a couple times more as you progress throughout this book, as they are very deeply layered concepts. However as we’ll soon see, they are very powerful concepts that open up a whole new world of scientific thinking: Population: The population is a set of \\(N\\) observations of interest. Above Ex: Our bowl consisting of \\(N=2400\\) identically-shaped balls. Population parameter: A population parameter is a numerical summary value about the population. In most settings, this is a value that’s unknown and you wish you knew it. Above Ex: The true population proportion \\(p\\) of the balls in the bowl that are red. In this scenario the parameter of interest is the proportion, but in others it could be numerical summary values like the mean, median, etc. Census: An exhaustive enumeration/counting of all observations in the population in order to compute the population parameter’s numerical value. exactly Above Ex: This corresponds to manually going over all \\(N=2400\\) balls and counting the number that are red, thereby allowing us to compute the population proportion \\(p\\) of the balls that are red exactly. When \\(N\\) is small, a census is feasible. However, when \\(N\\) is large, a census can get very expensive, either in terms of time, energy, or money. Ex: the Decennial United States census attempts to exhaustively count the US population. Consequently it is a very expensive, but necessary, procedure. Sampling: Collecting a sample of size \\(n\\) of observations from the population. Typically the sample size \\(n\\) is much smaller than the population size \\(N\\), thereby making sampling a much cheaper procedure than a census. Above Ex: Using the shovel to extract a sample of \\(n=50\\) balls. It is important to remember that the lowercase \\(n\\) corresponds to the sample size and uppercase \\(N\\) corresponds to the population size, thus \\(n \\leq N\\). Point estimates/sample statistics: A summary statistic based on the sample of size \\(n\\) that estimates the unknown population parameter. Above Ex: it’s the sample proportion \\(\\widehat{p}\\) red of the balls in the sample of size \\(n=50\\). Key: The sample proportion red \\(\\widehat{p}\\) is an estimate of the true unknown population proportion red \\(p\\). Representative sampling: A sample is said be a representative sample if it “looks like the population”. In other words, the sample’s characteristics are a good representation of the population’s characteristics. Above Ex: Does our sample of \\(n=50\\) balls “look like” the contents of the larger set of \\(N=2400\\) balls in the bowl? Generalizability: We say a sample is generalizable if any results of based on the sample can generalize to the population. Above Ex: Is \\(\\widehat{p}\\) a “good guess” of \\(p\\)? In other words, can we infer about the true proportion of the balls in the bowl that are red, based on the results of our sample of \\(n=50\\) balls? Bias: In a statistical sense, we say bias occurs if certain observations in a population have a higher chance of being sampled than others. We say a sampling procedure is unbiased if every observation in a population had an equal chance of being sampled. Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? we feel since the balls are all of the same size, there isn’t any bias in the sampling. If, say, the red balls had a much larger diameter than the red ones. You might have have a higher or lower probability of now sampling red balls. Random sampling: We say a sampling procedure is random if we sample randomly from the population in an unbiased fashion. Above Ex: As long as you mixed the bowl sufficiently before sampling, your samples of size \\(n=50\\) balls would be random. Inference via sampling Why did we go through the trouble of enumerating all the above concepts and terminology? The moral of the story: If the sampling of a sample of size \\(n\\) is done at random, then The sample is unbiased and representative of the population, thus Any result based on the sample can generalize to the population, thus The point estimate/sample statistic is a “good guess” of the unknown population parameter of interest and thus we have inferred about the population based on our sample. In the above example: If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size \\(n=50\\), then The contents of the shovel will “look like” the contents of the bowl, thus Any results based on the sample of \\(n=50\\) balls can generalize to the large bowl of \\(N=2400\\) balls, thus The sample proportion \\(\\widehat{p}\\) of the \\(n=50\\) balls in the shovel that are red is a “good guess” of the true population proportion \\(p\\) of the \\(N=2400\\) balls that are red. and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel. At this point, you might be saying to yourself: “Big deal, why do we care about this bowl?” As hopefully you’ll soon come to appreciate, this sampling bowl exercise is merely a simulation representing the reality of many important sampling scenarios in a simplified and accessible setting. One in particular sampling scenario is familiar to many: polling. Whether for market research or for political purposes, polls inform much of the world’s decision and opinion making, and understanding the mechanism behind them can better inform you statistical citizenship. We’ll tie-in everything we learn in this chapter with an example relating to a 2013 poll on President Obama’s approval ratings among young adult in Section 8.4. 8.2 Tactile sampling simulation Let’s start by revisiting our tactile sampling illustrating with “sampling bowl” in Figures 8.1 and 8.2. By tactile we mean with your hands and to the touch. We’ll break down the act of tactile sampling from the bowl with the shovel using our newly acquired concepts and terminology relating to sampling. In particular we’ll study how sampling variability affects outcomes, which we’ll illustrate through simulations of repeated sampling. To this end, we’ll be using both the above-mentioned tactile simulation, but also using virtual simulation. By virtual we mean on the computer. 8.2.1 Using shovel once Let’s now view our shovel through the lens of sampling with the following 3-step tactile sampling simulation: Step 1: Use the shovel to take a sample of size \\(n=50\\) balls from the bowl as seen in Fig 8.3. Figure 8.3: Step 1: Take sample of size \\(n=50\\) Step 2: Pour them into a cup and Count the number that are red then Compute the sample proportion \\(\\widehat{p}\\) of the \\(n=50\\) balls that are red as seen in Figure 8.4 below. Note from above there are 18 balls out of \\(n=50\\) that are red. Thus the sample proportion red \\(\\widehat{p}\\) for this particular sample is thus \\(\\widehat{p} = 18 / 50 = 0.36\\). Figure 8.4: Step 2: Pour into Red Solo Cup and compute \\(\\widehat{p}\\) Step 3: Mark the sample proportion \\(\\widehat{p}\\) in a hand-drawn histogram, just like our intrepid students are doing in Figure 8.5. Figure 8.5: Step 3: Mark \\(\\widehat{p}\\)’s in histogram Repeat Steps 1-3 a few times: After a few groups of students complete this exercise, let’s draw the resulting histogram by hand. In Figure 8.6 we have the resulting hand-drawn histogram for 10 groups of students. Figure 8.6: Step 3: Histogram of 10 values of \\(\\widehat{p}\\) Observe the behavior of the 10 different values of the sample proportion \\(\\widehat{p}\\) in the histogram of their distribution, in particular where the values center and how much they spread out, in other words how much they vary. Note: The lowest value of \\(\\widehat{p}\\) was somewhere between 0.20 and 0.25. The highest value of \\(\\widehat{p}\\) was somewhere between 0.45 and 0.50. Five of the sample proportions \\(\\widehat{p}\\) cluster. Five different samples of size \\(n=50\\) yielded sample proportions \\(\\widehat{p}\\) that were in the range 0.30 to 0.35. Let’s now look at some real-life outcomes of this tactile sampling simulation. We present the actual results for not 10 groups of students, but 33 groups of students below! 8.2.2 Using shovel 33 times All told, 33 groups took samples. In other words, the shovel was used 33 times and 33 values of the sample proportion \\(\\widehat{p}\\) were computed; this data is saved in the tactile_prop_red data frame included in the moderndive package. Let’s display its contents in Table ??. Notice how the replicate column enumerates each of the 33 groups, red_balls is the count of balls in the sample of size \\(n=50\\) that we red, and prop_red is the sample proportion \\(\\widehat{p}\\) that are red. tactile_prop_red View(tactile_prop_red) group replicate red_balls prop_red Ilyas, Yohan 1 21 0.42 Morgan, Terrance 2 17 0.34 Martin, Thomas 3 21 0.42 Clark, Frank 4 21 0.42 Riddhi, Karina 5 18 0.36 Andrew, Tyler 6 19 0.38 Julia 7 19 0.38 Rachel, Lauren 8 11 0.22 Daniel, Caroline 9 15 0.30 Josh, Maeve 10 17 0.34 Emily, Emily 11 16 0.32 Conrad, Emily 12 18 0.36 Oliver, Erik 13 17 0.34 Isabel, Nam 14 21 0.42 X, Claire 15 15 0.30 Cindy, Kimberly 16 20 0.40 Kevin, James 17 11 0.22 Nam, Isabelle 18 21 0.42 Harry, Yuko 19 15 0.30 Yuki, Eileen 20 16 0.32 Ramses 21 23 0.46 Joshua, Elizabeth, Stanley 22 15 0.30 Siobhan, Jane 23 18 0.36 Jack, Will 24 16 0.32 Caroline, Katie 25 21 0.42 Griffin, Y 26 18 0.36 Kaitlin, Jordan 27 17 0.34 Ella, Garrett 28 18 0.36 Julie, Hailin 29 15 0.30 Katie, Caroline 30 21 0.42 Mallory, Damani, Melissa 31 21 0.42 Katie 32 16 0.32 Francis, Vignesh 33 19 0.38 Using your data visualization skills that you honed in Chapter 3, let’s visualize the distribution of these 33 sample proportions red \\(\\widehat{p}\\) using a histogram with binwidth = 0.05. This visualization is appropriate since prop_red is a numerical variable. This histogram is showing a very particular important type of distribution in statistics: the sampling distribution. ggplot(tactile_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, color = &quot;white&quot;) + labs(x = &quot;Sample proportion red based on n = 50&quot;, title = &quot;Sampling distribution of p-hat&quot;) Figure 8.7: Sampling distribution of 33 sample proportions based on 33 tactile samples with n=50 Sampling distributions are a specific kind of distribution: distributions of point estimates/sample statistics based on samples of size \\(n\\) used to estimate an unknown population parameter. In the case of the histogram in Figure 8.7, its the distribution of the sample proportion red \\(\\widehat{p}\\) based on \\(n=50\\) sampled balls from the bowl, for which we want to estimate the unknown population proportion \\(p\\) of the \\(N=2400\\) balls that are red. Sampling distributions describe how values of the sample proportion red \\(\\widehat{p}\\) will vary from sample to sample due to sampling variability and thus identify “typical” and “atypical” values of \\(\\widehat{p}\\). For example Obtaining a sample that yields \\(\\widehat{p} = 0.36\\) would be considered typical, common, and plausible since it would in theory occur frequently. Obtaining a sample that yields \\(\\widehat{p} = 0.8\\) would be considered atypical, uncommon, and implausible since it lies far away from most of the distribution. Let’s now ask ourselves the following questions: Where is the sampling distribution centered? What is the spread of this sampling distribution? Recall from Section 5.4 the mean and the standard deviation are two summary statistics that would answer this question: tactile_prop_red %&gt;% summarize(mean = mean(prop_red), sd = sd(prop_red)) mean sd 0.356 0.058 Finally, it’s important to keep in mind: If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red \\(p\\), or in other words the true number of balls out of 2400 that are red. The spread of this histogram, as quantified by the standard deviation of 0.058, is called the standard error. It quantifies the variability of our estimates for \\(\\widehat{p}\\). Note: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors. 8.3 Virtual sampling simulation Now let’s mimic the above tactile sampling, but with virtual sampling. We’ll resort to virtual sampling because while collecting 33 tactile samples manually is feasible, for large numbers like 1000, things start getting tiresome! That’s where a computer can really help: computers excel at performing mundane tasks repeatedly; think of what accounting software must be like! In other words: Instead of considering the tactile bowl shown in Figure 8.1 above and using a tactile shovel to draw samples of size \\(n=50\\) Let’s use a virtual bowl saved in a computer and use R’s random number generator as a virtual shovel to draw samples of size \\(n=50\\) First, we describe our virtual bowl. In the moderndive package, we’ve included a data frame called bowl that has 2400 rows corresponding to the \\(N=2400\\) balls in the physical bowl. Run View(bowl) in RStudio to convince yourselves that bowl is indeed a virtual version of the tactile bowl in the previous section. bowl # A tibble: 2,400 x 2 ball_ID color &lt;int&gt; &lt;chr&gt; 1 1 white 2 2 white 3 3 white 4 4 red 5 5 white 6 6 white 7 7 red 8 8 white 9 9 red 10 10 white # … with 2,390 more rows Note that the balls are not actually marked with numbers; the variable ball_ID is merely used as an identification variable for each row of bowl. Recall our previous discussion on identification variables in Subsection 4.2.2 in the “Data Tidying” Chapter 4. Next, we describe our virtual shovel: the rep_sample_n() function included in the moderndive package where rep_sample_n() indicates that we are taking repeated/replicated samples of size \\(n\\). 8.3.1 Using shovel once The rep_sample_n() function included in the moderndive package where rep_sample_n() indicates that we are taking repeated/replicated samples of size \\(n\\). Let’s perform the virtual analogue of tactilely inserting the shovel only once into the bowl and extracting a sample of size \\(n=50\\). In the table below we only show results about the first 10 sampled balls out of 50. virtual_shovel &lt;- bowl %&gt;% rep_sample_n(size = 50) View(virtual_shovel) Table 8.1: First 10 sampled balls of 50 in virtual sample replicate ball_ID color 1 2079 red 1 1076 white 1 1691 red 1 1687 red 1 1434 white 1 954 white 1 483 white 1 1520 white 1 2060 red 1 1682 white Looking at all 50 rows of virtual_shovel in the spreadsheet viewer that pops up after running View(virtual_shovel) in RStudio, the ball_ID variable seems to suggest that we do indeed have a random sample of \\(n=50\\) balls. However, what does the replicate variable indicate, where in this case it’s equal to 1 for all 50 rows? We’ll see in a minute. First let’s compute both the number of balls red and the proportion red out of \\(n=50\\) using our dplyr data wrangling tools from Chapter 5: virtual_shovel %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) Table 8.2: Count and proportion red in single virtual sample of size n = 50 replicate red prop_red 1 23 0.46 Why does this work? Because for every row where color == &quot;red&quot;, the Boolean TRUE is returned and R treats TRUE like the number 1. Equivalently, for every row where color is not equal to &quot;red&quot;, the Boolean FALSE is returned and R treats FALSE like the number 0. So summing the number of TRUE’s and FALSE’s is equivalent to summing 1’s and 0’s which counts the number of balls where color is red. 8.3.2 Using shovel 33 times Recall however in our tactile sampling exercise in Section 8.2 above that we had 33 groups of students take 33 samples total of size \\(n=50\\) using the shovel 33 times and hence compute 33 separate values of the sample proportion red \\(\\widehat{p}\\). In other words we repeated/replicated the sampling 33 times. We can achieve this by reusing the same rep_sample_n() function code above, but by adding the reps = 33 argument indicating we want to repeat this sampling 33 times: virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 33) View(virtual_samples) virtual_samples has \\(50 \\times 33 = 1650\\) rows, corresponding to 33 samples of size \\(n=50\\), or 33 draws from the shovel. We won’t display the contents of this data frame but leave it to you to View() this data frame. You’ll see that the first 50 rows have replicate equal to 1, then the next 50 rows have replicate equal to 2, and so on and so forth, up until the last 50 rows which have replicate equal to 33. The replicate variable denotes which of our 33 samples a particular ball is included in. Now let’s compute the 33 corresponding values of the sample proportion \\(\\widehat{p}\\) based on 33 different samples of size \\(n=50\\) by reusing the previous code, but remembering to group_by the replicate variable first since we want to compute the sample proportion for each of the 33 samples separately. Notice the similarity of this table with Table ??. virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) View(virtual_prop_red) replicate red prop_red 1 17 0.34 2 20 0.40 3 24 0.48 4 20 0.40 5 17 0.34 6 16 0.32 7 17 0.34 8 19 0.38 9 19 0.38 10 12 0.24 11 22 0.44 12 17 0.34 13 20 0.40 14 22 0.44 15 13 0.26 16 15 0.30 17 23 0.46 18 20 0.40 19 16 0.32 20 12 0.24 21 14 0.28 22 21 0.42 23 14 0.28 24 18 0.36 25 19 0.38 26 12 0.24 27 22 0.44 28 23 0.46 29 19 0.38 30 18 0.36 31 20 0.40 32 17 0.34 33 20 0.40 Just as we did before, let’s now visualize the sampling distribution using a histogram with binwidth = 0.05 of the 33 virtually sample proportions \\(\\widehat{p}\\): ggplot(virtual_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, color = &quot;white&quot;) + labs(x = &quot;Sample proportion red based on n = 50&quot;, title = &quot;Sampling distribution of p-hat&quot;) Figure 8.8: Sampling distribution of 33 sample proportions based on 33 virtual samples with n=50 The resulting sampling distribution based on our virtual sampling simulation is near identical to the sampling distribution of our tactile sampling simulation from Section 8.3. Let’s compare them side-by-side in Figure 8.9. Figure 8.9: Comparison of sampling distributions based on 33 tactile &amp; virtual samples with n=50 We see that they are similar in terms of center and spread, although not identical due to random variation. This was in fact by design, as we made the virtual contents of the virtual bowl match the actual contents of the actual bowl pictured above. 8.3.3 Using shovel 1000 times In Figure 8.8, we can start seeing a pattern in the sampling distribution emerge. However, 33 values of the sample proportion \\(\\widehat{p}\\) might not be enough to get a true sense of the distribution. Using 1000 values of \\(\\widehat{p}\\) would definitely give a better sense. What are our two options for constructing these histograms? Tactile sampling: Make the 33 groups of students take \\(1000 / 33 \\approx 31\\) samples of size \\(n=50\\) each, count the number of red balls for each of the 1000 tactile samples, and then compute the 1000 corresponding values of the sample proportion \\(\\widehat{p}\\). However, this would be cruel and unusual as this would take hours! Virtual sampling: Computers are very good at automating repetitive tasks such as this one. This is the way to go! First, generate 1000 samples of size \\(n=50\\) virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 1000) View(virtual_samples) Then for each of these 1000 samples of size \\(n=50\\), compute the corresponding sample proportions virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) View(virtual_prop_red) As previously done, let’s plot the sampling distribution of these 1000 simulated values of the sample proportion red \\(\\widehat{p}\\) with a histogram in Figure 8.10. ggplot(virtual_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, color = &quot;white&quot;) + labs(x = &quot;Sample proportion red based on n = 50&quot;, title = &quot;Sampling distribution of p-hat&quot;) Figure 8.10: Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50 Since the sampling is random and thus representative and unbiased, the above sampling distribution is centered at the true population proportion red \\(p\\) of all \\(N=2400\\) balls in the bowl. Eyeballing it, the sampling distribution appears to be centered at around 0.375. What is the standard error of the above sampling distribution of \\(\\widehat{p}\\) based on 1000 samples of size \\(n=50\\)? virtual_prop_red %&gt;% summarize(SE = sd(prop_red)) # A tibble: 1 x 1 SE &lt;dbl&gt; 1 0.0698 What this value is saying might not be immediately apparent by itself to someone who is new to sampling. It’s best to first compare different standard errors for different sampling schemes based on different sample sizes \\(n\\). We’ll do so for samples of size \\(n=25\\), \\(n=50\\), and \\(n=100\\) next. 8.3.4 Using different shovels Recall, the sampling we just did on the computer using the rep_sample_n() function is simply a virtual version of act of taking a tactile sample using the shovel with \\(n=50\\) slots seen in Figure 8.11. We visualized the variation in the resulting sample proportion red \\(\\widehat{p}\\) in a histogram of the sampling distribution and quantified this variation using the standard error. Figure 8.11: Tactile shovel for sampling n = 50 balls But what if we changed the sample size to \\(n=25\\)? This would correspond to sampling using the shovel with \\(n=25\\) slots see in Figure 8.12. What differences if any would you notice about the sampling distribution and the standard error? Figure 8.12: Tactile shovel for sampling n = 25 balls Furthermore what if we took samples of size \\(n=100\\) as well? This would correspond to sampling using the shovel with \\(n=100\\) slots see in Figure 8.13. What differences if any would you notice about the sampling distribution and the standard error for \\(n=100\\) as compared to \\(n=50\\) and \\(n=25\\)? Figure 8.13: Tactile shovel for sampling n = 100 balls Let’s take the opportunity to review our sampling procedure and do this for 1000 virtual samples of size \\(n=25\\), \\(n=50\\), \\(n=100\\) each. Shovel with \\(n=50\\) slots: Take 1000 virtual samples of size \\(n=50\\), mimicking the act of taking 1000 tactile samples using the shovel with \\(n=50\\) slots: virtual_samples_50 &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 1000) Then based on each of these 1000 virtual samples of size \\(n=50\\), compute the corresponding 1000 sample proportions \\(\\widehat{p}\\) being sure to divide by 50: virtual_prop_red_50 &lt;- virtual_samples_50 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) The standard error is the standard deviation of the 1000 sample proportions \\(\\widehat{p}\\), in other words we are quantifying how much \\(\\widehat{p}\\) varies from sample-to-sample based on samples of size \\(n=50\\) due to sampling variation. virtual_prop_red_50 %&gt;% summarize(SE = sd(prop_red)) # A tibble: 1 x 1 SE &lt;dbl&gt; 1 0.0694 Shovel with \\(n=25\\) slots: Take 1000 virtual samples of size \\(n=25\\), mimicking the act of taking 1000 tactile samples using the shovel with \\(n=25\\) slots: virtual_samples_25 &lt;- bowl %&gt;% rep_sample_n(size = 25, reps = 1000) Then based on each of these 1000 virtual samples of size \\(n=50\\), compute the corresponding 1000 sample proportions \\(\\widehat{p}\\) being sure to divide by 50: virtual_prop_red_25 &lt;- virtual_samples_25 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 25) The standard error is the standard deviation of the 1000 sample proportions \\(\\widehat{p}\\), in other words we are quantifying how much \\(\\widehat{p}\\) varies from sample-to-sample based on samples of size \\(n=25\\) due to sampling variation. virtual_prop_red_25 %&gt;% summarize(SE = sd(prop_red)) # A tibble: 1 x 1 SE &lt;dbl&gt; 1 0.100 Shovel with \\(n=100\\) slots: Take 1000 virtual samples of size \\(n=100\\), mimicking the act of taking 1000 tactile samples using the shovel with \\(n=100\\) slots: virtual_samples_100 &lt;- bowl %&gt;% rep_sample_n(size = 100, reps = 1000) Then based on each of these 1000 virtual samples of size \\(n=100\\), compute the corresponding 1000 sample proportions \\(\\widehat{p}\\) being sure to divide by 100: virtual_prop_red_100 &lt;- virtual_samples_100 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 100) The standard error is the standard deviation of the 1000 sample proportions \\(\\widehat{p}\\), in other words we are quantifying how much \\(\\widehat{p}\\) varies from sample-to-sample based on samples of size \\(n=100\\) due to sampling variation. virtual_prop_red_100 %&gt;% summarize(SE = sd(prop_red)) # A tibble: 1 x 1 SE &lt;dbl&gt; 1 0.0457 Comparison: Let’s compare the 3 standard errors we computed above in Table ??: n SE 25 0.100 50 0.069 100 0.046 Observe the behavior of the standard error as \\(n\\) increases from \\(n=25\\) to \\(n=50\\) to \\(n=100\\), the standard error get smaller. In other words, the values of \\(\\widehat{p}\\) vary less. The standard error is a numerical quantification of the spreads of the following three histograms (on the same scale) of the sampling distribution of the sample proportion \\(\\widehat{p}\\): Figure 8.14: Comparing sampling distributions of p-hat for different sample sizes n Observe that the histogram of possible \\(\\widehat{p}\\) values are narrowest and most consistent for the \\(n=100\\) case. In other words, they make less error. “Bigger sample size equals better sampling” is a concept you probably knew before reading this chapter. What we’ve just demonstrated is what this concept means: Samples based on large samples sizes will yield point estimates that vary less around the true value and hence be less prone to error. In the case of our sampling bowl, the sample proportion red \\(\\widehat{p}\\) based on samples of size \\(n=100\\) will vary the least around the true proportion \\(p\\) of the balls that are red, and thus be less prone to error. On the case of polls as we study in the next chapter: representative polls based on a larger number of respondents will yield guess that tend to be closer to the truth. 8.4 In real-life sampling: Polls In December 4, 2013 National Public Radio reported on a recent poll of President Obama’s approval rating among young Americans aged 18-29 in an article Poll: Support For Obama Among Young Americans Eroding. A quote from the article: After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama. According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama’s job performance, his lowest-ever standing among the group and an 11-point drop from April. Let’s tie elements of this story using the concepts and terminology we learned at the outset of this chapter along with our observations from the tactile and virtual sampling simulations: Population: Who is the population of \\(N\\) observations of interest? Bowl: \\(N=2400\\) identically-shaped balls Obama poll: \\(N = \\text{?}\\) young Americans aged 18-29 Population parameter: What is the population parameter? Bowl: The true population proportion \\(p\\) of the balls in the bowl that are red. Obama poll: The true population proportion \\(p\\) of young Americans who approve of Obama’s job performance. Census: What would a census be in this case? Bowl: Manually going over all \\(N=2400\\) balls and exactly computing the population proportion \\(p\\) of the balls that are red. Obama poll: Locating all \\(N = \\text{?}\\) young Americans (which is in the millions) and asking them if they approve of Obama’s job performance. This would be quite expensive to do! Sampling: How do you acquire the sample of size \\(n\\) observations? Bowl: Using the shovel to extract a sample of \\(n=50\\) balls. Obama poll: One way would be to get phone records from a database and pick out \\(n\\) phone numbers. In the case of the above poll, the sample was of size \\(n=2089\\) young adults. Point estimates/sample statistics: What is the summary statistic based on the sample of size \\(n\\) that estimates the unknown population parameter? Bowl: The sample proportion \\(\\widehat{p}\\) red of the balls in the sample of size \\(n=50\\). Key: The sample proportion red \\(\\widehat{p}\\) of young Americans in the sample of size \\(n=2089\\) that approve of Obama’s job performance. In this study’s case, \\(\\widehat{p} = 0.41\\) which is the quoted 41% figure in the article. Representative sampling: Is the sample procedure representative? In other words, to the resulting samples “look like” the population? Bowl: Does our sample of \\(n=50\\) balls “look like” the contents of the larger set of \\(N=2400\\) balls in the bowl? Obama poll: Does our sample of \\(n=2089\\) young Americans “look like” the population of all young Americans aged 18-29? Generalizability: Are the samples generalizable to the greater population? Bowl: Is \\(\\widehat{p}\\) a “good guess” of \\(p\\)? Obama poll: Is \\(\\widehat{p} = 0.41\\) a “good guess” of \\(p\\)? In other words, can we confidently say that 41% of all young Americans approve of Obama. Bias: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample? Bowl: Here, I would say it is unbiased. All balls are equally sized as evidenced by the slots of the \\(n=50\\) shovel, and thus no particular color of ball can be favored in our samples over others. Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using a database of only mobile phone numbers, would people without mobile phones be included? What about if this were an internet poll on a certain news website? Would non-readers of this this website be included? Random sampling: Was the sampling random? Bowl: As long as you mixed the bowl sufficiently before sampling, your samples would be random? Obama poll: Random sampling is a necessary assumption for all of the above to work. Most articles reporting on polls take this assumption as granted. In our Obama poll, you’d have to ask the group that conducted the poll: The Harvard University Institute of Politics. Recall the punchline of all the above: If the sampling of a sample of size \\(n\\) is done at random, then The sample is unbiased and representative of the population, thus Any result based on the sample can generalize to the population, thus The point estimate/sample statistic is a “good guess” of the unknown population parameter of interest and thus we have inferred about the population based on our sample. In the bowl example: If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size \\(n=50\\), then The contents of the shovel will “look like” the contents of the bowl, thus Any results based on the sample of \\(n=50\\) balls can generalize to the large bowl of \\(N=2400\\) balls, thus The sample proportion \\(\\widehat{p}\\) of the \\(n=50\\) sampled balls in the shovel that are red is a “good guess” of the true population proportion \\(p\\) of the \\(N=2400\\) balls that are red. and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel: the proportion of balls that are red. In the Obama poll example: If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then These 2089 young Americans would “look like” the population of all young Americans, thus Any results based on this sample of 2089 young Americans can generalize to entire population of all young Americans, thus The reported sample approval rating of 41% of these 2089 young Americans is a “good guess” of the true approval rating amongst all young Americans. So long story short, this poll’s guess of Obama’s approval rating was 41%. However is this the end of the story when understanding the results of a poll? If you read further in the article, it states: The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points. Note the term margin of error, which here is plus or minus 2.1 percentage points. This is saying that a typical range of errors for polls of this type is about \\(\\pm 2.1\\%\\), in words from about 2.1% too small to about 2.1% too big. These errors are caused by sampling variation, the same sampling variation you saw studied in the histograms in Sections 8.2 on our tactile sampling simulations and Sections 8.3 on our virtual sampling simulations. In this case of polls, any variation from the true approval rating is an “error” and a reasonable range of errors is the margin of error. We’ll see in the next chapter that this what’s known as a 95% confidence interval for the unknown approval rating. We’ll study confidence intervals using a new package for our data science and statistical toolbox: the infer package for statistical inference. 8.5 Conclusion 8.5.1 Central Limit Theorem What you did in Section 8.2 and 8.3 was demonstrate a very famous theorem, or mathematically proven truth, called the Central Limit Theorem. It loosely states that when sample means and sample proportions are based on larger and larger samples, the sampling distribution corresponding to these point estimates get More and more normal More and more narrow Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following three minute and 38 second video explaining this crucial theorem to statistics using as examples, what else? The average weight of wild bunny rabbits! The average wing span of dragons! 8.5.2 What’s to come? This chapter serves as an introduction to the theoretical underpinning of the statistical inference techniques that will be discussed in greater detail in Chapter 9 for confidence intervals and Chapter 10 for hypothesis testing. 8.5.3 Script of R code An R script file of all R code used in this chapter is available here. "],
+["9-confidence-intervals.html", "9 Confidence Intervals 9.1 Bootstrapping 9.2 The infer package for statistical inference 9.3 Now to confidence intervals 9.4 Comparing bootstrap and sampling distributions 9.5 Interpreting the confidence interval 9.6 EXAMPLE: One proportion 9.7 EXAMPLE: Comparing two proportions 9.8 Conclusion", " 9 Confidence Intervals In Chapter 8, we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter 8: Generally speaking, we learned that if the sampling of a sample of size \\(n\\) is done at random, then the resulting sample is unbiased and representative of the population, thus any result based on the sample can generalize to the population, and hence the point estimate/sample statistic computed from this sample is a “good guess” of the unknown population parameter of interest Specific to the bowl, we learned that if we properly mix the balls first thereby ensuring the randomness of samples extracted using the shovel with \\(n=50\\) slots, then the contents of the shovel will “look like” the contents of the bowl, thus any results based on the sample of \\(n=50\\) balls can generalize to the large bowl of \\(N=2400\\) balls, and hence the sample proportion red \\(\\widehat{p}\\) of the \\(n=50\\) balls in the shovel is a “good guess” of the true population proportion red \\(p\\) of the \\(N=2400\\) balls in the bowl. We emphasize that we used a point estimate/sample statistic, in this case the sample proportion \\(\\widehat{p}\\), to estimate the unknown value of the population parameter, in this case the population proportion \\(p\\). In other words, we are using the sample to infer about the population. We can however consider inferential situations other than just those involving proportions. We present a wide array of such scenarios in Table ??. In all 7 cases, the point estimate/sample statistic estimates the unknown population parameter. It does so by computing summary statistics based on a sample of size \\(n\\). Scenario Population parameter Population Notation Point estimate/sample statistic Sample Notation 1 Population proportion \\(p\\) Sample proportion \\(\\widehat{p}\\) 2 Population mean \\(\\mu\\) Sample mean \\(\\overline{x}\\) 3 Difference in population proportions \\(p_1 - p_2\\) Difference in sample proportions \\(\\widehat{p}_1 - \\widehat{p}_2\\) 4 Difference in population means \\(\\mu_1 - \\mu_2\\) Difference in sample means \\(\\overline{x}_1 - \\overline{x}_2\\) 5 Population standard deviation \\(\\sigma\\) Sample standard deviation \\(s\\) 6 Population regression intercept \\(\\beta_0\\) Sample regression intercept \\(\\widehat{\\beta}_0\\) or \\(b_0\\) 7 Population regression slope \\(\\beta_1\\) Sample regression slope \\(\\widehat{\\beta}_1\\) or \\(b_1\\) We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing: Scenario 2 about means. Ex: the average age of pennies. Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of two-sample inference. Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This another situation of two-sample inference. In contrast to these, Scenario 5 involves a measure of spread: the standard deviation. Does the spread/variability of a sample match the spread/variability of the population? However, we leave this topic for a more intermediate course on statistical inference. In Chapter 11 on inference for regression, we’ll cover Scenarios 6 &amp; 7 about the regression line. In particular we’ll see that the fitted regression line from Chapter 6 on basic regression, \\(\\widehat{y} = b_0 + b_1 \\cdot x\\), is in fact an estimate of some true population regression line \\(y = \\beta_0 + \\beta+1 \\cdot x\\) based on a sample of \\(n\\) pairs of points \\((x, y)\\). Ex: Recall our sample of \\(n=463\\) instructors at the UT Austin from the evals data set in Chapter 6. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for all instructors, not just those at the UT Austin? In most cases, we don’t have the population values as we did with the bowl of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a confidence interval and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as bootstrapping that will be the focus of the beginning sections of this chapter. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(janitor) library(moderndive) library(infer) DataCamp Our approach of using data science tools to understand the first major component of statistical inference, confidence intervals, uses the same tools as in Mine Cetinkaya-Rundel and Andrew Bray’s DataCamp courses “Inference for Numerical Data” and “Inference for Categorical Data.” If you’re interested in complementing your learning below in an interactive online environment, click on the images below to access the courses. 9.1 Bootstrapping 9.1.1 Data explanation The moderndive package contains a sample of 40 pennies collected and minted in the United States. Let’s explore this sample data first: pennies_sample # A tibble: 40 x 2 year age_in_2011 &lt;int&gt; &lt;int&gt; 1 2005 6 2 1981 30 3 1977 34 4 1992 19 5 2005 6 6 2006 5 7 2000 11 8 1992 19 9 1988 23 10 1996 15 # … with 30 more rows The pennies_sample data frame has rows corresponding to a single penny with two variables: year of minting as shown on the penny and age_in_2011 giving the years the penny had been in circulation from 2011 as an integer, e.g. 15, 2, etc. Suppose we are interested in understanding some properties of the mean age of all US pennies from this data collected in 2011. How might we go about that? Let’s begin by understanding some of the properties of pennies_sample using data wrangling from Chapter 5 and data visualization from Chapter 3. 9.1.2 Exploratory data analysis First, let’s visualize the values in this sample as a histogram: ggplot(pennies_sample, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) We see a roughly symmetric distribution here that has quite a few values near 20 years in age with only a few larger than 40 years or smaller than 5 years. If pennies_sample is a representative sample from the population, we’d expect the age of all US pennies collected in 2011 to have a similar shape, a similar spread, and similar measures of central tendency like the mean. So where does the mean value fall for this sample? This point will be known as our point estimate and provides us with a single number that could serve as the guess to what the true population mean age might be. Recall how to find this using the dplyr package: x_bar &lt;- pennies_sample %&gt;% summarize(stat = mean(age_in_2011)) x_bar # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 We’ve denoted this sample mean as \\(\\bar{x}\\), which is the standard symbol for denoting the mean of a sample. Our point estimate is, thus, \\(\\bar{x} = 25.1\\). Note that this is just one sample though providing just one guess at the population mean. What if we’d like to have another guess? This should all sound similar to what we did in Chapter 8. There instead of collecting just a single scoop of balls we had many different students use the shovel to scoop different samples of red and white balls. We then calculated a sample statistic (the sample proportion) from each sample. But, we don’t have a population to pull from here with the pennies. We only have this one sample. The process of bootstrapping allows us to use a single sample to generate many different samples that will act as our way of approximating a sampling distribution using a created bootstrap distribution instead. We will pull ourselves up from our bootstraps using a single sample (pennies_sample) to get an idea of the grander sampling distribution. 9.1.3 The Bootstrapping Process Bootstrapping uses a process of sampling with replacement from our original sample to create new bootstrap samples of the same size as our original sample. We can again make use of the rep_sample_n() function to explore what one such bootstrap sample would look like. Remember that we are randomly sampling from the original sample here with replacement and that we always use the same sample size for the bootstrap samples as the size of the original sample (pennies_sample). bootstrap_sample1 &lt;- pennies_sample %&gt;% rep_sample_n(size = 40, replace = TRUE, reps = 1) bootstrap_sample1 # A tibble: 40 x 3 # Groups: replicate [1] replicate year age_in_2011 &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 1 1983 28 2 1 2000 11 3 1 2004 7 4 1 1981 30 5 1 1993 18 6 1 2006 5 7 1 1981 30 8 1 2004 7 9 1 1992 19 10 1 1994 17 # … with 30 more rows Let’s visualize what this new bootstrap sample looks like: ggplot(bootstrap_sample1, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) We now have another sample from what we could assume comes from the population of interest. We can similarly calculate the sample mean of this bootstrap sample, called a bootstrap statistic. bootstrap_sample1 %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 1 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 23.2 We can see that this sample mean is smaller than the x_bar value we calculated earlier for the pennies_sample data. We’ll come back to analyzing the different bootstrap statistic values shortly. Let’s recap what was done to get to this bootstrap sample using a tactile explanation: First, pretend that each of the 40 values of age_in_2011 in pennies_sample were written on a small piece of paper. Recall that these values were 6, 30, 34, 19, 6, etc. Now, put the 40 small pieces of paper into a receptacle such as a baseball cap. Shake up the pieces of paper. Draw “at random” from the cap to select one piece of paper. Write down the value on this piece of paper. Say that it is 28. Now, place this piece of paper containing 28 back into the cap. Draw “at random” again from the cap to select a piece of paper. Note that this is the sampling with replacement part since you may draw 28 again. Repeat this process until you have drawn 40 pieces of paper and written down the values on these 40 pieces of paper. Completing this repetition produces ONE bootstrap sample. If you look at the values in bootstrap_sample1, you can see how this process plays out. We originally drew 28, then we drew 11, then 7, and so on. Of course, we didn’t actually use pieces of paper and a cap here. We just had the computer perform this process for us to produce bootstrap_sample1 using rep_sample_n() with replace = TRUE set. The process of sampling with replacement is how we can use the original sample to take a guess as to what other values in the population may be. Sometimes in these bootstrap samples, we will select lots of larger values from the original sample, sometimes we will select lots of smaller values, and most frequently we will select values that are near the center of the sample. Let’s explore what the distribution of values of age_in_2011 for six different bootstrap samples looks like to further understand this variability. six_bootstrap_samples &lt;- pennies_sample %&gt;% rep_sample_n(size = 40, replace = TRUE, reps = 6) ggplot(six_bootstrap_samples, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) + facet_wrap(~ replicate) We can also look at the six different means using dplyr syntax: six_bootstrap_samples %&gt;% group_by(replicate) %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 6 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 23.6 2 2 24.1 3 3 25.2 4 4 23.1 5 5 24.0 6 6 24.7 Instead of doing this six times, we could do it 1000 times and then look at the distribution of stat across all 1000 of the replicates. This sets the stage for the infer R package (Bray et al. 2019) that was created to help users perform statistical inference such as confidence intervals and hypothesis tests using verbs similar to what you’ve seen with dplyr. We’ll walk through setting up each of the infer verbs for confidence intervals using this pennies_sample example, while also explaining the purpose of the verbs in a general framework. 9.2 The infer package for statistical inference The infer package makes great use of the %&gt;% to create a pipeline for statistical inference. The goal of the package is to provide a way for its users to explain the computational process of confidence intervals and hypothesis tests using the code as a guide. The verbs build in order here, so you’ll want to start with specify() and then continue through the others as needed. 9.2.1 Specify variables The specify() function is used primarily to choose which variables will be the focus of the statistical inference. In addition, a setting of which variable will act as the explanatory and which acts as the response variable is done here. For proportion problems to those in Chapter 8, we can also give which of the different levels we would like to have as a success. We’ll see further examples of these options in this chapter, Chapter 10, and in Appendix B. To begin to create a confidence interval for the population mean age of US pennies in 2011, we start by using specify() to choose which variable in our pennies_sample data we’d like to work with. This can be done in one of two ways: Using the response argument: pennies_sample %&gt;% specify(response = age_in_2011) Response: age_in_2011 (integer) # A tibble: 40 x 1 age_in_2011 &lt;int&gt; 1 6 2 30 3 34 4 19 5 6 6 5 7 11 8 19 9 23 10 15 # … with 30 more rows Using formula notation: pennies_sample %&gt;% specify(formula = age_in_2011 ~ NULL) Response: age_in_2011 (integer) # A tibble: 40 x 1 age_in_2011 &lt;int&gt; 1 6 2 30 3 34 4 19 5 6 6 5 7 11 8 19 9 23 10 15 # … with 30 more rows Note that the formula notation uses the common R methodology to include the response \\(y\\) variable on the left of the ~ and the explanatory \\(x\\) variable on the right of the “tilde.” Recall that you used this notation frequently with the lm() function in Chapters 6 and 7 when fitting regression models. Either notation works just fine, but a preference is usually given here for the formula notation to further build on the ideas from earlier chapters. 9.2.2 Generate replicates After specify()ing the variables we’d like in our inferential analysis, we next feed that into the generate() verb. The generate() verb’s main argument is reps, which is used to give how many different repetitions one would like to perform. Another argument here is type, which is automatically determined by the kinds of variables passed into specify(). We can also be explicit and set this type to be type = &quot;bootstrap&quot;. This type argument will be further used in hypothesis testing in Chapter 10 as well. Make sure to check out ?generate to see the options here and use the ? operator to better understand other verbs as well. Let’s generate() 1000 bootstrap samples: thousand_bootstrap_samples &lt;- pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% generate(reps = 1000) We can use the dplyr count() function to help us understand what the thousand_bootstrap_samples data frame looks like: thousand_bootstrap_samples %&gt;% count(replicate) # A tibble: 1,000 x 2 # Groups: replicate [1,000] replicate n &lt;int&gt; &lt;int&gt; 1 1 40 2 2 40 3 3 40 4 4 40 5 5 40 6 6 40 7 7 40 8 8 40 9 9 40 10 10 40 # … with 990 more rows Notice that each replicate has 40 entries here. Now that we have 1000 different bootstrap samples, our next step is to calculate the bootstrap statistics for each sample. 9.2.3 Calculate summary statistics After generate()ing many different samples, we next want to condense those samples down into a single statistic for each replicated sample. As seen in the diagram, the calculate() function is helpful here. As we did at the beginning of this chapter, we now want to calculate the mean age_in_2011 for each bootstrap sample. To do so, we use the stat argument and set it to &quot;mean&quot; below. The stat argument has a variety of different options here and we will see further examples of this throughout the remaining chapters. bootstrap_distribution &lt;- pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;mean&quot;) bootstrap_distribution # A tibble: 1,000 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 26.5 2 2 25.4 3 3 26.0 4 4 26 5 5 25.2 6 6 29.0 7 7 22.8 8 8 26.4 9 9 24.9 10 10 28.1 # … with 990 more rows We see that the resulting data has 1000 rows and 2 columns corresponding to the 1000 replicates and the mean for each bootstrap sample. Observed statistic / point estimate calculations Just as group_by() %&gt;% summarize() produces a useful workflow in dplyr, we can also use specify() %&gt;% calculate() to compute summary measures on our original sample data. It’s often helpful both in confidence interval calculations, but also in hypothesis testing to identify what the corresponding statistic is in the original data. For our example on penny age, we computed above a value of x_bar using the summarize() verb in dplyr: pennies_sample %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 This can also be done by skipping the generate() step in the pipeline feeding specify() directly into calculate(): pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% calculate(stat = &quot;mean&quot;) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 This shortcut will be particularly useful when the calculation of the observed statistic is tricky to do using dplyr alone. This is particularly the case when working with more than one variable as will be seen in Chapter 10. 9.2.4 Visualize the results The visualize() verb provides a simple way to view the bootstrap distribution as a histogram of the stat variable values. It has many other arguments that one can use as well including the shading of the histogram values corresponding to the confidence interval values. bootstrap_distribution %&gt;% visualize() The shape of this resulting distribution may look familiar to you. It resembles the well-known normal (bell-shaped) curve. The following diagram recaps the infer pipeline for creating a bootstrap distribution. 9.3 Now to confidence intervals Definition: Confidence Interval A confidence interval gives a range of plausible values for a parameter. It depends on a specified confidence level with higher confidence levels corresponding to wider confidence intervals and lower confidence levels corresponding to narrower confidence intervals. Common confidence levels include 90%, 95%, and 99%. Usually we don’t just begin sections with a definition, but confidence intervals are simple to define and play an important role in the sciences and any field that uses data. You can think of a confidence interval as playing the role of a net when fishing. Instead of just trying to catch a fish with a single spear (estimating an unknown parameter by using a single point estimate/statistic), we can use a net to try to provide a range of possible locations for the fish (use a range of possible values based around our statistic to make a plausible guess as to the location of the parameter). The bootstrapping process will provide bootstrap statistics that have a bootstrap distribution with center at (or extremely close to) the mean of the original sample. This can be seen by giving the observed statistic obs_stat argument the value of the point estimate x_bar. bootstrap_distribution %&gt;% visualize(obs_stat = x_bar) We can also compute the mean of the bootstrap distribution of means to see how it compares to x_bar: bootstrap_distribution %&gt;% summarize(mean_of_means = mean(stat)) # A tibble: 1 x 1 mean_of_means &lt;dbl&gt; 1 25.1 In this case, we can see that the bootstrap distribution provides us a guess as to what the variability in different sample means may look like only using the original sample as our guide. We can quantify this variability in the form of a 95% confidence interval in a couple different ways. 9.3.1 The percentile method One way to calculate a range of plausible values for the unknown mean age of coins in 2011 is to use the middle 95% of the bootstrap_distribution to determine our endpoints. Our endpoints are thus at the 2.5th and 97.5th percentiles. This can be done with infer using the get_ci() function. (You can also use the conf_int() or get_confidence_interval() functions here as they are aliases that work the exact same way.) bootstrap_distribution %&gt;% get_ci(level = 0.95, type = &quot;percentile&quot;) # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 These options are the default values for level and type so we can also just do: percentile_ci &lt;- bootstrap_distribution %&gt;% get_ci() percentile_ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 Using the percentile method, our range of plausible values for the mean age of US pennies in circulation in 2011 is 20.972 years to 29.252 years. We can use the visualize() function to view this using the endpoints and direction arguments, setting direction to &quot;between&quot; (between the values) and endpoints to be those stored with name percentile_ci. bootstrap_distribution %&gt;% visualize(endpoints = percentile_ci, direction = &quot;between&quot;) You can see that 95% of the data stored in the stat variable in bootstrap_distribution falls between the two endpoints with 2.5% to the left outside of the shading and 2.5% to the right outside of the shading. The cut-off points that provide our range are shown with the darker lines. 9.3.2 The standard error method If the bootstrap distribution is close to symmetric and bell-shaped, we can also use a shortcut formula for determining the lower and upper endpoints of the confidence interval. This is done by using the formula \\(\\bar{x} \\pm (multiplier * SE),\\) where \\(\\bar{x}\\) is our original sample mean and \\(SE\\) stands for standard error and corresponds to the standard deviation of the bootstrap distribution. The value of \\(multiplier\\) here is the appropriate percentile of the standard normal distribution. These are automatically calculated when level is provided with level = 0.95 being the default. (95% of the values in a standard normal distribution fall within 1.96 standard deviations of the mean, so \\(multiplier = 1.96\\) for level = 0.95, for example.) As mentioned, this formula assumes that the bootstrap distribution is symmetric and bell-shaped. This is often the case with bootstrap distributions, especially those in which the original distribution of the sample is not highly skewed. Definition: standard error The standard error is the standard deviation of the sampling distribution. The variability of the sampling distribution may be approximated by the variability of the bootstrap distribution. Traditional theory-based methodologies for inference also have formulas for standard errors, assuming some conditions are met. This \\(\\bar{x} \\pm (multiplier * SE)\\) formula is implemented in the get_ci() function as shown with our pennies problem using the bootstrap distribution’s variability as an approximation for the sampling distribution’s variability. We’ll see more on this approximation shortly. Note that the center of the confidence interval (the point_estimate) must be provided for the standard error confidence interval. standard_error_ci &lt;- bootstrap_distribution %&gt;% get_ci(type = &quot;se&quot;, point_estimate = x_bar) standard_error_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 bootstrap_distribution %&gt;% visualize(endpoints = standard_error_ci, direction = &quot;between&quot;) We see that both methods produce nearly identical confidence intervals with the percentile method being \\([20.97, 29.25]\\) and the standard error method being \\([20.97, 29.28]\\). 9.4 Comparing bootstrap and sampling distributions To help build up the idea of a confidence interval, we weren’t completely honest in our initial discussion. The pennies_sample data frame represents a sample from a larger number of pennies stored as pennies in the moderndive package. The pennies data frame (also in the moderndive package) contains 800 rows of data and two columns pertaining to the same variables as pennies_sample. Let’s begin by understanding some of the properties of the age_by_2011 variable in the pennies data frame. ggplot(pennies, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) pennies %&gt;% summarize(mean_age = mean(age_in_2011), median_age = median(age_in_2011)) # A tibble: 1 x 2 mean_age median_age &lt;dbl&gt; &lt;dbl&gt; 1 21.2 20 We see that pennies is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that pennies_sample was more symmetric than pennies. In fact, it actually exhibited some left-skew as we compare the mean and median values. ggplot(pennies_sample, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) pennies_sample %&gt;% summarize(mean_age = mean(age_in_2011), median_age = median(age_in_2011)) # A tibble: 1 x 2 mean_age median_age &lt;dbl&gt; &lt;dbl&gt; 1 25.1 25.5 Sampling distribution Let’s assume that pennies represents our population of interest. We can then create a sampling distribution for the population mean age of pennies, denoted by the Greek letter \\(\\mu\\), using the rep_sample_n() function seen in Chapter 8. First we will create 1000 samples from the pennies data frame. thousand_samples &lt;- pennies %&gt;% rep_sample_n(size = 40, reps = 1000, replace = FALSE) When creating a sampling distribution, we do not replace the items when we create each sample. This is in contrast to the bootstrap distribution. It’s important to remember that the sampling distribution is sampling without replacement from the population to better understand sample-to-sample variability, whereas the bootstrap distribution is sampling with replacement from our original sample to better understand potential sample-to-sample variability. After sampling from pennies 1000 times, we next want to compute the mean age for each of the 1000 samples: sampling_distribution &lt;- thousand_samples %&gt;% group_by(replicate) %&gt;% summarize(stat = mean(age_in_2011)) We could use ggplot() with geom_histogram() again, but since we’ve named our column in summarize() to be stat, we can also use the shortcut visualize() function in infer and also specify the number of bins and also fill the bars with a different color such as &quot;salmon&quot;. This will be done to help remember that &quot;salmon&quot; corresponds to “sampling distribution”. sampling_distribution %&gt;% visualize(bins = 10, fill = &quot;salmon&quot;) Figure 9.1: Sampling distribution for n=40 samples of pennies We can also examine the variability in this sampling distribution by calculating the standard deviation of the stat column. Remember that the standard deviation of the sampling distribution is the standard error, frequently denoted as se. sampling_distribution %&gt;% summarize(se = sd(stat)) # A tibble: 1 x 1 se &lt;dbl&gt; 1 2.01 Bootstrap distribution Let’s now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We’ll shade the bootstrap distribution blue to further assist with remembering which is which. bootstrap_distribution %&gt;% visualize(bins = 10, fill = &quot;blue&quot;) bootstrap_distribution %&gt;% summarize(se = sd(stat)) # A tibble: 1 x 1 se &lt;dbl&gt; 1 2.12 Notice that while the standard deviations are similar, the center of the sampling distribution and the bootstrap distribution differ: sampling_distribution %&gt;% summarize(mean_of_sampling_means = mean(stat)) # A tibble: 1 x 1 mean_of_sampling_means &lt;dbl&gt; 1 21.2 bootstrap_distribution %&gt;% summarize(mean_of_bootstrap_means = mean(stat)) # A tibble: 1 x 1 mean_of_bootstrap_means &lt;dbl&gt; 1 25.1 Since the bootstrap distribution is centered at the original sample mean, it doesn’t necessarily provide a good estimate of the overall population mean \\(\\mu\\). Let’s calculate the mean of age_in_2011 for the pennies data frame to see how it compares to the mean of the sampling distribution and the mean of the bootstrap distribution. pennies %&gt;% summarize(overall_mean = mean(age_in_2011)) # A tibble: 1 x 1 overall_mean &lt;dbl&gt; 1 21.2 Notice that this value matches up well with the mean of the sampling distribution. This is actually an artifact of the Central Limit Theorem introduced in Chapter 8. The mean of the sampling distribution is expected to be the mean of the overall population. The unfortunate fact though is that we don’t know the population mean in nearly all circumstances. The motivation of presenting it here was to show that the theory behind the Central Limit Theorem works using the tools you’ve worked with so far using the ggplot2, dplyr, moderndive, and infer packages. If we aren’t able to use the sample mean as a good guess for the population mean, how should we best go about estimating what the population mean may be if we can only select samples from the population. We’ve now come full circle and can discuss the underpinnings of the confidence interval and ways to interpret it. 9.5 Interpreting the confidence interval As shown above in Subsection 9.3.1, one range of plausible values for the population mean age of pennies in 2011, denoted by \\(\\mu\\), is \\([20.97, 29.25]\\). Recall that this confidence interval is based on bootstrapping using pennies_sample. Note that the mean of pennies (21.152) does fall in this confidence interval. If we had a different sample of size 40 and constructed a confidence interval using the same method, would we be guaranteed that it contained the population parameter value as well? Let’s try it out: pennies_sample2 &lt;- pennies %&gt;% sample_n(size = 40) Note the use of the sample_n() function in the dplyr package here. This does the same thing as rep_sample_n(reps = 1) but omits the extra replicate column. We next create an infer pipeline to generate a percentile-based 95% confidence interval for \\(\\mu\\): percentile_ci2 &lt;- pennies_sample2 %&gt;% specify(formula = age_in_2011 ~ NULL) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;mean&quot;) %&gt;% get_ci() percentile_ci2 # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 18.4 25.3 This new confidence interval also contains the value of \\(\\mu\\). Let’s further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of pennies. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years. Of the 100 confidence intervals based on samples of size \\(n = 40\\), 96 of them captured the population mean \\(\\mu = 21.152\\), whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we’d expect 95% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is “95% reliable” in that we can expect it to include the true population parameter 95% of the time if the process is repeated. To further accentuate this point, let’s perform a similar procedure using 90% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals. Of the 100 confidence intervals based on samples of size \\(n = 40\\), 87 of them captured the population mean \\(\\mu = 21.152\\), whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be “95% confident” or “90% confident” that the true value falls within the range of the specified confidence interval. We will use this “confident” language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process. Back to our pennies example After this elaboration on what the level corresponds to in a confidence interval, let’s conclude by providing an interpretation of the original confidence interval result we found in Subsection 9.3.1. Interpretation: We are 95% confident that the true mean age of pennies in circulation in 2011 is between 20.972 and 29.252 years. This level of confidence is based on the percentile-based method including the true mean 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created. 9.6 EXAMPLE: One proportion Let’s revisit our exercise of trying to estimate the proportion of red balls in the bowl from Chapter 8. We are now interested in determining a confidence interval for population parameter \\(p\\), the proportion of balls that are red out of the total \\(N = 2400\\) red and white balls. We will use the first sample reported from Ilyas and Yohan in Subsection 8.2.2 for our point estimate. They observed 21 red balls out of the 50 in their shovel. This data is stored in the tactile_shovel1 data frame in the moderndive package. tactile_shovel1 # A tibble: 50 x 1 color &lt;chr&gt; 1 red 2 red 3 white 4 red 5 white 6 red 7 red 8 white 9 red 10 white # … with 40 more rows 9.6.1 Observed Statistic To compute the proportion that are red in this data we can use the specify() %&gt;% calculate() workflow. Note the use of the success argument here to clarify which of the two colors &quot;red&quot; or &quot;white&quot; we are interested in. p_hat &lt;- tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% calculate(stat = &quot;prop&quot;) p_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.42 9.6.2 Bootstrap distribution Next we want to calculate many different bootstrap samples and their corresponding bootstrap statistic (the proportion of red balls). We’ve done 1000 in the past, but let’s go up to 10,000 now to better see the resulting distribution. Recall that this is done by including a generate() function call in the middle of our pipeline: tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% generate(reps = 10000) This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the calculate() step. bootstrap_props &lt;- tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;prop&quot;) Let’s visualize() what the resulting bootstrap distribution looks like as a histogram. We’ve adjusted the number of bins here as well to better see the resulting shape. bootstrap_props %&gt;% visualize(bins = 25) We see that the resulting distribution is symmetric and bell-shaped so it doesn’t much matter which confidence interval method we choose. Let’s use the standard error method to create a 95% confidence interval. standard_error_ci &lt;- bootstrap_props %&gt;% get_ci(type = &quot;se&quot;, level = 0.95, point_estimate = p_hat) standard_error_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 0.284 0.556 bootstrap_props %&gt;% visualize(bins = 25, endpoints = standard_error_ci) We are 95% confident that the true proportion of red balls in the bowl is between 0.284 and years. This level of confidence is based on the standard error-based method including the true proportion 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created. 9.6.3 Theory-based confidence intervals When the bootstrap distribution has the nice symmetric, bell shape that we saw in the red balls example above, we can also use a formula to quantify the standard error. This provides another way to compute a confidence interval, but is a little more tedious and mathematical. The steps are outlined below. We’ve also shown how we can use the confidence interval (CI) interpretation in this case as well to support your understanding of this tricky concept. Procedure for building a theory-based CI for \\(p\\) To construct a theory-based confidence interval for \\(p\\), the unknown true population proportion we Collect a sample of size \\(n\\) Compute \\(\\widehat{p}\\) Compute the standard error \\[\\text{SE} = \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Compute the margin of error \\[\\text{MoE} = 1.96 \\cdot \\text{SE} = 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Compute both end points of the confidence interval: The lower end point lower_ci: \\[\\widehat{p} - \\text{MoE} = \\widehat{p} - 1.96 \\cdot \\text{SE} = \\widehat{p} - 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] The upper end point upper_ci: \\[\\widehat{p} + \\text{MoE} = \\widehat{p} + 1.96 \\cdot \\text{SE} = \\widehat{p} + 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Alternatively, you can succinctly summarize a 95% confidence interval for \\(p\\) using the \\(\\pm\\) symbol: \\[ \\widehat{p} \\pm \\text{MoE} = \\widehat{p} \\pm 1.96 \\cdot \\text{SE} = \\widehat{p} \\pm 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}} \\] Confidence intervals based on 33 tactile samples Let’s load the tactile sampling data for the 33 groups from Chapter 8. Recall this data was saved in the tactile_prop_red data frame included in the moderndive package. tactile_prop_red Let’s now apply the above procedure for constructing confidence intervals for \\(p\\) using the data saved in tactile_prop_red by adding/modifying new columns using the dplyr package data wrangling tools seen in Chapter 5: Rename prop_red to p_hat, the official name of the sample proportion Make explicit the sample size n of \\(n=50\\) the standard error SE the margin of error MoE the left endpoint of the confidence interval lower_ci the right endpoint of the confidence interval upper_ci conf_ints &lt;- tactile_prop_red %&gt;% rename(p_hat = prop_red) %&gt;% mutate( n = 50, SE = sqrt(p_hat * (1 - p_hat) / n), MoE = 1.96 * SE, lower_ci = p_hat - MoE, upper_ci = p_hat + MoE ) conf_ints group red_balls p_hat n SE MoE lower_ci upper_ci Ilyas, Yohan 21 0.42 50 0.070 0.137 0.283 0.557 Morgan, Terrance 17 0.34 50 0.067 0.131 0.209 0.471 Martin, Thomas 21 0.42 50 0.070 0.137 0.283 0.557 Clark, Frank 21 0.42 50 0.070 0.137 0.283 0.557 Riddhi, Karina 18 0.36 50 0.068 0.133 0.227 0.493 Andrew, Tyler 19 0.38 50 0.069 0.135 0.245 0.515 Julia 19 0.38 50 0.069 0.135 0.245 0.515 Rachel, Lauren 11 0.22 50 0.059 0.115 0.105 0.335 Daniel, Caroline 15 0.30 50 0.065 0.127 0.173 0.427 Josh, Maeve 17 0.34 50 0.067 0.131 0.209 0.471 Emily, Emily 16 0.32 50 0.066 0.129 0.191 0.449 Conrad, Emily 18 0.36 50 0.068 0.133 0.227 0.493 Oliver, Erik 17 0.34 50 0.067 0.131 0.209 0.471 Isabel, Nam 21 0.42 50 0.070 0.137 0.283 0.557 X, Claire 15 0.30 50 0.065 0.127 0.173 0.427 Cindy, Kimberly 20 0.40 50 0.069 0.136 0.264 0.536 Kevin, James 11 0.22 50 0.059 0.115 0.105 0.335 Nam, Isabelle 21 0.42 50 0.070 0.137 0.283 0.557 Harry, Yuko 15 0.30 50 0.065 0.127 0.173 0.427 Yuki, Eileen 16 0.32 50 0.066 0.129 0.191 0.449 Ramses 23 0.46 50 0.070 0.138 0.322 0.598 Joshua, Elizabeth, Stanley 15 0.30 50 0.065 0.127 0.173 0.427 Siobhan, Jane 18 0.36 50 0.068 0.133 0.227 0.493 Jack, Will 16 0.32 50 0.066 0.129 0.191 0.449 Caroline, Katie 21 0.42 50 0.070 0.137 0.283 0.557 Griffin, Y 18 0.36 50 0.068 0.133 0.227 0.493 Kaitlin, Jordan 17 0.34 50 0.067 0.131 0.209 0.471 Ella, Garrett 18 0.36 50 0.068 0.133 0.227 0.493 Julie, Hailin 15 0.30 50 0.065 0.127 0.173 0.427 Katie, Caroline 21 0.42 50 0.070 0.137 0.283 0.557 Mallory, Damani, Melissa 21 0.42 50 0.070 0.137 0.283 0.557 Katie 16 0.32 50 0.066 0.129 0.191 0.449 Francis, Vignesh 19 0.38 50 0.069 0.135 0.245 0.515 Let’s plot: These 33 confidence intervals for \\(p\\): from lower_ci to upper_ci The true population proportion \\(p = 900 / 2400 = 0.375\\) with a red vertical line Figure 9.2: 33 confidence intervals based on 33 tactile samples of size n=50 We see that: In 31 cases, the confidence intervals “capture” the true \\(p = 900 / 2400 = 0.375\\) In 2 cases, the confidence intervals do not “capture” the true \\(p = 900 / 2400 = 0.375\\) Thus, the confidence intervals capture the true proportion $31 / 33 = 93.939% of the time using this theory-based methodology. Confidence intervals based on 100 virtual samples Let’s say however, we repeated the above 100 times, not tactilely, but virtually. Let’s do this only 100 times instead of 1000 like we did before so that the results can fit on the screen. Again, the steps for compute a 95% confidence interval for \\(p\\) are: Collect a sample of size \\(n = 50\\) as we did in Chapter 8 Compute \\(\\widehat{p}\\): the sample proportion red of these \\(n=50\\) balls Compute the standard error \\(\\text{SE} = \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Compute the margin of error \\(\\text{MoE} = 1.96 \\cdot \\text{SE} = 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Compute both end points of the confidence interval: lower_ci: \\(\\widehat{p} - \\text{MoE} = \\widehat{p} - 1.96 \\cdot \\text{SE} = \\widehat{p} - 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) upper_ci: \\(\\widehat{p} + \\text{MoE} = \\widehat{p} + 1.96 \\cdot \\text{SE} = \\widehat{p} +1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Run the following three steps, being sure to View() the resulting data frame after each step so you can convince yourself of what’s going on: # First: Take 100 virtual samples of n=50 balls virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 100) # Second: For each virtual sample compute the proportion red virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) # Third: Compute the 95% confidence interval as above virtual_prop_red &lt;- virtual_prop_red %&gt;% rename(p_hat = prop_red) %&gt;% mutate( n = 50, SE = sqrt(p_hat*(1-p_hat)/n), MoE = 1.96 * SE, lower_ci = p_hat - MoE, upper_ci = p_hat + MoE ) Here are the results: Figure 9.3: 100 confidence intervals based on 100 virtual samples of size n=50 We see that of our 100 confidence intervals based on samples of size \\(n=50\\), 96 of them captured the true \\(p = 900/2400\\), whereas 4 of them missed. As we create more and more confidence intervals based on more and more samples, about 95% of these intervals will capture. In other words our procedure is “95% reliable.” Theoretical methods like this have largely been used in the past since we didn’t have the computing power to perform the simulation-based methods such as bootstrapping. They are still commonly used though and if the normality assumptions are met, they can provide a nice option for finding confidence intervals and performing hypothesis tests as we will see in Chapter 10. 9.7 EXAMPLE: Comparing two proportions If you see someone else yawn, are you more likely to yawn? In an episode of the show Mythbusters, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website here. More information about the episode is also available on IMDb here. Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (“confederate”) who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at mythbusters_yawn in the moderndive package. Let’s check it out. mythbusters_yawn # A tibble: 50 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 1 seed yes 2 2 control yes 3 3 seed no 4 4 seed yes 5 5 seed no 6 6 control no 7 7 seed yes 8 8 control no 9 9 control no 10 10 seed no # … with 40 more rows The participant ID is stored in the subj variable with values of 1 to 50. The group variable is either &quot;seed&quot; for when a confederate was trying to influence the participant or &quot;control&quot; if a confederate did not interact with the participant. The yawn variable is either &quot;yes&quot; if the participant yawned or &quot;no&quot; if the participant did not yawn. We can use the janitor package to get a glimpse into this data in a table format: mythbusters_yawn %&gt;% tabyl(group, yawn) %&gt;% adorn_percentages() %&gt;% adorn_pct_formatting() %&gt;% # To show original counts adorn_ns() group no yes control 75.0% (12) 25.0% (4) seed 70.6% (24) 29.4% (10) We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We’d like to see if the difference between these two proportions is significantly larger than 0. If so, we’d have evidence to support the claim that yawning is contagious based on this study. In looking over this problem, we can make note of some important details to include in our infer pipeline: We are calling a success having a yawn value of &quot;yes&quot;. Our response variable will always correspond to the variable used in the success so the response variable is yawn. The explanatory variable is the other variable of interest here: group. To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not. 9.7.1 Compute the point estimate mythbusters_yawn %&gt;% specify(formula = yawn ~ group) Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`. Note that the success argument must be specified in situations such as this where the response variable has only two levels. mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) Response: yawn (factor) Explanatory: group (factor) # A tibble: 50 x 2 yawn group &lt;fct&gt; &lt;fct&gt; 1 yes seed 2 yes control 3 no seed 4 yes seed 5 no seed 6 no control 7 yes seed 8 no control 9 no control 10 no seed # … with 40 more rows We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes. mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;) Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c(&quot;first&quot;, &quot;second&quot;)` means `(&quot;first&quot; - &quot;second&quot;)`. Check `?calculate` for details. We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the order in which R should subtract these proportions of successes. As the error message states, we’ll want to put &quot;seed&quot; first after c() and then &quot;control&quot;: order = c(&quot;seed&quot;, &quot;control&quot;). Our point estimate is thus calculated: obs_diff &lt;- mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;seed&quot;, &quot;control&quot;)) obs_diff # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.0441 This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25). 9.7.2 Bootstrap distribution Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection 9.1.3 and in computing bootstrap proportions in Section 9.6, but we haven’t yet worked with bootstrapping involving multiple variables though. In the infer package, bootstrapping with multiple variables means that each row is potentially resampled. Let’s investigate this by looking at the first few rows of mythbusters_yawn: head(mythbusters_yawn) # A tibble: 6 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 1 seed yes 2 2 control yes 3 3 seed no 4 4 seed yes 5 5 seed no 6 6 control no When we bootstrap this data, we are potentially pulling the subject’s readings multiple times. Thus, we could see the entries of &quot;seed&quot; for group and &quot;no&quot; for yawn together in a new row in a bootstrap sample. This is further seen by exploring the sample_n() function in dplyr on this smaller 6 row data frame comprised of head(mythbusters_yawn). The sample_n() function can perform this bootstrapping procedure and is similar to the rep_sample_n() function in infer, except that it is not repeated but rather only performs one sample with or without replacement. set.seed(2019) head(mythbusters_yawn) %&gt;% sample_n(size = 6, replace = TRUE) # A tibble: 6 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 5 seed no 2 5 seed no 3 2 control yes 4 4 seed yes 5 1 seed yes 6 1 seed yes We can see that in this bootstrap sample generated from the first six rows of mythbusters_yawn, we have some rows repeated. The same is true when we perform the generate() step in infer as done below. bootstrap_distribution &lt;- mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;seed&quot;, &quot;control&quot;)) bootstrap_distribution %&gt;% visualize(bins = 20) This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply get_ci() can be used. bootstrap_distribution %&gt;% get_ci(type = &quot;percentile&quot;, level = 0.95) # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -0.219 0.293 The confidence interval shown here includes the value of 0. We’ll see in Chapter 10 further what this means in terms of this difference being statistically significant or not, but let’s examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293. Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about “95% confident”) that the seed group had a higher proportion of yawning than the control group. Note that this all relates to the importance of denoting the order argument in the calculate() function. Since we specified &quot;seed&quot; and then &quot;control&quot; positive values for the statistic correspond to the &quot;seed&quot; proportion being higher, whereas negative values correspond to the &quot;control&quot; group being higher. We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that “yawning is contagious” being “confirmed” is not statistically appropriate. Learning check Practice problems to come soon! 9.8 Conclusion 9.8.1 What’s to come? This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter 10 up next! 9.8.2 Script of R code An R script file of all R code used in this chapter is available here. "],
+["10-hypothesis-testing.html", "10 Hypothesis Testing 10.1 When inference is not needed 10.2 Basics of hypothesis testing 10.3 Criminal trial analogy 10.4 Types of errors in hypothesis testing 10.5 Statistical significance 10.6 Hypothesis testing with infer 10.7 Example: Comparing two means 10.8 Building theory-based methods using computation 10.9 Conclusion", " 10 Hypothesis Testing We saw some of the main concepts of hypothesis testing introduced in Chapters 8 and 9. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general.Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations. The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the infer package pipeline in Chapter 9. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix B. We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the \\(t\\)-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(infer) library(nycflights13) library(ggplot2movies) library(broom) DataCamp Our approach of using data science tools to understand the second major component of statistical inference, hypothesis testing, uses the same tools as in Mine Cetinkaya-Rundel and Andrew Bray’s DataCamp courses “Inference for Numerical Data” and “Inference for Categorical Data.” If you’re interested in complementing your learning below in an interactive online environment, click on the images below to access the courses. 10.1 When inference is not needed Before we delve into hypothesis testing, it’s good to remember that there are cases where you need not perform a rigorous statistical inference. An important and time-saving skill is to ALWAYS do exploratory data analysis using dplyr and ggplot2 before thinking about running a hypothesis test. Let’s look at such an example selecting a sample of flights traveling to Boston and to San Francisco from New York City in the flights data frame in the nycflights13 package. (We will remove flights with missing data first using na.omit and then sample 100 flights going to each of the two airports.) bos_sfo &lt;- flights %&gt;% na.omit() %&gt;% filter(dest %in% c(&quot;BOS&quot;, &quot;SFO&quot;)) %&gt;% group_by(dest) %&gt;% sample_n(100) Suppose we were interested in seeing if the air_time to SFO in San Francisco was statistically greater than the air_time to BOS in Boston. As suggested, let’s begin with some exploratory data analysis to get a sense for how the two variables of air_time and dest relate for these two destination airports: bos_sfo_summary &lt;- bos_sfo %&gt;% group_by(dest) %&gt;% summarize(mean_time = mean(air_time), sd_time = sd(air_time)) bos_sfo_summary # A tibble: 2 x 3 dest mean_time sd_time &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 BOS 38.3 4.21 2 SFO 345. 18.0 Looking at these results, we can clearly see that SFO air_time is much larger than BOS air_time. The standard deviation is also extremely informative here. Learning check (LC10.1) Could we make the same type of immediate conclusion that SFO had a statistically greater air_time if, say, its corresponding standard deviation was 200 minutes? What about 100 minutes? Explain. To further understand just how different the air_time variable is for BOS and SFO, let’s look at a boxplot: ggplot(data = bos_sfo, mapping = aes(x = dest, y = air_time)) + geom_boxplot() Since there is no overlap at all, we can conclude that the air_time for San Francisco flights is statistically greater (at any level of significance) than the air_time for Boston flights. This is a clear example of not needing to do anything more than some simple exploratory data analysis with descriptive statistics and data visualization to get an appropriate inferential conclusion. This is one reason why you should ALWAYS investigate the sample data first using dplyr and ggplot2 via exploratory data analysis. As you get more and more practice with hypothesis testing, you’ll be better able to determine in many cases whether or not the results will be statistically significant. There are circumstances where it is difficult to tell, but you should always try to make a guess FIRST about significance after you have completed your data exploration and before you actually begin the inferential techniques. 10.2 Basics of hypothesis testing In a hypothesis test, we will use data from a sample to help us decide between two competing hypotheses about a population. We make these hypotheses more concrete by specifying them in terms of at least one population parameter of interest. We refer to the competing claims about the population as the null hypothesis, denoted by \\(H_0\\), and the alternative (or research) hypothesis, denoted by \\(H_a\\). The roles of these two hypotheses are NOT interchangeable. The claim for which we seek significant evidence is assigned to the alternative hypothesis. The alternative is usually what the experimenter or researcher wants to establish or find evidence for. Usually, the null hypothesis is a claim that there really is “no effect” or “no difference.” In many cases, the null hypothesis represents the status quo or that nothing interesting is happening. We assess the strength of evidence by assuming the null hypothesis is true and determining how unlikely it would be to see sample results/statistics as extreme (or more extreme) as those in the original sample. Hypothesis testing brings about many weird and incorrect notions in the scientific community and society at large. One reason for this is that statistics has traditionally been thought of as this magic box of algorithms and procedures to get to results and this has been readily apparent if you do a Google search of “flowchart statistics hypothesis tests”. There are so many different complex ways to determine which test is appropriate. You’ll see that we don’t need to rely on these complicated series of assumptions and procedures to conduct a hypothesis test any longer. These methods were introduced in a time when computers weren’t powerful. Your cellphone (in 2016) has more power than the computers that sent NASA astronauts to the moon after all. We’ll see that ALL hypothesis tests can be broken down into the following framework given by Allen Downey here: Figure 10.1: Hypothesis Testing Framework Before we hop into this framework, we will provide another way to think about hypothesis testing that may be useful. 10.3 Criminal trial analogy We can think of hypothesis testing in the same context as a criminal trial in the United States. A criminal trial in the United States is a familiar situation in which a choice between two contradictory claims must be made. The accuser of the crime must be judged either guilty or not guilty. Under the U.S. system of justice, the individual on trial is initially presumed not guilty. Only STRONG EVIDENCE to the contrary causes the not guilty claim to be rejected in favor of a guilty verdict. The phrase “beyond a reasonable doubt” is often used to set the cutoff value for when enough evidence has been given to convict. Theoretically, we should never say “The person is innocent.” but instead “There is not sufficient evidence to show that the person is guilty.” Now let’s compare that to how we look at a hypothesis test. The decision about the population parameter(s) must be judged to follow one of two hypotheses. We initially assume that \\(H_0\\) is true. The null hypothesis \\(H_0\\) will be rejected (in favor of \\(H_a\\)) only if the sample evidence strongly suggests that \\(H_0\\) is false. If the sample does not provide such evidence, \\(H_0\\) will not be rejected. The analogy to “beyond a reasonable doubt” in hypothesis testing is what is known as the significance level. This will be set before conducting the hypothesis test and is denoted as \\(\\alpha\\). Common values for \\(\\alpha\\) are 0.1, 0.01, and 0.05. 10.3.1 Two possible conclusions Therefore, we have two possible conclusions with hypothesis testing: Reject \\(H_0\\) Fail to reject \\(H_0\\) Gut instinct says that “Fail to reject \\(H_0\\)” should say “Accept \\(H_0\\)” but this technically is not correct. Accepting \\(H_0\\) is the same as saying that a person is innocent. We cannot show that a person is innocent; we can only say that there was not enough substantial evidence to find the person guilty. When you run a hypothesis test, you are the jury of the trial. You decide whether there is enough evidence to convince yourself that \\(H_a\\) is true (“the person is guilty”) or that there was not enough evidence to convince yourself \\(H_a\\) is true (“the person is not guilty”). You must convince yourself (using statistical arguments) which hypothesis is the correct one given the sample information. Important note: Therefore, DO NOT WRITE “Accept \\(H_0\\)” any time you conduct a hypothesis test. Instead write “Fail to reject \\(H_0\\).” 10.4 Types of errors in hypothesis testing Unfortunately, just as a jury or a judge can make an incorrect decision in regards to a criminal trial by reaching the wrong verdict, there is some chance we will reach the wrong conclusion via a hypothesis test about a population parameter. As with criminal trials, this comes from the fact that we don’t have complete information, but rather a sample from which to try to infer about a population. The possible erroneous conclusions in a criminal trial are an innocent person is convicted (found guilty) or a guilty person is set free (found not guilty). The possible errors in a hypothesis test are rejecting \\(H_0\\) when in fact \\(H_0\\) is true (Type I Error) or failing to reject \\(H_0\\) when in fact \\(H_0\\) is false (Type II Error). The risk of error is the price researchers pay for basing an inference about a population on a sample. With any reasonable sample-based procedure, there is some chance that a Type I error will be made and some chance that a Type II error will occur. To help understand the concepts of Type I error and Type II error, observe the following table: Figure 10.2: Type I and Type II errors If we are using sample data to make inferences about a parameter, we run the risk of making a mistake. Obviously, we want to minimize our chance of error; we want a small probability of drawing an incorrect conclusion. The probability of a Type I Error occurring is denoted by \\(\\alpha\\) and is called the significance level of a hypothesis test The probability of a Type II Error is denoted by \\(\\beta\\). Formally, we can define \\(\\alpha\\) and \\(\\beta\\) in regards to the table above, but for hypothesis tests instead of a criminal trial. \\(\\alpha\\) corresponds to the probability of rejecting \\(H_0\\) when, in fact, \\(H_0\\) is true. \\(\\beta\\) corresponds to the probability of failing to reject \\(H_0\\) when, in fact, \\(H_0\\) is false. Ideally, we want \\(\\alpha = 0\\) and \\(\\beta = 0\\), meaning that the chance of making an error does not exist. When we have to use incomplete information (sample data), it is not possible to have both \\(\\alpha = 0\\) and \\(\\beta = 0\\). We will always have the possibility of at least one error existing when we use sample data. Usually, what is done is that \\(\\alpha\\) is set before the hypothesis test is conducted and then the evidence is judged against that significance level. Common values for \\(\\alpha\\) are 0.05, 0.01, and 0.10. If \\(\\alpha = 0.05\\), we are using a testing procedure that, used over and over with different samples, rejects a TRUE null hypothesis five percent of the time. So if we can set \\(\\alpha\\) to be whatever we want, why choose 0.05 instead of 0.01 or even better 0.0000000000000001? Well, a small \\(\\alpha\\) means the test procedure requires the evidence against \\(H_0\\) to be very strong before we can reject \\(H_0\\). This means we will almost never reject \\(H_0\\) if \\(\\alpha\\) is very small. If we almost never reject \\(H_0\\), the probability of a Type II Error – failing to reject \\(H_0\\) when we should – will increase! Thus, as \\(\\alpha\\) decreases, \\(\\beta\\) increases and as \\(\\alpha\\) increases, \\(\\beta\\) decreases. We, therefore, need to strike a balance in \\(\\alpha\\) and \\(\\beta\\) and the common values for \\(\\alpha\\) of 0.05, 0.01, and 0.10 usually lead to a nice balance. Learning check (LC10.2) Reproduce the table above about errors, but for a hypothesis test, instead of the one provided for a criminal trial. 10.4.1 Logic of hypothesis testing Take a random sample (or samples) from a population (or multiple populations) If the sample data are consistent with the null hypothesis, do not reject the null hypothesis. If the sample data are inconsistent with the null hypothesis (in the direction of the alternative hypothesis), reject the null hypothesis and conclude that there is evidence the alternative hypothesis is true (based on the particular sample collected). 10.5 Statistical significance The idea that sample results are more extreme than we would reasonably expect to see by random chance if the null hypothesis were true is the fundamental idea behind statistical hypothesis tests. If data at least as extreme would be very unlikely if the null hypothesis were true, we say the data are statistically significant. Statistically significant data provide convincing evidence against the null hypothesis in favor of the alternative, and allow us to generalize our sample results to the claim about the population. Learning check (LC10.3) What is wrong about saying “The defendant is innocent.” based on the US system of criminal trials? (LC10.4) What is the purpose of hypothesis testing? (LC10.5) What are some flaws with hypothesis testing? How could we alleviate them? 10.6 Hypothesis testing with infer The “There is Only One Test” diagram mentioned in Section 10.2 was the inspiration for the infer pipeline that you saw for confidence intervals in Chapter 9. For hypothesis tests, we include one more verb into the pipeline: the hypothesize() verb. Its main argument is null which is either &quot;point&quot; for point hypotheses involving a single sample or &quot;independence&quot; for testing for independence between two variables. We’ll first explore the two variable case by comparing two means. Note the section headings here that refer to the “There is Only One Test” diagram. We will lay out the specifics for each problem using this framework and the infer pipeline together. 10.7 Example: Comparing two means 10.7.1 Randomization/permutation We will now focus on building hypotheses looking at the difference between two population means in an example. We will denote population means using the Greek symbol \\(\\mu\\) (pronounced “mu”). Thus, we will be looking to see if one group “out-performs” another group. This is quite possibly the most common type of statistical inference and serves as a basis for many other types of analyses when comparing the relationship between two variables. Our null hypothesis will be of the form \\(H_0: \\mu_1 = \\mu_2\\), which can also be written as \\(H_0: \\mu_1 - \\mu_2 = 0\\). Our alternative hypothesis will be of the form \\(H_0: \\mu_1 \\star \\mu_2\\) (or \\(H_a: \\mu_1 - \\mu_2 \\, \\star \\, 0\\)) where \\(\\star\\) = \\(&lt;\\), \\(\\ne\\), or \\(&gt;\\) depending on the context of the problem. You needn’t focus on these new symbols too much at this point. It will just be a shortcut way for us to describe our hypotheses. As we saw in Chapter 9, bootstrapping is a valuable tool when conducting inferences based on one or two population variables. We will see that the process of randomization (also known as permutation) will be valuable in conducting tests comparing quantitative values from two groups. 10.7.2 Comparing action and romance movies The movies dataset in the ggplot2movies package contains information on a large number of movies that have been rated by users of IMDB.com (Wickham 2015). We are interested in the question here of whether Action movies are rated higher on IMDB than Romance movies. We will first need to do a little bit of data wrangling using the ideas from Chapter 5 to get the data in the form that we would like: movies_trimmed &lt;- movies %&gt;% select(title, year, rating, Action, Romance) Note that Action and Romance are binary variables here. To remove any overlap of movies (and potential confusion) that are both Action and Romance, we will remove them from our population: movies_trimmed &lt;- movies_trimmed %&gt;% filter(!(Action == 1 &amp; Romance == 1)) We will now create a new variable called genre that specifies whether a movie in our movies_trimmed data frame is an &quot;Action&quot; movie, a &quot;Romance&quot; movie, or &quot;Neither&quot;. We aren’t really interested in the &quot;Neither&quot; category here so we will exclude those rows as well. Lastly, the Action and Romance columns are not needed anymore since they are encoded in the genre column. movies_trimmed &lt;- movies_trimmed %&gt;% mutate(genre = case_when(Action == 1 ~ &quot;Action&quot;, Romance == 1 ~ &quot;Romance&quot;, TRUE ~ &quot;Neither&quot;)) %&gt;% filter(genre != &quot;Neither&quot;) %&gt;% select(-Action, -Romance) The case_when function is useful for assigning values in a new variable based on the values of another variable. The last step of TRUE ~ &quot;Neither&quot; is used when a particular movie is not set to either Action or Romance. We are left with 8878 movies in our population dataset that focuses on only &quot;Action&quot; and &quot;Romance&quot; movies. Learning check (LC10.6) Why are the different genre variables stored as binary variables (1s and 0s) instead of just listing the genre as a column of values like “Action”, “Comedy”, etc.? (LC10.7) What complications could come above with us excluding action romance movies? Should we question the results of our hypothesis test? Explain. Let’s now visualize the distributions of rating across both levels of genre. Think about what type(s) of plot is/are appropriate here before you proceed: ggplot(data = movies_trimmed, aes(x = genre, y = rating)) + geom_boxplot() Figure 10.3: Rating vs genre in the population We can see that the middle 50% of ratings for &quot;Action&quot; movies is more spread out than that of &quot;Romance&quot; movies in the population. &quot;Romance&quot; has outliers at both the top and bottoms of the scale though. We are initially interested in comparing the mean rating across these two groups so a faceted histogram may also be useful: ggplot(data = movies_trimmed, mapping = aes(x = rating)) + geom_histogram(binwidth = 1, color = &quot;white&quot;) + facet_grid(genre ~ .) Figure 10.4: Faceted histogram of genre vs rating Important note: Remember that we hardly ever have access to the population values as we do here. This example and the nycflights13 dataset were used to create a common flow from chapter to chapter. In nearly all circumstances, we’ll be needing to use only a sample of the population to try to infer conclusions about the unknown population parameter values. These examples do show a nice relationship between statistics (where data is usually small and more focused on experimental settings) and data science (where data is frequently large and collected without experimental conditions). 10.7.3 Sampling \\(\\rightarrow\\) randomization We can use hypothesis testing to investigate ways to determine, for example, whether a treatment has an effect over a control and other ways to statistically analyze if one group performs better than, worse than, or different than another. We are interested here in seeing how we can use a random sample of action movies and a random sample of romance movies from movies to determine if a statistical difference exists in the mean ratings of each group. Learning check (LC10.8) Define the relevant parameters here in terms of the populations of movies. 10.7.4 Data Let’s select a random sample of 34 action movies and a random sample of 34 romance movies. (The number 34 was chosen somewhat arbitrarily here.) set.seed(2017) movies_genre_sample &lt;- movies_trimmed %&gt;% group_by(genre) %&gt;% sample_n(34) %&gt;% ungroup() Note the addition of the ungroup() function here. This will be useful shortly in allowing us to permute the values of rating across genre. Our analysis does not work without this ungroup() function since the data stays grouped by the levels of genre without it. We can now observe the distributions of our two sample ratings for both groups. Remember that these plots should be rough approximations of our population distributions of movie ratings for &quot;Action&quot; and &quot;Romance&quot; in our population of all movies in the movies data frame. ggplot(data = movies_genre_sample, aes(x = genre, y = rating)) + geom_boxplot() Figure 10.5: Genre vs rating for our sample ggplot(data = movies_genre_sample, mapping = aes(x = rating)) + geom_histogram(binwidth = 1, color = &quot;white&quot;) + facet_grid(genre ~ .) Figure 10.6: Genre vs rating for our sample as faceted histogram Learning check (LC10.9) What single value could we change to improve the approximation using the sample distribution on the population distribution? Do we have reason to believe, based on the sample distributions of rating over the two groups of genre, that there is a significant difference between the mean rating for action movies compared to romance movies? It’s hard to say just based on the plots. The boxplot does show that the median sample rating is higher for romance movies, but the histogram isn’t as clear. The two groups have somewhat differently shaped distributions but they are both over similar values of rating. It’s often useful to calculate the mean and standard deviation as well, conditioned on the two levels. summary_ratings &lt;- movies_genre_sample %&gt;% group_by(genre) %&gt;% summarize(mean = mean(rating), std_dev = sd(rating), n = n()) summary_ratings # A tibble: 2 x 4 genre mean std_dev n &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; 1 Action 5.11 1.49 34 2 Romance 6.06 1.15 34 Learning check (LC10.10) Why did we not specify na.rm = TRUE here as we did in Chapter 5? We see that the sample mean rating for romance movies, \\(\\bar{x}_{r}\\), is greater than the similar measure for action movies, \\(\\bar{x}_a\\). But is it statistically significantly greater (thus, leading us to conclude that the means are statistically different)? The standard deviation can provide some insight here but with these standard deviations being so similar it’s still hard to say for sure. Learning check (LC10.11) Why might the standard deviation provide some insight about the means being statistically different or not? 10.7.5 Model of \\(H_0\\) The hypotheses we specified can also be written in another form to better give us an idea of what we will be simulating to create our null distribution. \\(H_0: \\mu_r - \\mu_a = 0\\) \\(H_a: \\mu_r - \\mu_a \\ne 0\\) 10.7.6 Test statistic \\(\\delta\\) We are, therefore, interested in seeing whether the difference in the sample means, \\(\\bar{x}_r - \\bar{x}_a\\), is statistically different than 0. We can now come back to our infer pipeline for computing our observed statistic. Note the order argument that shows the mean value for &quot;Action&quot; being subtracted from the mean value of &quot;Romance&quot;. 10.7.7 Observed effect \\(\\delta^*\\) obs_diff &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) obs_diff # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.95 Our goal next is to figure out a random process with which to simulate the null hypothesis being true. Recall that \\(H_0: \\mu_r - \\mu_a = 0\\) corresponds to us assuming that the population means are the same. We would like to assume this is true and perform a random process to generate() data in the model of the null hypothesis. 10.7.8 Simulated data Tactile simulation Here, with us assuming the two population means are equal (\\(H_0: \\mu_r - \\mu_a = 0\\)), we can look at this from a tactile point of view by using index cards. There are \\(n_r = 34\\) data elements corresponding to romance movies and \\(n_a = 34\\) for action movies. We can write the 34 ratings from our sample for romance movies on one set of 34 index cards and the 34 ratings for action movies on another set of 34 index cards. (Note that the sample sizes need not be the same.) The next step is to put the two stacks of index cards together, creating a new set of 68 cards. If we assume that the two population means are equal, we are saying that there is no association between ratings and genre (romance vs action). We can use the index cards to create two new stacks for romance and action movies. First, we must shuffle all the cards thoroughly. After doing so, in this case with equal values of sample sizes, we split the deck in half. We then calculate the new sample mean rating of the romance deck, and also the new sample mean rating of the action deck. This creates one simulation of the samples that were collected originally. We next want to calculate a statistic from these two samples. Instead of actually doing the calculation using index cards, we can use R as we have before to simulate this process. Let’s do this just once and compare the results to what we see in movies_genre_sample. shuffled_ratings_old &lt;- #movies_trimmed %&gt;% movies_genre_sample %&gt;% mutate(genre = mosaic::shuffle(genre)) %&gt;% group_by(genre) %&gt;% summarize(mean = mean(rating)) diff(shuffled_ratings_old$mean) [1] 0.126 permuted_ratings &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% generate(reps = 1) Learning check (LC10.12) How would the tactile shuffling of index cards change if we had different samples of say 20 action movies and 60 romance movies? Describe each step that would change. (LC10.13) Why are we taking the difference in the means of the cards in the new shuffled decks? 10.7.9 Distribution of \\(\\delta\\) under \\(H_0\\) The generate() step completes a permutation sending values of ratings to potentially different values of genre from which they originally came. It simulates a shuffling of the ratings between the two levels of genre just as we could have done with index cards. We can now proceed in a similar way to what we have done previously with bootstrapping by repeating this process many times to create simulated samples, assuming the null hypothesis is true. generated_samples &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) A null distribution of simulated differences in sample means is created with the specification of stat = &quot;diff in means&quot; for the calculate() step. The null distribution is similar to the bootstrap distribution we saw in Chapter 9, but remember that it consists of statistics generated assuming the null hypothesis is true. We can now plot the distribution of these simulated differences in means: null_distribution_two_means %&gt;% visualize() Figure 10.7: Simulated differences in means histogram 10.7.10 The p-value Remember that we are interested in seeing where our observed sample mean difference of 0.95 falls on this null/randomization distribution. We are interested in simply a difference here so “more extreme” corresponds to values in both tails on the distribution. Let’s shade our null distribution to show a visual representation of our \\(p\\)-value: null_distribution_two_means %&gt;% visualize(obs_stat = obs_diff, direction = &quot;both&quot;) Figure 10.8: Shaded histogram to show p-value Remember that the observed difference in means was 0.95. We have shaded red all values at or above that value and also shaded red those values at or below its negative value (since this is a two-tailed test). By giving obs_stat = obs_diff a vertical darker line is also shown at 0.95. To better estimate how large the \\(p\\)-value will be, we also increase the number of bins to 100 here from 20: null_distribution_two_means %&gt;% visualize(bins = 100, obs_stat = obs_diff, direction = &quot;both&quot;) Figure 10.9: Histogram with vertical lines corresponding to observed statistic At this point, it is important to take a guess as to what the \\(p\\)-value may be. We can see that there are only a few permuted differences as extreme or more extreme than our observed effect (in both directions). Maybe we guess that this \\(p\\)-value is somewhere around 2%, or maybe 3%, but certainly not 30% or more. Lastly, we calculate the \\(p\\)-value directly using infer: pvalue &lt;- null_distribution_two_means %&gt;% get_pvalue(obs_stat = obs_diff, direction = &quot;both&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.0046 We have around 0.46% of values as extreme or more extreme than our observed statistic in both directions. Assuming we are using a 5% significance level for \\(\\alpha\\), we have evidence supporting the conclusion that the mean rating for romance movies is different from that of action movies. The next important idea is to better understand just how much higher of a mean rating can we expect the romance movies to have compared to that of action movies. 10.7.11 Corresponding confidence interval One of the great things about the infer pipeline is that going between hypothesis tests and confidence intervals is incredibly simple. To create a null distribution, we ran null_distribution_two_means &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) To get the corresponding bootstrap distribution with which we can compute a confidence interval, we can just remove or comment out the hypothesize() step since we are no longer assuming the null hypothesis is true when we bootstrap: percentile_ci_two_means &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% # hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) %&gt;% get_ci() percentile_ci_two_means # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.333 1.59 Thus, we can expect the true mean of Romance movies on IMDB to have a rating 0.333 to 1.593 points higher than that of Action movies. Remember that this is based on bootstrapping using movies_genre_sample as our original sample and the confidence interval process being 95% reliable. Learning check (LC10.14) Conduct the same analysis comparing action movies versus romantic movies using the median rating instead of the mean rating? What was different and what was the same? (LC10.15) What conclusions can you make from viewing the faceted histogram looking at rating versus genre that you couldn’t see when looking at the boxplot? (LC10.16) Describe in a paragraph how we used Allen Downey’s diagram to conclude if a statistical difference existed between mean movie ratings for action and romance movies. (LC10.17) Why are we relatively confident that the distributions of the sample ratings will be good approximations of the population distributions of ratings for the two genres? (LC10.18) Using the definition of “\\(p\\)-value”, write in words what the \\(p\\)-value represents for the hypothesis test above comparing the mean rating of romance to action movies. (LC10.19) What is the value of the \\(p\\)-value for the hypothesis test comparing the mean rating of romance to action movies? (LC10.20) Do the results of the hypothesis test match up with the original plots we made looking at the population of movies? Why or why not? 10.7.12 Summary To review, these are the steps one would take whenever you’d like to do a hypothesis test comparing values from the distributions of two groups: Simulate many samples using a random process that matches the way the original data were collected and that assumes the null hypothesis is true. Collect the values of a sample statistic for each sample created using this random process to build a null distribution. Assess the significance of the original sample by determining where its sample statistic lies in the null distribution. If the proportion of values as extreme or more extreme than the observed statistic in the randomization distribution is smaller than the pre-determined significance level \\(\\alpha\\), we reject \\(H_0\\). Otherwise, we fail to reject \\(H_0\\). (If no significance level is given, one can assume \\(\\alpha = 0.05\\).) 10.8 Building theory-based methods using computation As a point of reference, we will now discuss the traditional theory-based way to conduct the hypothesis test for determining if there is a statistically significant difference in the sample mean rating of Action movies versus Romance movies. This method and ones like it work very well when the assumptions are met in order to run the test. They are based on probability models and distributions such as the normal and \\(t\\)-distributions. These traditional methods have been used for many decades back to the time when researchers didn’t have access to computers that could run 5000 simulations in a few seconds. They had to base their methods on probability theory instead. Many fields and researchers continue to use these methods and that is the biggest reason for their inclusion here. It’s important to remember that a \\(t\\)-test or a \\(z\\)-test is really just an approximation of what you have seen in this chapter already using simulation and randomization. The focus here is on understanding how the shape of the \\(t\\)-curve comes about without digging big into the mathematical underpinnings. 10.8.1 Example: \\(t\\)-test for two independent samples What is commonly done in statistics is the process of normalization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common normalization is known as the \\(z\\)-score. The formula for a \\(z\\)-score is \\[Z = \\frac{x - \\mu}{\\sigma},\\] where \\(x\\) represent the value of a variable, \\(\\mu\\) represents the mean of the variable, and \\(\\sigma\\) represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding \\(z\\)-score that gives how many standard deviations away that value is from its mean. \\(z\\)-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below. Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity. Another form of normalization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This normalization is often called the \\(t\\)-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is \\[T =\\dfrac{ (\\bar{x}_1 - \\bar{x}_2) - (\\mu_1 - \\mu_2)}{ \\sqrt{\\dfrac{{s_1}^2}{n_1} + \\dfrac{{s_2}^2}{n_2}} }\\] There is a lot to try to unpack here. \\(\\bar{x}_1\\) is the sample mean response of the first group \\(\\bar{x}_2\\) is the sample mean response of the second group \\(\\mu_1\\) is the population mean response of the first group \\(\\mu_2\\) is the population mean response of the second group \\(s_1\\) is the sample standard deviation of the response of the first group \\(s_2\\) is the sample standard deviation of the response of the second group \\(n_1\\) is the sample size of the first group \\(n_2\\) is the sample size of the second group Assuming that the null hypothesis is true (\\(H_0: \\mu_1 - \\mu_2 = 0\\)), \\(T\\) is said to be distributed following a \\(t\\) distribution with degrees of freedom equal to the smaller value of \\(n_1 - 1\\) and \\(n_2 - 1\\). The “degrees of freedom” can be thought of measuring how different the \\(t\\) distribution will be as compared to a normal distribution. Small sample sizes lead to small degrees of freedom and, thus, \\(t\\) distributions that have more values in the tails of their distributions. Large sample sizes lead to large degrees of freedom and, thus, \\(t\\) distributions that closely align with the standard normal, bell-shaped curve. So, assuming \\(H_0\\) is true, our formula simplifies a bit: \\[T =\\dfrac{ \\bar{x}_1 - \\bar{x}_2}{ \\sqrt{\\dfrac{{s_1}^2}{n_1} + \\dfrac{{s_2}^2}{n_2}} }.\\] We have already built an approximation for what we think the distribution of \\(\\delta = \\bar{x}_1 - \\bar{x}_2\\) looks like using randomization above. Recall this distribution: ggplot(data = null_distribution_two_means, aes(x = stat)) + geom_histogram(color = &quot;white&quot;, bins = 20) Figure 10.10: Simulated differences in means histogram The infer package also includes some built-in theory-based statistics as well, so instead of going through the process of trying to transform the difference into a standardized form, we can just provide a different value for stat in calculate(). Recall the generated_samples data frame created via: generated_samples &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) We can now created a null distribution of \\(t\\) statistics: null_distribution_t &lt;- generated_samples %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) null_distribution_t %&gt;% visualize() We see that the shape of this stat = &quot;t&quot; distribution is the same as that of stat = &quot;diff in means&quot;. The scale has changed though with the \\(t\\) values having less spread than the difference in means. A traditional \\(t\\)-test doesn’t look at this simulated distribution, but instead it looks at the \\(t\\)-curve with degrees of freedom equal to 62.029. We can overlay this distribution over the top of our permuted \\(t\\) statistics using the method = &quot;both&quot; setting in visualize(). null_distribution_t %&gt;% visualize(method = &quot;both&quot;) We can see that the curve does a good job of approximating the randomization distribution here. (More on when to expect for this to be the case when we discuss conditions for the \\(t\\)-test in a bit.) To calculate the \\(p\\)-value in this case, we need to figure out how much of the total area under the \\(t\\)-curve is at our observed \\(T\\)-statistic or more, plus also adding the area under the curve at the negative value of the observed \\(T\\)-statistic or below. (Remember this is a two-tailed test so we are looking for a difference–values in the tails of either direction.) Just as we converted all of the simulated values to \\(T\\)-statistics, we must also do so for our observed effect \\(\\delta^*\\): obs_t &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) So graphically we are interested in finding the percentage of values that are at or above 2.945 or at or below -2.945. null_distribution_t %&gt;% visualize(method = &quot;both&quot;, obs_stat = obs_t, direction = &quot;both&quot;) As we might have expected with this just being a standardization of the difference in means statistic that produced a small \\(p\\)-value, we also have a very small one here. 10.8.2 Conditions for t-test The infer package does not automatically check conditions for the theoretical methods to work and this warning was given when we used method = &quot;both&quot;. In order for the results of the \\(t\\)-test to be valid, three conditions must be met: Independent observations in both samples Nearly normal populations OR large sample sizes (\\(n \\ge 30\\)) Independently selected samples Condition 1: This is met since we sampled at random using R from our population. Condition 2: Recall from Figure 10.4, that we know how the populations are distributed. Both of them are close to normally distributed. If we are a little concerned about this assumption, we also do have samples of size larger than 30 (\\(n_1 = n_2 = 34\\)). Condition 3: This is met since there is no natural pairing of a movie in the Action group to a movie in the Romance group. Since all three conditions are met, we can be reasonably certain that the theory-based test will match the results of the randomization-based test using shuffling. Remember that theory-based tests can produce some incorrect results in these assumptions are not carefully checked. The only assumption for randomization and computational-based methods is that the sample is selected at random. They are our preference and we strongly believe they should be yours as well, but it’s important to also see how the theory-based tests can be done and used as an approximation for the computational techniques until at least more researchers are using these techniques that utilize the power of computers. 10.9 Conclusion We conclude by showing the infer pipeline diagram. In Chapter 11, we’ll come back to regression and see how the ideas covered in Chapter 9 and this chapter can help in understanding the significance of predictors in modeling. 10.9.1 Script of R code An R script file of all R code used in this chapter is available here. "],
+["11-inference-for-regression.html", "11 Inference for Regression 11.1 Simulation-based Inference for Regression 11.2 Bootstrapping for the regression slope 11.3 Inference for multiple regression", " 11 Inference for Regression Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(infer) DataCamp Our approach of understanding both the statistical and practical significance of any regression results, is aligned with the approach taken in Jo Hardin’s DataCamp course “Inference for Regression.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. 11.1 Simulation-based Inference for Regression We can also use the concept of permuting to determine the standard error of our null distribution and conduct a hypothesis test for a population slope. Let’s go back to our example on teacher evaluations Chapters 6 and 7. We’ll begin in the basic regression setting to test to see if we have evidence that a statistically significant positive relationship exists between teaching and beauty scores for the University of Texas professors. As we did in Chapter 6, teaching score will act as our outcome variable and bty_avg will be our explanatory variable. We will set up this hypothesis testing process as we have each before via the “There is Only One Test” diagram in Figure 10.1 using the infer package. 11.1.1 Data Our data is stored in evals and we are focused on the measurements of the score and bty_avg variables there. Note that we don’t choose a subset of variables here since we will specify() the variables of interest using infer. evals %&gt;% specify(score ~ bty_avg) Response: score (numeric) Explanatory: bty_avg (numeric) # A tibble: 463 x 2 score bty_avg &lt;dbl&gt; &lt;dbl&gt; 1 4.7 5 2 4.1 5 3 3.9 5 4 4.8 5 5 4.6 3 6 4.3 3 7 2.8 3 8 4.1 3.33 9 3.4 3.33 10 4.5 3.17 # … with 453 more rows 11.1.2 Test statistic \\(\\delta\\) Our test statistic here is the sample slope coefficient that we denote with \\(b_1\\). 11.1.3 Observed effect \\(\\delta^*\\) We can use the specify() %&gt;% calculate() shortcut here to determine the slope value seen in our observed data: slope_obs &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% calculate(stat = &quot;slope&quot;) The calculated slope value from our observed sample is \\(b_1 = 0.067\\). 11.1.4 Model of \\(H_0\\) We are looking to see if a positive relationship exists so \\(H_A: \\beta_1 &gt; 0\\). Our null hypothesis is always in terms of equality so we have \\(H_0: \\beta_1 = 0\\). In other words, when we assume the null hypothesis is true, we are assuming there is NOT a linear relationship between teaching and beauty scores for University of Texas professors. 11.1.5 Simulated data Now to simulate the null hypothesis being true and recreating how our sample was created, we need to think about what it means for \\(\\beta_1\\) to be zero. If \\(\\beta_1 = 0\\), we said above that there is no relationship between the teaching and beauty scores. If there is no relationship, then any one of the teaching score values could have just as likely occurred with any of the other beauty score values instead of the one that it actually did fall with. We, therefore, have another example of permuting in our simulating of data under the null hypothesis. Tactile simulation We could use a deck of 926 note cards to create a tactile simulation of this permuting process. We would write the 463 different values of beauty scores on each of the 463 cards, one per card. We would then do the same thing for the 463 teaching scores putting them on one per card. Next, we would lay out each of the 463 beauty score cards and we would shuffle the teaching score deck. Then, after shuffling the deck well, we would disperse the cards one per each one of the beauty score cards. We would then enter these new values in for teaching score and compute a sample slope based on this permuting. We could repeat this process many times, keeping track of our sample slope after each shuffle. 11.1.6 Distribution of \\(\\delta\\) under \\(H_0\\) We can build our null distribution in much the same way we did in Chapter 10 using the generate() and calculate() functions. Note also the addition of the hypothesize() function, which lets generate() know to perform the permuting instead of bootstrapping. null_slope_distn &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;slope&quot;) null_slope_distn %&gt;% visualize(obs_stat = slope_obs, direction = &quot;greater&quot;) In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with visualize(). 11.1.7 The p-value null_slope_distn %&gt;% get_pvalue(obs_stat = slope_obs, direction = &quot;greater&quot;) # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0 Since 0.067 falls far to the right of this plot beyond where any of the histogram bins have data, we can say that we have a \\(p\\)-value of 0. We, thus, have evidence to reject the null hypothesis in support of there being a positive association between the beauty score and teaching score of University of Texas faculty members. Learning check (LC11.1) Repeat the inference above but this time for the correlation coefficient instead of the slope. Note the implementation of stat = &quot;correlation&quot; in the calculate() function of the infer package. 11.2 Bootstrapping for the regression slope With the p-value calculated as 0 in the hypothesis test above, we can next determine just how strong of a positive slope value we might expect between the variables of teaching score and beauty score (bty_avg) for University of Texas faculty. Recall the infer pipeline above to compute the null distribution. Recall that this assumes the null hypothesis is true that there is no relationship between teaching score and beauty score using the hypothesize() function. null_slope_distn &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000, type = &quot;permute&quot;) %&gt;% calculate(stat = &quot;slope&quot;) To further reinforce the process being done in the pipeline, we’ve added the type argument to generate(). This is automatically added based on the entries for specify() and hypothesize() but it provides a useful way to check to make sure generate() is created the samples in the desired way. In this case, we permuted the values of one variable across the values of the other 10,000 times and calculated a &quot;slope&quot; coefficient for each of these 10,000 generated samples. If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping: bootstrap_slope_distn %&gt;% visualize() Next we can use the get_ci() function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score. percentile_slope_ci &lt;- bootstrap_slope_distn %&gt;% get_ci(level = 0.99, type = &quot;percentile&quot;) percentile_slope_ci # A tibble: 1 x 2 `0.5%` `99.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.0229 0.110 se_slope_ci &lt;- bootstrap_slope_distn %&gt;% get_ci(level = 0.99, type = &quot;se&quot;, point_estimate = slope_obs) se_slope_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 0.0220 0.111 With the bootstrap distribution being close to symmetric, it makes sense that the two resulting confidence intervals are similar. 11.3 Inference for multiple regression 11.3.1 Refresher: Professor evaluations data Let’s revisit the professor evaluations data that we analyzed using multiple regression with one numerical and one categorical predictor. In particular \\(y\\): outcome variable of instructor evaluation score predictor variables \\(x_1\\): numerical explanatory/predictor variable of age \\(x_2\\): categorical explanatory/predictor variable of gender library(ggplot2) library(dplyr) library(moderndive) evals_multiple &lt;- evals %&gt;% select(score, ethnicity, gender, language, age, bty_avg, rank) First, recall that we had two competing potential models to explain professors’ teaching scores: Model 1: No interaction term. i.e. both male and female profs have the same slope describing the associated effect of age on teaching score Model 2: Includes an interaction term. i.e. we allow for male and female profs to have different slopes describing the associated effect of age on teaching score 11.3.2 Refresher: Visualizations Recall the plots we made for both these models: Figure 11.1: Model 1: no interaction effect included Figure 11.2: Model 2: interaction effect included 11.3.3 Refresher: Regression tables Last, let’s recall the regressions we fit. First, the regression with no interaction effect: note the use of + in the formula. score_model_2 &lt;- lm(score ~ age + gender, data = evals_multiple) get_regression_table(score_model_2) Table 11.1: Model 1: Regression table with no interaction effect included term estimate std_error statistic p_value lower_ci upper_ci intercept 4.484 0.125 35.79 0.000 4.238 4.730 age -0.009 0.003 -3.28 0.001 -0.014 -0.003 gendermale 0.191 0.052 3.63 0.000 0.087 0.294 Second, the regression with an interaction effect: note the use of * in the formula. score_model_3 &lt;- lm(score ~ age * gender, data = evals_multiple) get_regression_table(score_model_3) Table 11.2: Model 2: Regression table with interaction effect included term estimate std_error statistic p_value lower_ci upper_ci intercept 4.883 0.205 23.80 0.000 4.480 5.286 age -0.018 0.004 -3.92 0.000 -0.026 -0.009 gendermale -0.446 0.265 -1.68 0.094 -0.968 0.076 age:gendermale 0.014 0.006 2.45 0.015 0.003 0.024 11.3.4 Script of R code An R script file of all R code used in this chapter is available here. "],
+["12-thinking-with-data.html", "12 Thinking with Data 12.1 Case study: Seattle house prices 12.2 Case study: Effective data storytelling Concluding remarks", " 12 Thinking with Data Recall in Section 1.2 “Introduction for students” and at the end of chapters throughout this book, we displayed the “ModernDive flowchart” mapping your journey through this book. Figure 12.1: ModernDive Flowchart Let’s get a refresher of what you’ve covered so far. You first got started with with data in Chapter 2, where you learned about the difference between R and RStudio, started coding in R, started understanding what R packages are, and explored your first dataset: all domestic departure flights from a New York City airport in 2013. Then: Data science: You assembled your data science toolbox using tidyverse packages. In particular: Ch.3: Visualizing data via the ggplot2 package. Ch.4: Understanding the concept of “tidy” data as a standardized data input format for all packages in the tidyverse Ch.5: Wrangling data via the dplyr package. Data modeling: Using these data science tools and helper functions from the moderndive package, you started performing data modeling. In particular: Ch.6: Constructing basic regression models. Ch.7: Constructing multiple regression models. Statistical inference: Once again using your newly acquired data science tools, we unpacked statistical inference using the infer package. In particular: Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls. Ch.9: Building confidence intervals. Ch.10: Conducting hypothesis tests. Data modeling revisited: Armed with your new understanding of statistical inference, you revisited and reviewed the models you constructed in Ch.6 &amp; Ch.7. In particular: Ch.11: Interpreting both the statistical and practice significance of the results of the models. All this was our approach of guiding you through your first experiences of “thinking with data”, an expression originally coined by Diane Lambert of Google. How the philosophy underlying this expression guided our mapping of the flowchart above was well put in the introduction to the “Practical Data Science for Stats” collection of preprints focusing on the practical side of data science workflows and statistical analysis, curated by Jennifer Bryan and Hadley Wickham: There are many aspects of day-to-day analytical work that are almost absent from the conventional statistics literature and curriculum. And yet these activities account for a considerable share of the time and effort of data analysts and applied statisticians. The goal of this collection is to increase the visibility and adoption of modern data analytical workflows. We aim to facilitate the transfer of tools and frameworks between industry and academia, between software engineering and statistics and computer science, and across different domains. In other words, in order to be equipped to “think with data” in the 21st century, future analysts need preparation going through the entirety of the “Data/Science Pipeline” we also saw earlier and not just parts of it. Figure 12.2: Data/Science Pipeline In Section 12.1, we’ll take you through full-pass of the “Data/Science Pipeline” where we’ll analyze the sale price of houses in Seattle, WA, USA. In Section 12.2, we’ll present you with examples of effective data storytelling, in particular the articles from the data journalism website FiveThirtyEight.com, many of whose source datasets are accessible from the fivethirtyeight R package. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(fivethirtyeight) DataCamp The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression”. Cases studies involving data in the fivethirtyeight R package form the basis of ModernDive co-author Chester Ismay’s DataCamp course “Effective Data Storytelling in the Tidyverse”. This free course can be accessed here. 12.1 Case study: Seattle house prices Kaggle.com is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the House Sales in King County, USA consisting of homes sold in between May 2014 and May 2015 in King County, Washington State, USA, which includes the greater Seattle metropolitan area. This CC0: Public Domain licensed dataset is included in the moderndive package in the house_prices data frame, which we’ll refer to as the “Seattle house prices” dataset. The dataset consists 21,613 houses and 21 variables describing these houses; for a full list of these variables see the help file by running ?house_prices in the console. In this case study, we’ll create a model using multiple regression where: The outcome variable \\(y\\) is the sale price of houses The two explanatory/predictor variables we’ll use are : \\(x_1\\): house size sqft_living, as measured by square feet of living space, where 1 square foot is about 0.09 square meters. \\(x_2\\): house condition, a categorical variable with 5 levels where 1 indicates “poor” and 5 indicates “excellent.” Let’s load all the packages needed for this case study (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) 12.1.1 Exploratory data analysis (EDA) A crucial first step before any formal modeling is an exploratory data analysis, commonly abbreviated at EDA. Exploratory data analysis can give you a sense of your data, help identify issues with your data, bring to light any outliers, and help inform model construction. There are three basic approaches to EDA: Most fundamentally, just looking at the raw data. For example using RStudio’s View() spreadsheet viewer or the glimpse() function from the dplyr package Creating visualizations like the ones using ggplot2 from Chapter 3 Computing summary statistics using the dplyr data wrangling tools from Chapter 5 First, let’s look the raw data using View() and the glimpse() function. Explore the dataset. Which variables are numerical and which are categorical? For the categorical variables, what are their levels? Which do you think would useful variables to use in a model for house price? In this case study, we’ll only consider the variables price, sqft_living, and condition. An important thing to observe is that while the condition variable has values 1 through 5, these are saved in R as fct factors i.e. R’s way of saving categorical variables. So you should think of these as the “labels” 1 through 5 and not the numerical values 1 through 5. View(house_prices) glimpse(house_prices) Observations: 21,613 Variables: 21 $ id &lt;chr&gt; &quot;7129300520&quot;, &quot;6414100192&quot;, &quot;5631500400&quot;, &quot;2487200875&quot;,… $ date &lt;date&gt; 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-0… $ price &lt;dbl&gt; 221900, 538000, 180000, 604000, 510000, 1225000, 257500… $ bedrooms &lt;int&gt; 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2… $ bathrooms &lt;dbl&gt; 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2… $ sqft_living &lt;int&gt; 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 18… $ sqft_lot &lt;int&gt; 5650, 7242, 10000, 5000, 8080, 101930, 6819, 9711, 7470… $ floors &lt;dbl&gt; 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, … $ waterfront &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,… $ view &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0… $ condition &lt;fct&gt; 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4… $ grade &lt;fct&gt; 7, 7, 6, 7, 8, 11, 7, 7, 7, 7, 8, 7, 7, 7, 7, 9, 7, 7, … $ sqft_above &lt;int&gt; 1180, 2170, 770, 1050, 1680, 3890, 1715, 1060, 1050, 18… $ sqft_basement &lt;int&gt; 0, 400, 0, 910, 0, 1530, 0, 0, 730, 0, 1700, 300, 0, 0,… $ yr_built &lt;int&gt; 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 2… $ yr_renovated &lt;int&gt; 0, 1991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0… $ zipcode &lt;fct&gt; 98178, 98125, 98028, 98136, 98074, 98053, 98003, 98198,… $ lat &lt;dbl&gt; 47.5, 47.7, 47.7, 47.5, 47.6, 47.7, 47.3, 47.4, 47.5, 4… $ long &lt;dbl&gt; -122, -122, -122, -122, -122, -122, -122, -122, -122, -… $ sqft_living15 &lt;int&gt; 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780, 2… $ sqft_lot15 &lt;int&gt; 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 8113,… Let’s now perform the second possible approach to EDA: creating visualizations. Since price and sqft_living are numerical variables, an appropriate way to visualize of these variables’ distributions would be using a histogram using a geom_histogram() as seen in Section 3.5. However, since condition is categorical, a barplot using a geom_bar() yields an appropriate visualization of its distribution. Recall from Section 3.8 that since condition is not “pre-counted”, we use a geom_bar() and not a geom_col(). In Figure 12.3, we display all three of these visualizations at once. # Histogram of house price: ggplot(house_prices, aes(x = price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;price (USD)&quot;, title = &quot;House price&quot;) # Histogram of sqft_living: ggplot(house_prices, aes(x = sqft_living)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;living space (square feet)&quot;, title = &quot;House size&quot;) # Barplot of condition: ggplot(house_prices, aes(x = condition)) + geom_bar() + labs(x = &quot;condition&quot;, title = &quot;House condition&quot;) Figure 12.3: Exploratory visualizations of Seattle house prices data We observe the following: In the histogram for price: Since e+06 means \\(10^6\\), or one million, we see that a majority of houses are less than 2 million dollars. The x-axis stretches out far to the right to 8 million dollars, even though there appear to be no houses. In the histogram for size sqft_living Most houses appear to have less than 5000 square feet of living space. For comparison a standard American football field is about 57,600 square feet, where as a standard soccer AKA association football field is about 64,000 square feet. The x-axis exhibits the same stretched out behavior to the right as for price Most houses are of condition 3, 4, or 5. In the case of price, why does the x-axis stretch so far to the right? It is because there are a very small number of houses with price closer to 8 million; these prices are outliers in this case. We say variable is “right skewed” as exhibited by the long right tail. This skew makes it difficult to compare prices of the less expensive houses as the more expensive houses dominate the scale of the x-axis. This is similarly the case for sqft_living. Let’s now perform the third possible approach to EDA: computing summary statistics. In particular, let’s compute 4 summary statistics using the summarize() data wrangling verb from Section 5.4. Two measures of center: the mean and median Two measures of variability/spread: the standard deviation and interquartile-range (IQR = 3rd quartile - 1st quartile) house_prices %&gt;% summarize( mean_price = mean(price), median_price = median(price), sd_price = sd(price), IQR_price = IQR(price) ) # A tibble: 1 x 4 mean_price median_price sd_price IQR_price &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 540088. 450000 367127. 323050 Observe the following: The mean price of $540,088 is larger than the median of $450,000. This is because the small number of very expensive outlier houses prices are inflating the average, whereas since the median is the “middle” value, it is not as sensitive to such large values at the high end. This is why the news typically reports median house prices and not average house prices when describing the real estate market. We say here that the median more “robust to outliers” than the mean. Similarly, while both the standard deviation and IQR are both measures of spread and variability, the IQR is more “robust to outliers”. If you repeat the above summarize() for sqft_living, you’ll find a similar relationship between mean vs median and standard deviation vs IQR given its similar right-skewed nature. Is there anything we can do about this right-skew? Again, this could potentially be an issue because we’ll have a harder time discriminating between houses at the lower end of price and sqft_living, which might lead to a problem when modeling. We can in fact address this issue by using a log base 10 transformation, which we cover next. 12.1.2 log10 transformations At its simplest, log10() transformations returns base 10 logarithms. For example, since \\(1000 = 10^3\\), log10(1000) returns 3. To undo a log10-transformation, we raise 10 to this value. For example, to undo the previous log10-transformation and return the original value of 1000, we raise 10 to this value \\(10^{3}\\) by running 10^(3) = 1000. log-transformations allow us to focus on multiplicative changes instead of additive ones, thereby emphasizing changes in “orders of magnitude.” Let’s illustrate this idea in Table ?? with examples of prices of consumer goods in US dollars. Price log10(Price) Order of magnitude Examples $1 0 Singles Cups of coffee $10 1 Tens Books $100 2 Hundreds Mobile phones $1,000 3 Thousands High definition TV’s $10,000 4 Tens of thousands Cars $100,000 5 Hundreds of thousands Luxury cars &amp; houses $1,000,000 6 Millions Luxury houses Let’s break this down: When purchasing a cup of coffee, we tend to think of prices ranging in single dollars e.g. $2 or $3. However when purchasing say mobile phones, we don’t tend to think in prices in single dollars e.g. $676 or $757, but tend to round to the nearest unit of hundreds of dollars e.g. $200 or $500. Let’s say want to know the log10-transformed value of $76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since $76 is between $10 and $100. In fact, log10(76) is 1.880814. log10-transformations are monotonic, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B). Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: $100 to $1000. Let’s create new log10-transformed versions of the right-skewed variable price and sqft_living using the mutate() function from Section 5.6, but we’ll give the latter the name log10_size, which is a little more succinct and descriptive a variable name. house_prices &lt;- house_prices %&gt;% mutate( log10_price = log10(price), log10_size = log10(sqft_living) ) Let’s first display the before and after effects of this transformation on these variables for only the first 10 rows of house_prices: house_prices %&gt;% select(price, log10_price, sqft_living, log10_size) # A tibble: 10 x 4 price log10_price sqft_living log10_size &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; 1 221900 5.35 1180 3.07 2 538000 5.73 2570 3.41 3 180000 5.26 770 2.89 4 604000 5.78 1960 3.29 5 510000 5.71 1680 3.23 6 1225000 6.09 5420 3.73 7 257500 5.41 1715 3.23 8 291850 5.47 1060 3.03 9 229500 5.36 1780 3.25 10 323000 5.51 1890 3.28 Observe in particular: The house in the 6th row with price $1,225,000, which is just above one million dollars. Since \\(10^6\\) is one million, its log10_price is 6.09. Contrast this with all other houses with log10_price less than 6. Similarly, there is only one house with size sqft_living less than 1000. Since \\(1000 = 10^3\\), its the lone house with log10_size less than 3. Let’s now visualize the before and after effects of this transformation for price in Figure 12.4. # Before: ggplot(house_prices, aes(x = price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;price (USD)&quot;, title = &quot;House price: Before&quot;) # After: ggplot(house_prices, aes(x = log10_price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;log10 price (USD)&quot;, title = &quot;House price: After&quot;) Figure 12.4: House price before and after log10-transformation Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and bell-shaped, although this isn’t always necessarily the case. Now you can now better discriminate between house prices at the lower end of the scale. Let’s do the same for size where the before variable is sqft_living and the after variable is log10_size. Observe in Figure 12.5 that the log10-transformation has a similar effect of un-skewing the variable. Again, we emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case. # Before: ggplot(house_prices, aes(x = sqft_living)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;living space (square feet)&quot;, title = &quot;House size: Before&quot;) # After: ggplot(house_prices, aes(x = log10_size)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;log10 living space (square feet)&quot;, title = &quot;House size: After&quot;) Figure 12.5: House size before and after log10-transformation Given the now un-skewed nature of log10_price and log10_size, we are going to revise our modeling structure: We’ll use a new outcome variable \\(y\\) log10_price of houses The two explanatory/predictor variables we’ll use are: \\(x_1\\): A modified version of house size: log10_size \\(x_2\\): House condition will remain unchanged 12.1.3 EDA Part II Let’s continue our exploratory data analysis from Subsection 12.1.1 above. The earlier EDA you performed was univariate in nature in that we only considered one variable at a time. The goal of modeling, however, is to explore relationships between variables. So we must jointly consider the relationship between the outcome variable log10_price and the explanatory/predictor variables log10_size (numerical) and condition (categorical). We viewed such a modeling scenario in Section 7.2 using the evals dataset, where the outcome variable was teaching score, the numerical explanatory/predictor variable was instructor age and the categorical explanatory/predictor variable was (binary) gender. We have two possible visual models. Either a parallel slopes model in Figure 12.6 where we have a different regression line for each of the 5 possible condition levels, each with a different intercept but the same slope: Figure 12.6: Parallel slopes model Or an interaction model in Figure 12.7, where we allow each regression line to not only have different intercepts, but different slopes as well: ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) + geom_point(alpha = 0.1) + labs(y = &quot;log10 price&quot;, x = &quot;log10 size&quot;, title = &quot;House prices in Seattle&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) Figure 12.7: Interaction model In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plot it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the purple line is highest, followed by condition 4 and 3. As for condition 1 and 2, this pattern isn’t as clear, as if you recall from the univariate barplot of condition in Figure 12.3 there are very few houses of condition 1 or 2. This ready is more apparent in an alternative visualization to Figure 12.7 displayed in Figure 12.8 that uses facets instead: ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) + geom_point(alpha = 0.3) + labs(y = &quot;log10 price&quot;, x = &quot;log10 size&quot;, title = &quot;House prices in Seattle&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) + facet_wrap(~condition) Figure 12.8: Interaction model with facets Which exploratory visualization of the interaction model is better, the one in Figure 12.7 or Figure 12.8? There is no universal right answer, you need to make a choice depending on what you want to convey, and own it. 12.1.4 Regression modeling For now let’s focus on the latter, interaction model we’ve visualized in Figure 12.8 above. What are the 5 different slopes and intercepts for the condition = 1, condition = 2, …, and condition = 5 lines in Figure 12.8? To determine these, we first need the values from the regression table: # Fit regression model: price_interaction &lt;- lm(log10_price ~ log10_size * condition, data = house_prices) # Get regression table: get_regression_table(price_interaction) # A tibble: 10 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept 3.33 0.451 7.38 0 2.45 4.22 2 log10_size 0.69 0.148 4.65 0 0.399 0.98 3 condition2 0.047 0.498 0.094 0.925 -0.93 1.02 4 condition3 -0.367 0.452 -0.812 0.417 -1.25 0.519 5 condition4 -0.398 0.453 -0.879 0.38 -1.29 0.49 6 condition5 -0.883 0.457 -1.93 0.053 -1.78 0.013 7 log10_size:condition2 -0.024 0.163 -0.148 0.882 -0.344 0.295 8 log10_size:condition3 0.133 0.148 0.893 0.372 -0.158 0.424 9 log10_size:condition4 0.146 0.149 0.979 0.328 -0.146 0.437 10 log10_size:condition5 0.31 0.15 2.07 0.039 0.016 0.604 Recall from Section 7.2.3 on how to interpret the outputs where there exists an interaction term, where in this case the “baseline for comparison” group for the categorical variable condition are the condition 1 houses. We’ll write our answers as: \\[\\widehat{\\log10(\\text{price})} = \\hat{\\beta}_0 + \\hat{\\beta}_{\\text{size}} * \\log10(\\text{size})\\] for all five condition levels separately: Condition 1: \\(\\widehat{\\log10(\\text{price})} = 3.33 + 0.69 * \\log10(\\text{size})\\) Condition 2: \\(\\widehat{\\log10(\\text{price})} = (3.33 + 0.047) + (0.69 - 0.024) * \\log10(\\text{size}) = 3.38 + 0.666 * \\log10(\\text{size})\\) Condition 3: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.367) + (0.69 + 0.133) * \\log10(\\text{size}) = 2.96 + 0.823 * \\log10(\\text{size})\\) Condition 4: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.398) + (0.69 + 0.146) * \\log10(\\text{size}) = 2.93 + 0.836 * \\log10(\\text{size})\\) Condition 5: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.883) + (0.69 + 0.31) * \\log10(\\text{size}) = 2.45 + 1 * \\log10(\\text{size})\\) These correspond to the regression lines in the exploratory visualization of the interaction model in Figure 12.7 above. For homes of all 5 condition types, as the size of the house increases, the prices increases. This is what most would expect. However, the rate of increase of price with size is fastest for the homes with condition 3, 4, and 5 of 0.823, 0.836, and 1 respectively; these are the 3 most largest slopes out of the 5. 12.1.5 Making predictions Say you’re a realtor and someone calls you asking you how much their home will sell for. They tell you that it’s in condition = 5 and is sized 1900 square feet. What do you tell them? We first make this prediction visually in Figure 12.9. The predicted log10_price of this house is marked with a black dot: it is where the two following lines intersect: The purple regression line for the condition = 5 homes and The vertical dashed black line at log10_size equals 3.28, since our predictor variable is the log10-transformed square feet of living space and \\(\\log10(1900) = 3.28\\) . Figure 12.9: Interaction model with prediction Eyeballing it, it seems the predicted log10_price seems to be around 5.72. Let’s now obtain the an exact numerical value for the prediction using the values of the intercept and slope for the condition = 5 that we computed using the regression table output. We use the equation for the condition = 5 line, being sure to log10() the square footage first. 2.45 + 1 * log10(1900) [1] 5.73 This value is very close to our earlier visually made prediction of 5.72. But wait! We were using the outcome variable log10_price as our outcome variable! So if we want a prediction in terms of price in dollar units, we need to un-log this by taking a power of 10 as described in Section 12.1.2. 10^(2.45 + 1 * log10(1900)) [1] 535493 So we our predicted price for this home of condition 5 and size 1900 square feet is $535,493. Learning check (LC12.1) Repeat the regression modeling in Subsection 12.1.4 and the prediction making you just did on the house of condition 5 and size 1900 square feet in Subsection 12.1.5, but using the parallel slopes model you visualized in Figure 12.6. Hint: it’s $524,807! 12.2 Case study: Effective data storytelling Note: This section is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. As we’ve progressed throughout this book, you’ve seen how to work with data in a variety of ways. You’ve learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types. You’ve summarized data in table form and calculated summary statistics for a variety of different variables. Further, you’ve seen the value of inference as a process to come to conclusions about a population by using a random sample. Lastly, you’ve explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure. All throughout, you’ve learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the “effective data storytelling” done by data journalists around the world. Great data stories don’t mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling. 12.2.1 Bechdel test for Hollywood gender representation We recommend you read and analyze this article by Walt Hickey entitled The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women on the Bechdel test, an informal test of gender representation in a movie. As you read over it, think carefully about how Walt is using data, graphics, and analyses to paint the picture for the reader of what the story is he wants to tell. In the spirit of reproducibility, the members of FiveThirtyEight have also shared the data and R code that they used to create for this story and many more of their articles on GitHub. ModernDive co-authors Chester Ismay and Albert Y. Kim along with Jennifer Chunn went one step further by creating the fivethirtyeight R package. The fivethirtyeight package takes FiveThirtyEight’s article data from GitHub, “tames” it so that it’s novice-friendly, and makes all data, documentation, and the original article easily accessible via an R package. The package homepage also includes a list of all fivethirtyeight data sets included. Furthermore, example “vignettes” of fully reproducible start-to-finish analyses of some of these data using dplyr, ggplot2, and other packages in the tidyverse is available here. For example, a vignette showing how to reproduce one of the plots at the end of the above article on the Bechdel test is available here. 12.2.2 US Births in 1999 Here is another example involving the US_births_1994_2003 data frame of all births in the United States between 1994 and 2003. For more information on this data frame including a link to the original article on FiveThirtyEight.com, check out the help file by running ?US_births_1994_2003 in the console. First, let’s load all necessary packages: library(ggplot2) library(dplyr) library(fivethirtyeight) It’s always a good idea to preview your data, either by using RStudio’s spreadsheet View() function or using glimpse() from the dplyr package below: # Preview data glimpse(US_births_1994_2003) Observations: 3,652 Variables: 6 $ year &lt;int&gt; 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1… $ month &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… $ date_of_month &lt;int&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, … $ date &lt;date&gt; 1994-01-01, 1994-01-02, 1994-01-03, 1994-01-04, 1994-0… $ day_of_week &lt;ord&gt; Sat, Sun, Mon, Tues, Wed, Thurs, Fri, Sat, Sun, Mon, Tu… $ births &lt;int&gt; 8096, 7772, 10142, 11248, 11053, 11406, 11251, 8653, 79… We’ll focus on the number of births for each date, but only for births that occurred in 1999. Recall we achieve this using the filter() command from dplyr package: US_births_1999 &lt;- US_births_1994_2003 %&gt;% filter(year == 1999) Since date is a notion of time, which has a sequential ordering to it, a linegraph AKA a “time series” plot would be more appropriate than a scatterplot. In other words, use a geom_line() instead of geom_point(): ggplot(US_births_1999, aes(x = date, y = births)) + geom_line() + labs(x = &quot;Data&quot;, y = &quot;Number of births&quot;, title = &quot;US Births in 1999&quot;) We see a big valley occurring just before January 1st, 2000, mostly likely due to the holiday season. However, what about the major peak of over 14,000 births occurring just before October 1st, 1999? What could be the reason for this anomalously high spike in ? Time to think with data! 12.2.3 Other examples Stand by! 12.2.4 Script of R code An R script file of all R code used in this chapter is available here. Concluding remarks If you’ve come to this point in the book, I’d suspect that you know a thing or two about how to work with data in R. You’ve also gained a lot of knowledge about how to use simulation techniques to determine statistical significance and how these techniques build an intuition about traditional inferential methods like the \\(t\\)-test. The hope is that you’ve come to appreciate data wrangling, tidy datasets, and the power of data visualization. Actually, the data visualization part may be the most important thing here. If you can create truly beautiful graphics that display information in ways that the reader can clearly decipher, you’ve picked up a great skill. Let’s hope that that skill keeps you creating great stories with data into the near and far distant future. Thanks for coming along for the ride as we dove into modern data analysis using R! "],
+["A-appendixA.html", "A Statistical Background A.1 Basic statistical terms", " A Statistical Background A.1 Basic statistical terms A.1.1 Mean The mean is the most commonly reported measure of center. It is commonly called the “average” though this term can be a little ambiguous. The mean is the sum of all of the data elements divided by how many elements there are. If we have \\(n\\) data points, the mean is given by: \\[Mean = \\frac{x_1 + x_2 + \\cdots + x_n}{n}\\] A.1.2 Median The median is calculated by first sorting a variable’s data from smallest to largest. After sorting the data, the middle element in the list is the median. If the middle falls between two values, then the median is the mean of those two values. A.1.3 Standard deviation We will next discuss the standard deviation of a sample dataset pertaining to one variable. The formula can be a little intimidating at first but it is important to remember that it is essentially a measure of how far to expect a given data value is from its mean: \\[Standard \\, deviation = \\sqrt{\\frac{(x_1 - Mean)^2 + (x_2 - Mean)^2 + \\cdots + (x_n - Mean)^2}{n - 1}}\\] A.1.4 Five-number summary The five-number summary consists of five values: minimum, first quantile (25th percentile), median (50th percentile), third quantile (75th) quantile, and maximum. The quantiles are calculated as first quantile (\\(Q_1\\)): the median of the first half of the sorted data third quantile (\\(Q_3\\)): the median of the second half of the sorted data The interquartile range is defined as \\(Q_3 - Q_1\\) and is a measure of how spread out the middle 50% of values is. The five-number summary is not influenced by the presence of outliers in the ways that the mean and standard deviation are. It is, thus, recommended for skewed datasets. A.1.5 Distribution The distribution of a variable/dataset corresponds to generalizing patterns in the dataset. It often shows how frequently elements in the dataset appear. It shows how the data varies and gives some information about where a typical element in the data might fall. Distributions are most easily seen through data visualization. A.1.6 Outliers Outliers correspond to values in the dataset that fall far outside the range of “ordinary” values. In regards to a boxplot (by default), they correspond to values below \\(Q_1 - (1.5 * IQR)\\) or above \\(Q_3 + (1.5 * IQR)\\). Note that these terms (aside from Distribution) only apply to quantitative variables. "],
+["B-appendixB.html", "B Inference Examples Needed packages B.1 Inference mind map B.2 One mean B.3 One proportion B.4 Two proportions B.5 Two means (independent samples) B.6 Two means (paired samples)", " B Inference Examples This appendix is designed to provide you with examples of the five basic hypothesis tests and their corresponding confidence intervals. Traditional theory-based methods as well as computational-based methods are presented. Note: This appendix is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. Please check out our sneak peak of infer below in the meanwhile. For more details on infer visit https://infer.netlify.com/. Needed packages library(dplyr) library(ggplot2) library(infer) library(knitr) library(readr) library(janitor) B.1 Inference mind map To help you better navigate and choose the appropriate analysis, we’ve created a mind map on http://coggle.it available here and below. Figure B.1: Mind map for Inference B.2 One mean B.2.1 Problem statement The National Survey of Family Growth conducted by the Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men’s and women’s health. One of the variables collected on this survey is the age at first marriage. 5,534 randomly sampled US women between 2006 and 2010 completed the survey. The women sampled here had been married at least once. Do we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years? (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 4]) B.2.2 Competing hypotheses In words Null hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is equal to 23 years. Alternative hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years. In symbols (with annotations) \\(H_0: \\mu = \\mu_{0}\\), where \\(\\mu\\) represents the mean age of first marriage for all US women from 2006 to 2010 and \\(\\mu_0\\) is 23. \\(H_A: \\mu &gt; 23\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.2.3 Exploring the sample data age_at_marriage &lt;- read_csv(&quot;https://moderndive.com/data/ageAtMar.csv&quot;) age_summ &lt;- age_at_marriage %&gt;% summarize(sample_size = n(), mean = mean(age), sd = sd(age), minimum = min(age), lower_quartile = quantile(age, 0.25), median = median(age), upper_quartile = quantile(age, 0.75), max = max(age)) kable(age_summ) sample_size mean sd minimum lower_quartile median upper_quartile max 5534 23.4 4.72 10 20 23 26 43 The histogram below also shows the distribution of age. ggplot(data = age_at_marriage, mapping = aes(x = age)) + geom_histogram(binwidth = 3, color = &quot;white&quot;) The observed statistic of interest here is the sample mean: x_bar &lt;- age_at_marriage %&gt;% specify(response = age) %&gt;% calculate(stat = &quot;mean&quot;) x_bar # A tibble: 1 x 1 stat &lt;dbl&gt; 1 23.4 Guess about statistical significance We are looking to see if the observed sample mean of 23.44 is statistically greater than \\(\\mu_0 = 23\\). They seem to be quite close, but we have a large sample size here. Let’s guess that the large sample size will lead us to reject this practically small difference. B.2.4 Non-traditional methods Bootstrapping for hypothesis test In order to look to see if the observed sample mean of 23.44 is statistically greater than \\(\\mu_0 = 23\\), we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 5534 was selected. We can use the idea of bootstrapping to simulate the population from which the sample came and then generate samples from that simulated population to account for sampling variability. Recall how bootstrapping would apply in this context: Sample with replacement from our original sample of 5534 women and repeat this process 10,000 times, calculate the mean for each of the 10,000 bootstrap samples created in Step 1., combine all of these bootstrap statistics calculated in Step 2 into a boot_distn object, and shift the center of this distribution over to the null value of 23. (This is needed since it will be centered at 23.44 via the process of bootstrapping.) set.seed(2018) null_distn_one_mean &lt;- age_at_marriage %&gt;% specify(response = age) %&gt;% hypothesize(null = &quot;point&quot;, mu = 23) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) null_distn_one_mean %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44 for our \\(p\\)-value. null_distn_one_mean %&gt;% visualize(obs_stat = x_bar, direction = &quot;greater&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_one_mean %&gt;% get_pvalue(obs_stat = x_bar, direction = &quot;greater&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0 So our \\(p\\)-value is 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tail of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\mu\\) using our sample data using bootstrapping. Note that we don’t need to shift this distribution since we want the center of our confidence interval to be our point estimate \\(\\bar{x}_{obs} = 23.44\\). boot_distn_one_mean &lt;- age_at_marriage %&gt;% specify(response = age) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) ci &lt;- boot_distn_one_mean %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 23.3 23.6 boot_distn_one_mean %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 23 is not contained in this confidence interval as a plausible value of \\(\\mu\\) (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (\\(\\mu &gt; 23\\)). Interpretation: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.316 and 23.565. B.2.5 Traditional methods Check conditions Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations are collected independently. The cases are selected independently through random sampling so this condition is met. Approximately normal: The distribution of the response variable should be normal or the sample size should be at least 30. The histogram for the sample above does show some skew. The Q-Q plot below also shows some skew. ggplot(data = age_at_marriage, mapping = aes(sample = age)) + stat_qq() The sample size here is quite large though (\\(n = 5534\\)) so both conditions are met. Test statistic The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean \\(\\mu\\). A good guess is the sample mean \\(\\bar{X}\\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \\(\\bar{x}_{obs} = 23.44\\) or larger assuming that the population mean is 23 (assuming the null hypothesis is true). If the conditions are met and assuming \\(H_0\\) is true, we can “standardize” this original test statistic of \\(\\bar{X}\\) into a \\(T\\) statistic that follows a \\(t\\) distribution with degrees of freedom equal to \\(df = n - 1\\): \\[ T =\\dfrac{ \\bar{X} - \\mu_0}{ S / \\sqrt{n} } \\sim t (df = n - 1) \\] where \\(S\\) represents the standard deviation of the sample and \\(n\\) is the sample size. Observed test statistic While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t_test() function to perform this analysis for us. t_test_results &lt;- age_at_marriage %&gt;% infer::t_test(formula = age ~ NULL, alternative = &quot;greater&quot;, mu = 23) t_test_results # A tibble: 1 x 6 statistic t_df p_value alternative lower_ci upper_ci &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 6.94 5533 2.25e-12 greater 23.3 Inf We see here that the \\(t_{obs}\\) value is 6.936. Compute \\(p\\)-value The \\(p\\)-value—the probability of observing an \\(t_{obs}\\) value of 6.936 or more in our null distribution of a \\(t\\) with 5533 degrees of freedom—is essentially 0. State conclusion We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean was statistically greater than the hypothesized mean has supporting evidence here. Based on this sample, we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years. Confidence interval t.test(x = age_at_marriage$age, alternative = &quot;two.sided&quot;, mu = 23)$conf [1] 23.3 23.6 attr(,&quot;conf.level&quot;) [1] 0.95 B.2.6 Comparing results Observing the bootstrap distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met (the large sample size was the driver here) leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results. B.3 One proportion B.3.1 Problem statement The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. 73 were satisfied and the remaining were unsatisfied. Based on these findings from the sample, can we reject the CEO’s hypothesis that 80% of the customers are satisfied? [Tweaked a bit from http://stattrek.com/hypothesis-test/proportion.aspx?Tutorial=AP] B.3.2 Competing hypotheses In words Null hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is equal 0.80. Alternative hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80. In symbols (with annotations) \\(H_0: \\pi = p_{0}\\), where \\(\\pi\\) represents the proportion of all customers of the large electric utility satisfied with service they receive and \\(p_0\\) is 0.8. \\(H_A: \\pi \\ne 0.8\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.3.3 Exploring the sample data elec &lt;- c(rep(&quot;satisfied&quot;, 73), rep(&quot;unsatisfied&quot;, 27)) %&gt;% as_data_frame() %&gt;% rename(satisfy = value) The bar graph below also shows the distribution of satisfy. ggplot(data = elec, aes(x = satisfy)) + geom_bar() The observed statistic is computed as p_hat &lt;- elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% calculate(stat = &quot;prop&quot;) p_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.73 Guess about statistical significance We are looking to see if the sample proportion of 0.73 is statistically different from \\(p_0 = 0.8\\) based on this sample. They seem to be quite close, and our sample size is not huge here (\\(n = 100\\)). Let’s guess that we do not have evidence to reject the null hypothesis. B.3.4 Non-traditional methods Simulation for hypothesis test In order to look to see if 0.73 is statistically different from 0.8, we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 100 was selected. We can use the idea of an unfair coin to simulate this process. We will simulate flipping an unfair coin (with probability of success 0.8 matching the null hypothesis) 100 times. Then we will keep track of how many heads come up in those 100 flips. Our simulated statistic matches with how we calculated the original statistic \\(\\hat{p}\\): the number of heads (satisfied) out of our total sample of 100. We then repeat this process many times (say 10,000) to create the null distribution looking at the simulated proportions of successes: set.seed(2018) null_distn_one_prop &lt;- elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% hypothesize(null = &quot;point&quot;, p = 0.8) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;prop&quot;) null_distn_one_prop %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a two-tailed test so we will be looking for values that are 0.8 - 0.73 = 0.07 away from 0.8 in BOTH directions for our \\(p\\)-value: null_distn_one_prop %&gt;% visualize(obs_stat = p_hat, direction = &quot;both&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_one_prop %&gt;% get_pvalue(obs_stat = p_hat, direction = &quot;both&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.0813 So our \\(p\\)-value is 0.081 and we fail to reject the null hypothesis at the 5% level. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\pi\\) using our sample data. To do so, we use bootstrapping, which involves sampling with replacement from our original sample of 100 survey respondents and repeating this process 10,000 times, calculating the proportion of successes for each of the 10,000 bootstrap samples created in Step 1., combining all of these bootstrap statistics calculated in Step 2 into a boot_distn object, identifying the 2.5th and 97.5th percentiles of this distribution (corresponding to the 5% significance level chosen) to find a 95% confidence interval for \\(\\pi\\), and interpret this confidence interval in the context of the problem. boot_distn_one_prop &lt;- elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;prop&quot;) Just as we use the mean function for calculating the mean over a numerical variable, we can also use it to compute the proportion of successes for a categorical variable where we specify what we are calling a “success” after the ==. (Think about the formula for calculating a mean and how R handles logical statements such as satisfy == &quot;satisfied&quot; for why this must be true.) ci &lt;- boot_distn_one_prop %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.64 0.81 boot_distn_one_prop %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0.80 is contained in this confidence interval as a plausible value of \\(\\pi\\) (the unknown population proportion). This matches with our hypothesis test results of failing to reject the null hypothesis. Interpretation: We are 95% confident the true proportion of customers who are satisfied with the service they receive is between 0.64 and 0.81. B.3.5 Traditional methods Check conditions Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations are collected independently. The cases are selected independently through random sampling so this condition is met. Approximately normal: The number of expected successes and expected failures is at least 10. This condition is met since 73 and 27 are both greater than 10. Test statistic The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population proportion \\(\\pi\\). A good guess is the sample proportion \\(\\hat{P}\\). Recall that this sample proportion is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample proportion of \\(\\hat{p}_{obs} = 0.73\\) or larger assuming that the population proportion is 0.80 (assuming the null hypothesis is true). If the conditions are met and assuming \\(H_0\\) is true, we can standardize this original test statistic of \\(\\hat{P}\\) into a \\(Z\\) statistic that follows a \\(N(0, 1)\\) distribution. \\[ Z =\\dfrac{ \\hat{P} - p_0}{\\sqrt{\\dfrac{p_0(1 - p_0)}{n} }} \\sim N(0, 1) \\] Observed test statistic While one could compute this observed test statistic by “hand” by plugging the observed values into the formula, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. The calculation has been done in R below for completeness though: p_hat &lt;- 0.73 p0 &lt;- 0.8 n &lt;- 100 (z_obs &lt;- (p_hat - p0) / sqrt( (p0 * (1 - p0)) / n)) [1] -1.75 We see here that the \\(z_{obs}\\) value is around -1.75. Our observed sample proportion of 0.73 is 1.75 standard errors below the hypothesized parameter value of 0.8. Visualize and compute \\(p\\)-value elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% hypothesize(null = &quot;point&quot;, p = 0.8) %&gt;% calculate(stat = &quot;z&quot;) %&gt;% visualize(method = &quot;theoretical&quot;, obs_stat = z_obs, direction = &quot;both&quot;) 2 * pnorm(z_obs) [1] 0.0801 The \\(p\\)-value—the probability of observing an \\(z_{obs}\\) value of -1.75 or more extreme (in both directions) in our null distribution—is around 8%. Note that we could also do this test directly using the prop.test function. stats::prop.test(x = table(elec$satisfy), n = length(elec$satisfy), alternative = &quot;two.sided&quot;, p = 0.8, correct = FALSE) 1-sample proportions test without continuity correction data: table(elec$satisfy), null probability 0.8 X-squared = 3, df = 1, p-value = 0.08 alternative hypothesis: true p is not equal to 0.8 95 percent confidence interval: 0.636 0.807 sample estimates: p 0.73 prop.test does a \\(\\chi^2\\) test here but this matches up exactly with what we would expect: \\(x^2_{obs} = 3.06 = (-1.75)^2 = (z_{obs})^2\\) and the \\(p\\)-values are the same because we are focusing on a two-tailed test. Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping. State conclusion We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample proportion was not statistically greater than the hypothesized proportion has not been invalidated. Based on this sample, we have do not evidence that the proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80, at the 5% level. B.3.6 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results. B.4 Two proportions B.4.1 Problem statement A 2010 survey asked 827 randomly sampled registered voters in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of California? Or do you not know enough to say?” Conduct a hypothesis test to determine if the data provide strong evidence that the proportion of college graduates who do not have an opinion on this issue is different than that of non-college graduates. (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 6]) B.4.2 Competing hypotheses In words Null hypothesis: There is no association between having an opinion on drilling and having a college degree for all registered California voters in 2010. Alternative hypothesis: There is an association between having an opinion on drilling and having a college degree for all registered California voters in 2010. Another way in words Null hypothesis: The probability that a Californian voter in 2010 having no opinion on drilling and is a college graduate is the same as that of a non-college graduate. Alternative hypothesis: These parameter probabilities are different. In symbols (with annotations) \\(H_0: \\pi_{college} = \\pi_{no\\_college}\\) or \\(H_0: \\pi_{college} - \\pi_{no\\_college} = 0\\), where \\(\\pi\\) represents the probability of not having an opinion on drilling. \\(H_A: \\pi_{college} - \\pi_{no\\_college} \\ne 0\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.4.3 Exploring the sample data offshore &lt;- read_csv(&quot;https://moderndive.com/data/offshore.csv&quot;) offshore %&gt;% tabyl(college_grad, response) college_grad no opinion opinion no 131 258 yes 104 334 off_summ &lt;- offshore %&gt;% group_by(college_grad) %&gt;% summarize(prop_no_opinion = mean(response == &quot;no opinion&quot;), sample_size = n()) ggplot(offshore, aes(x = college_grad, fill = response)) + geom_bar(position = &quot;fill&quot;) + coord_flip() Guess about statistical significance We are looking to see if a difference exists in the size of the bars corresponding to no opinion for the plot. Based solely on the plot, we have little reason to believe that a difference exists since the bars seem to be about the same size, BUT…it’s important to use statistics to see if that difference is actually statistically significant! B.4.4 Non-traditional methods Collecting summary info The observed statistic is d_hat &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) d_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -0.0993 Randomization for hypothesis test In order to look to see if the observed sample proportion of no opinion for college graduates of 0.337 is statistically different than that for graduates of 0.237, we need to account for the sample sizes. Note that this is the same as looking to see if \\(\\hat{p}_{grad} - \\hat{p}_{nograd}\\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 389 and 438 were selected. We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability. set.seed(2018) null_distn_two_props &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) null_distn_two_props %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.099 or less than or equal to 0.099 for our \\(p\\)-value. null_distn_two_props %&gt;% visualize(obs_stat = d_hat, direction = &quot;two_sided&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_two_props %&gt;% get_pvalue(obs_stat = d_hat, direction = &quot;two_sided&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.0021 So our \\(p\\)-value is 0.002 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tails of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\pi_{college} - \\pi_{no\\_college}\\) using our sample data with bootstrapping. boot_distn_two_props &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) ci &lt;- boot_distn_two_props %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -0.161 -0.0378 boot_distn_two_props %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0 is not contained in this confidence interval as a plausible value of \\(\\pi_{college} - \\pi_{no\\_college}\\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates. Interpretation: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates. B.4.5 Traditional methods B.4.6 Check conditions Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: Each case that was selected must be independent of all the other cases selected. This condition is met since cases were selected at random to observe. Sample size: The number of pooled successes and pooled failures must be at least 10 for each group. We need to first figure out the pooled success rate: \\[\\hat{p}_{obs} = \\dfrac{131 + 104}{827} = 0.28.\\] We now determine expected (pooled) success and failure counts: \\(0.28 \\cdot (131 + 258) = 108.92\\), \\(0.72 \\cdot (131 + 258) = 280.08\\) \\(0.28 \\cdot (104 + 334) = 122.64\\), \\(0.72 \\cdot (104 + 334) = 315.36\\) Independent selection of samples: The cases are not paired in any meaningful way. We have no reason to suspect that a college graduate selected would have any relationship to a non-college graduate selected. B.4.7 Test statistic The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample proportions corresponding to no opinion on drilling (\\(\\hat{p}_{college, obs} - \\hat{p}_{no\\_college, obs}\\) = 0.033) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the standard normal distribution to standardize the difference in sample proportions (\\(\\hat{P}_{college} - \\hat{P}_{no\\_college}\\)) using the standard error of \\(\\hat{P}_{college} - \\hat{P}_{no\\_college}\\) and the pooled estimate: \\[ Z =\\dfrac{ (\\hat{P}_1 - \\hat{P}_2) - 0}{\\sqrt{\\dfrac{\\hat{P}(1 - \\hat{P})}{n_1} + \\dfrac{\\hat{P}(1 - \\hat{P})}{n_2} }} \\sim N(0, 1) \\] where \\(\\hat{P} = \\dfrac{\\text{total number of successes} }{ \\text{total number of cases}}.\\) Observed test statistic While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the prop.test function to perform this analysis for us. z_hat &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% calculate(stat = &quot;z&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) z_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -3.16 The observed difference in sample proportions is 3.16 standard deviations smaller than 0. The \\(p\\)-value—the probability of observing a \\(Z\\) value of -3.16 or more extreme in our null distribution—is 0.0016. This can also be calculated in R directly: 2 * pnorm(-3.16, lower.tail = TRUE) [1] 0.00158 B.4.8 State conclusion We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians. B.4.9 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results. B.5 Two means (independent samples) B.5.1 Problem statement Average income varies from one region of the country to another, and it often reflects both lifestyles and regional living expenses. Suppose a new graduate is considering a job in two locations, Cleveland, OH and Sacramento, CA, and he wants to see whether the average income in one of these cities is higher than the other. He would like to conduct a hypothesis test based on two randomly selected samples from the 2000 Census. (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 5]) B.5.2 Competing hypotheses In words Null hypothesis: There is no association between income and location (Cleveland, OH and Sacramento, CA). Alternative hypothesis: There is an association between income and location (Cleveland, OH and Sacramento, CA). Another way in words Null hypothesis: The mean income is the same for both cities. Alternative hypothesis: The mean income is different for the two cities. In symbols (with annotations) \\(H_0: \\mu_{sac} = \\mu_{cle}\\) or \\(H_0: \\mu_{sac} - \\mu_{cle} = 0\\), where \\(\\mu\\) represents the average income. \\(H_A: \\mu_{sac} - \\mu_{cle} \\ne 0\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.5.3 Exploring the sample data cle_sac &lt;- read.delim(&quot;https://moderndive.com/data/cleSac.txt&quot;) %&gt;% rename(metro_area = Metropolitan_area_Detailed, income = Total_personal_income) %&gt;% na.omit() inc_summ &lt;- cle_sac %&gt;% group_by(metro_area) %&gt;% summarize(sample_size = n(), mean = mean(income), sd = sd(income), minimum = min(income), lower_quartile = quantile(income, 0.25), median = median(income), upper_quartile = quantile(income, 0.75), max = max(income)) kable(inc_summ) metro_area sample_size mean sd minimum lower_quartile median upper_quartile max Cleveland_ OH 212 27467 27681 0 8475 21000 35275 152400 Sacramento_ CA 175 32428 35774 0 8050 20000 49350 206900 The boxplot below also shows the mean for each group highlighted by the red dots. ggplot(cle_sac, aes(x = metro_area, y = income)) + geom_boxplot() + stat_summary(fun.y = &quot;mean&quot;, geom = &quot;point&quot;, color = &quot;red&quot;) Guess about statistical significance We are looking to see if a difference exists in the mean income of the two levels of the explanatory variable. Based solely on the boxplot, we have reason to believe that no difference exists. The distributions of income seem similar and the means fall in roughly the same place. B.5.4 Non-traditional methods Collecting summary info We now compute the observed statistic: d_hat &lt;- cle_sac %&gt;% specify(income ~ metro_area) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Sacramento_ CA&quot;, &quot;Cleveland_ OH&quot;)) d_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 4960. Randomization for hypothesis test In order to look to see if the observed sample mean for Sacramento of 27467.066 is statistically different than that for Cleveland of 32427.543, we need to account for the sample sizes. Note that this is the same as looking to see if \\(\\bar{x}_{sac} - \\bar{x}_{cle}\\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 212 and 175 were selected. We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability. set.seed(2018) null_distn_two_means &lt;- cle_sac %&gt;% specify(income ~ metro_area) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Sacramento_ CA&quot;, &quot;Cleveland_ OH&quot;)) null_distn_two_means %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to 4960.477 or less than or equal to -4960.477 for our \\(p\\)-value. null_distn_two_means %&gt;% visualize(obs_stat = d_hat, direction = &quot;both&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_two_means %&gt;% get_pvalue(obs_stat = d_hat, direction = &quot;both&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.124 So our \\(p\\)-value is 0.124 and we fail to reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are not very far into the tail of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\mu_{sac} - \\mu_{cle}\\) using our sample data with bootstrapping. Here we will bootstrap each of the groups with replacement instead of shuffling. This is done using the groups argument in the resample function to fix the size of each group to be the same as the original group sizes of 175 for Sacramento and 212 for Cleveland. boot_distn_two_means &lt;- cle_sac %&gt;% specify(income ~ metro_area) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Sacramento_ CA&quot;, &quot;Cleveland_ OH&quot;)) ci &lt;- boot_distn_two_means %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -1446. 11308. boot_distn_two_means %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0 is contained in this confidence interval as a plausible value of \\(\\mu_{sac} - \\mu_{cle}\\) (the unknown population parameter). This matches with our hypothesis test results of failing to reject the null hypothesis. Since zero is a plausible value of the population parameter, we do not have evidence that Sacramento incomes are different than Cleveland incomes. Interpretation: We are 95% confident the true mean yearly income for those living in Sacramento is between 1445.53 dollars smaller to 11307.82 dollars higher than for Cleveland. Note: You could also use the null distribution based on randomization with a shift to have its center at \\(\\bar{x}_{sac} - \\bar{x}_{cle} = \\$4960.48\\) instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above. B.5.5 Traditional methods Check conditions Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations are independent in both groups. This metro_area variable is met since the cases are randomly selected from each city. Approximately normal: The distribution of the response for each group should be normal or the sample sizes should be at least 30. ggplot(cle_sac, aes(x = income)) + geom_histogram(color = &quot;white&quot;, binwidth = 20000) + facet_wrap(~ metro_area) We have some reason to doubt the normality assumption here since both the histograms show deviation from a normal model fitting the data well for each group. The sample sizes for each group are greater than 100 though so the assumptions should still apply. Independent samples: The samples should be collected without any natural pairing. There is no mention of there being a relationship between those selected in Cleveland and in Sacramento. B.5.6 Test statistic The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample means (\\(\\bar{x}_{sac, obs} - \\bar{x}_{cle, obs}\\) = 4960.477) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the \\(t\\) distribution to standardize the difference in sample means (\\(\\bar{X}_{sac} - \\bar{X}_{cle}\\)) using the approximate standard error of \\(\\bar{X}_{sac} - \\bar{X}_{cle}\\) (invoking \\(S_{sac}\\) and \\(S_{cle}\\) as estimates of unknown \\(\\sigma_{sac}\\) and \\(\\sigma_{cle}\\)). \\[ T =\\dfrac{ (\\bar{X}_1 - \\bar{X}_2) - 0}{ \\sqrt{\\dfrac{S_1^2}{n_1} + \\dfrac{S_2^2}{n_2}} } \\sim t (df = min(n_1 - 1, n_2 - 1)) \\] where 1 = Sacramento and 2 = Cleveland with \\(S_1^2\\) and \\(S_2^2\\) the sample variance of the incomes of both cities, respectively, and \\(n_1 = 175\\) for Sacramento and \\(n_2 = 212\\) for Cleveland. Observed test statistic Note that we could also do (ALMOST) this test directly using the t.test function. The x and y arguments are expected to both be numeric vectors here so we’ll need to appropriately filter our datasets. cle_sac %&gt;% specify(income ~ metro_area) %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Cleveland_ OH&quot;, &quot;Sacramento_ CA&quot;)) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -1.50 We see here that the observed test statistic value is around -1.5. While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. B.5.7 Compute \\(p\\)-value The \\(p\\)-value—the probability of observing an \\(t_{174}\\) value of -1.501 or more extreme (in both directions) in our null distribution—is 0.13. This can also be calculated in R directly: 2 * pt(-1.501, df = min(212 - 1, 175 - 1), lower.tail = TRUE) [1] 0.135 We can also approximate by using the standard normal curve: 2 * pnorm(-1.501) [1] 0.133 Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping. B.5.8 State conclusion We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference not existing in the means was backed by this statistical analysis. We do not have evidence to suggest that the true mean income differs between Cleveland, OH and Sacramento, CA based on this data. B.5.9 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results. B.6 Two means (paired samples) Problem statement Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water at 10 randomly selected locations on a stretch of river. Do the data suggest that the true average concentration in the surface water is smaller than that of bottom water? (Note that units are not given.) [Tweaked a bit from https://onlinecourses.science.psu.edu/stat500/node/51] B.6.1 Competing hypotheses In words Null hypothesis: The mean concentration in the bottom water is the same as that of the surface water at different paired locations. Alternative hypothesis: The mean concentration in the surface water is smaller than that of the bottom water at different paired locations. In symbols (with annotations) \\(H_0: \\mu_{diff} = 0\\), where \\(\\mu_{diff}\\) represents the mean difference in concentration for surface water minus bottom water. \\(H_A: \\mu_{diff} &lt; 0\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.6.2 Exploring the sample data zinc_tidy &lt;- read_csv(&quot;https://moderndive.com/data/zinc_tidy.csv&quot;) We want to look at the differences in surface - bottom for each location: zinc_diff &lt;- zinc_tidy %&gt;% group_by(loc_id) %&gt;% summarize(pair_diff = diff(concentration)) %&gt;% ungroup() Next we calculate the mean difference as our observed statistic: d_hat &lt;- zinc_diff %&gt;% specify(response = pair_diff) %&gt;% calculate(stat = &quot;mean&quot;) d_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -0.0804 The histogram below also shows the distribution of pair_diff. ggplot(zinc_diff, aes(x = pair_diff)) + geom_histogram(binwidth = 0.04, color = &quot;white&quot;) Guess about statistical significance We are looking to see if the sample paired mean difference of -0.08 is statistically less than 0. They seem to be quite close, but we have a small number of pairs here. Let’s guess that we will fail to reject the null hypothesis. B.6.3 Non-traditional methods Bootstrapping for hypothesis test In order to look to see if the observed sample mean difference \\(\\bar{x}_{diff} = 4960.477\\) is statistically less than 0, we need to account for the number of pairs. We also need to determine a process that replicates how the paired data was selected in a way similar to how we calculated our original difference in sample means. Treating the differences as our data of interest, we next use the process of bootstrapping to build other simulated samples and then calculate the mean of the bootstrap samples. We hypothesize that the mean difference is zero. This process is similar to comparing the One Mean example seen above, but using the differences between the two groups as a single sample with a hypothesized mean difference of 0. set.seed(2018) null_distn_paired_means &lt;- zinc_diff %&gt;% specify(response = pair_diff) %&gt;% hypothesize(null = &quot;point&quot;, mu = 0) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) null_distn_paired_means %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a left-tailed test so we will be looking for values that are less than or equal to 4960.477 for our \\(p\\)-value. null_distn_paired_means %&gt;% visualize(obs_stat = d_hat, direction = &quot;less&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_paired_means %&gt;% get_pvalue(obs_stat = d_hat, direction = &quot;less&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0 So our \\(p\\)-value is essentially 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the left tail of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\mu_{diff}\\) using our sample data (the calculated differences) with bootstrapping. This is similar to the bootstrapping done in a one sample mean case, except now our data is differences instead of raw numerical data. Note that this code is identical to the pipeline shown in the hypothesis test above except the hypothesize() function is not called. boot_distn_paired_means &lt;- zinc_diff %&gt;% specify(response = pair_diff) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) ci &lt;- boot_distn_paired_means %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -0.112 -0.0503 boot_distn_paired_means %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0 is not contained in this confidence interval as a plausible value of \\(\\mu_{diff}\\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter and since the entire confidence interval falls below zero, we have evidence that surface zinc concentration levels are lower, on average, than bottom level zinc concentrations. Interpretation: We are 95% confident the true mean zinc concentration on the surface is between 0.11 units smaller to 0.05 units smaller than on the bottom. B.6.4 Traditional methods Check conditions Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations among pairs are independent. The locations are selected independently through random sampling so this condition is met. Approximately normal: The distribution of population of differences is normal or the number of pairs is at least 30. The histogram above does show some skew so we have reason to doubt the population being normal based on this sample. We also only have 10 pairs which is fewer than the 30 needed. A theory-based test may not be valid here. Test statistic The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean difference \\(\\mu_{diff}\\). A good guess is the sample mean difference \\(\\bar{X}_{diff}\\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \\(\\bar{x}_{diff, obs} = 0.0804\\) or larger assuming that the population mean difference is 0 (assuming the null hypothesis is true). If the conditions are met and assuming \\(H_0\\) is true, we can “standardize” this original test statistic of \\(\\bar{X}_{diff}\\) into a \\(T\\) statistic that follows a \\(t\\) distribution with degrees of freedom equal to \\(df = n - 1\\): \\[ T =\\dfrac{ \\bar{X}_{diff} - 0}{ S_{diff} / \\sqrt{n} } \\sim t (df = n - 1) \\] where \\(S\\) represents the standard deviation of the sample differences and \\(n\\) is the number of pairs. Observed test statistic While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t_test function on the differences to perform this analysis for us. t_test_results &lt;- zinc_diff %&gt;% infer::t_test(formula = pair_diff ~ NULL, alternative = &quot;less&quot;, mu = 0) t_test_results # A tibble: 1 x 6 statistic t_df p_value alternative lower_ci upper_ci &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 -4.86 9 0.000446 less -Inf -0.0501 We see here that the \\(t_{obs}\\) value is -4.864. Compute \\(p\\)-value The \\(p\\)-value—the probability of observing a \\(t_{obs}\\) value of -4.864 or less in our null distribution of a \\(t\\) with 9 degrees of freedom—is 0. This can also be calculated in R directly: pt(-4.8638, df = nrow(zinc_diff) - 1, lower.tail = TRUE) [1] 0.000446 State conclusion We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean difference was not statistically less than the hypothesized mean of 0 has been invalidated here. Based on this sample, we have evidence that the mean concentration in the bottom water is greater than that of the surface water at different paired locations. B.6.5 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results here. "],
+["C-appendixC.html", "C Reach for the Stars Needed packages C.1 Sorted barplots C.2 Interactive graphics", " C Reach for the Stars Needed packages library(dplyr) library(ggplot2) library(knitr) library(dygraphs) library(nycflights13) C.1 Sorted barplots Building upon the example in Section 3.8: flights_table &lt;- table(flights$carrier) flights_table 9E AA AS B6 DL EV F9 FL HA MQ OO UA US 18460 32729 714 54635 48110 54173 685 3260 342 26397 32 58665 20536 VX WN YV 5162 12275 601 We can sort this table from highest to lowest counts by using the sort function: sorted_flights &lt;- sort(flights_table, decreasing = TRUE) names(sorted_flights) [1] &quot;UA&quot; &quot;B6&quot; &quot;EV&quot; &quot;DL&quot; &quot;AA&quot; &quot;MQ&quot; &quot;US&quot; &quot;9E&quot; &quot;WN&quot; &quot;VX&quot; &quot;FL&quot; &quot;AS&quot; &quot;F9&quot; &quot;YV&quot; &quot;HA&quot; [16] &quot;OO&quot; It is often preferred for barplots to be ordered corresponding to the heights of the bars. This allows the reader to more easily compare the ordering of different airlines in terms of departed flights (Robbins 2013). We can also much more easily answer questions like “How many airlines have more departing flights than Southwest Airlines?”. We can use the sorted table giving the number of flights defined as sorted_flights to reorder the carrier. ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() + scale_x_discrete(limits = names(sorted_flights)) Figure C.1: Number of flights departing NYC in 2013 by airline - Descending numbers The last addition here specifies the values of the horizontal x axis on a discrete scale to correspond to those given by the entries of sorted_flights. C.2 Interactive graphics C.2.1 Interactive linegraphs Another useful tool for viewing linegraphs such as this is the dygraph function in the dygraphs package in combination with the dyRangeSelector function. This allows us to zoom in on a selected range and get an interactive plot for us to work with: library(dygraphs) flights_day &lt;- mutate(flights, date = as.Date(time_hour)) flights_summarized &lt;- flights_day %&gt;% group_by(date) %&gt;% summarize(median_arr_delay = median(arr_delay, na.rm = TRUE)) rownames(flights_summarized) &lt;- flights_summarized$date flights_summarized &lt;- select(flights_summarized, -date) dyRangeSelector(dygraph(flights_summarized)) The syntax here is a little different than what we have covered so far. The dygraph function is expecting for the dates to be given as the rownames of the object. We then remove the date variable from the flights_summarized data frame since it is accounted for in the rownames. Lastly, we run the dygraph function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via dyRangeSelector. (Note that this plot will only be interactive in the HTML version of this book.) "],
+["references.html", "References", " References "]
+]
diff --git a/docs/previous_versions/v0.4.0/style.css b/docs/previous_versions/v0.4.0/style.css
new file mode 100755
index 000000000..ed4333b47
--- /dev/null
+++ b/docs/previous_versions/v0.4.0/style.css
@@ -0,0 +1,26 @@
+.learncheck {
+  padding: 1em 1em 1em 1em;
+  margin-bottom: 10px;
+  background: #9ED3AD 5px center/3em no-repeat;
+} 
+
+.review {
+  padding: 1em 1em 1em 1em;
+  margin-bottom: 10px;
+  background: #9ED3AD 1px center/1em no-repeat;
+} 
+
+p.caption {
+  color: #777;
+  margin-top: 10px;
+}
+p code {
+  white-space: inherit;
+}
+pre {
+  word-break: normal;
+  word-wrap: normal;
+}
+pre code {
+  white-space: inherit;
+}
diff --git a/docs/previous_versions/v0.4.0/wide_format.png b/docs/previous_versions/v0.4.0/wide_format.png
new file mode 100755
index 000000000..d693fadc2
Binary files /dev/null and b/docs/previous_versions/v0.4.0/wide_format.png differ
diff --git a/docs/references.html b/docs/references.html
index 02b148add..601e02448 100644
--- a/docs/references.html
+++ b/docs/references.html
@@ -6,20 +6,20 @@
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <title>References | Statistical Inference via Data Science</title>
-  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools.">
   <meta name="generator" content="bookdown  and GitBook 2.6.7">
 
   <meta property="og:title" content="References | Statistical Inference via Data Science" />
   <meta property="og:type" content="book" />
   <meta property="og:url" content="https://moderndive.com/" />
   <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
-  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="github-repo" content="moderndive/moderndive_book" />
 
   <meta name="twitter:card" content="summary" />
   <meta name="twitter:title" content="References | Statistical Inference via Data Science" />
   
-  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
   <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
 
 <meta name="author" content="Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton">
@@ -214,9 +214,10 @@
 <li class="chapter" data-level="4.5" data-path="4-wrangling.html"><a href="4-wrangling.html#mutate"><i class="fa fa-check"></i><b>4.5</b> <code>mutate</code> existing variables</a></li>
 <li class="chapter" data-level="4.6" data-path="4-wrangling.html"><a href="4-wrangling.html#arrange"><i class="fa fa-check"></i><b>4.6</b> <code>arrange</code> and sort rows</a></li>
 <li class="chapter" data-level="4.7" data-path="4-wrangling.html"><a href="4-wrangling.html#joins"><i class="fa fa-check"></i><b>4.7</b> <code>join</code> data frames</a><ul>
-<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>4.7.1</b> Joining by “key” variables</a></li>
-<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>4.7.2</b> Joining by “key” variables with different names</a></li>
-<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Joining by multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.1" data-path="4-wrangling.html"><a href="4-wrangling.html#matching-key-variable-names"><i class="fa fa-check"></i><b>4.7.1</b> Matching “key” variable names</a></li>
+<li class="chapter" data-level="4.7.2" data-path="4-wrangling.html"><a href="4-wrangling.html#diff-key"><i class="fa fa-check"></i><b>4.7.2</b> Different “key” variable names</a></li>
+<li class="chapter" data-level="4.7.3" data-path="4-wrangling.html"><a href="4-wrangling.html#multiple-key-variables"><i class="fa fa-check"></i><b>4.7.3</b> Multiple “key” variables</a></li>
+<li class="chapter" data-level="4.7.4" data-path="4-wrangling.html"><a href="4-wrangling.html#normal-forms"><i class="fa fa-check"></i><b>4.7.4</b> Normal forms</a></li>
 </ul></li>
 <li class="chapter" data-level="4.8" data-path="4-wrangling.html"><a href="4-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>4.8</b> Other verbs</a><ul>
 <li class="chapter" data-level="4.8.1" data-path="4-wrangling.html"><a href="4-wrangling.html#select"><i class="fa fa-check"></i><b>4.8.1</b> <code>select</code> variables</a></li>
@@ -232,26 +233,24 @@
 <li class="chapter" data-level="5" data-path="5-tidy.html"><a href="5-tidy.html"><i class="fa fa-check"></i><b>5</b> Data Importing &amp; “Tidy” Data</a><ul>
 <li class="chapter" data-level="" data-path="5-tidy.html"><a href="5-tidy.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="5.1" data-path="5-tidy.html"><a href="5-tidy.html#csv"><i class="fa fa-check"></i><b>5.1</b> Importing data</a><ul>
-<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Importing via the console</a></li>
-<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#importing-via-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Importing via RStudio’s interface</a></li>
+<li class="chapter" data-level="5.1.1" data-path="5-tidy.html"><a href="5-tidy.html#using-the-console"><i class="fa fa-check"></i><b>5.1.1</b> Using the console</a></li>
+<li class="chapter" data-level="5.1.2" data-path="5-tidy.html"><a href="5-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>5.1.2</b> Using RStudio’s interface</a></li>
 </ul></li>
-<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
-<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> What is tidy data?</a></li>
-<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-format"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” format</a></li>
+<li class="chapter" data-level="5.2" data-path="5-tidy.html"><a href="5-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>5.2</b> Tidy data</a><ul>
+<li class="chapter" data-level="5.2.1" data-path="5-tidy.html"><a href="5-tidy.html#definition-of-tidy-data"><i class="fa fa-check"></i><b>5.2.1</b> Definition of “tidy” data</a></li>
+<li class="chapter" data-level="5.2.2" data-path="5-tidy.html"><a href="5-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>5.2.2</b> Converting to “tidy” data</a></li>
 <li class="chapter" data-level="5.2.3" data-path="5-tidy.html"><a href="5-tidy.html#nycflights13-package-1"><i class="fa fa-check"></i><b>5.2.3</b> <code>nycflights13</code> package</a></li>
 </ul></li>
 <li class="chapter" data-level="5.3" data-path="5-tidy.html"><a href="5-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>5.3</b> Case study: Democracy in Guatemala</a></li>
 <li class="chapter" data-level="5.4" data-path="5-tidy.html"><a href="5-tidy.html#conclusion-3"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a><ul>
 <li class="chapter" data-level="5.4.1" data-path="5-tidy.html"><a href="5-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>5.4.1</b> <code>tidyverse</code> package</a></li>
-<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>5.4.2</b> Optional: Normal forms of data</a></li>
-<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.3</b> Additional resources</a></li>
-<li class="chapter" data-level="5.4.4" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.4</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.4.2" data-path="5-tidy.html"><a href="5-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>5.4.2</b> Additional resources</a></li>
+<li class="chapter" data-level="5.4.3" data-path="5-tidy.html"><a href="5-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>5.4.3</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
 <li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
 <li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
 <li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
@@ -269,12 +268,12 @@
 <li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
 </ul></li>
 <li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="6.4.2" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
 <li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
 <li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
 <li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
@@ -291,42 +290,36 @@
 <li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
 </ul></li>
 <li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
-<li class="chapter" data-level="7.4.1" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-1"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#additional-resources-5"><i class="fa fa-check"></i><b>7.4.1</b> Additional resources</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.2</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
-<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="part"><span><b>III Statistical inference via infer</b></span></li>
 <li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
 <li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
 <li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>8.1</b> Sampling activity</a><ul>
 <li class="chapter" data-level="8.1.1" data-path="8-sampling.html"><a href="8-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>8.1.1</b> What proportion of this bowl’s balls are red?</a></li>
-<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.1.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>8.1.2</b> Using the shovel once</a></li>
+<li class="chapter" data-level="8.1.3" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.1.3</b> Using the shovel 33 times</a></li>
 <li class="chapter" data-level="8.1.4" data-path="8-sampling.html"><a href="8-sampling.html#what-are-we-doing-here"><i class="fa fa-check"></i><b>8.1.4</b> What are we doing here?</a></li>
 </ul></li>
 <li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation</a><ul>
-<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
-<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
-<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using the virtual shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-33-times"><i class="fa fa-check"></i><b>8.2.2</b> Using the virtual shovel 33 times</a></li>
+<li class="chapter" data-level="8.2.3" data-path="8-sampling.html"><a href="8-sampling.html#using-the-virtual-shovel-1000-times"><i class="fa fa-check"></i><b>8.2.3</b> Using the virtual shovel 1000 times</a></li>
 <li class="chapter" data-level="8.2.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.2.4</b> Using different shovels</a></li>
 </ul></li>
-<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-goal"><i class="fa fa-check"></i><b>8.3</b> Our goal</a><ul>
-<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#what-is-sampling-variation"><i class="fa fa-check"></i><b>8.3.1</b> What is sampling variation?</a></li>
-<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#effect-of-sample-size"><i class="fa fa-check"></i><b>8.3.2</b> Effect of sample size</a></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.3</b> Sampling framework</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology-notation"><i class="fa fa-check"></i><b>8.3.1</b> Terminology &amp; notation</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.3.2</b> Statistical definitions</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#the-moral-of-the-story"><i class="fa fa-check"></i><b>8.3.3</b> The moral of the story</a></li>
 </ul></li>
-<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>8.4</b> Sampling framework</a><ul>
-<li class="chapter" data-level="8.4.1" data-path="8-sampling.html"><a href="8-sampling.html#terminology"><i class="fa fa-check"></i><b>8.4.1</b> Terminology</a></li>
-<li class="chapter" data-level="8.4.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-for-inference"><i class="fa fa-check"></i><b>8.4.2</b> Sampling for inference</a></li>
-<li class="chapter" data-level="8.4.3" data-path="8-sampling.html"><a href="8-sampling.html#statistical-definitions"><i class="fa fa-check"></i><b>8.4.3</b> Statistical definitions</a></li>
-</ul></li>
-<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-intepretation"><i class="fa fa-check"></i><b>8.5</b> Interpretation</a></li>
-<li class="chapter" data-level="8.6" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.6</b> Case study: Polls</a></li>
-<li class="chapter" data-level="8.7" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a><ul>
-<li class="chapter" data-level="8.7.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.7.1</b> Table of inference scenarios</a></li>
-<li class="chapter" data-level="8.7.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-sampling-vs-assignment"><i class="fa fa-check"></i><b>8.7.2</b> Random sampling vs random assignment</a></li>
-<li class="chapter" data-level="8.7.3" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.7.3</b> Theory: Central Limit Theorem</a></li>
-<li class="chapter" data-level="8.7.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-standard-error"><i class="fa fa-check"></i><b>8.7.4</b> Formula: Standard error</a></li>
-<li class="chapter" data-level="8.7.5" data-path="8-sampling.html"><a href="8-sampling.html#closing-notes"><i class="fa fa-check"></i><b>8.7.5</b> Closing notes</a></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>8.4</b> Case study: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>8.5.2</b> Summary table</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#additional-resources-6"><i class="fa fa-check"></i><b>8.5.3</b> Additional resources</a></li>
+<li class="chapter" data-level="8.5.4" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.4</b> What’s to come?</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
@@ -358,8 +351,8 @@
 <li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
 </ul></li>
 <li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-6"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
-<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-4"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
-<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-2"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
@@ -394,7 +387,7 @@
 <li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
 </ul></li>
 <li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-7"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
-<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-3"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-1"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
 </ul></li>
 </ul></li>
 <li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
@@ -413,7 +406,7 @@
 <li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
 <li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
 <li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
-<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-2"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="11.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#residual-analysis"><i class="fa fa-check"></i><b>11.4</b> Residual analysis</a><ul>
 <li class="chapter" data-level="11.4.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model1residuals"><i class="fa fa-check"></i><b>11.4.1</b> Residual analysis</a></li>
@@ -425,7 +418,6 @@
 <li class="part"><span><b>IV Conclusion</b></span></li>
 <li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
-<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
 <li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
 <li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
 <li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
@@ -437,7 +429,7 @@
 <li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
 <li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
 <li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
-<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-5"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-3"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
 </ul></li>
 <li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
 </ul></li>
@@ -451,6 +443,7 @@
 <li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
 <li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
 </ul></li>
+<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#normal-distribution-discussion"><i class="fa fa-check"></i><b>A.2</b> Normal distribution discussion</a></li>
 </ul></li>
 <li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
 <li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
diff --git a/docs/scripts/03-visualization.R b/docs/scripts/03-visualization.R
index ca0004923..bf67a2177 100644
--- a/docs/scripts/03-visualization.R
+++ b/docs/scripts/03-visualization.R
@@ -125,7 +125,7 @@ ggplot(data = weather, mapping = aes(x = temp)) +
 ggplot(data = weather, mapping = aes(x = temp)) +
   geom_histogram(color = "white", fill = "steelblue")
 
-## ---- warning=FALSE, message=FALSE, fig.cap= "Histogram with 60 bins."----
+## ---- warning=FALSE, message=FALSE, fig.cap= "Histogram with 40 bins."----
 ggplot(data = weather, mapping = aes(x = temp)) +
   geom_histogram(bins = 40, color = "white")
 
diff --git a/docs/scripts/04-wrangling.R b/docs/scripts/04-wrangling.R
index 91bcd9adc..84c68ed00 100644
--- a/docs/scripts/04-wrangling.R
+++ b/docs/scripts/04-wrangling.R
@@ -30,9 +30,12 @@ library(nycflights13)
 
 ## ---- eval=FALSE---------------------------------------------------------
 ## btv_sea_flights_fall <- flights %>%
-##   filter(origin == "JFK",
-##          dest == "BTV" | dest == "SEA",
-##          month >= 10)
+##   filter(origin == "JFK" & (dest == "BTV" | dest == "SEA") & month >= 10)
+## View(btv_sea_flights_fall)
+
+## ---- eval=FALSE---------------------------------------------------------
+## btv_sea_flights_fall <- flights %>%
+##   filter(origin == "JFK", (dest == "BTV" | dest == "SEA"), month >= 10)
 ## View(btv_sea_flights_fall)
 
 ## ---- eval=FALSE---------------------------------------------------------
@@ -40,6 +43,10 @@ library(nycflights13)
 ##   filter(!(dest == "BTV" | dest == "SEA"))
 ## View(not_BTV_SEA)
 
+## ---- eval=FALSE---------------------------------------------------------
+## flights %>%
+##   filter(!dest == "BTV" | dest == "SEA")
+
 ## ---- eval=FALSE---------------------------------------------------------
 ## many_airports <- flights %>%
 ##   filter(dest == "BTV" | dest == "SEA" | dest == "PDX" | dest == "SFO" | dest == "BDL")
@@ -50,11 +57,10 @@ library(nycflights13)
 ##   filter(dest %in% c("BTV", "SEA", "PDX", "SFO", "BDL"))
 ## View(many_airports)
 
-## ---- eval=FALSE---------------------------------------------------------
-## summary_temp <- weather %>%
-##   summarize(mean = mean(temp),
-##             std_dev = sd(temp))
-## summary_temp
+## ---- eval=TRUE----------------------------------------------------------
+summary_temp <- weather %>% 
+  summarize(mean = mean(temp), std_dev = sd(temp))
+summary_temp
 
 ## ---- echo=FALSE, eval=FALSE---------------------------------------------
 ## options(knitr.kable.NA = '')
@@ -65,22 +71,19 @@ library(nycflights13)
 ##   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
 ##                 latex_options = c("HOLD_position"))
 
-## ---- eval=FALSE---------------------------------------------------------
-## summary_temp <- weather %>%
-##   summarize(mean = mean(temp, na.rm = TRUE),
-##             std_dev = sd(temp, na.rm = TRUE))
-## summary_temp
-
-## ---- echo=FALSE---------------------------------------------------------
+## ---- eval = TRUE--------------------------------------------------------
 summary_temp <- weather %>% 
   summarize(mean = mean(temp, na.rm = TRUE), 
             std_dev = sd(temp, na.rm = TRUE))
-kable(summary_temp) %>% 
-  kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
-                latex_options = c("HOLD_position"))
+summary_temp
 
-## ------------------------------------------------------------------------
-#summary_temp$mean
+## ---- echo=FALSE, eval=FALSE---------------------------------------------
+## summary_temp <- weather %>%
+##   summarize(mean = mean(temp, na.rm = TRUE),
+##             std_dev = sd(temp, na.rm = TRUE))
+## kable(summary_temp) %>%
+##   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
+##                 latex_options = c("HOLD_position"))
 
 ## ----eval=FALSE----------------------------------------------------------
 ## summary_temp <- weather %>%
@@ -103,19 +106,36 @@ kable(summary_monthly_temp) %>%
   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
                 latex_options = c("HOLD_position"))
 
-## ---- eval=FALSE---------------------------------------------------------
-## by_origin <- flights %>%
-##   group_by(origin) %>%
-##   summarize(count = n())
-## by_origin
+## ---- eval=TRUE----------------------------------------------------------
+diamonds
 
-## ---- echo=FALSE---------------------------------------------------------
+## ---- eval=TRUE----------------------------------------------------------
+diamonds %>% 
+  group_by(cut)
+
+## ---- eval=TRUE----------------------------------------------------------
+diamonds %>% 
+  group_by(cut) %>% 
+  summarize(avg_price = mean(price))
+
+## ---- eval=TRUE----------------------------------------------------------
+diamonds %>% 
+  group_by(cut) %>% 
+  ungroup()
+
+## ---- eval=TRUE----------------------------------------------------------
 by_origin <- flights %>% 
   group_by(origin) %>% 
   summarize(count = n())
-kable(by_origin) %>% 
-  kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
-                latex_options = c("HOLD_position"))
+by_origin
+
+## ---- echo=FALSE, eval=FALSE---------------------------------------------
+## by_origin <- flights %>%
+##   group_by(origin) %>%
+##   summarize(count = n())
+## kable(by_origin) %>%
+##   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
+##                 latex_options = c("HOLD_position"))
 
 ## ------------------------------------------------------------------------
 by_origin_monthly <- flights %>% 
@@ -130,12 +150,23 @@ by_origin_monthly_incorrect <- flights %>%
   summarize(count = n())
 by_origin_monthly_incorrect
 
+## NA
 ## ---- eval=FALSE---------------------------------------------------------
-## by_monthly_origin <- flights %>%
-##   count(origin, month)
-## by_monthly_origin
+## weather <- weather %>%
+##   mutate(temp_in_C = (temp-32)/1.8)
+## View(weather)
+
+## ---- eval=TRUE, echo=FALSE----------------------------------------------
+weather <- weather %>% 
+  mutate(temp_in_C = (temp-32)/1.8)
+
+## ------------------------------------------------------------------------
+summary_monthly_temp <- weather %>% 
+  group_by(month) %>% 
+  summarize(mean_temp_in_F = mean(temp, na.rm = TRUE), 
+            mean_temp_in_C = mean(temp_in_C, na.rm = TRUE))
+summary_monthly_temp
 
-## NA
 ## ------------------------------------------------------------------------
 flights <- flights %>% 
   mutate(gain = dep_delay - arr_delay)
@@ -214,8 +245,9 @@ freq_dest %>%
 ## View(airports)
 
 ## ---- eval=FALSE---------------------------------------------------------
-## flights %>%
+## flights_with_airport_names <-  flights %>%
 ##   inner_join(airports, by = c("dest" = "faa"))
+## View(flights_with_airport_names)
 
 ## ------------------------------------------------------------------------
 named_dests <- flights %>%
@@ -226,11 +258,17 @@ named_dests <- flights %>%
   rename(airport_name = name)
 named_dests
 
-## ------------------------------------------------------------------------
-flights_weather_joined <- flights %>%
-  inner_join(weather, 
-             by = c("year", "month", "day", "hour", "origin"))
-flights_weather_joined
+## ---- eval=FALSE---------------------------------------------------------
+## flights_weather_joined <- flights %>%
+##   inner_join(weather, by = c("year", "month", "day", "hour", "origin"))
+## View(flights_weather_joined)
+
+## ----eval=FALSE----------------------------------------------------------
+## joined_flights <- flights %>%
+##   inner_join(airlines, by = "carrier")
+## View(joined_flights)
+
+## **_Learning check_**
 
 ## ---- eval=FALSE---------------------------------------------------------
 ## glimpse(flights)
@@ -242,7 +280,7 @@ flights_weather_joined
 ## ---- eval=FALSE---------------------------------------------------------
 ## flights_no_year <- flights %>%
 ##   select(-year)
-## names(flights_no_year)
+## glimpse(flights_no_year)
 
 ## ---- eval=FALSE---------------------------------------------------------
 ## flight_arr_times <- flights %>%
@@ -251,8 +289,8 @@ flights_weather_joined
 
 ## ---- eval=FALSE---------------------------------------------------------
 ## flights_reorder <- flights %>%
-##   select(month:day, hour:time_hour, everything())
-## names(flights_reorder)
+##   select(year, month, day, hour, minute, time_hour, everything())
+## glimpse(flights_reorder)
 
 ## ---- eval=FALSE---------------------------------------------------------
 ## flights_begin_a <- flights %>%
@@ -274,7 +312,7 @@ flights_weather_joined
 ##   select(contains("time")) %>%
 ##   rename(departure_time = dep_time,
 ##          arrival_time = arr_time)
-## names(flights_time)
+## glimpse(flights_time)
 
 ## ---- eval=FALSE---------------------------------------------------------
 ## named_dests %>%
@@ -285,29 +323,21 @@ flights_weather_joined
 ##   top_n(n = 10, wt = num_flights) %>%
 ##   arrange(desc(num_flights))
 
-## ---- eval=FALSE---------------------------------------------------------
-## ten_freq_dests <- flights %>%
-##   group_by(dest) %>%
-##   summarize(num_flights = n()) %>%
-##   arrange(desc(num_flights)) %>%
-##   top_n(n = 10)
-## View(ten_freq_dests)
-
 ## ----wrangle-summary-table, echo=FALSE, message=FALSE--------------------
 # The following Google Doc is published to CSV and loaded below using read_csv() below:
 # https://docs.google.com/spreadsheets/d/1nRkXfYMQiTj79c08xQPY0zkoJSpde3NC1w6DRhsWCss/edit#gid=0
 
 "https://docs.google.com/spreadsheets/d/e/2PACX-1vRgwl1lugQA6zxzfB6_0hM5vBjXkU7cbUVYYXLcWeaRJ9HmvNXyCjzJCgiGW8HCe1kvjLCGYHf-BvYL/pub?gid=0&single=true&output=csv" %>% 
   read_csv(na = "") %>% 
-  rename_(" " = "X1") %>% 
+  select(-X1) %>% 
   kable(
     caption = "Summary of data wrangling verbs", 
     booktabs = TRUE
   ) %>% 
   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
                 latex_options = c("HOLD_position")) %>%
-  column_spec(2, width = "0.9in") %>% 
-  column_spec(3, width = "3.3in")
+  column_spec(1, width = "0.9in") %>% 
+  column_spec(2, width = "3.3in")
 
 ## ----dplyr-cheatsheet, echo=FALSE, fig.cap="Data Transformation with dplyr cheatsheat"----
 include_graphics("images/dplyr_cheatsheet-1.png")
diff --git a/docs/scripts/05-tidy.R b/docs/scripts/05-tidy.R
index 5b4d8c780..68e23316e 100644
--- a/docs/scripts/05-tidy.R
+++ b/docs/scripts/05-tidy.R
@@ -11,7 +11,7 @@ knitr::opts_chunk$set(
   fig.height = 4,
   fig.align='center',
   warning = FALSE
-  )
+)
 
 options(scipen = 99, digits = 3)
 
@@ -66,10 +66,13 @@ ggplot(drinks_smaller_tidy, aes(x=country, y=servings, fill=type)) +
 ## ------------------------------------------------------------------------
 drinks_smaller_tidy
 
-## ----tidyfig, echo=FALSE, fig.cap="Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html"----
+## ------------------------------------------------------------------------
+drinks_smaller
+
+## ----tidyfig, echo=FALSE, fig.cap="Tidy data graphic from [R for Data Science](http://r4ds.had.co.nz/tidy-data.html)."----
 knitr::include_graphics("images/tidy-1.png")
 
-## ----echo=FALSE----------------------------------------------------------
+## ----non-tidy-stocks, echo=FALSE-----------------------------------------
 stocks <- data_frame(
   Date = as.Date('2009-01-01') + 0:4,
   `Boeing Stock Price` = paste("$", c("173.55", "172.61", "173.86", "170.77", "174.29"), sep = ""),
@@ -86,7 +89,7 @@ stocks %>%
   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
                 latex_options = c("HOLD_position"))
 
-## ----echo=FALSE----------------------------------------------------------
+## ----tidy-stocks, echo=FALSE---------------------------------------------
 stocks_tidy <- stocks %>% 
   rename(
     Boeing = `Boeing Stock Price`,
@@ -103,7 +106,7 @@ stocks_tidy %>%
   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
                 latex_options = c("HOLD_position"))
 
-## ----echo=FALSE----------------------------------------------------------
+## ----tidy-stocks-2, echo=FALSE-------------------------------------------
 stocks <- data_frame(
   Date = as.Date('2009-01-01') + 0:4,
   `Boeing Price` = paste("$", c("173.55", "172.61", "173.86", "170.77", "174.29"), sep = ""),
@@ -119,6 +122,8 @@ stocks %>%
   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16), 
                 latex_options = c("HOLD_position"))
 
+## **_Learning check_**
+
 ## ------------------------------------------------------------------------
 drinks_smaller
 
@@ -127,29 +132,24 @@ drinks_smaller_tidy <- drinks_smaller %>%
   gather(key = type, value = servings, -country)
 drinks_smaller_tidy
 
+## ---- eval=FALSE---------------------------------------------------------
+## drinks_smaller_tidy <- drinks_smaller %>%
+##   gather(key = type, value = servings, c(beer, spirit, wine))
+## drinks_smaller_tidy
+
 ## ------------------------------------------------------------------------
 ggplot(drinks_smaller_tidy, aes(x=country, y=servings, fill=type)) +
   geom_col(position = "dodge")
 
 ## **_Learning check_**
 
-## ----echo=FALSE----------------------------------------------------------
-drinks_sub <- drinks %>%
-  select(-total_litres_of_pure_alcohol) %>% 
-  filter(country %in% c("USA", "Canada", "South Korea"))
-drinks_sub_tidy <- drinks_sub %>%
-  gather(type, servings, -c(country)) %>%
-  mutate(
-    type = str_sub(type, start=1, end=-10)
-  ) %>%
-  arrange(country, type) %>% 
-  rename(`alcohol type` = type)
-drinks_sub
+## ---- eval=FALSE---------------------------------------------------------
+## airline_safety
 
 ## ------------------------------------------------------------------------
-glimpse(airports)
-
-## **_Learning check_**
+airline_safety_smaller <- airline_safety %>% 
+  select(-c(incl_reg_subsidiaries, avail_seat_km_per_week))
+airline_safety_smaller
 
 ## ------------------------------------------------------------------------
 guat_dem <- dem_score %>% 
@@ -157,18 +157,18 @@ guat_dem <- dem_score %>%
 guat_dem
 
 ## ------------------------------------------------------------------------
-guat_tidy <- guat_dem %>% 
+guat_dem_tidy <- guat_dem %>% 
   gather(key = year, value = democracy_score, -country) 
-guat_tidy
+guat_dem_tidy
 
-## ----errors=TRUE---------------------------------------------------------
-ggplot(guat_tidy, aes(x = year, y = democracy_score)) +
-  geom_line()
+## ------------------------------------------------------------------------
+guat_dem_tidy <- guat_dem_tidy %>% 
+  mutate(year = as.numeric(year))
 
-## ----guatline, fig.cap="Guatemala's democracy score ratings from 1952 to 1992"----
-ggplot(guat_tidy, aes(x = parse_number(year), y = democracy_score)) +
+## ----errors=TRUE---------------------------------------------------------
+ggplot(guat_dem_tidy, aes(x = year, y = democracy_score)) +
   geom_line() +
-  labs(x = "year")
+  labs(x = "Year", y = "Democracy Score", title = "Democracy score in Guatemala 1952-1992")
 
 ## ---- eval=FALSE---------------------------------------------------------
 ## library(dplyr)
@@ -189,14 +189,6 @@ ggplot(guat_tidy, aes(x = parse_number(year), y = democracy_score)) +
 ## library(stringr)
 ## library(forcats)
 
-## ----message=FALSE-------------------------------------------------------
-joined_flights <- inner_join(x = flights, y = airlines, by = "carrier")
-
-## ----eval=FALSE----------------------------------------------------------
-## View(joined_flights)
-
-## **_Learning check_**
-
 ## ----import-cheatsheet, echo=FALSE, fig.cap="Data Import cheatsheat"-----
 include_graphics("images/import_cheatsheet-1.png")
 
diff --git a/docs/scripts/07-multiple-regression.R b/docs/scripts/07-multiple-regression.R
index 4abfe3528..d476c8f97 100644
--- a/docs/scripts/07-multiple-regression.R
+++ b/docs/scripts/07-multiple-regression.R
@@ -21,9 +21,6 @@ library(forcats)
 library(gridExtra)
 library(patchwork)
 
-## ---- echo=FALSE, results='asis'-----------------------------------------
-image_link(path = "images/datacamp_working_with_data.png", link = "https://www.datacamp.com/courses/working-with-data-in-the-tidyverse", html_opts = "height: 150px;", latex_opts = "width=0.3\\textwidth")
-
 ## ---- warning=FALSE, message=FALSE---------------------------------------
 library(ISLR)
 Credit <- Credit %>%
diff --git a/docs/scripts/08-sampling.R b/docs/scripts/08-sampling.R
index 549a9232d..c53f06c91 100644
--- a/docs/scripts/08-sampling.R
+++ b/docs/scripts/08-sampling.R
@@ -9,6 +9,7 @@ library(knitr)
 library(kableExtra)
 library(patchwork)
 library(readr)
+library(stringr)
 
 ## ---- eval=FALSE---------------------------------------------------------
 ## tactile_prop_red
@@ -63,11 +64,11 @@ virtual_shovel %>%
 
 ## ------------------------------------------------------------------------
 virtual_shovel %>% 
-  mutate(is_red = color == "red")
+  mutate(is_red = (color == "red"))
 
 ## ------------------------------------------------------------------------
 virtual_shovel %>% 
-  mutate(is_red = color == "red") %>% 
+  mutate(is_red = (color == "red")) %>% 
   summarize(num_red = sum(is_red))  
 
 ## ------------------------------------------------------------------------
@@ -130,7 +131,7 @@ virtual_histogram +
   labs(x = "Proportion of 50 balls that were red", 
        title = "Distribution of 33 proportions red")
 
-## ----tactile-vs-virtual, echo=FALSE, fig.cap="Two distribution of 33 proportions based on 33 samples of size 50"----
+## ----tactile-vs-virtual, echo=FALSE, fig.cap="Comparing 33 virtual and 33 tactile proportions red."----
 bind_rows(
   virtual_prop_red %>% 
     mutate(type = "Virtual sampling"), 
@@ -255,12 +256,13 @@ virtual_prop_red_100 <- virtual_samples_100 %>%
   mutate(prop_red = red / 100) %>% 
   mutate(n = 100)
 
-virtual_prop <- bind_rows(virtual_prop_red_25, virtual_prop_red_50,virtual_prop_red_100)
+virtual_prop <- bind_rows(virtual_prop_red_25, virtual_prop_red_50, virtual_prop_red_100)
 
-ggplot(virtual_prop, aes(x = prop_red)) +
+comparing_sampling_distributions <- ggplot(virtual_prop, aes(x = prop_red)) +
   geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
-  labs(x = "Sample proportion red", title = "Comparing the distributions of proportion red for different sample sizes") +
+  labs(x = "Proportion of shovel's balls that are red", title = "Comparing distributions of proportions red for 3 different shovels.") +
   facet_wrap(~n)
+comparing_sampling_distributions
 
 ## ---- eval = FALSE-------------------------------------------------------
 ## # n = 25
@@ -276,74 +278,78 @@ ggplot(virtual_prop, aes(x = prop_red)) +
 ##   summarize(sd = sd(prop_red))
 
 ## ----comparing-n, eval=TRUE, echo=FALSE----------------------------------
-virtual_prop %>% 
+comparing_n_table <- virtual_prop %>% 
   group_by(n) %>% 
   summarize(sd = sd(prop_red)) %>% 
-  rename(`sample size` = n, `standard deviation` = sd) %>% 
+  rename(`Number of slots in shovel` = n, `Standard deviation of proportions red` = sd) 
+
+comparing_n_table  %>% 
   kable(
     digits = 3,
-      caption = "Comparing the standard deviations of the proportion red for different sample sizes.", 
+      caption = "Comparing standard deviations of proportions red for 3 different shovels.", 
       booktabs = TRUE
 ) %>% 
   kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
                 latex_options = c("HOLD_position"))
 
-## ---- eval=FALSE---------------------------------------------------------
-## tactile_prop_red %>%
-##   summarize(mean = mean(prop_red), sd = sd(prop_red))
+## ----echo=FALSE----------------------------------------------------------
+comparing_sampling_distributions
 
-## ---- echo=FALSE---------------------------------------------------------
-summary_stats <- tactile_prop_red %>% 
-  summarize(mean = mean(prop_red), sd = sd(prop_red))
-summary_stats %>% 
-  kable(digits = 3) %>% 
-  kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
-                latex_options = c("HOLD_position"))
+## ---- eval=TRUE, echo=FALSE----------------------------------------------
+comparing_n_table  %>% 
+  kable(digits = 3)
 
-## ---- eval=FALSE---------------------------------------------------------
-## virtual_samples <- bowl %>%
-##   rep_sample_n(size = 50, reps = 1000)
-## View(virtual_samples)
-
-## ---- echo=FALSE---------------------------------------------------------
-virtual_samples <- bowl %>% 
-  rep_sample_n(size = 50, reps = 1000)
-
-## ---- eval=FALSE---------------------------------------------------------
-## virtual_prop_red <- virtual_samples %>%
-##   group_by(replicate) %>%
-##   summarize(red = sum(color == "red")) %>%
-##   mutate(prop_red = red / 50)
-## View(virtual_prop_red)
-
-## ---- echo=FALSE---------------------------------------------------------
-virtual_prop_red <- virtual_samples %>% 
-  group_by(replicate) %>% 
-  summarize(red = sum(color == "red")) %>% 
-  mutate(prop_red = red / 50)
-
-## ---- eval=FALSE---------------------------------------------------------
-## ggplot(virtual_prop_red, aes(x = prop_red)) +
-##   geom_histogram(binwidth = 0.05, color = "white") +
-##   labs(x = "Sample proportion red based on n = 50",
-##        title = "Sampling distribution of p-hat")
+## ----comparing-sampling-distributions-2, echo=FALSE, fig.cap="Three sampling distributions of the sample proportion $\\widehat{p}$."----
+virtual_prop %>% 
+  mutate(
+    n = str_c("n = ", n),
+    n = factor(n, levels = c("n = 25", "n = 50", "n = 100"))
+    ) %>% 
+  ggplot( aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
+  labs(x = expression(paste("Sample proportion ", hat(p))), 
+       title = expression(paste("Sampling distributions of the sample proportion ", hat(p), " based on n = 25, 50, 100.")) ) +
+  facet_wrap(~n)
 
-## ----echo=FALSE, fig.cap="Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50"----
-virtual_prop_red <- virtual_samples %>% 
-  group_by(replicate) %>% 
-  summarize(red = sum(color == "red")) %>% 
-  mutate(prop_red = red / 50)
+## ----comparing-n-2, eval=TRUE, echo=FALSE--------------------------------
+comparing_n_table <- virtual_prop %>% 
+  group_by(n) %>% 
+  summarize(sd = sd(prop_red)) %>% 
+  mutate(
+    n = str_c("n = ", n),
+    n = factor(n, levels = c("n = 25", "n = 50", "n = 100"))
+    ) %>% 
+  rename(`Sample size` = n, `Standard error of $\\widehat{p}$` = sd) 
 
-ggplot(virtual_prop_red, aes(x = prop_red)) +
-  geom_histogram(binwidth = 0.05, color = "white") +
-    labs(
-      x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
-      title = expression(paste("Sampling distribution of ", hat(p)))
-      )
+comparing_n_table  %>% 
+  kable(
+    digits = 3,
+      caption = "Three standard errors of the sample proportion $\\widehat{p}$ based on n = 25, 50, 100. ", 
+      booktabs = TRUE
+) %>% 
+  kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16),
+                latex_options = c("HOLD_position"))
 
 ## ------------------------------------------------------------------------
-virtual_prop_red %>% 
-  summarize(SE = sd(prop_red))
+bowl %>% 
+  summarize(sum_red = sum(color == "red"), 
+            sum_not_red = sum(color != "red"))
+
+## ----comparing-sampling-distributions-3, echo=FALSE, fig.cap="Three sampling distributions with population proportion $p$ marked in red."----
+p <- bowl %>% 
+  summarize(p = mean(color == "red")) %>% 
+  pull(p)
+virtual_prop %>% 
+  mutate(
+    n = str_c("n = ", n),
+    n = factor(n, levels = c("n = 25", "n = 50", "n = 100"))
+    ) %>% 
+  ggplot( aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
+  labs(x = expression(paste("Sample proportion ", hat(p))), 
+       title = expression(paste("Sampling distributions of the sample proportion ", hat(p), " based on n = 25, 50, 100.")) ) +
+  facet_wrap(~n) +
+  geom_vline(xintercept = p, col = "red", size = 1)
 
 ## ----summarytable-ch8, echo=FALSE, message=FALSE-------------------------
 # The following Google Doc is published to CSV and loaded below using read_csv() below:
@@ -352,7 +358,7 @@ virtual_prop_red %>%
 "https://docs.google.com/spreadsheets/d/e/2PACX-1vRd6bBgNwM3z-AJ7o4gZOiPAdPfbTp_V15HVHRmOH5Fc9w62yaG-fEKtjNUD2wOSa5IJkrDMaEBjRnA/pub?gid=0&single=true&output=csv" %>% 
   read_csv(na = "") %>% 
   kable(
-    caption = "\\label{tab:summarytable}Scenarios of sampling for inference", 
+    caption = "\\label{tab:summarytable-ch8}Scenarios of sampling for inference", 
     booktabs = TRUE,
     escape = FALSE
   ) %>% 
diff --git a/docs/scripts/11-inference-for-regression.R b/docs/scripts/11-inference-for-regression.R
index 73f942788..198ebe718 100644
--- a/docs/scripts/11-inference-for-regression.R
+++ b/docs/scripts/11-inference-for-regression.R
@@ -76,7 +76,7 @@ null_slope_distn %>%
 ##   generate(reps = 10000, type = "permute") %>%
 ##   calculate(stat = "slope")
 
-## ----echo=FALSE----------------------------------------------------------
+## ------------------------------------------------------------------------
 bootstrap_slope_distn <- evals %>% 
   specify(score ~ bty_avg) %>%
   generate(reps = 10000, type = "bootstrap") %>% 
diff --git a/docs/scripts/12-thinking-with-data.R b/docs/scripts/12-thinking-with-data.R
index 2ecd7736c..0da78a820 100644
--- a/docs/scripts/12-thinking-with-data.R
+++ b/docs/scripts/12-thinking-with-data.R
@@ -7,27 +7,16 @@ rq <- 0
 
 knitr::opts_chunk$set(
   tidy = FALSE, 
-  out.width = '\\textwidth'
+  out.width = '\\textwidth', 
+  fig.height = 4,
+  warning = FALSE
   )
+
 options(scipen = 99, digits = 3)
 
-# This bit of code is a bug fix on asis blocks, which we use to show/not show LC
-# solutions, which are written like markdown text. In theory, it shouldn't be
-# necessary for knitr versions <=1.11.6, but I've found I still need to for
-# everything to knit properly in asis blocks. More info here: 
-# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
-library(knitr)
-knit_engines$set(asis = function(options) {
-  if (options$echo && options$eval) knit_child(text = options$code)
-})
-
-# This controls which LC solutions to show. Options for solutions_shown: "ALL"
-# (to show all solutions), or subsets of c('4-4', '4-5'), including the
-# null vector c('') to show no solutions.
-solutions_shown <- c('')
-show_solutions <- function(section){
-  return(solutions_shown == "ALL" | section %in% solutions_shown)
-  }
+# Set random number generator see value for replicable pseudorandomness. Why 76?
+# https://www.youtube.com/watch?v=xjJ7FheCkCU
+set.seed(76)
 
 ## ----moderndive-figure-conclusion, echo=FALSE, fig.align='center', fig.cap="ModernDive Flowchart"----
 knitr::include_graphics("images/flowcharts/flowchart/flowchart.002.png")
@@ -47,9 +36,6 @@ library(knitr)
 library(patchwork)
 library(scales)
 
-## ---- echo=FALSE, results='asis'-----------------------------------------
-image_link(path = "images/datacamp_intro_to_modeling.png", link = "https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse")
-
 ## ----warning=FALSE, message=FALSE----------------------------------------
 library(ggplot2)
 library(dplyr)
diff --git a/docs/search_index.json b/docs/search_index.json
index a279ebcc8..5489f2f17 100644
--- a/docs/search_index.json
+++ b/docs/search_index.json
@@ -1,19 +1,19 @@
 [
-["index.html", "Statistical Inference via Data Science A moderndive into R and the tidyverse Chapter 1 Introduction 1.1 Introduction for students 1.2 Introduction for instructors 1.3 DataCamp 1.4 Connect and contribute 1.5 About this book 1.6 About the authors", " Statistical Inference via Data Science A moderndive into R and the tidyverse Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton February 26, 2019 Chapter 1 Introduction Note: This is the development version of ModernDive and is currently in the process of being edited. For the latest released version of ModernDive, please go to ModernDive.com. Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do? If you’re asking yourself this question, then you’ve come to the right place! Start with our Introduction for Students. Are you an instructor hoping to use this book in your courses? Then click here for more information on how to teach with this book. Are you looking to connect with and contribute to ModernDive? Then click here for information on how. Are you curious about the publishing of this book? Then click here for more information on the open-source technology, in particular R Markdown and the bookdown package. This is version 0.4.0.9000 of ModernDive published on February 26, 2019. For previous versions of ModernDive, see Section 1.5. While a PDF version of this book can be found here, this is very much a work in progress with many things that still need to be fixed. We appreciate your patience. 1.1 Introduction for students This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would. In Figure 1.1 we present a flowchart of what you’ll cover in this book. You’ll first get started with data in Chapter 2, where you’ll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then Data science: You’ll assemble your data science toolbox using tidyverse packages. In particular: Ch.3: Visualizing data via the ggplot2 package. Ch.5: Understanding the concept of “tidy” data as a standardized data input format for all packages in the tidyverse Ch.4: Wrangling data via the dplyr package. Data modeling: Using these data science tools and helper functions from the moderndive package, you’ll start performing data modeling. In particular: Ch.6: Constructing basic regression models. Ch.7: Constructing multiple regression models. Statistical inference: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the infer package. In particular: Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls. Ch.9: Building confidence intervals. Ch.10: Conducting hypothesis tests. Data modeling revisited: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.6 &amp; Ch.7. In particular: Ch.11: Interpreting both the statistical and practice significance of the results of the models. We’ll end with a discussion on what it means to “think with data” in Chapter 12 and present an example case study data analysis of house prices in Seattle. FIGURE 1.1: ModernDive Flowchart 1.1.1 What you will learn from this book We hope that by the end of this book, you’ll have learned How to use R to explore data. How to answer statistical questions using tools like confidence intervals and hypothesis tests. How to effectively create “data stories” using these tools. What do we mean by data stories? We mean any analysis involving data that engages the reader in answering questions with careful visuals and thoughtful discussion, such as How strong is the relationship between per capita income and crime in Chicago neighborhoods? and How many f**ks does Quentin Tarantino give (as measured by the amount of swearing in his films)?. Further discussions on data stories can be found in this Think With Google article. For other examples of data stories constructed by students like yourselves, look at the final projects for two courses that have previously used ModernDive: Middlebury College MATH 116 Introduction to Statistical and Data Sciences using student collected data. Pacific University SOC 301 Social Statistics using data from the fivethirtyeight R package. This book will help you develop your “data science toolbox”, including tools such as data visualization, data formatting, data wrangling, and data modeling using regression. With these tools, you’ll be able to perform the entirety of the “data/science pipeline” while building data communication skills (see Subsection 1.1.2 for more details). In particular, this book will lean heavily on data visualization. In today’s world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are to convey relationships with data. You’ll also see the use of visualization to introduce concepts like mean, median, standard deviation, distributions, etc. In general, we’ll use visualization as a way of building almost all of the ideas in this book. To impart the statistical lessons in this book, we have intentionally minimized the number of mathematical formulas used and instead have focused on developing a conceptual understanding via data visualization, statistical computing, and simulations. We hope this is a more intuitive experience than the way statistics has traditionally been taught in the past and how it is commonly perceived. Finally, you’ll learn the importance of literate programming. By this we mean you’ll learn how to write code that is useful not just for a computer to execute but also for readers to understand exactly what your analysis is doing and how you did it. This is part of a greater effort to encourage reproducible research (see Subsection 1.1.3 for more details). Hal Abelson coined the phrase that we will follow throughout this book: “Programs must be written for people to read, and only incidentally for machines to execute.” We understand that there may be challenging moments as you learn to program. Both of us continue to struggle and find ourselves often using web searches to find answers and reach out to colleagues for help. In the long run though, we all can solve problems faster and more elegantly via programming. We wrote this book as our way to help you get started and you should know that there is a huge community of R users that are always happy to help everyone along as well. This community exists in particular on the internet on various forums and websites such as stackoverflow.com. 1.1.2 Data/science pipeline You may think of statistics as just being a bunch of numbers. We commonly hear the phrase “statistician” when listening to broadcasts of sporting events. Statistics (in particular, data analysis), in addition to describing numbers like with baseball batting averages, plays a vital role in all of the sciences. You’ll commonly hear the phrase “statistically significant” thrown around in the media. You’ll see articles that say “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this book, you’ll be able to better understand whether these claims should be trusted or whether we should be wary. Inside data analysis are many sub-fields that we will discuss throughout this book (though not necessarily in this order): data collection data wrangling data visualization data modeling inference correlation and regression interpretation of results data communication/storytelling These sub-fields are summarized in what Grolemund and Wickham term the “Data/Science Pipeline” in Figure 1.2. FIGURE 1.2: Data/Science Pipeline We will begin by digging into the gray Understand portion of the cycle with data visualization, then with a discussion on what is meant by tidy data and data wrangling, and then conclude by talking about interpreting and discussing the results of our models via Communication. These steps are vital to any statistical analysis. But why should you care about statistics? “Why did they make me take this class?” There’s a reason so many fields require a statistics course. Scientific knowledge grows through an understanding of statistical significance and data analysis. You needn’t be intimidated by statistics. It’s not the beast that it used to be and, paired with computation, you’ll see how reproducible research in the sciences particularly increases scientific knowledge. 1.1.3 Reproducible research “The most important tool is the mindset, when starting, that the end product will be reproducible.” – Keith Baggerly Another goal of this book is to help readers understand the importance of reproducible analyses. The hope is to get readers into the habit of making their analyses reproducible from the very beginning. This means we’ll be trying to help you build new habits. This will take practice and be difficult at times. You’ll see just why it is so important for you to keep track of your code and well-document it to help yourself later and any potential collaborators as well. Copying and pasting results from one program into a word processor is not the way that efficient and effective scientific research is conducted. It’s much more important for time to be spent on data collection and data analysis and not on copying and pasting plots back and forth across a variety of programs. In a traditional analyses if an error was made with the original data, we’d need to step through the entire process again: recreate the plots and copy and paste all of the new plots and our statistical analysis into your document. This is error prone and a frustrating use of time. We’ll see how to use R Markdown to get away from this tedious activity so that we can spend more time doing science. “We are talking about computational reproducibility.” - Yihui Xie Reproducibility means a lot of things in terms of different scientific fields. Are experiments conducted in a way that another researcher could follow the steps and get similar results? In this book, we will focus on what is known as computational reproducibility. This refers to being able to pass all of one’s data analysis, data-sets, and conclusions to someone else and have them get exactly the same results on their machine. This allows for time to be spent interpreting results and considering assumptions instead of the more error prone way of starting from scratch or following a list of steps that may be different from machine to machine. 1.1.4 Final note for students At this point, if you are interested in instructor perspectives on this book, ways to contribute and collaborate, or the technical details of this book’s construction and publishing, then continue with the rest of the chapter below. Otherwise, let’s get started with R and RStudio in Chapter 2! 1.2 Introduction for instructors This book is inspired by the following books: “Mathematical Statistics with Resampling and R” (Chihara and Hesterberg 2011), “OpenIntro: Intro Stat with Randomization and Simulation” (Diez, Barr, and Çetinkaya-Rundel 2014), and “R for Data Science” (Grolemund and Wickham 2016). The first book, while designed for upper-level undergraduates and graduate students, provides an excellent resource on how to use resampling to impart statistical concepts like sampling distributions using computation instead of large-sample approximations and other mathematical formulas. The last two books are free options to learning introductory statistics and data science, providing an alternative to the many traditionally expensive introductory statistics textbooks. When looking over the large number of introductory statistics textbooks that currently exist, we found that there wasn’t one that incorporated many newly developed R packages directly into the text, in particular the many packages included in the tidyverse collection of packages, such as ggplot2, dplyr, tidyr, and broom. Additionally, there wasn’t an open-source and easily reproducible textbook available that exposed new learners all of three of the learning goals listed at the outset of Subsection 1.1.1. 1.2.1 Who is this book for? This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience. Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you. Blur the lines between lecture and lab With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened. It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key. Focus on the entire data/science research pipeline We believe that the entirety of Grolemund and Wickham’s data/science pipeline should be taught. We believe in “minimizing prerequisites to research”: students should be answering questions with data as soon as possible. It’s all about the data We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the nycflights13 and fivethirtyeight packages. We believe that data visualization is a gateway drug for statistics and that the Grammar of Graphics as implemented in the ggplot2 package is the best way to impart such lessons. However, we often hear: “You can’t teach ggplot2 for data visualization in intro stats!” We, like David Robinson, are much more optimistic. dplyr has made data wrangling much more accessible to novices, and hence much more interesting data-sets can be explored. Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference. This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics. Don’t fence off students from the computation pool, throw them in! Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice. We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis. Complete reproducibility and customizability We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book! Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see About this Book. 1.3 DataCamp DataCamp logo DataCamp is a browser-based interactive platform for learning data science, offering courses on a wide array of courses on data science, analytics, statistics, machine learning, and artificial intelligence, where each course is a combination of lectures and exercises that offer immediate feedback. The following chapters of ModernDive roughly map to the following closely-integrated DataCamp courses that use the same R tools and often even the same datasets. By no means is this an exhaustive list of possible DataCamp courses that are relevant to the topics in this book, we recommend these ones in particular to supplement your ModernDive experience. Click on the image for each course to access its webpage on datacamp.com. Instructors at accredited universities can sign their class up for a free academic licence at DataCamp For The Classroom, giving their students access to all premium courses for 6 months for free. Chapter Topic DataCamp Courses 2 Basic R programming concepts 3 &amp; 5 Introductory data visualization and wrangling 4 &amp; 5 Data “tidying” and intermediate data wrangling 6 &amp; 7 Data modeling, basic regression, and multiple regression 9 &amp; 10 Statistical inference: confidence intervals and hypothesis testing 11 Inference for regression 1.4 Connect and contribute If you would like to connect with ModernDive, check out the following links: If you would like to receive periodic updates about ModernDive (roughly every 3 months), please sign up for our mailing list. Contact Albert at albert.ys.kim@gmail.com and Chester at chester.ismay@gmail.com. We’re on Twitter at ModernDive. If you would like to contribute to ModernDive, there are many ways! Let’s all work together to make this book as great as possible for as many students and instructors as possible! Please let us know if you find any errors, typos, or areas from improvement on our GitHub issues page. If you are familiar with GitHub and would like to contribute more, please see Section 1.5 below. For example, we thank Dr Andrew Heiss for contributing Subsection 2.2.3 on “Errors, warnings, and messages”. The authors would like to thank Nina Sonneborn, Kristin Bott, and the participants of our USCOTS 2017 workshop for their feedback and suggestions. A special thanks goes to Dr. Yana Weinstein, cognitive psychological scientist and co-founder of The Learning Scientists, for her extensive contributions. 1.5 About this book This book was written using RStudio’s bookdown package by Yihui Xie (Xie 2018). This package simplifies the publishing of books by having all content written in R Markdown. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub: Latest published version The most up-to-date release: Version 0.4.0 released on July 21, 2018 (source code). Available at ModernDive.com Development version The working copy of the next version which is currently being edited: Preview of development version is available at https://moderndive.netlify.com/ Source code: Available on ModernDive’s GitHub repository page Previous versions Older versions that may be out of date: Version 0.3.0 released on February 3, 2018 (source code) Version 0.2.0 released on August 02, 2017 (source code) Version 0.1.3 released on February 09, 2017 (source code) Version 0.1.2 released on January 22, 2017 (source code) Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated editions of the textbook every few years, we apply a software design influenced model of publishing more easily updated versions. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests. Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of index.Rmd as “Chester Ismay, Albert Y. Kim, and YOU!” 1.6 About the authors Who we are! Chester Ismay Albert Y. Kim Chester Ismay: Senior Curriculum Lead - DataCamp, Portland, OR, USA. Email: chester.ismay@gmail.com Webpage: http://chester.rbind.io/ Twitter: old_man_chester GitHub: https://github.com/ismayc Albert Y. Kim: Assistant Professor of Statistical &amp; Data Sciences - Smith College, Northampton, MA, USA. Email: albert.ys.kim@gmail.com Webpage: http://rudeboybert.rbind.io/ Twitter: rudeboybert GitHub: https://github.com/rudeboybert References "],
+["index.html", "Statistical Inference via Data Science A moderndive into R and the tidyverse Chapter 1 Introduction 1.1 Introduction for students 1.2 Introduction for instructors 1.3 DataCamp 1.4 Connect and contribute 1.5 About this book 1.6 About the authors", " Statistical Inference via Data Science A moderndive into R and the tidyverse Chester Ismay, Albert Y. Kim, Arend M. Kuyper, and Elizabeth Tipton February 26, 2019 Chapter 1 Introduction Special Announcement We’re excited to announce that we’ve signed a book deal with CRC Press! We will be publishing our first fully complete online version of ModernDive in Summer 2019, with a corresponding print edition to follow in Fall 2019. Don’t worry though, our content will always remain freely available on ModernDive.com. Please note that you are currently looking at the “development version” of ModernDive, which is a work in progress currently being edited and thus subject to frequent change. For the latest “released version” of ModernDive, which changes much less frequently, please visit ModernDive.com. Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do? If you’re asking yourself this question, then you’ve come to the right place! Start with our Introduction for Students. Are you an instructor hoping to use this book in your courses? Then click here for more information on how to teach with this book. Are you looking to connect with and contribute to ModernDive? Then click here for information on how. Are you curious about the publishing of this book? Then click here for more information on the open-source technology, in particular R Markdown and the bookdown package. This is version 0.5.0.9000 of ModernDive published on February 26, 2019. For previous versions of ModernDive, see Section 1.5. While a PDF version of this book can be found here, this is very much a work in progress with many things that still need to be fixed. We appreciate your patience. 1.1 Introduction for students This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would. In Figure 1.1 we present a flowchart of what you’ll cover in this book. You’ll first get started with data in Chapter 2, where you’ll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then Data science: You’ll assemble your data science toolbox using tidyverse packages. In particular: Ch.3: Visualizing data via the ggplot2 package. Ch.5: Understanding the concept of “tidy” data as a standardized data input format for all packages in the tidyverse Ch.4: Wrangling data via the dplyr package. Data modeling: Using these data science tools and helper functions from the moderndive package, you’ll start performing data modeling. In particular: Ch.6: Constructing basic regression models. Ch.7: Constructing multiple regression models. Statistical inference: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the infer package. In particular: Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls. Ch.9: Building confidence intervals. Ch.10: Conducting hypothesis tests. Data modeling revisited: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.6 &amp; Ch.7. In particular: Ch.11: Interpreting both the statistical and practice significance of the results of the models. We’ll end with a discussion on what it means to “think with data” in Chapter 12 and present an example case study data analysis of house prices in Seattle. FIGURE 1.1: ModernDive Flowchart 1.1.1 What you will learn from this book We hope that by the end of this book, you’ll have learned How to use R to explore data. How to answer statistical questions using tools like confidence intervals and hypothesis tests. How to effectively create “data stories” using these tools. What do we mean by data stories? We mean any analysis involving data that engages the reader in answering questions with careful visuals and thoughtful discussion, such as How strong is the relationship between per capita income and crime in Chicago neighborhoods? and How many f**ks does Quentin Tarantino give (as measured by the amount of swearing in his films)?. Further discussions on data stories can be found in this Think With Google article. For other examples of data stories constructed by students like yourselves, look at the final projects for two courses that have previously used ModernDive: Middlebury College MATH 116 Introduction to Statistical and Data Sciences using student collected data. Pacific University SOC 301 Social Statistics using data from the fivethirtyeight R package. This book will help you develop your “data science toolbox”, including tools such as data visualization, data formatting, data wrangling, and data modeling using regression. With these tools, you’ll be able to perform the entirety of the “data/science pipeline” while building data communication skills (see Subsection 1.1.2 for more details). In particular, this book will lean heavily on data visualization. In today’s world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are to convey relationships with data. You’ll also see the use of visualization to introduce concepts like mean, median, standard deviation, distributions, etc. In general, we’ll use visualization as a way of building almost all of the ideas in this book. To impart the statistical lessons in this book, we have intentionally minimized the number of mathematical formulas used and instead have focused on developing a conceptual understanding via data visualization, statistical computing, and simulations. We hope this is a more intuitive experience than the way statistics has traditionally been taught in the past and how it is commonly perceived. Finally, you’ll learn the importance of literate programming. By this we mean you’ll learn how to write code that is useful not just for a computer to execute but also for readers to understand exactly what your analysis is doing and how you did it. This is part of a greater effort to encourage reproducible research (see Subsection 1.1.3 for more details). Hal Abelson coined the phrase that we will follow throughout this book: “Programs must be written for people to read, and only incidentally for machines to execute.” We understand that there may be challenging moments as you learn to program. Both of us continue to struggle and find ourselves often using web searches to find answers and reach out to colleagues for help. In the long run though, we all can solve problems faster and more elegantly via programming. We wrote this book as our way to help you get started and you should know that there is a huge community of R users that are always happy to help everyone along as well. This community exists in particular on the internet on various forums and websites such as stackoverflow.com. 1.1.2 Data/science pipeline You may think of statistics as just being a bunch of numbers. We commonly hear the phrase “statistician” when listening to broadcasts of sporting events. Statistics (in particular, data analysis), in addition to describing numbers like with baseball batting averages, plays a vital role in all of the sciences. You’ll commonly hear the phrase “statistically significant” thrown around in the media. You’ll see articles that say “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this book, you’ll be able to better understand whether these claims should be trusted or whether we should be wary. Inside data analysis are many sub-fields that we will discuss throughout this book (though not necessarily in this order): data collection data wrangling data visualization data modeling inference correlation and regression interpretation of results data communication/storytelling These sub-fields are summarized in what Grolemund and Wickham term the “Data/Science Pipeline” in Figure 1.2. FIGURE 1.2: Data/Science Pipeline We will begin by digging into the gray Understand portion of the cycle with data visualization, then with a discussion on what is meant by tidy data and data wrangling, and then conclude by talking about interpreting and discussing the results of our models via Communication. These steps are vital to any statistical analysis. But why should you care about statistics? “Why did they make me take this class?” There’s a reason so many fields require a statistics course. Scientific knowledge grows through an understanding of statistical significance and data analysis. You needn’t be intimidated by statistics. It’s not the beast that it used to be and, paired with computation, you’ll see how reproducible research in the sciences particularly increases scientific knowledge. 1.1.3 Reproducible research “The most important tool is the mindset, when starting, that the end product will be reproducible.” – Keith Baggerly Another goal of this book is to help readers understand the importance of reproducible analyses. The hope is to get readers into the habit of making their analyses reproducible from the very beginning. This means we’ll be trying to help you build new habits. This will take practice and be difficult at times. You’ll see just why it is so important for you to keep track of your code and well-document it to help yourself later and any potential collaborators as well. Copying and pasting results from one program into a word processor is not the way that efficient and effective scientific research is conducted. It’s much more important for time to be spent on data collection and data analysis and not on copying and pasting plots back and forth across a variety of programs. In a traditional analyses if an error was made with the original data, we’d need to step through the entire process again: recreate the plots and copy and paste all of the new plots and our statistical analysis into your document. This is error prone and a frustrating use of time. We’ll see how to use R Markdown to get away from this tedious activity so that we can spend more time doing science. “We are talking about computational reproducibility.” - Yihui Xie Reproducibility means a lot of things in terms of different scientific fields. Are experiments conducted in a way that another researcher could follow the steps and get similar results? In this book, we will focus on what is known as computational reproducibility. This refers to being able to pass all of one’s data analysis, data-sets, and conclusions to someone else and have them get exactly the same results on their machine. This allows for time to be spent interpreting results and considering assumptions instead of the more error prone way of starting from scratch or following a list of steps that may be different from machine to machine. 1.1.4 Final note for students At this point, if you are interested in instructor perspectives on this book, ways to contribute and collaborate, or the technical details of this book’s construction and publishing, then continue with the rest of the chapter below. Otherwise, let’s get started with R and RStudio in Chapter 2! 1.2 Introduction for instructors This book is inspired by the following books: “Mathematical Statistics with Resampling and R” (Chihara and Hesterberg 2011), “OpenIntro: Intro Stat with Randomization and Simulation” (Diez, Barr, and Çetinkaya-Rundel 2014), and “R for Data Science” (Grolemund and Wickham 2016). The first book, while designed for upper-level undergraduates and graduate students, provides an excellent resource on how to use resampling to impart statistical concepts like sampling distributions using computation instead of large-sample approximations and other mathematical formulas. The last two books are free options to learning introductory statistics and data science, providing an alternative to the many traditionally expensive introductory statistics textbooks. When looking over the large number of introductory statistics textbooks that currently exist, we found that there wasn’t one that incorporated many newly developed R packages directly into the text, in particular the many packages included in the tidyverse collection of packages, such as ggplot2, dplyr, tidyr, and broom. Additionally, there wasn’t an open-source and easily reproducible textbook available that exposed new learners all of three of the learning goals listed at the outset of Subsection 1.1.1. 1.2.1 Who is this book for? This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience. Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you. Blur the lines between lecture and lab With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened. It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key. Focus on the entire data/science research pipeline We believe that the entirety of Grolemund and Wickham’s data/science pipeline should be taught. We believe in “minimizing prerequisites to research”: students should be answering questions with data as soon as possible. It’s all about the data We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the nycflights13 and fivethirtyeight packages. We believe that data visualization is a gateway drug for statistics and that the Grammar of Graphics as implemented in the ggplot2 package is the best way to impart such lessons. However, we often hear: “You can’t teach ggplot2 for data visualization in intro stats!” We, like David Robinson, are much more optimistic. dplyr has made data wrangling much more accessible to novices, and hence much more interesting data-sets can be explored. Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference. This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics. Don’t fence off students from the computation pool, throw them in! Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice. We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis. Complete reproducibility and customizability We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book! Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see About this Book. 1.3 DataCamp DataCamp logo DataCamp is a browser-based interactive platform for learning data science, offering courses on a wide array of courses on data science, analytics, statistics, machine learning, and artificial intelligence, where each course is a combination of lectures and exercises that offer immediate feedback. The following chapters of ModernDive roughly map to the following closely-integrated DataCamp courses that use the same R tools and often even the same datasets. By no means is this an exhaustive list of possible DataCamp courses that are relevant to the topics in this book, we recommend these ones in particular to supplement your ModernDive experience. Click on the image for each course to access its webpage on datacamp.com. Instructors at accredited universities can sign their class up for a free academic licence at DataCamp For The Classroom, giving their students access to all premium courses for 6 months for free. Chapter Topic DataCamp Courses 2 Basic R programming concepts 3 &amp; 5 Introductory data visualization and wrangling 4 &amp; 5 Data “tidying” and intermediate data wrangling 6 &amp; 7 Data modeling, basic regression, and multiple regression 9 &amp; 10 Statistical inference: confidence intervals and hypothesis testing 11 Inference for regression 1.4 Connect and contribute If you would like to connect with ModernDive, check out the following links: If you would like to receive periodic updates about ModernDive (roughly every 3 months), please sign up for our mailing list. Contact Albert at albert.ys.kim@gmail.com and Chester at chester.ismay@gmail.com. We’re on Twitter at ModernDive. If you would like to contribute to ModernDive, there are many ways! Let’s all work together to make this book as great as possible for as many students and instructors as possible! Please let us know if you find any errors, typos, or areas from improvement on our GitHub issues page. If you are familiar with GitHub and would like to contribute more, please see Section 1.5 below. For example, we thank Dr Andrew Heiss for contributing Subsection 2.2.3 on “Errors, warnings, and messages”. The authors would like to thank Nina Sonneborn, Kristin Bott, and the participants of our USCOTS 2017 workshop for their feedback and suggestions. A special thanks goes to Dr. Yana Weinstein, cognitive psychological scientist and co-founder of The Learning Scientists, for her extensive contributions. 1.5 About this book This book was written using RStudio’s bookdown package by Yihui Xie (Xie 2018). This package simplifies the publishing of books by having all content written in R Markdown. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub: Latest published version The most up-to-date release: Version 0.5.0 released on February 24, 2019 (source code). Available at ModernDive.com Development version The working copy of the next version which is currently being edited: Preview of development version is available at https://moderndive.netlify.com/ Source code: Available on ModernDive’s GitHub repository page Previous versions Older versions that may be out of date: Version 0.4.0 released on July 21, 2018 (source code) Version 0.3.0 released on February 3, 2018 (source code) Version 0.2.0 released on August 02, 2017 (source code) Version 0.1.3 released on February 09, 2017 (source code) Version 0.1.2 released on January 22, 2017 (source code) Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated editions of the textbook every few years, we apply a software design influenced model of publishing more easily updated versions. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests. Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of index.Rmd as “Chester Ismay, Albert Y. Kim, and YOU!” 1.6 About the authors Who we are! Chester Ismay Albert Y. Kim Chester Ismay: Senior Curriculum Lead - DataCamp, Portland, OR, USA. Email: chester.ismay@gmail.com Webpage: http://chester.rbind.io/ Twitter: old_man_chester GitHub: https://github.com/ismayc Albert Y. Kim: Assistant Professor of Statistical &amp; Data Sciences - Smith College, Northampton, MA, USA. Email: albert.ys.kim@gmail.com Webpage: http://rudeboybert.rbind.io/ Twitter: rudeboybert GitHub: https://github.com/rudeboybert References "],
 ["2-getting-started.html", "Chapter 2 Getting Started with Data in R 2.1 What are R and RStudio? 2.2 How do I code in R? 2.3 What are R packages? 2.4 Explore your first dataset 2.5 Conclusion", " Chapter 2 Getting Started with Data in R Before we can start exploring data in R, there are some key concepts to understand first: What are R and RStudio? How do I code in R? What are R packages? We’ll introduce these concepts in upcoming Sections 2.1-2.3. If you are already somewhat familiar with these concepts, feel free to skip to Section 2.4 where we’ll introduce our first data set: all domestic flights departing a New York City airport in 2013. This is a dataset we will explore in depth in this book. 2.1 What are R and RStudio? For much of this book, we will assume that you are using R via RStudio. First time users often confuse the two. At its simplest: R is like a car’s engine. RStudio is like a car’s dashboard. R: Engine RStudio: Dashboard More precisely, R is a programming language that runs computations while RStudio is an integrated development environment (IDE) that provides an interface by adding many convenient features and tools. So the way of having access to a speedometer, rearview mirrors, and a navigation system makes driving much easier, using RStudio’s interface makes using R much easier as well. If you are still not sure about the difference between R and RStudio IDE, we suggest you watch this DataCamp video. 2.1.1 Installing R and RStudio Note about RStudio Server: If your instructor has provided you with a link and access to RStudio Server, then you can skip this section. We do recommend though after a few months of working on the RStudio Server that you return to these instructions. You will first need to download and install both R and RStudio (Desktop version) on your computer. You must do this first: Download and install R. Click on the download link corresponding to your computer’s operating system. You must do this second: Download and install RStudio. Scroll down to “Installers for Supported Platforms” Click on the download link corresponding to your computer’s operating system. If you had trouble with these two steps, we suggest you watch this DataCamp video. 2.1.2 Using R via RStudio Recall our car analogy from above. Much as we don’t drive a car by interacting directly with the engine but rather by interacting with elements on the car’s dashboard, we won’t be using R directly but rather we will use RStudio’s interface. After you install R and RStudio on your computer, you’ll have two new programs AKA applications you can open. We will always work in RStudio and not R. In other words: R: Do not open this RStudio: Open this After you open RStudio, you should see the following: Note the three panes, which are three panels dividing the screen: The Console pane, the Files pane, and the Environment pane. Over the course of this chapter, you’ll come to learn what purpose each of these panes serve. If however you would like an in depth explanation right now however, we suggest you watch following DataCamp video. 2.2 How do I code in R? Now that you’re set up with R and RStudio, you are probably asking yourself “OK. Now how do I use R?” The first thing to note as that unlike other statistical software programs like Excel, STATA, or SAS that provide point and click interfaces, R is an interpreted language, meaning you have to enter in R commands written in R code. In other words, you have to code/program in R. Note that we’ll use the terms “coding” and “programming” interchangeably in this book. While it is not required to be a seasoned coder/computer programmer to use R, there is still a set of basic programming concepts that R users need to understand. Consequently, while this book is not a book on programming, you will still learn just enough of these basic programming concepts needed to explore and analyze data effectively. 2.2.1 Basic programming concepts and terminology To introduce you to many of these basic programming concepts and terminology, we direct you to the following DataCamp online interactive tutorials. For each of the tutorials, we give a list of the basic programming concepts covered. Note that in this book, we will use a different font to distinguish regular font from computer_code. It is important to note that while these tutorials serve as excellent introductions, a single pass through them is insufficient for long-term learning and retention. The ultimate tools for long-term learning and retention are “learning by doing” and repetition, something we will have you do over the course of the entire book and we encourage this process as much as possible as you learn any new skill. From the Introduction to R course complete the following chapters. As you work through the chapters, carefully note the important terms and what they are used for. We recommend you do so in a notebook that you can easily refer back to. Chapter 1 Intro to basics: Console pane: where you enter in commands Objects: where values are saved, how to assign values to objects. Data types: integers, doubles/numerics, logicals, characters. Chapter 2 Vectors: Vectors: a series of values. Chapter 4 Factors: Categorical data (as opposed to numerical data) are represented in R as factors. Chapter 5 Data frames: Data frames are analogous to rectangular spreadsheets: they are representations of datasets in R where the rows correspond observations and the columns correspond to variables that describe the observations. We will revisit this later in Section 2.4. From the Intermediate R course complete the following chapters: Chapter 1 Conditionals and Control Flow: Testing for equality in R using == (and not = which is typically used for assignment). Ex: 2 + 1 == 3 compares 2 + 1 to 3 and is correct R syntax, while 2 + 1 = 3 is not and is incorrect R syntax. Boolean algebra: TRUE/FALSE statements and mathematical operators such as &lt; (less than), &lt;= (less than or equal), and != (not equal to). Logical operators: &amp; representing “and”, | representing “or”. Ex: (2 + 1 == 3) &amp; (2 + 1 == 4) returns FALSE while (2 + 1 == 3) | (2 + 1 == 4) returns TRUE. Chapter 3 Functions: Concept of functions: they take in inputs (called arguments) and return outputs. You either manually specify a function’s arguments or use the function’s defaults. This list is by no means an exhaustive list of all the programming concepts and terminology needed to become a savvy R user; such a list would be so large it wouldn’t be very useful, especially for novices. Rather, we feel this is the bare minimum you need to know before you get started; the rest we feel you can learn as you go. Remember that your knowledge of all of these concepts will build as you get better and better at “speaking R” and getting used to its syntax. 2.2.2 Errors, warnings, and messages One slightly confusing part of R is how it reports errors, warnings, and messages. The default theme in RStudio colors errors, warnings, and messages in red, which makes them seem like you did something wrong. However, seeing red text in the console is not always bad. R will show red text in the console in three different situations: Errors: When the red text is a legitimate error, it will be prefaced with “Error in…” and try to explain what went wrong. Generally when there’s an error, the code will not run. For example, as shown in Subsection 2.3.3 below if you see Error in ggplot(...) : could not find function &quot;ggplot&quot;, it means that the ggplot() function is not accessible because the package was not loaded with library(ggplot2), and thus you cannot use it. Warnings: When the red text is a warning, it will be prefaced with “Warning:” and try to explain why there’s a warning. Generally your code will still work, but with some caveats. For example, you see in Chapter 3 if you plot a scatterplot and one of the rows in your data frame is missing a value, you will see this warning: Warning: Removed 1 rows containing missing values (geom_point). R will still make the scatterplot with all the remaining values, but it’s warning you that one of the points isn’t there. Messages: When the red text doesn’t start with either “Error” or “Warning”, it’s just a friendly message. You’ll see these messages when you load some packages like the dplyr package in Subsection 2.3.2 below, or when you read data saved in spreadsheet files with read_csv() as you’ll see in Chapter 5. These are helpful diagnostic messages and they don’t stop your code from working. Remember, when you see red text in the console, don’t panic. It doesn’t necessarily mean anything is wrong. If the text starts with “Error”, figure out what’s causing it. Think of errors as a red traffic light: something is wrong! If the text starts with “Warning”, figure out if it’s something to worry about. For instance, if you get a warning about missing values in a scatterplot and you know there are missing values, you’re fine. If that’s surprising, look at your data and see what’s missing. Think of warnings as a yellow traffic light: everything is working fine, but watch out/pay attention. Otherwise the text is just a message. Read it, wave back at R, and thank it for talking to you. Think of messages as a green traffic light: everything is working fine. 2.2.3 Tips on learning to code Learning to code/program is very much like learning a foreign language, it can be very daunting and frustrating at first. Such frustrations are very common and it is very normal to feel discouraged as you learn. However just as with learning a foreign language, if you put in the effort and are not afraid to make mistakes, anybody can learn. Here are a few useful tips to keep in mind as you learn to program: Remember that computers are not actually that smart: You may think your computer or smartphone are “smart,” but really people spent a lot of time and energy designing them to appear “smart.” Rather you have to tell a computer everything it needs to do. Furthermore the instructions you give your computer can’t have any mistakes in them, nor can they be ambiguous in any way. Take the “copy, paste, and tweak” approach: Especially when learning your first programming language, it is often much easier to taking existing code that you know works and modify it to suit your ends, rather than trying to write new code from scratch. We call this the copy, paste, and tweak approach. So early on, we suggest not trying to write code from memory, but rather take existing examples we have provided you, then copy, paste, and tweak them to suit your goals. Don’t be afraid to play around! The best way to learn to code is by doing: Rather than learning to code for its own sake, we feel that learning to code goes much smoother when you have a goal in mind or when you are working on a particular project, like analyzing data that you are interested in. Practice is key: Just as the only method to improving your foreign language skills is through practice, practice, and practice; so also the only method to improving your coding is through practice, practice, and practice. Don’t worry however; we’ll give you plenty of opportunities to do so! 2.3 What are R packages? Another point of confusion with many new R users is the idea of an R package. R packages extend the functionality of R by providing additional functions, data, and documentation. They are written by a world-wide community of R users and can be downloaded for free from the internet. For example, among the many packages we will use in this book are: The ggplot2 package for data visualization in Chapter 3. The dplyr package for data wrangling in Chapter 4. The moderndive package that accompanies this book. The infer package for “tidy” and transparent statistical inference in Chapters 9, 10, and 11. A good analogy for R packages is they are like apps you can download onto a mobile phone: R: A new phone R Packages: Apps you can download So R is like a new mobile phone: while it has a certain amount of features when you use it for the first time, it doesn’t have everything. R packages are like the apps you can download onto your phone from Apple’s App Store or Android’s Google Play. Let’s continue this analogy by considering the Instagram app for editing and sharing pictures. Say you have purchased a new phone and you would like to share a recent photo you have taken on Instagram. You need to: Install the app: Since your phone is new and does not include the Instagram app, you need to download the app from either the App Store or Google Play. You do this once and you’re set. You might do this again in the future any time there is an update to the app. Open the app: After you’ve installed Instagram, you need to open the app. Once Instagram is open on your phone, you can then proceed to share your photo with your friends and family. The process is very similar for using an R package. You need to: Install the package: This is like installing an app on your phone. Most packages are not installed by default when you install R and RStudio. Thus if you want to use a package for the first time, you need to install it first. Once you’ve installed a package, you likely won’t install it again unless you want to update it to a newer version. “Load” the package: “Loading” a package is like opening an app on your phone. Packages are not “loaded” by default when you start RStudio on your computer; you need to “load” each package you want to use every time you start RStudio. Let’s now show you how to perform these two steps for the ggplot2 package for data visualization. 2.3.1 Package installation Note about RStudio Server: If your instructor has provided you with a link and access to RStudio Server, you probably will not need to install packages, as they have likely been pre-installed for you by your instructor. That being said, it is still a good idea to know this process for later on when you are not using RStudio Server, but rather RStudio Desktop on your own computer. There are two ways to install an R package. For example, to install the ggplot2 package: Easy way: In the Files pane of RStudio: Click on the “Packages” tab Click on “Install” Type the name of the package under “Packages (separate multiple with space or comma):” In this case, type ggplot2 Click “Install” Slightly harder way: An alternative but slightly less convenient way to install a package is by typing install.packages(&quot;ggplot2&quot;) in the Console pane of RStudio and hitting enter. Note you must include the quotation marks. Much like an app on your phone, you only have to install a package once. However, if you want to update an already installed package to a newer verions, you need to re-install it by repeating the above steps. Learning check (LC2.1) Repeat the above installing steps, but for the dplyr, nycflights13, and knitr packages. This will install the earlier mentioned dplyr package, the nycflights13 package containing data on all domestic flights leaving a NYC airport in 2013, and the knitr package for writing reports in R. 2.3.2 Package loading Recall that after you’ve installed a package, you need to “load” it, in other words open it. We do this by using the library() command. For example, to load the ggplot2 package, run the following code in the Console pane. What do we mean by “run the following code”? Either type or copy &amp; paste the following code into the Console pane and then hit the enter key. library(ggplot2) If after running the above code, a blinking cursor returns next to the &gt; “prompt” sign, it means you were successful and the ggplot2 package is now loaded and ready to use. If however, you get a red “error message” that reads… Error in library(ggplot2) : there is no package called ‘ggplot2’ … it means that you didn’t successfully install it. In that case, go back to the previous subsection “Package installation” and install it. Learning check (LC2.2) “Load” the dplyr, nycflights13, and knitr packages as well by repeating the above steps. 2.3.3 Package use One extremely common mistake new R users make when wanting to use particular packages is they forget to “load” them first by using the library() command we just saw. Remember: you have to load each package you want to use every time you start RStudio. If you don’t first “load” a package, but attempt to use one of its features, you’ll see an error message similar to: Error: could not find function R is telling you that you are trying to use a function in a package that has not yet been “loaded.” Almost all new users forget do this when starting out, and it is a little annoying to get used. However, you’ll remember with pratice. 2.4 Explore your first dataset Let’s put everything we’ve learned so far into practice and start exploring some real data! Data comes to us in a variety of formats, from pictures to text to numbers. Throughout this book, we’ll focus on datasets that are saved in “spreadsheet”-type format; this is probably the most common way data are collected and saved in many fields. Remember from Subsection 2.2.1 that these “spreadsheet”-type datasets are called data frames in R; we will focus on working with data saved as data frames throughout this book. Let’s first load all the packages needed for this chapter, assuming you’ve already installed them. Read Section 2.3 for information on how to install and load R packages if you haven’t already. library(nycflights13) library(dplyr) library(knitr) At the beginning of all subsequent chapters in this text, we’ll always have a list of packages that you should have installed and loaded to work with that chapter’s R code. 2.4.1 nycflights13 package Many of us have flown on airplanes or know someone who has. Air travel has become an ever-present aspect in many people’s lives. If you live in or are visiting a relatively large city and you walk around that city’s airport, you see gates showing flight information from many different airlines. And you will frequently see that some flights are delayed because of a variety of conditions. Are there ways that we can avoid having to deal with these flight delays? We’d all like to arrive at our destinations on time whenever possible. (Unless you secretly love hanging out at airports. If you are one of these people, pretend for the moment that you are very much anticipating being at your final destination.) Throughout this book, we’re going to analyze data related to flights contained in the nycflights13 package (Wickham 2018). Specifically, this package contains five data sets saved in five separate data frames with information about all domestic flights departing from New York City in 2013. These include Newark Liberty International (EWR), John F. Kennedy International (JFK), and LaGuardia (LGA) airports: flights: Information on all 336,776 flights airlines: A table matching airline names and their two letter IATA airline codes (also known as carrier codes) for 16 airline companies planes: Information about each of 3,322 physical aircraft used. weather: Hourly meteorological data for each of the three NYC airports. This data frame has 26,115 rows, roughtly corresponding to the 365 \\(\\times\\) 24 \\(\\times\\) 3 = 26,280 possible hourly measurements one can observe at three locations over the course of a year. airports: Airport names, codes, and locations for 1,458 destination airports. 2.4.2 flights data frame We will begin by exploring the flights data frame that is included in the nycflights13 package and getting an idea of its structure. Run the following code in your console (either by typing it or cutting &amp; pasting it): it loads in the flights dataset into your Console. Note depending on the size of your monitor, the output may vary slightly. flights # A tibble: 336,776 x 19 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; 1 2013 1 1 517 515 2 830 819 2 2013 1 1 533 529 4 850 830 3 2013 1 1 542 540 2 923 850 4 2013 1 1 544 545 -1 1004 1022 5 2013 1 1 554 600 -6 812 837 6 2013 1 1 554 558 -4 740 728 7 2013 1 1 555 600 -5 913 854 8 2013 1 1 557 600 -3 709 723 9 2013 1 1 557 600 -3 838 846 10 2013 1 1 558 600 -2 753 745 # … with 336,766 more rows, and 11 more variables: arr_delay &lt;dbl&gt;, # carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;, # air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;, time_hour &lt;dttm&gt; Let’s unpack this output: A tibble: 336,776 x 19: A tibble is a kind of data frame used in R. This particular data frame has 336,776 rows 19 columns corresponding to 19 variables describing each observation year month day dep_time sched_dep_time dep_delay arr_time are different columns, in other words variables, of this data frame. We then have the first 10 rows of observations corresponding to 10 flights. ... with 336,766 more rows, and 11 more variables: indicating to us that 336,766 more rows of data and 11 more variables could not fit in this screen. Unfortunately, this output does not allow us to explore the data very well. Let’s look at different tools to explore data frames. 2.4.3 Exploring data frames Among the many ways of getting a feel for the data contained in a data frame such as flights, we present three functions that take as their “argument”, in other words their input, the data frame in question. We also include a fourth method for exploring one particular column of a data frame: Using the View() function built for use in RStudio. We will use this the most. Using the glimpse() function, which is included in the dplyr package. Using the kable() function, which is included in the knitr package. Using the $ operator to view a single variable in a data frame. 1. View(): Run View(flights) in your Console in RStudio, either by typing it or cutting &amp; pasting it into the Console pane, and explore this data frame in the resulting pop-up viewer. You should get into the habit of always Viewing any data frames that come your way. Note the capital “V” in View. R is case-sensitive so you’ll receive an error is you run view(flights) instead of View(flights). Learning check (LC2.3) What does any ONE row in this flights dataset refer to? A. Data on an airline B. Data on a flight C. Data on an airport D. Data on multiple flights By running View(flights), we see the different variables listed in the columns and we see that there are different types of variables. Some of the variables like distance, day, and arr_delay are what we will call quantitative variables. These variables are numerical in nature. Other variables here are categorical. Note that if you look in the leftmost column of the View(flights) output, you will see a column of numbers. These are the row numbers of the dataset. If you glance across a row with the same number, say row 5, you can get an idea of what each row corresponds to. In other words, this will allow you to identify what object is being referred to in a given row. This is often called the observational unit. The observational unit in this example is an individual flight departing New York City in 2013. You can identify the observational unit by determining what “thing” is being measured or described by each of the variables. 2. glimpse(): The second way to explore a data frame is using the glimpse() function included in the dplyr package. Thus, you can only use the glimpse() function after you’ve loaded the dplyr package. This function provides us with an alternative method for exploring a data frame than the View() function: glimpse(flights) Observations: 336,776 Variables: 19 $ year &lt;int&gt; 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, … $ month &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … $ day &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … $ dep_time &lt;int&gt; 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558,… $ sched_dep_time &lt;int&gt; 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600,… $ dep_delay &lt;dbl&gt; 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -… $ arr_time &lt;int&gt; 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849… $ sched_arr_time &lt;int&gt; 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851… $ arr_delay &lt;dbl&gt; 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -… $ carrier &lt;chr&gt; &quot;UA&quot;, &quot;UA&quot;, &quot;AA&quot;, &quot;B6&quot;, &quot;DL&quot;, &quot;UA&quot;, &quot;B6&quot;, &quot;EV&quot;, &quot;B6&quot;, … $ flight &lt;int&gt; 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, … $ tailnum &lt;chr&gt; &quot;N14228&quot;, &quot;N24211&quot;, &quot;N619AA&quot;, &quot;N804JB&quot;, &quot;N668DN&quot;, &quot;N39… $ origin &lt;chr&gt; &quot;EWR&quot;, &quot;LGA&quot;, &quot;JFK&quot;, &quot;JFK&quot;, &quot;LGA&quot;, &quot;EWR&quot;, &quot;EWR&quot;, &quot;LGA&quot;… $ dest &lt;chr&gt; &quot;IAH&quot;, &quot;IAH&quot;, &quot;MIA&quot;, &quot;BQN&quot;, &quot;ATL&quot;, &quot;ORD&quot;, &quot;FLL&quot;, &quot;IAD&quot;… $ air_time &lt;dbl&gt; 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, … $ distance &lt;dbl&gt; 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733,… $ hour &lt;dbl&gt; 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, … $ minute &lt;dbl&gt; 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, … $ time_hour &lt;dttm&gt; 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 … We see that glimpse() will give you the first few entries of each variable in a row after the variable. In addition, the data type (see Subsection 2.2.1) of the variable is given immediately after each variable’s name inside &lt; &gt;. Here, int and dbl refer to “integer” and “double”, which are computer coding terminology for quantitative/numerical variables. In contrast, chr refers to “character”, which is computer terminology for text data. Text data, such as the carrier or origin of a flight, are categorical variables. The time_hour variable is an example of one more type of data type: dttm. As you may suspect, this variable corresponds to a specific date and time of day. However, we won’t work with dates in this class and leave it to a more advanced book on data science. Learning check (LC2.4) What are some examples in this dataset of categorical variables? What makes them different than quantitative variables? 3. kable(): The final way to explore the entirety of a data frame is using the kable() function from the knitr package. Let’s explore the different carrier codes for all the airlines in our dataset two ways. Run both of these lines of code in your Console: airlines kable(airlines) At first glance, it may not appear that there is much difference in the outputs. However when using tools for document production such as R Markdown, the latter code produces output that is much more legible and reader-friendly. 4. $ operator Lastly, the $ operator allows us to explore a single variable within a data frame. For example, run the following in your console airlines airlines$name We used the $ operator to extract only the name variable and return it as a vector of length 16. We will only be occasionally exploring data frames using this operator, instead favoring the View() and glimpse() functions. 2.4.4 Help files Another nice feature of R is the help system. You can get help in R by entering a ? before the name of a function or data frame in question and you will be presented with a page showing the documentation. For example, let’s look at the help file for the flights data frame: ?flights A help file should pop-up in the Help pane of RStudio. If you have questions about a function or data frame included in an R package, you should get in the habit of consulting the help file right away. 2.5 Conclusion We’ve given you what we feel are the most essential concepts to know before you can start exploring data in R. Is this chapter exhaustive? Absolutely not. To try to include everything in this chapter would make the chapter so large it wouldn’t be useful! 2.5.1 Additional resources If you are completely new to the world of coding, R, and RStudio and feel you could benefit from a more detailed introduction, we suggest you check out ModernDive co-author Chester Ismay’s Getting used to R, RStudio, and R Markdown short book (Ismay 2016), which includes screencast recordings that you can follow along and pause as you learn. Furthermore, there is an introduction to R Markdown, a tool used for reproducible research in R. 2.5.2 What’s to come? As we stated earlier however, the best way to learn R is to learn by doing. We now start the “data science” portion of the book in Chapter 3 with what we feel is the most important tool in a data scientist’s toolbox: data visualization. We will continue to explore the data included in the nycflights13 package through data visualization. We’ll see that data visualization is a powerful tool to add to our toolbox for data exploring that provides additional insight to what the View() and glimpse() functions can provide. FIGURE 2.1: ModernDive flowchart References "],
-["3-viz.html", "Chapter 3 Data Visualization 3.1 The Grammar of Graphics 3.2 Five Named Graphs - The 5NG 3.3 5NG#1: Scatterplots 3.4 5NG#2: Linegraphs 3.5 5NG#3: Histograms 3.6 Facets 3.7 5NG#4: Boxplots 3.8 5NG#5: Barplots 3.9 Conclusion", " Chapter 3 Data Visualization We begin the development of your data science toolbox with data visualization. By visualizing our data, we gain valuable insights that we couldn’t initially see from just looking at the raw data in spreadsheet form. We will use the ggplot2 package as it provides an easy way to customize your plots. ggplot2 is rooted in the data visualization theory known as The Grammar of Graphics (Wilkinson 2005). At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). Graphics should be designed to emphasise the findings and insight you want your audience to understand. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don’t want to include so many as to overwhelm your audience. As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the distribution of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is distributed in terms of its values) as we go across the levels of a different categorical variable. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(nycflights13) library(ggplot2) library(dplyr) 3.1 The Grammar of Graphics We begin with a discussion of a theoretical framework for data visualization known as “The Grammar of Graphics,” which serves as the foundation for the ggplot2 package. Think of how we construct sentences in english to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can’t just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, “The Grammar of Graphics” define a set of rules for contructing statistical graphics by combining different types of layers. This grammar was created by Leland Wilkinson (Wilkinson 2005) and has been implemented in a variety of data visualization software including R. 3.1.1 Components of the Grammar In short, the grammar tells us that: A statistical graphic is a mapping of data variables to aesthetic attributes of geometric objects. Specifically, we can break a graphic into the following three essential components: data: the data set composed of variables that we map. geom: the geometric object in question. This refers to the type of object we can observe in a plot. For example: points, lines, and bars. aes: aesthetic attributes of the geometric object. For example, x/y position, color, shape, and size. Each assigned aesthetic attribute can be mapped to a variable in our data set. You might be wondering why we wrote the terms data, geom, and aes in a computer code type font. We’ll see very shortly that we’ll specify the elements of the grammar in R using these terms. However, let’s first break down the grammar with an example. 3.1.2 Gapminder data In February 2006, a statistician named Hans Rosling gave a TED talk titled “The best stats you’ve ever seen” where he presented global economic, health, and development data from the website gapminder.org. For example, for the 142 countries included from 2007, let’s consider only the first 6 countries when listed alphabetically in Table 3.1. TABLE 3.1: Gapminder 2007 Data: First 6 of 142 countries Country Continent Life Expectancy Population GDP per Capita Afghanistan Asia 43.8 31889923 975 Albania Europe 76.4 3600523 5937 Algeria Africa 72.3 33333216 6223 Angola Africa 42.7 12420476 4797 Argentina Americas 75.3 40301927 12779 Australia Oceania 81.2 20434176 34435 Each row in this table corresponds to a country in 2007. For each row, we have 5 columns: Country: Name of country. Continent: Which of the five continents the country is part of. (Note that “Americas” includes countries in both North and South America and that Antarctica is excluded.) Life Expectancy: Life expectancy in years. Population: Number of people living in the country. GDP per Capita: Gross domestic product (in US dollars). Now consider Figure 3.1, which plots this data for all 142 countries in the data. FIGURE 3.1: Life Expectancy over GDP per Capita in 2007 Let’s view this plot through the grammar of graphics: The data variable GDP per Capita gets mapped to the x-position aesthetic of the points. The data variable Life Expectancy gets mapped to the y-position aesthetic of the points. The data variable Population gets mapped to the size aesthetic of the points. The data variable Continent gets mapped to the color aesthetic of the points. We’ll see shortly that data corresponds to the particular data frame where our data is saved and a “data variable” corresponds to a particular column in the data frame. Furthermore, the type of geometric object considered in this plot are points. That being said, while in this example we are considering points, graphics are not limited to just points. Other plots involve lines while others involve bars. Let’s summarize the three essential components of the Grammar in Table 3.2. TABLE 3.2: Summary of Grammar of Graphics for this plot data variable aes geom GDP per Capita x point Life Expectancy y point Population size point Continent color point 3.1.3 Other components There are other components of the Grammar of Graphics we can control as well. As you start to delve deeper into the Grammar of Graphics, you’ll start to encounter these topics more frequently. In this book however, we’ll keep things simple and only work with the two additional components listed below: faceting breaks up a plot into small multiples corresponding to the levels of another variable (Section 3.6) position adjustments for barplots (Section 3.8) Other more complex components like scales and coordinate systems are left for a more advanced text such as R for Data Science (Grolemund and Wickham 2016). Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifiying them. 3.1.4 ggplot2 package In this book, we will be using the ggplot2 package for data visualization, which is an implementation of the Grammar of Graphics for R (Wickham et al. 2019). As we noted earlier, a lot of the previous section was written in a computer code type font. This is because the various components of the Grammar of Graphics are specified in the ggplot() function included in the ggplot2 package, which expects at a minimum as arguments (i.e. inputs): The data frame where the variables exist: the data argument. The mapping of the variables to aesthetic attributes: the mapping argument which specifies the aesthetic attributes involved. After we’ve specified these components, we then add layers to the plot using the + sign. The most essential layer to add to a plot is the layer that specifies which type of geometric object we want the plot to involve: points, lines, bars, and others. Other layers we can add to a plot include layers specifying the plot title, axes labels, visual themes for the plots, and facets (which we’ll see in Section 3.6). Let’s now put the theory of the Grammar of Graphics into practice. 3.2 Five Named Graphs - The 5NG In order to keep things simple, we will only five different types of graphics in this book, each with a commonly given name. We term these “five named graphs” the 5NG: scatterplots linegraphs boxplots histograms barplots We will discuss some variations of these plots, but with this basic repertoire of graphics in your toolbox you can visualize a wide array of different variable types. Note that certain plots are only appropriate for categorical variables and while others are only appropriate for quantitative variables. You’ll want to quiz yourself often as we go along on which plot makes sense a given a particular problem or data set. 3.3 5NG#1: Scatterplots The simplest of the 5NG are scatterplots, also called bivariate plots. They allow you to visualize the relationship between two numerical variables. While you may already be familiar with scatterplots, let’s view them through the lens of the Grammar of Graphics. Specifically, we will visualize the relationship between the following two numerical variables in the flights data frame included in the nycflights13 package: dep_delay: departure delay on the horizontal “x” axis and arr_delay: arrival delay on the vertical “y” axis for Alaska Airlines flights leaving NYC in 2013. This requires paring down the data from all 336,776 flights that left NYC in 2013, to only the 714 Alaska Airlines flights that left NYC in 2013. What this means computationally is: we’ll take the flights data frame, extract only the 714 rows corresponding to Alaska Airlines flights, and save this in a new data frame called alaska_flights. Run the code below to do this: alaska_flights &lt;- flights %&gt;% filter(carrier == &quot;AS&quot;) For now we suggest you ignore how this code works; we’ll explain this in detail in Chapter 4 when we cover data wrangling. However, convince yourself that this code does what it is supposed to by running View(alaska_flights): it creates a new data frame alaska_flights consisting of only the 714 Alaska Airlines flights. We’ll see later in Chapter 4 on data wrangling that this code uses the dplyr package for data wrangling to achieve our goal: it takes the flights data frame and filters it to only return the rows where carrier is equal to &quot;AS&quot;, Alaska Airlines’ carrier code. Other examples of carrier codes include “AA” for American Airlines and “UA” for United Airlines. Recall from Section 2.2 that testing for equality is specified with == and not =. Fasten your seat belts and sit tight for now however, we’ll introduce these ideas more fully in Chapter 4. Learning check (LC3.1) Take a look at both the flights and alaska_flights data frames by running View(flights) and View(alaska_flights). In what respect do these data frames differ? 3.3.1 Scatterplots via geom_point Let’s now go over the code that will create the desired scatterplot, keeping in mind our discussion on the Grammar of Graphics in Section 3.1. We’ll be using the ggplot() function included in the ggplot2 package. ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point() Let’s break this down piece-by-piece: Within the ggplot() function, we specify two of the components of the Grammar of Graphics as arguments (i.e. inputs): The data frame to be alaska_flights by setting data = alaska_flights. The aesthetic mapping by setting aes(x = dep_delay, y = arr_delay). Specifically: the variable dep_delay maps to the x position aesthetic the variable arr_delay maps to the y position aesthetic We add a layer to the ggplot() function call using the + sign. The layer in question specifies the third component of the grammar: the geometric object. In this case the geometric object are points, set by specifying geom_point(). After running the above code, you’ll notice two outputs: a warning message and the graphic shown in Figure 3.2. Let’s first unpack the warning message: Warning: Removed 5 rows containing missing values (geom_point). FIGURE 3.2: Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013 After running the above code, R returns a warning message alerting us to the fact that 5 rows were ignored due to them being missing. For 5 rows either the value for dep_delay or arr_delay or both were missing (recorded in R as NA), and thus these rows were ignored in our plot. Turning our attention to the resulting scatterplot in Figure 3.2, we see that a positive relationship exists between dep_delay and arr_delay: as departure delays increase, arrival delays tend to also increase. We also note the large mass of points clustered near (0, 0). Before we continue, let’s consider a few more notes on the layers in the above code that generated the scatterplot: Note that the + sign comes at the end of lines, and not at the beginning. You’ll get an error in R if you put it at the beginning. When adding layers to a plot, you are encouraged to start a new line after the + so that the code for each layer is on a new line. As we add more and more layers to plots, you’ll see this will greatly improve the legibility of your code. To stress the importance of adding layers in particular the layer specifying the geometric object, consider Figure 3.3 where no layers are added. A not very useful plot! ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) FIGURE 3.3: Plot with No Layers Learning check (LC3.2) What are some practical reasons why dep_delay and arr_delay have a positive relationship? (LC3.3) What variables (not necessarily in the flights data frame) would you expect to have a negative correlation (i.e. a negative relationship) with dep_delay? Why? Remember that we are focusing on numerical variables here. (LC3.4) Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights? (LC3.5) What are some other features of the plot that stand out to you? (LC3.6) Create a new scatterplot using different variables in the alaska_flights data frame by modifying the example above. 3.3.2 Over-plotting The large mass of points near (0, 0) in Figure 3.2 can cause some confusion as it is hard to tell the true number of points that are plotted. This is the result of a phenomenon called overplotting. As one may guess, this corresponds to values being plotted on top of each other over and over again. It is often difficult to know just how many values are plotted in this way when looking at a basic scatterplot as we have here. There are two methods to address the issue of overplotting: By adjusting the transparency of the points. By adding a little random “jitter”, or random “nudges”, to each of the points. Method 1: Changing the transparency The first way of addressing overplotting is by changing the transparency of the points by using the alpha argument in geom_point(). By default, this value is set to 1. We can change this to any value between 0 and 1, where 0 sets the points to be 100% transparent and 1 sets the points to be 100% opaque. Note how the following code is identical to the code in Section 3.3 that created the scatterplot with overplotting, but with alpha = 0.2 added to the geom_point(): ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point(alpha = 0.2) FIGURE 3.4: Delay scatterplot with alpha=0.2 The key feature to note in Figure 3.4 is that the transparency of the points is cumulative: areas with a high-degree of overplotting are darker, whereas areas with a lower degree are less dark. Note furthermore that there is no aes() surrounding alpha = 0.2. This is because we are not mapping a variable to an aesthetic attribute, but rather merely changing the default setting of alpha. In fact, you’ll receive an error if you try to change the second line above to read geom_point(aes(alpha = 0.2)). Method 2: Jittering the points The second way of addressing overplotting is by jittering all the points, in other words give each point a small nudge in a random direction. You can think of “jittering” as shaking the points around a bit on the plot. Let’s illustrate using a simple example first. Say we have a data frame jitter_example with 4 rows of identical value 0 for both x and y: # A tibble: 4 x 2 x y &lt;dbl&gt; &lt;dbl&gt; 1 0 0 2 0 0 3 0 0 4 0 0 We display the resulting scatterplot in Figure 3.5; observe that the 4 points are superimposed on top of each other. While we know there are 4 values being plotted, this fact might not be apparent to others. FIGURE 3.5: Regular scatterplot of jitter example data In Figure 3.6 we instead display a jittered scatterplot where each point is given a random “nudge.” It is now plainly evident that this plot involves four points. Keep in mind that jittering is strictly a visualization tool; even after creating a jittered scatterplot, the original values saved in jitter_example remain unchanged. FIGURE 3.6: Jittered scatterplot of jitter example data To create a jittered scatterplot, instead of using geom_point(), we use geom_jitter(). To specify how much jitter to add, we adjust the width and height arguments. This corresponds to how hard you’d like to shake the plot in units corresponding to those for both the horizontal and vertical variables (in this case minutes). ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_jitter(width = 30, height = 30) FIGURE 3.7: Jittered delay scatterplot Observe how the above code is identical to the code that created the scatterplot with overplotting in Subsection 3.3.1, but with geom_point() replaced with geom_jitter(). The resulting plot in Figure 3.7 helps us a little bit in getting a sense for the overplotting, but with a relatively large data set like this one (714 flights), it can be argued that changing the transparency of the points by setting alpha proved more effective. In terms of how much jitter one should add using the width and height arguments, it is important to add just enough jitter to break any overlap in points, but not so much that we completely alter the overall pattern in points. Learning check (LC3.7) Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? (LC3.8) After viewing the Figure 3.4 above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2? 3.3.3 Summary Scatterplots display the relationship between two numerical variables. They are among the most commonly used plots because they can provide an immediate way to see the trend in one variable versus another. However, if you try to create a scatterplot where either one of the two variables is not numerical, you might get strange results. Be careful! With medium to large data sets, you may need to play around with the different modifications one can make to a scatterplot. This tweaking is often a fun part of data visualization, since you’ll have the chance to see different relationships come about as you make subtle changes to your plots. 3.4 5NG#2: Linegraphs The next of the five named graphs are linegraphs. Linegraphs show the relationship between two numerical variables when the variable on the x-axis, also called the explanatory variable, is of a sequential nature; in other words there is an inherent ordering to the variable. The most common example of linegraphs have some notion of time on the x-axis: hours, days, weeks, years, etc. Since time is sequential, we connect consecutive observations of the variable on the y-axis with a line. Linegraphs that have some notion of time on the x-axis are also called time series plots. Linegraphs should be avoided when there is not a clear sequential ordering to the variable on the x-axis. Let’s illustrate linegraphs using another data set in the nycflights13 package: the weather data frame. Let’s get a sense for the weather data frame: Explore the weather data by running View(weather). Run ?weather to bring up the help file. We can see that there is a variable called temp of hourly temperature recordings in Fahrenheit at weather stations near all three airports in New York City: Newark (origin code EWR), JFK, and La Guardia (LGA). Instead of considering hourly temperatures for all days in 2013 for all three airports however, for simplicity let’s only consider hourly temperatures at only Newark airport for the first 15 days in January. Recall in Section 3.3 we used the filter() function to only choose the subset of rows of flights corresponding to Alaska Airlines flights. We similarly use filter() here, but by using the &amp; operator we only choose the subset of rows of weather where The origin is &quot;EWR&quot; and the month is January and the day is between 1 and 15 early_january_weather &lt;- weather %&gt;% filter(origin == &quot;EWR&quot; &amp; month == 1 &amp; day &lt;= 15) Learning check (LC3.9) Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather). In what respect do these data frames differ? (LC3.10) View() the flights data frame again. Why does the time_hour variable uniquely identify the hour of the measurement whereas the hour variable does not? 3.4.1 Linegraphs via geom_line Let’s plot a linegraph of hourly temperatures in early_january_weather by using geom_line() instead of geom_point() like we did for scatterplots: ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) + geom_line() FIGURE 3.8: Hourly Temperature in Newark for January 1-15, 2013 Much as with the ggplot() code that created the scatterplot of departure and arrival delays for Alaska Airlines flights in Figure 3.2, let’s break down the above code piece-by-piece in terms of the Grammar of Graphics: Within the ggplot() function call, we specify two of the components of the Grammar of Graphics as arguments: The data frame to be early_january_weather by setting data = early_january_weather The aesthetic mapping by setting aes(x = time_hour, y = temp). Specifically: the variable time_hour maps to the x position aesthetic. the variable temp maps to the y position aesthetic We add a layer to the ggplot() function call using the + sign. The layer in question specifies the third component of the grammar: the geometric object in question. In this case the geometric object is a line, set by specifying geom_line(). Learning check (LC3.11) Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? (LC3.12) Why are linegraphs frequently used when time is the explanatory variable on the x-axis? (LC3.13) Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013. 3.4.2 Summary Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use lingraphs over scatterplots when the variable on the x-axis (i.e. the explanatory variable) has an inherent ordering, like some notion of time. 3.5 5NG#3: Histograms Let’s consider the temp variable in the weather data frame once again, but unlike with the linegraphs in Section 3.4, let’s say we don’t care about the relationship of temperature to time, but rather we only care about how the values of temp distribute. In other words: What are the smallest and largest values? What is the “center” value? How do the values spread out? What are frequent and infrequent values? One way to visualize this distribution of this single variable temp is to plot them on a horizontal line as we do in Figure 3.9: FIGURE 3.9: Plot of Hourly Temperature Recordings from NYC in 2013 This gives us a general idea of how the values of temp distribute: observe that temperatures vary from around 11°F up to 100°F. Furthermore, there appear to be more recorded temperatures between 40°F and 60°F than outside this range. However, because of the high degree of overlap in the points, it’s hard to get a sense of exactly how many values are between, say, 50°F and 55°F. What is commonly produced instead of the above plot is known as a histogram. A histogram is a plot that visualizes the distribution of a numerical value as follows: We first cut up the x-axis into a series of bins, where each bin represents a range of values. For each bin, we count the number of observations that fall in the range corresponding to that bin. Then for each bin, we draw a bar whose height marks the corresponding count. Let’s drill-down on an example of a histogram, shown in Figure 3.10. FIGURE 3.10: Example histogram. Observe that there are three bins of equal width between 30°F and 60°F, thus we have three bins of width 10°F each: one bin for the 30-40°F range, another bin for the 40-50°F range, and another bin for the 50-60°F range. Since: The bin for the 30-40°F range has a height of around 5000, this histogram is telling us that around 5000 of the hourly temperature recordings are between 30°F and 40°F. The bin for the 40-50°F range has a height of around 4300, this histogram is telling us that around 4300 of the hourly temperature recordings are between 40°F and 50°F. The bin for the 50-60°F range has a height of around 3500, this histogram is telling us that around 3500 of the hourly temperature recordings are between 50°F and 60°F. The remaining bins all have a similar interpretation. 3.5.1 Histograms via geom_histogram Let’s now present the ggplot() code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in aes(): the single numerical variable temp. The y-aesthetic of a histogram gets computed for you automatically. Furthemore, the geometric object layer is now a geom_histogram() ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram() `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Warning: Removed 1 rows containing non-finite values (stat_bin). FIGURE 3.11: Histogram of hourly temperatures at three NYC airports. Let’s unpack the messages R sent us first. The first message is telling us that the histogram was constructed using bins = 30, in other words 30 equally spaced bins. This is known in computer programming as a default value; unless you override this default number of bins with a number you specify, R will choose 30 by default. We’ll see in the next section how to change this default number of bins. The second message is telling us something similar to the warning message we received when we ran the code to create a scatterplot of departure and arrival delays for Alaska Airlines flights in Figure 3.2: that because one row has a missing NA value for temp, it was omitted from the histogram. R is just giving us a friendly heads up that this was the case. Now’s let’s unpack the resulting histogram in Figure 3.11. Observe that values less than 25°F as well as values above 80°F are rather rare. However, because of the large number of bins, its hard to get a sense for which range of temperatures is covered by each bin; everything is one giant amorphous blob. So let’s add white vertical borders demarcating the bins by adding a color = &quot;white&quot; argument to geom_histogram(): ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(color = &quot;white&quot;) FIGURE 3.12: Histogram of hourly temperatures at three NYC airports with white borders. We can now better associate ranges of temperatures to each of the bins. We can also vary the color of the bars by setting the fill argument. Run colors() to see all 657 possible choice of colors! ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(color = &quot;white&quot;, fill = &quot;steelblue&quot;) FIGURE 3.13: Histogram of hourly temperatures at three NYC airports with white borders. 3.5.2 Adjusting the bins Observe in both Figure 3.12 and Figure 3.13 that in the 50-75°F range there appear to be roughly 8 bins. Thus each bin has width 25 divided by 8, or roughly 3.12°F which is not a very easily interpretable range to work with. Let’s now adjust the number of bins in our histogram in one of two methods: By adjusting the number of bins via the bins argument to geom_histogram(). By adjusting the width of the bins via the binwidth argument to geom_histogram(). Using the first method, we have the power to specify how many bins we would like to cut the x-axis up in. As mentioned in the previous section, the default number of bins is 30. We can override this default, to say 40 bins, as follows: ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(bins = 40, color = &quot;white&quot;) FIGURE 3.14: Histogram with 60 bins. Using the second method, instead of specifying the number of bins, we specify the width of the bins by using the binwidth argument in the geom_histogram() layer. For example, let’s set the width of each bin to be 10°F. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(binwidth = 10, color = &quot;white&quot;) FIGURE 3.15: Histogram with binwidth 10. Learning check (LC3.14) What does changing the number of bins from 30 to 40 tell us about the distribution of temperatures? (LC3.15) Would you classify the distribution of temperatures as symmetric or skewed? (LC3.16) What would you guess is the “center” value in this distribution? Why did you make that choice? (LC3.17) Is this data spread out greatly from the center or is it close? Why? 3.5.3 Summary Histograms, unlike scatterplots and linegraphs, present information on only a single numerical variable. Specifically, they are visualizations of the distribution of the numerical variable in question. 3.6 Facets Before continuing the 5NG, let’s briefly introduce a new concept called faceting. Faceting is used when we’d like to split a particular visualization of variables by another variable. This will create mutiple copies of the same type of plot with matching x and y axes, but whose content will differ. For example, suppose we were interested in looking at how the histogram of hourly temperature recordings at the three NYC airports we saw in Section 3.5 differed by month. We would “split” this histogram by the 12 possible months in a given year, in other words plot histograms of temp for each month. We do this by adding facet_wrap(~ month) layer. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + facet_wrap(~ month) FIGURE 3.16: Faceted histogram. Note the use of the tilde ~ before month in facet_wrap(). The tilde is required and you’ll receive the error Error in as.quoted(facets) : object 'month' not found if you don’t include it before month here. We can also specify the number of rows and columns in the grid by using the nrow and ncol arguments inside of facet_wrap(). For example, say we would like our facetted plot to have 4 rows instead of 3. Add the nrow = 4 argument to facet_wrap(~ month) ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + facet_wrap(~ month, nrow = 4) FIGURE 3.17: Faceted histogram with 4 instead of 3 rows. Observe in both Figure 3.16 and Figure 3.17 that as we might expect in the Northern Hemisphere, temperatures tend to be higher in the summer months, while they tend to be lower in the winter. Learning check (LC3.18) What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables? (LC3.19) What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100? (LC3.20) For which types of data sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics. (LC3.21) Does the temp variable in the weather data set have a lot of variability? Why do you say that? 3.7 5NG#4: Boxplots While faceted histograms are one visualization that allows us to compare distributions of a numerical variable split by another variable, another visualization that achieves this same goal are side-by-side boxplots. A boxplot is constructed from the information provided in the five-number summary of a numerical variable (see Appendix A). To keep things simple for now, let’s only consider hourly temperature recordings for the month of November in Figure 3.18. FIGURE 3.18: November temperatures. These 2141 observations have the following five-number summary: Minimum: 21.02°F First quartile AKA 25th percentile: 35.96°F Median AKA second quartile AKA 50th percentile: 44.96°F Third quartile AKA 75th percentile: 51.98°F Maximum: 71.06°F Let’s mark these 5 values with dashed horizontal lines in Figure 3.19. FIGURE 3.19: November temperatures. Let’s add the boxplot underneath these points and dashed horizontal lines in Figure 3.20. FIGURE 3.20: November temperatures. What the boxplot does summarize the 2141 points by emphasizing that: 25% of points (about 534 observations) fall below the bottom edge of the box, which is the first quartile of 35.96°F. In other words 25% of observations were colder than 35.96°F. 25% of points fall between the bottom edge of the box and the solid middle line, which is the median of 44.96°F. In other words 25% of observations were between 35.96 and 44.96°F and 50% of observations were colder than 44.96°F. 25% of points fall between the solid middle line and the top edge of the box, which is the third quartile of 51.98°F. In other words 25% of observations were between 44.96 and 51.98°F and 75% of observations were colder than 51.98°F. 25% of points fall over the top edge of the box. In other words 25% of observations were warmer than 51.98°F. The middle 50% of points lie within the interquartile range between the first and third quartile of 51.98 - 35.96 = 16.02°F. Lastly, for clarity’s sake let’s remove the points but keep the dashed horizontal lines in Figure 3.21. FIGURE 3.21: November temperatures. We can now better see the whiskers of the boxplot. They stick out from either end of the box all the way to the minimum and maximum observed temperatures of 21.02°F and 71.06°F respectively. However, the whiskers don’t always extend to the smallest and largest observed values. They in fact can extend no more than 1.5 \\(\\times\\) the interquartile range from either end of the box, in this case 1.5 \\(\\times\\) 16.02°F = 24.03°F from either end of the box. Any observed values outside this whiskers get marked with points called outliers, which we’ll see in the next section. 3.7.1 Boxplots via geom_boxplot Let’s now create a side-by-side boxplot of hourly temperatures split by the 12 months as we did above with the faceted histograms. We do this by mapping the month variable to the x-position aesthetic, the temp variable to the y-position aesthetic, and by adding a geom_boxplot() layer: ggplot(data = weather, mapping = aes(x = month, y = temp)) + geom_boxplot() FIGURE 3.22: Invalid boxplot specification Warning messages: 1: Continuous x aesthetic -- did you forget aes(group=...)? 2: Removed 1 rows containing non-finite values (stat_boxplot). Observe in Figure 3.22 that this plot does not provide information about temperature separated by month. The warning messages clue us in as to why. The second warning message is identical to the warning message when plotting a histogram of hourly temperatures: that one of the values was recorded as NA missing. However, the first warning message is telling us that we have a “continuous”, or numerical variable, on the x-position aesthetic. Boxplots however require a categorical variable on the x-axis. We can convert the numerical variable month into a categorical variable by using the factor() function. So after applying factor(month), month goes from having numerical values 1, 2, …, 12 to having labels “1”, “2”, …, “12.” ggplot(data = weather, mapping = aes(x = factor(month), y = temp)) + geom_boxplot() FIGURE 3.23: Month by temp boxplot The resulting Figure 3.23 shows 12 separate “box and whiskers” plots with the features we saw earlier focusing only on November: The “box” portions of this visualization represent the 1st quartile, the median AKA the 2nd quartile, and the 3rd quartile. The height of each box, i.e. the value of the 3rd quartile minus the value of the 1st quartile, is the interquartile range. It is a measure of spread of the middle 50% of values, with longer boxes indicating more variability. The “whisker” portions of these plots extend out from the bottoms and tops of the boxes and represent points less than the 25th percentile and greater than the 75th percentiles respectively. They’re set to extend out no more than \\(1.5 \\times IQR\\) units away from either end of the boxes. We say “no more than” because the ends of the whiskers have to correspond to observed temperatures. The length of these whiskers show how the data outside the middle 50% of values vary, with longer whiskers indicating more variability. The dots representing values falling outside the whiskers are called outliers. These can be thought of as anomalous values. It is important to keep in mind that the definition of an outlier is somewhat arbitrary and not absolute. In this case, they are defined by the length of the whiskers, which are no more than \\(1.5 \\times IQR\\) units long. Looking at this plot we can see, as expected, that summer months (6 through 8) have higher median temperatures as evidenced by the higher solid lines in the middle of the boxes. We can easily compare temperatures across months by drawing imaginary horizontal lines across the plot. Furthermore, the height of the 12 boxes as quantified by the interquartile ranges are informative too; they tell us about variability, or spread, of temperatures recorded in a given month. Learning check (LC3.22) What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point. (LC3.23) Which months have the highest variability in temperature? What reasons can you give for this? (LC3.24) We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example? (LC3.25) Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram? 3.7.2 Summary Side-by-side boxplots provide us with a way to compare and contrast the distribution of a quantitative variable across multiple levels of another categorical variable. One can see where the median falls across the different groups by looking at the center line in the boxes. To see how spread out the variable is across the different groups, look at both the width of the box and also how far the whiskers stretch out away from the box. Outliers are even more easily identified when looking at a boxplot than when looking at a histogram as they are marked with points. 3.8 5NG#5: Barplots Both histograms and boxplots are tools to visualize the distribution of numerical variables. Another common task is visualize the distribution of a categorical variable. This is a simpler task, as we are simply counting different categories, also known as levels, of a categorical variable. Often the best way to visualize these different counts, also known as frequencies, is with a barplot (also known as a barchart). One complication, however, is how your data is represented: is the categorical variable of interest “pre-counted” or not? For example, run the following code that manually creates two data frames representing a collection of fruit: 3 apples and 2 oranges. fruits &lt;- data_frame( fruit = c(&quot;apple&quot;, &quot;apple&quot;, &quot;orange&quot;, &quot;apple&quot;, &quot;orange&quot;) ) fruits_counted &lt;- data_frame( fruit = c(&quot;apple&quot;, &quot;orange&quot;), number = c(3, 2) ) We see both the fruits and fruits_counted data frames represent the same collection of fruit. Whereas fruits just lists the fruit individually… # A tibble: 5 x 1 fruit &lt;chr&gt; 1 apple 2 apple 3 orange 4 apple 5 orange … fruits_counted has a variable count which represents pre-counted values of each fruit. # A tibble: 2 x 2 fruit number &lt;chr&gt; &lt;dbl&gt; 1 apple 3 2 orange 2 Depending on how your categorical data is represented, you’ll need to use add a different geom layer to your ggplot() to create a barplot, as we now explore. 3.8.1 Barplots via geom_bar or geom_col Let’s generate barplots using these two different representations of the same basket of fruit: 3 apples and 2 oranges. Using the fruits data frame where all 5 fruits are listed individually in 5 rows, we map the fruit variable to the x-position aesthetic and add a geom_bar() layer. ggplot(data = fruits, mapping = aes(x = fruit)) + geom_bar() FIGURE 3.24: Barplot when counts are not pre-counted However, using the fruits_counted data frame where the fruit have been “pre-counted”, we map the fruit variable to the x-position aesthetic as with geom_bar(), but we also map the count variable to the y-position aesthetic, and add a geom_col() layer. ggplot(data = fruits_counted, mapping = aes(x = fruit, y = number)) + geom_col() FIGURE 3.25: Barplot when counts are pre-counted Compare the barplots in Figures 3.24 and 3.25. They are identical because they reflect count of the same 5 fruit. However depending on how our data is saved, either pre-counted or not, we must add a different geom layer. When the categorical variable whose distribution you want to visualize is: Is not pre-counted in your data frame: use geom_bar(). Is pre-counted in your data frame, use geom_col() with the y-position aesthetic mapped to the variable that has the counts. Let’s now go back to the flights data frame in the nycflights13 package and visualize the distribution of the categorical variable carrier. In other words, let’s visualize the number of domestic flights out of the three New York City airports each airline company flew in 2013. Recall from Section 2.4.3 when you first explored the flights data frame you saw that each row corresponds to a flight. In other words the flights data frame is more like the fruits data frame than the fruits_counted data frame above, and thus we should use geom_bar() instead of geom_col() to create a barplot. Much like a geom_histogram(), there is only one variable in the aes() aesthetic mapping: the variable carrier gets mapped to the x-position. ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() FIGURE 3.26: Number of flights departing NYC in 2013 by airline using geom_bar Observe in Figure 3.26 that United Air Lines (UA), JetBlue Airways (B6), and ExpressJet Airlines (EV) had the most flights depart New York City in 2013. If you don’t know which airlines correspond to which carrier codes, then run View(airlines) to see a directory of airlines. For example: AA is American Airlines; B6 is JetBlue Airways; DL is Delta Airlines; EV is ExpressJet Airlines; MQ is Envoy Air; while UA is United Airlines. Alternatively, say you had a data frame flights_counted where the number of flights for each carrier was pre-counted like in Table 3.3. TABLE 3.3: Number of flights pre-counted for each carrier. carrier number 9E 18460 AA 32729 AS 714 B6 54635 DL 48110 EV 54173 F9 685 FL 3260 HA 342 MQ 26397 OO 32 UA 58665 US 20536 VX 5162 WN 12275 YV 601 In order to create a barplot visualizing the distribution of the categorical variable carrier in this case, we would use geom_col() instead with x mapped to carrier and y mapped to number as seen below. The resulting barplot would be identical to Figure 3.26. ggplot(data = flights_table, mapping = aes(x = carrier, y = number)) + geom_col() Learning check (LC3.26) Why are histograms inappropriate for visualizing categorical variables? (LC3.27) What is the difference between histograms and barplots? (LC3.28) How many Envoy Air flights departed NYC in 2013? (LC3.29) What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly? 3.8.2 Must avoid pie charts! Unfortunately, one of the most common plots seen today for categorical data is the pie chart. While they may seem harmless enough, they actually present a problem in that humans are unable to judge angles well. As Naomi Robbins describes in her book “Creating More Effective Graphs” (Robbins 2013), we overestimate angles greater than 90 degrees and we underestimate angles less than 90 degrees. In other words, it is difficult for us to determine relative size of one piece of the pie compared to another. Let’s examine the same data used in our previous barplot of the number of flights departing NYC by airline in Figure 3.26, but this time we will use a pie chart in Figure 3.27. FIGURE 3.27: The dreaded pie chart Try to answer the following questions: How much larger the portion of the pie is for ExpressJet Airlines (EV) compared to US Airways (US), What the third largest carrier is in terms of departing flights, and How many carriers have fewer flights than United Airlines (UA)? While it is quite difficult to answer these questions when looking at the pie chart in Figure 3.27, we can much more easily answer these questions using the barchart in Figure 3.26. This is true since barplots present the information in a way such that comparisons between categories can be made with single horizontal lines, whereas pie charts present the information in a way such that comparisons between categories must be made by comparing angles. There may be one exception of a pie chart not to avoid courtesy Nathan Yau at FlowingData.com, but we will leave this for the reader to decide: FIGURE 3.28: The only good pie chart Learning check (LC3.30) Why should pie charts be avoided and replaced by barplots? (LC3.31) Why do you think people continue to use pie charts? 3.8.3 Two categorical variables Barplots are the go-to way to visualize the frequency of different categories, or levels, of a single categorical variable. Another use of barplots is to visualize the joint distribution of two categorical variables at the same time. Let’s examine the joint distribution of outgoing domestic flights from NYC by carrier and origin, or in other words the number of flights for each carrier and origin combination. For example, the number of WestJet flights from JFK, the number of WestJet flights from LGA, the number of WestJet flights from EWR, the number of American Airlines flights from JFK, and so on. Recall the ggplot() code that created the barplot of carrier frequency in Figure 3.26: ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() We can now map the additional variable origin by adding a fill = origin inside the aes() aesthetic mapping; the fill aesthetic of any bar corresponds to the color used to fill the bars. ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) + geom_bar() FIGURE 3.29: Stacked barplot comparing the number of flights by carrier and origin. Figure 3.29 is an example of a stacked barplot. While simple to make, in certain aspects it is not ideal. For example, it is difficult to compare the heights of the different colors between the bars, corresponding to comparing the number of flights from each origin airport between the carriers. Before we continue, let’s address some common points of confusion amongst new R users. First, note that fill is another aesthetic mapping much like x-position; thus it must be included within the parentheses of the aes() mapping. The following code, where the fill aesthetic is specified outside the aes() mapping will yield an error. This is a fairly common error that new ggplot users make: ggplot(data = flights, mapping = aes(x = carrier), fill = origin) + geom_bar() Second, the fill aesthetic corresponds to the color used to fill the bars, while the color aesthetic corresponds to the color of the outline of the bars. Observe in Figure 3.30 that mapping origin to color and not fill yields grey bars with different colored outlines. ggplot(data = flights, mapping = aes(x = carrier, color = origin)) + geom_bar() FIGURE 3.30: Stacked barplot with color aesthetic used instead of fill. Learning check (LC3.32) What kinds of questions are not easily answered by looking at the above figure? (LC3.33) What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights? Another alternative to stacked barplots are side-by-side barplots, also known as a dodged barplot. The code to created a side-by-side barplot is identical to the code to create a stacked barplot, but with a position = &quot;dodge&quot; argument added to geom_bar(). In other words, we are overriding the default barplot type, which is a stacked barplot, and specifying it to be a side-by-side barplot. ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) + geom_bar(position = &quot;dodge&quot;) FIGURE 3.31: Side-by-side AKA dodged barplot comparing the number of flights by carrier and origin. Learning check (LC3.34) Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case? (LC3.35) What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general? Lastly, another type of barplot is a faceted barplot. Recall in Section 3.6 we visualized the distribution of hourly temperatures at the 3 NYC airports split by month using facets. We apply the same principle to our barplot visualizing the frequency of carrier split by origin: instead of mapping origin ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() + facet_wrap(~ origin, ncol = 1) FIGURE 3.32: Faceted barplot comparing the number of flights by carrier and origin. Learning check (LC3.36) Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case? (LC3.37) What information about the different carriers at different airports is more easily seen in the faceted barplot? 3.8.4 Summary Barplots are the preferred way of displaying the distribution of a categorical variable, or in other words the frequency with which the different categories called levels occur. They are easy to understand and make it easy to make comparisons across levels. When trying to visualize two categorical variables, you have many options: stacked barplots, side-by-side barplots, and faceted barplots. Depending on what aspect of the joint distribution you are trying to emphasize, you will need to make a choice between these three types of barplots. 3.9 Conclusion 3.9.1 Summary table Let’s recap all five of the Five Named Graphs (5NG) in Table 3.4 summarizing their differences. Using these 5NG, you’ll be able to visualize the distributions and relationships of variables contained in a wide array of datasets. This will be even more the case as we start to map more variables to more of each geometric object’s aesthetic attribute options, further unlocking the awesome power of the ggplot2 package. TABLE 3.4: Summary of 5NG Named graph Shows Geometric object Notes 1 Scatterplot Relationship between 2 numerical variables geom_point() 2 Linegraph Relationship between 2 numerical variables geom_line() Used when there is a sequential order to x-variable e.g. time 3 Histogram Distribution of 1 numerical variable geom_histogram() Facetted histograms show the distribution of 1 numerical variable split by values of another variable 4 Boxplot Distribution of 1 numerical variable split by 1 categorical variable geom_boxplot() 5 Barplot Distribution of 1 categorical variable geom_bar() when counts are not pre-counted, geom_col() when counts are pre-counted Stacked, side-by-side, and faceted barplots show the joint distribution of 2 categorical variables 3.9.2 Argument specification Run the following two segments of code. First this: ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() then this: ggplot(flights, aes(x = carrier)) + geom_bar() You’ll notice that that both code segments create the same barplot, even though in the second segment we omitted the data = and mapping = code argument names. This is because the ggplot() by default assumes that the data argument comes first and the mapping argument comes second. So as long as you specify the data frame in question first and the aes() mapping second, you can omit the explicit statement of the argument names data = and mapping =. Going forward for the rest of this book, all ggplot() will be like the second segment above: with the data = and mapping = explicit naming of the argument omitted and the default ordering of arguments respected. 3.9.3 Additional resources An R script file of all R code used in this chapter is available here. If you want to further unlock the power of the ggplot2 package for data visualization, we suggest you that you check out RStudio’s “Data Visualization with ggplot2” cheatsheet. This cheatsheet summarizes much more than what we’ve discussed in this chapter, in particular the many more than the 5 geom geometric objects we covered in this Chapter, while providing quick and easy to read visual descriptions. You can access this cheatsheet by going to the RStudio Menu Bar -&gt; Help -&gt; Cheatsheets -&gt; “Data Visualization with ggplot2”: FIGURE 3.33: Data Visualization with ggplot2 cheatsheat 3.9.4 What’s to come Recall in Figure 3.2 in Section 3.3 we visualized the relationship between departure delay and arrival delay for Alaska Airlines flights. This necessitated paring down the flights data frame to a new data frame alaska_flights consisting of only carrier == AS flights first: alaska_flights &lt;- flights %&gt;% filter(carrier == &quot;AS&quot;) ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point() Furthermore recall in Figure 3.8 in Section 3.4 we visualized hourly temperature recordings at Newark airport only for the first 15 days of January 2013. This necessitated paring down the weather data frame to a new data frame early_january_weather consisting of hourly temperature recordings only for origin == &quot;EWR&quot;, month == 1, and day less than or equal to 15 first: early_january_weather &lt;- weather %&gt;% filter(origin == &quot;EWR&quot; &amp; month == 1 &amp; day &lt;= 15) ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) + geom_line() These two code segments were a preview of Chapter 4 on data wrangling where we’ll delve further into the dplyr package. Data wrangling is the process of transforming and modifying existing data to with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the filter() function to create new data frames (alaska_flights and early_january_weather) by choosing only a subset of rows of existing data frames (flights and weather). In this next chapter, we’ll formally introduce the filter() and other data wrangling functions as well as the pipe operator %&gt;% which allows you to combine multiple data wrangling actions into a single sequential chain of actions. On to Chapter 4 on data wrangling! References "],
-["4-wrangling.html", "Chapter 4 Data Wrangling 4.1 The pipe operator: %&gt;% 4.2 filter rows 4.3 summarize variables 4.4 group_by rows 4.5 mutate existing variables 4.6 arrange and sort rows 4.7 join data frames 4.8 Other verbs 4.9 Conclusion", " Chapter 4 Data Wrangling So far in our journey, we’ve seen how to look at data saved in data frames using the glimpse() and View() functions in Chapter 2 on and how to create data visualizations using the ggplot2 package in Chapter 3. In particular we study what we term the “five named graphs” (5NG): scatterplots via geom_point() linegraphs via geom_line() boxplots via geom_boxplot() histograms via geom_histogram() barplots via geom_bar() or geom_col() We created these visualization using the “Grammar of Graphics”, which maps variables in a data frame to the aesthetic attributes of the above 5 geometric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure 3.1. Furthermore in Section 3.9.4 we discussed that for two of our visualizations, we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay only for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the flights data frame to a new data frame alaska_flights consisting of only carrier == AS flights using the filter() function. alaska_flights &lt;- flights %&gt;% filter(carrier == &quot;AS&quot;) ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point() In this chapter, we’ll introduce a series of functions from the dplyr package that will allow you to take a data frame and filter() its existing rows to only pick out a subset of them. For example, the alaska_flights data frame above. summarize() one of its columns/variables with a summary statistic. For example, the median and interquartile range of temperatures as we saw in Section 3.7 on boxplots. group_by() its rows. In other words assign different rows to be part of the same group and thus report summary statistics for each group separately. For example, perhaps you want not the overall average departure delay dep_delay for all three origin airports combined, but the average departure delay for each of the three origin airports separately. mutate() its existing columns/variables to create new ones. For example, convert hourly temperature recordings from °F to °C. arrange() its rows. For example, sort the rows of weather in ascending or descending order of temp. join() it with another data frame by matching along a “key” variable. In other words, merge these two data frames together. Notice how we used computer code type font to describe the actions we want to take on our data frames. This is because the dplyr package have intuitively verb-named functions that are easy to remember. We’ll start by introducing the pipe operator %&gt;%, which allows you to combine multiple data wrangling verb-named functions into a single sequential chain of actions. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(nycflights13) 4.1 The pipe operator: %&gt;% Before we dig into data wrangling, let’s first introduce a very nifty tool that gets loaded along with the dplyr package: the pipe operator %&gt;%. Let’s say you would like to perform this sequence of operations in R: Take x then Use x as an input to a function f() then Use the output of f(x) as an input to a function g() then Use the output of g(f(x)) as an input to a function h() One way to achieve this sequence of operations is by using nesting parentheses as follows: h(g(f(x))) In this case, the above code isn’t so hard to read since we are applying only three functions: f(), then g(), then h(). However, you can imagine this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator %&gt;% (pronounced “then”) comes in handy. %&gt;% takes one output of one function and then “pipes” it to be the input of the next function. For example: you can obtain the same output as the above sequence of operations as follows: x %&gt;% f() %&gt;% g() %&gt;% h() You would read this above sequence as: Take x then Use this output as the input to the next function f() then Use this output as the input to the next function g() then Use this output as the input to the next function h() So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are x, f(), g(), and h()? Throughout this chapter on data wrangling: The starting value x will be a data frame. For example: flights. The sequence of functions, here f(), g(), and h(), will be a sequence of any number of the 6 data wrangling verb-named functions we listed in the introduction to this chapter. For example: filter(carrier == &quot;AS&quot;). The result will the transformed/modified data frame that you want. For example: a data frame consisting of only the subset of rows in flights corresponding to Alaska Airlines flights. Much like when adding layers to a ggplot() using the + sign at the end of lines, you form a single chain of data wrangling operations by combining verb-named functions into a single sequence with pipe operators %&gt;% at the end of lines. So continuing our example involving Alaska Airlines flights, we form a chain using the pipe operator %&gt;% and save the resulting data frame in alaska_flights: alaska_flights &lt;- flights %&gt;% filter(carrier == &quot;AS&quot;) Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you’ll see some examples of these near in Section 4.8. However, just with these 6 verb-named functions you’ll be able to perform a broad array of data wrangling tasks. 4.2 filter rows FIGURE 4.1: Diagram of The filter() function here works much like the “Filter” option in Microsoft Excel; it allows you to specify criteria about values of a variable in your dataset and then chooses only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The dest code (or airport code) for Portland, Oregon is &quot;PDX&quot;. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here: portland_flights &lt;- flights %&gt;% filter(dest == &quot;PDX&quot;) View(portland_flights) Note the following: The ordering of the commands: Take the data frame flights then filter the data frame so that only those where the dest equals &quot;PDX&quot; are included. The double equal sign == for testing for equality, and not =. You are almost guaranteed to make the mistake at least once of only including one equals sign. You can combine multiple criteria together using operators that make comparisons: | corresponds to “or” &amp; corresponds to “and” We can often skip the use of &amp; and just separate our conditions with a comma. You’ll see this in the example below. In addition, you can use other mathematical checks (similar to ==): &gt; corresponds to “greater than” &lt; corresponds to “less than” &gt;= corresponds to “greater than or equal to” &lt;= corresponds to “less than or equal to” != corresponds to “not equal to” To see many of these in action, let’s select all flights that left JFK airport heading to Burlington, Vermont (&quot;BTV&quot;) or Seattle, Washington (&quot;SEA&quot;) in the months of October, November, or December. Run the following btv_sea_flights_fall &lt;- flights %&gt;% filter(origin == &quot;JFK&quot;, dest == &quot;BTV&quot; | dest == &quot;SEA&quot;, month &gt;= 10) View(btv_sea_flights_fall) Note: even though colloquially speaking one might say “all flights leaving Burlington, Vermont and Seattle, Washington,” in terms of computer logical operations, we really mean “all flights leaving Burlington, Vermont or Seattle, Washington.” For a given row in the data, dest can be “BTV”, “SEA”, or something else, but not “BTV” and “SEA” at the same time. Another example uses the ! to pick rows that don’t match a condition. The ! can be read as “not.” Here we are selecting rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA. not_BTV_SEA &lt;- flights %&gt;% filter(!(dest == &quot;BTV&quot; | dest == &quot;SEA&quot;)) View(not_BTV_SEA) Now say we have a large list of airports we want to filter for, say BTV, SEA, PDX, SFO, and BDL. We could continue to use the | or operator as so: many_airports &lt;- flights %&gt;% filter(dest == &quot;BTV&quot; | dest == &quot;SEA&quot; | dest == &quot;PDX&quot; | dest == &quot;SFO&quot; | dest == &quot;BDL&quot;) View(many_airports) but as we progressively include more airports, this will get unwieldly. A slightly shorter approach uses the %in% operator: many_airports &lt;- flights %&gt;% filter(dest %in% c(&quot;BTV&quot;, &quot;SEA&quot;, &quot;PDX&quot;, &quot;SFO&quot;, &quot;BDL&quot;)) View(many_airports) What this code is doing is its filtering for all flights where dest is in the list of airports c(&quot;BTV&quot;, &quot;SEA&quot;, &quot;PDX&quot;, &quot;SFO&quot;, &quot;BDL&quot;). Both outputs of many_airports are the same, but as you can see the latter takes much less time to code. As a final note we point out that filter() should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope to just the observations your care about. Learning check (LC4.1) What’s another way using the “not” operator ! we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the flights data frame? Test this out using the code above. 4.3 summarize variables The next common task when working with data is to be able to summarize data: take a large number of values and summarize them with a single value. While this may seem like a very abstract idea, something as simple as the sum, the smallest value, and the largest values are all summaries of a large number of values. FIGURE 4.2: Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet FIGURE 4.3: Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet We can calculate the standard deviation and mean of the temperature variable temp in the weather data frame of nycflights13 in one step using the summarize (or equivalently using the UK spelling summarise) function in dplyr (See Appendix A): summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp), std_dev = sd(temp)) summary_temp # A tibble: 1 x 2 mean std_dev &lt;dbl&gt; &lt;dbl&gt; 1 NA NA We’ve created a small data frame here called summary_temp that includes both the mean and the std_dev of the temp variable in weather. Notice as shown in Figures 4.2 and 4.3, the data frame weather went from many rows to a single row of just the summary values in the data frame summary_temp. But why are the values returned NA? This stands for “not available or not applicable” and is how R encodes missing values; if in a data frame for a particular row and column no value exists, NA is stored instead. Furthermore, by default any time you try to summarize a number of values (using mean() and sd() for example) that has one or more missing values, then NA is returned. Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You’ll often encounter issues with missing values. You can summarize all non-missing values by setting the na.rm argument to TRUE (rm is short for “remove”). This will remove any NA missing values and only return the summary value for all non-missing values. So the code below computes the mean and standard deviation of all non-missing values. Notice how the na.rm=TRUE are set as arguments to the mean() and sd() functions, and not to the summarize() function. summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) summary_temp mean std_dev 55.3 17.8 It is not good practice to include a na.rm = TRUE in your summary commands by default; you should attempt to run code first without this argument as this will alert you to the presence of missing data. Only after you’ve identified where missing values occur and have thought about the potential causes of this missing should you consider using na.rm = TRUE. In the upcoming Learning Checks we’ll consider the possible ramifications of blindly sweeping rows with missing values under the rug. What other summary functions can we use inside the summarize() verb? Any function in R that takes a vector of values and returns just one. Here are just a few: mean(): the mean AKA the average sd(): the standard deviation, which is a measure of spread min() and max(): the minimum and maximum values respectively IQR(): Interquartile range sum(): the sum n(): a count of the number of rows/observations in each group. This particular summary function will make more sense when group_by() is covered in Section 4.4. Learning check (LC4.2) Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach? (LC4.3) Modify the above summarize function to create summary_temp to also use the n() summary function: summarize(count = n()). What does the returned value correspond to? (LC4.4) Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) first. summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) %&gt;% summarize(std_dev = sd(temp, na.rm = TRUE)) 4.4 group_by rows FIGURE 4.4: Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet It’s often more useful to summarize a variable based on the groupings of another variable. Let’s say, we are interested in the mean and standard deviation of temperatures but grouped by month. To be more specific: we want the mean and standard deviation of temperatures split by month. sliced by month. aggregated by month. collapsed over month. Run the following code: summary_monthly_temp &lt;- weather %&gt;% group_by(month) %&gt;% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) summary_monthly_temp month mean std_dev 1 35.6 10.22 2 34.3 6.98 3 39.9 6.25 4 51.7 8.79 5 61.8 9.68 6 72.2 7.55 7 80.1 7.12 8 74.5 5.19 9 67.4 8.47 10 60.1 8.85 11 45.0 10.44 12 38.4 9.98 This code is identical to the previous code that created summary_temp, with an extra group_by(month) added. Grouping the weather dataset by month and then passing this new data frame into summarize yields a data frame that shows the mean and standard deviation of temperature for each month in New York City. Note: Since each row in summary_monthly_temp represents a summary of different rows in weather, the observational units have changed. It is important to note that group_by doesn’t change the data frame. It sets meta-data (data about the data), specifically the group structure of the data. It is only after we apply the summarize function that the data frame changes. If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the ungroup() function. For example, say the group structure meta-data is set to be by month via group_by(month), all future summarizations will be reported on a month-by-month basis. If however, we would like to no longer have this and have all summarizations be for all data in a single group (in this case over the entire year of 2013), then pipe the data frame in question through and ungroup() to remove this. We now revisit the n() counting summary function we introduced in the previous section. For example, suppose we’d like to get a sense for how many flights departed each of the three airports in New York City: by_origin &lt;- flights %&gt;% group_by(origin) %&gt;% summarize(count = n()) by_origin origin count EWR 120835 JFK 111279 LGA 104662 We see that Newark (&quot;EWR&quot;) had the most flights departing in 2013 followed by &quot;JFK&quot; and lastly by LaGuardia (&quot;LGA&quot;). Note there is a subtle but important difference between sum() and n(). While sum() simply adds up a large set of numbers, the latter counts the number of times each of many different values occur. 4.4.1 Grouping by more than one variable You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports for each month, we can also group by a second variable month: group_by(origin, month). by_origin_monthly &lt;- flights %&gt;% group_by(origin, month) %&gt;% summarize(count = n()) by_origin_monthly # A tibble: 36 x 3 # Groups: origin [?] origin month count &lt;chr&gt; &lt;int&gt; &lt;int&gt; 1 EWR 1 9893 2 EWR 2 9107 3 EWR 3 10420 4 EWR 4 10531 5 EWR 5 10592 6 EWR 6 10175 7 EWR 7 10475 8 EWR 8 10359 9 EWR 9 9550 10 EWR 10 10104 # … with 26 more rows We see there are 36 rows to by_origin_monthly because there are 12 months times 3 airports (EWR, JFK, and LGA). Why do we group_by(origin, month) and not group_by(origin) and then group_by(month)? Let’s investigate: by_origin_monthly_incorrect &lt;- flights %&gt;% group_by(origin) %&gt;% group_by(month) %&gt;% summarize(count = n()) by_origin_monthly_incorrect # A tibble: 12 x 2 month count &lt;int&gt; &lt;int&gt; 1 1 27004 2 2 24951 3 3 28834 4 4 28330 5 5 28796 6 6 28243 7 7 29425 8 8 29327 9 9 27574 10 10 28889 11 11 27268 12 12 28135 What happened here is that the second group_by(month) overrode the first group_by(origin), so that in the end we are only grouping by month. The lesson here, is if you want to group_by() two or more variables, you should include all these variables in a single group_by() function call. Learning check (LC4.5) Recall from Chapter 3 when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the summary_monthly_temp data frame tell us about temperatures in New York City throughout the year? (LC4.6) What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC? (LC4.7) Recreate by_monthly_origin, but instead of grouping via group_by(origin, month), group variables in a different order group_by(month, origin). What differs in the resulting dataset? (LC4.8) How could we identify how many flights left each of the three airports for each carrier? (LC4.9) How does the filter operation differ from a group_by followed by a summarize? 4.5 mutate existing variables FIGURE 4.5: Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet When looking at the flights dataset, there are some clear additional variables that could be calculated based on the values of variables already in the dataset. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to when they expected to land. This is commonly referred to as “gain” and we will create this variable using the mutate function. Note that we have also overwritten the flights data frame with what it was before as well as an additional variable gain here, or put differently, the mutate() command outputs a new data frame which then gets saved over the original flights data frame. flights &lt;- flights %&gt;% mutate(gain = dep_delay - arr_delay) Let’s take a look at dep_delay, arr_delay, and the resulting gain variables for the first 5 rows in our new flights data frame: # A tibble: 5 x 3 dep_delay arr_delay gain &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 2 11 -9 2 4 20 -16 3 2 33 -31 4 -1 -18 17 5 -6 -25 19 The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its “gained time in the air” is actually a loss of 9 minutes, hence its gain is -9. Contrast this to the flight in the fourth row which departed a minute early (dep_delay of -1) but arrived 18 minutes early (arr_delay of -18), so its “gained time in the air” is 17 minutes, hence its gain is +17. Why did we overwrite flights instead of assigning the resulting data frame to a new object, like flights_with_gain? As a rough rule of thumb, as long as you are not losing information that you might need later, it’s acceptable practice to overwrite data frames. However, if you overwrite existing variables and/or change the observational units, recovering the original information might prove difficult. In this case, it might make sense to create a new data object. Let’s look at summary measures of this gain variable and even plot it in the form of a histogram: gain_summary &lt;- flights %&gt;% summarize( min = min(gain, na.rm = TRUE), q1 = quantile(gain, 0.25, na.rm = TRUE), median = quantile(gain, 0.5, na.rm = TRUE), q3 = quantile(gain, 0.75, na.rm = TRUE), max = max(gain, na.rm = TRUE), mean = mean(gain, na.rm = TRUE), sd = sd(gain, na.rm = TRUE), missing = sum(is.na(gain)) ) gain_summary min q1 median q3 max mean sd missing -196 -3 7 17 109 5.66 18 9430 We’ve recreated the summary function we saw in Chapter 3 here using the summarize function in dplyr. ggplot(data = flights, mapping = aes(x = gain)) + geom_histogram(color = &quot;white&quot;, bins = 20) FIGURE 4.6: Histogram of gain variable We can also create multiple columns at once and even refer to columns that were just created in a new column. Hadley and Garrett produce one such example in Chapter 5 of “R for Data Science” (Grolemund and Wickham 2016): flights &lt;- flights %&gt;% mutate( gain = dep_delay - arr_delay, hours = air_time / 60, gain_per_hour = gain / hours ) Learning check (LC4.10) What do positive values of the gain variable in flights correspond to? What about negative values? And what about a zero value? (LC4.11) Could we create the dep_delay and arr_delay columns by simply subtracting dep_time from sched_dep_time and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in flights. (LC4.12) What can we say about the distribution of gain? Describe it in a few sentences using the plot and the gain_summary data frame values. 4.6 arrange and sort rows One of the most common things people working with data would like to do is sort the data frames by a specific variable in a column. Have you ever been asked to calculate a median by hand? This requires you to put the data in order from smallest to highest in value. The dplyr package has a function called arrange that we will use to sort/reorder our data according to the values of the specified variable. This is often used after we have used the group_by and summarize functions as we will see. Let’s suppose we were interested in determining the most frequent destination airports from New York City in 2013: freq_dest &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) freq_dest # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 ABQ 254 2 ACK 265 3 ALB 439 4 ANC 8 5 ATL 17215 6 AUS 2439 7 AVL 275 8 BDL 443 9 BGR 375 10 BHM 297 # … with 95 more rows You’ll see that by default the values of dest are displayed in alphabetical order here. We are interested in finding those airports that appear most: freq_dest %&gt;% arrange(num_flights) # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 LEX 1 2 LGA 1 3 ANC 8 4 SBN 10 5 HDN 15 6 MTJ 15 7 EYW 17 8 PSP 19 9 JAC 25 10 BZN 36 # … with 95 more rows This is actually giving us the opposite of what we are looking for. It tells us the least frequent destination airports first. To switch the ordering to be descending instead of ascending we use the desc (descending) function: freq_dest %&gt;% arrange(desc(num_flights)) # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 ORD 17283 2 ATL 17215 3 LAX 16174 4 BOS 15508 5 MCO 14082 6 CLT 14064 7 SFO 13331 8 FLL 12055 9 MIA 11728 10 DCA 9705 # … with 95 more rows 4.7 join data frames Another common task is joining AKA merging two different datasets. For example, in the flights data, the variable carrier lists the carrier code for the different flights. While &quot;UA&quot; and &quot;AA&quot; might be somewhat easy to guess for some (United and American Airlines), what are “VX”, “HA”, and “B6”? This information is provided in a separate data frame airlines. View(airlines) We see that in airports, carrier is the carrier code while name is the full name of the airline. Using this table, we can see that “VX”, “HA”, and “B6” correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, will we have to continually look up the carrier’s name for each flight in the airlines dataset? No! Instead of having to do this manually, we can have R automatically do the “looking up” for us. Note that the values in the variable carrier in flights match the values in the variable carrier in airlines. In this case, we can use the variable carrier as a key variable to join/merge/match the two data frames by. Key variables are almost always identification variables that uniquely identify the observational units as we saw back in Subsection ?? on identification vs measurement variables. This ensures that rows in both data frames are appropriate matched during the join. Hadley and Garrett (Grolemund and Wickham 2016) created the following diagram to help us understand how the different datasets are linked by various key variables: FIGURE 4.7: Data relationships in nycflights13 from R for Data Science 4.7.1 Joining by “key” variables In both flights and airlines, the key variable we want to join/merge/match the two data frames with has the same name in both datasets: carriers. We make use of the inner_join() function to join by the variable carrier. flights_joined &lt;- flights %&gt;% inner_join(airlines, by = &quot;carrier&quot;) View(flights) View(flights_joined) We observed that the flights and flights_joined are identical except that flights_joined has an additional variable name whose values were drawn from airlines. A visual representation of the inner_join is given below (Grolemund and Wickham 2016): FIGURE 4.8: Diagram of inner join from R for Data Science There are more complex joins available, but the inner_join will solve nearly all of the problems you’ll face in our experience. 4.7.2 Joining by “key” variables with different names Say instead, you are interested in all the destinations of flights from NYC in 2013 and ask yourself: “What cities are these airports in?” “Is &quot;ORD&quot; Orlando?” “Where is &quot;FLL&quot;? The airports data frame contains airport codes: View(airports) However, looking at both the airports and flights and the visual representation of the relations between the data frames in Figure 4.8, we see that in: airports the airport code is in the variable faa flights the airport code is in the variables origin and dest (destination) So to join these two datasets so that we can identify the destination cities, our inner_join operation involves a by argument that accounts for the different names: flights %&gt;% inner_join(airports, by = c(&quot;dest&quot; = &quot;faa&quot;)) Let’s construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport: named_dests &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) %&gt;% arrange(desc(num_flights)) %&gt;% inner_join(airports, by = c(&quot;dest&quot; = &quot;faa&quot;)) %&gt;% rename(airport_name = name) named_dests # A tibble: 101 x 9 dest num_flights airport_name lat lon alt tz dst tzone &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; 1 ORD 17283 Chicago Ohare Intl 42.0 -87.9 668 -6 A America… 2 ATL 17215 Hartsfield Jackson… 33.6 -84.4 1026 -5 A America… 3 LAX 16174 Los Angeles Intl 33.9 -118. 126 -8 A America… 4 BOS 15508 General Edward Law… 42.4 -71.0 19 -5 A America… 5 MCO 14082 Orlando Intl 28.4 -81.3 96 -5 A America… 6 CLT 14064 Charlotte Douglas … 35.2 -80.9 748 -5 A America… 7 SFO 13331 San Francisco Intl 37.6 -122. 13 -8 A America… 8 FLL 12055 Fort Lauderdale Ho… 26.1 -80.2 9 -5 A America… 9 MIA 11728 Miami Intl 25.8 -80.3 8 -5 A America… 10 DCA 9705 Ronald Reagan Wash… 38.9 -77.0 15 -5 A America… # … with 91 more rows In case you didn’t know, &quot;ORD&quot; is the airport code of Chicago O’Hare airport and &quot;FLL&quot; is the main airport in Fort Lauderdale, Florida, which we can now see in our named_dests data frame. 4.7.3 Joining by multiple “key” variables Say instead we are in a situation where we need to join by multiple variables. For example, in Figure 4.7 above we see that in order to join the flights and weather data frames, we need more than one key variable: year, month, day, hour, and origin. This is because the combination of these 5 variables act to uniquely identify each observational unit in the weather data frame: hourly weather recordings at each of the 3 NYC airports. We achieve this by specifying a vector of key variables to join by using the c() concatenate function. Note the individual variables need to be wrapped in quotation marks. flights_weather_joined &lt;- flights %&gt;% inner_join(weather, by = c(&quot;year&quot;, &quot;month&quot;, &quot;day&quot;, &quot;hour&quot;, &quot;origin&quot;)) flights_weather_joined # A tibble: 335,220 x 32 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; 1 2013 1 1 517 515 2 830 819 2 2013 1 1 533 529 4 850 830 3 2013 1 1 542 540 2 923 850 4 2013 1 1 544 545 -1 1004 1022 5 2013 1 1 554 600 -6 812 837 6 2013 1 1 554 558 -4 740 728 7 2013 1 1 555 600 -5 913 854 8 2013 1 1 557 600 -3 709 723 9 2013 1 1 557 600 -3 838 846 10 2013 1 1 558 600 -2 753 745 # … with 335,210 more rows, and 24 more variables: arr_delay &lt;dbl&gt;, # carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;, # air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;, # time_hour.x &lt;dttm&gt;, gain &lt;dbl&gt;, hours &lt;dbl&gt;, gain_per_hour &lt;dbl&gt;, # temp &lt;dbl&gt;, dewp &lt;dbl&gt;, humid &lt;dbl&gt;, wind_dir &lt;dbl&gt;, wind_speed &lt;dbl&gt;, # wind_gust &lt;dbl&gt;, precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;, # time_hour.y &lt;dttm&gt; Learning check (LC4.13) Looking at Figure 4.7, when joining flights and weather (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of year, month, day, hour, and origin, and not just hour? (LC4.14) What surprises you about the top 10 destinations from NYC in 2013? 4.8 Other verbs On top of the following examples of other verbs, if you’d like to see more examples on using dplyr, the data wrangling verbs we introduction in Section ??, and the pipe function %&gt;% with the nycflights13 dataset, check out Chapter 5 of Hadley and Garrett’s book (Grolemund and Wickham 2016). 4.8.1 select variables FIGURE 4.9: Select diagram from Data Wrangling with dplyr and tidyr cheatsheet We’ve seen that the flights data frame in the nycflights13 package contains many different variables. The names function gives a listing of all the columns in a data frame; in our case you would run names(flights). You can also identify these variables by running the glimpse function in the dplyr package: glimpse(flights) However, say you only want to consider two of these variables, say carrier and flight. You can select these: flights %&gt;% select(carrier, flight) This function makes navigating datasets with a very large number of variables easier for humans by restricting consideration to only those of interest, like carrier and flight above. So for example, this might make viewing the dataset using the View() spreadsheet viewer more digestible. However, as far as the computer is concerned it doesn’t care how many additional variables are in the dataset in question, so long as carrier and flight are included. Another example involves the variable year. If you remember the original description of the flights data frame (or by running ?flights), you’ll remember that this data correspond to flights in 2013 departing New York City. The year variable isn’t really a variable here in that it doesn’t vary… flights actually comes from a larger dataset that covers many years. We may want to remove the year variable from our dataset since it won’t be helpful for analysis in this case. We can deselect year by using the - sign: flights_no_year &lt;- flights %&gt;% select(-year) names(flights_no_year) Or we could specify a ranges of columns: flight_arr_times &lt;- flights %&gt;% select(month:day, arr_time:sched_arr_time) flight_arr_times The select function can also be used to reorder columns in combination with the everything helper function. Let’s suppose we’d like the hour, minute, and time_hour variables, which appear at the end of the flights dataset, to actually appear immediately after the day variable: flights_reorder &lt;- flights %&gt;% select(month:day, hour:time_hour, everything()) names(flights_reorder) in this case everything() picks up all remaining variables. Lastly, the helper functions starts_with, ends_with, and contains can be used to choose column names that match those conditions: flights_begin_a &lt;- flights %&gt;% select(starts_with(&quot;a&quot;)) flights_begin_a flights_delays &lt;- flights %&gt;% select(ends_with(&quot;delay&quot;)) flights_delays flights_time &lt;- flights %&gt;% select(contains(&quot;time&quot;)) flights_time 4.8.2 rename variables Another useful function is rename, which as you may suspect renames one column to another name. Suppose we wanted dep_time and arr_time to be departure_time and arrival_time instead in the flights_time data frame: flights_time_new &lt;- flights %&gt;% select(contains(&quot;time&quot;)) %&gt;% rename(departure_time = dep_time, arrival_time = arr_time) names(flights_time) Note that in this case we used a single = sign with the rename(). Ex: departure_time = dep_time. This is because we are not testing for equality like we would using ==, but instead we want to assign a new variable departure_time to have the same values as dep_time and then delete the variable dep_time. It’s easy to forget if the new name comes before or after the equals sign. I usually remember this as “New Before, Old After” or NBOA. You’ll receive an error if you try to do it the other way: Error: Unknown variables: departure_time, arrival_time. 4.8.3 top_n values of a variable We can also use the top_n function which automatically tells us the most frequent num_flights. We specify the top 10 airports here: named_dests %&gt;% top_n(n = 10, wt = num_flights) We’ll still need to arrange this by num_flights though: named_dests %&gt;% top_n(n = 10, wt = num_flights) %&gt;% arrange(desc(num_flights)) Note: Remember that I didn’t pull the n and wt arguments out of thin air. They can be found by using the ? function on top_n. We can go one stop further and tie together the group_by and summarize functions we used to find the most frequent flights: ten_freq_dests &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) %&gt;% arrange(desc(num_flights)) %&gt;% top_n(n = 10) View(ten_freq_dests) Learning check (LC4.15) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways. (LC4.16) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains. (LC4.17) Why might we want to use the select function on a data frame? (LC4.18) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013. 4.9 Conclusion 4.9.1 Summary table Let’s recap a selection of verbs in Table 4.1 summarizing their differences. Using these verbs and the pipe %&gt;% operator from Section 4.1, you’ll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book. TABLE 4.1: Summary of data wrangling verbs Verb Data wrangling operation 1 filter() Pick out a subset of rows 2 summarize() Summarize many values to one using a summary statistic function like mean(), median(), etc. 3 group_by() Add grouping structure to rows in data frame. Note this does not change values in data frame. 4 mutate() Create new variables by mutating existing ones 5 arrange() Arrange rows of a data variable in ascending (default) or descending order 6 inner_join() Join/merge two data frames, matching rows by a key variable Learning check (LC4.19) Let’s now put your newly acquired data wrangling skills to the test! An airline industry measure of a passenger airline’s capacity is the available seat miles, which is equal to the number of seats available multiplied by the number of miles or kilometers flown summed over all flights. So for example say an airline had 2 flights using a plane with 10 seats that flew 500 miles and 3 flights using a plane with 20 seats that flew 1000 miles, the available seat miles would be 2 \\(\\times\\) 10 \\(\\times\\) 500 \\(+\\) 3 \\(\\times\\) 20 \\(\\times\\) 1000 = 70,000 seat miles. Using the datasets included in the nycflights13 package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints: Crucial: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level pseudocode that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse what you are trying to do (the algorithm) with how you are going to do it (writing dplyr code). Take a close look at all the datasets using the View() function: flights, weather, planes, airports, and airlines to identify which variables are necessary to compute available seat miles. Figure 4.7 above showing how the various datasets can be joined will also be useful. Consider the data wrangling verbs in Table 4.1 as your toolbox! 4.9.2 Additional resources An R script file of all R code used in this chapter is available here. If you want to further unlock the power of the dplyr package for data wrangling, we suggest you that you check out RStudio’s “Data Transformation with dplyr” cheatsheet. This cheatsheet summarizes much more than what we’ve discussed in this chapter, in particular more-intermediate level and advanced data wrangling functions, while providing quick and easy to read visual descriptions. You can access this cheatsheet by going to the RStudio Menu Bar -&gt; Help -&gt; Cheatsheets -&gt; “Data Transformation with dplyr”: FIGURE 4.10: Data Transformation with dplyr cheatsheat 4.9.3 What’s to come? So far in this book, we’ve explored, visualized, and wrangled data saved in data frames that are in spreadsheet-type format: rectangular with a certain number of rows corresponding to observations and a certain number of columns corresponding to variables describing the observations. We’ll see in Chapter 5 that there are actually two ways to represent data in spreadsheet-type rectangular format: 1) “wide” format and 2) “tall/narrow” format also known in R circles as “tidy” format. While the distinction between “tidy” and non-“tidy” formatted data is very subtle, it has very large implications for whether or not we can use the ggplot2 package for data visualization and the dplyr package for data wrangling. Furthermore, we’ve only explored, visualized, and wrangled data saved within R packages. What if you have spreadsheet data saved in a Microsoft Excel, Google Sheets, or “Comma-Separated Values” (CSV) file that you would like to analyze? In Chapter 5, we’ll show you how to import this data into R using the readr package. References "],
-["5-tidy.html", "Chapter 5 Data Importing &amp; “Tidy” Data 5.1 Importing data 5.2 Tidy data 5.3 Case study: Democracy in Guatemala 5.4 Conclusion", " Chapter 5 Data Importing &amp; “Tidy” Data In Subsection 2.2.1 we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section 2.4, we started exploring our first data frame: the flights data frame included in the nycflights13 package. In Chapter 3 we created visualizations based on the data included in flights and other data frames such as weather. In Chapter 4, we learned how to wrangle data, in other words take existing data frames and transform and modify them to suit our desired analysis. In this final chapter of the “Data Science via the tidyverse” portion of the book, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest of having your data “neatly organized” in a spreadsheet. Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored and the implications of these rules for analyses. Although knowledge of this type of data formatting was not necessary in our treatment of data visualization in Chapter 3 since all the data was already in tidy format, we’ll see going forward that having tidy data will allow you to more easily create data visualizations in a wide range of settings. Furthermore, it will also help you with data wrangling in Chapter 4 and in all subsequent chapters in this book when we cover regression and discuss statistical inference. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(readr) library(tidyr) library(nycflights13) library(fivethirtyeight) 5.1 Importing data Up to this point, we’ve almost entirely used data stored inside of an R package. Another common way to getting data into R is by importing from a spreadsheet file either on your computer or online. Spreadsheet data is often saved in one of two formats: A Comma Separated Values .csv file. You can think of a CSV file as a bare-bones spreadsheet where: Each line in the file corresponds to one row of data/one observation. Values for each line are separated with commas. In other words, the values of different variables are separated by commas. The first line is often, but not always, a header row indicating the names of the columns/variables. An Excel .xlsx file. This format is based on Microsoft’s proprietary Excel software. As opposed to a bare-bones .csv files, .xlsx Excel files contain a lot of metadata, or put more simply, data about the data. Examples include the use of bold and italic fonts, colored cells, different column widths, and formula macros etc. Google Sheets allows you to download your data in both comma separated values .csv and Excel .xlsx formats: Go to the Google Sheets menu bar -&gt; File -&gt; Download as -&gt; Select “Microsoft Excel” or “Comma-separated values.” We’ll cover two methods for importing data in R: one using the R console and the other using RStudio’s graphical interface. 5.1.1 Importing via the console First, let’s import a Comma Separated Values (CSV) of data directly off the internet. The CSV file dem_score.csv accessible at https://moderndive.com/data/dem_score.csv contains ratings of the level of democracy in different countries spanning 1952 to 1992. Let’s use the read_csv() function from the readr package to read it off the web, import it into R, and save the data in a data frame called dem_score library(readr) dem_score &lt;- read_csv(&quot;https://moderndive.com/data/dem_score.csv&quot;) dem_score # A tibble: 96 x 10 country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 Albania -9 -9 -9 -9 -9 -9 -9 -9 5 2 Argentina -9 -1 -1 -9 -9 -9 -8 8 7 3 Armenia -9 -7 -7 -7 -7 -7 -7 -7 7 4 Australia 10 10 10 10 10 10 10 10 10 5 Austria 10 10 10 10 10 10 10 10 10 6 Azerbaijan -9 -7 -7 -7 -7 -7 -7 -7 1 7 Belarus -9 -7 -7 -7 -7 -7 -7 -7 7 8 Belgium 10 10 10 10 10 10 10 10 10 9 Bhutan -10 -10 -10 -10 -10 -10 -10 -10 -10 10 Bolivia -4 -3 -3 -4 -7 -7 8 9 9 # … with 86 more rows In this dem_score data frame, the minimum value of -10 corresponds to a highly autocratic nation whereas a value of 10 corresponds to a highly democratic nation. We’ll revisit the dem_score data frame in a case study analysis in the upcoming Section 5.3. Note that the read_csv() function included in the readr package is different than the read.csv() function that comes with R even if you don’t install any packages. While the different in the names might be near meaningless (an _ instead of a .), the read_csv() is in our opinions easier to use since it can easily read data off the web and generally imports data at a much faster speed. 5.1.2 Importing via RStudio’s interface Let’s read in the exact same data saved in Excel format, but this time via RStudio’s graphical interface instead of via the R console. First download the Excel file dem_score.xlsx by clicking here, then Go to the Files panel of RStudio. Navigate to the directory where your downloaded dem_score.xlsx is saved. Click on dem_score.xlsx Click “Import Dataset…” At this point you should see an image like this: After clicking on the “Import” button on the bottom right RStudio save this spreadsheet’s data in a data frame called dem_score and display its contents in the spreadsheet viewer. Furthermore on the bottom right you’ll see the code that read in your data in the console; you can copy and paste this code to reload your data again later automatically instead of repeating the above manual process. 5.2 Tidy data Let’s now switch gears and learn about the concept of “tidy” data format. Let’s start with a motivating example. Let’s consider the drinks data frame included in the fivethirtyeight data. Run the drinks # A tibble: 193 x 5 country beer_servings spirit_servings wine_servings total_litres_of_pur… &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; 1 Afghanistan 0 0 0 0 2 Albania 89 132 54 4.9 3 Algeria 25 0 14 0.7 4 Andorra 245 138 312 12.4 5 Angola 217 57 45 5.9 6 Antigua &amp; B… 102 128 45 4.9 7 Argentina 193 25 221 8.3 8 Armenia 21 179 11 3.8 9 Australia 261 72 212 10.4 10 Austria 279 75 191 9.7 # … with 183 more rows After reading the help file by running ?drinks we see that is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries originally reported on the data journalism website FiveThirtyEight.com’s article “Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?”. Let’s filter drinks to only consider 4 countries: the US, China, Italy, and Saudi Arabia; drop the column total_litres_of_pure_alcohol by using select() with a - sign; and rename the variables beer_servings, spirit_servings, and wine_servings to read beer, spirit, and wine. drinks_smaller &lt;- drinks %&gt;% filter(country %in% c(&quot;USA&quot;, &quot;China&quot;, &quot;Italy&quot;, &quot;Saudi Arabia&quot;)) %&gt;% select(-total_litres_of_pure_alcohol) %&gt;% rename(beer = beer_servings, spirit = spirit_servings, wine = wine_servings) drinks_smaller # A tibble: 4 x 4 country beer spirit wine &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 China 79 192 8 2 Italy 85 42 237 3 Saudi Arabia 0 5 0 4 USA 249 158 84 Using drinks_smaller, how would we create the side-by-side AKA dodged barplot in Figure 5.1; recall we saw barplots displaying two categorical variables in Section 3.8.3. FIGURE 5.1: Alcohol consumption in 4 countries. Let’s break down the Grammar of Graphics: The categorical variable country with four levels (China, Italy, Saudi Arabia, USA) is mapped to the x-position of the bars. The numerical variable servings is mapped to the y-position of the bars, in other words the height. The cateogircal variable type with three levels (beer, spirit, wine) is mapped to the fill color of the bars. Observe however that drinks_smaller has three separate columns for beer, spirit, and wine, whereas in order to recreate the side-by-side AKA dodged barplot in Figure 5.1 we would need a single column type with three possible values: beer, spirit, and wine. In other words, for us to be able to create this barplot, our data frame would have to look like: drinks_smaller_tidy # A tibble: 12 x 3 country type servings &lt;chr&gt; &lt;chr&gt; &lt;int&gt; 1 China beer 79 2 Italy beer 85 3 Saudi Arabia beer 0 4 USA beer 249 5 China spirit 192 6 Italy spirit 42 7 Saudi Arabia spirit 5 8 USA spirit 158 9 China wine 8 10 Italy wine 237 11 Saudi Arabia wine 0 12 USA wine 84 Observe that while drinks_smaller and drinks_smaller_tidy are both rectangular in shape and contain the same data on 4 countries average number of servings for 3 alcohol types, totalling 12 numerical values, they are formatted differently. drinks_smaller is formatted in what’s known as “wide” format, whereas drinks_smaller_tidy is formated in what’s known as “long/narrow”. “Long/narrow” format is as known in R circles as “tidy” format. 5.2.1 What is tidy data? You have surely heard the word “tidy” in your life: “Tidy up your room!” “Please write your homework in a tidy way so that it is easier to grade and to provide feedback.” Marie Kondo’s best-selling book The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing and Netflix TV series Tidying Up with Marie Kondo. “I am not by any stretch of the imagination a tidy person, and the piles of unread books on the coffee table and by my bed have a plaintive, pleading quality to me - ‘Read me, please!’” - Linda Grant What does it mean for your data to be “tidy”? While “tidy” has a clear english meaning of “organized”, “tidy” in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham’s definition of tidy data here (Wickham 2014): A dataset is a collection of values, usually either numbers (if quantitative) or strings AKA text data (if qualitative). Values are organised in two ways. Every value belongs to a variable and an observation. A variable contains all values that measure the same underlying attribute (like height, temperature, duration) across units. An observation contains all values measured on the same unit (like a person, or a day, or a city) across attributes. Tidy data is a standard way of mapping the meaning of a dataset to its structure. A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types. In tidy data: Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table. FIGURE 5.2: Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html For example, say the following table consists of stock prices: TABLE 5.1: Stock Prices (Non-Tidy Format) Date Boeing Stock Price Amazon Stock Price Google Stock Price 2009-01-01 $173.55 $174.90 $174.34 2009-01-02 $172.61 $171.42 $170.04 Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format since there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), but there are not three columns. In tidy data format each variable should be its own column, as shown below. Notice that both tables present the same information, but in different formats. TABLE 5.2: Stock Prices (Tidy Format) Date Stock Name Stock Price 2009-01-01 Boeing $173.55 2009-01-02 Boeing $172.61 2009-01-01 Amazon $174.90 2009-01-02 Amazon $171.42 2009-01-01 Google $174.34 2009-01-02 Google $170.04 However, consider the following table TABLE 5.3: Date, Boeing Price, Weather Data Date Boeing Price Weather 2009-01-01 $173.55 Sunny 2009-01-02 $172.61 Overcast In this case, even though the variable “Boeing Price” occurs again, the data is tidy since there are three variables corresponding to three unique pieces of information (Date, Boeing stock price, and the weather that particular day). 5.2.2 Converting to “tidy” format In this book so far, you’ve only seen data frames that were already in “tidy” format. Furthermore for the rest of this book, you’ll only see data frames that are already in “tidy” format. This is not always the case however with data in the wild. If your original data is in wide AKA non-“tidy” format and you would like to use the ggplot2 or dplyr packages on it, you will have to convert it “tidy” format using the gather() function in the tidyr package (Wickham and Henry 2018). Going back to our drinks_smaller data frame drinks_smaller # A tibble: 4 x 4 country beer spirit wine &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 China 79 192 8 2 Italy 85 42 237 3 Saudi Arabia 0 5 0 4 USA 249 158 84 let’s convert it to “tidy” format by using the gather() function from the tidyr package: drinks_smaller_tidy &lt;- drinks_smaller %&gt;% gather(key = type, value = servings, -country) drinks_smaller_tidy # A tibble: 12 x 3 country type servings &lt;chr&gt; &lt;chr&gt; &lt;int&gt; 1 China beer 79 2 Italy beer 85 3 Saudi Arabia beer 0 4 USA beer 249 5 China spirit 192 6 Italy spirit 42 7 Saudi Arabia spirit 5 8 USA spirit 158 9 China wine 8 10 Italy wine 237 11 Saudi Arabia wine 0 12 USA wine 84 We set the key argument to be the name of the column/variable in the new “tidy” frame that contains the column names of the original data frame that you want to gather. Observe we set key = type and in the resulting drinks_smaller_tidy data frame, the column type contains the names beer, spirit, and serving. value argument to be the name of the column/variable in the “tidy” frame that contains the rows and columns of values in the original data frame you want to gather. Observe we set value = servings and in the resulting drinks_smaller_tidy data frame, the column value contains the 4 \\(\\times\\) 3 numerical values. Third argument to be the columns you want to or don’t want to gather. Observe we set this to -country indicating that we don’t want to gather the country variable and in the resulting drinks_smaller_tidy data frame there is still a variable country. With the resulting drinks_smaller_tidy “tidy” format data frame, we can now produce a side-by-side AKA dodged barplot using geom_col() and not geom_bar(), since we would like to map the servings variable to the y-aesthetic of the bars. ggplot(drinks_smaller_tidy, aes(x=country, y=servings, fill=type)) + geom_col(position = &quot;dodge&quot;) Converting “wide” format data to “tidy” format often confuses new R users. The only way to learn to get comfortable with the gather() function is with practice, practice, and more practice. For example, see the examples in the bottom of the help file for gather() by running ?gather. We’ll show another example of using gather() to convert a “wide” formatted data frame to “tidy” format in Section 5.3. For other examples of converting a dataset into “tidy” format, check out the different functions available for data tidying and a case study using data from the World Health Organization in R for Data Science (Grolemund and Wickham 2016). Learning check (LC5.1) Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits? # A tibble: 3 x 4 country beer_servings spirit_servings wine_servings &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 Canada 240 122 100 2 South Korea 140 16 9 3 USA 249 158 84 This data frame is not in tidy format. What would it look like if it were? 5.2.3 nycflights13 package Recall the nycflights13 package with data about all domestic flights departing from New York City in 2013 that we introduced in Section 2.4 and used extensively in Chapter 3 to create visualizations. In particular, let’s revisit the flights data frame by running View(flights) in your console. We see that flights has a rectangular shape with each row corresponding to a different flight and each column corresponding to a characteristic of that flight. This matches exactly with how Hadley Wickham defined tidy data: Each variable forms a column. Each observation forms a row. But what about the third property of “tidy” data? Each type of observational unit forms a table. Observational units: We identified earlier that the observational unit in the flights dataset is an individual flight. And we have shown that this dataset consists of 336,776 flights with 22 variables. In other words, rows of this dataset don’t refer to a measurement on an airline or on an airport; they refer to characteristics/measurements on a given flight from New York City in 2013. Also included in the nycflights13 package are datasets with different observational units (Wickham 2018): airlines: translation between two letter IATA carrier codes and names (16 in total) planes: construction information about each of 3,322 planes used weather: hourly meteorological data (about 8705 observations) for each of the three NYC airports airports: airport names and locations The organization of this data follows the third “tidy” data property: observations corresponding to the same observational unit should be saved in the same table/data frame. Another example involves a spreadsheet of all students enrolled in a university along with information about them, such as name, gender, and date of birth. Each row represents an individual student, which is the observational unit in question. Identification vs measurement variables: There is a subtle difference between the kinds of variables that you will encounter in data frames: measurement variables and identification variables. The airports data frame you worked with above contains both these types of variables. Recall that in airports the observational unit is an airport, and thus each row corresponds to one particular airport. Let’s pull them apart using the glimpse function: glimpse(airports) Observations: 1,458 Variables: 8 $ faa &lt;chr&gt; &quot;04G&quot;, &quot;06A&quot;, &quot;06C&quot;, &quot;06N&quot;, &quot;09J&quot;, &quot;0A9&quot;, &quot;0G6&quot;, &quot;0G7&quot;, &quot;0P2&quot;, … $ name &lt;chr&gt; &quot;Lansdowne Airport&quot;, &quot;Moton Field Municipal Airport&quot;, &quot;Schaumbu… $ lat &lt;dbl&gt; 41.1, 32.5, 42.0, 41.4, 31.1, 36.4, 41.5, 42.9, 39.8, 48.1, 39.… $ lon &lt;dbl&gt; -80.6, -85.7, -88.1, -74.4, -81.4, -82.2, -84.5, -76.8, -76.6, … $ alt &lt;int&gt; 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 409, 875, 1… $ tz &lt;dbl&gt; -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -5, -5, -5,… $ dst &lt;chr&gt; &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;… $ tzone &lt;chr&gt; &quot;America/New_York&quot;, &quot;America/Chicago&quot;, &quot;America/Chicago&quot;, &quot;Amer… The variables faa and name are what we will call identification variables: variables that uniquely identify each observational unit. They are mainly used to provide a unique name to each observational unit, thereby allowing us to uniquely identify them. faa gives the unique code provided by the FAA for that airport, while the name variable gives the longer more natural name of the airport. The remaining variables (lat, lon, alt, tz, dst, tzone) are often called measurement or characteristic variables: variables that describe properties of each observational unit, in other words each observation in each row. For example, lat and long describe the latitude and longitude of each airport. So in our above example of a spreadsheet of all students enrolled at a university, email address could be treated as an identical variable since it uniquely identifies each observational unit i.e. each student, while date of birth could not since it is possible (and highly probable) that two students share the same birthday. Furthermore, sometimes a single variable might not be enough to uniquely identify each observational unit: combinations of variables might be needed (see Learning Check below). While it is not an absolute rule, for organizational purposes it is considered good practice to have your identification variables in the far left-most columns of your data frame. Learning check (LC5.2) What properties of the observational unit do each of lat, lon, alt, tz, dst, and tzone describe for the airports data frame? Note that you may want to use ?airports to get more information. (LC5.3) Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions. 5.3 Case study: Democracy in Guatemala In this section, we’ll show you another example of how to convert a dataset that isn’t in “tidy” format i.e. “wide” format, to a dataset that is in “tidy” format i.e. “long/narrow” format using the gather() function from the tidyr package.. Let’s use the dem_score data frame we imported in Section 5.1, but focus on only data corresponding to the country of Guatemala. guat_dem &lt;- dem_score %&gt;% filter(country == &quot;Guatemala&quot;) guat_dem # A tibble: 1 x 10 country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 Guatemala 2 -6 -5 3 1 -3 -7 3 3 Now let’s produce a plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Let’s start by laying out how we would map our aesthetics to variables in the data frame: The data frame is guat_dem by setting data = guat_dem What are the names of the variables to plot? We’d like to see how the democracy score has changed over the years. Now we are stuck in a predicament. We see that we have a variable named country but its only value is &quot;Guatemala&quot;. We have other variables denoted by different year values. Unfortunately, we’ve run into a dataset that is not in the appropriate format to apply the Grammar of Graphics and ggplot2. Remember that ggplot2 is a package in the tidyverse and, thus, needs data to be in a tidy format. We’d like to finish off our mapping of aesthetics to variables by doing something like The aesthetic mapping is set by aes(x = year, y = democracy_score) but this is not possible with our wide-formatted data. We need to take the values of the current column names in guat_dem (aside from country) and convert them into a new variable that will act as a key called year. Then, we’d like to take the numbers on the inside of the table and turn them into a column that will act as values called democracy_score. Our resulting data frame will have three columns: country, year, and democracy_score. The gather() function in the tidyr package can complete this task for us. The first argument to gather(), just as with ggplot2(), is the data argument where we specify which data frame we would like to tidy. The next two arguments to gather() are key and value, which specify what we’d like to call the new columns that convert our wide data into long format. Lastly, we include a specification for variables we’d like to NOT include in this tidying process using a -. guat_tidy &lt;- guat_dem %&gt;% gather(key = year, value = democracy_score, -country) guat_tidy # A tibble: 9 x 3 country year democracy_score &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; 1 Guatemala 1952 2 2 Guatemala 1957 -6 3 Guatemala 1962 -5 4 Guatemala 1967 3 5 Guatemala 1972 1 6 Guatemala 1977 -3 7 Guatemala 1982 -7 8 Guatemala 1987 3 9 Guatemala 1992 3 We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a linegraph and ggplot2. ggplot(guat_tidy, aes(x = year, y = democracy_score)) + geom_line() geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? Observe that the year variable in guat_tidy is stored as a character vector since we had to circumvent the naming rules in R by adding backticks around the different year columns in guat_dem. This is leading to ggplot not knowing exactly how to plot a line using a categorical variable. We can fix this by using the parse_number() function in the readr package and then specify the horizontal axis label to be &quot;year&quot;: ggplot(guat_tidy, aes(x = parse_number(year), y = democracy_score)) + geom_line() + labs(x = &quot;year&quot;) FIGURE 5.3: Guatemala’s democracy score ratings from 1952 to 1992 We’ll see in Chapter 4 how we could use the mutate() function to change year to be a numeric variable instead after we have done our tidying. Notice now that the mappings of aesthetics to variables make sense in Figure 5.3: The data frame is guat_tidy by setting data = dem_score The x aesthetic is mapped to year The y aesthetic is mapped to democracy_score The geom_etry chosen is line Learning check (LC5.4) Convert the dem_score data frame into a tidy data frame and assign the name of dem_score_tidy to the resulting long-formatted data frame. (LC5.5) Read in the life expectancy data stored at https://moderndive.com/data/le_mess.csv and convert it to a tidy data frame. 5.4 Conclusion 5.4.1 tidyverse package Notice at the beginning of the Chapter we loaded the following four packages: library(dplyr) library(ggplot2) library(readr) library(tidyr) In fact, these are among the four of the most frequently used R packages for data science. There is a much quicker way to load these packages than by individually loading them as we did above. We can install and load the tidyverse package. The tidyverse package acts as an “umbrella” package whereby installing/loading it will install/load multiple packages at once for you. So that after installing the tidyverse package as you would a normal package, running this: library(tidyverse) would be the same as running this: library(ggplot2) library(dplyr) library(tidyr) library(readr) library(purrr) library(tibble) library(stringr) library(forcats) You’ve seen the first 4 of the these packages: ggplot2 for data visualization, dplyr for data wrangling, tidyr for converting data to “tidy” format, and readr for importing spreadsheet data into R. The remaining packages (purrr, tibble, stringr, and forcats) are left for a more advanced book; check out R for Data Science to learn about these packages. The tidyverse “umbrella” package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in “tidy” format and all output data frames are in “tidy” format as well. This acts as a standardization to make transitions between the various functions in these packages as seamless as possible. 5.4.2 Optional: Normal forms of data The datasets included in the nycflights13 package are in a form that minimizes redundancy of data. We will see that there are ways to merge (or join) the different tables together easily. We are capable of doing so because each of the tables have keys in common to relate one to another. This is an important property of normal forms of data. The process of decomposing data frames into less redundant tables without losing information is called normalization. More information is available on Wikipedia. We saw an example of this above with the airlines dataset. While the flights data frame could also include a column with the names of the airlines instead of the carrier code, this would be repetitive since there is a unique mapping of the carrier code to the name of the airline/carrier. Below an example is given showing how to join the airlines data frame together with the flights data frame by linking together the two datasets via a common key of &quot;carrier&quot;. Note that this “joined” data frame is assigned to a new data frame called joined_flights. The key variable that we frequently join by is one of the identification variables mentioned above. joined_flights &lt;- inner_join(x = flights, y = airlines, by = &quot;carrier&quot;) View(joined_flights) If we View() this dataset, we see a new variable has been created called name. (We will see in Subsection 4.8.2 ways to change name to a more descriptive variable name.) More discussion about joining data frames together will be given in Chapter 4. We will see there that the names of the columns to be linked need not match as they did here with &quot;carrier&quot;. Learning check (LC5.6) What are common characteristics of “tidy” datasets? (LC5.7) What makes “tidy” datasets useful for organizing data? (LC5.8) What are some advantages of data in normal forms? What are some disadvantages? 5.4.3 Additional resources An R script file of all R code used in this chapter is available here. If you want to learn more about using the readr and tidyr package, we suggest you that you check out RStudio’s “Data Import” cheatsheet. You can access this cheatsheet by going to RStudio’s cheatsheet page and searching for “Data Import Cheat Sheet”. FIGURE 5.4: Data Import cheatsheat 5.4.4 What’s to come? Congratulations! We’ve completed the “Data Science via the tidyverse” portion of this book! We’ll now move to the “data modeling” portion in Chapters 6 and 7, where you’ll leverage your data visualization and wrangling skills to model relationships between different variables in datasets. However, we’re going to leave the Chapter 11 on “Inference for Regression” until after we’ve covered statistical inference. FIGURE 5.5: ModernDive flowchart - On to Part II! References "],
-["6-regression.html", "Chapter 6 Basic Regression 6.1 One numerical explanatory variable 6.2 One categorical explanatory variable 6.3 Related topics 6.4 Conclusion", " Chapter 6 Basic Regression Now that we are equipped with data visualization skills from Chapter 3, an understanding of the “tidy” data format from Chapter 5, and data wrangling skills from Chapter 4, we now proceed with data modeling. The fundamental premise of data modeling is to make explicit the relationship between: an outcome variable \\(y\\), also called a dependent variable and an explanatory/predictor variable \\(x\\), also called an independent variable or covariate. Another way to state this is using mathematical terminology: we will model the outcome variable \\(y\\) as a function of the explanatory/predictor variable \\(x\\). Why do we have two different labels, explanatory and predictor, for the variable \\(x\\)? That’s because roughly speaking data modeling can be used for two purposes: Modeling for prediction: You want to predict an outcome variable \\(y\\) based on the information contained in a set of predictor variables. You don’t care so much about understanding how all the variables relate and interact, but so long as you can make good predictions about \\(y\\), you’re fine. For example, if we know many individuals’ risk factors for lung cancer, such as smoking habits and age, can we predict whether or not they will develop lung cancer? Here we wouldn’t care so much about distinguishing the degree to which the different risk factors contribute to lung cancer, but instead only on whether or not they could be put together to make reliable predictions. Modeling for explanation: You want to explicitly describe the relationship between an outcome variable \\(y\\) and a set of explanatory variables, determine the significance of any found relationships, and have measures summarizing these. Continuing our example from above, we would now be interested in describing the individual effects of the different risk factors and quantifying the magnitude of these effects. One reason could be to design an intervention to reduce lung cancer cases in a population, such as targeting smokers of a specific age group with an advertisement for smoking cessation programs. In this book, we’ll focus more on this latter purpose. Data modeling is used in a wide variety of fields, including statistical inference, causal inference, artificial intelligence, and machine learning. There are many techniques for data modeling, such as tree-based models, neural networks and deep learning, and supervised learning. In this chapter, we’ll focus on one particular technique: linear regression, one of the most commonly-used and easy-to-understand approaches to modeling. Recall our discussion in Subsection 2.4.3 on numerical and categorical variables. Linear regression involves: an outcome variable \\(y\\) that is numerical and explanatory variables \\(\\vec{x}\\) that are either numerical or categorical. With linear regression there is always only one numerical outcome variable \\(y\\) but we have choices on both the number and the type of explanatory variables \\(\\vec{x}\\) to use. We’re going to cover the following regression scenarios: In this current chapter on basic regression, we’ll always have only one explanatory variable. In Section 6.1, this explanatory variable will be a single numerical explanatory variable \\(x\\). This scenario is known as simple linear regression. In Section 6.2, this explanatory variable will be a categorical explanatory variable \\(x\\). In the next chapter, Chapter 7 on multiple regression, we’ll have more than one explanatory variable: We’ll focus on two numerical explanatory variables \\(x_1\\) and \\(x_2\\) in Section 7.1. This can be denoted as \\(\\vec{x}\\) as well since we have more than one explanatory variable. We’ll use one numerical and one categorical explanatory variable in Section 7.1. We’ll also introduce interaction models here; there, the effect of one explanatory variable depends on the value of another. We’ll study all four of these regression scenarios using real data, all easily accessible via R packages! Needed packages In this chapter we introduce a new package, moderndive, that is an accompaniment package to this ModernDive book. It includes useful functions for linear regression and other functions as well as data used later in the book. Let’s now load all the packages needed for this chapter. If needed, read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(gapminder) library(skimr) DataCamp The introductory basic regression analysis below was the inspiration for a large part of ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 2 “Modeling with Basic Regression”. 6.1 One numerical explanatory variable Why do some professors and instructors at universities and colleges get high teaching evaluations from students while others don’t? What factors can explain these differences? Are there biases? These are questions that are of interest to university/college administrators, as teaching evaluations are among the many criteria considered in determining which professors and instructors should get promotions. Researchers at the University of Texas in Austin, Texas (UT Austin) tried to answer this question: what factors can explain differences in instructor’s teaching evaluation scores? To this end, they collected information on \\(n = 463\\) instructors. A full description of the study can be found at openintro.org. We’ll keep things simple for now and try to explain differences in instructor evaluation scores as a function of one numerical variable: their “beauty score.” The specifics on how this score was calculated will be described shortly. Could it be that instructors with higher beauty scores also have higher teaching evaluations? Could it be instead that instructors with higher beauty scores tend to have lower teaching evaluations? Or could it be there is no relationship between beauty score and teaching evaluations? We’ll achieve ways to address these questions by modeling the relationship between these two variables with a particular kind of linear regression called simple linear regression. Simple linear regression is the most basic form of linear regression. With it we have A numerical outcome variable \\(y\\). In this case, their teaching score. A single numerical explanatory variable \\(x\\). In this case, their beauty score. 6.1.1 Exploratory data analysis A crucial step before doing any kind of modeling or analysis is performing an exploratory data analysis, or EDA, of all our data. Exploratory data analysis can give you a sense of the distribution of the data, and whether there are outliers and/or missing values. Most importantly, it can inform how to build your model. There are many approaches to exploratory data analysis; here are three: Most fundamentally: just looking at the raw values, in a spreadsheet for example. While this may seem trivial, many people ignore this crucial step! Computing summary statistics like means, medians, and standard deviations. Creating data visualizations. Let’s load the data, select only a subset of the variables, and look at the raw values. Recall you can look at the raw values by running View() in the console in RStudio to pop-up the spreadsheet viewer with the data frame of interest as the argument to View(). Here, however, we present only a snapshot of five randomly chosen rows: evals_ch6 &lt;- evals %&gt;% select(score, bty_avg, age) evals_ch6 %&gt;% sample_n(5) TABLE 6.1: Random sample of 5 instructors score bty_avg age 3.6 6.67 34 4.9 3.50 43 3.3 2.33 47 4.4 4.67 33 4.7 3.67 60 While a full description of each of these variables can be found at openintro.org, let’s summarize what each of these variables represents. score: Numerical variable of the average teaching score based on students’ evaluations between 1 and 5. This is the outcome variable \\(y\\) of interest. bty_avg: Numerical variable of average “beauty” rating based on a panel of 6 students’ scores between 1 and 10. This is the numerical explanatory variable \\(x\\) of interest. Here 1 corresponds to a low beauty rating and 10 to a high beauty rating. age: A numerical variable of age in years as an integer value. Another way to look at the raw values is using the glimpse() function, which gives us a slightly different view of the data. We see Observations: 463, indicating that there are 463 observations in evals, each corresponding to a particular instructor at UT Austin. Expressed differently, each row in the data frame evals corresponds to one of 463 instructors. glimpse(evals_ch6) Observations: 463 Variables: 3 $ score &lt;dbl&gt; 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4.5, 4… $ bty_avg &lt;dbl&gt; 5.00, 5.00, 5.00, 5.00, 3.00, 3.00, 3.00, 3.33, 3.33, 3.17, 3… $ age &lt;int&gt; 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, 40, 4… Since both the outcome variable score and the explanatory variable bty_avg are numerical, we can compute summary statistics about them such as the mean, median, and standard deviation. Let’s take evals_ch6 and select only the two variables of interest for now. However, let’s instead pipe this into the skim() function from the skimr package. This function quickly uses a “skim” of the data to return the following summary information about each variable. evals_ch6 %&gt;% select(score, bty_avg) %&gt;% skim() Skim summary statistics n obs: 463 n variables: 2 Variable type: numeric variable missing complete n mean sd p0 p25 p50 p75 p100 hist bty_avg 0 463 463 4.42 1.53 1.67 3.17 4.33 5.5 8.17 ▂▅▅▇▃▃▂▁ score 0 463 463 4.17 0.54 2.3 3.8 4.3 4.6 5 ▁▁▂▃▅▇▇▆ In this case for our two numerical variables bty_avg beauty score and teaching score score it returns: missing: the number of missing values complete: the number of non-missing or complete values n: the total number of values mean: the average sd: the standard deviation p0: the 0th percentile: the value at which 0% of observations are smaller than it. This is also known as the minimum p25: the 25th percentile: the value at which 25% of observations are smaller than it. This is also known as the 1st quartile p50: the 50th percentile: the value at which 50% of observations are smaller than it. This is also know as the 2nd quartile and more commonly the median p75: the 75th percentile: the value at which 75% of observations are smaller than it. This is also known as the 3rd quartile p100: the 100th percentile: the value at which 100% of observations are smaller than it. This is also known as the maximum A quick snapshot of the histogram We get an idea of how the values in both variables are distributed. For example, the mean teaching score was 4.17 out of 5 whereas the mean beauty score was 4.42 out of 10. Furthermore, the middle 50% of teaching scores were between 3.80 and 4.6 (the first and third quartiles) while the middle 50% of beauty scores were between 3.17 and 5.5 out of 10. The skim() function however only returns what are called univariate summaries, i.e. summaries about single variables at a time. Since we are considering the relationship between two numerical variables, it would be nice to have a summary statistic that simultaneously considers both variables. The correlation coefficient is a bivariate summary statistic that fits this bill. Coefficients in general are quantitative expressions of a specific property of a phenomenon. A correlation coefficient is a quantitative expression between -1 and 1 that summarizes the strength of the linear relationship between two numerical variables: -1 indicates a perfect negative relationship: as the value of one variable goes up, the value of the other variable tends to go down. 0 indicates no relationship: the values of both variables go up/down independently of each other. +1 indicates a perfect positive relationship: as the value of one variable goes up, the value of the other variable tends to go up as well. Figure 6.1 gives examples of different correlation coefficient values for hypothetical numerical variables \\(x\\) and \\(y\\). We see that while for a correlation coefficient of -0.75 there is still a negative relationship between \\(x\\) and \\(y\\), it is not as strong as the negative relationship between \\(x\\) and \\(y\\) when the correlation coefficient is -1. FIGURE 6.1: Different correlation coefficients The correlation coefficient is computed using the get_correlation() function in the moderndive package, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. We place the name of the response variable on the left hand side of the ~ and the explanatory variable on the right hand side of the “tilde.” We will use this same “formula” syntax with regression later in this chapter. evals_ch6 %&gt;% get_correlation(formula = score ~ bty_avg) # A tibble: 1 x 1 correlation &lt;dbl&gt; 1 0.187 The correlation coefficient can also be computed using the cor() function, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. Recall from Subsection 2.4.3 that the $ pulls out specific variables from a data frame: cor(x = evals_ch6$bty_avg, y = evals_ch6$score) [1] 0.187 In our case, the correlation coefficient of 0.187 indicates that the relationship between teaching evaluation score and beauty average is “weakly positive.” There is a certain amount of subjectivity in interpreting correlation coefficients, especially those that aren’t close to -1, 0, and 1. For help developing such intuition and more discussion on the correlation coefficient see Subsection 6.3.1 below. Let’s now proceed by visualizing this data. Since both the score and bty_avg variables are numerical, a scatterplot is an appropriate graph to visualize this data. Let’s do this using geom_point() and set informative axes labels and title and display the result in Figure 6.2. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) FIGURE 6.2: Instructor evaluation scores at UT Austin Observe the following: Most “beauty” scores lie between 2 and 8. Most teaching scores lie between 3 and 5. Recall our earlier computation of the correlation coefficient, which describes the strength of the linear relationship between two numerical variables. Looking at Figure 6.3, it is not immediately apparent that these two variables are positively related. This is to be expected given the positive, but rather weak (close to 0), correlation coefficient of 0.187. Before we continue, we bring to light an important fact about this dataset: it suffers from overplotting. Recall from the data visualization Subsection 3.3.2 that overplotting occurs when several points are stacked directly on top of each other thereby obscuring the number of points. For example, let’s focus on the 6 points in the top-right of the plot with a beauty score of around 8 out of 10: are there truly only 6 points, or are there many more just stacked on top of each other? You can think of these as ties. Let’s break up these ties with a little random “jitter” added to the points in Figure 6.3. FIGURE 6.3: Instructor evaluation scores at UT Austin: Jittered Jittering adds a little random bump to each of the points to break up these ties: just enough so you can distinguish them, but not so much that the plot is overly altered. Furthermore, jittering is strictly a visualization tool; it does not alter the original values in the dataset. Let’s compare side-by-side the regular scatterplot in Figure 6.2 with the jittered scatterplot in Figure 6.3 in Figure 6.4. FIGURE 6.4: Comparing regular and jittered scatterplots. We make several further observations: Focusing our attention on the top-right of the plot again, as noted earlier where there seemed to only be 6 points in the regular scatterplot, we see there were in fact really 9 as seen in the jittered scatterplot. A further interesting trend is that the jittering revealed a large number of instructors with beauty scores of between 3 and 4.5, towards the lower end of the beauty scale. To keep things simple in this chapter, we’ll present regular scatterplots rather than the jittered scatterplots, though we’ll keep the overplotting in mind whenever looking at such plots. Going back to scatterplot in Figure 6.2, let’s improve on it by adding a “regression line” in Figure 6.5. This is easily done by adding a new layer to the ggplot code that created Figure 6.3: + geom_smooth(method = &quot;lm&quot;). A regression line is a “best fitting” line in that of all possible lines you could draw on this plot, it is “best” in terms of some mathematical criteria. We discuss the criteria for “best” in Subsection 6.3.3 below, but we suggest you read this only after covering the concept of a residual coming up in Subsection 6.1.3. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) + geom_smooth(method = &quot;lm&quot;) FIGURE 6.5: Regression line When viewed on this plot, the regression line is a visual summary of the relationship between two numerical variables, in our case the outcome variable score and the explanatory variable bty_avg. The positive slope of the blue line is consistent with our observed correlation coefficient of 0.187 suggesting that there is a positive relationship between score and bty_avg. We’ll see later however that while the correlation coefficient is not equal to the slope of this line, they always have the same sign: positive or negative. What are the grey bands surrounding the blue line? These are standard error bands, which can be thought of as error/uncertainty bands. Let’s skip this idea for now and suppress these grey bars by adding the argument se = FALSE to geom_smooth(method = &quot;lm&quot;). We’ll introduce standard errors in Chapter 8 on sampling, use them for constructing confidence intervals and conducting hypothesis tests in Chapters 9 and 10, and consider them when we revisit regression in Chapter 11. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) FIGURE 6.6: Regression line without error bands Learning check (LC6.1) Conduct a new exploratory data analysis with the same outcome variable \\(y\\) being score but with age as the new explanatory variable \\(x\\). Remember, this involves three things: Looking at the raw values. Computing summary statistics of the variables of interest. Creating informative visualizations. What can you say about the relationship between age and teaching scores based on this exploration? 6.1.2 Simple linear regression You may recall from secondary school / high school algebra, in general, the equation of a line is \\(y = a + bx\\), which is defined by two coefficients. Recall we defined this earlier as “quantitative expressions of a specific property of a phenomenon.” These two coefficients are: the intercept coefficient \\(a\\), or the value of \\(y\\) when \\(x = 0\\), and the slope coefficient \\(b\\), or the increase in \\(y\\) for every increase of one in \\(x\\). However, when defining a line specifically for regression, like the blue regression line in Figure 6.6, we use slightly different notation: the equation of the regression line is \\(\\widehat{y} = b_0 + b_1 \\cdot x\\) where the intercept coefficient is \\(b_0\\), or the value of \\(\\widehat{y}\\) when \\(x=0\\), and the slope coefficient \\(b_1\\), or the increase in \\(\\widehat{y}\\) for every increase of one in \\(x\\). Why do we put a “hat” on top of the \\(y\\)? It’s a form of notation commonly used in regression, which we’ll introduce in the next Subsection 6.1.3 when we discuss fitted values. For now, let’s ignore the hat and treat the equation of the line as you would from secondary school / high school algebra recognizing the slope and the intercept. We know looking at Figure 6.6 that the slope coefficient corresponding to bty_avg should be positive. Why? Because as bty_avg increases, professors tend to roughly have larger teaching evaluation scores. However, what are the specific values of the intercept and slope coefficients? Let’s not worry about computing these by hand, but instead let the computer do the work for us. Specifically let’s use R! Let’s get the value of the intercept and slope coefficients by outputting something called the linear regression table. We will fit the linear regression model to the data using the lm() function and save this to score_model. lm stands for “linear model”, given that we are dealing with lines. When we say “fit”, we are saying find the best fitting line to this data. The lm() function that “fits” the linear regression model is typically used as lm(y ~ x, data = data_frame_name) where: y is the outcome variable, followed by a tilde (~). This is likely the key to the left of “1” on your keyboard. In our case, y is set to score. x is the explanatory variable. In our case, x is set to bty_avg. We call the combination y ~ x a model formula. Recall the use of this notation when we computed the correlation coefficient using the get_correlation() function in Subsection 6.1.1. data_frame_name is the name of the data frame that contains the variables y and x. In our case, data_frame_name is the evals_ch6 data frame. score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) score_model Call: lm(formula = score ~ bty_avg, data = evals_ch6) Coefficients: (Intercept) bty_avg 3.8803 0.0666 This output is telling us that the Intercept coefficient \\(b_0\\) of the regression line is 3.8803 and the slope coefficient for by_avg is 0.0666. Therefore the blue regression line in Figure 6.6 is \\[\\widehat{\\text{score}} = b_0 + b_{\\text{bty avg}} \\cdot\\text{bty avg} = 3.8803 + 0.0666\\cdot\\text{ bty avg}\\] where The intercept coefficient \\(b_0 = 3.8803\\) means for instructors that had a hypothetical beauty score of 0, we would expect them to have on average a teaching score of 3.8803. In this case however, while the intercept has a mathematical interpretation when defining the regression line, there is no practical interpretation since score is an average of a panel of 6 students’ ratings from 1 to 10, a bty_avg of 0 would be impossible. Furthermore, no instructors had a beauty score anywhere near 0 in this data. Of more interest is the slope coefficient associated with bty_avg: \\(b_{\\text{bty avg}} = +0.0666\\). This is a numerical quantity that summarizes the relationship between the outcome and explanatory variables. Note that the sign is positive, suggesting a positive relationship between beauty scores and teaching scores, meaning as beauty scores go up, so also do teaching scores go up. The slope’s precise interpretation is: For every increase of 1 unit in bty_avg, there is an associated increase of, on average, 0.0666 units of score. Such interpretations need be carefully worded: We only stated that there is an associated increase, and not necessarily a causal increase. For example, perhaps it’s not that beauty directly affects teaching scores, but instead individuals from wealthier backgrounds tend to have had better education and training, and hence have higher teaching scores, but these same individuals also have higher beauty scores. Avoiding such reasoning can be summarized by the adage “correlation is not necessarily causation.” In other words, just because two variables are correlated, it doesn’t mean one directly causes the other. We discuss these ideas more in Subsection 6.3.2. We say that this associated increase is on average 0.0666 units of teaching score and not that the associated increase is exactly 0.0666 units of score across all values of bty_avg. This is because the slope is the average increase across all points as shown by the regression line in Figure 6.6. Now that we’ve learned how to compute the equation for the blue regression line in Figure 6.6 and interpreted all its terms, let’s take our modeling one step further. This time after fitting the model using the lm(), let’s get something called the regression table using the get_regression_table() function from the moderndive package: # Fit regression model: score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) # Get regression table: get_regression_table(score_model) TABLE 6.2: Linear regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 Note how we took the output of the model fit saved in score_model and used it as an input to the subsequent get_regression_table() function. The output now looks like a table: in fact it is a data frame. The values of the intercept and slope of 3.880 and 0.0666 are now in the estimate column. But what are the remaining 5 columns: std_error, statistic, p_value, lower_ci and upper_ci? What do they tell us? They tell us about both the statistical significance and practical significance of our model results. You can think of this loosely as the “meaningfulness” of the results from a statistical perspective. We are going to put aside these ideas for now and revisit them in Chapter 11 on (statistical) inference for regression, after we’ve had a chance to cover: Standard errors in Chapter 8 (std_error) Confidence intervals in Chapter 9 (lower_ci and upper_ci) Hypothesis testing in Chapter 10 (statistic and p_value). For now, we’ll only focus on the term and estimate columns of any regression table. The get_regression_table() from the moderndive is an example of what’s known as a wrapper function in computer programming, which takes other pre-existing functions and “wraps” them into a single function. This concept is illustrated in Figure 6.7. FIGURE 6.7: The concept of a ‘wrapper’ function. So all you need to worry about is the what the inputs look like and what the outputs look like; you leave all the other details “under the hood of the car.” In our regression modeling example, the get_regression_table() has Input: A saved lm() linear regression Output: A data frame with information on the intercept and slope of the regression line. If you’re interested in learning more about the get_regression_table() function’s construction and thinking, see Subsection 6.3.4 below. Learning check (LC6.2) Fit a new simple linear regression using lm(score ~ age, data = evals_ch6) where age is the new explanatory variable \\(x\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 6.1.3 Observed/fitted values and residuals We just saw how to get the value of the intercept and the slope of the regression line from the regression table generated by get_regression_table(). Now instead, say we want information on individual points. In this case, we focus on one of the \\(n = 463\\) instructors in this dataset, corresponding to a single row of evals_ch6. For example, say we are interested in the 21st instructor in this dataset: TABLE 6.3: Data for 21st instructor score bty_avg age 4.9 7.33 31 What is the value on the blue line corresponding to this instructor’s bty_avg of 7.333? In Figure 6.8 we mark three values in particular corresponding to this instructor. Red circle: This is the observed value \\(y\\) = 4.9 and corresponds to this instructor’s actual teaching score. Red square: This is the fitted value \\(\\widehat{y}\\) and corresponds to the value on the regression line for \\(x\\) = 7.333. This value is computed using the intercept and slope in the regression table above: \\[\\widehat{y} = b_0 + b_1 \\cdot x = 3.88 + 0.067 * 7.333 = 4.369\\] Blue arrow: The length of this arrow is the residual and is computed by subtracting the fitted value \\(\\widehat{y}\\) from the observed value \\(y\\). The residual can be thought of as the error or “lack of fit” of the regression line. In the case of this instructor, it is \\(y - \\widehat{y}\\) = 4.9 - 4.369 = 0.531. In other words, the model was off by 0.531 teaching score units for this instructor. FIGURE 6.8: Example of observed value, fitted value, and residual What if we want both the fitted value \\(\\widehat{y} = b_0 + b_1 \\cdot x\\) and the residual \\(y - \\widehat{y}\\) not only the 21st instructor but for all 463 instructors in the study? Recall that each instructor corresponds to one of the 463 rows in the evals_ch6 data frame and also one of the 463 points in the regression plot in Figure 6.6. We could repeat the above calculations by hand 463 times, but that would be tedious and time consuming. Instead, let’s use the get_regression_points() function that we’ve included in the moderndive R package. Note that in the table below we only present the results for the 21st through the 24th instructors. regression_points &lt;- get_regression_points(score_model) regression_points TABLE 6.4: Regression points (for only 21st through 24th instructor) ID score bty_avg score_hat residual 21 4.9 7.33 4.37 0.531 22 4.6 7.33 4.37 0.231 23 4.5 7.33 4.37 0.131 24 4.4 5.50 4.25 0.153 Just as with the get_regression_table() function, the inputs to the get_regression_points() function are the same, however the outputs are different. Let’s inspect the individual columns: The score column represents the observed value of the outcome variable \\(y\\). The bty_avg column represents the values of the explanatory variable \\(x\\). The score_hat column represents the fitted values \\(\\widehat{y}\\). The residual column represents the residuals \\(y - \\widehat{y}\\). get_regression_points() is another example of a wrapper function we described in Figure 6.7. If you’re curious about this function as well, check out Subsection 6.3.4. Just as we did for the 21st instructor in the evals_ch6 dataset (in the first row of the table above), let’s repeat the above calculations for the 24th instructor in the evals_ch6 dataset (in the fourth row of the table above): score = 4.4 is the observed value \\(y\\) for this instructor. bty_avg = 5.50 is the value of the explanatory variable \\(x\\) for this instructor. score_hat = 4.25 = 3.88 + 0.067 * \\(x\\) = 3.88 + 0.067 * 5.50 is the fitted value \\(\\widehat{y}\\) for this instructor. residual = 0.153 = 4.4 - 4.25 is the value of the residual for this instructor. In other words, the model was off by 0.153 teaching score units for this instructor. More development of this idea appears in Section 6.3.3 and we encourage you to read that section after you investigate residuals. 6.2 One categorical explanatory variable It’s an unfortunate truth that life expectancy is not the same across various countries in the world; there are a multitude of factors that are associated with how long people live. International development agencies are very interested in studying these differences in the hope of understanding where governments should allocate resources to address this problem. In this section, we’ll explore differences in life expectancy in two ways: Differences between continents: Are there significant differences in life expectancy, on average, between the five continents of the world: Africa, the Americas, Asia, Europe, and Oceania? Differences within continents: How does life expectancy vary within the world’s five continents? For example, is the spread of life expectancy among the countries of Africa larger than the spread of life expectancy among the countries of Asia? To answer such questions, we’ll study the gapminder dataset in the gapminder package. Recall we mentioned this dataset in Subsection 3.1.2 when we first studied the “Grammar of Graphics” introduced in Figure 3.1. This dataset has international development statistics such as life expectancy, GDP per capita, and population by country (\\(n\\) = 142) for 5-year intervals between 1952 and 2007. We’ll use this data for linear regression again, but note that our explanatory variable \\(x\\) is now categorical, and not numerical like when we covered simple linear regression in Section 6.1. More precisely, we have: A numerical outcome variable \\(y\\). In this case, life expectancy. A single categorical explanatory variable \\(x\\), In this case, the continent the country is part of. When the explanatory variable \\(x\\) is categorical, the concept of a “best-fitting” line is a little different than the one we saw previously in Section 6.1 where the explanatory variable \\(x\\) was numerical. We’ll study these differences shortly in Subsection 6.2.2, but first we conduct our exploratory data analysis. 6.2.1 Exploratory data analysis Let’s load the gapminder data and filter() for only observations in 2007. Next we select() only the variables we’ll need along with gdpPercap, which is each country’s gross domestic product per capita (GDP). GDP is a rough measure of that country’s economic performance. (This will be used for the upcoming Learning Check). Lastly, we save this in a data frame with name gapminder2007: library(gapminder) gapminder2007 &lt;- gapminder %&gt;% filter(year == 2007) %&gt;% select(country, continent, lifeExp, gdpPercap) You should look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function. In Table 6.5 we only show 5 randomly selected countries out of 142: View(gapminder2007) TABLE 6.5: Random sample of 5 countries country continent lifeExp gdpPercap Namibia Africa 52.9 4811 Portugal Europe 78.1 20510 Iran Asia 71.0 11606 Brazil Americas 72.4 9066 Italy Europe 80.5 28570 glimpse(gapminder2007) Observations: 142 Variables: 4 $ country &lt;fct&gt; Afghanistan, Albania, Algeria, Angola, Argentina, Australia… $ continent &lt;fct&gt; Asia, Europe, Africa, Africa, Americas, Oceania, Europe, As… $ lifeExp &lt;dbl&gt; 43.8, 76.4, 72.3, 42.7, 75.3, 81.2, 79.8, 75.6, 64.1, 79.4,… $ gdpPercap &lt;dbl&gt; 975, 5937, 6223, 4797, 12779, 34435, 36126, 29796, 1391, 33… We see that the variable continent is indeed categorical, as it is encoded as fct which stands for “factor.” This is R’s way of storing categorical variables. Let’s once again apply the skim() function from the skimr package to our two variables of interest: continent and lifeExp: gapminder2007 %&gt;% select(continent, lifeExp) %&gt;% skim() Skim summary statistics n obs: 142 n variables: 2 ── Variable type:factor ────── variable missing complete n n_unique top_counts continent 0 142 142 5 Afr: 52, Asi: 33, Eur: 30, Ame: 25 ordered FALSE ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 lifeExp 0 142 142 67.01 12.07 39.61 57.16 71.94 76.41 82.6 hist ▂▂▂▂▂▃▇▇ The output now reports summaries for categorical variables (the variable type: factor) separately from the numerical variables. For the categorical variable continent it now reports: missing, complete, n as before which are the number of missing, complete, and total number of values. n_unique: The unique number of levels to this variable, corresponding to Africa, Asia, Americas, Europe, and Oceania top_counts: In this case the top four counts: Africa has 52 entries each corresponding to a country, Asia has 33, Europe has 30, and Americans has 25. Not displayed is Oceania with 2 countries ordered: Reporting whether the variable is “ordinal.” In this case, it is not ordered. Given that the global median life expectancy is 71.94, half of the world’s countries (71 countries) will have a life expectancy less than 71.94. Further, half will have a life expectancy greater than this value. The mean life expectancy of 67.01 is lower however. Why are these two values different? Let’s look at a histogram of lifeExp in Figure 6.9 to see why. FIGURE 6.9: Histogram of Life Expectancy in 2007 We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancies that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a group_by(continent) to the above code: lifeExp_by_continent &lt;- gapminder2007 %&gt;% group_by(continent) %&gt;% summarize(median = median(lifeExp), mean = mean(lifeExp)) TABLE 6.6: Life expectancy by continent continent median mean Africa 52.9 54.8 Americas 72.9 73.6 Asia 72.4 70.7 Europe 78.6 77.6 Oceania 80.7 80.7 We see now that there are differences in life expectancies between the continents. For example let’s focus on only medians. While the median life expectancy across all \\(n = 142\\) countries in 2007 was 71.935, the median life expectancy across the \\(n =52\\) countries in Africa was only 52.927. Let’s create a corresponding visualization. One way to compare the life expectancies of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section 3.6, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure 6.10, the variable we facet by is continent, which is categorical with five levels, each corresponding to the five continents of the world. ggplot(gapminder2007, aes(x = lifeExp)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + labs(x = &quot;Life expectancy&quot;, y = &quot;Number of countries&quot;, title = &quot;Life expectancy by continent&quot;) + facet_wrap(~ continent, nrow = 2) FIGURE 6.10: Life expectancy in 2007 Another way would be via a geom_boxplot where we map the categorical variable continent to the \\(x\\)-axis and the different life expectancies within each continent on the \\(y\\)-axis; we do this in Figure 6.11. ggplot(gapminder2007, aes(x = continent, y = lifeExp)) + geom_boxplot() + labs(x = &quot;Continent&quot;, y = &quot;Life expectancy (years)&quot;, title = &quot;Life expectancy by continent&quot;) FIGURE 6.11: Life expectancy in 2007 Some people prefer comparing a numerical variable between different levels of a categorical variable, in this case comparing life expectancy between different continents, using a boxplot over a faceted histogram as we can make quick comparisons with single horizontal lines. For example, we can see that even the country with the highest life expectancy in Africa is still lower than all countries in Oceania. It’s important to remember however that the solid lines in the middle of the boxes correspond to the medians (i.e. the middle value) rather than the mean (the average). So, for example, if you look at Asia, the solid line denotes the median life expectancy of around 72 years, indicating to us that half of all countries in Asia have a life expectancy below 72 years whereas half of all countries in Asia have a life expectancy above 72 years. Furthermore, note that: Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes). Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand. Now, let’s start making comparisons of life expectancy between continents. Let’s use Africa as a baseline for comparsion. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa: The median life expectancy of the Americas is roughly 20 years greater. The median life expectancy of Asia is roughly 20 years greater. The median life expectancy of Europe is roughly 25 years greater. The median life expectancy of Oceania is roughly 27.8 years greater. Let’s remember these four differences vs Africa corresponding to the Americas, Asia, Europe, and Oceania: 20, 20, 25, 27.8. Learning check (LC6.3) Conduct a new exploratory data analysis with the same explanatory variable \\(x\\) being continent but with gdpPercap as the new outcome variable \\(y\\). Remember, this involves three things: Looking at the raw values Computing summary statistics of the variables of interest. Creating informative visualizations What can you say about the differences in GDP per capita between continents based on this exploration? 6.2.2 Linear regression In Subsection 6.1.2 we introduced simple linear regression, which involves modeling the relationship between a numerical outcome variable \\(y\\) as a function of a numerical explanatory variable \\(x\\), in our life expectancy example, we now have a categorical explanatory variable \\(x\\) continent. While we still can fit a regression model, given our categorical explanatory variable we no longer have a concept of a “best-fitting” line, but rather “differences relative to a baseline for comparison.” Before we fit our regression model, let’s create a table similar to Table 6.6, but Report the mean life expectancy for each continent. Report the difference in mean life expectancy relative to Africa’s mean life expectancy of 54.806 in the column “mean vs Africa”; this column is simply the “mean” column minus 54.806. Think back to your observations from the eyeball test of Figure 6.11 at the end of the last subsection. The column “mean vs Africa” is the same idea of comparing a summary statistic to a baseline for comparison, in this case the countries of Africa, but using means instead of medians. TABLE 6.7: Mean life expectancy by continent continent mean mean vs Africa Africa 54.8 0.0 Americas 73.6 18.8 Asia 70.7 15.9 Europe 77.6 22.8 Oceania 80.7 25.9 Now, let’s use the get_regression_table() function we introduced in Section 6.1.2 to get the regression table for gapminder2007 analysis: lifeExp_model &lt;- lm(lifeExp ~ continent, data = gapminder2007) get_regression_table(lifeExp_model) TABLE 6.8: Linear regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 54.8 1.02 53.45 0 52.8 56.8 continentAmericas 18.8 1.80 10.45 0 15.2 22.4 continentAsia 15.9 1.65 9.68 0 12.7 19.2 continentEurope 22.8 1.70 13.47 0 19.5 26.2 continentOceania 25.9 5.33 4.86 0 15.4 36.5 Just as before, we have the term and estimates columns of interest, but unlike before, we now have 5 rows corresponding to 5 outputs in our table: an intercept like before, but also continentAmericas, continentAsia, continentEurope, and continentOceania. What are these values? First, we must describe the equation for fitted value \\(\\widehat{y}\\), which is a little more complicated when the \\(x\\) explanatory variable is categorical: \\[\\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x) \\end{align}\\] Let’s break this down. First, \\(\\mathbb{1}_{A}(x)\\) is what’s known in mathematics as an “indicator function” that takes one of two possible values: \\[ \\mathbb{1}_{A}(x) = \\left\\{ \\begin{array}{ll} 1 &amp; \\text{if } x \\text{ is in } A \\\\ 0 &amp; \\text{if } \\text{otherwise} \\end{array} \\right. \\] In a statistical modeling context this is also known as a “dummy variable”. In our case, let’s consider the first such indicator variable: \\[ \\mathbb{1}_{\\mbox{Amer}}(x) = \\left\\{ \\begin{array}{ll} 1 &amp; \\text{if } \\text{country } x \\text{ is in the Americas} \\\\ 0 &amp; \\text{otherwise}\\end{array} \\right. \\] Now let’s interpret the terms in the estimate column of the regression table. First \\(b_0 =\\) intercept = 54.8 corresponds to the mean life expectancy for countries in Africa, since for country \\(x\\) in Africa we have the following equation: \\[\\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 0 + 15.9\\cdot 0 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 \\end{align}\\] i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table 6.7. Next, \\(b_{\\text{Amer}}\\) = continentAmericas = 18.8 is the difference in mean life expectancies of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is: \\[\\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 1 + 15.9\\cdot 0 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 + 18.8\\\\ &amp;= 72.9 \\end{align}\\] i.e. in this case, only the indicator function \\(\\mathbb{1}_{\\mbox{Amer}}(x)\\) is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table 6.7. Similarly, \\(b_{\\text{Asia}}\\) = continentAsia = 15.9 is the difference in mean life expectancies of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is: \\[\\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 0 + 15.9\\cdot 1 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 + 15.9\\\\ &amp;= 70.7 \\end{align}\\] i.e. in this case, only the indicator function \\(\\mathbb{1}_{\\mbox{Asia}}(x)\\) is equal to 1, but all others are 0. Recall that 70.7 corresponds to the group mean life expectancy for all countries in Asia in Table 6.7. The same logic applies to \\(b_{\\text{Euro}} = 22.8\\) and \\(b_{\\text{Ocean}} = 25.9\\); they correspond to the “offset” in mean life expectancy for countries in Europe and Oceania, relative to the mean life expectancy of the baseline group for comparison of African countries. Let’s generalize this idea a bit. If we fit a linear regression model using a categorical explanatory variable \\(x\\) that has \\(k\\) levels, a regression model will return an intercept and \\(k - 1\\) “slope” coefficients. When \\(x\\) is a numerical explanatory variable the interpretation is of a “slope” coefficient, but when \\(x\\) is categorical the meaning is a little trickier. They are offsets relative to the baseline. In our case, since there are \\(k = 5\\) continents, the regression model returns an intercept corresponding to the baseline for comparison Africa and \\(k - 1 = 4\\) slope coefficients corresponding to the Americas, Asia, Europe, and Oceania. Africa was chosen as the baseline by R for no other reason than it is first alphabetically of the 5 continents. You can manually specify which continent to use as baseline instead of the default choice of whichever comes first alphabetically, but we leave that to a more advanced course. (The forcats package is particularly nice for doing this and we encourage you to explore using it.) Learning check (LC6.4) Fit a new linear regression using lm(gdpPercap ~ continent, data = gapminder2007) where gdpPercap is the new outcome variable \\(y\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 6.2.3 Observed/fitted values and residuals Recall in Subsection 6.1.3 when we had a numerical explanatory variable \\(x\\), we defined: Observed values \\(y\\), or the observed value of the outcome variable Fitted values \\(\\widehat{y}\\), or the value on the regression line for a given \\(x\\) value Residuals \\(y - \\widehat{y}\\), or the error between the observed value and the fitted value What do fitted values \\(\\widehat{y}\\) and residuals \\(y - \\widehat{y}\\) correspond to when the explanatory variable \\(x\\) is categorical? Let’s investigate these values for the first 10 countries in the gapminder2007 dataset: TABLE 6.9: First 10 out of 142 countries country continent lifeExp gdpPercap Afghanistan Asia 43.8 975 Albania Europe 76.4 5937 Algeria Africa 72.3 6223 Angola Africa 42.7 4797 Argentina Americas 75.3 12779 Australia Oceania 81.2 34435 Austria Europe 79.8 36126 Bahrain Asia 75.6 29796 Bangladesh Asia 64.1 1391 Belgium Europe 79.4 33693 Recall the get_regression_points() function we used in Subsection 6.1.3 to return the observed value of the outcome variable, all explanatory variables, fitted values, and residuals for all points in the regression. Recall that each “point”. In this case, each row corresponds to one of 142 countries in the gapminder2007 dataset. They are also the 142 observations used to construct the boxplots in Figure 6.11. regression_points &lt;- get_regression_points(lifeExp_model) regression_points TABLE 6.10: Regression points (First 10 out of 142 countries) ID lifeExp continent lifeExp_hat residual 1 43.8 Asia 70.7 -26.900 2 76.4 Europe 77.6 -1.226 3 72.3 Africa 54.8 17.495 4 42.7 Africa 54.8 -12.075 5 75.3 Americas 73.6 1.712 6 81.2 Oceania 80.7 0.515 7 79.8 Europe 77.6 2.180 8 75.6 Asia 70.7 4.907 9 64.1 Asia 70.7 -6.666 10 79.4 Europe 77.6 1.792 Notice The fitted values lifeExp_hat \\(\\widehat{\\text{lifeexp}}\\). Countries in Africa have the same fitted value of 54.8, which is the mean life expectancy of Africa. Countries in Asia have the same fitted value of 70.7, which is the mean life expectancy of Asia. This similarly holds for countries in the Americas, Europe, and Oceania. The residual column is simply \\(y - \\widehat{y}\\) = lifeexp - lifeexp_hat. These values can be interpreted as that particular country’s deviation from the mean life expectancy of the respective continent’s mean. For example, the first row of this dataset corresponds to Afghanistan, and the residual of \\(-26.9 = 43.8 - 70.7\\) is Afghanistan’s mean life expectancy minus the mean life expectancy of all Asian countries. 6.3 Related topics 6.3.1 Correlation coefficient Let’s re-plot Figure 6.1, but now consider a broader range of correlation coefficient values in Figure 6.12. FIGURE 6.12: Different Correlation Coefficients As we suggested in Subsection 6.1.1, interpreting coefficients that are not close to the extreme values of -1 and 1 can be subjective. To develop your sense of correlation coefficients, we suggest you play the following 80’s-style video game called “Guess the correlation”! Click on the image below to do so: 6.3.2 Correlation is not necessarily causation You’ll note throughout this chapter we’ve been very cautious in making statements of the “associated effect” of explanatory variables on the outcome variables, for example our statement from Subsection 6.1.2 that “for every increase of 1 unit in bty_avg, there is an associated increase of, on average, 18.802 units of score.” We stay this because we are careful not to make causal statements. So while beauty score bty_avg is positively correlated with teaching score, does it directly cause effects on teaching score. For example, let’s say an instructor has their bty_avg reevaluated, but only after taking steps to try to boost their beauty score. Does this mean that they will suddenly be a better instructor? Or will they suddenly get higher teaching scores? Maybe? Here is another example, a not-so-great medical doctor goes through their medical records and finds that patients who slept with their shoes on tended to wake up more with headaches. So this doctor declares “Sleeping with shoes on cause headaches!” FIGURE 6.13: Does sleeping with shoes on cause headaches? However as some of you might have guessed, if someone is sleeping with their shoes on its probably because they are intoxicated. Furthermore, drinking more tends to cause more hangovers, and hence more headaches. In this instance, alcohol is what’s known as a confounding/lurking variable. It “lurks” behind the scenes, confounding or making less apparent, the causal effect (if any) of “sleeping with shoes on” with waking up with a headache. We can summarize this notion in Figure 6.14 with a causal graph where: Y: Is an outcome variable, here “waking up with a headache.” X: Is a treatment variable whose causal effect we are interested in, here “sleeping with shoes on.” FIGURE 6.14: Causal graph. So for example, many such studies use regression modeling where the outcome variable is set to Y and the explanatory/predictor variable is X, much as you’ve started learning how to do in this chapter. However, Figure 6.14 also includes a third variable with arrows pointing at both X and Y. Z: Is a confounding variable that affects both X &amp; Y, thus “confounding” their relationship. So as we said, alcohol will both cause people to be more likely to sleep with their shoes on as well as more likely to wake up with a headache. Thus when evaluating what causes one to wake up with a headache, its hard to tease out the effect of sleeping with shoes on versus just the alcohol. Thus our model needs to also use Z as an explanatory/predictor variable as well, in other words our doctor needs to take into account who had been drinking the night before. We’ll start covering multiple regression models that allows us to incorporate more than one variable in the next chapter. Establishing causation is a tricky problem and frequently takes either carefully designed experiments or methods to control for the effects of potential confounding variables. Both these approaches attempt either to remove all confounding variables or take them into account as best they can, and only focus on the behavior of an outcome variable in the presence of the levels of the other variable(s). Be careful as you read studies to make sure that the writers aren’t falling into this fallacy of correlation implying causation. If you spot one, you may want to send them a link to Spurious Correlations. 6.3.3 Best fitting line Regression lines are also known as “best fitting lines”. But what do we mean by best? Let’s unpack the criteria that is used by regression to determine best. Recall the plot in Figure 6.8 where for a instructor with a beauty average score of \\(x=7.333\\) The observed value \\(y=4.9\\) was marked with a red circle The fitted value \\(\\widehat{y} = 4.369\\) on the regression line was marked with a red square The residual \\(y-\\widehat{y} = 4.9-4.369 = 0.531\\) was the length of the blue arrow. Let’s do this for another arbitrarily chosen instructor whose beauty score was \\(x=2.333\\). The residual in this case is \\(2.7 - 4.036 = -1.336\\). Another arbitrarily chosen instructor whose beauty score was \\(x=3.667\\) results in the residual in this case being \\(4.4 - 4.125 = 0.2753\\). Let’s do this one more time for another arbitrarily chosen instructor. This instructor had a beauty score of \\(x = 6\\). The residual in this case is \\(3.8 - 4.28 = -0.4802\\). Now let’s say we repeated this process for all 463 instructors in our dataset. Regression minimizes the sum of all 463 arrow lengths squared. In other words, it minimizes the sum of the squared residuals: \\[ \\sum_{i=1}^{n}(y_i - \\widehat{y}_i)^2 \\] We square the arrow lengths so that positive and negative deviations of the same amount are treated equally. That’s why alternative names for the simple linear regression line are the least-squares line and the best fitting line. It can be proven via calculus and linear algebra that this line uniquely minimizes the sum of the squared arrow lengths. For the regression line in the plot, the sum of the squared residuals is 131.879. This is the lowest possible value of the sum of the squared residuals of all possible lines we could draw on this scatterplot? How do we know this? We can mathematically prove this fact, but this requires some calculus and linear algebra, so let’s leave this proof for another course! 6.3.4 get_regression_x() functions What is going on behind the scenes with the get_regression_table() get_regression_points() from the moderndive package? Recall we introduced In Subsection 6.1.2, the get_regression_table() function that returned a regression table. In Subsection 6.1.3, the get_regression_points() function that returned information on all \\(n\\) points/observations involved in a regression? and that these were examples of wrapper functions that takes other pre-existing functions and “wraps” them in a single function. This way all the user needs to worry about is the input and the output format, and ignore what’s “under the hood.” In this subsection we “lift the hood” and see how the engine of these wrapper functions work. First, the get_regression_table() wrapper function leverages the the tidy() function in the broom package and the clean_names() function in the janitor package to generate tidy data frames with information about a regression model. Here is what the regression table from Subsection 6.1.2 looks like: score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) get_regression_table(score_model) term estimate std_error statistic p_value lower_ci upper_ci intercept 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 The get_regression_table() function takes the above two functions that already existed in other R packages, uses them, and hides the details as seen below. This was on the editorial decision on our part as we felt the following code was unfortunately out of the reach for some new coders, so the following wrapper function was written so that users need only focus on the output. library(broom) library(janitor) score_model %&gt;% tidy(conf.int = TRUE) %&gt;% mutate_if(is.numeric, round, digits = 3) %&gt;% clean_names() %&gt;% rename(lower_ci = conf_low, upper_ci = conf_high) term estimate std_error statistic p_value lower_ci upper_ci (Intercept) 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 Note that the mutate_if() function is from the dplyr package and applies the round() function with 3 significant digits precision only to those variables that are numerical. Similarly, the second get_regression_points() function is another wrapper function, but this time returning information about the points in a regression rather than the regression table. It uses the augment() function in the broom package instead of tidy() as with get_regression_points(). library(broom) library(janitor) score_model %&gt;% augment() %&gt;% mutate_if(is.numeric, round, digits = 3) %&gt;% clean_names() %&gt;% select(-c(&quot;se_fit&quot;, &quot;hat&quot;, &quot;sigma&quot;, &quot;cooksd&quot;, &quot;std_resid&quot;)) score bty_avg fitted resid 4.7 5.00 4.21 0.486 4.1 5.00 4.21 -0.114 3.9 5.00 4.21 -0.314 4.8 5.00 4.21 0.586 4.6 3.00 4.08 0.520 4.3 3.00 4.08 0.220 2.8 3.00 4.08 -1.280 4.1 3.33 4.10 -0.002 3.4 3.33 4.10 -0.702 4.5 3.17 4.09 0.409 In this case, it outputs only variables of interest to us as new regression modelers: the outcome variable \\(y\\) (score), all explanatory/predictor variables (bty_avg), all resulting fitted values \\(\\hat{y}\\) used by applying the equation of the regression line to bty_avg, and the residual \\(y - \\hat{y}\\). If you’re even more curious, take a look at the source code for these functions on GitHub. 6.4 Conclusion In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter 7, we’ll study multiple regression where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections 11.4.1 and 11.4.2. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, lower_ci and upper_ci (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in! 6.4.1 Script of R code An R script file of all R code used in this chapter is available here. "],
-["7-multiple-regression.html", "Chapter 7 Multiple Regression 7.1 Two numerical explanatory variables 7.2 One numerical &amp; one categorical explanatory variable 7.3 Related topics 7.4 Conclusion", " Chapter 7 Multiple Regression In Chapter 6 we introduced ideas related to modeling, in particular that the fundamental premise of modeling is to make explicit the relationship between an outcome variable \\(y\\) and an explanatory/predictor variable \\(x\\). Recall further the synonyms that we used to also denote \\(y\\) as the dependent variable and \\(x\\) as an independent variable or covariate. There are many modeling approaches one could take, among the most well-known being linear regression, which was the focus of the last chapter. Whereas in the last chapter we focused solely on regression scenarios where there is only one explanatory/predictor variable, in this chapter, we now focus on modeling scenarios where there is more than one. This case of regression more than one explanatory variable is known as multiple regression. You can imagine when trying to model a particular outcome variable, like teaching evaluation score as in Section 6.1 or life expectancy as in Section 6.2, it would be very useful to incorporate more than one explanatory variable. Since our regression models will now consider more than one explanatory/predictor variable, the interpretation of the associated effect of any one explanatory/predictor variables must be made in conjunction with the others. For example, say we are modeling individuals’ incomes as a function of their number of years of education and their parents’ wealth. When interpreting the effect of education on income, one has to consider the effect of their parents’ wealth at the same time, as these two variables are almost certainly related. Make note of this throughout this chapter and as you work on interpreting the results of multiple regression models into the future. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(ISLR) library(skimr) DataCamp The approach taken below of using more than one variable of information in models using multiple regression is identical to that taken in ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression.” 7.1 Two numerical explanatory variables Let’s now attempt to identify factors that are associated with how much credit card debt an individual will have. The textbook An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani is an intermediate-level textbook on statistical and machine learning freely available here. It has an accompanying R package called ISLR with datasets that the authors use to demonstrate various machine learning methods. One dataset that is frequently used by the authors is the Credit dataset where predictions are made on the credit card balance held by \\(n = 400\\) credit card holders. These predictions are based on information about them like income, credit limit, and education level. Note that this dataset is not based on actual individuals, it is a simulated dataset used for educational purposes. Since no information was provided as to who these \\(n\\) = 400 individuals are and how they came to be included in this dataset, it will be hard to make any scientific claims based on this data. Recall our discussion from the previous chapter that correlation does not necessarily imply causation. That being said, we’ll still use Credit to demonstrate multiple regression with: A numerical outcome variable \\(y\\), in this case credit card balance. Two explanatory variables: A first numerical explanatory variable \\(x_1\\). In this case, their credit limit. A second numerical explanatory variable \\(x_2\\). In this case, their income (in thousands of dollars). In the forthcoming Learning Checks, we’ll consider a different scenario: The same numerical outcome variable \\(y\\): credit card balance. Two new explanatory variables: A first numerical explanatory variable \\(x_1\\): their credit rating. A second numerical explanatory variable \\(x_2\\): their age. 7.1.1 Exploratory data analysis Let’s load the Credit data and select() only the needed subset of variables. library(ISLR) Credit &lt;- Credit %&gt;% select(Balance, Limit, Income, Rating, Age) Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function. Although in Table 7.1 we only show 5 randomly selected credit card holders out of 400: View(Credit) TABLE 7.1: Random sample of 5 credit card holders Balance Limit Income Rating Age 119 0 2161 27.0 173 40 41 50 3327 35.0 253 54 308 0 3874 75.4 298 41 399 0 2525 37.7 192 44 296 0 1389 27.3 149 67 glimpse(Credit) Observations: 400 Variables: 5 $ Balance &lt;int&gt; 333, 903, 580, 964, 331, 1151, 203, 872, 279, 1350, 1407, 0, … $ Limit &lt;int&gt; 3606, 6645, 7075, 9504, 4897, 8047, 3388, 7114, 3300, 6819, 8… $ Income &lt;dbl&gt; 14.9, 106.0, 104.6, 148.9, 55.9, 80.2, 21.0, 71.4, 15.1, 71.1… $ Rating &lt;int&gt; 283, 483, 514, 681, 357, 569, 259, 512, 266, 491, 589, 138, 3… $ Age &lt;int&gt; 34, 82, 71, 36, 68, 77, 37, 87, 66, 41, 30, 64, 57, 49, 75, 5… Let’s look at some summary statistics, again using the skim() function from the skimr package: Credit %&gt;% select(Balance, Limit, Income) %&gt;% skim() Skim summary statistics n obs: 400 n variables: 3 ── Variable type:integer ───── variable missing complete n mean sd p0 p25 p50 p75 p100 Balance 0 400 400 520.01 459.76 0 68.75 459.5 863 1999 Limit 0 400 400 4735.6 2308.2 855 3088 4622.5 5872.75 13913 hist ▇▃▃▃▂▁▁▁ ▅▇▇▃▂▁▁▁ ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 Income 0 400 400 45.22 35.24 10.35 21.01 33.12 57.47 186.63 hist ▇▃▂▁▁▁▁▁ We observe for example: The mean and median credit card balance are $520.01 and $459.50 respectively. 25% of card holders had debts of $68.75 or less. The mean and median credit card limit are $4735.6 and $4622.50 respectively. 75% of these card holders had incomes of $57,470 or less. Since our outcome variable Balance and the explanatory variables Limit and Income are numerical, we can compute the correlation coefficient between pairs of these variables. First, we could run the get_correlation() command as seen in Subsection 6.1.1 twice, once for each explanatory variable: Credit %&gt;% get_correlation(Balance ~ Limit) Credit %&gt;% get_correlation(Balance ~ Income) Or we can simultaneously compute them by returning a correlation matrix in Table 7.2. We can read off the correlation coefficient for any pair of variables by looking them up in the appropriate row/column combination. Credit %&gt;% select(Balance, Limit, Income) %&gt;% cor() TABLE 7.2: Correlations between credit card balance, credit limit, and income Balance Limit Income Balance 1.000 0.862 0.464 Limit 0.862 1.000 0.792 Income 0.464 0.792 1.000 For example, the correlation coefficient of: Balance with itself is 1 as we would expect based on the definition of the correlation coefficient. Balance with Limit is 0.862. This indicates a strong positive linear relationship, which makes sense as only individuals with large credit limits can accrue large credit card balances. Balance with Income is 0.464. This is suggestive of another positive linear relationship, although not as strong as the relationship between Balance and Limit. As an added bonus, we can read off the correlation coefficient of the two explanatory variables, Limit and Income of 0.792. In this case, we say there is a high degree of collinearity between these two explanatory variables. Collinearity (or multicollinearity) is a phenomenon in which one explanatory variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. So in this case, if we knew someone’s credit card Limit and since Limit and Income are highly correlated, we could make a fairly accurate guess as to that person’s Income. Or put loosely, these two variables provided redundant information. For now let’s ignore any issues related to collinearity and press on. Let’s visualize the relationship of the outcome variable with each of the two explanatory variables in two separate plots: ggplot(Credit, aes(x = Limit, y = Balance)) + geom_point() + labs(x = &quot;Credit limit (in $)&quot;, y = &quot;Credit card balance (in $)&quot;, title = &quot;Relationship between balance and credit limit&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) ggplot(Credit, aes(x = Income, y = Balance)) + geom_point() + labs(x = &quot;Income (in $1000)&quot;, y = &quot;Credit card balance (in $)&quot;, title = &quot;Relationship between balance and income&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) FIGURE 7.1: Relationship between credit card balance and credit limit/income First, there is a positive relationship between credit limit and balance, since as credit limit increases so also does credit card balance; this is to be expected given the strongly positive correlation coefficient of 0.862. In the case of income, the positive relationship doesn’t appear as strong, given the weakly positive correlation coefficient of 0.464. However the two plots in Figure 7.1 only focus on the relationship of the outcome variable with each of the explanatory variables independently. To get a sense of the joint relationship of all three variables simultaneously through a visualization, let’s display the data in a 3-dimensional (3D) scatterplot, where The numerical outcome variable \\(y\\) Balance is on the z-axis (vertical axis) The two numerical explanatory variables form the “floor” axes. In this case The first numerical explanatory variable \\(x_1\\) Income is on of the floor axes. The second numerical explanatory variable \\(x_2\\) Limit is on the other floor axis. Click on the following image to open an interactive 3D scatterplot in your browser: Previously in Figure 6.6, we plotted a “best-fitting” regression line through a set of points where the numerical outcome variable \\(y\\) was teaching score and a single numerical explanatory variable \\(x\\) was bty_avg. What is the analogous concept when we have two numerical predictor variables? Instead of a best-fitting line, we now have a best-fitting plane, which is a 3D generalization of lines which exist in 2D. Click here to open an interactive plot of the regression plane shown below in your browser. Move the image around, zoom in, and think about how this plane generalizes the concept of a linear regression line to three dimensions. FIGURE 7.2: Regression plane Learning check (LC7.1) Conduct a new exploratory data analysis with the same outcome variable \\(y\\) being Balance but with Rating and Age as the new explanatory variables \\(x_1\\) and \\(x_2\\). Remember, this involves three things: Looking at the raw values Computing summary statistics of the variables of interest. Creating informative visualizations What can you say about the relationship between a credit card holder’s balance and their credit rating and age? 7.1.2 Multiple regression Just as we did when we had a single numerical explanatory variable \\(x\\) in Subsection 6.1.2 and when we had a single categorical explanatory variable \\(x\\) in Subsection 6.2.2, we fit a regression model and obtained the regression table in our two numerical explanatory variable scenario. To fit a regression model and get a table using get_regression_table(), we now use a + to consider multiple explanatory variables. In this case since we want to perform a regression of Limit and Income simultaneously, we input Balance ~ Limit + Income. Balance_model &lt;- lm(Balance ~ Limit + Income, data = Credit) get_regression_table(Balance_model) TABLE 7.3: Multiple regression table term estimate std_error statistic p_value lower_ci upper_ci intercept -385.179 19.465 -19.8 0 -423.446 -346.912 Limit 0.264 0.006 45.0 0 0.253 0.276 Income -7.663 0.385 -19.9 0 -8.420 -6.906 How do we interpret these three values that define the regression plane? Intercept: -$385.18 (rounded to two decimal points to represent cents). The intercept in our case represents the credit card balance for an individual who has both a credit Limit of $0 and Income of $0. In our data however, the intercept has limited practical interpretation as no individuals had Limit or Income values of $0 and furthermore the smallest credit card balance was $0. Rather, it is used to situate the regression plane in 3D space. Limit: $0.26. Now that we have multiple variables to consider, we have to add a caveat to our interpretation: taking all other variables in our model into account, for every increase of one unit in credit Limit (dollars), there is an associated increase of on average $0.26 in credit card balance. Note: Just as we did in Subsection 6.1.2, we are not making any causal statements, only statements relating to the association between credit limit and balance We need to preface our interpretation of the associated effect of Limit with the statement “taking all other variables into account”, in this case Income, to emphasize that we are now jointly interpreting the associated effect of multiple explanatory variables in the same model and not in isolation. Income: -$7.66. Similarly, taking all other variables into account, for every increase of one unit in Income (in other words, $1000 in income), there is an associated decrease of on average $7.66 in credit card balance. However, recall in Figure 7.1 that when considered separately, both Limit and Income had positive relationships with the outcome variable Balance. As card holders’ credit limits increased their credit card balances tended to increase as well, and a similar relationship held for incomes and balances. In the above multiple regression, however, the slope for Income is now -7.66, suggesting a negative relationship between income and credit card balance. What explains these contradictory results? This is known as Simpson’s Paradox, a phenomenon in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. We expand on this in Subsection 7.3.2 where we’ll look at the relationship between credit Limit and credit card balance but split by different income bracket groups. Learning check (LC7.2) Fit a new simple linear regression using lm(Balance ~ Rating + Age, data = Credit) where Rating and Age are the new numerical explanatory variables \\(x_1\\) and \\(x_2\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 7.1.3 Observed/fitted values and residuals As we did previously in Table 7.4, let’s unpack the output of the get_regression_points() function for our model for credit card balance for all 400 card holders in the dataset. Recall that each card holder corresponds to one of the 400 rows in the Credit data frame and also for one of the 400 3D points in the 3D scatterplots in Subsection 7.1.1. regression_points &lt;- get_regression_points(Balance_model) regression_points TABLE 7.4: Regression points (first 5 rows of 400) ID Balance Limit Income Balance_hat residual 1 333 3606 14.9 454 -120.8 2 903 6645 106.0 559 344.3 3 580 7075 104.6 683 -103.4 4 964 9504 148.9 986 -21.7 5 331 4897 55.9 481 -150.0 Recall the format of the output: Balance corresponds to \\(y\\) (the observed value) Balance_hat corresponds to \\(\\widehat{y}\\) (the fitted value) residual corresponds to \\(y - \\widehat{y}\\) (the residual) 7.2 One numerical &amp; one categorical explanatory variable Let’s revisit the instructor evaluation data introduced in Section 6.1, where we studied the relationship between instructor evaluation scores and their beauty scores. This analysis suggested that there is a positive relationship between bty_avg and score, in other words as instructors had higher beauty scores, they also tended to have higher teaching evaluation scores. Now let’s say instead of bty_avg we are interested in the numerical explanatory variable \\(x_1\\) age and furthermore we want to use a second explanatory variable \\(x_2\\), the (binary) categorical variable gender. Note: This study only focused on the gender binary of &quot;male&quot; or &quot;female&quot; when the data was collected and analyzed years ago. It has been tradition to use gender as an “easy” binary variable in the past in statistical analyses. We have chosen to include it here because of the interesting results of the study, but we also understand that a segment of the population is not included in this dichotomous assignment of gender and we advocate for more inclusion in future studies to show representation of groups that do not identify with the gender binary. We now resume our analyses using this evals data and hope that others find these results interesting and worth further exploration. Our modeling scenario now becomes A numerical outcome variable \\(y\\). As before, instructor evaluation score. Two explanatory variables: A numerical explanatory variable \\(x_1\\): in this case, their age. A categorical explanatory variable \\(x_2\\): in this case, their binary gender. 7.2.1 Exploratory data analysis Let’s reload the evals data and select() only the needed subset of variables. Note that these are different than the variables chosen in Chapter 6. Let’s given this the name evals_ch7. evals_ch7 &lt;- evals %&gt;% select(score, age, gender) Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function, although in Table 7.5 we only show 5 randomly selected instructors out of 463: View(evals_ch7) TABLE 7.5: Random sample of 5 instructors score age gender 3.6 34 male 4.9 43 male 3.3 47 male 4.4 33 female 4.7 60 male Let’s look at some summary statistics using the skim() function from the skimr package: evals_ch7 %&gt;% skim() Skim summary statistics n obs: 463 n variables: 3 ── Variable type:factor ────── variable missing complete n n_unique top_counts ordered gender 0 463 463 2 mal: 268, fem: 195, NA: 0 FALSE ── Variable type:integer ───── variable missing complete n mean sd p0 p25 p50 p75 p100 hist age 0 463 463 48.37 9.8 29 42 48 57 73 ▅▅▅▇▅▇▂▁ ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 hist score 0 463 463 4.17 0.54 2.3 3.8 4.3 4.6 5 ▁▁▂▃▅▇▇▆ Furthermore, let’s compute the correlation between two numerical variables we have score and age. Recall that correlation coefficients only exist between numerical variables. We observe that they are weakly negatively correlated. evals_ch7 %&gt;% get_correlation(formula = score ~ age) # A tibble: 1 x 1 correlation &lt;dbl&gt; 1 -0.107 In Figure 7.3, we plot a scatterplot of score over age. Given that gender is a binary categorical variable in this study, we can make some interesting tweaks: We can assign a color to points from each of the two levels of gender: female and male. Furthermore, the geom_smooth(method = &quot;lm&quot;, se = FALSE) layer automatically fits a different regression line for each since we have provided color = gender at the top level in ggplot(). This allows for all geom_etries that follow to have the same mapping of aes()thetics to variables throughout the plot. ggplot(evals_ch7, aes(x = age, y = score, color = gender)) + geom_jitter() + labs(x = &quot;Age&quot;, y = &quot;Teaching Score&quot;, color = &quot;Gender&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) FIGURE 7.3: Instructor evaluation scores at UT Austin split by gender (jittered) We notice some interesting trends: There are almost no women faculty over the age of 60. We can see this by the lack of red dots above 60. Fitting separate regression lines for men and women, we see they have different slopes. We see that the associated effect of increasing age seems to be much harsher for women than men. In other words, as women age, the drop in their teaching score appears to be faster. 7.2.2 Multiple regression: Parallel slopes model Much like we started to consider multiple explanatory variables using the + sign in Subsection 7.1.2, let’s fit a regression model and get the regression table. This time we provide the name of score_model_2 to our regression model fit, in so as to not overwrite the model score_model from Section 6.1.2. score_model_2 &lt;- lm(score ~ age + gender, data = evals_ch7) get_regression_table(score_model_2) TABLE 7.6: Regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 4.484 0.125 35.79 0.000 4.238 4.730 age -0.009 0.003 -3.28 0.001 -0.014 -0.003 gendermale 0.191 0.052 3.63 0.000 0.087 0.294 The modeling equation for this scenario is: \\[\\begin{align} \\widehat{y} &amp;= b_0 + b_1 \\cdot x_1 + b_2 \\cdot x_2 \\\\ \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\end{align}\\] where \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) is an indicator function for sex == male. In other words, \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals one if the current observation corresponds to a male professor, and 0 if the current observation corresponds to a female professor. This model can be visualized in Figure 7.4. FIGURE 7.4: Instructor evaluation scores at UT Austin by gender: same slope We see that: Females are treated as the baseline for comparison for no other reason than “female” is alphabetically earlier than “male.” The \\(b_{male} = 0.1906\\) is the vertical “bump” that men get in their teaching evaluation scores. Or more precisely, it is the average difference in teaching score that men get relative to the baseline of women. Accordingly, the intercepts (which in this case make no sense since no instructor can have an age of 0) are : for women: \\(b_0\\) = 4.484 for men: \\(b_0 + b_{male}\\) = 4.484 + 0.191 = 4.675 Both men and women have the same slope. In other words, in this model the associated effect of age is the same for men and women. So for every increase of one year in age, there is on average an associated change of \\(b_{age}\\) = -0.009 (a decrease) in teaching score. But wait, why is Figure 7.4 different than Figure 7.3! What is going on? What we have in the original plot is known as an interaction effect between age and gender. Focusing on fitting a model for each of men and women, we see that the resulting regression lines are different. Thus, gender appears to interact in different ways for men and women with the different values of age. 7.2.3 Multiple regression: Interaction model We say a model has an interaction effect if the associated effect of one variable depends on the value of another variable. These types of models usually prove to be tricky to view on first glance because of their complexity. In this case, the effect of age will depend on the value of gender. Put differently, the effect of age on teaching scores will differ for men and for women, as was suggested by the different slopes for men and women in our visual exploratory data analysis in Figure 7.3. Let’s fit a regression with an interaction term. Instead of using the + sign in the enumeration of explanatory variables, we use the * sign. Let’s fit this regression and save it in score_model_3, then we get the regression table using the get_regression_table() function as before. score_model_interaction &lt;- lm(score ~ age * gender, data = evals_ch7) get_regression_table(score_model_interaction) TABLE 7.7: Regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 4.883 0.205 23.80 0.000 4.480 5.286 age -0.018 0.004 -3.92 0.000 -0.026 -0.009 gendermale -0.446 0.265 -1.68 0.094 -0.968 0.076 age:gendermale 0.014 0.006 2.45 0.015 0.003 0.024 The modeling equation for this scenario is: \\[\\begin{align} \\widehat{y} &amp;= b_0 + b_1 \\cdot x_1 + b_2 \\cdot x_2 + b_3 \\cdot x_1 \\cdot x_2\\\\ \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\end{align}\\] Oof, that’s a lot of rows in the regression table output and a lot of terms in the model equation. The fourth term being added on the right hand side of the equation corresponds to the interaction term. Let’s simplify things by considering men and women separately. First, recall that \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals 1 if a particular observation (or row in evals_ch7) corresponds to a male instructor. In this case, using the values from the regression table the fitted value of \\(\\widehat{\\mbox{score}}\\) is: \\[\\begin{align} \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot 1 + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot 1 \\\\ &amp;= \\left(b_0 + b_{\\mbox{male}}\\right) + \\left(b_{\\mbox{age}} + b_{\\mbox{age,male}} \\right) \\cdot \\mbox{age} \\\\ &amp;= \\left(4.883 + -0.446\\right) + \\left(-0.018 + 0.014 \\right) \\cdot \\mbox{age} \\\\ &amp;= 4.437 -0.004 \\cdot \\mbox{age} \\end{align}\\] Second, recall that \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals 0 if a particular observation corresponds to a female instructor. Again, using the values from the regression table the fitted value of \\(\\widehat{\\mbox{score}}\\) is: \\[\\begin{align} \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot 0 + b_{\\mbox{age,male}}\\mbox{age} \\cdot 0 \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age}\\\\ &amp;= 4.883 -0.018 \\cdot \\mbox{age} \\end{align}\\] Let’s summarize these values in a table: TABLE 7.8: Comparison of male and female intercepts and age slopes Gender Intercept Slope for age Male instructors 4.44 -0.004 Female instructors 4.88 -0.018 We see that while male instructors have a lower intercept, as they age, they have a less steep associated average decrease in teaching scores: 0.004 teaching score units per year as opposed to -0.018 for women. This is consistent with the different slopes and intercepts of the red and blue regression lines fit in Figure 7.3. Recall our definition of a model having an interaction effect: when the associated effect of one variable, in this case age, depends on the value of another variable, in this case gender. But how do we know when it’s appropriate to include an interaction effect? For example, which is the more appropriate model? The regular multiple regression model without an interaction term we saw in Section 7.2.2 or the multiple regression model with the interaction term we just saw? We’ll revisit this question in Chapter 11 on “inference for regression.” 7.2.4 Observed/fitted values and residuals Now say we want to apply the above calculations for male and female instructors for all 463 instructors in the evals_ch7 dataset. As our multiple regression models get more and more complex, computing such values by hand gets more and more tedious. The get_regression_points() function spares us this tedium and returns all fitted values and all residuals. For simplicity, let’s focus only on the fitted interaction model, which is saved in score_model_interaction. regression_points &lt;- get_regression_points(score_model_interaction) regression_points TABLE 7.9: Regression points (first 5 rows of 463) ID score age gender score_hat residual 1 4.7 36 female 4.25 0.448 2 4.1 36 female 4.25 -0.152 3 3.9 36 female 4.25 -0.352 4 4.8 36 female 4.25 0.548 5 4.6 59 male 4.20 0.399 Recall the format of the output: score corresponds to \\(y\\) the observed value score_hat corresponds to \\(\\widehat{y} = \\widehat{\\mbox{score}}\\) the fitted value residual corresponds to the residual \\(y - \\widehat{y}\\) 7.3 Related topics 7.3.1 More on the correlation coefficient Recall from Table 7.2 that we saw the correlation coefficient between Income in thousands of dollars and credit card Balance was 0.464. What if in instead we looked at the correlation coefficient between Income and credit card Balance, but where Income was in dollars and not thousands of dollars? This can be done by multiplying Income by 1000. library(ISLR) data(Credit) Credit %&gt;% select(Balance, Income) %&gt;% mutate(Income = Income * 1000) %&gt;% cor() TABLE 7.10: Correlation between income (in dollars) and credit card balance Balance Income Balance 1.000 0.464 Income 0.464 1.000 We see it is the same! We say that the correlation coefficient is invariant to linear transformations! In other words, the correlation between \\(x\\) and \\(y\\) will be the same as the correlation between \\(a\\times x + b\\) and \\(y\\) where \\(a\\) and \\(b\\) are numerical values (real numbers in mathematical terms). 7.3.2 Simpson’s Paradox Recall in Section 7.1, we saw the two following seemingly contradictory results when studying the relationship between credit card balance, credit limit, and income. On the one hand, the right hand plot of Figure 7.1 suggested that credit card balance and income were positively related: FIGURE 7.5: Relationship between credit card balance and credit limit/income On the other hand, the multiple regression in Table 7.3, suggested that when modeling credit card balance as a function of both credit limit and income at the same time, credit limit has a negative relationship with balance, as evidenced by the slope of -7.66. How can this be? First, let’s dive a little deeper into the explanatory variable Limit. Figure 7.6 shows a histogram of all 400 values of Limit, along with vertical red lines that cut up the data into quartiles, meaning: 25% of credit limits were between $0 and $3088. Let’s call this the “low” credit limit bracket. 25% of credit limits were between $3088 and $4622. Let’s call this the “medium-low” credit limit bracket. 25% of credit limits were between $4622 and $5873. Let’s call this the “medium-high” credit limit bracket. 25% of credit limits were over $5873. Let’s call this the “high” credit limit bracket. FIGURE 7.6: Histogram of credit limits and quartiles Let’s now display The scatterplot showing the relationship between credit card balance and limit (the right-hand plot of Figure 7.1). The scatterplot showing the relationship between credit card balance and limit now with a color aesthetic added corresponding to the credit limit bracket. FIGURE 7.7: Relationship between credit card balance and income for different credit limit brackets In the right-hand plot, the Red points (bottom-left) correspond to the low credit limit bracket. Green points correspond to the medium-low credit limit bracket. Blue points correspond to the medium-high credit limit bracket. Purple points (top-right) correspond to the high credit limit bracket. The left-hand plot focuses of the relationship between balance and income in aggregate, but the right-hand plot focuses on the relationship between balance and income broken down by credit limit bracket. Whereas in aggregate there is an overall positive relationship, when broken down we now see that for the low (red points), medium-low (green points), and medium-high (blue points) income bracket groups, the strong positive relationship between credit card balance and income disappears! Only for the high bracket does the relationship stay somewhat positive. In this example, credit limit is a confounding variable for credit card balance and income. 7.4 Conclusion 7.4.1 What’s to come? Congratulations! We’re ready to proceed to the third portion of this book: “statistical inference” using a new package called infer. Once we’ve covered Chapters 8 on sampling, 9 on confidence intervals, and 10 on hypothesis testing, we’ll come back to the models we’ve seen in “data modeling” in Chapter 11 on inference for regression. As we said at the end of Chapter 6, we’ll see why we’ve been conducting the residual analyses from Subsections 11.4.3 and 11.4.4. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, conf_low and conf_high (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Up next: 7.4.2 Script of R code An R script file of all R code used in this chapter is available here. "],
-["8-sampling.html", "Chapter 8 Sampling 8.1 Sampling activity 8.2 Computer simulation 8.3 Our goal 8.4 Sampling framework 8.5 Interpretation 8.6 Case study: Polls 8.7 Conclusion", " Chapter 8 Sampling In this chapter we kick off the third segment of this book, statistical inference, by learning about sampling. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we’ll cover in Chapters 9 and 10 respectively. We will see that the tools that you learned in the data science segment of this book, in particular data visualization and data wrangling, will also play an important role here in the development of your understanding. As mentioned before, the concepts throughout this text all build into a culmination allowing you to “think with data.” Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(moderndive) 8.1 Sampling activity Let’s start with a hand-on activity. 8.1.1 What proportion of this bowl’s balls are red? Take a look at the bowl in Figure 8.1. It has a certain number of red and and a certain number of white balls, all of equal size. What proportion of this bowl’s balls are red? FIGURE 8.1: A bowl with red and white balls. One way to answer this question would be to perform an exhaustive count: remove each ball individually, count the number of red balls and the number of white balls, and divide the number of red balls by the total number of balls. However this would be a long and tedious process. 8.1.2 Using shovel once Instead of performing an exhaustive count, let’s insert a shovel into the bowl as seen in Figure 8.2. FIGURE 8.2: Inserting a shovel into the bowl. Using the shovel we remove a number of balls as seen in Figure 8.3. FIGURE 8.3: Fifty balls from the bowl. Observe that 17 of the balls are red and there are a total of 5 x 10 = 50 balls and thus 0.34 = 34% of the shovel’s balls are red. The proportion of balls that are red in this shovel is a guess of the proportion of balls that are red in the entire bowl. While not as exact as doing an exhaustive count, our guess of 34% took much less time and energy to obtain. However say we started this activity over from the beginning. In other words, we replace the 50 balls back into the ball and start over. Would we remove exactly 17 red balls again? In other words, would our guess at the proportion of the bowl’s balls that are red by exactly 34% again? Maybe? What if we repeated this exercise several times? Would I obtain exactly 17 red balls each time? In other words, would our guess at the proportion of the bowl’s balls that are red by exactly 34% every time? Surely not. Let’s actually do and observe the results with the help of 33 of our friends. 8.1.3 Using shovel 33 times Each of our 33 friends will do the following: use the shovel to remove 50 balls each, count the number of red balls, use this number to compute the proportion of the 50 balls they removed that are red, return the balls into the bowl, and mix the contents of the bowl a little to not let a previous group;s results influence the next group’s set of results. FIGURE 8.4: Repeating sampling activity 33 times. FIGURE 8.4: Repeating sampling activity 33 times. FIGURE 8.4: Repeating sampling activity 33 times. However, before returning the balls into the bowl, they are going to mark the proportion of the 50 balls they removed that are red in a histogram as seen in Figure 8.5. FIGURE 8.5: Constructing a histogram of proportions. Recall from Section 3.5 that histograms allow us to visualize the distribution of a numerical variable: where the values center and in particular how they vary. The resulting hand-drawn histogram can be seen in Figure 8.6. FIGURE 8.6: Hand-drawn histogram of 33 proportions. Observe the following about the histogram in Figure 8.6: At the low end, one group removed 50 balls from the bowl with proportion between 0.20 = 20% and 0.25 = 25% At the high end, another group removed 50 balls from the bowl with proportion between 0.45 = 45% and 0.5 = 50% red. However the most frequently occuring proportions were between 0.30 = 30% and 0.35 = 35% red, right in the middle of the distribution. The shape of this distribution is somewhat bell-shaped. Let’s construct this same hand-drawn histogram in R using your data visualization skills that you honed in Chapter 3. We saved our 33 groups of friend’s proportion red in a data frame tactile_prop_red which is included in the moderndive package you loaded earlier. tactile_prop_red View(tactile_prop_red) Let’s display only the first 10 out of 33 rows of tactile_prop_red’s contents in Table 8.1. TABLE 8.1: First 10 out of 33 groups’ proportion of 50 balls that are red. group replicate red_balls prop_red Ilyas, Yohan 1 21 0.42 Morgan, Terrance 2 17 0.34 Martin, Thomas 3 21 0.42 Clark, Frank 4 21 0.42 Riddhi, Karina 5 18 0.36 Andrew, Tyler 6 19 0.38 Julia 7 19 0.38 Rachel, Lauren 8 11 0.22 Daniel, Caroline 9 15 0.30 Josh, Maeve 10 17 0.34 Observe for each group we have their names, the number of red_balls they obtained, and the corresponding proportion out of 50 balls that were red prop_red. Observe, we also have a variable replicate enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red. We visualize the distribution of these 33 proportions using a geom_histogram() with binwidth = 0.05 in Figure 8.7, which matches our hand-drawn histogram from the earlier Figure 8.6. Recall that using a histogram is appropriate since prop_red is a numerical variable. ggplot(tactile_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 50 balls that were red&quot;, title = &quot;Distribution of 33 proportions red&quot;) FIGURE 8.7: Distribution of 33 proportions based on 33 samples of size 50 8.1.4 What are we doing here? What we just demonstrated in this activity is the statistical concept of sampling. We would like to know the proportion of the bowl’s balls that are red. However, because the bowl has a very large number of balls, performing an exhaustive count of the number of red and white balls in the bowl would be very costly, both in terms of both time and energy. We therefore instead mix the balls and extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we approximate the proportion of the bowl’s balls that are red using the proportion of the shovel’s balls that are red, 17 red balls out of 50 balls = 34% in our earlier example. Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table 8.1. This is known as the concept of sampling variation. In Section 8.2 we’ll mimic the hands-on sampling activity we just performed in a computer simulation; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the activity a very large number of times, but we will also be able to repeat it with different sized shovels. After these simulations, in Section 8.3 we’ll explicitly articulate our goals for this chapter: understanding the concept of sampling variation and the role that sample size plays in this variation. After having armed ourselves with this conceptual understanding of sampling, we’ll present you with definitions, terminology, and notation related to sampling in Section 8.4. As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you’ll be able to master these topics. To tie the contents of this chapter to the real-word, we’ll present an example of one of the most recognizable uses of sampling: polls. In Section 8.6 we’ll look at a particular case study: a 2013 poll on then President Obama’s popularity amongst young Americans, conducted by the Harvard Kennedy School’s Institute of Politics. We’ll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distiguishing between random sampling and random assignment, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter. 8.2 Computer simulation What we performed in Section 8.1 is a simulation of sampling. The crowd-sourced Wikipedia definition of a simulation states: “A simulation is an approximate imitation of the operation of a process or system.”1 One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible. Now you might be thinking that simulations must necssarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengeres of being in an automobile crash. To distinguish between these two simulation types, we’ll term a simulation performed in real-life as a “tactile” simulation done with your hands and to the touch as opposed to a “virtual” simulation performed on a computer. Example of a “tactile” simulation Example of “virtual” simulation So while in Section 8.1 we performed a “tactile” simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we’ll perform a “virtual” simulation using a virtual bowl and a virtual shovel with our computers. 8.2.1 Using shovel once Let’s start by perfoming the virtual analogue of the tactile sampling simulation we performed in 8.1. We first need a virtual analogue of the bowl seen in Figure 8.1. To this end, we created a data frame called bowl whose rows correspond exactly with the contents of the actual bowl; we’ve included this data frame in the moderndive package. bowl # A tibble: 2,400 x 2 ball_ID color &lt;int&gt; &lt;chr&gt; 1 1 white 2 2 white 3 3 white 4 4 red 5 5 white 6 6 white 7 7 red 8 8 white 9 9 red 10 10 white # … with 2,390 more rows Observe in the output that bowl has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable ball_ID is used merely as an “identification variable” for this data frame as discussed in Subsection ??; none of the balls in the actual bowl are marked with numbers. The second variable color indicates whether a particular virtual ball i s red or white. Run View(bowl) in RStudio and scroll through the contents to convince yourselves that bowl is indeed a virtual version of the actual bowl in Figure 8.1. Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure 8.2 to generate our random samples of 50 balls. We’re going to use the rep_sample_n() function included in the moderndive package that allows us to take repeated/replicated samples of sizen. Run the following and explorevirtual_shovel`’s contents in the spreadsheet viewer. virtual_shovel &lt;- bowl %&gt;% rep_sample_n(size = 50) View(virtual_shovel) Let’s display only the first 10 out of 50 rows of virtual_shovel’s contents in Table 8.2. TABLE 8.2: First 10 sampled balls of 50 in virtual sample replicate ball_ID color 1 1500 white 1 1767 red 1 1035 white 1 245 white 1 1121 white 1 1828 white 1 721 white 1 1729 red 1 770 white 1 1499 red The ball_ID variable identifies which of balls from bowl are included in our sample of 50 balls and color denotes it’s color. However what does the replicate variable indicate? In virtual_shovel’s case, replicate is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in other words our first sample. We’ll see below when we “virtually” take 33 samples below, replicate will take values between 1 and 33. Before we do this, let’s compute the proportion of balls in our virtual sample of size 50 that are red. We’ll be using the dplyr data wrangling verbs you learned in Chapter 4. Let’s breakdown the steps individually: First, for each of our 50 sampled balls, identify if it is red or not using the boolean algebra. For every row where color == &quot;red&quot;, the boolean TRUE is returned and for every row where color is not equal to &quot;red&quot;, the boolean FALSE is returned. Let’s create a new boolean variable is_red using the mutate() function from Section 4.5: virtual_shovel %&gt;% mutate(is_red = color == &quot;red&quot;) # A tibble: 50 x 4 # Groups: replicate [1] replicate ball_ID color is_red &lt;int&gt; &lt;int&gt; &lt;chr&gt; &lt;lgl&gt; 1 1 1500 white FALSE 2 1 1767 red TRUE 3 1 1035 white FALSE 4 1 245 white FALSE 5 1 1121 white FALSE 6 1 1828 white FALSE 7 1 721 white FALSE 8 1 1729 red TRUE 9 1 770 white FALSE 10 1 1499 red TRUE # … with 40 more rows Second, we compute the number of balls out of 50 that are red using the summarize() function. Recall from Section 4.3 that summarize() takes a data frame with many rows and returns a data frame with a single row containing summary statistics that you specify, like mean() and median(). In this case we use the sum(): virtual_shovel %&gt;% mutate(is_red = color == &quot;red&quot;) %&gt;% summarize(num_red = sum(is_red)) # A tibble: 1 x 2 replicate num_red &lt;int&gt; &lt;int&gt; 1 1 17 Why does this work? Because R treats TRUE like the number 1 and FALSE like the number 0. So summing the number of TRUE’s and FALSE’s is equivalent to summing 1’s and 0’s, which in the end which counts the number of balls where color is red. Third and last, we compute the proportion of the 50 sampled balls that are red by dividing num_red by 50: virtual_shovel %&gt;% mutate(is_red = color == &quot;red&quot;) %&gt;% summarize(num_red = sum(is_red)) %&gt;% mutate(prop_red = num_red / 50) # A tibble: 1 x 3 replicate num_red prop_red &lt;int&gt; &lt;int&gt; &lt;dbl&gt; 1 1 17 0.34 Let’s make the above code a little more compact and succinct by combining the first mutate() and the summarize() as follows: virtual_shovel %&gt;% summarize(num_red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = num_red / 50) # A tibble: 1 x 3 replicate num_red prop_red &lt;int&gt; &lt;int&gt; &lt;dbl&gt; 1 1 17 0.34 Great! 44% of virtual_shovel’s 50 balls were red! So based on this particular sample, our guess at the proportion of bowl’s balls that are red is 44%. But remember from our earlier tactile sampling activity, that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 44% of them being red; there will likely be some variation. In fact in Table 8.2 we displayed 33 such proportions based on 33 tactile samples and then in Figure 8.6 we visualized the distribution of the 33 proportions in a histogram. Let’s now perform the virtual analogue of having 33 groups of students use the sampling shovel! 8.2.2 Using shovel 33 times Recall however in our tactile sampling exercise in Section 8.1 above that we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we used to then compute 33 proportions. In other words we repeated/replicated the sampling activity 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel funciton rep_sample_n(), but by adding the reps = 33 argument indicating we want to repeat the sampling 33 times. Be sure to scroll through the contents of virtual_samples in RStudio’s spreadsheet viewer. virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 33) View(virtual_samples) Observe that while the first 50 rows of replicate are equal to 1 the next 50 are equal to 2. This is indicating that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all reps = 33 replicates and thus virtual_samples has 33 \\(\\times\\) 50 = 1650 rows. Let’s now take the data frame virtual_samples with 33 \\(\\times\\) 50 = 1650 rows corresponding to 33 samples of size 50 and compute the resulting 33 proportions red. We’ll use the same dplyr verbs as we did in the previous section, but this time with an additional group_by() the replicate variable. Recall from Section 4.4 that by assigning grouping “meta-data” before summarizing(), we’ll obtain 33 different proportions red: virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) View(virtual_prop_red) Let’s display only the first 10 out of 33 rows of virtual_prop_red’s contents in Table 8.1. TABLE 8.3: First 10 out of 33 virtual proportion of 50 balls that are red. replicate red prop_red 1 18 0.36 2 20 0.40 3 19 0.38 4 18 0.36 5 15 0.30 6 18 0.36 7 19 0.38 8 13 0.26 9 23 0.46 10 14 0.28 Let’s visualize the distribution of these 33 proportions red based on 33 virtual samples using a histogram with binwidth = 0.05 in Figure 8.8. ggplot(virtual_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 50 balls that were red&quot;, title = &quot;Distribution of 33 proportions red&quot;) FIGURE 8.8: Distribution of 33 proportions based on 33 samples of size 50 Observe that occasionally we obtained proportions red that are less than 0.3 = 30%, while occasionally we obtained proportions that are greater than 0.45 = 45%. However, the most frequently occurring proportions red out of 50 balls were between 35% and 40% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation. Let’s now compare our virtual results with our tactile results from the previous section in Figure 8.9. We see that both histograms, in other words the distribution of the 33 proportions red, are somewhat somewhat similar in their center and spread, although not identical; these slight differences are again due to random variation. Furthermore both distributions are somewhat bell-shaped. FIGURE 8.9: Two distribution of 33 proportions based on 33 samples of size 50 8.2.3 Using shovel 1000 times Now say we want study the variation in proportions red not based on 33 samples but rather a very large number of samples, say 1000 samples. We have two choices at this point. We could make our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. However, this would be cruel and unusual, as it this would be very tedious and time consuming. This is however where computers excel: for automating long and repetitive tasks and having them performed very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let’s once again use the rep_sample_n() function with sample size set to 50, but the number of replicates reps = 1000. Be sure to scroll through the contents of virtual_samples in RStudio’s spreadsheet viewer. virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 1000) View(virtual_samples) Observe that now virtual_samples has 1000 \\(\\times\\) 50 = 50,000 rows, instead of the 33 \\(\\times\\) 50 = 1650 rows from earlier. Using the same code as earlier, let’s take the data frame virtual_samples with 1000 \\(\\times\\) 50 = 50,000 and compute the resulting 33 proportions red. virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) View(virtual_prop_red) Observe that we now have 1000 replicates of prop_red, the proportion of 50 balls that are red. Using the same code as earlier, let’s now visualize the distribution of these 1000 replicates of prop_red in a histogram in Figure 8.10. ggplot(virtual_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 50 balls that were red&quot;, title = &quot;Distribution of 1000 proportions red&quot;) FIGURE 8.10: Distribution of 1000 proportions based on 33 samples of size 50 Once again, the most frequently occuring proportions red occur between 35% and 40%. Every now and then, we’d obtain proportions are low as between 20% and 25%, and others as high as between 55% and 60%, but those are rarities. Furthermore observe that we now have much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix A for a brief discussion on properties of the Normal distribution. 8.2.4 Using different shovels We ask ourselves a question now. Say you had three choices of shovels to extract a sample of balls and compute the corresponding proportion of balls in the shovel that are red: A shovel with 25 slots A shovel with 50 slots A shovel with 100 slots Which would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size, and thus would yield the “best” guess of the proportion of the bowl’s 2400 balls that are red. The three shovels above present with three possible sample sizes. Using our newly developed tools for virtual sampling simulations, let’s unpack the effect of having different sample sizes! In other words, for size = 25, size = 50, and size = 100: Virtually use the appropriate shovel to generate 1000 samples with size balls. Compute the resulting 1000 replicated of the proportion of the shovel’s balls that are red. Visualize the distribution of these 1000 proportion red using a histogram. Run each of the following code segments individually and then compare the three resulting histograms. # Segment 1: sample size = 25 ------------------------------ # 1.a) Virtually use shovel 1000 times virtual_samples_25 &lt;- bowl %&gt;% rep_sample_n(size = 25, reps = 1000) # 1.b) Compute resulting 1000 replicates of proportion red virtual_prop_red_25 &lt;- virtual_samples_25 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 25) # 1.c) Plot distribution via a histogram ggplot(virtual_prop_red_25, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 25 balls that were red&quot;, title = &quot;25&quot;) # Segment 2: sample size = 50 ------------------------------ # 2.a) Virtually use shovel 1000 times virtual_samples_50 &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 1000) # 2.b) Compute resulting 1000 replicates of proportion red virtual_prop_red_50 &lt;- virtual_samples_50 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) # 2.c) Plot distribution via a histogram ggplot(virtual_prop_red_50, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 50 balls that were red&quot;, title = &quot;50&quot;) # Segment 3: sample size = 100 ------------------------------ # 3.a) Virtually using shovel with 100 slots 1000 times virtual_samples_100 &lt;- bowl %&gt;% rep_sample_n(size = 100, reps = 1000) # 3.b) Compute resulting 1000 replicates of proportion red virtual_prop_red_100 &lt;- virtual_samples_100 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 100) # 3.c) Plot distribution via a histogram ggplot(virtual_prop_red_100, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 100 balls that were red&quot;, title = &quot;100&quot;) For easy comparison, we present the three resulting histograms in a single row with matching x and y axes in Figure 8.11. What do you observe? FIGURE 8.11: Comparing the distributions of proportion red for different sample sizes Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation, and the distribution centers more tightly around the same value. Eyeballing Figure 8.11, things appear to center more tightly around roughly 40%. We can be numerically explicit about the amount of spread using the standard deviation: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix A for a brief discussion on properties of the standard deviation. For all three sample sizes, compute the standard deviation of sd() of the 1000 proportions red by running the following data wrangling code. # n = 25 virtual_prop_red_25 %&gt;% summarize(sd = sd(prop_red)) # n = 50 virtual_prop_red_50 %&gt;% summarize(sd = sd(prop_red)) # n = 100 virtual_prop_red_100 %&gt;% summarize(sd = sd(prop_red)) Let’s compare these 3 measures of spread of the distributions we in Table 8.4. TABLE 8.4: Comparing the standard deviations of the proportion red for different sample sizes. sample size standard deviation 25 0.099 50 0.071 100 0.048 As we observed visually in Figure 8.11, as the sample size increases our numerical measure of spread decreases; there is less variation in our proportions red. In other words, as the sample size increases, our guesses at the true proportion of the bowl’s balls that are red get more consistent and precise. 8.3 Our goal Simply put: study the effects of sampling variation 8.3.1 What is sampling variation? 8.3.2 Effect of sample size 8.4 Sampling framework 8.4.1 Terminology Let’s now define some concepts and terminology important to understand sampling, being sure to tie things back to the above example. You might have to read this a couple times more as you progress throughout this book, as they are very deeply layered concepts. However as we’ll soon see, they are very powerful concepts that open up a whole new world of scientific thinking: Population: The population is a set of \\(N\\) observations of interest. Above Ex: Our bowl consisting of \\(N=2400\\) identically-shaped balls. Population parameter: A population parameter is a numerical summary value about the population. In most settings, this is a value that’s unknown and you wish you knew it. Above Ex: The true population proportion \\(p\\) of the balls in the bowl that are red. In this scenario the parameter of interest is the proportion, but in others it could be numerical summary values like the mean, median, etc. Census: An exhaustive enumeration/counting of all observations in the population in order to compute the population parameter’s numerical value exactly. Above Ex: This corresponds to manually going over all \\(N=2400\\) balls and counting the number that are red, thereby allowing us to compute the population proportion \\(p\\) of the balls that are red exactly. When \\(N\\) is small, a census is feasible. However, when \\(N\\) is large, a census can get very expensive, either in terms of time, energy, or money. Ex: the Decennial United States census attempts to exhaustively count the US population. Consequently it is a very expensive, but necessary, procedure. Sampling: Collecting a sample of size \\(n\\) of observations from the population. Typically the sample size \\(n\\) is much smaller than the population size \\(N\\), thereby making sampling a much cheaper procedure than a census. Above Ex: Using the shovel to extract a sample of \\(n=50\\) balls. It is important to remember that the lowercase \\(n\\) corresponds to the sample size and uppercase \\(N\\) corresponds to the population size, thus \\(n \\leq N\\). Point estimates/sample statistics: A summary statistic based on the sample of size \\(n\\) that estimates the unknown population parameter. Above Ex: it’s the sample proportion \\(\\widehat{p}\\) red of the balls in the sample of size \\(n=50\\). Key: The sample proportion red \\(\\widehat{p}\\) is an estimate of the true unknown population proportion red \\(p\\). Representative sampling: A sample is said be a representative sample if it “looks like the population.” In other words, the sample’s characteristics are a good representation of the population’s characteristics. Above Ex: Does our sample of \\(n=50\\) balls “look like” the contents of the larger set of \\(N=2400\\) balls in the bowl? Generalizability: We say a sample is generalizable if any results of based on the sample can generalize to the population. Above Ex: Is \\(\\widehat{p}\\) a “good guess” of \\(p\\)? In other words, can we infer about the true proportion of the balls in the bowl that are red, based on the results of our sample of \\(n=50\\) balls? Bias: In a statistical sense, we say bias occurs if certain observations in a population have a higher chance of being sampled than others. We say a sampling procedure is unbiased if every observation in a population had an equal chance of being sampled. Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? We feel since the balls are all of the same size, there isn’t any bias in the sampling. If, say, the red balls had a much larger diameter than the white ones then you might have have a higher or lower probability of now sampling red balls. Random sampling: We say a sampling procedure is random if we sample randomly from the population in an unbiased fashion. Above Ex: As long as you mixed the bowl sufficiently before sampling, your samples of size \\(n=50\\) balls would be random. 8.4.2 Sampling for inference Why did we go through the trouble of enumerating all the above concepts and terminology? The moral of the story: If the sampling of a sample of size \\(n\\) is done at random, then The sample is unbiased and representative of the population, thus Any result based on the sample can generalize to the population, thus The point estimate/sample statistic is a “good guess” of the unknown population parameter of interest and thus we have inferred about the population based on our sample. In the above example: If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size \\(n=50\\), then The contents of the shovel will “look like” the contents of the bowl, thus Any results based on the sample of \\(n=50\\) balls can generalize to the large bowl of \\(N=2400\\) balls, thus The sample proportion \\(\\widehat{p}\\) of the \\(n=50\\) balls in the shovel that are red is a “good guess” of the true population proportion \\(p\\) of the \\(N=2400\\) balls that are red. and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel. 8.4.3 Statistical definitions Sampling distributions are a specific kind of distribution: distributions of point estimates/sample statistics based on samples of size \\(n\\) used to estimate an unknown population parameter. In the case of the histogram in Figure 8.7, its the distribution of the sample proportion red \\(\\widehat{p}\\) based on \\(n=50\\) sampled balls from the bowl, for which we want to estimate the unknown population proportion \\(p\\) of the \\(N=2400\\) balls that are red. Sampling distributions describe how values of the sample proportion red \\(\\widehat{p}\\) will vary from sample to sample due to sampling variability and thus identify “typical” and “atypical” values of \\(\\widehat{p}\\). For example Obtaining a sample that yields \\(\\widehat{p} = 0.36\\) would be considered typical, common, and plausible since it would in theory occur frequently. Obtaining a sample that yields \\(\\widehat{p} = 0.8\\) would be considered atypical, uncommon, and implausible since it lies far away from most of the distribution. Let’s now ask ourselves the following questions: Where is the sampling distribution centered? What is the spread of this sampling distribution? Recall from Section 4.3 the mean and the standard deviation are two summary statistics that would answer this question: tactile_prop_red %&gt;% summarize(mean = mean(prop_red), sd = sd(prop_red)) mean sd 0.356 0.058 Finally, it’s important to keep in mind: If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red \\(p\\), or in other words the true number of balls out of 2400 that are red. The spread of this histogram, as quantified by the standard deviation of 0.058, is called the standard error. It quantifies the uncertainty of our estimates of \\(p\\), which recall are called \\(\\widehat{p}\\). Note: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors. sampling distribution standard error Now let’s mimic the above tactile sampling, but with virtual sampling. We’ll resort to virtual sampling because while collecting 33 tactile samples manually is feasible, for large numbers like 1000, things start getting tiresome! That’s where a computer can really help: computers excel at performing mundane tasks repeatedly; think of what accounting software must be like! In Figure 8.8, we can start seeing a pattern in the sampling distribution emerge. However, 33 values of the sample proportion \\(\\widehat{p}\\) might not be enough to get a true sense of the distribution. Using 1000 values of \\(\\widehat{p}\\) would definitely give a better sense. What are our two options for constructing these histograms? Tactile sampling: Make the 33 groups of students take \\(1000 / 33 \\approx 31\\) samples of size \\(n=50\\) each, count the number of red balls for each of the 1000 tactile samples, and then compute the 1000 corresponding values of the sample proportion \\(\\widehat{p}\\). However, this would be cruel and unusual as this would take hours! Virtual sampling: Computers are very good at automating repetitive tasks such as this one. This is the way to go! First, generate 1000 samples of size \\(n=50\\) virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 1000) View(virtual_samples) Then for each of these 1000 samples of size \\(n=50\\), compute the corresponding sample proportions virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) View(virtual_prop_red) As previously done, let’s plot the sampling distribution of these 1000 simulated values of the sample proportion red \\(\\widehat{p}\\) with a histogram in Figure 8.10. ggplot(virtual_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, color = &quot;white&quot;) + labs(x = &quot;Sample proportion red based on n = 50&quot;, title = &quot;Sampling distribution of p-hat&quot;) FIGURE 8.12: Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50 Since the sampling is random and thus representative and unbiased, the above sampling distribution is centered at the true population proportion red \\(p\\) of all \\(N=2400\\) balls in the bowl. Eyeballing it, the sampling distribution appears to be centered at around 0.375. What is the standard error of the above sampling distribution of \\(\\widehat{p}\\) based on 1000 samples of size \\(n=50\\)? virtual_prop_red %&gt;% summarize(SE = sd(prop_red)) # A tibble: 1 x 1 SE &lt;dbl&gt; 1 0.0702 What this value is saying might not be immediately apparent by itself to someone who is new to sampling. It’s best to first compare different standard errors for different sampling schemes based on different sample sizes \\(n\\). We’ll do so for samples of size \\(n=25\\), \\(n=50\\), and \\(n=100\\) next. 8.5 Interpretation At this point, you might be saying to yourself: “Big deal, why do we care about this bowl?” As hopefully you’ll soon come to appreciate, this sampling bowl exercise is merely a simulation representing the reality of many important sampling scenarios in a simplified and accessible setting. One in particular sampling scenario is familiar to many: polling. Whether for market research or for political purposes, polls inform much of the world’s decision and opinion making, and understanding the mechanism behind them can better inform you statistical citizenship. We’ll tie-in everything we learn in this chapter with an example relating to a 2013 poll on President Obama’s approval ratings among young adults in Section ??. 8.6 Case study: Polls In December 4, 2013 National Public Radio reported on a recent poll of President Obama’s approval rating among young Americans aged 18-29 in an article Poll: Support For Obama Among Young Americans Eroding. A quote from the article: After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama. According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama’s job performance, his lowest-ever standing among the group and an 11-point drop from April. Let’s tie elements of this story using the concepts and terminology we learned at the outset of this chapter along with our observations from the tactile and virtual sampling simulations: Population: Who is the population of \\(N\\) observations of interest? Bowl: \\(N=2400\\) identically-shaped balls Obama poll: \\(N = \\text{?}\\) young Americans aged 18-29 Population parameter: What is the population parameter? Bowl: The true population proportion \\(p\\) of the balls in the bowl that are red. Obama poll: The true population proportion \\(p\\) of young Americans who approve of Obama’s job performance. Census: What would a census be in this case? Bowl: Manually going over all \\(N=2400\\) balls and exactly computing the population proportion \\(p\\) of the balls that are red. Obama poll: Locating all \\(N = \\text{?}\\) young Americans (which is in the millions) and asking them if they approve of Obama’s job performance. This would be quite expensive to do! Sampling: How do you acquire the sample of size \\(n\\) observations? Bowl: Using the shovel to extract a sample of \\(n=50\\) balls. Obama poll: One way would be to get phone records from a database and pick out \\(n\\) phone numbers. In the case of the above poll, the sample was of size \\(n=2089\\) young adults. Point estimates/sample statistics: What is the summary statistic based on the sample of size \\(n\\) that estimates the unknown population parameter? Bowl: The sample proportion \\(\\widehat{p}\\) red of the balls in the sample of size \\(n=50\\). Key: The sample proportion red \\(\\widehat{p}\\) of young Americans in the sample of size \\(n=2089\\) that approve of Obama’s job performance. In this study’s case, \\(\\widehat{p} = 0.41\\) which is the quoted 41% figure in the article. Representative sampling: Is the sample procedure representative? In other words, to the resulting samples “look like” the population? Bowl: Does our sample of \\(n=50\\) balls “look like” the contents of the larger set of \\(N=2400\\) balls in the bowl? Obama poll: Does our sample of \\(n=2089\\) young Americans “look like” the population of all young Americans aged 18-29? Generalizability: Are the samples generalizable to the greater population? Bowl: Is \\(\\widehat{p}\\) a “good guess” of \\(p\\)? Obama poll: Is \\(\\widehat{p} = 0.41\\) a “good guess” of \\(p\\)? In other words, can we confidently say that 41% of all young Americans approve of Obama. Bias: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample? Bowl: Here, I would say it is unbiased. All balls are equally sized as evidenced by the slots of the \\(n=50\\) shovel, and thus no particular color of ball can be favored in our samples over others. Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using a database of only mobile phone numbers, would people without mobile phones be included? What about if this were an internet poll on a certain news website? Would non-readers of this this website be included? Random sampling: Was the sampling random? Bowl: As long as you mixed the bowl sufficiently before sampling, your samples would be random? Obama poll: Random sampling is a necessary assumption for all of the above to work. Most articles reporting on polls take this assumption as granted. In our Obama poll, you’d have to ask the group that conducted the poll: The Harvard University Institute of Politics. Recall the punchline of all the above: If the sampling of a sample of size \\(n\\) is done at random, then The sample is unbiased and representative of the population, thus Any result based on the sample can generalize to the population, thus The point estimate/sample statistic is a “good guess” of the unknown population parameter of interest and thus we have inferred about the population based on our sample. In the bowl example: If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size \\(n=50\\), then The contents of the shovel will “look like” the contents of the bowl, thus Any results based on the sample of \\(n=50\\) balls can generalize to the large bowl of \\(N=2400\\) balls, thus The sample proportion \\(\\widehat{p}\\) of the \\(n=50\\) sampled balls in the shovel that are red is a “good guess” of the true population proportion \\(p\\) of the \\(N=2400\\) balls that are red. and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel: the proportion of balls that are red. In the Obama poll example: If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then These 2089 young Americans would “look like” the population of all young Americans, thus Any results based on this sample of 2089 young Americans can generalize to entire population of all young Americans, thus The reported sample approval rating of 41% of these 2089 young Americans is a “good guess” of the true approval rating amongst all young Americans. So long story short, this poll’s guess of Obama’s approval rating was 41%. However is this the end of the story when understanding the results of a poll? If you read further in the article, it states: The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points. Note the term margin of error, which here is plus or minus 2.1 percentage points. This is saying that a typical range of errors for polls of this type is about \\(\\pm 2.1\\%\\), in words from about 2.1% too small to about 2.1% too big. These errors are caused by sampling variation, the same sampling variation you saw studied in the histograms in Sections ?? on our tactile sampling simulations and Sections ?? on our virtual sampling simulations. In this case of polls, any variation from the true approval rating is an “error” and a reasonable range of errors is the margin of error. We’ll see in the next chapter that this what’s known as a 95% confidence interval for the unknown approval rating. We’ll study confidence intervals using a new package for our data science and statistical toolbox: the infer package for statistical inference. 8.7 Conclusion 8.7.1 Table of inference scenarios TABLE 8.5: Scenarios of sampling for inference Scenario Population parameter Notation Point estimate Notation. 1 Population proportion \\(p\\) Sample proportion \\(\\widehat{p}\\) 2 Population mean \\(\\mu\\) Sample mean \\(\\widehat{\\mu}\\) or \\(\\overline{x}\\) 3 Difference in population proportions \\(p_1 - p_2\\) Difference in sample proportions \\(\\widehat{p}_1 - \\widehat{p}_2\\) 4 Difference in population means \\(\\mu_1 - \\mu_2\\) Difference in sample means \\(\\overline{x}_1 - \\overline{x}_2\\) 5 Population regression slope \\(\\beta_1\\) Sample regression slope \\(\\widehat{\\beta}_1\\) or \\(b_1\\) 6 Population regression intercept \\(\\beta_0\\) Sample regression intercept \\(\\widehat{\\beta}_0\\) or \\(b_0\\) We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing: Scenario 2 about means. Ex: the average age of pennies. Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of two-sample inference. Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This is another situation of two-sample inference. In Chapter 11 on inference for regression, we’ll cover Scenarios 5 &amp; 6 about the regression line. In particular we’ll see that the fitted regression line from Chapter 6 on basic regression, \\(\\widehat{y} = b_0 + b_1 \\cdot x\\), is in fact an estimate of some true population regression line \\(y = \\beta_0 + \\beta_1 \\cdot x\\) based on a sample of \\(n\\) pairs of points \\((x, y)\\). Ex: Recall our sample of \\(n=463\\) instructors at the UT Austin from the evals data set in Chapter 6. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for all instructors, not just those at the UT Austin? In most cases, we don’t have the population values as we did with the bowl of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a confidence interval and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as bootstrapping that will be the focus of the beginning sections of this chapter. 8.7.2 Random sampling vs random assignment 8.7.3 Theory: Central Limit Theorem What you did in Section ?? and ?? was demonstrate a very famous theorem, or mathematically proven truth, called the Central Limit Theorem. It loosely states that when sample means and sample proportions are based on larger and larger samples, the sampling distribution corresponding to these point estimates get More and more normal More and more narrow Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following three minute and 38 second video explaining this crucial theorem to statistics using as examples, what else? The average weight of wild bunny rabbits! The average wing span of dragons! 8.7.4 Formula: Standard error 8.7.5 Closing notes This chapter serves as an introduction to the theoretical underpinning of the statistical inference techniques that will be discussed in greater detail in Chapter 9 for confidence intervals and Chapter 10 for hypothesis testing. An R script file of all R code used in this chapter is available here. Wikipedia entry for simulation↩ "],
-["9-confidence-intervals.html", "Chapter 9 Confidence Intervals 9.1 Bootstrapping 9.2 The infer package for statistical inference 9.3 Now to confidence intervals 9.4 Comparing bootstrap and sampling distributions 9.5 Interpreting the confidence interval 9.6 Example: One proportion 9.7 Example: Comparing two proportions 9.8 Conclusion", " Chapter 9 Confidence Intervals In Chapter 8, we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter 8: Generally speaking, we learned that if the sampling of a sample of size \\(n\\) is done at random, then the resulting sample is unbiased and representative of the population, thus any result based on the sample can generalize to the population, and hence the point estimate/sample statistic computed from this sample is a “good guess” of the unknown population parameter of interest Specific to the bowl, we learned that if we properly mix the balls first thereby ensuring the randomness of samples extracted using the shovel with \\(n=50\\) slots, then the contents of the shovel will “look like” the contents of the bowl, thus any results based on the sample of \\(n=50\\) balls can generalize to the large bowl of \\(N=2400\\) balls, and hence the sample proportion red \\(\\widehat{p}\\) of the \\(n=50\\) balls in the shovel is a “good guess” of the true population proportion red \\(p\\) of the \\(N=2400\\) balls in the bowl. We emphasize that we used a point estimate/sample statistic, in this case the sample proportion \\(\\widehat{p}\\), to estimate the unknown value of the population parameter, in this case the population proportion \\(p\\). In other words, we are using the sample to infer about the population. We can however consider inferential situations other than just those involving proportions. We present a wide array of such scenarios in Table 9.1. In all 7 cases, the point estimate/sample statistic estimates the unknown population parameter. It does so by computing summary statistics based on a sample of size \\(n\\). TABLE 9.1: Scenarios of sampling for inference Scenario Population parameter Notation Point estimate Notation. 1 Population proportion \\(p\\) Sample proportion \\(\\widehat{p}\\) 2 Population mean \\(\\mu\\) Sample mean \\(\\widehat{\\mu}\\) or \\(\\overline{x}\\) 3 Difference in population proportions \\(p_1 - p_2\\) Difference in sample proportions \\(\\widehat{p}_1 - \\widehat{p}_2\\) 4 Difference in population means \\(\\mu_1 - \\mu_2\\) Difference in sample means \\(\\overline{x}_1 - \\overline{x}_2\\) 5 Population regression slope \\(\\beta_1\\) Sample regression slope \\(\\widehat{\\beta}_1\\) or \\(b_1\\) 6 Population regression intercept \\(\\beta_0\\) Sample regression intercept \\(\\widehat{\\beta}_0\\) or \\(b_0\\) We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing: Scenario 2 about means. Ex: the average age of pennies. Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of two-sample inference. Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This is another situation of two-sample inference. In Chapter 11 on inference for regression, we’ll cover Scenarios 5 &amp; 6 about the regression line. In particular we’ll see that the fitted regression line from Chapter 6 on basic regression, \\(\\widehat{y} = b_0 + b_1 \\cdot x\\), is in fact an estimate of some true population regression line \\(y = \\beta_0 + \\beta_1 \\cdot x\\) based on a sample of \\(n\\) pairs of points \\((x, y)\\). Ex: Recall our sample of \\(n=463\\) instructors at the UT Austin from the evals data set in Chapter 6. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for all instructors, not just those at the UT Austin? In contrast to these, Scenario 7 involves a measure of spread: the standard deviation. Does the spread/variability of a sample match the spread/variability of the population? However, we leave this topic for a more intermediate course on statistical inference. In most cases, we don’t have the population values as we did with the bowl of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a confidence interval and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as bootstrapping that will be the focus of the beginning sections of this chapter. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(janitor) library(moderndive) library(infer) 9.1 Bootstrapping 9.1.1 Data explanation The moderndive package contains a sample of 40 pennies collected and minted in the United States. Let’s explore this sample data first: pennies_sample # A tibble: 40 x 2 year age_in_2011 &lt;int&gt; &lt;int&gt; 1 2005 6 2 1981 30 3 1977 34 4 1992 19 5 2005 6 6 2006 5 7 2000 11 8 1992 19 9 1988 23 10 1996 15 # … with 30 more rows The pennies_sample data frame has rows corresponding to a single penny with two variables: year of minting as shown on the penny and age_in_2011 giving the years the penny had been in circulation from 2011 as an integer, e.g. 15, 2, etc. Suppose we are interested in understanding some properties of the mean age of all US pennies from this data collected in 2011. How might we go about that? Let’s begin by understanding some of the properties of pennies_sample using data wrangling from Chapter 4 and data visualization from Chapter 3. 9.1.2 Exploratory data analysis First, let’s visualize the values in this sample as a histogram: ggplot(pennies_sample, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) We see a roughly symmetric distribution here that has quite a few values near 20 years in age with only a few larger than 40 years or smaller than 5 years. If pennies_sample is a representative sample from the population, we’d expect the age of all US pennies collected in 2011 to have a similar shape, a similar spread, and similar measures of central tendency like the mean. So where does the mean value fall for this sample? This point will be known as our point estimate and provides us with a single number that could serve as the guess to what the true population mean age might be. Recall how to find this using the dplyr package: x_bar &lt;- pennies_sample %&gt;% summarize(stat = mean(age_in_2011)) x_bar # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 We’ve denoted this sample mean as \\(\\bar{x}\\), which is the standard symbol for denoting the mean of a sample. Our point estimate is, thus, \\(\\bar{x} = 25.1\\). Note that this is just one sample though providing just one guess at the population mean. What if we’d like to have another guess? This should all sound similar to what we did in Chapter 8. There instead of collecting just a single scoop of balls we had many different students use the shovel to scoop different samples of red and white balls. We then calculated a sample statistic (the sample proportion) from each sample. But, we don’t have a population to pull from here with the pennies. We only have this one sample. The process of bootstrapping allows us to use a single sample to generate many different samples that will act as our way of approximating a sampling distribution using a created bootstrap distribution instead. We will pull ourselves up from our bootstraps using a single sample (pennies_sample) to get an idea of the grander sampling distribution. 9.1.3 The Bootstrapping Process Bootstrapping uses a process of sampling with replacement from our original sample to create new bootstrap samples of the same size as our original sample. We can again make use of the rep_sample_n() function to explore what one such bootstrap sample would look like. Remember that we are randomly sampling from the original sample here with replacement and that we always use the same sample size for the bootstrap samples as the size of the original sample (pennies_sample). bootstrap_sample1 &lt;- pennies_sample %&gt;% rep_sample_n(size = 40, replace = TRUE, reps = 1) bootstrap_sample1 # A tibble: 40 x 3 # Groups: replicate [1] replicate year age_in_2011 &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 1 1983 28 2 1 2000 11 3 1 2004 7 4 1 1981 30 5 1 1993 18 6 1 2006 5 7 1 1981 30 8 1 2004 7 9 1 1992 19 10 1 1994 17 # … with 30 more rows Let’s visualize what this new bootstrap sample looks like: ggplot(bootstrap_sample1, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) We now have another sample from what we could assume comes from the population of interest. We can similarly calculate the sample mean of this bootstrap sample, called a bootstrap statistic. bootstrap_sample1 %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 1 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 23.2 We can see that this sample mean is smaller than the x_bar value we calculated earlier for the pennies_sample data. We’ll come back to analyzing the different bootstrap statistic values shortly. Let’s recap what was done to get to this bootstrap sample using a tactile explanation: First, pretend that each of the 40 values of age_in_2011 in pennies_sample were written on a small piece of paper. Recall that these values were 6, 30, 34, 19, 6, etc. Now, put the 40 small pieces of paper into a receptacle such as a baseball cap. Shake up the pieces of paper. Draw “at random” from the cap to select one piece of paper. Write down the value on this piece of paper. Say that it is 28. Now, place this piece of paper containing 28 back into the cap. Draw “at random” again from the cap to select a piece of paper. Note that this is the sampling with replacement part since you may draw 28 again. Repeat this process until you have drawn 40 pieces of paper and written down the values on these 40 pieces of paper. Completing this repetition produces ONE bootstrap sample. If you look at the values in bootstrap_sample1, you can see how this process plays out. We originally drew 28, then we drew 11, then 7, and so on. Of course, we didn’t actually use pieces of paper and a cap here. We just had the computer perform this process for us to produce bootstrap_sample1 using rep_sample_n() with replace = TRUE set. The process of sampling with replacement is how we can use the original sample to take a guess as to what other values in the population may be. Sometimes in these bootstrap samples, we will select lots of larger values from the original sample, sometimes we will select lots of smaller values, and most frequently we will select values that are near the center of the sample. Let’s explore what the distribution of values of age_in_2011 for six different bootstrap samples looks like to further understand this variability. six_bootstrap_samples &lt;- pennies_sample %&gt;% rep_sample_n(size = 40, replace = TRUE, reps = 6) ggplot(six_bootstrap_samples, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) + facet_wrap(~ replicate) We can also look at the six different means using dplyr syntax: six_bootstrap_samples %&gt;% group_by(replicate) %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 6 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 23.6 2 2 24.1 3 3 25.2 4 4 23.1 5 5 24.0 6 6 24.7 Instead of doing this six times, we could do it 1000 times and then look at the distribution of stat across all 1000 of the replicates. This sets the stage for the infer R package (Bray et al. 2018) that was created to help users perform statistical inference such as confidence intervals and hypothesis tests using verbs similar to what you’ve seen with dplyr. We’ll walk through setting up each of the infer verbs for confidence intervals using this pennies_sample example, while also explaining the purpose of the verbs in a general framework. 9.2 The infer package for statistical inference The infer package makes great use of the %&gt;% to create a pipeline for statistical inference. The goal of the package is to provide a way for its users to explain the computational process of confidence intervals and hypothesis tests using the code as a guide. The verbs build in order here, so you’ll want to start with specify() and then continue through the others as needed. 9.2.1 Specify variables The specify() function is used primarily to choose which variables will be the focus of the statistical inference. In addition, a setting of which variable will act as the explanatory and which acts as the response variable is done here. For proportion problems similar to those in Chapter 8, we can also give which of the different levels we would like to have as a success. We’ll see further examples of these options in this chapter, Chapter 10, and in Appendix B. To begin to create a confidence interval for the population mean age of US pennies in 2011, we start by using specify() to choose which variable in our pennies_sample data we’d like to work with. This can be done in one of two ways: Using the response argument: pennies_sample %&gt;% specify(response = age_in_2011) Response: age_in_2011 (integer) # A tibble: 40 x 1 age_in_2011 &lt;int&gt; 1 6 2 30 3 34 4 19 5 6 6 5 7 11 8 19 9 23 10 15 # … with 30 more rows Using formula notation: pennies_sample %&gt;% specify(formula = age_in_2011 ~ NULL) Response: age_in_2011 (integer) # A tibble: 40 x 1 age_in_2011 &lt;int&gt; 1 6 2 30 3 34 4 19 5 6 6 5 7 11 8 19 9 23 10 15 # … with 30 more rows Note that the formula notation uses the common R methodology to include the response \\(y\\) variable on the left of the ~ and the explanatory \\(x\\) variable on the right of the “tilde.” Recall that you used this notation frequently with the lm() function in Chapters 6 and 7 when fitting regression models. Either notation works just fine, but a preference is usually given here for the formula notation to further build on the ideas from earlier chapters. 9.2.2 Generate replicates After specify()ing the variables we’d like in our inferential analysis, we next feed that into the generate() verb. The generate() verb’s main argument is reps, which is used to give how many different repetitions one would like to perform. Another argument here is type, which is automatically determined by the kinds of variables passed into specify(). We can also be explicit and set this type to be type = &quot;bootstrap&quot;. This type argument will be further used in hypothesis testing in Chapter 10 as well. Make sure to check out ?generate to see the options here and use the ? operator to better understand other verbs as well. Let’s generate() 1000 bootstrap samples: thousand_bootstrap_samples &lt;- pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% generate(reps = 1000) We can use the dplyr count() function to help us understand what the thousand_bootstrap_samples data frame looks like: thousand_bootstrap_samples %&gt;% count(replicate) # A tibble: 1,000 x 2 # Groups: replicate [1,000] replicate n &lt;int&gt; &lt;int&gt; 1 1 40 2 2 40 3 3 40 4 4 40 5 5 40 6 6 40 7 7 40 8 8 40 9 9 40 10 10 40 # … with 990 more rows Notice that each replicate has 40 entries here. Now that we have 1000 different bootstrap samples, our next step is to calculate the bootstrap statistics for each sample. 9.2.3 Calculate summary statistics After generate()ing many different samples, we next want to condense those samples down into a single statistic for each replicated sample. As seen in the diagram, the calculate() function is helpful here. As we did at the beginning of this chapter, we now want to calculate the mean age_in_2011 for each bootstrap sample. To do so, we use the stat argument and set it to &quot;mean&quot; below. The stat argument has a variety of different options here and we will see further examples of this throughout the remaining chapters. bootstrap_distribution &lt;- pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;mean&quot;) bootstrap_distribution # A tibble: 1,000 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 26.5 2 2 25.4 3 3 26.0 4 4 26 5 5 25.2 6 6 29.0 7 7 22.8 8 8 26.4 9 9 24.9 10 10 28.1 # … with 990 more rows We see that the resulting data has 1000 rows and 2 columns corresponding to the 1000 replicates and the mean for each bootstrap sample. Observed statistic / point estimate calculations Just as group_by() %&gt;% summarize() produces a useful workflow in dplyr, we can also use specify() %&gt;% calculate() to compute summary measures on our original sample data. It’s often helpful both in confidence interval calculations, but also in hypothesis testing to identify what the corresponding statistic is in the original data. For our example on penny age, we computed above a value of x_bar using the summarize() verb in dplyr: pennies_sample %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 This can also be done by skipping the generate() step in the pipeline feeding specify() directly into calculate(): pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% calculate(stat = &quot;mean&quot;) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 This shortcut will be particularly useful when the calculation of the observed statistic is tricky to do using dplyr alone. This is particularly the case when working with more than one variable as will be seen in Chapter 10. 9.2.4 Visualize the results The visualize() verb provides a simple way to view the bootstrap distribution as a histogram of the stat variable values. It has many other arguments that one can use as well including the shading of the histogram values corresponding to the confidence interval values. bootstrap_distribution %&gt;% visualize() The shape of this resulting distribution may look familiar to you. It resembles the well-known normal (bell-shaped) curve. The following diagram recaps the infer pipeline for creating a bootstrap distribution. 9.3 Now to confidence intervals Definition: Confidence Interval A confidence interval gives a range of plausible values for a parameter. It depends on a specified confidence level with higher confidence levels corresponding to wider confidence intervals and lower confidence levels corresponding to narrower confidence intervals. Common confidence levels include 90%, 95%, and 99%. Usually we don’t just begin sections with a definition, but confidence intervals are simple to define and play an important role in the sciences and any field that uses data. You can think of a confidence interval as playing the role of a net when fishing. Instead of just trying to catch a fish with a single spear (estimating an unknown parameter by using a single point estimate/statistic), we can use a net to try to provide a range of possible locations for the fish (use a range of possible values based around our statistic to make a plausible guess as to the location of the parameter). The bootstrapping process will provide bootstrap statistics that have a bootstrap distribution with center at (or extremely close to) the mean of the original sample. This can be seen by giving the observed statistic obs_stat argument the value of the point estimate x_bar. bootstrap_distribution %&gt;% visualize(obs_stat = x_bar) We can also compute the mean of the bootstrap distribution of means to see how it compares to x_bar: bootstrap_distribution %&gt;% summarize(mean_of_means = mean(stat)) # A tibble: 1 x 1 mean_of_means &lt;dbl&gt; 1 25.1 In this case, we can see that the bootstrap distribution provides us a guess as to what the variability in different sample means may look like only using the original sample as our guide. We can quantify this variability in the form of a 95% confidence interval in a couple different ways. 9.3.1 The percentile method One way to calculate a range of plausible values for the unknown mean age of coins in 2011 is to use the middle 95% of the bootstrap_distribution to determine our endpoints. Our endpoints are thus at the 2.5th and 97.5th percentiles. This can be done with infer using the get_ci() function. (You can also use the conf_int() or get_confidence_interval() functions here as they are aliases that work the exact same way.) bootstrap_distribution %&gt;% get_ci(level = 0.95, type = &quot;percentile&quot;) # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 These options are the default values for level and type so we can also just do: percentile_ci &lt;- bootstrap_distribution %&gt;% get_ci() percentile_ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 Using the percentile method, our range of plausible values for the mean age of US pennies in circulation in 2011 is 20.972 years to 29.252 years. We can use the visualize() function to view this using the endpoints and direction arguments, setting direction to &quot;between&quot; (between the values) and endpoints to be those stored with name percentile_ci. bootstrap_distribution %&gt;% visualize(endpoints = percentile_ci, direction = &quot;between&quot;) You can see that 95% of the data stored in the stat variable in bootstrap_distribution falls between the two endpoints with 2.5% to the left outside of the shading and 2.5% to the right outside of the shading. The cut-off points that provide our range are shown with the darker lines. 9.3.2 The standard error method If the bootstrap distribution is close to symmetric and bell-shaped, we can also use a shortcut formula for determining the lower and upper endpoints of the confidence interval. This is done by using the formula \\(\\bar{x} \\pm (multiplier * SE),\\) where \\(\\bar{x}\\) is our original sample mean and \\(SE\\) stands for standard error and corresponds to the standard deviation of the bootstrap distribution. The value of \\(multiplier\\) here is the appropriate percentile of the standard normal distribution. These are automatically calculated when level is provided with level = 0.95 being the default. (95% of the values in a standard normal distribution fall within 1.96 standard deviations of the mean, so \\(multiplier = 1.96\\) for level = 0.95, for example.) As mentioned, this formula assumes that the bootstrap distribution is symmetric and bell-shaped. This is often the case with bootstrap distributions, especially those in which the original distribution of the sample is not highly skewed. Definition: standard error The standard error is the standard deviation of the sampling distribution. The variability of the sampling distribution may be approximated by the variability of the bootstrap distribution. Traditional theory-based methodologies for inference also have formulas for standard errors, assuming some conditions are met. This \\(\\bar{x} \\pm (multiplier * SE)\\) formula is implemented in the get_ci() function as shown with our pennies problem using the bootstrap distribution’s variability as an approximation for the sampling distribution’s variability. We’ll see more on this approximation shortly. Note that the center of the confidence interval (the point_estimate) must be provided for the standard error confidence interval. standard_error_ci &lt;- bootstrap_distribution %&gt;% get_ci(type = &quot;se&quot;, point_estimate = x_bar) standard_error_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 bootstrap_distribution %&gt;% visualize(endpoints = standard_error_ci, direction = &quot;between&quot;) We see that both methods produce nearly identical confidence intervals with the percentile method being \\([20.97, 29.25]\\) and the standard error method being \\([20.97, 29.28]\\). 9.4 Comparing bootstrap and sampling distributions To help build up the idea of a confidence interval, we weren’t completely honest in our initial discussion. The pennies_sample data frame represents a sample from a larger number of pennies stored as pennies in the moderndive package. The pennies data frame (also in the moderndive package) contains 800 rows of data and two columns pertaining to the same variables as pennies_sample. Let’s begin by understanding some of the properties of the age_by_2011 variable in the pennies data frame. ggplot(pennies, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) pennies %&gt;% summarize(mean_age = mean(age_in_2011), median_age = median(age_in_2011)) # A tibble: 1 x 2 mean_age median_age &lt;dbl&gt; &lt;dbl&gt; 1 21.2 20 We see that pennies is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that pennies_sample was more symmetric than pennies. In fact, it actually exhibited some left-skew as we compare the mean and median values. ggplot(pennies_sample, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) pennies_sample %&gt;% summarize(mean_age = mean(age_in_2011), median_age = median(age_in_2011)) # A tibble: 1 x 2 mean_age median_age &lt;dbl&gt; &lt;dbl&gt; 1 25.1 25.5 Sampling distribution Let’s assume that pennies represents our population of interest. We can then create a sampling distribution for the population mean age of pennies, denoted by the Greek letter \\(\\mu\\), using the rep_sample_n() function seen in Chapter 8. First we will create 1000 samples from the pennies data frame. thousand_samples &lt;- pennies %&gt;% rep_sample_n(size = 40, reps = 1000, replace = FALSE) When creating a sampling distribution, we do not replace the items when we create each sample. This is in contrast to the bootstrap distribution. It’s important to remember that the sampling distribution is sampling without replacement from the population to better understand sample-to-sample variability, whereas the bootstrap distribution is sampling with replacement from our original sample to better understand potential sample-to-sample variability. After sampling from pennies 1000 times, we next want to compute the mean age for each of the 1000 samples: sampling_distribution &lt;- thousand_samples %&gt;% group_by(replicate) %&gt;% summarize(stat = mean(age_in_2011)) ggplot(sampling_distribution, aes(x = stat)) + geom_histogram(bins = 10, fill = &quot;salmon&quot;, color = &quot;white&quot;) FIGURE 9.1: Sampling distribution for n=40 samples of pennies We can also examine the variability in this sampling distribution by calculating the standard deviation of the stat column. Remember that the standard deviation of the sampling distribution is the standard error, frequently denoted as se. sampling_distribution %&gt;% summarize(se = sd(stat)) # A tibble: 1 x 1 se &lt;dbl&gt; 1 2.01 Bootstrap distribution Let’s now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We’ll shade the bootstrap distribution blue to further assist with remembering which is which. bootstrap_distribution %&gt;% visualize(bins = 10, fill = &quot;blue&quot;) bootstrap_distribution %&gt;% summarize(se = sd(stat)) # A tibble: 1 x 1 se &lt;dbl&gt; 1 2.12 Notice that while the standard deviations are similar, the center of the sampling distribution and the bootstrap distribution differ: sampling_distribution %&gt;% summarize(mean_of_sampling_means = mean(stat)) # A tibble: 1 x 1 mean_of_sampling_means &lt;dbl&gt; 1 21.2 bootstrap_distribution %&gt;% summarize(mean_of_bootstrap_means = mean(stat)) # A tibble: 1 x 1 mean_of_bootstrap_means &lt;dbl&gt; 1 25.1 Since the bootstrap distribution is centered at the original sample mean, it doesn’t necessarily provide a good estimate of the overall population mean \\(\\mu\\). Let’s calculate the mean of age_in_2011 for the pennies data frame to see how it compares to the mean of the sampling distribution and the mean of the bootstrap distribution. pennies %&gt;% summarize(overall_mean = mean(age_in_2011)) # A tibble: 1 x 1 overall_mean &lt;dbl&gt; 1 21.2 Notice that this value matches up well with the mean of the sampling distribution. This is actually an artifact of the Central Limit Theorem introduced in Chapter 8. The mean of the sampling distribution is expected to be the mean of the overall population. The unfortunate fact though is that we don’t know the population mean in nearly all circumstances. The motivation of presenting it here was to show that the theory behind the Central Limit Theorem works using the tools you’ve worked with so far using the ggplot2, dplyr, moderndive, and infer packages. If we aren’t able to use the sample mean as a good guess for the population mean, how should we best go about estimating what the population mean may be if we can only select samples from the population. We’ve now come full circle and can discuss the underpinnings of the confidence interval and ways to interpret it. 9.5 Interpreting the confidence interval As shown above in Subsection 9.3.1, one range of plausible values for the population mean age of pennies in 2011, denoted by \\(\\mu\\), is \\([20.97, 29.25]\\). Recall that this confidence interval is based on bootstrapping using pennies_sample. Note that the mean of pennies (21.152) does fall in this confidence interval. If we had a different sample of size 40 and constructed a confidence interval using the same method, would we be guaranteed that it contained the population parameter value as well? Let’s try it out: pennies_sample2 &lt;- pennies %&gt;% sample_n(size = 40) Note the use of the sample_n() function in the dplyr package here. This does the same thing as rep_sample_n(reps = 1) but omits the extra replicate column. We next create an infer pipeline to generate a percentile-based 95% confidence interval for \\(\\mu\\): percentile_ci2 &lt;- pennies_sample2 %&gt;% specify(formula = age_in_2011 ~ NULL) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;mean&quot;) %&gt;% get_ci() percentile_ci2 # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 18.4 25.3 This new confidence interval also contains the value of \\(\\mu\\). Let’s further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of pennies. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years. Of the 100 confidence intervals based on samples of size \\(n = 40\\), 96 of them captured the population mean \\(\\mu = 21.152\\), whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we’d expect 95% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is “95% reliable” in that we can expect it to include the true population parameter 95% of the time if the process is repeated. To further accentuate this point, let’s perform a similar procedure using 90% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals. Of the 100 confidence intervals based on samples of size \\(n = 40\\), 87 of them captured the population mean \\(\\mu = 21.152\\), whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be “95% confident” or “90% confident” that the true value falls within the range of the specified confidence interval. We will use this “confident” language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process. Back to our pennies example After this elaboration on what the level corresponds to in a confidence interval, let’s conclude by providing an interpretation of the original confidence interval result we found in Subsection 9.3.1. Interpretation: We are 95% confident that the true mean age of pennies in circulation in 2011 is between 20.972 and 29.252 years. This level of confidence is based on the percentile-based method including the true mean 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created. 9.6 Example: One proportion Let’s revisit our exercise of trying to estimate the proportion of red balls in the bowl from Chapter 8. We are now interested in determining a confidence interval for population parameter \\(p\\), the proportion of balls that are red out of the total \\(N = 2400\\) red and white balls. We will use the first sample reported from Ilyas and Yohan in Subsection 8.1.3 for our point estimate. They observed 21 red balls out of the 50 in their shovel. This data is stored in the tactile_shovel1 data frame in the moderndive package. tactile_shovel1 # A tibble: 50 x 1 color &lt;chr&gt; 1 red 2 red 3 white 4 red 5 white 6 red 7 red 8 white 9 red 10 white # … with 40 more rows 9.6.1 Observed Statistic To compute the proportion that are red in this data we can use the specify() %&gt;% calculate() workflow. Note the use of the success argument here to clarify which of the two colors &quot;red&quot; or &quot;white&quot; we are interested in. p_hat &lt;- tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% calculate(stat = &quot;prop&quot;) p_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.42 9.6.2 Bootstrap distribution Next we want to calculate many different bootstrap samples and their corresponding bootstrap statistic (the proportion of red balls). We’ve done 1000 in the past, but let’s go up to 10,000 now to better see the resulting distribution. Recall that this is done by including a generate() function call in the middle of our pipeline: tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% generate(reps = 10000) This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the calculate() step. bootstrap_props &lt;- tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;prop&quot;) Let’s visualize() what the resulting bootstrap distribution looks like as a histogram. We’ve adjusted the number of bins here as well to better see the resulting shape. bootstrap_props %&gt;% visualize(bins = 25) We see that the resulting distribution is symmetric and bell-shaped so it doesn’t much matter which confidence interval method we choose. Let’s use the standard error method to create a 95% confidence interval. standard_error_ci &lt;- bootstrap_props %&gt;% get_ci(type = &quot;se&quot;, level = 0.95, point_estimate = p_hat) standard_error_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 0.284 0.556 bootstrap_props %&gt;% visualize(bins = 25, endpoints = standard_error_ci) We are 95% confident that the true proportion of red balls in the bowl is between 0.284 and 0.556. This level of confidence is based on the standard error-based method including the true proportion 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created. 9.6.3 Theory-based confidence intervals When the bootstrap distribution has the nice symmetric, bell shape that we saw in the red balls example above, we can also use a formula to quantify the standard error. This provides another way to compute a confidence interval, but is a little more tedious and mathematical. The steps are outlined below. We’ve also shown how we can use the confidence interval (CI) interpretation in this case as well to support your understanding of this tricky concept. Procedure for building a theory-based CI for \\(p\\) To construct a theory-based confidence interval for \\(p\\), the unknown true population proportion we Collect a sample of size \\(n\\) Compute \\(\\widehat{p}\\) Compute the standard error \\[\\text{SE} = \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Compute the margin of error \\[\\text{MoE} = 1.96 \\cdot \\text{SE} = 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Compute both end points of the confidence interval: The lower end point lower_ci: \\[\\widehat{p} - \\text{MoE} = \\widehat{p} - 1.96 \\cdot \\text{SE} = \\widehat{p} - 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] The upper end point upper_ci: \\[\\widehat{p} + \\text{MoE} = \\widehat{p} + 1.96 \\cdot \\text{SE} = \\widehat{p} + 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Alternatively, you can succinctly summarize a 95% confidence interval for \\(p\\) using the \\(\\pm\\) symbol: \\[ \\widehat{p} \\pm \\text{MoE} = \\widehat{p} \\pm 1.96 \\cdot \\text{SE} = \\widehat{p} \\pm 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}} \\] Confidence intervals based on 33 tactile samples Let’s load the tactile sampling data for the 33 groups from Chapter 8. Recall this data was saved in the tactile_prop_red data frame included in the moderndive package. tactile_prop_red Let’s now apply the above procedure for constructing confidence intervals for \\(p\\) using the data saved in tactile_prop_red by adding/modifying new columns using the dplyr package data wrangling tools seen in Chapter 4: Rename prop_red to p_hat, the official name of the sample proportion Make explicit the sample size n of \\(n=50\\) the standard error SE the margin of error MoE the left endpoint of the confidence interval lower_ci the right endpoint of the confidence interval upper_ci conf_ints &lt;- tactile_prop_red %&gt;% rename(p_hat = prop_red) %&gt;% mutate( n = 50, SE = sqrt(p_hat * (1 - p_hat) / n), MoE = 1.96 * SE, lower_ci = p_hat - MoE, upper_ci = p_hat + MoE ) conf_ints TABLE 9.2: 33 confidence intervals from 33 tactile samples of size n=50 group red_balls p_hat n SE MoE lower_ci upper_ci Ilyas, Yohan 21 0.42 50 0.070 0.137 0.283 0.557 Morgan, Terrance 17 0.34 50 0.067 0.131 0.209 0.471 Martin, Thomas 21 0.42 50 0.070 0.137 0.283 0.557 Clark, Frank 21 0.42 50 0.070 0.137 0.283 0.557 Riddhi, Karina 18 0.36 50 0.068 0.133 0.227 0.493 Andrew, Tyler 19 0.38 50 0.069 0.135 0.245 0.515 Julia 19 0.38 50 0.069 0.135 0.245 0.515 Rachel, Lauren 11 0.22 50 0.059 0.115 0.105 0.335 Daniel, Caroline 15 0.30 50 0.065 0.127 0.173 0.427 Josh, Maeve 17 0.34 50 0.067 0.131 0.209 0.471 Emily, Emily 16 0.32 50 0.066 0.129 0.191 0.449 Conrad, Emily 18 0.36 50 0.068 0.133 0.227 0.493 Oliver, Erik 17 0.34 50 0.067 0.131 0.209 0.471 Isabel, Nam 21 0.42 50 0.070 0.137 0.283 0.557 X, Claire 15 0.30 50 0.065 0.127 0.173 0.427 Cindy, Kimberly 20 0.40 50 0.069 0.136 0.264 0.536 Kevin, James 11 0.22 50 0.059 0.115 0.105 0.335 Nam, Isabelle 21 0.42 50 0.070 0.137 0.283 0.557 Harry, Yuko 15 0.30 50 0.065 0.127 0.173 0.427 Yuki, Eileen 16 0.32 50 0.066 0.129 0.191 0.449 Ramses 23 0.46 50 0.070 0.138 0.322 0.598 Joshua, Elizabeth, Stanley 15 0.30 50 0.065 0.127 0.173 0.427 Siobhan, Jane 18 0.36 50 0.068 0.133 0.227 0.493 Jack, Will 16 0.32 50 0.066 0.129 0.191 0.449 Caroline, Katie 21 0.42 50 0.070 0.137 0.283 0.557 Griffin, Y 18 0.36 50 0.068 0.133 0.227 0.493 Kaitlin, Jordan 17 0.34 50 0.067 0.131 0.209 0.471 Ella, Garrett 18 0.36 50 0.068 0.133 0.227 0.493 Julie, Hailin 15 0.30 50 0.065 0.127 0.173 0.427 Katie, Caroline 21 0.42 50 0.070 0.137 0.283 0.557 Mallory, Damani, Melissa 21 0.42 50 0.070 0.137 0.283 0.557 Katie 16 0.32 50 0.066 0.129 0.191 0.449 Francis, Vignesh 19 0.38 50 0.069 0.135 0.245 0.515 Let’s plot: These 33 confidence intervals for \\(p\\): from lower_ci to upper_ci The true population proportion \\(p = 900 / 2400 = 0.375\\) with a red vertical line FIGURE 9.2: 33 confidence intervals based on 33 tactile samples of size n=50 We see that: In 31 cases, the confidence intervals “capture” the true \\(p = 900 / 2400 = 0.375\\) In 2 cases, the confidence intervals do not “capture” the true \\(p = 900 / 2400 = 0.375\\) Thus, the confidence intervals capture the true proportion $31 / 33 = 93.939% of the time using this theory-based methodology. Confidence intervals based on 100 virtual samples Let’s say however, we repeated the above 100 times, not tactilely, but virtually. Let’s do this only 100 times instead of 1000 like we did before so that the results can fit on the screen. Again, the steps for compute a 95% confidence interval for \\(p\\) are: Collect a sample of size \\(n = 50\\) as we did in Chapter 8 Compute \\(\\widehat{p}\\): the sample proportion red of these \\(n=50\\) balls Compute the standard error \\(\\text{SE} = \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Compute the margin of error \\(\\text{MoE} = 1.96 \\cdot \\text{SE} = 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Compute both end points of the confidence interval: lower_ci: \\(\\widehat{p} - \\text{MoE} = \\widehat{p} - 1.96 \\cdot \\text{SE} = \\widehat{p} - 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) upper_ci: \\(\\widehat{p} + \\text{MoE} = \\widehat{p} + 1.96 \\cdot \\text{SE} = \\widehat{p} +1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Run the following three steps, being sure to View() the resulting data frame after each step so you can convince yourself of what’s going on: # First: Take 100 virtual samples of n=50 balls virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 100) # Second: For each virtual sample compute the proportion red virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) # Third: Compute the 95% confidence interval as above virtual_prop_red &lt;- virtual_prop_red %&gt;% rename(p_hat = prop_red) %&gt;% mutate( n = 50, SE = sqrt(p_hat*(1-p_hat)/n), MoE = 1.96 * SE, lower_ci = p_hat - MoE, upper_ci = p_hat + MoE ) Here are the results: FIGURE 9.3: 100 confidence intervals based on 100 virtual samples of size n=50 We see that of our 100 confidence intervals based on samples of size \\(n=50\\), 96 of them captured the true \\(p = 900/2400\\), whereas 4 of them missed. As we create more and more confidence intervals based on more and more samples, about 95% of these intervals will capture. In other words our procedure is “95% reliable.” Theoretical methods like this have largely been used in the past since we didn’t have the computing power to perform the simulation-based methods such as bootstrapping. They are still commonly used though and if the normality assumptions are met, they can provide a nice option for finding confidence intervals and performing hypothesis tests as we will see in Chapter 10. 9.7 Example: Comparing two proportions If you see someone else yawn, are you more likely to yawn? In an episode of the show Mythbusters, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website here. More information about the episode is also available on IMDb here. Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (“confederate”) who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at mythbusters_yawn in the moderndive package. Let’s check it out. mythbusters_yawn # A tibble: 50 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 1 seed yes 2 2 control yes 3 3 seed no 4 4 seed yes 5 5 seed no 6 6 control no 7 7 seed yes 8 8 control no 9 9 control no 10 10 seed no # … with 40 more rows The participant ID is stored in the subj variable with values of 1 to 50. The group variable is either &quot;seed&quot; for when a confederate was trying to influence the participant or &quot;control&quot; if a confederate did not interact with the participant. The yawn variable is either &quot;yes&quot; if the participant yawned or &quot;no&quot; if the participant did not yawn. We can use the janitor package to get a glimpse into this data in a table format: mythbusters_yawn %&gt;% tabyl(group, yawn) %&gt;% adorn_percentages() %&gt;% adorn_pct_formatting() %&gt;% # To show original counts adorn_ns() group no yes control 75.0% (12) 25.0% (4) seed 70.6% (24) 29.4% (10) We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We’d like to see if the difference between these two proportions is significantly larger than 0. If so, we’d have evidence to support the claim that yawning is contagious based on this study. In looking over this problem, we can make note of some important details to include in our infer pipeline: We are calling a success having a yawn value of &quot;yes&quot;. Our response variable will always correspond to the variable used in the success so the response variable is yawn. The explanatory variable is the other variable of interest here: group. To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not. 9.7.1 Compute the point estimate mythbusters_yawn %&gt;% specify(formula = yawn ~ group) Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`. Note that the success argument must be specified in situations such as this where the response variable has only two levels. mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) Response: yawn (factor) Explanatory: group (factor) # A tibble: 50 x 2 yawn group &lt;fct&gt; &lt;fct&gt; 1 yes seed 2 yes control 3 no seed 4 yes seed 5 no seed 6 no control 7 yes seed 8 no control 9 no control 10 no seed # … with 40 more rows We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes. mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;) Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c(&quot;first&quot;, &quot;second&quot;)` means `(&quot;first&quot; - &quot;second&quot;)`. Check `?calculate` for details. We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the order in which R should subtract these proportions of successes. As the error message states, we’ll want to put &quot;seed&quot; first after c() and then &quot;control&quot;: order = c(&quot;seed&quot;, &quot;control&quot;). Our point estimate is thus calculated: obs_diff &lt;- mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;seed&quot;, &quot;control&quot;)) obs_diff # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.0441 This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25). 9.7.2 Bootstrap distribution Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection 9.1.3 and in computing bootstrap proportions in Section 9.6, but we haven’t yet worked with bootstrapping involving multiple variables though. In the infer package, bootstrapping with multiple variables means that each row is potentially resampled. Let’s investigate this by looking at the first few rows of mythbusters_yawn: head(mythbusters_yawn) # A tibble: 6 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 1 seed yes 2 2 control yes 3 3 seed no 4 4 seed yes 5 5 seed no 6 6 control no When we bootstrap this data, we are potentially pulling the subject’s readings multiple times. Thus, we could see the entries of &quot;seed&quot; for group and &quot;no&quot; for yawn together in a new row in a bootstrap sample. This is further seen by exploring the sample_n() function in dplyr on this smaller 6 row data frame comprised of head(mythbusters_yawn). The sample_n() function can perform this bootstrapping procedure and is similar to the rep_sample_n() function in infer, except that it is not repeated but rather only performs one sample with or without replacement. set.seed(2019) head(mythbusters_yawn) %&gt;% sample_n(size = 6, replace = TRUE) # A tibble: 6 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 5 seed no 2 5 seed no 3 2 control yes 4 4 seed yes 5 1 seed yes 6 1 seed yes We can see that in this bootstrap sample generated from the first six rows of mythbusters_yawn, we have some rows repeated. The same is true when we perform the generate() step in infer as done below. bootstrap_distribution &lt;- mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;seed&quot;, &quot;control&quot;)) bootstrap_distribution %&gt;% visualize(bins = 20) This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply get_ci() can be used. bootstrap_distribution %&gt;% get_ci(type = &quot;percentile&quot;, level = 0.95) # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -0.219 0.293 The confidence interval shown here includes the value of 0. We’ll see in Chapter 10 further what this means in terms of this difference being statistically significant or not, but let’s examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293. Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about “95% confident”) that the seed group had a higher proportion of yawning than the control group. Note that this all relates to the importance of denoting the order argument in the calculate() function. Since we specified &quot;seed&quot; and then &quot;control&quot; positive values for the statistic correspond to the &quot;seed&quot; proportion being higher, whereas negative values correspond to the &quot;control&quot; group being higher. We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that “yawning is contagious” being “confirmed” is not statistically appropriate. Learning check Practice problems to come soon! 9.8 Conclusion 9.8.1 What’s to come? This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter 10 up next! 9.8.2 Script of R code An R script file of all R code used in this chapter is available here. References "],
-["10-hypothesis-testing.html", "Chapter 10 Hypothesis Testing 10.1 When inference is not needed 10.2 Basics of hypothesis testing 10.3 Criminal trial analogy 10.4 Types of errors in hypothesis testing 10.5 Statistical significance 10.6 Hypothesis testing with infer 10.7 Example: Comparing two means 10.8 Building theory-based methods using computation 10.9 Conclusion", " Chapter 10 Hypothesis Testing We saw some of the main concepts of hypothesis testing introduced in Chapters 8 and 9. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations. The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the infer package pipeline in Chapter 9. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix B. We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the \\(t\\)-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(infer) library(nycflights13) library(ggplot2movies) library(broom) We saw some of the main concepts of hypothesis testing introduced in Chapters 8 and 9. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations. The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the infer package pipeline in Chapter 9. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix B. We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the \\(t\\)-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(infer) library(nycflights13) library(ggplot2movies) library(broom) 10.1 When inference is not needed Before we delve into hypothesis testing, it’s good to remember that there are cases where you need not perform a rigorous statistical inference. An important and time-saving skill is to ALWAYS do exploratory data analysis using dplyr and ggplot2 before thinking about running a hypothesis test. Let’s look at such an example selecting a sample of flights traveling to Boston and to San Francisco from New York City in the flights data frame in the nycflights13 package. (We will remove flights with missing data first using na.omit and then sample 100 flights going to each of the two airports.) bos_sfo &lt;- flights %&gt;% na.omit() %&gt;% filter(dest %in% c(&quot;BOS&quot;, &quot;SFO&quot;)) %&gt;% group_by(dest) %&gt;% sample_n(100) Suppose we were interested in seeing if the air_time to SFO in San Francisco was statistically greater than the air_time to BOS in Boston. As suggested, let’s begin with some exploratory data analysis to get a sense for how the two variables of air_time and dest relate for these two destination airports: bos_sfo_summary &lt;- bos_sfo %&gt;% group_by(dest) %&gt;% summarize(mean_time = mean(air_time), sd_time = sd(air_time)) bos_sfo_summary # A tibble: 2 x 3 dest mean_time sd_time &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 BOS 39.0 4.51 2 SFO 349. 18.7 Looking at these results, we can clearly see that SFO air_time is much larger than BOS air_time. The standard deviation is also extremely informative here. Learning check (LC10.1) Could we make the same type of immediate conclusion that SFO had a statistically greater air_time if, say, its corresponding standard deviation was 200 minutes? What about 100 minutes? Explain. To further understand just how different the air_time variable is for BOS and SFO, let’s look at a boxplot: ggplot(data = bos_sfo, mapping = aes(x = dest, y = air_time)) + geom_boxplot() Since there is no overlap at all, we can conclude that the air_time for San Francisco flights is statistically greater (at any level of significance) than the air_time for Boston flights. This is a clear example of not needing to do anything more than some simple exploratory data analysis with descriptive statistics and data visualization to get an appropriate inferential conclusion. This is one reason why you should ALWAYS investigate the sample data first using dplyr and ggplot2 via exploratory data analysis. As you get more and more practice with hypothesis testing, you’ll be better able to determine in many cases whether or not the results will be statistically significant. There are circumstances where it is difficult to tell, but you should always try to make a guess FIRST about significance after you have completed your data exploration and before you actually begin the inferential techniques. 10.2 Basics of hypothesis testing In a hypothesis test, we will use data from a sample to help us decide between two competing hypotheses about a population. We make these hypotheses more concrete by specifying them in terms of at least one population parameter of interest. We refer to the competing claims about the population as the null hypothesis, denoted by \\(H_0\\), and the alternative (or research) hypothesis, denoted by \\(H_a\\). The roles of these two hypotheses are NOT interchangeable. The claim for which we seek significant evidence is assigned to the alternative hypothesis. The alternative is usually what the experimenter or researcher wants to establish or find evidence for. Usually, the null hypothesis is a claim that there really is “no effect” or “no difference.” In many cases, the null hypothesis represents the status quo or that nothing interesting is happening. We assess the strength of evidence by assuming the null hypothesis is true and determining how unlikely it would be to see sample results/statistics as extreme (or more extreme) as those in the original sample. Hypothesis testing brings about many weird and incorrect notions in the scientific community and society at large. One reason for this is that statistics has traditionally been thought of as this magic box of algorithms and procedures to get to results and this has been readily apparent if you do a Google search of “flowchart statistics hypothesis tests.” There are so many different complex ways to determine which test is appropriate. You’ll see that we don’t need to rely on these complicated series of assumptions and procedures to conduct a hypothesis test any longer. These methods were introduced in a time when computers weren’t powerful. Your cellphone (in 2016) has more power than the computers that sent NASA astronauts to the moon after all. We’ll see that ALL hypothesis tests can be broken down into the following framework given by Allen Downey here: FIGURE 10.1: Hypothesis Testing Framework Before we hop into this framework, we will provide another way to think about hypothesis testing that may be useful. 10.3 Criminal trial analogy We can think of hypothesis testing in the same context as a criminal trial in the United States. A criminal trial in the United States is a familiar situation in which a choice between two contradictory claims must be made. The accuser of the crime must be judged either guilty or not guilty. Under the U.S. system of justice, the individual on trial is initially presumed not guilty. Only STRONG EVIDENCE to the contrary causes the not guilty claim to be rejected in favor of a guilty verdict. The phrase “beyond a reasonable doubt” is often used to set the cutoff value for when enough evidence has been given to convict. Theoretically, we should never say “The person is innocent.” but instead “There is not sufficient evidence to show that the person is guilty.” Now let’s compare that to how we look at a hypothesis test. The decision about the population parameter(s) must be judged to follow one of two hypotheses. We initially assume that \\(H_0\\) is true. The null hypothesis \\(H_0\\) will be rejected (in favor of \\(H_a\\)) only if the sample evidence strongly suggests that \\(H_0\\) is false. If the sample does not provide such evidence, \\(H_0\\) will not be rejected. The analogy to “beyond a reasonable doubt” in hypothesis testing is what is known as the significance level. This will be set before conducting the hypothesis test and is denoted as \\(\\alpha\\). Common values for \\(\\alpha\\) are 0.1, 0.01, and 0.05. 10.3.1 Two possible conclusions Therefore, we have two possible conclusions with hypothesis testing: Reject \\(H_0\\) Fail to reject \\(H_0\\) Gut instinct says that “Fail to reject \\(H_0\\)” should say “Accept \\(H_0\\)” but this technically is not correct. Accepting \\(H_0\\) is the same as saying that a person is innocent. We cannot show that a person is innocent; we can only say that there was not enough substantial evidence to find the person guilty. When you run a hypothesis test, you are the jury of the trial. You decide whether there is enough evidence to convince yourself that \\(H_a\\) is true (“the person is guilty”) or that there was not enough evidence to convince yourself \\(H_a\\) is true (“the person is not guilty”). You must convince yourself (using statistical arguments) which hypothesis is the correct one given the sample information. Important note: Therefore, DO NOT WRITE “Accept \\(H_0\\)” any time you conduct a hypothesis test. Instead write “Fail to reject \\(H_0\\).” 10.4 Types of errors in hypothesis testing Unfortunately, just as a jury or a judge can make an incorrect decision in regards to a criminal trial by reaching the wrong verdict, there is some chance we will reach the wrong conclusion via a hypothesis test about a population parameter. As with criminal trials, this comes from the fact that we don’t have complete information, but rather a sample from which to try to infer about a population. The possible erroneous conclusions in a criminal trial are an innocent person is convicted (found guilty) or a guilty person is set free (found not guilty). The possible errors in a hypothesis test are rejecting \\(H_0\\) when in fact \\(H_0\\) is true (Type I Error) or failing to reject \\(H_0\\) when in fact \\(H_0\\) is false (Type II Error). The risk of error is the price researchers pay for basing an inference about a population on a sample. With any reasonable sample-based procedure, there is some chance that a Type I error will be made and some chance that a Type II error will occur. To help understand the concepts of Type I error and Type II error, observe the following table: FIGURE 10.2: Type I and Type II errors If we are using sample data to make inferences about a parameter, we run the risk of making a mistake. Obviously, we want to minimize our chance of error; we want a small probability of drawing an incorrect conclusion. The probability of a Type I Error occurring is denoted by \\(\\alpha\\) and is called the significance level of a hypothesis test The probability of a Type II Error is denoted by \\(\\beta\\). Formally, we can define \\(\\alpha\\) and \\(\\beta\\) in regards to the table above, but for hypothesis tests instead of a criminal trial. \\(\\alpha\\) corresponds to the probability of rejecting \\(H_0\\) when, in fact, \\(H_0\\) is true. \\(\\beta\\) corresponds to the probability of failing to reject \\(H_0\\) when, in fact, \\(H_0\\) is false. Ideally, we want \\(\\alpha = 0\\) and \\(\\beta = 0\\), meaning that the chance of making an error does not exist. When we have to use incomplete information (sample data), it is not possible to have both \\(\\alpha = 0\\) and \\(\\beta = 0\\). We will always have the possibility of at least one error existing when we use sample data. Usually, what is done is that \\(\\alpha\\) is set before the hypothesis test is conducted and then the evidence is judged against that significance level. Common values for \\(\\alpha\\) are 0.05, 0.01, and 0.10. If \\(\\alpha = 0.05\\), we are using a testing procedure that, used over and over with different samples, rejects a TRUE null hypothesis five percent of the time. So if we can set \\(\\alpha\\) to be whatever we want, why choose 0.05 instead of 0.01 or even better 0.0000000000000001? Well, a small \\(\\alpha\\) means the test procedure requires the evidence against \\(H_0\\) to be very strong before we can reject \\(H_0\\). This means we will almost never reject \\(H_0\\) if \\(\\alpha\\) is very small. If we almost never reject \\(H_0\\), the probability of a Type II Error – failing to reject \\(H_0\\) when we should – will increase! Thus, as \\(\\alpha\\) decreases, \\(\\beta\\) increases and as \\(\\alpha\\) increases, \\(\\beta\\) decreases. We, therefore, need to strike a balance in \\(\\alpha\\) and \\(\\beta\\) and the common values for \\(\\alpha\\) of 0.05, 0.01, and 0.10 usually lead to a nice balance. Learning check (LC10.2) Reproduce the table above about errors, but for a hypothesis test, instead of the one provided for a criminal trial. 10.4.1 Logic of hypothesis testing Take a random sample (or samples) from a population (or multiple populations) If the sample data are consistent with the null hypothesis, do not reject the null hypothesis. If the sample data are inconsistent with the null hypothesis (in the direction of the alternative hypothesis), reject the null hypothesis and conclude that there is evidence the alternative hypothesis is true (based on the particular sample collected). 10.5 Statistical significance The idea that sample results are more extreme than we would reasonably expect to see by random chance if the null hypothesis were true is the fundamental idea behind statistical hypothesis tests. If data at least as extreme would be very unlikely if the null hypothesis were true, we say the data are statistically significant. Statistically significant data provide convincing evidence against the null hypothesis in favor of the alternative, and allow us to generalize our sample results to the claim about the population. Learning check (LC10.3) What is wrong about saying “The defendant is innocent.” based on the US system of criminal trials? (LC10.4) What is the purpose of hypothesis testing? (LC10.5) What are some flaws with hypothesis testing? How could we alleviate them? 10.6 Hypothesis testing with infer The “There is Only One Test” diagram mentioned in Section 10.2 was the inspiration for the infer pipeline that you saw for confidence intervals in Chapter 9. For hypothesis tests, we include one more verb into the pipeline: the hypothesize() verb. Its main argument is null which is either &quot;point&quot; for point hypotheses involving a single sample or &quot;independence&quot; for testing for independence between two variables. We’ll first explore the two variable case by comparing two means. Note the section headings here that refer to the “There is Only One Test” diagram. We will lay out the specifics for each problem using this framework and the infer pipeline together. 10.7 Example: Comparing two means 10.7.1 Randomization/permutation We will now focus on building hypotheses looking at the difference between two population means in an example. We will denote population means using the Greek symbol \\(\\mu\\) (pronounced “mu”). Thus, we will be looking to see if one group “out-performs” another group. This is quite possibly the most common type of statistical inference and serves as a basis for many other types of analyses when comparing the relationship between two variables. Our null hypothesis will be of the form \\(H_0: \\mu_1 = \\mu_2\\), which can also be written as \\(H_0: \\mu_1 - \\mu_2 = 0\\). Our alternative hypothesis will be of the form \\(H_0: \\mu_1 \\star \\mu_2\\) (or \\(H_a: \\mu_1 - \\mu_2 \\, \\star \\, 0\\)) where \\(\\star\\) = \\(&lt;\\), \\(\\ne\\), or \\(&gt;\\) depending on the context of the problem. You needn’t focus on these new symbols too much at this point. It will just be a shortcut way for us to describe our hypotheses. As we saw in Chapter 9, bootstrapping is a valuable tool when conducting inferences based on one or two population variables. We will see that the process of randomization (also known as permutation) will be valuable in conducting tests comparing quantitative values from two groups. 10.7.2 Comparing action and romance movies The movies dataset in the ggplot2movies package contains information on a large number of movies that have been rated by users of IMDB.com (Wickham 2015). We are interested in the question here of whether Action movies are rated higher on IMDB than Romance movies. We will first need to do a little bit of data wrangling using the ideas from Chapter 4 to get the data in the form that we would like: movies_trimmed &lt;- movies %&gt;% select(title, year, rating, Action, Romance) Note that Action and Romance are binary variables here. To remove any overlap of movies (and potential confusion) that are both Action and Romance, we will remove them from our population: movies_trimmed &lt;- movies_trimmed %&gt;% filter(!(Action == 1 &amp; Romance == 1)) We will now create a new variable called genre that specifies whether a movie in our movies_trimmed data frame is an &quot;Action&quot; movie, a &quot;Romance&quot; movie, or &quot;Neither&quot;. We aren’t really interested in the &quot;Neither&quot; category here so we will exclude those rows as well. Lastly, the Action and Romance columns are not needed anymore since they are encoded in the genre column. movies_trimmed &lt;- movies_trimmed %&gt;% mutate(genre = case_when(Action == 1 ~ &quot;Action&quot;, Romance == 1 ~ &quot;Romance&quot;, TRUE ~ &quot;Neither&quot;)) %&gt;% filter(genre != &quot;Neither&quot;) %&gt;% select(-Action, -Romance) The case_when function is useful for assigning values in a new variable based on the values of another variable. The last step of TRUE ~ &quot;Neither&quot; is used when a particular movie is not set to either Action or Romance. We are left with 8878 movies in our population dataset that focuses on only &quot;Action&quot; and &quot;Romance&quot; movies. Learning check (LC10.6) Why are the different genre variables stored as binary variables (1s and 0s) instead of just listing the genre as a column of values like “Action”, “Comedy”, etc.? (LC10.7) What complications could come above with us excluding action romance movies? Should we question the results of our hypothesis test? Explain. Let’s now visualize the distributions of rating across both levels of genre. Think about what type(s) of plot is/are appropriate here before you proceed: ggplot(data = movies_trimmed, aes(x = genre, y = rating)) + geom_boxplot() FIGURE 10.3: Rating vs genre in the population We can see that the middle 50% of ratings for &quot;Action&quot; movies is more spread out than that of &quot;Romance&quot; movies in the population. &quot;Romance&quot; has outliers at both the top and bottoms of the scale though. We are initially interested in comparing the mean rating across these two groups so a faceted histogram may also be useful: ggplot(data = movies_trimmed, mapping = aes(x = rating)) + geom_histogram(binwidth = 1, color = &quot;white&quot;) + facet_grid(genre ~ .) FIGURE 10.4: Faceted histogram of genre vs rating Important note: Remember that we hardly ever have access to the population values as we do here. This example and the nycflights13 dataset were used to create a common flow from chapter to chapter. In nearly all circumstances, we’ll be needing to use only a sample of the population to try to infer conclusions about the unknown population parameter values. These examples do show a nice relationship between statistics (where data is usually small and more focused on experimental settings) and data science (where data is frequently large and collected without experimental conditions). 10.7.3 Sampling \\(\\rightarrow\\) randomization We can use hypothesis testing to investigate ways to determine, for example, whether a treatment has an effect over a control and other ways to statistically analyze if one group performs better than, worse than, or different than another. We are interested here in seeing how we can use a random sample of action movies and a random sample of romance movies from movies to determine if a statistical difference exists in the mean ratings of each group. Learning check (LC10.8) Define the relevant parameters here in terms of the populations of movies. 10.7.4 Data Let’s select a random sample of 34 action movies and a random sample of 34 romance movies. (The number 34 was chosen somewhat arbitrarily here.) set.seed(2017) movies_genre_sample &lt;- movies_trimmed %&gt;% group_by(genre) %&gt;% sample_n(34) %&gt;% ungroup() Note the addition of the ungroup() function here. This will be useful shortly in allowing us to permute the values of rating across genre. Our analysis does not work without this ungroup() function since the data stays grouped by the levels of genre without it. We can now observe the distributions of our two sample ratings for both groups. Remember that these plots should be rough approximations of our population distributions of movie ratings for &quot;Action&quot; and &quot;Romance&quot; in our population of all movies in the movies data frame. ggplot(data = movies_genre_sample, aes(x = genre, y = rating)) + geom_boxplot() FIGURE 10.5: Genre vs rating for our sample ggplot(data = movies_genre_sample, mapping = aes(x = rating)) + geom_histogram(binwidth = 1, color = &quot;white&quot;) + facet_grid(genre ~ .) FIGURE 10.6: Genre vs rating for our sample as faceted histogram Learning check (LC10.9) What single value could we change to improve the approximation using the sample distribution on the population distribution? Do we have reason to believe, based on the sample distributions of rating over the two groups of genre, that there is a significant difference between the mean rating for action movies compared to romance movies? It’s hard to say just based on the plots. The boxplot does show that the median sample rating is higher for romance movies, but the histogram isn’t as clear. The two groups have somewhat differently shaped distributions but they are both over similar values of rating. It’s often useful to calculate the mean and standard deviation as well, conditioned on the two levels. summary_ratings &lt;- movies_genre_sample %&gt;% group_by(genre) %&gt;% summarize(mean = mean(rating), std_dev = sd(rating), n = n()) summary_ratings # A tibble: 2 x 4 genre mean std_dev n &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; 1 Action 5.11 1.49 34 2 Romance 6.06 1.15 34 Learning check (LC10.10) Why did we not specify na.rm = TRUE here as we did in Chapter 4? We see that the sample mean rating for romance movies, \\(\\bar{x}_{r}\\), is greater than the similar measure for action movies, \\(\\bar{x}_a\\). But is it statistically significantly greater (thus, leading us to conclude that the means are statistically different)? The standard deviation can provide some insight here but with these standard deviations being so similar it’s still hard to say for sure. Learning check (LC10.11) Why might the standard deviation provide some insight about the means being statistically different or not? 10.7.5 Model of \\(H_0\\) The hypotheses we specified can also be written in another form to better give us an idea of what we will be simulating to create our null distribution. \\(H_0: \\mu_r - \\mu_a = 0\\) \\(H_a: \\mu_r - \\mu_a \\ne 0\\) 10.7.6 Test statistic \\(\\delta\\) We are, therefore, interested in seeing whether the difference in the sample means, \\(\\bar{x}_r - \\bar{x}_a\\), is statistically different than 0. We can now come back to our infer pipeline for computing our observed statistic. Note the order argument that shows the mean value for &quot;Action&quot; being subtracted from the mean value of &quot;Romance&quot;. 10.7.7 Observed effect \\(\\delta^*\\) obs_diff &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) obs_diff # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.95 Our goal next is to figure out a random process with which to simulate the null hypothesis being true. Recall that \\(H_0: \\mu_r - \\mu_a = 0\\) corresponds to us assuming that the population means are the same. We would like to assume this is true and perform a random process to generate() data in the model of the null hypothesis. 10.7.8 Simulated data Tactile simulation Here, with us assuming the two population means are equal (\\(H_0: \\mu_r - \\mu_a = 0\\)), we can look at this from a tactile point of view by using index cards. There are \\(n_r = 34\\) data elements corresponding to romance movies and \\(n_a = 34\\) for action movies. We can write the 34 ratings from our sample for romance movies on one set of 34 index cards and the 34 ratings for action movies on another set of 34 index cards. (Note that the sample sizes need not be the same.) The next step is to put the two stacks of index cards together, creating a new set of 68 cards. If we assume that the two population means are equal, we are saying that there is no association between ratings and genre (romance vs action). We can use the index cards to create two new stacks for romance and action movies. Note that the new “romance movie stack” will likely have some of the original action movies in it and likewise for the “action movie stack” including some romance movies from our original set. Since we are assuming that each card is equally likely to have appeared in either one of the stacks this makes sense. First, we must shuffle all the cards thoroughly. After doing so, in this case with equal values of sample sizes, we split the deck in half. We then calculate the new sample mean rating of the romance deck, and also the new sample mean rating of the action deck. This creates one simulation of the samples that were collected originally. We next want to calculate a statistic from these two samples. Instead of actually doing the calculation using index cards, we can use R as we have before to simulate this process. Let’s do this just once and compare the results to what we see in movies_genre_sample. movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 1) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.515 Learning check (LC10.12) How would the tactile shuffling of index cards change if we had different samples of say 20 action movies and 60 romance movies? Describe each step that would change. (LC10.13) Why are we taking the difference in the means of the cards in the new shuffled decks? 10.7.9 Distribution of \\(\\delta\\) under \\(H_0\\) The generate() step completes a permutation sending values of ratings to potentially different values of genre from which they originally came. It simulates a shuffling of the ratings between the two levels of genre just as we could have done with index cards. We can now proceed in a similar way to what we have done previously with bootstrapping by repeating this process many times to create simulated samples, assuming the null hypothesis is true. generated_samples &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) A null distribution of simulated differences in sample means is created with the specification of stat = &quot;diff in means&quot; for the calculate() step. The null distribution is similar to the bootstrap distribution we saw in Chapter 9, but remember that it consists of statistics generated assuming the null hypothesis is true. We can now plot the distribution of these simulated differences in means: null_distribution_two_means %&gt;% visualize() FIGURE 10.7: Simulated differences in means histogram 10.7.10 The p-value Remember that we are interested in seeing where our observed sample mean difference of 0.95 falls on this null/randomization distribution. We are interested in simply a difference here so “more extreme” corresponds to values in both tails on the distribution. Let’s shade our null distribution to show a visual representation of our \\(p\\)-value: null_distribution_two_means %&gt;% visualize(obs_stat = obs_diff, direction = &quot;both&quot;) FIGURE 10.8: Shaded histogram to show p-value Remember that the observed difference in means was 0.95. We have shaded red all values at or above that value and also shaded red those values at or below its negative value (since this is a two-tailed test). By giving obs_stat = obs_diff a vertical darker line is also shown at 0.95. To better estimate how large the \\(p\\)-value will be, we also increase the number of bins to 100 here from 20: null_distribution_two_means %&gt;% visualize(bins = 100, obs_stat = obs_diff, direction = &quot;both&quot;) FIGURE 10.9: Histogram with vertical lines corresponding to observed statistic At this point, it is important to take a guess as to what the \\(p\\)-value may be. We can see that there are only a few permuted differences as extreme or more extreme than our observed effect (in both directions). Maybe we guess that this \\(p\\)-value is somewhere around 2%, or maybe 3%, but certainly not 30% or more. Lastly, we calculate the \\(p\\)-value directly using infer: pvalue &lt;- null_distribution_two_means %&gt;% get_pvalue(obs_stat = obs_diff, direction = &quot;both&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.006 We have around 0.6% of values as extreme or more extreme than our observed statistic in both directions. Assuming we are using a 5% significance level for \\(\\alpha\\), we have evidence supporting the conclusion that the mean rating for romance movies is different from that of action movies. The next important idea is to better understand just how much higher of a mean rating can we expect the romance movies to have compared to that of action movies. 10.7.11 Corresponding confidence interval One of the great things about the infer pipeline is that going between hypothesis tests and confidence intervals is incredibly simple. To create a null distribution, we ran null_distribution_two_means &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) To get the corresponding bootstrap distribution with which we can compute a confidence interval, we can just remove or comment out the hypothesize() step since we are no longer assuming the null hypothesis is true when we bootstrap: percentile_ci_two_means &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% # hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) %&gt;% get_ci() percentile_ci_two_means # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.333 1.59 Thus, we can expect the true mean of Romance movies on IMDB to have a rating 0.333 to 1.593 points higher than that of Action movies. Remember that this is based on bootstrapping using movies_genre_sample as our original sample and the confidence interval process being 95% reliable. Learning check (LC10.14) Conduct the same analysis comparing action movies versus romantic movies using the median rating instead of the mean rating? What was different and what was the same? (LC10.15) What conclusions can you make from viewing the faceted histogram looking at rating versus genre that you couldn’t see when looking at the boxplot? (LC10.16) Describe in a paragraph how we used Allen Downey’s diagram to conclude if a statistical difference existed between mean movie ratings for action and romance movies. (LC10.17) Why are we relatively confident that the distributions of the sample ratings will be good approximations of the population distributions of ratings for the two genres? (LC10.18) Using the definition of “\\(p\\)-value”, write in words what the \\(p\\)-value represents for the hypothesis test above comparing the mean rating of romance to action movies. (LC10.19) What is the value of the \\(p\\)-value for the hypothesis test comparing the mean rating of romance to action movies? (LC10.20) Do the results of the hypothesis test match up with the original plots we made looking at the population of movies? Why or why not? 10.7.12 Summary To review, these are the steps one would take whenever you’d like to do a hypothesis test comparing values from the distributions of two groups: Simulate many samples using a random process that matches the way the original data were collected and that assumes the null hypothesis is true. Collect the values of a sample statistic for each sample created using this random process to build a null distribution. Assess the significance of the original sample by determining where its sample statistic lies in the null distribution. If the proportion of values as extreme or more extreme than the observed statistic in the randomization distribution is smaller than the pre-determined significance level \\(\\alpha\\), we reject \\(H_0\\). Otherwise, we fail to reject \\(H_0\\). (If no significance level is given, one can assume \\(\\alpha = 0.05\\).) 10.8 Building theory-based methods using computation As a point of reference, we will now discuss the traditional theory-based way to conduct the hypothesis test for determining if there is a statistically significant difference in the sample mean rating of Action movies versus Romance movies. This method and ones like it work very well when the assumptions are met in order to run the test. They are based on probability models and distributions such as the normal and \\(t\\)-distributions. These traditional methods have been used for many decades back to the time when researchers didn’t have access to computers that could run 5000 simulations in a few seconds. They had to base their methods on probability theory instead. Many fields and researchers continue to use these methods and that is the biggest reason for their inclusion here. It’s important to remember that a \\(t\\)-test or a \\(z\\)-test is really just an approximation of what you have seen in this chapter already using simulation and randomization. The focus here is on understanding how the shape of the \\(t\\)-curve comes about without digging big into the mathematical underpinnings. 10.8.1 Example: \\(t\\)-test for two independent samples What is commonly done in statistics is the process of normalization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common normalization is known as the \\(z\\)-score. The formula for a \\(z\\)-score is \\[Z = \\frac{x - \\mu}{\\sigma},\\] where \\(x\\) represent the value of a variable, \\(\\mu\\) represents the mean of the variable, and \\(\\sigma\\) represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding \\(z\\)-score that gives how many standard deviations away that value is from its mean. \\(z\\)-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below. Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity. Another form of normalization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This normalization is often called the \\(t\\)-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is \\[T =\\dfrac{ (\\bar{x}_1 - \\bar{x}_2) - (\\mu_1 - \\mu_2)}{ \\sqrt{\\dfrac{{s_1}^2}{n_1} + \\dfrac{{s_2}^2}{n_2}} }\\] There is a lot to try to unpack here. \\(\\bar{x}_1\\) is the sample mean response of the first group \\(\\bar{x}_2\\) is the sample mean response of the second group \\(\\mu_1\\) is the population mean response of the first group \\(\\mu_2\\) is the population mean response of the second group \\(s_1\\) is the sample standard deviation of the response of the first group \\(s_2\\) is the sample standard deviation of the response of the second group \\(n_1\\) is the sample size of the first group \\(n_2\\) is the sample size of the second group Assuming that the null hypothesis is true (\\(H_0: \\mu_1 - \\mu_2 = 0\\)), \\(T\\) is said to be distributed following a \\(t\\) distribution with degrees of freedom equal to the smaller value of \\(n_1 - 1\\) and \\(n_2 - 1\\). The “degrees of freedom” can be thought of measuring how different the \\(t\\) distribution will be as compared to a normal distribution. Small sample sizes lead to small degrees of freedom and, thus, \\(t\\) distributions that have more values in the tails of their distributions. Large sample sizes lead to large degrees of freedom and, thus, \\(t\\) distributions that closely align with the standard normal, bell-shaped curve. So, assuming \\(H_0\\) is true, our formula simplifies a bit: \\[T =\\dfrac{ \\bar{x}_1 - \\bar{x}_2}{ \\sqrt{\\dfrac{{s_1}^2}{n_1} + \\dfrac{{s_2}^2}{n_2}} }.\\] We have already built an approximation for what we think the distribution of \\(\\delta = \\bar{x}_1 - \\bar{x}_2\\) looks like using randomization above. Recall this distribution: ggplot(data = null_distribution_two_means, aes(x = stat)) + geom_histogram(color = &quot;white&quot;, bins = 20) FIGURE 10.10: Simulated differences in means histogram The infer package also includes some built-in theory-based statistics as well, so instead of going through the process of trying to transform the difference into a standardized form, we can just provide a different value for stat in calculate(). Recall the generated_samples data frame created via: generated_samples &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) We can now created a null distribution of \\(t\\) statistics: null_distribution_t &lt;- generated_samples %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) null_distribution_t %&gt;% visualize() We see that the shape of this stat = &quot;t&quot; distribution is the same as that of stat = &quot;diff in means&quot;. The scale has changed though with the \\(t\\) values having less spread than the difference in means. A traditional \\(t\\)-test doesn’t look at this simulated distribution, but instead it looks at the \\(t\\)-curve with degrees of freedom equal to 62.029. We can overlay this distribution over the top of our permuted \\(t\\) statistics using the method = &quot;both&quot; setting in visualize(). null_distribution_t %&gt;% visualize(method = &quot;both&quot;) We can see that the curve does a good job of approximating the randomization distribution here. (More on when to expect for this to be the case when we discuss conditions for the \\(t\\)-test in a bit.) To calculate the \\(p\\)-value in this case, we need to figure out how much of the total area under the \\(t\\)-curve is at our observed \\(T\\)-statistic or more, plus also adding the area under the curve at the negative value of the observed \\(T\\)-statistic or below. (Remember this is a two-tailed test so we are looking for a difference–values in the tails of either direction.) Just as we converted all of the simulated values to \\(T\\)-statistics, we must also do so for our observed effect \\(\\delta^*\\): obs_t &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) So graphically we are interested in finding the percentage of values that are at or above 2.945 or at or below -2.945. null_distribution_t %&gt;% visualize(method = &quot;both&quot;, obs_stat = obs_t, direction = &quot;both&quot;) As we might have expected with this just being a standardization of the difference in means statistic that produced a small \\(p\\)-value, we also have a very small one here. 10.8.2 Conditions for t-test The infer package does not automatically check conditions for the theoretical methods to work and this warning was given when we used method = &quot;both&quot;. In order for the results of the \\(t\\)-test to be valid, three conditions must be met: Independent observations in both samples Nearly normal populations OR large sample sizes (\\(n \\ge 30\\)) Independently selected samples Condition 1: This is met since we sampled at random using R from our population. Condition 2: Recall from Figure 10.4, that we know how the populations are distributed. Both of them are close to normally distributed. If we are a little concerned about this assumption, we also do have samples of size larger than 30 (\\(n_1 = n_2 = 34\\)). Condition 3: This is met since there is no natural pairing of a movie in the Action group to a movie in the Romance group. Since all three conditions are met, we can be reasonably certain that the theory-based test will match the results of the randomization-based test using shuffling. Remember that theory-based tests can produce some incorrect results in these assumptions are not carefully checked. The only assumption for randomization and computational-based methods is that the sample is selected at random. They are our preference and we strongly believe they should be yours as well, but it’s important to also see how the theory-based tests can be done and used as an approximation for the computational techniques until at least more researchers are using these techniques that utilize the power of computers. 10.9 Conclusion We conclude by showing the infer pipeline diagram. In Chapter 11, we’ll come back to regression and see how the ideas covered in Chapter 9 and this chapter can help in understanding the significance of predictors in modeling. 10.9.1 Script of R code An R script file of all R code used in this chapter is available here. References "],
-["11-inference-for-regression.html", "Chapter 11 Inference for Regression 11.1 Simulation-based Inference for Regression 11.2 Bootstrapping for the regression slope 11.3 Inference for multiple regression 11.4 Residual analysis", " Chapter 11 Inference for Regression Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. /begin{center} r include_image(path = “images/sign-2408065_1920.png”, html_opts=“height=100px”, latex_opts = “width=20%”) /end{center} Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(infer) library(gapminder) library(ISLR) 11.1 Simulation-based Inference for Regression We can also use the concept of permuting to determine the standard error of our null distribution and conduct a hypothesis test for a population slope. Let’s go back to our example on teacher evaluations Chapters 6 and 7. We’ll begin in the basic regression setting to test to see if we have evidence that a statistically significant positive relationship exists between teaching and beauty scores for the University of Texas professors. As we did in Chapter 6, teaching score will act as our outcome variable and bty_avg will be our explanatory variable. We will set up this hypothesis testing process as we have each before via the “There is Only One Test” diagram in Figure 10.1 using the infer package. 11.1.1 Data Our data is stored in evals and we are focused on the measurements of the score and bty_avg variables there. Note that we don’t choose a subset of variables here since we will specify() the variables of interest using infer. evals %&gt;% specify(score ~ bty_avg) Response: score (numeric) Explanatory: bty_avg (numeric) # A tibble: 463 x 2 score bty_avg &lt;dbl&gt; &lt;dbl&gt; 1 4.7 5 2 4.1 5 3 3.9 5 4 4.8 5 5 4.6 3 6 4.3 3 7 2.8 3 8 4.1 3.33 9 3.4 3.33 10 4.5 3.17 # … with 453 more rows 11.1.2 Test statistic \\(\\delta\\) Our test statistic here is the sample slope coefficient that we denote with \\(b_1\\). 11.1.3 Observed effect \\(\\delta^*\\) We can use the specify() %&gt;% calculate() shortcut here to determine the slope value seen in our observed data: slope_obs &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% calculate(stat = &quot;slope&quot;) The calculated slope value from our observed sample is \\(b_1 = 0.067\\). 11.1.4 Model of \\(H_0\\) We are looking to see if a positive relationship exists so \\(H_A: \\beta_1 &gt; 0\\). Our null hypothesis is always in terms of equality so we have \\(H_0: \\beta_1 = 0\\). In other words, when we assume the null hypothesis is true, we are assuming there is NOT a linear relationship between teaching and beauty scores for University of Texas professors. 11.1.5 Simulated data Now to simulate the null hypothesis being true and recreating how our sample was created, we need to think about what it means for \\(\\beta_1\\) to be zero. If \\(\\beta_1 = 0\\), we said above that there is no relationship between the teaching and beauty scores. If there is no relationship, then any one of the teaching score values could have just as likely occurred with any of the other beauty score values instead of the one that it actually did fall with. We, therefore, have another example of permuting in our simulating of data under the null hypothesis. Tactile simulation We could use a deck of 926 note cards to create a tactile simulation of this permuting process. We would write the 463 different values of beauty scores on each of the 463 cards, one per card. We would then do the same thing for the 463 teaching scores putting them on one per card. Next, we would lay out each of the 463 beauty score cards and we would shuffle the teaching score deck. Then, after shuffling the deck well, we would disperse the cards one per each one of the beauty score cards. We would then enter these new values in for teaching score and compute a sample slope based on this permuting. We could repeat this process many times, keeping track of our sample slope after each shuffle. 11.1.6 Distribution of \\(\\delta\\) under \\(H_0\\) We can build our null distribution in much the same way we did in Chapter 10 using the generate() and calculate() functions. Note also the addition of the hypothesize() function, which lets generate() know to perform the permuting instead of bootstrapping. null_slope_distn &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;slope&quot;) null_slope_distn %&gt;% visualize(obs_stat = slope_obs, direction = &quot;greater&quot;) In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with visualize(). 11.1.7 The p-value null_slope_distn %&gt;% get_pvalue(obs_stat = slope_obs, direction = &quot;greater&quot;) # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0 Since 0.067 falls far to the right of this plot beyond where any of the histogram bins have data, we can say that we have a \\(p\\)-value of 0. We, thus, have evidence to reject the null hypothesis in support of there being a positive association between the beauty score and teaching score of University of Texas faculty members. Learning check (LC11.1) Repeat the inference above but this time for the correlation coefficient instead of the slope. Note the implementation of stat = &quot;correlation&quot; in the calculate() function of the infer package. 11.2 Bootstrapping for the regression slope With the p-value calculated as 0 in the hypothesis test above, we can next determine just how strong of a positive slope value we might expect between the variables of teaching score and beauty score (bty_avg) for University of Texas faculty. Recall the infer pipeline above to compute the null distribution. Recall that this assumes the null hypothesis is true that there is no relationship between teaching score and beauty score using the hypothesize() function. null_slope_distn &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000, type = &quot;permute&quot;) %&gt;% calculate(stat = &quot;slope&quot;) To further reinforce the process being done in the pipeline, we’ve added the type argument to generate(). This is automatically added based on the entries for specify() and hypothesize() but it provides a useful way to check to make sure generate() is created the samples in the desired way. In this case, we permuted the values of one variable across the values of the other 10,000 times and calculated a &quot;slope&quot; coefficient for each of these 10,000 generated samples. If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping: bootstrap_slope_distn %&gt;% visualize() Next we can use the get_ci() function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score. percentile_slope_ci &lt;- bootstrap_slope_distn %&gt;% get_ci(level = 0.99, type = &quot;percentile&quot;) percentile_slope_ci # A tibble: 1 x 2 `0.5%` `99.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.0229 0.110 se_slope_ci &lt;- bootstrap_slope_distn %&gt;% get_ci(level = 0.99, type = &quot;se&quot;, point_estimate = slope_obs) se_slope_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 0.0220 0.111 With the bootstrap distribution being close to symmetric, it makes sense that the two resulting confidence intervals are similar. 11.3 Inference for multiple regression 11.3.1 Refresher: Professor evaluations data Let’s revisit the professor evaluations data that we analyzed using multiple regression with one numerical and one categorical predictor. In particular \\(y\\): outcome variable of instructor evaluation score predictor variables \\(x_1\\): numerical explanatory/predictor variable of age \\(x_2\\): categorical explanatory/predictor variable of gender library(ggplot2) library(dplyr) library(moderndive) evals_multiple &lt;- evals %&gt;% select(score, ethnicity, gender, language, age, bty_avg, rank) First, recall that we had two competing potential models to explain professors’ teaching scores: Model 1: No interaction term. i.e. both male and female profs have the same slope describing the associated effect of age on teaching score Model 2: Includes an interaction term. i.e. we allow for male and female profs to have different slopes describing the associated effect of age on teaching score 11.3.2 Refresher: Visualizations Recall the plots we made for both these models: FIGURE 11.1: Model 1: no interaction effect included FIGURE 11.2: Model 2: interaction effect included 11.3.3 Refresher: Regression tables Last, let’s recall the regressions we fit. First, the regression with no interaction effect: note the use of + in the formula in Table 11.1. score_model_2 &lt;- lm(score ~ age + gender, data = evals_multiple) get_regression_table(score_model_2) TABLE 11.1: Model 1: Regression table with no interaction effect included term estimate std_error statistic p_value lower_ci upper_ci intercept 4.484 0.125 35.79 0.000 4.238 4.730 age -0.009 0.003 -3.28 0.001 -0.014 -0.003 gendermale 0.191 0.052 3.63 0.000 0.087 0.294 Second, the regression with an interaction effect: note the use of * in the formula. score_model_3 &lt;- lm(score ~ age * gender, data = evals_multiple) get_regression_table(score_model_3) TABLE 11.2: Model 2: Regression table with interaction effect included term estimate std_error statistic p_value lower_ci upper_ci intercept 4.883 0.205 23.80 0.000 4.480 5.286 age -0.018 0.004 -3.92 0.000 -0.026 -0.009 gendermale -0.446 0.265 -1.68 0.094 -0.968 0.076 age:gendermale 0.014 0.006 2.45 0.015 0.003 0.024 11.3.4 Script of R code An R script file of all R code used in this chapter is available here. 11.4 Residual analysis 11.4.1 Residual analysis Recall the residuals can be thought of as the error or the “lack-of-fit” between the observed value \\(y\\) and the fitted value \\(\\widehat{y}\\) on the blue regression line in Figure 6.6. Ideally when we fit a regression model, we’d like there to be no systematic pattern to these residuals. We’ll be more specific as to what we mean by no systematic pattern when we see Figure 11.4 below, but let’s keep this notion imprecise for now. Investigating any such patterns is known as residual analysis and is the theme of this section. We’ll perform our residual analysis in two ways: Creating a scatterplot with the residuals on the \\(y\\)-axis and the original explanatory variable \\(x\\) on the \\(x\\)-axis. Creating a histogram of the residuals, thereby showing the distribution of the residuals. First, recall in Figure 6.8 above we created a scatterplot where on the vertical axis we had the teaching score \\(y\\), on the horizontal axis we had the beauty score \\(x\\), and the blue arrow represented the residual for one particular instructor. Instead, in Figure 11.3 below, let’s create a scatterplot where On the vertical axis we have the residual \\(y-\\widehat{y}\\) instead On the horizontal axis we have the beauty score \\(x\\) as before: # Get data evals_ch6 &lt;- evals %&gt;% select(score, bty_avg, age) # Fit regression model: score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) # Get regression table: get_regression_table(score_model) # A tibble: 2 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept 3.88 0.076 51.0 0 3.73 4.03 2 bty_avg 0.067 0.016 4.09 0 0.035 0.099 # Get regression points regression_points &lt;- get_regression_points(score_model) ggplot(regression_points, aes(x = bty_avg, y = residual)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;, size = 1) FIGURE 11.3: Plot of residuals over beauty score You can think of Figure 11.3 as Figure 6.8 but with the blue line flattened out to \\(y=0\\). Does it seem like there is no systematic pattern to the residuals? This question is rather qualitative and subjective in nature, thus different people may respond with different answers to the above question. However, it can be argued that there isn’t a drastic pattern in the residuals. Let’s now get a little more precise in our definition of no systematic pattern in the residuals. Ideally, the residuals should behave randomly. In addition, the residuals should be on average 0. In other words, sometimes the regression model will make a positive error in that \\(y - \\widehat{y} &gt; 0\\), sometimes the regression model will make a negative error in that \\(y - \\widehat{y} &lt; 0\\), but on average the error is 0. Further, the value and spread of the residuals should not depend on the value of \\(x\\). In Figure 11.4 below, we display some hypothetical examples where there are drastic patterns to the residuals. In Example 1, the value of the residual seems to depend on \\(x\\): the residuals tend to be positive for small and large values of \\(x\\) in this range, whereas values of \\(x\\) more in the middle tend to have negative residuals. In Example 2, while the residuals seem to be on average 0 for each value of \\(x\\), the spread of the residuals varies for different values of \\(x\\); this situation is known as heteroskedasticity. FIGURE 11.4: Examples of less than ideal residual patterns The second way to perform a residual analysis is to look at the histogram of the residuals: ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 0.25, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) FIGURE 11.5: Histogram of residuals This histogram seems to indicate that we have more positive residuals than negative. Since the residual \\(y-\\widehat{y}\\) is positive when \\(y &gt; \\widehat{y}\\), it seems our fitted teaching score from the regression model tends to underestimate the true teaching score. This histogram has a slight left-skew in that there is a long tail on the left. Another way to say this is this data exhibits a negative skew. Is this a problem? Again, there is a certain amount of subjectivity in the response. In the authors’ opinion, while there is a slight skew/pattern to the residuals, it isn’t a large concern. On the other hand, others might disagree with our assessment. Here are examples of an ideal and less than ideal pattern to the residuals when viewed in a histogram: FIGURE 11.6: Examples of ideal and less than ideal residual patterns In fact, we’ll see later on that we would like the residuals to be normally distributed with mean 0. In other words, be bell-shaped and centered at 0! While this requirement and residual analysis in general may seem to some of you as not being overly critical at this point, we’ll see later after when we cover inference for regression in Chapter 11 that for the last five columns of the regression table from earlier (std error, statistic, p_value,lower_ci, and upper_ci) to have valid interpretations, the above three conditions should roughly hold. Learning check (LC11.2) Continuing with our regression using age as the explanatory variable and teaching score as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 463 instructors. Perform a residual analysis and look for any systematic patterns in the residuals. Ideally, there should be little to no pattern. 11.4.2 Residual analysis # Get data: gapminder2007 &lt;- gapminder %&gt;% filter(year == 2007) %&gt;% select(country, continent, lifeExp, gdpPercap) # Fit regression model: lifeExp_model &lt;- lm(lifeExp ~ continent, data = gapminder2007) # Get regression table: get_regression_table(lifeExp_model) # A tibble: 5 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept 54.8 1.02 53.4 0 52.8 56.8 2 continentAmericas 18.8 1.8 10.4 0 15.2 22.4 3 continentAsia 15.9 1.65 9.68 0 12.7 19.2 4 continentEurope 22.8 1.70 13.5 0 19.5 26.2 5 continentOceania 25.9 5.33 4.86 0 15.4 36.4 # Get regression points regression_points &lt;- get_regression_points(lifeExp_model) Recall our discussion on residuals from Section 11.4.1 where our goal was to investigate whether or not there was a systematic pattern to the residuals. Ideally since residuals can be thought of as error, there should be no such pattern. While there are many ways to do such residual analysis, we focused on two approaches based on visualizations. A plot with residuals on the vertical axis and the predictor (in this case continent) on the horizontal axis A histogram of all residuals First, let’s plot the residuals versus continent in Figure 11.7, but also let’s plot all 142 points with a little horizontal random jitter by setting the width = 0.1 parameter in geom_jitter(): ggplot(regression_points, aes(x = continent, y = residual)) + geom_jitter(width = 0.1) + labs(x = &quot;Continent&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;) FIGURE 11.7: Plot of residuals over continent We observe There seems to be a rough balance of both positive and negative residuals for all 5 continents. However, there is one clear outlier in Asia, which has a residual with the largest deviation away from 0. Let’s investigate the 5 countries in Asia with the shortest life expectancy: gapminder2007 %&gt;% filter(continent == &quot;Asia&quot;) %&gt;% arrange(lifeExp) TABLE 11.3: Countries in Asia with shortest life expectancy country continent lifeExp gdpPercap Afghanistan Asia 43.8 975 Iraq Asia 59.5 4471 Cambodia Asia 59.7 1714 Myanmar Asia 62.1 944 Yemen, Rep. Asia 62.7 2281 This was the earlier identified residual for Afghanistan of -26.9. Unfortunately given recent geopolitical turmoil, individuals who live in Afghanistan and, in particular in 2007, have a drastically lower life expectancy. Second, let’s look at a histogram of all 142 values of residuals in Figure 11.8. In this case, the residuals form a rather nice bell-shape, although there are a couple of very low and very high values at the tails. As we said previously, searching for patterns in residuals can be somewhat subjective, but ideally we hope there are no “drastic” patterns. ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) FIGURE 11.8: Histogram of residuals Learning check (LC11.3) Continuing with our regression using gdpPercap as the outcome variable and continent as the explanatory variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 142 countries in 2007 and perform a residual analysis to look for any systematic patterns in the residuals. Is there a pattern? Please keep in mind that these types of questions are somewhat subjective and different people will most likely have different answers. The focus should be on being able to justify the conclusions made. 11.4.3 Residual analysis Recall in Section 11.4.1, our first residual analysis plot investigated the presence of any systematic pattern in the residuals when we had a single numerical predictor: bty_age. For the Credit card dataset, since we have two numerical predictors, Limit and Income, we must perform this twice: # Get data: Credit &lt;- Credit %&gt;% select(Balance, Limit, Income, Rating, Age) # Fit regression model: Balance_model &lt;- lm(Balance ~ Limit + Income, data = Credit) # Get regression table: get_regression_table(Balance_model) # A tibble: 3 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept -385. 19.5 -19.8 0 -423. -347. 2 Limit 0.264 0.006 45.0 0 0.253 0.276 3 Income -7.66 0.385 -19.9 0 -8.42 -6.91 # Get regression points regression_points &lt;- get_regression_points(Balance_model) ggplot(regression_points, aes(x = Limit, y = residual)) + geom_point() + labs(x = &quot;Credit limit (in $)&quot;, y = &quot;Residual&quot;, title = &quot;Residuals vs credit limit&quot;) ggplot(regression_points, aes(x = Income, y = residual)) + geom_point() + labs(x = &quot;Income (in $1000)&quot;, y = &quot;Residual&quot;, title = &quot;Residuals vs income&quot;) FIGURE 11.9: Residuals vs credit limit and income In this case, there does appear to be a systematic pattern to the residuals. As the scatter of the residuals around the line \\(y=0\\) is definitely not consistent. This behavior of the residuals is further evidenced by the histogram of residuals in Figure 11.10. We observe that the residuals have a slight right-skew (recall we say that data is right-skewed, or positively-skewed, if there is a tail to the right). Ideally, these residuals should be bell-shaped around a residual value of 0. ggplot(regression_points, aes(x = residual)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) FIGURE 11.10: Relationship between credit card balance and credit limit/income Another way to interpret this histogram is that since the residual is computed as \\(y - \\widehat{y}\\) = balance - balance_hat, we have some values where the fitted value \\(\\widehat{y}\\) is very much lower than the observed value \\(y\\). In other words, we are underestimating certain credit card holders’ balances by a very large amount. Learning check (LC11.4) Continuing with our regression using Rating and Age as the explanatory variables and credit card Balance as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 400 credit card holders. Perform a residual analysis and look for any systematic patterns in the residuals. 11.4.4 Residual analysis # Get data: evals_ch7 &lt;- evals %&gt;% select(score, age, gender) # Fit regression model: score_model_2 &lt;- lm(score ~ age + gender, data = evals_ch7) # Get regression table: get_regression_table(score_model_2) # A tibble: 3 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept 4.48 0.125 35.8 0 4.24 4.73 2 age -0.009 0.003 -3.28 0.001 -0.014 -0.003 3 gendermale 0.191 0.052 3.63 0 0.087 0.294 # Get regression points regression_points &lt;- get_regression_points(score_model_2) As always, let’s perform a residual analysis first with a histogram, which we can facet by gender: ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 0.25, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) + facet_wrap(~gender) FIGURE 11.11: Interaction model histogram of residuals Second, the residuals as compared to the predictor variables: \\(x_1\\): numerical explanatory/predictor variable of age \\(x_2\\): categorical explanatory/predictor variable of gender ggplot(regression_points, aes(x = age, y = residual)) + geom_point() + labs(x = &quot;age&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;, size = 1) + facet_wrap(~ gender) FIGURE 11.12: Interaction model residuals vs predictor "],
-["12-thinking-with-data.html", "Chapter 12 Thinking with Data 12.1 Case study: Seattle house prices 12.2 Case study: Effective data storytelling Concluding remarks", " Chapter 12 Thinking with Data Recall in Section 1.1 “Introduction for students” and at the end of chapters throughout this book, we displayed the “ModernDive flowchart” mapping your journey through this book. FIGURE 12.1: ModernDive Flowchart Let’s get a refresher of what you’ve covered so far. You first got started with with data in Chapter 2, where you learned about the difference between R and RStudio, started coding in R, started understanding what R packages are, and explored your first dataset: all domestic departure flights from a New York City airport in 2013. Then: Data science: You assembled your data science toolbox using tidyverse packages. In particular: Ch.3: Visualizing data via the ggplot2 package. Ch.5: Understanding the concept of “tidy” data as a standardized data input format for all packages in the tidyverse Ch.4: Wrangling data via the dplyr package. Data modeling: Using these data science tools and helper functions from the moderndive package, you started performing data modeling. In particular: Ch.6: Constructing basic regression models. Ch.7: Constructing multiple regression models. Statistical inference: Once again using your newly acquired data science tools, we unpacked statistical inference using the infer package. In particular: Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls. Ch.9: Building confidence intervals. Ch.10: Conducting hypothesis tests. Data modeling revisited: Armed with your new understanding of statistical inference, you revisited and reviewed the models you constructed in Ch.6 &amp; Ch.7. In particular: Ch.11: Interpreting both the statistical and practice significance of the results of the models. All this was our approach of guiding you through your first experiences of “thinking with data”, an expression originally coined by Diane Lambert of Google. How the philosophy underlying this expression guided our mapping of the flowchart above was well put in the introduction to the “Practical Data Science for Stats” collection of preprints focusing on the practical side of data science workflows and statistical analysis, curated by Jennifer Bryan and Hadley Wickham: There are many aspects of day-to-day analytical work that are almost absent from the conventional statistics literature and curriculum. And yet these activities account for a considerable share of the time and effort of data analysts and applied statisticians. The goal of this collection is to increase the visibility and adoption of modern data analytical workflows. We aim to facilitate the transfer of tools and frameworks between industry and academia, between software engineering and statistics and computer science, and across different domains. In other words, in order to be equipped to “think with data” in the 21st century, future analysts need preparation going through the entirety of the “Data/Science Pipeline” we also saw earlier and not just parts of it. FIGURE 12.2: Data/Science Pipeline In Section 12.1, we’ll take you through full-pass of the “Data/Science Pipeline” where we’ll analyze the sale price of houses in Seattle, WA, USA. In Section 12.2, we’ll present you with examples of effective data storytelling, in particular the articles from the data journalism website FiveThirtyEight.com, many of whose source datasets are accessible from the fivethirtyeight R package. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(fivethirtyeight) DataCamp The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression.” Case studies involving data in the fivethirtyeight R package form the basis of ModernDive co-author Chester Ismay’s DataCamp course “Effective Data Storytelling in the Tidyverse.” This free course can be accessed here. 12.1 Case study: Seattle house prices Kaggle.com is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the House Sales in King County, USA consisting of homes sold in between May 2014 and May 2015 in King County, Washington State, USA, which includes the greater Seattle metropolitan area. This CC0: Public Domain licensed dataset is included in the moderndive package in the house_prices data frame, which we’ll refer to as the “Seattle house prices” dataset. The dataset consists 21,613 houses and 21 variables describing these houses; for a full list of these variables see the help file by running ?house_prices in the console. In this case study, we’ll create a model using multiple regression where: The outcome variable \\(y\\) is the sale price of houses The two explanatory/predictor variables we’ll use are : \\(x_1\\): house size sqft_living, as measured by square feet of living space, where 1 square foot is about 0.09 square meters. \\(x_2\\): house condition, a categorical variable with 5 levels where 1 indicates “poor” and 5 indicates “excellent.” Let’s load all the packages needed for this case study (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) 12.1.1 Exploratory data analysis (EDA) A crucial first step before any formal modeling is an exploratory data analysis, commonly abbreviated at EDA. Exploratory data analysis can give you a sense of your data, help identify issues with your data, bring to light any outliers, and help inform model construction. There are three basic approaches to EDA: Most fundamentally, just looking at the raw data. For example using RStudio’s View() spreadsheet viewer or the glimpse() function from the dplyr package Creating visualizations like the ones using ggplot2 from Chapter 3 Computing summary statistics using the dplyr data wrangling tools from Chapter 4 First, let’s look the raw data using View() and the glimpse() function. Explore the dataset. Which variables are numerical and which are categorical? For the categorical variables, what are their levels? Which do you think would useful variables to use in a model for house price? In this case study, we’ll only consider the variables price, sqft_living, and condition. An important thing to observe is that while the condition variable has values 1 through 5, these are saved in R as fct factors i.e. R’s way of saving categorical variables. So you should think of these as the “labels” 1 through 5 and not the numerical values 1 through 5. View(house_prices) glimpse(house_prices) Observations: 21,613 Variables: 21 $ id &lt;chr&gt; &quot;7129300520&quot;, &quot;6414100192&quot;, &quot;5631500400&quot;, &quot;2487200875&quot;,… $ date &lt;dttm&gt; 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-0… $ price &lt;dbl&gt; 221900, 538000, 180000, 604000, 510000, 1225000, 257500… $ bedrooms &lt;int&gt; 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2… $ bathrooms &lt;dbl&gt; 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2… $ sqft_living &lt;int&gt; 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 18… $ sqft_lot &lt;int&gt; 5650, 7242, 10000, 5000, 8080, 101930, 6819, 9711, 7470… $ floors &lt;dbl&gt; 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, … $ waterfront &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,… $ view &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0… $ condition &lt;fct&gt; 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4… $ grade &lt;fct&gt; 7, 7, 6, 7, 8, 11, 7, 7, 7, 7, 8, 7, 7, 7, 7, 9, 7, 7, … $ sqft_above &lt;int&gt; 1180, 2170, 770, 1050, 1680, 3890, 1715, 1060, 1050, 18… $ sqft_basement &lt;int&gt; 0, 400, 0, 910, 0, 1530, 0, 0, 730, 0, 1700, 300, 0, 0,… $ yr_built &lt;int&gt; 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 2… $ yr_renovated &lt;int&gt; 0, 1991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0… $ zipcode &lt;fct&gt; 98178, 98125, 98028, 98136, 98074, 98053, 98003, 98198,… $ lat &lt;dbl&gt; 47.5, 47.7, 47.7, 47.5, 47.6, 47.7, 47.3, 47.4, 47.5, 4… $ long &lt;dbl&gt; -122, -122, -122, -122, -122, -122, -122, -122, -122, -… $ sqft_living15 &lt;int&gt; 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780, 2… $ sqft_lot15 &lt;int&gt; 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 8113,… Let’s now perform the second possible approach to EDA: creating visualizations. Since price and sqft_living are numerical variables, an appropriate way to visualize of these variables’ distributions would be using a histogram using a geom_histogram() as seen in Section 3.5. However, since condition is categorical, a barplot using a geom_bar() yields an appropriate visualization of its distribution. Recall from Section 3.8 that since condition is not “pre-counted”, we use a geom_bar() and not a geom_col(). In Figure 12.3, we display all three of these visualizations at once. # Histogram of house price: ggplot(house_prices, aes(x = price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;price (USD)&quot;, title = &quot;House price&quot;) # Histogram of sqft_living: ggplot(house_prices, aes(x = sqft_living)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;living space (square feet)&quot;, title = &quot;House size&quot;) # Barplot of condition: ggplot(house_prices, aes(x = condition)) + geom_bar() + labs(x = &quot;condition&quot;, title = &quot;House condition&quot;) FIGURE 12.3: Exploratory visualizations of Seattle house prices data We observe the following: In the histogram for price: Since e+06 means \\(10^6\\), or one million, we see that a majority of houses are less than 2 million dollars. The x-axis stretches out far to the right to 8 million dollars, even though there appear to be no houses. In the histogram for size sqft_living Most houses appear to have less than 5000 square feet of living space. For comparison a standard American football field is about 57,600 square feet, where as a standard soccer AKA association football field is about 64,000 square feet. The x-axis exhibits the same stretched out behavior to the right as for price Most houses are of condition 3, 4, or 5. In the case of price, why does the x-axis stretch so far to the right? It is because there are a very small number of houses with price closer to 8 million; these prices are outliers in this case. We say variable is “right skewed” as exhibited by the long right tail. This skew makes it difficult to compare prices of the less expensive houses as the more expensive houses dominate the scale of the x-axis. This is similarly the case for sqft_living. Let’s now perform the third possible approach to EDA: computing summary statistics. In particular, let’s compute 4 summary statistics using the summarize() data wrangling verb from Section 4.3. Two measures of center: the mean and median Two measures of variability/spread: the standard deviation and interquartile-range (IQR = 3rd quartile - 1st quartile) house_prices %&gt;% summarize( mean_price = mean(price), median_price = median(price), sd_price = sd(price), IQR_price = IQR(price) ) # A tibble: 1 x 4 mean_price median_price sd_price IQR_price &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 540088. 450000 367127. 323050 Observe the following: The mean price of $540,088 is larger than the median of $450,000. This is because the small number of very expensive outlier houses prices are inflating the average, whereas since the median is the “middle” value, it is not as sensitive to such large values at the high end. This is why the news typically reports median house prices and not average house prices when describing the real estate market. We say here that the median more “robust to outliers” than the mean. Similarly, while both the standard deviation and IQR are both measures of spread and variability, the IQR is more “robust to outliers.” If you repeat the above summarize() for sqft_living, you’ll find a similar relationship between mean vs median and standard deviation vs IQR given its similar right-skewed nature. Is there anything we can do about this right-skew? Again, this could potentially be an issue because we’ll have a harder time discriminating between houses at the lower end of price and sqft_living, which might lead to a problem when modeling. We can in fact address this issue by using a log base 10 transformation, which we cover next. 12.1.2 log10 transformations At its simplest, log10() transformations returns base 10 logarithms. For example, since \\(1000 = 10^3\\), log10(1000) returns 3. To undo a log10-transformation, we raise 10 to this value. For example, to undo the previous log10-transformation and return the original value of 1000, we raise 10 to this value \\(10^{3}\\) by running 10^(3) = 1000. log-transformations allow us to focus on multiplicative changes instead of additive ones, thereby emphasizing changes in “orders of magnitude.” Let’s illustrate this idea in Table 12.1 with examples of prices of consumer goods in US dollars. TABLE 12.1: log10-transformed prices, orders of magnitude, and examples Price log10(Price) Order of magnitude Examples $1 0 Singles Cups of coffee $10 1 Tens Books $100 2 Hundreds Mobile phones $1,000 3 Thousands High definition TV’s $10,000 4 Tens of thousands Cars $100,000 5 Hundreds of thousands Luxury cars &amp; houses $1,000,000 6 Millions Luxury houses Let’s break this down: When purchasing a cup of coffee, we tend to think of prices ranging in single dollars e.g. $2 or $3. However when purchasing say mobile phones, we don’t tend to think in prices in single dollars e.g. $676 or $757, but tend to round to the nearest unit of hundreds of dollars e.g. $200 or $500. Let’s say want to know the log10-transformed value of $76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since $76 is between $10 and $100. In fact, log10(76) is 1.880814. log10-transformations are monotonic, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B). Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: $100 to $1000. Let’s create new log10-transformed versions of the right-skewed variable price and sqft_living using the mutate() function from Section 4.5, but we’ll give the latter the name log10_size, which is a little more succinct and descriptive a variable name. house_prices &lt;- house_prices %&gt;% mutate( log10_price = log10(price), log10_size = log10(sqft_living) ) Let’s first display the before and after effects of this transformation on these variables for only the first 10 rows of house_prices: house_prices %&gt;% select(price, log10_price, sqft_living, log10_size) # A tibble: 10 x 4 price log10_price sqft_living log10_size &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; 1 221900 5.35 1180 3.07 2 538000 5.73 2570 3.41 3 180000 5.26 770 2.89 4 604000 5.78 1960 3.29 5 510000 5.71 1680 3.23 6 1225000 6.09 5420 3.73 7 257500 5.41 1715 3.23 8 291850 5.47 1060 3.03 9 229500 5.36 1780 3.25 10 323000 5.51 1890 3.28 Observe in particular: The house in the 6th row with price $1,225,000, which is just above one million dollars. Since \\(10^6\\) is one million, its log10_price is 6.09. Contrast this with all other houses with log10_price less than 6. Similarly, there is only one house with size sqft_living less than 1000. Since \\(1000 = 10^3\\), its the lone house with log10_size less than 3. Let’s now visualize the before and after effects of this transformation for price in Figure 12.4. # Before: ggplot(house_prices, aes(x = price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;price (USD)&quot;, title = &quot;House price: Before&quot;) # After: ggplot(house_prices, aes(x = log10_price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;log10 price (USD)&quot;, title = &quot;House price: After&quot;) FIGURE 12.4: House price before and after log10-transformation Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and bell-shaped, although this isn’t always necessarily the case. Now you can now better discriminate between house prices at the lower end of the scale. Let’s do the same for size where the before variable is sqft_living and the after variable is log10_size. Observe in Figure 12.5 that the log10-transformation has a similar effect of un-skewing the variable. Again, we emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case. # Before: ggplot(house_prices, aes(x = sqft_living)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;living space (square feet)&quot;, title = &quot;House size: Before&quot;) # After: ggplot(house_prices, aes(x = log10_size)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;log10 living space (square feet)&quot;, title = &quot;House size: After&quot;) FIGURE 12.5: House size before and after log10-transformation Given the now un-skewed nature of log10_price and log10_size, we are going to revise our modeling structure: We’ll use a new outcome variable \\(y\\) log10_price of houses The two explanatory/predictor variables we’ll use are: \\(x_1\\): A modified version of house size: log10_size \\(x_2\\): House condition will remain unchanged 12.1.3 EDA Part II Let’s continue our exploratory data analysis from Subsection 12.1.1 above. The earlier EDA you performed was univariate in nature in that we only considered one variable at a time. The goal of modeling, however, is to explore relationships between variables. So we must jointly consider the relationship between the outcome variable log10_price and the explanatory/predictor variables log10_size (numerical) and condition (categorical). We viewed such a modeling scenario in Section 7.2 using the evals dataset, where the outcome variable was teaching score, the numerical explanatory/predictor variable was instructor age and the categorical explanatory/predictor variable was (binary) gender. We have two possible visual models. Either a parallel slopes model in Figure 12.6 where we have a different regression line for each of the 5 possible condition levels, each with a different intercept but the same slope: FIGURE 12.6: Parallel slopes model Or an interaction model in Figure 12.7, where we allow each regression line to not only have different intercepts, but different slopes as well: ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) + geom_point(alpha = 0.1) + labs(y = &quot;log10 price&quot;, x = &quot;log10 size&quot;, title = &quot;House prices in Seattle&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) FIGURE 12.7: Interaction model In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plots it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the purple line is highest, followed by condition 4 and 3. As for condition 1 and 2, this pattern isn’t as clear, as if you recall from the univariate barplot of condition in Figure 12.3 there are very few houses of condition 1 or 2. This reality is more apparent in an alternative visualization to Figure 12.7 displayed in Figure 12.8 that uses facets instead: ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) + geom_point(alpha = 0.3) + labs(y = &quot;log10 price&quot;, x = &quot;log10 size&quot;, title = &quot;House prices in Seattle&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) + facet_wrap(~condition) FIGURE 12.8: Interaction model with facets Which exploratory visualization of the interaction model is better, the one in Figure 12.7 or Figure 12.8? There is no universal right answer, you need to make a choice depending on what you want to convey, and own it. 12.1.4 Regression modeling For now let’s focus on the latter, interaction model we’ve visualized in Figure 12.8 above. What are the 5 different slopes and intercepts for the condition = 1, condition = 2, …, and condition = 5 lines in Figure 12.8? To determine these, we first need the values from the regression table: # Fit regression model: price_interaction &lt;- lm(log10_price ~ log10_size * condition, data = house_prices) # Get regression table: get_regression_table(price_interaction) # A tibble: 10 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept 3.33 0.451 7.38 0 2.45 4.22 2 log10_size 0.69 0.148 4.65 0 0.399 0.98 3 condition2 0.047 0.498 0.094 0.925 -0.93 1.02 4 condition3 -0.367 0.452 -0.812 0.417 -1.25 0.519 5 condition4 -0.398 0.453 -0.879 0.38 -1.29 0.49 6 condition5 -0.883 0.457 -1.93 0.053 -1.78 0.013 7 log10_size:condition2 -0.024 0.163 -0.148 0.882 -0.344 0.295 8 log10_size:condition3 0.133 0.148 0.893 0.372 -0.158 0.424 9 log10_size:condition4 0.146 0.149 0.979 0.328 -0.146 0.437 10 log10_size:condition5 0.31 0.15 2.07 0.039 0.016 0.604 Recall from Section 7.2.3 on how to interpret the outputs where there exists an interaction term, where in this case the “baseline for comparison” group for the categorical variable condition are the condition 1 houses. We’ll write our answers as: \\[\\widehat{\\log10(\\text{price})} = \\hat{\\beta}_0 + \\hat{\\beta}_{\\text{size}} * \\log10(\\text{size})\\] for all five condition levels separately: Condition 1: \\(\\widehat{\\log10(\\text{price})} = 3.33 + 0.69 * \\log10(\\text{size})\\) Condition 2: \\(\\widehat{\\log10(\\text{price})} = (3.33 + 0.047) + (0.69 - 0.024) * \\log10(\\text{size}) = 3.38 + 0.666 * \\log10(\\text{size})\\) Condition 3: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.367) + (0.69 + 0.133) * \\log10(\\text{size}) = 2.96 + 0.823 * \\log10(\\text{size})\\) Condition 4: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.398) + (0.69 + 0.146) * \\log10(\\text{size}) = 2.93 + 0.836 * \\log10(\\text{size})\\) Condition 5: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.883) + (0.69 + 0.31) * \\log10(\\text{size}) = 2.45 + 1 * \\log10(\\text{size})\\) These correspond to the regression lines in the exploratory visualization of the interaction model in Figure 12.7 above. For homes of all 5 condition types, as the size of the house increases, the prices increases. This is what most would expect. However, the rate of increase of price with size is fastest for the homes with condition 3, 4, and 5 of 0.823, 0.836, and 1 respectively; these are the 3 most largest slopes out of the 5. 12.1.5 Making predictions Say you’re a realtor and someone calls you asking you how much their home will sell for. They tell you that it’s in condition = 5 and is sized 1900 square feet. What do you tell them? We first make this prediction visually in Figure 12.9. The predicted log10_price of this house is marked with a black dot: it is where the two following lines intersect: The purple regression line for the condition = 5 homes and The vertical dashed black line at log10_size equals 3.28, since our predictor variable is the log10-transformed square feet of living space and \\(\\log10(1900) = 3.28\\) . FIGURE 12.9: Interaction model with prediction Eyeballing it, it seems the predicted log10_price seems to be around 5.72. Let’s now obtain the an exact numerical value for the prediction using the values of the intercept and slope for the condition = 5 that we computed using the regression table output. We use the equation for the condition = 5 line, being sure to log10() the square footage first. 2.45 + 1 * log10(1900) [1] 5.73 This value is very close to our earlier visually made prediction of 5.72. But wait! We were using the outcome variable log10_price as our outcome variable! So if we want a prediction in terms of price in dollar units, we need to un-log this by taking a power of 10 as described in Section 12.1.2. 10^(2.45 + 1 * log10(1900)) [1] 535493 So we our predicted price for this home of condition 5 and size 1900 square feet is $535,493. Learning check (LC12.1) Repeat the regression modeling in Subsection 12.1.4 and the prediction making you just did on the house of condition 5 and size 1900 square feet in Subsection 12.1.5, but using the parallel slopes model you visualized in Figure 12.6. Hint: it’s $524,807! 12.2 Case study: Effective data storytelling Note: This section is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. /begin{center} r include_image(path = “images/sign-2408065_1920.png”, html_opts=“height=100px”, latex_opts = “width=20%”) /end{center} As we’ve progressed throughout this book, you’ve seen how to work with data in a variety of ways. You’ve learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types. You’ve summarized data in table form and calculated summary statistics for a variety of different variables. Further, you’ve seen the value of inference as a process to come to conclusions about a population by using a random sample. Lastly, you’ve explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure. All throughout, you’ve learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the “effective data storytelling” done by data journalists around the world. Great data stories don’t mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling. 12.2.1 Bechdel test for Hollywood gender representation We recommend you read and analyze this article by Walt Hickey entitled The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women on the Bechdel test, an informal test of gender representation in a movie. As you read over it, think carefully about how Walt is using data, graphics, and analyses to paint the picture for the reader of what the story is he wants to tell. In the spirit of reproducibility, the members of FiveThirtyEight have also shared the data and R code that they used to create for this story and many more of their articles on GitHub. ModernDive co-authors Chester Ismay and Albert Y. Kim along with Jennifer Chunn went one step further by creating the fivethirtyeight R package. The fivethirtyeight package takes FiveThirtyEight’s article data from GitHub, “tames” it so that it’s novice-friendly, and makes all data, documentation, and the original article easily accessible via an R package. The package homepage also includes a list of all fivethirtyeight data sets included. Furthermore, example “vignettes” of fully reproducible start-to-finish analyses of some of these data using dplyr, ggplot2, and other packages in the tidyverse is available here. For example, a vignette showing how to reproduce one of the plots at the end of the above article on the Bechdel test is available here. 12.2.2 US Births in 1999 Here is another example involving the US_births_1994_2003 data frame of all births in the United States between 1994 and 2003. For more information on this data frame including a link to the original article on FiveThirtyEight.com, check out the help file by running ?US_births_1994_2003 in the console. First, let’s load all necessary packages: library(ggplot2) library(dplyr) library(fivethirtyeight) It’s always a good idea to preview your data, either by using RStudio’s spreadsheet View() function or using glimpse() from the dplyr package below: # Preview data glimpse(US_births_1994_2003) Observations: 3,652 Variables: 6 $ year &lt;int&gt; 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1… $ month &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… $ date_of_month &lt;int&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, … $ date &lt;date&gt; 1994-01-01, 1994-01-02, 1994-01-03, 1994-01-04, 1994-0… $ day_of_week &lt;ord&gt; Sat, Sun, Mon, Tues, Wed, Thurs, Fri, Sat, Sun, Mon, Tu… $ births &lt;int&gt; 8096, 7772, 10142, 11248, 11053, 11406, 11251, 8653, 79… We’ll focus on the number of births for each date, but only for births that occurred in 1999. Recall we achieve this using the filter() command from dplyr package: US_births_1999 &lt;- US_births_1994_2003 %&gt;% filter(year == 1999) Since date is a notion of time, which has a sequential ordering to it, a linegraph AKA a “time series” plot would be more appropriate than a scatterplot. In other words, use a geom_line() instead of geom_point(): ggplot(US_births_1999, aes(x = date, y = births)) + geom_line() + labs(x = &quot;Data&quot;, y = &quot;Number of births&quot;, title = &quot;US Births in 1999&quot;) We see a big valley occurring just before January 1st, 2000, mostly likely due to the holiday season. However, what about the major peak of over 14,000 births occurring just before October 1st, 1999? What could be the reason for this anomalously high spike in ? Time to think with data! 12.2.3 Other examples Stand by! 12.2.4 Script of R code An R script file of all R code used in this chapter is available here. Concluding remarks If you’ve come to this point in the book, I’d suspect that you know a thing or two about how to work with data in R. You’ve also gained a lot of knowledge about how to use simulation techniques to determine statistical significance and how these techniques build an intuition about traditional inferential methods like the \\(t\\)-test. The hope is that you’ve come to appreciate data wrangling, tidy datasets, and the power of data visualization. Actually, the data visualization part may be the most important thing here. If you can create truly beautiful graphics that display information in ways that the reader can clearly decipher, you’ve picked up a great skill. Let’s hope that that skill keeps you creating great stories with data into the near and far distant future. Thanks for coming along for the ride as we dove into modern data analysis using R! "],
-["A-appendixA.html", "A Statistical Background A.1 Basic statistical terms", " A Statistical Background A.1 Basic statistical terms A.1.1 Mean The mean AKA average is the most commonly reported measure of center. It is commonly called the “average” though this term can be a little ambiguous. The mean is the sum of all of the data elements divided by how many elements there are. If we have \\(n\\) data points, the mean is given by: \\[Mean = \\frac{x_1 + x_2 + \\cdots + x_n}{n}\\] A.1.2 Median The median is calculated by first sorting a variable’s data from smallest to largest. After sorting the data, the middle element in the list is the median. If the middle falls between two values, then the median is the mean of those two values. A.1.3 Standard deviation We will next discuss the standard deviation of a sample dataset pertaining to one variable. The formula can be a little intimidating at first but it is important to remember that it is essentially a measure of how far to expect a given data value is from its mean: \\[Standard \\, deviation = \\sqrt{\\frac{(x_1 - Mean)^2 + (x_2 - Mean)^2 + \\cdots + (x_n - Mean)^2}{n - 1}}\\] A.1.4 Five-number summary The five-number summary consists of five values: minimum, first quantile AKA 25th percentile, second quantile AKA median AKA 50th percentile, third quantile AKA 75th, and maximum. The quantiles are calculated as first quantile (\\(Q_1\\)): the median of the first half of the sorted data third quantile (\\(Q_3\\)): the median of the second half of the sorted data The interquartile range is defined as \\(Q_3 - Q_1\\) and is a measure of how spread out the middle 50% of values is. The five-number summary is not influenced by the presence of outliers in the ways that the mean and standard deviation are. It is, thus, recommended for skewed datasets. A.1.5 Distribution The distribution of a variable/dataset corresponds to generalizing patterns in the dataset. It often shows how frequently elements in the dataset appear. It shows how the data varies and gives some information about where a typical element in the data might fall. Distributions are most easily seen through data visualization. A.1.6 Outliers Outliers correspond to values in the dataset that fall far outside the range of “ordinary” values. In regards to a boxplot (by default), they correspond to values below \\(Q_1 - (1.5 * IQR)\\) or above \\(Q_3 + (1.5 * IQR)\\). Note that these terms (aside from Distribution) only apply to quantitative variables. "],
+["3-viz.html", "Chapter 3 Data Visualization 3.1 The Grammar of Graphics 3.2 Five Named Graphs - The 5NG 3.3 5NG#1: Scatterplots 3.4 5NG#2: Linegraphs 3.5 5NG#3: Histograms 3.6 Facets 3.7 5NG#4: Boxplots 3.8 5NG#5: Barplots 3.9 Conclusion", " Chapter 3 Data Visualization We begin the development of your data science toolbox with data visualization. By visualizing our data, we gain valuable insights that we couldn’t initially see from just looking at the raw data in spreadsheet form. We will use the ggplot2 package as it provides an easy way to customize your plots. ggplot2 is rooted in the data visualization theory known as The Grammar of Graphics (Wilkinson 2005). At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). Graphics should be designed to emphasize the findings and insight you want your audience to understand. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don’t want to include so many as to overwhelm your audience. As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the distribution of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is distributed in terms of its values) as we go across the levels of a different categorical variable. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(nycflights13) library(ggplot2) library(dplyr) 3.1 The Grammar of Graphics We begin with a discussion of a theoretical framework for data visualization known as “The Grammar of Graphics,” which serves as the foundation for the ggplot2 package. Think of how we construct sentences in English to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can’t just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, “The Grammar of Graphics” define a set of rules for constructing statistical graphics by combining different types of layers. This grammar was created by Leland Wilkinson (Wilkinson 2005) and has been implemented in a variety of data visualization software including R. 3.1.1 Components of the Grammar In short, the grammar tells us that: A statistical graphic is a mapping of data variables to aesthetic attributes of geometric objects. Specifically, we can break a graphic into the following three essential components: data: the data set composed of variables that we map. geom: the geometric object in question. This refers to the type of object we can observe in a plot. For example: points, lines, and bars. aes: aesthetic attributes of the geometric object. For example, x/y position, color, shape, and size. Each assigned aesthetic attribute can be mapped to a variable in our data set. You might be wondering why we wrote the terms data, geom, and aes in a computer code type font. We’ll see very shortly that we’ll specify the elements of the grammar in R using these terms. However, let’s first break down the grammar with an example. 3.1.2 Gapminder data In February 2006, a statistician named Hans Rosling gave a TED talk titled “The best stats you’ve ever seen” where he presented global economic, health, and development data from the website gapminder.org. For example, for the 142 countries included from 2007, let’s consider only the first 6 countries when listed alphabetically in Table 3.1. TABLE 3.1: Gapminder 2007 Data: First 6 of 142 countries Country Continent Life Expectancy Population GDP per Capita Afghanistan Asia 43.8 31889923 975 Albania Europe 76.4 3600523 5937 Algeria Africa 72.3 33333216 6223 Angola Africa 42.7 12420476 4797 Argentina Americas 75.3 40301927 12779 Australia Oceania 81.2 20434176 34435 Each row in this table corresponds to a country in 2007. For each row, we have 5 columns: Country: Name of country. Continent: Which of the five continents the country is part of. (Note that “Americas” includes countries in both North and South America and that Antarctica is excluded.) Life Expectancy: Life expectancy in years. Population: Number of people living in the country. GDP per Capita: Gross domestic product (in US dollars). Now consider Figure 3.1, which plots this data for all 142 countries in the data. FIGURE 3.1: Life Expectancy over GDP per Capita in 2007 Let’s view this plot through the grammar of graphics: The data variable GDP per Capita gets mapped to the x-position aesthetic of the points. The data variable Life Expectancy gets mapped to the y-position aesthetic of the points. The data variable Population gets mapped to the size aesthetic of the points. The data variable Continent gets mapped to the color aesthetic of the points. We’ll see shortly that data corresponds to the particular data frame where our data is saved and a “data variable” corresponds to a particular column in the data frame. Furthermore, the type of geometric object considered in this plot are points. That being said, while in this example we are considering points, graphics are not limited to just points. Other plots involve lines while others involve bars. Let’s summarize the three essential components of the Grammar in Table 3.2. TABLE 3.2: Summary of Grammar of Graphics for this plot data variable aes geom GDP per Capita x point Life Expectancy y point Population size point Continent color point 3.1.3 Other components There are other components of the Grammar of Graphics we can control as well. As you start to delve deeper into the Grammar of Graphics, you’ll start to encounter these topics more frequently. In this book however, we’ll keep things simple and only work with the two additional components listed below: faceting breaks up a plot into small multiples corresponding to the levels of another variable (Section 3.6) position adjustments for barplots (Section 3.8) Other more complex components like scales and coordinate systems are left for a more advanced text such as R for Data Science (Grolemund and Wickham 2016). Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifying them. 3.1.4 ggplot2 package In this book, we will be using the ggplot2 package for data visualization, which is an implementation of the Grammar of Graphics for R (Wickham et al. 2019). As we noted earlier, a lot of the previous section was written in a computer code type font. This is because the various components of the Grammar of Graphics are specified in the ggplot() function included in the ggplot2 package, which expects at a minimum as arguments (i.e. inputs): The data frame where the variables exist: the data argument. The mapping of the variables to aesthetic attributes: the mapping argument which specifies the aesthetic attributes involved. After we’ve specified these components, we then add layers to the plot using the + sign. The most essential layer to add to a plot is the layer that specifies which type of geometric object we want the plot to involve: points, lines, bars, and others. Other layers we can add to a plot include layers specifying the plot title, axes labels, visual themes for the plots, and facets (which we’ll see in Section 3.6). Let’s now put the theory of the Grammar of Graphics into practice. 3.2 Five Named Graphs - The 5NG In order to keep things simple, we will only five different types of graphics in this book, each with a commonly given name. We term these “five named graphs” the 5NG: scatterplots linegraphs boxplots histograms barplots We will discuss some variations of these plots, but with this basic repertoire of graphics in your toolbox you can visualize a wide array of different variable types. Note that certain plots are only appropriate for categorical variables and while others are only appropriate for quantitative variables. You’ll want to quiz yourself often as we go along on which plot makes sense a given a particular problem or data set. 3.3 5NG#1: Scatterplots The simplest of the 5NG are scatterplots, also called bivariate plots. They allow you to visualize the relationship between two numerical variables. While you may already be familiar with scatterplots, let’s view them through the lens of the Grammar of Graphics. Specifically, we will visualize the relationship between the following two numerical variables in the flights data frame included in the nycflights13 package: dep_delay: departure delay on the horizontal “x” axis and arr_delay: arrival delay on the vertical “y” axis for Alaska Airlines flights leaving NYC in 2013. This requires paring down the data from all 336,776 flights that left NYC in 2013, to only the 714 Alaska Airlines flights that left NYC in 2013. What this means computationally is: we’ll take the flights data frame, extract only the 714 rows corresponding to Alaska Airlines flights, and save this in a new data frame called alaska_flights. Run the code below to do this: alaska_flights &lt;- flights %&gt;% filter(carrier == &quot;AS&quot;) For now we suggest you ignore how this code works; we’ll explain this in detail in Chapter 4 when we cover data wrangling. However, convince yourself that this code does what it is supposed to by running View(alaska_flights): it creates a new data frame alaska_flights consisting of only the 714 Alaska Airlines flights. We’ll see later in Chapter 4 on data wrangling that this code uses the dplyr package for data wrangling to achieve our goal: it takes the flights data frame and filters it to only return the rows where carrier is equal to &quot;AS&quot;, Alaska Airlines’ carrier code. Other examples of carrier codes include “AA” for American Airlines and “UA” for United Airlines. Recall from Section 2.2 that testing for equality is specified with == and not =. Fasten your seat belts and sit tight for now however, we’ll introduce these ideas more fully in Chapter 4. Learning check (LC3.1) Take a look at both the flights and alaska_flights data frames by running View(flights) and View(alaska_flights). In what respect do these data frames differ? 3.3.1 Scatterplots via geom_point Let’s now go over the code that will create the desired scatterplot, keeping in mind our discussion on the Grammar of Graphics in Section 3.1. We’ll be using the ggplot() function included in the ggplot2 package. ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point() Let’s break this down piece-by-piece: Within the ggplot() function, we specify two of the components of the Grammar of Graphics as arguments (i.e. inputs): The data frame to be alaska_flights by setting data = alaska_flights. The aesthetic mapping by setting aes(x = dep_delay, y = arr_delay). Specifically: the variable dep_delay maps to the x position aesthetic the variable arr_delay maps to the y position aesthetic We add a layer to the ggplot() function call using the + sign. The layer in question specifies the third component of the grammar: the geometric object. In this case the geometric object are points, set by specifying geom_point(). After running the above code, you’ll notice two outputs: a warning message and the graphic shown in Figure 3.2. Let’s first unpack the warning message: Warning: Removed 5 rows containing missing values (geom_point). FIGURE 3.2: Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013 After running the above code, R returns a warning message alerting us to the fact that 5 rows were ignored due to them being missing. For 5 rows either the value for dep_delay or arr_delay or both were missing (recorded in R as NA), and thus these rows were ignored in our plot. Turning our attention to the resulting scatterplot in Figure 3.2, we see that a positive relationship exists between dep_delay and arr_delay: as departure delays increase, arrival delays tend to also increase. We also note the large mass of points clustered near (0, 0). Before we continue, let’s consider a few more notes on the layers in the above code that generated the scatterplot: Note that the + sign comes at the end of lines, and not at the beginning. You’ll get an error in R if you put it at the beginning. When adding layers to a plot, you are encouraged to start a new line after the + so that the code for each layer is on a new line. As we add more and more layers to plots, you’ll see this will greatly improve the legibility of your code. To stress the importance of adding layers in particular the layer specifying the geometric object, consider Figure 3.3 where no layers are added. A not very useful plot! ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) FIGURE 3.3: Plot with No Layers Learning check (LC3.2) What are some practical reasons why dep_delay and arr_delay have a positive relationship? (LC3.3) What variables (not necessarily in the flights data frame) would you expect to have a negative correlation (i.e. a negative relationship) with dep_delay? Why? Remember that we are focusing on numerical variables here. (LC3.4) Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights? (LC3.5) What are some other features of the plot that stand out to you? (LC3.6) Create a new scatterplot using different variables in the alaska_flights data frame by modifying the example above. 3.3.2 Over-plotting The large mass of points near (0, 0) in Figure 3.2 can cause some confusion as it is hard to tell the true number of points that are plotted. This is the result of a phenomenon called overplotting. As one may guess, this corresponds to values being plotted on top of each other over and over again. It is often difficult to know just how many values are plotted in this way when looking at a basic scatterplot as we have here. There are two methods to address the issue of overplotting: By adjusting the transparency of the points. By adding a little random “jitter”, or random “nudges”, to each of the points. Method 1: Changing the transparency The first way of addressing overplotting is by changing the transparency of the points by using the alpha argument in geom_point(). By default, this value is set to 1. We can change this to any value between 0 and 1, where 0 sets the points to be 100% transparent and 1 sets the points to be 100% opaque. Note how the following code is identical to the code in Section 3.3 that created the scatterplot with overplotting, but with alpha = 0.2 added to the geom_point(): ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point(alpha = 0.2) FIGURE 3.4: Delay scatterplot with alpha=0.2 The key feature to note in Figure 3.4 is that the transparency of the points is cumulative: areas with a high-degree of overplotting are darker, whereas areas with a lower degree are less dark. Note furthermore that there is no aes() surrounding alpha = 0.2. This is because we are not mapping a variable to an aesthetic attribute, but rather merely changing the default setting of alpha. In fact, you’ll receive an error if you try to change the second line above to read geom_point(aes(alpha = 0.2)). Method 2: Jittering the points The second way of addressing overplotting is by jittering all the points, in other words give each point a small nudge in a random direction. You can think of “jittering” as shaking the points around a bit on the plot. Let’s illustrate using a simple example first. Say we have a data frame jitter_example with 4 rows of identical value 0 for both x and y: # A tibble: 4 x 2 x y &lt;dbl&gt; &lt;dbl&gt; 1 0 0 2 0 0 3 0 0 4 0 0 We display the resulting scatterplot in Figure 3.5; observe that the 4 points are superimposed on top of each other. While we know there are 4 values being plotted, this fact might not be apparent to others. FIGURE 3.5: Regular scatterplot of jitter example data In Figure 3.6 we instead display a jittered scatterplot where each point is given a random “nudge.” It is now plainly evident that this plot involves four points. Keep in mind that jittering is strictly a visualization tool; even after creating a jittered scatterplot, the original values saved in jitter_example remain unchanged. FIGURE 3.6: Jittered scatterplot of jitter example data To create a jittered scatterplot, instead of using geom_point(), we use geom_jitter(). To specify how much jitter to add, we adjust the width and height arguments. This corresponds to how hard you’d like to shake the plot in units corresponding to those for both the horizontal and vertical variables (in this case minutes). ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_jitter(width = 30, height = 30) FIGURE 3.7: Jittered delay scatterplot Observe how the above code is identical to the code that created the scatterplot with overplotting in Subsection 3.3.1, but with geom_point() replaced with geom_jitter(). The resulting plot in Figure 3.7 helps us a little bit in getting a sense for the overplotting, but with a relatively large data set like this one (714 flights), it can be argued that changing the transparency of the points by setting alpha proved more effective. In terms of how much jitter one should add using the width and height arguments, it is important to add just enough jitter to break any overlap in points, but not so much that we completely alter the overall pattern in points. Learning check (LC3.7) Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? (LC3.8) After viewing the Figure 3.4 above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2? 3.3.3 Summary Scatterplots display the relationship between two numerical variables. They are among the most commonly used plots because they can provide an immediate way to see the trend in one variable versus another. However, if you try to create a scatterplot where either one of the two variables is not numerical, you might get strange results. Be careful! With medium to large data sets, you may need to play around with the different modifications one can make to a scatterplot. This tweaking is often a fun part of data visualization, since you’ll have the chance to see different relationships come about as you make subtle changes to your plots. 3.4 5NG#2: Linegraphs The next of the five named graphs are linegraphs. Linegraphs show the relationship between two numerical variables when the variable on the x-axis, also called the explanatory variable, is of a sequential nature; in other words there is an inherent ordering to the variable. The most common example of linegraphs have some notion of time on the x-axis: hours, days, weeks, years, etc. Since time is sequential, we connect consecutive observations of the variable on the y-axis with a line. Linegraphs that have some notion of time on the x-axis are also called time series plots. Linegraphs should be avoided when there is not a clear sequential ordering to the variable on the x-axis. Let’s illustrate linegraphs using another data set in the nycflights13 package: the weather data frame. Let’s get a sense for the weather data frame: Explore the weather data by running View(weather). Run ?weather to bring up the help file. We can see that there is a variable called temp of hourly temperature recordings in Fahrenheit at weather stations near all three airports in New York City: Newark (origin code EWR), JFK, and La Guardia (LGA). Instead of considering hourly temperatures for all days in 2013 for all three airports however, for simplicity let’s only consider hourly temperatures at only Newark airport for the first 15 days in January. Recall in Section 3.3 we used the filter() function to only choose the subset of rows of flights corresponding to Alaska Airlines flights. We similarly use filter() here, but by using the &amp; operator we only choose the subset of rows of weather where The origin is &quot;EWR&quot; and the month is January and the day is between 1 and 15 early_january_weather &lt;- weather %&gt;% filter(origin == &quot;EWR&quot; &amp; month == 1 &amp; day &lt;= 15) Learning check (LC3.9) Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather). In what respect do these data frames differ? (LC3.10) View() the flights data frame again. Why does the time_hour variable uniquely identify the hour of the measurement whereas the hour variable does not? 3.4.1 Linegraphs via geom_line Let’s plot a linegraph of hourly temperatures in early_january_weather by using geom_line() instead of geom_point() like we did for scatterplots: ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) + geom_line() FIGURE 3.8: Hourly Temperature in Newark for January 1-15, 2013 Much as with the ggplot() code that created the scatterplot of departure and arrival delays for Alaska Airlines flights in Figure 3.2, let’s break down the above code piece-by-piece in terms of the Grammar of Graphics: Within the ggplot() function call, we specify two of the components of the Grammar of Graphics as arguments: The data frame to be early_january_weather by setting data = early_january_weather The aesthetic mapping by setting aes(x = time_hour, y = temp). Specifically: the variable time_hour maps to the x position aesthetic. the variable temp maps to the y position aesthetic We add a layer to the ggplot() function call using the + sign. The layer in question specifies the third component of the grammar: the geometric object in question. In this case the geometric object is a line, set by specifying geom_line(). Learning check (LC3.11) Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? (LC3.12) Why are linegraphs frequently used when time is the explanatory variable on the x-axis? (LC3.13) Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013. 3.4.2 Summary Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use linegraphs over scatterplots when the variable on the x-axis (i.e. the explanatory variable) has an inherent ordering, like some notion of time. 3.5 5NG#3: Histograms Let’s consider the temp variable in the weather data frame once again, but unlike with the linegraphs in Section 3.4, let’s say we don’t care about the relationship of temperature to time, but rather we only care about how the values of temp distribute. In other words: What are the smallest and largest values? What is the “center” value? How do the values spread out? What are frequent and infrequent values? One way to visualize this distribution of this single variable temp is to plot them on a horizontal line as we do in Figure 3.9: FIGURE 3.9: Plot of Hourly Temperature Recordings from NYC in 2013 This gives us a general idea of how the values of temp distribute: observe that temperatures vary from around 11°F up to 100°F. Furthermore, there appear to be more recorded temperatures between 40°F and 60°F than outside this range. However, because of the high degree of overlap in the points, it’s hard to get a sense of exactly how many values are between, say, 50°F and 55°F. What is commonly produced instead of the above plot is known as a histogram. A histogram is a plot that visualizes the distribution of a numerical value as follows: We first cut up the x-axis into a series of bins, where each bin represents a range of values. For each bin, we count the number of observations that fall in the range corresponding to that bin. Then for each bin, we draw a bar whose height marks the corresponding count. Let’s drill-down on an example of a histogram, shown in Figure 3.10. FIGURE 3.10: Example histogram. Observe that there are three bins of equal width between 30°F and 60°F, thus we have three bins of width 10°F each: one bin for the 30-40°F range, another bin for the 40-50°F range, and another bin for the 50-60°F range. Since: The bin for the 30-40°F range has a height of around 5000, this histogram is telling us that around 5000 of the hourly temperature recordings are between 30°F and 40°F. The bin for the 40-50°F range has a height of around 4300, this histogram is telling us that around 4300 of the hourly temperature recordings are between 40°F and 50°F. The bin for the 50-60°F range has a height of around 3500, this histogram is telling us that around 3500 of the hourly temperature recordings are between 50°F and 60°F. The remaining bins all have a similar interpretation. 3.5.1 Histograms via geom_histogram Let’s now present the ggplot() code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in aes(): the single numerical variable temp. The y-aesthetic of a histogram gets computed for you automatically. Furthermore, the geometric object layer is now a geom_histogram() ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram() `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Warning: Removed 1 rows containing non-finite values (stat_bin). FIGURE 3.11: Histogram of hourly temperatures at three NYC airports. Let’s unpack the messages R sent us first. The first message is telling us that the histogram was constructed using bins = 30, in other words 30 equally spaced bins. This is known in computer programming as a default value; unless you override this default number of bins with a number you specify, R will choose 30 by default. We’ll see in the next section how to change this default number of bins. The second message is telling us something similar to the warning message we received when we ran the code to create a scatterplot of departure and arrival delays for Alaska Airlines flights in Figure 3.2: that because one row has a missing NA value for temp, it was omitted from the histogram. R is just giving us a friendly heads up that this was the case. Now’s let’s unpack the resulting histogram in Figure 3.11. Observe that values less than 25°F as well as values above 80°F are rather rare. However, because of the large number of bins, its hard to get a sense for which range of temperatures is covered by each bin; everything is one giant amorphous blob. So let’s add white vertical borders demarcating the bins by adding a color = &quot;white&quot; argument to geom_histogram(): ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(color = &quot;white&quot;) FIGURE 3.12: Histogram of hourly temperatures at three NYC airports with white borders. We can now better associate ranges of temperatures to each of the bins. We can also vary the color of the bars by setting the fill argument. Run colors() to see all 657 possible choice of colors! ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(color = &quot;white&quot;, fill = &quot;steelblue&quot;) FIGURE 3.13: Histogram of hourly temperatures at three NYC airports with white borders. 3.5.2 Adjusting the bins Observe in both Figure 3.12 and Figure 3.13 that in the 50-75°F range there appear to be roughly 8 bins. Thus each bin has width 25 divided by 8, or roughly 3.12°F which is not a very easily interpretable range to work with. Let’s now adjust the number of bins in our histogram in one of two methods: By adjusting the number of bins via the bins argument to geom_histogram(). By adjusting the width of the bins via the binwidth argument to geom_histogram(). Using the first method, we have the power to specify how many bins we would like to cut the x-axis up in. As mentioned in the previous section, the default number of bins is 30. We can override this default, to say 40 bins, as follows: ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(bins = 40, color = &quot;white&quot;) FIGURE 3.14: Histogram with 40 bins. Using the second method, instead of specifying the number of bins, we specify the width of the bins by using the binwidth argument in the geom_histogram() layer. For example, let’s set the width of each bin to be 10°F. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(binwidth = 10, color = &quot;white&quot;) FIGURE 3.15: Histogram with binwidth 10. Learning check (LC3.14) What does changing the number of bins from 30 to 40 tell us about the distribution of temperatures? (LC3.15) Would you classify the distribution of temperatures as symmetric or skewed? (LC3.16) What would you guess is the “center” value in this distribution? Why did you make that choice? (LC3.17) Is this data spread out greatly from the center or is it close? Why? 3.5.3 Summary Histograms, unlike scatterplots and linegraphs, present information on only a single numerical variable. Specifically, they are visualizations of the distribution of the numerical variable in question. 3.6 Facets Before continuing the 5NG, let’s briefly introduce a new concept called faceting. Faceting is used when we’d like to split a particular visualization of variables by another variable. This will create multiple copies of the same type of plot with matching x and y axes, but whose content will differ. For example, suppose we were interested in looking at how the histogram of hourly temperature recordings at the three NYC airports we saw in Section 3.5 differed by month. We would “split” this histogram by the 12 possible months in a given year, in other words plot histograms of temp for each month. We do this by adding facet_wrap(~ month) layer. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + facet_wrap(~ month) FIGURE 3.16: Faceted histogram. Note the use of the tilde ~ before month in facet_wrap(). The tilde is required and you’ll receive the error Error in as.quoted(facets) : object 'month' not found if you don’t include it before month here. We can also specify the number of rows and columns in the grid by using the nrow and ncol arguments inside of facet_wrap(). For example, say we would like our faceted plot to have 4 rows instead of 3. Add the nrow = 4 argument to facet_wrap(~ month) ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + facet_wrap(~ month, nrow = 4) FIGURE 3.17: Faceted histogram with 4 instead of 3 rows. Observe in both Figure 3.16 and Figure 3.17 that as we might expect in the Northern Hemisphere, temperatures tend to be higher in the summer months, while they tend to be lower in the winter. Learning check (LC3.18) What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables? (LC3.19) What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100? (LC3.20) For which types of data sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics. (LC3.21) Does the temp variable in the weather data set have a lot of variability? Why do you say that? 3.7 5NG#4: Boxplots While faceted histograms are one visualization that allows us to compare distributions of a numerical variable split by another variable, another visualization that achieves this same goal are side-by-side boxplots. A boxplot is constructed from the information provided in the five-number summary of a numerical variable (see Appendix A). To keep things simple for now, let’s only consider hourly temperature recordings for the month of November in Figure 3.18. FIGURE 3.18: November temperatures. These 2141 observations have the following five-number summary: Minimum: 21.02°F First quartile AKA 25th percentile: 35.96°F Median AKA second quartile AKA 50th percentile: 44.96°F Third quartile AKA 75th percentile: 51.98°F Maximum: 71.06°F Let’s mark these 5 values with dashed horizontal lines in Figure 3.19. FIGURE 3.19: November temperatures. Let’s add the boxplot underneath these points and dashed horizontal lines in Figure 3.20. FIGURE 3.20: November temperatures. What the boxplot does summarize the 2141 points by emphasizing that: 25% of points (about 534 observations) fall below the bottom edge of the box, which is the first quartile of 35.96°F. In other words 25% of observations were colder than 35.96°F. 25% of points fall between the bottom edge of the box and the solid middle line, which is the median of 44.96°F. In other words 25% of observations were between 35.96 and 44.96°F and 50% of observations were colder than 44.96°F. 25% of points fall between the solid middle line and the top edge of the box, which is the third quartile of 51.98°F. In other words 25% of observations were between 44.96 and 51.98°F and 75% of observations were colder than 51.98°F. 25% of points fall over the top edge of the box. In other words 25% of observations were warmer than 51.98°F. The middle 50% of points lie within the interquartile range between the first and third quartile of 51.98 - 35.96 = 16.02°F. Lastly, for clarity’s sake let’s remove the points but keep the dashed horizontal lines in Figure 3.21. FIGURE 3.21: November temperatures. We can now better see the whiskers of the boxplot. They stick out from either end of the box all the way to the minimum and maximum observed temperatures of 21.02°F and 71.06°F respectively. However, the whiskers don’t always extend to the smallest and largest observed values. They in fact can extend no more than 1.5 \\(\\times\\) the interquartile range from either end of the box, in this case 1.5 \\(\\times\\) 16.02°F = 24.03°F from either end of the box. Any observed values outside this whiskers get marked with points called outliers, which we’ll see in the next section. 3.7.1 Boxplots via geom_boxplot Let’s now create a side-by-side boxplot of hourly temperatures split by the 12 months as we did above with the faceted histograms. We do this by mapping the month variable to the x-position aesthetic, the temp variable to the y-position aesthetic, and by adding a geom_boxplot() layer: ggplot(data = weather, mapping = aes(x = month, y = temp)) + geom_boxplot() FIGURE 3.22: Invalid boxplot specification Warning messages: 1: Continuous x aesthetic -- did you forget aes(group=...)? 2: Removed 1 rows containing non-finite values (stat_boxplot). Observe in Figure 3.22 that this plot does not provide information about temperature separated by month. The warning messages clue us in as to why. The second warning message is identical to the warning message when plotting a histogram of hourly temperatures: that one of the values was recorded as NA missing. However, the first warning message is telling us that we have a “continuous”, or numerical variable, on the x-position aesthetic. Boxplots however require a categorical variable on the x-axis. We can convert the numerical variable month into a categorical variable by using the factor() function. So after applying factor(month), month goes from having numerical values 1, 2, …, 12 to having labels “1”, “2”, …, “12.” ggplot(data = weather, mapping = aes(x = factor(month), y = temp)) + geom_boxplot() FIGURE 3.23: Month by temp boxplot The resulting Figure 3.23 shows 12 separate “box and whiskers” plots with the features we saw earlier focusing only on November: The “box” portions of this visualization represent the 1st quartile, the median AKA the 2nd quartile, and the 3rd quartile. The height of each box, i.e. the value of the 3rd quartile minus the value of the 1st quartile, is the interquartile range. It is a measure of spread of the middle 50% of values, with longer boxes indicating more variability. The “whisker” portions of these plots extend out from the bottoms and tops of the boxes and represent points less than the 25th percentile and greater than the 75th percentiles respectively. They’re set to extend out no more than \\(1.5 \\times IQR\\) units away from either end of the boxes. We say “no more than” because the ends of the whiskers have to correspond to observed temperatures. The length of these whiskers show how the data outside the middle 50% of values vary, with longer whiskers indicating more variability. The dots representing values falling outside the whiskers are called outliers. These can be thought of as anomalous values. It is important to keep in mind that the definition of an outlier is somewhat arbitrary and not absolute. In this case, they are defined by the length of the whiskers, which are no more than \\(1.5 \\times IQR\\) units long. Looking at this plot we can see, as expected, that summer months (6 through 8) have higher median temperatures as evidenced by the higher solid lines in the middle of the boxes. We can easily compare temperatures across months by drawing imaginary horizontal lines across the plot. Furthermore, the height of the 12 boxes as quantified by the interquartile ranges are informative too; they tell us about variability, or spread, of temperatures recorded in a given month. Learning check (LC3.22) What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point. (LC3.23) Which months have the highest variability in temperature? What reasons can you give for this? (LC3.24) We looked at the distribution of the numerical variable temp split by the numerical variable month that we converted to a categorical variable using the factor() function. Why would a boxplot of temp split by the numerical variable pressure similarly converted to a categorical variable using the factor() not be informative? (LC3.25) Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram? 3.7.2 Summary Side-by-side boxplots provide us with a way to compare and contrast the distribution of a quantitative variable across multiple levels of another categorical variable. One can see where the median falls across the different groups by looking at the center line in the boxes. To see how spread out the variable is across the different groups, look at both the width of the box and also how far the whiskers stretch out away from the box. Outliers are even more easily identified when looking at a boxplot than when looking at a histogram as they are marked with points. 3.8 5NG#5: Barplots Both histograms and boxplots are tools to visualize the distribution of numerical variables. Another common task is visualize the distribution of a categorical variable. This is a simpler task, as we are simply counting different categories, also known as levels, of a categorical variable. Often the best way to visualize these different counts, also known as frequencies, is with a barplot (also known as a barchart). One complication, however, is how your data is represented: is the categorical variable of interest “pre-counted” or not? For example, run the following code that manually creates two data frames representing a collection of fruit: 3 apples and 2 oranges. fruits &lt;- data_frame( fruit = c(&quot;apple&quot;, &quot;apple&quot;, &quot;orange&quot;, &quot;apple&quot;, &quot;orange&quot;) ) fruits_counted &lt;- data_frame( fruit = c(&quot;apple&quot;, &quot;orange&quot;), number = c(3, 2) ) We see both the fruits and fruits_counted data frames represent the same collection of fruit. Whereas fruits just lists the fruit individually… # A tibble: 5 x 1 fruit &lt;chr&gt; 1 apple 2 apple 3 orange 4 apple 5 orange … fruits_counted has a variable count which represents pre-counted values of each fruit. # A tibble: 2 x 2 fruit number &lt;chr&gt; &lt;dbl&gt; 1 apple 3 2 orange 2 Depending on how your categorical data is represented, you’ll need to use add a different geom layer to your ggplot() to create a barplot, as we now explore. 3.8.1 Barplots via geom_bar or geom_col Let’s generate barplots using these two different representations of the same basket of fruit: 3 apples and 2 oranges. Using the fruits data frame where all 5 fruits are listed individually in 5 rows, we map the fruit variable to the x-position aesthetic and add a geom_bar() layer. ggplot(data = fruits, mapping = aes(x = fruit)) + geom_bar() FIGURE 3.24: Barplot when counts are not pre-counted However, using the fruits_counted data frame where the fruit have been “pre-counted”, we map the fruit variable to the x-position aesthetic as with geom_bar(), but we also map the count variable to the y-position aesthetic, and add a geom_col() layer. ggplot(data = fruits_counted, mapping = aes(x = fruit, y = number)) + geom_col() FIGURE 3.25: Barplot when counts are pre-counted Compare the barplots in Figures 3.24 and 3.25. They are identical because they reflect count of the same 5 fruit. However depending on how our data is saved, either pre-counted or not, we must add a different geom layer. When the categorical variable whose distribution you want to visualize is: Is not pre-counted in your data frame: use geom_bar(). Is pre-counted in your data frame, use geom_col() with the y-position aesthetic mapped to the variable that has the counts. Let’s now go back to the flights data frame in the nycflights13 package and visualize the distribution of the categorical variable carrier. In other words, let’s visualize the number of domestic flights out of the three New York City airports each airline company flew in 2013. Recall from Section 2.4.3 when you first explored the flights data frame you saw that each row corresponds to a flight. In other words the flights data frame is more like the fruits data frame than the fruits_counted data frame above, and thus we should use geom_bar() instead of geom_col() to create a barplot. Much like a geom_histogram(), there is only one variable in the aes() aesthetic mapping: the variable carrier gets mapped to the x-position. ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() FIGURE 3.26: Number of flights departing NYC in 2013 by airline using geom_bar Observe in Figure 3.26 that United Air Lines (UA), JetBlue Airways (B6), and ExpressJet Airlines (EV) had the most flights depart New York City in 2013. If you don’t know which airlines correspond to which carrier codes, then run View(airlines) to see a directory of airlines. For example: AA is American Airlines; B6 is JetBlue Airways; DL is Delta Airlines; EV is ExpressJet Airlines; MQ is Envoy Air; while UA is United Airlines. Alternatively, say you had a data frame flights_counted where the number of flights for each carrier was pre-counted like in Table 3.3. TABLE 3.3: Number of flights pre-counted for each carrier. carrier number 9E 18460 AA 32729 AS 714 B6 54635 DL 48110 EV 54173 F9 685 FL 3260 HA 342 MQ 26397 OO 32 UA 58665 US 20536 VX 5162 WN 12275 YV 601 In order to create a barplot visualizing the distribution of the categorical variable carrier in this case, we would use geom_col() instead with x mapped to carrier and y mapped to number as seen below. The resulting barplot would be identical to Figure 3.26. ggplot(data = flights_table, mapping = aes(x = carrier, y = number)) + geom_col() Learning check (LC3.26) Why are histograms inappropriate for visualizing categorical variables? (LC3.27) What is the difference between histograms and barplots? (LC3.28) How many Envoy Air flights departed NYC in 2013? (LC3.29) What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly? 3.8.2 Must avoid pie charts! Unfortunately, one of the most common plots seen today for categorical data is the pie chart. While they may seem harmless enough, they actually present a problem in that humans are unable to judge angles well. As Naomi Robbins describes in her book “Creating More Effective Graphs” (Robbins 2013), we overestimate angles greater than 90 degrees and we underestimate angles less than 90 degrees. In other words, it is difficult for us to determine relative size of one piece of the pie compared to another. Let’s examine the same data used in our previous barplot of the number of flights departing NYC by airline in Figure 3.26, but this time we will use a pie chart in Figure 3.27. FIGURE 3.27: The dreaded pie chart Try to answer the following questions: How much larger the portion of the pie is for ExpressJet Airlines (EV) compared to US Airways (US), What the third largest carrier is in terms of departing flights, and How many carriers have fewer flights than United Airlines (UA)? While it is quite difficult to answer these questions when looking at the pie chart in Figure 3.27, we can much more easily answer these questions using the barchart in Figure 3.26. This is true since barplots present the information in a way such that comparisons between categories can be made with single horizontal lines, whereas pie charts present the information in a way such that comparisons between categories must be made by comparing angles. There may be one exception of a pie chart not to avoid courtesy Nathan Yau at FlowingData.com, but we will leave this for the reader to decide: FIGURE 3.28: The only good pie chart Learning check (LC3.30) Why should pie charts be avoided and replaced by barplots? (LC3.31) Why do you think people continue to use pie charts? 3.8.3 Two categorical variables Barplots are the go-to way to visualize the frequency of different categories, or levels, of a single categorical variable. Another use of barplots is to visualize the joint distribution of two categorical variables at the same time. Let’s examine the joint distribution of outgoing domestic flights from NYC by carrier and origin, or in other words the number of flights for each carrier and origin combination. For example, the number of WestJet flights from JFK, the number of WestJet flights from LGA, the number of WestJet flights from EWR, the number of American Airlines flights from JFK, and so on. Recall the ggplot() code that created the barplot of carrier frequency in Figure 3.26: ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() We can now map the additional variable origin by adding a fill = origin inside the aes() aesthetic mapping; the fill aesthetic of any bar corresponds to the color used to fill the bars. ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) + geom_bar() FIGURE 3.29: Stacked barplot comparing the number of flights by carrier and origin. Figure 3.29 is an example of a stacked barplot. While simple to make, in certain aspects it is not ideal. For example, it is difficult to compare the heights of the different colors between the bars, corresponding to comparing the number of flights from each origin airport between the carriers. Before we continue, let’s address some common points of confusion amongst new R users. First, note that fill is another aesthetic mapping much like x-position; thus it must be included within the parentheses of the aes() mapping. The following code, where the fill aesthetic is specified outside the aes() mapping will yield an error. This is a fairly common error that new ggplot users make: ggplot(data = flights, mapping = aes(x = carrier), fill = origin) + geom_bar() Second, the fill aesthetic corresponds to the color used to fill the bars, while the color aesthetic corresponds to the color of the outline of the bars. Observe in Figure 3.30 that mapping origin to color and not fill yields grey bars with different colored outlines. ggplot(data = flights, mapping = aes(x = carrier, color = origin)) + geom_bar() FIGURE 3.30: Stacked barplot with color aesthetic used instead of fill. Learning check (LC3.32) What kinds of questions are not easily answered by looking at the above figure? (LC3.33) What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights? Another alternative to stacked barplots are side-by-side barplots, also known as a dodged barplot. The code to created a side-by-side barplot is identical to the code to create a stacked barplot, but with a position = &quot;dodge&quot; argument added to geom_bar(). In other words, we are overriding the default barplot type, which is a stacked barplot, and specifying it to be a side-by-side barplot. ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) + geom_bar(position = &quot;dodge&quot;) FIGURE 3.31: Side-by-side AKA dodged barplot comparing the number of flights by carrier and origin. Learning check (LC3.34) Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case? (LC3.35) What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general? Lastly, another type of barplot is a faceted barplot. Recall in Section 3.6 we visualized the distribution of hourly temperatures at the 3 NYC airports split by month using facets. We apply the same principle to our barplot visualizing the frequency of carrier split by origin: instead of mapping origin ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() + facet_wrap(~ origin, ncol = 1) FIGURE 3.32: Faceted barplot comparing the number of flights by carrier and origin. Learning check (LC3.36) Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case? (LC3.37) What information about the different carriers at different airports is more easily seen in the faceted barplot? 3.8.4 Summary Barplots are the preferred way of displaying the distribution of a categorical variable, or in other words the frequency with which the different categories called levels occur. They are easy to understand and make it easy to make comparisons across levels. When trying to visualize two categorical variables, you have many options: stacked barplots, side-by-side barplots, and faceted barplots. Depending on what aspect of the joint distribution you are trying to emphasize, you will need to make a choice between these three types of barplots. 3.9 Conclusion 3.9.1 Summary table Let’s recap all five of the Five Named Graphs (5NG) in Table 3.4 summarizing their differences. Using these 5NG, you’ll be able to visualize the distributions and relationships of variables contained in a wide array of datasets. This will be even more the case as we start to map more variables to more of each geometric object’s aesthetic attribute options, further unlocking the awesome power of the ggplot2 package. TABLE 3.4: Summary of 5NG Named graph Shows Geometric object Notes 1 Scatterplot Relationship between 2 numerical variables geom_point() 2 Linegraph Relationship between 2 numerical variables geom_line() Used when there is a sequential order to x-variable e.g. time 3 Histogram Distribution of 1 numerical variable geom_histogram() Facetted histograms show the distribution of 1 numerical variable split by values of another variable 4 Boxplot Distribution of 1 numerical variable split by 1 categorical variable geom_boxplot() 5 Barplot Distribution of 1 categorical variable geom_bar() when counts are not pre-counted, geom_col() when counts are pre-counted Stacked, side-by-side, and faceted barplots show the joint distribution of 2 categorical variables 3.9.2 Argument specification Run the following two segments of code. First this: ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() then this: ggplot(flights, aes(x = carrier)) + geom_bar() You’ll notice that that both code segments create the same barplot, even though in the second segment we omitted the data = and mapping = code argument names. This is because the ggplot() by default assumes that the data argument comes first and the mapping argument comes second. So as long as you specify the data frame in question first and the aes() mapping second, you can omit the explicit statement of the argument names data = and mapping =. Going forward for the rest of this book, all ggplot() will be like the second segment above: with the data = and mapping = explicit naming of the argument omitted and the default ordering of arguments respected. 3.9.3 Additional resources An R script file of all R code used in this chapter is available here. If you want to further unlock the power of the ggplot2 package for data visualization, we suggest you that you check out RStudio’s “Data Visualization with ggplot2” cheatsheet. This cheatsheet summarizes much more than what we’ve discussed in this chapter, in particular the many more than the 5 geom geometric objects we covered in this Chapter, while providing quick and easy to read visual descriptions. You can access this cheatsheet by going to the RStudio Menu Bar -&gt; Help -&gt; Cheatsheets -&gt; “Data Visualization with ggplot2”: FIGURE 3.33: Data Visualization with ggplot2 cheatsheat 3.9.4 What’s to come Recall in Figure 3.2 in Section 3.3 we visualized the relationship between departure delay and arrival delay for Alaska Airlines flights. This necessitated paring down the flights data frame to a new data frame alaska_flights consisting of only carrier == AS flights first: alaska_flights &lt;- flights %&gt;% filter(carrier == &quot;AS&quot;) ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point() Furthermore recall in Figure 3.8 in Section 3.4 we visualized hourly temperature recordings at Newark airport only for the first 15 days of January 2013. This necessitated paring down the weather data frame to a new data frame early_january_weather consisting of hourly temperature recordings only for origin == &quot;EWR&quot;, month == 1, and day less than or equal to 15 first: early_january_weather &lt;- weather %&gt;% filter(origin == &quot;EWR&quot; &amp; month == 1 &amp; day &lt;= 15) ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) + geom_line() These two code segments were a preview of Chapter 4 on data wrangling where we’ll delve further into the dplyr package. Data wrangling is the process of transforming and modifying existing data with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the filter() function to create new data frames (alaska_flights and early_january_weather) by choosing only a subset of rows of existing data frames (flights and weather). In this next chapter, we’ll formally introduce the filter() and other data wrangling functions as well as the pipe operator %&gt;% which allows you to combine multiple data wrangling actions into a single sequential chain of actions. On to Chapter 4 on data wrangling! References "],
+["4-wrangling.html", "Chapter 4 Data Wrangling 4.1 The pipe operator: %&gt;% 4.2 filter rows 4.3 summarize variables 4.4 group_by rows 4.5 mutate existing variables 4.6 arrange and sort rows 4.7 join data frames 4.8 Other verbs 4.9 Conclusion", " Chapter 4 Data Wrangling So far in our journey, we’ve seen how to look at data saved in data frames using the glimpse() and View() functions in Chapter 2 on and how to create data visualizations using the ggplot2 package in Chapter 3. In particular we studied what we term the “five named graphs” (5NG): scatterplots via geom_point() linegraphs via geom_line() boxplots via geom_boxplot() histograms via geom_histogram() barplots via geom_bar() or geom_col() We created these visualizations using the “Grammar of Graphics”, which maps variables in a data frame to the aesthetic attributes of one the above 5 geometric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure 3.1. Recall however in Section 3.9.4 we discussed that for two of our visualizations we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay only for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the flights data frame to a new data frame alaska_flights consisting of only carrier == &quot;AS&quot; flights using the filter() function. alaska_flights &lt;- flights %&gt;% filter(carrier == &quot;AS&quot;) ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point() In this chapter, we’ll introduce a series of functions from the dplyr package that will allow you to take a data frame and filter() its existing rows to only pick out a subset of them. For example, the alaska_flights data frame above. summarize() one of its columns/variables with a summary statistic. Examples include the median and interquartile range of temperatures as we saw in Section 3.7 on boxplots. group_by() its rows. In other words assign different rows to be part of the same group and report summary statistics for each group separately. For example, say perhaps you don’t want a single overall average departure delay dep_delay for all three origin airports combined, but rather three separate average departure delays, one for each of the three origin airports. mutate() its existing columns/variables to create new ones. For example, convert hourly temperature recordings from °F to °C. arrange() its rows. For example, sort the rows of weather in ascending or descending order of temp. join() it with another data frame by matching along a “key” variable. In other words, merge these two data frames together. Notice how we used computer code font to describe the actions we want to take on our data frames. This is because the dplyr package for data wrangling that we’ll introduce in this chapter has intuitively verb-named functions that are easy to remember. We’ll start by introducing the pipe operator %&gt;%, which allows you to combine multiple data wrangling verb-named functions into a single sequential chain of actions. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(nycflights13) 4.1 The pipe operator: %&gt;% Before we start data wrangling, let’s first introduce a very nifty tool that gets loaded along with the dplyr package: the pipe operator %&gt;%. Say you would like to perform a hypothetical sequence of operations on a hypothetical data frame x using hypothetical functions f(), g(), and h(): Take x then Use x as an input to a function f() then Use the output of f(x) as an input to a function g() then Use the output of g(f(x)) as an input to a function h() One way to achieve this sequence of operations is by using nesting parentheses as follows: h(g(f(x))) The above code isn’t so hard to read since we are applying only three functions: f(), then g(), then h(). However, you can imagine that this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator %&gt;% comes in handy. %&gt;% takes one output of one function and then “pipes” it to be the input of the next function. Furthermore, a helpful trick is to read %&gt;% as “then.” For example, you can obtain the same output as the above sequence of operations as follows: x %&gt;% f() %&gt;% g() %&gt;% h() You would read this above sequence as: Take x then Use this output as the input to the next function f() then Use this output as the input to the next function g() then Use this output as the input to the next function h() So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are the hypothetical x, f(), g(), and h()? Throughout this chapter on data wrangling: The starting value x will be a data frame. For example: flights. The sequence of functions, here f(), g(), and h(), will be a sequence of any number of the 6 data wrangling verb-named functions we listed in the introduction to this chapter. For example: filter(carrier == &quot;AS&quot;). The result will the transformed/modified data frame that you want. For example: a data frame consisting of only the subset of rows in flights corresponding to Alaska Airlines flights. Much like when adding layers to a ggplot() using the + sign at the end of lines, you form a single chain of data wrangling operations by combining verb-named functions into a single sequence with pipe operators %&gt;% at the end of lines. So continuing our example involving Alaska Airlines flights, we form a chain using the pipe operator %&gt;% and save the resulting data frame in alaska_flights: alaska_flights &lt;- flights %&gt;% filter(carrier == &quot;AS&quot;) Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you’ll see some examples of these near in Section 4.8. However, just with these 6 verb-named functions you’ll be able to perform a broad array of data wrangling tasks for the rest of this book. 4.2 filter rows FIGURE 4.1: Diagram of The filter() function here works much like the “Filter” option in Microsoft Excel; it allows you to specify criteria about the values of a variables in your dataset and then filters out only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The dest code (or airport code) for Portland, Oregon is &quot;PDX&quot;. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here: portland_flights &lt;- flights %&gt;% filter(dest == &quot;PDX&quot;) View(portland_flights) Note the following: The ordering of the commands: Take the flights data frame flights then filter the data frame so that only those where the dest equals &quot;PDX&quot; are included. We test for equality using the double equal sign == and not a single equal sign =. In other words filter(dest = &quot;PDX&quot;) will yield an error. This is a convention across many programming languages. If you are new to coding, you’ll probably forget to use the double equal sign == a few times before you get the hang of it. You can use other mathematical operations beyond just == to form criteria: &gt; corresponds to “greater than” &lt; corresponds to “less than” &gt;= corresponds to “greater than or equal to” &lt;= corresponds to “less than or equal to” != corresponds to “not equal to”. The ! is used in many programming languages to indicate “not”. Furthermore, you can combine multiple criteria together using operators that make comparisons: | corresponds to “or” &amp; corresponds to “and” To see many of these in action, let’s filter flights for all rows that: Departed from JFK airport and Were heading to Burlington, Vermont (&quot;BTV&quot;) or Seattle, Washington (&quot;SEA&quot;) and Departed in the months of October, November, or December. Run the following: btv_sea_flights_fall &lt;- flights %&gt;% filter(origin == &quot;JFK&quot; &amp; (dest == &quot;BTV&quot; | dest == &quot;SEA&quot;) &amp; month &gt;= 10) View(btv_sea_flights_fall) Note that even though colloquially speaking one might say “all flights leaving Burlington, Vermont and Seattle, Washington,” in terms of computer operations, we really mean “all flights leaving Burlington, Vermont or leaving Seattle, Washington.” For a given row in the data, dest can be “BTV”, “SEA”, or something else, but not “BTV” and “SEA” at the same time. Furthermore, note the careful use of parentheses around the dest == &quot;BTV&quot; | dest == &quot;SEA&quot;. We can often skip the use of &amp; and just separate our conditions with a comma. In other words the code above will return the identical output btv_sea_flights_fall as this code below: btv_sea_flights_fall &lt;- flights %&gt;% filter(origin == &quot;JFK&quot;, (dest == &quot;BTV&quot; | dest == &quot;SEA&quot;), month &gt;= 10) View(btv_sea_flights_fall) Let’s present another example that uses the ! “not” operator to pick rows that don’t match a criteria. As mentioned earlier, the ! can be read as “not.” Here we are filtering rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA. not_BTV_SEA &lt;- flights %&gt;% filter(!(dest == &quot;BTV&quot; | dest == &quot;SEA&quot;)) View(not_BTV_SEA) Again, note the careful use of parentheses around the (dest == &quot;BTV&quot; | dest == &quot;SEA&quot;). If we didn’t use parentheses as follows: flights %&gt;% filter(!dest == &quot;BTV&quot; | dest == &quot;SEA&quot;) We would be returning all flights not headed to &quot;BTV&quot; or those headed to &quot;SEA&quot;, which is an entirely different resulting data frame. Now say we have a large list of airports we want to filter for, say BTV, SEA, PDX, SFO, and BDL. We could continue to use the | or operator as so: many_airports &lt;- flights %&gt;% filter(dest == &quot;BTV&quot; | dest == &quot;SEA&quot; | dest == &quot;PDX&quot; | dest == &quot;SFO&quot; | dest == &quot;BDL&quot;) View(many_airports) but as we progressively include more airports, this will get unwieldy. A slightly shorter approach uses the %in% operator: many_airports &lt;- flights %&gt;% filter(dest %in% c(&quot;BTV&quot;, &quot;SEA&quot;, &quot;PDX&quot;, &quot;SFO&quot;, &quot;BDL&quot;)) View(many_airports) What this code is doing is filtering flights for all flights where dest is in the list of airports c(&quot;BTV&quot;, &quot;SEA&quot;, &quot;PDX&quot;, &quot;SFO&quot;, &quot;BDL&quot;). Recall from Chapter 2 that the c() function “combines” or “concatenates” values in a vector of values. Both outputs of many_airports are the same, but as you can see the latter takes much less time to code. As a final note we point out that filter() should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope of your data frame to just the observations your care about. Learning check (LC4.1) What’s another way of using the “not” operator ! to filter only the rows that are not going to Burlington VT nor Seattle WA in the flights data frame? Test this out using the code above. 4.3 summarize variables The next common task when working with data is to return summary statistics: a single numerical value that summarizes a large number of values, for example the mean/average or the median. Other examples of summary statistics that might not immediately come to mind include the sum, the smallest value AKA the minimum, the largest value AKA the maximum, and the standard deviation; they are all summaries of a large number of values. FIGURE 4.2: Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet FIGURE 4.3: Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet Let’s calculate the mean and the standard deviation of the temperature variable temp in the weather data frame included in the nycflights13 package (See Appendix A). We’ll do this in one step using the summarize() function from the dplyr package and save the results in a new data frame summary_temp with columns/variables mean and the std_dev. Note you can also use the UK spelling of “summarise” using the summarise() function. As shown in Figures 4.2 and 4.3, the weather data frame’s many rows will be collapsed into a single row of just the summary values, in this case the mean and standard deviation: summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp), std_dev = sd(temp)) summary_temp # A tibble: 1 x 2 mean std_dev &lt;dbl&gt; &lt;dbl&gt; 1 NA NA Why are the values returned NA? As we saw in Section 3.3.1 when creating the scatterplot of departure and arrival delays for alaska_flights, NA is how R encodes missing values where NA indicates “not available” or “not applicable.” If a value for a particular row and a particular column does not exist, NA is stored instead. Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You’ll often encounter issues with missing values when working with real data. Going back to our summary_temp output above, by default any time you try to calculate a summary statistic of a variable that has one or more NA missing values in R, then NA is returned. To work around this fact, you can set the na.rm argument to TRUE, where rm is short for “remove”; this will ignore any NA missing values and only return the summary value for all non-missing values. The code below computes the mean and standard deviation of all non-missing values of temp. Notice how the na.rm=TRUE are used as arguments to the mean() and sd() functions individually, and not to the summarize() function. summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) summary_temp # A tibble: 1 x 2 mean std_dev &lt;dbl&gt; &lt;dbl&gt; 1 55.3 17.8 However, one needs to be cautious whenever ignoring missing values as we’ve done above. In the upcoming Learning Checks we’ll consider the possible ramifications of blindly sweeping rows with missing values “under the rug.” This is in fact why the na.rm argument to any summary statistic function in R has is set to FALSE by default; in other words, do not ignore rows with missing values by default. R is alerting you to the presence of missing data and you should by mindful of this missingness and any potential causes of this missingness throughtout your analysis. What are other summary statistic functions can we use inside the summarize() verb? As seen in Figure 4.3, you can use any function in R that takes many values and returns just one. Here are just a few: mean(): the mean AKA the average sd(): the standard deviation, which is a measure of spread min() and max(): the minimum and maximum values respectively IQR(): Interquartile range sum(): the sum n(): a count of the number of rows/observations in each group. This particular summary function will make more sense when group_by() is covered in Section 4.4. Learning check (LC4.2) Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach? (LC4.3) Modify the above summarize function to create summary_temp to also use the n() summary function: summarize(count = n()). What does the returned value correspond to? (LC4.4) Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) first. summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) %&gt;% summarize(std_dev = sd(temp, na.rm = TRUE)) 4.4 group_by rows FIGURE 4.4: Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet Say instead of the a single mean temperature for the whole year, you would like 12 mean temperatures, one for each of the 12 months separately? In other words, we would like to compute the mean temperature split by month AKA sliced by month AKA aggregated by month. We can do this by “grouping” temperature observations by the values of another variable, in this case by the 12 values of the variable month. Run the following code: summary_monthly_temp &lt;- weather %&gt;% group_by(month) %&gt;% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) summary_monthly_temp month mean std_dev 1 35.6 10.22 2 34.3 6.98 3 39.9 6.25 4 51.7 8.79 5 61.8 9.68 6 72.2 7.55 7 80.1 7.12 8 74.5 5.19 9 67.4 8.47 10 60.1 8.85 11 45.0 10.44 12 38.4 9.98 This code is identical to the previous code that created summary_temp, but with an extra group_by(month) added before the summarize(). Grouping the weather dataset by month and then applying the summarize() functions yields a data frame that displays the mean and standard deviation temperature split by the 12 months of the year. It is important to note that the group_by() function doesn’t change data frames by itself. Rather it changes the meta-data, or data about the data, specifically the group structure. It is only after we apply the summarize() function that the data frame changes. For example, let’s consider the diamonds data frame included in the ggplot2 package. Run this code, specifically in the console: diamonds # A tibble: 53,940 x 10 carat cut color clarity depth table price x y z &lt;dbl&gt; &lt;ord&gt; &lt;ord&gt; &lt;ord&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 # … with 53,930 more rows Observe that the first line of the output reads # A tibble: 53,940 x 10. This is an example of meta-data, in this case the number of observations/rows and variables/columns in diamonds. The actual data itself are the subsequent table of values. Now let’s pipe the diamonds data frame into group_by(cut). Run this code, specifically in the console: diamonds %&gt;% group_by(cut) # A tibble: 53,940 x 10 # Groups: cut [5] carat cut color clarity depth table price x y z &lt;dbl&gt; &lt;ord&gt; &lt;ord&gt; &lt;ord&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 # … with 53,930 more rows Observe that now there is additional meta-data: # Groups: cut [5] indicating that the grouping structure meta-data has been set based on the 5 possible values AKA levels of the categorical variable cut: &quot;Fair&quot;, &quot;Good&quot;, &quot;Very Good&quot;, &quot;Premium&quot;, &quot;Ideal&quot;. On the other hand observe that the data has not changed: it is still a table of 53,940 \\(\\times\\) 10 values. Only by combining a group_by() with another data wrangling operation, in this case summarize() will the actual data be transformed. diamonds %&gt;% group_by(cut) %&gt;% summarize(avg_price = mean(price)) # A tibble: 5 x 2 cut avg_price &lt;ord&gt; &lt;dbl&gt; 1 Fair 4359. 2 Good 3929. 3 Very Good 3982. 4 Premium 4584. 5 Ideal 3458. If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the ungroup() function. Observe how the # Groups: cut [5] meta-data is no longer present. Run this code, specifically in the console: diamonds %&gt;% group_by(cut) %&gt;% ungroup() # A tibble: 53,940 x 10 carat cut color clarity depth table price x y z &lt;dbl&gt; &lt;ord&gt; &lt;ord&gt; &lt;ord&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 # … with 53,930 more rows Let’s now revisit the n() counting summary function we introduced in the previous section. For example, suppose we’d like to count how many flights departed each of the three airports in New York City: by_origin &lt;- flights %&gt;% group_by(origin) %&gt;% summarize(count = n()) by_origin # A tibble: 3 x 2 origin count &lt;chr&gt; &lt;int&gt; 1 EWR 120835 2 JFK 111279 3 LGA 104662 We see that Newark (&quot;EWR&quot;) had the most flights departing in 2013 followed by &quot;JFK&quot; and lastly by LaGuardia (&quot;LGA&quot;). Note there is a subtle but important difference between sum() and n(); While sum() returns the sum of a numerical variable, n() returns counts of the the number of rows/observations. 4.4.1 Grouping by more than one variable You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports for each month, we can also group by a second variable month: group_by(origin, month). We see there are 36 rows to by_origin_monthly because there are 12 months for 3 airports (EWR, JFK, and LGA). by_origin_monthly &lt;- flights %&gt;% group_by(origin, month) %&gt;% summarize(count = n()) by_origin_monthly # A tibble: 36 x 3 # Groups: origin [?] origin month count &lt;chr&gt; &lt;int&gt; &lt;int&gt; 1 EWR 1 9893 2 EWR 2 9107 3 EWR 3 10420 4 EWR 4 10531 5 EWR 5 10592 6 EWR 6 10175 7 EWR 7 10475 8 EWR 8 10359 9 EWR 9 9550 10 EWR 10 10104 # … with 26 more rows Why do we group_by(origin, month) and not group_by(origin) and then group_by(month)? Let’s investigate: by_origin_monthly_incorrect &lt;- flights %&gt;% group_by(origin) %&gt;% group_by(month) %&gt;% summarize(count = n()) by_origin_monthly_incorrect # A tibble: 12 x 2 month count &lt;int&gt; &lt;int&gt; 1 1 27004 2 2 24951 3 3 28834 4 4 28330 5 5 28796 6 6 28243 7 7 29425 8 8 29327 9 9 27574 10 10 28889 11 11 27268 12 12 28135 What happened here is that the second group_by(month) overrode the group structure meta-data of the first group_by(origin), so that in the end we are only grouping by month. The lesson here is if you want to group_by() two or more variables, you should include all these variables in a single group_by() function call. Learning check (LC4.5) Recall from Chapter 3 when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the summary_monthly_temp data frame tell us about temperatures in New York City throughout the year? (LC4.6) What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC? (LC4.7) Recreate by_monthly_origin, but instead of grouping via group_by(origin, month), group variables in a different order group_by(month, origin). What differs in the resulting dataset? (LC4.8) How could we identify how many flights left each of the three airports for each carrier? (LC4.9) How does the filter operation differ from a group_by followed by a summarize? 4.5 mutate existing variables FIGURE 4.5: Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet Another common transformation of data is to create/compute new variables based on existing ones. For example, say you are more comfortable thinking of temperature in degrees Celsius °C and not degrees Farenheit °F. The formula to convert temperatures from °F to °C is: \\[ \\text{temp in C} = \\frac{\\text{temp in F} - 32}{1.8} \\] We can apply this formula to the temp variable using the mutate() function, which takes existing variables and mutates them to create new ones. weather &lt;- weather %&gt;% mutate(temp_in_C = (temp-32)/1.8) View(weather) Note that we have overwritten the original weather data frame with a new version that now includes the additional variable temp_in_C. In other words, the mutate() command outputs a new data frame which then gets saved over the original weather data frame. Furthermore, note how in mutate() we used temp_in_C = (temp-32)/1.8 to create a new variable temp_in_C. Why did we overwrite the data frame weather instead of assigning the result to a new data frame like weather_new, but on the other hand why did we not overwrite temp, but instead created a new variable called temp_in_C? As a rough rule of thumb, as long as you are not losing original information that you might need later, it’s acceptable practice to overwrite existing data frames. On the other hand, had we used mutate(temp = (temp-32)/1.8) instead of mutate(temp_in_C = (temp-32)/1.8), we would have overwritten the original variable temp and lost its values. Let’s compute average monthly temperatures in both °F and °C using the similar group_by() and summarize() code as in the previous section. summary_monthly_temp &lt;- weather %&gt;% group_by(month) %&gt;% summarize(mean_temp_in_F = mean(temp, na.rm = TRUE), mean_temp_in_C = mean(temp_in_C, na.rm = TRUE)) summary_monthly_temp # A tibble: 12 x 3 month mean_temp_in_F mean_temp_in_C &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 1 35.6 2.02 2 2 34.3 1.26 3 3 39.9 4.38 4 4 51.7 11.0 5 5 61.8 16.6 6 6 72.2 22.3 7 7 80.1 26.7 8 8 74.5 23.6 9 9 67.4 19.7 10 10 60.1 15.6 11 11 45.0 7.22 12 12 38.4 3.58 Let’s consider another example. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to the original arrival time. This is commonly referred to as “gain” and we will create this variable using the mutate() function. flights &lt;- flights %&gt;% mutate(gain = dep_delay - arr_delay) Let’s take a look at dep_delay, arr_delay, and the resulting gain variables for the first 5 rows in our new flights data frame: # A tibble: 5 x 3 dep_delay arr_delay gain &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 2 11 -9 2 4 20 -16 3 2 33 -31 4 -1 -18 17 5 -6 -25 19 The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its “gained time in the air” is actually a loss of 9 minutes, hence its gain is -9. Contrast this to the flight in the fourth row which departed a minute early (dep_delay of -1) but arrived 18 minutes early (arr_delay of -18), so its “gained time in the air” is 17 minutes, hence its gain is +17. Let’s look at summary measures of this gain variable and even plot it in the form of a histogram: gain_summary &lt;- flights %&gt;% summarize( min = min(gain, na.rm = TRUE), q1 = quantile(gain, 0.25, na.rm = TRUE), median = quantile(gain, 0.5, na.rm = TRUE), q3 = quantile(gain, 0.75, na.rm = TRUE), max = max(gain, na.rm = TRUE), mean = mean(gain, na.rm = TRUE), sd = sd(gain, na.rm = TRUE), missing = sum(is.na(gain)) ) gain_summary min q1 median q3 max mean sd missing -196 -3 7 17 109 5.66 18 9430 We’ve recreated the summary function we saw in Chapter 3 here using the summarize function in dplyr. ggplot(data = flights, mapping = aes(x = gain)) + geom_histogram(color = &quot;white&quot;, bins = 20) FIGURE 4.6: Histogram of gain variable We can also create multiple columns at once and even refer to columns that were just created in a new column. Hadley and Garrett produce one such example in Chapter 5 of “R for Data Science” (Grolemund and Wickham 2016): flights &lt;- flights %&gt;% mutate( gain = dep_delay - arr_delay, hours = air_time / 60, gain_per_hour = gain / hours ) Learning check (LC4.10) What do positive values of the gain variable in flights correspond to? What about negative values? And what about a zero value? (LC4.11) Could we create the dep_delay and arr_delay columns by simply subtracting dep_time from sched_dep_time and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in flights. (LC4.12) What can we say about the distribution of gain? Describe it in a few sentences using the plot and the gain_summary data frame values. 4.6 arrange and sort rows One of the most common tasks people working with data would like to perform is sort the data frame’s rows in alphanumeric order of the values in a variable/column. For example, when calculating a median by hand requires you to first sort the data from the smallest to highest in value and then identify the “middle” value. The dplyr package has a function called arrange() that we will use to sort/reorder a data frame’s rows according to the values of the specified variable. This is often used after we have used the group_by() and summarize() functions as we will see. Let’s suppose we were interested in determining the most frequent destination airports for all domestic flights departing from New York City in 2013: freq_dest &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) freq_dest # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 ABQ 254 2 ACK 265 3 ALB 439 4 ANC 8 5 ATL 17215 6 AUS 2439 7 AVL 275 8 BDL 443 9 BGR 375 10 BHM 297 # … with 95 more rows Observe that by default the rows of the resulting freq_dest data frame are sorted in alphabetical order of dest destination. Say instead we would like to see the same data, but sorted from the most to the least number of flights num_flights instead: freq_dest %&gt;% arrange(num_flights) # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 LEX 1 2 LGA 1 3 ANC 8 4 SBN 10 5 HDN 15 6 MTJ 15 7 EYW 17 8 PSP 19 9 JAC 25 10 BZN 36 # … with 95 more rows This is actually giving us the opposite of what we are looking for: the rows are sorted with the least frequent destination airports displayed first. To switch the ordering to be descending instead of ascending we use the desc() function, which is short for “descending”: freq_dest %&gt;% arrange(desc(num_flights)) # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 ORD 17283 2 ATL 17215 3 LAX 16174 4 BOS 15508 5 MCO 14082 6 CLT 14064 7 SFO 13331 8 FLL 12055 9 MIA 11728 10 DCA 9705 # … with 95 more rows In other words, arrange() sorts in ascending order by default unless you override this default behavior by using desc(). 4.7 join data frames Another common data transformation task is “joining” or “merging” two different datasets. For example in the flights data frame the variable carrier lists the carrier code for the different flights. While the corresponding airline names for &quot;UA&quot; and &quot;AA&quot; might be somewhat easy to guess (United and American Airlines), what airlines have codes? &quot;VX&quot;, &quot;HA&quot;, and &quot;B6&quot;? This information is provided in a separate data frame airlines. View(airlines) We see that in airports, carrier is the carrier code while name is the full name of the airline company. Using this table, we can see that &quot;VX&quot;, &quot;HA&quot;, and &quot;B6&quot; correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, wouldn’t it be nice to have all this information in a single data frame instead of two separate data frames? We can do this by “joining” i.e. “merging” the flights and airlines data frames. Note that the values in the variable carrier in the flights data frame match the values in the variable carrier in the airlines data frame. In this case, we can use the variable carrier as a key variable to match the rows of the two data frames. Key variables are almost always identification variables that uniquely identify the observational units as we saw in Subsection ??. This ensures that rows in both data frames are appropriately matched during the join. Hadley and Garrett (Grolemund and Wickham 2016) created the following diagram to help us understand how the different datasets are linked by various key variables: FIGURE 4.7: Data relationships in nycflights13 from R for Data Science 4.7.1 Matching “key” variable names In both the flights and airlines data frames, the key variable we want to join/merge/match the rows of the two data frames by have the same name: carriers. We make use of the inner_join() function to join the two data frames, where the rows will be matched by the variable carrier. flights_joined &lt;- flights %&gt;% inner_join(airlines, by = &quot;carrier&quot;) View(flights) View(flights_joined) Observe that the flights and flights_joined data frames are identical except that flights_joined has an additional variable name whose values correspond to the airline company names drawn from the airlines data frame. A visual representation of the inner_join() is given below (Grolemund and Wickham 2016). There are other types of joins available (such as left_join(), right_join(), outer_join(), and anti_join()), but the inner_join() will solve nearly all of the problems you’ll encounter in this book. FIGURE 4.8: Diagram of inner join from R for Data Science 4.7.2 Different “key” variable names Say instead you are interested in the destinations of all domestic flights departing NYC in 2013 and ask yourself: “What cities are these airports in?” “Is &quot;ORD&quot; Orlando?” “Where is &quot;FLL&quot;? The airports data frame contains airport codes: View(airports) However, looking at both the airports and flights frames and the visual representation of the relations between these data frames in Figure 4.8 above, we see that in: the airports data frame the airport code is in the variable faa the flights data frame the airport codes are in the variables origin and dest So to join these two data frames so that we can identify the destination cities for example, our inner_join() operation will use the by = c(&quot;dest&quot; = &quot;faa&quot;) argument, which allows us to join two data frames where the key variable has a different name: flights_with_airport_names &lt;- flights %&gt;% inner_join(airports, by = c(&quot;dest&quot; = &quot;faa&quot;)) View(flights_with_airport_names) Let’s construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport: named_dests &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) %&gt;% arrange(desc(num_flights)) %&gt;% inner_join(airports, by = c(&quot;dest&quot; = &quot;faa&quot;)) %&gt;% rename(airport_name = name) named_dests # A tibble: 101 x 9 dest num_flights airport_name lat lon alt tz dst tzone &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; 1 ORD 17283 Chicago Ohare Intl 42.0 -87.9 668 -6 A America… 2 ATL 17215 Hartsfield Jackson… 33.6 -84.4 1026 -5 A America… 3 LAX 16174 Los Angeles Intl 33.9 -118. 126 -8 A America… 4 BOS 15508 General Edward Law… 42.4 -71.0 19 -5 A America… 5 MCO 14082 Orlando Intl 28.4 -81.3 96 -5 A America… 6 CLT 14064 Charlotte Douglas … 35.2 -80.9 748 -5 A America… 7 SFO 13331 San Francisco Intl 37.6 -122. 13 -8 A America… 8 FLL 12055 Fort Lauderdale Ho… 26.1 -80.2 9 -5 A America… 9 MIA 11728 Miami Intl 25.8 -80.3 8 -5 A America… 10 DCA 9705 Ronald Reagan Wash… 38.9 -77.0 15 -5 A America… # … with 91 more rows In case you didn’t know, &quot;ORD&quot; is the airport code of Chicago O’Hare airport and &quot;FLL&quot; is the main airport in Fort Lauderdale, Florida, which we can now see in the airport_name variable in the resulting named_dests data frame. 4.7.3 Multiple “key” variables Say instead we are in a situation where we need to join by multiple variables. For example, in Figure 4.7 above we see that in order to join the flights and weather data frames, we need more than one key variable: year, month, day, hour, and origin. This is because the combination of these 5 variables act to uniquely identify each observational unit in the weather data frame: hourly weather recordings at each of the 3 NYC airports. We achieve this by specifying a vector of key variables to join by using the c() function for “combine” or “concatenate” that we saw earlier: flights_weather_joined &lt;- flights %&gt;% inner_join(weather, by = c(&quot;year&quot;, &quot;month&quot;, &quot;day&quot;, &quot;hour&quot;, &quot;origin&quot;)) View(flights_weather_joined) Learning check (LC4.13) Looking at Figure 4.7, when joining flights and weather (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of year, month, day, hour, and origin, and not just hour? (LC4.14) What surprises you about the top 10 destinations from NYC in 2013? 4.7.4 Normal forms The data frames included in the nycflights13 package are in a form that minimizes redundancy of data. For example, the flights data frame only saves the carrier code of the airline company; it does not include the actual name of the airline. For example the first row of flights has carrier equal to UA, but does it does not include the airline name “United Air Lines Inc.” The names of the airline companies are included in the name variable of the airlines data frame. In order to have the airline company name included in flights, we could join these two data frames as follows: joined_flights &lt;- flights %&gt;% inner_join(airlines, by = &quot;carrier&quot;) View(joined_flights) We are capable of performing this join because each of the data frames have keys in common to relate one to another: the carrier variable in both the flights and airlines data frames. The key variable(s) that we join are often identification variables we mentioned previously. This is an important property of what’s known as normal forms of data. The process of decomposing data frames into less redundant tables without losing information is called normalization. More information is available on Wikipedia. Learning check (LC4.15) What are some advantages of data in normal forms? What are some disadvantages? 4.8 Other verbs Here are some other useful data wrangling verbs that might come in handy: select() only a subset of variables/columns rename() variables/columns to have new names Return only the top_n() values of a variable 4.8.1 select variables FIGURE 4.9: Select diagram from Data Wrangling with dplyr and tidyr cheatsheet We’ve seen that the flights data frame in the nycflights13 package contains 19 different variables. You can identify the names of these 19 variables by running the glimpse() function from the dplyr package: glimpse(flights) However, say you only need two of these variables, say carrier and flight. You can select() these two variables: flights %&gt;% select(carrier, flight) This function makes exploring data frames with a very large number of variables easier for humans to process by restricting consideration to only those we care about, like our example with carrier and flight above. This might make viewing the dataset using the View() spreadsheet viewer more digestible. However, as far as the computer is concerned, it doesn’t care how many additional variables are in the data frame in question, so long as carrier and flight are included. Let’s say instead you want to drop i.e deselect certain variables. For example, take the variable year in the flights data frame. This variable isn’t quite a “variable” in the sense that all the values are 2013 i.e. it doesn’t change. Say you want to remove the year variable from the data frame; we can deselect year by using the - sign: flights_no_year &lt;- flights %&gt;% select(-year) glimpse(flights_no_year) Another way of selecting columns/variables is by specifying a range of columns: flight_arr_times &lt;- flights %&gt;% select(month:day, arr_time:sched_arr_time) flight_arr_times The select() function can also be used to reorder columns in combination with the everything() helper function. Let’s suppose we’d like the hour, minute, and time_hour variables, which appear at the end of the flights dataset, to appear immediately after the year, month, and day variables while keeping the rest of the variables. In the code below everything() picks up all remaining variables. flights_reorder &lt;- flights %&gt;% select(year, month, day, hour, minute, time_hour, everything()) glimpse(flights_reorder) Lastly, the helper functions starts_with(), ends_with(), and contains() can be used to select variables/column that match those conditions. For example: flights_begin_a &lt;- flights %&gt;% select(starts_with(&quot;a&quot;)) flights_begin_a flights_delays &lt;- flights %&gt;% select(ends_with(&quot;delay&quot;)) flights_delays flights_time &lt;- flights %&gt;% select(contains(&quot;time&quot;)) flights_time 4.8.2 rename variables Another useful function is rename(), which as you may have guessed renames one column to another name. Suppose we want dep_time and arr_time to be departure_time and arrival_time instead in the flights_time data frame: flights_time_new &lt;- flights %&gt;% select(contains(&quot;time&quot;)) %&gt;% rename(departure_time = dep_time, arrival_time = arr_time) glimpse(flights_time) Note that in this case we used a single = sign within the rename(), for example departure_time = dep_time. This is because we are not testing for equality like we would using ==, but instead we want to assign a new variable departure_time to have the same values as dep_time and then delete the variable dep_time. It’s easy to forget if the new name comes before or after the equals sign. I usually remember this as “New Before, Old After” or NBOA. 4.8.3 top_n values of a variable We can also return the top n values of a variable using the top_n() function. For example, we can return a data frame of the top 10 destination airports using the example from Section 4.7.2. Observe that we set the number of values to return to n = 10 and wt = num_flights to indicate that we want the rows of corresponding to the top 10 values of num_flights. See the help file for top_n() by running ?top_n for more information. named_dests %&gt;% top_n(n = 10, wt = num_flights) Let’s further arrange() these results in descending order of num_flights: named_dests %&gt;% top_n(n = 10, wt = num_flights) %&gt;% arrange(desc(num_flights)) Learning check (LC4.16) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways. (LC4.17) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains. (LC4.18) Why might we want to use the select function on a data frame? (LC4.19) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013. 4.9 Conclusion 4.9.1 Summary table Let’s recap our data wrangling verbs in Table 4.1. Using these verbs and the pipe %&gt;% operator from Section 4.1, you’ll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book. TABLE 4.1: Summary of data wrangling verbs Verb Data wrangling operation filter() Pick out a subset of rows summarize() Summarize many values to one using a summary statistic function like mean(), median(), etc. group_by() Add grouping structure to rows in data frame. Note this does not change values in data frame. mutate() Create new variables by mutating existing ones arrange() Arrange rows of a data variable in ascending (default) or descending order inner_join() Join/merge two data frames, matching rows by a key variable Learning check (LC4.20) Let’s now put your newly acquired data wrangling skills to the test! An airline industry measure of a passenger airline’s capacity is the available seat miles, which is equal to the number of seats available multiplied by the number of miles or kilometers flown summed over all flights. So for example say an airline had 2 flights using a plane with 10 seats that flew 500 miles and 3 flights using a plane with 20 seats that flew 1000 miles, the available seat miles would be 2 \\(\\times\\) 10 \\(\\times\\) 500 \\(+\\) 3 \\(\\times\\) 20 \\(\\times\\) 1000 = 70,000 seat miles. Using the datasets included in the nycflights13 package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints: Crucial: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level pseudocode that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse what you are trying to do (the algorithm) with how you are going to do it (writing dplyr code). Take a close look at all the datasets using the View() function: flights, weather, planes, airports, and airlines to identify which variables are necessary to compute available seat miles. Figure 4.7 above showing how the various datasets can be joined will also be useful. Consider the data wrangling verbs in Table 4.1 as your toolbox! 4.9.2 Additional resources An R script file of all R code used in this chapter is available here. If you want to further unlock the power of the dplyr package for data wrangling, we suggest you that you check out RStudio’s “Data Transformation with dplyr” cheatsheet. This cheatsheet summarizes much more than what we’ve discussed in this chapter, in particular more-intermediate level and advanced data wrangling functions, while providing quick and easy to read visual descriptions. You can access this cheatsheet by going to the RStudio Menu Bar -&gt; Help -&gt; Cheatsheets -&gt; “Data Transformation with dplyr”: FIGURE 4.10: Data Transformation with dplyr cheatsheat On top of data wrangling verbs and examples we presented in this section, if you’d like to see more examples of using the dplyr package for data wrangling check out Chapter 5 of Garrett Grolemund and Hadley Wickham’s and Garrett’s book (Grolemund and Wickham 2016). 4.9.3 What’s to come? So far in this book, we’ve explored, visualized, and wrangled data saved in data frames that are in spreadsheet-type format: rectangular with a certain number of rows corresponding to observations and a certain number of columns corresponding to variables describing the observations. We’ll see in Chapter 5 that there are actually two ways to represent data in spreadsheet-type rectangular format: 1) “wide” format and 2) “tall/narrow” format also known in R circles as “tidy” format. While the distinction between “tidy” and non-“tidy” formatted data is very subtle, it has very large implications for whether or not we can use the ggplot2 package for data visualization and the dplyr package for data wrangling. Furthermore, we’ve only explored, visualized, and wrangled data saved within R packages. What if you have spreadsheet data saved in a Microsoft Excel, Google Sheets, or “Comma-Separated Values” (CSV) file that you would like to analyze? In Chapter 5, we’ll show you how to import this data into R using the readr package. References "],
+["5-tidy.html", "Chapter 5 Data Importing &amp; “Tidy” Data 5.1 Importing data 5.2 Tidy data 5.3 Case study: Democracy in Guatemala 5.4 Conclusion", " Chapter 5 Data Importing &amp; “Tidy” Data In Subsection 2.2.1 we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section 2.4, we started exploring our first data frame: the flights data frame included in the nycflights13 package. In Chapter 3 we created visualizations based on the data included in flights and other data frames such as weather. In Chapter 4, we learned how to wrangle data, in other words take existing data frames and transform/ modify them to suit our analysis goals. In this final chapter of the “Data Science via the tidyverse” portion of the book, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest: having your data “neatly organized.” Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored, and the implications of these rules for analyses. Although knowledge of this type of data formatting was not necessary for our treatment of data visualization in Chapter 3 and data wrangling in Chapter 4 since all the data was already in “tidy” format, we’ll now see this format is actually essential to using the tools we covered in these two chapters. Furthermore, it will also be useful for all subsequent chapters in this book when we cover regression and statistical inference. First however, we’ll show you how to import spreadsheet data for use in R. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(readr) library(tidyr) library(nycflights13) library(fivethirtyeight) 5.1 Importing data Up to this point, we’ve almost entirely used data stored inside of an R package. Say instead you have your own data saved on your computer or somewhere online? How can you analyze this data in R? Spreadsheet data is often saved in one of the following formats: A Comma Separated Values .csv file. You can think of a .csv file as a bare-bones spreadsheet where: Each line in the file corresponds to one row of data/one observation. Values for each line are separated with commas. In other words, the values of different variables are separated by commas. The first line is often, but not always, a header row indicating the names of the columns/variables. An Excel .xlsx file. This format is based on Microsoft’s proprietary Excel software. As opposed to a bare-bones .csv files, .xlsx Excel files contain a lot of meta-data, or put more simply, data about the data. (Recall we saw a previous example of meta-data in Section 4.4 when adding “group structure” meta-data to a data frame by using the group_by() verb.) Some examples of spreadsheet meta-data include the use of bold and italic fonts, colored cells, different column widths, and formula macros. A Google Sheets file, which is a “cloud” or online-based way to work with a spreadsheet. Google Sheets allows you to download your data in both comma separated values .csv and Excel .xlsx formats however: go to the Google Sheets menu bar -&gt; File -&gt; Download as -&gt; Select “Microsoft Excel” or “Comma-separated values.” We’ll cover two methods for importing .csv and .xlsx spreadsheet data in R: one using the R console and the other using RStudio’s graphical user interface, abbreviated a GUI. 5.1.1 Using the console First, let’s import a Comma Separated Values .csv file of data directly off the internet. The .csv file dem_score.csv accessible at https://moderndive.com/data/dem_score.csv contains ratings of the level of democracy in different countries spanning 1952 to 1992. Let’s use the read_csv() function from the readr package to read it off the web, import it into R, and save it in a data frame called dem_score library(readr) dem_score &lt;- read_csv(&quot;https://moderndive.com/data/dem_score.csv&quot;) dem_score # A tibble: 96 x 10 country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 Albania -9 -9 -9 -9 -9 -9 -9 -9 5 2 Argentina -9 -1 -1 -9 -9 -9 -8 8 7 3 Armenia -9 -7 -7 -7 -7 -7 -7 -7 7 4 Australia 10 10 10 10 10 10 10 10 10 5 Austria 10 10 10 10 10 10 10 10 10 6 Azerbaijan -9 -7 -7 -7 -7 -7 -7 -7 1 7 Belarus -9 -7 -7 -7 -7 -7 -7 -7 7 8 Belgium 10 10 10 10 10 10 10 10 10 9 Bhutan -10 -10 -10 -10 -10 -10 -10 -10 -10 10 Bolivia -4 -3 -3 -4 -7 -7 8 9 9 # … with 86 more rows In this dem_score data frame, the minimum value of -10 corresponds to a highly autocratic nation whereas a value of 10 corresponds to a highly democratic nation. We’ll revisit the dem_score data frame in a case study in the upcoming Section 5.3. Note that the read_csv() function included in the readr package is different than the read.csv() function that comes installed with R by default. While the difference in the names might seem near meaningless (an _ instead of a .), the read_csv() function is in our opinion easier to use since it can more easily read data off the web and generally imports data at a much faster speed. 5.1.2 Using RStudio’s interface Let’s read in the exact same data saved in Excel format, but this time via RStudio’s graphical interface instead of via the R console. First download the Excel file dem_score.xlsx by clicking here, then Go to the Files panel of RStudio. Navigate to the directory i.e. folder on your computer where the downloaded dem_score.xlsx Excel file is saved. Click on dem_score.xlsx. Click “Import Dataset…” At this point you should see an image like this: After clicking on the “Import” button on the bottom right RStudio, RStudio will save this spreadsheet’s data in a data frame called dem_score and display its contents in the spreadsheet viewer. Furthermore, note in the bottom right of the above image there exists a “Code Preview”: you can copy and paste this code to reload your data again later automatically instead of repeating the above manual point-and-click process. 5.2 Tidy data Let’s now switch gears and learn about the concept of “tidy” data format by starting with a motivating example. Let’s consider the drinks data frame included in the fivethirtyeight data. Run the following: drinks # A tibble: 193 x 5 country beer_servings spirit_servings wine_servings total_litres_of_pur… &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; 1 Afghanistan 0 0 0 0 2 Albania 89 132 54 4.9 3 Algeria 25 0 14 0.7 4 Andorra 245 138 312 12.4 5 Angola 217 57 45 5.9 6 Antigua &amp; B… 102 128 45 4.9 7 Argentina 193 25 221 8.3 8 Armenia 21 179 11 3.8 9 Australia 261 72 212 10.4 10 Austria 279 75 191 9.7 # … with 183 more rows After reading the help file by running ?drinks, we see that drinks is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries. This data was originally reported on the data journalism website FiveThirtyEight.com in Mona Chalabi’s article “Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?” Let’s apply some of the data wrangling verbs we learned in Chapter 4 on the drinks data frame. Let’s filter() the drinks data frame to only consider 4 countries (the United States, China, Italy, and Saudi Arabia) then select() all columns except total_litres_of_pure_alcohol by using - sign, then rename() the variables beer_servings, spirit_servings, and wine_servings to beer, spirit, and wine respectively and save the resulting data frame in drinks_smaller. drinks_smaller &lt;- drinks %&gt;% filter(country %in% c(&quot;USA&quot;, &quot;China&quot;, &quot;Italy&quot;, &quot;Saudi Arabia&quot;)) %&gt;% select(-total_litres_of_pure_alcohol) %&gt;% rename(beer = beer_servings, spirit = spirit_servings, wine = wine_servings) drinks_smaller # A tibble: 4 x 4 country beer spirit wine &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 China 79 192 8 2 Italy 85 42 237 3 Saudi Arabia 0 5 0 4 USA 249 158 84 Using the drinks_smaller data frame, how would we create the side-by-side AKA dodged barplot in Figure 5.1? Recall we saw barplots displaying two categorical variables in Section 3.8.3. FIGURE 5.1: Alcohol consumption in 4 countries. Let’s break down the Grammar of Graphics: The categorical variable country with four levels (China, Italy, Saudi Arabia, USA) would have to be mapped to the x-position of the bars. The numerical variable servings would have to be mapped to the y-position of the bars, in other words the height of the bars. The categorical variable type with three levels (beer, spirit, wine) who have to be mapped to the fill color of the bars. Observe however that drinks_smaller has three separate variables for beer, spirit, and wine, whereas in order to recreate the side-by-side AKA dodged barplot in Figure 5.1 we would need a single variable type with three possible values: beer, spirit, and wine, which we would then map to the fill aesthetic. In other words, for us to be able to create the barplot in Figure 5.1, our data frame would have to look like this: drinks_smaller_tidy # A tibble: 12 x 3 country type servings &lt;chr&gt; &lt;chr&gt; &lt;int&gt; 1 China beer 79 2 Italy beer 85 3 Saudi Arabia beer 0 4 USA beer 249 5 China spirit 192 6 Italy spirit 42 7 Saudi Arabia spirit 5 8 USA spirit 158 9 China wine 8 10 Italy wine 237 11 Saudi Arabia wine 0 12 USA wine 84 Let’s compare the drinks_smaller_tidy with the drinks_smaller data frame from earlier: drinks_smaller # A tibble: 4 x 4 country beer spirit wine &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 China 79 192 8 2 Italy 85 42 237 3 Saudi Arabia 0 5 0 4 USA 249 158 84 Observe that while drinks_smaller and drinks_smaller_tidy are both rectangular in shape and contain the same 12 numerical values (3 alcohol types \\(\\times\\) 4 countries), they are formatted differently. drinks_smaller is formatted in what’s known as “wide” format, whereas drinks_smaller_tidy is formatted in what’s known as “long/narrow”. In the context of using R, long/narrow format is also known as “tidy” format. Furthermore, in order to use the ggplot2 and dplyr packages for data visualization and data wrangling, your input data frames must be in “tidy” format. So all non-“tidy” data must be converted to “tidy” format first. Before we show you how to convert non-“tidy” data frames like drinks_smaller to “tidy” data frames like drinks_smaller_tidy, let’s go over the explicit definition of “tidy” data. 5.2.1 Definition of “tidy” data You have surely heard the word “tidy” in your life: “Tidy up your room!” “Please write your homework in a tidy way so that it is easier to grade and to provide feedback.” Marie Kondo’s best-selling book The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing and Netflix TV series Tidying Up with Marie Kondo. “I am not by any stretch of the imagination a tidy person, and the piles of unread books on the coffee table and by my bed have a plaintive, pleading quality to me - ‘Read me, please!’” - Linda Grant What does it mean for your data to be “tidy”? While “tidy” has a clear English meaning of “organized”, “tidy” in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham’s definition of tidy data here (Wickham 2014): A dataset is a collection of values, usually either numbers (if quantitative) or strings AKA text data (if qualitative). Values are organised in two ways. Every value belongs to a variable and an observation. A variable contains all values that measure the same underlying attribute (like height, temperature, duration) across units. An observation contains all values measured on the same unit (like a person, or a day, or a city) across attributes. Tidy data is a standard way of mapping the meaning of a dataset to its structure. A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types. In tidy data: Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table. FIGURE 5.2: Tidy data graphic from R for Data Science. For example, say you have the following table of stock prices in Table 5.1: TABLE 5.1: Stock Prices (Non-Tidy Format) Date Boeing Stock Price Amazon Stock Price Google Stock Price 2009-01-01 $173.55 $174.90 $174.34 2009-01-02 $172.61 $171.42 $170.04 Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format because while there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), there are not three columns. In “tidy” data format each variable should be its own column, as shown in Table 5.2. Notice that both tables present the same information, but in different formats. TABLE 5.2: Stock Prices (Tidy Format) Date Stock Name Stock Price 2009-01-01 Boeing $173.55 2009-01-02 Boeing $172.61 2009-01-01 Amazon $174.90 2009-01-02 Amazon $171.42 2009-01-01 Google $174.34 2009-01-02 Google $170.04 Now we have the requisite three columns Date, Stock Name, and Stock Price. On the other hand, consider the data in Table 5.3. TABLE 5.3: Date, Boeing Price, Weather Data Date Boeing Price Weather 2009-01-01 $173.55 Sunny 2009-01-02 $172.61 Overcast In this case, even though the variable “Boeing Price” occurs just like in our non-“tidy” data in Table 5.1, the data is “tidy” since there are three variables corresponding to three unique pieces of information: Date, Boeing stock price, and the weather that particular day. Learning check (LC5.1) What are common characteristics of “tidy” data frames? (LC5.2) What makes “tidy” data frames useful for organizing data? 5.2.2 Converting to “tidy” data In this book so far, you’ve only seen data frames that were already in “tidy” format. Furthermore for the rest of this book, you’ll mostly only see data frames that are already in “tidy” format as well. This is not always the case however with data in the wild. If your original data frame is in wide i.e. non-“tidy” format and you would like to use the ggplot2 package for data visualization or the dplyr package for data wrangling, you will first have to convert it “tidy” format using the gather() function in the tidyr package (Wickham and Henry 2018). Going back to our drinks_smaller data frame from earlier: drinks_smaller # A tibble: 4 x 4 country beer spirit wine &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 China 79 192 8 2 Italy 85 42 237 3 Saudi Arabia 0 5 0 4 USA 249 158 84 We convert it to “tidy” format by using the gather() function from the tidyr package as follows: drinks_smaller_tidy &lt;- drinks_smaller %&gt;% gather(key = type, value = servings, -country) drinks_smaller_tidy # A tibble: 12 x 3 country type servings &lt;chr&gt; &lt;chr&gt; &lt;int&gt; 1 China beer 79 2 Italy beer 85 3 Saudi Arabia beer 0 4 USA beer 249 5 China spirit 192 6 Italy spirit 42 7 Saudi Arabia spirit 5 8 USA spirit 158 9 China wine 8 10 Italy wine 237 11 Saudi Arabia wine 0 12 USA wine 84 We set the arguments to gather() as follows: key is the name of the column/variable in the new “tidy” frame that contains the column names of the original data frame that you want to tidy. Observe how we set key = type and in the resulting drinks_smaller_tidy the column type contains the three types of alcohol beer, spirit, and wine. value is the name of the column/variable in the “tidy” frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set value = servings and in the resulting drinks_smaller_tidy the column value contains the 4 \\(\\times\\) 3 = 12 numerical values. The third argument are the columns you either want to or don’t want to tidy. Observe how we set this to -country indicating that we don’t want to tidy the country variable in drinks_smaller and rather only beer, spirit, and wine. The third argument is a little nuanced, so let’s consider another example. Note the code below is very similar, but now the third argument species which columns we’d want to tidy c(beer, spirit, wine), instead of the columns we don’t want to tidy -country. Note the use of c() to create a vector of the columns in drinks_smaller that we’d like to tidy. If you run the code below, you’ll see that the resulting drinks_smaller_tidy is the same. drinks_smaller_tidy &lt;- drinks_smaller %&gt;% gather(key = type, value = servings, c(beer, spirit, wine)) drinks_smaller_tidy With our drinks_smaller_tidy “tidy” format data frame, we can now produce a side-by-side AKA dodged barplot using geom_col() and not geom_bar(), since we would like to map the servings variable to the y-aesthetic of the bars. ggplot(drinks_smaller_tidy, aes(x=country, y=servings, fill=type)) + geom_col(position = &quot;dodge&quot;) Converting “wide” format data to “tidy” format often confuses new R users. The only way to learn to get comfortable with the gather() function is with practice, practice, and more practice. For example, see the examples in the bottom of the help file for gather() by running ?gather. We’ll show another example of using gather() to convert a “wide” formatted data frame to “tidy” format in Section 5.3. For other examples of converting a dataset into “tidy” format, check out the different functions available for data tidying and a case study using data from the World Health Organization in R for Data Science (Grolemund and Wickham 2016). Learning check (LC5.3) Take a look the airline_safety data frame included in the fivethirtyeight data. Run the following: airline_safety After reading the help file by running ?airline_safety, we see that airline_safety is a data frame containing information on different airlines companies’ safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver’s article “Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?”. Let’s ignore the incl_reg_subsidiaries and avail_seat_km_per_week variables for simplicity: airline_safety_smaller &lt;- airline_safety %&gt;% select(-c(incl_reg_subsidiaries, avail_seat_km_per_week)) airline_safety_smaller # A tibble: 56 x 7 airline incidents_85_99 fatal_accidents… fatalities_85_99 incidents_00_14 &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 Aer Li… 2 0 0 0 2 Aerofl… 76 14 128 6 3 Aeroli… 6 0 0 1 4 Aerome… 3 1 64 5 5 Air Ca… 2 0 0 2 6 Air Fr… 14 4 79 6 7 Air In… 2 1 329 4 8 Air Ne… 3 0 0 5 9 Alaska… 5 0 0 5 10 Alital… 7 2 50 4 # … with 46 more rows, and 2 more variables: fatal_accidents_00_14 &lt;int&gt;, # fatalities_00_14 &lt;int&gt; This data frame is not in “tidy” format. How would you convert this data frame to be in “tidy” format, in particular so that it has a variable incident_type_years indicating the incident type/year and a variable count of the counts? 5.2.3 nycflights13 package Recall the nycflights13 package with data about all domestic flights departing from New York City in 2013 that we introduced in Section 2.4 and used extensively in Chapter 3 on data visualization and Chapter 4 on data wrangling. Let’s revisit the flights data frame by running View(flights). We saw that flights has a rectangular shape with each of its 336,776 rows corresponding to a flight and each of its 22 columns corresponding to different characteristics/measurements of each flight. This matches exactly with our definition of “tidy” data from above. Each variable forms a column. Each observation forms a row. But what about the third property of “tidy” data? Each type of observational unit forms a table. Recall that we also saw in Section 2.4.3 that the observational unit for the flights data frame is an individual flight. In other words, the rows of the flights data frame refer to characteristics/measurements of individual flights. Also included in the nycflights13 package are other data frames with their rows representing different observational units (Wickham 2018): airlines: translation between two letter IATA carrier codes and names (16 in total). i.e. the observational unit is an airline company. planes: construction information about each of 3,322 planes used. i.e. the observational unit is an aircraft. weather: hourly meteorological data (about 8705 observations) for each of the three NYC airports. i.e. the observational unit is an hourly measurement. airports: airport names and locations. i.e. the observational unit is an airport. The organization of the information into these five data frames follow the third “tidy” data property: observations corresponding to the same observational unit should be saved in the same table i.e. data frame. You could think of this property as the old English expression: “birds of a feather flock together.” 5.3 Case study: Democracy in Guatemala In this section, we’ll show you another example of how to convert a data frame that isn’t in “tidy” format i.e. “wide” format, to a data frame that is in “tidy” format i.e. “long/narrow” format. We’ll do this using the gather() function from the tidyr package again. Furthermore, we’ll make use of some of the ggplot2 data visualization and dplyr data wrangling tools you learned in Chapters 3 and 4. Let’s use the dem_score data frame we imported in Section 5.1, but focus on only data corresponding to Guatemala. guat_dem &lt;- dem_score %&gt;% filter(country == &quot;Guatemala&quot;) guat_dem # A tibble: 1 x 10 country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 Guatemala 2 -6 -5 3 1 -3 -7 3 3 Now let’s produce a time-series plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Recall that we saw time-series plot in Section 3.4 on creating linegraphs using geom_line(). Let’s lay out the Grammar of Graphics we saw in Section 3.1. First we know we need to set data = guat_dem and use a geom_line() layer, but what is the aesthetic mapping of variables. We’d like to see how the democracy score has changed over the years, so we need to map: year to the x-position aesthetic and democracy_score to the y-position aesthetic Now we are stuck in a predicament, much like with our drinks_smaller example in Section 5.2. We see that we have a variable named country, but its only value is &quot;Guatemala&quot;. We have other variables denoted by different year values. Unfortunately, the guat_dem data frame is not “tidy” and hence is not in the appropriate format to apply the Grammar of Graphics and thus we cannot use the ggplot2 package. We need to take the values of the columns corresponding to years in guat_dem and convert them into a new “key” variable called year. Furthermore, we’d like to take the democracy scores on the inside of the table and turn them into a new “value” variable called democracy_score. Our resulting data frame will thus have three columns: country, year, and democracy_score. Recall that the gather() function in the tidyr package can complete this task for us: guat_dem_tidy &lt;- guat_dem %&gt;% gather(key = year, value = democracy_score, -country) guat_dem_tidy # A tibble: 9 x 3 country year democracy_score &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; 1 Guatemala 1952 2 2 Guatemala 1957 -6 3 Guatemala 1962 -5 4 Guatemala 1967 3 5 Guatemala 1972 1 6 Guatemala 1977 -3 7 Guatemala 1982 -7 8 Guatemala 1987 3 9 Guatemala 1992 3 We set the arguments to gather() as follows: key is the name of the column/variable in the new “tidy” frame that contains the column names of the original data frame that you want to tidy. Observe how we set key = year and in the resulting guat_dem_tidy the column year contains the years where the Guatemala’s democracy score were measured. value is the name of the column/variable in the “tidy” frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set value = democracy_score and in the resulting guat_dem_tidy the column democracy_score contains the 1 \\(\\times\\) 9 = 9 democracy scores. The third argument are the columns you either want to or don’t want to tidy. Observe how we set this to -country indicating that we don’t want to tidy the country variable in guat_dem and rather only 1952 through 1992. However, observe in the output for guat_dem_tidy that the year variable is of type chr or character. Before we can plot this variable on the x-axis, we need to convert it into a numerical variable using the as.numeric() function within the mutate() function, which we saw in Section 4.5 on mutating existing variables to create new ones. guat_dem_tidy &lt;- guat_dem_tidy %&gt;% mutate(year = as.numeric(year)) We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a geom_line(): ggplot(guat_dem_tidy, aes(x = year, y = democracy_score)) + geom_line() + labs(x = &quot;Year&quot;, y = &quot;Democracy Score&quot;, title = &quot;Democracy score in Guatemala 1952-1992&quot;) Learning check (LC5.4) Convert the dem_score data frame into a tidy data frame and assign the name of dem_score_tidy to the resulting long-formatted data frame. (LC5.5) Read in the life expectancy data stored at https://moderndive.com/data/le_mess.csv and convert it to a tidy data frame. 5.4 Conclusion 5.4.1 tidyverse package Notice at the beginning of the chapter we loaded the following four packages, which are among the four of the most frequently used R packages for data science: library(dplyr) library(ggplot2) library(readr) library(tidyr) There is a much quicker way to load these packages than by individually loading them as we did above: by installing and loading the tidyverse package. The tidyverse package acts as an “umbrella” package whereby installing/loading it will install/load multiple packages at once for you. So after installing the tidyverse package as you would a normal package, running this: library(tidyverse) would be the same as running this: library(ggplot2) library(dplyr) library(tidyr) library(readr) library(purrr) library(tibble) library(stringr) library(forcats) You’ve seen the first 4 of the these packages: ggplot2 for data visualization, dplyr for data wrangling, tidyr for converting data to “tidy” format, and readr for importing spreadsheet data into R. The remaining packages (purrr, tibble, stringr, and forcats) are left for a more advanced book; check out R for Data Science to learn about these packages. The tidyverse “umbrella” package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in “tidy” format and all output data frames are in “tidy” format as well. This standardization of input and output data frames makes transitions between the various functions in these packages as seamless as possible. 5.4.2 Additional resources An R script file of all R code used in this chapter is available here. If you want to learn more about using the readr and tidyr package, we suggest you that you check out RStudio’s “Data Import” cheatsheet. You can access this cheatsheet by going to RStudio’s cheatsheet page and searching for “Data Import Cheat Sheet”. FIGURE 5.3: Data Import cheatsheat 5.4.3 What’s to come? Congratulations! We’ve completed the “Data Science via the tidyverse” portion of this book! We’ll now move to the “data modeling” portion in Chapters 6 and 7, where you’ll leverage your data visualization and wrangling skills to model relationships between different variables in data frames. However, we’re going to leave the Chapter 11 on “Inference for Regression” until after we’ve covered statistical inference. FIGURE 5.4: ModernDive flowchart - On to Part II! References "],
+["6-regression.html", "Chapter 6 Basic Regression 6.1 One numerical explanatory variable 6.2 One categorical explanatory variable 6.3 Related topics 6.4 Conclusion", " Chapter 6 Basic Regression Now that we are equipped with data visualization skills from Chapter 3, an understanding of the “tidy” data format from Chapter 5, and data wrangling skills from Chapter 4, we now proceed with data modeling. The fundamental premise of data modeling is to make explicit the relationship between: an outcome variable \\(y\\), also called a dependent variable and an explanatory/predictor variable \\(x\\), also called an independent variable or covariate. Another way to state this is using mathematical terminology: we will model the outcome variable \\(y\\) as a function of the explanatory/predictor variable \\(x\\). Why do we have two different labels, explanatory and predictor, for the variable \\(x\\)? That’s because roughly speaking data modeling can be used for two purposes: Modeling for prediction: You want to predict an outcome variable \\(y\\) based on the information contained in a set of predictor variables. You don’t care so much about understanding how all the variables relate and interact, but so long as you can make good predictions about \\(y\\), you’re fine. For example, if we know many individuals’ risk factors for lung cancer, such as smoking habits and age, can we predict whether or not they will develop lung cancer? Here we wouldn’t care so much about distinguishing the degree to which the different risk factors contribute to lung cancer, but instead only on whether or not they could be put together to make reliable predictions. Modeling for explanation: You want to explicitly describe the relationship between an outcome variable \\(y\\) and a set of explanatory variables, determine the significance of any found relationships, and have measures summarizing these. Continuing our example from above, we would now be interested in describing the individual effects of the different risk factors and quantifying the magnitude of these effects. One reason could be to design an intervention to reduce lung cancer cases in a population, such as targeting smokers of a specific age group with an advertisement for smoking cessation programs. In this book, we’ll focus more on this latter purpose. Data modeling is used in a wide variety of fields, including statistical inference, causal inference, artificial intelligence, and machine learning. There are many techniques for data modeling, such as tree-based models, neural networks and deep learning, and supervised learning. In this chapter, we’ll focus on one particular technique: linear regression, one of the most commonly-used and easy-to-understand approaches to modeling. Recall our discussion in Subsection 2.4.3 on numerical and categorical variables. Linear regression involves: an outcome variable \\(y\\) that is numerical and explanatory variables \\(\\vec{x}\\) that are either numerical or categorical. With linear regression there is always only one numerical outcome variable \\(y\\) but we have choices on both the number and the type of explanatory variables \\(\\vec{x}\\) to use. We’re going to cover the following regression scenarios: In this current chapter on basic regression, we’ll always have only one explanatory variable. In Section 6.1, this explanatory variable will be a single numerical explanatory variable \\(x\\). This scenario is known as simple linear regression. In Section 6.2, this explanatory variable will be a categorical explanatory variable \\(x\\). In the next chapter, Chapter 7 on multiple regression, we’ll have more than one explanatory variable: We’ll focus on two numerical explanatory variables \\(x_1\\) and \\(x_2\\) in Section 7.1. This can be denoted as \\(\\vec{x}\\) as well since we have more than one explanatory variable. We’ll use one numerical and one categorical explanatory variable in Section 7.1. We’ll also introduce interaction models here; there, the effect of one explanatory variable depends on the value of another. We’ll study all four of these regression scenarios using real data, all easily accessible via R packages! Needed packages In this chapter we introduce a new package, moderndive, that is an accompaniment package to this ModernDive book. It includes useful functions for linear regression and other functions as well as data used later in the book. Let’s now load all the packages needed for this chapter. If needed, read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(gapminder) library(skimr) 6.1 One numerical explanatory variable Why do some professors and instructors at universities and colleges get high teaching evaluations from students while others don’t? What factors can explain these differences? Are there biases? These are questions that are of interest to university/college administrators, as teaching evaluations are among the many criteria considered in determining which professors and instructors should get promotions. Researchers at the University of Texas in Austin, Texas (UT Austin) tried to answer this question: what factors can explain differences in instructor’s teaching evaluation scores? To this end, they collected information on \\(n = 463\\) instructors. A full description of the study can be found at openintro.org. We’ll keep things simple for now and try to explain differences in instructor evaluation scores as a function of one numerical variable: their “beauty score.” The specifics on how this score was calculated will be described shortly. Could it be that instructors with higher beauty scores also have higher teaching evaluations? Could it be instead that instructors with higher beauty scores tend to have lower teaching evaluations? Or could it be there is no relationship between beauty score and teaching evaluations? We’ll achieve ways to address these questions by modeling the relationship between these two variables with a particular kind of linear regression called simple linear regression. Simple linear regression is the most basic form of linear regression. With it we have A numerical outcome variable \\(y\\). In this case, their teaching score. A single numerical explanatory variable \\(x\\). In this case, their beauty score. 6.1.1 Exploratory data analysis A crucial step before doing any kind of modeling or analysis is performing an exploratory data analysis, or EDA, of all our data. Exploratory data analysis can give you a sense of the distribution of the data, and whether there are outliers and/or missing values. Most importantly, it can inform how to build your model. There are many approaches to exploratory data analysis; here are three: Most fundamentally: just looking at the raw values, in a spreadsheet for example. While this may seem trivial, many people ignore this crucial step! Computing summary statistics like means, medians, and standard deviations. Creating data visualizations. Let’s load the data, select only a subset of the variables, and look at the raw values. Recall you can look at the raw values by running View() in the console in RStudio to pop-up the spreadsheet viewer with the data frame of interest as the argument to View(). Here, however, we present only a snapshot of five randomly chosen rows: evals_ch6 &lt;- evals %&gt;% select(score, bty_avg, age) evals_ch6 %&gt;% sample_n(5) TABLE 6.1: Random sample of 5 instructors score bty_avg age 3.6 6.67 34 4.9 3.50 43 3.3 2.33 47 4.4 4.67 33 4.7 3.67 60 While a full description of each of these variables can be found at openintro.org, let’s summarize what each of these variables represents. score: Numerical variable of the average teaching score based on students’ evaluations between 1 and 5. This is the outcome variable \\(y\\) of interest. bty_avg: Numerical variable of average “beauty” rating based on a panel of 6 students’ scores between 1 and 10. This is the numerical explanatory variable \\(x\\) of interest. Here 1 corresponds to a low beauty rating and 10 to a high beauty rating. age: A numerical variable of age in years as an integer value. Another way to look at the raw values is using the glimpse() function, which gives us a slightly different view of the data. We see Observations: 463, indicating that there are 463 observations in evals, each corresponding to a particular instructor at UT Austin. Expressed differently, each row in the data frame evals corresponds to one of 463 instructors. glimpse(evals_ch6) Observations: 463 Variables: 3 $ score &lt;dbl&gt; 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4.5, 4… $ bty_avg &lt;dbl&gt; 5.00, 5.00, 5.00, 5.00, 3.00, 3.00, 3.00, 3.33, 3.33, 3.17, 3… $ age &lt;int&gt; 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, 40, 4… Since both the outcome variable score and the explanatory variable bty_avg are numerical, we can compute summary statistics about them such as the mean, median, and standard deviation. Let’s take evals_ch6 and select only the two variables of interest for now. However, let’s instead pipe this into the skim() function from the skimr package. This function quickly uses a “skim” of the data to return the following summary information about each variable. evals_ch6 %&gt;% select(score, bty_avg) %&gt;% skim() Skim summary statistics n obs: 463 n variables: 2 Variable type: numeric variable missing complete n mean sd p0 p25 p50 p75 p100 hist bty_avg 0 463 463 4.42 1.53 1.67 3.17 4.33 5.5 8.17 ▂▅▅▇▃▃▂▁ score 0 463 463 4.17 0.54 2.3 3.8 4.3 4.6 5 ▁▁▂▃▅▇▇▆ In this case for our two numerical variables bty_avg beauty score and teaching score score it returns: missing: the number of missing values complete: the number of non-missing or complete values n: the total number of values mean: the average sd: the standard deviation p0: the 0th percentile: the value at which 0% of observations are smaller than it. This is also known as the minimum p25: the 25th percentile: the value at which 25% of observations are smaller than it. This is also known as the 1st quartile p50: the 50th percentile: the value at which 50% of observations are smaller than it. This is also know as the 2nd quartile and more commonly the median p75: the 75th percentile: the value at which 75% of observations are smaller than it. This is also known as the 3rd quartile p100: the 100th percentile: the value at which 100% of observations are smaller than it. This is also known as the maximum A quick snapshot of the histogram We get an idea of how the values in both variables are distributed. For example, the mean teaching score was 4.17 out of 5 whereas the mean beauty score was 4.42 out of 10. Furthermore, the middle 50% of teaching scores were between 3.80 and 4.6 (the first and third quartiles) while the middle 50% of beauty scores were between 3.17 and 5.5 out of 10. The skim() function however only returns what are called univariate summaries, i.e. summaries about single variables at a time. Since we are considering the relationship between two numerical variables, it would be nice to have a summary statistic that simultaneously considers both variables. The correlation coefficient is a bivariate summary statistic that fits this bill. Coefficients in general are quantitative expressions of a specific property of a phenomenon. A correlation coefficient is a quantitative expression between -1 and 1 that summarizes the strength of the linear relationship between two numerical variables: -1 indicates a perfect negative relationship: as the value of one variable goes up, the value of the other variable tends to go down. 0 indicates no relationship: the values of both variables go up/down independently of each other. +1 indicates a perfect positive relationship: as the value of one variable goes up, the value of the other variable tends to go up as well. Figure 6.1 gives examples of different correlation coefficient values for hypothetical numerical variables \\(x\\) and \\(y\\). We see that while for a correlation coefficient of -0.75 there is still a negative relationship between \\(x\\) and \\(y\\), it is not as strong as the negative relationship between \\(x\\) and \\(y\\) when the correlation coefficient is -1. FIGURE 6.1: Different correlation coefficients The correlation coefficient is computed using the get_correlation() function in the moderndive package, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. We place the name of the response variable on the left hand side of the ~ and the explanatory variable on the right hand side of the “tilde.” We will use this same “formula” syntax with regression later in this chapter. evals_ch6 %&gt;% get_correlation(formula = score ~ bty_avg) # A tibble: 1 x 1 correlation &lt;dbl&gt; 1 0.187 The correlation coefficient can also be computed using the cor() function, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. Recall from Subsection 2.4.3 that the $ pulls out specific variables from a data frame: cor(x = evals_ch6$bty_avg, y = evals_ch6$score) [1] 0.187 In our case, the correlation coefficient of 0.187 indicates that the relationship between teaching evaluation score and beauty average is “weakly positive.” There is a certain amount of subjectivity in interpreting correlation coefficients, especially those that aren’t close to -1, 0, and 1. For help developing such intuition and more discussion on the correlation coefficient see Subsection 6.3.1 below. Let’s now proceed by visualizing this data. Since both the score and bty_avg variables are numerical, a scatterplot is an appropriate graph to visualize this data. Let’s do this using geom_point() and set informative axes labels and title and display the result in Figure 6.2. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) FIGURE 6.2: Instructor evaluation scores at UT Austin Observe the following: Most “beauty” scores lie between 2 and 8. Most teaching scores lie between 3 and 5. Recall our earlier computation of the correlation coefficient, which describes the strength of the linear relationship between two numerical variables. Looking at Figure 6.3, it is not immediately apparent that these two variables are positively related. This is to be expected given the positive, but rather weak (close to 0), correlation coefficient of 0.187. Before we continue, we bring to light an important fact about this dataset: it suffers from overplotting. Recall from the data visualization Subsection 3.3.2 that overplotting occurs when several points are stacked directly on top of each other thereby obscuring the number of points. For example, let’s focus on the 6 points in the top-right of the plot with a beauty score of around 8 out of 10: are there truly only 6 points, or are there many more just stacked on top of each other? You can think of these as ties. Let’s break up these ties with a little random “jitter” added to the points in Figure 6.3. FIGURE 6.3: Instructor evaluation scores at UT Austin: Jittered Jittering adds a little random bump to each of the points to break up these ties: just enough so you can distinguish them, but not so much that the plot is overly altered. Furthermore, jittering is strictly a visualization tool; it does not alter the original values in the dataset. Let’s compare side-by-side the regular scatterplot in Figure 6.2 with the jittered scatterplot in Figure 6.3 in Figure 6.4. FIGURE 6.4: Comparing regular and jittered scatterplots. We make several further observations: Focusing our attention on the top-right of the plot again, as noted earlier where there seemed to only be 6 points in the regular scatterplot, we see there were in fact really 9 as seen in the jittered scatterplot. A further interesting trend is that the jittering revealed a large number of instructors with beauty scores of between 3 and 4.5, towards the lower end of the beauty scale. To keep things simple in this chapter, we’ll present regular scatterplots rather than the jittered scatterplots, though we’ll keep the overplotting in mind whenever looking at such plots. Going back to scatterplot in Figure 6.2, let’s improve on it by adding a “regression line” in Figure 6.5. This is easily done by adding a new layer to the ggplot code that created Figure 6.3: + geom_smooth(method = &quot;lm&quot;). A regression line is a “best fitting” line in that of all possible lines you could draw on this plot, it is “best” in terms of some mathematical criteria. We discuss the criteria for “best” in Subsection 6.3.3 below, but we suggest you read this only after covering the concept of a residual coming up in Subsection 6.1.3. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) + geom_smooth(method = &quot;lm&quot;) FIGURE 6.5: Regression line When viewed on this plot, the regression line is a visual summary of the relationship between two numerical variables, in our case the outcome variable score and the explanatory variable bty_avg. The positive slope of the blue line is consistent with our observed correlation coefficient of 0.187 suggesting that there is a positive relationship between score and bty_avg. We’ll see later however that while the correlation coefficient is not equal to the slope of this line, they always have the same sign: positive or negative. What are the grey bands surrounding the blue line? These are standard error bands, which can be thought of as error/uncertainty bands. Let’s skip this idea for now and suppress these grey bars by adding the argument se = FALSE to geom_smooth(method = &quot;lm&quot;). We’ll introduce standard errors in Chapter 8 on sampling, use them for constructing confidence intervals and conducting hypothesis tests in Chapters 9 and 10, and consider them when we revisit regression in Chapter 11. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) FIGURE 6.6: Regression line without error bands Learning check (LC6.1) Conduct a new exploratory data analysis with the same outcome variable \\(y\\) being score but with age as the new explanatory variable \\(x\\). Remember, this involves three things: Looking at the raw values. Computing summary statistics of the variables of interest. Creating informative visualizations. What can you say about the relationship between age and teaching scores based on this exploration? 6.1.2 Simple linear regression You may recall from secondary school / high school algebra, in general, the equation of a line is \\(y = a + bx\\), which is defined by two coefficients. Recall we defined this earlier as “quantitative expressions of a specific property of a phenomenon.” These two coefficients are: the intercept coefficient \\(a\\), or the value of \\(y\\) when \\(x = 0\\), and the slope coefficient \\(b\\), or the increase in \\(y\\) for every increase of one in \\(x\\). However, when defining a line specifically for regression, like the blue regression line in Figure 6.6, we use slightly different notation: the equation of the regression line is \\(\\widehat{y} = b_0 + b_1 \\cdot x\\) where the intercept coefficient is \\(b_0\\), or the value of \\(\\widehat{y}\\) when \\(x=0\\), and the slope coefficient \\(b_1\\), or the increase in \\(\\widehat{y}\\) for every increase of one in \\(x\\). Why do we put a “hat” on top of the \\(y\\)? It’s a form of notation commonly used in regression, which we’ll introduce in the next Subsection 6.1.3 when we discuss fitted values. For now, let’s ignore the hat and treat the equation of the line as you would from secondary school / high school algebra recognizing the slope and the intercept. We know looking at Figure 6.6 that the slope coefficient corresponding to bty_avg should be positive. Why? Because as bty_avg increases, professors tend to roughly have larger teaching evaluation scores. However, what are the specific values of the intercept and slope coefficients? Let’s not worry about computing these by hand, but instead let the computer do the work for us. Specifically let’s use R! Let’s get the value of the intercept and slope coefficients by outputting something called the linear regression table. We will fit the linear regression model to the data using the lm() function and save this to score_model. lm stands for “linear model”, given that we are dealing with lines. When we say “fit”, we are saying find the best fitting line to this data. The lm() function that “fits” the linear regression model is typically used as lm(y ~ x, data = data_frame_name) where: y is the outcome variable, followed by a tilde (~). This is likely the key to the left of “1” on your keyboard. In our case, y is set to score. x is the explanatory variable. In our case, x is set to bty_avg. We call the combination y ~ x a model formula. Recall the use of this notation when we computed the correlation coefficient using the get_correlation() function in Subsection 6.1.1. data_frame_name is the name of the data frame that contains the variables y and x. In our case, data_frame_name is the evals_ch6 data frame. score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) score_model Call: lm(formula = score ~ bty_avg, data = evals_ch6) Coefficients: (Intercept) bty_avg 3.8803 0.0666 This output is telling us that the Intercept coefficient \\(b_0\\) of the regression line is 3.8803 and the slope coefficient for by_avg is 0.0666. Therefore the blue regression line in Figure 6.6 is \\[\\widehat{\\text{score}} = b_0 + b_{\\text{bty avg}} \\cdot\\text{bty avg} = 3.8803 + 0.0666\\cdot\\text{ bty avg}\\] where The intercept coefficient \\(b_0 = 3.8803\\) means for instructors that had a hypothetical beauty score of 0, we would expect them to have on average a teaching score of 3.8803. In this case however, while the intercept has a mathematical interpretation when defining the regression line, there is no practical interpretation since score is an average of a panel of 6 students’ ratings from 1 to 10, a bty_avg of 0 would be impossible. Furthermore, no instructors had a beauty score anywhere near 0 in this data. Of more interest is the slope coefficient associated with bty_avg: \\(b_{\\text{bty avg}} = +0.0666\\). This is a numerical quantity that summarizes the relationship between the outcome and explanatory variables. Note that the sign is positive, suggesting a positive relationship between beauty scores and teaching scores, meaning as beauty scores go up, so also do teaching scores go up. The slope’s precise interpretation is: For every increase of 1 unit in bty_avg, there is an associated increase of, on average, 0.0666 units of score. Such interpretations need be carefully worded: We only stated that there is an associated increase, and not necessarily a causal increase. For example, perhaps it’s not that beauty directly affects teaching scores, but instead individuals from wealthier backgrounds tend to have had better education and training, and hence have higher teaching scores, but these same individuals also have higher beauty scores. Avoiding such reasoning can be summarized by the adage “correlation is not necessarily causation.” In other words, just because two variables are correlated, it doesn’t mean one directly causes the other. We discuss these ideas more in Subsection 6.3.2. We say that this associated increase is on average 0.0666 units of teaching score and not that the associated increase is exactly 0.0666 units of score across all values of bty_avg. This is because the slope is the average increase across all points as shown by the regression line in Figure 6.6. Now that we’ve learned how to compute the equation for the blue regression line in Figure 6.6 and interpreted all its terms, let’s take our modeling one step further. This time after fitting the model using the lm(), let’s get something called the regression table using the get_regression_table() function from the moderndive package: # Fit regression model: score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) # Get regression table: get_regression_table(score_model) TABLE 6.2: Linear regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 Note how we took the output of the model fit saved in score_model and used it as an input to the subsequent get_regression_table() function. The output now looks like a table: in fact it is a data frame. The values of the intercept and slope of 3.880 and 0.0666 are now in the estimate column. But what are the remaining 5 columns: std_error, statistic, p_value, lower_ci and upper_ci? What do they tell us? They tell us about both the statistical significance and practical significance of our model results. You can think of this loosely as the “meaningfulness” of the results from a statistical perspective. We are going to put aside these ideas for now and revisit them in Chapter 11 on (statistical) inference for regression, after we’ve had a chance to cover: Standard errors in Chapter 8 (std_error) Confidence intervals in Chapter 9 (lower_ci and upper_ci) Hypothesis testing in Chapter 10 (statistic and p_value). For now, we’ll only focus on the term and estimate columns of any regression table. The get_regression_table() from the moderndive is an example of what’s known as a wrapper function in computer programming, which takes other pre-existing functions and “wraps” them into a single function. This concept is illustrated in Figure 6.7. FIGURE 6.7: The concept of a ‘wrapper’ function. So all you need to worry about is the what the inputs look like and what the outputs look like; you leave all the other details “under the hood of the car.” In our regression modeling example, the get_regression_table() has Input: A saved lm() linear regression Output: A data frame with information on the intercept and slope of the regression line. If you’re interested in learning more about the get_regression_table() function’s construction and thinking, see Subsection 6.3.4 below. Learning check (LC6.2) Fit a new simple linear regression using lm(score ~ age, data = evals_ch6) where age is the new explanatory variable \\(x\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 6.1.3 Observed/fitted values and residuals We just saw how to get the value of the intercept and the slope of the regression line from the regression table generated by get_regression_table(). Now instead, say we want information on individual points. In this case, we focus on one of the \\(n = 463\\) instructors in this dataset, corresponding to a single row of evals_ch6. For example, say we are interested in the 21st instructor in this dataset: TABLE 6.3: Data for 21st instructor score bty_avg age 4.9 7.33 31 What is the value on the blue line corresponding to this instructor’s bty_avg of 7.333? In Figure 6.8 we mark three values in particular corresponding to this instructor. Red circle: This is the observed value \\(y\\) = 4.9 and corresponds to this instructor’s actual teaching score. Red square: This is the fitted value \\(\\widehat{y}\\) and corresponds to the value on the regression line for \\(x\\) = 7.333. This value is computed using the intercept and slope in the regression table above: \\[\\widehat{y} = b_0 + b_1 \\cdot x = 3.88 + 0.067 * 7.333 = 4.369\\] Blue arrow: The length of this arrow is the residual and is computed by subtracting the fitted value \\(\\widehat{y}\\) from the observed value \\(y\\). The residual can be thought of as the error or “lack of fit” of the regression line. In the case of this instructor, it is \\(y - \\widehat{y}\\) = 4.9 - 4.369 = 0.531. In other words, the model was off by 0.531 teaching score units for this instructor. FIGURE 6.8: Example of observed value, fitted value, and residual What if we want both the fitted value \\(\\widehat{y} = b_0 + b_1 \\cdot x\\) and the residual \\(y - \\widehat{y}\\) not only the 21st instructor but for all 463 instructors in the study? Recall that each instructor corresponds to one of the 463 rows in the evals_ch6 data frame and also one of the 463 points in the regression plot in Figure 6.6. We could repeat the above calculations by hand 463 times, but that would be tedious and time consuming. Instead, let’s use the get_regression_points() function that we’ve included in the moderndive R package. Note that in the table below we only present the results for the 21st through the 24th instructors. regression_points &lt;- get_regression_points(score_model) regression_points TABLE 6.4: Regression points (for only 21st through 24th instructor) ID score bty_avg score_hat residual 21 4.9 7.33 4.37 0.531 22 4.6 7.33 4.37 0.231 23 4.5 7.33 4.37 0.131 24 4.4 5.50 4.25 0.153 Just as with the get_regression_table() function, the inputs to the get_regression_points() function are the same, however the outputs are different. Let’s inspect the individual columns: The score column represents the observed value of the outcome variable \\(y\\). The bty_avg column represents the values of the explanatory variable \\(x\\). The score_hat column represents the fitted values \\(\\widehat{y}\\). The residual column represents the residuals \\(y - \\widehat{y}\\). get_regression_points() is another example of a wrapper function we described in Figure 6.7. If you’re curious about this function as well, check out Subsection 6.3.4. Just as we did for the 21st instructor in the evals_ch6 dataset (in the first row of the table above), let’s repeat the above calculations for the 24th instructor in the evals_ch6 dataset (in the fourth row of the table above): score = 4.4 is the observed value \\(y\\) for this instructor. bty_avg = 5.50 is the value of the explanatory variable \\(x\\) for this instructor. score_hat = 4.25 = 3.88 + 0.067 * \\(x\\) = 3.88 + 0.067 * 5.50 is the fitted value \\(\\widehat{y}\\) for this instructor. residual = 0.153 = 4.4 - 4.25 is the value of the residual for this instructor. In other words, the model was off by 0.153 teaching score units for this instructor. More development of this idea appears in Section 6.3.3 and we encourage you to read that section after you investigate residuals. 6.2 One categorical explanatory variable It’s an unfortunate truth that life expectancy is not the same across various countries in the world; there are a multitude of factors that are associated with how long people live. International development agencies are very interested in studying these differences in the hope of understanding where governments should allocate resources to address this problem. In this section, we’ll explore differences in life expectancy in two ways: Differences between continents: Are there significant differences in life expectancy, on average, between the five continents of the world: Africa, the Americas, Asia, Europe, and Oceania? Differences within continents: How does life expectancy vary within the world’s five continents? For example, is the spread of life expectancy among the countries of Africa larger than the spread of life expectancy among the countries of Asia? To answer such questions, we’ll study the gapminder dataset in the gapminder package. Recall we mentioned this dataset in Subsection 3.1.2 when we first studied the “Grammar of Graphics” introduced in Figure 3.1. This dataset has international development statistics such as life expectancy, GDP per capita, and population by country (\\(n\\) = 142) for 5-year intervals between 1952 and 2007. We’ll use this data for linear regression again, but note that our explanatory variable \\(x\\) is now categorical, and not numerical like when we covered simple linear regression in Section 6.1. More precisely, we have: A numerical outcome variable \\(y\\). In this case, life expectancy. A single categorical explanatory variable \\(x\\), In this case, the continent the country is part of. When the explanatory variable \\(x\\) is categorical, the concept of a “best-fitting” line is a little different than the one we saw previously in Section 6.1 where the explanatory variable \\(x\\) was numerical. We’ll study these differences shortly in Subsection 6.2.2, but first we conduct our exploratory data analysis. 6.2.1 Exploratory data analysis Let’s load the gapminder data and filter() for only observations in 2007. Next we select() only the variables we’ll need along with gdpPercap, which is each country’s gross domestic product per capita (GDP). GDP is a rough measure of that country’s economic performance. (This will be used for the upcoming Learning Check). Lastly, we save this in a data frame with name gapminder2007: library(gapminder) gapminder2007 &lt;- gapminder %&gt;% filter(year == 2007) %&gt;% select(country, continent, lifeExp, gdpPercap) You should look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function. In Table 6.5 we only show 5 randomly selected countries out of 142: View(gapminder2007) TABLE 6.5: Random sample of 5 countries country continent lifeExp gdpPercap Namibia Africa 52.9 4811 Portugal Europe 78.1 20510 Iran Asia 71.0 11606 Brazil Americas 72.4 9066 Italy Europe 80.5 28570 glimpse(gapminder2007) Observations: 142 Variables: 4 $ country &lt;fct&gt; Afghanistan, Albania, Algeria, Angola, Argentina, Australia… $ continent &lt;fct&gt; Asia, Europe, Africa, Africa, Americas, Oceania, Europe, As… $ lifeExp &lt;dbl&gt; 43.8, 76.4, 72.3, 42.7, 75.3, 81.2, 79.8, 75.6, 64.1, 79.4,… $ gdpPercap &lt;dbl&gt; 975, 5937, 6223, 4797, 12779, 34435, 36126, 29796, 1391, 33… We see that the variable continent is indeed categorical, as it is encoded as fct which stands for “factor.” This is R’s way of storing categorical variables. Let’s once again apply the skim() function from the skimr package to our two variables of interest: continent and lifeExp: gapminder2007 %&gt;% select(continent, lifeExp) %&gt;% skim() Skim summary statistics n obs: 142 n variables: 2 ── Variable type:factor ────────────────────────────────────────────────────────────────────────────────────────────────────── variable missing complete n n_unique top_counts continent 0 142 142 5 Afr: 52, Asi: 33, Eur: 30, Ame: 25 ordered FALSE ── Variable type:numeric ───────────────────────────────────────────────────────────────────────────────────────────────────── variable missing complete n mean sd p0 p25 p50 p75 p100 lifeExp 0 142 142 67.01 12.07 39.61 57.16 71.94 76.41 82.6 hist ▂▂▂▂▂▃▇▇ The output now reports summaries for categorical variables (the variable type: factor) separately from the numerical variables. For the categorical variable continent it now reports: missing, complete, n as before which are the number of missing, complete, and total number of values. n_unique: The unique number of levels to this variable, corresponding to Africa, Asia, Americas, Europe, and Oceania top_counts: In this case the top four counts: Africa has 52 entries each corresponding to a country, Asia has 33, Europe has 30, and Americans has 25. Not displayed is Oceania with 2 countries ordered: Reporting whether the variable is “ordinal.” In this case, it is not ordered. Given that the global median life expectancy is 71.94, half of the world’s countries (71 countries) will have a life expectancy less than 71.94. Further, half will have a life expectancy greater than this value. The mean life expectancy of 67.01 is lower however. Why are these two values different? Let’s look at a histogram of lifeExp in Figure 6.9 to see why. FIGURE 6.9: Histogram of Life Expectancy in 2007 We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancy that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a group_by(continent) to the above code: lifeExp_by_continent &lt;- gapminder2007 %&gt;% group_by(continent) %&gt;% summarize(median = median(lifeExp), mean = mean(lifeExp)) TABLE 6.6: Life expectancy by continent continent median mean Africa 52.9 54.8 Americas 72.9 73.6 Asia 72.4 70.7 Europe 78.6 77.6 Oceania 80.7 80.7 We see now that there are differences in life expectancy between the continents. For example let’s focus on only medians. While the median life expectancy across all \\(n = 142\\) countries in 2007 was 71.935, the median life expectancy across the \\(n =52\\) countries in Africa was only 52.927. Let’s create a corresponding visualization. One way to compare the life expectancy of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section 3.6, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure 6.10, the variable we facet by is continent, which is categorical with five levels, each corresponding to the five continents of the world. ggplot(gapminder2007, aes(x = lifeExp)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + labs(x = &quot;Life expectancy&quot;, y = &quot;Number of countries&quot;, title = &quot;Life expectancy by continent&quot;) + facet_wrap(~ continent, nrow = 2) FIGURE 6.10: Life expectancy in 2007 Another way would be via a geom_boxplot where we map the categorical variable continent to the \\(x\\)-axis and the different life expectancy within each continent on the \\(y\\)-axis; we do this in Figure 6.11. ggplot(gapminder2007, aes(x = continent, y = lifeExp)) + geom_boxplot() + labs(x = &quot;Continent&quot;, y = &quot;Life expectancy (years)&quot;, title = &quot;Life expectancy by continent&quot;) FIGURE 6.11: Life expectancy in 2007 Some people prefer comparing a numerical variable between different levels of a categorical variable, in this case comparing life expectancy between different continents, using a boxplot over a faceted histogram as we can make quick comparisons with single horizontal lines. For example, we can see that even the country with the highest life expectancy in Africa is still lower than all countries in Oceania. It’s important to remember however that the solid lines in the middle of the boxes correspond to the medians (i.e. the middle value) rather than the mean (the average). So, for example, if you look at Asia, the solid line denotes the median life expectancy of around 72 years, indicating to us that half of all countries in Asia have a life expectancy below 72 years whereas half of all countries in Asia have a life expectancy above 72 years. Furthermore, note that: Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes). Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand. Now, let’s start making comparisons of life expectancy between continents. Let’s use Africa as a baseline for comparison. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa: The median life expectancy of the Americas is roughly 20 years greater. The median life expectancy of Asia is roughly 20 years greater. The median life expectancy of Europe is roughly 25 years greater. The median life expectancy of Oceania is roughly 27.8 years greater. Let’s remember these four differences vs Africa corresponding to the Americas, Asia, Europe, and Oceania: 20, 20, 25, 27.8. Learning check (LC6.3) Conduct a new exploratory data analysis with the same explanatory variable \\(x\\) being continent but with gdpPercap as the new outcome variable \\(y\\). Remember, this involves three things: Looking at the raw values Computing summary statistics of the variables of interest. Creating informative visualizations What can you say about the differences in GDP per capita between continents based on this exploration? 6.2.2 Linear regression In Subsection 6.1.2 we introduced simple linear regression, which involves modeling the relationship between a numerical outcome variable \\(y\\) as a function of a numerical explanatory variable \\(x\\), in our life expectancy example, we now have a categorical explanatory variable \\(x\\) continent. While we still can fit a regression model, given our categorical explanatory variable we no longer have a concept of a “best-fitting” line, but rather “differences relative to a baseline for comparison.” Before we fit our regression model, let’s create a table similar to Table 6.6, but Report the mean life expectancy for each continent. Report the difference in mean life expectancy relative to Africa’s mean life expectancy of 54.806 in the column “mean vs Africa”; this column is simply the “mean” column minus 54.806. Think back to your observations from the eyeball test of Figure 6.11 at the end of the last subsection. The column “mean vs Africa” is the same idea of comparing a summary statistic to a baseline for comparison, in this case the countries of Africa, but using means instead of medians. TABLE 6.7: Mean life expectancy by continent continent mean mean vs Africa Africa 54.8 0.0 Americas 73.6 18.8 Asia 70.7 15.9 Europe 77.6 22.8 Oceania 80.7 25.9 Now, let’s use the get_regression_table() function we introduced in Section 6.1.2 to get the regression table for gapminder2007 analysis: lifeExp_model &lt;- lm(lifeExp ~ continent, data = gapminder2007) get_regression_table(lifeExp_model) TABLE 6.8: Linear regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 54.8 1.02 53.45 0 52.8 56.8 continentAmericas 18.8 1.80 10.45 0 15.2 22.4 continentAsia 15.9 1.65 9.68 0 12.7 19.2 continentEurope 22.8 1.70 13.47 0 19.5 26.2 continentOceania 25.9 5.33 4.86 0 15.4 36.5 Just as before, we have the term and estimates columns of interest, but unlike before, we now have 5 rows corresponding to 5 outputs in our table: an intercept like before, but also continentAmericas, continentAsia, continentEurope, and continentOceania. What are these values? First, we must describe the equation for fitted value \\(\\widehat{y}\\), which is a little more complicated when the \\(x\\) explanatory variable is categorical: \\[\\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x) \\end{align}\\] Let’s break this down. First, \\(\\mathbb{1}_{A}(x)\\) is what’s known in mathematics as an “indicator function” that takes one of two possible values: \\[ \\mathbb{1}_{A}(x) = \\left\\{ \\begin{array}{ll} 1 &amp; \\text{if } x \\text{ is in } A \\\\ 0 &amp; \\text{if } \\text{otherwise} \\end{array} \\right. \\] In a statistical modeling context this is also known as a “dummy variable”. In our case, let’s consider the first such indicator variable: \\[ \\mathbb{1}_{\\mbox{Amer}}(x) = \\left\\{ \\begin{array}{ll} 1 &amp; \\text{if } \\text{country } x \\text{ is in the Americas} \\\\ 0 &amp; \\text{otherwise}\\end{array} \\right. \\] Now let’s interpret the terms in the estimate column of the regression table. First \\(b_0 =\\) intercept = 54.8 corresponds to the mean life expectancy for countries in Africa, since for country \\(x\\) in Africa we have the following equation: \\[\\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 0 + 15.9\\cdot 0 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 \\end{align}\\] i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table 6.7. Next, \\(b_{\\text{Amer}}\\) = continentAmericas = 18.8 is the difference in mean life expectancy of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is: \\[\\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 1 + 15.9\\cdot 0 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 + 18.8\\\\ &amp;= 72.9 \\end{align}\\] i.e. in this case, only the indicator function \\(\\mathbb{1}_{\\mbox{Amer}}(x)\\) is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table 6.7. Similarly, \\(b_{\\text{Asia}}\\) = continentAsia = 15.9 is the difference in mean life expectancy of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is: \\[\\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 0 + 15.9\\cdot 1 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 + 15.9\\\\ &amp;= 70.7 \\end{align}\\] i.e. in this case, only the indicator function \\(\\mathbb{1}_{\\mbox{Asia}}(x)\\) is equal to 1, but all others are 0. Recall that 70.7 corresponds to the group mean life expectancy for all countries in Asia in Table 6.7. The same logic applies to \\(b_{\\text{Euro}} = 22.8\\) and \\(b_{\\text{Ocean}} = 25.9\\); they correspond to the “offset” in mean life expectancy for countries in Europe and Oceania, relative to the mean life expectancy of the baseline group for comparison of African countries. Let’s generalize this idea a bit. If we fit a linear regression model using a categorical explanatory variable \\(x\\) that has \\(k\\) levels, a regression model will return an intercept and \\(k - 1\\) “slope” coefficients. When \\(x\\) is a numerical explanatory variable the interpretation is of a “slope” coefficient, but when \\(x\\) is categorical the meaning is a little trickier. They are offsets relative to the baseline. In our case, since there are \\(k = 5\\) continents, the regression model returns an intercept corresponding to the baseline for comparison Africa and \\(k - 1 = 4\\) slope coefficients corresponding to the Americas, Asia, Europe, and Oceania. Africa was chosen as the baseline by R for no other reason than it is first alphabetically of the 5 continents. You can manually specify which continent to use as baseline instead of the default choice of whichever comes first alphabetically, but we leave that to a more advanced course. (The forcats package is particularly nice for doing this and we encourage you to explore using it.) Learning check (LC6.4) Fit a new linear regression using lm(gdpPercap ~ continent, data = gapminder2007) where gdpPercap is the new outcome variable \\(y\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 6.2.3 Observed/fitted values and residuals Recall in Subsection 6.1.3 when we had a numerical explanatory variable \\(x\\), we defined: Observed values \\(y\\), or the observed value of the outcome variable Fitted values \\(\\widehat{y}\\), or the value on the regression line for a given \\(x\\) value Residuals \\(y - \\widehat{y}\\), or the error between the observed value and the fitted value What do fitted values \\(\\widehat{y}\\) and residuals \\(y - \\widehat{y}\\) correspond to when the explanatory variable \\(x\\) is categorical? Let’s investigate these values for the first 10 countries in the gapminder2007 dataset: TABLE 6.9: First 10 out of 142 countries country continent lifeExp gdpPercap Afghanistan Asia 43.8 975 Albania Europe 76.4 5937 Algeria Africa 72.3 6223 Angola Africa 42.7 4797 Argentina Americas 75.3 12779 Australia Oceania 81.2 34435 Austria Europe 79.8 36126 Bahrain Asia 75.6 29796 Bangladesh Asia 64.1 1391 Belgium Europe 79.4 33693 Recall the get_regression_points() function we used in Subsection 6.1.3 to return the observed value of the outcome variable, all explanatory variables, fitted values, and residuals for all points in the regression. Recall that each “point”. In this case, each row corresponds to one of 142 countries in the gapminder2007 dataset. They are also the 142 observations used to construct the boxplots in Figure 6.11. regression_points &lt;- get_regression_points(lifeExp_model) regression_points TABLE 6.10: Regression points (First 10 out of 142 countries) ID lifeExp continent lifeExp_hat residual 1 43.8 Asia 70.7 -26.900 2 76.4 Europe 77.6 -1.226 3 72.3 Africa 54.8 17.495 4 42.7 Africa 54.8 -12.075 5 75.3 Americas 73.6 1.712 6 81.2 Oceania 80.7 0.515 7 79.8 Europe 77.6 2.180 8 75.6 Asia 70.7 4.907 9 64.1 Asia 70.7 -6.666 10 79.4 Europe 77.6 1.792 Notice The fitted values lifeExp_hat \\(\\widehat{\\text{lifeexp}}\\). Countries in Africa have the same fitted value of 54.8, which is the mean life expectancy of Africa. Countries in Asia have the same fitted value of 70.7, which is the mean life expectancy of Asia. This similarly holds for countries in the Americas, Europe, and Oceania. The residual column is simply \\(y - \\widehat{y}\\) = lifeexp - lifeexp_hat. These values can be interpreted as that particular country’s deviation from the mean life expectancy of the respective continent’s mean. For example, the first row of this dataset corresponds to Afghanistan, and the residual of \\(-26.9 = 43.8 - 70.7\\) is Afghanistan’s mean life expectancy minus the mean life expectancy of all Asian countries. 6.3 Related topics 6.3.1 Correlation coefficient Let’s re-plot Figure 6.1, but now consider a broader range of correlation coefficient values in Figure 6.12. FIGURE 6.12: Different Correlation Coefficients As we suggested in Subsection 6.1.1, interpreting coefficients that are not close to the extreme values of -1 and 1 can be subjective. To develop your sense of correlation coefficients, we suggest you play the following 80’s-style video game called “Guess the correlation”! Click on the image below to do so: 6.3.2 Correlation is not necessarily causation You’ll note throughout this chapter we’ve been very cautious in making statements of the “associated effect” of explanatory variables on the outcome variables, for example our statement from Subsection 6.1.2 that “for every increase of 1 unit in bty_avg, there is an associated increase of, on average, 18.802 units of score.” We stay this because we are careful not to make causal statements. So while beauty score bty_avg is positively correlated with teaching score, does it directly cause effects on teaching score. For example, let’s say an instructor has their bty_avg reevaluated, but only after taking steps to try to boost their beauty score. Does this mean that they will suddenly be a better instructor? Or will they suddenly get higher teaching scores? Maybe? Here is another example, a not-so-great medical doctor goes through their medical records and finds that patients who slept with their shoes on tended to wake up more with headaches. So this doctor declares “Sleeping with shoes on cause headaches!” FIGURE 6.13: Does sleeping with shoes on cause headaches? However as some of you might have guessed, if someone is sleeping with their shoes on its probably because they are intoxicated. Furthermore, drinking more tends to cause more hangovers, and hence more headaches. In this instance, alcohol is what’s known as a confounding/lurking variable. It “lurks” behind the scenes, confounding or making less apparent, the causal effect (if any) of “sleeping with shoes on” with waking up with a headache. We can summarize this notion in Figure 6.14 with a causal graph where: Y: Is an outcome variable, here “waking up with a headache.” X: Is a treatment variable whose causal effect we are interested in, here “sleeping with shoes on.” FIGURE 6.14: Causal graph. So for example, many such studies use regression modeling where the outcome variable is set to Y and the explanatory/predictor variable is X, much as you’ve started learning how to do in this chapter. However, Figure 6.14 also includes a third variable with arrows pointing at both X and Y. Z: Is a confounding variable that affects both X &amp; Y, thus “confounding” their relationship. So as we said, alcohol will both cause people to be more likely to sleep with their shoes on as well as more likely to wake up with a headache. Thus when evaluating what causes one to wake up with a headache, its hard to tease out the effect of sleeping with shoes on versus just the alcohol. Thus our model needs to also use Z as an explanatory/predictor variable as well, in other words our doctor needs to take into account who had been drinking the night before. We’ll start covering multiple regression models that allows us to incorporate more than one variable in the next chapter. Establishing causation is a tricky problem and frequently takes either carefully designed experiments or methods to control for the effects of potential confounding variables. Both these approaches attempt either to remove all confounding variables or take them into account as best they can, and only focus on the behavior of an outcome variable in the presence of the levels of the other variable(s). Be careful as you read studies to make sure that the writers aren’t falling into this fallacy of correlation implying causation. If you spot one, you may want to send them a link to Spurious Correlations. 6.3.3 Best fitting line Regression lines are also known as “best fitting lines”. But what do we mean by best? Let’s unpack the criteria that is used by regression to determine best. Recall the plot in Figure 6.8 where for a instructor with a beauty average score of \\(x=7.333\\) The observed value \\(y=4.9\\) was marked with a red circle The fitted value \\(\\widehat{y} = 4.369\\) on the regression line was marked with a red square The residual \\(y-\\widehat{y} = 4.9-4.369 = 0.531\\) was the length of the blue arrow. Let’s do this for another arbitrarily chosen instructor whose beauty score was \\(x=2.333\\). The residual in this case is \\(2.7 - 4.036 = -1.336\\). Another arbitrarily chosen instructor whose beauty score was \\(x=3.667\\) results in the residual in this case being \\(4.4 - 4.125 = 0.2753\\). Let’s do this one more time for another arbitrarily chosen instructor. This instructor had a beauty score of \\(x = 6\\). The residual in this case is \\(3.8 - 4.28 = -0.4802\\). Now let’s say we repeated this process for all 463 instructors in our dataset. Regression minimizes the sum of all 463 arrow lengths squared. In other words, it minimizes the sum of the squared residuals: \\[ \\sum_{i=1}^{n}(y_i - \\widehat{y}_i)^2 \\] We square the arrow lengths so that positive and negative deviations of the same amount are treated equally. That’s why alternative names for the simple linear regression line are the least-squares line and the best fitting line. It can be proven via calculus and linear algebra that this line uniquely minimizes the sum of the squared arrow lengths. For the regression line in the plot, the sum of the squared residuals is 131.879. This is the lowest possible value of the sum of the squared residuals of all possible lines we could draw on this scatterplot? How do we know this? We can mathematically prove this fact, but this requires some calculus and linear algebra, so let’s leave this proof for another course! 6.3.4 get_regression_x() functions What is going on behind the scenes with the get_regression_table() get_regression_points() from the moderndive package? Recall we introduced In Subsection 6.1.2, the get_regression_table() function that returned a regression table. In Subsection 6.1.3, the get_regression_points() function that returned information on all \\(n\\) points/observations involved in a regression? and that these were examples of wrapper functions that takes other pre-existing functions and “wraps” them in a single function. This way all the user needs to worry about is the input and the output format, and ignore what’s “under the hood.” In this subsection we “lift the hood” and see how the engine of these wrapper functions work. First, the get_regression_table() wrapper function leverages the the tidy() function in the broom package and the clean_names() function in the janitor package to generate tidy data frames with information about a regression model. Here is what the regression table from Subsection 6.1.2 looks like: score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) get_regression_table(score_model) term estimate std_error statistic p_value lower_ci upper_ci intercept 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 The get_regression_table() function takes the above two functions that already existed in other R packages, uses them, and hides the details as seen below. This was on the editorial decision on our part as we felt the following code was unfortunately out of the reach for some new coders, so the following wrapper function was written so that users need only focus on the output. library(broom) library(janitor) score_model %&gt;% tidy(conf.int = TRUE) %&gt;% mutate_if(is.numeric, round, digits = 3) %&gt;% clean_names() %&gt;% rename(lower_ci = conf_low, upper_ci = conf_high) term estimate std_error statistic p_value lower_ci upper_ci (Intercept) 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 Note that the mutate_if() function is from the dplyr package and applies the round() function with 3 significant digits precision only to those variables that are numerical. Similarly, the second get_regression_points() function is another wrapper function, but this time returning information about the points in a regression rather than the regression table. It uses the augment() function in the broom package instead of tidy() as with get_regression_points(). library(broom) library(janitor) score_model %&gt;% augment() %&gt;% mutate_if(is.numeric, round, digits = 3) %&gt;% clean_names() %&gt;% select(-c(&quot;se_fit&quot;, &quot;hat&quot;, &quot;sigma&quot;, &quot;cooksd&quot;, &quot;std_resid&quot;)) score bty_avg fitted resid 4.7 5.00 4.21 0.486 4.1 5.00 4.21 -0.114 3.9 5.00 4.21 -0.314 4.8 5.00 4.21 0.586 4.6 3.00 4.08 0.520 4.3 3.00 4.08 0.220 2.8 3.00 4.08 -1.280 4.1 3.33 4.10 -0.002 3.4 3.33 4.10 -0.702 4.5 3.17 4.09 0.409 In this case, it outputs only variables of interest to us as new regression modelers: the outcome variable \\(y\\) (score), all explanatory/predictor variables (bty_avg), all resulting fitted values \\(\\hat{y}\\) used by applying the equation of the regression line to bty_avg, and the residual \\(y - \\hat{y}\\). If you’re even more curious, take a look at the source code for these functions on GitHub. 6.4 Conclusion 6.4.1 Additional resources An R script file of all R code used in this chapter is available here. 6.4.2 What’s to come? In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter 7, we’ll study multiple regression where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections 11.4.1 and 11.4.2. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, lower_ci and upper_ci (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in! "],
+["7-multiple-regression.html", "Chapter 7 Multiple Regression 7.1 Two numerical explanatory variables 7.2 One numerical &amp; one categorical explanatory variable 7.3 Related topics 7.4 Conclusion", " Chapter 7 Multiple Regression In Chapter 6 we introduced ideas related to modeling, in particular that the fundamental premise of modeling is to make explicit the relationship between an outcome variable \\(y\\) and an explanatory/predictor variable \\(x\\). Recall further the synonyms that we used to also denote \\(y\\) as the dependent variable and \\(x\\) as an independent variable or covariate. There are many modeling approaches one could take, among the most well-known being linear regression, which was the focus of the last chapter. Whereas in the last chapter we focused solely on regression scenarios where there is only one explanatory/predictor variable, in this chapter, we now focus on modeling scenarios where there is more than one. This case of regression more than one explanatory variable is known as multiple regression. You can imagine when trying to model a particular outcome variable, like teaching evaluation score as in Section 6.1 or life expectancy as in Section 6.2, it would be very useful to incorporate more than one explanatory variable. Since our regression models will now consider more than one explanatory/predictor variable, the interpretation of the associated effect of any one explanatory/predictor variables must be made in conjunction with the others. For example, say we are modeling individuals’ incomes as a function of their number of years of education and their parents’ wealth. When interpreting the effect of education on income, one has to consider the effect of their parents’ wealth at the same time, as these two variables are almost certainly related. Make note of this throughout this chapter and as you work on interpreting the results of multiple regression models into the future. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(ISLR) library(skimr) 7.1 Two numerical explanatory variables Let’s now attempt to identify factors that are associated with how much credit card debt an individual will have. The textbook An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani is an intermediate-level textbook on statistical and machine learning freely available here. It has an accompanying R package called ISLR with datasets that the authors use to demonstrate various machine learning methods. One dataset that is frequently used by the authors is the Credit dataset where predictions are made on the credit card balance held by \\(n = 400\\) credit card holders. These predictions are based on information about them like income, credit limit, and education level. Note that this dataset is not based on actual individuals, it is a simulated dataset used for educational purposes. Since no information was provided as to who these \\(n\\) = 400 individuals are and how they came to be included in this dataset, it will be hard to make any scientific claims based on this data. Recall our discussion from the previous chapter that correlation does not necessarily imply causation. That being said, we’ll still use Credit to demonstrate multiple regression with: A numerical outcome variable \\(y\\), in this case credit card balance. Two explanatory variables: A first numerical explanatory variable \\(x_1\\). In this case, their credit limit. A second numerical explanatory variable \\(x_2\\). In this case, their income (in thousands of dollars). In the forthcoming Learning Checks, we’ll consider a different scenario: The same numerical outcome variable \\(y\\): credit card balance. Two new explanatory variables: A first numerical explanatory variable \\(x_1\\): their credit rating. A second numerical explanatory variable \\(x_2\\): their age. 7.1.1 Exploratory data analysis Let’s load the Credit data and select() only the needed subset of variables. library(ISLR) Credit &lt;- Credit %&gt;% select(Balance, Limit, Income, Rating, Age) Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function. Although in Table 7.1 we only show 5 randomly selected credit card holders out of 400: View(Credit) TABLE 7.1: Random sample of 5 credit card holders Balance Limit Income Rating Age 250 98 1551 22.6 134 43 294 1677 11200 140.7 817 46 172 283 4270 36.9 299 63 41 50 3327 35.0 253 54 186 450 4442 30.4 316 30 glimpse(Credit) Observations: 400 Variables: 5 $ Balance &lt;int&gt; 333, 903, 580, 964, 331, 1151, 203, 872, 279, 1350, 1407, 0, … $ Limit &lt;int&gt; 3606, 6645, 7075, 9504, 4897, 8047, 3388, 7114, 3300, 6819, 8… $ Income &lt;dbl&gt; 14.9, 106.0, 104.6, 148.9, 55.9, 80.2, 21.0, 71.4, 15.1, 71.1… $ Rating &lt;int&gt; 283, 483, 514, 681, 357, 569, 259, 512, 266, 491, 589, 138, 3… $ Age &lt;int&gt; 34, 82, 71, 36, 68, 77, 37, 87, 66, 41, 30, 64, 57, 49, 75, 5… Let’s look at some summary statistics, again using the skim() function from the skimr package: Credit %&gt;% select(Balance, Limit, Income) %&gt;% skim() Skim summary statistics n obs: 400 n variables: 3 ── Variable type:integer ───────────────────────────────────────────────────────────────────────────────────────────────────── variable missing complete n mean sd p0 p25 p50 p75 p100 Balance 0 400 400 520.01 459.76 0 68.75 459.5 863 1999 Limit 0 400 400 4735.6 2308.2 855 3088 4622.5 5872.75 13913 hist ▇▃▃▃▂▁▁▁ ▅▇▇▃▂▁▁▁ ── Variable type:numeric ───────────────────────────────────────────────────────────────────────────────────────────────────── variable missing complete n mean sd p0 p25 p50 p75 p100 Income 0 400 400 45.22 35.24 10.35 21.01 33.12 57.47 186.63 hist ▇▃▂▁▁▁▁▁ We observe for example: The mean and median credit card balance are $520.01 and $459.50 respectively. 25% of card holders had debts of $68.75 or less. The mean and median credit card limit are $4735.6 and $4622.50 respectively. 75% of these card holders had incomes of $57,470 or less. Since our outcome variable Balance and the explanatory variables Limit and Income are numerical, we can compute the correlation coefficient between pairs of these variables. First, we could run the get_correlation() command as seen in Subsection 6.1.1 twice, once for each explanatory variable: Credit %&gt;% get_correlation(Balance ~ Limit) Credit %&gt;% get_correlation(Balance ~ Income) Or we can simultaneously compute them by returning a correlation matrix in Table 7.2. We can read off the correlation coefficient for any pair of variables by looking them up in the appropriate row/column combination. Credit %&gt;% select(Balance, Limit, Income) %&gt;% cor() TABLE 7.2: Correlations between credit card balance, credit limit, and income Balance Limit Income Balance 1.000 0.862 0.464 Limit 0.862 1.000 0.792 Income 0.464 0.792 1.000 For example, the correlation coefficient of: Balance with itself is 1 as we would expect based on the definition of the correlation coefficient. Balance with Limit is 0.862. This indicates a strong positive linear relationship, which makes sense as only individuals with large credit limits can accrue large credit card balances. Balance with Income is 0.464. This is suggestive of another positive linear relationship, although not as strong as the relationship between Balance and Limit. As an added bonus, we can read off the correlation coefficient of the two explanatory variables, Limit and Income of 0.792. In this case, we say there is a high degree of collinearity between these two explanatory variables. Collinearity (or multicollinearity) is a phenomenon in which one explanatory variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. So in this case, if we knew someone’s credit card Limit and since Limit and Income are highly correlated, we could make a fairly accurate guess as to that person’s Income. Or put loosely, these two variables provided redundant information. For now let’s ignore any issues related to collinearity and press on. Let’s visualize the relationship of the outcome variable with each of the two explanatory variables in two separate plots: ggplot(Credit, aes(x = Limit, y = Balance)) + geom_point() + labs(x = &quot;Credit limit (in $)&quot;, y = &quot;Credit card balance (in $)&quot;, title = &quot;Relationship between balance and credit limit&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) ggplot(Credit, aes(x = Income, y = Balance)) + geom_point() + labs(x = &quot;Income (in $1000)&quot;, y = &quot;Credit card balance (in $)&quot;, title = &quot;Relationship between balance and income&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) FIGURE 7.1: Relationship between credit card balance and credit limit/income First, there is a positive relationship between credit limit and balance, since as credit limit increases so also does credit card balance; this is to be expected given the strongly positive correlation coefficient of 0.862. In the case of income, the positive relationship doesn’t appear as strong, given the weakly positive correlation coefficient of 0.464. However the two plots in Figure 7.1 only focus on the relationship of the outcome variable with each of the explanatory variables independently. To get a sense of the joint relationship of all three variables simultaneously through a visualization, let’s display the data in a 3-dimensional (3D) scatterplot, where The numerical outcome variable \\(y\\) Balance is on the z-axis (vertical axis) The two numerical explanatory variables form the “floor” axes. In this case The first numerical explanatory variable \\(x_1\\) Income is on of the floor axes. The second numerical explanatory variable \\(x_2\\) Limit is on the other floor axis. Click on the following image to open an interactive 3D scatterplot in your browser: Previously in Figure 6.6, we plotted a “best-fitting” regression line through a set of points where the numerical outcome variable \\(y\\) was teaching score and a single numerical explanatory variable \\(x\\) was bty_avg. What is the analogous concept when we have two numerical predictor variables? Instead of a best-fitting line, we now have a best-fitting plane, which is a 3D generalization of lines which exist in 2D. Click here to open an interactive plot of the regression plane shown below in your browser. Move the image around, zoom in, and think about how this plane generalizes the concept of a linear regression line to three dimensions. FIGURE 7.2: Regression plane Learning check (LC7.1) Conduct a new exploratory data analysis with the same outcome variable \\(y\\) being Balance but with Rating and Age as the new explanatory variables \\(x_1\\) and \\(x_2\\). Remember, this involves three things: Looking at the raw values Computing summary statistics of the variables of interest. Creating informative visualizations What can you say about the relationship between a credit card holder’s balance and their credit rating and age? 7.1.2 Multiple regression Just as we did when we had a single numerical explanatory variable \\(x\\) in Subsection 6.1.2 and when we had a single categorical explanatory variable \\(x\\) in Subsection 6.2.2, we fit a regression model and obtained the regression table in our two numerical explanatory variable scenario. To fit a regression model and get a table using get_regression_table(), we now use a + to consider multiple explanatory variables. In this case since we want to perform a regression of Limit and Income simultaneously, we input Balance ~ Limit + Income. Balance_model &lt;- lm(Balance ~ Limit + Income, data = Credit) get_regression_table(Balance_model) TABLE 7.3: Multiple regression table term estimate std_error statistic p_value lower_ci upper_ci intercept -385.179 19.465 -19.8 0 -423.446 -346.912 Limit 0.264 0.006 45.0 0 0.253 0.276 Income -7.663 0.385 -19.9 0 -8.420 -6.906 How do we interpret these three values that define the regression plane? Intercept: -$385.18 (rounded to two decimal points to represent cents). The intercept in our case represents the credit card balance for an individual who has both a credit Limit of $0 and Income of $0. In our data however, the intercept has limited practical interpretation as no individuals had Limit or Income values of $0 and furthermore the smallest credit card balance was $0. Rather, it is used to situate the regression plane in 3D space. Limit: $0.26. Now that we have multiple variables to consider, we have to add a caveat to our interpretation: taking all other variables in our model into account, for every increase of one unit in credit Limit (dollars), there is an associated increase of on average $0.26 in credit card balance. Note: Just as we did in Subsection 6.1.2, we are not making any causal statements, only statements relating to the association between credit limit and balance We need to preface our interpretation of the associated effect of Limit with the statement “taking all other variables into account”, in this case Income, to emphasize that we are now jointly interpreting the associated effect of multiple explanatory variables in the same model and not in isolation. Income: -$7.66. Similarly, taking all other variables into account, for every increase of one unit in Income (in other words, $1000 in income), there is an associated decrease of on average $7.66 in credit card balance. However, recall in Figure 7.1 that when considered separately, both Limit and Income had positive relationships with the outcome variable Balance. As card holders’ credit limits increased their credit card balances tended to increase as well, and a similar relationship held for incomes and balances. In the above multiple regression, however, the slope for Income is now -7.66, suggesting a negative relationship between income and credit card balance. What explains these contradictory results? This is known as Simpson’s Paradox, a phenomenon in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. We expand on this in Subsection 7.3.2 where we’ll look at the relationship between credit Limit and credit card balance but split by different income bracket groups. Learning check (LC7.2) Fit a new simple linear regression using lm(Balance ~ Rating + Age, data = Credit) where Rating and Age are the new numerical explanatory variables \\(x_1\\) and \\(x_2\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 7.1.3 Observed/fitted values and residuals As we did previously in Table 7.4, let’s unpack the output of the get_regression_points() function for our model for credit card balance for all 400 card holders in the dataset. Recall that each card holder corresponds to one of the 400 rows in the Credit data frame and also for one of the 400 3D points in the 3D scatterplots in Subsection 7.1.1. regression_points &lt;- get_regression_points(Balance_model) regression_points TABLE 7.4: Regression points (first 5 rows of 400) ID Balance Limit Income Balance_hat residual 1 333 3606 14.9 454 -120.8 2 903 6645 106.0 559 344.3 3 580 7075 104.6 683 -103.4 4 964 9504 148.9 986 -21.7 5 331 4897 55.9 481 -150.0 Recall the format of the output: Balance corresponds to \\(y\\) (the observed value) Balance_hat corresponds to \\(\\widehat{y}\\) (the fitted value) residual corresponds to \\(y - \\widehat{y}\\) (the residual) 7.2 One numerical &amp; one categorical explanatory variable Let’s revisit the instructor evaluation data introduced in Section 6.1, where we studied the relationship between instructor evaluation scores and their beauty scores. This analysis suggested that there is a positive relationship between bty_avg and score, in other words as instructors had higher beauty scores, they also tended to have higher teaching evaluation scores. Now let’s say instead of bty_avg we are interested in the numerical explanatory variable \\(x_1\\) age and furthermore we want to use a second explanatory variable \\(x_2\\), the (binary) categorical variable gender. Note: This study only focused on the gender binary of &quot;male&quot; or &quot;female&quot; when the data was collected and analyzed years ago. It has been tradition to use gender as an “easy” binary variable in the past in statistical analyses. We have chosen to include it here because of the interesting results of the study, but we also understand that a segment of the population is not included in this dichotomous assignment of gender and we advocate for more inclusion in future studies to show representation of groups that do not identify with the gender binary. We now resume our analyses using this evals data and hope that others find these results interesting and worth further exploration. Our modeling scenario now becomes A numerical outcome variable \\(y\\). As before, instructor evaluation score. Two explanatory variables: A numerical explanatory variable \\(x_1\\): in this case, their age. A categorical explanatory variable \\(x_2\\): in this case, their binary gender. 7.2.1 Exploratory data analysis Let’s reload the evals data and select() only the needed subset of variables. Note that these are different than the variables chosen in Chapter 6. Let’s given this the name evals_ch7. evals_ch7 &lt;- evals %&gt;% select(score, age, gender) Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function, although in Table 7.5 we only show 5 randomly selected instructors out of 463: View(evals_ch7) TABLE 7.5: Random sample of 5 instructors score age gender 3.6 34 male 4.9 43 male 3.3 47 male 4.4 33 female 4.7 60 male Let’s look at some summary statistics using the skim() function from the skimr package: evals_ch7 %&gt;% skim() Skim summary statistics n obs: 463 n variables: 3 ── Variable type:factor ────────────────────────────────────────────────────────────────────────────────────────────────────── variable missing complete n n_unique top_counts ordered gender 0 463 463 2 mal: 268, fem: 195, NA: 0 FALSE ── Variable type:integer ───────────────────────────────────────────────────────────────────────────────────────────────────── variable missing complete n mean sd p0 p25 p50 p75 p100 hist age 0 463 463 48.37 9.8 29 42 48 57 73 ▅▅▅▇▅▇▂▁ ── Variable type:numeric ───────────────────────────────────────────────────────────────────────────────────────────────────── variable missing complete n mean sd p0 p25 p50 p75 p100 hist score 0 463 463 4.17 0.54 2.3 3.8 4.3 4.6 5 ▁▁▂▃▅▇▇▆ Furthermore, let’s compute the correlation between two numerical variables we have score and age. Recall that correlation coefficients only exist between numerical variables. We observe that they are weakly negatively correlated. evals_ch7 %&gt;% get_correlation(formula = score ~ age) # A tibble: 1 x 1 correlation &lt;dbl&gt; 1 -0.107 In Figure 7.3, we plot a scatterplot of score over age. Given that gender is a binary categorical variable in this study, we can make some interesting tweaks: We can assign a color to points from each of the two levels of gender: female and male. Furthermore, the geom_smooth(method = &quot;lm&quot;, se = FALSE) layer automatically fits a different regression line for each since we have provided color = gender at the top level in ggplot(). This allows for all geom_etries that follow to have the same mapping of aes()thetics to variables throughout the plot. ggplot(evals_ch7, aes(x = age, y = score, color = gender)) + geom_jitter() + labs(x = &quot;Age&quot;, y = &quot;Teaching Score&quot;, color = &quot;Gender&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) FIGURE 7.3: Instructor evaluation scores at UT Austin split by gender (jittered) We notice some interesting trends: There are almost no women faculty over the age of 60. We can see this by the lack of red dots above 60. Fitting separate regression lines for men and women, we see they have different slopes. We see that the associated effect of increasing age seems to be much harsher for women than men. In other words, as women age, the drop in their teaching score appears to be faster. 7.2.2 Multiple regression: Parallel slopes model Much like we started to consider multiple explanatory variables using the + sign in Subsection 7.1.2, let’s fit a regression model and get the regression table. This time we provide the name of score_model_2 to our regression model fit, in so as to not overwrite the model score_model from Section 6.1.2. score_model_2 &lt;- lm(score ~ age + gender, data = evals_ch7) get_regression_table(score_model_2) TABLE 7.6: Regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 4.484 0.125 35.79 0.000 4.238 4.730 age -0.009 0.003 -3.28 0.001 -0.014 -0.003 gendermale 0.191 0.052 3.63 0.000 0.087 0.294 The modeling equation for this scenario is: \\[\\begin{align} \\widehat{y} &amp;= b_0 + b_1 \\cdot x_1 + b_2 \\cdot x_2 \\\\ \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\end{align}\\] where \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) is an indicator function for sex == male. In other words, \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals one if the current observation corresponds to a male professor, and 0 if the current observation corresponds to a female professor. This model can be visualized in Figure 7.4. FIGURE 7.4: Instructor evaluation scores at UT Austin by gender: same slope We see that: Females are treated as the baseline for comparison for no other reason than “female” is alphabetically earlier than “male.” The \\(b_{male} = 0.1906\\) is the vertical “bump” that men get in their teaching evaluation scores. Or more precisely, it is the average difference in teaching score that men get relative to the baseline of women. Accordingly, the intercepts (which in this case make no sense since no instructor can have an age of 0) are : for women: \\(b_0\\) = 4.484 for men: \\(b_0 + b_{male}\\) = 4.484 + 0.191 = 4.675 Both men and women have the same slope. In other words, in this model the associated effect of age is the same for men and women. So for every increase of one year in age, there is on average an associated change of \\(b_{age}\\) = -0.009 (a decrease) in teaching score. But wait, why is Figure 7.4 different than Figure 7.3! What is going on? What we have in the original plot is known as an interaction effect between age and gender. Focusing on fitting a model for each of men and women, we see that the resulting regression lines are different. Thus, gender appears to interact in different ways for men and women with the different values of age. 7.2.3 Multiple regression: Interaction model We say a model has an interaction effect if the associated effect of one variable depends on the value of another variable. These types of models usually prove to be tricky to view on first glance because of their complexity. In this case, the effect of age will depend on the value of gender. Put differently, the effect of age on teaching scores will differ for men and for women, as was suggested by the different slopes for men and women in our visual exploratory data analysis in Figure 7.3. Let’s fit a regression with an interaction term. Instead of using the + sign in the enumeration of explanatory variables, we use the * sign. Let’s fit this regression and save it in score_model_3, then we get the regression table using the get_regression_table() function as before. score_model_interaction &lt;- lm(score ~ age * gender, data = evals_ch7) get_regression_table(score_model_interaction) TABLE 7.7: Regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 4.883 0.205 23.80 0.000 4.480 5.286 age -0.018 0.004 -3.92 0.000 -0.026 -0.009 gendermale -0.446 0.265 -1.68 0.094 -0.968 0.076 age:gendermale 0.014 0.006 2.45 0.015 0.003 0.024 The modeling equation for this scenario is: \\[\\begin{align} \\widehat{y} &amp;= b_0 + b_1 \\cdot x_1 + b_2 \\cdot x_2 + b_3 \\cdot x_1 \\cdot x_2\\\\ \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\end{align}\\] Oof, that’s a lot of rows in the regression table output and a lot of terms in the model equation. The fourth term being added on the right hand side of the equation corresponds to the interaction term. Let’s simplify things by considering men and women separately. First, recall that \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals 1 if a particular observation (or row in evals_ch7) corresponds to a male instructor. In this case, using the values from the regression table the fitted value of \\(\\widehat{\\mbox{score}}\\) is: \\[\\begin{align} \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot 1 + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot 1 \\\\ &amp;= \\left(b_0 + b_{\\mbox{male}}\\right) + \\left(b_{\\mbox{age}} + b_{\\mbox{age,male}} \\right) \\cdot \\mbox{age} \\\\ &amp;= \\left(4.883 + -0.446\\right) + \\left(-0.018 + 0.014 \\right) \\cdot \\mbox{age} \\\\ &amp;= 4.437 -0.004 \\cdot \\mbox{age} \\end{align}\\] Second, recall that \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals 0 if a particular observation corresponds to a female instructor. Again, using the values from the regression table the fitted value of \\(\\widehat{\\mbox{score}}\\) is: \\[\\begin{align} \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot 0 + b_{\\mbox{age,male}}\\mbox{age} \\cdot 0 \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age}\\\\ &amp;= 4.883 -0.018 \\cdot \\mbox{age} \\end{align}\\] Let’s summarize these values in a table: TABLE 7.8: Comparison of male and female intercepts and age slopes Gender Intercept Slope for age Male instructors 4.44 -0.004 Female instructors 4.88 -0.018 We see that while male instructors have a lower intercept, as they age, they have a less steep associated average decrease in teaching scores: 0.004 teaching score units per year as opposed to -0.018 for women. This is consistent with the different slopes and intercepts of the red and blue regression lines fit in Figure 7.3. Recall our definition of a model having an interaction effect: when the associated effect of one variable, in this case age, depends on the value of another variable, in this case gender. But how do we know when it’s appropriate to include an interaction effect? For example, which is the more appropriate model? The regular multiple regression model without an interaction term we saw in Section 7.2.2 or the multiple regression model with the interaction term we just saw? We’ll revisit this question in Chapter 11 on “inference for regression.” 7.2.4 Observed/fitted values and residuals Now say we want to apply the above calculations for male and female instructors for all 463 instructors in the evals_ch7 dataset. As our multiple regression models get more and more complex, computing such values by hand gets more and more tedious. The get_regression_points() function spares us this tedium and returns all fitted values and all residuals. For simplicity, let’s focus only on the fitted interaction model, which is saved in score_model_interaction. regression_points &lt;- get_regression_points(score_model_interaction) regression_points TABLE 7.9: Regression points (first 5 rows of 463) ID score age gender score_hat residual 1 4.7 36 female 4.25 0.448 2 4.1 36 female 4.25 -0.152 3 3.9 36 female 4.25 -0.352 4 4.8 36 female 4.25 0.548 5 4.6 59 male 4.20 0.399 Recall the format of the output: score corresponds to \\(y\\) the observed value score_hat corresponds to \\(\\widehat{y} = \\widehat{\\mbox{score}}\\) the fitted value residual corresponds to the residual \\(y - \\widehat{y}\\) 7.3 Related topics 7.3.1 More on the correlation coefficient Recall from Table 7.2 that we saw the correlation coefficient between Income in thousands of dollars and credit card Balance was 0.464. What if in instead we looked at the correlation coefficient between Income and credit card Balance, but where Income was in dollars and not thousands of dollars? This can be done by multiplying Income by 1000. library(ISLR) data(Credit) Credit %&gt;% select(Balance, Income) %&gt;% mutate(Income = Income * 1000) %&gt;% cor() TABLE 7.10: Correlation between income (in dollars) and credit card balance Balance Income Balance 1.000 0.464 Income 0.464 1.000 We see it is the same! We say that the correlation coefficient is invariant to linear transformations! In other words, the correlation between \\(x\\) and \\(y\\) will be the same as the correlation between \\(a\\times x + b\\) and \\(y\\) where \\(a\\) and \\(b\\) are numerical values (real numbers in mathematical terms). 7.3.2 Simpson’s Paradox Recall in Section 7.1, we saw the two following seemingly contradictory results when studying the relationship between credit card balance, credit limit, and income. On the one hand, the right hand plot of Figure 7.1 suggested that credit card balance and income were positively related: FIGURE 7.5: Relationship between credit card balance and credit limit/income On the other hand, the multiple regression in Table 7.3, suggested that when modeling credit card balance as a function of both credit limit and income at the same time, credit limit has a negative relationship with balance, as evidenced by the slope of -7.66. How can this be? First, let’s dive a little deeper into the explanatory variable Limit. Figure 7.6 shows a histogram of all 400 values of Limit, along with vertical red lines that cut up the data into quartiles, meaning: 25% of credit limits were between $0 and $3088. Let’s call this the “low” credit limit bracket. 25% of credit limits were between $3088 and $4622. Let’s call this the “medium-low” credit limit bracket. 25% of credit limits were between $4622 and $5873. Let’s call this the “medium-high” credit limit bracket. 25% of credit limits were over $5873. Let’s call this the “high” credit limit bracket. `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. FIGURE 7.6: Histogram of credit limits and quartiles Let’s now display The scatterplot showing the relationship between credit card balance and limit (the right-hand plot of Figure 7.1). The scatterplot showing the relationship between credit card balance and limit now with a color aesthetic added corresponding to the credit limit bracket. FIGURE 7.7: Relationship between credit card balance and income for different credit limit brackets In the right-hand plot, the Red points (bottom-left) correspond to the low credit limit bracket. Green points correspond to the medium-low credit limit bracket. Blue points correspond to the medium-high credit limit bracket. Purple points (top-right) correspond to the high credit limit bracket. The left-hand plot focuses of the relationship between balance and income in aggregate, but the right-hand plot focuses on the relationship between balance and income broken down by credit limit bracket. Whereas in aggregate there is an overall positive relationship, when broken down we now see that for the low (red points), medium-low (green points), and medium-high (blue points) income bracket groups, the strong positive relationship between credit card balance and income disappears! Only for the high bracket does the relationship stay somewhat positive. In this example, credit limit is a confounding variable for credit card balance and income. 7.4 Conclusion 7.4.1 Additional resources An R script file of all R code used in this chapter is available here. 7.4.2 What’s to come? Congratulations! We’re ready to proceed to the third portion of this book: “statistical inference” using a new package called infer. Once we’ve covered Chapters 8 on sampling, 9 on confidence intervals, and 10 on hypothesis testing, we’ll come back to the models we’ve seen in “data modeling” in Chapter 11 on inference for regression. As we said at the end of Chapter 6, we’ll see why we’ve been conducting the residual analyses from Subsections 11.4.3 and 11.4.4. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, conf_low and conf_high (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Up next: "],
+["8-sampling.html", "Chapter 8 Sampling 8.1 Sampling activity 8.2 Computer simulation 8.3 Sampling framework 8.4 Case study: Polls 8.5 Conclusion", " Chapter 8 Sampling In this chapter we kick off the third segment of this book, statistical inference, by learning about sampling. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we’ll cover in Chapters 9 and 10 respectively. We will see that the tools that you learned in the data science segment of this book, in particular data visualization and data wrangling, will also play an important role here in the development of your understanding. As mentioned before, the concepts throughout this text all build into a culmination allowing you to “think with data.” Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(moderndive) 8.1 Sampling activity Let’s start with a hands-on activity. 8.1.1 What proportion of this bowl’s balls are red? Take a look at the bowl in Figure 8.1. It has a certain number of red and a certain number of white balls all of equal size. Furthermore, it appears the bowl has been mixed beforehand as there does not seem to be any particular pattern to the spatial distribution of red and white balls. Let’s now ask ourselves, what proportion of this bowl’s balls are red? FIGURE 8.1: A bowl with red and white balls. One way to answer this question would be to perform an exhaustive count: remove each ball individually, count the number of red balls and the number of white balls, and divide the number of red balls by the total number of balls. However this would be a long and tedious process. 8.1.2 Using the shovel once Instead of performing an exhaustive count, let’s insert a shovel into the bowl as seen in Figure 8.2. FIGURE 8.2: Inserting a shovel into the bowl. Using the shovel we remove a number of balls as seen in Figure 8.3. FIGURE 8.3: Fifty balls from the bowl. Observe that 17 of the balls are red and there are a total of 5 x 10 = 50 balls and thus 0.34 = 34% of the shovel’s balls are red. We can view the proportion of balls that are red in this shovel as a guess of the proportion of balls that are red in the entire bowl. While not as exact as doing an exhaustive count, our guess of 34% took much less time and energy to obtain. However, say, we started this activity over from the beginning. In other words, we replace the 50 balls back into the bowl and start over. Would we remove exactly 17 red balls again? In other words, would our guess at the proportion of the bowl’s balls that are red be exactly 34% again? Maybe? What if we repeated this exercise several times? Would I obtain exactly 17 red balls each time? In other words, would our guess at the proportion of the bowl’s balls that are red be exactly 34% every time? Surely not. Let’s actually do and observe the results with the help of 33 of our friends. 8.1.3 Using the shovel 33 times Each of our 33 friends will do the following: use the shovel to remove 50 balls each, count the number of red balls, use this number to compute the proportion of the 50 balls they removed that are red, return the balls into the bowl, and mix the contents of the bowl a little to not let a previous group’s results influence the next group’s set of results. FIGURE 8.4: Repeating sampling activity 33 times. However, before returning the balls into the bowl, they are going to mark the proportion of the 50 balls they removed that are red in a histogram as seen in Figure 8.5. FIGURE 8.5: Constructing a histogram of proportions. Recall from Section 3.5 that histograms allow us to visualize the distribution of a numerical variable: where the values center and in particular how they vary. The resulting hand-drawn histogram can be seen in Figure 8.6. FIGURE 8.6: Hand-drawn histogram of 33 proportions. Observe the following about the histogram in Figure 8.6: At the low end, one group removed 50 balls from the bowl with proportion between 0.20 = 20% and 0.25 = 25% At the high end, another group removed 50 balls from the bowl with proportion between 0.45 = 45% and 0.5 = 50% red. However the most frequently occurring proportions were between 0.30 = 30% and 0.35 = 35% red, right in the middle of the distribution. The shape of this distribution is somewhat bell-shaped. Let’s construct this same hand-drawn histogram in R using your data visualization skills that you honed in Chapter 3. We saved our 33 group of friends’ proportion red in a data frame tactile_prop_red which is included in the moderndive package you loaded earlier. tactile_prop_red View(tactile_prop_red) Let’s display only the first 10 out of 33 rows of tactile_prop_red’s contents in Table 8.1. TABLE 8.1: First 10 out of 33 groups’ proportion of 50 balls that are red. group replicate red_balls prop_red Ilyas, Yohan 1 21 0.42 Morgan, Terrance 2 17 0.34 Martin, Thomas 3 21 0.42 Clark, Frank 4 21 0.42 Riddhi, Karina 5 18 0.36 Andrew, Tyler 6 19 0.38 Julia 7 19 0.38 Rachel, Lauren 8 11 0.22 Daniel, Caroline 9 15 0.30 Josh, Maeve 10 17 0.34 Observe for each group we have their names, the number of red_balls they obtained, and the corresponding proportion out of 50 balls that were red named prop_red. Observe, we also have a variable replicate enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red. We visualize the distribution of these 33 proportions using a geom_histogram() with binwidth = 0.05 in Figure 8.7, which is appropriate since the variable prop_red is numerical. This computer-generated histogram matches our hand-drawn histogram from the earlier Figure 8.6. ggplot(tactile_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 50 balls that were red&quot;, title = &quot;Distribution of 33 proportions red&quot;) FIGURE 8.7: Distribution of 33 proportions based on 33 samples of size 50 8.1.4 What are we doing here? What we just demonstrated in this activity is the statistical concept of sampling. We would like to know the proportion of the bowl’s balls that are red, but because the bowl has a very large number of balls performing an exhaustive count of the number of red and white balls in the bowl would be very costly in terms of both time and energy. We therefore extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we estimate the proportion of the bowl’s balls that are red using the proportion of the shovel’s balls that are red. This estimate in our earlier example was 17 red balls out of 50 balls = 34%. Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table 8.1. This is known as the concept of sampling variation. In Section 8.2 we’ll mimic the hands-on sampling activity we just performed in a computer simulation; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the hands-on activity a very large number of times, but we will also be able to repeat it using different sized shovels. The purpose of these simulations is to develop an understanding of two key concepts relating to sampling: understanding the concept of sampling variation and the role that sample size plays in this variation. To this end, we’ll present you with definitions, terminology, and notation related to sampling in Section 8.3. As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you’ll be able to master these topics. To tie the contents of this chapter to the real-word, we’ll present an example of one of the most recognizable uses of sampling: polls. In Section 8.4 we’ll look at a particular case study: a 2013 poll on then President Obama’s popularity among young Americans, conducted by the Harvard Kennedy School’s Institute of Politics. We’ll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distinguishing between random sampling and random assignment, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter. 8.2 Computer simulation What we performed in Section 8.1 is a simulation of sampling. In other words, we were not in a real-life sampling scenario in order to answer a real-life question, but rather we were mimicking such a scenario with our bowl and shovel. The crowd-sourced Wikipedia definition of a simulation states: “A simulation is an approximate imitation of the operation of a process or system.”1 One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible. Now you might be thinking that simulations must necessarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengers of being in an automobile crash. To distinguish between these two simulation types, we’ll term a simulation performed in real-life as a “tactile” simulation done with your hands and to the touch as opposed to a “virtual” simulation performed on a computer. Example of a “tactile” simulation Example of “virtual” simulation So while in Section 8.1 we performed a “tactile” simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we’ll perform a “virtual” simulation using a “virtual” bowl and a “virtual” shovel with our computers. 8.2.1 Using the virtual shovel once Let’s start by performing the virtual analogue of the tactile sampling simulation we performed in 8.1. We first need a virtual analogue of the bowl seen in Figure 8.1. To this end, we included a data frame bowl in the moderndive package whose rows correspond exactly with the contents of the actual bowl. bowl # A tibble: 2,400 x 2 ball_ID color &lt;int&gt; &lt;chr&gt; 1 1 white 2 2 white 3 3 white 4 4 red 5 5 white 6 6 white 7 7 red 8 8 white 9 9 red 10 10 white # … with 2,390 more rows Observe in the output that bowl has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable ball_ID is used merely as an “identification variable” for this data frame as discussed in Subsection ??; none of the balls in the actual bowl are marked with numbers. The second variable color indicates whether a particular virtual ball is red or white. View the contents of the bowl in RStudio’s data viewer and scroll through the contents to convince yourselves that bowl is indeed a virtual version of the actual bowl in Figure 8.1. Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure 8.2; we’ll use this virtual shovel to generate our virtual random samples of 50 balls. We’re going to use the rep_sample_n() function included in the moderndive package. This function allows us to take repeated, or replicated, samples of size n. Run the following and explore virtual_shovel’s contents in the RStudio viewer. virtual_shovel &lt;- bowl %&gt;% rep_sample_n(size = 50) View(virtual_shovel) Let’s display only the first 10 out of 50 rows of virtual_shovel’s contents in Table 8.2. TABLE 8.2: First 10 sampled balls of 50 in virtual sample replicate ball_ID color 1 1500 white 1 1767 red 1 1035 white 1 245 white 1 1121 white 1 1828 white 1 721 white 1 1729 red 1 770 white 1 1499 red The ball_ID variable identifies which of the balls from bowl are included in our sample of 50 balls and color denotes its color. However what does the replicate variable indicate? In virtual_shovel’s case, replicate is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in our case our first sample. We’ll see below when we “virtually” take 33 samples, replicate will take values between 1 and 33. Before we do this, let’s compute the proportion of balls in our virtual sample of size 50 that are red using the dplyr data wrangling verbs you learned in Chapter 4. Let’s breakdown the steps individually: First, for each of our 50 sampled balls, identify if it is red using a test for equality using ==. For every row where color == &quot;red&quot;, the Boolean TRUE is returned and for every row where color is not equal to &quot;red&quot;, the Boolean FALSE is returned. Let’s create a new Boolean variable is_red using the mutate() function from Section 4.5: virtual_shovel %&gt;% mutate(is_red = (color == &quot;red&quot;)) # A tibble: 50 x 4 # Groups: replicate [1] replicate ball_ID color is_red &lt;int&gt; &lt;int&gt; &lt;chr&gt; &lt;lgl&gt; 1 1 1500 white FALSE 2 1 1767 red TRUE 3 1 1035 white FALSE 4 1 245 white FALSE 5 1 1121 white FALSE 6 1 1828 white FALSE 7 1 721 white FALSE 8 1 1729 red TRUE 9 1 770 white FALSE 10 1 1499 red TRUE # … with 40 more rows Second, we compute the number of balls out of 50 that are red using the summarize() function. Recall from Section 4.3 that summarize() takes a data frame with many rows and returns a data frame with a single row containing summary statistics that you specify, like mean() and median(). In this case we use the sum(): virtual_shovel %&gt;% mutate(is_red = (color == &quot;red&quot;)) %&gt;% summarize(num_red = sum(is_red)) # A tibble: 1 x 2 replicate num_red &lt;int&gt; &lt;int&gt; 1 1 17 Why does this work? Because R treats TRUE like the number 1 and FALSE like the number 0. So summing the number of TRUE’s and FALSE’s is equivalent to summing 1’s and 0’s, which in the end counts the number of balls where color is red. In our case, 17 of the 50 balls were red. Third and last, we compute the proportion of the 50 sampled balls that are red by dividing num_red by 50: virtual_shovel %&gt;% mutate(is_red = color == &quot;red&quot;) %&gt;% summarize(num_red = sum(is_red)) %&gt;% mutate(prop_red = num_red / 50) # A tibble: 1 x 3 replicate num_red prop_red &lt;int&gt; &lt;int&gt; &lt;dbl&gt; 1 1 17 0.34 In other words, this “virtual” sample’s balls were 34% red. Let’s make the above code a little more compact and succinct by combining the first mutate() and the summarize() as follows: virtual_shovel %&gt;% summarize(num_red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = num_red / 50) # A tibble: 1 x 3 replicate num_red prop_red &lt;int&gt; &lt;int&gt; &lt;dbl&gt; 1 1 17 0.34 Great! 34% of virtual_shovel’s 50 balls were red! So based on this particular sample, our guess at the proportion of the bowl’s balls that are red is 34%. But remember from our earlier tactile sampling activity that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 34% of them being red again; there will likely be some variation. In fact in Table 8.2 we displayed 33 such proportions based on 33 tactile samples and then in Figure 8.6 we visualized the distribution of the 33 proportions in a histogram. Let’s now perform the virtual analogue of having 33 groups of students use the sampling shovel! 8.2.2 Using the virtual shovel 33 times Recall that in our tactile sampling exercise in Section 8.1 we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we then used to compute 33 proportions. In other words we repeated/replicated using the shovel 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel function rep_sample_n(), but by adding the reps = 33 argument, indicating we want to repeat the sampling 33 times. Be sure to scroll through the contents of virtual_samples in RStudio’s viewer. virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 33) View(virtual_samples) Observe that while the first 50 rows of replicate are equal to 1, the next 50 rows of replicate are equal to 2. This is telling us that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all reps = 33 replicates and thus virtual_samples has 33 \\(\\times\\) 50 = 1650 rows. Let’s now take the data frame virtual_samples with 33 \\(\\times\\) 50 = 1650 rows corresponding to 33 samples of size 50 balls and compute the resulting 33 proportions red. We’ll use the same dplyr verbs as we did in the previous section, but this time with an additional group_by() of the replicate variable. Recall from Section 4.4 that by assigning the grouping variable “meta-data” before summarizing(), we’ll obtain 33 different proportions red: virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) View(virtual_prop_red) Let’s display only the first 10 out of 33 rows of virtual_prop_red’s contents in Table 8.1. As one would expect, there is variation in the resulting prop_red proportions red for the first 10 out 33 repeated/replicated samples. TABLE 8.3: First 10 out of 33 virtual proportion of 50 balls that are red. replicate red prop_red 1 18 0.36 2 20 0.40 3 19 0.38 4 18 0.36 5 15 0.30 6 18 0.36 7 19 0.38 8 13 0.26 9 23 0.46 10 14 0.28 Let’s visualize the distribution of these 33 proportions red based on 33 virtual samples using a histogram with binwidth = 0.05 in Figure 8.8. ggplot(virtual_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 50 balls that were red&quot;, title = &quot;Distribution of 33 proportions red&quot;) FIGURE 8.8: Distribution of 33 proportions based on 33 samples of size 50 Observe that occasionally we obtained proportions red that are less than 0.3 = 30%, while on the other hand we occasionally we obtained proportions that are greater than 0.45 = 45%. However, the most frequently occurring proportions red out of 50 balls were between 35% and 40% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation. Let’s now compare our virtual results with our tactile results from the previous section in Figure 8.9. We see that both histograms, in other words the distribution of the 33 proportions red, are somewhat similar in their center and spread although not identical. These slight differences are again due to random variation. Furthermore both distributions are somewhat bell-shaped. FIGURE 8.9: Comparing 33 virtual and 33 tactile proportions red. 8.2.3 Using the virtual shovel 1000 times Now say we want study the variation in proportions red not based on 33 repeated/replicated samples, but rather a very large number of samples say 1000 samples. We have two choices at this point. We could have our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. This would be cruel and unusual however, as this would be very tedious and time-consuming. This is where computers excel: automating long and repetitive tasks while performing them very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let’s once again use the rep_sample_n() function with sample size set to 50 once again, but this time with the number of replicates reps = 1000. Be sure to scroll through the contents of virtual_samples in RStudio’s viewer. virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 1000) View(virtual_samples) Observe that now virtual_samples has 1000 \\(\\times\\) 50 = 50,000 rows, instead of the 33 \\(\\times\\) 50 = 1650 rows from earlier. Using the same code as earlier, let’s take the data frame virtual_samples with 1000 \\(\\times\\) 50 = 50,000 and compute the resulting 1000 proportions red. virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) View(virtual_prop_red) Observe that we now have 1000 replicates of prop_red, the proportion of 50 balls that are red. Using the same code as earlier, let’s now visualize the distribution of these 1000 replicates of prop_red in a histogram in Figure 8.10. ggplot(virtual_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 50 balls that were red&quot;, title = &quot;Distribution of 1000 proportions red&quot;) FIGURE 8.10: Distribution of 1000 proportions based on 33 samples of size 50 Once again, the most frequently occurring proportions red occur between 35% and 40%. Every now and then, we obtain proportions as low as between 20% and 25%, and others as high as between 55% and 60%. These are rare however. Furthermore observe that we now have a much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix A for a brief discussion on properties of the Normal distribution. 8.2.4 Using different shovels Now say instead of just one shovel, you had three choices of shovels to extract a sample of balls with. A shovel with 25 slots A shovel with 50 slots A shovel with 100 slots If your goal was still to estimate the proportion of the bowl’s balls that were red, which shovel would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size and hence would yield the “best” guess of the proportion of the bowl’s 2400 balls that are red. Using our newly developed tools for virtual sampling simulations, let’s unpack the effect of having different sample sizes! In other words, let’s use rep_sample_n() with size = 25, size = 50, and size = 100, while keeping the number of repeated/replicated samples at 1000: Virtually use the appropriate shovel to generate 1000 samples with size balls. Compute the resulting 1000 replicated of the proportion of the shovel’s balls that are red. Visualize the distribution of these 1000 proportion red using a histogram. Run each of the following code segments individually and then compare the three resulting histograms. # Segment 1: sample size = 25 ------------------------------ # 1.a) Virtually use shovel 1000 times virtual_samples_25 &lt;- bowl %&gt;% rep_sample_n(size = 25, reps = 1000) # 1.b) Compute resulting 1000 replicates of proportion red virtual_prop_red_25 &lt;- virtual_samples_25 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 25) # 1.c) Plot distribution via a histogram ggplot(virtual_prop_red_25, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 25 balls that were red&quot;, title = &quot;25&quot;) # Segment 2: sample size = 50 ------------------------------ # 2.a) Virtually use shovel 1000 times virtual_samples_50 &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 1000) # 2.b) Compute resulting 1000 replicates of proportion red virtual_prop_red_50 &lt;- virtual_samples_50 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) # 2.c) Plot distribution via a histogram ggplot(virtual_prop_red_50, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 50 balls that were red&quot;, title = &quot;50&quot;) # Segment 3: sample size = 100 ------------------------------ # 3.a) Virtually using shovel with 100 slots 1000 times virtual_samples_100 &lt;- bowl %&gt;% rep_sample_n(size = 100, reps = 1000) # 3.b) Compute resulting 1000 replicates of proportion red virtual_prop_red_100 &lt;- virtual_samples_100 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 100) # 3.c) Plot distribution via a histogram ggplot(virtual_prop_red_100, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = &quot;white&quot;) + labs(x = &quot;Proportion of 100 balls that were red&quot;, title = &quot;100&quot;) For easy comparison, we present the three resulting histograms in a single row with matching x and y axes in Figure 8.11. What do you observe? FIGURE 8.11: Comparing the distributions of proportion red for different sample sizes Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation and the distribution centers more tightly around the same value. Eyeballing Figure 8.11, things appear to center tightly around roughly 40%. We can be numerically explicit about the amount of spread in our 3 sets of 1000 values of prop_red using the standard deviation: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix A for a brief discussion on properties of the standard deviation. For all three sample sizes, let’s compute the standard deviation of the 1000 proportions red by running the following data wrangling code that uses the sd() summary function. # n = 25 virtual_prop_red_25 %&gt;% summarize(sd = sd(prop_red)) # n = 50 virtual_prop_red_50 %&gt;% summarize(sd = sd(prop_red)) # n = 100 virtual_prop_red_100 %&gt;% summarize(sd = sd(prop_red)) Let’s compare these 3 measures of spread of the distributions in Table 8.4. TABLE 8.4: Comparing standard deviations of proportions red for 3 different shovels. Number of slots in shovel Standard deviation of proportions red 25 0.099 50 0.071 100 0.048 As we observed visually in Figure 8.11, as the sample size increases our numerical measure of spread decreases; there is less variation in our proportions red. In other words, as the sample size increases, our guesses at the true proportion of the bowl’s balls that are red get more consistent and precise. 8.3 Sampling framework In both our “hands-on” tactile simulations and our “virtual” simulations using a computer, we used sampling for the purpose of estimation: we extract samples in order to estimate the proportion of the bowl’s balls that are red. We used sampling as a cheaper and less-time consuming approach than to do a full census of all the balls. Our virtual simulations all built up to the results shown in Figure 8.11 and Table 8.4, comparing 1000 proportions red based on samples of size 25, 50, and 100. This was our first attempt at understanding two key concepts relating to sampling for estimation: The effect of sampling variation on our estimates. The effect of sample size on sampling variation. Let’s now introduce some terminology and notation as well as statistical definitions related to sampling. Given the number of new words to learn, you will likely have to read these next three subsections multiple times. Keep in mind however that none of the concepts underlying these terminology, notation, and definitions are any different than the concepts underlying our simulations in Sections 8.1 and 8.2; it will simply take time and practice to master them. 8.3.1 Terminology &amp; notation Here is a list of terminology and mathematical notation relating to sampling. For each item, we’ll be sure to tie them to our simulations in Sections 8.1 and 8.2. (Study) Population: A (study) population is a collection of individuals or observations about which we are interested. We mathematically denote the population’s size using upper case \\(N\\). In our simulations the (study) population was the collection of \\(N\\) = 2400 identically sized red and white balls contained in the bowl. Population parameter: A population parameter is a numerical summary quantity about the population that is unknown, but you wish you knew. For example, when this quantity is a mean, the population parameter of interest is the population mean which is mathematically denoted with the Greek letter \\(\\mu\\) (pronounced “mu”). In our simulations however since we were interested in the proportion of the bowl’s balls that were red, the population parameter is the population proportion which is mathematically denoted with the letter \\(p\\). Census: An exhaustive enumeration or counting of all \\(N\\) individuals or observations in the population in order to compute the population parameter’s value exactly. In our simulations, this would correspond to manually going over all \\(N\\) = 2400 balls in the bowl and counting the number that are red and computing the population proportion \\(p\\) of the balls that are red exactly. When the number \\(N\\) of individuals or observations in our population is large, as was the case with our bowl, a census can be very expensive in terms of time, energy, and money. Sampling: Sampling is the act of collecting a sample from the population when we don’t have the means to perform a census. We mathematically denote the sample’s size using lower case \\(n\\), as opposed to upper case \\(N\\) which denotes the population’s size. Typically the sample size \\(n\\) is much smaller than the population size \\(N\\), thereby making sampling a much cheaper procedure than a census. In our simulations, we used shovels with 25, 50, and 100 slots to extract a sample of size \\(n\\) = 25, \\(n\\) = 50, and \\(n\\) = 100 balls. Point estimate (AKA sample statistic): A summary statistic computed from the sample that estimates the unknown population parameter. In our simulations, recall that the unknown population parameter was the population proportion and that this is mathematically denoted with \\(p\\). Our point estimate is the sample proportion: the proportion of the shovel’s balls that are red. In other words, it is our guess of the proportion of the bowl’s balls balls that are red. We mathematically denote the sample proportion using \\(\\widehat{p}\\); the “hat” on top of the \\(p\\) indicates that it is an estimate of the unknown population proportion \\(p\\). Representative sampling: A sample is said be a representative sample if it is representative of the population. In other words, are the sample’s characteristics a good representation of the population’s characteristics? In our simulations, are the samples of \\(n\\) balls extracted using our shovels representative of the bowl’s \\(N\\)=2400 balls? Generalizability: We say a sample is generalizable if any results based on the sample can generalize to the population. In other words, can the value of the point estimate be generalized to estimate the value of the population parameter well? In our simulations, can we generalize the values of the sample proportions red of our shovels to the population proportion red of the bowl? Using mathematical notation, is \\(\\widehat{p}\\) a “good guess” of \\(p\\)? Bias: In a statistical sense, we say bias occurs if certain individuals or observations in a population have a higher chance of being included in a sample than others. We say a sampling procedure is unbiased if every observation in a population had an equal chance of being sampled. In our simulations, since each ball had the same size and hence an equal chance of being sample in our shovels, our samples were unbiased. Random sampling: We say a sampling procedure is random if we sample randomly from the population in an unbiased fashion. In our simulations, this would correspond to sufficiently mixing the bowl before each use of the shovel. Phew, that’s a lot of new terminology and notation to learn! Let’s put them all together to describe the paradigm of sampling: If the sampling of a sample of size \\(n\\) is done at random, then the sample is unbiased and representative of the population of size \\(N\\), thus any result based on the sample can generalize to the population, thus the point estimate is a “good guess” of the unknown population parameter, thus instead of performing a census, we can infer about the population using sampling. Restricting consideration to a shovel with 50 slots from our simulations, If we extract a sample of \\(n=50\\) balls at random, in other words we mix the equally-sized balls before using the shovel, then the contents of the shovel are an unbiased representation of the contents of the bowl’s 2400 balls, thus any result based on the sample of balls can generalize to the bowl, thus the sample proportion \\(\\widehat{p}\\) of the \\(n=50\\) balls in the shovel that are red is a “good guess” of the population proportion \\(p\\) of the \\(N\\)=2400 balls that are red, thus instead of manually going over all the balls in the bowl, we can infer about the bowl using the shovel. Note that last word we wrote in bold: infer. The act of “inferring” is to deduce or conclude (information) from evidence and reasoning. In our simulations, we wanted to infer about the proportion of the bowl’s balls that are red. Statistical inference is the theory, methods, and practice of forming judgments about the parameters of a population and the reliability of statistical relationships, typically on the basis of random sampling (Wikipedia). In other words, statistical inference is the act of inference via sampling. In the upcoming Chapter 9 on confidence intervals, we’ll introduce the infer package, which makes statistical inference “tidy” and transparent. It is why this third portion of the book is called “Statistical inference via infer”. 8.3.2 Statistical definitions Now for some important statistical definitions related to sampling. As a refresher of our 1000 repeated/replicated virtual samples of size \\(n\\) = 25, \\(n\\) = 50, and \\(n\\) = 100 in Section 8.2, let’s display Figure 8.11 again below. These types of distributions have a special name: sampling distributions; their visualization displays the effect of sampling variation on the distribution of any point estimate, in this case the sample proportion \\(\\widehat{p}\\). Using these sampling distributions, for a given sample size \\(n\\), we can make statements about what values we can typically expect. For example, observe the centers of all three sampling distributions: they are all roughly centered around 0.4 = 40%. Furthermore, observe that while we are somewhat likely to observe sample proportions red of 0.2 = 20% when using the shovel with 25 slots, we will almost never observe this sample proportion when using the shovel with 100 slots. Observe also the effect of sample size on the sampling variation. As the sample size \\(n\\) increases from 25 to 50 to 100, the spread/variation of the sampling distribution decreases and thus the values cluster more and more tightly around the same center of around 40%. We quantified this spread/variation using the standard deviation of our proportions in Table 8.4, which we display again below: Number of slots in shovel Standard deviation of proportions red 25 0.099 50 0.071 100 0.048 So as the number of slots in the shovel increased, this standard deviation decreased. These types of standard deviations have another special name: standard errors; they quantify the effect of sampling variation induced on our estimates. In other words, they are quantifying how much we can expect different proportions of a shovel’s balls that are red to vary from random sample to random sample. Unfortunately, many new statistics practitioners get confused by these names. For example, it’s common for people new to statistical inference to call the “sampling distribution” the “sample distribution”. Another additional source of confusion is the name “standard deviation” and “standard error”. Remember that a standard error is merely a kind of standard deviation: the standard deviation of any point estimate from a sampling scenario. In other words, all standard errors are standard deviations, but not all standard deviations are a standard error. To help reinforce these concepts, let’s re-display Figure 8.11 but using our new terminology, notation, and definitions relating to sampling in Figure 8.12. FIGURE 8.12: Three sampling distributions of the sample proportion \\(\\widehat{p}\\). Furthermore, let’s re-display Table 8.4 but using our new terminology, notation, and definitions relating to sampling in Table 8.5. TABLE 8.5: Three standard errors of the sample proportion \\(\\widehat{p}\\) based on n = 25, 50, 100. Sample size Standard error of \\(\\widehat{p}\\) n = 25 0.099 n = 50 0.071 n = 100 0.048 Remember the key message of this last table: that as the sample size \\(n\\) goes up, the “typical” error of your point estimate as quantified by the standard error will go down. 8.3.3 The moral of the story Let’s recap this section so far. We’ve seen that if a sample is generated at random, then the resulting point estimate is a “good guess” of the true unknown population parameter. In our simulations, since we made sure to mix the balls first before extracting a sample with the shovel, the resulting sample proportion \\(\\widehat{p}\\) of the shovel’s balls that were red was a “good guess” of the population proportion \\(p\\) of the bowl’s balls that were red. However, what do we mean by our point estimate being a “good guess”? While sometimes we’ll obtain a point estimate less than the true value of the unknown population parameter, other times we’ll obtain a point estimate greater than the true value of the unknown population parameter, this is because of sampling variation. However despite this sampling variation, our point estimates will “on average” be correct. In our simulations, sometimes our sample proportion \\(\\widehat{p}\\) was less than the true population proportion \\(p\\), other times the sample proportion \\(\\widehat{p}\\) was greater than the true population proportion \\(p\\). This was due to the sampling variability induced by the mixing. However despite this sampling variation, our sample proportions \\(\\widehat{p}\\) were always centered around the true population proportion. This is also known as having an accurate estimate. What was the value of the population proportion \\(p\\) of the \\(N\\) = 2400 balls in the actual bowl? There were 900 red balls, for a proportion red of 900/2400 = 0.375 = 37.5%! How do we know this? Did the authors do an exhaustive count of all the balls? No! They were listed on the contexts of the box that the bowl came in. Hence we made the contents of the virtual bowl match the tactile bowl: bowl %&gt;% summarize(sum_red = sum(color == &quot;red&quot;), sum_not_red = sum(color != &quot;red&quot;)) # A tibble: 1 x 2 sum_red sum_not_red &lt;int&gt; &lt;int&gt; 1 900 1500 Let’s re-display our sampling distributions from Figures 8.11 and 8.12, but now with a vertical red line marking the true population proportion \\(p\\) of balls that are red = 37.5% in Figure 8.13. We see that while there is a certain amount of error in the sample proportions \\(\\widehat{p}\\) for all three sampling distributions, on average the \\(\\widehat{p}\\) are centered at the true population proportion red \\(p\\). FIGURE 8.13: Three sampling distributions with population proportion \\(p\\) marked in red. We also saw in this section that as your sample size \\(n\\) increases, your point estimates will vary less and less and be more and more concentrated around the true population parameter; this is quantified by the decreasing standard error. In other words, the typical error of your point estimates will decrease. In our simulations, as the sample size increases, the spread/variation of our sample proportions \\(\\widehat{p}\\) around the true population proportion \\(p\\) decreases. You can observe this behavior as well in Figure 8.13. This is also known as having a more precise estimate. So random sampling ensures our point estimates are accurate, while having a large sample size ensures our point estimates are precise. While accuracy and precision may sound like the same concept, they are actually not. Accuracy relates to how “on target” our estimates are whereas precision relates to how “consistent” our estimates are. Figure 8.14 illustrates the difference. FIGURE 8.14: Comparing accuracy and precision As this point you might be asking yourself: “If you already knew the true proportion of the bowl’s balls that are red was 37.5%, then what did we do any of this for?” In other words, “If you already knew the value of the true unknown population parameter, then why did we do any sampling?” You might also be asking: “Why did we take 1000 repeated/replicated samples of size n = 25, 50, and 100? Shouldn’t we be taking only one sample that’s as large as possible?” Recall our definition of a simulation from Section 8.2: an approximate imitation of the operation of a process or system. We performed these simulations to study: The effect of sampling variation on our estimates. The effect of sample size on sampling variation. In a real-life scenario, we won’t know what the true value of the population parameter is and furthermore we won’t take repeated/replicated samples but rather a single sample that’s as large as we can afford. This was also done to show the power of the technique of sampling when trying to estimate a population parameter. Since we knew the value was 37.5%, we could show just how well the different sample sizes approximated this value in their sampling distributions. We present one case study of a real-life sampling scenario in the next section: polling. 8.4 Case study: Polls In December 4, 2013 National Public Radio in the US reported on a recent, at the time, poll of President Obama’s approval rating among young Americans aged 18-29 in an article Poll: Support For Obama Among Young Americans Eroding. A quote from the article: After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama. According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama’s job performance, his lowest-ever standing among the group and an 11-point drop from April. Let’s tie elements of the real-life poll in this new article with our “tactile” and “virtual” simulations from Sections 8.1 and 8.2 using the terminology, notations, and definitions we learned in Section 8.3. (Study) Population: Who is the population of \\(N\\) individuals or observations of interest? Simulation: \\(N\\) = 2400 identically-sized red and white balls Obama poll: \\(N\\) = ? young Americans aged 18-29 Population parameter: What is the population parameter? Simulation: The population proportion \\(p\\) of ALL the balls in the bowl that are red. Obama poll: The population proportion \\(p\\) of ALL young Americans who approve of Obama’s job performance. Census: What would a census look like? Simulation: Manually going over all \\(N\\) = 2400 balls and exactly computing the population proportion \\(p\\) of the balls that are red, a time consuming task. Obama poll: Locating all \\(N\\) = ? young Americans and asking them all if they approve of Obama’s job performance, an expensive task. Sampling: How do you collect the sample of size \\(n\\) individuals or observations? Simulation: Using a shovel with \\(n\\) slots. Obama poll: One method is to get a list of phone numbers of all young Americans and pick out \\(n\\) phone numbers. In this poll’s case, the sample size of this poll was \\(n\\) = 2089 young Americans. Point estimate (AKA sample statistic): What is your estimate of the unknown population parameter? Simulation: The sample proportion \\(\\widehat{p}\\) of the balls in the shovel that were red. Obama poll: The sample proportion \\(\\widehat{p}\\) of young Americans in the sample that approve of Obama’s job performance. In this poll’s case, \\(\\widehat{p}\\) = 0.41 = 41%, the quoted percentage in the second paragraph of the article. Representative sampling: Is the sampling procedure representative? Simulation: Are the contents of the shovel representative of the contents of the bowl? Obama poll: Is the sample of \\(n\\) = 2089 young Americans representative of all young Americans aged 18-29? Generalizability: Are the samples generalizable to the greater population? Simulation: Is the sample proportion \\(\\widehat{p}\\) of the shovel’s balls that are red a “good guess” of the population proportion \\(p\\) of the bowl’s balls that are red? Obama poll: Is the sample proportion \\(\\widehat{p}\\) = 0.41 of the sample of young Americans who support Obama a “good guess” of the population proportion \\(p\\) of all young Americans who support Obama? In other words, can we confidently say that 41% of all young Americans approve of Obama? Bias: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample? Simulation: Since each ball was equally sized, each ball had an equal chance of being included in a shovel’s sample, and hence the sampling was unbiased. Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using only mobile phone numbers, would people without mobile phones be included? What if those who disapproved of Obama were less likely to agree to take part in the poll? What about if this were an internet poll on a certain news website? Would non-readers of this website be included? We need to ask the Harvard University Institute of Politics pollsters about their sampling methodology. Random sampling: Was the sampling random? Simulation: As long as you mixed the bowl sufficiently before sampling, your samples would be random. Obama poll: Was the sample conducted at random? We need to ask the Harvard University Institute of Politics pollsters about their sampling methodology. Once again, let’s revisit the sampling paradigm: If the sampling of a sample of size \\(n\\) is done at random, then the sample is unbiased and representative of the population of size \\(N\\), thus any result based on the sample can generalize to the population, thus the point estimate is a “good guess” of the unknown population parameter, thus instead of performing a census, we can infer about the population using sampling. In our simulations using the shovel with 50 slots: If we extract a sample of \\(n\\) = 50 balls at random, in other words we mix the equally-sized balls before using the shovel, then the contents of the shovel are an unbiased representation of the contents of the bowl’s 2400 balls, thus any result based on the sample of balls can generalize to the bowl, thus the sample proportion \\(\\widehat{p}\\) of the \\(n\\) = 50 balls in the shovel that are red is a “good guess” of the population proportion \\(p\\) of the \\(N\\) = 2400 balls that are red, thus instead of manually going over all the balls in the bowl, we can infer about the bowl using the shovel. In the in-real life Obama poll: If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then these 2089 young Americans would be an unbiased and representative sample of all young Americans, thus any results based on this sample of 2089 young Americans can generalize to the entire population of all young Americans, thus the reported sample approval rating of 41% of these 2089 young Americans is a good guess of the true approval rating among all young Americans, thus instead of performing a highly costly census of all young Americans, we can infer about all young Americans using polling. 8.5 Conclusion 8.5.1 Central Limit Theorem What you did in Sections 8.1 and 8.2 (in particular in Figure 8.11 and Table 8.4) was demonstrate a very famous theorem, or mathematically proven truth, called the Central Limit Theorem. It loosely states that when sample means and sample proportions are based on larger and larger sample sizes, the sampling distribution of these two point estimates become more and more normally shaped and more and more narrow. In other words, their sampling distributions become more normally distributed and the spread/variation of these sampling distributions as quantified by their standard errors gets smaller. Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following 3m38s video at https://www.youtube.com/embed/jvoxEYmQHNM explaining this crucial statistical theorem using the average weight of wild bunny rabbits and the average wing span of dragons as examples. Enjoy! 8.5.2 Summary table In this chapter, we performed both tactile and virtual simulations of sampling to infer about an unknown proportion. We also presented a case study of a sampling in real life situation: polls. In both cases, we used the sample proportion \\(\\widehat{p}\\) to estimate the population proportion \\(p\\). However, we are not just limited to scenarios related statistical inference for proportions. In other words, we can consider other population parameter and point estimate scenarios than just the population proportion \\(p\\) and sample proportion \\(\\widehat{p}\\) scenarios we studied in this chapter. We present 5 more such scenarios in Table 8.6. Note that the sample mean is traditionally noted as \\(\\overline{x}\\) but can also be thought of as an estimate of the population mean \\(\\mu\\). Thus, it can also be denoted as \\(\\widehat{\\mu}\\) as shown below in the table. TABLE 8.6: Scenarios of sampling for inference Scenario Population parameter Notation Point estimate Notation. 1 Population proportion \\(p\\) Sample proportion \\(\\widehat{p}\\) 2 Population mean \\(\\mu\\) Sample mean \\(\\widehat{\\mu}\\) or \\(\\overline{x}\\) 3 Difference in population proportions \\(p_1 - p_2\\) Difference in sample proportions \\(\\widehat{p}_1 - \\widehat{p}_2\\) 4 Difference in population means \\(\\mu_1 - \\mu_2\\) Difference in sample means \\(\\overline{x}_1 - \\overline{x}_2\\) 5 Population regression slope \\(\\beta_1\\) Sample regression slope \\(\\widehat{\\beta}_1\\) or \\(b_1\\) 6 Population regression intercept \\(\\beta_0\\) Sample regression intercept \\(\\widehat{\\beta}_0\\) or \\(b_0\\) We’ll cover all the remaining scenarios as follows, using the terminology, notation, and definitions related to sampling you saw in Section 8.3: In Chapter 9, we’ll cover examples of statistical inference for Scenario 2: The mean age \\(\\mu\\) of all pennies in circulation in the US. Scenario 3: The difference \\(p_1 - p_2\\) in the proportion of people who yawn when seeing someone else yawn and the proportion of people who yawn without seeing someone else yawn. This is an example of two-sample inference. In Chapter 10, we’ll cover an example of statistical inference for Scenario 4: The difference \\(\\mu_1 - \\mu_2\\) in average IMDB ratings for action and romance movies. This is another example of two-sample inference. In Chapter 11, we’ll cover an example of statistical inference for the relationship between teaching score and various instructor demographic variables you saw in Chapter 6 on basic regression and Chapter 7 on multiple regression. Specifically Scenario 5: The intercept \\(\\beta_0\\) of some population regression line. Scenario 6: The slope \\(\\beta_1\\) of some population regression line. 8.5.3 Additional resources An R script file of all R code used in this chapter is available here. 8.5.4 What’s to come? Recall in our Obama poll case study in Section 8.4 that based on this particular sample, the Harvard University Institute of Politics’ best guess of Obama’s approval rating among all young Americans was 41%. However, this isn’t the end of the story. If you read further in the article, it states: The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points. Note the term margin of error, which here is plus or minus 2.1 percentage points. What this is saying is that most polls won’t get it perfectly right; there will always be a certain amount of error caused by sampling variation. The margin of error of plus or minus 2.1 percentage points is saying that a typical range of errors for polls of this type is about \\(\\pm\\) 2.1%, in words from about 2.1% too small to about 2.1% too big for an interval of [41% - 2.1%, 41% + 2.1%] = [37.9%, 43.1%]. Remember that this notation corresponds to 37.9% and 43.1% being included as well as all numbers between the two of them. We’ll see in the next chapter that such intervals are known as confidence intervals. Wikipedia entry for simulation↩ "],
+["9-confidence-intervals.html", "Chapter 9 Confidence Intervals 9.1 Bootstrapping 9.2 The infer package for statistical inference 9.3 Now to confidence intervals 9.4 Comparing bootstrap and sampling distributions 9.5 Interpreting the confidence interval 9.6 Example: One proportion 9.7 Example: Comparing two proportions 9.8 Conclusion", " Chapter 9 Confidence Intervals In preparation for our first print edition to be published by CRC Press in Fall 2019, we’re remodeling this chapter a bit. Don’t expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at ModernDive.com by early Summer 2019! In Chapter 8, we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter 8: Generally speaking, we learned that if the sampling of a sample of size \\(n\\) is done at random, then the resulting sample is unbiased and representative of the population, thus any result based on the sample can generalize to the population, and hence the point estimate/sample statistic computed from this sample is a “good guess” of the unknown population parameter of interest Specific to the bowl, we learned that if we properly mix the balls first thereby ensuring the randomness of samples extracted using the shovel with \\(n=50\\) slots, then the contents of the shovel will “look like” the contents of the bowl, thus any results based on the sample of \\(n=50\\) balls can generalize to the large bowl of \\(N=2400\\) balls, and hence the sample proportion red \\(\\widehat{p}\\) of the \\(n=50\\) balls in the shovel is a “good guess” of the true population proportion red \\(p\\) of the \\(N=2400\\) balls in the bowl. We emphasize that we used a point estimate/sample statistic, in this case the sample proportion \\(\\widehat{p}\\), to estimate the unknown value of the population parameter, in this case the population proportion \\(p\\). In other words, we are using the sample to infer about the population. We can however consider inferential situations other than just those involving proportions. We present a wide array of such scenarios in Table 9.1. In all 7 cases, the point estimate/sample statistic estimates the unknown population parameter. It does so by computing summary statistics based on a sample of size \\(n\\). TABLE 9.1: Scenarios of sampling for inference Scenario Population parameter Notation Point estimate Notation. 1 Population proportion \\(p\\) Sample proportion \\(\\widehat{p}\\) 2 Population mean \\(\\mu\\) Sample mean \\(\\widehat{\\mu}\\) or \\(\\overline{x}\\) 3 Difference in population proportions \\(p_1 - p_2\\) Difference in sample proportions \\(\\widehat{p}_1 - \\widehat{p}_2\\) 4 Difference in population means \\(\\mu_1 - \\mu_2\\) Difference in sample means \\(\\overline{x}_1 - \\overline{x}_2\\) 5 Population regression slope \\(\\beta_1\\) Sample regression slope \\(\\widehat{\\beta}_1\\) or \\(b_1\\) 6 Population regression intercept \\(\\beta_0\\) Sample regression intercept \\(\\widehat{\\beta}_0\\) or \\(b_0\\) We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing: Scenario 2 about means. Ex: the average age of pennies. Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of two-sample inference. Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This is another situation of two-sample inference. In Chapter 11 on inference for regression, we’ll cover Scenarios 5 &amp; 6 about the regression line. In particular we’ll see that the fitted regression line from Chapter 6 on basic regression, \\(\\widehat{y} = b_0 + b_1 \\cdot x\\), is in fact an estimate of some true population regression line \\(y = \\beta_0 + \\beta_1 \\cdot x\\) based on a sample of \\(n\\) pairs of points \\((x, y)\\). Ex: Recall our sample of \\(n=463\\) instructors at the UT Austin from the evals data set in Chapter 6. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for all instructors, not just those at the UT Austin? In contrast to these, Scenario 7 involves a measure of spread: the standard deviation. Does the spread/variability of a sample match the spread/variability of the population? However, we leave this topic for a more intermediate course on statistical inference. In most cases, we don’t have the population values as we did with the bowl of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a confidence interval and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as bootstrapping that will be the focus of the beginning sections of this chapter. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(janitor) library(moderndive) library(infer) 9.1 Bootstrapping 9.1.1 Data explanation The moderndive package contains a sample of 40 pennies collected and minted in the United States. Let’s explore this sample data first: pennies_sample # A tibble: 40 x 2 year age_in_2011 &lt;int&gt; &lt;int&gt; 1 2005 6 2 1981 30 3 1977 34 4 1992 19 5 2005 6 6 2006 5 7 2000 11 8 1992 19 9 1988 23 10 1996 15 # … with 30 more rows The pennies_sample data frame has rows corresponding to a single penny with two variables: year of minting as shown on the penny and age_in_2011 giving the years the penny had been in circulation from 2011 as an integer, e.g. 15, 2, etc. Suppose we are interested in understanding some properties of the mean age of all US pennies from this data collected in 2011. How might we go about that? Let’s begin by understanding some of the properties of pennies_sample using data wrangling from Chapter 4 and data visualization from Chapter 3. 9.1.2 Exploratory data analysis First, let’s visualize the values in this sample as a histogram: ggplot(pennies_sample, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) We see a roughly symmetric distribution here that has quite a few values near 20 years in age with only a few larger than 40 years or smaller than 5 years. If pennies_sample is a representative sample from the population, we’d expect the age of all US pennies collected in 2011 to have a similar shape, a similar spread, and similar measures of central tendency like the mean. So where does the mean value fall for this sample? This point will be known as our point estimate and provides us with a single number that could serve as the guess to what the true population mean age might be. Recall how to find this using the dplyr package: x_bar &lt;- pennies_sample %&gt;% summarize(stat = mean(age_in_2011)) x_bar # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 We’ve denoted this sample mean as \\(\\bar{x}\\), which is the standard symbol for denoting the mean of a sample. Our point estimate is, thus, \\(\\bar{x} = 25.1\\). Note that this is just one sample though providing just one guess at the population mean. What if we’d like to have another guess? This should all sound similar to what we did in Chapter 8. There instead of collecting just a single scoop of balls we had many different students use the shovel to scoop different samples of red and white balls. We then calculated a sample statistic (the sample proportion) from each sample. But, we don’t have a population to pull from here with the pennies. We only have this one sample. The process of bootstrapping allows us to use a single sample to generate many different samples that will act as our way of approximating a sampling distribution using a created bootstrap distribution instead. We will pull ourselves up from our bootstraps using a single sample (pennies_sample) to get an idea of the grander sampling distribution. 9.1.3 The Bootstrapping Process Bootstrapping uses a process of sampling with replacement from our original sample to create new bootstrap samples of the same size as our original sample. We can again make use of the rep_sample_n() function to explore what one such bootstrap sample would look like. Remember that we are randomly sampling from the original sample here with replacement and that we always use the same sample size for the bootstrap samples as the size of the original sample (pennies_sample). bootstrap_sample1 &lt;- pennies_sample %&gt;% rep_sample_n(size = 40, replace = TRUE, reps = 1) bootstrap_sample1 # A tibble: 40 x 3 # Groups: replicate [1] replicate year age_in_2011 &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 1 1983 28 2 1 2000 11 3 1 2004 7 4 1 1981 30 5 1 1993 18 6 1 2006 5 7 1 1981 30 8 1 2004 7 9 1 1992 19 10 1 1994 17 # … with 30 more rows Let’s visualize what this new bootstrap sample looks like: ggplot(bootstrap_sample1, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) We now have another sample from what we could assume comes from the population of interest. We can similarly calculate the sample mean of this bootstrap sample, called a bootstrap statistic. bootstrap_sample1 %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 1 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 23.2 We can see that this sample mean is smaller than the x_bar value we calculated earlier for the pennies_sample data. We’ll come back to analyzing the different bootstrap statistic values shortly. Let’s recap what was done to get to this bootstrap sample using a tactile explanation: First, pretend that each of the 40 values of age_in_2011 in pennies_sample were written on a small piece of paper. Recall that these values were 6, 30, 34, 19, 6, etc. Now, put the 40 small pieces of paper into a receptacle such as a baseball cap. Shake up the pieces of paper. Draw “at random” from the cap to select one piece of paper. Write down the value on this piece of paper. Say that it is 28. Now, place this piece of paper containing 28 back into the cap. Draw “at random” again from the cap to select a piece of paper. Note that this is the sampling with replacement part since you may draw 28 again. Repeat this process until you have drawn 40 pieces of paper and written down the values on these 40 pieces of paper. Completing this repetition produces ONE bootstrap sample. If you look at the values in bootstrap_sample1, you can see how this process plays out. We originally drew 28, then we drew 11, then 7, and so on. Of course, we didn’t actually use pieces of paper and a cap here. We just had the computer perform this process for us to produce bootstrap_sample1 using rep_sample_n() with replace = TRUE set. The process of sampling with replacement is how we can use the original sample to take a guess as to what other values in the population may be. Sometimes in these bootstrap samples, we will select lots of larger values from the original sample, sometimes we will select lots of smaller values, and most frequently we will select values that are near the center of the sample. Let’s explore what the distribution of values of age_in_2011 for six different bootstrap samples looks like to further understand this variability. six_bootstrap_samples &lt;- pennies_sample %&gt;% rep_sample_n(size = 40, replace = TRUE, reps = 6) ggplot(six_bootstrap_samples, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) + facet_wrap(~ replicate) We can also look at the six different means using dplyr syntax: six_bootstrap_samples %&gt;% group_by(replicate) %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 6 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 23.6 2 2 24.1 3 3 25.2 4 4 23.1 5 5 24.0 6 6 24.7 Instead of doing this six times, we could do it 1000 times and then look at the distribution of stat across all 1000 of the replicates. This sets the stage for the infer R package (Bray et al. 2018) that was created to help users perform statistical inference such as confidence intervals and hypothesis tests using verbs similar to what you’ve seen with dplyr. We’ll walk through setting up each of the infer verbs for confidence intervals using this pennies_sample example, while also explaining the purpose of the verbs in a general framework. 9.2 The infer package for statistical inference The infer package makes great use of the %&gt;% to create a pipeline for statistical inference. The goal of the package is to provide a way for its users to explain the computational process of confidence intervals and hypothesis tests using the code as a guide. The verbs build in order here, so you’ll want to start with specify() and then continue through the others as needed. 9.2.1 Specify variables The specify() function is used primarily to choose which variables will be the focus of the statistical inference. In addition, a setting of which variable will act as the explanatory and which acts as the response variable is done here. For proportion problems similar to those in Chapter 8, we can also give which of the different levels we would like to have as a success. We’ll see further examples of these options in this chapter, Chapter 10, and in Appendix B. To begin to create a confidence interval for the population mean age of US pennies in 2011, we start by using specify() to choose which variable in our pennies_sample data we’d like to work with. This can be done in one of two ways: Using the response argument: pennies_sample %&gt;% specify(response = age_in_2011) Response: age_in_2011 (integer) # A tibble: 40 x 1 age_in_2011 &lt;int&gt; 1 6 2 30 3 34 4 19 5 6 6 5 7 11 8 19 9 23 10 15 # … with 30 more rows Using formula notation: pennies_sample %&gt;% specify(formula = age_in_2011 ~ NULL) Response: age_in_2011 (integer) # A tibble: 40 x 1 age_in_2011 &lt;int&gt; 1 6 2 30 3 34 4 19 5 6 6 5 7 11 8 19 9 23 10 15 # … with 30 more rows Note that the formula notation uses the common R methodology to include the response \\(y\\) variable on the left of the ~ and the explanatory \\(x\\) variable on the right of the “tilde.” Recall that you used this notation frequently with the lm() function in Chapters 6 and 7 when fitting regression models. Either notation works just fine, but a preference is usually given here for the formula notation to further build on the ideas from earlier chapters. 9.2.2 Generate replicates After specify()ing the variables we’d like in our inferential analysis, we next feed that into the generate() verb. The generate() verb’s main argument is reps, which is used to give how many different repetitions one would like to perform. Another argument here is type, which is automatically determined by the kinds of variables passed into specify(). We can also be explicit and set this type to be type = &quot;bootstrap&quot;. This type argument will be further used in hypothesis testing in Chapter 10 as well. Make sure to check out ?generate to see the options here and use the ? operator to better understand other verbs as well. Let’s generate() 1000 bootstrap samples: thousand_bootstrap_samples &lt;- pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% generate(reps = 1000) Setting `type = &quot;bootstrap&quot;` in `generate()`. We can use the dplyr count() function to help us understand what the thousand_bootstrap_samples data frame looks like: thousand_bootstrap_samples %&gt;% count(replicate) # A tibble: 1,000 x 2 # Groups: replicate [1,000] replicate n &lt;int&gt; &lt;int&gt; 1 1 40 2 2 40 3 3 40 4 4 40 5 5 40 6 6 40 7 7 40 8 8 40 9 9 40 10 10 40 # … with 990 more rows Notice that each replicate has 40 entries here. Now that we have 1000 different bootstrap samples, our next step is to calculate the bootstrap statistics for each sample. 9.2.3 Calculate summary statistics After generate()ing many different samples, we next want to condense those samples down into a single statistic for each replicated sample. As seen in the diagram, the calculate() function is helpful here. As we did at the beginning of this chapter, we now want to calculate the mean age_in_2011 for each bootstrap sample. To do so, we use the stat argument and set it to &quot;mean&quot; below. The stat argument has a variety of different options here and we will see further examples of this throughout the remaining chapters. bootstrap_distribution &lt;- pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;mean&quot;) Setting `type = &quot;bootstrap&quot;` in `generate()`. bootstrap_distribution # A tibble: 1,000 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 26.5 2 2 25.4 3 3 26.0 4 4 26 5 5 25.2 6 6 29.0 7 7 22.8 8 8 26.4 9 9 24.9 10 10 28.1 # … with 990 more rows We see that the resulting data has 1000 rows and 2 columns corresponding to the 1000 replicates and the mean for each bootstrap sample. Observed statistic / point estimate calculations Just as group_by() %&gt;% summarize() produces a useful workflow in dplyr, we can also use specify() %&gt;% calculate() to compute summary measures on our original sample data. It’s often helpful both in confidence interval calculations, but also in hypothesis testing to identify what the corresponding statistic is in the original data. For our example on penny age, we computed above a value of x_bar using the summarize() verb in dplyr: pennies_sample %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 This can also be done by skipping the generate() step in the pipeline feeding specify() directly into calculate(): pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% calculate(stat = &quot;mean&quot;) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 This shortcut will be particularly useful when the calculation of the observed statistic is tricky to do using dplyr alone. This is particularly the case when working with more than one variable as will be seen in Chapter 10. 9.2.4 Visualize the results The visualize() verb provides a simple way to view the bootstrap distribution as a histogram of the stat variable values. It has many other arguments that one can use as well including the shading of the histogram values corresponding to the confidence interval values. bootstrap_distribution %&gt;% visualize() The shape of this resulting distribution may look familiar to you. It resembles the well-known normal (bell-shaped) curve. The following diagram recaps the infer pipeline for creating a bootstrap distribution. 9.3 Now to confidence intervals Definition: Confidence Interval A confidence interval gives a range of plausible values for a parameter. It depends on a specified confidence level with higher confidence levels corresponding to wider confidence intervals and lower confidence levels corresponding to narrower confidence intervals. Common confidence levels include 90%, 95%, and 99%. Usually we don’t just begin sections with a definition, but confidence intervals are simple to define and play an important role in the sciences and any field that uses data. You can think of a confidence interval as playing the role of a net when fishing. Instead of just trying to catch a fish with a single spear (estimating an unknown parameter by using a single point estimate/statistic), we can use a net to try to provide a range of possible locations for the fish (use a range of possible values based around our statistic to make a plausible guess as to the location of the parameter). The bootstrapping process will provide bootstrap statistics that have a bootstrap distribution with center at (or extremely close to) the mean of the original sample. This can be seen by giving the observed statistic obs_stat argument the value of the point estimate x_bar. bootstrap_distribution %&gt;% visualize(obs_stat = x_bar) We can also compute the mean of the bootstrap distribution of means to see how it compares to x_bar: bootstrap_distribution %&gt;% summarize(mean_of_means = mean(stat)) # A tibble: 1 x 1 mean_of_means &lt;dbl&gt; 1 25.1 In this case, we can see that the bootstrap distribution provides us a guess as to what the variability in different sample means may look like only using the original sample as our guide. We can quantify this variability in the form of a 95% confidence interval in a couple different ways. 9.3.1 The percentile method One way to calculate a range of plausible values for the unknown mean age of coins in 2011 is to use the middle 95% of the bootstrap_distribution to determine our endpoints. Our endpoints are thus at the 2.5th and 97.5th percentiles. This can be done with infer using the get_ci() function. (You can also use the conf_int() or get_confidence_interval() functions here as they are aliases that work the exact same way.) bootstrap_distribution %&gt;% get_ci(level = 0.95, type = &quot;percentile&quot;) # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 These options are the default values for level and type so we can also just do: percentile_ci &lt;- bootstrap_distribution %&gt;% get_ci() percentile_ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 Using the percentile method, our range of plausible values for the mean age of US pennies in circulation in 2011 is 20.972 years to 29.252 years. We can use the visualize() function to view this using the endpoints and direction arguments, setting direction to &quot;between&quot; (between the values) and endpoints to be those stored with name percentile_ci. bootstrap_distribution %&gt;% visualize(endpoints = percentile_ci, direction = &quot;between&quot;) You can see that 95% of the data stored in the stat variable in bootstrap_distribution falls between the two endpoints with 2.5% to the left outside of the shading and 2.5% to the right outside of the shading. The cut-off points that provide our range are shown with the darker lines. 9.3.2 The standard error method If the bootstrap distribution is close to symmetric and bell-shaped, we can also use a shortcut formula for determining the lower and upper endpoints of the confidence interval. This is done by using the formula \\(\\bar{x} \\pm (multiplier * SE),\\) where \\(\\bar{x}\\) is our original sample mean and \\(SE\\) stands for standard error and corresponds to the standard deviation of the bootstrap distribution. The value of \\(multiplier\\) here is the appropriate percentile of the standard normal distribution. These are automatically calculated when level is provided with level = 0.95 being the default. (95% of the values in a standard normal distribution fall within 1.96 standard deviations of the mean, so \\(multiplier = 1.96\\) for level = 0.95, for example.) As mentioned, this formula assumes that the bootstrap distribution is symmetric and bell-shaped. This is often the case with bootstrap distributions, especially those in which the original distribution of the sample is not highly skewed. Definition: standard error The standard error is the standard deviation of the sampling distribution. The variability of the sampling distribution may be approximated by the variability of the bootstrap distribution. Traditional theory-based methodologies for inference also have formulas for standard errors, assuming some conditions are met. This \\(\\bar{x} \\pm (multiplier * SE)\\) formula is implemented in the get_ci() function as shown with our pennies problem using the bootstrap distribution’s variability as an approximation for the sampling distribution’s variability. We’ll see more on this approximation shortly. Note that the center of the confidence interval (the point_estimate) must be provided for the standard error confidence interval. standard_error_ci &lt;- bootstrap_distribution %&gt;% get_ci(type = &quot;se&quot;, point_estimate = x_bar) standard_error_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 bootstrap_distribution %&gt;% visualize(endpoints = standard_error_ci, direction = &quot;between&quot;) We see that both methods produce nearly identical confidence intervals with the percentile method being \\([20.97, 29.25]\\) and the standard error method being \\([20.97, 29.28]\\). 9.4 Comparing bootstrap and sampling distributions To help build up the idea of a confidence interval, we weren’t completely honest in our initial discussion. The pennies_sample data frame represents a sample from a larger number of pennies stored as pennies in the moderndive package. The pennies data frame (also in the moderndive package) contains 800 rows of data and two columns pertaining to the same variables as pennies_sample. Let’s begin by understanding some of the properties of the age_by_2011 variable in the pennies data frame. ggplot(pennies, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) pennies %&gt;% summarize(mean_age = mean(age_in_2011), median_age = median(age_in_2011)) # A tibble: 1 x 2 mean_age median_age &lt;dbl&gt; &lt;dbl&gt; 1 21.2 20 We see that pennies is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that pennies_sample was more symmetric than pennies. In fact, it actually exhibited some left-skew as we compare the mean and median values. ggplot(pennies_sample, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) pennies_sample %&gt;% summarize(mean_age = mean(age_in_2011), median_age = median(age_in_2011)) # A tibble: 1 x 2 mean_age median_age &lt;dbl&gt; &lt;dbl&gt; 1 25.1 25.5 Sampling distribution Let’s assume that pennies represents our population of interest. We can then create a sampling distribution for the population mean age of pennies, denoted by the Greek letter \\(\\mu\\), using the rep_sample_n() function seen in Chapter 8. First we will create 1000 samples from the pennies data frame. thousand_samples &lt;- pennies %&gt;% rep_sample_n(size = 40, reps = 1000, replace = FALSE) When creating a sampling distribution, we do not replace the items when we create each sample. This is in contrast to the bootstrap distribution. It’s important to remember that the sampling distribution is sampling without replacement from the population to better understand sample-to-sample variability, whereas the bootstrap distribution is sampling with replacement from our original sample to better understand potential sample-to-sample variability. After sampling from pennies 1000 times, we next want to compute the mean age for each of the 1000 samples: sampling_distribution &lt;- thousand_samples %&gt;% group_by(replicate) %&gt;% summarize(stat = mean(age_in_2011)) ggplot(sampling_distribution, aes(x = stat)) + geom_histogram(bins = 10, fill = &quot;salmon&quot;, color = &quot;white&quot;) FIGURE 9.1: Sampling distribution for n=40 samples of pennies We can also examine the variability in this sampling distribution by calculating the standard deviation of the stat column. Remember that the standard deviation of the sampling distribution is the standard error, frequently denoted as se. sampling_distribution %&gt;% summarize(se = sd(stat)) # A tibble: 1 x 1 se &lt;dbl&gt; 1 2.01 Bootstrap distribution Let’s now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We’ll shade the bootstrap distribution blue to further assist with remembering which is which. bootstrap_distribution %&gt;% visualize(bins = 10, fill = &quot;blue&quot;) bootstrap_distribution %&gt;% summarize(se = sd(stat)) # A tibble: 1 x 1 se &lt;dbl&gt; 1 2.12 Notice that while the standard deviations are similar, the center of the sampling distribution and the bootstrap distribution differ: sampling_distribution %&gt;% summarize(mean_of_sampling_means = mean(stat)) # A tibble: 1 x 1 mean_of_sampling_means &lt;dbl&gt; 1 21.2 bootstrap_distribution %&gt;% summarize(mean_of_bootstrap_means = mean(stat)) # A tibble: 1 x 1 mean_of_bootstrap_means &lt;dbl&gt; 1 25.1 Since the bootstrap distribution is centered at the original sample mean, it doesn’t necessarily provide a good estimate of the overall population mean \\(\\mu\\). Let’s calculate the mean of age_in_2011 for the pennies data frame to see how it compares to the mean of the sampling distribution and the mean of the bootstrap distribution. pennies %&gt;% summarize(overall_mean = mean(age_in_2011)) # A tibble: 1 x 1 overall_mean &lt;dbl&gt; 1 21.2 Notice that this value matches up well with the mean of the sampling distribution. This is actually an artifact of the Central Limit Theorem introduced in Chapter 8. The mean of the sampling distribution is expected to be the mean of the overall population. The unfortunate fact though is that we don’t know the population mean in nearly all circumstances. The motivation of presenting it here was to show that the theory behind the Central Limit Theorem works using the tools you’ve worked with so far using the ggplot2, dplyr, moderndive, and infer packages. If we aren’t able to use the sample mean as a good guess for the population mean, how should we best go about estimating what the population mean may be if we can only select samples from the population. We’ve now come full circle and can discuss the underpinnings of the confidence interval and ways to interpret it. 9.5 Interpreting the confidence interval As shown above in Subsection 9.3.1, one range of plausible values for the population mean age of pennies in 2011, denoted by \\(\\mu\\), is \\([20.97, 29.25]\\). Recall that this confidence interval is based on bootstrapping using pennies_sample. Note that the mean of pennies (21.152) does fall in this confidence interval. If we had a different sample of size 40 and constructed a confidence interval using the same method, would we be guaranteed that it contained the population parameter value as well? Let’s try it out: pennies_sample2 &lt;- pennies %&gt;% sample_n(size = 40) Note the use of the sample_n() function in the dplyr package here. This does the same thing as rep_sample_n(reps = 1) but omits the extra replicate column. We next create an infer pipeline to generate a percentile-based 95% confidence interval for \\(\\mu\\): percentile_ci2 &lt;- pennies_sample2 %&gt;% specify(formula = age_in_2011 ~ NULL) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;mean&quot;) %&gt;% get_ci() Setting `type = &quot;bootstrap&quot;` in `generate()`. percentile_ci2 # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 18.4 25.3 This new confidence interval also contains the value of \\(\\mu\\). Let’s further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of pennies. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years. Of the 100 confidence intervals based on samples of size \\(n = 40\\), 96 of them captured the population mean \\(\\mu = 21.152\\), whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we’d expect 95% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is “95% reliable” in that we can expect it to include the true population parameter 95% of the time if the process is repeated. To further accentuate this point, let’s perform a similar procedure using 90% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals. Of the 100 confidence intervals based on samples of size \\(n = 40\\), 87 of them captured the population mean \\(\\mu = 21.152\\), whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be “95% confident” or “90% confident” that the true value falls within the range of the specified confidence interval. We will use this “confident” language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process. Back to our pennies example After this elaboration on what the level corresponds to in a confidence interval, let’s conclude by providing an interpretation of the original confidence interval result we found in Subsection 9.3.1. Interpretation: We are 95% confident that the true mean age of pennies in circulation in 2011 is between 20.972 and 29.252 years. This level of confidence is based on the percentile-based method including the true mean 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created. 9.6 Example: One proportion Let’s revisit our exercise of trying to estimate the proportion of red balls in the bowl from Chapter 8. We are now interested in determining a confidence interval for population parameter \\(p\\), the proportion of balls that are red out of the total \\(N = 2400\\) red and white balls. We will use the first sample reported from Ilyas and Yohan in Subsection 8.1.3 for our point estimate. They observed 21 red balls out of the 50 in their shovel. This data is stored in the tactile_shovel1 data frame in the moderndive package. tactile_shovel1 # A tibble: 50 x 1 color &lt;chr&gt; 1 red 2 red 3 white 4 red 5 white 6 red 7 red 8 white 9 red 10 white # … with 40 more rows 9.6.1 Observed Statistic To compute the proportion that are red in this data we can use the specify() %&gt;% calculate() workflow. Note the use of the success argument here to clarify which of the two colors &quot;red&quot; or &quot;white&quot; we are interested in. p_hat &lt;- tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% calculate(stat = &quot;prop&quot;) p_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.42 9.6.2 Bootstrap distribution Next we want to calculate many different bootstrap samples and their corresponding bootstrap statistic (the proportion of red balls). We’ve done 1000 in the past, but let’s go up to 10,000 now to better see the resulting distribution. Recall that this is done by including a generate() function call in the middle of our pipeline: tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% generate(reps = 10000) Setting `type = &quot;bootstrap&quot;` in `generate()`. This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the calculate() step. bootstrap_props &lt;- tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;prop&quot;) Let’s visualize() what the resulting bootstrap distribution looks like as a histogram. We’ve adjusted the number of bins here as well to better see the resulting shape. bootstrap_props %&gt;% visualize(bins = 25) We see that the resulting distribution is symmetric and bell-shaped so it doesn’t much matter which confidence interval method we choose. Let’s use the standard error method to create a 95% confidence interval. standard_error_ci &lt;- bootstrap_props %&gt;% get_ci(type = &quot;se&quot;, level = 0.95, point_estimate = p_hat) standard_error_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 0.284 0.556 bootstrap_props %&gt;% visualize(bins = 25, endpoints = standard_error_ci) We are 95% confident that the true proportion of red balls in the bowl is between 0.284 and 0.556. This level of confidence is based on the standard error-based method including the true proportion 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created. 9.6.3 Theory-based confidence intervals When the bootstrap distribution has the nice symmetric, bell shape that we saw in the red balls example above, we can also use a formula to quantify the standard error. This provides another way to compute a confidence interval, but is a little more tedious and mathematical. The steps are outlined below. We’ve also shown how we can use the confidence interval (CI) interpretation in this case as well to support your understanding of this tricky concept. Procedure for building a theory-based CI for \\(p\\) To construct a theory-based confidence interval for \\(p\\), the unknown true population proportion we Collect a sample of size \\(n\\) Compute \\(\\widehat{p}\\) Compute the standard error \\[\\text{SE} = \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Compute the margin of error \\[\\text{MoE} = 1.96 \\cdot \\text{SE} = 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Compute both end points of the confidence interval: The lower end point lower_ci: \\[\\widehat{p} - \\text{MoE} = \\widehat{p} - 1.96 \\cdot \\text{SE} = \\widehat{p} - 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] The upper end point upper_ci: \\[\\widehat{p} + \\text{MoE} = \\widehat{p} + 1.96 \\cdot \\text{SE} = \\widehat{p} + 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Alternatively, you can succinctly summarize a 95% confidence interval for \\(p\\) using the \\(\\pm\\) symbol: \\[ \\widehat{p} \\pm \\text{MoE} = \\widehat{p} \\pm 1.96 \\cdot \\text{SE} = \\widehat{p} \\pm 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}} \\] Confidence intervals based on 33 tactile samples Let’s load the tactile sampling data for the 33 groups from Chapter 8. Recall this data was saved in the tactile_prop_red data frame included in the moderndive package. tactile_prop_red Let’s now apply the above procedure for constructing confidence intervals for \\(p\\) using the data saved in tactile_prop_red by adding/modifying new columns using the dplyr package data wrangling tools seen in Chapter 4: Rename prop_red to p_hat, the official name of the sample proportion Make explicit the sample size n of \\(n=50\\) the standard error SE the margin of error MoE the left endpoint of the confidence interval lower_ci the right endpoint of the confidence interval upper_ci conf_ints &lt;- tactile_prop_red %&gt;% rename(p_hat = prop_red) %&gt;% mutate( n = 50, SE = sqrt(p_hat * (1 - p_hat) / n), MoE = 1.96 * SE, lower_ci = p_hat - MoE, upper_ci = p_hat + MoE ) conf_ints TABLE 9.2: 33 confidence intervals from 33 tactile samples of size n=50 group red_balls p_hat n SE MoE lower_ci upper_ci Ilyas, Yohan 21 0.42 50 0.070 0.137 0.283 0.557 Morgan, Terrance 17 0.34 50 0.067 0.131 0.209 0.471 Martin, Thomas 21 0.42 50 0.070 0.137 0.283 0.557 Clark, Frank 21 0.42 50 0.070 0.137 0.283 0.557 Riddhi, Karina 18 0.36 50 0.068 0.133 0.227 0.493 Andrew, Tyler 19 0.38 50 0.069 0.135 0.245 0.515 Julia 19 0.38 50 0.069 0.135 0.245 0.515 Rachel, Lauren 11 0.22 50 0.059 0.115 0.105 0.335 Daniel, Caroline 15 0.30 50 0.065 0.127 0.173 0.427 Josh, Maeve 17 0.34 50 0.067 0.131 0.209 0.471 Emily, Emily 16 0.32 50 0.066 0.129 0.191 0.449 Conrad, Emily 18 0.36 50 0.068 0.133 0.227 0.493 Oliver, Erik 17 0.34 50 0.067 0.131 0.209 0.471 Isabel, Nam 21 0.42 50 0.070 0.137 0.283 0.557 X, Claire 15 0.30 50 0.065 0.127 0.173 0.427 Cindy, Kimberly 20 0.40 50 0.069 0.136 0.264 0.536 Kevin, James 11 0.22 50 0.059 0.115 0.105 0.335 Nam, Isabelle 21 0.42 50 0.070 0.137 0.283 0.557 Harry, Yuko 15 0.30 50 0.065 0.127 0.173 0.427 Yuki, Eileen 16 0.32 50 0.066 0.129 0.191 0.449 Ramses 23 0.46 50 0.070 0.138 0.322 0.598 Joshua, Elizabeth, Stanley 15 0.30 50 0.065 0.127 0.173 0.427 Siobhan, Jane 18 0.36 50 0.068 0.133 0.227 0.493 Jack, Will 16 0.32 50 0.066 0.129 0.191 0.449 Caroline, Katie 21 0.42 50 0.070 0.137 0.283 0.557 Griffin, Y 18 0.36 50 0.068 0.133 0.227 0.493 Kaitlin, Jordan 17 0.34 50 0.067 0.131 0.209 0.471 Ella, Garrett 18 0.36 50 0.068 0.133 0.227 0.493 Julie, Hailin 15 0.30 50 0.065 0.127 0.173 0.427 Katie, Caroline 21 0.42 50 0.070 0.137 0.283 0.557 Mallory, Damani, Melissa 21 0.42 50 0.070 0.137 0.283 0.557 Katie 16 0.32 50 0.066 0.129 0.191 0.449 Francis, Vignesh 19 0.38 50 0.069 0.135 0.245 0.515 Let’s plot: These 33 confidence intervals for \\(p\\): from lower_ci to upper_ci The true population proportion \\(p = 900 / 2400 = 0.375\\) with a red vertical line FIGURE 9.2: 33 confidence intervals based on 33 tactile samples of size n=50 We see that: In 31 cases, the confidence intervals “capture” the true \\(p = 900 / 2400 = 0.375\\) In 2 cases, the confidence intervals do not “capture” the true \\(p = 900 / 2400 = 0.375\\) Thus, the confidence intervals capture the true proportion $31 / 33 = 93.939% of the time using this theory-based methodology. Confidence intervals based on 100 virtual samples Let’s say however, we repeated the above 100 times, not tactilely, but virtually. Let’s do this only 100 times instead of 1000 like we did before so that the results can fit on the screen. Again, the steps for compute a 95% confidence interval for \\(p\\) are: Collect a sample of size \\(n = 50\\) as we did in Chapter 8 Compute \\(\\widehat{p}\\): the sample proportion red of these \\(n=50\\) balls Compute the standard error \\(\\text{SE} = \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Compute the margin of error \\(\\text{MoE} = 1.96 \\cdot \\text{SE} = 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Compute both end points of the confidence interval: lower_ci: \\(\\widehat{p} - \\text{MoE} = \\widehat{p} - 1.96 \\cdot \\text{SE} = \\widehat{p} - 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) upper_ci: \\(\\widehat{p} + \\text{MoE} = \\widehat{p} + 1.96 \\cdot \\text{SE} = \\widehat{p} +1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Run the following three steps, being sure to View() the resulting data frame after each step so you can convince yourself of what’s going on: # First: Take 100 virtual samples of n=50 balls virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 100) # Second: For each virtual sample compute the proportion red virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) # Third: Compute the 95% confidence interval as above virtual_prop_red &lt;- virtual_prop_red %&gt;% rename(p_hat = prop_red) %&gt;% mutate( n = 50, SE = sqrt(p_hat*(1-p_hat)/n), MoE = 1.96 * SE, lower_ci = p_hat - MoE, upper_ci = p_hat + MoE ) Here are the results: FIGURE 9.3: 100 confidence intervals based on 100 virtual samples of size n=50 We see that of our 100 confidence intervals based on samples of size \\(n=50\\), 96 of them captured the true \\(p = 900/2400\\), whereas 4 of them missed. As we create more and more confidence intervals based on more and more samples, about 95% of these intervals will capture. In other words our procedure is “95% reliable.” Theoretical methods like this have largely been used in the past since we didn’t have the computing power to perform the simulation-based methods such as bootstrapping. They are still commonly used though and if the normality assumptions are met, they can provide a nice option for finding confidence intervals and performing hypothesis tests as we will see in Chapter 10. 9.7 Example: Comparing two proportions If you see someone else yawn, are you more likely to yawn? In an episode of the show Mythbusters, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website here. More information about the episode is also available on IMDb here. Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (“confederate”) who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at mythbusters_yawn in the moderndive package. Let’s check it out. mythbusters_yawn # A tibble: 50 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 1 seed yes 2 2 control yes 3 3 seed no 4 4 seed yes 5 5 seed no 6 6 control no 7 7 seed yes 8 8 control no 9 9 control no 10 10 seed no # … with 40 more rows The participant ID is stored in the subj variable with values of 1 to 50. The group variable is either &quot;seed&quot; for when a confederate was trying to influence the participant or &quot;control&quot; if a confederate did not interact with the participant. The yawn variable is either &quot;yes&quot; if the participant yawned or &quot;no&quot; if the participant did not yawn. We can use the janitor package to get a glimpse into this data in a table format: mythbusters_yawn %&gt;% tabyl(group, yawn) %&gt;% adorn_percentages() %&gt;% adorn_pct_formatting() %&gt;% # To show original counts adorn_ns() group no yes control 75.0% (12) 25.0% (4) seed 70.6% (24) 29.4% (10) We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We’d like to see if the difference between these two proportions is significantly larger than 0. If so, we’d have evidence to support the claim that yawning is contagious based on this study. In looking over this problem, we can make note of some important details to include in our infer pipeline: We are calling a success having a yawn value of &quot;yes&quot;. Our response variable will always correspond to the variable used in the success so the response variable is yawn. The explanatory variable is the other variable of interest here: group. To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not. 9.7.1 Compute the point estimate mythbusters_yawn %&gt;% specify(formula = yawn ~ group) Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`. Note that the success argument must be specified in situations such as this where the response variable has only two levels. mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) Response: yawn (factor) Explanatory: group (factor) # A tibble: 50 x 2 yawn group &lt;fct&gt; &lt;fct&gt; 1 yes seed 2 yes control 3 no seed 4 yes seed 5 no seed 6 no control 7 yes seed 8 no control 9 no control 10 no seed # … with 40 more rows We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes. mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;) Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c(&quot;first&quot;, &quot;second&quot;)` means `(&quot;first&quot; - &quot;second&quot;)`. Check `?calculate` for details. We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the order in which R should subtract these proportions of successes. As the error message states, we’ll want to put &quot;seed&quot; first after c() and then &quot;control&quot;: order = c(&quot;seed&quot;, &quot;control&quot;). Our point estimate is thus calculated: obs_diff &lt;- mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;seed&quot;, &quot;control&quot;)) obs_diff # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.0441 This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25). 9.7.2 Bootstrap distribution Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection 9.1.3 and in computing bootstrap proportions in Section 9.6, but we haven’t yet worked with bootstrapping involving multiple variables though. In the infer package, bootstrapping with multiple variables means that each row is potentially resampled. Let’s investigate this by looking at the first few rows of mythbusters_yawn: head(mythbusters_yawn) # A tibble: 6 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 1 seed yes 2 2 control yes 3 3 seed no 4 4 seed yes 5 5 seed no 6 6 control no When we bootstrap this data, we are potentially pulling the subject’s readings multiple times. Thus, we could see the entries of &quot;seed&quot; for group and &quot;no&quot; for yawn together in a new row in a bootstrap sample. This is further seen by exploring the sample_n() function in dplyr on this smaller 6 row data frame comprised of head(mythbusters_yawn). The sample_n() function can perform this bootstrapping procedure and is similar to the rep_sample_n() function in infer, except that it is not repeated but rather only performs one sample with or without replacement. set.seed(2019) head(mythbusters_yawn) %&gt;% sample_n(size = 6, replace = TRUE) # A tibble: 6 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 5 seed no 2 5 seed no 3 2 control yes 4 4 seed yes 5 1 seed yes 6 1 seed yes We can see that in this bootstrap sample generated from the first six rows of mythbusters_yawn, we have some rows repeated. The same is true when we perform the generate() step in infer as done below. bootstrap_distribution &lt;- mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;seed&quot;, &quot;control&quot;)) Setting `type = &quot;bootstrap&quot;` in `generate()`. bootstrap_distribution %&gt;% visualize(bins = 20) This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply get_ci() can be used. bootstrap_distribution %&gt;% get_ci(type = &quot;percentile&quot;, level = 0.95) # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -0.219 0.293 The confidence interval shown here includes the value of 0. We’ll see in Chapter 10 further what this means in terms of this difference being statistically significant or not, but let’s examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293. Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about “95% confident”) that the seed group had a higher proportion of yawning than the control group. Note that this all relates to the importance of denoting the order argument in the calculate() function. Since we specified &quot;seed&quot; and then &quot;control&quot; positive values for the statistic correspond to the &quot;seed&quot; proportion being higher, whereas negative values correspond to the &quot;control&quot; group being higher. We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that “yawning is contagious” being “confirmed” is not statistically appropriate. Learning check Practice problems to come soon! 9.8 Conclusion 9.8.1 What’s to come? This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter 10 up next! 9.8.2 Script of R code An R script file of all R code used in this chapter is available here. References "],
+["10-hypothesis-testing.html", "Chapter 10 Hypothesis Testing 10.1 When inference is not needed 10.2 Basics of hypothesis testing 10.3 Criminal trial analogy 10.4 Types of errors in hypothesis testing 10.5 Statistical significance 10.6 Hypothesis testing with infer 10.7 Example: Comparing two means 10.8 Building theory-based methods using computation 10.9 Conclusion", " Chapter 10 Hypothesis Testing In preparation for our first print edition to be published by CRC Press in Fall 2019, we’re remodeling this chapter a bit. Don’t expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at ModernDive.com by early Summer 2019! We saw some of the main concepts of hypothesis testing introduced in Chapters 8 and 9. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations. The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the infer package pipeline in Chapter 9. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix B. We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the \\(t\\)-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(infer) library(nycflights13) library(ggplot2movies) library(broom) We saw some of the main concepts of hypothesis testing introduced in Chapters 8 and 9. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations. The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the infer package pipeline in Chapter 9. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix B. We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the \\(t\\)-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(infer) library(nycflights13) library(ggplot2movies) library(broom) 10.1 When inference is not needed Before we delve into hypothesis testing, it’s good to remember that there are cases where you need not perform a rigorous statistical inference. An important and time-saving skill is to ALWAYS do exploratory data analysis using dplyr and ggplot2 before thinking about running a hypothesis test. Let’s look at such an example selecting a sample of flights traveling to Boston and to San Francisco from New York City in the flights data frame in the nycflights13 package. (We will remove flights with missing data first using na.omit and then sample 100 flights going to each of the two airports.) bos_sfo &lt;- flights %&gt;% na.omit() %&gt;% filter(dest %in% c(&quot;BOS&quot;, &quot;SFO&quot;)) %&gt;% group_by(dest) %&gt;% sample_n(100) Suppose we were interested in seeing if the air_time to SFO in San Francisco was statistically greater than the air_time to BOS in Boston. As suggested, let’s begin with some exploratory data analysis to get a sense for how the two variables of air_time and dest relate for these two destination airports: bos_sfo_summary &lt;- bos_sfo %&gt;% group_by(dest) %&gt;% summarize(mean_time = mean(air_time), sd_time = sd(air_time)) bos_sfo_summary # A tibble: 2 x 3 dest mean_time sd_time &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 BOS 39.0 4.51 2 SFO 349. 18.7 Looking at these results, we can clearly see that SFO air_time is much larger than BOS air_time. The standard deviation is also extremely informative here. Learning check (LC10.1) Could we make the same type of immediate conclusion that SFO had a statistically greater air_time if, say, its corresponding standard deviation was 200 minutes? What about 100 minutes? Explain. To further understand just how different the air_time variable is for BOS and SFO, let’s look at a boxplot: ggplot(data = bos_sfo, mapping = aes(x = dest, y = air_time)) + geom_boxplot() Since there is no overlap at all, we can conclude that the air_time for San Francisco flights is statistically greater (at any level of significance) than the air_time for Boston flights. This is a clear example of not needing to do anything more than some simple exploratory data analysis with descriptive statistics and data visualization to get an appropriate inferential conclusion. This is one reason why you should ALWAYS investigate the sample data first using dplyr and ggplot2 via exploratory data analysis. As you get more and more practice with hypothesis testing, you’ll be better able to determine in many cases whether or not the results will be statistically significant. There are circumstances where it is difficult to tell, but you should always try to make a guess FIRST about significance after you have completed your data exploration and before you actually begin the inferential techniques. 10.2 Basics of hypothesis testing In a hypothesis test, we will use data from a sample to help us decide between two competing hypotheses about a population. We make these hypotheses more concrete by specifying them in terms of at least one population parameter of interest. We refer to the competing claims about the population as the null hypothesis, denoted by \\(H_0\\), and the alternative (or research) hypothesis, denoted by \\(H_a\\). The roles of these two hypotheses are NOT interchangeable. The claim for which we seek significant evidence is assigned to the alternative hypothesis. The alternative is usually what the experimenter or researcher wants to establish or find evidence for. Usually, the null hypothesis is a claim that there really is “no effect” or “no difference.” In many cases, the null hypothesis represents the status quo or that nothing interesting is happening. We assess the strength of evidence by assuming the null hypothesis is true and determining how unlikely it would be to see sample results/statistics as extreme (or more extreme) as those in the original sample. Hypothesis testing brings about many weird and incorrect notions in the scientific community and society at large. One reason for this is that statistics has traditionally been thought of as this magic box of algorithms and procedures to get to results and this has been readily apparent if you do a Google search of “flowchart statistics hypothesis tests.” There are so many different complex ways to determine which test is appropriate. You’ll see that we don’t need to rely on these complicated series of assumptions and procedures to conduct a hypothesis test any longer. These methods were introduced in a time when computers weren’t powerful. Your cellphone (in 2016) has more power than the computers that sent NASA astronauts to the moon after all. We’ll see that ALL hypothesis tests can be broken down into the following framework given by Allen Downey here: FIGURE 10.1: Hypothesis Testing Framework Before we hop into this framework, we will provide another way to think about hypothesis testing that may be useful. 10.3 Criminal trial analogy We can think of hypothesis testing in the same context as a criminal trial in the United States. A criminal trial in the United States is a familiar situation in which a choice between two contradictory claims must be made. The accuser of the crime must be judged either guilty or not guilty. Under the U.S. system of justice, the individual on trial is initially presumed not guilty. Only STRONG EVIDENCE to the contrary causes the not guilty claim to be rejected in favor of a guilty verdict. The phrase “beyond a reasonable doubt” is often used to set the cutoff value for when enough evidence has been given to convict. Theoretically, we should never say “The person is innocent.” but instead “There is not sufficient evidence to show that the person is guilty.” Now let’s compare that to how we look at a hypothesis test. The decision about the population parameter(s) must be judged to follow one of two hypotheses. We initially assume that \\(H_0\\) is true. The null hypothesis \\(H_0\\) will be rejected (in favor of \\(H_a\\)) only if the sample evidence strongly suggests that \\(H_0\\) is false. If the sample does not provide such evidence, \\(H_0\\) will not be rejected. The analogy to “beyond a reasonable doubt” in hypothesis testing is what is known as the significance level. This will be set before conducting the hypothesis test and is denoted as \\(\\alpha\\). Common values for \\(\\alpha\\) are 0.1, 0.01, and 0.05. 10.3.1 Two possible conclusions Therefore, we have two possible conclusions with hypothesis testing: Reject \\(H_0\\) Fail to reject \\(H_0\\) Gut instinct says that “Fail to reject \\(H_0\\)” should say “Accept \\(H_0\\)” but this technically is not correct. Accepting \\(H_0\\) is the same as saying that a person is innocent. We cannot show that a person is innocent; we can only say that there was not enough substantial evidence to find the person guilty. When you run a hypothesis test, you are the jury of the trial. You decide whether there is enough evidence to convince yourself that \\(H_a\\) is true (“the person is guilty”) or that there was not enough evidence to convince yourself \\(H_a\\) is true (“the person is not guilty”). You must convince yourself (using statistical arguments) which hypothesis is the correct one given the sample information. Important note: Therefore, DO NOT WRITE “Accept \\(H_0\\)” any time you conduct a hypothesis test. Instead write “Fail to reject \\(H_0\\).” 10.4 Types of errors in hypothesis testing Unfortunately, just as a jury or a judge can make an incorrect decision in regards to a criminal trial by reaching the wrong verdict, there is some chance we will reach the wrong conclusion via a hypothesis test about a population parameter. As with criminal trials, this comes from the fact that we don’t have complete information, but rather a sample from which to try to infer about a population. The possible erroneous conclusions in a criminal trial are an innocent person is convicted (found guilty) or a guilty person is set free (found not guilty). The possible errors in a hypothesis test are rejecting \\(H_0\\) when in fact \\(H_0\\) is true (Type I Error) or failing to reject \\(H_0\\) when in fact \\(H_0\\) is false (Type II Error). The risk of error is the price researchers pay for basing an inference about a population on a sample. With any reasonable sample-based procedure, there is some chance that a Type I error will be made and some chance that a Type II error will occur. To help understand the concepts of Type I error and Type II error, observe the following table: FIGURE 10.2: Type I and Type II errors If we are using sample data to make inferences about a parameter, we run the risk of making a mistake. Obviously, we want to minimize our chance of error; we want a small probability of drawing an incorrect conclusion. The probability of a Type I Error occurring is denoted by \\(\\alpha\\) and is called the significance level of a hypothesis test The probability of a Type II Error is denoted by \\(\\beta\\). Formally, we can define \\(\\alpha\\) and \\(\\beta\\) in regards to the table above, but for hypothesis tests instead of a criminal trial. \\(\\alpha\\) corresponds to the probability of rejecting \\(H_0\\) when, in fact, \\(H_0\\) is true. \\(\\beta\\) corresponds to the probability of failing to reject \\(H_0\\) when, in fact, \\(H_0\\) is false. Ideally, we want \\(\\alpha = 0\\) and \\(\\beta = 0\\), meaning that the chance of making an error does not exist. When we have to use incomplete information (sample data), it is not possible to have both \\(\\alpha = 0\\) and \\(\\beta = 0\\). We will always have the possibility of at least one error existing when we use sample data. Usually, what is done is that \\(\\alpha\\) is set before the hypothesis test is conducted and then the evidence is judged against that significance level. Common values for \\(\\alpha\\) are 0.05, 0.01, and 0.10. If \\(\\alpha = 0.05\\), we are using a testing procedure that, used over and over with different samples, rejects a TRUE null hypothesis five percent of the time. So if we can set \\(\\alpha\\) to be whatever we want, why choose 0.05 instead of 0.01 or even better 0.0000000000000001? Well, a small \\(\\alpha\\) means the test procedure requires the evidence against \\(H_0\\) to be very strong before we can reject \\(H_0\\). This means we will almost never reject \\(H_0\\) if \\(\\alpha\\) is very small. If we almost never reject \\(H_0\\), the probability of a Type II Error – failing to reject \\(H_0\\) when we should – will increase! Thus, as \\(\\alpha\\) decreases, \\(\\beta\\) increases and as \\(\\alpha\\) increases, \\(\\beta\\) decreases. We, therefore, need to strike a balance in \\(\\alpha\\) and \\(\\beta\\) and the common values for \\(\\alpha\\) of 0.05, 0.01, and 0.10 usually lead to a nice balance. Learning check (LC10.2) Reproduce the table above about errors, but for a hypothesis test, instead of the one provided for a criminal trial. 10.4.1 Logic of hypothesis testing Take a random sample (or samples) from a population (or multiple populations) If the sample data are consistent with the null hypothesis, do not reject the null hypothesis. If the sample data are inconsistent with the null hypothesis (in the direction of the alternative hypothesis), reject the null hypothesis and conclude that there is evidence the alternative hypothesis is true (based on the particular sample collected). 10.5 Statistical significance The idea that sample results are more extreme than we would reasonably expect to see by random chance if the null hypothesis were true is the fundamental idea behind statistical hypothesis tests. If data at least as extreme would be very unlikely if the null hypothesis were true, we say the data are statistically significant. Statistically significant data provide convincing evidence against the null hypothesis in favor of the alternative, and allow us to generalize our sample results to the claim about the population. Learning check (LC10.3) What is wrong about saying “The defendant is innocent.” based on the US system of criminal trials? (LC10.4) What is the purpose of hypothesis testing? (LC10.5) What are some flaws with hypothesis testing? How could we alleviate them? 10.6 Hypothesis testing with infer The “There is Only One Test” diagram mentioned in Section 10.2 was the inspiration for the infer pipeline that you saw for confidence intervals in Chapter 9. For hypothesis tests, we include one more verb into the pipeline: the hypothesize() verb. Its main argument is null which is either &quot;point&quot; for point hypotheses involving a single sample or &quot;independence&quot; for testing for independence between two variables. We’ll first explore the two variable case by comparing two means. Note the section headings here that refer to the “There is Only One Test” diagram. We will lay out the specifics for each problem using this framework and the infer pipeline together. 10.7 Example: Comparing two means 10.7.1 Randomization/permutation We will now focus on building hypotheses looking at the difference between two population means in an example. We will denote population means using the Greek symbol \\(\\mu\\) (pronounced “mu”). Thus, we will be looking to see if one group “out-performs” another group. This is quite possibly the most common type of statistical inference and serves as a basis for many other types of analyses when comparing the relationship between two variables. Our null hypothesis will be of the form \\(H_0: \\mu_1 = \\mu_2\\), which can also be written as \\(H_0: \\mu_1 - \\mu_2 = 0\\). Our alternative hypothesis will be of the form \\(H_0: \\mu_1 \\star \\mu_2\\) (or \\(H_a: \\mu_1 - \\mu_2 \\, \\star \\, 0\\)) where \\(\\star\\) = \\(&lt;\\), \\(\\ne\\), or \\(&gt;\\) depending on the context of the problem. You needn’t focus on these new symbols too much at this point. It will just be a shortcut way for us to describe our hypotheses. As we saw in Chapter 9, bootstrapping is a valuable tool when conducting inferences based on one or two population variables. We will see that the process of randomization (also known as permutation) will be valuable in conducting tests comparing quantitative values from two groups. 10.7.2 Comparing action and romance movies The movies dataset in the ggplot2movies package contains information on a large number of movies that have been rated by users of IMDB.com (Wickham 2015). We are interested in the question here of whether Action movies are rated higher on IMDB than Romance movies. We will first need to do a little bit of data wrangling using the ideas from Chapter 4 to get the data in the form that we would like: movies_trimmed &lt;- movies %&gt;% select(title, year, rating, Action, Romance) Note that Action and Romance are binary variables here. To remove any overlap of movies (and potential confusion) that are both Action and Romance, we will remove them from our population: movies_trimmed &lt;- movies_trimmed %&gt;% filter(!(Action == 1 &amp; Romance == 1)) We will now create a new variable called genre that specifies whether a movie in our movies_trimmed data frame is an &quot;Action&quot; movie, a &quot;Romance&quot; movie, or &quot;Neither&quot;. We aren’t really interested in the &quot;Neither&quot; category here so we will exclude those rows as well. Lastly, the Action and Romance columns are not needed anymore since they are encoded in the genre column. movies_trimmed &lt;- movies_trimmed %&gt;% mutate(genre = case_when(Action == 1 ~ &quot;Action&quot;, Romance == 1 ~ &quot;Romance&quot;, TRUE ~ &quot;Neither&quot;)) %&gt;% filter(genre != &quot;Neither&quot;) %&gt;% select(-Action, -Romance) The case_when function is useful for assigning values in a new variable based on the values of another variable. The last step of TRUE ~ &quot;Neither&quot; is used when a particular movie is not set to either Action or Romance. We are left with 8878 movies in our population dataset that focuses on only &quot;Action&quot; and &quot;Romance&quot; movies. Learning check (LC10.6) Why are the different genre variables stored as binary variables (1s and 0s) instead of just listing the genre as a column of values like “Action”, “Comedy”, etc.? (LC10.7) What complications could come above with us excluding action romance movies? Should we question the results of our hypothesis test? Explain. Let’s now visualize the distributions of rating across both levels of genre. Think about what type(s) of plot is/are appropriate here before you proceed: ggplot(data = movies_trimmed, aes(x = genre, y = rating)) + geom_boxplot() FIGURE 10.3: Rating vs genre in the population We can see that the middle 50% of ratings for &quot;Action&quot; movies is more spread out than that of &quot;Romance&quot; movies in the population. &quot;Romance&quot; has outliers at both the top and bottoms of the scale though. We are initially interested in comparing the mean rating across these two groups so a faceted histogram may also be useful: ggplot(data = movies_trimmed, mapping = aes(x = rating)) + geom_histogram(binwidth = 1, color = &quot;white&quot;) + facet_grid(genre ~ .) FIGURE 10.4: Faceted histogram of genre vs rating Important note: Remember that we hardly ever have access to the population values as we do here. This example and the nycflights13 dataset were used to create a common flow from chapter to chapter. In nearly all circumstances, we’ll be needing to use only a sample of the population to try to infer conclusions about the unknown population parameter values. These examples do show a nice relationship between statistics (where data is usually small and more focused on experimental settings) and data science (where data is frequently large and collected without experimental conditions). 10.7.3 Sampling \\(\\rightarrow\\) randomization We can use hypothesis testing to investigate ways to determine, for example, whether a treatment has an effect over a control and other ways to statistically analyze if one group performs better than, worse than, or different than another. We are interested here in seeing how we can use a random sample of action movies and a random sample of romance movies from movies to determine if a statistical difference exists in the mean ratings of each group. Learning check (LC10.8) Define the relevant parameters here in terms of the populations of movies. 10.7.4 Data Let’s select a random sample of 34 action movies and a random sample of 34 romance movies. (The number 34 was chosen somewhat arbitrarily here.) set.seed(2017) movies_genre_sample &lt;- movies_trimmed %&gt;% group_by(genre) %&gt;% sample_n(34) %&gt;% ungroup() Note the addition of the ungroup() function here. This will be useful shortly in allowing us to permute the values of rating across genre. Our analysis does not work without this ungroup() function since the data stays grouped by the levels of genre without it. We can now observe the distributions of our two sample ratings for both groups. Remember that these plots should be rough approximations of our population distributions of movie ratings for &quot;Action&quot; and &quot;Romance&quot; in our population of all movies in the movies data frame. ggplot(data = movies_genre_sample, aes(x = genre, y = rating)) + geom_boxplot() FIGURE 10.5: Genre vs rating for our sample ggplot(data = movies_genre_sample, mapping = aes(x = rating)) + geom_histogram(binwidth = 1, color = &quot;white&quot;) + facet_grid(genre ~ .) FIGURE 10.6: Genre vs rating for our sample as faceted histogram Learning check (LC10.9) What single value could we change to improve the approximation using the sample distribution on the population distribution? Do we have reason to believe, based on the sample distributions of rating over the two groups of genre, that there is a significant difference between the mean rating for action movies compared to romance movies? It’s hard to say just based on the plots. The boxplot does show that the median sample rating is higher for romance movies, but the histogram isn’t as clear. The two groups have somewhat differently shaped distributions but they are both over similar values of rating. It’s often useful to calculate the mean and standard deviation as well, conditioned on the two levels. summary_ratings &lt;- movies_genre_sample %&gt;% group_by(genre) %&gt;% summarize(mean = mean(rating), std_dev = sd(rating), n = n()) summary_ratings # A tibble: 2 x 4 genre mean std_dev n &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; 1 Action 5.11 1.49 34 2 Romance 6.06 1.15 34 Learning check (LC10.10) Why did we not specify na.rm = TRUE here as we did in Chapter 4? We see that the sample mean rating for romance movies, \\(\\bar{x}_{r}\\), is greater than the similar measure for action movies, \\(\\bar{x}_a\\). But is it statistically significantly greater (thus, leading us to conclude that the means are statistically different)? The standard deviation can provide some insight here but with these standard deviations being so similar it’s still hard to say for sure. Learning check (LC10.11) Why might the standard deviation provide some insight about the means being statistically different or not? 10.7.5 Model of \\(H_0\\) The hypotheses we specified can also be written in another form to better give us an idea of what we will be simulating to create our null distribution. \\(H_0: \\mu_r - \\mu_a = 0\\) \\(H_a: \\mu_r - \\mu_a \\ne 0\\) 10.7.6 Test statistic \\(\\delta\\) We are, therefore, interested in seeing whether the difference in the sample means, \\(\\bar{x}_r - \\bar{x}_a\\), is statistically different than 0. We can now come back to our infer pipeline for computing our observed statistic. Note the order argument that shows the mean value for &quot;Action&quot; being subtracted from the mean value of &quot;Romance&quot;. 10.7.7 Observed effect \\(\\delta^*\\) obs_diff &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) obs_diff # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.95 Our goal next is to figure out a random process with which to simulate the null hypothesis being true. Recall that \\(H_0: \\mu_r - \\mu_a = 0\\) corresponds to us assuming that the population means are the same. We would like to assume this is true and perform a random process to generate() data in the model of the null hypothesis. 10.7.8 Simulated data Tactile simulation Here, with us assuming the two population means are equal (\\(H_0: \\mu_r - \\mu_a = 0\\)), we can look at this from a tactile point of view by using index cards. There are \\(n_r = 34\\) data elements corresponding to romance movies and \\(n_a = 34\\) for action movies. We can write the 34 ratings from our sample for romance movies on one set of 34 index cards and the 34 ratings for action movies on another set of 34 index cards. (Note that the sample sizes need not be the same.) The next step is to put the two stacks of index cards together, creating a new set of 68 cards. If we assume that the two population means are equal, we are saying that there is no association between ratings and genre (romance vs action). We can use the index cards to create two new stacks for romance and action movies. Note that the new “romance movie stack” will likely have some of the original action movies in it and likewise for the “action movie stack” including some romance movies from our original set. Since we are assuming that each card is equally likely to have appeared in either one of the stacks this makes sense. First, we must shuffle all the cards thoroughly. After doing so, in this case with equal values of sample sizes, we split the deck in half. We then calculate the new sample mean rating of the romance deck, and also the new sample mean rating of the action deck. This creates one simulation of the samples that were collected originally. We next want to calculate a statistic from these two samples. Instead of actually doing the calculation using index cards, we can use R as we have before to simulate this process. Let’s do this just once and compare the results to what we see in movies_genre_sample. movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 1) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.515 Learning check (LC10.12) How would the tactile shuffling of index cards change if we had different samples of say 20 action movies and 60 romance movies? Describe each step that would change. (LC10.13) Why are we taking the difference in the means of the cards in the new shuffled decks? 10.7.9 Distribution of \\(\\delta\\) under \\(H_0\\) The generate() step completes a permutation sending values of ratings to potentially different values of genre from which they originally came. It simulates a shuffling of the ratings between the two levels of genre just as we could have done with index cards. We can now proceed in a similar way to what we have done previously with bootstrapping by repeating this process many times to create simulated samples, assuming the null hypothesis is true. generated_samples &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) A null distribution of simulated differences in sample means is created with the specification of stat = &quot;diff in means&quot; for the calculate() step. The null distribution is similar to the bootstrap distribution we saw in Chapter 9, but remember that it consists of statistics generated assuming the null hypothesis is true. We can now plot the distribution of these simulated differences in means: null_distribution_two_means %&gt;% visualize() FIGURE 10.7: Simulated differences in means histogram 10.7.10 The p-value Remember that we are interested in seeing where our observed sample mean difference of 0.95 falls on this null/randomization distribution. We are interested in simply a difference here so “more extreme” corresponds to values in both tails on the distribution. Let’s shade our null distribution to show a visual representation of our \\(p\\)-value: null_distribution_two_means %&gt;% visualize(obs_stat = obs_diff, direction = &quot;both&quot;) FIGURE 10.8: Shaded histogram to show p-value Remember that the observed difference in means was 0.95. We have shaded red all values at or above that value and also shaded red those values at or below its negative value (since this is a two-tailed test). By giving obs_stat = obs_diff a vertical darker line is also shown at 0.95. To better estimate how large the \\(p\\)-value will be, we also increase the number of bins to 100 here from 20: null_distribution_two_means %&gt;% visualize(bins = 100, obs_stat = obs_diff, direction = &quot;both&quot;) FIGURE 10.9: Histogram with vertical lines corresponding to observed statistic At this point, it is important to take a guess as to what the \\(p\\)-value may be. We can see that there are only a few permuted differences as extreme or more extreme than our observed effect (in both directions). Maybe we guess that this \\(p\\)-value is somewhere around 2%, or maybe 3%, but certainly not 30% or more. Lastly, we calculate the \\(p\\)-value directly using infer: pvalue &lt;- null_distribution_two_means %&gt;% get_pvalue(obs_stat = obs_diff, direction = &quot;both&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.006 We have around 0.6% of values as extreme or more extreme than our observed statistic in both directions. Assuming we are using a 5% significance level for \\(\\alpha\\), we have evidence supporting the conclusion that the mean rating for romance movies is different from that of action movies. The next important idea is to better understand just how much higher of a mean rating can we expect the romance movies to have compared to that of action movies. 10.7.11 Corresponding confidence interval One of the great things about the infer pipeline is that going between hypothesis tests and confidence intervals is incredibly simple. To create a null distribution, we ran null_distribution_two_means &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) To get the corresponding bootstrap distribution with which we can compute a confidence interval, we can just remove or comment out the hypothesize() step since we are no longer assuming the null hypothesis is true when we bootstrap: percentile_ci_two_means &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% # hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) %&gt;% get_ci() Setting `type = &quot;bootstrap&quot;` in `generate()`. percentile_ci_two_means # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.333 1.59 Thus, we can expect the true mean of Romance movies on IMDB to have a rating 0.333 to 1.593 points higher than that of Action movies. Remember that this is based on bootstrapping using movies_genre_sample as our original sample and the confidence interval process being 95% reliable. Learning check (LC10.14) Conduct the same analysis comparing action movies versus romantic movies using the median rating instead of the mean rating? What was different and what was the same? (LC10.15) What conclusions can you make from viewing the faceted histogram looking at rating versus genre that you couldn’t see when looking at the boxplot? (LC10.16) Describe in a paragraph how we used Allen Downey’s diagram to conclude if a statistical difference existed between mean movie ratings for action and romance movies. (LC10.17) Why are we relatively confident that the distributions of the sample ratings will be good approximations of the population distributions of ratings for the two genres? (LC10.18) Using the definition of “\\(p\\)-value”, write in words what the \\(p\\)-value represents for the hypothesis test above comparing the mean rating of romance to action movies. (LC10.19) What is the value of the \\(p\\)-value for the hypothesis test comparing the mean rating of romance to action movies? (LC10.20) Do the results of the hypothesis test match up with the original plots we made looking at the population of movies? Why or why not? 10.7.12 Summary To review, these are the steps one would take whenever you’d like to do a hypothesis test comparing values from the distributions of two groups: Simulate many samples using a random process that matches the way the original data were collected and that assumes the null hypothesis is true. Collect the values of a sample statistic for each sample created using this random process to build a null distribution. Assess the significance of the original sample by determining where its sample statistic lies in the null distribution. If the proportion of values as extreme or more extreme than the observed statistic in the randomization distribution is smaller than the pre-determined significance level \\(\\alpha\\), we reject \\(H_0\\). Otherwise, we fail to reject \\(H_0\\). (If no significance level is given, one can assume \\(\\alpha = 0.05\\).) 10.8 Building theory-based methods using computation As a point of reference, we will now discuss the traditional theory-based way to conduct the hypothesis test for determining if there is a statistically significant difference in the sample mean rating of Action movies versus Romance movies. This method and ones like it work very well when the assumptions are met in order to run the test. They are based on probability models and distributions such as the normal and \\(t\\)-distributions. These traditional methods have been used for many decades back to the time when researchers didn’t have access to computers that could run 5000 simulations in a few seconds. They had to base their methods on probability theory instead. Many fields and researchers continue to use these methods and that is the biggest reason for their inclusion here. It’s important to remember that a \\(t\\)-test or a \\(z\\)-test is really just an approximation of what you have seen in this chapter already using simulation and randomization. The focus here is on understanding how the shape of the \\(t\\)-curve comes about without digging big into the mathematical underpinnings. 10.8.1 Example: \\(t\\)-test for two independent samples What is commonly done in statistics is the process of standardization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common standardization is known as the \\(z\\)-score. The formula for a \\(z\\)-score is \\[Z = \\frac{x - \\mu}{\\sigma},\\] where \\(x\\) represent the value of a variable, \\(\\mu\\) represents the mean of the variable, and \\(\\sigma\\) represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding \\(z\\)-score that gives how many standard deviations away that value is from its mean. \\(z\\)-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below. Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity. Another form of standardization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This standardization is often called the \\(t\\)-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is \\[T =\\dfrac{ (\\bar{x}_1 - \\bar{x}_2) - (\\mu_1 - \\mu_2)}{ \\sqrt{\\dfrac{{s_1}^2}{n_1} + \\dfrac{{s_2}^2}{n_2}} }\\] There is a lot to try to unpack here. \\(\\bar{x}_1\\) is the sample mean response of the first group \\(\\bar{x}_2\\) is the sample mean response of the second group \\(\\mu_1\\) is the population mean response of the first group \\(\\mu_2\\) is the population mean response of the second group \\(s_1\\) is the sample standard deviation of the response of the first group \\(s_2\\) is the sample standard deviation of the response of the second group \\(n_1\\) is the sample size of the first group \\(n_2\\) is the sample size of the second group Assuming that the null hypothesis is true (\\(H_0: \\mu_1 - \\mu_2 = 0\\)), \\(T\\) is said to be distributed following a \\(t\\) distribution with degrees of freedom equal to the smaller value of \\(n_1 - 1\\) and \\(n_2 - 1\\). The “degrees of freedom” can be thought of measuring how different the \\(t\\) distribution will be as compared to a normal distribution. Small sample sizes lead to small degrees of freedom and, thus, \\(t\\) distributions that have more values in the tails of their distributions. Large sample sizes lead to large degrees of freedom and, thus, \\(t\\) distributions that closely align with the standard normal, bell-shaped curve. So, assuming \\(H_0\\) is true, our formula simplifies a bit: \\[T =\\dfrac{ \\bar{x}_1 - \\bar{x}_2}{ \\sqrt{\\dfrac{{s_1}^2}{n_1} + \\dfrac{{s_2}^2}{n_2}} }.\\] We have already built an approximation for what we think the distribution of \\(\\delta = \\bar{x}_1 - \\bar{x}_2\\) looks like using randomization above. Recall this distribution: ggplot(data = null_distribution_two_means, aes(x = stat)) + geom_histogram(color = &quot;white&quot;, bins = 20) FIGURE 10.10: Simulated differences in means histogram The infer package also includes some built-in theory-based statistics as well, so instead of going through the process of trying to transform the difference into a standardized form, we can just provide a different value for stat in calculate(). Recall the generated_samples data frame created via: generated_samples &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) We can now created a null distribution of \\(t\\) statistics: null_distribution_t &lt;- generated_samples %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) null_distribution_t %&gt;% visualize() We see that the shape of this stat = &quot;t&quot; distribution is the same as that of stat = &quot;diff in means&quot;. The scale has changed though with the \\(t\\) values having less spread than the difference in means. A traditional \\(t\\)-test doesn’t look at this simulated distribution, but instead it looks at the \\(t\\)-curve with degrees of freedom equal to 62.029. We can overlay this distribution over the top of our permuted \\(t\\) statistics using the method = &quot;both&quot; setting in visualize(). null_distribution_t %&gt;% visualize(method = &quot;both&quot;) We can see that the curve does a good job of approximating the randomization distribution here. (More on when to expect for this to be the case when we discuss conditions for the \\(t\\)-test in a bit.) To calculate the \\(p\\)-value in this case, we need to figure out how much of the total area under the \\(t\\)-curve is at our observed \\(T\\)-statistic or more, plus also adding the area under the curve at the negative value of the observed \\(T\\)-statistic or below. (Remember this is a two-tailed test so we are looking for a difference–values in the tails of either direction.) Just as we converted all of the simulated values to \\(T\\)-statistics, we must also do so for our observed effect \\(\\delta^*\\): obs_t &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) So graphically we are interested in finding the percentage of values that are at or above 2.945 or at or below -2.945. null_distribution_t %&gt;% visualize(method = &quot;both&quot;, obs_stat = obs_t, direction = &quot;both&quot;) As we might have expected with this just being a standardization of the difference in means statistic that produced a small \\(p\\)-value, we also have a very small one here. 10.8.2 Conditions for t-test The infer package does not automatically check conditions for the theoretical methods to work and this warning was given when we used method = &quot;both&quot;. In order for the results of the \\(t\\)-test to be valid, three conditions must be met: Independent observations in both samples Nearly normal populations OR large sample sizes (\\(n \\ge 30\\)) Independently selected samples Condition 1: This is met since we sampled at random using R from our population. Condition 2: Recall from Figure 10.4, that we know how the populations are distributed. Both of them are close to normally distributed. If we are a little concerned about this assumption, we also do have samples of size larger than 30 (\\(n_1 = n_2 = 34\\)). Condition 3: This is met since there is no natural pairing of a movie in the Action group to a movie in the Romance group. Since all three conditions are met, we can be reasonably certain that the theory-based test will match the results of the randomization-based test using shuffling. Remember that theory-based tests can produce some incorrect results in these assumptions are not carefully checked. The only assumption for randomization and computational-based methods is that the sample is selected at random. They are our preference and we strongly believe they should be yours as well, but it’s important to also see how the theory-based tests can be done and used as an approximation for the computational techniques until at least more researchers are using these techniques that utilize the power of computers. 10.9 Conclusion We conclude by showing the infer pipeline diagram. In Chapter 11, we’ll come back to regression and see how the ideas covered in Chapter 9 and this chapter can help in understanding the significance of predictors in modeling. 10.9.1 Script of R code An R script file of all R code used in this chapter is available here. References "],
+["11-inference-for-regression.html", "Chapter 11 Inference for Regression 11.1 Simulation-based Inference for Regression 11.2 Bootstrapping for the regression slope 11.3 Inference for multiple regression 11.4 Residual analysis", " Chapter 11 Inference for Regression In preparation for our first print edition to be published by CRC Press in Fall 2019, we’re remodeling this chapter a bit. Don’t expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at ModernDive.com by early Summer 2019! Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(infer) library(gapminder) library(ISLR) 11.1 Simulation-based Inference for Regression We can also use the concept of permuting to determine the standard error of our null distribution and conduct a hypothesis test for a population slope. Let’s go back to our example on teacher evaluations Chapters 6 and 7. We’ll begin in the basic regression setting to test to see if we have evidence that a statistically significant positive relationship exists between teaching and beauty scores for the University of Texas professors. As we did in Chapter 6, teaching score will act as our outcome variable and bty_avg will be our explanatory variable. We will set up this hypothesis testing process as we have each before via the “There is Only One Test” diagram in Figure 10.1 using the infer package. 11.1.1 Data Our data is stored in evals and we are focused on the measurements of the score and bty_avg variables there. Note that we don’t choose a subset of variables here since we will specify() the variables of interest using infer. evals %&gt;% specify(score ~ bty_avg) Response: score (numeric) Explanatory: bty_avg (numeric) # A tibble: 463 x 2 score bty_avg &lt;dbl&gt; &lt;dbl&gt; 1 4.7 5 2 4.1 5 3 3.9 5 4 4.8 5 5 4.6 3 6 4.3 3 7 2.8 3 8 4.1 3.33 9 3.4 3.33 10 4.5 3.17 # … with 453 more rows 11.1.2 Test statistic \\(\\delta\\) Our test statistic here is the sample slope coefficient that we denote with \\(b_1\\). 11.1.3 Observed effect \\(\\delta^*\\) We can use the specify() %&gt;% calculate() shortcut here to determine the slope value seen in our observed data: slope_obs &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% calculate(stat = &quot;slope&quot;) The calculated slope value from our observed sample is \\(b_1 = 0.067\\). 11.1.4 Model of \\(H_0\\) We are looking to see if a positive relationship exists so \\(H_A: \\beta_1 &gt; 0\\). Our null hypothesis is always in terms of equality so we have \\(H_0: \\beta_1 = 0\\). In other words, when we assume the null hypothesis is true, we are assuming there is NOT a linear relationship between teaching and beauty scores for University of Texas professors. 11.1.5 Simulated data Now to simulate the null hypothesis being true and recreating how our sample was created, we need to think about what it means for \\(\\beta_1\\) to be zero. If \\(\\beta_1 = 0\\), we said above that there is no relationship between the teaching and beauty scores. If there is no relationship, then any one of the teaching score values could have just as likely occurred with any of the other beauty score values instead of the one that it actually did fall with. We, therefore, have another example of permuting in our simulating of data under the null hypothesis. Tactile simulation We could use a deck of 926 note cards to create a tactile simulation of this permuting process. We would write the 463 different values of beauty scores on each of the 463 cards, one per card. We would then do the same thing for the 463 teaching scores putting them on one per card. Next, we would lay out each of the 463 beauty score cards and we would shuffle the teaching score deck. Then, after shuffling the deck well, we would disperse the cards one per each one of the beauty score cards. We would then enter these new values in for teaching score and compute a sample slope based on this permuting. We could repeat this process many times, keeping track of our sample slope after each shuffle. 11.1.6 Distribution of \\(\\delta\\) under \\(H_0\\) We can build our null distribution in much the same way we did in Chapter 10 using the generate() and calculate() functions. Note also the addition of the hypothesize() function, which lets generate() know to perform the permuting instead of bootstrapping. null_slope_distn &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;slope&quot;) null_slope_distn %&gt;% visualize(obs_stat = slope_obs, direction = &quot;greater&quot;) In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with visualize(). 11.1.7 The p-value null_slope_distn %&gt;% get_pvalue(obs_stat = slope_obs, direction = &quot;greater&quot;) # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0 Since 0.067 falls far to the right of this plot beyond where any of the histogram bins have data, we can say that we have a \\(p\\)-value of 0. We, thus, have evidence to reject the null hypothesis in support of there being a positive association between the beauty score and teaching score of University of Texas faculty members. Learning check (LC11.1) Repeat the inference above but this time for the correlation coefficient instead of the slope. Note the implementation of stat = &quot;correlation&quot; in the calculate() function of the infer package. 11.2 Bootstrapping for the regression slope With the p-value calculated as 0 in the hypothesis test above, we can next determine just how strong of a positive slope value we might expect between the variables of teaching score and beauty score (bty_avg) for University of Texas faculty. Recall the infer pipeline above to compute the null distribution. Recall that this assumes the null hypothesis is true that there is no relationship between teaching score and beauty score using the hypothesize() function. null_slope_distn &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000, type = &quot;permute&quot;) %&gt;% calculate(stat = &quot;slope&quot;) To further reinforce the process being done in the pipeline, we’ve added the type argument to generate(). This is automatically added based on the entries for specify() and hypothesize() but it provides a useful way to check to make sure generate() is created the samples in the desired way. In this case, we permuted the values of one variable across the values of the other 10,000 times and calculated a &quot;slope&quot; coefficient for each of these 10,000 generated samples. If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping: bootstrap_slope_distn &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% generate(reps = 10000, type = &quot;bootstrap&quot;) %&gt;% calculate(stat = &quot;slope&quot;) bootstrap_slope_distn %&gt;% visualize() Next we can use the get_ci() function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score. percentile_slope_ci &lt;- bootstrap_slope_distn %&gt;% get_ci(level = 0.99, type = &quot;percentile&quot;) percentile_slope_ci # A tibble: 1 x 2 `0.5%` `99.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.0229 0.110 se_slope_ci &lt;- bootstrap_slope_distn %&gt;% get_ci(level = 0.99, type = &quot;se&quot;, point_estimate = slope_obs) se_slope_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 0.0220 0.111 With the bootstrap distribution being close to symmetric, it makes sense that the two resulting confidence intervals are similar. 11.3 Inference for multiple regression 11.3.1 Refresher: Professor evaluations data Let’s revisit the professor evaluations data that we analyzed using multiple regression with one numerical and one categorical predictor. In particular \\(y\\): outcome variable of instructor evaluation score predictor variables \\(x_1\\): numerical explanatory/predictor variable of age \\(x_2\\): categorical explanatory/predictor variable of gender library(ggplot2) library(dplyr) library(moderndive) evals_multiple &lt;- evals %&gt;% select(score, ethnicity, gender, language, age, bty_avg, rank) First, recall that we had two competing potential models to explain professors’ teaching scores: Model 1: No interaction term. i.e. both male and female profs have the same slope describing the associated effect of age on teaching score Model 2: Includes an interaction term. i.e. we allow for male and female profs to have different slopes describing the associated effect of age on teaching score 11.3.2 Refresher: Visualizations Recall the plots we made for both these models: FIGURE 11.1: Model 1: no interaction effect included FIGURE 11.2: Model 2: interaction effect included 11.3.3 Refresher: Regression tables Last, let’s recall the regressions we fit. First, the regression with no interaction effect: note the use of + in the formula in Table 11.1. score_model_2 &lt;- lm(score ~ age + gender, data = evals_multiple) get_regression_table(score_model_2) TABLE 11.1: Model 1: Regression table with no interaction effect included term estimate std_error statistic p_value lower_ci upper_ci intercept 4.484 0.125 35.79 0.000 4.238 4.730 age -0.009 0.003 -3.28 0.001 -0.014 -0.003 gendermale 0.191 0.052 3.63 0.000 0.087 0.294 Second, the regression with an interaction effect: note the use of * in the formula. score_model_3 &lt;- lm(score ~ age * gender, data = evals_multiple) get_regression_table(score_model_3) TABLE 11.2: Model 2: Regression table with interaction effect included term estimate std_error statistic p_value lower_ci upper_ci intercept 4.883 0.205 23.80 0.000 4.480 5.286 age -0.018 0.004 -3.92 0.000 -0.026 -0.009 gendermale -0.446 0.265 -1.68 0.094 -0.968 0.076 age:gendermale 0.014 0.006 2.45 0.015 0.003 0.024 11.3.4 Script of R code An R script file of all R code used in this chapter is available here. 11.4 Residual analysis 11.4.1 Residual analysis Recall the residuals can be thought of as the error or the “lack-of-fit” between the observed value \\(y\\) and the fitted value \\(\\widehat{y}\\) on the blue regression line in Figure 6.6. Ideally when we fit a regression model, we’d like there to be no systematic pattern to these residuals. We’ll be more specific as to what we mean by no systematic pattern when we see Figure 11.4 below, but let’s keep this notion imprecise for now. Investigating any such patterns is known as residual analysis and is the theme of this section. We’ll perform our residual analysis in two ways: Creating a scatterplot with the residuals on the \\(y\\)-axis and the original explanatory variable \\(x\\) on the \\(x\\)-axis. Creating a histogram of the residuals, thereby showing the distribution of the residuals. First, recall in Figure 6.8 above we created a scatterplot where on the vertical axis we had the teaching score \\(y\\), on the horizontal axis we had the beauty score \\(x\\), and the blue arrow represented the residual for one particular instructor. Instead, in Figure 11.3 below, let’s create a scatterplot where On the vertical axis we have the residual \\(y-\\widehat{y}\\) instead On the horizontal axis we have the beauty score \\(x\\) as before: # Get data evals_ch6 &lt;- evals %&gt;% select(score, bty_avg, age) # Fit regression model: score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) # Get regression table: get_regression_table(score_model) # A tibble: 2 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept 3.88 0.076 51.0 0 3.73 4.03 2 bty_avg 0.067 0.016 4.09 0 0.035 0.099 # Get regression points regression_points &lt;- get_regression_points(score_model) ggplot(regression_points, aes(x = bty_avg, y = residual)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;, size = 1) FIGURE 11.3: Plot of residuals over beauty score You can think of Figure 11.3 as Figure 6.8 but with the blue line flattened out to \\(y=0\\). Does it seem like there is no systematic pattern to the residuals? This question is rather qualitative and subjective in nature, thus different people may respond with different answers to the above question. However, it can be argued that there isn’t a drastic pattern in the residuals. Let’s now get a little more precise in our definition of no systematic pattern in the residuals. Ideally, the residuals should behave randomly. In addition, the residuals should be on average 0. In other words, sometimes the regression model will make a positive error in that \\(y - \\widehat{y} &gt; 0\\), sometimes the regression model will make a negative error in that \\(y - \\widehat{y} &lt; 0\\), but on average the error is 0. Further, the value and spread of the residuals should not depend on the value of \\(x\\). In Figure 11.4 below, we display some hypothetical examples where there are drastic patterns to the residuals. In Example 1, the value of the residual seems to depend on \\(x\\): the residuals tend to be positive for small and large values of \\(x\\) in this range, whereas values of \\(x\\) more in the middle tend to have negative residuals. In Example 2, while the residuals seem to be on average 0 for each value of \\(x\\), the spread of the residuals varies for different values of \\(x\\); this situation is known as heteroskedasticity. FIGURE 11.4: Examples of less than ideal residual patterns The second way to perform a residual analysis is to look at the histogram of the residuals: ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 0.25, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) FIGURE 11.5: Histogram of residuals This histogram seems to indicate that we have more positive residuals than negative. Since the residual \\(y-\\widehat{y}\\) is positive when \\(y &gt; \\widehat{y}\\), it seems our fitted teaching score from the regression model tends to underestimate the true teaching score. This histogram has a slight left-skew in that there is a long tail on the left. Another way to say this is this data exhibits a negative skew. Is this a problem? Again, there is a certain amount of subjectivity in the response. In the authors’ opinion, while there is a slight skew/pattern to the residuals, it isn’t a large concern. On the other hand, others might disagree with our assessment. Here are examples of an ideal and less than ideal pattern to the residuals when viewed in a histogram: FIGURE 11.6: Examples of ideal and less than ideal residual patterns In fact, we’ll see later on that we would like the residuals to be normally distributed with mean 0. In other words, be bell-shaped and centered at 0! While this requirement and residual analysis in general may seem to some of you as not being overly critical at this point, we’ll see later after when we cover inference for regression in Chapter 11 that for the last five columns of the regression table from earlier (std error, statistic, p_value,lower_ci, and upper_ci) to have valid interpretations, the above three conditions should roughly hold. Learning check (LC11.2) Continuing with our regression using age as the explanatory variable and teaching score as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 463 instructors. Perform a residual analysis and look for any systematic patterns in the residuals. Ideally, there should be little to no pattern. 11.4.2 Residual analysis # Get data: gapminder2007 &lt;- gapminder %&gt;% filter(year == 2007) %&gt;% select(country, continent, lifeExp, gdpPercap) # Fit regression model: lifeExp_model &lt;- lm(lifeExp ~ continent, data = gapminder2007) # Get regression table: get_regression_table(lifeExp_model) # A tibble: 5 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept 54.8 1.02 53.4 0 52.8 56.8 2 continentAmericas 18.8 1.8 10.4 0 15.2 22.4 3 continentAsia 15.9 1.65 9.68 0 12.7 19.2 4 continentEurope 22.8 1.70 13.5 0 19.5 26.2 5 continentOceania 25.9 5.33 4.86 0 15.4 36.4 # Get regression points regression_points &lt;- get_regression_points(lifeExp_model) Recall our discussion on residuals from Section 11.4.1 where our goal was to investigate whether or not there was a systematic pattern to the residuals. Ideally since residuals can be thought of as error, there should be no such pattern. While there are many ways to do such residual analysis, we focused on two approaches based on visualizations. A plot with residuals on the vertical axis and the predictor (in this case continent) on the horizontal axis A histogram of all residuals First, let’s plot the residuals versus continent in Figure 11.7, but also let’s plot all 142 points with a little horizontal random jitter by setting the width = 0.1 parameter in geom_jitter(): ggplot(regression_points, aes(x = continent, y = residual)) + geom_jitter(width = 0.1) + labs(x = &quot;Continent&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;) FIGURE 11.7: Plot of residuals over continent We observe There seems to be a rough balance of both positive and negative residuals for all 5 continents. However, there is one clear outlier in Asia, which has a residual with the largest deviation away from 0. Let’s investigate the 5 countries in Asia with the shortest life expectancy: gapminder2007 %&gt;% filter(continent == &quot;Asia&quot;) %&gt;% arrange(lifeExp) TABLE 11.3: Countries in Asia with shortest life expectancy country continent lifeExp gdpPercap Afghanistan Asia 43.8 975 Iraq Asia 59.5 4471 Cambodia Asia 59.7 1714 Myanmar Asia 62.1 944 Yemen, Rep. Asia 62.7 2281 This was the earlier identified residual for Afghanistan of -26.9. Unfortunately given recent geopolitical turmoil, individuals who live in Afghanistan and, in particular in 2007, have a drastically lower life expectancy. Second, let’s look at a histogram of all 142 values of residuals in Figure 11.8. In this case, the residuals form a rather nice bell-shape, although there are a couple of very low and very high values at the tails. As we said previously, searching for patterns in residuals can be somewhat subjective, but ideally we hope there are no “drastic” patterns. ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) FIGURE 11.8: Histogram of residuals Learning check (LC11.3) Continuing with our regression using gdpPercap as the outcome variable and continent as the explanatory variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 142 countries in 2007 and perform a residual analysis to look for any systematic patterns in the residuals. Is there a pattern? Please keep in mind that these types of questions are somewhat subjective and different people will most likely have different answers. The focus should be on being able to justify the conclusions made. 11.4.3 Residual analysis Recall in Section 11.4.1, our first residual analysis plot investigated the presence of any systematic pattern in the residuals when we had a single numerical predictor: bty_age. For the Credit card dataset, since we have two numerical predictors, Limit and Income, we must perform this twice: # Get data: Credit &lt;- Credit %&gt;% select(Balance, Limit, Income, Rating, Age) # Fit regression model: Balance_model &lt;- lm(Balance ~ Limit + Income, data = Credit) # Get regression table: get_regression_table(Balance_model) # A tibble: 3 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept -385. 19.5 -19.8 0 -423. -347. 2 Limit 0.264 0.006 45.0 0 0.253 0.276 3 Income -7.66 0.385 -19.9 0 -8.42 -6.91 # Get regression points regression_points &lt;- get_regression_points(Balance_model) ggplot(regression_points, aes(x = Limit, y = residual)) + geom_point() + labs(x = &quot;Credit limit (in $)&quot;, y = &quot;Residual&quot;, title = &quot;Residuals vs credit limit&quot;) ggplot(regression_points, aes(x = Income, y = residual)) + geom_point() + labs(x = &quot;Income (in $1000)&quot;, y = &quot;Residual&quot;, title = &quot;Residuals vs income&quot;) FIGURE 11.9: Residuals vs credit limit and income In this case, there does appear to be a systematic pattern to the residuals. As the scatter of the residuals around the line \\(y=0\\) is definitely not consistent. This behavior of the residuals is further evidenced by the histogram of residuals in Figure 11.10. We observe that the residuals have a slight right-skew (recall we say that data is right-skewed, or positively-skewed, if there is a tail to the right). Ideally, these residuals should be bell-shaped around a residual value of 0. ggplot(regression_points, aes(x = residual)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. FIGURE 11.10: Relationship between credit card balance and credit limit/income Another way to interpret this histogram is that since the residual is computed as \\(y - \\widehat{y}\\) = balance - balance_hat, we have some values where the fitted value \\(\\widehat{y}\\) is very much lower than the observed value \\(y\\). In other words, we are underestimating certain credit card holders’ balances by a very large amount. Learning check (LC11.4) Continuing with our regression using Rating and Age as the explanatory variables and credit card Balance as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 400 credit card holders. Perform a residual analysis and look for any systematic patterns in the residuals. 11.4.4 Residual analysis # Get data: evals_ch7 &lt;- evals %&gt;% select(score, age, gender) # Fit regression model: score_model_2 &lt;- lm(score ~ age + gender, data = evals_ch7) # Get regression table: get_regression_table(score_model_2) # A tibble: 3 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept 4.48 0.125 35.8 0 4.24 4.73 2 age -0.009 0.003 -3.28 0.001 -0.014 -0.003 3 gendermale 0.191 0.052 3.63 0 0.087 0.294 # Get regression points regression_points &lt;- get_regression_points(score_model_2) As always, let’s perform a residual analysis first with a histogram, which we can facet by gender: ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 0.25, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) + facet_wrap(~gender) FIGURE 11.11: Interaction model histogram of residuals Second, the residuals as compared to the predictor variables: \\(x_1\\): numerical explanatory/predictor variable of age \\(x_2\\): categorical explanatory/predictor variable of gender ggplot(regression_points, aes(x = age, y = residual)) + geom_point() + labs(x = &quot;age&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;, size = 1) + facet_wrap(~ gender) FIGURE 11.12: Interaction model residuals vs predictor "],
+["12-thinking-with-data.html", "Chapter 12 Thinking with Data 12.1 Case study: Seattle house prices 12.2 Case study: Effective data storytelling Concluding remarks", " Chapter 12 Thinking with Data In preparation for our first print edition to be published by CRC Press in Fall 2019, we’re remodeling this chapter a bit. Don’t expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at ModernDive.com by early Summer 2019! Recall in Section 1.1 “Introduction for students” and at the end of chapters throughout this book, we displayed the “ModernDive flowchart” mapping your journey through this book. FIGURE 12.1: ModernDive Flowchart Let’s get a refresher of what you’ve covered so far. You first got started with with data in Chapter 2, where you learned about the difference between R and RStudio, started coding in R, started understanding what R packages are, and explored your first dataset: all domestic departure flights from a New York City airport in 2013. Then: Data science: You assembled your data science toolbox using tidyverse packages. In particular: Ch.3: Visualizing data via the ggplot2 package. Ch.5: Understanding the concept of “tidy” data as a standardized data input format for all packages in the tidyverse Ch.4: Wrangling data via the dplyr package. Data modeling: Using these data science tools and helper functions from the moderndive package, you started performing data modeling. In particular: Ch.6: Constructing basic regression models. Ch.7: Constructing multiple regression models. Statistical inference: Once again using your newly acquired data science tools, we unpacked statistical inference using the infer package. In particular: Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls. Ch.9: Building confidence intervals. Ch.10: Conducting hypothesis tests. Data modeling revisited: Armed with your new understanding of statistical inference, you revisited and reviewed the models you constructed in Ch.6 &amp; Ch.7. In particular: Ch.11: Interpreting both the statistical and practice significance of the results of the models. All this was our approach of guiding you through your first experiences of “thinking with data”, an expression originally coined by Diane Lambert of Google. How the philosophy underlying this expression guided our mapping of the flowchart above was well put in the introduction to the “Practical Data Science for Stats” collection of preprints focusing on the practical side of data science workflows and statistical analysis, curated by Jennifer Bryan and Hadley Wickham: There are many aspects of day-to-day analytical work that are almost absent from the conventional statistics literature and curriculum. And yet these activities account for a considerable share of the time and effort of data analysts and applied statisticians. The goal of this collection is to increase the visibility and adoption of modern data analytical workflows. We aim to facilitate the transfer of tools and frameworks between industry and academia, between software engineering and statistics and computer science, and across different domains. In other words, in order to be equipped to “think with data” in the 21st century, future analysts need preparation going through the entirety of the “Data/Science Pipeline” we also saw earlier and not just parts of it. FIGURE 12.2: Data/Science Pipeline In Section 12.1, we’ll take you through full-pass of the “Data/Science Pipeline” where we’ll analyze the sale price of houses in Seattle, WA, USA. In Section 12.2, we’ll present you with examples of effective data storytelling, in particular the articles from the data journalism website FiveThirtyEight.com, many of whose source datasets are accessible from the fivethirtyeight R package. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(fivethirtyeight) 12.1 Case study: Seattle house prices Kaggle.com is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the House Sales in King County, USA consisting of homes sold in between May 2014 and May 2015 in King County, Washington State, USA, which includes the greater Seattle metropolitan area. This CC0: Public Domain licensed dataset is included in the moderndive package in the house_prices data frame, which we’ll refer to as the “Seattle house prices” dataset. The dataset consists 21,613 houses and 21 variables describing these houses; for a full list of these variables see the help file by running ?house_prices in the console. In this case study, we’ll create a model using multiple regression where: The outcome variable \\(y\\) is the sale price of houses The two explanatory/predictor variables we’ll use are : \\(x_1\\): house size sqft_living, as measured by square feet of living space, where 1 square foot is about 0.09 square meters. \\(x_2\\): house condition, a categorical variable with 5 levels where 1 indicates “poor” and 5 indicates “excellent.” Let’s load all the packages needed for this case study (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) 12.1.1 Exploratory data analysis (EDA) A crucial first step before any formal modeling is an exploratory data analysis, commonly abbreviated at EDA. Exploratory data analysis can give you a sense of your data, help identify issues with your data, bring to light any outliers, and help inform model construction. There are three basic approaches to EDA: Most fundamentally, just looking at the raw data. For example using RStudio’s View() spreadsheet viewer or the glimpse() function from the dplyr package Creating visualizations like the ones using ggplot2 from Chapter 3 Computing summary statistics using the dplyr data wrangling tools from Chapter 4 First, let’s look the raw data using View() and the glimpse() function. Explore the dataset. Which variables are numerical and which are categorical? For the categorical variables, what are their levels? Which do you think would useful variables to use in a model for house price? In this case study, we’ll only consider the variables price, sqft_living, and condition. An important thing to observe is that while the condition variable has values 1 through 5, these are saved in R as fct factors i.e. R’s way of saving categorical variables. So you should think of these as the “labels” 1 through 5 and not the numerical values 1 through 5. View(house_prices) glimpse(house_prices) Observations: 21,613 Variables: 21 $ id &lt;chr&gt; &quot;7129300520&quot;, &quot;6414100192&quot;, &quot;5631500400&quot;, &quot;2487200875&quot;,… $ date &lt;dttm&gt; 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-0… $ price &lt;dbl&gt; 221900, 538000, 180000, 604000, 510000, 1225000, 257500… $ bedrooms &lt;int&gt; 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2… $ bathrooms &lt;dbl&gt; 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2… $ sqft_living &lt;int&gt; 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 18… $ sqft_lot &lt;int&gt; 5650, 7242, 10000, 5000, 8080, 101930, 6819, 9711, 7470… $ floors &lt;dbl&gt; 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, … $ waterfront &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,… $ view &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0… $ condition &lt;fct&gt; 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4… $ grade &lt;fct&gt; 7, 7, 6, 7, 8, 11, 7, 7, 7, 7, 8, 7, 7, 7, 7, 9, 7, 7, … $ sqft_above &lt;int&gt; 1180, 2170, 770, 1050, 1680, 3890, 1715, 1060, 1050, 18… $ sqft_basement &lt;int&gt; 0, 400, 0, 910, 0, 1530, 0, 0, 730, 0, 1700, 300, 0, 0,… $ yr_built &lt;int&gt; 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 2… $ yr_renovated &lt;int&gt; 0, 1991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0… $ zipcode &lt;fct&gt; 98178, 98125, 98028, 98136, 98074, 98053, 98003, 98198,… $ lat &lt;dbl&gt; 47.5, 47.7, 47.7, 47.5, 47.6, 47.7, 47.3, 47.4, 47.5, 4… $ long &lt;dbl&gt; -122, -122, -122, -122, -122, -122, -122, -122, -122, -… $ sqft_living15 &lt;int&gt; 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780, 2… $ sqft_lot15 &lt;int&gt; 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 8113,… Let’s now perform the second possible approach to EDA: creating visualizations. Since price and sqft_living are numerical variables, an appropriate way to visualize of these variables’ distributions would be using a histogram using a geom_histogram() as seen in Section 3.5. However, since condition is categorical, a barplot using a geom_bar() yields an appropriate visualization of its distribution. Recall from Section 3.8 that since condition is not “pre-counted”, we use a geom_bar() and not a geom_col(). In Figure 12.3, we display all three of these visualizations at once. # Histogram of house price: ggplot(house_prices, aes(x = price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;price (USD)&quot;, title = &quot;House price&quot;) # Histogram of sqft_living: ggplot(house_prices, aes(x = sqft_living)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;living space (square feet)&quot;, title = &quot;House size&quot;) # Barplot of condition: ggplot(house_prices, aes(x = condition)) + geom_bar() + labs(x = &quot;condition&quot;, title = &quot;House condition&quot;) FIGURE 12.3: Exploratory visualizations of Seattle house prices data We observe the following: In the histogram for price: Since e+06 means \\(10^6\\), or one million, we see that a majority of houses are less than 2 million dollars. The x-axis stretches out far to the right to 8 million dollars, even though there appear to be no houses. In the histogram for size sqft_living Most houses appear to have less than 5000 square feet of living space. For comparison a standard American football field is about 57,600 square feet, where as a standard soccer AKA association football field is about 64,000 square feet. The x-axis exhibits the same stretched out behavior to the right as for price Most houses are of condition 3, 4, or 5. In the case of price, why does the x-axis stretch so far to the right? It is because there are a very small number of houses with price closer to 8 million; these prices are outliers in this case. We say variable is “right skewed” as exhibited by the long right tail. This skew makes it difficult to compare prices of the less expensive houses as the more expensive houses dominate the scale of the x-axis. This is similarly the case for sqft_living. Let’s now perform the third possible approach to EDA: computing summary statistics. In particular, let’s compute 4 summary statistics using the summarize() data wrangling verb from Section 4.3. Two measures of center: the mean and median Two measures of variability/spread: the standard deviation and interquartile-range (IQR = 3rd quartile - 1st quartile) house_prices %&gt;% summarize( mean_price = mean(price), median_price = median(price), sd_price = sd(price), IQR_price = IQR(price) ) # A tibble: 1 x 4 mean_price median_price sd_price IQR_price &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 540088. 450000 367127. 323050 Observe the following: The mean price of $540,088 is larger than the median of $450,000. This is because the small number of very expensive outlier houses prices are inflating the average, whereas since the median is the “middle” value, it is not as sensitive to such large values at the high end. This is why the news typically reports median house prices and not average house prices when describing the real estate market. We say here that the median more “robust to outliers” than the mean. Similarly, while both the standard deviation and IQR are both measures of spread and variability, the IQR is more “robust to outliers.” If you repeat the above summarize() for sqft_living, you’ll find a similar relationship between mean vs median and standard deviation vs IQR given its similar right-skewed nature. Is there anything we can do about this right-skew? Again, this could potentially be an issue because we’ll have a harder time discriminating between houses at the lower end of price and sqft_living, which might lead to a problem when modeling. We can in fact address this issue by using a log base 10 transformation, which we cover next. 12.1.2 log10 transformations At its simplest, log10() transformations returns base 10 logarithms. For example, since \\(1000 = 10^3\\), log10(1000) returns 3. To undo a log10-transformation, we raise 10 to this value. For example, to undo the previous log10-transformation and return the original value of 1000, we raise 10 to this value \\(10^{3}\\) by running 10^(3) = 1000. log-transformations allow us to focus on multiplicative changes instead of additive ones, thereby emphasizing changes in “orders of magnitude.” Let’s illustrate this idea in Table 12.1 with examples of prices of consumer goods in US dollars. TABLE 12.1: log10-transformed prices, orders of magnitude, and examples Price log10(Price) Order of magnitude Examples $1 0 Singles Cups of coffee $10 1 Tens Books $100 2 Hundreds Mobile phones $1,000 3 Thousands High definition TV’s $10,000 4 Tens of thousands Cars $100,000 5 Hundreds of thousands Luxury cars &amp; houses $1,000,000 6 Millions Luxury houses Let’s break this down: When purchasing a cup of coffee, we tend to think of prices ranging in single dollars e.g. $2 or $3. However when purchasing say mobile phones, we don’t tend to think in prices in single dollars e.g. $676 or $757, but tend to round to the nearest unit of hundreds of dollars e.g. $200 or $500. Let’s say we want to know the log10-transformed value of $76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since $76 is between $10 and $100. In fact, log10(76) is 1.880814. log10-transformations are monotonic, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B). Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: $100 to $1000. Let’s create new log10-transformed versions of the right-skewed variable price and sqft_living using the mutate() function from Section 4.5, but we’ll give the latter the name log10_size, which is a little more succinct and descriptive a variable name. house_prices &lt;- house_prices %&gt;% mutate( log10_price = log10(price), log10_size = log10(sqft_living) ) Let’s first display the before and after effects of this transformation on these variables for only the first 10 rows of house_prices: house_prices %&gt;% select(price, log10_price, sqft_living, log10_size) # A tibble: 10 x 4 price log10_price sqft_living log10_size &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; 1 221900 5.35 1180 3.07 2 538000 5.73 2570 3.41 3 180000 5.26 770 2.89 4 604000 5.78 1960 3.29 5 510000 5.71 1680 3.23 6 1225000 6.09 5420 3.73 7 257500 5.41 1715 3.23 8 291850 5.47 1060 3.03 9 229500 5.36 1780 3.25 10 323000 5.51 1890 3.28 Observe in particular: The house in the 6th row with price $1,225,000, which is just above one million dollars. Since \\(10^6\\) is one million, its log10_price is 6.09. Contrast this with all other houses with log10_price less than 6. Similarly, there is only one house with size sqft_living less than 1000. Since \\(1000 = 10^3\\), its the lone house with log10_size less than 3. Let’s now visualize the before and after effects of this transformation for price in Figure 12.4. # Before: ggplot(house_prices, aes(x = price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;price (USD)&quot;, title = &quot;House price: Before&quot;) # After: ggplot(house_prices, aes(x = log10_price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;log10 price (USD)&quot;, title = &quot;House price: After&quot;) FIGURE 12.4: House price before and after log10-transformation Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and bell-shaped, although this isn’t always necessarily the case. Now you can now better discriminate between house prices at the lower end of the scale. Let’s do the same for size where the before variable is sqft_living and the after variable is log10_size. Observe in Figure 12.5 that the log10-transformation has a similar effect of un-skewing the variable. Again, we emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case. # Before: ggplot(house_prices, aes(x = sqft_living)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;living space (square feet)&quot;, title = &quot;House size: Before&quot;) # After: ggplot(house_prices, aes(x = log10_size)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;log10 living space (square feet)&quot;, title = &quot;House size: After&quot;) FIGURE 12.5: House size before and after log10-transformation Given the now un-skewed nature of log10_price and log10_size, we are going to revise our modeling structure: We’ll use a new outcome variable \\(y\\) log10_price of houses The two explanatory/predictor variables we’ll use are: \\(x_1\\): A modified version of house size: log10_size \\(x_2\\): House condition will remain unchanged 12.1.3 EDA Part II Let’s continue our exploratory data analysis from Subsection 12.1.1 above. The earlier EDA you performed was univariate in nature in that we only considered one variable at a time. The goal of modeling, however, is to explore relationships between variables. So we must jointly consider the relationship between the outcome variable log10_price and the explanatory/predictor variables log10_size (numerical) and condition (categorical). We viewed such a modeling scenario in Section 7.2 using the evals dataset, where the outcome variable was teaching score, the numerical explanatory/predictor variable was instructor age and the categorical explanatory/predictor variable was (binary) gender. We have two possible visual models. Either a parallel slopes model in Figure 12.6 where we have a different regression line for each of the 5 possible condition levels, each with a different intercept but the same slope: FIGURE 12.6: Parallel slopes model Or an interaction model in Figure 12.7, where we allow each regression line to not only have different intercepts, but different slopes as well: ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) + geom_point(alpha = 0.1) + labs(y = &quot;log10 price&quot;, x = &quot;log10 size&quot;, title = &quot;House prices in Seattle&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) FIGURE 12.7: Interaction model In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plots it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the purple line is highest, followed by condition 4 and 3. As for condition 1 and 2, this pattern isn’t as clear, as if you recall from the univariate barplot of condition in Figure 12.3 there are very few houses of condition 1 or 2. This reality is more apparent in an alternative visualization to Figure 12.7 displayed in Figure 12.8 that uses facets instead: ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) + geom_point(alpha = 0.3) + labs(y = &quot;log10 price&quot;, x = &quot;log10 size&quot;, title = &quot;House prices in Seattle&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) + facet_wrap(~condition) FIGURE 12.8: Interaction model with facets Which exploratory visualization of the interaction model is better, the one in Figure 12.7 or Figure 12.8? There is no universal right answer, you need to make a choice depending on what you want to convey, and own it. 12.1.4 Regression modeling For now let’s focus on the latter, interaction model we’ve visualized in Figure 12.8 above. What are the 5 different slopes and intercepts for the condition = 1, condition = 2, …, and condition = 5 lines in Figure 12.8? To determine these, we first need the values from the regression table: # Fit regression model: price_interaction &lt;- lm(log10_price ~ log10_size * condition, data = house_prices) # Get regression table: get_regression_table(price_interaction) # A tibble: 10 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept 3.33 0.451 7.38 0 2.45 4.22 2 log10_size 0.69 0.148 4.65 0 0.399 0.98 3 condition2 0.047 0.498 0.094 0.925 -0.93 1.02 4 condition3 -0.367 0.452 -0.812 0.417 -1.25 0.519 5 condition4 -0.398 0.453 -0.879 0.38 -1.29 0.49 6 condition5 -0.883 0.457 -1.93 0.053 -1.78 0.013 7 log10_size:condition2 -0.024 0.163 -0.148 0.882 -0.344 0.295 8 log10_size:condition3 0.133 0.148 0.893 0.372 -0.158 0.424 9 log10_size:condition4 0.146 0.149 0.979 0.328 -0.146 0.437 10 log10_size:condition5 0.31 0.15 2.07 0.039 0.016 0.604 Recall from Section 7.2.3 on how to interpret the outputs where there exists an interaction term, where in this case the “baseline for comparison” group for the categorical variable condition are the condition 1 houses. We’ll write our answers as: \\[\\widehat{\\log10(\\text{price})} = \\hat{\\beta}_0 + \\hat{\\beta}_{\\text{size}} * \\log10(\\text{size})\\] for all five condition levels separately: Condition 1: \\(\\widehat{\\log10(\\text{price})} = 3.33 + 0.69 * \\log10(\\text{size})\\) Condition 2: \\(\\widehat{\\log10(\\text{price})} = (3.33 + 0.047) + (0.69 - 0.024) * \\log10(\\text{size}) = 3.38 + 0.666 * \\log10(\\text{size})\\) Condition 3: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.367) + (0.69 + 0.133) * \\log10(\\text{size}) = 2.96 + 0.823 * \\log10(\\text{size})\\) Condition 4: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.398) + (0.69 + 0.146) * \\log10(\\text{size}) = 2.93 + 0.836 * \\log10(\\text{size})\\) Condition 5: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.883) + (0.69 + 0.31) * \\log10(\\text{size}) = 2.45 + 1 * \\log10(\\text{size})\\) These correspond to the regression lines in the exploratory visualization of the interaction model in Figure 12.7 above. For homes of all 5 condition types, as the size of the house increases, the prices increases. This is what most would expect. However, the rate of increase of price with size is fastest for the homes with condition 3, 4, and 5 of 0.823, 0.836, and 1 respectively; these are the 3 most largest slopes out of the 5. 12.1.5 Making predictions Say you’re a realtor and someone calls you asking you how much their home will sell for. They tell you that it’s in condition = 5 and is sized 1900 square feet. What do you tell them? We first make this prediction visually in Figure 12.9. The predicted log10_price of this house is marked with a black dot: it is where the two following lines intersect: The purple regression line for the condition = 5 homes and The vertical dashed black line at log10_size equals 3.28, since our predictor variable is the log10-transformed square feet of living space and \\(\\log10(1900) = 3.28\\) . FIGURE 12.9: Interaction model with prediction Eyeballing it, it seems the predicted log10_price seems to be around 5.72. Let’s now obtain the an exact numerical value for the prediction using the values of the intercept and slope for the condition = 5 that we computed using the regression table output. We use the equation for the condition = 5 line, being sure to log10() the square footage first. 2.45 + 1 * log10(1900) [1] 5.73 This value is very close to our earlier visually made prediction of 5.72. But wait! We were using the outcome variable log10_price as our outcome variable! So if we want a prediction in terms of price in dollar units, we need to un-log this by taking a power of 10 as described in Section 12.1.2. 10^(2.45 + 1 * log10(1900)) [1] 535493 So we our predicted price for this home of condition 5 and size 1900 square feet is $535,493. Learning check (LC12.1) Repeat the regression modeling in Subsection 12.1.4 and the prediction making you just did on the house of condition 5 and size 1900 square feet in Subsection 12.1.5, but using the parallel slopes model you visualized in Figure 12.6. Hint: it’s $524,807! 12.2 Case study: Effective data storytelling Note: This section is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. As we’ve progressed throughout this book, you’ve seen how to work with data in a variety of ways. You’ve learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types. You’ve summarized data in table form and calculated summary statistics for a variety of different variables. Further, you’ve seen the value of inference as a process to come to conclusions about a population by using a random sample. Lastly, you’ve explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure. All throughout, you’ve learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the “effective data storytelling” done by data journalists around the world. Great data stories don’t mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling. 12.2.1 Bechdel test for Hollywood gender representation We recommend you read and analyze this article by Walt Hickey entitled The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women on the Bechdel test, an informal test of gender representation in a movie. As you read over it, think carefully about how Walt is using data, graphics, and analyses to paint the picture for the reader of what the story is he wants to tell. In the spirit of reproducibility, the members of FiveThirtyEight have also shared the data and R code that they used to create for this story and many more of their articles on GitHub. ModernDive co-authors Chester Ismay and Albert Y. Kim along with Jennifer Chunn went one step further by creating the fivethirtyeight R package. The fivethirtyeight package takes FiveThirtyEight’s article data from GitHub, “tames” it so that it’s novice-friendly, and makes all data, documentation, and the original article easily accessible via an R package. The package homepage also includes a list of all fivethirtyeight data sets included. Furthermore, example “vignettes” of fully reproducible start-to-finish analyses of some of these data using dplyr, ggplot2, and other packages in the tidyverse is available here. For example, a vignette showing how to reproduce one of the plots at the end of the above article on the Bechdel test is available here. 12.2.2 US Births in 1999 Here is another example involving the US_births_1994_2003 data frame of all births in the United States between 1994 and 2003. For more information on this data frame including a link to the original article on FiveThirtyEight.com, check out the help file by running ?US_births_1994_2003 in the console. First, let’s load all necessary packages: library(ggplot2) library(dplyr) library(fivethirtyeight) It’s always a good idea to preview your data, either by using RStudio’s spreadsheet View() function or using glimpse() from the dplyr package below: # Preview data glimpse(US_births_1994_2003) Observations: 3,652 Variables: 6 $ year &lt;int&gt; 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1… $ month &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… $ date_of_month &lt;int&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, … $ date &lt;date&gt; 1994-01-01, 1994-01-02, 1994-01-03, 1994-01-04, 1994-0… $ day_of_week &lt;ord&gt; Sat, Sun, Mon, Tues, Wed, Thurs, Fri, Sat, Sun, Mon, Tu… $ births &lt;int&gt; 8096, 7772, 10142, 11248, 11053, 11406, 11251, 8653, 79… We’ll focus on the number of births for each date, but only for births that occurred in 1999. Recall we achieve this using the filter() command from dplyr package: US_births_1999 &lt;- US_births_1994_2003 %&gt;% filter(year == 1999) Since date is a notion of time, which has a sequential ordering to it, a linegraph AKA a “time series” plot would be more appropriate than a scatterplot. In other words, use a geom_line() instead of geom_point(): ggplot(US_births_1999, aes(x = date, y = births)) + geom_line() + labs(x = &quot;Data&quot;, y = &quot;Number of births&quot;, title = &quot;US Births in 1999&quot;) We see a big valley occurring just before January 1st, 2000, mostly likely due to the holiday season. However, what about the major peak of over 14,000 births occurring just before October 1st, 1999? What could be the reason for this anomalously high spike in ? Time to think with data! 12.2.3 Other examples Stand by! 12.2.4 Script of R code An R script file of all R code used in this chapter is available here. Concluding remarks If you’ve come to this point in the book, I’d suspect that you know a thing or two about how to work with data in R. You’ve also gained a lot of knowledge about how to use simulation techniques to determine statistical significance and how these techniques build an intuition about traditional inferential methods like the \\(t\\)-test. The hope is that you’ve come to appreciate data wrangling, tidy datasets, and the power of data visualization. Actually, the data visualization part may be the most important thing here. If you can create truly beautiful graphics that display information in ways that the reader can clearly decipher, you’ve picked up a great skill. Let’s hope that that skill keeps you creating great stories with data into the near and far distant future. Thanks for coming along for the ride as we dove into modern data analysis using R! "],
+["A-appendixA.html", "A Statistical Background A.1 Basic statistical terms A.2 Normal distribution discussion", " A Statistical Background A.1 Basic statistical terms A.1.1 Mean The mean AKA average is the most commonly reported measure of center. It is commonly called the “average” though this term can be a little ambiguous. The mean is the sum of all of the data elements divided by how many elements there are. If we have \\(n\\) data points, the mean is given by: \\[Mean = \\frac{x_1 + x_2 + \\cdots + x_n}{n}\\] A.1.2 Median The median is calculated by first sorting a variable’s data from smallest to largest. After sorting the data, the middle element in the list is the median. If the middle falls between two values, then the median is the mean of those two values. A.1.3 Standard deviation We will next discuss the standard deviation of a sample dataset pertaining to one variable. The formula can be a little intimidating at first but it is important to remember that it is essentially a measure of how far to expect a given data value is from its mean: \\[Standard \\, deviation = \\sqrt{\\frac{(x_1 - Mean)^2 + (x_2 - Mean)^2 + \\cdots + (x_n - Mean)^2}{n - 1}}\\] A.1.4 Five-number summary The five-number summary consists of five values: minimum, first quantile AKA 25th percentile, second quantile AKA median AKA 50th percentile, third quantile AKA 75th, and maximum. The quantiles are calculated as first quantile (\\(Q_1\\)): the median of the first half of the sorted data third quantile (\\(Q_3\\)): the median of the second half of the sorted data The interquartile range is defined as \\(Q_3 - Q_1\\) and is a measure of how spread out the middle 50% of values is. The five-number summary is not influenced by the presence of outliers in the ways that the mean and standard deviation are. It is, thus, recommended for skewed datasets. A.1.5 Distribution The distribution of a variable/dataset corresponds to generalizing patterns in the dataset. It often shows how frequently elements in the dataset appear. It shows how the data varies and gives some information about where a typical element in the data might fall. Distributions are most easily seen through data visualization. A.1.6 Outliers Outliers correspond to values in the dataset that fall far outside the range of “ordinary” values. In regards to a boxplot (by default), they correspond to values below \\(Q_1 - (1.5 * IQR)\\) or above \\(Q_3 + (1.5 * IQR)\\). Note that these terms (aside from Distribution) only apply to quantitative variables. A.2 Normal distribution discussion "],
 ["B-appendixB.html", "B Inference Examples Needed packages B.1 Inference mind map B.2 One mean B.3 One proportion B.4 Two proportions B.5 Two means (independent samples) B.6 Two means (paired samples)", " B Inference Examples This appendix is designed to provide you with examples of the five basic hypothesis tests and their corresponding confidence intervals. Traditional theory-based methods as well as computational-based methods are presented. Note: This appendix is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. Please check out our sneak peak of infer below in the meanwhile. For more details on infer visit https://infer.netlify.com/. include_image(path = &quot;images/sign-2408065_1920.png&quot;, html_opts=&quot;height=100px&quot;, latex_opts = &quot;width=20%&quot;) ![](images/sign-2408065_1920.png){ height=100px } Needed packages library(dplyr) library(ggplot2) library(infer) library(knitr) library(kableExtra) library(readr) library(janitor) B.1 Inference mind map To help you better navigate and choose the appropriate analysis, we’ve created a mind map on http://coggle.it available here and below. FIGURE B.1: Mind map for Inference B.2 One mean B.2.1 Problem statement The National Survey of Family Growth conducted by the Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men’s and women’s health. One of the variables collected on this survey is the age at first marriage. 5,534 randomly sampled US women between 2006 and 2010 completed the survey. The women sampled here had been married at least once. Do we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years? (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 4]) B.2.2 Competing hypotheses In words Null hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is equal to 23 years. Alternative hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years. In symbols (with annotations) \\(H_0: \\mu = \\mu_{0}\\), where \\(\\mu\\) represents the mean age of first marriage for all US women from 2006 to 2010 and \\(\\mu_0\\) is 23. \\(H_A: \\mu &gt; 23\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.2.3 Exploring the sample data age_at_marriage &lt;- read_csv(&quot;https://moderndive.com/data/ageAtMar.csv&quot;) age_summ &lt;- age_at_marriage %&gt;% summarize(sample_size = n(), mean = mean(age), sd = sd(age), minimum = min(age), lower_quartile = quantile(age, 0.25), median = median(age), upper_quartile = quantile(age, 0.75), max = max(age)) kable(age_summ) %&gt;% kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16), latex_options = c(&quot;HOLD_position&quot;)) sample_size mean sd minimum lower_quartile median upper_quartile max 5534 23.4 4.72 10 20 23 26 43 The histogram below also shows the distribution of age. ggplot(data = age_at_marriage, mapping = aes(x = age)) + geom_histogram(binwidth = 3, color = &quot;white&quot;) The observed statistic of interest here is the sample mean: x_bar &lt;- age_at_marriage %&gt;% specify(response = age) %&gt;% calculate(stat = &quot;mean&quot;) x_bar # A tibble: 1 x 1 stat &lt;dbl&gt; 1 23.4 Guess about statistical significance We are looking to see if the observed sample mean of 23.44 is statistically greater than \\(\\mu_0 = 23\\). They seem to be quite close, but we have a large sample size here. Let’s guess that the large sample size will lead us to reject this practically small difference. B.2.4 Non-traditional methods Bootstrapping for hypothesis test In order to look to see if the observed sample mean of 23.44 is statistically greater than \\(\\mu_0 = 23\\), we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 5534 was selected. We can use the idea of bootstrapping to simulate the population from which the sample came and then generate samples from that simulated population to account for sampling variability. Recall how bootstrapping would apply in this context: Sample with replacement from our original sample of 5534 women and repeat this process 10,000 times, calculate the mean for each of the 10,000 bootstrap samples created in Step 1., combine all of these bootstrap statistics calculated in Step 2 into a boot_distn object, and shift the center of this distribution over to the null value of 23. (This is needed since it will be centered at 23.44 via the process of bootstrapping.) set.seed(2018) null_distn_one_mean &lt;- age_at_marriage %&gt;% specify(response = age) %&gt;% hypothesize(null = &quot;point&quot;, mu = 23) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) null_distn_one_mean %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44 for our \\(p\\)-value. null_distn_one_mean %&gt;% visualize(obs_stat = x_bar, direction = &quot;greater&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_one_mean %&gt;% get_pvalue(obs_stat = x_bar, direction = &quot;greater&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0 So our \\(p\\)-value is 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tail of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\mu\\) using our sample data using bootstrapping. Note that we don’t need to shift this distribution since we want the center of our confidence interval to be our point estimate \\(\\bar{x}_{obs} = 23.44\\). boot_distn_one_mean &lt;- age_at_marriage %&gt;% specify(response = age) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) ci &lt;- boot_distn_one_mean %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 23.3 23.6 boot_distn_one_mean %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 23 is not contained in this confidence interval as a plausible value of \\(\\mu\\) (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (\\(\\mu &gt; 23\\)). Interpretation: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.316 and 23.565. B.2.5 Traditional methods Check conditions Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations are collected independently. The cases are selected independently through random sampling so this condition is met. Approximately normal: The distribution of the response variable should be normal or the sample size should be at least 30. The histogram for the sample above does show some skew. The Q-Q plot below also shows some skew. ggplot(data = age_at_marriage, mapping = aes(sample = age)) + stat_qq() The sample size here is quite large though (\\(n = 5534\\)) so both conditions are met. Test statistic The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean \\(\\mu\\). A good guess is the sample mean \\(\\bar{X}\\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \\(\\bar{x}_{obs} = 23.44\\) or larger assuming that the population mean is 23 (assuming the null hypothesis is true). If the conditions are met and assuming \\(H_0\\) is true, we can “standardize” this original test statistic of \\(\\bar{X}\\) into a \\(T\\) statistic that follows a \\(t\\) distribution with degrees of freedom equal to \\(df = n - 1\\): \\[ T =\\dfrac{ \\bar{X} - \\mu_0}{ S / \\sqrt{n} } \\sim t (df = n - 1) \\] where \\(S\\) represents the standard deviation of the sample and \\(n\\) is the sample size. Observed test statistic While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t_test() function to perform this analysis for us. t_test_results &lt;- age_at_marriage %&gt;% infer::t_test(formula = age ~ NULL, alternative = &quot;greater&quot;, mu = 23) t_test_results # A tibble: 1 x 6 statistic t_df p_value alternative lower_ci upper_ci &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 6.94 5533 2.25e-12 greater 23.3 Inf We see here that the \\(t_{obs}\\) value is 6.936. Compute \\(p\\)-value The \\(p\\)-value—the probability of observing an \\(t_{obs}\\) value of 6.936 or more in our null distribution of a \\(t\\) with 5533 degrees of freedom—is essentially 0. State conclusion We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean was statistically greater than the hypothesized mean has supporting evidence here. Based on this sample, we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years. Confidence interval t.test(x = age_at_marriage$age, alternative = &quot;two.sided&quot;, mu = 23)$conf [1] 23.3 23.6 attr(,&quot;conf.level&quot;) [1] 0.95 B.2.6 Comparing results Observing the bootstrap distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met (the large sample size was the driver here) leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results. B.3 One proportion B.3.1 Problem statement The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. 73 were satisfied and the remaining were unsatisfied. Based on these findings from the sample, can we reject the CEO’s hypothesis that 80% of the customers are satisfied? [Tweaked a bit from http://stattrek.com/hypothesis-test/proportion.aspx?Tutorial=AP] B.3.2 Competing hypotheses In words Null hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is equal 0.80. Alternative hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80. In symbols (with annotations) \\(H_0: \\pi = p_{0}\\), where \\(\\pi\\) represents the proportion of all customers of the large electric utility satisfied with service they receive and \\(p_0\\) is 0.8. \\(H_A: \\pi \\ne 0.8\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.3.3 Exploring the sample data elec &lt;- c(rep(&quot;satisfied&quot;, 73), rep(&quot;unsatisfied&quot;, 27)) %&gt;% as_data_frame() %&gt;% rename(satisfy = value) The bar graph below also shows the distribution of satisfy. ggplot(data = elec, aes(x = satisfy)) + geom_bar() The observed statistic is computed as p_hat &lt;- elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% calculate(stat = &quot;prop&quot;) p_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.73 Guess about statistical significance We are looking to see if the sample proportion of 0.73 is statistically different from \\(p_0 = 0.8\\) based on this sample. They seem to be quite close, and our sample size is not huge here (\\(n = 100\\)). Let’s guess that we do not have evidence to reject the null hypothesis. B.3.4 Non-traditional methods Simulation for hypothesis test In order to look to see if 0.73 is statistically different from 0.8, we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 100 was selected. We can use the idea of an unfair coin to simulate this process. We will simulate flipping an unfair coin (with probability of success 0.8 matching the null hypothesis) 100 times. Then we will keep track of how many heads come up in those 100 flips. Our simulated statistic matches with how we calculated the original statistic \\(\\hat{p}\\): the number of heads (satisfied) out of our total sample of 100. We then repeat this process many times (say 10,000) to create the null distribution looking at the simulated proportions of successes: set.seed(2018) null_distn_one_prop &lt;- elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% hypothesize(null = &quot;point&quot;, p = 0.8) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;prop&quot;) null_distn_one_prop %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a two-tailed test so we will be looking for values that are 0.8 - 0.73 = 0.07 away from 0.8 in BOTH directions for our \\(p\\)-value: null_distn_one_prop %&gt;% visualize(obs_stat = p_hat, direction = &quot;both&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_one_prop %&gt;% get_pvalue(obs_stat = p_hat, direction = &quot;both&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.114 So our \\(p\\)-value is 0.114 and we fail to reject the null hypothesis at the 5% level. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\pi\\) using our sample data. To do so, we use bootstrapping, which involves sampling with replacement from our original sample of 100 survey respondents and repeating this process 10,000 times, calculating the proportion of successes for each of the 10,000 bootstrap samples created in Step 1., combining all of these bootstrap statistics calculated in Step 2 into a boot_distn object, identifying the 2.5th and 97.5th percentiles of this distribution (corresponding to the 5% significance level chosen) to find a 95% confidence interval for \\(\\pi\\), and interpret this confidence interval in the context of the problem. boot_distn_one_prop &lt;- elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;prop&quot;) Just as we use the mean function for calculating the mean over a numerical variable, we can also use it to compute the proportion of successes for a categorical variable where we specify what we are calling a “success” after the ==. (Think about the formula for calculating a mean and how R handles logical statements such as satisfy == &quot;satisfied&quot; for why this must be true.) ci &lt;- boot_distn_one_prop %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.64 0.81 boot_distn_one_prop %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0.80 is contained in this confidence interval as a plausible value of \\(\\pi\\) (the unknown population proportion). This matches with our hypothesis test results of failing to reject the null hypothesis. Interpretation: We are 95% confident the true proportion of customers who are satisfied with the service they receive is between 0.64 and 0.81. B.3.5 Traditional methods Check conditions Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations are collected independently. The cases are selected independently through random sampling so this condition is met. Approximately normal: The number of expected successes and expected failures is at least 10. This condition is met since 73 and 27 are both greater than 10. Test statistic The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population proportion \\(\\pi\\). A good guess is the sample proportion \\(\\hat{P}\\). Recall that this sample proportion is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample proportion of \\(\\hat{p}_{obs} = 0.73\\) or larger assuming that the population proportion is 0.80 (assuming the null hypothesis is true). If the conditions are met and assuming \\(H_0\\) is true, we can standardize this original test statistic of \\(\\hat{P}\\) into a \\(Z\\) statistic that follows a \\(N(0, 1)\\) distribution. \\[ Z =\\dfrac{ \\hat{P} - p_0}{\\sqrt{\\dfrac{p_0(1 - p_0)}{n} }} \\sim N(0, 1) \\] Observed test statistic While one could compute this observed test statistic by “hand” by plugging the observed values into the formula, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. The calculation has been done in R below for completeness though: p_hat &lt;- 0.73 p0 &lt;- 0.8 n &lt;- 100 (z_obs &lt;- (p_hat - p0) / sqrt( (p0 * (1 - p0)) / n)) [1] -1.75 We see here that the \\(z_{obs}\\) value is around -1.75. Our observed sample proportion of 0.73 is 1.75 standard errors below the hypothesized parameter value of 0.8. Visualize and compute \\(p\\)-value elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% hypothesize(null = &quot;point&quot;, p = 0.8) %&gt;% calculate(stat = &quot;z&quot;) %&gt;% visualize(method = &quot;theoretical&quot;, obs_stat = z_obs, direction = &quot;both&quot;) 2 * pnorm(z_obs) [1] 0.0801 The \\(p\\)-value—the probability of observing an \\(z_{obs}\\) value of -1.75 or more extreme (in both directions) in our null distribution—is around 8%. Note that we could also do this test directly using the prop.test function. stats::prop.test(x = table(elec$satisfy), n = length(elec$satisfy), alternative = &quot;two.sided&quot;, p = 0.8, correct = FALSE) 1-sample proportions test without continuity correction data: table(elec$satisfy), null probability 0.8 X-squared = 3, df = 1, p-value = 0.08 alternative hypothesis: true p is not equal to 0.8 95 percent confidence interval: 0.636 0.807 sample estimates: p 0.73 prop.test does a \\(\\chi^2\\) test here but this matches up exactly with what we would expect: \\(x^2_{obs} = 3.06 = (-1.75)^2 = (z_{obs})^2\\) and the \\(p\\)-values are the same because we are focusing on a two-tailed test. Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping. State conclusion We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample proportion was not statistically greater than the hypothesized proportion has not been invalidated. Based on this sample, we have do not evidence that the proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80, at the 5% level. B.3.6 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results. B.4 Two proportions B.4.1 Problem statement A 2010 survey asked 827 randomly sampled registered voters in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of California? Or do you not know enough to say?” Conduct a hypothesis test to determine if the data provide strong evidence that the proportion of college graduates who do not have an opinion on this issue is different than that of non-college graduates. (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 6]) B.4.2 Competing hypotheses In words Null hypothesis: There is no association between having an opinion on drilling and having a college degree for all registered California voters in 2010. Alternative hypothesis: There is an association between having an opinion on drilling and having a college degree for all registered California voters in 2010. Another way in words Null hypothesis: The probability that a Californian voter in 2010 having no opinion on drilling and is a college graduate is the same as that of a non-college graduate. Alternative hypothesis: These parameter probabilities are different. In symbols (with annotations) \\(H_0: \\pi_{college} = \\pi_{no\\_college}\\) or \\(H_0: \\pi_{college} - \\pi_{no\\_college} = 0\\), where \\(\\pi\\) represents the probability of not having an opinion on drilling. \\(H_A: \\pi_{college} - \\pi_{no\\_college} \\ne 0\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.4.3 Exploring the sample data offshore &lt;- read_csv(&quot;https://moderndive.com/data/offshore.csv&quot;) offshore %&gt;% tabyl(college_grad, response) college_grad no opinion opinion no 131 258 yes 104 334 off_summ &lt;- offshore %&gt;% group_by(college_grad) %&gt;% summarize(prop_no_opinion = mean(response == &quot;no opinion&quot;), sample_size = n()) ggplot(offshore, aes(x = college_grad, fill = response)) + geom_bar(position = &quot;fill&quot;) + coord_flip() Guess about statistical significance We are looking to see if a difference exists in the size of the bars corresponding to no opinion for the plot. Based solely on the plot, we have little reason to believe that a difference exists since the bars seem to be about the same size, BUT…it’s important to use statistics to see if that difference is actually statistically significant! B.4.4 Non-traditional methods Collecting summary info The observed statistic is d_hat &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) d_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -0.0993 Randomization for hypothesis test In order to look to see if the observed sample proportion of no opinion for college graduates of 0.337 is statistically different than that for graduates of 0.237, we need to account for the sample sizes. Note that this is the same as looking to see if \\(\\hat{p}_{grad} - \\hat{p}_{nograd}\\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 389 and 438 were selected. We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability. set.seed(2018) null_distn_two_props &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) null_distn_two_props %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.099 or less than or equal to 0.099 for our \\(p\\)-value. null_distn_two_props %&gt;% visualize(obs_stat = d_hat, direction = &quot;two_sided&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_two_props %&gt;% get_pvalue(obs_stat = d_hat, direction = &quot;two_sided&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.003 So our \\(p\\)-value is 0.003 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tails of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\pi_{college} - \\pi_{no\\_college}\\) using our sample data with bootstrapping. boot_distn_two_props &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) ci &lt;- boot_distn_two_props %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -0.161 -0.0378 boot_distn_two_props %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0 is not contained in this confidence interval as a plausible value of \\(\\pi_{college} - \\pi_{no\\_college}\\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates. Interpretation: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates. B.4.5 Traditional methods B.4.6 Check conditions Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: Each case that was selected must be independent of all the other cases selected. This condition is met since cases were selected at random to observe. Sample size: The number of pooled successes and pooled failures must be at least 10 for each group. We need to first figure out the pooled success rate: \\[\\hat{p}_{obs} = \\dfrac{131 + 104}{827} = 0.28.\\] We now determine expected (pooled) success and failure counts: \\(0.28 \\cdot (131 + 258) = 108.92\\), \\(0.72 \\cdot (131 + 258) = 280.08\\) \\(0.28 \\cdot (104 + 334) = 122.64\\), \\(0.72 \\cdot (104 + 334) = 315.36\\) Independent selection of samples: The cases are not paired in any meaningful way. We have no reason to suspect that a college graduate selected would have any relationship to a non-college graduate selected. B.4.7 Test statistic The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample proportions corresponding to no opinion on drilling (\\(\\hat{p}_{college, obs} - \\hat{p}_{no\\_college, obs}\\) = 0.033) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the standard normal distribution to standardize the difference in sample proportions (\\(\\hat{P}_{college} - \\hat{P}_{no\\_college}\\)) using the standard error of \\(\\hat{P}_{college} - \\hat{P}_{no\\_college}\\) and the pooled estimate: \\[ Z =\\dfrac{ (\\hat{P}_1 - \\hat{P}_2) - 0}{\\sqrt{\\dfrac{\\hat{P}(1 - \\hat{P})}{n_1} + \\dfrac{\\hat{P}(1 - \\hat{P})}{n_2} }} \\sim N(0, 1) \\] where \\(\\hat{P} = \\dfrac{\\text{total number of successes} }{ \\text{total number of cases}}.\\) Observed test statistic While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the prop.test function to perform this analysis for us. z_hat &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% calculate(stat = &quot;z&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) z_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -3.16 The observed difference in sample proportions is 3.16 standard deviations smaller than 0. The \\(p\\)-value—the probability of observing a \\(Z\\) value of -3.16 or more extreme in our null distribution—is 0.0016. This can also be calculated in R directly: 2 * pnorm(-3.16, lower.tail = TRUE) [1] 0.00158 B.4.8 State conclusion We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians. B.4.9 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results. B.5 Two means (independent samples) B.5.1 Problem statement Average income varies from one region of the country to another, and it often reflects both lifestyles and regional living expenses. Suppose a new graduate is considering a job in two locations, Cleveland, OH and Sacramento, CA, and he wants to see whether the average income in one of these cities is higher than the other. He would like to conduct a hypothesis test based on two randomly selected samples from the 2000 Census. (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 5]) B.5.2 Competing hypotheses In words Null hypothesis: There is no association between income and location (Cleveland, OH and Sacramento, CA). Alternative hypothesis: There is an association between income and location (Cleveland, OH and Sacramento, CA). Another way in words Null hypothesis: The mean income is the same for both cities. Alternative hypothesis: The mean income is different for the two cities. In symbols (with annotations) \\(H_0: \\mu_{sac} = \\mu_{cle}\\) or \\(H_0: \\mu_{sac} - \\mu_{cle} = 0\\), where \\(\\mu\\) represents the average income. \\(H_A: \\mu_{sac} - \\mu_{cle} \\ne 0\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.5.3 Exploring the sample data cle_sac &lt;- read.delim(&quot;https://moderndive.com/data/cleSac.txt&quot;) %&gt;% rename(metro_area = Metropolitan_area_Detailed, income = Total_personal_income) %&gt;% na.omit() inc_summ &lt;- cle_sac %&gt;% group_by(metro_area) %&gt;% summarize(sample_size = n(), mean = mean(income), sd = sd(income), minimum = min(income), lower_quartile = quantile(income, 0.25), median = median(income), upper_quartile = quantile(income, 0.75), max = max(income)) kable(inc_summ) %&gt;% kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16), latex_options = c(&quot;HOLD_position&quot;)) metro_area sample_size mean sd minimum lower_quartile median upper_quartile max Cleveland_ OH 212 27467 27681 0 8475 21000 35275 152400 Sacramento_ CA 175 32428 35774 0 8050 20000 49350 206900 The boxplot below also shows the mean for each group highlighted by the red dots. ggplot(cle_sac, aes(x = metro_area, y = income)) + geom_boxplot() + stat_summary(fun.y = &quot;mean&quot;, geom = &quot;point&quot;, color = &quot;red&quot;) Guess about statistical significance We are looking to see if a difference exists in the mean income of the two levels of the explanatory variable. Based solely on the boxplot, we have reason to believe that no difference exists. The distributions of income seem similar and the means fall in roughly the same place. B.5.4 Non-traditional methods Collecting summary info We now compute the observed statistic: d_hat &lt;- cle_sac %&gt;% specify(income ~ metro_area) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Sacramento_ CA&quot;, &quot;Cleveland_ OH&quot;)) d_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 4960. Randomization for hypothesis test In order to look to see if the observed sample mean for Sacramento of 27467.066 is statistically different than that for Cleveland of 32427.543, we need to account for the sample sizes. Note that this is the same as looking to see if \\(\\bar{x}_{sac} - \\bar{x}_{cle}\\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 212 and 175 were selected. We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability. set.seed(2018) null_distn_two_means &lt;- cle_sac %&gt;% specify(income ~ metro_area) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Sacramento_ CA&quot;, &quot;Cleveland_ OH&quot;)) null_distn_two_means %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to 4960.477 or less than or equal to -4960.477 for our \\(p\\)-value. null_distn_two_means %&gt;% visualize(obs_stat = d_hat, direction = &quot;both&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_two_means %&gt;% get_pvalue(obs_stat = d_hat, direction = &quot;both&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.130 So our \\(p\\)-value is 0.13 and we fail to reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are not very far into the tail of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\mu_{sac} - \\mu_{cle}\\) using our sample data with bootstrapping. Here we will bootstrap each of the groups with replacement instead of shuffling. This is done using the groups argument in the resample function to fix the size of each group to be the same as the original group sizes of 175 for Sacramento and 212 for Cleveland. boot_distn_two_means &lt;- cle_sac %&gt;% specify(income ~ metro_area) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Sacramento_ CA&quot;, &quot;Cleveland_ OH&quot;)) ci &lt;- boot_distn_two_means %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -1446. 11308. boot_distn_two_means %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0 is contained in this confidence interval as a plausible value of \\(\\mu_{sac} - \\mu_{cle}\\) (the unknown population parameter). This matches with our hypothesis test results of failing to reject the null hypothesis. Since zero is a plausible value of the population parameter, we do not have evidence that Sacramento incomes are different than Cleveland incomes. Interpretation: We are 95% confident the true mean yearly income for those living in Sacramento is between 1445.53 dollars smaller to 11307.82 dollars higher than for Cleveland. Note: You could also use the null distribution based on randomization with a shift to have its center at \\(\\bar{x}_{sac} - \\bar{x}_{cle} = \\$4960.48\\) instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above. B.5.5 Traditional methods Check conditions Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations are independent in both groups. This metro_area variable is met since the cases are randomly selected from each city. Approximately normal: The distribution of the response for each group should be normal or the sample sizes should be at least 30. ggplot(cle_sac, aes(x = income)) + geom_histogram(color = &quot;white&quot;, binwidth = 20000) + facet_wrap(~ metro_area) We have some reason to doubt the normality assumption here since both the histograms show deviation from a normal model fitting the data well for each group. The sample sizes for each group are greater than 100 though so the assumptions should still apply. Independent samples: The samples should be collected without any natural pairing. There is no mention of there being a relationship between those selected in Cleveland and in Sacramento. B.5.6 Test statistic The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample means (\\(\\bar{x}_{sac, obs} - \\bar{x}_{cle, obs}\\) = 4960.477) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the \\(t\\) distribution to standardize the difference in sample means (\\(\\bar{X}_{sac} - \\bar{X}_{cle}\\)) using the approximate standard error of \\(\\bar{X}_{sac} - \\bar{X}_{cle}\\) (invoking \\(S_{sac}\\) and \\(S_{cle}\\) as estimates of unknown \\(\\sigma_{sac}\\) and \\(\\sigma_{cle}\\)). \\[ T =\\dfrac{ (\\bar{X}_1 - \\bar{X}_2) - 0}{ \\sqrt{\\dfrac{S_1^2}{n_1} + \\dfrac{S_2^2}{n_2}} } \\sim t (df = min(n_1 - 1, n_2 - 1)) \\] where 1 = Sacramento and 2 = Cleveland with \\(S_1^2\\) and \\(S_2^2\\) the sample variance of the incomes of both cities, respectively, and \\(n_1 = 175\\) for Sacramento and \\(n_2 = 212\\) for Cleveland. Observed test statistic Note that we could also do (ALMOST) this test directly using the t.test function. The x and y arguments are expected to both be numeric vectors here so we’ll need to appropriately filter our datasets. cle_sac %&gt;% specify(income ~ metro_area) %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Cleveland_ OH&quot;, &quot;Sacramento_ CA&quot;)) Response: income (integer) Explanatory: metro_area (factor) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -1.50 We see here that the observed test statistic value is around -1.5. While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. B.5.7 Compute \\(p\\)-value The \\(p\\)-value—the probability of observing an \\(t_{174}\\) value of -1.501 or more extreme (in both directions) in our null distribution—is 0.13. This can also be calculated in R directly: 2 * pt(-1.501, df = min(212 - 1, 175 - 1), lower.tail = TRUE) [1] 0.135 We can also approximate by using the standard normal curve: 2 * pnorm(-1.501) [1] 0.133 Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping. B.5.8 State conclusion We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference not existing in the means was backed by this statistical analysis. We do not have evidence to suggest that the true mean income differs between Cleveland, OH and Sacramento, CA based on this data. B.5.9 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results. B.6 Two means (paired samples) Problem statement Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water at 10 randomly selected locations on a stretch of river. Do the data suggest that the true average concentration in the surface water is smaller than that of bottom water? (Note that units are not given.) [Tweaked a bit from https://onlinecourses.science.psu.edu/stat500/node/51] B.6.1 Competing hypotheses In words Null hypothesis: The mean concentration in the bottom water is the same as that of the surface water at different paired locations. Alternative hypothesis: The mean concentration in the surface water is smaller than that of the bottom water at different paired locations. In symbols (with annotations) \\(H_0: \\mu_{diff} = 0\\), where \\(\\mu_{diff}\\) represents the mean difference in concentration for surface water minus bottom water. \\(H_A: \\mu_{diff} &lt; 0\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.6.2 Exploring the sample data zinc_tidy &lt;- read_csv(&quot;https://moderndive.com/data/zinc_tidy.csv&quot;) We want to look at the differences in surface - bottom for each location: zinc_diff &lt;- zinc_tidy %&gt;% group_by(loc_id) %&gt;% summarize(pair_diff = diff(concentration)) %&gt;% ungroup() Next we calculate the mean difference as our observed statistic: d_hat &lt;- zinc_diff %&gt;% specify(response = pair_diff) %&gt;% calculate(stat = &quot;mean&quot;) d_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -0.0804 The histogram below also shows the distribution of pair_diff. ggplot(zinc_diff, aes(x = pair_diff)) + geom_histogram(binwidth = 0.04, color = &quot;white&quot;) Guess about statistical significance We are looking to see if the sample paired mean difference of -0.08 is statistically less than 0. They seem to be quite close, but we have a small number of pairs here. Let’s guess that we will fail to reject the null hypothesis. B.6.3 Non-traditional methods Bootstrapping for hypothesis test In order to look to see if the observed sample mean difference \\(\\bar{x}_{diff} = 4960.477\\) is statistically less than 0, we need to account for the number of pairs. We also need to determine a process that replicates how the paired data was selected in a way similar to how we calculated our original difference in sample means. Treating the differences as our data of interest, we next use the process of bootstrapping to build other simulated samples and then calculate the mean of the bootstrap samples. We hypothesize that the mean difference is zero. This process is similar to comparing the One Mean example seen above, but using the differences between the two groups as a single sample with a hypothesized mean difference of 0. set.seed(2018) null_distn_paired_means &lt;- zinc_diff %&gt;% specify(response = pair_diff) %&gt;% hypothesize(null = &quot;point&quot;, mu = 0) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) null_distn_paired_means %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a left-tailed test so we will be looking for values that are less than or equal to 4960.477 for our \\(p\\)-value. null_distn_paired_means %&gt;% visualize(obs_stat = d_hat, direction = &quot;less&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_paired_means %&gt;% get_pvalue(obs_stat = d_hat, direction = &quot;less&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0 So our \\(p\\)-value is essentially 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the left tail of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\mu_{diff}\\) using our sample data (the calculated differences) with bootstrapping. This is similar to the bootstrapping done in a one sample mean case, except now our data is differences instead of raw numerical data. Note that this code is identical to the pipeline shown in the hypothesis test above except the hypothesize() function is not called. boot_distn_paired_means &lt;- zinc_diff %&gt;% specify(response = pair_diff) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) ci &lt;- boot_distn_paired_means %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -0.112 -0.0503 boot_distn_paired_means %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0 is not contained in this confidence interval as a plausible value of \\(\\mu_{diff}\\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter and since the entire confidence interval falls below zero, we have evidence that surface zinc concentration levels are lower, on average, than bottom level zinc concentrations. Interpretation: We are 95% confident the true mean zinc concentration on the surface is between 0.11 units smaller to 0.05 units smaller than on the bottom. B.6.4 Traditional methods Check conditions Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations among pairs are independent. The locations are selected independently through random sampling so this condition is met. Approximately normal: The distribution of population of differences is normal or the number of pairs is at least 30. The histogram above does show some skew so we have reason to doubt the population being normal based on this sample. We also only have 10 pairs which is fewer than the 30 needed. A theory-based test may not be valid here. Test statistic The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean difference \\(\\mu_{diff}\\). A good guess is the sample mean difference \\(\\bar{X}_{diff}\\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \\(\\bar{x}_{diff, obs} = 0.0804\\) or larger assuming that the population mean difference is 0 (assuming the null hypothesis is true). If the conditions are met and assuming \\(H_0\\) is true, we can “standardize” this original test statistic of \\(\\bar{X}_{diff}\\) into a \\(T\\) statistic that follows a \\(t\\) distribution with degrees of freedom equal to \\(df = n - 1\\): \\[ T =\\dfrac{ \\bar{X}_{diff} - 0}{ S_{diff} / \\sqrt{n} } \\sim t (df = n - 1) \\] where \\(S\\) represents the standard deviation of the sample differences and \\(n\\) is the number of pairs. Observed test statistic While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t_test function on the differences to perform this analysis for us. t_test_results &lt;- zinc_diff %&gt;% infer::t_test(formula = pair_diff ~ NULL, alternative = &quot;less&quot;, mu = 0) t_test_results # A tibble: 1 x 6 statistic t_df p_value alternative lower_ci upper_ci &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 -4.86 9 0.000446 less -Inf -0.0501 We see here that the \\(t_{obs}\\) value is -4.864. Compute \\(p\\)-value The \\(p\\)-value—the probability of observing a \\(t_{obs}\\) value of -4.864 or less in our null distribution of a \\(t\\) with 9 degrees of freedom—is 0. This can also be calculated in R directly: pt(-4.8638, df = nrow(zinc_diff) - 1, lower.tail = TRUE) [1] 0.000446 State conclusion We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean difference was not statistically less than the hypothesized mean of 0 has been invalidated here. Based on this sample, we have evidence that the mean concentration in the bottom water is greater than that of the surface water at different paired locations. B.6.5 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results here. References "],
 ["C-appendixC.html", "C Reach for the Stars Needed packages C.1 Sorted barplots C.2 Interactive graphics", " C Reach for the Stars Needed packages library(dplyr) library(ggplot2) library(knitr) library(dygraphs) library(nycflights13) C.1 Sorted barplots Building upon the example in Section 3.8: flights_table &lt;- table(flights$carrier) flights_table 9E AA AS B6 DL EV F9 FL HA MQ OO UA US 18460 32729 714 54635 48110 54173 685 3260 342 26397 32 58665 20536 VX WN YV 5162 12275 601 We can sort this table from highest to lowest counts by using the sort function: sorted_flights &lt;- sort(flights_table, decreasing = TRUE) names(sorted_flights) [1] &quot;UA&quot; &quot;B6&quot; &quot;EV&quot; &quot;DL&quot; &quot;AA&quot; &quot;MQ&quot; &quot;US&quot; &quot;9E&quot; &quot;WN&quot; &quot;VX&quot; &quot;FL&quot; &quot;AS&quot; &quot;F9&quot; &quot;YV&quot; &quot;HA&quot; [16] &quot;OO&quot; It is often preferred for barplots to be ordered corresponding to the heights of the bars. This allows the reader to more easily compare the ordering of different airlines in terms of departed flights (Robbins 2013). We can also much more easily answer questions like “How many airlines have more departing flights than Southwest Airlines?”. We can use the sorted table giving the number of flights defined as sorted_flights to reorder the carrier. ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() + scale_x_discrete(limits = names(sorted_flights)) FIGURE C.1: Number of flights departing NYC in 2013 by airline - Descending numbers The last addition here specifies the values of the horizontal x axis on a discrete scale to correspond to those given by the entries of sorted_flights. C.2 Interactive graphics C.2.1 Interactive linegraphs Another useful tool for viewing linegraphs such as this is the dygraph function in the dygraphs package in combination with the dyRangeSelector function. This allows us to zoom in on a selected range and get an interactive plot for us to work with: library(dygraphs) flights_day &lt;- mutate(flights, date = as.Date(time_hour)) flights_summarized &lt;- flights_day %&gt;% group_by(date) %&gt;% summarize(median_arr_delay = median(arr_delay, na.rm = TRUE)) rownames(flights_summarized) &lt;- flights_summarized$date flights_summarized &lt;- select(flights_summarized, -date) dyRangeSelector(dygraph(flights_summarized)) The syntax here is a little different than what we have covered so far. The dygraph function is expecting for the dates to be given as the rownames of the object. We then remove the date variable from the flights_summarized data frame since it is accounted for in the rownames. Lastly, we run the dygraph function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via dyRangeSelector. (Note that this plot will only be interactive in the HTML version of this book.) References "],
-["D-appendixD.html", "D Learning Check Solutions D.1 Chapter 2 Solutions D.2 Chapter 3 Solutions D.3 Chapter 4 Solutions D.4 Chapter 5 Solutions D.5 Chapter 6 Solutions", " D Learning Check Solutions D.1 Chapter 2 Solutions library(dplyr) library(ggplot2) library(nycflights13) (LC2.1) What does any ONE row in this flights dataset refer to? A. Data on an airline B. Data on a flight C. Data on an airport D. Data on multiple flights Solution: This is data on a flight. Not a flight path! Example: a flight path would be United 1545 to Houston a flight would be United 1545 to Houston at a specific date/time. For example: 2013/1/1 at 5:15am. (LC2.2) What are some examples in this dataset of categorical variables? What makes them different than quantitative variables? Solution: Hint: Type ?flights in the console to see what all the variables mean! Categorical: carrier the company dest the destination flight the flight number. Even though this is a number, its simply a label. Example United 1545 is not less than United 1714 Quantitative: distance the distance in miles time_hour time (LC2.3) What does int, dbl, and chr mean in the output above? Solution: int: integer. Used to count things i.e. a discrete value. Ex: the # of cars parked in a lot dbl: double. Used to measure things. i.e. a continuous value. Ex: your height in inches chr: character. i.e. text D.2 Chapter 3 Solutions library(nycflights13) library(ggplot2) library(dplyr) (LC3.1) Take a look at both the flights and alaska_flights data frames by running View(flights) and View(alaska_flights) in the console. In what respect do these data frames differ? Solution: flights contains all flight data, while alaska_flights contains only data from Alaskan carrier “AS”. We can see that flights has 336776 rows while alaska_flights has only 714 (LC3.2) What are some practical reasons why dep_delay and arr_delay have a positive relationship? Solution: The later a plane departs, typically the later it will arrive. (LC3.3) What variables (not necessarily in the flights data frame) would you expect to have a negative correlation (i.e. a negative relationship) with dep_delay? Why? Remember that we are focusing on numerical variables here. Solution: An example in the weather dataset is visibility, which measure visibility in miles. As visibility increases, we would expect departure delays to decrease. (LC3.4) Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights? Solution: The point (0,0) means no delay in departure nor arrival. From the point of view of Alaska airlines, this means the flight was on time. It seems most flights are at least close to being on time. (LC3.5) What are some other features of the plot that stand out to you? Solution: Different people will answer this one differently. One answer is most flights depart and arrive less than an hour late. (LC3.6) Create a new scatterplot using different variables in the alaska_flights data frame by modifying the example above. Solution: Many possibilities for this one, see the plot below. Is there a pattern in departure delay depending on when the flight is scheduled to depart? Interestingly, there seems to be only two blocks of time where flights depart. ggplot(data = alaska_flights, mapping = aes(x = dep_time, y = dep_delay)) + geom_point() (LC3.7) Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? Solution: Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? It thins out the points so we address overplotting. But more importantly it hints at the (statistical) density and distribution of the points: where are the points concentrated, where do they occur. We will see more about densities and distributions in Chapter 6 when we switch gears to statistical topics. (LC3.8) After viewing the Figure 3.4 above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2? Solution: After viewing the Figure 3.4 above, give a range of arrival delays and departure delays that occur most frequently? How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2? The lower plot suggests that most Alaska flights from NYC depart between 12 minutes early and on time and arrive between 50 minutes early and on time. (LC3.9) Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather) in the console. In what respect do these data frames differ? Solution: Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather) in the console. In what respect do these data frames differ? The rows of early_january_weather are a subset of weather. (LC3.10) View() the flights data frame again. Why does the time_hour variable uniquely identify the hour of the measurement whereas the hour variable does not? Solution: View() the flights data frame again. Why does the time_hour variable correctly identify the hour of the measurement whereas the hour variable does not? Because to uniquely identify an hour, we need the year/month/day/hour sequence, whereas there are only 24 possible hour’s. (LC3.11) Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? Solution: Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? Because lines suggest connectedness and ordering. (LC3.12) Why are linegraphs frequently used when time is the explanatory variable? Solution: Why are linegraphs frequently used when time is the explanatory variable? Because time is sequential: subsequent observations are closely related to each other. (LC3.13) Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013. Solution: Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013. Humidity is a good one to look at, since this very closely related to the cycles of a day. ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = humid)) + geom_line() (LC3.14) What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures? Solution: The distribution doesn’t change much. But by refining the bin width, we see that the temperature data has a high degree of accuracy. What do I mean by accuracy? Looking at the temp variabile by View(weather), we see that the precision of each temperature recording is 2 decimal places. (LC3.15) Would you classify the distribution of temperatures as symmetric or skewed? Solution: It is rather symmetric, i.e. there are no long tails on only one side of the distribution (LC3.16) What would you guess is the “center” value in this distribution? Why did you make that choice? Solution: The center is around 55.26°F. By running the summary() command, we see that the mean and median are very similar. In fact, when the distribution is symmetric the mean equals the median. (LC3.17) Is this data spread out greatly from the center or is it close? Why? Solution: This can only be answered relatively speaking! Let’s pick things to be relative to Seattle, WA temperatures: While, it appears that Seattle weather has a similar center of 55°F, its temperatures are almost entirely between 35°F and 75°F for a range of about 40°F. Seattle temperatures are much less spread out than New York i.e. much more consistent over the year. New York on the other hand has much colder days in the winter and much hotter days in the summer. Expressed differently, the middle 50% of values, as delineated by the interquartile range is 30°F: (LC3.18) What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables? Solution: Certain months have much more consistent weather (August in particular), while others have crazy variability like January and October, representing changes in the seasons. Because we see temp recordings split by month, we are considering the relationship between these two variables. For example, for example for summer months, temperatures tend to be higher. (LC3.19) What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100? Solution: While month is technically a number between 1-12, we’re viewing it as a categorical variable here. Specifically an ordinal categorical variable since there is a ordering to the categories 25, 50, 75, 100 are temperatures (LC3.20) For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics. Solution: We’d have 365 facets to look at. Way to many. We don’t really care about day-to-day fluctuation in weather so much, but maybe more week-to-week variation. We’d like to focus on seasonal trends. (LC3.21) Does the temp variable in the weather data-set have a lot of variability? Why do you say that? Solution: Again, like in LC (LC3.17), this is a relative question. I would say yes, because in New York City, you have 4 clear seasons with different weather. Whereas in Seattle WA and Portland OR, you have two seasons: summer and rain! (LC3.22) What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point. Solution: It appears to be an outlier. Let’s revisit the use of the filter command to hone in on it. We want all data points where the month is 5 and temp&lt;25 weather %&gt;% filter(month==5 &amp; temp &lt; 25) # A tibble: 1 x 15 origin year month day hour temp dewp humid wind_dir wind_speed wind_gust &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 JFK 2013 5 8 22 13.1 12.0 95.3 80 8.06 NA # … with 4 more variables: precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;, # time_hour &lt;dttm&gt; There appears to be only one hour and only at JFK that recorded 13.1 F (-10.5 C) in the month of May. This is probably a data entry mistake! Why wasn’t the weather at least similar at EWR (Newark) and LGA (La Guardia)? (LC3.23) Which months have the highest variability in temperature? What reasons do you think this is? Solution: We are now interested in the spread of the data. One measure some of you may have seen previously is the standard deviation. But in this plot we can read off the Interquartile Range (IQR): The distance from the 1st to the 3rd quartiles i.e. the length of the boxes You can also think of this as the spread of the middle 50% of the data Just from eyeballing it, it seems November has the biggest IQR, i.e. the widest box, so has the most variation in temperature August has the smallest IQR, i.e. the narrowest box, so is the most consistent temperature-wise Here’s how we compute the exact IQR values for each month (we’ll see this more in depth Chapter 5 of the text): group the observations by month then for each group, i.e. month, summarize it by applying the summary statistic function IQR(), while making sure to skip over missing data via na.rm=TRUE then arrange the table in descending order of IQR weather %&gt;% group_by(month) %&gt;% summarize(IQR = IQR(temp, na.rm=TRUE)) %&gt;% arrange(desc(IQR)) month IQR 11 16.02 12 14.04 1 13.77 9 12.06 4 12.06 5 11.88 6 10.98 10 10.98 2 10.08 7 9.18 3 9.00 8 7.02 (LC3.24) We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example? Solution: Because we need a way to group many numerical observations together, say by grouping by month. For pressure, we have near unique values for pressure, i.e. no groups, so we can’t make boxplots. (LC3.25) Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram? Solution: In a histogram, the bin corresponding to where an outlier lies may not by high enough for us to see. In a boxplot, they are explicitly labelled separately. (LC3.26) Why are histograms inappropriate for visualizing categorical variables? Solution: Histograms are for numerical variables i.e. the horizontal part of each histogram bar represents an interval, whereas for a categorical variable each bar represents only one level of the categorical variable. (LC3.27) What is the difference between histograms and barplots? Solution: See above. (LC3.28) How many Envoy Air flights departed NYC in 2013? Solution: Envoy Air is carrier code MQ and thus 26397 flights departed NYC in 2013. (LC3.29) What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly? Solution: What a pain! We’ll see in Chapter 5 on Data Wrangling that applying arrange(desc(n)) will sort this table in descending order of n! (LC3.30) Why should pie charts be avoided and replaced by barplots? Solution: In our opinion, comparisons using horizontal lines are easier than comparing angles and areas of circles. (LC3.31) What is your opinion as to why pie charts continue to be used? Solution: Legacy? (LC3.32) What kinds of questions are not easily answered by looking at the above figure? Solution: Because the red, green, and blue bars don’t all start at 0 (only red does), it makes comparing counts hard. (LC3.33) What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights? Solution: The different airlines prefer different airports. For example, United is mostly a Newark carrier and JetBlue is a JFK carrier. If airlines didn’t prefer airports, each color would be roughly one third of each bar.} (LC3.34) Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case? Solution: We can easily compare the different aiports for a given carrier using a single comparison line i.e. things are lined up (LC3.35) What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general? Solution: It is hard to get totals for each airline. (LC3.36) Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case? Solution: Not that different than using side-by-side; depends on how you want to organize your presentation. (LC3.37) What information about the different carriers at different airports is more easily seen in the faceted barplot? Solution: Now we can also compare the different carriers within a particular airport easily too. For example, we can read off who the top carrier for each airport is easily using a single horizontal line. D.3 Chapter 4 Solutions library(dplyr) library(ggplot2) library(nycflights13) library(tidyr) library(readr) (LC4.1) Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits? # A tibble: 3 x 4 country beer_servings spirit_servings wine_servings &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 Canada 240 122 100 2 South Korea 140 16 9 3 USA 249 158 84 This data frame is not in tidy format. What would it look like if it were? Solution: There are three variables of information included: country, alcohol type, and number of servings. In tidy format, each of these variables of information are included in their own column. # A tibble: 9 x 3 country `alcohol type` servings &lt;chr&gt; &lt;chr&gt; &lt;int&gt; 1 Canada beer 240 2 Canada spirit 122 3 Canada wine 100 4 South Korea beer 140 5 South Korea spirit 16 6 South Korea wine 9 7 USA beer 249 8 USA spirit 158 9 USA wine 84 Note that how the rows are sorted is inconsequential in whether or not the data frame is in tidy format. In other words, the following data frame sorted by alcohol type instead of country is equally in tidy format. # A tibble: 9 x 3 country `alcohol type` servings &lt;chr&gt; &lt;chr&gt; &lt;int&gt; 1 Canada beer 240 2 South Korea beer 140 3 USA beer 249 4 Canada spirit 122 5 South Korea spirit 16 6 USA spirit 158 7 Canada wine 100 8 South Korea wine 9 9 USA wine 84 (LC4.2) What properties of the observational unit do each of lat, lon, alt, tz, dst, and tzone describe for the airports data frame? Note that you may want to use ?airports to get more information. Solution: lat long represent the airport geographic coordinates, alt is the altitude above sea level of the airport (Run airports %&gt;% filter(faa == &quot;DEN&quot;) to see the altitude of Denver International Airport), tz is the time zone difference with respect to GMT in London UK, dst is the daylight savings time zone, and tzone is the time zone label. (LC4.3) Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions. Solution: In the weather example in LC3.8, the combination of origin, year, month, day, hour are identification variables as they identify the observation in question. Anything else pertains to observations: temp, humid, wind_speed, etc. (LC4.4) Convert the dem_score data frame into a tidy data frame and assign the name of dem_score_tidy to the resulting long-formatted data frame. Solution: Running the following in the console: dem_score_tidy &lt;- gather(data = dem_score, key = year, value = democracy_score, - country) Let’s now compare the dem_score and dem_score_tidy. dem_score has democracy score information for each year in columns, whereas in dem_score_tidy there are explicit variables year and democracy_score. While both representations of the data contain the same information, we can only use ggplot() to create plots using the dem_score_tidy data frame. dem_score # A tibble: 96 x 10 country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 Albania -9 -9 -9 -9 -9 -9 -9 -9 5 2 Argentina -9 -1 -1 -9 -9 -9 -8 8 7 3 Armenia -9 -7 -7 -7 -7 -7 -7 -7 7 4 Australia 10 10 10 10 10 10 10 10 10 5 Austria 10 10 10 10 10 10 10 10 10 6 Azerbaijan -9 -7 -7 -7 -7 -7 -7 -7 1 7 Belarus -9 -7 -7 -7 -7 -7 -7 -7 7 8 Belgium 10 10 10 10 10 10 10 10 10 9 Bhutan -10 -10 -10 -10 -10 -10 -10 -10 -10 10 Bolivia -4 -3 -3 -4 -7 -7 8 9 9 # … with 86 more rows dem_score_tidy # A tibble: 864 x 3 country year democracy_score &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; 1 Albania 1952 -9 2 Argentina 1952 -9 3 Armenia 1952 -9 4 Australia 1952 10 5 Austria 1952 10 6 Azerbaijan 1952 -9 7 Belarus 1952 -9 8 Belgium 1952 10 9 Bhutan 1952 -10 10 Bolivia 1952 -4 # … with 854 more rows (LC4.5) Read in the life expectancy data stored at https://moderndive.com/data/le_mess.csv and convert it to a tidy data frame. Solution: The code is similar life_expectancy &lt;- read_csv(&#39;https://moderndive.com/data/le_mess.csv&#39;) life_expectancy_tidy &lt;- gather(data = life_expectancy, key = year, value = life_expectancy, -country) We observe the same construct structure with respect to year in life_expectancy vs life_expectancy_tidy as we did in dem_score vs dem_score_tidy: life_expectancy # A tibble: 202 x 67 country `1951` `1952` `1953` `1954` `1955` `1956` `1957` `1958` `1959` `1960` &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 Afghan… 27.1 27.7 28.2 28.7 29.3 29.8 30.3 30.9 31.4 31.9 2 Albania 54.7 55.2 55.8 56.6 57.4 58.4 59.5 60.6 61.8 62.9 3 Algeria 43.0 43.5 44.0 44.4 44.9 45.4 45.9 46.4 47.0 47.5 4 Angola 31.0 31.6 32.1 32.7 33.2 33.8 34.3 34.9 35.4 36.0 5 Antigu… 58.3 58.8 59.3 59.9 60.4 60.9 61.4 62.0 62.5 63.0 6 Argent… 61.9 62.5 63.1 63.6 64.0 64.4 64.7 65 65.2 65.4 7 Armenia 62.7 63.1 63.6 64.1 64.5 65 65.4 65.9 66.4 66.9 8 Aruba 59.0 60.0 61.0 61.9 62.7 63.4 64.1 64.7 65.2 65.7 9 Austra… 68.7 69.1 69.7 69.8 70.2 70.0 70.3 70.9 70.4 70.9 10 Austria 65.2 66.8 67.3 67.3 67.6 67.7 67.5 68.5 68.4 68.8 # … with 192 more rows, and 56 more variables: `1961` &lt;dbl&gt;, `1962` &lt;dbl&gt;, # `1963` &lt;dbl&gt;, `1964` &lt;dbl&gt;, `1965` &lt;dbl&gt;, `1966` &lt;dbl&gt;, `1967` &lt;dbl&gt;, # `1968` &lt;dbl&gt;, `1969` &lt;dbl&gt;, `1970` &lt;dbl&gt;, `1971` &lt;dbl&gt;, `1972` &lt;dbl&gt;, # `1973` &lt;dbl&gt;, `1974` &lt;dbl&gt;, `1975` &lt;dbl&gt;, `1976` &lt;dbl&gt;, `1977` &lt;dbl&gt;, # `1978` &lt;dbl&gt;, `1979` &lt;dbl&gt;, `1980` &lt;dbl&gt;, `1981` &lt;dbl&gt;, `1982` &lt;dbl&gt;, # `1983` &lt;dbl&gt;, `1984` &lt;dbl&gt;, `1985` &lt;dbl&gt;, `1986` &lt;dbl&gt;, `1987` &lt;dbl&gt;, # `1988` &lt;dbl&gt;, `1989` &lt;dbl&gt;, `1990` &lt;dbl&gt;, `1991` &lt;dbl&gt;, `1992` &lt;dbl&gt;, # `1993` &lt;dbl&gt;, `1994` &lt;dbl&gt;, `1995` &lt;dbl&gt;, `1996` &lt;dbl&gt;, `1997` &lt;dbl&gt;, # `1998` &lt;dbl&gt;, `1999` &lt;dbl&gt;, `2000` &lt;dbl&gt;, `2001` &lt;dbl&gt;, `2002` &lt;dbl&gt;, # `2003` &lt;dbl&gt;, `2004` &lt;dbl&gt;, `2005` &lt;dbl&gt;, `2006` &lt;dbl&gt;, `2007` &lt;dbl&gt;, # `2008` &lt;dbl&gt;, `2009` &lt;dbl&gt;, `2010` &lt;dbl&gt;, `2011` &lt;dbl&gt;, `2012` &lt;dbl&gt;, # `2013` &lt;dbl&gt;, `2014` &lt;dbl&gt;, `2015` &lt;dbl&gt;, `2016` &lt;dbl&gt; life_expectancy_tidy # A tibble: 13,332 x 3 country year life_expectancy &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; 1 Afghanistan 1951 27.1 2 Albania 1951 54.7 3 Algeria 1951 43.0 4 Angola 1951 31.0 5 Antigua and Barbuda 1951 58.3 6 Argentina 1951 61.9 7 Armenia 1951 62.7 8 Aruba 1951 59.0 9 Australia 1951 68.7 10 Austria 1951 65.2 # … with 13,322 more rows (LC4.6) What are common characteristics of “tidy” datasets? Solution: Rows correspond to observations, while columns correspond to variables. (LC4.7) What makes “tidy” datasets useful for organizing data? Solution: Tidy datasets are an organized way of viewing data. We’ll see later that this format is required for the ggplot2 and dplyr packages for data visualization and wrangling. (LC4.8) What are some advantages of data in normal forms? What are some disadvantages? Solution: When datasets are in normal form, we can easily _join them with other datasets! For example, can we join the flights data with the planes data? We’ll see this more in Chapter 5! D.4 Chapter 5 Solutions library(dplyr) library(ggplot2) library(nycflights13) (LC5.1) What’s another way using the “not” operator ! we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the flights data frame? Test this out using the code above. Solution: # Original in book not_BTV_SEA &lt;- flights %&gt;% filter(!(dest == &quot;BTV&quot; | dest == &quot;SEA&quot;)) # Alternative way not_BTV_SEA &lt;- flights %&gt;% filter(!dest == &quot;BTV&quot; &amp; !dest == &quot;SEA&quot;) # Yet another way not_BTV_SEA &lt;- flights %&gt;% filter(dest != &quot;BTV&quot; &amp; dest != &quot;SEA&quot;) (LC5.2) Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach? Solution: The missing patients may have died of lung cancer! So to ignore them might seriously bias your results! It is very important to think of what the consequences on your analysis are of ignoring missing data! Ask yourself: There is a systematic reasons why certain values are missing? If so, you might be biasing your results! If there isn’t, then it might be ok to “sweep missing values under the rug.” (LC5.3) Modify the above summarize function to create summary_temp to also use the n() summary function: summarize(count = n()). What does the returned value correspond to? Solution: It corresponds to a count of the number of observations/rows: weather %&gt;% summarize(count = n()) # A tibble: 1 x 1 count &lt;int&gt; 1 26115 (LC5.4) Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) first. summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) %&gt;% summarize(std_dev = sd(temp, na.rm = TRUE)) Solution: Consider the output of only running the first two lines: weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) # A tibble: 1 x 1 mean &lt;dbl&gt; 1 55.3 Because after the first summarize(), the variable temp disappears as it has been collapsed to the value mean. So when we try to run the second summarize(), it can’t find the variable temp to compute the standard deviation of. (LC5.5) Recall from Chapter 3 when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the summary_monthly_temp data frame tell us about temperatures in New York City throughout the year? Solution: month mean std_dev 1 35.6 10.22 2 34.3 6.98 3 39.9 6.25 4 51.7 8.79 5 61.8 9.68 6 72.2 7.55 7 80.1 7.12 8 74.5 5.19 9 67.4 8.47 10 60.1 8.85 11 45.0 10.44 12 38.4 9.98 The standard deviation is a quantification of spread and variability. We see that the period in November, December, and January has the most variation in weather, so you can expect very different temperatures on different days. (LC5.6) What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC? Solution: summary_temp_by_day &lt;- weather %&gt;% group_by(year, month, day) %&gt;% summarize( mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE) ) summary_temp_by_day # A tibble: 364 x 5 # Groups: year, month [?] year month day mean std_dev &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; 1 2013 1 1 37.0 4.00 2 2013 1 2 28.7 3.45 3 2013 1 3 30.0 2.58 4 2013 1 4 34.9 2.45 5 2013 1 5 37.2 4.01 6 2013 1 6 40.1 4.40 7 2013 1 7 40.6 3.68 8 2013 1 8 40.1 5.77 9 2013 1 9 43.2 5.40 10 2013 1 10 43.8 2.95 # … with 354 more rows Note: group_by(day) is not enough, because day is a value between 1-31. We need to group_by(year, month, day) library(dplyr) library(nycflights13) summary_temp_by_month &lt;- weather %&gt;% group_by(month) %&gt;% summarize( mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE) ) (LC5.7) Recreate by_monthly_origin, but instead of grouping via group_by(origin, month), group variables in a different order group_by(month, origin). What differs in the resulting dataset? Solution: by_monthly_origin &lt;- flights %&gt;% group_by(month, origin) %&gt;% summarize(count = n()) by_monthly_origin month origin count 1 EWR 9893 1 JFK 9161 1 LGA 7950 2 EWR 9107 2 JFK 8421 2 LGA 7423 3 EWR 10420 3 JFK 9697 3 LGA 8717 4 EWR 10531 4 JFK 9218 4 LGA 8581 5 EWR 10592 5 JFK 9397 5 LGA 8807 6 EWR 10175 6 JFK 9472 6 LGA 8596 7 EWR 10475 7 JFK 10023 7 LGA 8927 8 EWR 10359 8 JFK 9983 8 LGA 8985 9 EWR 9550 9 JFK 8908 9 LGA 9116 10 EWR 10104 10 JFK 9143 10 LGA 9642 11 EWR 9707 11 JFK 8710 11 LGA 8851 12 EWR 9922 12 JFK 9146 12 LGA 9067 In by_monthly_origin the month column is now first and the rows are sorted by month instead of origin. If you compare the values of count in by_origin_monthly and by_monthly_origin using the View() function, you’ll see that the values are actually the same, just presented in a different order. (LC5.8) How could we identify how many flights left each of the three airports for each carrier? Solution: We could summarize the count from each airport using the n() function, which counts rows. count_flights_by_airport &lt;- flights %&gt;% group_by(origin, carrier) %&gt;% summarize(count=n()) count_flights_by_airport origin carrier count EWR 9E 1268 EWR AA 3487 EWR AS 714 EWR B6 6557 EWR DL 4342 EWR EV 43939 EWR MQ 2276 EWR OO 6 EWR UA 46087 EWR US 4405 EWR VX 1566 EWR WN 6188 JFK 9E 14651 JFK AA 13783 JFK B6 42076 JFK DL 20701 JFK EV 1408 JFK HA 342 JFK MQ 7193 JFK UA 4534 JFK US 2995 JFK VX 3596 LGA 9E 2541 LGA AA 15459 LGA B6 6002 LGA DL 23067 LGA EV 8826 LGA F9 685 LGA FL 3260 LGA MQ 16928 LGA OO 26 LGA UA 8044 LGA US 13136 LGA WN 6087 LGA YV 601 All remarkably similar! Note: the n() function counts rows, whereas the sum(VARIABLE_NAME) funciton sums all values of a certain numerical variable VARIABLE_NAME. (LC5.9) How does the filter operation differ from a group_by followed by a summarize? Solution: filter picks out rows from the original dataset without modifying them, whereas group_by %&gt;% summarize computes summaries of numerical variables, and hence reports new values. (LC5.10) What do positive values of the gain variable in flights correspond to? What about negative values? And what about a zero value? Solution: Say a flight departed 20 minutes late, i.e. dep_delay = 20 Then arrived 10 minutes late, i.e. arr_delay = 10. Then gain = dep_delay - arr_delay = 20 - 10 = 10 is positive, so it “made up/gained time in the air.” 0 means the departure and arrival time were the same, so no time was made up in the air. We see in most cases that the gain is near 0 minutes. I never understood this. If the pilot says “we’re going make up time in the air” because of delay by flying faster, why don’t you always just fly faster to begin with? (LC5.11) Could we create the dep_delay and arr_delay columns by simply subtracting dep_time from sched_dep_time and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in flights. Solution: No because you can’t do direct arithmetic on times. The difference in time between 12:03 and 11:59 is 4 minutes, but 1203-1159 = 44 (LC5.12) What can we say about the distribution of gain? Describe it in a few sentences using the plot and the gain_summary data frame values. Solution: Most of the time the gain is a little under zero, most of the time the gain is between -50 and 50 minutes. There are some extreme cases however! (LC5.13) Looking at Figure 4.7, when joining flights and weather (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of year, month, day, hour, and origin, and not just hour? Solution: Because hour is simply a value between 0 and 23; to identify a specific hour, we need to know which year, month, day and at which airport. (LC5.14) What surprises you about the top 10 destinations from NYC in 2013? Solution: This question is subjective! What surprises me is the high number of flights to Boston. Wouldn’t it be easier and quicker to take the train? (LC5.15) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways. Solution: # The regular way: flights %&gt;% select(dest, air_time, distance) # A tibble: 336,776 x 3 dest air_time distance &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 IAH 227 1400 2 IAH 227 1416 3 MIA 160 1089 4 BQN 183 1576 5 ATL 116 762 6 ORD 150 719 7 FLL 158 1065 8 IAD 53 229 9 MCO 140 944 10 ORD 138 733 # … with 336,766 more rows # Since they are sequential columns in the dataset flights %&gt;% select(dest:distance) # A tibble: 336,776 x 3 dest air_time distance &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 IAH 227 1400 2 IAH 227 1416 3 MIA 160 1089 4 BQN 183 1576 5 ATL 116 762 6 ORD 150 719 7 FLL 158 1065 8 IAD 53 229 9 MCO 140 944 10 ORD 138 733 # … with 336,766 more rows # Not as effective, by removing everything else flights %&gt;% select(-year, -month, -day, -dep_time, -sched_dep_time, -dep_delay, -arr_time, -sched_arr_time, -arr_delay, -carrier, -flight, -tailnum, -origin, -hour, -minute, -time_hour) # A tibble: 336,776 x 6 dest air_time distance gain hours gain_per_hour &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 IAH 227 1400 -9 3.78 -2.38 2 IAH 227 1416 -16 3.78 -4.23 3 MIA 160 1089 -31 2.67 -11.6 4 BQN 183 1576 17 3.05 5.57 5 ATL 116 762 19 1.93 9.83 6 ORD 150 719 -16 2.5 -6.4 7 FLL 158 1065 -24 2.63 -9.11 8 IAD 53 229 11 0.883 12.5 9 MCO 140 944 5 2.33 2.14 10 ORD 138 733 -10 2.3 -4.35 # … with 336,766 more rows (LC5.16) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains. Solution: # Anything that starts with &quot;d&quot; flights %&gt;% select(starts_with(&quot;d&quot;)) # A tibble: 336,776 x 5 day dep_time dep_delay dest distance &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; 1 1 517 2 IAH 1400 2 1 533 4 IAH 1416 3 1 542 2 MIA 1089 4 1 544 -1 BQN 1576 5 1 554 -6 ATL 762 6 1 554 -4 ORD 719 7 1 555 -5 FLL 1065 8 1 557 -3 IAD 229 9 1 557 -3 MCO 944 10 1 558 -2 ORD 733 # … with 336,766 more rows # Anything related to delays: flights %&gt;% select(ends_with(&quot;delay&quot;)) # A tibble: 336,776 x 2 dep_delay arr_delay &lt;dbl&gt; &lt;dbl&gt; 1 2 11 2 4 20 3 2 33 4 -1 -18 5 -6 -25 6 -4 12 7 -5 19 8 -3 -14 9 -3 -8 10 -2 8 # … with 336,766 more rows # Anything related to departures: flights %&gt;% select(contains(&quot;dep&quot;)) # A tibble: 336,776 x 3 dep_time sched_dep_time dep_delay &lt;int&gt; &lt;int&gt; &lt;dbl&gt; 1 517 515 2 2 533 529 4 3 542 540 2 4 544 545 -1 5 554 600 -6 6 554 558 -4 7 555 600 -5 8 557 600 -3 9 557 600 -3 10 558 600 -2 # … with 336,766 more rows (LC5.17) Why might we want to use the select() function on a data frame? Solution: To narrow down the data frame, to make it easier to look at. Using View() for example. (LC5.18) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013. Solution: top_five &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(avg_delay = mean(arr_delay, na.rm = TRUE)) %&gt;% arrange(desc(avg_delay)) %&gt;% top_n(n = 5) top_five # A tibble: 5 x 2 dest avg_delay &lt;chr&gt; &lt;dbl&gt; 1 CAE 41.8 2 TUL 33.7 3 OKC 30.6 4 JAC 28.1 5 TYS 24.1 (LC5.19) Using the datasets included in the nycflights13 package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints: Crucial: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level pseudocode that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse what you are trying to do (the algorithm) with how you are going to do it (writing dplyr code). Take a close look at all the datasets using the View() function: flights, weather, planes, airports, and airlines to identify which variables are necessary to compute available seat miles. Figure 4.7 above showing how the various datasets can be joined will also be useful. Consider the data wrangling verbs in Table 4.1 as your toolbox! Solution: Here are some examples of student-written pseudocode. Based on our own pseudocode, let’s first display the entire solution. flights %&gt;% inner_join(planes, by = &quot;tailnum&quot;) %&gt;% select(carrier, seats, distance) %&gt;% mutate(ASM = seats * distance) %&gt;% group_by(carrier) %&gt;% summarize(ASM = sum(ASM, na.rm = TRUE)) %&gt;% arrange(desc(ASM)) # A tibble: 16 x 2 carrier ASM &lt;chr&gt; &lt;dbl&gt; 1 UA 15516377526 2 DL 10532885801 3 B6 9618222135 4 AA 3677292231 5 US 2533505829 6 VX 2296680778 7 EV 1817236275 8 WN 1718116857 9 9E 776970310 10 HA 642478122 11 AS 314104736 12 FL 219628520 13 F9 184832280 14 YV 20163632 15 MQ 7162420 16 OO 1299835 Let’s now break this down step-by-step. To compute the available seat miles for a given flight, we need the distance variable from the flights data frame and the seats variable from the planes data frame, necessitating a join by the key variable tailnum as illustrated in Figure 4.7. To keep the resulting data frame easy to view, we’ll select() only these two variables and carrier: flights %&gt;% inner_join(planes, by = &quot;tailnum&quot;) %&gt;% select(carrier, seats, distance) # A tibble: 284,170 x 3 carrier seats distance &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; 1 UA 149 1400 2 UA 149 1416 3 AA 178 1089 4 B6 200 1576 5 DL 178 762 6 UA 191 719 7 B6 200 1065 8 EV 55 229 9 B6 200 944 10 B6 200 1028 # … with 284,160 more rows Now for each flight we can compute the available seat miles ASM by multiplying the number of seats by the distance via a mutate(): flights %&gt;% inner_join(planes, by = &quot;tailnum&quot;) %&gt;% select(carrier, seats, distance) %&gt;% # Added: mutate(ASM = seats * distance) # A tibble: 284,170 x 4 carrier seats distance ASM &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; 1 UA 149 1400 208600 2 UA 149 1416 210984 3 AA 178 1089 193842 4 B6 200 1576 315200 5 DL 178 762 135636 6 UA 191 719 137329 7 B6 200 1065 213000 8 EV 55 229 12595 9 B6 200 944 188800 10 B6 200 1028 205600 # … with 284,160 more rows Next we want to sum the ASM for each carrier. We achieve this by first grouping by carrier and then summarizing using the sum() function: flights %&gt;% inner_join(planes, by = &quot;tailnum&quot;) %&gt;% select(carrier, seats, distance) %&gt;% mutate(ASM = seats * distance) %&gt;% # Added: group_by(carrier) %&gt;% summarize(ASM = sum(ASM)) # A tibble: 16 x 2 carrier ASM &lt;chr&gt; &lt;dbl&gt; 1 9E 776970310 2 AA 3677292231 3 AS 314104736 4 B6 9618222135 5 DL 10532885801 6 EV 1817236275 7 F9 184832280 8 FL 219628520 9 HA 642478122 10 MQ 7162420 11 OO 1299835 12 UA 15516377526 13 US 2533505829 14 VX 2296680778 15 WN 1718116857 16 YV 20163632 However, because for certain carriers certain flights have missing NA values, the resulting table also returns NA’s. We can eliminate these by adding a na.rm = TRUE argument to sum(), telling R that we want to remove the NA’s in the sum. We saw this in Section (summarize): flights %&gt;% inner_join(planes, by = &quot;tailnum&quot;) %&gt;% select(carrier, seats, distance) %&gt;% mutate(ASM = seats * distance) %&gt;% group_by(carrier) %&gt;% # Modified: summarize(ASM = sum(ASM, na.rm = TRUE)) # A tibble: 16 x 2 carrier ASM &lt;chr&gt; &lt;dbl&gt; 1 9E 776970310 2 AA 3677292231 3 AS 314104736 4 B6 9618222135 5 DL 10532885801 6 EV 1817236275 7 F9 184832280 8 FL 219628520 9 HA 642478122 10 MQ 7162420 11 OO 1299835 12 UA 15516377526 13 US 2533505829 14 VX 2296680778 15 WN 1718116857 16 YV 20163632 Finally, we arrange() the data in desc()ending order of ASM. flights %&gt;% inner_join(planes, by = &quot;tailnum&quot;) %&gt;% select(carrier, seats, distance) %&gt;% mutate(ASM = seats * distance) %&gt;% group_by(carrier) %&gt;% summarize(ASM = sum(ASM, na.rm = TRUE)) %&gt;% # Added: arrange(desc(ASM)) # A tibble: 16 x 2 carrier ASM &lt;chr&gt; &lt;dbl&gt; 1 UA 15516377526 2 DL 10532885801 3 B6 9618222135 4 AA 3677292231 5 US 2533505829 6 VX 2296680778 7 EV 1817236275 8 WN 1718116857 9 9E 776970310 10 HA 642478122 11 AS 314104736 12 FL 219628520 13 F9 184832280 14 YV 20163632 15 MQ 7162420 16 OO 1299835 While the above data frame is correct, the IATA carrier code is not always useful. For example, what carrier is WN? We can address this by joining with the airlines dataset using carrier is the key variable. While this step is not absolutely required, it goes a long way to making the table easier to make sense of. It is important to be empathetic with the ultimate consumers of your presented data! flights %&gt;% inner_join(planes, by = &quot;tailnum&quot;) %&gt;% select(carrier, seats, distance) %&gt;% mutate(ASM = seats * distance) %&gt;% group_by(carrier) %&gt;% summarize(ASM = sum(ASM, na.rm = TRUE)) %&gt;% arrange(desc(ASM)) %&gt;% # Added: inner_join(airlines, by = &quot;carrier&quot;) # A tibble: 16 x 3 carrier ASM name &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; 1 UA 15516377526 United Air Lines Inc. 2 DL 10532885801 Delta Air Lines Inc. 3 B6 9618222135 JetBlue Airways 4 AA 3677292231 American Airlines Inc. 5 US 2533505829 US Airways Inc. 6 VX 2296680778 Virgin America 7 EV 1817236275 ExpressJet Airlines Inc. 8 WN 1718116857 Southwest Airlines Co. 9 9E 776970310 Endeavor Air Inc. 10 HA 642478122 Hawaiian Airlines Inc. 11 AS 314104736 Alaska Airlines Inc. 12 FL 219628520 AirTran Airways Corporation 13 F9 184832280 Frontier Airlines Inc. 14 YV 20163632 Mesa Airlines Inc. 15 MQ 7162420 Envoy Air 16 OO 1299835 SkyWest Airlines Inc. D.5 Chapter 6 Solutions library(ggplot2) library(dplyr) library(moderndive) library(gapminder) library(skimr) "],
+["D-appendixD.html", "D Learning Check Solutions D.1 Chapter 2 Solutions D.2 Chapter 3 Solutions D.3 Chapter 4 Solutions D.4 Chapter 5 Solutions D.5 Chapter 6 Solutions", " D Learning Check Solutions D.1 Chapter 2 Solutions library(dplyr) library(ggplot2) library(nycflights13) (LC2.1) Repeat the above installing steps, but for the dplyr, nycflights13, and knitr packages. This will install the earlier mentioned dplyr package, the nycflights13 package containing data on all domestic flights leaving a NYC airport in 2013, and the knitr package for writing reports in R. (LC2.2) “Load” the dplyr, nycflights13, and knitr packages as well by repeating the above steps. Solution: If the following code runs with no errors, you’ve succeeded! library(dplyr) library(nycflights13) library(knitr) (LC2.3) What does any ONE row in this flights dataset refer to? A. Data on an airline B. Data on a flight C. Data on an airport D. Data on multiple flights Solution: This is data on a flight. Not a flight path! Example: a flight path would be United 1545 to Houston a flight would be United 1545 to Houston at a specific date/time. For example: 2013/1/1 at 5:15am. (LC2.4) What are some examples in this dataset of categorical variables? What makes them different than quantitative variables? Solution: Hint: Type ?flights in the console to see what all the variables mean! Categorical: carrier the company dest the destination flight the flight number. Even though this is a number, its simply a label. Example United 1545 is not less than United 1714 Quantitative: distance the distance in miles time_hour time (LC2.5) What does int, dbl, and chr mean in the output above? Solution: int: integer. Used to count things i.e. a discrete value. Ex: the # of cars parked in a lot dbl: double. Used to measure things. i.e. a continuous value. Ex: your height in inches chr: character. i.e. text (LC2.6) What properties of the observational unit do each of lat, lon, alt, tz, dst, and tzone describe for the airports data frame? Note that you may want to use ?airports to get more information. Solution: lat long represent the airport geographic coordinates, alt is the altitude above sea level of the airport (Run airports %&gt;% filter(faa == &quot;DEN&quot;) to see the altitude of Denver International Airport), tz is the time zone difference with respect to GMT in London UK, dst is the daylight savings time zone, and tzone is the time zone label. (LC2.7) Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions. Solution: In the weather example in LC3.8, the combination of origin, year, month, day, hour are identification variables as they identify the observation in question. Anything else pertains to observations: temp, humid, wind_speed, etc. D.2 Chapter 3 Solutions library(nycflights13) library(ggplot2) library(dplyr) (LC3.1) Take a look at both the flights and alaska_flights data frames by running View(flights) and View(alaska_flights) in the console. In what respect do these data frames differ? Solution: flights contains all flight data, while alaska_flights contains only data from Alaskan carrier “AS”. We can see that flights has 336776 rows while alaska_flights has only 714 (LC3.2) What are some practical reasons why dep_delay and arr_delay have a positive relationship? Solution: The later a plane departs, typically the later it will arrive. (LC3.3) What variables (not necessarily in the flights data frame) would you expect to have a negative correlation (i.e. a negative relationship) with dep_delay? Why? Remember that we are focusing on numerical variables here. Solution: An example in the weather dataset is visibility, which measure visibility in miles. As visibility increases, we would expect departure delays to decrease. (LC3.4) Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights? Solution: The point (0,0) means no delay in departure nor arrival. From the point of view of Alaska airlines, this means the flight was on time. It seems most flights are at least close to being on time. (LC3.5) What are some other features of the plot that stand out to you? Solution: Different people will answer this one differently. One answer is most flights depart and arrive less than an hour late. (LC3.6) Create a new scatterplot using different variables in the alaska_flights data frame by modifying the example above. Solution: Many possibilities for this one, see the plot below. Is there a pattern in departure delay depending on when the flight is scheduled to depart? Interestingly, there seems to be only two blocks of time where flights depart. ggplot(data = alaska_flights, mapping = aes(x = dep_time, y = dep_delay)) + geom_point() (LC3.7) Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? Solution: Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? It thins out the points so we address overplotting. But more importantly it hints at the (statistical) density and distribution of the points: where are the points concentrated, where do they occur. We will see more about densities and distributions in Chapter 6 when we switch gears to statistical topics. (LC3.8) After viewing the Figure 3.4 above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2? Solution: After viewing the Figure 3.4 above, give a range of arrival delays and departure delays that occur most frequently? How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2? The lower plot suggests that most Alaska flights from NYC depart between 12 minutes early and on time and arrive between 50 minutes early and on time. (LC3.9) Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather) in the console. In what respect do these data frames differ? Solution: Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather) in the console. In what respect do these data frames differ? The rows of early_january_weather are a subset of weather. (LC3.10) View() the flights data frame again. Why does the time_hour variable uniquely identify the hour of the measurement whereas the hour variable does not? Solution: View() the flights data frame again. Why does the time_hour variable correctly identify the hour of the measurement whereas the hour variable does not? Because to uniquely identify an hour, we need the year/month/day/hour sequence, whereas there are only 24 possible hour’s. (LC3.11) Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? Solution: Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? Because lines suggest connectedness and ordering. (LC3.12) Why are linegraphs frequently used when time is the explanatory variable? Solution: Why are linegraphs frequently used when time is the explanatory variable? Because time is sequential: subsequent observations are closely related to each other. (LC3.13) Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013. Solution: Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013. Humidity is a good one to look at, since this very closely related to the cycles of a day. ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = humid)) + geom_line() (LC3.14) What does changing the number of bins from 30 to 40 tell us about the distribution of temperatures? Solution: The distribution doesn’t change much. But by refining the bin width, we see that the temperature data has a high degree of accuracy. What do I mean by accuracy? Looking at the temp variabile by View(weather), we see that the precision of each temperature recording is 2 decimal places. (LC3.15) Would you classify the distribution of temperatures as symmetric or skewed? Solution: It is rather symmetric, i.e. there are no long tails on only one side of the distribution (LC3.16) What would you guess is the “center” value in this distribution? Why did you make that choice? Solution: The center is around 55.26°F. By running the summary() command, we see that the mean and median are very similar. In fact, when the distribution is symmetric the mean equals the median. (LC3.17) Is this data spread out greatly from the center or is it close? Why? Solution: This can only be answered relatively speaking! Let’s pick things to be relative to Seattle, WA temperatures: While, it appears that Seattle weather has a similar center of 55°F, its temperatures are almost entirely between 35°F and 75°F for a range of about 40°F. Seattle temperatures are much less spread out than New York i.e. much more consistent over the year. New York on the other hand has much colder days in the winter and much hotter days in the summer. Expressed differently, the middle 50% of values, as delineated by the interquartile range is 30°F: (LC3.18) What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables? Solution: Certain months have much more consistent weather (August in particular), while others have crazy variability like January and October, representing changes in the seasons. Because we see temp recordings split by month, we are considering the relationship between these two variables. For example, for example for summer months, temperatures tend to be higher. (LC3.19) What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100? Solution: While month is technically a number between 1-12, we’re viewing it as a categorical variable here. Specifically an ordinal categorical variable since there is a ordering to the categories 25, 50, 75, 100 are temperatures (LC3.20) For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics. Solution: We’d have 365 facets to look at. Way too many. We don’t really care about day-to-day fluctuation in weather so much, but maybe more week-to-week variation. We’d like to focus on seasonal trends. (LC3.21) Does the temp variable in the weather data-set have a lot of variability? Why do you say that? Solution: Again, like in LC (LC3.17), this is a relative question. I would say yes, because in New York City, you have 4 clear seasons with different weather. Whereas in Seattle WA and Portland OR, you have two seasons: summer and rain! (LC3.22) What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point. Solution: It appears to be an outlier. Let’s revisit the use of the filter command to hone in on it. We want all data points where the month is 5 and temp&lt;25 weather %&gt;% filter(month==5 &amp; temp &lt; 25) # A tibble: 1 x 16 origin year month day hour temp dewp humid wind_dir wind_speed wind_gust &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 JFK 2013 5 8 22 13.1 12.0 95.3 80 8.06 NA # … with 5 more variables: precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;, # time_hour &lt;dttm&gt;, temp_in_C &lt;dbl&gt; There appears to be only one hour and only at JFK that recorded 13.1 F (-10.5 C) in the month of May. This is probably a data entry mistake! Why wasn’t the weather at least similar at EWR (Newark) and LGA (La Guardia)? (LC3.23) Which months have the highest variability in temperature? What reasons do you think this is? Solution: We are now interested in the spread of the data. One measure some of you may have seen previously is the standard deviation. But in this plot we can read off the Interquartile Range (IQR): The distance from the 1st to the 3rd quartiles i.e. the length of the boxes You can also think of this as the spread of the middle 50% of the data Just from eyeballing it, it seems November has the biggest IQR, i.e. the widest box, so has the most variation in temperature August has the smallest IQR, i.e. the narrowest box, so is the most consistent temperature-wise Here’s how we compute the exact IQR values for each month (we’ll see this more in depth Chapter 5 of the text): group the observations by month then for each group, i.e. month, summarize it by applying the summary statistic function IQR(), while making sure to skip over missing data via na.rm=TRUE then arrange the table in descending order of IQR weather %&gt;% group_by(month) %&gt;% summarize(IQR = IQR(temp, na.rm=TRUE)) %&gt;% arrange(desc(IQR)) month IQR 11 16.02 12 14.04 1 13.77 9 12.06 4 12.06 5 11.88 6 10.98 10 10.98 2 10.08 7 9.18 3 9.00 8 7.02 (LC3.24) We looked at the distribution of the numerical variable temp split by the numerical variable month that we converted to a categorical variable using the factor() function. Why would a boxplot of temp split by the numerical variable pressure similarly converted to a categorical variable using the factor() not be informative? Solution: Because there are 12 unique values of month yielding only 12 boxes in our boxplot. There are many more unique values of pressure (469 unique values in fact), because values are to the first decimal place. This would lead to 469 boxes, which is too many for people to digest. (LC3.25) Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram? Solution: In a histogram, the bin corresponding to where an outlier lies may not by high enough for us to see. In a boxplot, they are explicitly labelled separately. (LC3.26) Why are histograms inappropriate for visualizing categorical variables? Solution: Histograms are for numerical variables i.e. the horizontal part of each histogram bar represents an interval, whereas for a categorical variable each bar represents only one level of the categorical variable. (LC3.27) What is the difference between histograms and barplots? Solution: See above. (LC3.28) How many Envoy Air flights departed NYC in 2013? Solution: Envoy Air is carrier code MQ and thus 26397 flights departed NYC in 2013. (LC3.29) What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly? Solution: What a pain! We’ll see in Chapter 5 on Data Wrangling that applying arrange(desc(n)) will sort this table in descending order of n! (LC3.30) Why should pie charts be avoided and replaced by barplots? Solution: In our opinion, comparisons using horizontal lines are easier than comparing angles and areas of circles. (LC3.31) What is your opinion as to why pie charts continue to be used? Solution: Legacy? (LC3.32) What kinds of questions are not easily answered by looking at the above figure? Solution: Because the red, green, and blue bars don’t all start at 0 (only red does), it makes comparing counts hard. (LC3.33) What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights? Solution: The different airlines prefer different airports. For example, United is mostly a Newark carrier and JetBlue is a JFK carrier. If airlines didn’t prefer airports, each color would be roughly one third of each bar.} (LC3.34) Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case? Solution: We can easily compare the different aiports for a given carrier using a single comparison line i.e. things are lined up (LC3.35) What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general? Solution: It is hard to get totals for each airline. (LC3.36) Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case? Solution: Not that different than using side-by-side; depends on how you want to organize your presentation. (LC3.37) What information about the different carriers at different airports is more easily seen in the faceted barplot? Solution: Now we can also compare the different carriers within a particular airport easily too. For example, we can read off who the top carrier for each airport is easily using a single horizontal line. D.3 Chapter 4 Solutions library(dplyr) library(ggplot2) library(nycflights13) (LC4.1) What’s another way using the “not” operator ! to filter only the rows that are not going to Burlington, VT nor Seattle, WA in the flights data frame? Test this out using the code above. Solution: # Original in book not_BTV_SEA &lt;- flights %&gt;% filter(!(dest == &quot;BTV&quot; | dest == &quot;SEA&quot;)) # Alternative way not_BTV_SEA &lt;- flights %&gt;% filter(!dest == &quot;BTV&quot; &amp; !dest == &quot;SEA&quot;) # Yet another way not_BTV_SEA &lt;- flights %&gt;% filter(dest != &quot;BTV&quot; &amp; dest != &quot;SEA&quot;) (LC4.2) Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach? Solution: The missing patients may have died of lung cancer! So to ignore them might seriously bias your results! It is very important to think of what the consequences on your analysis are of ignoring missing data! Ask yourself: There is a systematic reasons why certain values are missing? If so, you might be biasing your results! If there isn’t, then it might be ok to “sweep missing values under the rug.” (LC4.3) Modify the above summarize function to create summary_temp to also use the n() summary function: summarize(count = n()). What does the returned value correspond to? Solution: It corresponds to a count of the number of observations/rows: weather %&gt;% summarize(count = n()) # A tibble: 1 x 1 count &lt;int&gt; 1 26115 (LC4.4) Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) first. summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) %&gt;% summarize(std_dev = sd(temp, na.rm = TRUE)) Solution: Consider the output of only running the first two lines: weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) # A tibble: 1 x 1 mean &lt;dbl&gt; 1 55.3 Because after the first summarize(), the variable temp disappears as it has been collapsed to the value mean. So when we try to run the second summarize(), it can’t find the variable temp to compute the standard deviation of. (LC4.5) Recall from Chapter 3 when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the summary_monthly_temp data frame tell us about temperatures in New York City throughout the year? Solution: The standard deviation is a quantification of spread and variability. We see that the period in November, December, and January has the most variation in weather, so you can expect very different temperatures on different days. (LC4.6) What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC? Solution: Note: group_by(day) is not enough, because day is a value between 1-31. We need to group_by(year, month, day) library(dplyr) library(nycflights13) summary_temp_by_month &lt;- weather %&gt;% group_by(month) %&gt;% summarize( mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE) ) (LC4.7) Recreate by_monthly_origin, but instead of grouping via group_by(origin, month), group variables in a different order group_by(month, origin). What differs in the resulting dataset? Solution: by_monthly_origin In by_monthly_origin the month column is now first and the rows are sorted by month instead of origin. If you compare the values of count in by_origin_monthly and by_monthly_origin using the View() function, you’ll see that the values are actually the same, just presented in a different order. (LC4.8) How could we identify how many flights left each of the three airports for each carrier? Solution: We could summarize the count from each airport using the n() function, which counts rows. All remarkably similar! Note: the n() function counts rows, whereas the sum(VARIABLE_NAME) funciton sums all values of a certain numerical variable VARIABLE_NAME. (LC4.9) How does the filter operation differ from a group_by followed by a summarize? Solution: filter picks out rows from the original dataset without modifying them, whereas group_by %&gt;% summarize computes summaries of numerical variables, and hence reports new values. (LC4.10) What do positive values of the gain variable in flights correspond to? What about negative values? And what about a zero value? Solution: Say a flight departed 20 minutes late, i.e. dep_delay = 20 Then arrived 10 minutes late, i.e. arr_delay = 10. Then gain = dep_delay - arr_delay = 20 - 10 = 10 is positive, so it “made up/gained time in the air.” 0 means the departure and arrival time were the same, so no time was made up in the air. We see in most cases that the gain is near 0 minutes. I never understood this. If the pilot says “we’re going make up time in the air” because of delay by flying faster, why don’t you always just fly faster to begin with? (LC4.11) Could we create the dep_delay and arr_delay columns by simply subtracting dep_time from sched_dep_time and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in flights. Solution: No because you can’t do direct arithmetic on times. The difference in time between 12:03 and 11:59 is 4 minutes, but 1203-1159 = 44 (LC4.12) What can we say about the distribution of gain? Describe it in a few sentences using the plot and the gain_summary data frame values. Solution: Most of the time the gain is a little under zero, most of the time the gain is between -50 and 50 minutes. There are some extreme cases however! (LC4.13) Looking at Figure 4.7, when joining flights and weather (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of year, month, day, hour, and origin, and not just hour? Solution: Because hour is simply a value between 0 and 23; to identify a specific hour, we need to know which year, month, day and at which airport. (LC4.14) What surprises you about the top 10 destinations from NYC in 2013? Solution: This question is subjective! What surprises me is the high number of flights to Boston. Wouldn’t it be easier and quicker to take the train? (LC4.15) What are some advantages of data in normal forms? What are some disadvantages? Solution: When datasets are in normal form, we can easily _join them with other datasets! For example, we can join the flights data with the planes data. (LC4.16) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways. Solution: (LC4.17) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains. Solution: (LC4.18) Why might we want to use the select() function on a data frame? Solution: To narrow down the data frame, to make it easier to look at. Using View() for example. (LC4.19) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013. Solution: (LC4.20) Using the datasets included in the nycflights13 package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints: Crucial: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level pseudocode that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse what you are trying to do (the algorithm) with how you are going to do it (writing dplyr code). Take a close look at all the datasets using the View() function: flights, weather, planes, airports, and airlines to identify which variables are necessary to compute available seat miles. Figure 4.7 above showing how the various datasets can be joined will also be useful. Consider the data wrangling verbs in Table 4.1 as your toolbox! Solution: Here are some examples of student-written pseudocode. Based on our own pseudocode, let’s first display the entire solution. Let’s now break this down step-by-step. To compute the available seat miles for a given flight, we need the distance variable from the flights data frame and the seats variable from the planes data frame, necessitating a join by the key variable tailnum as illustrated in Figure 4.7. To keep the resulting data frame easy to view, we’ll select() only these two variables and carrier: Now for each flight we can compute the available seat miles ASM by multiplying the number of seats by the distance via a mutate(): Next we want to sum the ASM for each carrier. We achieve this by first grouping by carrier and then summarizing using the sum() function: However, because for certain carriers certain flights have missing NA values, the resulting table also returns NA’s. We can eliminate these by adding a na.rm = TRUE argument to sum(), telling R that we want to remove the NA’s in the sum. We saw this in Section 4.3: Finally, we arrange() the data in desc()ending order of ASM. While the above data frame is correct, the IATA carrier code is not always useful. For example, what carrier is WN? We can address this by joining with the airlines dataset using carrier is the key variable. While this step is not absolutely required, it goes a long way to making the table easier to make sense of. It is important to be empathetic with the ultimate consumers of your presented data! D.4 Chapter 5 Solutions library(dplyr) library(ggplot2) library(nycflights13) library(tidyr) library(readr) (LC5.1) What are common characteristics of “tidy” datasets? Solution: Rows correspond to observations, while columns correspond to variables. (LC5.2) What makes “tidy” datasets useful for organizing data? Solution: Tidy datasets are an organized way of viewing data. This format is required for the ggplot2 and dplyr packages for data visualization and wrangling. (LC5.3) Take a look the airline_safety data frame included in the fivethirtyeight data. Run the following: airline_safety After reading the help file by running ?airline_safety, we see that airline_safety is a data frame containing information on different airlines companies’ safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver’s article “Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?”. Let’s ignore the incl_reg_subsidiaries and avail_seat_km_per_week variables for simplicity: airline_safety_smaller &lt;- airline_safety %&gt;% select(-c(incl_reg_subsidiaries, avail_seat_km_per_week)) airline_safety_smaller # A tibble: 56 x 7 airline incidents_85_99 fatal_accidents… fatalities_85_99 incidents_00_14 &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 Aer Li… 2 0 0 0 2 Aerofl… 76 14 128 6 3 Aeroli… 6 0 0 1 4 Aerome… 3 1 64 5 5 Air Ca… 2 0 0 2 6 Air Fr… 14 4 79 6 7 Air In… 2 1 329 4 8 Air Ne… 3 0 0 5 9 Alaska… 5 0 0 5 10 Alital… 7 2 50 4 # … with 46 more rows, and 2 more variables: fatal_accidents_00_14 &lt;int&gt;, # fatalities_00_14 &lt;int&gt; This data frame is not in “tidy” format. How would you convert this data frame to be in “tidy” format, in particular so that it has a variable incident_type_years indicating the indicent type/year and a variable count of the counts? Solution: Using the gather() function from the tidyr package: airline_safety_smaller_tidy &lt;- airline_safety_smaller %&gt;% gather(key = incident_type_years, value = count, -airline) airline_safety_smaller_tidy # A tibble: 336 x 3 airline incident_type_years count &lt;chr&gt; &lt;chr&gt; &lt;int&gt; 1 Aer Lingus incidents_85_99 2 2 Aeroflot incidents_85_99 76 3 Aerolineas Argentinas incidents_85_99 6 4 Aeromexico incidents_85_99 3 5 Air Canada incidents_85_99 2 6 Air France incidents_85_99 14 7 Air India incidents_85_99 2 8 Air New Zealand incidents_85_99 3 9 Alaska Airlines incidents_85_99 5 10 Alitalia incidents_85_99 7 # … with 326 more rows If you look at the resulting airline_safety_smaller_tidy data frame in the spreadsheet viewer, you’ll see that the variable incident_type_years has 6 possible values: &quot;incidents_85_99&quot;, &quot;fatal_accidents_85_99&quot;, &quot;fatalities_85_99&quot;, &quot;incidents_00_14&quot;, &quot;fatal_accidents_00_14&quot;, &quot;fatalities_00_14&quot; corresponding to the 6 columns of airline_safety_smaller we tidied. (LC5.4) Convert the dem_score data frame into a tidy data frame and assign the name of dem_score_tidy to the resulting long-formatted data frame. Solution: Running the following in the console: Let’s now compare the dem_score and dem_score_tidy. dem_score has democracy score information for each year in columns, whereas in dem_score_tidy there are explicit variables year and democracy_score. While both representations of the data contain the same information, we can only use ggplot() to create plots using the dem_score_tidy data frame. (LC5.5) Read in the life expectancy data stored at https://moderndive.com/data/le_mess.csv and convert it to a tidy data frame. Solution: The code is similar We observe the same construct structure with respect to year in life_expectancy vs life_expectancy_tidy as we did in dem_score vs dem_score_tidy: D.5 Chapter 6 Solutions To come! library(ggplot2) library(dplyr) library(moderndive) library(gapminder) library(skimr) "],
 ["references.html", "References", " References "]
 ]
diff --git a/docs/style.css b/docs/style.css
index ed4333b47..d39a9051a 100755
--- a/docs/style.css
+++ b/docs/style.css
@@ -2,13 +2,19 @@
   padding: 1em 1em 1em 1em;
   margin-bottom: 10px;
   background: #9ED3AD 5px center/3em no-repeat;
-} 
+}
+
+.announcement {
+  padding: 1em 1em 1em 1em;
+  margin-bottom: 10px;
+  background: #F6D328 5px center/3em no-repeat;
+}
 
 .review {
   padding: 1em 1em 1em 1em;
   margin-bottom: 10px;
   background: #9ED3AD 1px center/1em no-repeat;
-} 
+}
 
 p.caption {
   color: #777;
diff --git a/images/accuracy_vs_precision.jpg b/images/accuracy_vs_precision.jpg
new file mode 100644
index 000000000..8c5c7d131
Binary files /dev/null and b/images/accuracy_vs_precision.jpg differ
diff --git a/images/accuracy_vs_precision.png b/images/accuracy_vs_precision.png
new file mode 100644
index 000000000..0c1edcafa
Binary files /dev/null and b/images/accuracy_vs_precision.png differ
diff --git a/images/crash-test-dummy.jpg b/images/crash-test-dummy.jpg
new file mode 100644
index 000000000..3364e6598
Binary files /dev/null and b/images/crash-test-dummy.jpg differ
diff --git a/images/crc_press.jpg b/images/crc_press.jpg
new file mode 100644
index 000000000..c7a78c667
Binary files /dev/null and b/images/crc_press.jpg differ
diff --git a/images/flight-simulator.jpg b/images/flight-simulator.jpg
new file mode 100644
index 000000000..7a5fe9df4
Binary files /dev/null and b/images/flight-simulator.jpg differ
diff --git a/images/import-cheatsheet-1.png b/images/import_cheatsheet-1.png
similarity index 100%
rename from images/import-cheatsheet-1.png
rename to images/import_cheatsheet-1.png
diff --git a/images/import-cheatsheet-2.png b/images/import_cheatsheet-2.png
similarity index 100%
rename from images/import-cheatsheet-2.png
rename to images/import_cheatsheet-2.png
diff --git a/index.Rmd b/index.Rmd
index 0964f4a4d..012aaaa2a 100755
--- a/index.Rmd
+++ b/index.Rmd
@@ -7,7 +7,7 @@ site: bookdown::bookdown_site
 documentclass: krantz
 bibliography: [bib/books.bib, bib/packages.bib, bib/articles.bib]
 biblio-style: apalike
-fontsize: 12pt
+fontsize: '12pt, krantz2'
 monofont: "Source Code Pro"
 monofontoptions: "Scale=0.7"
 link-citations: yes
@@ -17,7 +17,7 @@ lof: yes
 always_allow_html: yes
 github-repo: moderndive/moderndive_book
 graphics: yes
-description: "An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses."
+description: "An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools."
 cover-image: "images/logos/book_cover.png"
 url: 'https\://moderndive.com/'
 apple-touch-icon: "images/logos/favicons/apple-touch-icon.png"
@@ -32,12 +32,12 @@ favicon: "images/logos/favicons/favicon.ico"
 
 # Current version information: Date here should match the date in the YAML above.
 # Remove .9000 tag and set date to release date when releasing
-version <- "0.4.0.9000"
+version <- "0.5.0.9000"
 date <- format(Sys.time(), '%B %d, %Y')
 
 # Latest release information:
-latest_release_version <- "0.4.0"
-latest_release_date <- "July 21, 2018" 
+latest_release_version <- "0.5.0"
+latest_release_date <- "February 24, 2019"
 
 # Set output options
 if(knitr:::is_html_output())
@@ -125,8 +125,11 @@ system("cp -r previous_versions/* docs/previous_versions/")
 # For some reason logo needs to be done separately.
 # Loaded in _includes/logo.html
 file.copy("images/logos/wide_format.png", "docs/wide_format.png", overwrite = TRUE)
-```
 
+# Copy pdf file if needed
+if(file.exists("ismaykim.pdf"))
+  file.copy("ismaykim.pdf", "docs/ismaykim.pdf", overwrite = TRUE)
+```
 
 ```{r images, include=FALSE}
 include_image <- function(path,                           
@@ -167,33 +170,51 @@ image_link <- function(path,
 ```
 
 
+
+
 # Introduction {#intro}
 
 
----
+***
 
-```{block, type='learncheck', purl=FALSE}
-**Note: This is the development version of ModernDive and is currently in the process of being edited. For the latest released version of ModernDive, please go to [ModernDive.com](https://moderndive.com/).**
+<h1>
+Special Announcement
+</h1>
+
+
+```{block, type='announcement', purl=FALSE}
+**We're excited to announce that we've signed a book deal with CRC Press! We will be publishing our first fully complete online version of ModernDive in Summer 2019, with a corresponding print edition to follow in Fall 2019. Don't worry though, our content will always remain freely available on [ModernDive.com](https://moderndive.com/).**
 ```
 
-<!--
-## Important Note
+<center> 
+`r include_image(path = "images/crc_press.jpg", html_opts="height=150px", latex_opts = "width=20%")`
+</center>
 
-This is a previous version (v`r version`) of ModernDive and may be out of date. For the current version of ModernDive, please go to [ModernDive.com](https://moderndive.com/). 
+
+<!--
+```{block, type='announcement', purl=FALSE}
+**This is a previous version (v`r version`) of ModernDive and may be out of date. For the current version of ModernDive, please go to [ModernDive.com](https://moderndive.com/).**
+```
 -->
 
----
 
 
+***
 
+```{block, type='learncheck', purl=FALSE}
+**Please note that you are currently looking at the "development version" of ModernDive, which is a work in progress currently being edited and thus subject to frequent change. For the latest "released version" of ModernDive, which changes much less frequently, please visit [ModernDive.com](https://moderndive.com/).**
+```
 
 
 
+**Help! I'm new to R and RStudio and I need to learn about them! However, I'm completely new to coding! What do I do?** 
 
 <!-- https://cran.r-project.org/Rlogo.svg -->
 <!-- https://www.rstudio.com/wp-content/uploads/2014/07/RStudio-Logo-Blue-Gradient.png -->
 
-`r include_image("images/Rlogo.png", html_opts = "height=150px", latex_opts = "height=20%")`        \hfill &emsp; &emsp; &emsp; &emsp; `r include_image("images/RStudio-Logo-Blue-Gradient.png")`
+<center>
+`r include_image("images/Rlogo.png", html_opts = "height=100px", latex_opts = "height=20%")`        \hfill &emsp; &emsp; &emsp; &emsp; `r include_image("images/RStudio-Logo-Blue-Gradient.png", html_opts = "height=100px", latex_opts = "height=20%")`
+</center>
 
 <!--
 <img src="images/Rlogo.svg" style="height: 150px;"/>
@@ -201,20 +222,19 @@ This is a previous version (v`r version`) of ModernDive and may be out of date.
 <img src="images/RStudio-Logo-Blue-Gradient.png" style="height: 150px;"/>
 -->
 
-**Help! I'm new to R and RStudio and I need to learn about them! However, I'm completely new to coding! What do I do?** If you're asking yourself this question, then you've come to the right place! Start with our [Introduction for Students](#sec:intro-for-students).
+If you're asking yourself this question, then you've come to the right place! Start with our [Introduction for Students](#sec:intro-for-students).
 
 * *Are you an instructor hoping to use this book in your courses? Then click [here](#sec:intro-instructors) for more information on how to teach with this book.*
 * *Are you looking to connect with and contribute to ModernDive? Then click [here](#sec:connect-contribute) for information on how.*
 * *Are you curious about the publishing of this book? Then click [here](#sec:about-book) for more information on the open-source technology, in particular R Markdown and the bookdown package.*
 
-This is version `r version` of ModernDive published on `r date`. For previous versions of ModernDive, see Section \@ref(sec:about-book). While a PDF version of this book can be found [here](https://github.com/moderndive/moderndive_book/raw/master/docs/ismaykim.pdf){target="_blank"}, this is very much a work in progress with many things that still need to be fixed. We appreciate your patience. 
-
+This is version `r version` of ModernDive published on `r date`. For previous versions of ModernDive, see Section \@ref(sec:about-book). While a PDF version of this book can be found [here](ismaykim.pdf){target="_blank"}, this is very much a work in progress with many things that still need to be fixed. We appreciate your patience. 
 
 
 
+***
 
 
----
 
 ## Introduction for students {#sec:intro-for-students}
 
@@ -223,18 +243,18 @@ This book assumes no prerequisites: no algebra, no calculus, and no prior progra
 In Figure \@ref(fig:moderndive-figure) we present a flowchart of what you'll cover in this book. You'll first get started with data in Chapter \@ref(getting-started), where you'll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then
 
 1. **Data science**: You'll assemble your data science toolbox using `tidyverse` packages. In particular:
-+ Ch.\@ref(viz): Visualizing data via the `ggplot2` package.
-+ Ch.\@ref(tidy): Understanding the concept of "tidy" data as a standardized data input format for all packages in the `tidyverse`
-+ Ch.\@ref(wrangling): Wrangling data via the `dplyr` package.
+    + Ch.\@ref(viz): Visualizing data via the `ggplot2` package.
+    + Ch.\@ref(tidy): Understanding the concept of "tidy" data as a standardized data input format for all packages in the `tidyverse`
+    + Ch.\@ref(wrangling): Wrangling data via the `dplyr` package.
 1. **Data modeling**: Using these data science tools and helper functions from the `moderndive` package, you'll start performing data modeling. In particular:
-+ Ch.\@ref(regression): Constructing basic regression models.
-+ Ch.\@ref(multiple-regression): Constructing multiple regression models.
+    + Ch.\@ref(regression): Constructing basic regression models.
+    + Ch.\@ref(multiple-regression): Constructing multiple regression models.
 1. **Statistical inference**: Once again using your newly acquired data science tools, we'll unpack statistical inference using the `infer` package. In particular:
-+ Ch.\@ref(sampling): Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a "bowl" with an unknown proportion of red balls.
-+ Ch.\@ref(confidence-intervals): Building confidence intervals.
-+ Ch.\@ref(hypothesis-testing): Conducting hypothesis tests.
+    + Ch.\@ref(sampling): Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a "bowl" with an unknown proportion of red balls.
+    + Ch.\@ref(confidence-intervals): Building confidence intervals.
+    + Ch.\@ref(hypothesis-testing): Conducting hypothesis tests.
 1. **Data modeling revisited**: Armed with your new understanding of statistical inference, you'll revisit and review the models you constructed in Ch.\@ref(regression) & Ch.\@ref(multiple-regression). In particular:
-+ Ch.\@ref(inference-for-regression): Interpreting both the statistical and practice significance of the results of the models.
+    + Ch.\@ref(inference-for-regression): Interpreting both the statistical and practice significance of the results of the models.
 
 We'll end with a discussion on what it means to "think with data" in Chapter \@ref(thinking-with-data) and present an example case study data analysis of house prices in Seattle.
 
@@ -320,9 +340,9 @@ At this point, if you are interested in instructor perspectives on this book, wa
 
 
 
+***
 
 
----
 
 ## Introduction for instructors {#sec:intro-instructors}
 
@@ -344,27 +364,30 @@ This book is intended for instructors of traditional introductory statistics cla
 Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you.
 
 1. **Blur the lines between lecture and lab**
-+ With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened.
-+ It's much harder for students to understand the importance of using software if they only use it once a week or less.  They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key.
+    + With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened.
+    + It's much harder for students to understand the importance of using software if they only use it once a week or less.  They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key.
 1. **Focus on the entire data/science research pipeline**
-+ We believe that the entirety of Grolemund and Wickham's [data/science pipeline](http://r4ds.had.co.nz/introduction.html) should be taught.
-+ We believe in ["minimizing prerequisites to research"](https://arxiv.org/abs/1507.05346): students should be answering questions with data as soon as possible.
+    + We believe that the entirety of Grolemund and Wickham's [data/science pipeline](http://r4ds.had.co.nz/introduction.html) should be taught.
+    + We believe in ["minimizing prerequisites to research"](https://arxiv.org/abs/1507.05346): students should be answering questions with data as soon as possible.
 1. **It's all about the data**
-+ We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the `nycflights13` and `fivethirtyeight` packages.
-+ We believe that [data visualization is a gateway drug for statistics](http://escholarship.org/uc/item/84v3774z) and that the Grammar of Graphics as implemented in the `ggplot2` package is the best way to impart such lessons. However, we often hear: "You can't teach `ggplot2` for data visualization in intro stats!" We, like [David Robinson](http://varianceexplained.org/r/teach_ggplot2_to_beginners/), are much more optimistic.
-+ `dplyr` has made data wrangling much more [accessible](http://chance.amstat.org/2015/04/setting-the-stage/) to novices, and hence much more interesting data-sets can be explored. 
+    + We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the `nycflights13` and `fivethirtyeight` packages.
+    + We believe that [data visualization is a gateway drug for statistics](http://escholarship.org/uc/item/84v3774z) and that the Grammar of Graphics as implemented in the `ggplot2` package is the best way to impart such lessons. However, we often hear: "You can't teach `ggplot2` for data visualization in intro stats!" We, like [David Robinson](http://varianceexplained.org/r/teach_ggplot2_to_beginners/), are much more optimistic.
+    + `dplyr` has made data wrangling much more [accessible](http://chance.amstat.org/2015/04/setting-the-stage/) to novices, and hence much more interesting data-sets can be explored. 
 1. **Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas**
-+ Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference.
-+ This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics.
+    + Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference.
+    + This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics.
 1. **Don't fence off students from the computation pool, throw them in!**
-+ Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice.
-+ We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis.
+    + Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice.
+    + We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis.
 1. **Complete reproducibility and customizability**
-+ We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book!
-+ Ultimately the best textbook is one you've written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see [About this Book](#sec:about-book).
+    + We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book!
+    + Ultimately the best textbook is one you've written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see [About this Book](#sec:about-book).
+
+
+
+***
 
 
----
 
 ## DataCamp {#datacamp}
 
@@ -390,7 +413,9 @@ Click on the image for each course to access its webpage on [datacamp.com](https
 
 
 
----
+***
+
+
 
 ## Connect and contribute {#sec:connect-contribute}
 
@@ -413,25 +438,26 @@ The authors would like to thank [Nina Sonneborn](https://github.com/nsonneborn),
 
 
 
+***
 
 
----
 
 ## About this book {#sec:about-book}
 
 This book was written using RStudio's [bookdown](https://bookdown.org/) package by Yihui Xie [@R-bookdown]. This package simplifies the publishing of books by having all content written in [R Markdown](http://rmarkdown.rstudio.com/html_document_format.html). The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub:
 
 * **Latest published version** The most up-to-date release:
-+ Version `r latest_release_version` released on `r latest_release_date` ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v`r latest_release_version`)).
-+ Available at [ModernDive.com](https://moderndive.com/)
+    + Version `r latest_release_version` released on `r latest_release_date` ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v`r latest_release_version`)).
+    + Available at [ModernDive.com](https://moderndive.com/)
 * **Development version** The working copy of the next version which is currently being edited:
-+ Preview of development version is available at [https://moderndive.netlify.com/](https://moderndive.netlify.com/)
-+ Source code: Available on ModernDive's [GitHub repository page](https://github.com/moderndive/moderndive_book)
+    + Preview of development version is available at [https://moderndive.netlify.com/](https://moderndive.netlify.com/)
+    + Source code: Available on ModernDive's [GitHub repository page](https://github.com/moderndive/moderndive_book)
 * **Previous versions** Older versions that may be out of date:
-+ [Version 0.3.0](previous_versions/v0.3.0/index.html) released on February 3, 2018 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.3.0))
-+ [Version 0.2.0](previous_versions/v0.2.0/index.html) released on August 02, 2017 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.2.0))
-+ [Version 0.1.3](previous_versions/v0.1.3/index.html) released on February 09, 2017 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.1.3))
-+ [Version 0.1.2](previous_versions/v0.1.2/index.html) released on January 22, 2017 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.1.2))
+    + [Version 0.4.0](previous_versions/v0.4.0/index.html) released on July 21, 2018 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.4.0))
+    + [Version 0.3.0](previous_versions/v0.3.0/index.html) released on February 3, 2018 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.3.0))
+    + [Version 0.2.0](previous_versions/v0.2.0/index.html) released on August 02, 2017 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.2.0))
+    + [Version 0.1.3](previous_versions/v0.1.3/index.html) released on February 09, 2017 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.1.3))
+    + [Version 0.1.2](previous_versions/v0.1.2/index.html) released on January 22, 2017 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.1.2))
 
 Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated *editions* of the textbook every few years, we apply a software design influenced model of publishing more easily updated *versions*.  We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests.
 
@@ -439,8 +465,9 @@ Finally, feel free to modify the book as you wish for your own needs, but please
 
 
 
+***
+
 
----
 
 ## About the authors {#sec:about-authors}
 
@@ -453,15 +480,15 @@ Chester Ismay           |  Albert Y. Kim
 <!-- <img src="images/ismay.jpeg" alt="Drawing" style="height: 200px;"/>  |  <img src="images/kim.jpeg" alt="Drawing" style="height: 200px;"/> -->
 
 * Chester Ismay: Senior Curriculum Lead - DataCamp, Portland, OR, USA.
-+ Email: [chester.ismay@gmail.com](mailto:chester.ismay@gmail.com)
-+ Webpage: <http://chester.rbind.io/>
-+ Twitter: [old_man_chester](https://twitter.com/old_man_chester)
-+ GitHub: <https://github.com/ismayc>
+    + Email: [chester.ismay@gmail.com](mailto:chester.ismay@gmail.com)
+    + Webpage: <http://chester.rbind.io/>
+    + Twitter: [old_man_chester](https://twitter.com/old_man_chester)
+    + GitHub: <https://github.com/ismayc>
 * Albert Y. Kim: Assistant Professor of Statistical & Data Sciences - Smith College, Northampton, MA, USA.
-+ Email: [albert.ys.kim@gmail.com](mailto:albert.ys.kim@gmail.com)
-+ Webpage: <http://rudeboybert.rbind.io/>
-+ Twitter: [rudeboybert](https://twitter.com/rudeboybert)
-+ GitHub: <https://github.com/rudeboybert>
+    + Email: [albert.ys.kim@gmail.com](mailto:albert.ys.kim@gmail.com)
+    + Webpage: <http://rudeboybert.rbind.io/>
+    + Twitter: [rudeboybert](https://twitter.com/rudeboybert)
+    + GitHub: <https://github.com/rudeboybert>
 
 <!--
 ### Colophon 
diff --git a/ismaykim.log b/ismaykim.log
index f7e41065c..1dc2eb650 100644
--- a/ismaykim.log
+++ b/ismaykim.log
@@ -1,14 +1,14 @@
-This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015) (preloaded format=pdflatex 2015.5.24)  25 SEP 2018 09:50
+This is XeTeX, Version 3.14159265-2.6-0.99998 (TeX Live 2017) (preloaded format=xelatex 2017.5.23)  24 FEB 2019 13:55
 entering extended mode
  restricted \write18 enabled.
  %&-line parsing enabled.
 **ismaykim.tex
 (./ismaykim.tex
-LaTeX2e <2015/01/01>
-Babel <3.9l> and hyphenation patterns for 79 languages loaded.
+LaTeX2e <2017-04-15>
+Babel <3.10> and hyphenation patterns for 84 language(s) loaded.
 (./krantz.cls
 Document Class: krantz 2005/09/16 v1.4f Standard LaTeX document class
-(/usr/local/texlive/2015/texmf-dist/tex/latex/base/bk12.clo
+(/usr/local/texlive/2017/texmf-dist/tex/latex/base/bk12.clo
 File: bk12.clo 2014/09/29 v1.4h Standard LaTeX file (size option)
 )
 \trimheight=\dimen102
@@ -100,7 +100,7 @@ File: bk12.clo 2014/09/29 v1.4h Standard LaTeX file (size option)
 \enumdim=\dimen123
 \concolwidth=\dimen124
 \stempbox=\box50
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/lm/lmodern.sty
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/lm/lmodern.sty
 Package: lmodern 2009/10/30 v1.6 Latin Modern Fonts
 LaTeX Font Info:    Overwriting symbol font `operators' in version `normal'
 (Font)                  OT1/cmr/m/n --> OT1/lmr/m/n on input line 22.
@@ -134,46 +134,46 @@ LaTeX Font Info:    Overwriting math alphabet `\mathit' in version `bold'
 (Font)                  OT1/cmr/bx/it --> OT1/lmr/bx/it on input line 37.
 LaTeX Font Info:    Overwriting math alphabet `\mathtt' in version `bold'
 (Font)                  OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 38.
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/amsfonts/amssymb.sty
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/amsfonts/amssymb.sty
 Package: amssymb 2013/01/14 v3.01 AMS font symbols
-(/usr/local/texlive/2015/texmf-dist/tex/latex/amsfonts/amsfonts.sty
+(/usr/local/texlive/2017/texmf-dist/tex/latex/amsfonts/amsfonts.sty
 Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support
 \@emptytoks=\toks14
 \symAMSa=\mathgroup4
 \symAMSb=\mathgroup5
 LaTeX Font Info:    Overwriting math alphabet `\mathfrak' in version `bold'
 (Font)                  U/euf/m/n --> U/euf/b/n on input line 106.
-)) (/usr/local/texlive/2015/texmf-dist/tex/latex/amsmath/amsmath.sty
-Package: amsmath 2013/01/14 v2.14 AMS math features
+)) (/usr/local/texlive/2017/texmf-dist/tex/latex/amsmath/amsmath.sty
+Package: amsmath 2016/11/05 v2.16a AMS math features
 \@mathmargin=\skip71
 For additional information on amsmath, use the `?' option.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/amsmath/amstext.sty
-Package: amstext 2000/06/29 v2.01
-(/usr/local/texlive/2015/texmf-dist/tex/latex/amsmath/amsgen.sty
-File: amsgen.sty 1999/11/30 v2.0
+(/usr/local/texlive/2017/texmf-dist/tex/latex/amsmath/amstext.sty
+Package: amstext 2000/06/29 v2.01 AMS text
+(/usr/local/texlive/2017/texmf-dist/tex/latex/amsmath/amsgen.sty
+File: amsgen.sty 1999/11/30 v2.0 generic functions
 \@emptytoks=\toks15
 \ex@=\dimen125
-)) (/usr/local/texlive/2015/texmf-dist/tex/latex/amsmath/amsbsy.sty
-Package: amsbsy 1999/11/29 v1.2d
+)) (/usr/local/texlive/2017/texmf-dist/tex/latex/amsmath/amsbsy.sty
+Package: amsbsy 1999/11/29 v1.2d Bold Symbols
 \pmbraise@=\dimen126
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/amsmath/amsopn.sty
-Package: amsopn 1999/12/14 v2.01 operator names
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/amsmath/amsopn.sty
+Package: amsopn 2016/03/08 v2.02 operator names
 )
 \inf@bad=\count90
-LaTeX Info: Redefining \frac on input line 210.
+LaTeX Info: Redefining \frac on input line 213.
 \uproot@=\count91
 \leftroot@=\count92
-LaTeX Info: Redefining \overline on input line 306.
+LaTeX Info: Redefining \overline on input line 375.
 \classnum@=\count93
 \DOTSCASE@=\count94
-LaTeX Info: Redefining \ldots on input line 378.
-LaTeX Info: Redefining \dots on input line 381.
-LaTeX Info: Redefining \cdots on input line 466.
+LaTeX Info: Redefining \ldots on input line 472.
+LaTeX Info: Redefining \dots on input line 475.
+LaTeX Info: Redefining \cdots on input line 596.
 \Mathstrutbox@=\box51
 \strutbox@=\box52
 \big@size=\dimen127
-LaTeX Font Info:    Redeclaring font encoding OML on input line 566.
-LaTeX Font Info:    Redeclaring font encoding OMS on input line 567.
+LaTeX Font Info:    Redeclaring font encoding OML on input line 712.
+LaTeX Font Info:    Redeclaring font encoding OMS on input line 713.
 \macc@depth=\count95
 \c@MaxMatrixCols=\count96
 \dotsspace@=\muskip10
@@ -194,242 +194,457 @@ LaTeX Font Info:    Redeclaring font encoding OMS on input line 567.
 \multlinegap=\skip72
 \multlinetaggap=\skip73
 \mathdisplay@stack=\toks19
-LaTeX Info: Redefining \[ on input line 2665.
-LaTeX Info: Redefining \] on input line 2666.
-) (/usr/local/texlive/2015/texmf-dist/tex/generic/ifxetex/ifxetex.sty
+LaTeX Info: Redefining \[ on input line 2817.
+LaTeX Info: Redefining \] on input line 2818.
+) (/usr/local/texlive/2017/texmf-dist/tex/generic/ifxetex/ifxetex.sty
 Package: ifxetex 2010/09/12 v0.6 Provides ifxetex conditional
-) (/usr/local/texlive/2015/texmf-dist/tex/generic/oberdiek/ifluatex.sty
-Package: ifluatex 2010/03/01 v1.3 Provides the ifluatex switch (HO)
+) (/usr/local/texlive/2017/texmf-dist/tex/generic/oberdiek/ifluatex.sty
+Package: ifluatex 2016/05/16 v1.4 Provides the ifluatex switch (HO)
 Package ifluatex Info: LuaTeX not detected.
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/base/fixltx2e.sty
-Package: fixltx2e 2015/02/21 v2.0a fixes to LaTeX (obsolete)
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/base/fixltx2e.sty
+Package: fixltx2e 2016/12/29 v2.1a fixes to LaTeX (obsolete)
+Applying: [2015/01/01] Old fixltx2e package on input line 46.
 
 Package fixltx2e Warning: fixltx2e is not required with releases after 2015
 (fixltx2e)                All fixes are now in the LaTeX kernel.
 (fixltx2e)                See the latexrelease package for details.
 
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/base/fontenc.sty
-Package: fontenc 2005/09/27 v1.99g Standard LaTeX package
-(/usr/local/texlive/2015/texmf-dist/tex/latex/base/t1enc.def
-File: t1enc.def 2005/09/27 v1.99g Standard LaTeX file
-LaTeX Font Info:    Redeclaring font encoding T1 on input line 48.
-)) (/usr/local/texlive/2015/texmf-dist/tex/latex/base/inputenc.sty
-Package: inputenc 2015/03/17 v1.2c Input encoding file
-\inpenc@prehook=\toks20
-\inpenc@posthook=\toks21
-(/usr/local/texlive/2015/texmf-dist/tex/latex/base/utf8.def
-File: utf8.def 2014/09/29 v1.1m UTF-8 support for inputenc
-Now handling font encoding OML ...
-... no UTF-8 mapping file for font encoding OML
-Now handling font encoding T1 ...
-... processing UTF-8 mapping file for font encoding T1
-(/usr/local/texlive/2015/texmf-dist/tex/latex/base/t1enc.dfu
-File: t1enc.dfu 2014/09/29 v1.1m UTF-8 support for inputenc
-   defining Unicode char U+00A1 (decimal 161)
-   defining Unicode char U+00A3 (decimal 163)
-   defining Unicode char U+00AB (decimal 171)
-   defining Unicode char U+00BB (decimal 187)
-   defining Unicode char U+00BF (decimal 191)
-   defining Unicode char U+00C0 (decimal 192)
-   defining Unicode char U+00C1 (decimal 193)
-   defining Unicode char U+00C2 (decimal 194)
-   defining Unicode char U+00C3 (decimal 195)
-   defining Unicode char U+00C4 (decimal 196)
-   defining Unicode char U+00C5 (decimal 197)
-   defining Unicode char U+00C6 (decimal 198)
-   defining Unicode char U+00C7 (decimal 199)
-   defining Unicode char U+00C8 (decimal 200)
-   defining Unicode char U+00C9 (decimal 201)
-   defining Unicode char U+00CA (decimal 202)
-   defining Unicode char U+00CB (decimal 203)
-   defining Unicode char U+00CC (decimal 204)
-   defining Unicode char U+00CD (decimal 205)
-   defining Unicode char U+00CE (decimal 206)
-   defining Unicode char U+00CF (decimal 207)
-   defining Unicode char U+00D0 (decimal 208)
-   defining Unicode char U+00D1 (decimal 209)
-   defining Unicode char U+00D2 (decimal 210)
-   defining Unicode char U+00D3 (decimal 211)
-   defining Unicode char U+00D4 (decimal 212)
-   defining Unicode char U+00D5 (decimal 213)
-   defining Unicode char U+00D6 (decimal 214)
-   defining Unicode char U+00D8 (decimal 216)
-   defining Unicode char U+00D9 (decimal 217)
-   defining Unicode char U+00DA (decimal 218)
-   defining Unicode char U+00DB (decimal 219)
-   defining Unicode char U+00DC (decimal 220)
-   defining Unicode char U+00DD (decimal 221)
-   defining Unicode char U+00DE (decimal 222)
-   defining Unicode char U+00DF (decimal 223)
-   defining Unicode char U+00E0 (decimal 224)
-   defining Unicode char U+00E1 (decimal 225)
-   defining Unicode char U+00E2 (decimal 226)
-   defining Unicode char U+00E3 (decimal 227)
-   defining Unicode char U+00E4 (decimal 228)
-   defining Unicode char U+00E5 (decimal 229)
-   defining Unicode char U+00E6 (decimal 230)
-   defining Unicode char U+00E7 (decimal 231)
-   defining Unicode char U+00E8 (decimal 232)
-   defining Unicode char U+00E9 (decimal 233)
-   defining Unicode char U+00EA (decimal 234)
-   defining Unicode char U+00EB (decimal 235)
-   defining Unicode char U+00EC (decimal 236)
-   defining Unicode char U+00ED (decimal 237)
-   defining Unicode char U+00EE (decimal 238)
-   defining Unicode char U+00EF (decimal 239)
-   defining Unicode char U+00F0 (decimal 240)
-   defining Unicode char U+00F1 (decimal 241)
-   defining Unicode char U+00F2 (decimal 242)
-   defining Unicode char U+00F3 (decimal 243)
-   defining Unicode char U+00F4 (decimal 244)
-   defining Unicode char U+00F5 (decimal 245)
-   defining Unicode char U+00F6 (decimal 246)
-   defining Unicode char U+00F8 (decimal 248)
-   defining Unicode char U+00F9 (decimal 249)
-   defining Unicode char U+00FA (decimal 250)
-   defining Unicode char U+00FB (decimal 251)
-   defining Unicode char U+00FC (decimal 252)
-   defining Unicode char U+00FD (decimal 253)
-   defining Unicode char U+00FE (decimal 254)
-   defining Unicode char U+00FF (decimal 255)
-   defining Unicode char U+0102 (decimal 258)
-   defining Unicode char U+0103 (decimal 259)
-   defining Unicode char U+0104 (decimal 260)
-   defining Unicode char U+0105 (decimal 261)
-   defining Unicode char U+0106 (decimal 262)
-   defining Unicode char U+0107 (decimal 263)
-   defining Unicode char U+010C (decimal 268)
-   defining Unicode char U+010D (decimal 269)
-   defining Unicode char U+010E (decimal 270)
-   defining Unicode char U+010F (decimal 271)
-   defining Unicode char U+0110 (decimal 272)
-   defining Unicode char U+0111 (decimal 273)
-   defining Unicode char U+0118 (decimal 280)
-   defining Unicode char U+0119 (decimal 281)
-   defining Unicode char U+011A (decimal 282)
-   defining Unicode char U+011B (decimal 283)
-   defining Unicode char U+011E (decimal 286)
-   defining Unicode char U+011F (decimal 287)
-   defining Unicode char U+0130 (decimal 304)
-   defining Unicode char U+0131 (decimal 305)
-   defining Unicode char U+0132 (decimal 306)
-   defining Unicode char U+0133 (decimal 307)
-   defining Unicode char U+0139 (decimal 313)
-   defining Unicode char U+013A (decimal 314)
-   defining Unicode char U+013D (decimal 317)
-   defining Unicode char U+013E (decimal 318)
-   defining Unicode char U+0141 (decimal 321)
-   defining Unicode char U+0142 (decimal 322)
-   defining Unicode char U+0143 (decimal 323)
-   defining Unicode char U+0144 (decimal 324)
-   defining Unicode char U+0147 (decimal 327)
-   defining Unicode char U+0148 (decimal 328)
-   defining Unicode char U+014A (decimal 330)
-   defining Unicode char U+014B (decimal 331)
-   defining Unicode char U+0150 (decimal 336)
-   defining Unicode char U+0151 (decimal 337)
-   defining Unicode char U+0152 (decimal 338)
-   defining Unicode char U+0153 (decimal 339)
-   defining Unicode char U+0154 (decimal 340)
-   defining Unicode char U+0155 (decimal 341)
-   defining Unicode char U+0158 (decimal 344)
-   defining Unicode char U+0159 (decimal 345)
-   defining Unicode char U+015A (decimal 346)
-   defining Unicode char U+015B (decimal 347)
-   defining Unicode char U+015E (decimal 350)
-   defining Unicode char U+015F (decimal 351)
-   defining Unicode char U+0160 (decimal 352)
-   defining Unicode char U+0161 (decimal 353)
-   defining Unicode char U+0162 (decimal 354)
-   defining Unicode char U+0163 (decimal 355)
-   defining Unicode char U+0164 (decimal 356)
-   defining Unicode char U+0165 (decimal 357)
-   defining Unicode char U+016E (decimal 366)
-   defining Unicode char U+016F (decimal 367)
-   defining Unicode char U+0170 (decimal 368)
-   defining Unicode char U+0171 (decimal 369)
-   defining Unicode char U+0178 (decimal 376)
-   defining Unicode char U+0179 (decimal 377)
-   defining Unicode char U+017A (decimal 378)
-   defining Unicode char U+017B (decimal 379)
-   defining Unicode char U+017C (decimal 380)
-   defining Unicode char U+017D (decimal 381)
-   defining Unicode char U+017E (decimal 382)
-   defining Unicode char U+200C (decimal 8204)
-   defining Unicode char U+2013 (decimal 8211)
-   defining Unicode char U+2014 (decimal 8212)
-   defining Unicode char U+2018 (decimal 8216)
-   defining Unicode char U+2019 (decimal 8217)
-   defining Unicode char U+201A (decimal 8218)
-   defining Unicode char U+201C (decimal 8220)
-   defining Unicode char U+201D (decimal 8221)
-   defining Unicode char U+201E (decimal 8222)
-   defining Unicode char U+2030 (decimal 8240)
-   defining Unicode char U+2031 (decimal 8241)
-   defining Unicode char U+2039 (decimal 8249)
-   defining Unicode char U+203A (decimal 8250)
-   defining Unicode char U+2423 (decimal 9251)
-)
-Now handling font encoding OT1 ...
-... processing UTF-8 mapping file for font encoding OT1
-(/usr/local/texlive/2015/texmf-dist/tex/latex/base/ot1enc.dfu
-File: ot1enc.dfu 2014/09/29 v1.1m UTF-8 support for inputenc
-   defining Unicode char U+00A1 (decimal 161)
-   defining Unicode char U+00A3 (decimal 163)
-   defining Unicode char U+00B8 (decimal 184)
-   defining Unicode char U+00BF (decimal 191)
-   defining Unicode char U+00C5 (decimal 197)
-   defining Unicode char U+00C6 (decimal 198)
-   defining Unicode char U+00D8 (decimal 216)
-   defining Unicode char U+00DF (decimal 223)
-   defining Unicode char U+00E6 (decimal 230)
-   defining Unicode char U+00EC (decimal 236)
-   defining Unicode char U+00ED (decimal 237)
-   defining Unicode char U+00EE (decimal 238)
-   defining Unicode char U+00EF (decimal 239)
-   defining Unicode char U+00F8 (decimal 248)
-   defining Unicode char U+0131 (decimal 305)
-   defining Unicode char U+0141 (decimal 321)
-   defining Unicode char U+0142 (decimal 322)
-   defining Unicode char U+0152 (decimal 338)
-   defining Unicode char U+0153 (decimal 339)
-   defining Unicode char U+2013 (decimal 8211)
-   defining Unicode char U+2014 (decimal 8212)
-   defining Unicode char U+2018 (decimal 8216)
-   defining Unicode char U+2019 (decimal 8217)
-   defining Unicode char U+201C (decimal 8220)
-   defining Unicode char U+201D (decimal 8221)
-)
-Now handling font encoding OMS ...
-... processing UTF-8 mapping file for font encoding OMS
-(/usr/local/texlive/2015/texmf-dist/tex/latex/base/omsenc.dfu
-File: omsenc.dfu 2014/09/29 v1.1m UTF-8 support for inputenc
-   defining Unicode char U+00A7 (decimal 167)
-   defining Unicode char U+00B6 (decimal 182)
-   defining Unicode char U+00B7 (decimal 183)
-   defining Unicode char U+2020 (decimal 8224)
-   defining Unicode char U+2021 (decimal 8225)
-   defining Unicode char U+2022 (decimal 8226)
+Already applied: [0000/00/00] Old fixltx2e package on input line 53.
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/unicode-math/unicode-math.sty
+Package: unicode-math 2017/01/27 v0.8d Unicode maths in XeLaTeX and LuaLaTeX
+(/usr/local/texlive/2017/texmf-dist/tex/latex/l3kernel/expl3.sty
+Package: expl3 2017/05/13 L3 programming layer (loader) 
+(/usr/local/texlive/2017/texmf-dist/tex/latex/l3kernel/expl3-code.tex
+Package: expl3 2017/05/13 L3 programming layer (code)
+\c_max_int=\count102
+\l_tmpa_int=\count103
+\l_tmpb_int=\count104
+\g_tmpa_int=\count105
+\g_tmpb_int=\count106
+\g__prg_map_int=\count107
+\c_log_iow=\count108
+\l_iow_line_count_int=\count109
+\l__iow_line_target_int=\count110
+\l__iow_one_indent_int=\count111
+\l__iow_indent_int=\count112
+\c_zero_dim=\dimen134
+\c_max_dim=\dimen135
+\l_tmpa_dim=\dimen136
+\l_tmpb_dim=\dimen137
+\g_tmpa_dim=\dimen138
+\g_tmpb_dim=\dimen139
+\c_zero_skip=\skip74
+\c_max_skip=\skip75
+\l_tmpa_skip=\skip76
+\l_tmpb_skip=\skip77
+\g_tmpa_skip=\skip78
+\g_tmpb_skip=\skip79
+\c_zero_muskip=\muskip11
+\c_max_muskip=\muskip12
+\l_tmpa_muskip=\muskip13
+\l_tmpb_muskip=\muskip14
+\g_tmpa_muskip=\muskip15
+\g_tmpb_muskip=\muskip16
+\l_keys_choice_int=\count113
+\c__fp_leading_shift_int=\count114
+\c__fp_middle_shift_int=\count115
+\c__fp_trailing_shift_int=\count116
+\c__fp_big_leading_shift_int=\count117
+\c__fp_big_middle_shift_int=\count118
+\c__fp_big_trailing_shift_int=\count119
+\c__fp_Bigg_leading_shift_int=\count120
+\c__fp_Bigg_middle_shift_int=\count121
+\c__fp_Bigg_trailing_shift_int=\count122
+\l__sort_length_int=\count123
+\l__sort_min_int=\count124
+\l__sort_top_int=\count125
+\l__sort_max_int=\count126
+\l__sort_true_max_int=\count127
+\l__sort_block_int=\count128
+\l__sort_begin_int=\count129
+\l__sort_end_int=\count130
+\l__sort_A_int=\count131
+\l__sort_B_int=\count132
+\l__sort_C_int=\count133
+\c_empty_box=\box53
+\l_tmpa_box=\box54
+\l_tmpb_box=\box55
+\g_tmpa_box=\box56
+\g_tmpb_box=\box57
+\l__box_top_dim=\dimen140
+\l__box_bottom_dim=\dimen141
+\l__box_left_dim=\dimen142
+\l__box_right_dim=\dimen143
+\l__box_top_new_dim=\dimen144
+\l__box_bottom_new_dim=\dimen145
+\l__box_left_new_dim=\dimen146
+\l__box_right_new_dim=\dimen147
+\l__box_internal_box=\box58
+\l__coffin_internal_box=\box59
+\l__coffin_internal_dim=\dimen148
+\l__coffin_offset_x_dim=\dimen149
+\l__coffin_offset_y_dim=\dimen150
+\l__coffin_x_dim=\dimen151
+\l__coffin_y_dim=\dimen152
+\l__coffin_x_prime_dim=\dimen153
+\l__coffin_y_prime_dim=\dimen154
+\c_empty_coffin=\box60
+\l__coffin_aligned_coffin=\box61
+\l__coffin_aligned_internal_coffin=\box62
+\l_tmpa_coffin=\box63
+\l_tmpb_coffin=\box64
+\l__coffin_display_coffin=\box65
+\l__coffin_display_coord_coffin=\box66
+\l__coffin_display_pole_coffin=\box67
+\l__coffin_display_offset_dim=\dimen155
+\l__coffin_display_x_dim=\dimen156
+\l__coffin_display_y_dim=\dimen157
+\l__coffin_bounding_shift_dim=\dimen158
+\l__coffin_left_corner_dim=\dimen159
+\l__coffin_right_corner_dim=\dimen160
+\l__coffin_bottom_corner_dim=\dimen161
+\l__coffin_top_corner_dim=\dimen162
+\l__coffin_scaled_total_height_dim=\dimen163
+\l__coffin_scaled_width_dim=\dimen164
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/l3kernel/l3xdvipdfmx.def
+File: l3xdvidpfmx.def 2017/03/18 v L3 Experimental driver: xdvipdfmx
+\l__driver_tmp_box=\box68
+)) (/usr/local/texlive/2017/texmf-dist/tex/latex/ucharcat/ucharcat.sty
+Package: ucharcat 2015/11/19 v0.03 ucharcat for luaLaTeX (DPC)
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/l3packages/xparse/xparse.sty
+Package: xparse 2017/05/13 L3 Experimental document command parser
+\l__xparse_current_arg_int=\count134
+\g__xparse_grabber_int=\count135
+\l__xparse_m_args_int=\count136
+\l__xparse_mandatory_args_int=\count137
+\l__xparse_v_nesting_int=\count138
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/l3packages/l3keys2e/l3keys2e.st
+y
+Package: l3keys2e 2017/05/13 LaTeX2e option processing using LaTeX3 keys
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/fontspec/fontspec.sty
+Package: fontspec 2017/03/31 v2.6a Font selection for XeLaTeX and LuaLaTeX
+(/usr/local/texlive/2017/texmf-dist/tex/latex/fontspec/fontspec-xetex.sty
+Package: fontspec-xetex 2017/03/31 v2.6a Font selection for XeLaTeX and LuaLaTe
+X
+\l__fontspec_script_int=\count139
+\l__fontspec_language_int=\count140
+\l__fontspec_strnum_int=\count141
+\l__fontspec_tmp_int=\count142
+\l__fontspec_em_int=\count143
+\l__fontspec_emdef_int=\count144
+\l__fontspec_strong_int=\count145
+\l__fontspec_strongdef_int=\count146
+\l__fontspec_tmpa_dim=\dimen165
+\l__fontspec_tmpb_dim=\dimen166
+\l__fontspec_tmpc_dim=\dimen167
+\g__file_internal_ior=\read1
+(/usr/local/texlive/2017/texmf-dist/tex/latex/base/fontenc.sty
+Package: fontenc 2017/04/05 v2.0i Standard LaTeX package
+(/usr/local/texlive/2017/texmf-dist/tex/latex/base/tuenc.def
+File: tuenc.def 2017/04/05 v2.0i Standard LaTeX file
+LaTeX Font Info:    Redeclaring font encoding TU on input line 82.
+))
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \fontspec with sig. 'O{}mO{}' on line 472.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setmainfont with sig. 'O{}mO{}' on line 483.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setsansfont with sig. 'O{}mO{}' on line 503.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setmonofont with sig. 'O{}mO{}' on line 523.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setromanfont with sig. 'O{}mO{}' on line 543.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setmathrm with sig. 'O{}mO{}' on line 547.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setboldmathrm with sig. 'O{}mO{}' on line 555.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setmathsf with sig. 'O{}mO{}' on line 563.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setmathtt with sig. 'O{}mO{}' on line 571.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \newfontfamily with sig. 'mO{}mO{}' on line 586.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \newfontface with sig. 'mO{}mO{}' on line 603.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \defaultfontfeatures with sig. 't+om' on line 614.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \addfontfeatures with sig. 'm' on line 645.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \newfontfeature with sig. 'mm' on line 672.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \newAATfeature with sig. 'mmmm' on line 686.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \newopentypefeature with sig. 'mmm' on line 700.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \newICUfeature with sig. 'mmm' on line 720.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \aliasfontfeature with sig. 'mm' on line 724.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \aliasfontfeatureoption with sig. 'mmm' on line 745.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \newfontscript with sig. 'mm' on line 778.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \newfontlanguage with sig. 'mm' on line 782.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \DeclareFontsExtensions with sig. 'm' on line 786.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \IfFontFeatureActiveTF with sig. 'mmm' on line 797.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \EncodingCommand with sig. 'mO{}m' on line 3438.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \EncodingAccent with sig. 'mm' on line 3444.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \EncodingSymbol with sig. 'mm' on line 3450.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \EncodingComposite with sig. 'mmm' on line 3456.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \EncodingCompositeCommand with sig. 'mmm' on line 3462.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \DeclareUnicodeEncoding with sig. 'mm' on line 3487.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \UndeclareSymbol with sig. 'm' on line 3493.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \UndeclareComposite with sig. 'mm' on line 3501.
+.................................................
+(/usr/local/texlive/2017/texmf-dist/tex/latex/fontspec/fontspec.cfg)
+LaTeX Info: Redefining \itshape on input line 3686.
+LaTeX Info: Redefining \slshape on input line 3691.
+LaTeX Info: Redefining \scshape on input line 3696.
+LaTeX Info: Redefining \upshape on input line 3701.
+LaTeX Info: Redefining \em on input line 3731.
+LaTeX Info: Redefining \emph on input line 3756.
+LaTeX Info: Redefining \- on input line 3807.
+.................................................
+. LaTeX info: "xparse/redefine-command"
+. 
+. Redefining command \oldstylenums with sig. 'm' on line 3902.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \liningnums with sig. 'm' on line 3906.
+.................................................
+)) (/usr/local/texlive/2017/texmf-dist/tex/latex/base/fix-cm.sty
+Package: fix-cm 2015/01/14 v1.1t fixes to LaTeX
+(/usr/local/texlive/2017/texmf-dist/tex/latex/base/ts1enc.def
+File: ts1enc.def 2001/06/05 v3.0e (jk/car/fm) Standard LaTeX file
+)) (/usr/local/texlive/2017/texmf-dist/tex/latex/filehook/filehook.sty
+Package: filehook 2011/10/12 v0.5d Hooks for input files
 )
-Now handling font encoding OMX ...
-... no UTF-8 mapping file for font encoding OMX
-Now handling font encoding U ...
-... no UTF-8 mapping file for font encoding U
-   defining Unicode char U+00A9 (decimal 169)
-   defining Unicode char U+00AA (decimal 170)
-   defining Unicode char U+00AE (decimal 174)
-   defining Unicode char U+00BA (decimal 186)
-   defining Unicode char U+02C6 (decimal 710)
-   defining Unicode char U+02DC (decimal 732)
-   defining Unicode char U+200C (decimal 8204)
-   defining Unicode char U+2026 (decimal 8230)
-   defining Unicode char U+2122 (decimal 8482)
-   defining Unicode char U+2423 (decimal 9251)
-)) (/usr/local/texlive/2015/texmf-dist/tex/latex/upquote/upquote.sty
+\g__um_fam_int=\count147
+\g__um_primekern_muskip=\muskip17
+\l__um_primecount_int=\count148
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \unimathsetup with sig. 'm' on line 166.
+.................................................
+
+(/usr/local/texlive/2017/texmf-dist/tex/latex/unicode-math/unicode-math-xetex.s
+ty
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setmathfontface with sig. 'mO{}mO{}' on line 165.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setoperatorfont with sig. 'm' on line 169.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \setmathfont with sig. 'O{}mO{}' on line 278.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \addnolimits with sig. 'm' on line 520.
+.................................................
+.................................................
+. LaTeX info: "xparse/define-command"
+. 
+. Defining command \removenolimits with sig. 'm' on line 524.
+.................................................
+
+(/usr/local/texlive/2017/texmf-dist/tex/latex/unicode-math/unicode-math-table.t
+ex)
+.................................................
+. unicode-math info: "patch-macro"
+. 
+. I'm going to patch macro \subarray.
+.................................................
+.................................................
+. unicode-math info: "patch-macro"
+. 
+. I'm going to patch macro \r@@t.
+.................................................
+))
+.................................................
+. fontspec info: "set-scale"
+. 
+. Source Code Pro scale = 0.8979170143607594.
+.................................................
+\g__fontspec_family_SourceCodePro_int=\count149
+.................................................
+. fontspec info: "set-scale"
+. 
+. Source Code Pro scale = 0.8979170143607594.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. Source Code Pro/B scale = 0.8979170143607594.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. Source Code Pro/I scale = 0.8979170143607594.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. Source Code Pro/BI scale = 0.8979170143607594.
+.................................................
+.................................................
+. fontspec info: "defining-font"
+. 
+. Font family 'SourceCodePro(0)' created for font 'Source Code Pro' with
+. options
+. [Ligatures=TeX,Scale=MatchLowercase,WordSpace={1,0,0},HyphenChar=None,Punctua
+tionSpace=WordSpace,Mapping=tex-ansi,Scale=0.7].
+.  
+.  This font family consists of the following NFSS series/shapes:
+.  
+. - 'normal' (m/n) with NFSS spec.: <->s*[0.7]"Source Code
+. Pro/OT:script=latn;language=DFLT;mapping=tex-ansi;"
+. - 'small caps'  (m/sc) with NFSS spec.: 
+. and font adjustment code:
+. \fontdimen 2\font =1\fontdimen 2\font \fontdimen 3\font =0\fontdimen 3\font
+. \fontdimen 4\font =0\fontdimen 4\font \fontdimen 7\font =0\fontdimen 2\font
+. \hyphenchar \font =\c_minus_one 
+. - 'bold' (bx/n) with NFSS spec.: <->s*[0.7]"Source Code
+. Pro/B/OT:script=latn;language=DFLT;mapping=tex-ansi;"
+. - 'bold small caps'  (bx/sc) with NFSS spec.: 
+. and font adjustment code:
+. \fontdimen 2\font =1\fontdimen 2\font \fontdimen 3\font =0\fontdimen 3\font
+. \fontdimen 4\font =0\fontdimen 4\font \fontdimen 7\font =0\fontdimen 2\font
+. \hyphenchar \font =\c_minus_one 
+. - 'italic' (m/it) with NFSS spec.: <->s*[0.7]"Source Code
+. Pro/I/OT:script=latn;language=DFLT;mapping=tex-ansi;"
+. - 'italic small caps'  (m/itsc) with NFSS spec.: 
+. and font adjustment code:
+. \fontdimen 2\font =1\fontdimen 2\font \fontdimen 3\font =0\fontdimen 3\font
+. \fontdimen 4\font =0\fontdimen 4\font \fontdimen 7\font =0\fontdimen 2\font
+. \hyphenchar \font =\c_minus_one 
+. - 'bold italic' (bx/it) with NFSS spec.: <->s*[0.7]"Source Code
+. Pro/BI/OT:script=latn;language=DFLT;mapping=tex-ansi;"
+. - 'bold italic small caps'  (bx/itsc) with NFSS spec.: 
+. and font adjustment code:
+. \fontdimen 2\font =1\fontdimen 2\font \fontdimen 3\font =0\fontdimen 3\font
+. \fontdimen 4\font =0\fontdimen 4\font \fontdimen 7\font =0\fontdimen 2\font
+. \hyphenchar \font =\c_minus_one 
+.................................................
+LaTeX Info: Redefining \ttfamily on input line 17.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/upquote/upquote.sty
 Package: upquote 2012/04/19 v1.3 upright-quote and grave-accent glyphs in verba
 tim
-(/usr/local/texlive/2015/texmf-dist/tex/latex/base/textcomp.sty
-Package: textcomp 2005/09/27 v1.99g Standard LaTeX package
+(/usr/local/texlive/2017/texmf-dist/tex/latex/base/textcomp.sty
+Package: textcomp 2017/04/05 v2.0i Standard LaTeX package
 Package textcomp Info: Sub-encoding information:
 (textcomp)               5 = only ISO-Adobe without \textcurrency
 (textcomp)               4 = 5 + \texteuro
@@ -442,82 +657,10 @@ Package textcomp Info: Sub-encoding information:
 (textcomp)             Family '?' is the default used for unknown fonts.
 (textcomp)             See the documentation for details.
 Package textcomp Info: Setting ? sub-encoding to TS1/1 on input line 79.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/base/ts1enc.def
+(/usr/local/texlive/2017/texmf-dist/tex/latex/base/ts1enc.def
 File: ts1enc.def 2001/06/05 v3.0e (jk/car/fm) Standard LaTeX file
-Now handling font encoding TS1 ...
-... processing UTF-8 mapping file for font encoding TS1
-(/usr/local/texlive/2015/texmf-dist/tex/latex/base/ts1enc.dfu
-File: ts1enc.dfu 2014/09/29 v1.1m UTF-8 support for inputenc
-   defining Unicode char U+00A2 (decimal 162)
-   defining Unicode char U+00A3 (decimal 163)
-   defining Unicode char U+00A4 (decimal 164)
-   defining Unicode char U+00A5 (decimal 165)
-   defining Unicode char U+00A6 (decimal 166)
-   defining Unicode char U+00A7 (decimal 167)
-   defining Unicode char U+00A8 (decimal 168)
-   defining Unicode char U+00A9 (decimal 169)
-   defining Unicode char U+00AA (decimal 170)
-   defining Unicode char U+00AC (decimal 172)
-   defining Unicode char U+00AE (decimal 174)
-   defining Unicode char U+00AF (decimal 175)
-   defining Unicode char U+00B0 (decimal 176)
-   defining Unicode char U+00B1 (decimal 177)
-   defining Unicode char U+00B2 (decimal 178)
-   defining Unicode char U+00B3 (decimal 179)
-   defining Unicode char U+00B4 (decimal 180)
-   defining Unicode char U+00B5 (decimal 181)
-   defining Unicode char U+00B6 (decimal 182)
-   defining Unicode char U+00B7 (decimal 183)
-   defining Unicode char U+00B9 (decimal 185)
-   defining Unicode char U+00BA (decimal 186)
-   defining Unicode char U+00BC (decimal 188)
-   defining Unicode char U+00BD (decimal 189)
-   defining Unicode char U+00BE (decimal 190)
-   defining Unicode char U+00D7 (decimal 215)
-   defining Unicode char U+00F7 (decimal 247)
-   defining Unicode char U+0192 (decimal 402)
-   defining Unicode char U+02C7 (decimal 711)
-   defining Unicode char U+02D8 (decimal 728)
-   defining Unicode char U+02DD (decimal 733)
-   defining Unicode char U+0E3F (decimal 3647)
-   defining Unicode char U+2016 (decimal 8214)
-   defining Unicode char U+2020 (decimal 8224)
-   defining Unicode char U+2021 (decimal 8225)
-   defining Unicode char U+2022 (decimal 8226)
-   defining Unicode char U+2030 (decimal 8240)
-   defining Unicode char U+2031 (decimal 8241)
-   defining Unicode char U+203B (decimal 8251)
-   defining Unicode char U+203D (decimal 8253)
-   defining Unicode char U+2044 (decimal 8260)
-   defining Unicode char U+204E (decimal 8270)
-   defining Unicode char U+2052 (decimal 8274)
-   defining Unicode char U+20A1 (decimal 8353)
-   defining Unicode char U+20A4 (decimal 8356)
-   defining Unicode char U+20A6 (decimal 8358)
-   defining Unicode char U+20A9 (decimal 8361)
-   defining Unicode char U+20AB (decimal 8363)
-   defining Unicode char U+20AC (decimal 8364)
-   defining Unicode char U+20B1 (decimal 8369)
-   defining Unicode char U+2103 (decimal 8451)
-   defining Unicode char U+2116 (decimal 8470)
-   defining Unicode char U+2117 (decimal 8471)
-   defining Unicode char U+211E (decimal 8478)
-   defining Unicode char U+2120 (decimal 8480)
-   defining Unicode char U+2122 (decimal 8482)
-   defining Unicode char U+2126 (decimal 8486)
-   defining Unicode char U+2127 (decimal 8487)
-   defining Unicode char U+212E (decimal 8494)
-   defining Unicode char U+2190 (decimal 8592)
-   defining Unicode char U+2191 (decimal 8593)
-   defining Unicode char U+2192 (decimal 8594)
-   defining Unicode char U+2193 (decimal 8595)
-   defining Unicode char U+2329 (decimal 9001)
-   defining Unicode char U+232A (decimal 9002)
-   defining Unicode char U+2422 (decimal 9250)
-   defining Unicode char U+25E6 (decimal 9702)
-   defining Unicode char U+25EF (decimal 9711)
-   defining Unicode char U+266A (decimal 9834)
-))
+LaTeX Font Info:    Redeclaring font encoding TS1 on input line 47.
+)
 LaTeX Info: Redefining \oldstylenums on input line 334.
 Package textcomp Info: Setting cmr sub-encoding to TS1/0 on input line 349.
 Package textcomp Info: Setting cmss sub-encoding to TS1/0 on input line 350.
@@ -581,510 +724,702 @@ Package textcomp Info: Setting pplx sub-encoding to TS1/3 on input line 406.
 Package textcomp Info: Setting pplj sub-encoding to TS1/3 on input line 407.
 Package textcomp Info: Setting ptmx sub-encoding to TS1/4 on input line 408.
 Package textcomp Info: Setting ptmj sub-encoding to TS1/4 on input line 409.
-)) (/usr/local/texlive/2015/texmf-dist/tex/latex/microtype/microtype.sty
-Package: microtype 2013/05/23 v2.5a Micro-typographical refinements (RS)
-(/usr/local/texlive/2015/texmf-dist/tex/latex/graphics/keyval.sty
+)) (/usr/local/texlive/2017/texmf-dist/tex/latex/microtype/microtype.sty
+Package: microtype 2016/05/14 v2.6a Micro-typographical refinements (RS)
+(/usr/local/texlive/2017/texmf-dist/tex/latex/graphics/keyval.sty
 Package: keyval 2014/10/28 v1.15 key=value parser (DPC)
-\KV@toks@=\toks22
+\KV@toks@=\toks20
 )
-\MT@toks=\toks23
-\MT@count=\count102
-LaTeX Info: Redefining \textls on input line 766.
-\MT@outer@kern=\dimen134
-LaTeX Info: Redefining \textmicrotypecontext on input line 1285.
-\MT@listname@count=\count103
-(/usr/local/texlive/2015/texmf-dist/tex/latex/microtype/microtype-pdftex.def
-File: microtype-pdftex.def 2013/05/23 v2.5a Definitions specific to pdftex (RS)
-
-LaTeX Info: Redefining \lsstyle on input line 915.
-LaTeX Info: Redefining \lslig on input line 915.
-\MT@outer@space=\skip74
+\MT@toks=\toks21
+\MT@count=\count150
+LaTeX Info: Redefining \textls on input line 774.
+\MT@outer@kern=\dimen168
+LaTeX Info: Redefining \textmicrotypecontext on input line 1310.
+\MT@listname@count=\count151
+(/usr/local/texlive/2017/texmf-dist/tex/latex/microtype/microtype-xetex.def
+File: microtype-xetex.def 2016/05/14 v2.6a Definitions specific to xetex (RS)
+LaTeX Info: Redefining \lsstyle on input line 255.
 )
 Package microtype Info: Loading configuration file microtype.cfg.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/microtype/microtype.cfg
-File: microtype.cfg 2013/05/23 v2.5a microtype main configuration file (RS)
-)) (/usr/local/texlive/2015/texmf-dist/tex/latex/geometry/geometry.sty
-Package: geometry 2010/09/12 v5.6 Page Geometry
-(/usr/local/texlive/2015/texmf-dist/tex/generic/oberdiek/ifpdf.sty
-Package: ifpdf 2011/01/30 v2.3 Provides the ifpdf switch (HO)
-Package ifpdf Info: pdfTeX in PDF mode is detected.
-) (/usr/local/texlive/2015/texmf-dist/tex/generic/oberdiek/ifvtex.sty
-Package: ifvtex 2010/03/01 v1.5 Detect VTeX and its facilities (HO)
-Package ifvtex Info: VTeX not detected.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/microtype/microtype.cfg
+File: microtype.cfg 2016/05/14 v2.6a microtype main configuration file (RS)
+)) (/usr/local/texlive/2017/texmf-dist/tex/latex/parskip/parskip.sty
+Package: parskip 2001/04/09 non-zero parskip adjustments
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/xcolor/xcolor.sty
+Package: xcolor 2016/05/11 v2.12 LaTeX color extensions (UK)
+(/usr/local/texlive/2017/texmf-dist/tex/latex/graphics-cfg/color.cfg
+File: color.cfg 2016/01/02 v1.6 sample color configuration
 )
-\Gm@cnth=\count104
-\Gm@cntv=\count105
-\c@Gm@tempcnt=\count106
-\Gm@bindingoffset=\dimen135
-\Gm@wd@mp=\dimen136
-\Gm@odd@mp=\dimen137
-\Gm@even@mp=\dimen138
-\Gm@layoutwidth=\dimen139
-\Gm@layoutheight=\dimen140
-\Gm@layouthoffset=\dimen141
-\Gm@layoutvoffset=\dimen142
-\Gm@dimlist=\toks24
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/hyperref/hyperref.sty
-Package: hyperref 2012/11/06 v6.83m Hypertext links for LaTeX
-(/usr/local/texlive/2015/texmf-dist/tex/generic/oberdiek/hobsub-hyperref.sty
-Package: hobsub-hyperref 2012/05/28 v1.13 Bundle oberdiek, subset hyperref (HO)
-
-(/usr/local/texlive/2015/texmf-dist/tex/generic/oberdiek/hobsub-generic.sty
-Package: hobsub-generic 2012/05/28 v1.13 Bundle oberdiek, subset generic (HO)
-Package: hobsub 2012/05/28 v1.13 Construct package bundles (HO)
-Package: infwarerr 2010/04/08 v1.3 Providing info/warning/error messages (HO)
-Package: ltxcmds 2011/11/09 v1.22 LaTeX kernel commands for general use (HO)
+Package xcolor Info: Driver file: xetex.def on input line 225.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/graphics-def/xetex.def
+File: xetex.def 2016/07/11 v4.10 LaTeX color/graphics driver for XeTeX (L3/RRM/
+JK)
+)
+Package xcolor Info: Model `cmy' substituted by `cmy0' on input line 1348.
+Package xcolor Info: Model `RGB' extended on input line 1364.
+Package xcolor Info: Model `HTML' substituted by `rgb' on input line 1366.
+Package xcolor Info: Model `Hsb' substituted by `hsb' on input line 1367.
+Package xcolor Info: Model `tHsb' substituted by `hsb' on input line 1368.
+Package xcolor Info: Model `HSB' substituted by `hsb' on input line 1369.
+Package xcolor Info: Model `Gray' substituted by `gray' on input line 1370.
+Package xcolor Info: Model `wave' substituted by `hsb' on input line 1371.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/graphics/dvipsnam.def
+File: dvipsnam.def 2016/06/17 v3.0m Driver-dependent file (DPC,SPQR)
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/xcolor/svgnam.def
+File: svgnam.def 2016/05/11 v2.12 Predefined colors according to SVG 1.1 (UK)
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/xcolor/x11nam.def
+File: x11nam.def 2016/05/11 v2.12 Predefined colors according to Unix/X11 (UK)
+)) (/usr/local/texlive/2017/texmf-dist/tex/latex/hyperref/hyperref.sty
+Package: hyperref 2017/03/14 v6.85a Hypertext links for LaTeX
+(/usr/local/texlive/2017/texmf-dist/tex/generic/oberdiek/hobsub-hyperref.sty
+Package: hobsub-hyperref 2016/05/16 v1.14 Bundle oberdiek, subset hyperref (HO)
+
+(/usr/local/texlive/2017/texmf-dist/tex/generic/oberdiek/hobsub-generic.sty
+Package: hobsub-generic 2016/05/16 v1.14 Bundle oberdiek, subset generic (HO)
+Package: hobsub 2016/05/16 v1.14 Construct package bundles (HO)
+Package: infwarerr 2016/05/16 v1.4 Providing info/warning/error messages (HO)
+Package: ltxcmds 2016/05/16 v1.23 LaTeX kernel commands for general use (HO)
 Package hobsub Info: Skipping package `ifluatex' (already loaded).
-Package hobsub Info: Skipping package `ifvtex' (already loaded).
-Package: intcalc 2007/09/27 v1.1 Expandable calculations with integers (HO)
-Package hobsub Info: Skipping package `ifpdf' (already loaded).
-Package: etexcmds 2011/02/16 v1.5 Avoid name clashes with e-TeX commands (HO)
+Package: ifvtex 2016/05/16 v1.6 Detect VTeX and its facilities (HO)
+Package ifvtex Info: VTeX not detected.
+Package: intcalc 2016/05/16 v1.2 Expandable calculations with integers (HO)
+Package: ifpdf 2017/03/15 v3.2 Provides the ifpdf switch
+Package: etexcmds 2016/05/16 v1.6 Avoid name clashes with e-TeX commands (HO)
 Package etexcmds Info: Could not find \expanded.
 (etexcmds)             That can mean that you are not using pdfTeX 1.50 or
 (etexcmds)             that some package has redefined \expanded.
 (etexcmds)             In the latter case, load this package earlier.
-Package: kvsetkeys 2012/04/25 v1.16 Key value parser (HO)
-Package: kvdefinekeys 2011/04/07 v1.3 Define keys (HO)
-Package: pdftexcmds 2011/11/29 v0.20 Utility functions of pdfTeX for LuaTeX (HO
+Package: kvsetkeys 2016/05/16 v1.17 Key value parser (HO)
+Package: kvdefinekeys 2016/05/16 v1.4 Define keys (HO)
+Package: pdftexcmds 2017/03/19 v0.25 Utility functions of pdfTeX for LuaTeX (HO
 )
 Package pdftexcmds Info: LuaTeX not detected.
+Package pdftexcmds Info: pdfTeX >= 1.30 not detected.
 Package pdftexcmds Info: \pdf@primitive is available.
 Package pdftexcmds Info: \pdf@ifprimitive is available.
-Package pdftexcmds Info: \pdfdraftmode found.
-Package: pdfescape 2011/11/25 v1.13 Implements pdfTeX's escape features (HO)
-Package: bigintcalc 2012/04/08 v1.3 Expandable calculations on big integers (HO
+Package pdftexcmds Info: \pdfdraftmode not found.
+Package: pdfescape 2016/05/16 v1.14 Implements pdfTeX's escape features (HO)
+Package: bigintcalc 2016/05/16 v1.4 Expandable calculations on big integers (HO
 )
-Package: bitset 2011/01/30 v1.1 Handle bit-vector datatype (HO)
-Package: uniquecounter 2011/01/30 v1.2 Provide unlimited unique counter (HO)
+Package: bitset 2016/05/16 v1.2 Handle bit-vector datatype (HO)
+Package: uniquecounter 2016/05/16 v1.3 Provide unlimited unique counter (HO)
 )
 Package hobsub Info: Skipping package `hobsub' (already loaded).
-Package: letltxmacro 2010/09/02 v1.4 Let assignment for LaTeX macros (HO)
-Package: hopatch 2012/05/28 v1.2 Wrapper for package hooks (HO)
-Package: xcolor-patch 2011/01/30 xcolor patch
-Package: atveryend 2011/06/30 v1.8 Hooks at the very end of document (HO)
+Package: letltxmacro 2016/05/16 v1.5 Let assignment for LaTeX macros (HO)
+Package: hopatch 2016/05/16 v1.3 Wrapper for package hooks (HO)
+Package: xcolor-patch 2016/05/16 xcolor patch
+Package: atveryend 2016/05/16 v1.9 Hooks at the very end of document (HO)
 Package atveryend Info: \enddocument detected (standard20110627).
-Package: atbegshi 2011/10/05 v1.16 At begin shipout hook (HO)
-Package: refcount 2011/10/16 v3.4 Data extraction from label references (HO)
-Package: hycolor 2011/01/30 v1.7 Color options for hyperref/bookmark (HO)
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/oberdiek/auxhook.sty
-Package: auxhook 2011/03/04 v1.3 Hooks for auxiliary files (HO)
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/oberdiek/kvoptions.sty
-Package: kvoptions 2011/06/30 v3.11 Key value format for package options (HO)
+Package: atbegshi 2016/06/09 v1.18 At begin shipout hook (HO)
+Package: refcount 2016/05/16 v3.5 Data extraction from label references (HO)
+Package: hycolor 2016/05/16 v1.8 Color options for hyperref/bookmark (HO)
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/oberdiek/auxhook.sty
+Package: auxhook 2016/05/16 v1.4 Hooks for auxiliary files (HO)
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/oberdiek/kvoptions.sty
+Package: kvoptions 2016/05/16 v3.12 Key value format for package options (HO)
 )
-\@linkdim=\dimen143
-\Hy@linkcounter=\count107
-\Hy@pagecounter=\count108
-(/usr/local/texlive/2015/texmf-dist/tex/latex/hyperref/pd1enc.def
-File: pd1enc.def 2012/11/06 v6.83m Hyperref: PDFDocEncoding definition (HO)
-Now handling font encoding PD1 ...
-... no UTF-8 mapping file for font encoding PD1
+\@linkdim=\dimen169
+\Hy@linkcounter=\count152
+\Hy@pagecounter=\count153
+(/usr/local/texlive/2017/texmf-dist/tex/latex/hyperref/pd1enc.def
+File: pd1enc.def 2017/03/14 v6.85a Hyperref: PDFDocEncoding definition (HO)
 )
-\Hy@SavedSpaceFactor=\count109
-(/usr/local/texlive/2015/texmf-dist/tex/latex/latexconfig/hyperref.cfg
+\Hy@SavedSpaceFactor=\count154
+(/usr/local/texlive/2017/texmf-dist/tex/latex/latexconfig/hyperref.cfg
 File: hyperref.cfg 2002/06/06 v1.2 hyperref configuration of TeXLive
 )
-Package hyperref Info: Hyper figures OFF on input line 4443.
-Package hyperref Info: Link nesting OFF on input line 4448.
-Package hyperref Info: Hyper index ON on input line 4451.
-Package hyperref Info: Plain pages OFF on input line 4458.
-Package hyperref Info: Backreferencing OFF on input line 4463.
+Package hyperref Info: Option `unicode' set `true' on input line 4374.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/hyperref/puenc.def
+File: puenc.def 2017/03/14 v6.85a Hyperref: PDF Unicode definition (HO)
+)
+Package hyperref Info: Hyper figures OFF on input line 4498.
+Package hyperref Info: Link nesting OFF on input line 4503.
+Package hyperref Info: Hyper index ON on input line 4506.
+Package hyperref Info: Plain pages OFF on input line 4513.
+Package hyperref Info: Backreferencing OFF on input line 4518.
 Package hyperref Info: Implicit mode ON; LaTeX internals redefined.
-Package hyperref Info: Bookmarks ON on input line 4688.
-\c@Hy@tempcnt=\count110
-(/usr/local/texlive/2015/texmf-dist/tex/latex/url/url.sty
-\Urlmuskip=\muskip11
+Package hyperref Info: Bookmarks ON on input line 4751.
+\c@Hy@tempcnt=\count155
+(/usr/local/texlive/2017/texmf-dist/tex/latex/url/url.sty
+\Urlmuskip=\muskip18
 Package: url 2013/09/16  ver 3.4  Verb mode for urls, etc.
 )
-LaTeX Info: Redefining \url on input line 5041.
-\XeTeXLinkMargin=\dimen144
-\Fld@menulength=\count111
-\Field@Width=\dimen145
-\Fld@charsize=\dimen146
-Package hyperref Info: Hyper figures OFF on input line 6295.
-Package hyperref Info: Link nesting OFF on input line 6300.
-Package hyperref Info: Hyper index ON on input line 6303.
-Package hyperref Info: backreferencing OFF on input line 6310.
-Package hyperref Info: Link coloring OFF on input line 6315.
-Package hyperref Info: Link coloring with OCG OFF on input line 6320.
-Package hyperref Info: PDF/A mode OFF on input line 6325.
-LaTeX Info: Redefining \ref on input line 6365.
-LaTeX Info: Redefining \pageref on input line 6369.
-\Hy@abspage=\count112
-\c@Item=\count113
-\c@Hfootnote=\count114
+LaTeX Info: Redefining \url on input line 5104.
+\XeTeXLinkMargin=\dimen170
+\Fld@menulength=\count156
+\Field@Width=\dimen171
+\Fld@charsize=\dimen172
+Package hyperref Info: Hyper figures OFF on input line 6358.
+Package hyperref Info: Link nesting OFF on input line 6363.
+Package hyperref Info: Hyper index ON on input line 6366.
+Package hyperref Info: backreferencing OFF on input line 6373.
+Package hyperref Info: Link coloring OFF on input line 6378.
+Package hyperref Info: Link coloring with OCG OFF on input line 6383.
+Package hyperref Info: PDF/A mode OFF on input line 6388.
+LaTeX Info: Redefining \ref on input line 6428.
+LaTeX Info: Redefining \pageref on input line 6432.
+\Hy@abspage=\count157
+\c@Item=\count158
+\c@Hfootnote=\count159
 )
 
-Package hyperref Message: Driver (autodetected): hpdftex.
+Package hyperref Message: Driver (autodetected): hxetex.
 
-(/usr/local/texlive/2015/texmf-dist/tex/latex/hyperref/hpdftex.def
-File: hpdftex.def 2012/11/06 v6.83m Hyperref driver for pdfTeX
-\Fld@listcount=\count115
-\c@bookmark@seq@number=\count116
-(/usr/local/texlive/2015/texmf-dist/tex/latex/oberdiek/rerunfilecheck.sty
-Package: rerunfilecheck 2011/04/15 v1.7 Rerun checks for auxiliary files (HO)
-Package uniquecounter Info: New unique counter `rerunfilecheck' on input line 2
-82.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/hyperref/hxetex.def
+File: hxetex.def 2017/03/14 v6.85a Hyperref driver for XeTeX
+(/usr/local/texlive/2017/texmf-dist/tex/generic/oberdiek/stringenc.sty
+Package: stringenc 2016/05/16 v1.11 Convert strings between diff. encodings (HO
 )
-\Hy@SectionHShift=\skip75
 )
-Package hyperref Info: Option `unicode' set `true' on input line 35.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/hyperref/puenc.def
-File: puenc.def 2012/11/06 v6.83m Hyperref: PDF Unicode definition (HO)
-Now handling font encoding PU ...
-... no UTF-8 mapping file for font encoding PU
+\pdfm@box=\box69
+\c@Hy@AnnotLevel=\count160
+\HyField@AnnotCount=\count161
+\Fld@listcount=\count162
+\c@bookmark@seq@number=\count163
+(/usr/local/texlive/2017/texmf-dist/tex/latex/oberdiek/rerunfilecheck.sty
+Package: rerunfilecheck 2016/05/16 v1.8 Rerun checks for auxiliary files (HO)
+Package rerunfilecheck Info: Feature \pdfmdfivesum is not available
+(rerunfilecheck)             (e.g. pdfTeX or LuaTeX with package `pdftexcmds').
+
+(rerunfilecheck)             Therefore file contents cannot be checked efficien
+tly
+(rerunfilecheck)             and the loading of the package is aborted.
 )
-Package hyperref Info: Option `colorlinks' set `true' on input line 35.
-Package hyperref Info: Option `breaklinks' set `true' on input line 35.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/graphics/color.sty
-Package: color 2014/10/28 v1.1a Standard LaTeX Color (DPC)
-(/usr/local/texlive/2015/texmf-dist/tex/latex/latexconfig/color.cfg
-File: color.cfg 2007/01/18 v1.5 color configuration of teTeX/TeXLive
+\Hy@SectionHShift=\skip80
 )
-Package color Info: Driver file: pdftex.def on input line 142.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/pdftex-def/pdftex.def
-File: pdftex.def 2011/05/27 v0.06d Graphics/color for pdfTeX
-\Gread@gobject=\count117
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/graphics/dvipsnam.def
-File: dvipsnam.def 2014/10/14 v3.0j Driver-dependent file (DPC,SPQR)
-)) (/usr/local/texlive/2015/texmf-dist/tex/latex/fancyvrb/fancyvrb.sty
+Package hyperref Info: Option `colorlinks' set `true' on input line 42.
+Package hyperref Info: Option `breaklinks' set `true' on input line 42.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/fancyvrb/fancyvrb.sty
 Package: fancyvrb 2008/02/07
 
 Style option: `fancyvrb' v2.7a, with DG/SPQR fixes, and firstline=lastline fix 
 <2008/02/07> (tvz)
-\FV@CodeLineNo=\count118
-\FV@InFile=\read1
-\FV@TabBox=\box53
-\c@FancyVerbLine=\count119
-\FV@StepNumber=\count120
+\FV@CodeLineNo=\count164
+\FV@InFile=\read2
+\FV@TabBox=\box70
+\c@FancyVerbLine=\count165
+\FV@StepNumber=\count166
 \FV@OutFile=\write3
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/framed/framed.sty
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/framed/framed.sty
 Package: framed 2011/10/22 v 0.96: framed or shaded text with page breaks
-\OuterFrameSep=\skip76
-\fb@frw=\dimen147
-\fb@frh=\dimen148
-\FrameRule=\dimen149
-\FrameSep=\dimen150
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/tools/longtable.sty
+\OuterFrameSep=\skip81
+\fb@frw=\dimen173
+\fb@frh=\dimen174
+\FrameRule=\dimen175
+\FrameSep=\dimen176
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/tools/longtable.sty
 Package: longtable 2014/10/28 v4.11 Multi-page Table package (DPC)
-\LTleft=\skip77
-\LTright=\skip78
-\LTpre=\skip79
-\LTpost=\skip80
-\LTchunksize=\count121
-\LTcapwidth=\dimen151
-\LT@head=\box54
-\LT@firsthead=\box55
-\LT@foot=\box56
-\LT@lastfoot=\box57
-\LT@cols=\count122
-\LT@rows=\count123
-\c@LT@tables=\count124
-\c@LT@chunks=\count125
-\LT@p@ftn=\toks25
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/booktabs/booktabs.sty
-Package: booktabs 2005/04/14 v1.61803 publication quality tables
-\heavyrulewidth=\dimen152
-\lightrulewidth=\dimen153
-\cmidrulewidth=\dimen154
-\belowrulesep=\dimen155
-\belowbottomsep=\dimen156
-\aboverulesep=\dimen157
-\abovetopsep=\dimen158
-\cmidrulesep=\dimen159
-\cmidrulekern=\dimen160
-\defaultaddspace=\dimen161
-\@cmidla=\count126
-\@cmidlb=\count127
-\@aboverulesep=\dimen162
-\@belowrulesep=\dimen163
-\@thisruleclass=\count128
-\@lastruleclass=\count129
-\@thisrulewidth=\dimen164
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/graphics/graphicx.sty
+\LTleft=\skip82
+\LTright=\skip83
+\LTpre=\skip84
+\LTpost=\skip85
+\LTchunksize=\count167
+\LTcapwidth=\dimen177
+\LT@head=\box71
+\LT@firsthead=\box72
+\LT@foot=\box73
+\LT@lastfoot=\box74
+\LT@cols=\count168
+\LT@rows=\count169
+\c@LT@tables=\count170
+\c@LT@chunks=\count171
+\LT@p@ftn=\toks22
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/booktabs/booktabs.sty
+Package: booktabs 2016/04/27 v1.618033 publication quality tables
+\heavyrulewidth=\dimen178
+\lightrulewidth=\dimen179
+\cmidrulewidth=\dimen180
+\belowrulesep=\dimen181
+\belowbottomsep=\dimen182
+\aboverulesep=\dimen183
+\abovetopsep=\dimen184
+\cmidrulesep=\dimen185
+\cmidrulekern=\dimen186
+\defaultaddspace=\dimen187
+\@cmidla=\count172
+\@cmidlb=\count173
+\@aboverulesep=\dimen188
+\@belowrulesep=\dimen189
+\@thisruleclass=\count174
+\@lastruleclass=\count175
+\@thisrulewidth=\dimen190
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/mdwtools/footnote.sty
+Package: footnote 1997/01/28 1.13 Save footnotes around boxes
+\fn@notes=\box75
+\fn@width=\dimen191
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/graphics/graphicx.sty
 Package: graphicx 2014/10/28 v1.0g Enhanced LaTeX Graphics (DPC,SPQR)
-(/usr/local/texlive/2015/texmf-dist/tex/latex/graphics/graphics.sty
-Package: graphics 2014/10/28 v1.0p Standard LaTeX Graphics (DPC,SPQR)
-(/usr/local/texlive/2015/texmf-dist/tex/latex/graphics/trig.sty
-Package: trig 1999/03/16 v1.09 sin cos tan (DPC)
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/latexconfig/graphics.cfg
-File: graphics.cfg 2010/04/23 v1.9 graphics configuration of TeX Live
+(/usr/local/texlive/2017/texmf-dist/tex/latex/graphics/graphics.sty
+Package: graphics 2017/04/14 v1.1b Standard LaTeX Graphics (DPC,SPQR)
+(/usr/local/texlive/2017/texmf-dist/tex/latex/graphics/trig.sty
+Package: trig 2016/01/03 v1.10 sin cos tan (DPC)
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/graphics-cfg/graphics.cfg
+File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration
 )
-Package graphics Info: Driver file: pdftex.def on input line 94.
+Package graphics Info: Driver file: xetex.def on input line 99.
 )
-\Gin@req@height=\dimen165
-\Gin@req@width=\dimen166
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/oberdiek/grffile.sty
-Package: grffile 2012/04/05 v1.16 Extended file name support for graphics (HO)
+\Gin@req@height=\dimen192
+\Gin@req@width=\dimen193
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/oberdiek/grffile.sty
+Package: grffile 2016/05/16 v1.17 Extended file name support for graphics (HO)
+Package grffile Info: \Gread@QTm patched on input line 253.
+Package grffile Info: \Gread@eps patched on input line 303.
 Package grffile Info: Option `multidot' is set to `true'.
 Package grffile Info: Option `extendedchars' is set to `false'.
 Package grffile Info: Option `space' is set to `true'.
 Package grffile Info: \Gin@ii of package `graphicx' fixed on input line 486.
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/parskip/parskip.sty
-Package: parskip 2001/04/09 non-zero parskip adjustments
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/titling/titling.sty
-Package: titling 2009/09/04 v2.1d maketitle typesetting
-\thanksmarkwidth=\skip81
-\thanksmargin=\skip82
-\droptitle=\skip83
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/tools/array.sty
-Package: array 2014/10/28 v2.4c Tabular extension package (FMi)
-\col@sep=\dimen167
-\extrarowheight=\dimen168
-\NC@list=\toks26
-\extratabsurround=\skip84
-\backup@length=\skip85
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/multirow/multirow.sty
-\bigstrutjot=\dimen169
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/xcolor/xcolor.sty
-Package: xcolor 2007/01/21 v2.11 LaTeX color extensions (UK)
-(/usr/local/texlive/2015/texmf-dist/tex/latex/latexconfig/color.cfg
-File: color.cfg 2007/01/18 v1.5 color configuration of teTeX/TeXLive
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/caption/caption.sty
+Package: caption 2016/02/21 v3.3-144 Customizing captions (AR)
+(/usr/local/texlive/2017/texmf-dist/tex/latex/caption/caption3.sty
+Package: caption3 2016/05/22 v1.7-166 caption3 kernel (AR)
+Package caption3 Info: TeX engine: e-TeX on input line 67.
+\captionmargin=\dimen194
+\captionmargin@=\dimen195
+\captionwidth=\dimen196
+\caption@tempdima=\dimen197
+\caption@indent=\dimen198
+\caption@parindent=\dimen199
+\caption@hangindent=\dimen256
+Package caption Info: Unknown document class (or package),
+(caption)             standard defaults will be used.
 )
-Package xcolor Info: Driver file: pdftex.def on input line 225.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/colortbl/colortbl.sty
+
+Package caption Warning: Unsupported document class (or package) detected,
+(caption)                usage of the caption package is not recommended.
+See the caption package documentation for explanation.
+
+Package caption Info: \@makecaption = \long macro:#1#2->\vskip \abovecaptionski
+p \sbox \@tempboxa {#1: #2}\ifdim \wd \@tempboxa >\hsize {\FigCapFont #1} #2\pa
+r \else \global \@minipagefalse {\FigCapFont #1} #2\par \fi \vskip \belowcaptio
+nskip .
+\c@ContinuedFloat=\count176
+Package caption Info: hyperref package is loaded.
+Package caption Info: longtable package is loaded.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/caption/ltcaption.sty
+Package: ltcaption 2013/06/09 v1.4-94 longtable captions (AR)
+)) (/usr/local/texlive/2017/texmf-dist/tex/latex/float/float.sty
+Package: float 2001/11/08 v1.3d Float enhancements (AL)
+\c@float@type=\count177
+\float@exts=\toks23
+\float@box=\box76
+\@float@everytoks=\toks24
+\@floatcapt=\box77
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/tools/array.sty
+Package: array 2016/10/06 v2.4d Tabular extension package (FMi)
+\col@sep=\dimen257
+\extrarowheight=\dimen258
+\NC@list=\toks25
+\extratabsurround=\skip86
+\backup@length=\skip87
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/multirow/multirow.sty
+Package: multirow 2016/11/25 v2.2 Span multiple rows of a table
+\multirow@colwidth=\skip88
+\multirow@cntb=\count178
+\multirow@dima=\skip89
+\bigstrutjot=\dimen259
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/wrapfig/wrapfig.sty
+\wrapoverhang=\dimen260
+\WF@size=\dimen261
+\c@WF@wrappedlines=\count179
+\WF@box=\box78
+\WF@everypar=\toks26
+Package: wrapfig 2003/01/31  v 3.6
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/colortbl/colortbl.sty
 Package: colortbl 2012/02/13 v1.0a Color table columns (DPC)
 \everycr=\toks27
-\minrowclearance=\skip86
-)
-LaTeX Info: Redefining \color on input line 702.
-\rownum=\count130
-Package xcolor Info: Model `cmy' substituted by `cmy0' on input line 1337.
-Package xcolor Info: Model `hsb' substituted by `rgb' on input line 1341.
-Package xcolor Info: Model `RGB' extended on input line 1353.
-Package xcolor Info: Model `HTML' substituted by `rgb' on input line 1355.
-Package xcolor Info: Model `Hsb' substituted by `hsb' on input line 1356.
-Package xcolor Info: Model `tHsb' substituted by `hsb' on input line 1357.
-Package xcolor Info: Model `HSB' substituted by `hsb' on input line 1358.
-Package xcolor Info: Model `Gray' substituted by `gray' on input line 1359.
-Package xcolor Info: Model `wave' substituted by `hsb' on input line 1360.
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/wrapfig/wrapfig.sty
-\wrapoverhang=\dimen170
-\WF@size=\dimen171
-\c@WF@wrappedlines=\count131
-\WF@box=\box58
-\WF@everypar=\toks28
-Package: wrapfig 2003/01/31  v 3.6
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/float/float.sty
-Package: float 2001/11/08 v1.3d Float enhancements (AL)
-\c@float@type=\count132
-\float@exts=\toks29
-\float@box=\box59
-\@float@everytoks=\toks30
-\@floatcapt=\box60
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/oberdiek/pdflscape.sty
-Package: pdflscape 2008/08/11 v0.10 Display of landscape pages in PDF (HO)
-(/usr/local/texlive/2015/texmf-dist/tex/latex/graphics/lscape.sty
+\minrowclearance=\skip90
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/oberdiek/pdflscape.sty
+Package: pdflscape 2016/05/14 v0.11 Display of landscape pages in PDF (HO)
+(/usr/local/texlive/2017/texmf-dist/tex/latex/graphics/lscape.sty
 Package: lscape 2000/10/22 v3.01 Landscape Pages (DPC)
 )
-Package pdflscape Info: Auto-detected driver: pdftex on input line 80.
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/tabu/tabu.sty
+Package pdflscape Info: Auto-detected driver: dvipdfm (xetex) on input line 99.
+
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/tabu/tabu.sty
 Package: tabu 2011/02/26 v2.8 - flexible LaTeX tabulars (FC)
-(/usr/local/texlive/2015/texmf-dist/tex/latex/varwidth/varwidth.sty
+(/usr/local/texlive/2017/texmf-dist/tex/latex/varwidth/varwidth.sty
 Package: varwidth 2009/03/30 ver 0.92;  Variable-width minipages
-\@vwid@box=\box61
-\sift@deathcycles=\count133
-\@vwid@loff=\dimen172
-\@vwid@roff=\dimen173
+\@vwid@box=\box79
+\sift@deathcycles=\count180
+\@vwid@loff=\dimen262
+\@vwid@roff=\dimen263
 )
-\c@taburow=\count134
-\tabu@nbcols=\count135
-\tabu@cnt=\count136
-\tabu@Xcol=\count137
-\tabu@alloc=\count138
-\tabu@nested=\count139
-\tabu@target=\dimen174
-\tabu@spreadtarget=\dimen175
-\tabu@naturalX=\dimen176
-\tabucolX=\dimen177
-\tabu@Xsum=\dimen178
-\extrarowdepth=\dimen179
-\abovetabulinesep=\dimen180
-\belowtabulinesep=\dimen181
-\tabustrutrule=\dimen182
-\tabu@thebody=\toks31
-\tabu@footnotes=\toks32
-\tabu@box=\box62
-\tabu@arstrutbox=\box63
-\tabu@hleads=\box64
-\tabu@vleads=\box65
-\tabu@cellskip=\skip87
+\c@taburow=\count181
+\tabu@nbcols=\count182
+\tabu@cnt=\count183
+\tabu@Xcol=\count184
+\tabu@alloc=\count185
+\tabu@nested=\count186
+\tabu@target=\dimen264
+\tabu@spreadtarget=\dimen265
+\tabu@naturalX=\dimen266
+\tabucolX=\dimen267
+\tabu@Xsum=\dimen268
+\extrarowdepth=\dimen269
+\abovetabulinesep=\dimen270
+\belowtabulinesep=\dimen271
+\tabustrutrule=\dimen272
+\tabu@thebody=\toks28
+\tabu@footnotes=\toks29
+\tabu@box=\box80
+\tabu@arstrutbox=\box81
+\tabu@hleads=\box82
+\tabu@vleads=\box83
+\tabu@cellskip=\skip91
 )
-(/usr/local/texlive/2015/texmf-dist/tex/latex/threeparttable/threeparttable.sty
+(/usr/local/texlive/2017/texmf-dist/tex/latex/threeparttable/threeparttable.sty
 Package: threeparttable 2003/06/13  v 3.0
-\@tempboxb=\box66
+\@tempboxb=\box84
 )
-(/usr/local/texlive/2015/texmf-dist/tex/latex/threeparttablex/threeparttablex.s
+(/usr/local/texlive/2017/texmf-dist/tex/latex/threeparttablex/threeparttablex.s
 ty
 Package: threeparttablex 2013/07/23 v0.3 by daleif
-(/usr/local/texlive/2015/texmf-dist/tex/latex/environ/environ.sty
+(/usr/local/texlive/2017/texmf-dist/tex/latex/environ/environ.sty
 Package: environ 2014/05/04 v0.3 A new way to define environments
-(/usr/local/texlive/2015/texmf-dist/tex/latex/trimspaces/trimspaces.sty
+(/usr/local/texlive/2017/texmf-dist/tex/latex/trimspaces/trimspaces.sty
 Package: trimspaces 2009/09/17 v1.1 Trim spaces around a token list
 ))
-\TPTL@width=\skip88
-) (/usr/local/texlive/2015/texmf-dist/tex/generic/ulem/ulem.sty
-\UL@box=\box67
-\UL@hyphenbox=\box68
-\UL@skip=\skip89
-\UL@hook=\toks33
-\UL@height=\dimen183
-\UL@pe=\count140
-\UL@pixel=\dimen184
-\ULC@box=\box69
+\TPTL@width=\skip92
+) (/usr/local/texlive/2017/texmf-dist/tex/generic/ulem/ulem.sty
+\UL@box=\box85
+\UL@hyphenbox=\box86
+\UL@skip=\skip93
+\UL@hook=\toks30
+\UL@height=\dimen273
+\UL@pe=\count187
+\UL@pixel=\dimen274
+\ULC@box=\box87
 Package: ulem 2012/05/18
-\ULdepth=\dimen185
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/makecell/makecell.sty
+\ULdepth=\dimen275
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/makecell/makecell.sty
 Package: makecell 2009/08/03 V0.1e Managing of Tab Column Heads and Cells
-\rotheadsize=\dimen186
-\c@nlinenum=\count141
-\TeXr@lab=\toks34
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/amscls/amsthm.sty
+\rotheadsize=\dimen276
+\c@nlinenum=\count188
+\TeXr@lab=\toks31
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/base/makeidx.sty
+Package: makeidx 2014/09/29 v1.0m Standard LaTeX package
+)
+\@indexfile=\write4
+\openout4 = `ismaykim.idx'.
+
+Writing index file ismaykim.idx
+(/usr/local/texlive/2017/texmf-dist/tex/latex/amscls/amsthm.sty
 Package: amsthm 2015/03/04 v2.20.2
-\thm@style=\toks35
-\thm@bodyfont=\toks36
-\thm@headfont=\toks37
-\thm@notefont=\toks38
-\thm@headpunct=\toks39
-\thm@preskip=\skip90
-\thm@postskip=\skip91
-\thm@headsep=\skip92
-\dth@everypar=\toks40
+\thm@style=\toks32
+\thm@bodyfont=\toks33
+\thm@headfont=\toks34
+\thm@notefont=\toks35
+\thm@headpunct=\toks36
+\thm@preskip=\skip94
+\thm@postskip=\skip95
+\thm@headsep=\skip96
+\dth@everypar=\toks37
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/natbib/natbib.sty
+Package: natbib 2010/09/13 8.31b (PWD, AO)
+\bibhang=\skip97
+\bibsep=\skip98
+LaTeX Info: Redefining \cite on input line 694.
+\c@NAT@ctr=\count189
+) (./ismaykim.aux
+
+LaTeX Warning: Label `tab:tactilered' multiply defined.
+
+
+LaTeX Warning: Label `tab:virtualred' multiply defined.
+
+
+LaTeX Warning: Label `tab:summarytable-ch8' multiply defined.
+
+
+LaTeX Warning: Label `tab:summarytable' multiply defined.
+
+
+LaTeX Warning: Label `tab:unnamed-chunk-315' multiply defined.
+
 )
-\c@theorem=\count142
-\c@lemma=\count143
-\c@definition=\count144
-\c@corollary=\count145
-\c@proposition=\count146
-\c@example=\count147
-\c@exercise=\count148
-No file ismaykim.aux.
 \openout1 = `ismaykim.aux'.
 
-LaTeX Font Info:    Checking defaults for OML/cmm/m/it on input line 162.
-LaTeX Font Info:    ... okay on input line 162.
-LaTeX Font Info:    Checking defaults for T1/cmr/m/n on input line 162.
-LaTeX Font Info:    ... okay on input line 162.
-LaTeX Font Info:    Checking defaults for OT1/cmr/m/n on input line 162.
-LaTeX Font Info:    ... okay on input line 162.
-LaTeX Font Info:    Checking defaults for OMS/cmsy/m/n on input line 162.
-LaTeX Font Info:    ... okay on input line 162.
-LaTeX Font Info:    Checking defaults for OMX/cmex/m/n on input line 162.
-LaTeX Font Info:    ... okay on input line 162.
-LaTeX Font Info:    Checking defaults for U/cmr/m/n on input line 162.
-LaTeX Font Info:    ... okay on input line 162.
-LaTeX Font Info:    Checking defaults for TS1/cmr/m/n on input line 162.
-LaTeX Font Info:    Try loading font information for TS1+cmr on input line 162.
-
-(/usr/local/texlive/2015/texmf-dist/tex/latex/base/ts1cmr.fd
-File: ts1cmr.fd 2014/09/29 v2.5h Standard LaTeX font definitions
-)
-LaTeX Font Info:    ... okay on input line 162.
-LaTeX Font Info:    Checking defaults for PD1/pdf/m/n on input line 162.
-LaTeX Font Info:    ... okay on input line 162.
-LaTeX Font Info:    Checking defaults for PU/pdf/m/n on input line 162.
-LaTeX Font Info:    ... okay on input line 162.
-LaTeX Font Info:    Try loading font information for T1+lmr on input line 162.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/lm/t1lmr.fd
-File: t1lmr.fd 2009/10/30 v1.6 Font defs for Latin Modern
-)
-LaTeX Font Info:    Try loading font information for TS1+lmr on input line 162.
+LaTeX Font Info:    Checking defaults for OML/cmm/m/it on input line 231.
+LaTeX Font Info:    ... okay on input line 231.
+LaTeX Font Info:    Checking defaults for T1/cmr/m/n on input line 231.
+LaTeX Font Info:    ... okay on input line 231.
+LaTeX Font Info:    Checking defaults for OT1/cmr/m/n on input line 231.
+LaTeX Font Info:    ... okay on input line 231.
+LaTeX Font Info:    Checking defaults for OMS/cmsy/m/n on input line 231.
+LaTeX Font Info:    ... okay on input line 231.
+LaTeX Font Info:    Checking defaults for TU/lmr/m/n on input line 231.
+LaTeX Font Info:    ... okay on input line 231.
+LaTeX Font Info:    Checking defaults for OMX/cmex/m/n on input line 231.
+LaTeX Font Info:    ... okay on input line 231.
+LaTeX Font Info:    Checking defaults for U/cmr/m/n on input line 231.
+LaTeX Font Info:    ... okay on input line 231.
+LaTeX Font Info:    Checking defaults for TS1/cmr/m/n on input line 231.
+LaTeX Font Info:    ... okay on input line 231.
+LaTeX Font Info:    Checking defaults for PD1/pdf/m/n on input line 231.
+LaTeX Font Info:    ... okay on input line 231.
+LaTeX Font Info:    Checking defaults for PU/pdf/m/n on input line 231.
+LaTeX Font Info:    ... okay on input line 231.
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+\g__fontspec_family_latinmodern-math.otf_int=\count190
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "defining-font"
+. 
+. Font family 'latinmodern-math.otf(0)' created for font
+. 'latinmodern-math.otf' with options
+. [Ligatures=TeX,Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},Script=Ma
+th,SizeFeatures={{Size=12-},{Size=8-12,Font=latinmodern-math.otf,Style=MathScri
+pt},{Size=-8,Font=latinmodern-math.otf,Style=MathScriptScript}},BoldFont={latin
+modern-math.otf}].
+.  
+.  This font family consists of the following NFSS series/shapes:
+.  
+. - 'normal' (m/n) with NFSS spec.:
+. <12->s*[1]"[latinmodern-math.otf]/OT:script=math;language=DFLT;mapping=tex-te
+xt;"<8-12>s*[1]"[latinmodern-math.otf]/OT:script=math;language=DFLT;+ssty=0;map
+ping=tex-text;"<-8>s*[1]"[latinmodern-math.otf]/OT:script=math;language=DFLT;+s
+sty=1;mapping=tex-text;"
+. - 'small caps'  (m/sc) with NFSS spec.: 
+. - 'bold' (bx/n) with NFSS spec.:
+. <->s*[1]"[latinmodern-math.otf]/OT:script=math;language=DFLT;mapping=tex-text
+;"
+. - 'bold small caps'  (bx/sc) with NFSS spec.: 
+.................................................
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(0)/m/n' will be
+(Font)              scaled to size 12.0pt on input line 231.
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+LaTeX Font Info:    Encoding `OT1' has changed to `TU' for symbol font
+(Font)              `operators' in the math version `normal' on input line 231.
 
-(/usr/local/texlive/2015/texmf-dist/tex/latex/lm/ts1lmr.fd
-File: ts1lmr.fd 2009/10/30 v1.6 Font defs for Latin Modern
-)
-LaTeX Info: Redefining \microtypecontext on input line 162.
-Package microtype Info: Generating PDF output.
+LaTeX Font Info:    Overwriting symbol font `operators' in version `normal'
+(Font)                  OT1/lmr/m/n --> TU/latinmodern-math.otf(0)/m/n on input
+ line 231.
+LaTeX Font Info:    Encoding `OT1' has changed to `TU' for symbol font
+(Font)              `operators' in the math version `bold' on input line 231.
+LaTeX Font Info:    Overwriting symbol font `operators' in version `bold'
+(Font)                  OT1/lmr/bx/n --> TU/latinmodern-math.otf(0)/bx/n on inp
+ut line 231.
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "defining-font"
+. 
+. Font family 'latinmodern-math.otf(1)' created for font
+. 'latinmodern-math.otf' with options
+. [Ligatures=TeX,Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},Script=Ma
+th,SizeFeatures={{Size=12-},{Size=8-12,Font=latinmodern-math.otf,Style=MathScri
+pt},{Size=-8,Font=latinmodern-math.otf,Style=MathScriptScript}},BoldFont={latin
+modern-math.otf},Scale=1.00001,FontAdjustment={\fontdimen
+. 8\font =8.124pt\relax \relax \fontdimen 9\font =4.728pt\relax \relax
+. \fontdimen 10\font =5.328pt\relax \relax \fontdimen 11\font =8.232pt\relax
+. \relax \fontdimen 12\font =4.14pt\relax \relax \fontdimen 13\font
+. =4.356pt\relax \relax \fontdimen 14\font =4.356pt\relax \relax \fontdimen
+. 15\font =3.468pt\relax \relax \fontdimen 16\font =2.964pt\relax \relax
+. \fontdimen 17\font =2.964pt\relax \relax \fontdimen 18\font =3.0pt\relax
+. \relax \fontdimen 19\font =2.4pt\relax \relax \fontdimen 20\font =0pt\relax
+. \fontdimen 21\font =0pt\relax \fontdimen 22\font =3.0pt\relax \relax }].
+.  
+.  This font family consists of the following NFSS series/shapes:
+.  
+. - 'normal' (m/n) with NFSS spec.:
+. <12->s*[1.00001]"[latinmodern-math.otf]/OT:script=math;language=DFLT;mapping=
+tex-text;"<8-12>s*[1.00001]"[latinmodern-math.otf]/OT:script=math;language=DFLT
+;+ssty=0;mapping=tex-text;"<-8>s*[1.00001]"[latinmodern-math.otf]/OT:script=mat
+h;language=DFLT;+ssty=1;mapping=tex-text;"
+. - 'small caps'  (m/sc) with NFSS spec.: 
+. and font adjustment code:
+. \fontdimen 8\font =8.124pt\relax \relax \fontdimen 9\font =4.728pt\relax
+. \relax \fontdimen 10\font =5.328pt\relax \relax \fontdimen 11\font
+. =8.232pt\relax \relax \fontdimen 12\font =4.14pt\relax \relax \fontdimen
+. 13\font =4.356pt\relax \relax \fontdimen 14\font =4.356pt\relax \relax
+. \fontdimen 15\font =3.468pt\relax \relax \fontdimen 16\font =2.964pt\relax
+. \relax \fontdimen 17\font =2.964pt\relax \relax \fontdimen 18\font
+. =3.0pt\relax \relax \fontdimen 19\font =2.4pt\relax \relax \fontdimen
+. 20\font =0pt\relax \fontdimen 21\font =0pt\relax \fontdimen 22\font
+. =3.0pt\relax \relax 
+. - 'bold' (bx/n) with NFSS spec.:
+. <->s*[1.00001]"[latinmodern-math.otf]/OT:script=math;language=DFLT;mapping=te
+x-text;"
+. - 'bold small caps'  (bx/sc) with NFSS spec.: 
+. and font adjustment code:
+. \fontdimen 8\font =8.124pt\relax \relax \fontdimen 9\font =4.728pt\relax
+. \relax \fontdimen 10\font =5.328pt\relax \relax \fontdimen 11\font
+. =8.232pt\relax \relax \fontdimen 12\font =4.14pt\relax \relax \fontdimen
+. 13\font =4.356pt\relax \relax \fontdimen 14\font =4.356pt\relax \relax
+. \fontdimen 15\font =3.468pt\relax \relax \fontdimen 16\font =2.964pt\relax
+. \relax \fontdimen 17\font =2.964pt\relax \relax \fontdimen 18\font
+. =3.0pt\relax \relax \fontdimen 19\font =2.4pt\relax \relax \fontdimen
+. 20\font =0pt\relax \fontdimen 21\font =0pt\relax \fontdimen 22\font
+. =3.0pt\relax \relax 
+.................................................
+LaTeX Font Info:    Encoding `OMS' has changed to `TU' for symbol font
+(Font)              `symbols' in the math version `normal' on input line 231.
+LaTeX Font Info:    Overwriting symbol font `symbols' in version `normal'
+(Font)                  OMS/lmsy/m/n --> TU/latinmodern-math.otf(1)/m/n on inpu
+t line 231.
+LaTeX Font Info:    Encoding `OMS' has changed to `TU' for symbol font
+(Font)              `symbols' in the math version `bold' on input line 231.
+LaTeX Font Info:    Overwriting symbol font `symbols' in version `bold'
+(Font)                  OMS/lmsy/b/n --> TU/latinmodern-math.otf(1)/bx/n on inp
+ut line 231.
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "set-scale"
+. 
+. latinmodern-math scale = 1.
+.................................................
+.................................................
+. fontspec info: "defining-font"
+. 
+. Font family 'latinmodern-math.otf(2)' created for font
+. 'latinmodern-math.otf' with options
+. [Ligatures=TeX,Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},Script=Ma
+th,SizeFeatures={{Size=12-},{Size=8-12,Font=latinmodern-math.otf,Style=MathScri
+pt},{Size=-8,Font=latinmodern-math.otf,Style=MathScriptScript}},BoldFont={latin
+modern-math.otf},Scale=0.99999,FontAdjustment={\fontdimen
+. 8\font =0.48pt\relax \relax \fontdimen 9\font =2.4pt\relax \relax \fontdimen
+. 10\font =2.004pt\relax \relax \fontdimen 11\font =1.332pt\relax \relax
+. \fontdimen 12\font =7.2pt\relax \relax \fontdimen 13\font =0pt\relax }].
+.  
+.  This font family consists of the following NFSS series/shapes:
+.  
+. - 'normal' (m/n) with NFSS spec.:
+. <12->s*[0.99999]"[latinmodern-math.otf]/OT:script=math;language=DFLT;mapping=
+tex-text;"<8-12>s*[0.99999]"[latinmodern-math.otf]/OT:script=math;language=DFLT
+;+ssty=0;mapping=tex-text;"<-8>s*[0.99999]"[latinmodern-math.otf]/OT:script=mat
+h;language=DFLT;+ssty=1;mapping=tex-text;"
+. - 'small caps'  (m/sc) with NFSS spec.: 
+. and font adjustment code:
+. \fontdimen 8\font =0.48pt\relax \relax \fontdimen 9\font =2.4pt\relax \relax
+. \fontdimen 10\font =2.004pt\relax \relax \fontdimen 11\font =1.332pt\relax
+. \relax \fontdimen 12\font =7.2pt\relax \relax \fontdimen 13\font =0pt\relax 
+. - 'bold' (bx/n) with NFSS spec.:
+. <->s*[0.99999]"[latinmodern-math.otf]/OT:script=math;language=DFLT;mapping=te
+x-text;"
+. - 'bold small caps'  (bx/sc) with NFSS spec.: 
+. and font adjustment code:
+. \fontdimen 8\font =0.48pt\relax \relax \fontdimen 9\font =2.4pt\relax \relax
+. \fontdimen 10\font =2.004pt\relax \relax \fontdimen 11\font =1.332pt\relax
+. \relax \fontdimen 12\font =7.2pt\relax \relax \fontdimen 13\font =0pt\relax 
+.................................................
+LaTeX Font Info:    Encoding `OMX' has changed to `TU' for symbol font
+(Font)              `largesymbols' in the math version `normal' on input line 2
+31.
+LaTeX Font Info:    Overwriting symbol font `largesymbols' in version `normal'
+(Font)                  OMX/lmex/m/n --> TU/latinmodern-math.otf(2)/m/n on inpu
+t line 231.
+LaTeX Font Info:    Encoding `OMX' has changed to `TU' for symbol font
+(Font)              `largesymbols' in the math version `bold' on input line 231
+.
+LaTeX Font Info:    Overwriting symbol font `largesymbols' in version `bold'
+(Font)                  OMX/lmex/m/n --> TU/latinmodern-math.otf(2)/bx/n on inp
+ut line 231.
+LaTeX Info: Redefining \microtypecontext on input line 231.
 Package microtype Info: Character protrusion enabled (level 2).
 Package microtype Info: Using protrusion set `basicmath'.
-Package microtype Info: Automatic font expansion enabled (level 2),
-(microtype)             stretch: 20, shrink: 20, step: 1, non-selected.
-Package microtype Info: Using default expansion set `basictext'.
 Package microtype Info: No adjustment of tracking.
-Package microtype Info: No adjustment of interword spacing.
-Package microtype Info: No adjustment of character kerning.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/microtype/mt-cmr.cfg
-File: mt-cmr.cfg 2013/05/19 v2.2 microtype config. file: Computer Modern Roman 
-(RS)
-)
-*geometry* driver: auto-detecting
-*geometry* detected driver: pdftex
-*geometry* verbose mode - [ preamble ] result:
-* driver: pdftex
-* paper: <default>
-* layout: <same size as paper>
-* layoutoffset:(h,v)=(0.0pt,0.0pt)
-* modes: twoside 
-* h-part:(L,W,R)=(72.26999pt, 469.75502pt, 72.26999pt)
-* v-part:(T,H,B)=(72.26999pt, 650.43001pt, 72.26999pt)
-* \paperwidth=614.295pt
-* \paperheight=794.96999pt
-* \textwidth=469.75502pt
-* \textheight=650.43001pt
-* \oddsidemargin=0.0pt
-* \evensidemargin=0.0pt
-* \topmargin=-24.0pt
-* \headheight=12.0pt
-* \headsep=12.0pt
-* \topskip=12.0pt
-* \footskip=24.0pt
-* \marginparwidth=48.0pt
-* \marginparsep=12.0pt
-* \columnsep=10.0pt
-* \skip\footins=12.0pt plus 3.0pt minus 1.5pt
-* \hoffset=-15.0pt
-* \voffset=39.0pt
-* \mag=1000
-* \@twocolumnfalse
-* \@twosidetrue
-* \@mparswitchtrue
-* \@reversemarginfalse
-* (1in=72.27pt=25.4mm, 1cm=28.453pt)
-
-\AtBeginShipoutBox=\box70
-Package hyperref Info: Link coloring ON on input line 162.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/hyperref/nameref.sty
-Package: nameref 2012/10/27 v2.43 Cross-referencing by name of section
-(/usr/local/texlive/2015/texmf-dist/tex/generic/oberdiek/gettitlestring.sty
-Package: gettitlestring 2010/12/03 v1.4 Cleanup title references (HO)
+Package microtype Info: No adjustment of spacing.
+Package microtype Info: No adjustment of kerning.
+
+(/usr/local/texlive/2017/texmf-dist/tex/latex/microtype/mt-LatinModernRoman.cfg
+File: mt-LatinModernRoman.cfg 2013/03/13 v1.0 microtype config. file: Latin Mod
+ern Roman (RS)
 )
-\c@section@level=\count149
+\AtBeginShipoutBox=\box88
+Package hyperref Info: Link coloring ON on input line 231.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/hyperref/nameref.sty
+Package: nameref 2016/05/21 v2.44 Cross-referencing by name of section
+(/usr/local/texlive/2017/texmf-dist/tex/generic/oberdiek/gettitlestring.sty
+Package: gettitlestring 2016/05/16 v1.5 Cleanup title references (HO)
 )
-LaTeX Info: Redefining \ref on input line 162.
-LaTeX Info: Redefining \pageref on input line 162.
-LaTeX Info: Redefining \nameref on input line 162.
-\@outlinefile=\write4
-\openout4 = `ismaykim.out'.
-
-(/usr/local/texlive/2015/texmf-dist/tex/context/base/supp-pdf.mkii
-[Loading MPS to PDF converter (version 2006.09.02).]
-\scratchcounter=\count150
-\scratchdimen=\dimen187
-\scratchbox=\box71
-\nofMPsegments=\count151
-\nofMParguments=\count152
-\everyMPshowfont=\toks41
-\MPscratchCnt=\count153
-\MPscratchDim=\dimen188
-\MPnumerator=\count154
-\makeMPintoPDFobject=\count155
-\everyMPtoPDFconversion=\toks42
-) (/usr/local/texlive/2015/texmf-dist/tex/latex/oberdiek/epstopdf-base.sty
-Package: epstopdf-base 2010/02/09 v2.5 Base part for package epstopdf
-(/usr/local/texlive/2015/texmf-dist/tex/latex/oberdiek/grfext.sty
-Package: grfext 2010/08/19 v1.1 Manage graphics extensions (HO)
+\c@section@level=\count191
 )
-Package grfext Info: Graphics extension search list:
-(grfext)             [.png,.pdf,.jpg,.mps,.jpeg,.jbig2,.jb2,.PNG,.PDF,.JPG,.JPE
-G,.JBIG2,.JB2,.eps]
-(grfext)             \AppendGraphicsExtensions on input line 452.
-(/usr/local/texlive/2015/texmf-dist/tex/latex/latexconfig/epstopdf-sys.cfg
-File: epstopdf-sys.cfg 2010/07/13 v1.3 Configuration of (r)epstopdf for TeX Liv
-e
-))
+LaTeX Info: Redefining \ref on input line 231.
+LaTeX Info: Redefining \pageref on input line 231.
+LaTeX Info: Redefining \nameref on input line 231.
+(./ismaykim.out) (./ismaykim.out)
+\@outlinefile=\write5
+\openout5 = `ismaykim.out'.
+
+Package caption Info: Begin \AtBeginDocument code.
+Package caption Info: float package is loaded.
+Package caption Info: threeparttable package is loaded.
+Package caption Info: wrapfig package is loaded.
+Package caption Info: End \AtBeginDocument code.
+
 Underfull \vbox (badness 10000) has occurred while \output is active []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
 []
  []
 
@@ -1092,7 +1427,7 @@ Overfull \hbox (469.50502pt too wide) has occurred while \output is active
 Underfull \vbox (badness 10000) has occurred while \output is active []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
 []
  []
 
@@ -1100,7 +1435,7 @@ Overfull \hbox (469.50502pt too wide) has occurred while \output is active
 Underfull \vbox (badness 10000) has occurred while \output is active []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
 []
  []
 
@@ -1108,7 +1443,7 @@ Overfull \hbox (469.50502pt too wide) has occurred while \output is active
 Underfull \vbox (badness 10000) has occurred while \output is active []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
 []
  []
 
@@ -1120,11 +1455,12 @@ Underfull \hbox (badness 10000) has occurred while \output is active
 [1
 
 
-{/usr/local/texlive/2015/texmf-var/fonts/map/pdftex/updmap/pdftex.map}]
+
+]
 Underfull \vbox (badness 10000) has occurred while \output is active []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
 []
  []
 
@@ -1132,7 +1468,7 @@ Overfull \hbox (469.50502pt too wide) has occurred while \output is active
 Underfull \vbox (badness 10000) has occurred while \output is active []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
 []
  []
 
@@ -1140,7 +1476,7 @@ Overfull \hbox (469.50502pt too wide) has occurred while \output is active
 Underfull \vbox (badness 10000) has occurred while \output is active []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
 []
  []
 
@@ -1148,7 +1484,7 @@ Overfull \hbox (469.50502pt too wide) has occurred while \output is active
 Underfull \vbox (badness 10000) has occurred while \output is active []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
 []
  []
 
@@ -1157,296 +1493,5938 @@ Underfull \hbox (badness 10000) has occurred while \output is active
  [][][][] 
  []
 
-[2
-
-]
-Underfull \vbox (badness 10000) detected at line 168
+[2]
+Underfull \vbox (badness 10000) detected at line 251
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 168
+Overfull \hbox (395.75pt too wide) detected at line 251
 []
  []
 
 
-Underfull \vbox (badness 10000) detected at line 168
+Underfull \vbox (badness 10000) detected at line 251
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 168
+Overfull \hbox (395.75pt too wide) detected at line 251
 []
  []
 
 
-Underfull \vbox (badness 10000) detected at line 168
+Underfull \vbox (badness 10000) detected at line 251
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 168
+Overfull \hbox (395.75pt too wide) detected at line 251
 []
  []
 
 
-Underfull \vbox (badness 10000) detected at line 168
+Underfull \vbox (badness 10000) detected at line 251
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 168
+Overfull \hbox (395.75pt too wide) detected at line 251
 []
  []
 
 
-Underfull \hbox (badness 10000) detected at line 168
+Underfull \hbox (badness 10000) detected at line 251
 []
  []
 
-\tf@toc=\write5
-\openout5 = `ismaykim.toc'.
-
+(./ismaykim.toc
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(0)/m/n' will be
+(Font)              scaled to size 8.41483pt on input line 5.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(0)/m/n' will be
+(Font)              scaled to size 5.97688pt on input line 5.
+LaTeX Font Info:    Try loading font information for OML+lmm on input line 5.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/lm/omllmm.fd
+File: omllmm.fd 2009/10/30 v1.6 Font defs for Latin Modern
+)
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font)              scaled to size 12.00018pt on input line 5.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font)              scaled to size 8.41495pt on input line 5.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font)              scaled to size 5.97696pt on input line 5.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font)              scaled to size 11.99982pt on input line 5.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font)              scaled to size 8.41469pt on input line 5.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font)              scaled to size 5.97679pt on input line 5.
+LaTeX Font Info:    Try loading font information for U+msa on input line 5.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/amsfonts/umsa.fd
+File: umsa.fd 2013/01/14 v3.01 AMS symbols A
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/microtype/mt-msa.cfg
+File: mt-msa.cfg 2006/02/04 v1.1 microtype config. file: AMS symbols (a) (RS)
+)
+LaTeX Font Info:    Try loading font information for U+msb on input line 5.
+(/usr/local/texlive/2017/texmf-dist/tex/latex/amsfonts/umsb.fd
+File: umsb.fd 2013/01/14 v3.01 AMS symbols B
+) (/usr/local/texlive/2017/texmf-dist/tex/latex/microtype/mt-msb.cfg
+File: mt-msb.cfg 2005/06/01 v1.0 microtype config. file: AMS symbols (b) (RS)
+)
+LaTeX Font Info:    Font shape `TU/SourceCodePro(0)/m/n' will be
+(Font)              scaled to size 8.39996pt on input line 29.
+Font mapping `tex-ansi.tec' for font `Source Code Pro' not found.
 
 Underfull \hbox (badness 10000) has occurred while \output is active
  [][][][] 
  []
 
-[3]
-Underfull \vbox (badness 10000) has occurred while \output is active []
+[3
+
+] [4] [5] [6] [7] [8] [9])
+\tf@toc=\write6
+\openout6 = `ismaykim.toc'.
+
+[10]
+Underfull \vbox (badness 10000) detected at line 253
+ []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) detected at line 253
 []
  []
 
 
-Underfull \vbox (badness 10000) has occurred while \output is active []
+Underfull \vbox (badness 10000) detected at line 253
+ []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) detected at line 253
 []
  []
 
 
-Underfull \vbox (badness 10000) has occurred while \output is active []
+Underfull \vbox (badness 10000) detected at line 253
+ []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) detected at line 253
 []
  []
 
 
-Underfull \vbox (badness 10000) has occurred while \output is active []
+Underfull \vbox (badness 10000) detected at line 253
+ []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) detected at line 253
 []
  []
 
 
+Underfull \hbox (badness 10000) detected at line 253
+[]
+ []
+
+
+Package hyperref Warning: bookmark level for unknown fm defaults to 0.
+
+(./ismaykim.lot
 Underfull \hbox (badness 10000) has occurred while \output is active
  [][][][] 
  []
 
-[4
+[11
 
-]
-Underfull \vbox (badness 10000) detected at line 170
+
+])
+\tf@lot=\write7
+\openout7 = `ismaykim.lot'.
+
+[12]
+Underfull \vbox (badness 10000) detected at line 254
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 170
+Overfull \hbox (395.75pt too wide) detected at line 254
 []
  []
 
 
-Underfull \vbox (badness 10000) detected at line 170
+Underfull \vbox (badness 10000) detected at line 254
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 170
+Overfull \hbox (395.75pt too wide) detected at line 254
 []
  []
 
 
-Underfull \vbox (badness 10000) detected at line 170
+Underfull \vbox (badness 10000) detected at line 254
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 170
+Overfull \hbox (395.75pt too wide) detected at line 254
 []
  []
 
 
-Underfull \vbox (badness 10000) detected at line 170
+Underfull \vbox (badness 10000) detected at line 254
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 170
+Overfull \hbox (395.75pt too wide) detected at line 254
 []
  []
 
 
-Underfull \hbox (badness 10000) detected at line 170
+Underfull \hbox (badness 10000) detected at line 254
 []
  []
 
+(./ismaykim.lof
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
 
-Package hyperref Warning: bookmark level for unknown fm defaults to 0.
+[13
 
-\tf@lot=\write6
-\openout6 = `ismaykim.lot'.
 
+]
+Underfull \hbox (badness 10000) in paragraph at lines 54--54
+ [][] [][][]\TU/lmr/m/n/12 Tidy data graphic from [R for Data 
+ []
+
+[14]
+Underfull \hbox (badness 10000) in paragraph at lines 73--73
+ [][] [][][]\TU/lmr/m/n/12 Relationship between credit card balance and credit 
 
-Underfull \hbox (badness 10000) has occurred while \output is active
- [][][][] 
  []
 
-[5
 
-]
-Underfull \vbox (badness 10000) has occurred while \output is active []
+Underfull \hbox (badness 10000) in paragraph at lines 77--77
+ [][] [][][]\TU/lmr/m/n/12 Relationship between credit card balance and credit 
 
+ []
+
+[15]
+Underfull \hbox (badness 10000) in paragraph at lines 120--120
+ [][] [][][]\TU/lmr/m/n/12 Relationship between credit card balance and credit 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
-[]
  []
 
+)
+\tf@lof=\write8
+\openout8 = `ismaykim.lof'.
 
-Underfull \vbox (badness 10000) has occurred while \output is active []
+[16]
+Underfull \vbox (badness 10000) detected at line 258
+ []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) detected at line 258
 []
  []
 
 
-Underfull \vbox (badness 10000) has occurred while \output is active []
+Underfull \vbox (badness 10000) detected at line 258
+ []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) detected at line 258
 []
  []
 
 
-Underfull \vbox (badness 10000) has occurred while \output is active []
+Underfull \vbox (badness 10000) detected at line 258
+ []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) detected at line 258
 []
  []
 
 
+Underfull \vbox (badness 10000) detected at line 258
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 258
+[]
+ []
+
+Chapter 1.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(0)/m/n' will be
+(Font)              scaled to size 10.0pt on input line 269.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(0)/m/n' will be
+(Font)              scaled to size 7.0pt on input line 269.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(0)/m/n' will be
+(Font)              scaled to size 5.0pt on input line 269.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font)              scaled to size 10.00015pt on input line 269.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font)              scaled to size 7.0001pt on input line 269.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font)              scaled to size 5.00008pt on input line 269.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font)              scaled to size 9.99985pt on input line 269.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font)              scaled to size 6.9999pt on input line 269.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font)              scaled to size 4.99992pt on input line 269.
+LaTeX Font Info:    Font shape `TU/SourceCodePro(0)/m/n' will be
+(Font)              scaled to size 6.99997pt on input line 269.
+Font mapping `tex-ansi.tec' for font `Source Code Pro' not found.
+File: images/crc_press.jpg Graphic file (type QTm)
+<use  "images/crc_press.jpg" >
+File: images/Rlogo.png Graphic file (type QTm)
+<use  "images/Rlogo.png" >
+Missing character: There is no   in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no   in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no   in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no   in font [lmroman12-regular]:mapping=tex-text;!
+File: images/RStudio-Logo-Blue-Gradient.png Graphic file (type QTm)
+<use  "images/RStudio-Logo-Blue-Gradient.png" >
 Underfull \hbox (badness 10000) has occurred while \output is active
  [][][][] 
  []
 
-[6
+[1
 
-]
-Underfull \vbox (badness 10000) detected at line 171
+
+
+] [2]
+Underfull \hbox (badness 10000) detected at line 315
+ [][]
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 171
-[]
+Underfull \hbox (badness 10000) detected at line 315
+[][][]
  []
 
 
-Underfull \vbox (badness 10000) detected at line 171
+Underfull \hbox (badness 10000) detected at line 327
+ [][]
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 171
-[]
+Underfull \hbox (badness 10000) detected at line 327
+[][][]
  []
 
 
-Underfull \vbox (badness 10000) detected at line 171
+Underfull \hbox (badness 10000) detected at line 337
+ [][]
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 171
-[]
+Underfull \hbox (badness 10000) detected at line 337
+[][][]
  []
 
 
-Underfull \vbox (badness 10000) detected at line 171
+Underfull \hbox (badness 10000) detected at line 349
+ [][]
  []
 
 
-Overfull \hbox (469.50502pt too wide) detected at line 171
-[]
+Underfull \hbox (badness 10000) detected at line 349
+[][][]
  []
 
+File: images/flowcharts/flowchart/flowchart.002.png Graphic file (type QTm)
+<use  "images/flowcharts/flowchart/flowchart.002.png" > [3]
+Underfull \hbox (badness 10000) detected at line 378
+ [][]
+ []
 
-Underfull \hbox (badness 10000) detected at line 171
-[]
+
+Underfull \hbox (badness 10000) detected at line 378
+[][][]
  []
 
-\tf@lof=\write7
-\openout7 = `ismaykim.lof'.
 
+Underfull \hbox (badness 10000) in paragraph at lines 378--380
 
-Underfull \hbox (badness 10000) has occurred while \output is active
- [][][][] 
  []
 
-[7
 
-]
-Underfull \vbox (badness 10000) has occurred while \output is active []
+Underfull \hbox (badness 10000) detected at line 380
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 380
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 382
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 382
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 385--385
+[][][][][]$[][][][][] [] [] [] [][][] [] [][][][][][][][][][][][][][][] [] [][]
+[] [] [][][][][][][][][] [][] [][][][][][][][][] [] [][][][] [][] [][][][][][][
+][][][][] [] [][][][] [][] [][][][][][][][][][] [][]
+ []
 
+[4]
+Underfull \hbox (badness 10000) in paragraph at lines 392--392
+[][][][][]$[][][][][] [] [] [] [][][][][][][][][][][] [] [][][][][][] [] [][] [
+] [][][][][][][] [] [][] [] [][][][][] [] [][][][][][][] [] [][][][][] [] [][][
+][][][][] [] [][][][][][][] [] [][][][] [] [][][][] []
+ []
+
+[5]
+Overfull \hbox (30.0pt too wide) in paragraph at lines 407--407
+|  
+ []
+
+File: images/tidy1.png Graphic file (type QTm)
+<use  "images/tidy1.png" > [6]
+Overfull \hbox (30.0pt too wide) in paragraph at lines 456--456
+|  
+ []
+
+[7]
+Overfull \hbox (30.0pt too wide) in paragraph at lines 466--466
+|  
+ []
+
+[8]
+Underfull \hbox (badness 10000) detected at line 507
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 507
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 517
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 517
+[][][]
+ []
+
+[9]
+Underfull \hbox (badness 10000) detected at line 527
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 527
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 539
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 539
+[][][]
+ []
+
+
+Underfull \hbox (badness 6625) in paragraph at lines 544--546
+[]\TU/lmr/m/n/12 Instead of using formulas, large-sample approximations,
+ []
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+
+Underfull \hbox (badness 10000) detected at line 549
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 549
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 559
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 559
+[][][]
+ []
+
+[10]
+File: images/datacamp.png Graphic file (type QTm)
+<use  "images/datacamp.png" >
+File: images/datacamp_intro_to_R.png Graphic file (type QTm)
+<use  "images/datacamp_intro_to_R.png" >
+File: images/datacamp_intermediate_R.png Graphic file (type QTm)
+<use  "images/datacamp_intermediate_R.png" >
+File: images/datacamp_intro_to_tidyverse.png Graphic file (type QTm)
+<use  "images/datacamp_intro_to_tidyverse.png" >
+File: images/datacamp_working_with_data.png Graphic file (type QTm)
+<use  "images/datacamp_working_with_data.png" >
+File: images/datacamp_intro_to_modeling.png Graphic file (type QTm)
+<use  "images/datacamp_intro_to_modeling.png" >
+File: images/datacamp_inference_for_numerical_data.png Graphic file (type QTm)
+<use  "images/datacamp_inference_for_numerical_data.png" >
+File: images/datacamp_inference_for_categorical_data.png Graphic file (type QTm
+)
+<use  "images/datacamp_inference_for_categorical_data.png" >
+File: images/datacamp_inference_for_regression.png Graphic file (type QTm)
+<use  "images/datacamp_inference_for_regression.png" > [11] [12] [13] [14]
+File: images/ismay.png Graphic file (type QTm)
+<use  "images/ismay.png" >
+File: images/kim.png Graphic file (type QTm)
+<use  "images/kim.png" > [15] [16]
+Underfull \vbox (badness 10000) detected at line 780
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 780
 []
  []
 
 
-Underfull \vbox (badness 10000) has occurred while \output is active []
+Underfull \vbox (badness 10000) detected at line 780
+ []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) detected at line 780
 []
  []
 
 
-Underfull \vbox (badness 10000) has occurred while \output is active []
+Underfull \vbox (badness 10000) detected at line 780
+ []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) detected at line 780
 []
  []
 
 
-Underfull \vbox (badness 10000) has occurred while \output is active []
+Underfull \vbox (badness 10000) detected at line 780
+ []
 
 
-Overfull \hbox (469.50502pt too wide) has occurred while \output is active
+Overfull \hbox (395.75pt too wide) detected at line 780
 []
  []
 
+Chapter 2.
+
+Underfull \hbox (badness 10000) detected at line 788
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 788
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 790
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 790
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 792
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 792
+[][][]
+ []
+
+File: images/engine.jpg Graphic file (type QTm)
+<use  "images/engine.jpg" >
+File: images/dashboard.jpg Graphic file (type QTm)
+<use  "images/dashboard.jpg" >
+Overfull \vbox (0.13506pt too high) has occurred while \output is active []
+
 
 Underfull \hbox (badness 10000) has occurred while \output is active
  [][][][] 
  []
 
-[8
+[17
+
 
 ]
+Overfull \hbox (30.0pt too wide) in paragraph at lines 830--830
+|  
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 838
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 838
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 846
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 846
+[][][]
+ []
+
+[18]
+File: images/Rlogo.png Graphic file (type QTm)
+<use  "images/Rlogo.png" >
+File: images/RStudio-Ball.png Graphic file (type QTm)
+<use  "images/RStudio-Ball.png" >
+File: images/rstudio.png Graphic file (type QTm)
+<use  "images/rstudio.png" > [19] [20]
+Underfull \hbox (badness 10000) in paragraph at lines 905--905
+[][][][][]$[][][][][] [] [] [] [][][][][][] [] [][][][][][][][] [] [][][] [] []
+[][][][][][] [] [][][][] [][] [][][][][][][][][][][][] [][] [][] [][] [] [] [][
+][][][][][] [][] [] [][] [][][][][] [][] [][] [][] [][][][][][] [][]
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 914--915
+
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 947--947
+[][][][][]$[][][][][] [] [] [] [][][][][][] [] [][][][][][][][] [] [][][] [] []
+[][][][][][] [] [][][][][][][][][][][][] [][] [] [] [][][][][][][] [][] [] [][]
+ [][][][][][][][][][][][] [][] [][][] [][] [][][][][][][] [][]
+ []
+
+[21] [22] [23]
+File: images/iphone.jpg Graphic file (type QTm)
+<use  "images/iphone.jpg" >
+File: images/apps.jpg Graphic file (type QTm)
+<use  "images/apps.jpg" >
+Underfull \hbox (badness 10000) detected at line 1059
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1059
+[][][]
+ []
+
+[24]
+Underfull \hbox (badness 10000) detected at line 1061
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1061
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1070
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1070
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1072
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1072
+[][][]
+ []
+
+
+Overfull \hbox (30.0pt too wide) in paragraph at lines 1082--1082
+|  
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1090
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1090
+[][][]
+ []
+
+[25]
+File: images/install_packages_easy_way.png Graphic file (type QTm)
+<use  "images/install_packages_easy_way.png" >
+Underfull \hbox (badness 10000) detected at line 1106
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1106
+[][][]
+ []
+
+[26]
+LaTeX Font Info:    Font shape `TU/SourceCodePro(0)/bx/n' will be
+(Font)              scaled to size 8.39996pt on input line 1128.
+Font mapping `tex-ansi.tec' for font `Source Code Pro Bold' not found.
+[27]
+LaTeX Font Info:    Font shape `TU/SourceCodePro(0)/bx/n' will be
+(Font)              scaled to size 7.69997pt on input line 1181.
+Font mapping `tex-ansi.tec' for font `Source Code Pro Bold' not found.
+[28] [29]
+Underfull \hbox (badness 10000) in paragraph at lines 1248--1250
+[]\TU/SourceCodePro(0)/m/n/12 year month day dep_time sched_dep_time dep_delay 
+arr_time \TU/lmr/m/n/12 are different
+ []
+
+
+Underfull \hbox (badness 3428) in paragraph at lines 1252--1253
+[]\TU/SourceCodePro(0)/m/n/12 ... with 336,766 more rows, and 11 more variables
+: \TU/lmr/m/n/12 indicating to us that
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1266
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1266
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1268
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1268
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1270
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1270
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1272
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1272
+[][][]
+ []
+
+[30] [31] [32] [33] [34]
+File: images/gettting-used-to-R.png Graphic file (type QTm)
+<use  "images/gettting-used-to-R.png" >
+File: images/flowcharts/flowchart/flowchart.004.png Graphic file (type QTm)
+<use  "images/flowcharts/flowchart/flowchart.004.png" >
+Overfull \hbox (39.61073pt too wide) in paragraph at lines 1451--1452
+ [][] 
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 1459
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1459
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 1459
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1459
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 1459
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1459
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 1459
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1459
+[]
+ []
+
+[35] [36]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[37
+
+]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[38]
+Underfull \vbox (badness 10000) detected at line 1462
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1462
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 1462
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1462
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 1462
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1462
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 1462
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1462
+[]
+ []
+
+Chapter 3.
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[39
+
+
+]
+LaTeX Font Info:    Font shape `TU/SourceCodePro(0)/bx/n' will be
+(Font)              scaled to size 6.99997pt on input line 1497.
+Font mapping `tex-ansi.tec' for font `Source Code Pro Bold' not found.
+
+Overfull \hbox (30.0pt too wide) in paragraph at lines 1498--1498
+|  
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1506
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1506
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1508
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1508
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1510
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1510
+[][][]
+ []
+
+[40]
+Underfull \hbox (badness 10000) detected at line 1546
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1546
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1548
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1548
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1550
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1550
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1552
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1552
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1554
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1554
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/gapminder-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/gapminder-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 1574
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1574
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1576
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1576
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1578
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1578
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1580
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1580
+[][][]
+ []
+
+[41] [42] [43]
+Underfull \hbox (badness 10000) detected at line 1647
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1647
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1649
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1649
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1651
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1651
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1653
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1653
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1655
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1655
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1671
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1671
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1673
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1673
+[][][]
+ []
+
+[44]
+Underfull \hbox (badness 10000) detected at line 1724
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1724
+[][][]
+ []
+
+[45]
+Underfull \hbox (badness 10000) detected at line 1726
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1726
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/noalpha-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/noalpha-1.pdf" > [46]
+File: ismaykim_files/figure-latex/nolayers-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/nolayers-1.pdf" > [47]
+Underfull \hbox (badness 3954) in paragraph at lines 1796--1797
+[]\TU/lmr/bx/n/12 (LC3.6) \TU/lmr/m/n/12 Create a new scatterplot using differe
+nt variables in the
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1811
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1811
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1813
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1813
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/alpha-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/alpha-1.pdf" > [48]
+File: ismaykim_files/figure-latex/jitter-example-plot-1-1.pdf Graphic file (typ
+e QTm)
+<use  "ismaykim_files/figure-latex/jitter-example-plot-1-1.pdf" > [49]
+File: ismaykim_files/figure-latex/jitter-example-plot-2-1.pdf Graphic file (typ
+e QTm)
+<use  "ismaykim_files/figure-latex/jitter-example-plot-2-1.pdf" >
+Underfull \hbox (badness 1231) in paragraph at lines 1874--1875
+[]\TU/lmr/m/n/12 To create a jittered scatterplot, instead of using \TU/SourceC
+odePro(0)/m/n/12 geom_point()\TU/lmr/m/n/12 , we use
+ []
+
+File: ismaykim_files/figure-latex/jitter-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/jitter-1.pdf" > [50] [51] [52]
+Underfull \hbox (badness 10000) detected at line 1940
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1940
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1942
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1942
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1944
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1944
+[][][]
+ []
+
+[53]
+File: ismaykim_files/figure-latex/hourlytemp-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/hourlytemp-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 1998
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1998
+[][][]
+ []
+
+
+Underfull \hbox (badness 1189) in paragraph at lines 1998--2000
+[]\TU/lmr/m/n/12 The \TU/SourceCodePro(0)/m/n/12 data \TU/lmr/m/n/12 frame to b
+e \TU/SourceCodePro(0)/m/n/12 early_january_weather \TU/lmr/m/n/12 by setting \
+TU/SourceCodePro(0)/m/n/12 data =
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2000
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2000
+[][][]
+ []
+
+
+Underfull \hbox (badness 1527) in paragraph at lines 2000--2001
+[]\TU/lmr/m/n/12 The \TU/SourceCodePro(0)/m/n/12 aes\TU/lmr/m/n/12 thetic mappi
+ng by setting \TU/SourceCodePro(0)/m/n/12 aes(x = time_hour, y = temp)\TU/lmr/m
+/n/12 .
+ []
+
+[54]
+Underfull \hbox (badness 10000) detected at line 2044
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2044
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2046
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2046
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2048
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2048
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2050
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2050
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/temp-on-line-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/temp-on-line-1.pdf" > [55]
+Underfull \hbox (badness 10000) detected at line 2072
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2072
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2074
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2074
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2076
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2076
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/histogramexample-1.pdf Graphic file (type QTm
+)
+<use  "ismaykim_files/figure-latex/histogramexample-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 2096
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2096
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2098
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2098
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2100
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2100
+[][][]
+ []
+
+[56]
+File: ismaykim_files/figure-latex/weather-histogram-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/weather-histogram-1.pdf" > [57]
+File: ismaykim_files/figure-latex/weather-histogram-2-1.pdf Graphic file (type 
+QTm)
+<use  "ismaykim_files/figure-latex/weather-histogram-2-1.pdf" >
+File: ismaykim_files/figure-latex/weather-histogram-3-1.pdf Graphic file (type 
+QTm)
+<use  "ismaykim_files/figure-latex/weather-histogram-3-1.pdf" > [58] [59]
+Underfull \hbox (badness 10000) detected at line 2182
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2182
+[][][]
+ []
+
+
+Underfull \hbox (badness 2073) in paragraph at lines 2182--2184
+[]\TU/lmr/m/n/12 By adjusting the number of bins via the \TU/SourceCodePro(0)/m
+/n/12 bins \TU/lmr/m/n/12 argument to
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2184
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2184
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/unnamed-chunk-28-1.pdf Graphic file (type QTm
+)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-28-1.pdf" > [60]
+File: ismaykim_files/figure-latex/unnamed-chunk-29-1.pdf Graphic file (type QTm
+)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-29-1.pdf" > [61]
+File: ismaykim_files/figure-latex/facethistogram-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/facethistogram-1.pdf" >
+File: ismaykim_files/figure-latex/facethistogram2-1.pdf Graphic file (type QTm)
+
+<use  "ismaykim_files/figure-latex/facethistogram2-1.pdf" > [62] [63]
+File: ismaykim_files/figure-latex/nov1-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/nov1-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 2329
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2329
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2331
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2331
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2333
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2333
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2335
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2335
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2337
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2337
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/nov2-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/nov2-1.pdf" > [64]
+File: ismaykim_files/figure-latex/nov3-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/nov3-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 2368
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2368
+[][][]
+ []
+
+[65]
+Underfull \hbox (badness 10000) detected at line 2370
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2370
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2372
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2372
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2374
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2374
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2376
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2376
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/nov4-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/nov4-1.pdf" > [66]
+File: ismaykim_files/figure-latex/badbox-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/badbox-1.pdf" > [67]
+File: ismaykim_files/figure-latex/monthtempbox-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/monthtempbox-1.pdf" > [68] [69] [70]
+File: ismaykim_files/figure-latex/geombar-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/geombar-1.pdf" >
+File: ismaykim_files/figure-latex/geomcol-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/geomcol-1.pdf" > [71] [72]
+File: ismaykim_files/figure-latex/flightsbar-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/flightsbar-1.pdf" > [73]
+File: ismaykim_files/figure-latex/carrierpie-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/carrierpie-1.pdf" > [74]
+File: images/Pie-I-have-Eaten.jpg Graphic file (type QTm)
+<use  "images/Pie-I-have-Eaten.jpg" > [75] [76]
+File: ismaykim_files/figure-latex/unnamed-chunk-38-1.pdf Graphic file (type QTm
+)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-38-1.pdf" >
+File: ismaykim_files/figure-latex/flights-stacked-bar-1.pdf Graphic file (type 
+QTm)
+<use  "ismaykim_files/figure-latex/flights-stacked-bar-1.pdf" > [77]
+File: ismaykim_files/figure-latex/flights-stacked-bar-color-1.pdf Graphic file 
+(type QTm)
+<use  "ismaykim_files/figure-latex/flights-stacked-bar-color-1.pdf" > [78]
+File: ismaykim_files/figure-latex/unnamed-chunk-41-1.pdf Graphic file (type QTm
+)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-41-1.pdf" > [79]
+File: ismaykim_files/figure-latex/facet-bar-vert-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/facet-bar-vert-1.pdf" > [80] [81]
+Overfull \hbox (7.53256pt too wide) in paragraph at lines 2861--2861
+[]|\TU/lmr/m/n/10 ‘geom_histogram()‘|  
+ []
+
+[82]
+File: images/ggplot_cheatsheet-1.png Graphic file (type QTm)
+<use  "images/ggplot_cheatsheet-1.png" > [83] [84] [85]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[86
+
+]
+Underfull \vbox (badness 10000) detected at line 2941
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 2941
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 2941
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 2941
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 2941
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 2941
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 2941
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 2941
+[]
+ []
+
+Chapter 4.
+
+Underfull \hbox (badness 10000) detected at line 2949
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2949
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2951
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2951
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2953
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2953
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2955
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2955
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2957
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2957
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2980
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2980
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[87
+
+]
+Underfull \hbox (badness 10000) detected at line 2982
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2982
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2984
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2984
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2986
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2986
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2988
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2988
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2990
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 2990
+[][][]
+ []
+
+[88]
+Underfull \hbox (badness 10000) detected at line 3022
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3022
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3024
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3024
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3026
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3026
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3028
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3028
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3056
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3056
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3058
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3058
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3060
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3060
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3062
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 3062
+[][][]
+ []
+
+LaTeX Font Info:    Font shape `TU/SourceCodePro(0)/m/it' will be
+(Font)              scaled to size 6.99997pt on input line 3070.
+Font mapping `tex-ansi.tec' for font `Source Code Pro Italic' not found.
+[89]
+File: images/filter.png Graphic file (type QTm)
+<use  "images/filter.png" > [90] [91] [92]
+File: images/summarize1.png Graphic file (type QTm)
+<use  "images/summarize1.png" > [93]
+File: images/summary.png Graphic file (type QTm)
+<use  "images/summary.png" > [94] [95]
+File: images/group_summary.png Graphic file (type QTm)
+<use  "images/group_summary.png" > [96] [97] [98] [99] [100]
+Underfull \hbox (badness 1762) in paragraph at lines 3574--3575
+[]\TU/lmr/m/n/12 Why do we \TU/SourceCodePro(0)/m/n/12 group_by(origin, month) 
+\TU/lmr/m/n/12 and not \TU/SourceCodePro(0)/m/n/12 group_by(origin) \TU/lmr/m/n
+/12 and then
+ []
+
+[101]
+Underfull \hbox (badness 10000) in paragraph at lines 3614--3615
+[]\TU/lmr/bx/n/12 (LC4.7) \TU/lmr/m/n/12 Recreate \TU/SourceCodePro(0)/m/n/12 b
+y_monthly_origin\TU/lmr/m/n/12 , but instead of grouping via
+ []
+
+File: images/mutate.png Graphic file (type QTm)
+<use  "images/mutate.png" > [102] [103] [104]
+File: ismaykim_files/figure-latex/unnamed-chunk-87-1.pdf Graphic file (type QTm
+)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-87-1.pdf" > [105] [106] [107] 
+[108]
+File: images/relational-nycflights.png Graphic file (type QTm)
+<use  "images/relational-nycflights.png" > [109]
+File: images/join-inner.png Graphic file (type QTm)
+<use  "images/join-inner.png" > [110] [111] [112]
+File: images/select.png Graphic file (type QTm)
+<use  "images/select.png" > [113] [114] [115] [116] [117]
+Underfull \hbox (badness 10000) detected at line 4264
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4264
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4266
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4266
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4268
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4268
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4270
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4270
+[][][]
+ []
+
+File: images/dplyr_cheatsheet-1.png Graphic file (type QTm)
+<use  "images/dplyr_cheatsheet-1.png" > [118] [119] [120]
+Underfull \vbox (badness 10000) detected at line 4307
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 4307
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 4307
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 4307
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 4307
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 4307
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 4307
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 4307
+[]
+ []
+
+Chapter 5.
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[121
+
+
+] [122] [123]
+Underfull \hbox (badness 10000) detected at line 4405
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4405
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4407
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4407
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4409
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4409
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4411
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4411
+[][][]
+ []
+
+File: images/read_excel.png Graphic file (type QTm)
+<use  "images/read_excel.png" > [124]
+Underfull \hbox (badness 10000) in paragraph at lines 4451--4451
+[][][][][]$[][][][][] [] [] [] [][][][][][][][][][][][][][][] [] [][][] [] [][]
+[][][][][][] [] [][][][] [][] [][][][] [][] [][][][][][][][] [][] [][][][][] []
+[] [][] [][] [][][][][][] [][] [][][][][] [][] [][][] [][] [][][][] [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4459
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4459
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4461
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4461
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4463
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4463
+[][][]
+ []
+
+[125]
+File: ismaykim_files/figure-latex/drinks-smaller-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/drinks-smaller-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 4505
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4505
+[][][]
+ []
+
+[126]
+Underfull \hbox (badness 10000) detected at line 4507
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4507
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4509
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4509
+[][][]
+ []
+
+[127]
+Underfull \hbox (badness 10000) in paragraph at lines 4572--4572
+[][][][][]$[][][][][] [] [] [] [][][] [] [][][][][][] [] [][][] [] [][][][] [][
+] [][][][][][][][] [][] [][][][][] [][] [][][][][][][] [][] [][][][][][][][][][
+][][] [][] [][][][][][][][][][] [] [][] [] [][][][][][][][][][] []
+ []
+
+[128]
+Overfull \hbox (30.0pt too wide) in paragraph at lines 4586--4586
+|  
+ []
+
+
+Overfull \hbox (30.0pt too wide) in paragraph at lines 4592--4592
+|  
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4599
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4599
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4601
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4601
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4603
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4603
+[][][]
+ []
+
+
+Overfull \hbox (30.0pt too wide) in paragraph at lines 4605--4605
+|  
+ []
+
+File: images/tidy-1.png Graphic file (type QTm)
+<use  "images/tidy-1.png" >
+Underfull \hbox (badness 10000) in paragraph at lines 4613--4613
+[]\TU/lmr/bx/n/12 FIGURE 5.2: |\TU/lmr/m/n/12 Tidy data graphic from [R for Dat
+a
+ []
+
+[129] [130] [131]
+Underfull \hbox (badness 10000) detected at line 4743
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4743
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4745
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4745
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4747
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4747
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/unnamed-chunk-128-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-128-1.pdf" > [132]
+Underfull \hbox (badness 1803) in paragraph at lines 4777--4778
+[]\TU/lmr/bx/n/12 (LC5.3) \TU/lmr/m/n/12 Take a look the \TU/SourceCodePro(0)/m
+/n/12 airline_safety \TU/lmr/m/n/12 data frame included in the
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 4785--4785
+[][][][][]$[][][][][] [] [] [] [][][][][][][][][][][][][][][] [] [][][] [] [][]
+[][][][][][] [] [][][][][][] [][] [][][][][][][][][] [][] [][][][][] [][] [][][
+][][][] [][] [][][][][][][][] [][] [][][][] [][] [][][][] [][]
+ []
+
+
+Underfull \hbox (badness 2744) in paragraph at lines 4785--4786
+[]\TU/lmr/m/n/12 After reading the help file by running \TU/SourceCodePro(0)/m/
+n/12 ?airline_safety\TU/lmr/m/n/12 , we see that
+ []
+
+[133] [134]
+Underfull \hbox (badness 10000) detected at line 4831
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4831
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4833
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4833
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4844
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4844
+[][][]
+ []
+
+
+Overfull \hbox (30.0pt too wide) in paragraph at lines 4846--4846
+|  
+ []
+
+[135] [136]
+Underfull \hbox (badness 10000) detected at line 4934
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4934
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4936
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4936
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4938
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 4938
+[][][]
+ []
+
+[137]
+File: ismaykim_files/figure-latex/unnamed-chunk-136-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-136-1.pdf" > [138] [139]
+File: images/import_cheatsheet-1.png Graphic file (type QTm)
+<use  "images/import_cheatsheet-1.png" >
+File: images/flowcharts/flowchart/flowchart.005.png Graphic file (type QTm)
+<use  "images/flowcharts/flowchart/flowchart.005.png" >
+Underfull \vbox (badness 10000) detected at line 5052
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 5052
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 5052
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 5052
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 5052
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 5052
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 5052
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 5052
+[]
+ []
+
+[140] [141] [142]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[143
+
+]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[144]
+Underfull \vbox (badness 10000) detected at line 5055
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 5055
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 5055
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 5055
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 5055
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 5055
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 5055
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 5055
+[]
+ []
+
+Chapter 6.
+
+Underfull \hbox (badness 10000) detected at line 5073
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5073
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5075
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5075
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[145
+
+
+] [146]
+Underfull \hbox (badness 10000) detected at line 5149
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5149
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5151
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5151
+[][][]
+ []
+
+[147]
+Underfull \hbox (badness 10000) detected at line 5163
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5163
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5165
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5165
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5167
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5167
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5210
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5210
+[][][]
+ []
+
+[148]
+Underfull \hbox (badness 10000) detected at line 5212
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5212
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5214
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5214
+[][][]
+ []
+
+Missing character: There is no ▂ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▅ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▅ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▇ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▃ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▃ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▂ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▁ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▁ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▁ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▂ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▃ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▅ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▇ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▇ in font [lmroman12-regular]:mapping=tex-text;!
+Missing character: There is no ▆ in font [lmroman12-regular]:mapping=tex-text;!
+
+Overfull \hbox (54.62804pt too wide) in paragraph at lines 5249--5258
+[][] 
+ []
+
+[149] [150]
+File: ismaykim_files/figure-latex/correlation1-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/correlation1-1.pdf" > [151]
+File: ismaykim_files/figure-latex/numxplot1-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxplot1-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 5368
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5368
+[][][]
+ []
+
+[152]
+Underfull \hbox (badness 10000) detected at line 5370
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5370
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5372
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5372
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/numxplot2-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxplot2-1.pdf" >
+File: ismaykim_files/figure-latex/numxplot2-a-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxplot2-a-1.pdf" > [153]
+Underfull \hbox (badness 10000) detected at line 5405
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5405
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5407
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5407
+[][][]
+ []
+
+[154]
+File: ismaykim_files/figure-latex/numxplot3-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxplot3-1.pdf" > [155]
+File: ismaykim_files/figure-latex/numxplot4-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxplot4-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 5464
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5464
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5466
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5466
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5468
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5468
+[][][]
+ []
+
+[156] [157]
+Overfull \hbox (9.144pt too wide) in paragraph at lines 5545--5545
+| 
+ []
+
+
+Overfull \hbox (39.144pt too wide) in paragraph at lines 5547--5547
+|  
+ []
+
+[158]
+Underfull \hbox (badness 10000) in paragraph at lines 5555--5557
+
+ []
+
+LaTeX Font Info:    Font shape `TU/SourceCodePro(0)/m/it' will be
+(Font)              scaled to size 8.39996pt on input line 5564.
+Font mapping `tex-ansi.tec' for font `Source Code Pro Italic' not found.
+[159]
+File: images/flowcharts/flowchart.011-cropped.png Graphic file (type QTm)
+<use  "images/flowcharts/flowchart.011-cropped.png" > [160]
+File: ismaykim_files/figure-latex/numxplot5-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxplot5-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 5683
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5683
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5685
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5685
+[][][]
+ []
+
+[161]
+Underfull \hbox (badness 3612) in paragraph at lines 5715--5716
+[]\TU/lmr/m/n/12 Just as with the \TU/SourceCodePro(0)/m/n/12 get_regression_ta
+ble() \TU/lmr/m/n/12 function, the inputs to the
+ []
+
+[162]
+Underfull \hbox (badness 10000) detected at line 5758
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5758
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5760
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5760
+[][][]
+ []
+
+[163]
+Underfull \hbox (badness 10000) detected at line 5771
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5771
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5773
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5773
+[][][]
+ []
+
+[164]
+Overfull \hbox (2.15826pt too wide) in paragraph at lines 5859--5859
+[]\TU/SourceCodePro(0)/m/n/12 -- Variable type:factor -------------------------
+------------------------------[] 
+ []
+
+
+Overfull \hbox (2.15826pt too wide) in paragraph at lines 5859--5859
+[]\TU/SourceCodePro(0)/m/n/12 -- Variable type:numeric ------------------------
+------------------------------[] 
+ []
+
+[165]
+File: ismaykim_files/figure-latex/lifeExp2007hist-1.pdf Graphic file (type QTm)
+
+<use  "ismaykim_files/figure-latex/lifeExp2007hist-1.pdf" > [166]
+File: ismaykim_files/figure-latex/catxplot0b-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/catxplot0b-1.pdf" > [167]
+File: ismaykim_files/figure-latex/catxplot1-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/catxplot1-1.pdf" > [168]
+Underfull \hbox (badness 10000) detected at line 5974
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5974
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5976
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5976
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5978
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5978
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5980
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5980
+[][][]
+ []
+
+[169]
+Underfull \hbox (badness 10000) detected at line 5995
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5995
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5997
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5997
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5999
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 5999
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6019
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6019
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6021
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6021
+[][][]
+ []
+
+[170]
+Overfull \hbox (67.11713pt too wide) in paragraph at lines 6078--6078
+[] 
+ []
+
+[171]
+Overfull \hbox (67.11713pt too wide) in paragraph at lines 6109--6109
+[] 
+ []
+
+
+Overfull \hbox (67.11713pt too wide) in paragraph at lines 6123--6123
+[] 
+ []
+
+[172]
+Overfull \hbox (67.11713pt too wide) in paragraph at lines 6137--6137
+[] 
+ []
+
+[173]
+Underfull \hbox (badness 10000) detected at line 6164
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6164
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6166
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6166
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6168
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6168
+[][][]
+ []
+
+[174]
+File: ismaykim_files/figure-latex/correlation2-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/correlation2-1.pdf" > [175]
+File: images/guess_the_correlation.png Graphic file (type QTm)
+<use  "images/guess_the_correlation.png" > [176]
+File: images/flowcharts/flowchart.010-cropped.png Graphic file (type QTm)
+<use  "images/flowcharts/flowchart.010-cropped.png" >
+File: images/flowcharts/flowchart.009-cropped.png Graphic file (type QTm)
+<use  "images/flowcharts/flowchart.009-cropped.png" > [177] [178]
+File: ismaykim_files/figure-latex/unnamed-chunk-183-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-183-1.pdf" >
+File: ismaykim_files/figure-latex/unnamed-chunk-184-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-184-1.pdf" > [179]
+File: ismaykim_files/figure-latex/here-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/here-1.pdf" > [180]
+Underfull \hbox (badness 2556) in paragraph at lines 6383--6384
+\TU/lmr/m/n/12 What is going on behind the scenes with the \TU/SourceCodePro(0)
+/m/n/12 get_regression_table()
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6389
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6389
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6391
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6391
+[][][]
+ []
+
+[181] [182] [183] [184]
+Underfull \vbox (badness 10000) detected at line 6527
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 6527
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 6527
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 6527
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 6527
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 6527
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 6527
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 6527
+[]
+ []
+
+Chapter 7.
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[185
+
+
+]
+Underfull \hbox (badness 10000) detected at line 6564
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6564
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6566
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6566
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6584
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6584
+[][][]
+ []
+
+[186]
+Underfull \hbox (badness 10000) detected at line 6586
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6586
+[][][]
+ []
+
+[187]
+Overfull \hbox (2.15826pt too wide) in paragraph at lines 6681--6681
+[]\TU/SourceCodePro(0)/m/n/12 -- Variable type:integer ------------------------
+------------------------------[] 
+ []
+
+
+Overfull \hbox (2.15826pt too wide) in paragraph at lines 6681--6681
+[]\TU/SourceCodePro(0)/m/n/12 -- Variable type:numeric ------------------------
+------------------------------[] 
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6689
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6689
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6691
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6691
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6693
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6693
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6695
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6695
+[][][]
+ []
+
+[188]
+Underfull \hbox (badness 10000) detected at line 6746
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6746
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6748
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6748
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6750
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6750
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6752
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6752
+[][][]
+ []
+
+[189]
+File: ismaykim_files/figure-latex/2numxplot1-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/2numxplot1-1.pdf" >
+Underfull \hbox (badness 2173) in paragraph at lines 6781--6781
+[]\TU/lmr/bx/n/12 FIGURE 7.1: |\TU/lmr/m/n/12 Relationship between credit card 
+balance and credit
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6790
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6790
+[][][]
+ []
+
+[190]
+Underfull \hbox (badness 10000) detected at line 6792
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6792
+[][][]
+ []
+
+File: images/credit_card_balance_3D_scatterplot.png Graphic file (type QTm)
+<use  "images/credit_card_balance_3D_scatterplot.png" >
+Underfull \hbox (badness 10000) in paragraph at lines 6807--6807
+[][][][][]$[][][][][] [] [] [] [][][][][][] [] [][][][][][][][] [] [][][] [] []
+[][][][][][][][][] [] [][][][][][][][][][][][] [] [][][][] [] [][][][][][][][] 
+[]
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 6807--6807
+[][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][
+] [] [][] [][] [][] [][] [][][][][][] [][] [][][][][][][][][][][] [] [][] [] []
+[][][][][][][][][][] [] [][][][][][][][][][] []
+ []
+
+File: images/credit_card_balance_regression_plane.png Graphic file (type QTm)
+<use  "images/credit_card_balance_regression_plane.png" > [191]
+Underfull \hbox (badness 10000) detected at line 6831
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6831
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6833
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6833
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6835
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6835
+[][][]
+ []
+
+[192] [193]
+Underfull \hbox (badness 1960) in paragraph at lines 6910--6911
+\TU/lmr/m/n/12 As we did previously in Table [][]7.4[][], let’s unpack the outp
+ut of the
+ []
+
+[194]
+Underfull \hbox (badness 10000) detected at line 6964
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6964
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6966
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 6966
+[][][]
+ []
+
+[195]
+Overfull \hbox (2.15826pt too wide) in paragraph at lines 7047--7047
+[]\TU/SourceCodePro(0)/m/n/12 -- Variable type:factor -------------------------
+------------------------------[] 
+ []
+
+
+Overfull \hbox (2.15826pt too wide) in paragraph at lines 7047--7047
+[]\TU/SourceCodePro(0)/m/n/12 -- Variable type:integer ------------------------
+------------------------------[] 
+ []
+
+[196]
+Overfull \hbox (2.15826pt too wide) in paragraph at lines 7047--7047
+[]\TU/SourceCodePro(0)/m/n/12 -- Variable type:numeric ------------------------
+------------------------------[] 
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7071
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7071
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7073
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7073
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/numxcatxplot1-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxcatxplot1-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 7100
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7100
+[][][]
+ []
+
+[197]
+Underfull \hbox (badness 10000) detected at line 7102
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7102
+[][][]
+ []
+
+[198]
+File: ismaykim_files/figure-latex/numxcatxplot2-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxcatxplot2-1.pdf" > [199] [200] [201] [20
+2] [203]
+File: ismaykim_files/figure-latex/unnamed-chunk-226-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-226-1.pdf" >
+Underfull \hbox (badness 2173) in paragraph at lines 7356--7356
+[]\TU/lmr/bx/n/12 FIGURE 7.5: |\TU/lmr/m/n/12 Relationship between credit card 
+balance and credit
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7367
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7367
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7369
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7369
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7371
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7371
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7373
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7373
+[][][]
+ []
+
+[204]
+File: ismaykim_files/figure-latex/credit-limit-quartiles-1.pdf Graphic file (ty
+pe QTm)
+<use  "ismaykim_files/figure-latex/credit-limit-quartiles-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 7396
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7396
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7398
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7398
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/2numxplot4-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/2numxplot4-1.pdf" > [205] [206]
+Underfull \vbox (badness 10000) detected at line 7442
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 7442
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 7442
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 7442
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 7442
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 7442
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 7442
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 7442
+[]
+ []
+
+[207]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[208
+
+]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[209]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[210]
+Underfull \vbox (badness 10000) detected at line 7445
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 7445
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 7445
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 7445
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 7445
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 7445
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 7445
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 7445
+[]
+ []
+
+Chapter 8.
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[211
+
+
+]
+File: images/sampling_bowl_1.jpg Graphic file (type QTm)
+<use  "images/sampling_bowl_1.jpg" >
+File: images/sampling_bowl_2.jpg Graphic file (type QTm)
+<use  "images/sampling_bowl_2.jpg" >
+File: images/sampling_bowl_3_cropped.jpg Graphic file (type QTm)
+<use  "images/sampling_bowl_3_cropped.jpg" > [212] [213]
+File: images/sampling/tactile_2_a.jpg Graphic file (type QTm)
+<use  "images/sampling/tactile_2_a.jpg" >
+File: images/sampling/tactile_2_b.jpg Graphic file (type QTm)
+<use  "images/sampling/tactile_2_b.jpg" >
+File: images/sampling/tactile_2_c.jpg Graphic file (type QTm)
+<use  "images/sampling/tactile_2_c.jpg" >
+File: images/sampling/tactile_3_a.jpg Graphic file (type QTm)
+<use  "images/sampling/tactile_3_a.jpg" >
+File: images/sampling/tactile_3_c.jpg Graphic file (type QTm)
+<use  "images/sampling/tactile_3_c.jpg" > [214] [215]
+File: ismaykim_files/figure-latex/samplingdistribution-tactile-1.pdf Graphic fi
+le (type QTm)
+<use  "ismaykim_files/figure-latex/samplingdistribution-tactile-1.pdf" > [216] 
+[217]
+Overfull \hbox (16.61458pt too wide) in alignment at lines 7669--7673
+ [] [] 
+ []
+
+File: images/crash-test-dummy.jpg Graphic file (type QTm)
+<use  "images/crash-test-dummy.jpg" >
+File: images/flight-simulator.jpg Graphic file (type QTm)
+<use  "images/flight-simulator.jpg" >
+Overfull \hbox (16.61458pt too wide) in alignment at lines 7673--7676
+ [] [] 
+ []
+
+[218] [219] [220] [221] [222] [223]
+File: ismaykim_files/figure-latex/samplingdistribution-virtual-1.pdf Graphic fi
+le (type QTm)
+<use  "ismaykim_files/figure-latex/samplingdistribution-virtual-1.pdf" > [224]
+File: ismaykim_files/figure-latex/tactile-vs-virtual-1.pdf Graphic file (type Q
+Tm)
+<use  "ismaykim_files/figure-latex/tactile-vs-virtual-1.pdf" > [225]
+File: ismaykim_files/figure-latex/samplingdistribution-virtual-1000-1.pdf Graph
+ic file (type QTm)
+<use  "ismaykim_files/figure-latex/samplingdistribution-virtual-1000-1.pdf" >
+Overfull \hbox (115.31482pt too wide) in alignment at lines 7976--7980
+ [] [] [] 
+ []
+
+File: images/sampling/shovel_025.jpg Graphic file (type QTm)
+<use  "images/sampling/shovel_025.jpg" >
+File: images/sampling/shovel_050.jpg Graphic file (type QTm)
+<use  "images/sampling/shovel_050.jpg" >
+File: images/sampling/shovel_100.jpg Graphic file (type QTm)
+<use  "images/sampling/shovel_100.jpg" >
+Overfull \hbox (115.31482pt too wide) in alignment at lines 7980--7983
+ [] [] [] 
+ []
+
+[226]
+Underfull \hbox (badness 10000) detected at line 7991
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7991
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7993
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7993
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7995
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 7995
+[][][]
+ []
+
+
+Overfull \vbox (282.61697pt too high) has occurred while \output is active []
+
+[227]
+File: ismaykim_files/figure-latex/comparing-sampling-distributions-1.pdf Graphi
+c file (type QTm)
+<use  "ismaykim_files/figure-latex/comparing-sampling-distributions-1.pdf" > [2
+28] [229]
+Underfull \hbox (badness 10000) detected at line 8114
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8114
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8116
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8116
+[][][]
+ []
+
+[230]
+Underfull \hbox (badness 10000) detected at line 8130
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8130
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8132
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8132
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8134
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8134
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8136
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8136
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8138
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8138
+[][][]
+ []
+
+[231]
+Underfull \hbox (badness 10000) detected at line 8140
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8140
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8142
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8142
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8144
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8144
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8146
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8146
+[][][]
+ []
+
+
+Overfull \hbox (30.0pt too wide) in paragraph at lines 8165--8165
+|  
+ []
+
+[232]
+Overfull \hbox (30.0pt too wide) in paragraph at lines 8183--8183
+|  
+ []
+
+File: ismaykim_files/figure-latex/unnamed-chunk-247-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-247-1.pdf" > [233] [234]
+File: ismaykim_files/figure-latex/comparing-sampling-distributions-2-1.pdf Grap
+hic file (type QTm)
+
+<use  "ismaykim_files/figure-latex/comparing-sampling-distributions-2-1.pdf" > 
+[235] [236]
+File: ismaykim_files/figure-latex/comparing-sampling-distributions-3-1.pdf Grap
+hic file (type QTm)
+
+<use  "ismaykim_files/figure-latex/comparing-sampling-distributions-3-1.pdf" >
+File: images/accuracy_vs_precision.png Graphic file (type QTm)
+<use  "images/accuracy_vs_precision.png" > [237]
+Underfull \hbox (badness 10000) detected at line 8297
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8297
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8299
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8299
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 8309--8309
+[][][][][]$[][][][][] [] [] [] [][][] [] [][][] [] [][][] [] [][][][][][][][] [
+] [][][][][][][][][][][][][][] [] [][][][] [] [][] [] [][] [] [][][][][][][][][
+] [] [][][][] [][] [][][][][][][] [][] [][][] [][] [][][][][] [][]
+ []
+
+[238]
+Overfull \hbox (30.0pt too wide) in paragraph at lines 8315--8315
+|  
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8323
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8323
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8333
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8333
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8343
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8343
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8353
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8353
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8363
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8363
+[][][]
+ []
+
+[239]
+Underfull \hbox (badness 10000) detected at line 8373
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8373
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8383
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8383
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8393
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8393
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8403
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8403
+[][][]
+ []
+
+[240]
+Overfull \hbox (30.0pt too wide) in paragraph at lines 8430--8430
+|  
+ []
+
+
+Overfull \hbox (30.0pt too wide) in paragraph at lines 8448--8448
+|  
+ []
+
+
+Overfull \hbox (30.0pt too wide) in paragraph at lines 8466--8466
+|  
+ []
+
+[241]
+Overfull \hbox (0.565pt too wide) in paragraph at lines 8492--8492
+ []|\TU/lmr/m/n/10 Scenario|
+ []
+
+[242] [243]
+Overfull \hbox (30.0pt too wide) in paragraph at lines 8551--8551
+|  
+ []
+
+[244]
+Underfull \vbox (badness 10000) detected at line 8556
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 8556
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 8556
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 8556
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 8556
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 8556
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 8556
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 8556
+[]
+ []
+
+Chapter 9.
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[245
+
+
+]
+Overfull \hbox (0.565pt too wide) in paragraph at lines 8587--8587
+ []|\TU/lmr/m/n/10 Scenario|
+ []
+
+[246] [247] [248]
+File: ismaykim_files/figure-latex/unnamed-chunk-255-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-255-1.pdf" > [249] [250]
+File: ismaykim_files/figure-latex/unnamed-chunk-259-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-259-1.pdf" > [251]
+Underfull \hbox (badness 10000) detected at line 8784
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8784
+[][][]
+ []
+
+
+Underfull \hbox (badness 2486) in paragraph at lines 8784--8786
+[]\TU/lmr/m/n/12 First, pretend that each of the 40 values of \TU/SourceCodePro
+(0)/m/n/12 age_in_2011 \TU/lmr/m/n/12 in
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8786
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8786
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8788
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8788
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8790
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8790
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8792
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8792
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8794
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8794
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8796
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8796
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8798
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8798
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/unnamed-chunk-262-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-262-1.pdf" > [252] [253]
+File: images/flowcharts/infer/specify.png Graphic file (type QTm)
+<use  "images/flowcharts/infer/specify.png" > [254]
+Underfull \hbox (badness 10000) detected at line 8864
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8864
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8897
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 8897
+[][][]
+ []
+
+[255]
+File: images/flowcharts/infer/generate.png Graphic file (type QTm)
+<use  "images/flowcharts/infer/generate.png" > [256] [257]
+File: images/flowcharts/infer/calculate.png Graphic file (type QTm)
+<use  "images/flowcharts/infer/calculate.png" > [258] [259] [260]
+File: images/flowcharts/infer/visualize.png Graphic file (type QTm)
+<use  "images/flowcharts/infer/visualize.png" >
+File: ismaykim_files/figure-latex/unnamed-chunk-275-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-275-1.pdf" > [261]
+File: images/flowcharts/infer/ci_diagram.png Graphic file (type QTm)
+<use  "images/flowcharts/infer/ci_diagram.png" > [262]
+File: ismaykim_files/figure-latex/unnamed-chunk-277-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-277-1.pdf" > [263] [264]
+File: ismaykim_files/figure-latex/unnamed-chunk-281-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-281-1.pdf" > [265]
+Underfull \hbox (badness 2384) in paragraph at lines 9172--9173
+[]\TU/lmr/m/n/12 You can see that 95% of the data stored in the \TU/SourceCodeP
+ro(0)/m/n/12 stat \TU/lmr/m/n/12 variable in
+ []
+
+[266]
+File: ismaykim_files/figure-latex/unnamed-chunk-284-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-284-1.pdf" > [267]
+File: ismaykim_files/figure-latex/unnamed-chunk-285-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-285-1.pdf" > [268]
+File: ismaykim_files/figure-latex/unnamed-chunk-287-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-287-1.pdf" > [269] [270]
+File: ismaykim_files/figure-latex/unnamed-chunk-291-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-291-1.pdf" > [271]
+File: ismaykim_files/figure-latex/unnamed-chunk-293-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-293-1.pdf" > [272] [273] [274]
+File: ismaykim_files/figure-latex/unnamed-chunk-301-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-301-1.pdf" >
+File: ismaykim_files/figure-latex/unnamed-chunk-302-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-302-1.pdf" > [275] [276] [277]
+File: ismaykim_files/figure-latex/unnamed-chunk-310-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-310-1.pdf" > [278]
+File: ismaykim_files/figure-latex/unnamed-chunk-312-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-312-1.pdf" > [279]
+Underfull \hbox (badness 10000) detected at line 9609
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9609
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9611
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9611
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9613
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9613
+[][][]
+ []
+
+[280]
+Underfull \hbox (badness 10000) detected at line 9615
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9615
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9617
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9617
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9627
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9627
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9652
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9652
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9654
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9654
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9656
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9656
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9658
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9658
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9660
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9660
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9662
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9662
+[][][]
+ []
+
+[281]
+Overfull \hbox (2.3pt too wide) in alignment at lines 9682--9687
+ [] [] [] [] [] [] [] [] 
+ []
+
+
+Overfull \hbox (2.3pt too wide) in alignment at lines 9687--9692
+ [] [] [] [] [] [] [] [] 
+ []
+
+
+Overfull \hbox (2.3pt too wide) in alignment at lines 9692--9694
+ [] [] [] [] [] [] [] [] 
+ []
+
+
+Overfull \hbox (2.3pt too wide) in alignment at lines 9694--9696
+ [] [] [] [] [] [] [] [] 
+ []
+
+
+Overfull \hbox (2.3pt too wide) in alignment at lines 9696--9720
+ [] [] [] [] [] [] [] [] 
+ []
+
+
+Overfull \hbox (2.3pt too wide) in alignment at lines 9720--9736
+ [] [] [] [] [] [] [] [] 
+ []
+
+[282]
+Underfull \hbox (badness 10000) detected at line 9745
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9745
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9747
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9747
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/tactile-conf-int-1.pdf Graphic file (type QTm
+)
+<use  "ismaykim_files/figure-latex/tactile-conf-int-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 9781
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9781
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9783
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9783
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9785
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9785
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9787
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9787
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9789
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 9789
+[][][]
+ []
+
+[283]
+Underfull \vbox (badness 10000) detected at line 9825
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 9825
+ []
+
+[284]
+File: ismaykim_files/figure-latex/virtual-conf-int-1.pdf Graphic file (type QTm
+)
+<use  "ismaykim_files/figure-latex/virtual-conf-int-1.pdf" > [285] [286] [287]
+Overfull \hbox (163.43756pt too wide) in paragraph at lines 9931--9931
+[]\TU/SourceCodePro(0)/m/n/12 Error: A level of the response variable []yawn[] 
+needs to be specified for the []success[] argument in []specify()[].[] 
+ []
+
+
+Overfull \hbox (692.63525pt too wide) in paragraph at lines 9973--9973
+[]\TU/SourceCodePro(0)/m/n/12 Error: Statistic is based on a difference; specif
+y the []order[] in which to subtract the levels of the explanatory variable. []
+order = c("first", "second")[] means []("first" - "second")[]. Check []?calcula
+te[] for details.[] 
+ []
+
+[288] [289]
+File: ismaykim_files/figure-latex/unnamed-chunk-327-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-327-1.pdf" > [290] [291] [292]
+
+Underfull \vbox (badness 10000) detected at line 10121
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 10121
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 10121
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 10121
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 10121
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 10121
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 10121
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 10121
+[]
+ []
+
+Chapter 10.
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[293
+
+
+] [294] [295]
+File: ismaykim_files/figure-latex/unnamed-chunk-340-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-340-1.pdf" > [296] [297]
+Underfull \hbox (badness 10000) in paragraph at lines 10258--10260
+
+ []
+
+File: images/ht.png Graphic file (type QTm)
+<use  "images/ht.png" > [298]
+Underfull \hbox (badness 10000) detected at line 10288
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10288
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10290
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10290
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10292
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10292
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10294
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10294
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10304
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10304
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10306
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10306
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10308
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10308
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10310
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 10310
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 10321--10323
+
+ []
+
+[299]
+File: images/errors.png Graphic file (type QTm)
+<use  "images/errors.png" > [300] [301] [302]
+File: images/flowcharts/infer/ht.png Graphic file (type QTm)
+<use  "images/flowcharts/infer/ht.png" > [303] [304] [305]
+File: ismaykim_files/figure-latex/unnamed-chunk-349-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-349-1.pdf" > [306]
+File: ismaykim_files/figure-latex/movie-hist-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/movie-hist-1.pdf" >
+LaTeX Font Info:    Calculating math sizes for size <11> on input line 10565.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(0)/m/n' will be
+(Font)              scaled to size 11.0pt on input line 10565.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(0)/m/n' will be
+(Font)              scaled to size 7.69997pt on input line 10565.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(0)/m/n' will be
+(Font)              scaled to size 5.5pt on input line 10565.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font)              scaled to size 11.00017pt on input line 10565.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font)              scaled to size 7.70007pt on input line 10565.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font)              scaled to size 5.50008pt on input line 10565.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font)              scaled to size 10.99983pt on input line 10565.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font)              scaled to size 7.69984pt on input line 10565.
+LaTeX Font Info:    Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font)              scaled to size 5.49991pt on input line 10565.
+[307]
+File: ismaykim_files/figure-latex/unnamed-chunk-352-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-352-1.pdf" > [308]
+File: ismaykim_files/figure-latex/unnamed-chunk-353-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-353-1.pdf" > [309] [310] [311]
+[312]
+File: ismaykim_files/figure-latex/unnamed-chunk-366-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-366-1.pdf" > [313]
+File: ismaykim_files/figure-latex/unnamed-chunk-367-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-367-1.pdf" >
+File: ismaykim_files/figure-latex/unnamed-chunk-368-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-368-1.pdf" > [314] [315] [316]
+[317] [318]
+File: ismaykim_files/figure-latex/unnamed-chunk-373-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-373-1.pdf" > [319]
+File: ismaykim_files/figure-latex/unnamed-chunk-374-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-374-1.pdf" > [320]
+File: ismaykim_files/figure-latex/unnamed-chunk-376-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-376-1.pdf" > [321]
+File: ismaykim_files/figure-latex/unnamed-chunk-377-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-377-1.pdf" > [322]
+File: ismaykim_files/figure-latex/unnamed-chunk-379-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-379-1.pdf" > [323]
+Underfull \hbox (badness 10000) detected at line 11082
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11082
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11084
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11084
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11086
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11086
+[][][]
+ []
+
+[324]
+File: images/flowcharts/infer/ht_diagram.png Graphic file (type QTm)
+<use  "images/flowcharts/infer/ht_diagram.png" > [325]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[326
+
+]
+Underfull \vbox (badness 10000) detected at line 11112
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11112
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 11112
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11112
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 11112
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11112
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 11112
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11112
+[]
+ []
+
+Chapter 11.
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[327
+
+] [328] [329]
+File: ismaykim_files/figure-latex/unnamed-chunk-388-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-388-1.pdf" > [330] [331]
+File: ismaykim_files/figure-latex/unnamed-chunk-394-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-394-1.pdf" > [332] [333]
+Underfull \hbox (badness 10000) detected at line 11388
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11388
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11390
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11390
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/model1-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/model1-1.pdf" >
+File: ismaykim_files/figure-latex/model2-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/model2-1.pdf" > [334] [335]
+Underfull \hbox (badness 10000) detected at line 11492
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11492
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11494
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11494
+[][][]
+ []
+
+[336]
+File: ismaykim_files/figure-latex/numxplot6-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxplot6-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 11572
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11572
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11574
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11574
+[][][]
+ []
+
+[337]
+File: ismaykim_files/figure-latex/numxplot7-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxplot7-1.pdf" >
+File: ismaykim_files/figure-latex/model1residualshist-1.pdf Graphic file (type 
+QTm)
+<use  "ismaykim_files/figure-latex/model1residualshist-1.pdf" >
+File: ismaykim_files/figure-latex/numxplot9-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/numxplot9-1.pdf" > [338] [339] [340]
+Underfull \hbox (badness 10000) detected at line 11671
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11671
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11673
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11673
+[][][]
+ []
+
+[341]
+File: ismaykim_files/figure-latex/catxplot7-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/catxplot7-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 11702
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11702
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11704
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11704
+[][][]
+ []
+
+[342]
+File: ismaykim_files/figure-latex/catxplot8-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/catxplot8-1.pdf" > [343] [344]
+File: ismaykim_files/figure-latex/unnamed-chunk-415-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-415-1.pdf" > [345]
+File: ismaykim_files/figure-latex/model3-residuals-hist-1.pdf Graphic file (typ
+e QTm)
+<use  "ismaykim_files/figure-latex/model3-residuals-hist-1.pdf" > [346]
+File: ismaykim_files/figure-latex/residual1-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/residual1-1.pdf" > [347]
+File: ismaykim_files/figure-latex/residual2-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/residual2-1.pdf" >
+Underfull \vbox (badness 10000) detected at line 11947
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11947
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 11947
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11947
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 11947
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11947
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 11947
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11947
+[]
+ []
+
+[348]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[349
+
+]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[350]
+Underfull \vbox (badness 10000) detected at line 11950
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11950
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 11950
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11950
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 11950
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11950
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 11950
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 11950
+[]
+ []
+
+Chapter 12.
+File: images/flowcharts/flowchart/flowchart.002.png Graphic file (type QTm)
+<use  "images/flowcharts/flowchart/flowchart.002.png" >
+Underfull \hbox (badness 10000) detected at line 11981
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11981
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11993
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 11993
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12003
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12003
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[351
+
+
+]
+Underfull \hbox (badness 10000) detected at line 12015
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12015
+[][][]
+ []
+
+[352]
+Overfull \hbox (30.0pt too wide) in paragraph at lines 12028--12028
+|  
+ []
+
+File: images/tidy1.png Graphic file (type QTm)
+<use  "images/tidy1.png" > [353]
+Underfull \hbox (badness 10000) detected at line 12078
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12078
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12080
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12080
+[][][]
+ []
+
+[354]
+Underfull \hbox (badness 10000) detected at line 12103
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12103
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12105
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12105
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12107
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12107
+[][][]
+ []
+
+[355]
+File: ismaykim_files/figure-latex/house-prices-viz-1.pdf Graphic file (type QTm
+)
+<use  "ismaykim_files/figure-latex/house-prices-viz-1.pdf" > [356]
+Underfull \hbox (badness 10000) detected at line 12181
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12181
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12191
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12191
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12201
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12201
+[][][]
+ []
+
+[357]
+Underfull \hbox (badness 10000) detected at line 12241
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12241
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12243
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12243
+[][][]
+ []
+
+
+Underfull \hbox (badness 1303) in paragraph at lines 12253--12254
+\TU/lmr/m/n/12 we raise 10 to this value. For example, to undo the previous log
+10-
+ []
+
+[358]
+Underfull \hbox (badness 10000) detected at line 12282
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12282
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12284
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12284
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12286
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12286
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12288
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12288
+[][][]
+ []
+
+[359] [360]
+File: ismaykim_files/figure-latex/log10-price-viz-1.pdf Graphic file (type QTm)
+
+<use  "ismaykim_files/figure-latex/log10-price-viz-1.pdf" > [361]
+File: ismaykim_files/figure-latex/log10-size-viz-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/log10-size-viz-1.pdf" >
+Underfull \hbox (badness 10000) detected at line 12401
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12401
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12403
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12403
+[][][]
+ []
+
+[362]
+File: ismaykim_files/figure-latex/house-price-parallel-slopes-1.pdf Graphic fil
+e (type QTm)
+<use  "ismaykim_files/figure-latex/house-price-parallel-slopes-1.pdf" >
+File: ismaykim_files/figure-latex/house-price-interaction-1.pdf Graphic file (t
+ype QTm)
+<use  "ismaykim_files/figure-latex/house-price-interaction-1.pdf" > [363]
+File: ismaykim_files/figure-latex/house-price-interaction-2-1.pdf Graphic file 
+(type QTm)
+<use  "ismaykim_files/figure-latex/house-price-interaction-2-1.pdf" > [364]
+Underfull \hbox (badness 10000) detected at line 12506
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12506
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12508
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12508
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12510
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12510
+[][][]
+ []
+
+[365]
+Underfull \hbox (badness 10000) detected at line 12512
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12512
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12514
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12514
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/house-price-interaction-3-1.pdf Graphic file 
+(type QTm)
+<use  "ismaykim_files/figure-latex/house-price-interaction-3-1.pdf" > [366] [36
+7]
+File: images/sign-2408065_1920.png Graphic file (type QTm)
+<use  "images/sign-2408065_1920.png" >
+Underfull \hbox (badness 10000) in paragraph at lines 12598--12598
+[][][][][]$[][][][] [] [] [] [][][][][][][][][][][][][][][] [] [][][] [] [][][]
+[][][][][] [] [][][] [][] [][][][][][] [][] [][][] [][] [][][][][] [][] [][][][
+] [][] [][][][][][][] [][] [][][][][][][][][][] [][]
+ []
+
+[368] [369]
+File: ismaykim_files/figure-latex/unnamed-chunk-440-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-440-1.pdf" > [370] [371] [372]
+
+Underfull \vbox (badness 10000) detected at line 12685
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 12685
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 12685
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 12685
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 12685
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 12685
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 12685
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 12685
+[]
+ []
+
+Appendix A.
+
+Overfull \hbox (12.34805pt too wide) detected at line 12705
+[][][][][][][][] [][][][][][][][][] [] []
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[373
+
+
+
+] [374]
+Underfull \vbox (badness 10000) detected at line 12738
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 12738
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 12738
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 12738
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 12738
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 12738
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 12738
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 12738
+[]
+ []
+
+Appendix B.
+File: images/coggle.png Graphic file (type QTm)
+<use  "images/coggle.png" >
+
+LaTeX Warning: Float too large for page by 24.48196pt on input line 12792.
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[375
+
+
+] [376] [377]
+Overfull \hbox (3.85pt too wide) in paragraph at lines 12866--12874
+ [][] 
+ []
+
+File: ismaykim_files/figure-latex/hist1b-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/hist1b-1.pdf" > [378] [379]
+Underfull \hbox (badness 10000) detected at line 12928
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12928
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12930
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12930
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12932
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12932
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12934
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 12934
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/unnamed-chunk-449-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-449-1.pdf" > [380]
+File: ismaykim_files/figure-latex/unnamed-chunk-450-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-450-1.pdf" > [381] [382]
+File: ismaykim_files/figure-latex/unnamed-chunk-454-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-454-1.pdf" > [383]
+Underfull \hbox (badness 10000) detected at line 13045
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13045
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13049
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13049
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/qqplotmean-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/qqplotmean-1.pdf" > [384] [385] [386] [387]
+File: ismaykim_files/figure-latex/bar-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/bar-1.pdf" > [388] [389]
+File: ismaykim_files/figure-latex/unnamed-chunk-457-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-457-1.pdf" >
+File: ismaykim_files/figure-latex/unnamed-chunk-458-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-458-1.pdf" > [390]
+Underfull \hbox (badness 10000) detected at line 13301
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13301
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13303
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13303
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13305
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13305
+[][][]
+ []
+
+[391]
+Underfull \hbox (badness 10000) detected at line 13307
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13307
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13309
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13309
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/unnamed-chunk-462-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-462-1.pdf" > [392]
+Underfull \hbox (badness 10000) detected at line 13365
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13365
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13369
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13369
+[][][]
+ []
+
+[393] [394]
+File: ismaykim_files/figure-latex/pvaloneprop-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/pvaloneprop-1.pdf" > [395] [396] [397]
+File: ismaykim_files/figure-latex/stacked_bar-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/stacked_bar-1.pdf" > [398] [399]
+File: ismaykim_files/figure-latex/unnamed-chunk-467-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-467-1.pdf" >
+File: ismaykim_files/figure-latex/unnamed-chunk-468-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-468-1.pdf" > [400] [401]
+File: ismaykim_files/figure-latex/unnamed-chunk-472-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-472-1.pdf" > [402]
+Underfull \hbox (badness 10000) detected at line 13722
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13722
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13726
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13726
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13734
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 13734
+[][][]
+ []
+
+[403] [404] [405]
+Overfull \hbox (111.04001pt too wide) in paragraph at lines 13885--13895
+ [][] 
+ []
+
+[406]
+File: ismaykim_files/figure-latex/boxplot-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/boxplot-1.pdf" > [407]
+File: ismaykim_files/figure-latex/unnamed-chunk-476-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-476-1.pdf" > [408]
+File: ismaykim_files/figure-latex/unnamed-chunk-477-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-477-1.pdf" > [409] [410]
+File: ismaykim_files/figure-latex/unnamed-chunk-481-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-481-1.pdf" > [411]
+Underfull \hbox (badness 10000) detected at line 14065
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14065
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14069
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14069
+[][][]
+ []
+
+File: ismaykim_files/figure-latex/hist-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/hist-1.pdf" > [412]
+Underfull \hbox (badness 10000) detected at line 14088
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14088
+[][][]
+ []
+
+[413] [414] [415] [416]
+File: ismaykim_files/figure-latex/hist1a-1.pdf Graphic file (type QTm)
+<use  "ismaykim_files/figure-latex/hist1a-1.pdf" > [417]
+File: ismaykim_files/figure-latex/unnamed-chunk-485-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-485-1.pdf" > [418]
+File: ismaykim_files/figure-latex/unnamed-chunk-486-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-486-1.pdf" > [419] [420]
+File: ismaykim_files/figure-latex/unnamed-chunk-490-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-490-1.pdf" > [421]
+Underfull \hbox (badness 10000) detected at line 14391
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14391
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14395
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14395
+[][][]
+ []
+
+[422] [423] [424]
+Underfull \vbox (badness 10000) detected at line 14465
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 14465
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 14465
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 14465
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 14465
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 14465
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 14465
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 14465
+[]
+ []
+
+Appendix C.
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[425
+
+
+]
+Underfull \hbox (badness 1107) in paragraph at lines 14517--14518
+[]\TU/lmr/m/n/12 We can use the sorted table giving the number of flights defin
+ed as
+ []
+
+File: ismaykim_files/figure-latex/unnamed-chunk-496-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-496-1.pdf" > [426]
+File: ismaykim_files/figure-latex/unnamed-chunk-498-1.png Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-498-1.png" > [427] [428]
+Underfull \vbox (badness 10000) detected at line 14564
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 14564
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 14564
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 14564
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 14564
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 14564
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 14564
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 14564
+[]
+ []
+
+Appendix D.
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[429
+
+
+] [430]
+Underfull \hbox (badness 3954) in paragraph at lines 14708--14709
+[]\TU/lmr/bx/n/12 (LC3.6) \TU/lmr/m/n/12 Create a new scatterplot using differe
+nt variables in the
+ []
+
+[431]
+File: ismaykim_files/figure-latex/unnamed-chunk-504-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-504-1.pdf" > [432]
+File: ismaykim_files/figure-latex/unnamed-chunk-505-1.pdf Graphic file (type QT
+m)
+<use  "ismaykim_files/figure-latex/unnamed-chunk-505-1.pdf" > [433]
+File: images/temp.png Graphic file (type QTm)
+<use  "images/temp.png" > [434] [435] [436]
+Underfull \hbox (badness 10000) detected at line 14874
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14874
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14876
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14876
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14878
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 14878
+[][][]
+ []
+
+[437] [438] [439] [440] [441]
+Underfull \hbox (badness 10000) in paragraph at lines 15095--15096
+[]\TU/lmr/bx/n/12 (LC4.7) \TU/lmr/m/n/12 Recreate \TU/SourceCodePro(0)/m/n/12 b
+y_monthly_origin\TU/lmr/m/n/12 , but instead of grouping via
+ []
+
+[442] [443]
+Underfull \hbox (badness 10000) detected at line 15188
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 15188
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 15190
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 15190
+[][][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 15192
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 15192
+[][][]
+ []
+
+[444]
+Underfull \hbox (badness 10000) detected at line 15194
+ [][]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 15194
+[][][]
+ []
+
+[445]
+Underfull \hbox (badness 1803) in paragraph at lines 15234--15235
+[]\TU/lmr/bx/n/12 (LC5.3) \TU/lmr/m/n/12 Take a look the \TU/SourceCodePro(0)/m
+/n/12 airline_safety \TU/lmr/m/n/12 data frame included in the
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 15242--15242
+[][][][][]$[][][][][] [] [] [] [][][][][][][][][][][][][][][] [] [][][] [] [][]
+[][][][][][] [] [][][][][][] [][] [][][][][][][][][] [][] [][][][][] [][] [][][
+][][][] [][] [][][][][][][][] [][] [][][][] [][] [][][][] [][]
+ []
+
+
+Underfull \hbox (badness 2744) in paragraph at lines 15242--15243
+[]\TU/lmr/m/n/12 After reading the help file by running \TU/SourceCodePro(0)/m/
+n/12 ?airline_safety\TU/lmr/m/n/12 , we see that
+ []
+
+[446]
+Underfull \hbox (badness 4013) in paragraph at lines 15300--15301
+\TU/lmr/m/n/12 sible values: \TU/SourceCodePro(0)/m/n/12 "incidents_85_99", "fa
+tal_accidents_85_99", "fatalities_85_99",
+ []
+
+
+Underfull \hbox (badness 6961) in paragraph at lines 15300--15301
+\TU/SourceCodePro(0)/m/n/12 "incidents_00_14", "fatal_accidents_00_14", "fatali
+ties_00_14" \TU/lmr/m/n/12 corresponding
+ []
+
+[447] (./ismaykim.bbl [448]
+Underfull \vbox (badness 10000) detected at line 1
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 1
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 1
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1
+[]
+ []
+
+
+Underfull \vbox (badness 10000) detected at line 1
+ []
+
+
+Overfull \hbox (395.75pt too wide) detected at line 1
+[]
+ []
+
+
+Underfull \hbox (badness 10000) detected at line 1
+[]
+ []
+
+)
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[449
+
+
+]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+
+Overfull \hbox (395.75pt too wide) has occurred while \output is active
+[]
+ []
+
+
+Underfull \hbox (badness 10000) has occurred while \output is active
+ [][][][] 
+ []
+
+[450
+
+] (./ismaykim.ind)
+Package atveryend Info: Empty hook `BeforeClearDocument' on input line 15337.
+Package atveryend Info: Empty hook `AfterLastShipout' on input line 15337.
+(./ismaykim.aux)
+Package atveryend Info: Empty hook `AtVeryEndDocument' on input line 15337.
+Package atveryend Info: Empty hook `AtEndAfterFileList' on input line 15337.
 
-! LaTeX Error: Environment learncheck undefined.
+LaTeX Warning: There were multiply-defined labels.
 
-See the LaTeX manual or LaTeX Companion for explanation.
-Type  H <return>  for immediate help.
- ...                                              
-                                                  
-l.178 \begin{learncheck}
-                         
+Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 15337.
+ ) 
 Here is how much of TeX's memory you used:
- 15020 strings out of 493089
- 215116 string characters out of 6134842
- 335160 words of memory out of 5000000
- 18276 multiletter control sequences out of 15000+600000
- 34283 words of font info for 35 fonts, out of 8000000 for 9000
- 1141 hyphenation exceptions out of 8191
- 41i,7n,35p,247b,370s stack positions out of 5000i,500n,10000p,200000b,80000s
-
-!  ==> Fatal error occurred, no output PDF file produced!
+ 28758 strings out of 493005
+ 517038 string characters out of 6134596
+ 788642 words of memory out of 5000000
+ 32096 multiletter control sequences out of 15000+600000
+ 13207 words of font info for 107 fonts, out of 8000000 for 9000
+ 1348 hyphenation exceptions out of 8191
+ 61i,14n,120p,10417b,837s stack positions out of 5000i,500n,10000p,200000b,80000s
+
+Output written on ismaykim.pdf (466 pages).
diff --git a/krantz.cls b/krantz.cls
index d8cb46c27..9078c3e15 100755
--- a/krantz.cls
+++ b/krantz.cls
@@ -110,7 +110,6 @@
 \newif\if@krantzb
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-
 \ExecuteOptions{letterpaper,10pt,twoside,onecolumn,final,openright}
 \ProcessOptions
 
diff --git a/latex/preamble.tex b/latex/preamble.tex
index df9af2fba..5cbf37dd0 100755
--- a/latex/preamble.tex
+++ b/latex/preamble.tex
@@ -8,7 +8,7 @@
 \usepackage{float}
 \usepackage{array}
 \usepackage{multirow}
-\usepackage[table]{xcolor}
+%\usepackage[table]{xcolor}
 \usepackage{wrapfig}
 \usepackage{colortbl}
 \usepackage{pdflscape}
@@ -91,6 +91,10 @@
 \newenvironment{review}
   {\begin{rmdblock}{warning}}
   {\end{rmdblock}}
+\newenvironment{announcement}
+  {\begin{rmdblock}{warning}}
+  {\end{rmdblock}}
+
 
 \usepackage{amsthm}
 \makeatletter
diff --git a/previous_versions/v0.4.0/10-hypothesis-testing.html b/previous_versions/v0.4.0/10-hypothesis-testing.html
new file mode 100644
index 000000000..75f55a11d
--- /dev/null
+++ b/previous_versions/v0.4.0/10-hypothesis-testing.html
@@ -0,0 +1,1185 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>10 Hypothesis Testing | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="10 Hypothesis Testing | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="10 Hypothesis Testing | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="9-confidence-intervals.html">
+<link rel="next" href="11-inference-for-regression.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="hypothesis-testing" class="section level1">
+<h1><span class="header-section-number">10</span> Hypothesis Testing</h1>
+<p>We saw some of the main concepts of hypothesis testing introduced in Chapters <a href="8-sampling.html#sampling">8</a> and <a href="9-confidence-intervals.html#confidence-intervals">9</a>. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general.Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations.</p>
+<p>The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the <code>infer</code> package pipeline in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix <a href="B-appendixB.html#appendixB">B</a>.</p>
+<p>We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the <span class="math inline">\(t\)</span>-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook.</p>
+<div id="needed-packages-7" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(infer)
+<span class="kw">library</span>(nycflights13)
+<span class="kw">library</span>(ggplot2movies)
+<span class="kw">library</span>(broom)</code></pre>
+</div>
+<div id="datacamp-7" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>Our approach of using data science tools to understand the second major component of statistical inference, hypothesis testing, uses the same tools as in <a href="https://twitter.com/minebocek">Mine Cetinkaya-Rundel</a> and <a href="https://twitter.com/crite">Andrew Bray’s</a> DataCamp courses “Inference for Numerical Data” and “Inference for Categorical Data.” If you’re interested in complementing your learning below in an interactive online environment, click on the images below to access the courses.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-numerical-data"><img src="images/datacamp_inference_for_numerical_data.png" alt="Drawing" style="height: 150px;"/></a>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-categorical-data"><img src="images/datacamp_inference_for_categorical_data.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+<hr />
+</div>
+<div id="when-inference-is-not-needed" class="section level2">
+<h2><span class="header-section-number">10.1</span> When inference is not needed</h2>
+<p>Before we delve into hypothesis testing, it’s good to remember that there are cases where you need not perform a rigorous statistical inference. An important and time-saving skill is to <strong>ALWAYS</strong> do exploratory data analysis using <code>dplyr</code> and <code>ggplot2</code> before thinking about running a hypothesis test. Let’s look at such an example selecting a sample of flights traveling to Boston and to San Francisco from New York City in the <code>flights</code> data frame in the <code>nycflights13</code> package. (We will remove flights with missing data first using <code>na.omit</code> and then sample 100 flights going to each of the two airports.)</p>
+<pre class="sourceCode r"><code class="sourceCode r">bos_sfo &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">na.omit</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(dest <span class="op">%in%</span><span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;BOS&quot;</span>, <span class="st">&quot;SFO&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">sample_n</span>(<span class="dv">100</span>)</code></pre>
+<p>Suppose we were interested in seeing if the <code>air_time</code> to SFO in San Francisco was statistically greater than the <code>air_time</code> to BOS in Boston. As suggested, let’s begin with some exploratory data analysis to get a sense for how the two variables of <code>air_time</code> and <code>dest</code> relate for these two destination airports:</p>
+<pre class="sourceCode r"><code class="sourceCode r">bos_sfo_summary &lt;-<span class="st"> </span>bos_sfo <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_time =</span> <span class="kw">mean</span>(air_time),
+            <span class="dt">sd_time =</span> <span class="kw">sd</span>(air_time))
+bos_sfo_summary</code></pre>
+<pre><code># A tibble: 2 x 3
+  dest  mean_time sd_time
+  &lt;chr&gt;     &lt;dbl&gt;   &lt;dbl&gt;
+1 BOS        38.3    4.21
+2 SFO       345.    18.0 </code></pre>
+<p>Looking at these results, we can clearly see that SFO <code>air_time</code> is much larger than BOS <code>air_time</code>. The standard deviation is also extremely informative here.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC10.1)</strong> Could we make the same type of immediate conclusion that SFO had a statistically greater <code>air_time</code> if, say, its corresponding standard deviation was 200 minutes? What about 100 minutes? Explain.</p>
+<div class="learncheck">
+
+</div>
+<p>To further understand just how different the <code>air_time</code> variable is for BOS and SFO, let’s look at a boxplot:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> bos_sfo, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dest, <span class="dt">y =</span> air_time)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-357-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Since there is no overlap at all, we can conclude that the <code>air_time</code> for San Francisco flights is statistically greater (at any level of significance) than the <code>air_time</code> for Boston flights. This is a clear example of not needing to do anything more than some simple exploratory data analysis with descriptive statistics and data visualization to get an appropriate inferential conclusion. This is one reason why you should <strong>ALWAYS</strong> investigate the sample data first using <code>dplyr</code> and <code>ggplot2</code> via exploratory data analysis.</p>
+<p>As you get more and more practice with hypothesis testing, you’ll be better able to determine in many cases whether or not the results will be statistically significant. There are circumstances where it is difficult to tell, but you should always try to make a guess FIRST about significance after you have completed your data exploration and before you actually begin the inferential techniques.</p>
+</div>
+<div id="ht-basics" class="section level2">
+<h2><span class="header-section-number">10.2</span> Basics of hypothesis testing</h2>
+<p>In a hypothesis test, we will use data from a sample to help us decide between two competing <em>hypotheses</em> about a population. We make these hypotheses more concrete by specifying them in terms of at least one <em>population parameter</em> of interest. We refer to the competing claims about the population as the <strong>null hypothesis</strong>, denoted by <span class="math inline">\(H_0\)</span>, and the <strong>alternative (or research) hypothesis</strong>, denoted by <span class="math inline">\(H_a\)</span>. The roles of these two hypotheses are NOT interchangeable.</p>
+<ul>
+<li>The claim for which we seek significant evidence is assigned to the alternative hypothesis. The alternative is usually what the experimenter or researcher wants to establish or find evidence for.</li>
+<li>Usually, the null hypothesis is a claim that there really is “no effect” or “no difference.” In many cases, the null hypothesis represents the status quo or that nothing interesting is happening.<br />
+</li>
+<li>We assess the strength of evidence by assuming the null hypothesis is true and determining how unlikely it would be to see sample results/statistics as extreme (or more extreme) as those in the original sample.</li>
+</ul>
+<p>Hypothesis testing brings about many weird and incorrect notions in the scientific community and society at large. One reason for this is that statistics has traditionally been thought of as this magic box of algorithms and procedures to get to results and this has been readily apparent if you do a Google search of “flowchart statistics hypothesis tests”. There are so many different complex ways to determine which test is appropriate.</p>
+<p>You’ll see that we don’t need to rely on these complicated series of assumptions and procedures to conduct a hypothesis test any longer. These methods were introduced in a time when computers weren’t powerful. Your cellphone (in 2016) has more power than the computers that sent NASA astronauts to the moon after all. We’ll see that ALL hypothesis tests can be broken down into the following framework given by Allen Downey <a href="http://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.html">here</a>:</p>
+<div class="figure" style="text-align: center"><span id="fig:htdowney"></span>
+<img src="images/ht.png" alt="Hypothesis Testing Framework" width="\textwidth" />
+<p class="caption">
+Figure 10.1: Hypothesis Testing Framework
+</p>
+</div>
+<p>Before we hop into this framework, we will provide another way to think about hypothesis testing that may be useful.</p>
+<hr />
+</div>
+<div id="trial" class="section level2">
+<h2><span class="header-section-number">10.3</span> Criminal trial analogy</h2>
+<p>We can think of hypothesis testing in the same context as a criminal trial in the United States. A criminal trial in the United States is a familiar situation in which a choice between two contradictory claims must be made.</p>
+<ol style="list-style-type: decimal">
+<li><p>The accuser of the crime must be judged either guilty or not guilty.</p></li>
+<li><p>Under the U.S. system of justice, the individual on trial is initially presumed not guilty.</p></li>
+<li><p>Only STRONG EVIDENCE to the contrary causes the not guilty claim to be rejected in favor of a guilty verdict.</p></li>
+<li><p>The phrase “beyond a reasonable doubt” is often used to set the cutoff value for when enough evidence has been given to convict.</p></li>
+</ol>
+<p>Theoretically, we should never say “The person is innocent.” but instead “There is not sufficient evidence to show that the person is guilty.”</p>
+<p>Now let’s compare that to how we look at a hypothesis test.</p>
+<ol style="list-style-type: decimal">
+<li><p>The decision about the population parameter(s) must be judged to follow one of two hypotheses.</p></li>
+<li><p>We initially assume that <span class="math inline">\(H_0\)</span> is true.</p></li>
+<li><p>The null hypothesis <span class="math inline">\(H_0\)</span> will be rejected (in favor of <span class="math inline">\(H_a\)</span>) only if the sample evidence strongly suggests that <span class="math inline">\(H_0\)</span> is false. If the sample does not provide such evidence, <span class="math inline">\(H_0\)</span> will not be rejected.</p></li>
+<li><p>The analogy to “beyond a reasonable doubt” in hypothesis testing is what is known as the <strong>significance level</strong>. This will be set before conducting the hypothesis test and is denoted as <span class="math inline">\(\alpha\)</span>. Common values for <span class="math inline">\(\alpha\)</span> are 0.1, 0.01, and 0.05.</p></li>
+</ol>
+<div id="two-possible-conclusions" class="section level3">
+<h3><span class="header-section-number">10.3.1</span> Two possible conclusions</h3>
+<p>Therefore, we have two possible conclusions with hypothesis testing:</p>
+<ul>
+<li>Reject <span class="math inline">\(H_0\)</span><br />
+</li>
+<li>Fail to reject <span class="math inline">\(H_0\)</span></li>
+</ul>
+<p>Gut instinct says that “Fail to reject <span class="math inline">\(H_0\)</span>” should say “Accept <span class="math inline">\(H_0\)</span>” but this technically is not correct. Accepting <span class="math inline">\(H_0\)</span> is the same as saying that a person is innocent. We cannot show that a person is innocent; we can only say that there was not enough substantial evidence to find the person guilty.</p>
+<p>When you run a hypothesis test, you are the jury of the trial. You decide whether there is enough evidence to convince yourself that <span class="math inline">\(H_a\)</span> is true (“the person is guilty”) or that there was not enough evidence to convince yourself <span class="math inline">\(H_a\)</span> is true (“the person is not guilty”). You must convince yourself (using statistical arguments) which hypothesis is the correct one given the sample information.</p>
+<p><strong>Important note:</strong> Therefore, DO NOT WRITE “Accept <span class="math inline">\(H_0\)</span>” any time you conduct a hypothesis test. Instead write “Fail to reject <span class="math inline">\(H_0\)</span>.”</p>
+<hr />
+</div>
+</div>
+<div id="types-of-errors-in-hypothesis-testing" class="section level2">
+<h2><span class="header-section-number">10.4</span> Types of errors in hypothesis testing</h2>
+<p>Unfortunately, just as a jury or a judge can make an incorrect decision in regards to a criminal trial by reaching the wrong verdict, there is some chance we will reach the wrong conclusion via a hypothesis test about a population parameter. As with criminal trials, this comes from the fact that we don’t have complete information, but rather a sample from which to try to infer about a population.</p>
+<p>The possible erroneous conclusions in a criminal trial are</p>
+<ul>
+<li>an innocent person is convicted (found guilty) or</li>
+<li>a guilty person is set free (found not guilty).</li>
+</ul>
+<p>The possible errors in a hypothesis test are</p>
+<ul>
+<li>rejecting <span class="math inline">\(H_0\)</span> when in fact <span class="math inline">\(H_0\)</span> is true (Type I Error) or</li>
+<li>failing to reject <span class="math inline">\(H_0\)</span> when in fact <span class="math inline">\(H_0\)</span> is false (Type II Error).</li>
+</ul>
+<p>The risk of error is the price researchers pay for basing an inference about a population on a sample. With any reasonable sample-based procedure, there is some chance that a Type I error will be made and some chance that a Type II error will occur.</p>
+<p>To help understand the concepts of Type I error and Type II error, observe the following table:</p>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-358"></span>
+<img src="images/errors.png" alt="Type I and Type II errors" width="\textwidth" />
+<p class="caption">
+Figure 10.2: Type I and Type II errors
+</p>
+</div>
+<p>If we are using sample data to make inferences about a parameter, we run the risk of making a mistake. Obviously, we want to minimize our chance of error; we want a small probability of drawing an incorrect conclusion.</p>
+<ul>
+<li>The probability of a Type I Error occurring is denoted by <span class="math inline">\(\alpha\)</span> and is called the <strong>significance level</strong> of a hypothesis test</li>
+<li>The probability of a Type II Error is denoted by <span class="math inline">\(\beta\)</span>.</li>
+</ul>
+<p>Formally, we can define <span class="math inline">\(\alpha\)</span> and <span class="math inline">\(\beta\)</span> in regards to the table above, but for hypothesis tests instead of a criminal trial.</p>
+<ul>
+<li><span class="math inline">\(\alpha\)</span> corresponds to the probability of rejecting <span class="math inline">\(H_0\)</span> when, in fact, <span class="math inline">\(H_0\)</span> is true.</li>
+<li><span class="math inline">\(\beta\)</span> corresponds to the probability of failing to reject <span class="math inline">\(H_0\)</span> when, in fact, <span class="math inline">\(H_0\)</span> is false.</li>
+</ul>
+<p>Ideally, we want <span class="math inline">\(\alpha = 0\)</span> and <span class="math inline">\(\beta = 0\)</span>, meaning that the chance of making an error does not exist. When we have to use incomplete information (sample data), it is not possible to have both <span class="math inline">\(\alpha = 0\)</span> and <span class="math inline">\(\beta = 0\)</span>. We will always have the possibility of at least one error existing when we use sample data.</p>
+<p>Usually, what is done is that <span class="math inline">\(\alpha\)</span> is set before the hypothesis test is conducted and then the evidence is judged against that significance level. Common values for <span class="math inline">\(\alpha\)</span> are 0.05, 0.01, and 0.10. If <span class="math inline">\(\alpha = 0.05\)</span>, we are using a testing procedure that, used over and over with different samples, rejects a TRUE null hypothesis five percent of the time.</p>
+<p>So if we can set <span class="math inline">\(\alpha\)</span> to be whatever we want, why choose 0.05 instead of 0.01 or even better 0.0000000000000001? Well, a small <span class="math inline">\(\alpha\)</span> means the test procedure requires the evidence against <span class="math inline">\(H_0\)</span> to be <strong>very strong</strong> before we can reject <span class="math inline">\(H_0\)</span>. This means we will almost never reject <span class="math inline">\(H_0\)</span> if <span class="math inline">\(\alpha\)</span> is very small. If we almost never reject <span class="math inline">\(H_0\)</span>, the probability of a Type II Error – failing to reject <span class="math inline">\(H_0\)</span> when we should – will <em>increase</em>! Thus, as <span class="math inline">\(\alpha\)</span> decreases, <span class="math inline">\(\beta\)</span> increases and as <span class="math inline">\(\alpha\)</span> increases, <span class="math inline">\(\beta\)</span> decreases. We, therefore, need to strike a balance in <span class="math inline">\(\alpha\)</span> and <span class="math inline">\(\beta\)</span> and the common values for <span class="math inline">\(\alpha\)</span> of 0.05, 0.01, and 0.10 usually lead to a nice balance.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC10.2)</strong> Reproduce the table above about errors, but for a hypothesis test, instead of the one provided for a criminal trial.</p>
+<div class="learncheck">
+
+</div>
+<div id="logic-of-hypothesis-testing" class="section level3">
+<h3><span class="header-section-number">10.4.1</span> Logic of hypothesis testing</h3>
+<ul>
+<li>Take a random sample (or samples) from a population (or multiple populations)</li>
+<li>If the sample data are consistent with the null hypothesis, do not reject the null hypothesis.</li>
+<li>If the sample data are inconsistent with the null hypothesis (in the direction of the alternative hypothesis), reject the null hypothesis and conclude that there is evidence the alternative hypothesis is true (based on the particular sample collected).</li>
+</ul>
+<hr />
+</div>
+</div>
+<div id="statistical-significance" class="section level2">
+<h2><span class="header-section-number">10.5</span> Statistical significance</h2>
+<p>The idea that sample results are more extreme than we would reasonably expect to see by random chance if the null hypothesis were true is the fundamental idea behind statistical hypothesis tests. If data at least as extreme would be very unlikely if the null hypothesis were true, we say the data are <strong>statistically significant</strong>. Statistically significant data provide convincing evidence against the null hypothesis in favor of the alternative, and allow us to generalize our sample results to the claim about the population.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC10.3)</strong> What is wrong about saying “The defendant is innocent.” based on the US system of criminal trials?</p>
+<p><strong>(LC10.4)</strong> What is the purpose of hypothesis testing?</p>
+<p><strong>(LC10.5)</strong> What are some flaws with hypothesis testing? How could we alleviate them?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="hypothesis-testing-with-infer" class="section level2">
+<h2><span class="header-section-number">10.6</span> Hypothesis testing with infer</h2>
+<p>The “There is Only One Test” diagram mentioned in Section <a href="10-hypothesis-testing.html#ht-basics">10.2</a> was the inspiration for the <code>infer</code> pipeline that you saw for confidence intervals in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>. For hypothesis tests, we include one more verb into the pipeline: the <code>hypothesize()</code> verb. Its main argument is <code>null</code> which is either</p>
+<ul>
+<li><code>&quot;point&quot;</code> for point hypotheses involving a single sample or</li>
+<li><code>&quot;independence&quot;</code> for testing for independence between two variables.</li>
+</ul>
+<p><img src="images/flowcharts/infer/ht.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We’ll first explore the two variable case by comparing two means. Note the section headings here that refer to the “There is Only One Test” diagram. We will lay out the specifics for each problem using this framework and the <code>infer</code> pipeline together.</p>
+</div>
+<div id="example-comparing-two-means" class="section level2">
+<h2><span class="header-section-number">10.7</span> Example: Comparing two means</h2>
+<div id="randomizationpermutation" class="section level3">
+<h3><span class="header-section-number">10.7.1</span> Randomization/permutation</h3>
+<p>We will now focus on building hypotheses looking at the difference between two population means in an example. We will denote population means using the Greek symbol <span class="math inline">\(\mu\)</span> (pronounced “mu”). Thus, we will be looking to see if one group “out-performs” another group. This is quite possibly the most common type of statistical inference and serves as a basis for many other types of analyses when comparing the relationship between two variables.</p>
+<p>Our null hypothesis will be of the form <span class="math inline">\(H_0: \mu_1 = \mu_2\)</span>, which can also be written as <span class="math inline">\(H_0: \mu_1 - \mu_2 = 0\)</span>. Our alternative hypothesis will be of the form <span class="math inline">\(H_0: \mu_1 \star \mu_2\)</span> (or <span class="math inline">\(H_a: \mu_1 - \mu_2 \, \star \, 0\)</span>) where <span class="math inline">\(\star\)</span> = <span class="math inline">\(&lt;\)</span>, <span class="math inline">\(\ne\)</span>, or <span class="math inline">\(&gt;\)</span> depending on the context of the problem. You needn’t focus on these new symbols too much at this point. It will just be a shortcut way for us to describe our hypotheses.</p>
+<p>As we saw in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>, bootstrapping is a valuable tool when conducting inferences based on one or two population variables. We will see that the process of <strong>randomization</strong> (also known as <strong>permutation</strong>) will be valuable in conducting tests comparing quantitative values from two groups.</p>
+</div>
+<div id="comparing-action-and-romance-movies" class="section level3">
+<h3><span class="header-section-number">10.7.2</span> Comparing action and romance movies</h3>
+<p>The <code>movies</code> dataset in the <code>ggplot2movies</code> package contains information on a large number of movies that have been rated by users of IMDB.com <span class="citation">(Wickham 2015)</span>. We are interested in the question here of whether <code>Action</code> movies are rated higher on IMDB than <code>Romance</code> movies. We will first need to do a little bit of data wrangling using the ideas from Chapter <a href="5-wrangling.html#wrangling">5</a> to get the data in the form that we would like:</p>
+<pre class="sourceCode r"><code class="sourceCode r">movies_trimmed &lt;-<span class="st"> </span>movies <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(title, year, rating, Action, Romance)</code></pre>
+<p>Note that <code>Action</code> and <code>Romance</code> are binary variables here. To remove any overlap of movies (and potential confusion) that are both <code>Action</code> and <code>Romance</code>, we will remove them from our <em>population</em>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">movies_trimmed &lt;-<span class="st"> </span>movies_trimmed <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">filter</span>(<span class="op">!</span>(Action <span class="op">==</span><span class="st"> </span><span class="dv">1</span> <span class="op">&amp;</span><span class="st"> </span>Romance <span class="op">==</span><span class="st"> </span><span class="dv">1</span>))</code></pre>
+<p>We will now create a new variable called <code>genre</code> that specifies whether a movie in our <code>movies_trimmed</code> data frame is an <code>&quot;Action&quot;</code> movie, a <code>&quot;Romance&quot;</code> movie, or <code>&quot;Neither&quot;</code>. We aren’t really interested in the <code>&quot;Neither&quot;</code> category here so we will exclude those rows as well. Lastly, the <code>Action</code> and <code>Romance</code> columns are not needed anymore since they are encoded in the <code>genre</code> column.</p>
+<!-- Maybe just skip all this wrangling and include movies_trimmed in moderndive? I do kinda like coming back to wrangling though. -->
+<pre class="sourceCode r"><code class="sourceCode r">movies_trimmed &lt;-<span class="st"> </span>movies_trimmed <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">genre =</span> <span class="kw">case_when</span>(Action <span class="op">==</span><span class="st"> </span><span class="dv">1</span> <span class="op">~</span><span class="st"> &quot;Action&quot;</span>,
+                           Romance <span class="op">==</span><span class="st"> </span><span class="dv">1</span> <span class="op">~</span><span class="st"> &quot;Romance&quot;</span>,
+                           <span class="ot">TRUE</span> <span class="op">~</span><span class="st"> &quot;Neither&quot;</span>)) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">filter</span>(genre <span class="op">!=</span><span class="st"> &quot;Neither&quot;</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span>Action, <span class="op">-</span>Romance)</code></pre>
+<p>The <code>case_when</code> function is useful for assigning values in a new variable based on the values of another variable. The last step of <code>TRUE ~ &quot;Neither&quot;</code> is used when a particular movie is not set to either Action or Romance.</p>
+<p>We are left with 8878 movies in our <em>population</em> dataset that focuses on only <code>&quot;Action&quot;</code> and <code>&quot;Romance&quot;</code> movies.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC10.6)</strong> Why are the different genre variables stored as binary variables (1s and 0s) instead of just listing the <code>genre</code> as a column of values like “Action”, “Comedy”, etc.?</p>
+<p><strong>(LC10.7)</strong> What complications could come above with us excluding action romance movies? Should we question the results of our hypothesis test? Explain.</p>
+<div class="learncheck">
+
+</div>
+<p>Let’s now visualize the distributions of <code>rating</code> across both levels of <code>genre</code>. Think about what type(s) of plot is/are appropriate here before you proceed:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> movies_trimmed, <span class="kw">aes</span>(<span class="dt">x =</span> genre, <span class="dt">y =</span> rating)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-366"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-366-1.png" alt="Rating vs genre in the population" width="\textwidth" />
+<p class="caption">
+Figure 10.3: Rating vs genre in the population
+</p>
+</div>
+<p>We can see that the middle 50% of ratings for <code>&quot;Action&quot;</code> movies is more spread out than that of <code>&quot;Romance&quot;</code> movies in the population. <code>&quot;Romance&quot;</code> has outliers at both the top and bottoms of the scale though. We are initially interested in comparing the mean <code>rating</code> across these two groups so a faceted histogram may also be useful:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> movies_trimmed, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> rating)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">1</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_grid</span>(genre <span class="op">~</span><span class="st"> </span>.)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:movie-hist"></span>
+<img src="ismaykim_files/figure-html/movie-hist-1.png" alt="Faceted histogram of genre vs rating" width="\textwidth" />
+<p class="caption">
+Figure 10.4: Faceted histogram of genre vs rating
+</p>
+</div>
+<p><strong>Important note:</strong> Remember that we hardly ever have access to the population values as we do here. This example and the <code>nycflights13</code> dataset were used to create a common flow from chapter to chapter. In nearly all circumstances, we’ll be needing to use only a sample of the population to try to infer conclusions about the unknown population parameter values. These examples do show a nice relationship between statistics (where data is usually small and more focused on experimental settings) and data science (where data is frequently large and collected without experimental conditions).</p>
+</div>
+<div id="sampling-rightarrow-randomization" class="section level3">
+<h3><span class="header-section-number">10.7.3</span> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</h3>
+<p>We can use hypothesis testing to investigate ways to determine, for example, whether a <strong>treatment</strong> has an effect over a <strong>control</strong> and other ways to statistically analyze if one group performs better than, worse than, or different than another. We are interested here in seeing how we can use a random sample of action movies and a random sample of romance movies from <code>movies</code> to determine if a statistical difference exists in the mean ratings of each group.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC10.8)</strong> Define the relevant parameters here in terms of the populations of movies.</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="data" class="section level3">
+<h3><span class="header-section-number">10.7.4</span> Data</h3>
+<p>Let’s select a random sample of 34 action movies and a random sample of 34 romance movies. (The number 34 was chosen somewhat arbitrarily here.)</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2017</span>)
+movies_genre_sample &lt;-<span class="st"> </span>movies_trimmed <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(genre) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">sample_n</span>(<span class="dv">34</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">ungroup</span>()</code></pre>
+<p><strong>Note</strong> the addition of the <code>ungroup()</code> function here. This will be useful shortly in allowing us to permute the values of <code>rating</code> across <code>genre</code>. Our analysis does not work without this <code>ungroup()</code> function since the data stays grouped by the levels of <code>genre</code> without it.</p>
+<p>We can now observe the distributions of our two sample ratings for both groups. Remember that these plots
+should be rough approximations of our population distributions of movie ratings for <code>&quot;Action&quot;</code> and <code>&quot;Romance&quot;</code>
+in our population of all movies in the <code>movies</code> data frame.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> movies_genre_sample, <span class="kw">aes</span>(<span class="dt">x =</span> genre, <span class="dt">y =</span> rating)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-369"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-369-1.png" alt="Genre vs rating for our sample" width="\textwidth" />
+<p class="caption">
+Figure 10.5: Genre vs rating for our sample
+</p>
+</div>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> movies_genre_sample, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> rating)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">1</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_grid</span>(genre <span class="op">~</span><span class="st"> </span>.)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-370"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-370-1.png" alt="Genre vs rating for our sample as faceted histogram" width="\textwidth" />
+<p class="caption">
+Figure 10.6: Genre vs rating for our sample as faceted histogram
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC10.9)</strong> What single value could we change to improve the approximation using the sample distribution on the population distribution?</p>
+<div class="learncheck">
+
+</div>
+<p>Do we have reason to believe, based on the sample distributions of <code>rating</code> over the two groups of <code>genre</code>, that there is a significant difference between the mean <code>rating</code> for action movies compared to romance movies? It’s hard to say just based on the plots. The boxplot does show that the median sample rating is higher for romance movies, but the histogram isn’t as clear. The two groups have somewhat differently shaped distributions but they are both over similar values of <code>rating</code>. It’s often useful to calculate the mean and standard deviation as well, conditioned on the two levels.</p>
+<pre class="sourceCode r"><code class="sourceCode r">summary_ratings &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(genre) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(rating),
+            <span class="dt">std_dev =</span> <span class="kw">sd</span>(rating),
+            <span class="dt">n =</span> <span class="kw">n</span>())
+summary_ratings</code></pre>
+<pre><code># A tibble: 2 x 4
+  genre    mean std_dev     n
+  &lt;chr&gt;   &lt;dbl&gt;   &lt;dbl&gt; &lt;int&gt;
+1 Action   5.11    1.49    34
+2 Romance  6.06    1.15    34</code></pre>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC10.10)</strong> Why did we not specify <code>na.rm = TRUE</code> here as we did in Chapter <a href="5-wrangling.html#wrangling">5</a>?</p>
+<div class="learncheck">
+
+</div>
+<p>We see that the sample mean rating for romance movies, <span class="math inline">\(\bar{x}_{r}\)</span>, is greater than the similar measure for action movies, <span class="math inline">\(\bar{x}_a\)</span>. But is it statistically significantly greater (thus, leading us to conclude that the means are statistically different)? The standard deviation can provide some insight here but with these standard deviations being so similar it’s still hard to say for sure.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC10.11)</strong> Why might the standard deviation provide some insight about the means being statistically different or not?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model-of-h_0" class="section level3">
+<h3><span class="header-section-number">10.7.5</span> Model of <span class="math inline">\(H_0\)</span></h3>
+<p>The hypotheses we specified can also be written in another form to better give us an idea of what we will be simulating to create our null distribution.</p>
+<ul>
+<li><span class="math inline">\(H_0: \mu_r - \mu_a = 0\)</span></li>
+<li><span class="math inline">\(H_a: \mu_r - \mu_a \ne 0\)</span></li>
+</ul>
+</div>
+<div id="test-statistic-delta" class="section level3">
+<h3><span class="header-section-number">10.7.6</span> Test statistic <span class="math inline">\(\delta\)</span></h3>
+<p>We are, therefore, interested in seeing whether the difference in the sample means, <span class="math inline">\(\bar{x}_r - \bar{x}_a\)</span>, is statistically different than 0. We can now come back to our <code>infer</code> pipeline for computing our observed statistic. Note the <code>order</code> argument that shows the mean value for <code>&quot;Action&quot;</code> being subtracted from the mean value of <code>&quot;Romance&quot;</code>.</p>
+</div>
+<div id="observed-effect-delta" class="section level3">
+<h3><span class="header-section-number">10.7.7</span> Observed effect <span class="math inline">\(\delta^*\)</span></h3>
+<pre class="sourceCode r"><code class="sourceCode r">obs_diff &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Romance&quot;</span>, <span class="st">&quot;Action&quot;</span>))
+obs_diff</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  0.95</code></pre>
+<p>Our goal next is to figure out a random process with which to simulate the null hypothesis being true. Recall that <span class="math inline">\(H_0: \mu_r - \mu_a = 0\)</span> corresponds to us assuming that the population means are the same. We would like to assume this is true and perform a random process to <code>generate()</code> data in the model of the null hypothesis.</p>
+</div>
+<div id="simulated-data" class="section level3">
+<h3><span class="header-section-number">10.7.8</span> Simulated data</h3>
+<p><strong>Tactile simulation</strong></p>
+<!-- Should probably include some pictures of the index cards here. -->
+<p>Here, with us assuming the two population means are equal (<span class="math inline">\(H_0: \mu_r - \mu_a = 0\)</span>), we can look at this from a tactile point of view by using index cards. There are <span class="math inline">\(n_r = 34\)</span> data elements corresponding to romance movies and <span class="math inline">\(n_a = 34\)</span> for action movies. We can write the 34 ratings from our sample for romance movies on one set of 34 index cards and the 34 ratings for action movies on another set of 34 index cards. (Note that the sample sizes need not be the same.)</p>
+<p>The next step is to put the two stacks of index cards together, creating a new set of 68 cards. If we assume that the two population means are equal, we are saying that there is no association between ratings and genre (romance vs action). We can use the index cards to create two <strong>new</strong> stacks for romance and action movies. First, we must shuffle all the cards thoroughly. After doing so, in this case with equal values of sample sizes, we split the deck in half.</p>
+<p>We then calculate the new sample mean rating of the romance deck, and also the new sample mean rating of the action deck. This creates one simulation of the samples that were collected originally. We next want to calculate a statistic from these two samples. Instead of actually doing the calculation using index cards, we can use R as we have before to simulate this process. Let’s do this just once and compare the results to what we see in <code>movies_genre_sample</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">shuffled_ratings_old &lt;-<span class="st"> </span><span class="co">#movies_trimmed %&gt;%</span>
+<span class="st">  </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">     </span><span class="kw">mutate</span>(<span class="dt">genre =</span> mosaic<span class="op">::</span><span class="kw">shuffle</span>(genre)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">     </span><span class="kw">group_by</span>(genre) <span class="op">%&gt;%</span>
+<span class="st">     </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(rating))
+<span class="kw">diff</span>(shuffled_ratings_old<span class="op">$</span>mean)</code></pre>
+<pre><code>[1] 0.126</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">permuted_ratings &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1</span>)</code></pre>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC10.12)</strong> How would the tactile shuffling of index cards change if we had different samples of say 20 action movies and 60 romance movies? Describe each step that would change.</p>
+<p><strong>(LC10.13)</strong> Why are we taking the difference in the means of the cards in the new shuffled decks?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="distribution-of-delta-under-h_0" class="section level3">
+<h3><span class="header-section-number">10.7.9</span> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></h3>
+<p>The <code>generate()</code> step completes a permutation sending values of ratings to potentially different values of <code>genre</code> from which they originally came. It simulates a shuffling of the ratings between the two levels of <code>genre</code> just as we could have done with index cards. We can now proceed in a similar way to what we have done previously with bootstrapping by repeating this process many times to create simulated samples, assuming the null hypothesis is true.</p>
+<pre class="sourceCode r"><code class="sourceCode r">generated_samples &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">5000</span>)</code></pre>
+<p>A <strong>null distribution</strong> of simulated differences in sample means is created with the specification of <code>stat = &quot;diff in means&quot;</code> for the <code>calculate()</code> step. The <strong>null distribution</strong> is similar to the bootstrap distribution we saw in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a>, but remember that it consists of statistics generated assuming the null hypothesis is true.</p>
+<p>We can now plot the distribution of these simulated differences in means:</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-382"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-382-1.png" alt="Simulated differences in means histogram" width="\textwidth" />
+<p class="caption">
+Figure 10.7: Simulated differences in means histogram
+</p>
+</div>
+</div>
+<div id="the-p-value" class="section level3">
+<h3><span class="header-section-number">10.7.10</span> The p-value</h3>
+<p>Remember that we are interested in seeing where our observed sample mean difference of 0.95 falls on this null/randomization distribution. We are interested in simply a difference here so “more extreme” corresponds to values in both tails on the distribution. Let’s shade our null distribution to show a visual representation of our <span class="math inline">\(p\)</span>-value:</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> obs_diff, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-383"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-383-1.png" alt="Shaded histogram to show p-value" width="\textwidth" />
+<p class="caption">
+Figure 10.8: Shaded histogram to show p-value
+</p>
+</div>
+<p>Remember that the observed difference in means was 0.95. We have shaded red all values at or above that value and also shaded red those values at or below its negative value (since this is a two-tailed test). By giving <code>obs_stat = obs_diff</code> a vertical darker line is also shown at 0.95. To better estimate how large the <span class="math inline">\(p\)</span>-value will be, we also increase the number of bins to 100 here from 20:</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">100</span>, <span class="dt">obs_stat =</span> obs_diff, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-384"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-384-1.png" alt="Histogram with vertical lines corresponding to observed statistic" width="\textwidth" />
+<p class="caption">
+Figure 10.9: Histogram with vertical lines corresponding to observed statistic
+</p>
+</div>
+<p>At this point, it is important to take a guess as to what the <span class="math inline">\(p\)</span>-value may be. We can see that there are only a few permuted differences as extreme or more extreme than our observed effect (in both directions). Maybe we guess that this <span class="math inline">\(p\)</span>-value is somewhere around 2%, or maybe 3%, but certainly not 30% or more. Lastly, we calculate the <span class="math inline">\(p\)</span>-value directly using <code>infer</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distribution_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> obs_diff, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1  0.0046</code></pre>
+<p>We have around 0.46% of values as extreme or more extreme than our observed statistic in both directions. Assuming we are using a 5% significance level for <span class="math inline">\(\alpha\)</span>, we have evidence supporting the conclusion that the mean rating for romance movies is different from that of action movies. The next important idea is to better understand just how much higher of a mean rating can we expect the romance movies to have compared to that of action movies.</p>
+</div>
+<div id="corresponding-confidence-interval" class="section level3">
+<h3><span class="header-section-number">10.7.11</span> Corresponding confidence interval</h3>
+<p>One of the great things about the <code>infer</code> pipeline is that going between hypothesis tests and confidence intervals is incredibly simple. To create a null distribution, we ran</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distribution_two_means &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">5000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Romance&quot;</span>, <span class="st">&quot;Action&quot;</span>))</code></pre>
+<p>To get the corresponding bootstrap distribution with which we can compute a confidence interval, we can just remove or comment out the <code>hypothesize()</code> step since we are no longer assuming the null hypothesis is true when we bootstrap:</p>
+<pre class="sourceCode r"><code class="sourceCode r">percentile_ci_two_means &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="co">#  hypothesize(null = &quot;independence&quot;) %&gt;% </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">5000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Romance&quot;</span>, <span class="st">&quot;Action&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+percentile_ci_two_means</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1  0.333    1.59</code></pre>
+<p>Thus, we can expect the true mean of Romance movies on IMDB to have a rating 0.333 to 1.593 points higher than that of Action movies. Remember that this is based on bootstrapping using <code>movies_genre_sample</code> as our original sample and the confidence interval process being 95% reliable.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC10.14)</strong> Conduct the same analysis comparing action movies versus romantic movies using the median rating instead of the mean rating? What was different and what was the same?</p>
+<p><strong>(LC10.15)</strong> What conclusions can you make from viewing the faceted histogram looking at <code>rating</code> versus <code>genre</code> that you couldn’t see when looking at the boxplot?</p>
+<p><strong>(LC10.16)</strong> Describe in a paragraph how we used Allen Downey’s diagram to conclude if a statistical difference existed between mean movie ratings for action and romance movies.</p>
+<p><strong>(LC10.17)</strong> Why are we relatively confident that the distributions of the sample ratings will be good approximations of the population distributions of ratings for the two genres?</p>
+<p><strong>(LC10.18)</strong> Using the definition of “<span class="math inline">\(p\)</span>-value”, write in words what the <span class="math inline">\(p\)</span>-value represents for the hypothesis test above comparing the mean rating of romance to action movies.</p>
+<p><strong>(LC10.19)</strong> What is the value of the <span class="math inline">\(p\)</span>-value for the hypothesis test comparing the mean rating of romance to action movies?</p>
+<p><strong>(LC10.20)</strong> Do the results of the hypothesis test match up with the original plots we made looking at the population of movies? Why or why not?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="summary-5" class="section level3">
+<h3><span class="header-section-number">10.7.12</span> Summary</h3>
+<p>To review, these are the steps one would take whenever you’d like to do a hypothesis test comparing
+values from the distributions of two groups:</p>
+<ul>
+<li><p>Simulate many samples using a random process that matches the way
+the original data were collected and that <em>assumes the null hypothesis is
+true</em>.</p></li>
+<li><p>Collect the values of a sample statistic for each sample created using this random process to build
+a <em>null distribution</em>.</p></li>
+<li><p>Assess the significance of the <em>original</em> sample by determining where
+its sample statistic lies in the null distribution.</p></li>
+<li><p>If the proportion of values as extreme or more extreme than the observed statistic in the randomization
+distribution is smaller than the pre-determined significance level <span class="math inline">\(\alpha\)</span>, we reject <span class="math inline">\(H_0\)</span>. Otherwise,
+we fail to reject <span class="math inline">\(H_0\)</span>. (If no significance level is given, one can assume <span class="math inline">\(\alpha = 0.05\)</span>.)</p></li>
+</ul>
+<hr />
+</div>
+</div>
+<div id="theory-hypo" class="section level2">
+<h2><span class="header-section-number">10.8</span> Building theory-based methods using computation</h2>
+<p>As a point of reference, we will now discuss the traditional theory-based way to conduct the hypothesis test for determining if there is a statistically significant difference in the sample mean rating of Action movies versus Romance movies. This method and ones like it work very well when the assumptions are met in order to run the test. They are based on probability models and distributions such as the normal and <span class="math inline">\(t\)</span>-distributions.</p>
+<p>These traditional methods have been used for many decades back to the time when researchers didn’t have access to computers that could run 5000 simulations in a few seconds. They had to base their methods on probability theory instead. Many fields and researchers continue to use these methods and that is the biggest reason for their inclusion here. It’s important to remember that a <span class="math inline">\(t\)</span>-test or a <span class="math inline">\(z\)</span>-test is really just an approximation of what you have seen in this chapter already using simulation and randomization. The focus here is on understanding how the shape of the <span class="math inline">\(t\)</span>-curve comes about without digging big into the mathematical underpinnings.</p>
+<div id="example-t-test-for-two-independent-samples" class="section level3">
+<h3><span class="header-section-number">10.8.1</span> Example: <span class="math inline">\(t\)</span>-test for two independent samples</h3>
+<p>What is commonly done in statistics is the process of normalization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common normalization is known as the <span class="math inline">\(z\)</span>-score. The formula for a <span class="math inline">\(z\)</span>-score is <span class="math display">\[Z = \frac{x - \mu}{\sigma},\]</span> where <span class="math inline">\(x\)</span> represent the value of a variable, <span class="math inline">\(\mu\)</span> represents the mean of the variable, and <span class="math inline">\(\sigma\)</span> represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding <span class="math inline">\(z\)</span>-score that gives how many standard deviations away that value is from its mean. <span class="math inline">\(z\)</span>-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below.</p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-389-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity.</p>
+<p>Another form of normalization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This normalization is often called the <span class="math inline">\(t\)</span>-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is <span class="math display">\[T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}}  }\]</span></p>
+<p>There is a lot to try to unpack here.</p>
+<ul>
+<li><span class="math inline">\(\bar{x}_1\)</span> is the sample mean response of the first group</li>
+<li><span class="math inline">\(\bar{x}_2\)</span> is the sample mean response of the second group</li>
+<li><span class="math inline">\(\mu_1\)</span> is the population mean response of the first group</li>
+<li><span class="math inline">\(\mu_2\)</span> is the population mean response of the second group</li>
+<li><span class="math inline">\(s_1\)</span> is the sample standard deviation of the response of the first group</li>
+<li><span class="math inline">\(s_2\)</span> is the sample standard deviation of the response of the second group</li>
+<li><span class="math inline">\(n_1\)</span> is the sample size of the first group</li>
+<li><span class="math inline">\(n_2\)</span> is the sample size of the second group</li>
+</ul>
+<p>Assuming that the null hypothesis is true (<span class="math inline">\(H_0: \mu_1 - \mu_2 = 0\)</span>), <span class="math inline">\(T\)</span> is said to be distributed following a <span class="math inline">\(t\)</span> distribution with degrees of freedom equal to the smaller value of <span class="math inline">\(n_1 - 1\)</span> and <span class="math inline">\(n_2 - 1\)</span>. The “degrees of freedom” can be thought of measuring how different the <span class="math inline">\(t\)</span> distribution will be as compared to a normal distribution. Small sample sizes lead to small degrees of freedom and, thus, <span class="math inline">\(t\)</span> distributions that have more values in the tails of their distributions. Large sample sizes lead to large degrees of freedom and, thus, <span class="math inline">\(t\)</span> distributions that closely align with the standard normal, bell-shaped curve.</p>
+<p>So, assuming <span class="math inline">\(H_0\)</span> is true, our formula simplifies a bit:</p>
+<p><span class="math display">\[T =\dfrac{ \bar{x}_1 - \bar{x}_2}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}}  }.\]</span></p>
+<p>We have already built an approximation for what we think the distribution of <span class="math inline">\(\delta = \bar{x}_1 - \bar{x}_2\)</span> looks like using randomization above. Recall this distribution:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> null_distribution_two_means, <span class="kw">aes</span>(<span class="dt">x =</span> stat)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">bins =</span> <span class="dv">20</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-390"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-390-1.png" alt="Simulated differences in means histogram" width="\textwidth" />
+<p class="caption">
+Figure 10.10: Simulated differences in means histogram
+</p>
+</div>
+<p>The <code>infer</code> package also includes some built-in theory-based statistics as well, so instead of going through the process of trying to transform the difference into a standardized form, we can just provide a different value for <code>stat</code> in <code>calculate()</code>. Recall the <code>generated_samples</code> data frame created via:</p>
+<pre class="sourceCode r"><code class="sourceCode r">generated_samples &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">5000</span>)</code></pre>
+<p>We can now created a null distribution of <span class="math inline">\(t\)</span> statistics:</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distribution_t &lt;-<span class="st"> </span>generated_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;t&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Romance&quot;</span>, <span class="st">&quot;Action&quot;</span>))
+null_distribution_t <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-392-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that the shape of this <code>stat = &quot;t&quot;</code> distribution is the same as that of <code>stat = &quot;diff in means&quot;</code>. The scale has changed though with the <span class="math inline">\(t\)</span> values having less spread than the difference in means.</p>
+<p>A traditional <span class="math inline">\(t\)</span>-test doesn’t look at this simulated distribution, but instead it looks at the <span class="math inline">\(t\)</span>-curve with degrees of freedom equal to 62.029. We can overlay this distribution over the top of our permuted <span class="math inline">\(t\)</span> statistics using the <code>method = &quot;both&quot;</code> setting in <code>visualize()</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distribution_t <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">method =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-393-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can see that the curve does a good job of approximating the randomization distribution here. (More on when to expect for this to be the case when we discuss conditions for the <span class="math inline">\(t\)</span>-test in a bit.) To calculate the <span class="math inline">\(p\)</span>-value in this case, we need to figure out how much of the total area under the <span class="math inline">\(t\)</span>-curve is at our observed <span class="math inline">\(T\)</span>-statistic or more, plus also adding the area under the curve at the negative value of the observed <span class="math inline">\(T\)</span>-statistic or below. (Remember this is a two-tailed test so we are looking for a difference–values in the tails of either direction.) Just as we converted all of the simulated values to <span class="math inline">\(T\)</span>-statistics, we must also do so for our observed effect <span class="math inline">\(\delta^*\)</span>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">obs_t &lt;-<span class="st"> </span>movies_genre_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> rating <span class="op">~</span><span class="st"> </span>genre) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;t&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Romance&quot;</span>, <span class="st">&quot;Action&quot;</span>))</code></pre>
+<p>So graphically we are interested in finding the percentage of values that are at or above 2.945 or at or below -2.945.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distribution_t <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">method =</span> <span class="st">&quot;both&quot;</span>, <span class="dt">obs_stat =</span> obs_t, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-395-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>As we might have expected with this just being a standardization of the difference in means statistic that produced a small <span class="math inline">\(p\)</span>-value, we also have a very small one here.</p>
+</div>
+<div id="conditions-for-t-test" class="section level3">
+<h3><span class="header-section-number">10.8.2</span> Conditions for t-test</h3>
+<p>The <code>infer</code> package does not automatically check conditions for the theoretical methods to work and this warning was given when we used <code>method = &quot;both&quot;</code>. In order for the results of the <span class="math inline">\(t\)</span>-test to be valid, three conditions must be met:</p>
+<ol style="list-style-type: decimal">
+<li>Independent observations in both samples</li>
+<li>Nearly normal populations OR large sample sizes (<span class="math inline">\(n \ge 30\)</span>)</li>
+<li>Independently selected samples</li>
+</ol>
+<p>Condition 1: This is met since we sampled at random using R from our population.</p>
+<p>Condition 2: Recall from Figure <a href="10-hypothesis-testing.html#fig:movie-hist">10.4</a>, that we know how the populations are distributed. Both of them are close to normally distributed. If we are a little concerned about this assumption, we also do have samples of size larger than 30 (<span class="math inline">\(n_1 = n_2 = 34\)</span>).</p>
+<p>Condition 3: This is met since there is no natural pairing of a movie in the Action group to a movie in the Romance group.</p>
+<p>Since all three conditions are met, we can be reasonably certain that the theory-based test will match the results of the randomization-based test using shuffling. Remember that theory-based tests can produce some incorrect results in these assumptions are not carefully checked. The only assumption for randomization and computational-based methods is that the sample is selected at random. They are our preference and we strongly believe they should be yours as well, but it’s important to also see how the theory-based tests can be done and used as an approximation for the computational techniques until at least more researchers are using these techniques that utilize the power of computers.</p>
+</div>
+</div>
+<div id="conclusion-8" class="section level2">
+<h2><span class="header-section-number">10.9</span> Conclusion</h2>
+<p>We conclude by showing the <code>infer</code> pipeline diagram. In Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a>, we’ll come back to regression and see how the ideas covered in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a> and this chapter can help in understanding the significance of predictors in modeling.</p>
+<p><img src="images/flowcharts/infer/ht_diagram.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="script-of-r-code-7" class="section level3">
+<h3><span class="header-section-number">10.9.1</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/10-hypothesis-testing.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="9-confidence-intervals.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="11-inference-for-regression.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/10-hypothesis-testing.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/11-inference-for-regression.html b/previous_versions/v0.4.0/11-inference-for-regression.html
new file mode 100644
index 000000000..1b98665d0
--- /dev/null
+++ b/previous_versions/v0.4.0/11-inference-for-regression.html
@@ -0,0 +1,914 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>11 Inference for Regression | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="11 Inference for Regression | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="11 Inference for Regression | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="10-hypothesis-testing.html">
+<link rel="next" href="12-thinking-with-data.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="inference-for-regression" class="section level1">
+<h1><span class="header-section-number">11</span> Inference for Regression</h1>
+<hr />
+<div class="learncheck">
+<p>
+<strong>Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at <a href="https://github.com/moderndive/moderndive_book" class="uri">https://github.com/moderndive/moderndive_book</a>.</strong>
+</p>
+<center>
+<img src="images/sign-2408065_1920.png" alt="Drawing" style="height: 100px;"/>
+</center>
+</div>
+<hr />
+<div id="needed-packages-8" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(moderndive)
+<span class="kw">library</span>(infer)</code></pre>
+</div>
+<div id="datacamp-8" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>Our approach of understanding both the statistical and practical significance of any regression results, is aligned with the approach taken in <a href="https://twitter.com/jo_hardin47">Jo Hardin’s</a> DataCamp course “Inference for Regression.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-linear-regression"><img src="images/datacamp_inference_for_regression.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="simulation-based-inference-for-regression" class="section level2">
+<h2><span class="header-section-number">11.1</span> Simulation-based Inference for Regression</h2>
+<p>We can also use the concept of permuting to determine the standard error of our null distribution and conduct a hypothesis test for a population slope. Let’s go back to our example on teacher evaluations Chapters <a href="6-regression.html#regression">6</a> and <a href="7-multiple-regression.html#multiple-regression">7</a>. We’ll begin in the basic regression setting to test to see if we have evidence that a statistically significant <em>positive</em> relationship exists between teaching and beauty scores for the University of Texas professors. As we did in Chapter <a href="6-regression.html#regression">6</a>, teaching <code>score</code> will act as our outcome variable and <code>bty_avg</code> will be our explanatory variable. We will set up this hypothesis testing process as we have each before via the “There is Only One Test” diagram in Figure <a href="10-hypothesis-testing.html#fig:htdowney">10.1</a> using the <code>infer</code> package.</p>
+<div id="data-1" class="section level3">
+<h3><span class="header-section-number">11.1.1</span> Data</h3>
+<p>Our data is stored in <code>evals</code> and we are focused on the measurements of the <code>score</code> and <code>bty_avg</code> variables there. Note that we don’t choose a subset of variables here since we will <code>specify()</code> the variables of interest using <code>infer</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg)</code></pre>
+<pre><code>Response: score (numeric)
+Explanatory: bty_avg (numeric)
+# A tibble: 463 x 2
+   score bty_avg
+   &lt;dbl&gt;   &lt;dbl&gt;
+ 1   4.7    5   
+ 2   4.1    5   
+ 3   3.9    5   
+ 4   4.8    5   
+ 5   4.6    3   
+ 6   4.3    3   
+ 7   2.8    3   
+ 8   4.1    3.33
+ 9   3.4    3.33
+10   4.5    3.17
+# … with 453 more rows</code></pre>
+</div>
+<div id="test-statistic-delta-1" class="section level3">
+<h3><span class="header-section-number">11.1.2</span> Test statistic <span class="math inline">\(\delta\)</span></h3>
+<p>Our test statistic here is the sample slope coefficient that we denote with <span class="math inline">\(b_1\)</span>.</p>
+</div>
+<div id="observed-effect-delta-1" class="section level3">
+<h3><span class="header-section-number">11.1.3</span> Observed effect <span class="math inline">\(\delta^*\)</span></h3>
+<p>We can use the <code>specify() %&gt;% calculate()</code> shortcut here to determine the slope value seen in our observed data:</p>
+<pre class="sourceCode r"><code class="sourceCode r">slope_obs &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre>
+<p>The calculated slope value from our observed sample is <span class="math inline">\(b_1 = 0.067\)</span>.</p>
+</div>
+<div id="model-of-h_0-1" class="section level3">
+<h3><span class="header-section-number">11.1.4</span> Model of <span class="math inline">\(H_0\)</span></h3>
+<p>We are looking to see if a positive relationship exists so <span class="math inline">\(H_A: \beta_1 &gt; 0\)</span>. Our null hypothesis is always in terms of equality so we have <span class="math inline">\(H_0: \beta_1 = 0\)</span>. In other words, when we assume the null hypothesis is true, we are assuming there is NOT a linear relationship between teaching and beauty scores for University of Texas professors.</p>
+</div>
+<div id="simulated-data-1" class="section level3">
+<h3><span class="header-section-number">11.1.5</span> Simulated data</h3>
+<p>Now to simulate the null hypothesis being true and recreating how our sample was created, we need to think about what it means for <span class="math inline">\(\beta_1\)</span> to be zero. If <span class="math inline">\(\beta_1 = 0\)</span>, we said above that there is no relationship between the teaching and beauty scores. If there is no relationship, then any one of the teaching score values could have just as likely occurred with any of the other beauty score values instead of the one that it actually did fall with. We, therefore, have another example of permuting in our simulating of data under the null hypothesis.</p>
+<p><strong>Tactile simulation</strong></p>
+<p>We could use a deck of 926 note cards to create a tactile simulation of this permuting process. We would write the 463 different values of beauty scores on each of the 463 cards, one per card. We would then do the same thing for the 463 teaching scores putting them on one per card.</p>
+<p>Next, we would lay out each of the 463 beauty score cards and we would shuffle the teaching score deck. Then, after shuffling the deck well, we would disperse the cards one per each one of the beauty score cards. We would then enter these new values in for teaching score and compute a sample slope based on this permuting. We could repeat this process many times, keeping track of our sample slope after each shuffle.</p>
+</div>
+<div id="distribution-of-delta-under-h_0-1" class="section level3">
+<h3><span class="header-section-number">11.1.6</span> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></h3>
+<p>We can build our null distribution in much the same way we did in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> using the <code>generate()</code> and <code>calculate()</code> functions. Note also the addition of the <code>hypothesize()</code> function, which lets <code>generate()</code> know to perform the permuting instead of bootstrapping.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> slope_obs, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-404-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with <code>visualize()</code>.</p>
+</div>
+<div id="the-p-value-1" class="section level3">
+<h3><span class="header-section-number">11.1.7</span> The p-value</h3>
+<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> slope_obs, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1       0</code></pre>
+<p>Since 0.067 falls far to the right of this plot beyond where any of the histogram bins have data, we can say that we have a <span class="math inline">\(p\)</span>-value of 0. We, thus, have evidence to reject the null hypothesis in support of there being a positive association between the beauty score and teaching score of University of Texas faculty members.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC11.1)</strong> Repeat the inference above but this time for the correlation coefficient instead of the slope. Note the implementation of <code>stat = &quot;correlation&quot;</code> in the <code>calculate()</code> function of the <code>infer</code> package.</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="bootstrapping-for-the-regression-slope" class="section level2">
+<h2><span class="header-section-number">11.2</span> Bootstrapping for the regression slope</h2>
+<p>With the p-value calculated as 0 in the hypothesis test above, we can next determine just how strong of a positive slope value we might expect between the variables of teaching <code>score</code> and beauty score (<code>bty_avg</code>) for University of Texas faculty. Recall the <code>infer</code> pipeline above to compute the null distribution. Recall that this assumes the null hypothesis is true that there is no relationship between teaching score and beauty score using the <code>hypothesize()</code> function.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_slope_distn &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(score <span class="op">~</span><span class="st"> </span>bty_avg) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>, <span class="dt">type =</span> <span class="st">&quot;permute&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;slope&quot;</span>)</code></pre>
+<p>To further reinforce the process being done in the pipeline, we’ve added the <code>type</code> argument to <code>generate()</code>. This is automatically added based on the entries for <code>specify()</code> and <code>hypothesize()</code> but it provides a useful way to check to make sure <code>generate()</code> is created the samples in the desired way. In this case, we <code>permute</code>d the values of one variable across the values of the other 10,000 times and <code>calculate</code>d a <code>&quot;slope&quot;</code> coefficient for each of these 10,000 <code>generate</code>d samples.</p>
+<p>If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping:</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-410-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Next we can use the <code>get_ci()</code> function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score.</p>
+<pre class="sourceCode r"><code class="sourceCode r">percentile_slope_ci &lt;-<span class="st"> </span>bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">level =</span> <span class="fl">0.99</span>, <span class="dt">type =</span> <span class="st">&quot;percentile&quot;</span>)
+percentile_slope_ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `0.5%` `99.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1 0.0229   0.110</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">se_slope_ci &lt;-<span class="st"> </span>bootstrap_slope_distn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">level =</span> <span class="fl">0.99</span>, <span class="dt">type =</span> <span class="st">&quot;se&quot;</span>, <span class="dt">point_estimate =</span> slope_obs)
+se_slope_ci</code></pre>
+<pre><code># A tibble: 1 x 2
+   lower upper
+   &lt;dbl&gt; &lt;dbl&gt;
+1 0.0220 0.111</code></pre>
+<p>With the bootstrap distribution being close to symmetric, it makes sense that the two resulting confidence intervals are similar.</p>
+<!-- It's all you, Bert! Not sure if we want to cover more about the t distribution here as well or how we should transition from simulation-based to theory-based for the multiple regression part? -->
+</div>
+<div id="inference-for-multiple-regression" class="section level2">
+<h2><span class="header-section-number">11.3</span> Inference for multiple regression</h2>
+<div id="refresher-professor-evaluations-data" class="section level3">
+<h3><span class="header-section-number">11.3.1</span> Refresher: Professor evaluations data</h3>
+<p>Let’s revisit the professor evaluations data that we analyzed using multiple regression with one numerical and one categorical predictor. In particular</p>
+<ul>
+<li><span class="math inline">\(y\)</span>: outcome variable of instructor evaluation <code>score</code></li>
+<li>predictor variables
+<ul>
+<li><span class="math inline">\(x_1\)</span>: numerical explanatory/predictor variable of <code>age</code></li>
+<li><span class="math inline">\(x_2\)</span>: categorical explanatory/predictor variable of <code>gender</code></li>
+</ul></li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(moderndive)
+
+evals_multiple &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(score, ethnicity, gender, language, age, bty_avg, rank)</code></pre>
+<p>First, recall that we had two competing potential models to explain professors’
+teaching scores:</p>
+<ol style="list-style-type: decimal">
+<li>Model 1: No interaction term. i.e. both male and female profs have the same slope describing the associated effect of age on teaching score</li>
+<li>Model 2: Includes an interaction term. i.e. we allow for male and female profs to have different slopes describing the associated effect of age on teaching score</li>
+</ol>
+</div>
+<div id="refresher-visualizations" class="section level3">
+<h3><span class="header-section-number">11.3.2</span> Refresher: Visualizations</h3>
+<p>Recall the plots we made for both these models:</p>
+<div class="figure" style="text-align: center"><span id="fig:model1"></span>
+<img src="ismaykim_files/figure-html/model1-1.png" alt="Model 1: no interaction effect included" width="\textwidth" />
+<p class="caption">
+Figure 11.1: Model 1: no interaction effect included
+</p>
+</div>
+<div class="figure" style="text-align: center"><span id="fig:model2"></span>
+<img src="ismaykim_files/figure-html/model2-1.png" alt="Model 2: interaction effect included" width="\textwidth" />
+<p class="caption">
+Figure 11.2: Model 2: interaction effect included
+</p>
+</div>
+</div>
+<div id="refresher-regression-tables" class="section level3">
+<h3><span class="header-section-number">11.3.3</span> Refresher: Regression tables</h3>
+<p>Last, let’s recall the regressions we fit. First, the regression with no
+interaction effect: note the use of <code>+</code> in the formula.</p>
+<pre class="sourceCode r"><code class="sourceCode r">score_model_<span class="dv">2</span> &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">+</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_multiple)
+<span class="kw">get_regression_table</span>(score_model_<span class="dv">2</span>)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-416">Table 11.1: </span>Model 1: Regression table with no interaction effect included</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">4.484</td>
+<td align="right">0.125</td>
+<td align="right">35.79</td>
+<td align="right">0.000</td>
+<td align="right">4.238</td>
+<td align="right">4.730</td>
+</tr>
+<tr class="even">
+<td align="left">age</td>
+<td align="right">-0.009</td>
+<td align="right">0.003</td>
+<td align="right">-3.28</td>
+<td align="right">0.001</td>
+<td align="right">-0.014</td>
+<td align="right">-0.003</td>
+</tr>
+<tr class="odd">
+<td align="left">gendermale</td>
+<td align="right">0.191</td>
+<td align="right">0.052</td>
+<td align="right">3.63</td>
+<td align="right">0.000</td>
+<td align="right">0.087</td>
+<td align="right">0.294</td>
+</tr>
+</tbody>
+</table>
+<p>Second, the regression with an interaction effect: note the use of <code>*</code> in the formula.</p>
+<pre class="sourceCode r"><code class="sourceCode r">score_model_<span class="dv">3</span> &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">*</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_multiple)
+<span class="kw">get_regression_table</span>(score_model_<span class="dv">3</span>)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-418">Table 11.2: </span>Model 2: Regression table with interaction effect included</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">4.883</td>
+<td align="right">0.205</td>
+<td align="right">23.80</td>
+<td align="right">0.000</td>
+<td align="right">4.480</td>
+<td align="right">5.286</td>
+</tr>
+<tr class="even">
+<td align="left">age</td>
+<td align="right">-0.018</td>
+<td align="right">0.004</td>
+<td align="right">-3.92</td>
+<td align="right">0.000</td>
+<td align="right">-0.026</td>
+<td align="right">-0.009</td>
+</tr>
+<tr class="odd">
+<td align="left">gendermale</td>
+<td align="right">-0.446</td>
+<td align="right">0.265</td>
+<td align="right">-1.68</td>
+<td align="right">0.094</td>
+<td align="right">-0.968</td>
+<td align="right">0.076</td>
+</tr>
+<tr class="even">
+<td align="left">age:gendermale</td>
+<td align="right">0.014</td>
+<td align="right">0.006</td>
+<td align="right">2.45</td>
+<td align="right">0.015</td>
+<td align="right">0.003</td>
+<td align="right">0.024</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div id="script-of-r-code-8" class="section level3">
+<h3><span class="header-section-number">11.3.4</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/11-inference-for-regression.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+
+
+
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="10-hypothesis-testing.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="12-thinking-with-data.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/11-inference-for-regression.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/12-thinking-with-data.html b/previous_versions/v0.4.0/12-thinking-with-data.html
new file mode 100644
index 000000000..5e0ae1d8a
--- /dev/null
+++ b/previous_versions/v0.4.0/12-thinking-with-data.html
@@ -0,0 +1,1097 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>12 Thinking with Data | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="12 Thinking with Data | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="12 Thinking with Data | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="11-inference-for-regression.html">
+<link rel="next" href="A-appendixA.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="thinking-with-data" class="section level1">
+<h1><span class="header-section-number">12</span> Thinking with Data</h1>
+<p>Recall in Section <a href="index.html#sec:intro-for-students">1.2</a> “Introduction for students” and at the end of chapters throughout this book, we displayed the “ModernDive flowchart” mapping your journey through this book.</p>
+<div class="figure" style="text-align: center"><span id="fig:moderndive-figure-conclusion"></span>
+<img src="images/flowcharts/flowchart/flowchart.002.png" alt="ModernDive Flowchart" width="\textwidth" />
+<p class="caption">
+Figure 12.1: ModernDive Flowchart
+</p>
+</div>
+<p>Let’s get a refresher of what you’ve covered so far. You first got started with with data in Chapter <a href="2-getting-started.html#getting-started">2</a>, where you learned about the difference between R and RStudio, started coding in R, started understanding what R packages are, and explored your first dataset: all domestic departure flights from a New York City airport in 2013. Then:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Data science</strong>: You assembled your data science toolbox using <code>tidyverse</code> packages. In particular:
+<ul>
+<li>Ch.<a href="3-viz.html#viz">3</a>: Visualizing data via the <code>ggplot2</code> package.</li>
+<li>Ch.<a href="4-tidy.html#tidy">4</a>: Understanding the concept of “tidy” data as a standardized data input format for all packages in the <code>tidyverse</code></li>
+<li>Ch.<a href="5-wrangling.html#wrangling">5</a>: Wrangling data via the <code>dplyr</code> package.</li>
+</ul></li>
+<li><strong>Data modeling</strong>: Using these data science tools and helper functions from the <code>moderndive</code> package, you started performing data modeling. In particular:
+<ul>
+<li>Ch.<a href="6-regression.html#regression">6</a>: Constructing basic regression models.</li>
+<li>Ch.<a href="7-multiple-regression.html#multiple-regression">7</a>: Constructing multiple regression models.</li>
+</ul></li>
+<li><strong>Statistical inference</strong>: Once again using your newly acquired data science tools, we unpacked statistical inference using the <code>infer</code> package. In particular:
+<ul>
+<li>Ch.<a href="8-sampling.html#sampling">8</a>: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls.</li>
+<li>Ch.<a href="9-confidence-intervals.html#confidence-intervals">9</a>: Building confidence intervals.</li>
+<li>Ch.<a href="10-hypothesis-testing.html#hypothesis-testing">10</a>: Conducting hypothesis tests.</li>
+</ul></li>
+<li><strong>Data modeling revisited</strong>: Armed with your new understanding of statistical inference, you revisited and reviewed the models you constructed in Ch.<a href="6-regression.html#regression">6</a> &amp; Ch.<a href="7-multiple-regression.html#multiple-regression">7</a>. In particular:
+<ul>
+<li>Ch.<a href="11-inference-for-regression.html#inference-for-regression">11</a>: Interpreting both the statistical and practice significance of the results of the models.</li>
+</ul></li>
+</ol>
+<p>All this was our approach of guiding you through your first experiences of <a href="https://arxiv.org/pdf/1410.3127.pdf">“thinking with data”</a>, an expression originally coined by Diane Lambert of Google. How the philosophy underlying this expression guided our mapping of the flowchart above was well put in the introduction to the <a href="https://peerj.com/collections/50-practicaldatascistats/">“Practical Data Science for Stats”</a> collection of preprints focusing on the practical side of data science workflows and statistical analysis, curated by <a href="https://twitter.com/jennybryan?lang=en">Jennifer Bryan</a> and <a href="https://twitter.com/hadleywickham?lang=en">Hadley Wickham</a>:</p>
+<blockquote>
+<p>There are many aspects of day-to-day analytical work that are almost absent from the conventional statistics literature and curriculum. And yet these activities account for a considerable share of the time and effort of data analysts and applied statisticians. The goal of this collection is to increase the visibility and adoption of modern data analytical workflows. We aim to facilitate the transfer of tools and frameworks between industry and academia, between software engineering and statistics and computer science, and across different domains.</p>
+</blockquote>
+<p>In other words, in order to be equipped to “think with data” in the 21st century, future analysts need preparation going through the entirety of the <a href="http://r4ds.had.co.nz/explore-intro.html">“Data/Science Pipeline”</a> we also saw earlier and not just parts of it.</p>
+<div class="figure" style="text-align: center"><span id="fig:pipeline-figure-conclusion"></span>
+<img src="images/tidy1.png" alt="Data/Science Pipeline" width="\textwidth" />
+<p class="caption">
+Figure 12.2: Data/Science Pipeline
+</p>
+</div>
+<p>In Section <a href="12-thinking-with-data.html#seattle-house-prices">12.1</a>, we’ll take you through full-pass of the “Data/Science Pipeline” where we’ll analyze the sale price of houses in Seattle, WA, USA. In Section <a href="12-thinking-with-data.html#data-journalism">12.2</a>, we’ll present you with examples of effective data storytelling, in particular the articles from the data journalism website <a href="https://fivethirtyeight.com/">FiveThirtyEight.com</a>, many of whose source datasets are accessible from the <code>fivethirtyeight</code> R package.</p>
+<div id="needed-packages-9" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(moderndive)
+<span class="kw">library</span>(fivethirtyeight)</code></pre>
+</div>
+<div id="datacamp-9" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author <a href="https://twitter.com/rudeboybert">Albert Y. Kim’s</a> DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression”.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse"><img src="images/datacamp_intro_to_modeling.png" alt="Drawing" style="height: 200px;"/></a>
+</center>
+<p>Cases studies involving data in the <code>fivethirtyeight</code> R package form the basis of ModernDive co-author <a href="https://twitter.com/old_man_chester?lang=en">Chester Ismay’s</a> DataCamp course “Effective Data Storytelling in the Tidyverse”. This free course can be accessed <a href="https://www.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free">here</a>.</p>
+<hr />
+</div>
+<div id="seattle-house-prices" class="section level2">
+<h2><span class="header-section-number">12.1</span> Case study: Seattle house prices</h2>
+<p><a href="https://www.kaggle.com/">Kaggle.com</a> is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the <a href="https://www.kaggle.com/harlfoxem/housesalesprediction">House Sales in King County, USA</a> consisting of homes sold in between May 2014 and May 2015 in King County, Washington State, USA, which includes the greater Seattle metropolitan area. This <a href="https://creativecommons.org/publicdomain/zero/1.0/">CC0: Public Domain</a> licensed dataset is included in the <code>moderndive</code> package in the <code>house_prices</code> data frame, which we’ll refer to as the “Seattle house prices” dataset.</p>
+<p>The dataset consists 21,613 houses and 21 variables describing these houses; for a full list of these variables see the help file by running <code>?house_prices</code> in the console. In this case study, we’ll create a model using multiple regression where:</p>
+<ul>
+<li>The outcome variable <span class="math inline">\(y\)</span> is the sale <code>price</code> of houses</li>
+<li>The two explanatory/predictor variables we’ll use are :
+<ol style="list-style-type: decimal">
+<li><span class="math inline">\(x_1\)</span>: house size <code>sqft_living</code>, as measured by square feet of living space, where 1 square foot is about 0.09 square meters.</li>
+<li><span class="math inline">\(x_2\)</span>: house <code>condition</code>, a categorical variable with 5 levels where 1 indicates “poor” and 5 indicates “excellent.”</li>
+</ol></li>
+</ul>
+<p>Let’s load all the packages needed for this case study (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(moderndive)</code></pre>
+<div id="house-prices-EDA-I" class="section level3">
+<h3><span class="header-section-number">12.1.1</span> Exploratory data analysis (EDA)</h3>
+<p>A crucial first step before any formal modeling is an exploratory data analysis, commonly abbreviated at EDA. Exploratory data analysis can give you a sense of your data, help identify issues with your data, bring to light any outliers, and help inform model construction. There are three basic approaches to EDA:</p>
+<ol style="list-style-type: decimal">
+<li>Most fundamentally, just looking at the raw data. For example using RStudio’s <code>View()</code> spreadsheet viewer or the <code>glimpse()</code> function from the <code>dplyr</code> package</li>
+<li>Creating visualizations like the ones using <code>ggplot2</code> from Chapter <a href="3-viz.html#viz">3</a></li>
+<li>Computing summary statistics using the <code>dplyr</code> data wrangling tools from Chapter <a href="5-wrangling.html#wrangling">5</a></li>
+</ol>
+<p>First, let’s look the raw data using <code>View()</code> and the <code>glimpse()</code> function. Explore the dataset. Which variables are numerical and which are categorical? For the categorical variables, what are their levels? Which do you think would useful variables to use in a model for house price? In this case study, we’ll only consider the variables <code>price</code>, <code>sqft_living</code>, and <code>condition</code>. An important thing to observe is that while the <code>condition</code> variable has values <code>1</code> through <code>5</code>, these are saved in R as <code>fct</code> factors i.e. R’s way of saving categorical variables. So you should think of these as the “labels” <code>1</code> through <code>5</code> and not the numerical values <code>1</code> through <code>5</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(house_prices)
+<span class="kw">glimpse</span>(house_prices)</code></pre>
+<pre><code>Observations: 21,613
+Variables: 21
+$ id            &lt;chr&gt; &quot;7129300520&quot;, &quot;6414100192&quot;, &quot;5631500400&quot;, &quot;2487200875&quot;,…
+$ date          &lt;date&gt; 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-0…
+$ price         &lt;dbl&gt; 221900, 538000, 180000, 604000, 510000, 1225000, 257500…
+$ bedrooms      &lt;int&gt; 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2…
+$ bathrooms     &lt;dbl&gt; 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2…
+$ sqft_living   &lt;int&gt; 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 18…
+$ sqft_lot      &lt;int&gt; 5650, 7242, 10000, 5000, 8080, 101930, 6819, 9711, 7470…
+$ floors        &lt;dbl&gt; 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, …
+$ waterfront    &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
+$ view          &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0…
+$ condition     &lt;fct&gt; 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4…
+$ grade         &lt;fct&gt; 7, 7, 6, 7, 8, 11, 7, 7, 7, 7, 8, 7, 7, 7, 7, 9, 7, 7, …
+$ sqft_above    &lt;int&gt; 1180, 2170, 770, 1050, 1680, 3890, 1715, 1060, 1050, 18…
+$ sqft_basement &lt;int&gt; 0, 400, 0, 910, 0, 1530, 0, 0, 730, 0, 1700, 300, 0, 0,…
+$ yr_built      &lt;int&gt; 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 2…
+$ yr_renovated  &lt;int&gt; 0, 1991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
+$ zipcode       &lt;fct&gt; 98178, 98125, 98028, 98136, 98074, 98053, 98003, 98198,…
+$ lat           &lt;dbl&gt; 47.5, 47.7, 47.7, 47.5, 47.6, 47.7, 47.3, 47.4, 47.5, 4…
+$ long          &lt;dbl&gt; -122, -122, -122, -122, -122, -122, -122, -122, -122, -…
+$ sqft_living15 &lt;int&gt; 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780, 2…
+$ sqft_lot15    &lt;int&gt; 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 8113,…</code></pre>
+<p>Let’s now perform the second possible approach to EDA: creating visualizations. Since <code>price</code> and <code>sqft_living</code> are numerical variables, an appropriate way to visualize of these variables’ distributions would be using a histogram using a <code>geom_histogram()</code> as seen in Section <a href="3-viz.html#histograms">3.5</a>. However, since <code>condition</code> is categorical, a barplot using a <code>geom_bar()</code> yields an appropriate visualization of its distribution. Recall from Section <a href="3-viz.html#geombar">3.8</a> that since <code>condition</code> is not “pre-counted”, we use a <code>geom_bar()</code> and not a <code>geom_col()</code>. In Figure <a href="12-thinking-with-data.html#fig:house-prices-viz">12.3</a>, we display all three of these visualizations at once.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Histogram of house price:</span>
+<span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> price)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;price (USD)&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House price&quot;</span>)
+
+<span class="co"># Histogram of sqft_living:</span>
+<span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> sqft_living)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;living space (square feet)&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House size&quot;</span>)
+
+<span class="co"># Barplot of condition:</span>
+<span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> condition)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;condition&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House condition&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:house-prices-viz"></span>
+<img src="ismaykim_files/figure-html/house-prices-viz-1.png" alt="Exploratory visualizations of Seattle house prices data" width="\textwidth" />
+<p class="caption">
+Figure 12.3: Exploratory visualizations of Seattle house prices data
+</p>
+</div>
+<p>We observe the following:</p>
+<ol style="list-style-type: decimal">
+<li>In the histogram for <code>price</code>:
+<ul>
+<li>Since <code>e+06</code> means <span class="math inline">\(10^6\)</span>, or one million, we see that a majority of houses are less than 2 million dollars.</li>
+<li>The x-axis stretches out far to the right to 8 million dollars, even though there appear to be no houses.</li>
+</ul></li>
+<li>In the histogram for size <code>sqft_living</code>
+<ul>
+<li>Most houses appear to have less than 5000 square feet of living space. For comparison a standard American football field is about 57,600 square feet, where as a standard soccer AKA association football field is about 64,000 square feet.</li>
+<li>The x-axis exhibits the same stretched out behavior to the right as for <code>price</code></li>
+</ul></li>
+<li>Most houses are of <code>condition</code> 3, 4, or 5.</li>
+</ol>
+<p>In the case of <code>price</code>, why does the x-axis stretch so far to the right? It is because there are a very small number of houses with price closer to 8 million; these prices are outliers in this case. We say variable is “right skewed” as exhibited by the long right tail. This skew makes it difficult to compare prices of the less expensive houses as the more expensive houses dominate the scale of the x-axis. This is similarly the case for <code>sqft_living</code>.</p>
+<p>Let’s now perform the third possible approach to EDA: computing summary statistics. In particular, let’s compute 4 summary statistics using the <code>summarize()</code> data wrangling verb from Section <a href="5-wrangling.html#summarize">5.4</a>.</p>
+<ul>
+<li>Two measures of center: the mean and median</li>
+<li>Two measures of variability/spread: the standard deviation and interquartile-range (IQR = 3rd quartile - 1st quartile)</li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r">house_prices <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(
+    <span class="dt">mean_price =</span> <span class="kw">mean</span>(price),
+    <span class="dt">median_price =</span> <span class="kw">median</span>(price),
+    <span class="dt">sd_price =</span> <span class="kw">sd</span>(price),
+    <span class="dt">IQR_price =</span> <span class="kw">IQR</span>(price)
+  )</code></pre>
+<pre><code># A tibble: 1 x 4
+  mean_price median_price sd_price IQR_price
+       &lt;dbl&gt;        &lt;dbl&gt;    &lt;dbl&gt;     &lt;dbl&gt;
+1    540088.       450000  367127.    323050</code></pre>
+<p>Observe the following:</p>
+<ol style="list-style-type: decimal">
+<li>The mean <code>price</code> of $540,088 is larger than the median of $450,000. This is because the small number of very expensive outlier houses prices are inflating the average, whereas since the median is the “middle” value, it is not as sensitive to such large values at the high end. This is why the news typically reports median house prices and not average house prices when describing the real estate market. We say here that the median more “robust to outliers” than the mean.</li>
+<li>Similarly, while both the standard deviation and IQR are both measures of spread and variability, the IQR is more “robust to outliers”.</li>
+</ol>
+<p>If you repeat the above <code>summarize()</code> for <code>sqft_living</code>, you’ll find a similar relationship between mean vs median and standard deviation vs IQR given its similar right-skewed nature. Is there anything we can do about this right-skew? Again, this could potentially be an issue because we’ll have a harder time discriminating between houses at the lower end of <code>price</code> and <code>sqft_living</code>, which might lead to a problem when modeling.</p>
+<p>We can in fact address this issue by using a log base 10 transformation, which we cover next.</p>
+</div>
+<div id="log10-transformations" class="section level3">
+<h3><span class="header-section-number">12.1.2</span> log10 transformations</h3>
+<p>At its simplest, <code>log10()</code> transformations returns base 10 <em>logarithms</em>. For example, since <span class="math inline">\(1000 = 10^3\)</span>, <code>log10(1000)</code> returns <code>3</code>. To undo a log10-transformation, we raise 10 to this value. For example, to undo the previous log10-transformation and return the original value of 1000, we raise 10 to this value <span class="math inline">\(10^{3}\)</span> by running <code>10^(3) = 1000</code>. log-transformations allow us to focus on multiplicative changes instead of additive ones, thereby emphasizing changes in “orders of magnitude.” Let’s illustrate this idea in Table <a href="#tab:log10-orders-of-magnitude"><strong>??</strong></a> with examples of prices of consumer goods in US dollars.</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">Price</th>
+<th align="center">log10(Price)</th>
+<th align="center">Order of magnitude</th>
+<th align="center">Examples</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">$1</td>
+<td align="center">0</td>
+<td align="center">Singles</td>
+<td align="center">Cups of coffee</td>
+</tr>
+<tr class="even">
+<td align="center">$10</td>
+<td align="center">1</td>
+<td align="center">Tens</td>
+<td align="center">Books</td>
+</tr>
+<tr class="odd">
+<td align="center">$100</td>
+<td align="center">2</td>
+<td align="center">Hundreds</td>
+<td align="center">Mobile phones</td>
+</tr>
+<tr class="even">
+<td align="center">$1,000</td>
+<td align="center">3</td>
+<td align="center">Thousands</td>
+<td align="center">High definition TV’s</td>
+</tr>
+<tr class="odd">
+<td align="center">$10,000</td>
+<td align="center">4</td>
+<td align="center">Tens of thousands</td>
+<td align="center">Cars</td>
+</tr>
+<tr class="even">
+<td align="center">$100,000</td>
+<td align="center">5</td>
+<td align="center">Hundreds of thousands</td>
+<td align="center">Luxury cars &amp; houses</td>
+</tr>
+<tr class="odd">
+<td align="center">$1,000,000</td>
+<td align="center">6</td>
+<td align="center">Millions</td>
+<td align="center">Luxury houses</td>
+</tr>
+</tbody>
+</table>
+<p>Let’s break this down:</p>
+<ol style="list-style-type: decimal">
+<li>When purchasing a cup of coffee, we tend to think of prices ranging in single dollars e.g. $2 or $3. However when purchasing say mobile phones, we don’t tend to think in prices in single dollars e.g. $676 or $757, but tend to round to the nearest unit of hundreds of dollars e.g. $200 or $500.</li>
+<li>Let’s say want to know the log10-transformed value of $76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since $76 is between $10 and $100. In fact, <code>log10(76)</code> is 1.880814.</li>
+<li>log10-transformations are <em>monotonic</em>, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B).</li>
+<li>Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: $100 to $1000.</li>
+</ol>
+<p>Let’s create new log10-transformed versions of the right-skewed variable <code>price</code> and <code>sqft_living</code> using the <code>mutate()</code> function from Section <a href="5-wrangling.html#mutate">5.6</a>, but we’ll give the latter the name <code>log10_size</code>, which is a little more succinct and descriptive a variable name.</p>
+<pre class="sourceCode r"><code class="sourceCode r">house_prices &lt;-<span class="st"> </span>house_prices <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">mutate</span>(
+    <span class="dt">log10_price =</span> <span class="kw">log10</span>(price),
+    <span class="dt">log10_size =</span> <span class="kw">log10</span>(sqft_living)
+    )</code></pre>
+<p>Let’s first display the before and after effects of this transformation on these variables for only the first 10 rows of <code>house_prices</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">house_prices <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(price, log10_price, sqft_living, log10_size)</code></pre>
+<pre><code># A tibble: 10 x 4
+     price log10_price sqft_living log10_size
+     &lt;dbl&gt;       &lt;dbl&gt;       &lt;int&gt;      &lt;dbl&gt;
+ 1  221900        5.35        1180       3.07
+ 2  538000        5.73        2570       3.41
+ 3  180000        5.26         770       2.89
+ 4  604000        5.78        1960       3.29
+ 5  510000        5.71        1680       3.23
+ 6 1225000        6.09        5420       3.73
+ 7  257500        5.41        1715       3.23
+ 8  291850        5.47        1060       3.03
+ 9  229500        5.36        1780       3.25
+10  323000        5.51        1890       3.28</code></pre>
+<p>Observe in particular:</p>
+<ul>
+<li>The house in the 6th row with <code>price</code> $1,225,000, which is just above one million dollars. Since <span class="math inline">\(10^6\)</span> is one million, its <code>log10_price</code> is 6.09. Contrast this with all other houses with <code>log10_price</code> less than 6.</li>
+<li>Similarly, there is only one house with size <code>sqft_living</code> less than 1000. Since <span class="math inline">\(1000 = 10^3\)</span>, its the lone house with <code>log10_size</code> less than 3.</li>
+</ul>
+<p>Let’s now visualize the before and after effects of this transformation for <code>price</code> in Figure <a href="12-thinking-with-data.html#fig:log10-price-viz">12.4</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Before:</span>
+<span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> price)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;price (USD)&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House price: Before&quot;</span>)
+
+<span class="co"># After:</span>
+<span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> log10_price)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;log10 price (USD)&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House price: After&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:log10-price-viz"></span>
+<img src="ismaykim_files/figure-html/log10-price-viz-1.png" alt="House price before and after log10-transformation" width="\textwidth" />
+<p class="caption">
+Figure 12.4: House price before and after log10-transformation
+</p>
+</div>
+<p>Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and bell-shaped, although this isn’t always necessarily the case. Now you can now better discriminate between house prices at the lower end of the scale. Let’s do the same for size where the before variable is <code>sqft_living</code> and the after variable is <code>log10_size</code>. Observe in Figure <a href="12-thinking-with-data.html#fig:log10-size-viz">12.5</a> that the log10-transformation has a similar effect of un-skewing the variable. Again, we emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Before:</span>
+<span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> sqft_living)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;living space (square feet)&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House size: Before&quot;</span>)
+
+<span class="co"># After:</span>
+<span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> log10_size)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;log10 living space (square feet)&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House size: After&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:log10-size-viz"></span>
+<img src="ismaykim_files/figure-html/log10-size-viz-1.png" alt="House size before and after log10-transformation" width="\textwidth" />
+<p class="caption">
+Figure 12.5: House size before and after log10-transformation
+</p>
+</div>
+<p>Given the now un-skewed nature of <code>log10_price</code> and <code>log10_size</code>, we are going to revise our modeling structure:</p>
+<ul>
+<li>We’ll use a new outcome variable <span class="math inline">\(y\)</span> <code>log10_price</code> of houses</li>
+<li>The two explanatory/predictor variables we’ll use are:
+<ol style="list-style-type: decimal">
+<li><span class="math inline">\(x_1\)</span>: A modified version of house size: <code>log10_size</code></li>
+<li><span class="math inline">\(x_2\)</span>: House <code>condition</code> will remain unchanged</li>
+</ol></li>
+</ul>
+</div>
+<div id="eda-part-ii" class="section level3">
+<h3><span class="header-section-number">12.1.3</span> EDA Part II</h3>
+<p>Let’s continue our exploratory data analysis from Subsection <a href="12-thinking-with-data.html#house-prices-EDA-I">12.1.1</a> above. The earlier EDA you performed was <em>univariate</em> in nature in that we only considered one variable at a time. The goal of modeling, however, is to explore relationships between variables. So we must <em>jointly</em> consider the relationship between the outcome variable <code>log10_price</code> and the explanatory/predictor variables <code>log10_size</code> (numerical) and <code>condition</code> (categorical). We viewed such a modeling scenario in Section <a href="7-multiple-regression.html#model4">7.2</a> using the <code>evals</code> dataset, where the outcome variable was teaching <code>score</code>, the numerical explanatory/predictor variable was instructor <code>age</code> and the categorical explanatory/predictor variable was (binary) <code>gender</code>.</p>
+<p>We have two possible visual models. Either a parallel slopes model in Figure <a href="12-thinking-with-data.html#fig:house-price-parallel-slopes">12.6</a> where we have a different regression line for each of the 5 possible <code>condition</code> levels, each with a different intercept but the same slope:</p>
+<div class="figure" style="text-align: center"><span id="fig:house-price-parallel-slopes"></span>
+<img src="ismaykim_files/figure-html/house-price-parallel-slopes-1.png" alt="Parallel slopes model" width="\textwidth" />
+<p class="caption">
+Figure 12.6: Parallel slopes model
+</p>
+</div>
+<p>Or an interaction model in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction">12.7</a>, where we allow each regression line to not only have different intercepts, but different slopes as well:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> log10_size, <span class="dt">y =</span> log10_price, <span class="dt">col =</span> condition)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>(<span class="dt">alpha =</span> <span class="fl">0.1</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">y =</span> <span class="st">&quot;log10 price&quot;</span>, <span class="dt">x =</span> <span class="st">&quot;log10 size&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House prices in Seattle&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>, <span class="dt">se =</span> <span class="ot">FALSE</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:house-price-interaction"></span>
+<img src="ismaykim_files/figure-html/house-price-interaction-1.png" alt="Interaction model" width="\textwidth" />
+<p class="caption">
+Figure 12.7: Interaction model
+</p>
+</div>
+<p>In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plot it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the purple line is highest, followed by condition 4 and 3. As for condition 1 and 2, this pattern isn’t as clear, as if you recall from the univariate barplot of <code>condition</code> in Figure <a href="12-thinking-with-data.html#fig:house-prices-viz">12.3</a> there are very few houses of condition 1 or 2. This ready is more apparent in an alternative visualization to Figure <a href="12-thinking-with-data.html#fig:house-price-interaction">12.7</a> displayed in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction-2">12.8</a> that uses facets instead:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(house_prices, <span class="kw">aes</span>(<span class="dt">x =</span> log10_size, <span class="dt">y =</span> log10_price, <span class="dt">col =</span> condition)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>(<span class="dt">alpha =</span> <span class="fl">0.3</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">y =</span> <span class="st">&quot;log10 price&quot;</span>, <span class="dt">x =</span> <span class="st">&quot;log10 size&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;House prices in Seattle&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>, <span class="dt">se =</span> <span class="ot">FALSE</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span>condition)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:house-price-interaction-2"></span>
+<img src="ismaykim_files/figure-html/house-price-interaction-2-1.png" alt="Interaction model with facets" width="\textwidth" />
+<p class="caption">
+Figure 12.8: Interaction model with facets
+</p>
+</div>
+<p>Which exploratory visualization of the interaction model is better, the one in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction">12.7</a> or Figure <a href="12-thinking-with-data.html#fig:house-price-interaction-2">12.8</a>? There is no universal right answer, you need to make a choice depending on what you want to convey, and own it.</p>
+</div>
+<div id="house-prices-regression" class="section level3">
+<h3><span class="header-section-number">12.1.4</span> Regression modeling</h3>
+<p>For now let’s focus on the latter, interaction model we’ve visualized in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction-2">12.8</a> above. What are the 5 different slopes and intercepts for the condition = 1, condition = 2, …, and condition = 5 lines in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction-2">12.8</a>? To determine these, we first need the values from the regression table:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Fit regression model:</span>
+price_interaction &lt;-<span class="st"> </span><span class="kw">lm</span>(log10_price <span class="op">~</span><span class="st"> </span>log10_size <span class="op">*</span><span class="st"> </span>condition, <span class="dt">data =</span> house_prices)
+<span class="co"># Get regression table:</span>
+<span class="kw">get_regression_table</span>(price_interaction)</code></pre>
+<pre><code># A tibble: 10 x 7
+   term                  estimate std_error statistic p_value lower_ci upper_ci
+   &lt;chr&gt;                    &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
+ 1 intercept                3.33      0.451     7.38    0        2.45     4.22 
+ 2 log10_size               0.69      0.148     4.65    0        0.399    0.98 
+ 3 condition2               0.047     0.498     0.094   0.925   -0.93     1.02 
+ 4 condition3              -0.367     0.452    -0.812   0.417   -1.25     0.519
+ 5 condition4              -0.398     0.453    -0.879   0.38    -1.29     0.49 
+ 6 condition5              -0.883     0.457    -1.93    0.053   -1.78     0.013
+ 7 log10_size:condition2   -0.024     0.163    -0.148   0.882   -0.344    0.295
+ 8 log10_size:condition3    0.133     0.148     0.893   0.372   -0.158    0.424
+ 9 log10_size:condition4    0.146     0.149     0.979   0.328   -0.146    0.437
+10 log10_size:condition5    0.31      0.15      2.07    0.039    0.016    0.604</code></pre>
+<p>Recall from Section <a href="7-multiple-regression.html#model4interactiontable">7.2.3</a> on how to interpret the outputs where there exists an <em>interaction term</em>, where in this case the “baseline for comparison” group for the categorical variable <code>condition</code> are the condition 1 houses. We’ll write our answers as:</p>
+<p><span class="math display">\[\widehat{\log10(\text{price})} = \hat{\beta}_0 + \hat{\beta}_{\text{size}} * \log10(\text{size})\]</span></p>
+<p>for all five condition levels separately:</p>
+<ol style="list-style-type: decimal">
+<li>Condition 1: <span class="math inline">\(\widehat{\log10(\text{price})} = 3.33 + 0.69 * \log10(\text{size})\)</span></li>
+<li>Condition 2: <span class="math inline">\(\widehat{\log10(\text{price})} = (3.33 + 0.047) + (0.69 - 0.024) * \log10(\text{size}) = 3.38 + 0.666 * \log10(\text{size})\)</span></li>
+<li>Condition 3: <span class="math inline">\(\widehat{\log10(\text{price})} = (3.33 - 0.367) + (0.69 + 0.133) * \log10(\text{size}) = 2.96 + 0.823 * \log10(\text{size})\)</span></li>
+<li>Condition 4: <span class="math inline">\(\widehat{\log10(\text{price})} = (3.33 - 0.398) + (0.69 + 0.146) * \log10(\text{size}) = 2.93 + 0.836 * \log10(\text{size})\)</span></li>
+<li>Condition 5: <span class="math inline">\(\widehat{\log10(\text{price})} = (3.33 - 0.883) + (0.69 + 0.31) * \log10(\text{size}) = 2.45 + 1 * \log10(\text{size})\)</span></li>
+</ol>
+<p>These correspond to the regression lines in the exploratory visualization of the interaction model in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction">12.7</a> above. For homes of all 5 condition types, as the size of the house increases, the prices increases. This is what most would expect. However, the rate of increase of price with size is fastest for the homes with condition 3, 4, and 5 of 0.823, 0.836, and 1 respectively; these are the 3 most largest slopes out of the 5.</p>
+<!--
+### Inference for regression {#house-prices-inference-for-regression}
+
+Contributions via [pull request](http://happygitwithr.com/big-picture.html) to the [ModernDive GitHub repository](https://github.com/moderndive/moderndive_book) are welcome!
+-->
+</div>
+<div id="house-prices-making-predictions" class="section level3">
+<h3><span class="header-section-number">12.1.5</span> Making predictions</h3>
+<p>Say you’re a realtor and someone calls you asking you how much their home will sell for. They tell you that it’s in condition = 5 and is sized 1900 square feet. What do you tell them? We first make this prediction visually in Figure <a href="12-thinking-with-data.html#fig:house-price-interaction-3">12.9</a>. The predicted <code>log10_price</code> of this house is marked with a black dot: it is where the two following lines intersect:</p>
+<ul>
+<li>The purple regression line for the condition = 5 homes and</li>
+<li>The vertical dashed black line at <code>log10_size</code> equals 3.28, since our predictor variable is the log10-transformed square feet of living space and <span class="math inline">\(\log10(1900) = 3.28\)</span> .</li>
+</ul>
+<div class="figure" style="text-align: center"><span id="fig:house-price-interaction-3"></span>
+<img src="ismaykim_files/figure-html/house-price-interaction-3-1.png" alt="Interaction model with prediction" width="\textwidth" />
+<p class="caption">
+Figure 12.9: Interaction model with prediction
+</p>
+</div>
+<p>Eyeballing it, it seems the predicted <code>log10_price</code> seems to be around 5.72. Let’s now obtain the an exact numerical value for the prediction using the values of the intercept and slope for the condition = 5 that we computed using the regression table output. We use the equation for the condition = 5 line, being sure to <code>log10()</code> the
+square footage first.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="fl">2.45</span> <span class="op">+</span><span class="st"> </span><span class="dv">1</span> <span class="op">*</span><span class="st"> </span><span class="kw">log10</span>(<span class="dv">1900</span>)</code></pre>
+<pre><code>[1] 5.73</code></pre>
+<p>This value is very close to our earlier visually made prediction of 5.72. But wait! We were using the outcome variable <code>log10_price</code> as our outcome variable! So if we want a prediction in terms of <code>price</code> in dollar units, we need to un-log this by taking a power of 10 as described in Section <a href="12-thinking-with-data.html#log10-transformations">12.1.2</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="dv">10</span><span class="op">^</span>(<span class="fl">2.45</span> <span class="op">+</span><span class="st"> </span><span class="dv">1</span> <span class="op">*</span><span class="st"> </span><span class="kw">log10</span>(<span class="dv">1900</span>))</code></pre>
+<pre><code>[1] 535493</code></pre>
+<p>So we our predicted price for this home of condition 5 and size 1900 square feet is $535,493.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC12.1)</strong> Repeat the regression modeling in Subsection <a href="12-thinking-with-data.html#house-prices-regression">12.1.4</a> and the prediction making you just did on the house of condition 5 and size 1900 square feet in Subsection <a href="12-thinking-with-data.html#house-prices-making-predictions">12.1.5</a>, but using the parallel slopes model you visualized in Figure <a href="12-thinking-with-data.html#fig:house-price-parallel-slopes">12.6</a>. Hint: it’s $524,807!
+<!--
+Add this in later:
+intepreting the inference for regression in Subsection \@ref(house-prices-inference-for-regression),
+--></p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="data-journalism" class="section level2">
+<h2><span class="header-section-number">12.2</span> Case study: Effective data storytelling</h2>
+<hr />
+<div class="learncheck">
+<p>
+<strong>Note: This section is still under construction. If you would like to contribute, please check us out on GitHub at <a href="https://github.com/moderndive/moderndive_book" class="uri">https://github.com/moderndive/moderndive_book</a>.</strong>
+</p>
+<center>
+<img src="images/sign-2408065_1920.png" alt="Drawing" style="height: 100px;"/>
+</center>
+</div>
+<hr />
+<p>As we’ve progressed throughout this book, you’ve seen how to work with data in a variety of ways. You’ve learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types. You’ve summarized data in table form and calculated summary statistics for a variety of different variables. Further, you’ve seen the value of inference as a process to come to conclusions about a population by using a random sample. Lastly, you’ve explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure. All throughout, you’ve learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the “effective data storytelling” done by data journalists around the world. Great data stories don’t mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling.</p>
+<div id="bechdel-test-for-hollywood-gender-representation" class="section level3">
+<h3><span class="header-section-number">12.2.1</span> Bechdel test for Hollywood gender representation</h3>
+<p>We recommend you read and analyze this article by Walt Hickey entitled <a href="http://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/">The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women</a> on the Bechdel test, an informal test of gender representation in a movie. As you read over it, think carefully about how Walt is using data, graphics, and analyses to paint the picture for the reader of what the story is he wants to tell. In the spirit of reproducibility, the members of FiveThirtyEight have also shared the <a href="https://github.com/fivethirtyeight/data/tree/master/bechdel">data and R code</a> that they used to create for this story and many more of their articles on <a href="https://github.com/fivethirtyeight/data">GitHub</a>.</p>
+<p>ModernDive co-authors <a href="https://twitter.com/old_man_chester?lang=en">Chester Ismay</a> and <a href="https://twitter.com/rudeboybert">Albert Y. Kim</a> along with <a href="https://twitter.com/jchunn206">Jennifer Chunn</a> went one step further by creating the <a href="https://fivethirtyeight-r.netlify.com/"><code>fivethirtyeight</code> R package</a>. The <code>fivethirtyeight</code> package takes FiveThirtyEight’s article data from GitHub, <a href="http://rpubs.com/rudeboybert/fivethirtyeight_tamedata">“tames”</a> it so that it’s novice-friendly, and makes all data, documentation, and the original article easily accessible via an R package.</p>
+<p>The package homepage also includes a list of <a href="https://fivethirtyeight-r.netlify.com/articles/fivethirtyeight.html#data-sets">all <code>fivethirtyeight</code> data sets</a> included.</p>
+<p>Furthermore, example “vignettes” of fully reproducible start-to-finish analyses of some of these data using <code>dplyr</code>, <code>ggplot2</code>, and other packages in the <code>tidyverse</code> is available <a href="https://fivethirtyeight-r.netlify.com/articles/">here</a>. For example, a vignette showing how to reproduce one of the plots at the end of the above article on the Bechdel test is available <a href="https://fivethirtyeight-r.netlify.com/articles/bechdel.html">here</a>.</p>
+</div>
+<div id="us-births-in-1999" class="section level3">
+<h3><span class="header-section-number">12.2.2</span> US Births in 1999</h3>
+<p>Here is another example involving the <code>US_births_1994_2003</code> data frame of all births in the United States between 1994 and 2003. For more information on this data frame including a link to the original article on FiveThirtyEight.com, check out the help file by running <code>?US_births_1994_2003</code> in the console. First, let’s load all necessary packages:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(fivethirtyeight)</code></pre>
+<p>It’s always a good idea to preview your data, either by using RStudio’s spreadsheet <code>View()</code> function or using <code>glimpse()</code> from the <code>dplyr</code> package below:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Preview data</span>
+<span class="kw">glimpse</span>(US_births_<span class="dv">1994</span>_<span class="dv">2003</span>)</code></pre>
+<pre><code>Observations: 3,652
+Variables: 6
+$ year          &lt;int&gt; 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1…
+$ month         &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
+$ date_of_month &lt;int&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
+$ date          &lt;date&gt; 1994-01-01, 1994-01-02, 1994-01-03, 1994-01-04, 1994-0…
+$ day_of_week   &lt;ord&gt; Sat, Sun, Mon, Tues, Wed, Thurs, Fri, Sat, Sun, Mon, Tu…
+$ births        &lt;int&gt; 8096, 7772, 10142, 11248, 11053, 11406, 11251, 8653, 79…</code></pre>
+<p>We’ll focus on the number of <code>births</code> for each <code>date</code>, but only for births that occurred in 1999. Recall we achieve this using the <code>filter()</code> command from <code>dplyr</code> package:</p>
+<pre class="sourceCode r"><code class="sourceCode r">US_births_<span class="dv">1999</span> &lt;-<span class="st"> </span>US_births_<span class="dv">1994</span>_<span class="dv">2003</span> <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">filter</span>(year <span class="op">==</span><span class="st"> </span><span class="dv">1999</span>)</code></pre>
+<p>Since <code>date</code> is a notion of time, which has a sequential ordering to it, a linegraph AKA a “time series” plot would be more appropriate than a scatterplot. In other words, use a <code>geom_line()</code> instead of <code>geom_point()</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(US_births_<span class="dv">1999</span>, <span class="kw">aes</span>(<span class="dt">x =</span> date, <span class="dt">y =</span> births)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_line</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Data&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Number of births&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;US Births in 1999&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-439-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see a big valley occurring just before January 1st, 2000, mostly likely due to the holiday season. However, what about the major peak of over 14,000 births occurring just before October 1st, 1999? What could be the reason for this anomalously high spike in ? Time to think with data!</p>
+</div>
+<div id="other-examples" class="section level3">
+<h3><span class="header-section-number">12.2.3</span> Other examples</h3>
+<p>Stand by!</p>
+</div>
+<div id="script-of-r-code-9" class="section level3">
+<h3><span class="header-section-number">12.2.4</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/12-thinking-with-data.R">here</a>.</p>
+</div>
+</div>
+<div id="concluding-remarks" class="section level2 unnumbered">
+<h2>Concluding remarks</h2>
+<p>If you’ve come to this point in the book, I’d suspect that you know a thing or two about how to work with data in R. You’ve also gained a lot of knowledge about how to use simulation techniques to determine statistical significance and how these techniques build an intuition about traditional inferential methods like the <span class="math inline">\(t\)</span>-test. The hope is that you’ve come to appreciate data wrangling, tidy datasets, and the power of data visualization. Actually, the data visualization part may be the most important thing here. If you can create truly beautiful graphics that display information in ways that the reader can clearly decipher, you’ve picked up a great skill. Let’s hope that that skill keeps you creating great stories with data into the near and far distant future. Thanks for coming along for the ride as we dove into modern data analysis using R!</p>
+
+</div>
+</div>
+
+
+
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="11-inference-for-regression.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="A-appendixA.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/12-thinking-with-data.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/2-getting-started.html b/previous_versions/v0.4.0/2-getting-started.html
new file mode 100644
index 000000000..b48e65c5d
--- /dev/null
+++ b/previous_versions/v0.4.0/2-getting-started.html
@@ -0,0 +1,990 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>2 Getting Started with Data in R | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="2 Getting Started with Data in R | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="2 Getting Started with Data in R | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="index.html">
+<link rel="next" href="3-viz.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="getting-started" class="section level1">
+<h1><span class="header-section-number">2</span> Getting Started with Data in R</h1>
+<p>Before we can start exploring data in R, there are some key concepts to understand first:</p>
+<ol style="list-style-type: decimal">
+<li>What are R and RStudio?</li>
+<li>How do I code in R?</li>
+<li>What are R packages?</li>
+</ol>
+<p>If you are already familiar with these concepts, feel free to skip to Section <a href="2-getting-started.html#nycflights13">2.4</a> below introducing some of the datasets we will explore in depth in this book. Much of this chapter is based on two sources which you should feel free to use as references if you are looking for additional details:</p>
+<ol style="list-style-type: decimal">
+<li>ModernDive co-author Chester Ismay’s <a href="http://ismayc.github.io/rbasics-book">Getting used to R, RStudio, and R Markdown</a> <span class="citation">(Ismay 2016)</span>, which includes video screen recordings that you can follow along and pause as you learn.</li>
+<li>DataCamp’s online tutorials. DataCamp is a browser-based interactive platform for learning data science and their tutorials will help facilitate your learning of the above concepts (and other topics in this book). Go to <a href="https://www.datacamp.com/">DataCamp</a> and create an account before continuing.</li>
+</ol>
+<hr />
+<div id="what-are-r-and-rstudio" class="section level2">
+<h2><span class="header-section-number">2.1</span> What are R and RStudio?</h2>
+<p>For much of this book, we will assume that you are using R via RStudio. First time users often confuse the two. At its simplest:</p>
+<ul>
+<li>R is like a car’s engine</li>
+<li>RStudio is like a car’s dashboard</li>
+</ul>
+<table>
+<thead>
+<tr class="header">
+<th align="center">R: Engine</th>
+<th align="center">RStudio: Dashboard</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center"><img src="images/engine.jpg" alt="Drawing" style="height: 200px;"/></td>
+<td align="center"><img src="images/dashboard.jpg" alt="Drawing" style="height: 200px;"/></td>
+</tr>
+</tbody>
+</table>
+<p>More precisely, R is a programming language that runs computations while RStudio is an <em>integrated development environment (IDE)</em> that provides an interface by adding many convenient features and tools. So the way of having access to a speedometer, rearview mirrors, and a navigation system makes driving much easier, using RStudio’s interface makes using R much easier as well.</p>
+<p>Optional: For a more in-depth discussion on the difference between R and RStudio IDE, watch this <a href="https://campus.datacamp.com/courses/working-with-the-rstudio-ide-part-1/orientation?ex=1">DataCamp video (2m52s)</a>.</p>
+<div id="installing-r-and-rstudio" class="section level3">
+<h3><span class="header-section-number">2.1.1</span> Installing R and RStudio</h3>
+<p><em>If your instructor has provided you with a link and access to RStudio Server, then you can skip this section. We do recommend though after a few months of working on the RStudio Server that you return to these instructions. If you don’t know what RStudio Server is, then please read this section.</em></p>
+<p>You will first need to download and install both R and RStudio (Desktop version) on your computer.</p>
+<ol style="list-style-type: decimal">
+<li><a href="https://cran.r-project.org/">Download and install R</a>.
+<ul>
+<li>Note: You must do this first.</li>
+<li>Click on the download link corresponding to your computer’s operating system.</li>
+</ul></li>
+<li><a href="https://www.rstudio.com/products/rstudio/download3/">Download and install RStudio</a>.
+<ul>
+<li>Scroll down to “Installers for Supported Platforms”</li>
+<li>Click on the download link corresponding to your computer’s operating system.</li>
+</ul></li>
+</ol>
+<p>Optional: If you need more detailed instructions on how to install R and RStudio, watch this <a href="https://campus.datacamp.com/courses/working-with-the-rstudio-ide-part-1/orientation?ex=3">DataCamp video (1m22s)</a>.</p>
+</div>
+<div id="using-r-via-rstudio" class="section level3">
+<h3><span class="header-section-number">2.1.2</span> Using R via RStudio</h3>
+<p>Recall our car analogy from above. Much as we don’t drive a car by interacting directly with the engine but rather by using elements on the car’s dashboard, we won’t be using R directly but rather we will use RStudio’s interface. After you install R and RStudio on your computer, you’ll have two new programs AKA applications you can open. We will always work in RStudio and not R. In other words:</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">R: Do not open this</th>
+<th align="center">RStudio: Open this</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center"><img src="https://cran.r-project.org/Rlogo.svg" alt="Drawing" style="height: 100px;"/></td>
+<td align="center"><img src="https://www.rstudio.com/wp-content/uploads/2014/06/RStudio-Ball.png" alt="Drawing" style="height: 100px;"/></td>
+</tr>
+</tbody>
+</table>
+<p>After you open RStudio, you should see the following:</p>
+<p><img src="images/rstudio.png" /></p>
+<p>Watch the following <a href="https://campus.datacamp.com/courses/working-with-the-rstudio-ide-part-1/orientation?ex=5">DataCamp video (4m10s)</a> to learn about the different <em>panes</em> in RStudio, in particular the <em>Console pane</em> where you will later run R code.</p>
+<hr />
+</div>
+</div>
+<div id="code" class="section level2">
+<h2><span class="header-section-number">2.2</span> How do I code in R?</h2>
+<p>Now that you’re set up with R and RStudio, you are probably asking yourself “OK. Now how do I use R?” The first thing to note as that unlike other software like Excel, STATA, or SAS that provide <a href="https://en.wikipedia.org/wiki/Point_and_click">point and click</a> interfaces, R is an <a href="https://en.wikipedia.org/wiki/Interpreted_language">interpreted language</a>, meaning you have to enter in R commands written in R code i.e. you have to program in R (we use the terms “coding” and “programming” interchangeably in this book).</p>
+<p>While it is not required to be a seasoned coder/computer programmer to use R, there is still a set of basic programming concepts that R users need to understand. Consequently, while this book is not a book on programming, you will still learn just enough of these basic programming concepts needed to explore and analyze data effectively.</p>
+<div id="programming-concepts" class="section level3">
+<h3><span class="header-section-number">2.2.1</span> Basic programming concepts and terminology</h3>
+<p>To introduce you to many of these basic programming concepts and terminology, we direct you to the following DataCamp online interactive tutorials. For each of the tutorials, we give a list of the basic programming concepts covered. Note that in this book, we will use a different font to distinguish regular font from <code>computer_code</code>.</p>
+<p>It is important to note that while these tutorials serve as excellent introductions, a single pass through them is insufficient for long-term learning and retention. The ultimate tools for long-term learning and retention are “learning by doing” and repetition, something we will have you do over the course of the entire book and we encourage this process as much as possible as you learn any new skill.</p>
+<ul>
+<li>From the <a href="https://www.datacamp.com/courses/free-introduction-to-r">Introduction to R</a> course complete the following chapters. As you work through the chapters, carefully note the important terms and what they are used for. We recommend you do so in a notebook that you can easily refer back to.
+<ul>
+<li><a href="https://campus.datacamp.com/courses/free-introduction-to-r/chapter-1-intro-to-basics-1?ex=1">Chapter 1 Intro to basics</a>:
+<ul>
+<li>Console pane: where you enter in commands</li>
+<li>Objects: where values are saved, how to assign values to objects.</li>
+<li>Data types: integers, doubles/numerics, logicals, characters.<br />
+</li>
+</ul></li>
+<li><a href="https://campus.datacamp.com/courses/free-introduction-to-r/chapter-2-vectors-2?ex=1">Chapter 2 Vectors</a>:
+<ul>
+<li>Vectors: a series of values.</li>
+</ul></li>
+<li><a href="https://campus.datacamp.com/courses/free-introduction-to-r/chapter-4-factors-4?ex=1">Chapter 4 Factors</a>:
+<ul>
+<li><em>Categorical data</em> (as opposed to <em>numerical data</em>) are represented in R as <code>factor</code>s.</li>
+</ul></li>
+<li><a href="https://campus.datacamp.com/courses/free-introduction-to-r/chapter-5-data-frames?ex=1">Chapter 5 Data frames</a>:
+<ul>
+<li>Data frames are analogous to rectangular spreadsheets: they are representations of datasets in R where the rows correspond <em>observations</em> and the columns correspond to <em>variables</em> that describe the observations. We will revisit this later in Section <a href="2-getting-started.html#nycflights13">2.4</a>.</li>
+</ul></li>
+</ul></li>
+<li>From the <a href="https://www.datacamp.com/courses/intermediate-r">Intermediate R</a> course complete the following chapters:
+<ul>
+<li><a href="https://campus.datacamp.com/courses/intermediate-r/chapter-1-conditionals-and-control-flow?ex=1">Chapter 1 Conditionals and Control Flow</a>:
+<ul>
+<li>Testing for equality in R using <code>==</code> (and not <code>=</code> which is typically used for assignment). Ex: <code>2 + 1 == 3</code> compares <code>2 + 1</code> to <code>3</code> and is correct R syntax, while <code>2 + 1 = 3</code> is not and is incorrect R syntax.</li>
+<li>Boolean algebra: <code>TRUE/FALSE</code> statements and mathematical operators such as <code>&lt;</code> (less than), <code>&lt;=</code> (less than or equal), and <code>!=</code> (not equal to).</li>
+<li>Logical operators: <code>&amp;</code> representing “and”, <code>|</code> representing “or”. Ex: <code>(2 + 1 == 3) &amp; (2 + 1 == 4)</code> returns <code>FALSE</code> while <code>(2 + 1 == 3) | (2 + 1 == 4)</code> returns <code>TRUE</code>.</li>
+</ul></li>
+<li><a href="https://campus.datacamp.com/courses/intermediate-r/chapter-3-functions?ex=1">Chapter 3 Functions</a>:
+<ul>
+<li>Concept of functions: they take in inputs (called <em>arguments</em>) and return outputs.</li>
+<li>You either manually specify a function’s arguments or use the function’s <em>defaults</em>.</li>
+</ul></li>
+</ul></li>
+</ul>
+<p>This list is by no means an exhaustive list of all the programming concepts and terminology needed to become a savvy R user; such a list would be so large it wouldn’t be very useful, especially for novices. Rather, we feel this is the bare minimum you need to know before you get started; the rest we feel you can learn as you go. Remember that your knowledge of all of these concepts will build as you get better and better at “speaking R” and getting used to its syntax.</p>
+</div>
+<div id="tips-on-learning-to-code" class="section level3">
+<h3><span class="header-section-number">2.2.2</span> Tips on learning to code</h3>
+<p>Learning to code/program is very much like learning a foreign language, it can be very daunting and frustrating at first. However just as with learning a foreign language, if you put in the effort and are not afraid to make mistakes, anybody can learn. Lastly, there are a few useful things to keep in mind as you learn to program:</p>
+<ul>
+<li><strong>Computers are stupid</strong>: You have to tell a computer everything it needs to do. Furthermore, your instructions can’t have any mistakes in them, nor can they be ambiguous in any way.</li>
+<li><strong>Take the “copy/paste/tweak” approach</strong>: Especially when learning your first programming language, it is often much easier to taking existing code that you know works and modify it to suit your ends, rather than trying to write new code from scratch. We call this the <em>copy/paste/tweak</em> approach. So early on, we suggest not trying to code from scratch, but please take the code we provide throughout this book and play around with it!</li>
+<li><strong>Practice is key</strong>: Just as the only solution to improving your foreign language skills is practice, so also the only way to get better at R is through practice. Don’t worry however, we’ll give you plenty of opportunities to practice!</li>
+</ul>
+<hr />
+</div>
+</div>
+<div id="packages" class="section level2">
+<h2><span class="header-section-number">2.3</span> What are R packages?</h2>
+<p>Another point of confusion with new R users is the notion of a package. R packages extend the functionality of R by providing additional functions, data, and documentation and can be downloaded for free from the internet. They are written by a world-wide community of R users. For example, among the many packages we will use in this book are the</p>
+<ul>
+<li><code>ggplot2</code> package for data visualization in Chapter <a href="3-viz.html#viz">3</a></li>
+<li><code>dplyr</code> package for data wrangling in Chapter <a href="5-wrangling.html#wrangling">5</a></li>
+</ul>
+<p>There are two key things to remember about R packages:</p>
+<ol style="list-style-type: decimal">
+<li><em>Installation</em>: Most packages are not installed by default when you install R and RStudio. You need to install a package before you can use it. Once you’ve installed it, you likely don’t need to install it again unless you want to update it to a newer version of the package.</li>
+<li><em>Loading</em>: Packages are not loaded automatically when you open RStudio. You need to load them every time you open RStudio using the <code>library()</code> command.</li>
+</ol>
+<p>A good analogy for R packages is they are like apps you can download onto a mobile phone:</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">R: A new phone</th>
+<th align="center">R Packages: Apps you can download</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center"><img src="images/iphone.jpg" alt="Drawing" style="height: 200px;"/></td>
+<td align="center"><img src="images/apps.jpg" alt="Drawing" style="height: 200px;"/></td>
+</tr>
+</tbody>
+</table>
+<p>So, expanding on this analogy a bit:</p>
+<ol style="list-style-type: decimal">
+<li>R is like a new mobile phone. It has a certain amount of functionality when you use it for the first time, but it doesn’t have everything.</li>
+<li>R packages are like the apps you can download onto your phone, much like those offered in the App Store and Google Play. For example: Instagram.</li>
+<li>In order to use a package, just like in order to use Instagram, you must:
+<ol style="list-style-type: decimal">
+<li>First download it and install it. You do this only once.</li>
+<li>Load it, or in other words, “open” it, using the <code>library()</code> command.</li>
+</ol></li>
+</ol>
+<p>So just as you can only start sharing photos with your friends on Instagram if you first install the app and then open it, you can only access an R package’s data and functions if you first install the package and then load it with the <code>library()</code> command. Let’s cover these two steps:</p>
+<div id="package-installation" class="section level3">
+<h3><span class="header-section-number">2.3.1</span> Package installation</h3>
+<p>(Note that if you are working on an RStudio Server, you probably will not need to install your own packages as that has been already done for you. Still it is important that you know this process for later when you are not using the RStudio Server but rather your own installation of RStudio Desktop.)</p>
+<p>There are two ways to install an R package. For example, to install the <code>ggplot2</code> package:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Easy way</strong>: In the Files pane of RStudio:
+<ol style="list-style-type: lower-alpha">
+<li>Click on the “Packages” tab</li>
+<li>Click on “Install”</li>
+<li>Type the name of the package under “Packages (separate multiple with space or comma):” In this case, type <code>ggplot2</code></li>
+<li>Click “Install”</li>
+</ol></li>
+<li><strong>Alternative way</strong>: In the Console pane run <code>install.packages(&quot;ggplot2&quot;)</code> (you must include the quotation marks).</li>
+</ol>
+<p>Repeat this for the <code>dplyr</code> and <code>nycflights13</code> packages.</p>
+<p><strong>Note</strong>: You only have to install a package once, unless you want to update an already installed package to the latest version. If you want to update a package to the latest version, then re-install it by repeating the above steps.</p>
+</div>
+<div id="package-loading" class="section level3">
+<h3><span class="header-section-number">2.3.2</span> Package loading</h3>
+<p>After you’ve installed a package, you can now load it using the <code>library()</code> command. For example, to load the <code>ggplot2</code> and <code>dplyr</code> packages, run the following code in the Console pane:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)</code></pre>
+<p><strong>Note</strong>: You have to reload each package you want to use every time you open a new session of RStudio. This is a little annoying to get used to and will be your most common error as you begin. When you see an error such as</p>
+<pre><code>Error: could not find function</code></pre>
+<p>remember that this likely comes from you trying to use a function in a package that has not been loaded. Remember to run the <code>library()</code> function with the appropriate package to fix this error.</p>
+<hr />
+</div>
+</div>
+<div id="nycflights13" class="section level2">
+<h2><span class="header-section-number">2.4</span> Explore your first dataset</h2>
+<p>Let’s put everything we’ve learned so far into practice and start exploring some real data! Data comes to us in a variety of formats, from pictures to text to numbers. Throughout this book, we’ll focus on datasets that can be stored in a spreadsheet as that is among the most common way data is collected in the many fields. Remember from Subsection <a href="2-getting-started.html#programming-concepts">2.2.1</a> that these “spreadsheet”-type datasets are called <em>data frames</em> in R and we will focus on working with data frames throughout this book.</p>
+<p>Let’s first load all the packages needed for this chapter (This assumes you’ve already installed them. Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages if you haven’t already.) At the beginning of all subsequent chapters in this text, we’ll always have a list of packages similar to what follows that you should have installed and loaded to work with that chapter’s R code.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)</code></pre>
+<pre><code>Warning: package &#39;dplyr&#39; was built under R version 3.5.2</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Be sure to install these first!</span>
+<span class="kw">library</span>(nycflights13)
+<span class="kw">library</span>(knitr)</code></pre>
+<div id="nycflights13-package" class="section level3">
+<h3><span class="header-section-number">2.4.1</span> nycflights13 package</h3>
+<p>We likely have all flown on airplanes or know someone who has. Air travel has become an ever-present aspect in many people’s lives. If you live in or are visiting a relatively large city and you walk around that city’s airport, you see gates showing flight information from many different airlines. And you will frequently see that some flights are delayed because of a variety of conditions. Are there ways that we can avoid having to deal with these flight delays?</p>
+<p>We’d all like to arrive at our destinations on time whenever possible. (Unless you secretly love hanging out at airports. If you are one of these people, pretend for the moment that you are very much anticipating being at your final destination.) Throughout this book, we’re going to analyze data related to flights contained in the <code>nycflights13</code> package <span class="citation">(Wickham 2018)</span>. Specifically, this package contains five datasets saved as “data frames” (see Section <a href="2-getting-started.html#code">2.2</a>) with information about all domestic flights departing from New York City in 2013, from either Newark Liberty International (EWR), John F. Kennedy International (JFK), or LaGuardia (LGA) airports:</p>
+<ul>
+<li><code>flights</code>: information on all 336,776 flights</li>
+<li><code>airlines</code>: translation between two letter IATA carrier codes and names (16 in total)</li>
+<li><code>planes</code>: construction information about each of 3,322 planes used</li>
+<li><code>weather</code>: hourly meteorological data (about 8705 observations) for each of the three NYC airports</li>
+<li><code>airports</code>: airport names and locations</li>
+</ul>
+</div>
+<div id="flights-data-frame" class="section level3">
+<h3><span class="header-section-number">2.4.2</span> flights data frame</h3>
+<p>We will begin by exploring the <code>flights</code> data frame that is included in the <code>nycflights13</code> package and getting an idea of its structure. Run the following in your code in your console: it loads in the <code>flights</code> dataset into your Console. Note depending on the size of your monitor, the output may vary slightly.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights</code></pre>
+<pre><code># A tibble: 336,776 x 19
+    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+   &lt;int&gt; &lt;int&gt; &lt;int&gt;    &lt;int&gt;          &lt;int&gt;     &lt;dbl&gt;    &lt;int&gt;          &lt;int&gt;
+ 1  2013     1     1      517            515         2      830            819
+ 2  2013     1     1      533            529         4      850            830
+ 3  2013     1     1      542            540         2      923            850
+ 4  2013     1     1      544            545        -1     1004           1022
+ 5  2013     1     1      554            600        -6      812            837
+ 6  2013     1     1      554            558        -4      740            728
+ 7  2013     1     1      555            600        -5      913            854
+ 8  2013     1     1      557            600        -3      709            723
+ 9  2013     1     1      557            600        -3      838            846
+10  2013     1     1      558            600        -2      753            745
+# … with 336,766 more rows, and 11 more variables: arr_delay &lt;dbl&gt;,
+#   carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;,
+#   air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;, time_hour &lt;dttm&gt;</code></pre>
+<p>Let’s unpack this output:</p>
+<ul>
+<li><code>A tibble: 336,776 x 19</code>: a <code>tibble</code> is a <a href="https://blog.rstudio.org/2016/03/24/tibble-1-0-0/#tibbles-vs-data-frames">kind of data frame</a>. This particular data frame has
+<ul>
+<li><code>336,776</code> rows</li>
+<li><code>19</code> columns corresponding to 19 variables describing each observation</li>
+</ul></li>
+<li><code>year month day dep_time sched_dep_time dep_delay arr_time</code> are different columns, in other words variables, of this data frame.</li>
+<li>We then have the first 10 rows of observations corresponding to 10 flights.</li>
+<li><code>... with 336,766 more rows, and 11 more variables:</code> indicating to us that 336,766 more rows of data and 11 more variables could not fit in this screen.</li>
+</ul>
+<p>Unfortunately, this output does not allow us to explore the data very well. Let’s look at different tools to explore data frames.</p>
+</div>
+<div id="exploredataframes" class="section level3">
+<h3><span class="header-section-number">2.4.3</span> Exploring data frames</h3>
+<p>Among the many ways of getting a feel for the data contained in a data frame such as <code>flights</code>, we present three functions that take as their argument the data frame in question:</p>
+<ol style="list-style-type: decimal">
+<li>Using the <code>View()</code> function built for use in RStudio. We will use this the most.</li>
+<li>Using the <code>glimpse()</code> function loaded via <code>dplyr</code> package</li>
+<li>Using the <code>kable()</code> function in the <code>knitr</code> package</li>
+<li>Using the <code>$</code> operator to view a single variable in a data frame</li>
+</ol>
+<p><strong>1. <code>View()</code></strong>:</p>
+<p>Run <code>View(flights)</code> in your Console in RStudio and explore this data frame in the resulting pop-up viewer. You should get into the habit of always <code>View</code>ing any data frames that come your way.</p>
+<p>Note the capital “V” in <code>View</code>. R is case-sensitive so you’ll receive an error is you run <code>view(flights)</code> instead of <code>View(flights)</code>.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC2.1)</strong> What does any <em>ONE</em> row in this <code>flights</code> dataset refer to?</p>
+<ul>
+<li>A. Data on an airline</li>
+<li>B. Data on a flight</li>
+<li>C. Data on an airport</li>
+<li>D. Data on multiple flights</li>
+</ul>
+<div class="learncheck">
+
+</div>
+<p>By running <code>View(flights)</code>, we see the different <em>variables</em> listed in the columns and we see that there are different types of variables. Some of the variables like <code>distance</code>, <code>day</code>, and <code>arr_delay</code> are what we will call <em>quantitative</em> variables. These variables are numerical in nature. Other variables here are <em>categorical</em>.</p>
+<p>Note that if you look in the leftmost column of the <code>View(flights)</code> output, you will see a column of numbers. These are the row numbers of the dataset. If you glance across a row with the same number, say row 5, you can get an idea of what each row corresponds to. In other words, this will allow you to identify what object is being referred to in a given row. This is often called the <em>observational unit</em>. The <em>observational unit</em> in this example is an individual flight departing New York City in 2013. You can identify the observational unit by determining what the <em>thing</em> is that is being measured in each of the variables.</p>
+<p><strong>2. <code>glimpse()</code></strong>:</p>
+<p>The second way to explore a data frame is using the <code>glimpse()</code> function that you can access after you’ve loaded the <code>dplyr</code> package. It provides us with much of the above information and more.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(flights)</code></pre>
+<pre><code>Observations: 336,776
+Variables: 19
+$ year           &lt;int&gt; 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, …
+$ month          &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
+$ day            &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
+$ dep_time       &lt;int&gt; 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558,…
+$ sched_dep_time &lt;int&gt; 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600,…
+$ dep_delay      &lt;dbl&gt; 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -…
+$ arr_time       &lt;int&gt; 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849…
+$ sched_arr_time &lt;int&gt; 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851…
+$ arr_delay      &lt;dbl&gt; 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -…
+$ carrier        &lt;chr&gt; &quot;UA&quot;, &quot;UA&quot;, &quot;AA&quot;, &quot;B6&quot;, &quot;DL&quot;, &quot;UA&quot;, &quot;B6&quot;, &quot;EV&quot;, &quot;B6&quot;, …
+$ flight         &lt;int&gt; 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, …
+$ tailnum        &lt;chr&gt; &quot;N14228&quot;, &quot;N24211&quot;, &quot;N619AA&quot;, &quot;N804JB&quot;, &quot;N668DN&quot;, &quot;N39…
+$ origin         &lt;chr&gt; &quot;EWR&quot;, &quot;LGA&quot;, &quot;JFK&quot;, &quot;JFK&quot;, &quot;LGA&quot;, &quot;EWR&quot;, &quot;EWR&quot;, &quot;LGA&quot;…
+$ dest           &lt;chr&gt; &quot;IAH&quot;, &quot;IAH&quot;, &quot;MIA&quot;, &quot;BQN&quot;, &quot;ATL&quot;, &quot;ORD&quot;, &quot;FLL&quot;, &quot;IAD&quot;…
+$ air_time       &lt;dbl&gt; 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, …
+$ distance       &lt;dbl&gt; 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733,…
+$ hour           &lt;dbl&gt; 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, …
+$ minute         &lt;dbl&gt; 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, …
+$ time_hour      &lt;dttm&gt; 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 …</code></pre>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC2.2)</strong> What are some examples in this dataset of <strong>categorical</strong> variables? What makes them different than <strong>quantitative</strong> variables?</p>
+<p><strong>(LC2.3)</strong> What does <code>int</code>, <code>dbl</code>, and <code>chr</code> mean in the output above?</p>
+<div class="learncheck">
+
+</div>
+<p>We see that <code>glimpse</code> will give you the first few entries of each variable in a row after the variable. In addition, the <em>data type</em> (See Subsection <a href="2-getting-started.html#programming-concepts">2.2.1</a>) of the variable is given immediately after each variable’s name inside <code>&lt; &gt;</code>. Here, <code>int</code> and <code>num</code> refer to quantitative variables. In contrast, <code>chr</code> refers to categorical variables. One more type of variable is given here with the <code>time_hour</code> variable: <code>dttm</code>. As you may suspect, this variable corresponds to a specific date and time of day.</p>
+<p><strong>3. <code>kable()</code></strong>:</p>
+<p>The final way to explore the entirety of a data frame is using the <code>kable()</code> function from the <code>knitr</code> package. Let’s explore the different carrier codes for all the airlines in our dataset two ways. Run both of these in your Console:</p>
+<pre class="sourceCode r"><code class="sourceCode r">airlines
+<span class="kw">kable</span>(airlines)</code></pre>
+<p>At first glance of both outputs, it may not appear that there is much difference. However, we’ll see later on, especially when using a tool for document production called <a href="http://rmarkdown.rstudio.com/lesson-1.html">R Markdown</a>, that the latter produces output that is much more legible.</p>
+<p><strong>4. <code>$</code> operator</strong></p>
+<p>Lastly, the <code>$</code> operator allows us to explore a single variable within a data frame. For example, run the following in your console</p>
+<pre class="sourceCode r"><code class="sourceCode r">airlines
+airlines<span class="op">$</span>name</code></pre>
+<p>We used the <code>$</code> operator to extract only the <code>name</code> variable and return it as a vector of length 16. We will only be occasionally exploring data frames using this operator.</p>
+</div>
+<div id="help-files" class="section level3">
+<h3><span class="header-section-number">2.4.4</span> Help files</h3>
+<p>Another nice feature of R is the help system. You can get help in R by entering a <code>?</code> before the name of a function or data frame in question and you will be presented with a page showing the documentation. For example, let’s look at the help file for the <code>flights</code> data frame:</p>
+<pre class="sourceCode r"><code class="sourceCode r">?flights</code></pre>
+<p>A help file should pop-up in the Help pane of RStudio. Note the content of this particular help file is also accessible on the <a href="https://cran.r-project.org/web/packages/nycflights13/nycflights13.pdf">web</a> on page 3 of the PDF document. You should get in the habit of consulting the help file of any function or data frame in R about which you have questions.</p>
+<hr />
+</div>
+</div>
+<div id="conclusion" class="section level2">
+<h2><span class="header-section-number">2.5</span> Conclusion</h2>
+<p>We’ve given you what we feel are the most essential concepts to know before you can start exploring data in R. Is this chapter exhaustive? Absolutely not. To try to include everything in this chapter would make the chapter so large it wouldn’t be useful! However, as we stated earlier, the best way to learn R is to learn by doing. Now let’s get into learning about how to create good stories about and with data. In Chapter <a href="3-viz.html#viz">3</a>, we start with what we feel is the most important tool in a data scientist’s toolbox: data visualization.</p>
+<div id="whats-to-come" class="section level3">
+<h3><span class="header-section-number">2.5.1</span> What’s to come?</h3>
+<p>We’ll now start the “data science” portion of the in Chapter <a href="3-viz.html#viz">3</a>, where we will further explore the datasets include the <code>nycflights13</code> package. We’ll see that data visualization is a powerful tool to add to our toolbox for exploring what is going on in a dataset beyond the <code>View</code> and <code>glimpse</code> functions we introduced in this chapter.</p>
+<center>
+<img src="images/flowcharts/flowchart/flowchart.004.png" title="ModernDive flowchart" width="800"/>
+</center>
+<!--
+### Data Packages
+
+Some of the datasets we will analyze in this class are accessible via R packages. For example:
+
+- flights leaving New York City in 2013 in the `nycflights13` package
+- profiles of OKCupid users in San Francisco in the `okcupiddata` package
+- IMDB movie ratings in the `ggplot2movies` package
+
+By focusing on a few large data sources, it is our hope that you'll be able to see how each of the chapters is interconnected.  You'll see how the data being "tidy" (See Chapter \@ref(tidy)) leads into data visualization and wrangling in exploratory data analysis and how those concepts tie into inference and regression.
+
+We will keep a running list of R packages you will need to have installed to complete the analysis as well here in the `needed_pkgs` character vector.  You can check if you have all of the needed packages installed by running all of the lines below in the next chunk of R code.  The last lines including the `if` will install them as needed (i.e., download their needed files from the internet to your hard drive and install them for your use).
+
+You can run the `library` function on them to load them into your current analysis.  Prior to each analysis where a package is needed, you will see the corresponding `library` function in the text.  Make sure to check the top of the chapter to see if a package was loaded there.
+-->
+
+</div>
+</div>
+</div>
+
+
+
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="index.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="3-viz.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/02-getting-started.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/3-tidy.html b/previous_versions/v0.4.0/3-tidy.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/previous_versions/v0.4.0/3-viz.html b/previous_versions/v0.4.0/3-viz.html
new file mode 100644
index 000000000..871f30f95
--- /dev/null
+++ b/previous_versions/v0.4.0/3-viz.html
@@ -0,0 +1,1711 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>3 Data Visualization via ggplot2 | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="3 Data Visualization via ggplot2 | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="3 Data Visualization via ggplot2 | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="2-getting-started.html">
+<link rel="next" href="4-tidy.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="viz" class="section level1">
+<h1><span class="header-section-number">3</span> Data Visualization via ggplot2</h1>
+<p>We begin the development of your data science toolbox with data visualization. By visualizing our data, we will be able to gain valuable insights from our data that we couldn’t initially see from just looking at the raw data in spreadsheet form. We will use the <code>ggplot2</code> package as it provides an easy way to customize your plots and is rooted in the data visualization theory known as <em>The Grammar of Graphics</em> <span class="citation">(Wilkinson 2005)</span>.</p>
+<p>At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). The most important thing to know about graphics is that they should be created to make it obvious for your audience to understand the findings and insight you want to get across. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible, but on the other you don’t want to include so many as to overwhelm your audience.</p>
+<p>As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the <em>distribution</em> of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is <em>distributed</em> in terms of its values) as we go across the levels of a different categorical variable.</p>
+<div id="needed-packages" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(nycflights13)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)</code></pre>
+</div>
+<div id="datacamp-1" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>Our approach to introducing data visualization via the Grammar of Graphics and the <code>ggplot2</code> package is very similar to the approach taken in <a href="https://twitter.com/drob">David Robinson’s</a> DataCamp course “Introduction to the Tidyverse,” a course targeted at people new to R and the tidyverse. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters of the course are Chapter 2 on “Data visualization” and Chapter 4 on “Types of visualizations”.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/introduction-to-the-tidyverse"><img src="images/datacamp_intro_to_tidyverse.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="grammarofgraphics" class="section level2">
+<h2><span class="header-section-number">3.1</span> The Grammar of Graphics</h2>
+<p>We begin with a discussion of a theoretical framework for data visualization known as the “The Grammar of Graphics,” which serves as the basis for the <code>ggplot2</code> package. Much like how we construct sentences in any language by using a linguistic grammar (nouns, verbs, subjects, objects, etc.), the theoretical framework given by Leland Wilkinson <span class="citation">(Wilkinson 2005)</span> allows us to specify the components of a statistical graphic.</p>
+<div id="components-of-the-grammar" class="section level3">
+<h3><span class="header-section-number">3.1.1</span> Components of the Grammar</h3>
+<p>In short, the grammar tells us that:</p>
+<blockquote>
+<p><strong>A statistical graphic is a <code>mapping</code> of <code>data</code> variables to <code>aes</code>thetic attributes of <code>geom</code>etric objects.</strong></p>
+</blockquote>
+<p>Specifically, we can break a graphic into the following three essential components:</p>
+<ol style="list-style-type: decimal">
+<li><code>data</code>: the data-set comprised of variables that we map.</li>
+<li><code>geom</code>: the geometric object in question. This refers to our type of objects we can observe in our plot. For example, points, lines, bars, etc.</li>
+<li><code>aes</code>: aesthetic attributes of the geometric object that we can perceive on a graphic. For example, x/y position, color, shape, and size. Each assigned aesthetic attribute can be mapped to a variable in our data-set.</li>
+</ol>
+<p>Let’s break down the grammar with an example.</p>
+</div>
+<div id="gapminder" class="section level3">
+<h3><span class="header-section-number">3.1.2</span> Gapminder</h3>
+<p>In February 2006, a statistician named Hans Rosling gave a TED talk titled <a href="https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen">“The best stats you’ve ever seen”</a> where he presented global economic, health, and development data from the website <a href="http://www.gapminder.org/tools/#_locale_id=en;&amp;chart-type=bubbles">gapminder.org</a>. For example, from the 1704 countries included from 2007, consider only the first 6 countries when listed alphabetically:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-14">Table 3.1: </span>Gapminder 2007 Data: First 6 of 142 countries</caption>
+<thead>
+<tr class="header">
+<th align="left">Country</th>
+<th align="left">Continent</th>
+<th align="right">Life Expectancy</th>
+<th align="right">Population</th>
+<th align="right">GDP per Capita</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Afghanistan</td>
+<td align="left">Asia</td>
+<td align="right">43.83</td>
+<td align="right">31889923</td>
+<td align="right">974.58</td>
+</tr>
+<tr class="even">
+<td align="left">Albania</td>
+<td align="left">Europe</td>
+<td align="right">76.42</td>
+<td align="right">3600523</td>
+<td align="right">5937.03</td>
+</tr>
+<tr class="odd">
+<td align="left">Algeria</td>
+<td align="left">Africa</td>
+<td align="right">72.30</td>
+<td align="right">33333216</td>
+<td align="right">6223.37</td>
+</tr>
+<tr class="even">
+<td align="left">Angola</td>
+<td align="left">Africa</td>
+<td align="right">42.73</td>
+<td align="right">12420476</td>
+<td align="right">4797.23</td>
+</tr>
+<tr class="odd">
+<td align="left">Argentina</td>
+<td align="left">Americas</td>
+<td align="right">75.32</td>
+<td align="right">40301927</td>
+<td align="right">12779.38</td>
+</tr>
+<tr class="even">
+<td align="left">Australia</td>
+<td align="left">Oceania</td>
+<td align="right">81.23</td>
+<td align="right">20434176</td>
+<td align="right">34435.37</td>
+</tr>
+</tbody>
+</table>
+<p>Each row in this table corresponds to a country in 2007. For each row, we have 5 columns:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Country</strong>: Name of country.</li>
+<li><strong>Continent</strong>: Which of the five continents the country is part of. (Note that <code>Americas</code> groups North and South America and that Antarctica is excluded here.)</li>
+<li><strong>Life Expectancy</strong>: Life expectancy in years.</li>
+<li><strong>Population</strong>: Number of people living in the country.</li>
+<li><strong>GDP per Capita</strong>: Gross domestic product (in US dollars).</li>
+</ol>
+<p>Now consider Figure <a href="3-viz.html#fig:gapminder">3.1</a>, which plots this data for all 142 countries in the data frame. Note that R will deal with large numbers using scientific notation. So in the legend for “Population”, 1.25e+09 = <span class="math inline">\(1.25 \times 10^{9}\)</span> = 1,250,000,000 = 1.25 billion.</p>
+<div class="figure" style="text-align: center"><span id="fig:gapminder"></span>
+<img src="ismaykim_files/figure-html/gapminder-1.png" alt="Life Expectancy over GDP per Capita in 2007" width="\textwidth" />
+<p class="caption">
+Figure 3.1: Life Expectancy over GDP per Capita in 2007
+</p>
+</div>
+<p>Let’s view this plot through the grammar of graphics:</p>
+<ol style="list-style-type: decimal">
+<li>The <code>data</code> variable <strong>GDP per Capita</strong> gets mapped to the <code>x</code>-position <code>aes</code>thetic of the points.</li>
+<li>The <code>data</code> variable <strong>Life Expectancy</strong> gets mapped to the <code>y</code>-position <code>aes</code>thetic of the points.</li>
+<li>The <code>data</code> variable <strong>Population</strong> gets mapped to the <code>size</code> <code>aes</code>thetic of the points.</li>
+<li>The <code>data</code> variable <strong>Continent</strong> gets mapped to the <code>color</code> <code>aes</code>thetic of the points.</li>
+</ol>
+<p>Recall that <code>data</code> here corresponds to each of the variables being in the same <code>data</code> frame and the “data variable” corresponds to a column in a data frame.</p>
+<p>While in this example we are considering one type of <code>geom</code>etric object (of type <code>point</code>), graphics are not limited to just points. Some plots involve lines while others involve bars. Let’s summarize the three essential components of the grammar in a table:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-15">Table 3.2: </span>Summary of Grammar of Graphics for this plot</caption>
+<thead>
+<tr class="header">
+<th align="left">data variable</th>
+<th align="left">aes</th>
+<th align="left">geom</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">GDP per Capita</td>
+<td align="left">x</td>
+<td align="left">point</td>
+</tr>
+<tr class="even">
+<td align="left">Life Expectancy</td>
+<td align="left">y</td>
+<td align="left">point</td>
+</tr>
+<tr class="odd">
+<td align="left">Population</td>
+<td align="left">size</td>
+<td align="left">point</td>
+</tr>
+<tr class="even">
+<td align="left">Continent</td>
+<td align="left">color</td>
+<td align="left">point</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div id="other-components-of-the-grammar" class="section level3">
+<h3><span class="header-section-number">3.1.3</span> Other components of the Grammar</h3>
+<p>There are other components of the Grammar of Graphics we can control. As you start to delve deeper into the Grammar of Graphics, you’ll start to encounter these topics more and more often. In this book, we’ll only work with the two other components below (The other components are left to a more advanced text such as <a href="http://r4ds.had.co.nz/data-visualisation.html#aesthetic-mappings">R for Data Science</a> <span class="citation">(Grolemund and Wickham 2016)</span>):</p>
+<ul>
+<li><code>facet</code>ing breaks up a plot into small multiples corresponding to the levels of another variable (Section <a href="3-viz.html#facets">3.6</a>)</li>
+<li><code>position</code> adjustments for barplots (Section <a href="3-viz.html#geombar">3.8</a>)
+<!--
+- `scales` that both
+    + convert *data units* to *physical units* the computer can display. For example, apply a log-transformation on one of the axes to focus on multiplicative rather than additive changes.
+    + draw a legend and/or axes, which provide an inverse mapping to make it possible to read the original data values from the graph.
+- `coord`inate system for x/y values: typically `cartesian`, but can also be `map` or `polar`.
+- `stat`istical transformations: this includes smoothing, binning values into a histogram, or no transformation at all (known as the `"identity"` transformation).
+--></li>
+</ul>
+<p>In general, the Grammar of Graphics allows for a high degree of customization and also a consistent framework for easy updating/modification of plots.</p>
+</div>
+<div id="the-ggplot2-package" class="section level3">
+<h3><span class="header-section-number">3.1.4</span> The ggplot2 package</h3>
+<p>In this book, we will be using the <code>ggplot2</code> package for data visualization, which is an implementation of the Grammar of Graphics for R <span class="citation">(Wickham et al. 2018)</span>. You may have noticed that a lot of the previous text in this chapter is written in computer font. This is because the various components of the Grammar of Graphics are specified in the <code>ggplot</code> function, which expects at a bare minimum as arguments:</p>
+<ul>
+<li>The data frame where the variables exist: the <code>data</code> argument</li>
+<li>The mapping of the variables to aesthetic attributes: the <code>mapping</code> argument, which specifies the <code>aes</code>thetic attributes involved</li>
+</ul>
+<p>After we’ve specified these components, we then add <em>layers</em> to the plot using the <code>+</code> sign. The most essential layer to add to a plot is the specification of which type of <code>geom</code>etric object we want the plot to involve; e.g. points, lines, bars. Other layers we can add include the specification of the plot title, axes labels, facets, and visual themes for the plot.</p>
+<p>Let’s now put the theory of the Grammar of Graphics into practice.</p>
+<!--
+The plot given above is not a histogram, but the output does show us a bit of what is going on with `ggplot(data = weather, mapping = aes(x = temp))`.  It is producing a backdrop onto which we will "paint" elements.  We next proceed by adding a layer---hence, the use of the `+` symbol---to the plot to produce a histogram.  (Note also here that we don't have to specify the `data = ` and `mapping = ` text in our function calls.  This is covered in more detail in Chapter 5 of the "Getting Used to R, RStudio, and R Markdown" book [@usedtor2016]).
+-->
+<!--
+<div class="review">
+<p><strong><em>Review questions</em></strong></p>
+</div>
+**`paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
+- Have a variety of bad plots with data for the readers and have readers create better plots with `ggplot2`
+- Have sample datasets to work with from problem statements
+    + Identify the appropriate plot to address the questions of interest
+- Why is it important for barplots to start at zero?
+-->
+</div>
+</div>
+<div id="FiveNG" class="section level2">
+<h2><span class="header-section-number">3.2</span> Five Named Graphs - The 5NG</h2>
+<p>For our purposes, we will be limiting consideration to five different types of graphs. We term these five named graphs the <strong>5NG</strong>:</p>
+<ol style="list-style-type: decimal">
+<li>scatterplots</li>
+<li>linegraphs</li>
+<li>boxplots</li>
+<li>histograms</li>
+<li>barplots</li>
+</ol>
+<p>We will discuss some variations of these plots, but with this basic repertoire in your toolbox you can visualize a wide array of different data variable types. Note that certain plots are only appropriate for categorical/logical variables and others only for quantitative variables. You’ll want to quiz yourself often as we go along on which plot makes sense a given a particular problem or data-set.</p>
+<!--Subsection on scatter plots-->
+</div>
+<div id="scatterplots" class="section level2">
+<h2><span class="header-section-number">3.3</span> 5NG#1: Scatterplots</h2>
+<p>The simplest of the 5NG are <em>scatterplots</em> (also called bivariate plots); they allow you to investigate the relationship between two numerical variables. While you may already be familiar with this type of plot, let’s view it through the lens of the Grammar of Graphics. Specifically, we will graphically investigate the relationship between the following two numerical variables in the <code>flights</code> data frame:</p>
+<ol style="list-style-type: decimal">
+<li><code>dep_delay</code>: departure delay on the horizontal “x” axis and</li>
+<li><code>arr_delay</code>: arrival delay on the vertical “y” axis</li>
+</ol>
+<p>for Alaska Airlines flights leaving NYC in 2013. This requires paring down the <code>flights</code> data frame to a smaller data frame <code>all_alaska_flights</code> consisting of only Alaska Airlines (carrier code “AS”) flights. Don’t worry for now if you don’t fully understand what this code is doing, we’ll explain this in details Chapter <a href="5-wrangling.html#wrangling">5</a>, just run it all and understand that we are taking all flights and only considering those corresponding to Alaska Airlines.</p>
+<pre class="sourceCode r"><code class="sourceCode r">all_alaska_flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(carrier <span class="op">==</span><span class="st"> &quot;AS&quot;</span>)</code></pre>
+<p>This code snippet makes use of functions in the <code>dplyr</code> package for data wrangling to achieve our goal: it takes the <code>flights</code> data frame and <code>filter</code>s it to only return the rows which meet the condition <code>carrier == &quot;AS&quot;</code>. Recall from Section <a href="2-getting-started.html#code">2.2</a> that testing for equality is specified with <code>==</code> and not <code>=</code>. You will see many more examples of <code>==</code> and <code>filter()</code> in Chapter <a href="5-wrangling.html#wrangling">5</a>.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.1)</strong> Take a look at both the <code>flights</code> and <code>all_alaska_flights</code> data frames by running <code>View(flights)</code> and <code>View(all_alaska_flights)</code> in the console. In what respect do these data frames differ?</p>
+<div class="learncheck">
+
+</div>
+<div id="geompoint" class="section level3">
+<h3><span class="header-section-number">3.3.1</span> Scatterplots via geom_point</h3>
+<p>We proceed to create the scatterplot using the <code>ggplot()</code> function:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> all_alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_point</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:noalpha"></span>
+<img src="ismaykim_files/figure-html/noalpha-1.png" alt="Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013" width="\textwidth" />
+<p class="caption">
+Figure 3.2: Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013
+</p>
+</div>
+<p>In Figure <a href="3-viz.html#fig:noalpha">3.2</a> we see that a positive relationship exists between <code>dep_delay</code> and <code>arr_delay</code>: as departure delays increase, arrival delays tend to also increase. We also note that the majority of points fall near the point (0, 0). There is a large mass of points clustered there. Furthermore after executing this code, R returns a warning message alerting us to the fact that 5 rows were ignored due to missing values. For 5 rows either the value for <code>dep_delay</code> or <code>arr_delay</code> or both were missing, and thus these rows were ignored in our plot.</p>
+<p>Let’s go back to the <code>ggplot()</code> function call that created this visualization, keeping in mind our discussion in Section <a href="3-viz.html#grammarofgraphics">3.1</a>:</p>
+<ul>
+<li>Within the <code>ggplot()</code> function call, we specify two of the components of the grammar:
+<ol style="list-style-type: decimal">
+<li>The <code>data</code> frame to be <code>all_alaska_flights</code> by setting <code>data = all_alaska_flights</code></li>
+<li>The <code>aes</code>thetic <code>mapping</code> by setting <code>aes(x = dep_delay, y = arr_delay)</code>. Specifically
+<ul>
+<li>the variable <code>dep_delay</code> maps to the <code>x</code> position aesthetic</li>
+<li>the variable <code>arr_delay</code> maps to the <code>y</code> position aesthetic</li>
+</ul></li>
+</ol></li>
+<li>We add a layer to the <code>ggplot()</code> function call using the <code>+</code> sign. The layer in question specifies the third component of the grammar: the <code>geom</code>etric object. In this case the geometric object are <code>point</code>s, set by specifying <code>geom_point()</code>.</li>
+</ul>
+<p>Some notes on layers:</p>
+<ul>
+<li>Note that the <code>+</code> sign comes at the end of lines, and not at the beginning. You’ll get an error in R if you put it at the beginning.</li>
+<li>When adding layers to a plot, you are encouraged to hit <em>Return</em> on your keyboard after entering the <code>+</code> so that the code for each layer is on a new line. As we add more and more layers to plots, you’ll see this will greatly improve the legibility of your code.</li>
+<li>To stress the importance of adding layers, in particular the layer specifying the <code>geom</code>etric object, consider Figure <a href="3-viz.html#fig:nolayers">3.3</a> where no layers are added. A not very useful plot!</li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> all_alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay))</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:nolayers"></span>
+<img src="ismaykim_files/figure-html/nolayers-1.png" alt="Plot with No Layers" width="\textwidth" />
+<p class="caption">
+Figure 3.3: Plot with No Layers
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.2)</strong> What are some practical reasons why <code>dep_delay</code> and <code>arr_delay</code> have a positive relationship?</p>
+<p><strong>(LC3.3)</strong> What variables (not necessarily in the <code>flights</code> data frame) would you expect to have a negative correlation (i.e. a negative relationship) with <code>dep_delay</code>? Why? Remember that we are focusing on numerical variables here.</p>
+<p><strong>(LC3.4)</strong> Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights?</p>
+<p><strong>(LC3.5)</strong> What are some other features of the plot that stand out to you?</p>
+<p><strong>(LC3.6)</strong> Create a new scatterplot using different variables in the <code>all_alaska_flights</code> data frame by modifying the example above.</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="overplotting" class="section level3">
+<h3><span class="header-section-number">3.3.2</span> Over-plotting</h3>
+<p>The large mass of points near (0, 0) in Figure <a href="3-viz.html#fig:noalpha">3.2</a> can cause some confusion. This is the result of a phenomenon called <em>overplotting</em>. As one may guess, this corresponds to values being plotted on top of each other <em>over</em> and <em>over</em> again. It is often difficult to know just how many values are plotted in this way when looking at a basic scatterplot as we have here. There are two ways to address this issue:</p>
+<ol style="list-style-type: decimal">
+<li>By adjusting the transparency of the points via the <code>alpha</code> argument</li>
+<li>By jittering the points via <code>geom_jitter()</code></li>
+</ol>
+<p>The first way of relieving overplotting is by changing the <code>alpha</code> argument in <code>geom_point()</code> which controls the transparency of the points. By default, this value is set to <code>1</code>. We can change this to any value between <code>0</code> and <code>1</code> where <code>0</code> sets the points to be 100% transparent and <code>1</code> sets the points to be 100% opaque. Note how the following function call is identical to the one in Section <a href="3-viz.html#scatterplots">3.3</a>, but with <code>alpha = 0.2</code> added to the <code>geom_point()</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> all_alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_point</span>(<span class="dt">alpha =</span> <span class="fl">0.2</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:alpha"></span>
+<img src="ismaykim_files/figure-html/alpha-1.png" alt="Delay scatterplot with alpha=0.2" width="\textwidth" />
+<p class="caption">
+Figure 3.4: Delay scatterplot with alpha=0.2
+</p>
+</div>
+<p>The key feature to note in Figure <a href="3-viz.html#fig:alpha">3.4</a> is that the transparency of the points is cumulative: areas with a high-degree of overplotting are darker, whereas areas with a lower degree are less dark.</p>
+<p>Note that there is no <code>aes()</code> surrounding <code>alpha = 0.2</code> here. Since we are NOT mapping a variable to an aesthetic but instead are just changing a setting, we don’t need to create a mapping with <code>aes()</code>. In fact, you’ll receive an error if you try to change the second line above to <code>geom_point(aes(alpha = 0.2))</code>.</p>
+<p>The second way of relieving overplotting is to <em>jitter</em> the points a bit. In other words, we are going to add just a bit of random noise to the points to better see them and alleviate some of the overplotting. You can think of “jittering” as shaking the points around a bit on the plot. Let’s illustrate using a simple example first. Say we have a data frame <code>jitter_example</code> with 4 rows of identical value 0 for both <code>x</code> and <code>y</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">jitter_example</code></pre>
+<pre><code># A tibble: 4 x 2
+      x     y
+  &lt;dbl&gt; &lt;dbl&gt;
+1     0     0
+2     0     0
+3     0     0
+4     0     0</code></pre>
+<p>We display the resulting scatterplot in Figure <a href="3-viz.html#fig:jitter-example-plot-1">3.5</a>; observe that the 4 points are superimposed on top of each other. While we know there are 4 values being plotted, this fact might not be apparent to others.</p>
+<div class="figure" style="text-align: center"><span id="fig:jitter-example-plot-1"></span>
+<img src="ismaykim_files/figure-html/jitter-example-plot-1-1.png" alt="Regular scatterplot of jitter example data" width="\textwidth" />
+<p class="caption">
+Figure 3.5: Regular scatterplot of jitter example data
+</p>
+</div>
+<p>In Figure <a href="3-viz.html#fig:jitter-example-plot-2">3.6</a> we instead display a <em>jittered scatterplot</em>. Since each point is given a random “nudge”, it is now plainly evident that there are four points.</p>
+<div class="figure" style="text-align: center"><span id="fig:jitter-example-plot-2"></span>
+<img src="ismaykim_files/figure-html/jitter-example-plot-2-1.png" alt="Jittered scatterplot of jitter example data" width="\textwidth" />
+<p class="caption">
+Figure 3.6: Jittered scatterplot of jitter example data
+</p>
+</div>
+<p>To create a jittered scatterplot, instead of using <code>geom_point</code>, we use <code>geom_jitter</code>. To specify how much jitter to add, we adjust the <code>width</code> and <code>height</code> arguments. This corresponds to how hard you’d like to shake the plot in units corresponding to those for both the horizontal and vertical variables (in this case, minutes). It is important to add just enough jitter to break any overlap in points, but not so much that we completely obscure the overall pattern in points.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> all_alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_jitter</span>(<span class="dt">width =</span> <span class="dv">30</span>, <span class="dt">height =</span> <span class="dv">30</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:jitter"></span>
+<img src="ismaykim_files/figure-html/jitter-1.png" alt="Jittered delay scatterplot" width="\textwidth" />
+<p class="caption">
+Figure 3.7: Jittered delay scatterplot
+</p>
+</div>
+<p>Observe how this function call is identical to the one in Subsection <a href="3-viz.html#geompoint">3.3.1</a>, but with <code>geom_point()</code> replaced with <code>geom_jitter()</code>. Also, it is important to note that <code>geom_jitter()</code> is strictly a visualization tool and that does not alter the original values saved in <code>jitter_example</code>.</p>
+<p>The plot in Figure <a href="3-viz.html#fig:jitter">3.7</a> helps us a little bit in getting a sense for the overplotting, but with a relatively large data-set like this one (714 flights), it can be argued that changing the transparency of the points by setting <code>alpha</code> proved more effective.</p>
+<p>Furthermore, we’ll see later on that the two following R commands will yield the exact same plot:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> all_alaska_flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_jitter</span>(<span class="dt">width =</span> <span class="dv">30</span>, <span class="dt">height =</span> <span class="dv">30</span>)
+<span class="kw">ggplot</span>(all_alaska_flights, <span class="kw">aes</span>(<span class="dt">x =</span> dep_delay, <span class="dt">y =</span> arr_delay)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_jitter</span>(<span class="dt">width =</span> <span class="dv">30</span>, <span class="dt">height =</span> <span class="dv">30</span>)</code></pre>
+<p>In other words you can drop the <code>data =</code> and <code>mapping =</code> if you keep the order of the two arguments the same. Since the <code>ggplot()</code> function is expecting its first argument <code>data</code> to be a data frame and its second argument to correspond to <code>mapping =</code>, you can omit both and you’ll get the same plot. As you get more and more practice, you’ll likely find yourself not including the specification of the argument like this. But for now to keep things straightforward let’s make it a point to include the <code>data =</code> and <code>mapping =</code>.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.7)</strong> Why is setting the <code>alpha</code> argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot?</p>
+<p><strong>(LC3.8)</strong> After viewing the Figure <a href="3-viz.html#fig:alpha">3.4</a> above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the <code>alpha = 0.2</code> set in Figure <a href="3-viz.html#fig:noalpha">3.2</a>?</p>
+<div class="learncheck">
+
+</div>
+<!--
+Maybe include a shading of the points by another variable example here for multivariate thinking?
+-->
+</div>
+<div id="summary" class="section level3">
+<h3><span class="header-section-number">3.3.3</span> Summary</h3>
+<p>Scatterplots display the relationship between two numerical variables. They are among the most commonly used plots because they can provide an immediate way to see the trend in one variable versus another. However, if you try to create a scatterplot where either one of the two variables is not numerical, you will get strange results. Be careful!</p>
+<p>With medium to large data-sets, you may need to play with either <code>geom_jitter()</code> or the <code>alpha</code> argument in order to get a good feel for relationships in your data. This tweaking is often a fun part of data visualization since you’ll have the chance to see different relationships come about as you make subtle changes to your plots.</p>
+</div>
+</div>
+<div id="linegraphs" class="section level2">
+<h2><span class="header-section-number">3.4</span> 5NG#2: Linegraphs</h2>
+<p>The next of the 5NG is a linegraph. They are most frequently used when the x-axis represents time and the y-axis represents some other numerical variable; such plots are known as <em>time series</em>. Time represents a variable that is connected together by each day following the previous day. In other words, time has a natural ordering. Linegraphs should be avoided when there is not a clear sequential ordering to the explanatory variable, i.e. the x-variable or the <em>predictor</em> variable.</p>
+<p>Our focus now turns to the <code>temp</code> variable in this <code>weather</code> data-set. By</p>
+<ul>
+<li>Looking over the <code>weather</code> data-set by typing <code>View(weather)</code> in the console.</li>
+<li>Running <code>?weather</code> to bring up the help file.</li>
+</ul>
+<p>We can see that the <code>temp</code> variable corresponds to hourly temperature (in Fahrenheit) recordings at weather stations near airports in New York City. Instead of considering all hours in 2013 for all three airports in NYC, let’s focus on the hourly temperature at Newark airport (<code>origin</code> code “EWR”) for the first 15 days in January 2013. The <code>weather</code> data frame in the <code>nycflights13</code> package contains this data, but we first need to filter it to only include those rows that correspond to Newark in the first 15 days of January.</p>
+<pre class="sourceCode r"><code class="sourceCode r">early_january_weather &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(origin <span class="op">==</span><span class="st"> &quot;EWR&quot;</span> <span class="op">&amp;</span><span class="st"> </span>month <span class="op">==</span><span class="st"> </span><span class="dv">1</span> <span class="op">&amp;</span><span class="st"> </span>day <span class="op">&lt;=</span><span class="st"> </span><span class="dv">15</span>)</code></pre>
+<p>This is similar to the previous use of the <code>filter</code> command in Section <a href="3-viz.html#scatterplots">3.3</a>, however we now use the <code>&amp;</code> operator. The above selects only those rows in <code>weather</code> where the originating airport is <code>&quot;EWR&quot;</code> <strong>and</strong> we are in the first month <strong>and</strong> the day is from 1 to 15 inclusive.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.9)</strong> Take a look at both the <code>weather</code> and <code>early_january_weather</code> data frames by running <code>View(weather)</code> and <code>View(early_january_weather)</code> in the console. In what respect do these data frames differ?</p>
+<p><strong>(LC3.10)</strong> <code>View()</code> the <code>flights</code> data frame again. Why does the <code>time_hour</code> variable uniquely identify the hour of the measurement whereas the <code>hour</code> variable does not?</p>
+<div class="learncheck">
+
+</div>
+<div id="geomline" class="section level3">
+<h3><span class="header-section-number">3.4.1</span> Linegraphs via geom_line</h3>
+<p>We plot a linegraph of hourly temperature using <code>geom_line()</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> early_january_weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> time_hour, <span class="dt">y =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_line</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:hourlytemp"></span>
+<img src="ismaykim_files/figure-html/hourlytemp-1.png" alt="Hourly Temperature in Newark for January 1-15, 2013" width="\textwidth" />
+<p class="caption">
+Figure 3.8: Hourly Temperature in Newark for January 1-15, 2013
+</p>
+</div>
+<p>Much as with the <code>ggplot()</code> call in Chapter <a href="3-viz.html#geompoint">3.3.1</a>, we describe the components of the Grammar of Graphics:</p>
+<ul>
+<li>Within the <code>ggplot()</code> function call, we specify two of the components of the grammar:
+<ol style="list-style-type: decimal">
+<li>The <code>data</code> frame to be <code>early_january_weather</code> by setting <code>data = early_january_weather</code></li>
+<li>The <code>aes</code>thetic mapping by setting <code>aes(x = time_hour, y = temp)</code>. Specifically
+<ul>
+<li><code>time_hour</code> (i.e. the time variable) maps to the <code>x</code> position</li>
+<li><code>temp</code> maps to the <code>y</code> position</li>
+</ul></li>
+</ol></li>
+<li>We add a layer to the <code>ggplot()</code> function call using the <code>+</code> sign</li>
+<li>The layer in question specifies the third component of the grammar: the <code>geom</code>etric object in question. In this case the geometric object is a <code>line</code>, set by specifying <code>geom_line()</code>.</li>
+</ul>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.11)</strong> Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis?</p>
+<p><strong>(LC3.12)</strong> Why are linegraphs frequently used when time is the explanatory variable?</p>
+<p><strong>(LC3.13)</strong> Plot a time series of a variable other than <code>temp</code> for Newark Airport in the first 15 days of January 2013.</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="summary-1" class="section level3">
+<h3><span class="header-section-number">3.4.2</span> Summary</h3>
+<p>Linegraphs, just like scatterplots, display the relationship between two numerical variables. However, the variable on the x-axis (i.e. the explanatory variable) should have a natural ordering, like some notion of time. We can mislead our audience if that isn’t the case.</p>
+</div>
+</div>
+<div id="histograms" class="section level2">
+<h2><span class="header-section-number">3.5</span> 5NG#3: Histograms</h2>
+<p>Let’s consider the <code>temp</code> variable in the <code>weather</code> data frame once again, but now unlike with the linegraphs in Chapter <a href="3-viz.html#linegraphs">3.4</a>, let’s say we don’t care about the relationship of temperature to time, but rather we care about the <em>(statistical) distribution</em> of temperatures. We could just produce points where each of the different values appear on something similar to a number line:</p>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-26"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-26-1.png" alt="Plot of Hourly Temperature Recordings from NYC in 2013" width="\textwidth" />
+<p class="caption">
+Figure 3.9: Plot of Hourly Temperature Recordings from NYC in 2013
+</p>
+</div>
+<p>This gives us a general idea of how the values of <code>temp</code> differ. We see that temperatures vary from around 11 up to 100 degrees Fahrenheit. The area between 40 and 60 degrees appears to have more points plotted than outside that range.</p>
+<div id="geomhistogram" class="section level3">
+<h3><span class="header-section-number">3.5.1</span> Histograms via geom_histogram</h3>
+<p>What is commonly produced instead of the above plot is a plot known as a <em>histogram</em>. The histogram shows how many elements of a single numerical variable fall in specified <em>bins</em>. In this case, these bins may correspond to between 0-10°F, 10-20°F, etc. We produce a histogram of the hour temperatures at all three NYC airports in 2013:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>()</code></pre>
+<pre><code>`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</code></pre>
+<pre><code>Warning: Removed 1 rows containing non-finite values (stat_bin).</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-27"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-27-1.png" alt="Histogram of Hourly Temperature Recordings from NYC in 2013" width="\textwidth" />
+<p class="caption">
+Figure 3.10: Histogram of Hourly Temperature Recordings from NYC in 2013
+</p>
+</div>
+<p>Note here:</p>
+<ul>
+<li>There is only one variable being mapped in <code>aes()</code>: the single numerical variable <code>temp</code>. You don’t need to compute the y-aesthetic: it gets computed automatically.</li>
+<li>We set the <code>geom</code>etric object to be <code>geom_histogram()</code></li>
+<li>We got a warning message of <code>1 rows containing non-finite values</code> being removed. This is due to one of the values of temperature being missing. R is alerting us that this happened.<br />
+</li>
+<li>Another warning corresponds to an urge to specify the number of bins you’d like to create.</li>
+</ul>
+</div>
+<div id="adjustbins" class="section level3">
+<h3><span class="header-section-number">3.5.2</span> Adjusting the bins</h3>
+<p>We can adjust characteristics of the bins in one of <em>two</em> ways:</p>
+<ol style="list-style-type: decimal">
+<li>By adjusting the number of bins via the <code>bins</code> argument</li>
+<li>By adjusting the width of the bins via the <code>binwidth</code> argument</li>
+</ol>
+<p>First, we have the power to specify how many bins we would like to put the data into as an argument in the <code>geom_histogram()</code> function. By default, this is chosen to be 30 somewhat arbitrarily; we have received a warning above our plot that this was done.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">60</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-28"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-28-1.png" alt="Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Bins" width="\textwidth" />
+<p class="caption">
+Figure 3.11: Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Bins
+</p>
+</div>
+<p>Note the addition of the <code>color</code> argument. If you’d like to be able to more easily differentiate each of the bins, you can specify the color of the outline as done above. You can also adjust the color of the bars by setting the <code>fill</code> argument. Type <code>colors()</code> in your console to see all 657 available colors.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">60</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">fill =</span> <span class="st">&quot;steelblue&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-29"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-29-1.png" alt="Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Colored Bins" width="\textwidth" />
+<p class="caption">
+Figure 3.12: Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Colored Bins
+</p>
+</div>
+<p>Second, instead of specifying the number of bins, we can also specify the width of the bins by using the <code>binwidth</code> argument in the <code>geom_histogram</code> function.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-30"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-30-1.png" alt="Histogram of Hourly Temperature Recordings from NYC in 2013 - Binwidth = 10" width="\textwidth" />
+<p class="caption">
+Figure 3.13: Histogram of Hourly Temperature Recordings from NYC in 2013 - Binwidth = 10
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.14)</strong> What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures?</p>
+<p><strong>(LC3.15)</strong> Would you classify the distribution of temperatures as symmetric or skewed?</p>
+<p><strong>(LC3.16)</strong> What would you guess is the “center” value in this distribution? Why did you make that choice?</p>
+<p><strong>(LC3.17)</strong> Is this data spread out greatly from the center or is it close? Why?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="summary-2" class="section level3">
+<h3><span class="header-section-number">3.5.3</span> Summary</h3>
+<p>Histograms, unlike scatterplots and linegraphs, present information on only a single numerical variable. In particular they are visualizations of the (statistical) distribution of values.</p>
+</div>
+</div>
+<div id="facets" class="section level2">
+<h2><span class="header-section-number">3.6</span> Facets</h2>
+<p>Before continuing the 5NG, we briefly introduce a new concept called <em>faceting</em>. Faceting is used when we’d like to create small multiples of the same plot over a different categorical variable. By default, all of the small multiples will have the same vertical axis.</p>
+<p>For example, suppose we were interested in looking at how the temperature histograms we saw in Chapter <a href="3-viz.html#histograms">3.5</a> varied by month. This is what is meant by “the distribution of a variable over another variable”: <code>temp</code> is one variable and <code>month</code> is the other variable. In order to look at histograms of <code>temp</code> for each month, we add a layer <code>facet_wrap(~ month)</code>. You can also specify how many rows you’d like the small multiple plots to be in using <code>nrow</code> or how many columns using <code>ncol</code> inside of <code>facet_wrap</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>month, <span class="dt">nrow =</span> <span class="dv">4</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:facethistogram"></span>
+<img src="ismaykim_files/figure-html/facethistogram-1.png" alt="Faceted histogram" width="\textwidth" />
+<p class="caption">
+Figure 3.14: Faceted histogram
+</p>
+</div>
+<p>Note the use of the <code>~</code> before <code>month</code> in <code>facet_wrap</code>. The tilde (<code>~</code>) is required and you’ll receive the error <code>Error in as.quoted(facets) : object 'month' not found</code> if you don’t include it before <code>month</code> here.</p>
+<p>As we might expect, the temperature tends to increase as summer approaches and then decrease as winter approaches.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.18)</strong> What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables?</p>
+<p><strong>(LC3.19)</strong> What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100?</p>
+<p><strong>(LC3.20)</strong> For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics.</p>
+<p><strong>(LC3.21)</strong> Does the <code>temp</code> variable in the <code>weather</code> data-set have a lot of variability? Why do you say that?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="boxplots" class="section level2">
+<h2><span class="header-section-number">3.7</span> 5NG#4: Boxplots</h2>
+<p>While using faceted histograms can provide a way to compare distributions of a numerical variable split by groups of a categorical variable as in Section <a href="3-viz.html#facets">3.6</a>, an alternative plot called a <em>boxplot</em> (also called a <em>side-by-side boxplot</em>) achieves the same task and is frequently preferred. The <em>boxplot</em> uses the information provided in the <em>five-number summary</em> referred to in Appendix <a href="A-appendixA.html#appendixA">A</a>. It gives a way to compare this summary information across the different levels of a categorical variable.</p>
+<div id="geomboxplot" class="section level3">
+<h3><span class="header-section-number">3.7.1</span> Boxplots via geom_boxplot</h3>
+<p>Let’s create a boxplot to compare the monthly temperatures as we did above with the faceted histograms.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> month, <span class="dt">y =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:badbox"></span>
+<img src="ismaykim_files/figure-html/badbox-1.png" alt="Invalid boxplot specification" width="\textwidth" />
+<p class="caption">
+Figure 3.15: Invalid boxplot specification
+</p>
+</div>
+<pre><code>Warning messages:
+1: Continuous x aesthetic -- did you forget aes(group=...)? 
+2: Removed 1 rows containing non-finite values (stat_boxplot). </code></pre>
+<p>Note the set of warnings that is given here. The second warning corresponds to missing values in the data frame and it is turned off on subsequent plots. Let’s focus on the first warning.</p>
+<p>Observe that this plot does not look like what we were expecting. We were expecting to see the distribution of temperatures for each month (so 12 different boxplots). The first warning is letting us know that we are plotting a numerical, and not categorical variable, on the x-axis. This gives us the overall boxplot without any other groupings. We can get around this by introducing a new function for our <code>x</code> variable:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> weather, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> <span class="kw">factor</span>(month), <span class="dt">y =</span> temp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:monthtempbox"></span>
+<img src="ismaykim_files/figure-html/monthtempbox-1.png" alt="Month by temp boxplot" width="\textwidth" />
+<p class="caption">
+Figure 3.16: Month by temp boxplot
+</p>
+</div>
+<p>We have introduced a new function called <code>factor()</code> which converts a numerical variable to a categorical one. This is necessary as <code>geom_boxplot</code> requires the <code>x</code> variable to be a categorical variable, which the variable <code>month</code> is not. So after applying <code>factor(month)</code>, month goes from having numerical values 1, 2, …, 12 to having labels “1”, “2”, …, “12”. The resulting Figure <a href="3-viz.html#fig:monthtempbox">3.16</a> shows 12 separate “box and whiskers” plots with the following features:</p>
+<ul>
+<li>The “box” portions of this plot represent the 25<sup>th</sup> percentile AKA the 1<sup>st</sup> quartile, the median AKA the 50<sup>th</sup> percentile AKA the 2<sup>nd</sup> quartile, and the 75<sup>th</sup> percentile AKA the 3<sup>rd</sup> quartile.</li>
+<li>The height of each box, i.e. the value of the 3<sup>rd</sup> quartile minus the value of the 1<sup>st</sup> quartile, is called the <em>interquartile range</em> (<span class="math inline">\(IQR\)</span>). It is a measure of spread of the middle 50% of values, with longer boxes indicating more variability.</li>
+<li>The “whisker” portions of these plots extend out from the bottoms and tops of the boxes and represent points less than the 25<sup>th</sup> percentile and greater than the 75<sup>th</sup> percentiles respectively. They’re set to extend out no more than <span class="math inline">\(1.5 \times IQR\)</span> units away from either end of the boxes. We say “no more than” because the ends of the whiskers represent the first observed values of <code>temp</code> to be within the range of the whiskers. The length of these whiskers show how the data outside the middle 50% of values vary, with longer whiskers indicating more variability.</li>
+<li>The dots representing values falling outside the whiskers are called <em>outliers</em>. It is important to keep in mind that the definition of an outlier is somewhat arbitrary and not absolute. In this case, they are defined by the length of the whiskers, which are no more than <span class="math inline">\(1.5 \times IQR\)</span> units long.</li>
+</ul>
+<p>Looking at this plot we can see, as expected, that summer months (6 through 8) have higher median temperatures as evidenced by the higher solid lines in the middle of the boxes. We can easily compare temperatures across months by drawing imaginary horizontal lines across the plot. Furthermore, the height of the 12 boxes as quantified by the interquartile ranges are informative too; they tell us about variability, or spread, of temperatures recorded in a given month.</p>
+<p>But to really bring home what boxplots show, let’s focus only on the month of November’s 2141 temperature recordings.</p>
+<div class="figure" style="text-align: center"><span id="fig:monthtempbox2"></span>
+<img src="ismaykim_files/figure-html/monthtempbox2-1.png" alt="November boxplot" width="\textwidth" />
+<p class="caption">
+Figure 3.17: November boxplot
+</p>
+</div>
+<p>Now let’s plot all 2141 temperature recordings for November on top of the boxplot in Figure <a href="3-viz.html#fig:monthtempbox3">3.18</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:monthtempbox3"></span>
+<img src="ismaykim_files/figure-html/monthtempbox3-1.png" alt="November boxplot with points" width="\textwidth" />
+<p class="caption">
+Figure 3.18: November boxplot with points
+</p>
+</div>
+<p>What the boxplot does is summarize the 2141 points for you, in particular:</p>
+<ol style="list-style-type: decimal">
+<li>25% of points (about 534 observations) fall below the bottom edge of the box which is the first quartile of 35.96 degrees Fahrenheit (2.2 degrees Celsius). In other words 25% of observations were colder than 35.96 degrees Fahrenheit.</li>
+<li>25% of points fall between the bottom edge of the box and the solid middle line which is the median of 44.96 degrees Fahrenheit (7.8 degrees Celsius). In other words 25% of observations were between 35.96 and 44.96 degrees Fahrenheit.</li>
+<li>25% of points fall between the solid middle line and the top edge of the box which is the third quartile of 51.98 degrees Fahrenheit (11.1 degrees Celsius). In other words 25% of observations were between 44.96 and 51.98 degrees Fahrenheit.</li>
+<li>25% of points fall over the top edge of the box. In other words 25% of observations were warmer than 51.98 degrees Fahrenheit.</li>
+<li>The middle 50% of points lie within the interquartile range 16.02 degrees Fahrenheit.</li>
+</ol>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.22)</strong> What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point.</p>
+<p><strong>(LC3.23)</strong> Which months have the highest variability in temperature? What reasons do you think this is?</p>
+<p><strong>(LC3.24)</strong> We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?</p>
+<p><strong>(LC3.25)</strong> Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="summary-3" class="section level3">
+<h3><span class="header-section-number">3.7.2</span> Summary</h3>
+<p>Boxplots provide a way to compare and contrast the distribution of one quantitative variable across multiple levels of one categorical variable. One can see where the median falls across the different groups by looking at the center line in the box. To see how spread out the variable is across the different groups, look at both the width of the box and also how far the lines stretch vertically from the box. (If the lines stretch far from the box but the box has a small width, the variability of the values closer to the center is much smaller than the variability of the outer ends of the variable.) Outliers are even more easily identified when looking at a boxplot than when looking at a histogram.</p>
+</div>
+</div>
+<div id="geombar" class="section level2">
+<h2><span class="header-section-number">3.8</span> 5NG#5: Barplots</h2>
+<p>Both histograms and boxplots represent ways to visualize the variability of numerical variables. Another common task is to present the distribution of a categorical variable. This is a simpler task, focused on how many elements from the data fall into different categories of the categorical variable. Often the best way to visualize these different counts (also known as <em>frequencies</em>) is via a barplot, also known as a barchart.</p>
+<p>One complication, however, is how your data is represented: is the categorical variable of interest “pre-counted” or not? For example, run the following code in your Console. This code manually creates two data frames representing a collection of fruit: 3 apples and 2 oranges.</p>
+<pre class="sourceCode r"><code class="sourceCode r">fruits &lt;-<span class="st"> </span><span class="kw">data_frame</span>(
+  <span class="dt">fruit =</span> <span class="kw">c</span>(<span class="st">&quot;apple&quot;</span>, <span class="st">&quot;apple&quot;</span>, <span class="st">&quot;apple&quot;</span>, <span class="st">&quot;orange&quot;</span>, <span class="st">&quot;orange&quot;</span>)
+)
+fruits_counted &lt;-<span class="st"> </span><span class="kw">data_frame</span>(
+  <span class="dt">fruit =</span> <span class="kw">c</span>(<span class="st">&quot;apple&quot;</span>, <span class="st">&quot;orange&quot;</span>),
+  <span class="dt">number =</span> <span class="kw">c</span>(<span class="dv">3</span>, <span class="dv">2</span>)
+)</code></pre>
+<p>We see both the <code>fruits</code> and <code>fruits_counted</code> data frames represent the same collection of fruit. Whereas <code>fruits</code> just lists the fruit individually:</p>
+<table>
+<caption><span id="tab:fruits">Table 3.3: </span>Fruits</caption>
+<thead>
+<tr class="header">
+<th align="left">fruit</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">apple</td>
+</tr>
+<tr class="even">
+<td align="left">apple</td>
+</tr>
+<tr class="odd">
+<td align="left">apple</td>
+</tr>
+<tr class="even">
+<td align="left">orange</td>
+</tr>
+<tr class="odd">
+<td align="left">orange</td>
+</tr>
+</tbody>
+</table>
+<p><code>fruits_counted</code> has a variable <code>count</code> which represents pre-counted values of each fruit.</p>
+<table>
+<caption><span id="tab:fruitscounted">Table 3.4: </span>Fruits (Pre-Counted)</caption>
+<thead>
+<tr class="header">
+<th align="left">fruit</th>
+<th align="right">number</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">apple</td>
+<td align="right">3</td>
+</tr>
+<tr class="even">
+<td align="left">orange</td>
+<td align="right">2</td>
+</tr>
+</tbody>
+</table>
+<div id="barplots-via-geom_bargeom_col" class="section level3">
+<h3><span class="header-section-number">3.8.1</span> Barplots via geom_bar/geom_col</h3>
+<p>Let’s generate barplots using these two different representations of the same basket of fruit: 3 apples and 2 oranges. Using the not pre-counted data <code>fruits</code> from Table <a href="3-viz.html#tab:fruits">3.3</a>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> fruits, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> fruit)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:geombar"></span>
+<img src="ismaykim_files/figure-html/geombar-1.png" alt="Barplot when counts are not pre-counted" width="\textwidth" />
+<p class="caption">
+Figure 3.19: Barplot when counts are not pre-counted
+</p>
+</div>
+<p>and using the pre-counted data <code>fruits_counted</code> from Table <a href="3-viz.html#tab:fruitscounted">3.4</a>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> fruits_counted, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> fruit, <span class="dt">y =</span> number)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_col</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:geomcol"></span>
+<img src="ismaykim_files/figure-html/geomcol-1.png" alt="Barplot when counts are pre-counted" width="\textwidth" />
+<p class="caption">
+Figure 3.20: Barplot when counts are pre-counted
+</p>
+</div>
+<p>Compare the barplots in Figures <a href="3-viz.html#fig:geombar">3.19</a> and <a href="3-viz.html#fig:geomcol">3.20</a>, which are identical, but are based on the two different data frames. Observe that:</p>
+<ul>
+<li>The code that generates Figure <a href="3-viz.html#fig:geombar">3.19</a> based on <code>fruits</code> does not map a variable to the <code>y</code> <code>aes</code>thetic and uses <code>geom_bar()</code>.</li>
+<li>The code that generates Figure <a href="3-viz.html#fig:geomcol">3.20</a> based on <code>fruits_counted</code> maps the <code>number</code> variable to the <code>y</code> <code>aes</code>thetic and uses <code>geom_col()</code></li>
+</ul>
+<p>Stating the above differently:</p>
+<ul>
+<li>When the categorical variable you want to plot is not pre-counted in your data frame you need to use <code>geom_bar()</code>.</li>
+<li>When the categorical variable is pre-counted (in the above <code>fruits_counted</code> example in the variable <code>number</code>), you need to use <code>geom_col()</code> with the <code>y</code> aesthetic explicitly mapped.</li>
+</ul>
+<p>Please note that understanding this difference is one of <code>ggplot2</code>’s trickier aspects that causes the most confusion, and fortunately this is as complicated as our use of <code>ggplot2</code> is going to get. Let’s consider a different distribution: the distribution of airlines that flew out of New York City in 2013. Here we explore the number of flights from each airline/<code>carrier</code>. This can be plotted by invoking the <code>geom_bar</code> function in <code>ggplot2</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:flightsbar"></span>
+<img src="ismaykim_files/figure-html/flightsbar-1.png" alt="Number of flights departing NYC in 2013 by airline using geom_bar" width="\textwidth" />
+<p class="caption">
+Figure 3.21: Number of flights departing NYC in 2013 by airline using geom_bar
+</p>
+</div>
+<p>To get an understanding of what the names of these airlines are corresponding to these <code>carrier</code> codes, we can look at the <code>airlines</code> data frame in the <code>nycflights13</code> package.</p>
+<pre class="sourceCode r"><code class="sourceCode r">airlines</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="left">carrier</th>
+<th align="left">name</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">9E</td>
+<td align="left">Endeavor Air Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">AA</td>
+<td align="left">American Airlines Inc.</td>
+</tr>
+<tr class="odd">
+<td align="left">AS</td>
+<td align="left">Alaska Airlines Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">B6</td>
+<td align="left">JetBlue Airways</td>
+</tr>
+<tr class="odd">
+<td align="left">DL</td>
+<td align="left">Delta Air Lines Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">EV</td>
+<td align="left">ExpressJet Airlines Inc.</td>
+</tr>
+<tr class="odd">
+<td align="left">F9</td>
+<td align="left">Frontier Airlines Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">FL</td>
+<td align="left">AirTran Airways Corporation</td>
+</tr>
+<tr class="odd">
+<td align="left">HA</td>
+<td align="left">Hawaiian Airlines Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">MQ</td>
+<td align="left">Envoy Air</td>
+</tr>
+<tr class="odd">
+<td align="left">OO</td>
+<td align="left">SkyWest Airlines Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">UA</td>
+<td align="left">United Air Lines Inc.</td>
+</tr>
+<tr class="odd">
+<td align="left">US</td>
+<td align="left">US Airways Inc.</td>
+</tr>
+<tr class="even">
+<td align="left">VX</td>
+<td align="left">Virgin America</td>
+</tr>
+<tr class="odd">
+<td align="left">WN</td>
+<td align="left">Southwest Airlines Co.</td>
+</tr>
+<tr class="even">
+<td align="left">YV</td>
+<td align="left">Mesa Airlines Inc.</td>
+</tr>
+</tbody>
+</table>
+<p>Going back to our barplot, we see that United Air Lines, JetBlue Airways, and ExpressJet Airlines had the most flights depart New York City in 2013. To get the actual number of flights by each airline we can use the <code>group_by()</code>, <code>summarize()</code>, and <code>n()</code> functions in the <code>dplyr</code> package on the <code>carrier</code> variable in <code>flights</code>, which we will introduce formally in Chapter <a href="5-wrangling.html#wrangling">5</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_table &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(carrier) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">number =</span> <span class="kw">n</span>())
+flights_table</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="left">carrier</th>
+<th align="right">number</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">9E</td>
+<td align="right">18460</td>
+</tr>
+<tr class="even">
+<td align="left">AA</td>
+<td align="right">32729</td>
+</tr>
+<tr class="odd">
+<td align="left">AS</td>
+<td align="right">714</td>
+</tr>
+<tr class="even">
+<td align="left">B6</td>
+<td align="right">54635</td>
+</tr>
+<tr class="odd">
+<td align="left">DL</td>
+<td align="right">48110</td>
+</tr>
+<tr class="even">
+<td align="left">EV</td>
+<td align="right">54173</td>
+</tr>
+<tr class="odd">
+<td align="left">F9</td>
+<td align="right">685</td>
+</tr>
+<tr class="even">
+<td align="left">FL</td>
+<td align="right">3260</td>
+</tr>
+<tr class="odd">
+<td align="left">HA</td>
+<td align="right">342</td>
+</tr>
+<tr class="even">
+<td align="left">MQ</td>
+<td align="right">26397</td>
+</tr>
+<tr class="odd">
+<td align="left">OO</td>
+<td align="right">32</td>
+</tr>
+<tr class="even">
+<td align="left">UA</td>
+<td align="right">58665</td>
+</tr>
+<tr class="odd">
+<td align="left">US</td>
+<td align="right">20536</td>
+</tr>
+<tr class="even">
+<td align="left">VX</td>
+<td align="right">5162</td>
+</tr>
+<tr class="odd">
+<td align="left">WN</td>
+<td align="right">12275</td>
+</tr>
+<tr class="even">
+<td align="left">YV</td>
+<td align="right">601</td>
+</tr>
+</tbody>
+</table>
+<p>In this table, the counts of the carriers are pre-counted. To create a barplot using the data frame <code>flights_table</code>, we</p>
+<ul>
+<li>use <code>geom_col()</code> instead of <code>geom_bar()</code></li>
+<li>map the <code>y</code> aesthetic to the variable <code>number</code>.</li>
+</ul>
+<p>Compare this barplot using <code>geom_col</code> in Figure <a href="3-viz.html#fig:flightscol">3.22</a> with the earlier barplot using <code>geom_bar</code> in Figure <a href="3-viz.html#fig:flightsbar">3.21</a>. They are identical. However the input data we used for these are different.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights_table, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier, <span class="dt">y =</span> number)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_col</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:flightscol"></span>
+<img src="ismaykim_files/figure-html/flightscol-1.png" alt="Number of flights departing NYC in 2013 by airline using geom_col" width="\textwidth" />
+<p class="caption">
+Figure 3.22: Number of flights departing NYC in 2013 by airline using geom_col
+</p>
+</div>
+<!--
+**Technical note**: Refer to the use of `::` in both lines of code above.  This is another way of ensuring the correct function is called.  A `count` exists in a couple different packages and sometimes you'll receive strange errors when a different instance of a function is used.  This is a great way of telling R that "I want this one!".  You specify the name of the package directly before the `::` and then the name of the function immediately after `::`.
+-->
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.26)</strong> Why are histograms inappropriate for visualizing categorical variables?</p>
+<p><strong>(LC3.27)</strong> What is the difference between histograms and barplots?</p>
+<p><strong>(LC3.28)</strong> How many Envoy Air flights departed NYC in 2013?</p>
+<p><strong>(LC3.29)</strong> What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="must-avoid-pie-charts" class="section level3">
+<h3><span class="header-section-number">3.8.2</span> Must avoid pie charts!</h3>
+<p>Unfortunately, one of the most common plots seen today for categorical data is the pie chart. While they may see harmless enough, they actually present a problem in that humans are unable to judge angles well. As Naomi Robbins describes in her book “Creating More Effective Graphs” <span class="citation">(Robbins 2013)</span>, we overestimate angles greater than 90 degrees and we underestimate angles less than 90 degrees. In other words, it is difficult for us to determine relative size of one piece of the pie compared to another.</p>
+<p>Let’s examine our previous barplot example on the number of flights departing NYC by airline. This time we will use a pie chart. As you review this chart, try to identify</p>
+<ul>
+<li>how much larger the portion of the pie is for ExpressJet Airlines (<code>EV</code>) compared to US Airways (<code>US</code>),</li>
+<li>what the third largest carrier is in terms of departing flights, and</li>
+<li>how many carriers have fewer flights than United Airlines (<code>UA</code>)?</li>
+</ul>
+<div class="figure" style="text-align: center"><span id="fig:carrierpie"></span>
+<img src="ismaykim_files/figure-html/carrierpie-1.png" alt="The dreaded pie chart" width="\textwidth" />
+<p class="caption">
+Figure 3.23: The dreaded pie chart
+</p>
+</div>
+<p>While it is quite easy to look back at the barplot to get the answer to these questions, it’s quite difficult to get the answers correct when looking at the pie graph. Barplots can always present the information in a way that is easier for the eye to determine relative position. There may be one exception from Nathan Yau at <a href="https://flowingdata.com/2008/09/19/pie-i-have-eaten-and-pie-i-have-not-eaten/" title="Pie I Have Eaten and Pie I Have Not Eaten">FlowingData.com</a> but we will leave this for the reader to decide:</p>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-47"></span>
+<img src="images/Pie-I-have-Eaten.jpg" alt="The only good pie chart" width="\textwidth" />
+<p class="caption">
+Figure 3.24: The only good pie chart
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.30)</strong> Why should pie charts be avoided and replaced by barplots?</p>
+<p><strong>(LC3.31)</strong> What is your opinion as to why pie charts continue to be used?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="using-barplots-to-compare-two-categorical-variables" class="section level3">
+<h3><span class="header-section-number">3.8.3</span> Using barplots to compare two categorical variables</h3>
+<p>Barplots are the go-to way to visualize the frequency of different categories of a categorical variable. They make it easy to order the counts and to compare the frequencies of one group to another. Another use of barplots (unfortunately, sometimes inappropriately and confusingly) is to compare two categorical variables together. Let’s examine the distribution of outgoing flights from NYC by <code>carrier</code> and <code>airport</code>.</p>
+<p>We begin by getting the names of the airports in NYC that were included in the <code>flights</code> data-set. Here, we preview the <code>inner_join()</code> function from Chapter <a href="5-wrangling.html#wrangling">5</a>. This function will join the data frame <code>flights</code> with the data frame <code>airports</code> by matching rows that have the same airport code. However, in <code>flights</code> the airport code is included in the <code>origin</code> variable whereas in <code>airports</code> the airport code is included in the <code>faa</code> variable. We will revisit such examples in Section <a href="5-wrangling.html#joins">5.8</a> on joining data-sets.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_namedports &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">inner_join</span>(airports, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;origin&quot;</span> =<span class="st"> &quot;faa&quot;</span>))</code></pre>
+<p>After running <code>View(flights_namedports)</code>, we see that <code>name</code> now corresponds to the name of the airport as referenced by the <code>origin</code> variable. We will now plot <code>carrier</code> as the horizontal variable. When we specify <code>geom_bar</code>, it will specify <code>count</code> as being the vertical variable. A new addition here is <code>fill = name</code>. Look over what was produced from the plot to get an idea of what this argument gives.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights_namedports, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier, <span class="dt">fill =</span> name)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>()</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-50"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-50-1.png" alt="Stacked barplot comparing the number of flights by carrier and airport" width="\textwidth" />
+<p class="caption">
+Figure 3.25: Stacked barplot comparing the number of flights by carrier and airport
+</p>
+</div>
+<p>This plot is what is known as a <em>stacked barplot</em>. While simple to make, it often leads to many problems. For example in this plot, it is difficult to compare the heights of the different colors (corresponding to the number of flights from each airport) between the bars (corresponding to the different carriers).</p>
+<p>Note that <code>fill</code> is an <code>aes</code>thetic just like <code>x</code> is an <code>aes</code>thetic, and thus must be included within the parentheses of the <code>aes()</code> mapping. The following code, where the <code>fill</code> <code>aes</code>thetic is specified on the outside will yield an error. This is a fairly common error that new <code>ggplot</code> users make:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights_namedports, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier), <span class="dt">fill =</span> name) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>()</code></pre>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.32)</strong> What kinds of questions are not easily answered by looking at the above figure?</p>
+<p><strong>(LC3.33)</strong> What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights?</p>
+<div class="learncheck">
+
+</div>
+<p>Another variation on the stacked barplot is the <em>side-by-side barplot</em> also called a <em>dodged barplot</em>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights_namedports, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier, <span class="dt">fill =</span> name)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>(<span class="dt">position =</span> <span class="st">&quot;dodge&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-53"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-53-1.png" alt="Side-by-side AKA dodged barplot comparing the number of flights by carrier and airport" width="\textwidth" />
+<p class="caption">
+Figure 3.26: Side-by-side AKA dodged barplot comparing the number of flights by carrier and airport
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.34)</strong> Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case?</p>
+<p><strong>(LC3.35)</strong> What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general?</p>
+<div class="learncheck">
+
+</div>
+<p>Lastly, an often preferred type of barplot is the <em>faceted barplot</em>. We already saw this concept of faceting and small multiples in Section <a href="3-viz.html#facets">3.6</a>. This gives us a nicer way to compare the distributions across both <code>carrier</code> and airport/<code>name</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights_namedports, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier, <span class="dt">fill =</span> name)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>name, <span class="dt">ncol =</span> <span class="dv">1</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:facet-bar-vert"></span>
+<img src="ismaykim_files/figure-html/facet-bar-vert-1.png" alt="Faceted barplot comparing the number of flights by carrier and airport" width="\textwidth" />
+<p class="caption">
+Figure 3.27: Faceted barplot comparing the number of flights by carrier and airport
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC3.36)</strong> Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case?</p>
+<p><strong>(LC3.37)</strong> What information about the different carriers at different airports is more easily seen in the faceted barplot?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="summary-4" class="section level3">
+<h3><span class="header-section-number">3.8.4</span> Summary</h3>
+<p>Barplots are the preferred way of displaying categorical variables. They are easy-to-understand and make it easy to compare across groups of a categorical variable. When dealing with more than one categorical variable, faceted barplots are frequently preferred over side-by-side or stacked barplots. Stacked barplots are sometimes nice to look at, but it is quite difficult to compare across the levels since the sizes of the bars are all of different sizes. Side-by-side barplots can provide an improvement on this, but the issue about comparing across groups still must be dealt with.</p>
+</div>
+</div>
+<div id="conclusion-1" class="section level2">
+<h2><span class="header-section-number">3.9</span> Conclusion</h2>
+<div id="putting-it-all-together" class="section level3">
+<h3><span class="header-section-number">3.9.1</span> Putting it all together</h3>
+<p>Let’s recap all five of the Five Named Graphs (5NG) in Table <a href="3-viz.html#tab:viz-summary-table">3.5</a> summarizing their differences. Using these 5NG, you’ll be able to visualize the distributions and relationships of variables contained in a wide array of datasets. This will be even more the case as we start to map more variables to more of each <code>geom</code>etric object’s <code>aes</code>thetic attribute options, further unlocking the awesome power of the <code>ggplot2</code> package.</p>
+<table>
+<caption><span id="tab:viz-summary-table">Table 3.5: </span>Summary of 5NG</caption>
+<thead>
+<tr class="header">
+<th></th>
+<th align="left">Named graph</th>
+<th align="left">Shows</th>
+<th align="left">Geometric object</th>
+<th align="left">Notes</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>1</td>
+<td align="left">Scatterplot</td>
+<td align="left">Relationship between 2 numerical variables</td>
+<td align="left"><code>geom_point()</code></td>
+<td align="left"></td>
+</tr>
+<tr class="even">
+<td>2</td>
+<td align="left">Linegraph</td>
+<td align="left">Relationship between 2 numerical variables</td>
+<td align="left"><code>geom_line()</code></td>
+<td align="left">Used when there is a sequential order to x-variable e.g. time</td>
+</tr>
+<tr class="odd">
+<td>3</td>
+<td align="left">Histogram</td>
+<td align="left">Distribution of 1 numerical variable</td>
+<td align="left"><code>geom_histogram()</code></td>
+<td align="left">Facetted histogram shows distribution of 1 numerical variable split by 1 categorical variable</td>
+</tr>
+<tr class="even">
+<td>4</td>
+<td align="left">Boxplot</td>
+<td align="left">Distribution of 1 numerical variable split by 1 categorical variable</td>
+<td align="left"><code>geom_boxplot()</code></td>
+<td align="left"></td>
+</tr>
+<tr class="odd">
+<td>5</td>
+<td align="left">Barplot</td>
+<td align="left">Distribution of 1 categorical variable</td>
+<td align="left"><code>geom_barplot()</code> when counts are not pre-counted</td>
+<td align="left">Stacked &amp; dodged barplots show distribution of 2 categorical variables</td>
+</tr>
+<tr class="even">
+<td></td>
+<td align="left"></td>
+<td align="left"></td>
+<td align="left"><code>geom_col()</code> when counts are pre-counted</td>
+<td align="left"></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div id="review-questions" class="section level3">
+<h3><span class="header-section-number">3.9.2</span> Review questions</h3>
+<p>Review questions have been designed using the <a href="https://rudeboybert.github.io/fivethirtyeight/"><code>fivethirtyeight</code> R package</a> <span class="citation">(Kim, Ismay, and Chunn 2019)</span> with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course <strong>Effective Data Storytelling using the <code>tidyverse</code></strong>. The material in this chapter is covered in the chapters of the DataCamp course available below:</p>
+<ul>
+<li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17581?ex=1">Scatterplots &amp; Linegraphs</a></li>
+<li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17582?ex=1">Histograms &amp; Boxplots</a></li>
+<li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17583?ex=1">Barplots</a></li>
+<li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17584?ex=1">ggplot2 Review</a></li>
+</ul>
+</div>
+<div id="whats-to-come-1" class="section level3">
+<h3><span class="header-section-number">3.9.3</span> What’s to come?</h3>
+<p>In Chapter <a href="4-tidy.html#tidy">4</a>, we’ll introduce the concept of “tidy data” and how it is used as a key data format for all the packages we use in this textbook. You’ll see that the concept appears to be simple, but actually can be a little challenging to decipher without careful practice. We’ll also investigate how to import CSV (comma-separated value) files into R using the <code>readr</code> package.</p>
+</div>
+<div id="resources" class="section level3">
+<h3><span class="header-section-number">3.9.4</span> Resources</h3>
+<p>An excellent resource as you begin to create plots using the <code>ggplot2</code> package is a cheatsheet that RStudio has put together entitled “Data Visualization with ggplot2” available</p>
+<ul>
+<li>by clicking <a href="https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf">here</a> or</li>
+<li>by clicking the RStudio Menu Bar -&gt; Help -&gt; Cheatsheets -&gt; “Data Visualization with <code>ggplot2</code>”</li>
+</ul>
+<p>This cheatsheet covers more than what we’ve discussed in this chapter but provides nice visual descriptions of what each function produces.</p>
+<!--
+Fix this later
+-->
+<!--
+In addition, we've created a mind map to help you remember which types of plots are most appropriate in a given situation by identifying the types of variables involved in the problem.  It is available [here](https://coggle.it/diagram/V_G2gzukTDoQ-aZt-) and below.
+-->
+</div>
+<div id="script-of-r-code" class="section level3">
+<h3><span class="header-section-number">3.9.5</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/03-visualization.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="2-getting-started.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="4-tidy.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/03-visualization.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/4-tidy.html b/previous_versions/v0.4.0/4-tidy.html
new file mode 100644
index 000000000..83833fde5
--- /dev/null
+++ b/previous_versions/v0.4.0/4-tidy.html
@@ -0,0 +1,1007 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>4 Tidy Data via tidyr | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="4 Tidy Data via tidyr | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="4 Tidy Data via tidyr | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="3-viz.html">
+<link rel="next" href="5-wrangling.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="tidy" class="section level1">
+<h1><span class="header-section-number">4</span> Tidy Data via tidyr</h1>
+<p>In Subsection <a href="2-getting-started.html#programming-concepts">2.2.1</a> we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section <a href="2-getting-started.html#nycflights13">2.4</a>, we started explorations of our first data frame <code>flights</code> included in the <code>nycflights13</code> package. In Chapter <a href="3-viz.html#viz">3</a> we made graphics using data contained in <code>flights</code> and other data frames.</p>
+<p>In this chapter, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest of having your data “neatly organized” in a spreadsheet. Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored and the implications of these rules on analyses.</p>
+<p>Although knowledge of this type of data formatting was not necessary in our treatment of data visualization in Chapter <a href="3-viz.html#viz">3</a> since all the data was already in tidy format, we’ll see going forward that having tidy data will allow you to more easily create data visualizations in a wide range of settings. Furthermore, it will also help you with data wrangling in Chapter <a href="5-wrangling.html#wrangling">5</a> and in all subsequent chapters in this book when we cover regression and discuss statistical inference.</p>
+<div id="needed-packages-1" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(nycflights13)
+<span class="kw">library</span>(tidyr)
+<span class="kw">library</span>(readr)</code></pre>
+</div>
+<div id="datacamp-2" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>Our approach to introducing the concept of “tidy” data is aligned with the approach taken in <a href="https://twitter.com/apreshill">Alison Hill’s</a> DataCamp course “Working with Data in the Tidyverse,” a course where students learn to work with data using tools from the tidyverse in R. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapter is Chapter 3 “Tidy your data.”</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/working-with-data-in-the-tidyverse"><img src="images/datacamp_working_with_data.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+<hr />
+<!--Subsection on Tidy Data -->
+</div>
+<div id="what-is-tidy-data" class="section level2">
+<h2><span class="header-section-number">4.1</span> What is tidy data?</h2>
+<p>You have surely heard the word “tidy” in your life:</p>
+<ul>
+<li>“Tidy up your room!”</li>
+<li>“Please write your homework in a tidy way so that it is easier to grade and to provide feedback.”</li>
+<li>Marie Kondo’s best-selling book <a href="https://www.amazon.com/Life-Changing-Magic-Tidying-Decluttering-Organizing/dp/1607747308/ref=sr_1_1?ie=UTF8&amp;qid=1469400636&amp;sr=8-1&amp;keywords=tidying+up"><em>The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing</em></a></li>
+<li>“I am not by any stretch of the imagination a tidy person, and the piles of unread books on the coffee table and by my bed have a plaintive, pleading quality to me - ‘Read me, please!’” - Linda Grant</li>
+</ul>
+<p>What does it mean for your data to be “tidy”? Beyond just being organized, in the context of this book having “tidy” data means that your data follows a standardized format. This makes it easier for you and others to visualize your data, to wrangle/transform your data, and to model your data. We will follow Hadley Wickham’s definition of <em>tidy data</em> here <span class="citation">(Wickham 2014)</span>:</p>
+<blockquote>
+<p>A dataset is a collection of values, usually either numbers (if quantitative)
+or strings AKA text data (if qualitative). Values are organised in two ways.
+Every value belongs to a variable and an observation. A variable contains all
+values that measure the same underlying attribute (like height, temperature,
+duration) across units. An observation contains all values measured on the same
+unit (like a person, or a day, or a city) across attributes.</p>
+</blockquote>
+<blockquote>
+<p>Tidy data is a standard way of mapping the meaning of a dataset to its
+structure. A dataset is messy or tidy depending on how rows, columns and tables
+are matched up with observations, variables and types. In <em>tidy data</em>:</p>
+</blockquote>
+<blockquote>
+<ol style="list-style-type: decimal">
+<li>Each variable forms a column.</li>
+<li>Each observation forms a row.</li>
+<li>Each type of observational unit forms a table.</li>
+</ol>
+</blockquote>
+<div class="figure" style="text-align: center"><span id="fig:tidyfig"></span>
+<img src="images/tidy-1.png" alt="Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html" width="\textwidth" />
+<p class="caption">
+Figure 4.1: Tidy data graphic from <a href="http://r4ds.had.co.nz/tidy-data.html" class="uri">http://r4ds.had.co.nz/tidy-data.html</a>
+</p>
+</div>
+<p>For example, say the following table consists of stock prices:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-58">Table 4.1: </span>Stock Prices (Non-Tidy Format)</caption>
+<thead>
+<tr class="header">
+<th align="left">Date</th>
+<th align="left">Boeing Stock Price</th>
+<th align="left">Amazon Stock Price</th>
+<th align="left">Google Stock Price</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">2009-01-01</td>
+<td align="left">$173.55</td>
+<td align="left">$174.90</td>
+<td align="left">$174.34</td>
+</tr>
+<tr class="even">
+<td align="left">2009-01-02</td>
+<td align="left">$172.61</td>
+<td align="left">$171.42</td>
+<td align="left">$170.04</td>
+</tr>
+</tbody>
+</table>
+<p>Although the data are neatly organized in a spreadsheet-type format, they are not in tidy format since there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), but there are not three columns. In tidy data format each variable should be its own column, as shown below. Notice that both tables present the same information, but in different formats.</p>
+<table>
+<caption><span id="tab:unnamed-chunk-59">Table 4.2: </span>Stock Prices (Tidy Format)</caption>
+<thead>
+<tr class="header">
+<th align="left">Date</th>
+<th align="left">Stock Name</th>
+<th align="left">Stock Price</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">2009-01-01</td>
+<td align="left">Boeing</td>
+<td align="left">$173.55</td>
+</tr>
+<tr class="even">
+<td align="left">2009-01-02</td>
+<td align="left">Boeing</td>
+<td align="left">$172.61</td>
+</tr>
+<tr class="odd">
+<td align="left">2009-01-01</td>
+<td align="left">Amazon</td>
+<td align="left">$174.90</td>
+</tr>
+<tr class="even">
+<td align="left">2009-01-02</td>
+<td align="left">Amazon</td>
+<td align="left">$171.42</td>
+</tr>
+<tr class="odd">
+<td align="left">2009-01-01</td>
+<td align="left">Google</td>
+<td align="left">$174.34</td>
+</tr>
+<tr class="even">
+<td align="left">2009-01-02</td>
+<td align="left">Google</td>
+<td align="left">$170.04</td>
+</tr>
+</tbody>
+</table>
+<p>However, consider the following table</p>
+<table>
+<caption><span id="tab:unnamed-chunk-60">Table 4.3: </span>Date, Boeing Price, Weather Data</caption>
+<thead>
+<tr class="header">
+<th align="left">Date</th>
+<th align="left">Boeing Price</th>
+<th align="left">Weather</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">2009-01-01</td>
+<td align="left">$173.55</td>
+<td align="left">Sunny</td>
+</tr>
+<tr class="even">
+<td align="left">2009-01-02</td>
+<td align="left">$172.61</td>
+<td align="left">Overcast</td>
+</tr>
+</tbody>
+</table>
+<p>In this case, even though the variable “Boeing Price” occurs again, the data <em>is</em> tidy since there are three variables corresponding to three unique pieces of information (Date, Boeing stock price, and the weather that particular day).</p>
+<p>The non-tidy data format in the original table is also known as <a href="https://en.wikipedia.org/wiki/Wide_and_narrow_data">“wide”</a> format whereas the tidy data format in the second table is also known as <a href="https://en.wikipedia.org/wiki/Wide_and_narrow_data#Narrow">“long/narrow”</a> data format.</p>
+<p>In this book, we will work mostly with datasets that are already in tidy format even though a lot of the world’s data isn’t always in this nice format that the <code>tidyverse</code> gets its name from. Data that is in wide format can be converted to “tidy” format by using the <code>gather()</code> function in the <code>tidyr</code> package <span class="citation">(Wickham and Henry 2018)</span> in the <code>tidyverse</code>; we’ll show an example of this in Section <a href="4-tidy.html#tidying">4.4</a>. For other examples of converting a dataset into “tidy” format, check out the different functions available for data tidying and a case study using data from the World Health Organization in <a href="http://r4ds.had.co.nz/tidy-data.html">R for Data Science</a> <span class="citation">(Grolemund and Wickham 2016)</span>.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC4.1)</strong> Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article <a href="https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/">Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?</a></p>
+<pre><code># A tibble: 3 x 4
+  country     beer_servings spirit_servings wine_servings
+  &lt;chr&gt;               &lt;int&gt;           &lt;int&gt;         &lt;int&gt;
+1 Canada                240             122           100
+2 South Korea           140              16             9
+3 USA                   249             158            84</code></pre>
+<p>This data frame is not in tidy format. What would it look like if it were?</p>
+<div class="learncheck">
+
+</div>
+<hr />
+</div>
+<div id="back-to-nycflights13" class="section level2">
+<h2><span class="header-section-number">4.2</span> Back to nycflights13</h2>
+<p>Recall the <code>nycflights13</code> package with data about all domestic flights departing from New York City in 2013 that we introduced in Section <a href="2-getting-started.html#nycflights13">2.4</a> and used extensively in Chapter <a href="3-viz.html#viz">3</a> to create visualizations. In particular, let’s revisit the <code>flights</code> data frame by running <code>View(flights)</code> in your console. We see that <code>flights</code> has a rectangular shape with each row corresponding to a different flight and each column corresponding to a characteristic of that flight. This matches exactly with how Hadley Wickham defined tidy data:</p>
+<ol style="list-style-type: decimal">
+<li>Each variable forms a column.</li>
+<li>Each observation forms a row.</li>
+</ol>
+<p>But what about the third property?</p>
+<blockquote>
+<ol start="3" style="list-style-type: decimal">
+<li>Each type of observational unit forms a table.</li>
+</ol>
+</blockquote>
+<div id="observational-units" class="section level3">
+<h3><span class="header-section-number">4.2.1</span> Observational units</h3>
+<p>We identified earlier that the observational unit in the <code>flights</code> dataset is an individual flight. And we have shown that this dataset consists of 336,776 flights with 19 variables. In other words, rows of this dataset don’t refer to a measurement on an airline or on an airport; they refer to characteristics/measurements on a given flight from New York City in 2013.</p>
+<p>Also included in the <code>nycflights13</code> package are datasets with different observational units <span class="citation">(Wickham 2018)</span>:</p>
+<ul>
+<li><code>airlines</code>: translation between two letter IATA carrier codes and names (16 in total)</li>
+<li><code>planes</code>: construction information about each of 3,322 planes used</li>
+<li><code>weather</code>: hourly meteorological data (about 8705 observations) for each of the three NYC airports</li>
+<li><code>airports</code>: airport names and locations</li>
+</ul>
+<p>The organization of this data follows the third “tidy” data property: observations corresponding to the same observational unit should be saved in the same table/data frame.</p>
+<p>Another example involves a spreadsheet of all students enrolled in a university along with information about them, such as name, gender, and date of birth. Each row represents an individual student, which is the observational unit in question.</p>
+</div>
+<div id="identification-vs-measurement" class="section level3">
+<h3><span class="header-section-number">4.2.2</span> Identification vs measurement variables</h3>
+<p>There is a subtle difference between the kinds of variables that you will encounter in data frames: <em>measurement variables</em> and <em>identification variables</em>. The <code>airports</code> data frame you worked with above contains both these types of variables. Recall that in <code>airports</code> the observational unit is an airport, and thus each row corresponds to one particular airport. Let’s pull them apart using the <code>glimpse</code> function:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(airports)</code></pre>
+<pre><code>Observations: 1,458
+Variables: 8
+$ faa   &lt;chr&gt; &quot;04G&quot;, &quot;06A&quot;, &quot;06C&quot;, &quot;06N&quot;, &quot;09J&quot;, &quot;0A9&quot;, &quot;0G6&quot;, &quot;0G7&quot;, &quot;0P2&quot;, …
+$ name  &lt;chr&gt; &quot;Lansdowne Airport&quot;, &quot;Moton Field Municipal Airport&quot;, &quot;Schaumbu…
+$ lat   &lt;dbl&gt; 41.13047, 32.46057, 41.98934, 41.43191, 31.07447, 36.37122, 41.…
+$ lon   &lt;dbl&gt; -80.61958, -85.68003, -88.10124, -74.39156, -81.42778, -82.1734…
+$ alt   &lt;int&gt; 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 409, 875, 1…
+$ tz    &lt;dbl&gt; -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -5, -5, -5,…
+$ dst   &lt;chr&gt; &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;…
+$ tzone &lt;chr&gt; &quot;America/New_York&quot;, &quot;America/Chicago&quot;, &quot;America/Chicago&quot;, &quot;Amer…</code></pre>
+<p>The variables <code>faa</code> and <code>name</code> are what we will call <em>identification variables</em>: variables that uniquely identify each observational unit. They are mainly used to provide a unique name to each observational unit, thereby allowing us to uniquely identify them. <code>faa</code> gives the unique code provided by the FAA for that airport, while the <code>name</code> variable gives the longer more natural name of the airport. The remaining variables (<code>lat</code>, <code>lon</code>, <code>alt</code>, <code>tz</code>, <code>dst</code>, <code>tzone</code>) are often called <em>measurement</em> or <em>characteristic</em> variables: variables that describe properties of each observational unit, in other words each observation in each row. For example, <code>lat</code> and <code>long</code> describe the latitude and longitude of each airport.</p>
+<p>So in our above example of a spreadsheet of all students enrolled at a university, email address could be treated as an identical variable since it uniquely identifies each observational unit i.e. each student, while date of birth could not since it is possible (and highly probable) that two students share the same birthday.</p>
+<p>Furthermore, sometimes a single variable might not be enough to uniquely identify each observational unit: combinations of variables might be needed (see Learning Check below). While it is not an absolute rule, for organizational purposes it is considered good practice to have your identification variables in the far left-most columns of your data frame.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC4.2)</strong> What properties of the observational unit do each of <code>lat</code>, <code>lon</code>, <code>alt</code>, <code>tz</code>, <code>dst</code>, and <code>tzone</code> describe for the <code>airports</code> data frame? Note that you may want to use <code>?airports</code> to get more information.</p>
+<p><strong>(LC4.3)</strong> Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.</p>
+<div class="learncheck">
+
+</div>
+<hr />
+</div>
+</div>
+<div id="csv" class="section level2">
+<h2><span class="header-section-number">4.3</span> Importing spreadsheets into R</h2>
+<p>Up to this point, we’ve used data either stored inside of an R package or we’ve manually created the data such as the <code>fruits</code> and <code>fruits_counted</code> data in Subsection <a href="3-viz.html#geombar">3.8</a>. Another common way to get data into R is by reading in data from a spreadsheet file either on your computer or online. Spreadsheet data is often saved in one of two formats:</p>
+<ul>
+<li>A <em>Comma Separated Values</em> <code>.csv</code> file. You can think of a CSV file as a bare-bones spreadsheet where:
+<ul>
+<li>Each line in the file corresponds to one row of data/one observation.</li>
+<li>Values for each line are separated with commas. In other words, the values of different variables are separated by commas.</li>
+<li>The first line is often, but not always, a <em>header</em> row indicating the names of the columns/variables.</li>
+</ul></li>
+<li>An Excel <code>.xlsx</code> file. This format is based on Microsoft’s proprietary Excel software. As opposed to a bare-bones <code>.csv</code> files, <code>.xlsx</code> Excel files contain a lot of <em>metadata</em>, or put more simply, data about the data. Examples include the use of bold and italic fonts, colored cells, different column widths, and formula macros etc.</li>
+</ul>
+<p><a href="https://www.google.com/sheets/about/">Google Sheets</a> allows you to download your data in both comma separated values <code>.csv</code> and Excel <code>.xlsx</code> formats: Go to the Google Sheets menu bar -&gt; File -&gt; Download as -&gt; Select “Microsoft Excel” or “Comma-separated values”.</p>
+<p>We’ll cover two methods for importing data in R: one using the R console and the other using RStudio’s graphical interface.</p>
+<div id="method-1-from-the-console" class="section level3">
+<h3><span class="header-section-number">4.3.1</span> Method 1: From the console</h3>
+<p>First, let’s download a <em>Comma Separated Values</em> (CSV) file of ratings of the level of democracy in different countries spanning 1952 to 1992: <a href="https://moderndive.com/data/dem_score.csv" class="uri">https://moderndive.com/data/dem_score.csv</a>. We use the <code>read_csv()</code> function from the <code>readr</code> package to read it off the web and then take a look.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(readr)
+dem_score &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;https://moderndive.com/data/dem_score.csv&quot;</span>)
+dem_score</code></pre>
+<pre><code># A tibble: 96 x 10
+   country    `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
+   &lt;chr&gt;       &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
+ 1 Albania        -9     -9     -9     -9     -9     -9     -9     -9      5
+ 2 Argentina      -9     -1     -1     -9     -9     -9     -8      8      7
+ 3 Armenia        -9     -7     -7     -7     -7     -7     -7     -7      7
+ 4 Australia      10     10     10     10     10     10     10     10     10
+ 5 Austria        10     10     10     10     10     10     10     10     10
+ 6 Azerbaijan     -9     -7     -7     -7     -7     -7     -7     -7      1
+ 7 Belarus        -9     -7     -7     -7     -7     -7     -7     -7      7
+ 8 Belgium        10     10     10     10     10     10     10     10     10
+ 9 Bhutan        -10    -10    -10    -10    -10    -10    -10    -10    -10
+10 Bolivia        -4     -3     -3     -4     -7     -7      8      9      9
+# … with 86 more rows</code></pre>
+<p>In this <code>dem_score</code> data frame, the minimum value of -10 corresponds to a highly autocratic nation whereas a value of 10 corresponds to a highly democratic nation. <!--Note also that backticks surround the different names of the columns here.  Variable names are not allowed to start with a number but this can be worked around by surrounding the column name in backticks.  Variable names also can't include spaces so if you'd like to refer to the variable **Stock Names** above, for example, you'll need to surround it in backticks: `` `Stock Names` ``.--></p>
+</div>
+<div id="method-2-using-rstudios-interface" class="section level3">
+<h3><span class="header-section-number">4.3.2</span> Method 2: Using RStudio’s interface</h3>
+<p>Let’s read in the same data saved in Excel format this time at <a href="https://moderndive.com/data/dem_score.xlsx" class="uri">https://moderndive.com/data/dem_score.xlsx</a>, but using RStudio’s graphical interface instead of via the R console. First download the Excel file, then go to the Files pane of RStudio -&gt; Navigate to the directory where your downloaded <code>dem_score.xlsx</code> is saved -&gt; Click on <code>dem_score.xlsx</code> -&gt; Click “Import Dataset…” -&gt; Click “Import Dataset…” At this point you should see an image like in</p>
+<p><img src="images/read_excel.png" /></p>
+<p>After clicking on the “Import” button on the bottom right RStudio save this spreadsheet’s data in a data frame called <code>dem_score</code> and display its contents in the spreadsheet viewer. Furthermore you’ll see the code that read in your data in the console; you can copy and paste this code to reload your data again later instead of repeating the above manual process.</p>
+<hr />
+</div>
+</div>
+<div id="tidying" class="section level2">
+<h2><span class="header-section-number">4.4</span> Converting to “tidy” data format</h2>
+<p>In this Section, we’ll show you how to convert a dataset that isn’t in “tidy” format i.e. “wide” format, to a dataset that is in “tidy” format i.e. “long/narrow” format. Let’s use the <code>dem_score</code> data frame we loaded from a spreadsheet in the previous Section but focus on only data corresponding to the country of Guatemala.</p>
+<pre class="sourceCode r"><code class="sourceCode r">guat_dem &lt;-<span class="st"> </span>dem_score <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(country <span class="op">==</span><span class="st"> &quot;Guatemala&quot;</span>)
+guat_dem</code></pre>
+<pre><code># A tibble: 1 x 10
+  country   `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
+  &lt;chr&gt;      &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
+1 Guatemala      2     -6     -5      3      1     -3     -7      3      3</code></pre>
+<p>Now let’s produce a plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Let’s start by laying out how we would map our aesthetics to variables in the data frame:</p>
+<ul>
+<li>The <code>data</code> frame is <code>guat_dem</code> by setting <code>data = guat_dem</code></li>
+</ul>
+<p>What are the names of the variables to plot? We’d like to see how the democracy score has changed over the years. Now we are stuck in a predicament. We see that we have a variable named <code>country</code> but its only value is <code>&quot;Guatemala&quot;</code>. We have other variables denoted by different year values. Unfortunately, we’ve run into a dataset that is not in the appropriate format to apply the Grammar of Graphics and <code>ggplot2</code>. Remember that <code>ggplot2</code> is a package in the <code>tidyverse</code> and, thus, needs data to be in a tidy format. We’d like to finish off our mapping of aesthetics to variables by doing something like</p>
+<ul>
+<li>The <code>aes</code>thetic mapping is set by <code>aes(x = year, y = democracy_score)</code></li>
+</ul>
+<p>but this is not possible with our wide-formatted data. We need to take the values of the current column names in <code>guat_dem</code> (aside from <code>country</code>) and convert them into a new variable that will act as a key called <code>year</code>. Then, we’d like to take the numbers on the inside of the table and turn them into a column that will act as values called <code>democracy_score</code>. Our resulting data frame will have three columns: <code>country</code>, <code>year</code>, and <code>democracy_score</code>.</p>
+<p>The <code>gather()</code> function in the <code>tidyr</code> package can complete this task for us. The first argument to <code>gather()</code>, just as with <code>ggplot2()</code>, is the <code>data</code> argument where we specify which data frame we would like to tidy. The next two arguments to <code>gather()</code> are <code>key</code> and <code>value</code>, which specify what we’d like to call the new columns that convert our wide data into long format. Lastly, we include a specification for variables we’d like to NOT include in this tidying process using a <code>-</code>.</p>
+<!-- Should we include a mention of also including all the variables you'd like to include? I rarely do this and use the negation instead. -->
+<!-- I like not teaching the pipe here since the data argument is the same as what they are used to with ggplot2 -->
+<pre class="sourceCode r"><code class="sourceCode r">guat_tidy &lt;-<span class="st"> </span><span class="kw">gather</span>(<span class="dt">data =</span> guat_dem, 
+                    <span class="dt">key =</span> year,
+                    <span class="dt">value =</span> democracy_score,
+                    <span class="op">-</span><span class="st"> </span>country) 
+guat_tidy</code></pre>
+<pre><code># A tibble: 9 x 3
+  country   year  democracy_score
+  &lt;chr&gt;     &lt;chr&gt;           &lt;dbl&gt;
+1 Guatemala 1952                2
+2 Guatemala 1957               -6
+3 Guatemala 1962               -5
+4 Guatemala 1967                3
+5 Guatemala 1972                1
+6 Guatemala 1977               -3
+7 Guatemala 1982               -7
+8 Guatemala 1987                3
+9 Guatemala 1992                3</code></pre>
+<p>We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a linegraph and <code>ggplot2</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> guat_tidy, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> year, <span class="dt">y =</span> democracy_score)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_line</span>()</code></pre>
+<pre><code>geom_path: Each group consists of only one observation. Do you need to adjust
+the group aesthetic?</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-70-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<!-- Arg, this is really annoying that gather() doesn't see that these are all numbers.  Do you know a way around this? I usually just go mutate(year = as.numeric(year) but they don't know mutate() yet. -->
+<p>Observe that the <code>year</code> variable in <code>guat_tidy</code> is stored as a character vector since we had to circumvent the naming rules in R by adding backticks around the different year columns in <code>guat_dem</code>. This is leading to <code>ggplot</code> not knowing exactly how to plot a line using a categorical variable. We can fix this by using the <code>parse_number()</code> function in the <code>readr</code> package and then specify the horizontal axis label to be <code>&quot;year&quot;</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> guat_tidy, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> <span class="kw">parse_number</span>(year), <span class="dt">y =</span> democracy_score)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_line</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;year&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:guatline"></span>
+<img src="ismaykim_files/figure-html/guatline-1.png" alt="Guatemala's democracy score ratings from 1952 to 1992" width="\textwidth" />
+<p class="caption">
+Figure 4.2: Guatemala’s democracy score ratings from 1952 to 1992
+</p>
+</div>
+<p>We’ll see in Chapter <a href="5-wrangling.html#wrangling">5</a> how we could use the <code>mutate()</code> function to change <code>year</code> to be a numeric variable instead after we have done our tidying. Notice now that the mappings of aesthetics to variables make sense in Figure <a href="4-tidy.html#fig:guatline">4.2</a>:</p>
+<ul>
+<li>The <code>data</code> frame is <code>guat_tidy</code> by setting <code>data = dem_score</code></li>
+<li>The <code>x</code> <code>aes</code>thetic is mapped to <code>year</code></li>
+<li>The <code>y</code> <code>aes</code>thetic is mapped to <code>democracy_score</code></li>
+<li>The <code>geom_</code>etry chosen is <code>line</code></li>
+</ul>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC4.4)</strong> Convert the <code>dem_score</code> data frame into
+a tidy data frame and assign the name of <code>dem_score_tidy</code> to the resulting long-formatted data frame.</p>
+<p><strong>(LC4.5)</strong> Read in the life expectancy data stored at <a href="https://moderndive.com/data/le_mess.csv" class="uri">https://moderndive.com/data/le_mess.csv</a> and convert it to a tidy data frame.</p>
+<div class="learncheck">
+
+</div>
+<hr />
+</div>
+<div id="optional-normal-forms-of-data" class="section level2">
+<h2><span class="header-section-number">4.5</span> Optional: Normal forms of data</h2>
+<p>The datasets included in the <code>nycflights13</code> package are in a form that minimizes redundancy of data. We will see that there are ways to <em>merge</em> (or <em>join</em>) the different tables together easily. We are capable of doing so because each of the tables have <em>keys</em> in common to relate one to another. This is an important property of <strong>normal forms</strong> of data. The process of decomposing data frames into less redundant tables without losing information is called <strong>normalization</strong>. More information is available on <a href="https://en.wikipedia.org/wiki/Database_normalization">Wikipedia</a>.</p>
+<p>We saw an example of this above with the <code>airlines</code> dataset. While the <code>flights</code> data frame could also include a column with the names of the airlines instead of the carrier code, this would be repetitive since there is a unique mapping of the carrier code to the name of the airline/carrier.</p>
+<p>Below an example is given showing how to <strong>join</strong> the <code>airlines</code> data frame together with the <code>flights</code> data frame by linking together the two datasets via a common <strong>key</strong> of <code>&quot;carrier&quot;</code>. Note that this “joined” data frame is assigned to a new data frame called <code>joined_flights</code>. The <strong>key</strong> variable that we frequently join by is one of the <em>identification variables</em> mentioned above.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+joined_flights &lt;-<span class="st"> </span><span class="kw">inner_join</span>(<span class="dt">x =</span> flights, <span class="dt">y =</span> airlines, <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(joined_flights)</code></pre>
+<p>If we <code>View</code> this dataset, we see a new variable has been created called <code>name</code>. (We will see in Subsection <a href="5-wrangling.html#rename">5.9.2</a> ways to change <code>name</code> to a more descriptive variable name.) More discussion about joining data frames together will be given in Chapter <a href="5-wrangling.html#wrangling">5</a>. We will see there that the names of the columns to be linked need not match as they did here with <code>&quot;carrier&quot;</code>.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC4.6)</strong> What are common characteristics of “tidy” datasets?</p>
+<p><strong>(LC4.7)</strong> What makes “tidy” datasets useful for organizing data?</p>
+<p><strong>(LC4.8)</strong> What are some advantages of data in normal forms? What are some disadvantages?</p>
+<div class="learncheck">
+
+</div>
+<hr />
+</div>
+<div id="conclusion-2" class="section level2">
+<h2><span class="header-section-number">4.6</span> Conclusion</h2>
+<div id="review-questions-1" class="section level3">
+<h3><span class="header-section-number">4.6.1</span> Review questions</h3>
+<!-- Need to include an exercise in the DataCamp course on using gather() to turn the `police_locals` data frame into a tidy data frame. -->
+<p>Review questions have been designed using the <code>fivethirtyeight</code> R package <span class="citation">(Kim, Ismay, and Chunn 2019)</span> with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course <strong>Effective Data Storytelling using the <code>tidyverse</code></strong>. The material in this chapter is covered in the <strong>Tidy Data</strong> chapter of the DataCamp course available <a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse/tidy-data">here</a>.</p>
+</div>
+<div id="whats-to-come-2" class="section level3">
+<h3><span class="header-section-number">4.6.2</span> What’s to come?</h3>
+<p>In Chapter <a href="5-wrangling.html#wrangling">5</a>, we’ll further explore data in tidy format by grouping our data, creating summaries based on those groupings, filtering our data to match conditions, and performing other wranglings with our data including defining new columns/variables. These data wrangling procedures will go hand-in-hand with the data visualizations you’ve produced in Chapter <a href="3-viz.html#viz">3</a>.</p>
+</div>
+<div id="script-of-r-code-1" class="section level3">
+<h3><span class="header-section-number">4.6.3</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/04-tidy.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="3-viz.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="5-wrangling.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/04-tidy.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/4-wrangling.html b/previous_versions/v0.4.0/4-wrangling.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/previous_versions/v0.4.0/5-multiple-regression.html b/previous_versions/v0.4.0/5-multiple-regression.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/previous_versions/v0.4.0/5-wrangling.html b/previous_versions/v0.4.0/5-wrangling.html
new file mode 100644
index 000000000..51ef75ea8
--- /dev/null
+++ b/previous_versions/v0.4.0/5-wrangling.html
@@ -0,0 +1,1465 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>5 Data Wrangling via dplyr | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="5 Data Wrangling via dplyr | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="5 Data Wrangling via dplyr | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="4-tidy.html">
+<link rel="next" href="6-regression.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="wrangling" class="section level1">
+<h1><span class="header-section-number">5</span> Data Wrangling via dplyr</h1>
+<!--
+- Make sure to refer back to plots in the viz chapter and consider how the
+  material here relates to answering the questions posed in viz chapter
+-->
+<p>Let’s briefly recap where we have been so far and where we are headed. In Chapter <a href="4-tidy.html#tidy">4</a>, we discussed what it means for data to be tidy. We saw that this refers to observations corresponding to rows and variables being stored in columns (one variable for every column). The entries in the data frame correspond to different combinations of observations (specific instances of observational units) and variables. In the <code>flights</code> data frame, we saw that each row corresponds to a different flight leaving New York City. In other words, the <strong>observational unit</strong> of the <code>flights</code> tidy data frame is a flight. The variables are listed as columns, and for <code>flights</code> these columns include both quantitative variables like <code>dep_delay</code> and <code>distance</code> and also categorical variables like <code>carrier</code> and <code>origin</code>. An entry in the table corresponds to a particular flight on a given day and a particular value of a given variable representing that flight.</p>
+<p>Armed with this knowledge and looking back on Chapter <a href="3-viz.html#viz">3</a>, we see that organizing data in this tidy way makes it easy for us to produce graphics, specifically a set of 5 common graphics we termed the <em>5 Named Graphics</em> (5NG):</p>
+<ol style="list-style-type: decimal">
+<li>scatterplots</li>
+<li>linegraphs</li>
+<li>boxplots</li>
+<li>histograms</li>
+<li>barplots</li>
+</ol>
+<p>We can simply specify what variable/column we would like on one axis, (if applicable) what variable we’d like on the other axis, and what type of plot we’d like to make by specifying the <code>geom</code>etric object in question. We can also vary aesthetic attributes of the geometric objects in question (points, lines, bar), such as the size and color, along the values of another variable in this tidy dataset. Recall the Gapminder example from Figure <a href="3-viz.html#fig:gapminder">3.1</a>.</p>
+<p>Lastly, in a few spots in Chapter <a href="3-viz.html#viz">3</a> and Chapter <a href="4-tidy.html#tidy">4</a>, we hinted at some ways to summarize and wrangle data to suit your needs, using the <code>filter()</code> and <code>inner_join()</code> functions. This chapter expands on these two functions and provides you with various new data wrangling tools from the <code>dplyr</code> package <span class="citation">(Wickham et al. 2019)</span> for your data science toolbox.</p>
+<div id="needed-packages-2" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(nycflights13)</code></pre>
+</div>
+<div id="datacamp-3" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>Our approach to introducing data wrangling tools from the <code>dplyr</code> package is very similar to the approach taken in <a href="https://twitter.com/drob">David Robinson’s</a> DataCamp course “Introduction to the Tidyverse,” a course targeted at people new to R and the tidyverse. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 on “Data wrangling” and Chapter 3 on “Grouping and summarizing”.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/introduction-to-the-tidyverse"><img src="images/datacamp_intro_to_tidyverse.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+<p>While not required for this book, if you would like a quick peek at more powerful tools to explore, tame, tidy, and transform data, we suggest you take <a href="https://twitter.com/apreshill">Alison Hill’s</a> DataCamp course “Working with Data in the Tidyverse,” Click on the image below to access the course. The relevant chapter is Chapter 3 “Tidy your data.”</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/working-with-data-in-the-tidyverse"><img src="images/datacamp_working_with_data.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="piping" class="section level2">
+<h2><span class="header-section-number">5.1</span> The pipe <code>%&gt;%</code></h2>
+<p>Before we dig into data wrangling, let’s first introduce the pipe operator (<code>%&gt;%</code>). Just as the <code>+</code> sign was used to add layers to a plot created using <code>ggplot()</code>, the pipe operator allows us to chain together <code>dplyr</code> data wrangling functions. The pipe operator can be read as “<em>then</em>”. The <code>%&gt;%</code> operator allows us to go from one step in <code>dplyr</code> to the next easily so we can, for example:</p>
+<ul>
+<li><code>filter</code> our data frame to only focus on a few rows <em>then</em></li>
+<li><code>group_by</code> another variable to create groups <em>then</em></li>
+<li><code>summarize</code> this grouped data to calculate the mean for each level of the group.</li>
+</ul>
+<p>The piping syntax will be our major focus throughout the rest of this book and you’ll find that you’ll quickly be addicted to the chaining with some practice.</p>
+</div>
+<div id="verbs" class="section level2">
+<h2><span class="header-section-number">5.2</span> Data wrangling verbs</h2>
+<p>The <code>d</code> in <code>dplyr</code> stands for data frames, so the functions in <code>dplyr</code> are built for working with objects of the data frame type. For now, we focus on the most commonly used functions that help wrangle and summarize data. A description of these verbs follows, with each section devoted to an example of that verb, or a combination of a few verbs, in action.</p>
+<ol style="list-style-type: decimal">
+<li><code>filter()</code>: Pick rows based on conditions about their values</li>
+<li><code>summarize()</code>: Compute summary measures known as “summary statistics” of variables</li>
+<li><code>group_by()</code>: Group rows of observations together</li>
+<li><code>mutate()</code>: Create a new variable in the data frame by mutating existing ones</li>
+<li><code>arrange()</code>: Arrange/sort the rows based on one or more variables</li>
+<li><code>join()</code>: Join/merge two data frames by matching along a “key” variable. There are many different <code>join()</code>s available. Here, we will focus on the <code>inner_join()</code> function.</li>
+</ol>
+<p>All of the verbs are used similarly where you: take a data frame, pipe it using the <code>%&gt;%</code> syntax into one of the verbs above followed by other arguments specifying which criteria you’d like the verb to work with in parentheses. Keep in mind, there are more advanced functions than just these five and you’ll see some examples of this near the end of this chapter in <a href="5-wrangling.html#other-verbs">5.9</a>, but with just the above verbs you’ll be able to perform a broad array of data wrangling tasks.</p>
+</div>
+<div id="filter" class="section level2">
+<h2><span class="header-section-number">5.3</span> Filter observations using filter</h2>
+<div class="figure" style="text-align: center"><span id="fig:filter"></span>
+<img src="images/filter.png" alt="Filter diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
+<p class="caption">
+Figure 5.1: Filter diagram from Data Wrangling with dplyr and tidyr cheatsheet
+</p>
+</div>
+<p>The <code>filter</code> function here works much like the “Filter” option in Microsoft Excel; it allows you to specify criteria about values of a variable in your dataset and then chooses only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The <code>dest</code> code (or airport code) for Portland, Oregon is <code>&quot;PDX&quot;</code>. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here:</p>
+<pre class="sourceCode r"><code class="sourceCode r">portland_flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(dest <span class="op">==</span><span class="st"> &quot;PDX&quot;</span>)
+<span class="kw">View</span>(portland_flights)</code></pre>
+<p>Note the following:</p>
+<ul>
+<li>The ordering of the commands:
+<ul>
+<li>Take the data frame <code>flights</code> <em>then</em></li>
+<li><code>filter</code> the data frame so that only those where the <code>dest</code> equals <code>&quot;PDX&quot;</code> are included.</li>
+</ul></li>
+<li>The double equal sign <code>==</code> for testing for equality, and not <code>=</code>. You are almost guaranteed to make the mistake at least once of only including one equals sign.</li>
+</ul>
+<p>You can combine multiple criteria together using operators that make comparisons:</p>
+<ul>
+<li><code>|</code> corresponds to “or”</li>
+<li><code>&amp;</code> corresponds to “and”</li>
+</ul>
+<p>We can often skip the use of <code>&amp;</code> and just separate our conditions with a comma. You’ll see this in the example below.</p>
+<p>In addition, you can use other mathematical checks (similar to <code>==</code>):</p>
+<ul>
+<li><code>&gt;</code> corresponds to “greater than”</li>
+<li><code>&lt;</code> corresponds to “less than”</li>
+<li><code>&gt;=</code> corresponds to “greater than or equal to”</li>
+<li><code>&lt;=</code> corresponds to “less than or equal to”</li>
+<li><code>!=</code> corresponds to “not equal to”</li>
+</ul>
+<p>To see many of these in action, let’s select all flights that left JFK airport heading to Burlington, Vermont (<code>&quot;BTV&quot;</code>) or Seattle, Washington (<code>&quot;SEA&quot;</code>) in the months of October, November, or December. Run the following</p>
+<pre class="sourceCode r"><code class="sourceCode r">btv_sea_flights_fall &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(origin <span class="op">==</span><span class="st"> &quot;JFK&quot;</span>, (dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>), month <span class="op">&gt;=</span><span class="st"> </span><span class="dv">10</span>)
+<span class="kw">View</span>(btv_sea_flights_fall)</code></pre>
+<p>Note: even though colloquially speaking one might say “all flights leaving Burlington, Vermont <em>and</em> Seattle, Washington,” in terms of computer logical operations, we really mean “all flights leaving Burlington, Vermont <em>or</em> Seattle, Washington.” For a given row in the data, <code>dest</code> can be “BTV”, “SEA”, or something else, but not “BTV” and “SEA” at the same time.</p>
+<p>Another example uses the <code>!</code> to pick rows that <em>don’t</em> match a condition. The <code>!</code> can be read as “not”. Here we are selecting rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA.</p>
+<pre class="sourceCode r"><code class="sourceCode r">not_BTV_SEA &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(<span class="op">!</span>(dest <span class="op">==</span><span class="st"> &quot;BTV&quot;</span> <span class="op">|</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> &quot;SEA&quot;</span>))
+<span class="kw">View</span>(not_BTV_SEA)</code></pre>
+<p>As a final note we point out that <code>filter()</code> should often be the first verb you’ll apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope to just the observations your care about.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC5.1)</strong> What’s another way using the “not” operator <code>!</code> we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the <code>flights</code> data frame? Test this out using the code above.</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="summarize" class="section level2">
+<h2><span class="header-section-number">5.4</span> Summarize variables using summarize</h2>
+<p>The next common task when working with data is to be able to summarize data: take a large number of values and summarize them with a single value. While this may seem like a very abstract idea, something as simple as the sum, the smallest value, and the largest values are all summaries of a large number of values.</p>
+<div class="figure" style="text-align: center"><span id="fig:sum1"></span>
+<img src="images/summarize1.png" alt="Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
+<p class="caption">
+Figure 5.2: Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
+</p>
+</div>
+<div class="figure" style="text-align: center"><span id="fig:sum2"></span>
+<img src="images/summary.png" alt="Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
+<p class="caption">
+Figure 5.3: Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
+</p>
+</div>
+<p>We can calculate the standard deviation and mean of the temperature variable <code>temp</code> in the <code>weather</code> data frame of <code>nycflights13</code> in one step using the <code>summarize</code> (or equivalently using the UK spelling <code>summarise</code>) function in <code>dplyr</code> (See Appendix <a href="A-appendixA.html#appendixA">A</a>):</p>
+<pre class="sourceCode r"><code class="sourceCode r">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp), <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp))
+summary_temp</code></pre>
+<p>mean std_dev
+—– ——–</p>
+<p>We’ve created a small data frame here called <code>summary_temp</code> that includes both the <code>mean</code> and the <code>std_dev</code> of the <code>temp</code> variable in <code>weather</code>. Notice as shown in Figures <a href="5-wrangling.html#fig:sum1">5.2</a> and <a href="5-wrangling.html#fig:sum2">5.3</a>, the data frame <code>weather</code> went from many rows to a single row of just the summary values in the data frame <code>summary_temp</code>.</p>
+<p>But why are the values returned <code>NA</code>? This stands for “not available or not applicable” and is how R encodes <em>missing values</em>; if in a data frame for a particular row and column no value exists, <code>NA</code> is stored instead. Furthermore, by default any time you try to summarize a number of values (using <code>mean()</code> and <code>sd()</code> for example) that has one or more missing values, then <code>NA</code> is returned.</p>
+<p>Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You’ll often encounter issues with missing values.</p>
+<p>You can summarize all non-missing values by setting the <code>na.rm</code> argument to TRUE (<code>rm</code> is short for “remove”). This will remove any <code>NA</code> missing values and only return the summary value for all non-missing values. So the code below computes the mean and standard deviation of all non-missing values. Notice how the <code>na.rm=TRUE</code> are set as arguments to the <code>mean()</code> and <code>sd()</code> functions, and not to the <code>summarize()</code> function.</p>
+<pre class="sourceCode r"><code class="sourceCode r">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>), <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
+summary_temp</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="right">mean</th>
+<th align="right">std_dev</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">55.26039</td>
+<td align="right">17.78785</td>
+</tr>
+</tbody>
+</table>
+<p>It is not good practice to include a <code>na.rm = TRUE</code> in your summary commands by default; you should attempt to run code first without this argument as this will alert you to the presence of missing data. Only after you’ve identified where missing values occur and have thought about the potential causes of this missing should you consider using <code>na.rm = TRUE</code>. In the upcoming Learning Checks we’ll consider the possible ramifications of blindly sweeping rows with missing values under the rug.</p>
+<!--
+If we'd like to access either of these values directly we can use the `$` to specify a column in a data frame. For example:
+
+
+```r
+#summary_temp$mean
+```
+-->
+<p>What other summary functions can we use inside the <code>summarize()</code> verb? Any function in R that takes a vector of values and returns just one. Here are just a few:</p>
+<ul>
+<li><code>mean()</code>: the mean AKA the average</li>
+<li><code>sd()</code>: the standard deviation, which is a measure of spread</li>
+<li><code>min()</code> and <code>max()</code>: the minimum and maximum values respectively</li>
+<li><code>IQR()</code>: Interquartile range</li>
+<li><code>sum()</code>: the sum</li>
+<li><code>n()</code>: a count of the number of rows/observations in each group. This particular summary function will make more sense when <code>group_by()</code> is covered in Section <a href="5-wrangling.html#groupby">5.5</a>.</li>
+</ul>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC5.2)</strong> Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach?</p>
+<p><strong>(LC5.3)</strong> Modify the above <code>summarize</code> function to create <code>summary_temp</code> to also use the <code>n()</code> summary function: <code>summarize(count = n())</code>. What does the returned value correspond to?</p>
+<p><strong>(LC5.4)</strong> Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run <code>summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE))</code> first.</p>
+<pre class="sourceCode r"><code class="sourceCode r">summary_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st">   </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</code></pre>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="groupby" class="section level2">
+<h2><span class="header-section-number">5.5</span> Group rows using group_by</h2>
+<div class="figure" style="text-align: center"><span id="fig:groupsummarize"></span>
+<img src="images/group_summary.png" alt="Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
+<p class="caption">
+Figure 5.4: Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet
+</p>
+</div>
+<p>It’s often more useful to summarize a variable based on the groupings of another variable. Let’s say, we are interested in the mean and standard deviation of temperatures but <em>grouped by month</em>. To be more specific: we want the mean and standard deviation of temperatures</p>
+<ol style="list-style-type: decimal">
+<li>split by month.</li>
+<li>sliced by month.</li>
+<li>aggregated by month.</li>
+<li>collapsed over month.</li>
+</ol>
+<p>Run the following code:</p>
+<pre class="sourceCode r"><code class="sourceCode r">summary_monthly_temp &lt;-<span class="st"> </span>weather <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>), 
+            <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
+summary_monthly_temp</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="right">month</th>
+<th align="right">mean</th>
+<th align="right">std_dev</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">35.63566</td>
+<td align="right">10.224635</td>
+</tr>
+<tr class="even">
+<td align="right">2</td>
+<td align="right">34.27060</td>
+<td align="right">6.982378</td>
+</tr>
+<tr class="odd">
+<td align="right">3</td>
+<td align="right">39.88007</td>
+<td align="right">6.249278</td>
+</tr>
+<tr class="even">
+<td align="right">4</td>
+<td align="right">51.74564</td>
+<td align="right">8.786168</td>
+</tr>
+<tr class="odd">
+<td align="right">5</td>
+<td align="right">61.79500</td>
+<td align="right">9.681644</td>
+</tr>
+<tr class="even">
+<td align="right">6</td>
+<td align="right">72.18400</td>
+<td align="right">7.546371</td>
+</tr>
+<tr class="odd">
+<td align="right">7</td>
+<td align="right">80.06622</td>
+<td align="right">7.119898</td>
+</tr>
+<tr class="even">
+<td align="right">8</td>
+<td align="right">74.46847</td>
+<td align="right">5.191615</td>
+</tr>
+<tr class="odd">
+<td align="right">9</td>
+<td align="right">67.37129</td>
+<td align="right">8.465902</td>
+</tr>
+<tr class="even">
+<td align="right">10</td>
+<td align="right">60.07113</td>
+<td align="right">8.846035</td>
+</tr>
+<tr class="odd">
+<td align="right">11</td>
+<td align="right">44.99043</td>
+<td align="right">10.443805</td>
+</tr>
+<tr class="even">
+<td align="right">12</td>
+<td align="right">38.44180</td>
+<td align="right">9.982432</td>
+</tr>
+</tbody>
+</table>
+<p>This code is identical to the previous code that created <code>summary_temp</code>, with an extra <code>group_by(month)</code> added. Grouping the <code>weather</code> dataset by <code>month</code> and then passing this new data frame into <code>summarize</code> yields a data frame that shows the mean and standard deviation of temperature for each month in New York City. Note: Since each row in <code>summary_monthly_temp</code> represents a summary of different rows in <code>weather</code>, the observational units have changed.</p>
+<p>It is important to note that <code>group_by</code> doesn’t change the data frame. It sets <em>meta-data</em> (data about the data), specifically the group structure of the data. It is only after we apply the <code>summarize</code> function that the data frame changes.</p>
+<p>If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the <code>ungroup()</code> function. For example, say the group structure meta-data is set to be by month via <code>group_by(month)</code>, all future summarizations will be reported on a month-by-month basis. If however, we would like to no longer have this and have all summarizations be for all data in a single group (in this case over the entire year of 2013), then pipe the data frame in question through and <code>ungroup()</code> to remove this.</p>
+<p>We now revisit the <code>n()</code> counting summary function we introduced in the previous section. For example, suppose we’d like to get a sense for how many flights departed each of the three airports in New York City:</p>
+<pre class="sourceCode r"><code class="sourceCode r">by_origin &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(origin) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())
+by_origin</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="left">origin</th>
+<th align="right">count</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">EWR</td>
+<td align="right">120835</td>
+</tr>
+<tr class="even">
+<td align="left">JFK</td>
+<td align="right">111279</td>
+</tr>
+<tr class="odd">
+<td align="left">LGA</td>
+<td align="right">104662</td>
+</tr>
+</tbody>
+</table>
+<p>We see that Newark (<code>&quot;EWR&quot;</code>) had the most flights departing in 2013 followed by <code>&quot;JFK&quot;</code> and lastly by LaGuardia (<code>&quot;LGA&quot;</code>). Note there is a subtle but important difference between <code>sum()</code> and <code>n()</code>. While <code>sum()</code> simply adds up a large set of numbers, the latter counts the number of times each of many different values occur.</p>
+<div id="grouping-by-more-than-one-variable" class="section level3">
+<h3><span class="header-section-number">5.5.1</span> Grouping by more than one variable</h3>
+<p>You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports <em>for each month</em>, we can also group by a second variable <code>month</code>: <code>group_by(origin, month)</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">by_origin_monthly &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(origin, month) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())
+by_origin_monthly</code></pre>
+<pre><code># A tibble: 36 x 3
+# Groups:   origin [3]
+   origin month count
+   &lt;chr&gt;  &lt;int&gt; &lt;int&gt;
+ 1 EWR        1  9893
+ 2 EWR        2  9107
+ 3 EWR        3 10420
+ 4 EWR        4 10531
+ 5 EWR        5 10592
+ 6 EWR        6 10175
+ 7 EWR        7 10475
+ 8 EWR        8 10359
+ 9 EWR        9  9550
+10 EWR       10 10104
+# … with 26 more rows</code></pre>
+<p>We see there are 36 rows to <code>by_origin_monthly</code> because there are 12 months times 3 airports (<code>EWR</code>, <code>JFK</code>, and <code>LGA</code>). Let’s now pose two questions. First, what if we reverse the order of the grouping i.e. we <code>group_by(month, origin)</code>?</p>
+<pre class="sourceCode r"><code class="sourceCode r">by_monthly_origin &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(month, origin) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())
+by_monthly_origin</code></pre>
+<pre><code># A tibble: 36 x 3
+# Groups:   month [12]
+   month origin count
+   &lt;int&gt; &lt;chr&gt;  &lt;int&gt;
+ 1     1 EWR     9893
+ 2     1 JFK     9161
+ 3     1 LGA     7950
+ 4     2 EWR     9107
+ 5     2 JFK     8421
+ 6     2 LGA     7423
+ 7     3 EWR    10420
+ 8     3 JFK     9697
+ 9     3 LGA     8717
+10     4 EWR    10531
+# … with 26 more rows</code></pre>
+<p>In <code>by_monthly_origin</code> the <code>month</code> column is now first and the rows are sorted by <code>month</code> instead of origin. If you compare the values of <code>count</code> in <code>by_origin_monthly</code> and <code>by_monthly_origin</code> using the <code>View()</code> function, you’ll see that the values are actually the same, just presented in a different order.</p>
+<p>Second, why do we <code>group_by(origin, month)</code> and not <code>group_by(origin)</code> and then <code>group_by(month)</code>? Let’s investigate:</p>
+<pre class="sourceCode r"><code class="sourceCode r">by_origin_monthly_incorrect &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(origin) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(month) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">count =</span> <span class="kw">n</span>())
+by_origin_monthly_incorrect</code></pre>
+<pre><code># A tibble: 12 x 2
+   month count
+   &lt;int&gt; &lt;int&gt;
+ 1     1 27004
+ 2     2 24951
+ 3     3 28834
+ 4     4 28330
+ 5     5 28796
+ 6     6 28243
+ 7     7 29425
+ 8     8 29327
+ 9     9 27574
+10    10 28889
+11    11 27268
+12    12 28135</code></pre>
+<p>What happened here is that the second <code>group_by(month)</code> overrode the first <code>group_by(origin)</code>, so that in the end we are only grouping by <code>month</code>. The lesson here, is if you want to <code>group_by()</code> two or more variables, you should include all these variables in a single <code>group_by()</code> function call.</p>
+<!--
+Alternatively, you can use the shortcut `count()` function in `dplyr` to get the same result:
+
+
+```r
+by_monthly_origin <- flights %>% 
+  count(origin, month)
+by_monthly_origin
+```
+
+-->
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC5.5)</strong> Recall from Chapter <a href="3-viz.html#viz">3</a> when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the <code>summary_monthly_temp</code> data frame tell us about temperatures in New York City throughout the year?</p>
+<p><strong>(LC5.6)</strong> What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC?</p>
+<p><strong>(LC5.7)</strong> Recreate <code>by_monthly_origin</code>, but instead of grouping via <code>group_by(origin, month)</code>, group variables in a different order <code>group_by(month, origin)</code>. What differs in the resulting dataset?</p>
+<p><strong>(LC5.8)</strong> How could we identify how many flights left each of the three airports for each <code>carrier</code>?</p>
+<p><strong>(LC5.9)</strong> How does the <code>filter</code> operation differ from a <code>group_by</code> followed by a <code>summarize</code>?</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="mutate" class="section level2">
+<h2><span class="header-section-number">5.6</span> Create new variables/change old variables using mutate</h2>
+<div class="figure" style="text-align: center"><span id="fig:select"></span>
+<img src="images/mutate.png" alt="Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
+<p class="caption">
+Figure 5.5: Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet
+</p>
+</div>
+<p>When looking at the <code>flights</code> dataset, there are some clear additional variables that could be calculated based on the values of variables already in the dataset. Passengers are often frustrated when their flights departs late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to when they expected to land. This is commonly referred to as “gain” and we will create this variable using the <code>mutate</code> function. Note that we have also overwritten the <code>flights</code> data frame with what it was before as well as an additional variable <code>gain</code> here, or put differently, the <code>mutate()</code> command outputs a new data frame which then gets saved over the original <code>flights</code> data frame.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay)</code></pre>
+<p>Let’s take a look at <code>dep_delay</code>, <code>arr_delay</code>, and the resulting <code>gain</code> variables for the first 5 rows in our new <code>flights</code> data frame:</p>
+<pre><code># A tibble: 5 x 3
+  dep_delay arr_delay  gain
+      &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;
+1         2        11    -9
+2         4        20   -16
+3         2        33   -31
+4        -1       -18    17
+5        -6       -25    19</code></pre>
+<p>The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its “gained time in the air” is actually a loss of 9 minutes, hence its <code>gain</code> is <code>-9</code>. Contrast this to the flight in the fourth row which departed a minute early (<code>dep_delay</code> of <code>-1</code>) but arrived 18 minutes early (<code>arr_delay</code> of <code>-18</code>), so its “gained time in the air” is 17 minutes, hence its <code>gain</code> is <code>+17</code>.</p>
+<p>Why did we overwrite <code>flights</code> instead of assigning the resulting data frame to a new object, like <code>flights_with_gain</code>? As a rough rule of thumb, as long as you are not losing information that you might need later, it’s acceptable practice to overwrite data frames. However, if you overwrite existing variables and/or change the observational units, recovering the original information might prove difficult. In this case, it might make sense to create a new data object.</p>
+<p>Let’s look at summary measures of this <code>gain</code> variable and even plot it in the form of a histogram:</p>
+<pre class="sourceCode r"><code class="sourceCode r">gain_summary &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(
+    <span class="dt">min =</span> <span class="kw">min</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">q1 =</span> <span class="kw">quantile</span>(gain, <span class="fl">0.25</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">median =</span> <span class="kw">quantile</span>(gain, <span class="fl">0.5</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">q3 =</span> <span class="kw">quantile</span>(gain, <span class="fl">0.75</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">max =</span> <span class="kw">max</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">mean =</span> <span class="kw">mean</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">sd =</span> <span class="kw">sd</span>(gain, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>),
+    <span class="dt">missing =</span> <span class="kw">sum</span>(<span class="kw">is.na</span>(gain))
+  )
+gain_summary</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="right">min</th>
+<th align="right">q1</th>
+<th align="right">median</th>
+<th align="right">q3</th>
+<th align="right">max</th>
+<th align="right">mean</th>
+<th align="right">sd</th>
+<th align="right">missing</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">-196</td>
+<td align="right">-3</td>
+<td align="right">7</td>
+<td align="right">17</td>
+<td align="right">109</td>
+<td align="right">5.659779</td>
+<td align="right">18.04365</td>
+<td align="right">9430</td>
+</tr>
+</tbody>
+</table>
+<p>We’ve recreated the <code>summary</code> function we saw in Chapter <a href="3-viz.html#viz">3</a> here using the <code>summarize</code> function in <code>dplyr</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> gain)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">bins =</span> <span class="dv">20</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-121"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-121-1.png" alt="Histogram of gain variable" width="\textwidth" />
+<p class="caption">
+Figure 5.6: Histogram of gain variable
+</p>
+</div>
+<p>We can also create multiple columns at once and even refer to columns that were just created in a new column. Hadley and Garrett produce one such example in Chapter 5 of “R for Data Science” <span class="citation">(Grolemund and Wickham 2016)</span>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(
+    <span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay,
+    <span class="dt">hours =</span> air_time <span class="op">/</span><span class="st"> </span><span class="dv">60</span>,
+    <span class="dt">gain_per_hour =</span> gain <span class="op">/</span><span class="st"> </span>hours
+  )</code></pre>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC5.10)</strong> What do positive values of the <code>gain</code> variable in <code>flights</code> correspond to? What about negative values? And what about a zero value?</p>
+<p><strong>(LC5.11)</strong> Could we create the <code>dep_delay</code> and <code>arr_delay</code> columns by simply subtracting <code>dep_time</code> from <code>sched_dep_time</code> and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in <code>flights</code>.</p>
+<p><strong>(LC5.12)</strong> What can we say about the distribution of <code>gain</code>? Describe it in a few sentences using the plot and the <code>gain_summary</code> data frame values.</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="arrange" class="section level2">
+<h2><span class="header-section-number">5.7</span> Reorder the data frame using arrange</h2>
+<p>One of the most common things people working with data would like to do is sort the data frames by a specific variable in a column. Have you ever been asked to calculate a median by hand? This requires you to put the data in order from smallest to highest in value. The <code>dplyr</code> package has a function called <code>arrange</code> that we will use to sort/reorder our data according to the values of the specified variable. This is often used after we have used the <code>group_by</code> and <code>summarize</code> functions as we will see.</p>
+<p>Let’s suppose we were interested in determining the most frequent destination airports from New York City in 2013:</p>
+<pre class="sourceCode r"><code class="sourceCode r">freq_dest &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>())
+freq_dest</code></pre>
+<pre><code># A tibble: 105 x 2
+   dest  num_flights
+   &lt;chr&gt;       &lt;int&gt;
+ 1 ABQ           254
+ 2 ACK           265
+ 3 ALB           439
+ 4 ANC             8
+ 5 ATL         17215
+ 6 AUS          2439
+ 7 AVL           275
+ 8 BDL           443
+ 9 BGR           375
+10 BHM           297
+# … with 95 more rows</code></pre>
+<p>You’ll see that by default the values of <code>dest</code> are displayed in alphabetical order here. We are interested in finding those airports that appear most:</p>
+<pre class="sourceCode r"><code class="sourceCode r">freq_dest <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">arrange</span>(num_flights)</code></pre>
+<pre><code># A tibble: 105 x 2
+   dest  num_flights
+   &lt;chr&gt;       &lt;int&gt;
+ 1 LEX             1
+ 2 LGA             1
+ 3 ANC             8
+ 4 SBN            10
+ 5 HDN            15
+ 6 MTJ            15
+ 7 EYW            17
+ 8 PSP            19
+ 9 JAC            25
+10 BZN            36
+# … with 95 more rows</code></pre>
+<p>This is actually giving us the opposite of what we are looking for. It tells us the least frequent destination airports first. To switch the ordering to be descending instead of ascending we use the <code>desc</code> (<code>desc</code>ending) function:</p>
+<pre class="sourceCode r"><code class="sourceCode r">freq_dest <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights))</code></pre>
+<pre><code># A tibble: 105 x 2
+   dest  num_flights
+   &lt;chr&gt;       &lt;int&gt;
+ 1 ORD         17283
+ 2 ATL         17215
+ 3 LAX         16174
+ 4 BOS         15508
+ 5 MCO         14082
+ 6 CLT         14064
+ 7 SFO         13331
+ 8 FLL         12055
+ 9 MIA         11728
+10 DCA          9705
+# … with 95 more rows</code></pre>
+</div>
+<div id="joins" class="section level2">
+<h2><span class="header-section-number">5.8</span> Joining data frames</h2>
+<p>Another common task is joining AKA merging two different datasets. For example, in the <code>flights</code> data, the variable <code>carrier</code> lists the carrier code for the different flights. While <code>&quot;UA&quot;</code> and <code>&quot;AA&quot;</code> might be somewhat easy to guess for some (United and American Airlines), what are “VX”, “HA”, and “B6”? This information is provided in a separate data frame <code>airlines</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(airlines)</code></pre>
+<p>We see that in <code>airports</code>, <code>carrier</code> is the carrier code while <code>name</code> is the full name of the airline. Using this table, we can see that “VX”, “HA”, and “B6” correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, will we have to continually look up the carrier’s name for each flight in the <code>airlines</code> dataset? No! Instead of having to do this manually, we can have R automatically do the “looking up” for us.</p>
+<p>Note that the values in the variable <code>carrier</code> in <code>flights</code> match the values in the variable <code>carrier</code> in <code>airlines</code>. In this case, we can use the variable <code>carrier</code> as a <em>key variable</em> to join/merge/match the two data frames by. Key variables are almost always identification variables that uniquely identify the observational units as we saw back in Subsection <a href="4-tidy.html#identification-vs-measurement">4.2.2</a> on identification vs measurement variables. This ensures that rows in both data frames are appropriate matched during the join. Hadley and Garrett <span class="citation">(Grolemund and Wickham 2016)</span> created the following diagram to help us understand how the different datasets are linked by various key variables:</p>
+<div class="figure" style="text-align: center"><span id="fig:reldiagram"></span>
+<img src="images/relational-nycflights.png" alt="Data relationships in nycflights13 from R for Data Science" width="\textwidth" />
+<p class="caption">
+Figure 5.7: Data relationships in nycflights13 from R for Data Science
+</p>
+</div>
+<div id="joining-by-key-variables" class="section level3">
+<h3><span class="header-section-number">5.8.1</span> Joining by “key” variables</h3>
+<p>In both <code>flights</code> and <code>airlines</code>, the key variable we want to join/merge/match the two data frames with has the same name in both datasets: <code>carriers</code>. We make use of the <code>inner_join()</code> function to join by the variable <code>carrier</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_joined &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">inner_join</span>(airlines, <span class="dt">by =</span> <span class="st">&quot;carrier&quot;</span>)
+<span class="kw">View</span>(flights)
+<span class="kw">View</span>(flights_joined)</code></pre>
+<p>We observed that the <code>flights</code> and <code>flights_joined</code> are identical except that <code>flights_joined</code> has an additional variable <code>name</code> whose values were drawn from <code>airlines</code>.</p>
+<p>A visual representation of the <code>inner_join</code> is given below <span class="citation">(Grolemund and Wickham 2016)</span>:</p>
+<div class="figure" style="text-align: center"><span id="fig:ijdiagram"></span>
+<img src="images/join-inner.png" alt="Diagram of inner join from R for Data Science" width="\textwidth" />
+<p class="caption">
+Figure 5.8: Diagram of inner join from R for Data Science
+</p>
+</div>
+<p>There are more complex joins available, but the <code>inner_join</code> will solve nearly all of the problems you’ll face in our experience.</p>
+</div>
+<div id="joining-by-key-variables-with-different-names" class="section level3">
+<h3><span class="header-section-number">5.8.2</span> Joining by “key” variables with different names</h3>
+<p>Say instead, you are interested in all the destinations of flights from NYC in 2013 and ask yourself:</p>
+<ul>
+<li>“What cities are these airports in?”</li>
+<li>“Is <code>&quot;ORD&quot;</code> Orlando?”</li>
+<li>&quot;Where is <code>&quot;FLL&quot;</code>?</li>
+</ul>
+<p>The <code>airports</code> data frame contains airport codes:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(airports)</code></pre>
+<p>However, looking at both the <code>airports</code> and <code>flights</code> and the visual representation of the relations between the data frames in Figure <a href="5-wrangling.html#fig:ijdiagram">5.8</a>, we see that in:</p>
+<ul>
+<li><code>airports</code> the airport code is in the variable <code>faa</code></li>
+<li><code>flights</code> the airport code is in the variable <code>origin</code></li>
+</ul>
+<p>So to join these two datasets, our <code>inner_join</code> operation involves a <code>by</code> argument that accounts for the different names:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">inner_join</span>(airports, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;dest&quot;</span> =<span class="st"> &quot;faa&quot;</span>))</code></pre>
+<p>Let’s construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport:</p>
+<pre class="sourceCode r"><code class="sourceCode r">named_dests &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>()) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights)) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">inner_join</span>(airports, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;dest&quot;</span> =<span class="st"> &quot;faa&quot;</span>)) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">airport_name =</span> name)
+named_dests</code></pre>
+<pre><code># A tibble: 101 x 9
+   dest  num_flights airport_name          lat    lon   alt    tz dst   tzone   
+   &lt;chr&gt;       &lt;int&gt; &lt;chr&gt;               &lt;dbl&gt;  &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;   
+ 1 ORD         17283 Chicago Ohare Intl   42.0  -87.9   668    -6 A     America…
+ 2 ATL         17215 Hartsfield Jackson…  33.6  -84.4  1026    -5 A     America…
+ 3 LAX         16174 Los Angeles Intl     33.9 -118.    126    -8 A     America…
+ 4 BOS         15508 General Edward Law…  42.4  -71.0    19    -5 A     America…
+ 5 MCO         14082 Orlando Intl         28.4  -81.3    96    -5 A     America…
+ 6 CLT         14064 Charlotte Douglas …  35.2  -80.9   748    -5 A     America…
+ 7 SFO         13331 San Francisco Intl   37.6 -122.     13    -8 A     America…
+ 8 FLL         12055 Fort Lauderdale Ho…  26.1  -80.2     9    -5 A     America…
+ 9 MIA         11728 Miami Intl           25.8  -80.3     8    -5 A     America…
+10 DCA          9705 Ronald Reagan Wash…  38.9  -77.0    15    -5 A     America…
+# … with 91 more rows</code></pre>
+<p>In case you didn’t know, <code>&quot;ORD&quot;</code> is the airport code of Chicago O’Hare airport and <code>&quot;FLL&quot;</code> is the main airport in Fort Lauderdale, Florida, which we can now see in our <code>named_dests</code> data frame.</p>
+</div>
+<div id="joining-by-multiple-key-variables" class="section level3">
+<h3><span class="header-section-number">5.8.3</span> Joining by multiple “key” variables</h3>
+<p>Say instead we are in a situation where we need to join by multiple variables. For example, in Figure <a href="5-wrangling.html#fig:reldiagram">5.7</a> above we see that in order to join the <code>flights</code> and <code>weather</code> data frames, we need more than one key variable: <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code>, and <code>origin</code>. This is because the combination of these 5 variables act to uniquely identify each observational unit in the <code>weather</code> data frame: hourly weather recordings at each of the 3 NYC airports.</p>
+<p>We achieve this by specifying a vector of key variables to join by using the <code>c()</code> concatenate function. Note the individual variables need to be wrapped in quotation marks.</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_weather_joined &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">inner_join</span>(weather, <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;year&quot;</span>, <span class="st">&quot;month&quot;</span>, <span class="st">&quot;day&quot;</span>, <span class="st">&quot;hour&quot;</span>, <span class="st">&quot;origin&quot;</span>))
+flights_weather_joined</code></pre>
+<pre><code># A tibble: 335,220 x 32
+    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+   &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt;    &lt;int&gt;          &lt;int&gt;     &lt;dbl&gt;    &lt;int&gt;          &lt;int&gt;
+ 1  2013     1     1      517            515         2      830            819
+ 2  2013     1     1      533            529         4      850            830
+ 3  2013     1     1      542            540         2      923            850
+ 4  2013     1     1      544            545        -1     1004           1022
+ 5  2013     1     1      554            600        -6      812            837
+ 6  2013     1     1      554            558        -4      740            728
+ 7  2013     1     1      555            600        -5      913            854
+ 8  2013     1     1      557            600        -3      709            723
+ 9  2013     1     1      557            600        -3      838            846
+10  2013     1     1      558            600        -2      753            745
+# … with 335,210 more rows, and 24 more variables: arr_delay &lt;dbl&gt;,
+#   carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;,
+#   air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;,
+#   time_hour.x &lt;dttm&gt;, gain &lt;dbl&gt;, hours &lt;dbl&gt;, gain_per_hour &lt;dbl&gt;,
+#   temp &lt;dbl&gt;, dewp &lt;dbl&gt;, humid &lt;dbl&gt;, wind_dir &lt;dbl&gt;, wind_speed &lt;dbl&gt;,
+#   wind_gust &lt;dbl&gt;, precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;,
+#   time_hour.y &lt;dttm&gt;</code></pre>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC5.13)</strong> Looking at Figure <a href="5-wrangling.html#fig:reldiagram">5.7</a>, when joining <code>flights</code> and <code>weather</code> (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code>, and <code>origin</code>, and not just <code>hour</code>?</p>
+<p><strong>(LC5.14)</strong> What surprises you about the top 10 destinations from NYC in 2013?</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="other-verbs" class="section level2">
+<h2><span class="header-section-number">5.9</span> Other verbs</h2>
+<p>On top of the following examples of other verbs, if you’d like to see more examples on using <code>dplyr</code>, the data wrangling verbs we introduction in Section <a href="5-wrangling.html#verbs">5.2</a>, and the pipe function <code>%&gt;%</code> with the <code>nycflights13</code> dataset, check out <a href="http://r4ds.had.co.nz/transform.html">Chapter 5</a> of Hadley and Garrett’s book <span class="citation">(Grolemund and Wickham 2016)</span>.</p>
+<div id="select" class="section level3">
+<h3><span class="header-section-number">5.9.1</span> Select variables using select</h3>
+<div class="figure" style="text-align: center"><span id="fig:selectfig"></span>
+<img src="images/select.png" alt="Select diagram from Data Wrangling with dplyr and tidyr cheatsheet" width="\textwidth" />
+<p class="caption">
+Figure 5.9: Select diagram from Data Wrangling with dplyr and tidyr cheatsheet
+</p>
+</div>
+<p>We’ve seen that the <code>flights</code> data frame in the <code>nycflights13</code> package contains many different variables. The <code>names</code> function gives a listing of all the columns in a data frame; in our case you would run <code>names(flights)</code>. You can also identify these variables by running the <code>glimpse</code> function in the <code>dplyr</code> package:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(flights)</code></pre>
+<p>However, say you only want to consider two of these variables, say <code>carrier</code> and <code>flight</code>. You can <code>select</code> these:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(carrier, flight)</code></pre>
+<p>This function makes navigating datasets with a very large number of variables easier for humans by restricting consideration to only those of interest, like <code>carrier</code> and <code>flight</code> above. So for example, this might make viewing the dataset using the <code>View()</code> spreadsheet viewer more digestible. However, as far as the computer is concerned it doesn’t care how many variables additional variables are in the dataset in question, so long as <code>carrier</code> and <code>flight</code> are included.</p>
+<p>Another example involves the variable <code>year</code>. If you remember the original description of the <code>flights</code> data frame (or by running <code>?flights</code>), you’ll remember that this data correspond to flights in 2013 departing New York City. The <code>year</code> variable isn’t really a variable here in that it doesn’t vary… <code>flights</code> actually comes from a larger dataset that covers many years. We may want to remove the <code>year</code> variable from our dataset since it won’t be helpful for analysis in this case. We can deselect <code>year</code> by using the <code>-</code> sign:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_no_year &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span>year)
+<span class="kw">names</span>(flights_no_year)</code></pre>
+<p>Or we could specify a ranges of columns:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flight_arr_times &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(month<span class="op">:</span>day, arr_time<span class="op">:</span>sched_arr_time)
+flight_arr_times</code></pre>
+<p>The <code>select</code> function can also be used to reorder columns in combination with the <code>everything</code> helper function. Let’s suppose we’d like the <code>hour</code>, <code>minute</code>, and <code>time_hour</code> variables, which appear at the end of the <code>flights</code> dataset, to actually appear immediately after the <code>day</code> variable:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_reorder &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(month<span class="op">:</span>day, hour<span class="op">:</span>time_hour, <span class="kw">everything</span>())
+<span class="kw">names</span>(flights_reorder)</code></pre>
+<p>in this case <code>everything()</code> picks up all remaining variables. Lastly, the helper functions <code>starts_with</code>, <code>ends_with</code>, and <code>contains</code> can be used to choose column names that match those conditions:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_begin_a &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="kw">starts_with</span>(<span class="st">&quot;a&quot;</span>))
+flights_begin_a</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">flights_delays &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="kw">ends_with</span>(<span class="st">&quot;delay&quot;</span>))
+flights_delays</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">flights_time &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="kw">contains</span>(<span class="st">&quot;time&quot;</span>))
+flights_time</code></pre>
+</div>
+<div id="rename" class="section level3">
+<h3><span class="header-section-number">5.9.2</span> Rename variables using rename</h3>
+<p>Another useful function is <code>rename</code>, which as you may suspect renames one column to another name. Suppose we wanted <code>dep_time</code> and <code>arr_time</code> to be <code>departure_time</code> and <code>arrival_time</code> instead in the <code>flights_time</code> data frame:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_time_new &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="kw">contains</span>(<span class="st">&quot;time&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">departure_time =</span> dep_time,
+         <span class="dt">arrival_time =</span> arr_time)
+<span class="kw">names</span>(flights_time)</code></pre>
+<p>Note that in this case we used a single <code>=</code> sign with the <code>rename()</code>. Ex: <code>departure_time = dep_time</code>. This is because we are not testing for equality like we would using <code>==</code>, but instead we want to assign a new variable <code>departure_time</code> to have the same values as <code>dep_time</code> and then delete the variable <code>dep_time</code>.</p>
+<p>It’s easy to forget if the new name comes before or after the equals sign. I usually remember this as “New Before, Old After” or NBOA. You’ll receive an error if you try to do it the other way:</p>
+<pre><code>Error: Unknown variables: departure_time, arrival_time.</code></pre>
+</div>
+<div id="find-the-top-number-of-values-using-top_n" class="section level3">
+<h3><span class="header-section-number">5.9.3</span> Find the top number of values using top_n</h3>
+<p>We can also use the <code>top_n</code> function which automatically tells us the most frequent <code>num_flights</code>. We specify the top 10 airports here:</p>
+<pre class="sourceCode r"><code class="sourceCode r">named_dests <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>, <span class="dt">wt =</span> num_flights)</code></pre>
+<p>We’ll still need to arrange this by <code>num_flights</code> though:</p>
+<pre class="sourceCode r"><code class="sourceCode r">named_dests  <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>, <span class="dt">wt =</span> num_flights) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights))</code></pre>
+<p><strong>Note:</strong> Remember that I didn’t pull the <code>n</code> and <code>wt</code> arguments out of thin air. They can be found by using the <code>?</code> function on <code>top_n</code>.</p>
+<p>We can go one stop further and tie together the <code>group_by</code> and <code>summarize</code> functions we used to find the most frequent flights:</p>
+<pre class="sourceCode r"><code class="sourceCode r">ten_freq_dests &lt;-<span class="st"> </span>flights <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">group_by</span>(dest) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">num_flights =</span> <span class="kw">n</span>()) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">arrange</span>(<span class="kw">desc</span>(num_flights)) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">top_n</span>(<span class="dt">n =</span> <span class="dv">10</span>) 
+<span class="kw">View</span>(ten_freq_dests)</code></pre>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC5.15)</strong> What are some ways to select all three of the <code>dest</code>, <code>air_time</code>, and <code>distance</code> variables from <code>flights</code>? Give the code showing how to do this in at least three different ways.</p>
+<p><strong>(LC5.16)</strong> How could one use <code>starts_with</code>, <code>ends_with</code>, and <code>contains</code> to select columns from the <code>flights</code> data frame? Provide three different examples in total: one for <code>starts_with</code>, one for <code>ends_with</code>, and one for <code>contains</code>.</p>
+<p><strong>(LC5.17)</strong> Why might we want to use the <code>select</code> function on a data frame?</p>
+<p><strong>(LC5.18)</strong> Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="conclusion-3" class="section level2">
+<h2><span class="header-section-number">5.10</span> Conclusion</h2>
+<div id="putting-it-all-together-available-seat-miles" class="section level3">
+<h3><span class="header-section-number">5.10.1</span> Putting it all together: Available seat miles</h3>
+<p>Let’s recap a selection of verbs in Table <a href="5-wrangling.html#tab:wrangle-summary-table">5.1</a> summarizing their differences. Using these verbs and the pipe <code>%&gt;%</code> operator from Section <a href="5-wrangling.html#piping">5.1</a>, you’ll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book.</p>
+<table>
+<caption><span id="tab:wrangle-summary-table">Table 5.1: </span>Summary of data wrangling verbs</caption>
+<thead>
+<tr class="header">
+<th></th>
+<th align="left">Verb</th>
+<th align="left">Data wrangling operation</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>1</td>
+<td align="left"><code>filter()</code></td>
+<td align="left">Pick out a subset of rows</td>
+</tr>
+<tr class="even">
+<td>2</td>
+<td align="left"><code>summarize()</code></td>
+<td align="left">Summarize many values to one using a summary statistic function like <code>mean()</code>, <code>median()</code>, etc.</td>
+</tr>
+<tr class="odd">
+<td>3</td>
+<td align="left"><code>group_by()</code></td>
+<td align="left">Add grouping structure to rows in data frame. Note this does not change values in data frame.</td>
+</tr>
+<tr class="even">
+<td>4</td>
+<td align="left"><code>mutate()</code></td>
+<td align="left">Create new variables by mutating existing ones</td>
+</tr>
+<tr class="odd">
+<td>5</td>
+<td align="left"><code>arrange()</code></td>
+<td align="left">Arrange rows of a data variable in ascending (default) or <code>desc</code>ending order</td>
+</tr>
+<tr class="even">
+<td>6</td>
+<td align="left"><code>inner_join()</code></td>
+<td align="left">Join/merge two data frames, matching rows by a key variable</td>
+</tr>
+<tr class="odd">
+<td>7</td>
+<td align="left"><code>select()</code></td>
+<td align="left">Pick out a subset of columns to make data frames easier to view</td>
+</tr>
+</tbody>
+</table>
+<p>Let’s now put your newly acquired data wrangling skills to the test! An airline industry measure of a passenger airline’s capacity is the <a href="https://en.wikipedia.org/wiki/Available_seat_miles">available seat miles</a>, which is equal to the number of seats available multiplied by the number of miles or kilometers flown. So for example say an airline had 2 flights using a plane with 10 seats that flew 500 miles and 3 flights using a plane with 20 seats that flew 1000 miles, the available seat miles would be 2 <span class="math inline">\(\times\)</span> 10 <span class="math inline">\(\times\)</span> 500 <span class="math inline">\(+\)</span> 3 <span class="math inline">\(\times\)</span> 20 <span class="math inline">\(\times\)</span> 1000 = 70,000 seat miles.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC5.19)</strong> Using the datasets included in the <code>nycflights13</code> package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Crucial</strong>: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level <em>pseudocode</em> that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse <em>what</em> you are trying to do (the algorithm) with <em>how</em> you are going to do it (writing <code>dplyr</code> code).</li>
+<li>Take a close look at all the datasets using the <code>View()</code> function: <code>flights</code>, <code>weather</code>, <code>planes</code>, <code>airports</code>, and <code>airlines</code> to identify which variables are necessary to compute available seat miles.</li>
+<li>Figure <a href="5-wrangling.html#fig:reldiagram">5.7</a> above showing how the various datasets can be joined will also be useful.</li>
+<li>Consider the data wrangling verbs in Table <a href="5-wrangling.html#tab:wrangle-summary-table">5.1</a> as your toolbox!</li>
+</ol>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="review-questions-2" class="section level3">
+<h3><span class="header-section-number">5.10.2</span> Review questions</h3>
+<p>Review questions have been designed using the <code>fivethirtyeight</code> R package <span class="citation">(Kim, Ismay, and Chunn 2019)</span> with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course <strong>Effective Data Storytelling using the <code>tidyverse</code></strong>. The material in this chapter is covered in the chapters of the DataCamp course available below:</p>
+<ul>
+<li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17585?ex=1">Filtering, Grouping, &amp; Summarizing</a></li>
+<li><a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free/17586?ex=1">dplyr Review</a></li>
+</ul>
+</div>
+<div id="whats-to-come-3" class="section level3">
+<h3><span class="header-section-number">5.10.3</span> What’s to come?</h3>
+<p>Congratulations! We’ve completed the “data science” portion of this book! We’ll now move to the “data modeling” portion in Chapters <a href="6-regression.html#regression">6</a> and <a href="7-multiple-regression.html#multiple-regression">7</a>, where you’ll leverage your data visualization and wrangling skills to model the <em>relationships</em> between different variables of datasets. However, we’re going to leave “Inference for Regression” (Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a>) until later.</p>
+<center>
+<img src="images/flowcharts/flowchart/flowchart.005.png" title="ModernDive flowchart" width="800"/>
+</center>
+</div>
+<div id="resources-1" class="section level3">
+<h3><span class="header-section-number">5.10.4</span> Resources</h3>
+<p>As we saw with the RStudio cheatsheet on <a href="https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf">data visualization</a>, RStudio has also created a cheatsheet for data wrangling entitled <a href="https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/data-transformation-cheatsheet.pdf">“Data Transformation with dplyr”</a>.</p>
+<!--
+* By clicking [here](https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/data-transformation-cheatsheet.pdf)
+* Or by clicking the RStudio Menu Bar -> Help -> Cheatsheets -> "Data Manipulation with `dplyr`, `tidyr`"
+(The RStudio interface has not been updated to include JUST the dplyr cheatsheet)
+
+We will focus only on the `dplyr` functions in this book, but you are encouraged to also explore `tidyr` if you are presented with data that is not in the tidy format that we have specified as the preferred option for our purposes.
+-->
+</div>
+<div id="script-of-r-code-2" class="section level3">
+<h3><span class="header-section-number">5.10.5</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/05-wrangling.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+
+
+
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="4-tidy.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="6-regression.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/05-wrangling.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/6-confidence-intervals.html b/previous_versions/v0.4.0/6-confidence-intervals.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/previous_versions/v0.4.0/6-regression.html b/previous_versions/v0.4.0/6-regression.html
new file mode 100644
index 000000000..e511ba227
--- /dev/null
+++ b/previous_versions/v0.4.0/6-regression.html
@@ -0,0 +1,2091 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>6 Basic Regression | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="6 Basic Regression | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="6 Basic Regression | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="5-wrangling.html">
+<link rel="next" href="7-multiple-regression.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="regression" class="section level1">
+<h1><span class="header-section-number">6</span> Basic Regression</h1>
+<p>Now that we are equipped with data visualization skills from Chapter <a href="3-viz.html#viz">3</a>, an understanding of the “tidy” data format from Chapter <a href="4-tidy.html#tidy">4</a>, and data wrangling skills from Chapter <a href="5-wrangling.html#wrangling">5</a>, we now proceed with data modeling. The fundamental premise of data modeling is <em>to make explicit the relationship</em> between:</p>
+<ul>
+<li>an outcome variable <span class="math inline">\(y\)</span>, also called a dependent variable and</li>
+<li>an explanatory/predictor variable <span class="math inline">\(x\)</span>, also called an independent variable or covariate.</li>
+</ul>
+<p>Another way to state this is using mathematical terminology: we will model the outcome variable <span class="math inline">\(y\)</span> <em>as a function</em> of the explanatory/predictor variable <span class="math inline">\(x\)</span>. Why do we have two different labels, explanatory and predictor, for the variable <span class="math inline">\(x\)</span>? That’s because roughly speaking data modeling can be used for two purposes:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Modeling for prediction</strong>: You want to predict an outcome variable <span class="math inline">\(y\)</span> based on the information contained in a set of predictor variables. You don’t care so much about understanding how all the variables relate and interact, but so long as you can make good predictions about <span class="math inline">\(y\)</span>, you’re fine. For example, if we know many individuals’ risk factors for lung cancer, such as smoking habits and age, can we predict whether or not they will develop lung cancer? Here we wouldn’t care so much about distinguishing the degree to which the different risk factors contribute to lung cancer, but instead only on whether or not they could be put together to make reliable predictions.</li>
+<li><strong>Modeling for explanation</strong>: You want to explicitly describe the relationship between an outcome variable <span class="math inline">\(y\)</span> and a set of explanatory variables, determine the significance of any found relationships, and have measures summarizing these. Continuing our example from above, we would now be interested in describing the individual effects of the different risk factors and quantifying the magnitude of these effects. One reason could be to design an intervention to reduce lung cancer cases in a population, such as targeting smokers of a specific age group with an advertisement for smoking cessation programs. In this book, we’ll focus more on this latter purpose.</li>
+</ol>
+<p>Data modeling is used in a wide variety of fields, including statistical inference, causal inference, artificial intelligence, and machine learning. There are many techniques for data modeling, such as tree-based models, neural networks and deep learning, and supervised learning. In this chapter, we’ll focus on one particular technique: <em>linear regression</em>, one of the most commonly-used and easy-to-understand approaches to modeling. Recall our discussion in Subsection <a href="2-getting-started.html#exploredataframes">2.4.3</a> on numerical and categorical variables. Linear regression involves:</p>
+<ul>
+<li>an outcome variable <span class="math inline">\(y\)</span> that is <em>numerical</em> and</li>
+<li>explanatory variables <span class="math inline">\(\vec{x}\)</span> that are either <em>numerical</em> or <em>categorical</em>.</li>
+</ul>
+<p>With linear regression there is always only one numerical outcome variable <span class="math inline">\(y\)</span> but we have choices on both the number and the type of explanatory variables <span class="math inline">\(\vec{x}\)</span> to use. We’re going to cover the following regression scenarios:</p>
+<ul>
+<li>In this current chapter on basic regression, we’ll always have only one explanatory variable.
+<ul>
+<li>In Section <a href="6-regression.html#model1">6.1</a>, this explanatory variable will be a single numerical explanatory variable <span class="math inline">\(x\)</span>. This scenario is known as <em>simple linear regression</em>.</li>
+<li>In Section <a href="6-regression.html#model2">6.2</a>, this explanatory variable will be a categorical explanatory variable <span class="math inline">\(x\)</span>.</li>
+</ul></li>
+<li>In the next chapter, Chapter <a href="7-multiple-regression.html#multiple-regression">7</a> on <em>multiple regression</em>, we’ll have more than one explanatory variable:
+<ul>
+<li>We’ll focus on two numerical explanatory variables <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span> in Section <a href="7-multiple-regression.html#model3">7.1</a>. This can be denoted as <span class="math inline">\(\vec{x}\)</span> as well since we have more than one explanatory variable.</li>
+<li>We’ll use one numerical and one categorical explanatory variable in Section <a href="7-multiple-regression.html#model3">7.1</a>. We’ll also introduce <em>interaction models</em> here; there, the effect of one explanatory variable depends on the value of another.</li>
+</ul></li>
+</ul>
+<p>We’ll study all four of these regression scenarios using real data, all easily accessible via R packages! <!--We will also discuss the concept of *correlation* and how it is frequently incorrectly used to imply *causation*.--></p>
+<div id="needed-packages-3" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>In this chapter we introduce a new package, <code>moderndive</code>, that is an accompaniment package to this ModernDive book. It includes useful functions for linear regression and other functions as well as data used later in the book. Let’s now load all the packages needed for this chapter. If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(moderndive)
+<span class="kw">library</span>(gapminder)
+<span class="kw">library</span>(skimr)</code></pre>
+</div>
+<div id="datacamp-4" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>The introductory basic regression analysis below was the inspiration for a large part of ModernDive co-author <a href="https://twitter.com/rudeboybert">Albert Y. Kim’s</a> DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 2 “Modeling with Basic Regression”.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse"><img src="images/datacamp_intro_to_modeling.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="model1" class="section level2">
+<h2><span class="header-section-number">6.1</span> One numerical explanatory variable</h2>
+<p>Why do some professors and instructors at universities and colleges get high teaching evaluations from students while others don’t? What factors can explain these differences? Are there biases? These are questions that are of interest to university/college administrators, as teaching evaluations are among the many criteria considered in determining which professors and instructors should get promotions. Researchers at the University of Texas in Austin, Texas (UT Austin) tried to answer this question: what factors can explain differences in instructor’s teaching evaluation scores? To this end, they collected information on <span class="math inline">\(n = 463\)</span> instructors. A full description of the study can be found at <a href="https://www.openintro.org/stat/data/?data=evals">openintro.org</a>.</p>
+<p>We’ll keep things simple for now and try to explain differences in instructor evaluation scores as a function of one numerical variable: their “beauty score.” The specifics on how this score was calculated will be described shortly.</p>
+<p>Could it be that instructors with higher beauty scores also have higher teaching evaluations? Could it be instead that instructors with higher beauty scores tend to have lower teaching evaluations? Or could it be there is no relationship between beauty score and teaching evaluations?</p>
+<p>We’ll achieve ways to address these questions by modeling the relationship between these two variables with a particular kind of linear regression called <em>simple linear regression</em>. Simple linear regression is the most basic form of linear regression. With it we have</p>
+<ol style="list-style-type: decimal">
+<li>A numerical outcome variable <span class="math inline">\(y\)</span>. In this case, their teaching score.</li>
+<li>A single numerical explanatory variable <span class="math inline">\(x\)</span>. In this case, their beauty score.</li>
+</ol>
+<div id="model1EDA" class="section level3">
+<h3><span class="header-section-number">6.1.1</span> Exploratory data analysis</h3>
+<p>A crucial step before doing any kind of modeling or analysis is performing an <em>exploratory data analysis</em>, or EDA, of all our data. Exploratory data analysis can give you a sense of the distribution of the data, and whether there are outliers and/or missing values. Most importantly, it can inform how to build your model. There are many approaches to exploratory data analysis; here are three:</p>
+<ol style="list-style-type: decimal">
+<li>Most fundamentally: just looking at the raw values, in a spreadsheet for example. While this may seem trivial, many people ignore this crucial step!</li>
+<li>Computing summary statistics likes means, medians, and standard deviations.</li>
+<li>Creating data visualizations.</li>
+</ol>
+<p>Let’s load the data, <code>select</code> only a subset of the variables, and look at the raw values. Recall you can look at the raw values by running <code>View()</code> in the console in RStudio to pop-up the spreadsheet viewer with the data frame of interest as the argument to <code>View()</code>. Here, however, we present only a snapshot of five randomly chosen rows:</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch6 &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(score, bty_avg, age)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch6 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">sample_n</span>(<span class="dv">5</span>)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-158">Table 6.1: </span>Random sample of 5 instructors</caption>
+<thead>
+<tr class="header">
+<th align="right">score</th>
+<th align="right">bty_avg</th>
+<th align="right">age</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">3.6</td>
+<td align="right">6.67</td>
+<td align="right">34</td>
+</tr>
+<tr class="even">
+<td align="right">4.9</td>
+<td align="right">3.50</td>
+<td align="right">43</td>
+</tr>
+<tr class="odd">
+<td align="right">3.3</td>
+<td align="right">2.33</td>
+<td align="right">47</td>
+</tr>
+<tr class="even">
+<td align="right">4.4</td>
+<td align="right">4.67</td>
+<td align="right">33</td>
+</tr>
+<tr class="odd">
+<td align="right">4.7</td>
+<td align="right">3.67</td>
+<td align="right">60</td>
+</tr>
+</tbody>
+</table>
+<p>While a full description of each of these variables can be found at <a href="https://www.openintro.org/stat/data/?data=evals">openintro.org</a>, let’s summarize what each of these variables represents.</p>
+<ol style="list-style-type: decimal">
+<li><code>score</code>: Numerical variable of the average teaching score based on students’ evaluations between 1 and 5. This is the outcome variable <span class="math inline">\(y\)</span> of interest.</li>
+<li><code>bty_avg</code>: Numerical variable of average “beauty” rating based on a panel of 6 students’ scores between 1 and 10. This is the numerical explanatory variable <span class="math inline">\(x\)</span> of interest. Here 1 corresponds to a low beauty rating and 10 to a high beauty rating.</li>
+<li><code>age</code>: A numerical variable of age in years as an integer value.</li>
+</ol>
+<p>Another way to look at the raw values is using the <code>glimpse()</code> function, which gives us a slightly different view of the data. We see <code>Observations: 463</code>, indicating that there are 463 observations in <code>evals</code>, each corresponding to a particular instructor at UT Austin. Expressed differently, each row in the data frame <code>evals</code> corresponds to one of 463 instructors.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(evals_ch6)</code></pre>
+<pre><code>Observations: 463
+Variables: 3
+$ score   &lt;dbl&gt; 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4.5, 4…
+$ bty_avg &lt;dbl&gt; 5.00, 5.00, 5.00, 5.00, 3.00, 3.00, 3.00, 3.33, 3.33, 3.17, 3…
+$ age     &lt;int&gt; 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, 40, 4…</code></pre>
+<p>Since both the outcome variable <code>score</code> and the explanatory variable <code>bty_avg</code> are numerical, we can compute summary statistics about them such as the mean, median, and standard deviation. Let’s take <code>evals_ch6</code> and select only the two variables of interest for now. However, let’s instead pipe this into the <code>skim()</code> function from the <code>skimr</code> package. This function quickly uses a “skim” of the data to return the following summary information about each variable.</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch6 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(score, bty_avg) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">skim</span>()</code></pre>
+<pre><code>Skim summary statistics
+ n obs: 463 
+ n variables: 2 
+
+── Variable type:numeric ─────
+ variable missing complete   n mean   sd   p0  p25  p50 p75 p100     hist
+  bty_avg       0      463 463 4.42 1.53 1.67 3.17 4.33 5.5 8.17 ▂▅▅▇▃▃▂▁
+    score       0      463 463 4.17 0.54 2.3  3.8  4.3  4.6 5    ▁▁▂▃▅▇▇▆</code></pre>
+<p>In this case for our two numerical variables <code>bty_avg</code> beauty score and teaching score <code>score</code> it returns:</p>
+<ul>
+<li><code>missing</code>: the number of missing values</li>
+<li><code>complete</code>: the number of non-missing or complete values</li>
+<li><code>n</code>: the total number of values</li>
+<li><code>mean</code>: the average</li>
+<li><code>sd</code>: the standard deviation</li>
+<li><code>p0</code>: the 0<sup>th</sup> percentile: the value at which 0% of observations are smaller than it. This is also known as the <em>minimum</em></li>
+<li><code>p25</code>: the 25<sup>th</sup> percentile: the value at which 25% of observations are smaller than it. This is also known as the <em>1<sup>st</sup> quartile</em></li>
+<li><code>p50</code>: the 25<sup>th</sup> percentile: the value at which 50% of observations are smaller than it. This is also know as the <em>2<sup>nd</sup></em> quartile and more commonly the <em>median</em></li>
+<li><code>p75</code>: the 75<sup>th</sup> percentile: the value at which 75% of observations are smaller than it. This is also known as the <em>3<sup>rd</sup> quartile</em></li>
+<li><code>p100</code>: the 100<sup>th</sup> percentile: the value at which 100% of observations are smaller than it. This is also known as the <em>maximum</em></li>
+<li>A quick snapshot of the <code>hist</code>ogram</li>
+</ul>
+<p>We get an idea of how the values in both variables are distributed. For example, the mean teaching score was 4.17 out of 5 whereas the mean beauty score was 4.42 out of 10. Furthermore, the middle 50% of teaching scores were between 3.80 and 4.6 (the first and third quartiles) while the middle 50% of beauty scores were between 3.17 and 5.5 out of 10.</p>
+<p>The <code>skim()</code> function however only returns what are called <em>univariate</em> summaries, i.e. summaries about single variables at a time. Since we are considering the relationship between two numerical variables, it would be nice to have a summary statistic that simultaneously considers both variables. The <em>correlation coefficient</em> is a <em>bivariate</em> summary statistic that fits this bill. <em>Coefficients</em> in general are quantitative expressions of a specific property of a phenomenon. A correlation coefficient is a quantitative expression between -1 and 1 that summarizes the <em>strength of the linear relationship between two numerical variables</em>:</p>
+<ul>
+<li>-1 indicates a perfect <em>negative relationship</em>: as the value of one variable goes up, the value of the other variable tends to go down.</li>
+<li>0 indicates no relationship: the values of both variables go up/down independently of each other.</li>
+<li>+1 indicates a perfect <em>positive relationship</em>: as the value of one variable goes up, the value of the other variable tends to go up as well.</li>
+</ul>
+<p>Figure <a href="6-regression.html#fig:correlation1">6.1</a> gives examples of different correlation coefficient values for hypothetical numerical variables <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>. We see that while for a correlation coefficient of -0.75 there is still a negative relationship between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>, it is not as strong as the negative relationship between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> when the correlation coefficient is -1.</p>
+<div class="figure" style="text-align: center"><span id="fig:correlation1"></span>
+<img src="ismaykim_files/figure-html/correlation1-1.png" alt="Different correlation coefficients" width="\textwidth" />
+<p class="caption">
+Figure 6.1: Different correlation coefficients
+</p>
+</div>
+<p>The correlation coefficient is computed using the <code>get_correlation()</code> function in the <code>moderndive</code> package, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. We place the name of the response variable on the left hand side of the <code>~</code> and the explanatory variable on the right hand side of the “tilde.” We will use this same “formula” syntax with regression later in this chapter.</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch6 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_correlation</span>(<span class="dt">formula =</span> score <span class="op">~</span><span class="st"> </span>bty_avg)</code></pre>
+<pre><code># A tibble: 1 x 1
+  correlation
+        &lt;dbl&gt;
+1       0.187</code></pre>
+<p>The correlation coefficient can also be computed using the <code>cor()</code> function, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. Recall from Subsection <a href="2-getting-started.html#exploredataframes">2.4.3</a> that the <code>$</code> pulls out specific variables from a data frame:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">cor</span>(<span class="dt">x =</span> evals_ch6<span class="op">$</span>bty_avg, <span class="dt">y =</span> evals_ch6<span class="op">$</span>score)</code></pre>
+<pre><code>[1] 0.187</code></pre>
+<p>In our case, the correlation coefficient of 0.187 indicates that the relationship between teaching evaluation score and beauty average is “weakly positive.” There is a certain amount of subjectivity in interpreting correlation coefficients, especially those that aren’t close to -1, 0, and 1. For help developing such intuition and more discussion on the correlation coefficient see Subsection <a href="6-regression.html#correlationcoefficient">6.3.1</a> below.</p>
+<p>Let’s now proceed by visualizing this data. Since both the <code>score</code> and <code>bty_avg</code> variables are numerical, a scatterplot is an appropriate graph to visualize this data. Let’s do this using <code>geom_point()</code> and set informative axes labels and title and display the result in Figure <a href="6-regression.html#fig:numxplot1">6.2</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(evals_ch6, <span class="kw">aes</span>(<span class="dt">x =</span> bty_avg, <span class="dt">y =</span> score)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Beauty Score&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Teaching Score&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Relationship of teaching and beauty scores&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:numxplot1"></span>
+<img src="ismaykim_files/figure-html/numxplot1-1.png" alt="Instructor evaluation scores at UT Austin" width="\textwidth" />
+<p class="caption">
+Figure 6.2: Instructor evaluation scores at UT Austin
+</p>
+</div>
+<p>Observe the following:</p>
+<ol style="list-style-type: decimal">
+<li>Most “beauty” scores lie between 2 and 8.</li>
+<li>Most teaching scores lie between 3 and 5.</li>
+<li>Recall our earlier computation of the correlation coefficient, which describes the strength of the linear relationship between two numerical variables. Looking at Figure <a href="6-regression.html#fig:numxplot2">6.3</a>, it is not immediately apparent that these two variables are positively related. This is to be expected given the positive, but rather weak (close to 0), correlation coefficient of 0.187.</li>
+</ol>
+<p>Before we continue, we bring to light an important fact about this dataset: it suffers from <em>overplotting</em>. Recall from the data visualization Subsection <a href="3-viz.html#overplotting">3.3.2</a> that overplotting occurs when several points are stacked directly on top of each other thereby obscuring the number of points. For example, let’s focus on the 6 points in the top-right of the plot with a beauty score of around 8 out of 10: are there truly only 6 points, or are there many more just stacked on top of each other? You can think of these as <em>ties</em>. Let’s break up these ties with a little random “jitter” added to the points in Figure <a href="6-regression.html#fig:numxplot2">6.3</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:numxplot2"></span>
+<img src="ismaykim_files/figure-html/numxplot2-1.png" alt="Instructor evaluation scores at UT Austin: Jittered" width="\textwidth" />
+<p class="caption">
+Figure 6.3: Instructor evaluation scores at UT Austin: Jittered
+</p>
+</div>
+<p>Jittering adds a little random bump to each of the points to break up these ties: just enough so you can distinguish them, but not so much that the plot is overly altered. Furthermore, jittering is strictly a visualization tool; it does not alter the original values in the dataset.</p>
+<p>Let’s compare side-by-side the regular scatterplot in Figure <a href="6-regression.html#fig:numxplot1">6.2</a> with the jittered scatterplot in Figure <a href="6-regression.html#fig:numxplot2">6.3</a> in Figure <a href="6-regression.html#fig:numxplot2-a">6.4</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:numxplot2-a"></span>
+<img src="ismaykim_files/figure-html/numxplot2-a-1.png" alt="Comparing regular and jittered scatterplots." width="\textwidth" />
+<p class="caption">
+Figure 6.4: Comparing regular and jittered scatterplots.
+</p>
+</div>
+<p>We make several further observations:</p>
+<!-- We might want to actually highlight these points in the plot. -->
+<ol style="list-style-type: decimal">
+<li>Focusing our attention on the top-right of the plot again, as noted earlier where there seemed to only be 6 points in the regular scatterplot, we see there were in fact really 9 as seen in the jittered scatterplot.</li>
+<li>A further interesting trend is that the jittering revealed a large number of instructors with beauty scores of between 3 and 4.5, towards the lower end of the beauty scale.</li>
+</ol>
+<p>Going forward for simplicity’s sake however, we’ll only present regular scatterplot rather than the jittered scatterplots; we’ll only keep the overplotting in mind whenever looking at such plots. Going back to scatterplot in Figure <a href="6-regression.html#fig:numxplot1">6.2</a>, let’s improve on it by adding a “regression line” in Figure <a href="6-regression.html#fig:numxplot3">6.5</a>. This is easily done by adding a new layer to the <code>ggplot</code> code that created Figure <a href="6-regression.html#fig:numxplot2">6.3</a>: <code>+ geom_smooth(method = &quot;lm&quot;)</code>. A regression line is a “best fitting” line in that of all possible lines you could draw on this plot, it is “best” in terms of some mathematical criteria. We discuss the criteria for “best” in Subsection <a href="6-regression.html#leastsquares">6.3.3</a> below, but we suggest you read this only after covering the concept of a <em>residual</em> coming up in Subsection <a href="6-regression.html#model1points">6.1.3</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(evals_ch6, <span class="kw">aes</span>(<span class="dt">x =</span> bty_avg, <span class="dt">y =</span> score)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Beauty Score&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Teaching Score&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Relationship of teaching and beauty scores&quot;</span>) <span class="op">+</span><span class="st">  </span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:numxplot3"></span>
+<img src="ismaykim_files/figure-html/numxplot3-1.png" alt="Regression line" width="\textwidth" />
+<p class="caption">
+Figure 6.5: Regression line
+</p>
+</div>
+<p>When viewed on this plot, the regression line is a visual summary of the relationship between two numerical variables, in our case the outcome variable <code>score</code> and the explanatory variable <code>bty_avg</code>. The positive slope of the blue line is consistent with our observed correlation coefficient of 0.187 suggesting that there is a positive relationship between <code>score</code> and <code>bty_avg</code>. We’ll see later however that while the correlation coefficient is not equal to the slope of this line, they always have the same sign: positive or negative.</p>
+<p>What are the grey bands surrounding the blue line? These are <em>standard error</em> bands, which can be thought of as error/uncertainty bands. Let’s skip this idea for now and suppress these grey bars for now by adding the argument <code>se = FALSE</code> to <code>geom_smooth(method = &quot;lm&quot;)</code>. We’ll introduce standard errors in Chapter <a href="8-sampling.html#sampling">8</a> on sampling, use them for constructing <em>confidence intervals</em> and conducting <em>hypothesis tests</em> in Chapters <a href="9-confidence-intervals.html#confidence-intervals">9</a> and <a href="10-hypothesis-testing.html#hypothesis-testing">10</a>, and consider them when we revisit regression in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(evals_ch6, <span class="kw">aes</span>(<span class="dt">x =</span> bty_avg, <span class="dt">y =</span> score)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Beauty Score&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Teaching Score&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Relationship of teaching and beauty scores&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>, <span class="dt">se =</span> <span class="ot">FALSE</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:numxplot4"></span>
+<img src="ismaykim_files/figure-html/numxplot4-1.png" alt="Regression line without error bands" width="\textwidth" />
+<p class="caption">
+Figure 6.6: Regression line without error bands
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.1)</strong> Conduct a new exploratory data analysis with the same outcome variable <span class="math inline">\(y\)</span> being <code>score</code> but with <code>age</code> as the new explanatory variable <span class="math inline">\(x\)</span>. Remember, this involves three things:</p>
+<ol style="list-style-type: lower-alpha">
+<li>Looking at the raw values.</li>
+<li>Computing summary statistics of the variables of interest.</li>
+<li>Creating informative visualizations.</li>
+</ol>
+<p>What can you say about the relationship between age and teaching scores based on this exploration?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model1table" class="section level3">
+<h3><span class="header-section-number">6.1.2</span> Simple linear regression</h3>
+<p>You may recall from secondary school / high school algebra, in general, the equation of a line is <span class="math inline">\(y = a + bx\)</span>, which is defined by two coefficients. Recall we defined this earlier as “quantitative expressions of a specific property of a phenomenon.” These two coefficients are:</p>
+<ul>
+<li>the intercept coefficient <span class="math inline">\(a\)</span>, or the value of <span class="math inline">\(y\)</span> when <span class="math inline">\(x = 0\)</span>, and</li>
+<li>the slope coefficient <span class="math inline">\(b\)</span>, or the increase in <span class="math inline">\(y\)</span> for every increase of one in <span class="math inline">\(x\)</span>.</li>
+</ul>
+<p>However, when defining a line specifically for regression, like the blue regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>, we use slightly different notation: the equation of the regression line is <span class="math inline">\(\widehat{y} = b_0 + b_1 \cdot x\)</span> where</p>
+<ul>
+<li>the intercept coefficient is <span class="math inline">\(b_0\)</span>, or the value of <span class="math inline">\(\widehat{y}\)</span> when <span class="math inline">\(x=0\)</span>, and</li>
+<li>the slope coefficient <span class="math inline">\(b_1\)</span>, or the increase in <span class="math inline">\(\widehat{y}\)</span> for every increase of one in <span class="math inline">\(x\)</span>.</li>
+</ul>
+<p>Why do we put a “hat” on top of the <span class="math inline">\(y\)</span>? It’s a form of notation commonly used in regression, which we’ll introduce in the next Subsection <a href="6-regression.html#model1points">6.1.3</a> when we discuss <em>fitted values</em>. For now, let’s ignore the hat and treat the equation of the line as you would from secondary school / high school algebra recognizing the slope and the intercept. We know looking at Figure <a href="6-regression.html#fig:numxplot4">6.6</a> that the slope coefficient corresponding to <code>bty_avg</code> should be positive. Why? Because as <code>bty_avg</code> increases, professors tend to roughly have larger teaching evaluation <code>scores</code>. However, what are the specific values of the intercept and slope coefficients? Let’s not worry about computing these by hand, but instead let the computer do the work for us. Specifically let’s use R!</p>
+<p>Let’s get the value of the intercept and slope coefficients by outputting something called the <em>linear regression table</em>. We will fit the linear regression model to the <code>data</code> using the <code>lm()</code> function and save this to <code>score_model</code>. <code>lm</code> stands for “linear model”, given that we are dealing with lines. When we say “fit”, we are saying find the best fitting line to this data.</p>
+<p>The <code>lm()</code> function that “fits” the linear regression model is typically used as <code>lm(y ~ x, data = data_frame_name)</code> where:</p>
+<ul>
+<li><code>y</code> is the outcome variable, followed by a tilde (<code>~</code>). This is likely the key to the left of “1” on your keyboard. In our case, <code>y</code> is set to <code>score</code>.</li>
+<li><code>x</code> is the explanatory variable. In our case, <code>x</code> is set to <code>bty_avg</code>. We call the combination <code>y ~ x</code> a <em>model formula</em>. Recall the use of this notation when we computed the correlation coefficient using the <code>get_correlation()</code> function in Subsection <a href="6-regression.html#model1EDA">6.1.1</a>.</li>
+<li><code>data_frame_name</code> is the name of the data frame that contains the variables <code>y</code> and <code>x</code>. In our case, <code>data_frame_name</code> is the <code>evals_ch6</code> data frame.</li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r">score_model &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>bty_avg, <span class="dt">data =</span> evals_ch6)
+score_model</code></pre>
+<pre><code>
+Call:
+lm(formula = score ~ bty_avg, data = evals_ch6)
+
+Coefficients:
+(Intercept)      bty_avg  
+     3.8803       0.0666  </code></pre>
+<p>This output is telling us that the <code>Intercept</code> coefficient <span class="math inline">\(b_0\)</span> of the regression line is 3.8803 and the slope coefficient for <code>by_avg</code> is 0.0666. Therefore the blue regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a> is</p>
+<p><span class="math display">\[\widehat{\text{score}} = b_0 + b_{\text{bty_avg}} \cdot\text{bty_avg} = 3.8803 + 0.0666\cdot\text{ bty_avg}\]</span></p>
+<p>where</p>
+<ul>
+<li>The intercept coefficient <span class="math inline">\(b_0 = 3.8803\)</span> means for instructors that had a hypothetical beauty score of 0, we would expect them to have on average a teaching score of 3.8803. In this case however, while the intercept has a mathematical interpretation when defining the regression line, there is no <em>practical</em> interpretation since <code>score</code> is an average of a panel of 6 students’ ratings from 1 to 10, a <code>bty_avg</code> of 0 would be impossible. Furthermore, no instructors had a beauty score anywhere near 0 in this data.</li>
+<li><p>Of more interest is the slope coefficient associated with <code>bty_avg</code>: <span class="math inline">\(b_{\text{bty avg}} = +0.0666\)</span>. This is a numerical quantity that summarizes the relationship between the outcome and explanatory variables. Note that the sign is positive, suggesting a positive relationship between beauty scores and teaching scores, meaning as beauty scores go up, so also do teaching scores go up. The slope’s precise interpretation is:</p>
+<blockquote>
+<p>For every increase of 1 unit in <code>bty_avg</code>, there is an <em>associated</em> increase of, <em>on average</em>, 0.0666 units of <code>score</code>.</p>
+</blockquote></li>
+</ul>
+<p>Such interpretations need be carefully worded:</p>
+<ul>
+<li>We only stated that there is an <em>associated</em> increase, and not necessarily a <em>causal</em> increase. For example, perhaps it’s not that beauty directly affects teaching scores, but instead individuals from wealthier backgrounds tend to have had better education and training, and hence have higher teaching scores, but these same individuals also have higher beauty scores. Avoiding such reasoning can be summarized by the adage “correlation is not necessarily causation.” In other words, just because two variables are correlated, it doesn’t mean one directly causes the other. We discuss these ideas more in Subsection <a href="6-regression.html#correlation-is-not-causation">6.3.2</a>.<br />
+</li>
+<li>We say that this associated increase is <em>on average</em> 0.0666 units of teaching <code>score</code> and not that the associated increase is <em>exactly</em> 0.0666 units of <code>score</code> across all values of <code>bty_avg</code>. This is because the slope is the average increase across all points as shown by the regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>.</li>
+</ul>
+<p>Now that we’ve learned how to compute the equation for the blue regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a> and interpreted all its terms, let’s take our modeling one step further. This time after fitting the model using the <code>lm()</code>, let’s get something called the <em>regression table</em> using the <code>get_regression_table()</code> function from the <code>moderndive</code> package:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Fit regression model:</span>
+score_model &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>bty_avg, <span class="dt">data =</span> evals_ch6)
+<span class="co"># Get regression table:</span>
+<span class="kw">get_regression_table</span>(score_model)</code></pre>
+<table>
+<caption><span id="tab:numxplot4b">Table 6.2: </span>Linear regression table</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">3.880</td>
+<td align="right">0.076</td>
+<td align="right">50.96</td>
+<td align="right">0</td>
+<td align="right">3.731</td>
+<td align="right">4.030</td>
+</tr>
+<tr class="even">
+<td align="left">bty_avg</td>
+<td align="right">0.067</td>
+<td align="right">0.016</td>
+<td align="right">4.09</td>
+<td align="right">0</td>
+<td align="right">0.035</td>
+<td align="right">0.099</td>
+</tr>
+</tbody>
+</table>
+<p>Note how we took the output of the model fit saved in <code>score_model</code> and used it as an input to the subsequent <code>get_regression_table()</code> function. The output now looks like a table: in fact it is a data frame. The values of the intercept and slope of 3.880 and 0.0666 are now in the <code>estimate</code> column. But what are the remaining 5 columns: <code>std_error</code>, <code>statistic</code>, <code>p_value</code>, <code>lower_ci</code> and <code>upper_ci</code>? What do they tell us? They tell us about both the <em>statistical significance</em> and <em>practical significance</em> of our model results. You can think of this loosely as the “meaningfulness” of the results from a statistical perspective.</p>
+<p>We are going to put aside these ideas for now and revisit them in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on (statistical) inference for regression, after we’ve had a chance to cover:</p>
+<ul>
+<li>Standard errors in Chapter <a href="8-sampling.html#sampling">8</a> (<code>std_error</code>)</li>
+<li>Confidence intervals in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a> (<code>lower_ci</code> and <code>upper_ci</code>)</li>
+<li>Hypothesis testing in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> (<code>statistic</code> and <code>p_value</code>).</li>
+</ul>
+<p>For now, we’ll only focus on the <code>term</code> and <code>estimate</code> columns of any regression table.</p>
+<p>The <code>get_regression_table()</code> from the <code>moderndive</code> is an example of what’s known as a <em>wrapper function</em> in computer programming, which takes other pre-existing functions and “wraps” them into a single function. This concept is illustrated in Figure <a href="6-regression.html#fig:moderndive-figure-wrapper">6.7</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:moderndive-figure-wrapper"></span>
+<img src="images/flowcharts/flowchart.011-cropped.png" alt="The concept of a 'wrapper' function." width="\textwidth" />
+<p class="caption">
+Figure 6.7: The concept of a ‘wrapper’ function.
+</p>
+</div>
+<p>So all you need to worry about is the what the inputs look like and what the outputs look like; you leave all the other details “under the hood of the car.” In our regression modeling example, the <code>get_regression_table()</code> has</p>
+<ul>
+<li>Input: A saved <code>lm()</code> linear regression</li>
+<li>Output: A data frame with information on the intercept and slope of the regression line.</li>
+</ul>
+<p>If you’re interested in learning more about the <code>get_regression_table()</code> function’s construction and thinking, see Subsection <a href="6-regression.html#underthehood">6.3.4</a> below.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.2)</strong> Fit a new simple linear regression using <code>lm(score ~ age, data = evals_ch6)</code> where <code>age</code> is the new explanatory variable <span class="math inline">\(x\)</span>. Get information about the “best-fitting” line from the regression table by applying the <code>get_regression_table()</code> function. How do the regression results match up with the results from your exploratory data analysis above?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model1points" class="section level3">
+<h3><span class="header-section-number">6.1.3</span> Observed/fitted values and residuals</h3>
+<p>We just saw how to get the value of the intercept and the slope of the regression line from the regression table generated by <code>get_regression_table()</code>. Now instead, say we want information on individual points. In this case, we focus on one of the <span class="math inline">\(n = 463\)</span> instructors in this dataset, corresponding to a single row of <code>evals_ch6</code>.</p>
+<p>For example, say we are interested in the 21st instructor in this dataset:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-170">Table 6.3: </span>Data for 21st instructor</caption>
+<thead>
+<tr class="header">
+<th align="right">score</th>
+<th align="right">bty_avg</th>
+<th align="right">age</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">4.9</td>
+<td align="right">7.33</td>
+<td align="right">31</td>
+</tr>
+</tbody>
+</table>
+<p>What is the value on the blue line corresponding to this instructor’s <code>bty_avg</code> of 7.333? In Figure <a href="6-regression.html#fig:numxplot5">6.8</a> we mark three values in particular corresponding to this instructor.</p>
+<ul>
+<li>Red circle: This is the <em>observed value</em> <span class="math inline">\(y\)</span> = 4.9 and corresponds to this instructor’s actual teaching score.</li>
+<li>Red square: This is the <em>fitted value</em> <span class="math inline">\(\widehat{y}\)</span> and corresponds to the value on the regression line for <span class="math inline">\(x\)</span> = 7.333. This value is computed using the intercept and slope in the regression table above: <span class="math display">\[\widehat{y} = b_0 + b_1 \cdot x = 3.88 + 0.067 * 7.333 = 4.369\]</span></li>
+<li>Blue arrow: The length of this arrow is the <em>residual</em> and is computed by subtracting the fitted value <span class="math inline">\(\widehat{y}\)</span> from the observed value <span class="math inline">\(y\)</span>. The residual can be thought of as the error or “lack of fit” of the regression line. In the case of this instructor, it is <span class="math inline">\(y - \widehat{y}\)</span> = 4.9 - 4.369 = 0.531. In other words, the model was off by 0.531 teaching score units for this instructor.</li>
+</ul>
+<div class="figure" style="text-align: center"><span id="fig:numxplot5"></span>
+<img src="ismaykim_files/figure-html/numxplot5-1.png" alt="Example of observed value, fitted value, and residual" width="\textwidth" />
+<p class="caption">
+Figure 6.8: Example of observed value, fitted value, and residual
+</p>
+</div>
+<p>What if we want both</p>
+<ol style="list-style-type: decimal">
+<li>the fitted value <span class="math inline">\(\widehat{y} = b_0 + b_1 \cdot x\)</span> and</li>
+<li>the residual <span class="math inline">\(y - \widehat{y}\)</span></li>
+</ol>
+<p>not only the 21st instructor but for all 463 instructors in the study? Recall that each instructor corresponds to one of the 463 rows in the <code>evals_ch6</code> data frame and also one of the 463 points in the regression plot in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>.</p>
+<p>We could repeat the above calculations by hand 463 times, but that would be tedious and time consuming. Instead, let’s use the <code>get_regression_points()</code> function that we’ve included in the <code>moderndive</code> R package. Note that in the table below we only present the results for the 21st through the 24th instructors.</p>
+<pre class="sourceCode r"><code class="sourceCode r">regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(score_model)
+regression_points</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-173">Table 6.4: </span>Regression points (for only 21st through 24th instructor)</caption>
+<thead>
+<tr class="header">
+<th align="right">ID</th>
+<th align="right">score</th>
+<th align="right">bty_avg</th>
+<th align="right">score_hat</th>
+<th align="right">residual</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">21</td>
+<td align="right">4.9</td>
+<td align="right">7.33</td>
+<td align="right">4.37</td>
+<td align="right">0.531</td>
+</tr>
+<tr class="even">
+<td align="right">22</td>
+<td align="right">4.6</td>
+<td align="right">7.33</td>
+<td align="right">4.37</td>
+<td align="right">0.231</td>
+</tr>
+<tr class="odd">
+<td align="right">23</td>
+<td align="right">4.5</td>
+<td align="right">7.33</td>
+<td align="right">4.37</td>
+<td align="right">0.131</td>
+</tr>
+<tr class="even">
+<td align="right">24</td>
+<td align="right">4.4</td>
+<td align="right">5.50</td>
+<td align="right">4.25</td>
+<td align="right">0.153</td>
+</tr>
+</tbody>
+</table>
+<p>Just as with the <code>get_regression_table()</code> function, the inputs to the <code>get_regression_points()</code> function are the same, however the outputs are different. Let’s inspect the individual columns:</p>
+<ul>
+<li>The <code>score</code> column represents the observed value of the outcome variable <span class="math inline">\(y\)</span>.</li>
+<li>The <code>bty_avg</code> column represents the values of the explanatory variable <span class="math inline">\(x\)</span>.</li>
+<li>The <code>score_hat</code> column represents the fitted values <span class="math inline">\(\widehat{y}\)</span>.</li>
+<li>The <code>residual</code> column represents the residuals <span class="math inline">\(y - \widehat{y}\)</span>.</li>
+</ul>
+<p><code>get_regression_points()</code> is another example of a wrapper function we described in Figure <a href="6-regression.html#fig:moderndive-figure-wrapper">6.7</a>. If you’re curious about this function as well, check out Subsection <a href="6-regression.html#underthehood">6.3.4</a>.</p>
+<p>Just as we did for the 21st instructor in the <code>evals_ch6</code> dataset (in the first row of the table above), let’s repeat the above calculations for the 24th instructor in the <code>evals_ch6</code> dataset (in the fourth row of the table above):</p>
+<ul>
+<li><code>score</code> = 4.4 is the observed value <span class="math inline">\(y\)</span> for this instructor.</li>
+<li><code>bty_avg</code> = 5.50 is the value of the explanatory variable <span class="math inline">\(x\)</span> for this instructor.</li>
+<li><code>score_hat</code> = 4.25 = 3.88 + 0.067 * <span class="math inline">\(x\)</span> = 3.88 + 0.067 * 5.50 is the fitted value <span class="math inline">\(\widehat{y}\)</span> for this instructor.</li>
+<li><code>residual</code> = 0.153 = 4.4 - 4.25 is the value of the residual for this instructor. In other words, the model was off by 0.153 teaching score units for this instructor.</li>
+</ul>
+<p>More development of this idea appears in Section <a href="6-regression.html#leastsquares">6.3.3</a> and we encourage you to read that section after you investigate residuals.</p>
+</div>
+<div id="model1residuals" class="section level3">
+<h3><span class="header-section-number">6.1.4</span> Residual analysis</h3>
+<p>Recall the residuals can be thought of as the error or the “lack-of-fit” between the observed value <span class="math inline">\(y\)</span> and the fitted value <span class="math inline">\(\widehat{y}\)</span> on the blue regression line in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>. Ideally when we fit a regression model, we’d like there to be <em>no systematic pattern</em> to these residuals. We’ll be more specific as to what we mean by <em>no systematic pattern</em> when we see Figure <a href="6-regression.html#fig:numxplot7">6.10</a> below, but let’s keep this notion imprecise for now. Investigating any such patterns is known as <em>residual analysis</em> and is the theme of this section.</p>
+<p>We’ll perform our residual analysis in two ways:</p>
+<ol style="list-style-type: decimal">
+<li>Creating a scatterplot with the residuals on the <span class="math inline">\(y\)</span>-axis and the original explanatory variable <span class="math inline">\(x\)</span> on the <span class="math inline">\(x\)</span>-axis.</li>
+<li>Creating a histogram of the residuals, thereby showing the <em>distribution</em> of the residuals.</li>
+</ol>
+<p>First, recall in Figure <a href="6-regression.html#fig:numxplot5">6.8</a> above we created a scatterplot where</p>
+<ul>
+<li>on the vertical axis we had the teaching score <span class="math inline">\(y\)</span>,</li>
+<li>on the horizontal axis we had the beauty score <span class="math inline">\(x\)</span>, and</li>
+<li>the blue arrow represented the residual for one particular instructor.</li>
+</ul>
+<p>Instead, in Figure <a href="6-regression.html#fig:numxplot6">6.9</a> below, let’s create a scatterplot where</p>
+<ul>
+<li>On the vertical axis we have the residual <span class="math inline">\(y-\widehat{y}\)</span> instead</li>
+<li>On the horizontal axis we have the beauty score <span class="math inline">\(x\)</span> as before:</li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> bty_avg, <span class="dt">y =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Beauty Score&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_hline</span>(<span class="dt">yintercept =</span> <span class="dv">0</span>, <span class="dt">col =</span> <span class="st">&quot;blue&quot;</span>, <span class="dt">size =</span> <span class="dv">1</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:numxplot6"></span>
+<img src="ismaykim_files/figure-html/numxplot6-1.png" alt="Plot of residuals over beauty score" width="\textwidth" />
+<p class="caption">
+Figure 6.9: Plot of residuals over beauty score
+</p>
+</div>
+<p>You can think of Figure <a href="6-regression.html#fig:numxplot6">6.9</a> as Figure <a href="6-regression.html#fig:numxplot5">6.8</a> but with the blue line flattened out to <span class="math inline">\(y=0\)</span>. Does it seem like there is <em>no systematic pattern</em> to the residuals? This question is rather qualitative and subjective in nature, thus different people may respond with different answers to the above question. However, it can be argued that there isn’t a <em>drastic</em> pattern in the residuals.</p>
+<p>Let’s now get a little more precise in our definition of <em>no systematic pattern</em> in the residuals. Ideally, the residuals should behave <em>randomly</em>. In addition,</p>
+<ol style="list-style-type: decimal">
+<li>the residuals should be on average 0. In other words, sometimes the regression model will make a positive error in that <span class="math inline">\(y - \widehat{y} &gt; 0\)</span>, sometimes the regression model will make a negative error in that <span class="math inline">\(y - \widehat{y} &lt; 0\)</span>, but <em>on average</em> the error is 0.</li>
+<li>Further, the value and spread of the residuals should not depend on the value of <span class="math inline">\(x\)</span>.</li>
+</ol>
+<p>In Figure <a href="6-regression.html#fig:numxplot7">6.10</a> below, we display some hypothetical examples where there are <em>drastic</em> patterns to the residuals. In Example 1, the value of the residual seems to depend on <span class="math inline">\(x\)</span>: the residuals tend to be positive for small and large values of <span class="math inline">\(x\)</span> in this range, whereas values of <span class="math inline">\(x\)</span> more in the middle tend to have negative residuals. In Example 2, while the residuals seem to be on average 0 for each value of <span class="math inline">\(x\)</span>, the spread of the residuals varies for different values of <span class="math inline">\(x\)</span>; this situation is known as <em>heteroskedasticity</em>.</p>
+<div class="figure" style="text-align: center"><span id="fig:numxplot7"></span>
+<img src="ismaykim_files/figure-html/numxplot7-1.png" alt="Examples of less than ideal residual patterns" width="\textwidth" />
+<p class="caption">
+Figure 6.10: Examples of less than ideal residual patterns
+</p>
+</div>
+<p>The second way to perform a residual analysis is to look at the histogram of the residuals:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.25</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:model1residualshist"></span>
+<img src="ismaykim_files/figure-html/model1residualshist-1.png" alt="Histogram of residuals" width="\textwidth" />
+<p class="caption">
+Figure 6.11: Histogram of residuals
+</p>
+</div>
+<p>This histogram seems to indicate that we have more positive residuals than negative. Since the residual <span class="math inline">\(y-\widehat{y}\)</span> is positive when <span class="math inline">\(y &gt; \widehat{y}\)</span>, it seems our fitted teaching score from the regression model tends to <em>underestimate</em> the true teaching score. This histogram has a slight <em>left-skew</em> in that there is a long tail on the left. Another way to say this is this data exhibits a <em>negative skew</em>. Is this a problem? Again, there is a certain amount of subjectivity in the response. In the authors’ opinion, while there is a slight skew/pattern to the residuals, it isn’t a large concern. On the other hand, others might disagree with our assessment. Here are examples of an ideal and less than ideal pattern to the residuals when viewed in a histogram:</p>
+<div class="figure" style="text-align: center"><span id="fig:numxplot9"></span>
+<img src="ismaykim_files/figure-html/numxplot9-1.png" alt="Examples of ideal and less than ideal residual patterns" width="\textwidth" />
+<p class="caption">
+Figure 6.12: Examples of ideal and less than ideal residual patterns
+</p>
+</div>
+<p>In fact, we’ll see later on that we would like the residuals to be <em>normally distributed</em> with
+mean 0. In other words, be bell-shaped and centered at 0! While this requirement and residual analysis in general may seem to some of you as not being overly critical at this point, we’ll see later after when we cover <em>inference for regression</em> in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> that for the last five columns of the regression table from earlier (<code>std error</code>, <code>statistic</code>, <code>p_value</code>,<code>lower_ci</code>, and <code>upper_ci</code>) to have valid interpretations, the above three conditions should roughly hold.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.3)</strong> Continuing with our regression using <code>age</code> as the explanatory variable and teaching <code>score</code> as the outcome variable, use the <code>get_regression_points()</code> function to get the observed values, fitted values, and residuals for all 463 instructors. Perform a residual analysis and look for any systematic patterns in the residuals. Ideally, there should be little to no pattern.</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="model2" class="section level2">
+<h2><span class="header-section-number">6.2</span> One categorical explanatory variable</h2>
+<p>It’s an unfortunate truth that life expectancy is not the same across various countries in the world; there are a multitude of factors that are associated with how long people live. International development agencies are very interested in studying these differences in the hope of understanding where governments should allocate resources to address this problem. In this section, we’ll explore differences in life expectancy in two ways:</p>
+<ol style="list-style-type: decimal">
+<li>Differences between continents: Are there significant differences in life expectancy, on average, between the five continents of the world: Africa, the Americas, Asia, Europe, and Oceania?</li>
+<li>Differences within continents: How does life expectancy vary within the world’s five continents? For example, is the spread of life expectancy among the countries of Africa larger than the spread of life expectancy among the countries of Asia?</li>
+</ol>
+<p>To answer such questions, we’ll study the <code>gapminder</code> dataset in the <code>gapminder</code> package. Recall we mentioned this dataset in Subsection <a href="3-viz.html#gapminder">3.1.2</a> when we first studied the “Grammar of Graphics” introduced in Figure <a href="3-viz.html#fig:gapminder">3.1</a>. This dataset has international development statistics such as life expectancy, GDP per capita, and population by country (<span class="math inline">\(n\)</span> = 142) for 5-year intervals between 1952 and 2007.</p>
+<p>We’ll use this data for linear regression again, but note that our explanatory variable <span class="math inline">\(x\)</span> is now categorical, and not numerical like when we covered simple linear regression in Section <a href="6-regression.html#model1">6.1</a>. More precisely, we have:</p>
+<ol style="list-style-type: decimal">
+<li>A numerical outcome variable <span class="math inline">\(y\)</span>. In this case, life expectancy.</li>
+<li>A single categorical explanatory variable <span class="math inline">\(x\)</span>, In this case, the continent the country is part of.</li>
+</ol>
+<p>When the explanatory variable <span class="math inline">\(x\)</span> is categorical, the concept of a “best-fitting” line is a little different than the one we saw previously in Section <a href="6-regression.html#model1">6.1</a> where the explanatory variable <span class="math inline">\(x\)</span> was numerical. We’ll study these differences shortly in Subsection <a href="6-regression.html#model2table">6.2.2</a>, but first we conduct our exploratory data analysis.</p>
+<div id="model2EDA" class="section level3">
+<h3><span class="header-section-number">6.2.1</span> Exploratory data analysis</h3>
+<p>Let’s load the <code>gapminder</code> data and <code>filter()</code> for only observations in 2007. Next we <code>select()</code> only the variables we’ll need along with <code>gdpPercap</code>, which is each country’s gross domestic product per capita (GDP). GDP is a rough measure of that country’s economic performance. (This will be used for the upcoming Learning Check). Lastly, we save this in a data frame with name <code>gapminder2007</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(gapminder)
+gapminder2007 &lt;-<span class="st"> </span>gapminder <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">filter</span>(year <span class="op">==</span><span class="st"> </span><span class="dv">2007</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(country, continent, lifeExp, gdpPercap)</code></pre>
+<p>You should look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the <code>glimpse()</code> function. In Table <a href="6-regression.html#tab:model2-data-preview">6.5</a> we only show 5 randomly selected countries out of 142:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(gapminder2007)</code></pre>
+<table>
+<caption><span id="tab:model2-data-preview">Table 6.5: </span>Random sample of 5 countries</caption>
+<thead>
+<tr class="header">
+<th align="left">country</th>
+<th align="left">continent</th>
+<th align="right">lifeExp</th>
+<th align="right">gdpPercap</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Slovak Republic</td>
+<td align="left">Europe</td>
+<td align="right">74.7</td>
+<td align="right">18678</td>
+</tr>
+<tr class="even">
+<td align="left">Israel</td>
+<td align="left">Asia</td>
+<td align="right">80.7</td>
+<td align="right">25523</td>
+</tr>
+<tr class="odd">
+<td align="left">Bulgaria</td>
+<td align="left">Europe</td>
+<td align="right">73.0</td>
+<td align="right">10681</td>
+</tr>
+<tr class="even">
+<td align="left">Tanzania</td>
+<td align="left">Africa</td>
+<td align="right">52.5</td>
+<td align="right">1107</td>
+</tr>
+<tr class="odd">
+<td align="left">Myanmar</td>
+<td align="left">Asia</td>
+<td align="right">62.1</td>
+<td align="right">944</td>
+</tr>
+</tbody>
+</table>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(gapminder2007)</code></pre>
+<pre><code>Observations: 142
+Variables: 4
+$ country   &lt;fct&gt; Afghanistan, Albania, Algeria, Angola, Argentina, Australia…
+$ continent &lt;fct&gt; Asia, Europe, Africa, Africa, Americas, Oceania, Europe, As…
+$ lifeExp   &lt;dbl&gt; 43.8, 76.4, 72.3, 42.7, 75.3, 81.2, 79.8, 75.6, 64.1, 79.4,…
+$ gdpPercap &lt;dbl&gt; 975, 5937, 6223, 4797, 12779, 34435, 36126, 29796, 1391, 33…</code></pre>
+<p>We see that the variable <code>continent</code> is indeed categorical, as it is encoded as <code>fct</code> which stands for “factor.” This is R’s way of storing categorical variables. Let’s once again apply the <code>skim()</code> function from the <code>skimr</code> package to our two variables of interest: <code>continent</code> and <code>lifeExp</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">gapminder2007 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(continent, lifeExp) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">skim</span>()</code></pre>
+<pre><code>Skim summary statistics
+ n obs: 142 
+ n variables: 2 
+
+── Variable type:factor ──────
+  variable missing complete   n n_unique                         top_counts
+ continent       0      142 142        5 Afr: 52, Asi: 33, Eur: 30, Ame: 25
+ ordered
+   FALSE
+
+── Variable type:numeric ─────
+ variable missing complete   n  mean    sd    p0   p25   p50   p75 p100
+  lifeExp       0      142 142 67.01 12.07 39.61 57.16 71.94 76.41 82.6
+     hist
+ ▂▂▂▂▂▃▇▇</code></pre>
+<p>The output now reports summaries for categorical variables (the variable type: factor) separately from the numerical variables. For the categorical variable <code>continent</code> it now reports:</p>
+<ul>
+<li><code>missing</code>, <code>complete</code>, <code>n</code> as before which are the number of missing, complete, and total number of values.</li>
+<li><code>n_unique</code>: The unique number of levels to this variable, corresponding to Africa, Asia, Americas, Europe, and Oceania</li>
+<li><code>top_counts</code>: In this case the top four counts: Africa has 52 entries each corresponding to a country, Asia has 33, Europe has 30, and Americans has 25. Not displayed is Oceania with 2 countries</li>
+<li><code>ordered</code>: Reporting whether the variable is “ordinal.” In this case, it is not ordered.</li>
+</ul>
+<p>Given that the global median life expectancy is 71.94, half of the world’s countries (71 countries) will have a life expectancy less than 71.94. Further, half will have a life expectancy greater than this value. The mean life expectancy of 67.01 is lower however. Why are these two values different? Let’s look at a histogram of <code>lifeExp</code> in Figure <a href="6-regression.html#fig:lifeExp2007hist">6.13</a> to see why.</p>
+<div class="figure" style="text-align: center"><span id="fig:lifeExp2007hist"></span>
+<img src="ismaykim_files/figure-html/lifeExp2007hist-1.png" alt="Histogram of Life Expectancy in 2007" width="\textwidth" />
+<p class="caption">
+Figure 6.13: Histogram of Life Expectancy in 2007
+</p>
+</div>
+<p>We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancies that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a <code>group_by(continent)</code> to the above code:</p>
+<pre class="sourceCode r"><code class="sourceCode r">lifeExp_by_continent &lt;-<span class="st"> </span>gapminder2007 <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">group_by</span>(continent) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">median =</span> <span class="kw">median</span>(lifeExp), <span class="dt">mean =</span> <span class="kw">mean</span>(lifeExp))</code></pre>
+<table>
+<caption><span id="tab:catxplot0">Table 6.6: </span>Life expectancy by continent</caption>
+<thead>
+<tr class="header">
+<th align="left">continent</th>
+<th align="right">median</th>
+<th align="right">mean</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Africa</td>
+<td align="right">52.9</td>
+<td align="right">54.8</td>
+</tr>
+<tr class="even">
+<td align="left">Americas</td>
+<td align="right">72.9</td>
+<td align="right">73.6</td>
+</tr>
+<tr class="odd">
+<td align="left">Asia</td>
+<td align="right">72.4</td>
+<td align="right">70.7</td>
+</tr>
+<tr class="even">
+<td align="left">Europe</td>
+<td align="right">78.6</td>
+<td align="right">77.6</td>
+</tr>
+<tr class="odd">
+<td align="left">Oceania</td>
+<td align="right">80.7</td>
+<td align="right">80.7</td>
+</tr>
+</tbody>
+</table>
+<p>We see now that there are differences in life expectancies between the continents. For example let’s focus on only medians. While the median life expectancy across all <span class="math inline">\(n = 142\)</span> countries in 2007 was 71.935, the median life expectancy across the <span class="math inline">\(n =52\)</span> countries in Africa was only 52.927.</p>
+<p>Let’s create a corresponding visualization. One way to compare the life expectancies of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section <a href="3-viz.html#facets">3.6</a>, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure <a href="6-regression.html#fig:catxplot0b">6.14</a>, the variable we facet by is <code>continent</code>, which is categorical with five levels, each corresponding to the five continents of the world.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(gapminder2007, <span class="kw">aes</span>(<span class="dt">x =</span> lifeExp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Life expectancy&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Number of countries&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Life expectancy by continent&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>continent, <span class="dt">nrow =</span> <span class="dv">2</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:catxplot0b"></span>
+<img src="ismaykim_files/figure-html/catxplot0b-1.png" alt="Life expectancy in 2007" width="\textwidth" />
+<p class="caption">
+Figure 6.14: Life expectancy in 2007
+</p>
+</div>
+<p>Another way would be via a <code>geom_boxplot</code> where we map the categorical variable <code>continent</code> to the <span class="math inline">\(x\)</span>-axis and the different life expectancies within each continent on the <span class="math inline">\(y\)</span>-axis; we do this in Figure <a href="6-regression.html#fig:catxplot1">6.15</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(gapminder2007, <span class="kw">aes</span>(<span class="dt">x =</span> continent, <span class="dt">y =</span> lifeExp)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Continent&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Life expectancy (years)&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Life expectancy by continent&quot;</span>) </code></pre>
+<div class="figure" style="text-align: center"><span id="fig:catxplot1"></span>
+<img src="ismaykim_files/figure-html/catxplot1-1.png" alt="Life expectancy in 2007" width="\textwidth" />
+<p class="caption">
+Figure 6.15: Life expectancy in 2007
+</p>
+</div>
+<p>Some people prefer comparing a numerical variable between different levels of a categorical variable, in this case comparing life expectancy between different continents, using a boxplot over a faceted histogram as we can make quick comparisons with single horizontal lines. For example, we can see that even the country with the highest life expectancy in Africa is still lower than all countries in Oceania.</p>
+<p>It’s important to remember however that the solid lines in the middle of the boxes correspond to the medians (i.e. the middle value) rather than the mean (the average). So, for example, if you look at Asia, the solid line denotes the median life expectancy of around 72 years, indicating to us that half of all countries in Asia have a life expectancy below 72 years whereas half of all countries in Asia have a life expectancy above 72 years. Furthermore, note that:</p>
+<ul>
+<li>Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes).</li>
+<li>Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand.</li>
+</ul>
+<p>Now, let’s start making comparisons of life expectancy <em>between</em> continents. Let’s use Africa as a <em>baseline for comparsion</em>. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa:</p>
+<ol style="list-style-type: decimal">
+<li>The median life expectancy of the Americas is roughly 20 years greater.</li>
+<li>The median life expectancy of Asia is roughly 20 years greater.</li>
+<li>The median life expectancy of Europe is roughly 25 years greater.</li>
+<li>The median life expectancy of Oceania is roughly 27.8 years greater.</li>
+</ol>
+<p>Let’s remember these four differences vs Africa corresponding to the Americas, Asia, Europe, and Oceania: 20, 20, 25, 27.8.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.4)</strong> Conduct a new exploratory data analysis with the same explanatory variable <span class="math inline">\(x\)</span> being <code>continent</code> but with <code>gdpPercap</code> as the new outcome variable <span class="math inline">\(y\)</span>. Remember, this involves three things:</p>
+<ol style="list-style-type: lower-alpha">
+<li>Looking at the raw values</li>
+<li>Computing summary statistics of the variables of interest.</li>
+<li>Creating informative visualizations</li>
+</ol>
+<p>What can you say about the differences in GDP per capita between continents based on this exploration?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model2table" class="section level3">
+<h3><span class="header-section-number">6.2.2</span> Linear regression</h3>
+<p>In Subsection <a href="6-regression.html#model1table">6.1.2</a> we introduced <em>simple</em> linear regression, which involves modeling the relationship between a numerical outcome variable <span class="math inline">\(y\)</span> as a function of a numerical explanatory variable <span class="math inline">\(x\)</span>, in our life expectancy example, we now have a categorical explanatory variable <span class="math inline">\(x\)</span> <code>continent</code>. While we still can fit a regression model, given our categorical explanatory variable we no longer have a concept of a “best-fitting” line, but rather “differences relative to a baseline for comparison.”</p>
+<p>Before we fit our regression model, let’s create a table similar to Table <a href="6-regression.html#tab:catxplot0">6.6</a>, but</p>
+<ol style="list-style-type: decimal">
+<li>Report the mean life expectancy for each continent.</li>
+<li>Report the difference in mean life expectancy <em>relative</em> to Africa’s mean life expectancy of 54.806 in the column “mean vs Africa”; this column is simply the “mean” column minus 54.806.</li>
+</ol>
+<p>Think back to your observations from the eyeball test of Figure <a href="6-regression.html#fig:catxplot1">6.15</a> at the end of the last subsection. The column “mean vs Africa” is the same idea of comparing a summary statistic to a baseline for comparison, in this case the countries of Africa, but using means instead of medians.</p>
+<table>
+<caption><span id="tab:continent-mean-life-expectancies">Table 6.7: </span>Mean life expectancy by continent</caption>
+<thead>
+<tr class="header">
+<th align="left">continent</th>
+<th align="right">mean</th>
+<th align="right">mean vs Africa</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Africa</td>
+<td align="right">54.8</td>
+<td align="right">0.0</td>
+</tr>
+<tr class="even">
+<td align="left">Americas</td>
+<td align="right">73.6</td>
+<td align="right">18.8</td>
+</tr>
+<tr class="odd">
+<td align="left">Asia</td>
+<td align="right">70.7</td>
+<td align="right">15.9</td>
+</tr>
+<tr class="even">
+<td align="left">Europe</td>
+<td align="right">77.6</td>
+<td align="right">22.8</td>
+</tr>
+<tr class="odd">
+<td align="left">Oceania</td>
+<td align="right">80.7</td>
+<td align="right">25.9</td>
+</tr>
+</tbody>
+</table>
+<p>Now, let’s use the <code>get_regression_table()</code> function we introduced in Section <a href="6-regression.html#model1table">6.1.2</a> to get the <em>regression table</em> for <code>gapminder2007</code> analysis:</p>
+<pre class="sourceCode r"><code class="sourceCode r">lifeExp_model &lt;-<span class="st"> </span><span class="kw">lm</span>(lifeExp <span class="op">~</span><span class="st"> </span>continent, <span class="dt">data =</span> gapminder2007)
+<span class="kw">get_regression_table</span>(lifeExp_model)</code></pre>
+<table>
+<caption><span id="tab:catxplot4b">Table 6.8: </span>Linear regression table</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">54.8</td>
+<td align="right">1.02</td>
+<td align="right">53.45</td>
+<td align="right">0</td>
+<td align="right">52.8</td>
+<td align="right">56.8</td>
+</tr>
+<tr class="even">
+<td align="left">continentAmericas</td>
+<td align="right">18.8</td>
+<td align="right">1.80</td>
+<td align="right">10.45</td>
+<td align="right">0</td>
+<td align="right">15.2</td>
+<td align="right">22.4</td>
+</tr>
+<tr class="odd">
+<td align="left">continentAsia</td>
+<td align="right">15.9</td>
+<td align="right">1.65</td>
+<td align="right">9.68</td>
+<td align="right">0</td>
+<td align="right">12.7</td>
+<td align="right">19.2</td>
+</tr>
+<tr class="even">
+<td align="left">continentEurope</td>
+<td align="right">22.8</td>
+<td align="right">1.70</td>
+<td align="right">13.47</td>
+<td align="right">0</td>
+<td align="right">19.5</td>
+<td align="right">26.2</td>
+</tr>
+<tr class="odd">
+<td align="left">continentOceania</td>
+<td align="right">25.9</td>
+<td align="right">5.33</td>
+<td align="right">4.86</td>
+<td align="right">0</td>
+<td align="right">15.4</td>
+<td align="right">36.5</td>
+</tr>
+</tbody>
+</table>
+<p>Just as before, we have the <code>term</code> and <code>estimates</code> columns of interest, but unlike before, we now have 5 rows corresponding to 5 outputs in our table: an intercept like before, but also <code>continentAmericas</code>, <code>continentAsia</code>, <code>continentEurope</code>, and <code>continentOceania</code>. What are these values? First, we must describe the equation for fitted value <span class="math inline">\(\widehat{y}\)</span>, which is a little more complicated when the <span class="math inline">\(x\)</span> explanatory variable is categorical:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\text{life exp}} &amp;= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)
+\end{align}
+\]</span></p>
+<p>Let’s break this down. First, <span class="math inline">\(\mathbb{1}_{A}(x)\)</span> is what’s known in mathematics as an “indicator function” that takes one of two possible values:</p>
+<p><span class="math display">\[
+\mathbb{1}_{A}(x) = \left\{
+\begin{array}{ll}
+1 &amp; \text{if } x \text{ is in } A \\
+0 &amp; \text{if } \text{otherwise} \end{array}
+\right.
+\]</span></p>
+<p>In a statistical modeling context this is also known as a “dummy variable”. In our case, let’s consider the first such indicator variable:</p>
+<p><span class="math display">\[
+\mathbb{1}_{\mbox{Amer}}(x) = \left\{
+\begin{array}{ll}
+1 &amp; \text{if } \text{country } x \text{ is in the Americas} \\
+0 &amp; \text{otherwise}\end{array}
+\right.
+\]</span></p>
+<p>Now let’s interpret the terms in the estimate column of the regression table. First <span class="math inline">\(b_0 =\)</span> <code>intercept = 54.8</code> corresponds to the mean life expectancy for countries in Africa, since for country <span class="math inline">\(x\)</span> in Africa we have the following equation:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\text{life exp}} &amp;= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot 0 + 15.9\cdot 0 + 22.8\cdot 0 + 25.9\cdot 0\\
+&amp;= 54.8
+\end{align}
+\]</span></p>
+<p>i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table <a href="6-regression.html#tab:continent-mean-life-expectancies">6.7</a>.</p>
+<p>Next, <span class="math inline">\(b_{\text{Amer}}\)</span> = <code>continentAmericas = 18.8</code> is the difference in mean life expectancies of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\text{life exp}} &amp;= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot 1 + 15.9\cdot 0 + 22.8\cdot 0 + 25.9\cdot 0\\
+&amp;= 54.8 + 18.8\\
+&amp;= 72.9
+\end{align}
+\]</span></p>
+<p>i.e. in this case, only the indicator function <span class="math inline">\(\mathbb{1}_{\mbox{Amer}}(x)\)</span> is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table <a href="6-regression.html#tab:continent-mean-life-expectancies">6.7</a>.</p>
+<p>Similarly, <span class="math inline">\(b_{\text{Asia}}\)</span> = <code>continentAsia = 15.9</code> is the difference in mean life expectancies of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\text{life exp}} &amp;= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x)
++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\
+&amp;= 54.8 + 18.8\cdot 0 + 15.9\cdot 1 + 22.8\cdot 0 + 25.9\cdot 0\\
+&amp;= 54.8 + 15.9\\
+&amp;= 70.7
+\end{align}
+\]</span></p>
+<p>i.e. in this case, only the indicator function <span class="math inline">\(\mathbb{1}_{\mbox{Asia}}(x)\)</span> is equal to 1, but all others are 0. Recall that 70.7 corresponds to the group mean life expectancy for all countries in Asia in Table <a href="6-regression.html#tab:continent-mean-life-expectancies">6.7</a>. The same logic applies to <span class="math inline">\(b_{\text{Euro}} = 22.8\)</span> and <span class="math inline">\(b_{\text{Ocean}} = 25.9\)</span>; they correspond to the “offset” in mean life expectancy for countries in Europe and Oceania, relative to the mean life expectancy of the baseline group for comparison of African countries.</p>
+<p>Let’s generalize this idea a bit. If we fit a linear regression model using a categorical explanatory variable <span class="math inline">\(x\)</span> that has <span class="math inline">\(k\)</span> levels, a regression model will return an intercept and <span class="math inline">\(k - 1\)</span> “slope” coefficients. When <span class="math inline">\(x\)</span> is a numerical explanatory variable the interpretation is of a “slope” coefficient, but when <span class="math inline">\(x\)</span> is categorical the meaning is a little trickier. They are <em>offsets</em> relative to the baseline.</p>
+<p>In our case, since there are <span class="math inline">\(k = 5\)</span> continents, the regression model returns an intercept corresponding to the baseline for comparison Africa and <span class="math inline">\(k - 1 = 4\)</span> slope coefficients corresponding to the Americas, Asia, Europe, and Oceania. Africa was chosen as the baseline by R for no other reason than it is first alphabetically of the 5 continents. You can manually specify which continent to use as baseline instead of the default choice of whichever comes first alphabetically, but we leave that to a more advanced course. (The <code>forcats</code> package is particularly nice for doing this and we encourage you to explore using it.)</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.5)</strong> Fit a new linear regression using <code>lm(gdpPercap ~ continent, data = gapminder2007)</code> where <code>gdpPercap</code> is the new outcome variable <span class="math inline">\(y\)</span>. Get information about the “best-fitting” line from the regression table by applying the <code>get_regression_table()</code> function. How do the regression results match up with the results from your exploratory data analysis above?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model2points" class="section level3">
+<h3><span class="header-section-number">6.2.3</span> Observed/fitted values and residuals</h3>
+<p>Recall in Subsection <a href="6-regression.html#model1points">6.1.3</a> when we had a numerical explanatory variable <span class="math inline">\(x\)</span>, we defined:</p>
+<ol style="list-style-type: decimal">
+<li>Observed values <span class="math inline">\(y\)</span>, or the observed value of the outcome variable</li>
+<li>Fitted values <span class="math inline">\(\widehat{y}\)</span>, or the value on the regression line for a given <span class="math inline">\(x\)</span> value</li>
+<li>Residuals <span class="math inline">\(y - \widehat{y}\)</span>, or the error between the observed value and the fitted value</li>
+</ol>
+<p>What do fitted values <span class="math inline">\(\widehat{y}\)</span> and residuals <span class="math inline">\(y - \widehat{y}\)</span> correspond to when the explanatory variable <span class="math inline">\(x\)</span> is categorical? Let’s investigate these values for the first 10 countries in the <code>gapminder2007</code> dataset:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-191">Table 6.9: </span>First 10 out of 142 countries</caption>
+<thead>
+<tr class="header">
+<th align="left">country</th>
+<th align="left">continent</th>
+<th align="right">lifeExp</th>
+<th align="right">gdpPercap</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Afghanistan</td>
+<td align="left">Asia</td>
+<td align="right">43.8</td>
+<td align="right">975</td>
+</tr>
+<tr class="even">
+<td align="left">Albania</td>
+<td align="left">Europe</td>
+<td align="right">76.4</td>
+<td align="right">5937</td>
+</tr>
+<tr class="odd">
+<td align="left">Algeria</td>
+<td align="left">Africa</td>
+<td align="right">72.3</td>
+<td align="right">6223</td>
+</tr>
+<tr class="even">
+<td align="left">Angola</td>
+<td align="left">Africa</td>
+<td align="right">42.7</td>
+<td align="right">4797</td>
+</tr>
+<tr class="odd">
+<td align="left">Argentina</td>
+<td align="left">Americas</td>
+<td align="right">75.3</td>
+<td align="right">12779</td>
+</tr>
+<tr class="even">
+<td align="left">Australia</td>
+<td align="left">Oceania</td>
+<td align="right">81.2</td>
+<td align="right">34435</td>
+</tr>
+<tr class="odd">
+<td align="left">Austria</td>
+<td align="left">Europe</td>
+<td align="right">79.8</td>
+<td align="right">36126</td>
+</tr>
+<tr class="even">
+<td align="left">Bahrain</td>
+<td align="left">Asia</td>
+<td align="right">75.6</td>
+<td align="right">29796</td>
+</tr>
+<tr class="odd">
+<td align="left">Bangladesh</td>
+<td align="left">Asia</td>
+<td align="right">64.1</td>
+<td align="right">1391</td>
+</tr>
+<tr class="even">
+<td align="left">Belgium</td>
+<td align="left">Europe</td>
+<td align="right">79.4</td>
+<td align="right">33693</td>
+</tr>
+</tbody>
+</table>
+<p>Recall the <code>get_regression_points()</code> function we used in Subsection <a href="6-regression.html#model1points">6.1.3</a> to return</p>
+<ul>
+<li>the observed value of the outcome variable,</li>
+<li>all explanatory variables,</li>
+<li>fitted values, and</li>
+<li>residuals for all points in the regression. Recall that each “point”. In this case, each row corresponds to one of 142 countries in the <code>gapminder2007</code> dataset. They are also the 142 observations used to construct the boxplots in Figure <a href="6-regression.html#fig:catxplot1">6.15</a>.</li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r">regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(lifeExp_model)
+regression_points</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-193">Table 6.10: </span>Regression points (First 10 out of 142 countries)</caption>
+<thead>
+<tr class="header">
+<th align="right">ID</th>
+<th align="right">lifeExp</th>
+<th align="left">continent</th>
+<th align="right">lifeExp_hat</th>
+<th align="right">residual</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">43.8</td>
+<td align="left">Asia</td>
+<td align="right">70.7</td>
+<td align="right">-26.900</td>
+</tr>
+<tr class="even">
+<td align="right">2</td>
+<td align="right">76.4</td>
+<td align="left">Europe</td>
+<td align="right">77.6</td>
+<td align="right">-1.226</td>
+</tr>
+<tr class="odd">
+<td align="right">3</td>
+<td align="right">72.3</td>
+<td align="left">Africa</td>
+<td align="right">54.8</td>
+<td align="right">17.495</td>
+</tr>
+<tr class="even">
+<td align="right">4</td>
+<td align="right">42.7</td>
+<td align="left">Africa</td>
+<td align="right">54.8</td>
+<td align="right">-12.075</td>
+</tr>
+<tr class="odd">
+<td align="right">5</td>
+<td align="right">75.3</td>
+<td align="left">Americas</td>
+<td align="right">73.6</td>
+<td align="right">1.712</td>
+</tr>
+<tr class="even">
+<td align="right">6</td>
+<td align="right">81.2</td>
+<td align="left">Oceania</td>
+<td align="right">80.7</td>
+<td align="right">0.515</td>
+</tr>
+<tr class="odd">
+<td align="right">7</td>
+<td align="right">79.8</td>
+<td align="left">Europe</td>
+<td align="right">77.6</td>
+<td align="right">2.180</td>
+</tr>
+<tr class="even">
+<td align="right">8</td>
+<td align="right">75.6</td>
+<td align="left">Asia</td>
+<td align="right">70.7</td>
+<td align="right">4.907</td>
+</tr>
+<tr class="odd">
+<td align="right">9</td>
+<td align="right">64.1</td>
+<td align="left">Asia</td>
+<td align="right">70.7</td>
+<td align="right">-6.666</td>
+</tr>
+<tr class="even">
+<td align="right">10</td>
+<td align="right">79.4</td>
+<td align="left">Europe</td>
+<td align="right">77.6</td>
+<td align="right">1.792</td>
+</tr>
+</tbody>
+</table>
+<p>Notice</p>
+<ul>
+<li>The fitted values <code>lifeExp_hat</code> <span class="math inline">\(\widehat{\text{lifeexp}}\)</span>. Countries in Africa have the
+same fitted value of 54.8, which is the mean life expectancy of Africa. Countries in Asia have the same fitted value of 70.7, which is the mean life
+expectancy of Asia. This similarly holds for countries in the Americas, Europe,
+and Oceania.</li>
+<li>The <code>residual</code> column is simply <span class="math inline">\(y - \widehat{y}\)</span> = <code>lifeexp - lifeexp_hat</code>.
+These values can be interpreted as that particular country’s deviation from the
+mean life expectancy of the respective continent’s mean. For example, the first
+row of this dataset corresponds to Afghanistan, and the residual of
+<span class="math inline">\(-26.9 = 43.8 - 70.7\)</span> is Afghanistan’s mean life expectancy minus the mean life
+expectancy of all Asian countries.</li>
+</ul>
+</div>
+<div id="model2residuals" class="section level3">
+<h3><span class="header-section-number">6.2.4</span> Residual analysis</h3>
+<p>Recall our discussion on residuals from Section <a href="6-regression.html#model1residuals">6.1.4</a> where our goal was to investigate whether or not there was a <em>systematic pattern</em> to the residuals. Ideally since residuals can be thought of as error, there should be no such pattern. While there are many ways to do such residual analysis, we focused on two approaches based on visualizations.</p>
+<ol style="list-style-type: decimal">
+<li>A plot with residuals on the vertical axis and the predictor (in this case continent) on the horizontal axis</li>
+<li>A histogram of all residuals</li>
+</ol>
+<p>First, let’s plot the residuals versus continent in Figure <a href="6-regression.html#fig:catxplot7">6.16</a>, but also let’s plot all 142 points with a little horizontal random jitter by setting the <code>width = 0.1</code> parameter in <code>geom_jitter()</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> continent, <span class="dt">y =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_jitter</span>(<span class="dt">width =</span> <span class="fl">0.1</span>) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Continent&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_hline</span>(<span class="dt">yintercept =</span> <span class="dv">0</span>, <span class="dt">col =</span> <span class="st">&quot;blue&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:catxplot7"></span>
+<img src="ismaykim_files/figure-html/catxplot7-1.png" alt="Plot of residuals over continent" width="\textwidth" />
+<p class="caption">
+Figure 6.16: Plot of residuals over continent
+</p>
+</div>
+<p>We observe</p>
+<ol style="list-style-type: decimal">
+<li>There seems to be a rough balance of both positive and negative residuals for all 5 continents.</li>
+<li>However, there is one clear outlier in Asia. It has the smallest residual,
+hence also has the smallest life expectancy in Asia.</li>
+</ol>
+<p>Let’s investigate the 5 countries in Asia with the shortest life expectancy:</p>
+<pre class="sourceCode r"><code class="sourceCode r">gapminder2007 <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">filter</span>(continent <span class="op">==</span><span class="st"> &quot;Asia&quot;</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">arrange</span>(lifeExp)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-195">Table 6.11: </span>Countries in Asia with shortest life expectancy</caption>
+<thead>
+<tr class="header">
+<th align="left">country</th>
+<th align="left">continent</th>
+<th align="right">lifeExp</th>
+<th align="right">gdpPercap</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Afghanistan</td>
+<td align="left">Asia</td>
+<td align="right">43.8</td>
+<td align="right">975</td>
+</tr>
+<tr class="even">
+<td align="left">Iraq</td>
+<td align="left">Asia</td>
+<td align="right">59.5</td>
+<td align="right">4471</td>
+</tr>
+<tr class="odd">
+<td align="left">Cambodia</td>
+<td align="left">Asia</td>
+<td align="right">59.7</td>
+<td align="right">1714</td>
+</tr>
+<tr class="even">
+<td align="left">Myanmar</td>
+<td align="left">Asia</td>
+<td align="right">62.1</td>
+<td align="right">944</td>
+</tr>
+<tr class="odd">
+<td align="left">Yemen, Rep.</td>
+<td align="left">Asia</td>
+<td align="right">62.7</td>
+<td align="right">2281</td>
+</tr>
+</tbody>
+</table>
+<p>This was the earlier identified residual for Afghanistan of -26.9. Unfortunately
+given recent geopolitical turmoil, individuals who live in Afghanistan and, in particular in 2007, have a
+drastically lower life expectancy.</p>
+<p>Second, let’s look at a histogram of all 142 values of
+residuals in Figure <a href="6-regression.html#fig:catxplot8">6.17</a>. In this case, the residuals form a
+rather nice bell-shape, although there are a couple of very low and very high
+values at the tails. As we said previously, searching for patterns in residuals
+can be somewhat subjective, but ideally we hope there are no “drastic” patterns.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:catxplot8"></span>
+<img src="ismaykim_files/figure-html/catxplot8-1.png" alt="Histogram of residuals" width="\textwidth" />
+<p class="caption">
+Figure 6.17: Histogram of residuals
+</p>
+</div>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC6.6)</strong> Continuing with our regression using <code>gdpPercap</code> as the outcome variable and <code>continent</code> as the explanatory variable, use the <code>get_regression_points()</code> function to get the observed values, fitted values, and residuals for all 142 countries in 2007 and perform a residual analysis to look for any systematic patterns in the residuals. Is there a pattern? Please keep in mind that these types of questions are somewhat subjective and different people will most likely have different answers. The focus should be on being able to justify the conclusions made.</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="related-topics" class="section level2">
+<h2><span class="header-section-number">6.3</span> Related topics</h2>
+<div id="correlationcoefficient" class="section level3">
+<h3><span class="header-section-number">6.3.1</span> Correlation coefficient</h3>
+<p>Let’s re-plot Figure <a href="6-regression.html#fig:correlation1">6.1</a>, but now consider a broader range of correlation coefficient values in Figure <a href="6-regression.html#fig:correlation2">6.18</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:correlation2"></span>
+<img src="ismaykim_files/figure-html/correlation2-1.png" alt="Different Correlation Coefficients" width="\textwidth" />
+<p class="caption">
+Figure 6.18: Different Correlation Coefficients
+</p>
+</div>
+<p>As we suggested in Subsection <a href="6-regression.html#model1EDA">6.1.1</a>, interpreting coefficients that are not close to the extreme values of -1 and 1 can be subjective. To develop your sense of correlation coefficients, we suggest you play the following 80’s-style video game called “Guess the correlation”! Click on the image below to do so:</p>
+<center>
+<a target="_blank" href="http://guessthecorrelation.com/"><img src="images/guess_the_correlation.png" title="Guess the correlation" width="600"/></a>
+</center>
+</div>
+<div id="correlation-is-not-causation" class="section level3">
+<h3><span class="header-section-number">6.3.2</span> Correlation is not necessarily causation</h3>
+<p>You’ll note throughout this chapter we’ve been very cautious in making statements of the “associated effect” of explanatory variables on the outcome variables, for example our statement from Subsection <a href="6-regression.html#model1table">6.1.2</a> that “for every increase of 1 unit in <code>bty_avg</code>, there is an <em>associated</em> increase of, <em>on average</em>, 18.802 units of <code>score</code>.” We stay this because we are careful not to make <em>causal</em> statements. So while beauty score <code>bty_avg</code> is positively correlated with teaching <code>score</code>, does it directly cause effects on teaching score.</p>
+<p>For example, let’s say an instructor has their <code>bty_avg</code> reevaluated, but only after taking steps to try to boost their beauty score. Does this mean that they will suddenly be a better instructor? Or will they suddenly get higher teaching scores? Maybe?</p>
+<p>Here is another example, a not-so-great medical doctor goes through their medical records and finds that patients who slept with their shoes on tended to wake up more with headaches. So this doctor declares “Sleeping with shoes on cause headaches!”</p>
+<div class="figure" style="text-align: center"><span id="fig:moderndive-figure-causal-graph-2"></span>
+<img src="images/flowcharts/flowchart.010-cropped.png" alt="Does sleeping with shoes on cause headaches?" width="\textwidth" />
+<p class="caption">
+Figure 6.19: Does sleeping with shoes on cause headaches?
+</p>
+</div>
+<p>However as some of you might have guessed, if someone is sleeping with their shoes on its probably because they are intoxicated. Furthermore, drinking more tends to cause more hangovers, and hence more headaches.</p>
+<p>In this instance, alcohol is what’s known as a <em>confounding/lurking</em> variable. It “lurks” behind the scenes, confounding or making less apparent, the causal effect (if any) of “sleeping with shoes on” with waking up with a headache. We can summarize this notion in Figure <a href="6-regression.html#fig:moderndive-figure-causal-graph">6.20</a> with a <em>causal graph</em> where:</p>
+<ul>
+<li>Y: Is an <em>outcome</em> variable, here “waking up with a headache.”</li>
+<li>X: Is a <em>treatment</em> variable whose causal effect we are interested in, here “sleeping with shoes on.”</li>
+</ul>
+<div class="figure" style="text-align: center"><span id="fig:moderndive-figure-causal-graph"></span>
+<img src="images/flowcharts/flowchart.009-cropped.png" alt="Causal graph." width="\textwidth" />
+<p class="caption">
+Figure 6.20: Causal graph.
+</p>
+</div>
+<p>So for example, many such studies use regression modeling where the outcome variable is set to Y and the explanatory/predictor variable is X, much as you’ve started learning how to do in this chapter. However, Figure <a href="6-regression.html#fig:moderndive-figure-causal-graph">6.20</a> also includes a third variable with arrows pointing at both X and Y.</p>
+<ul>
+<li>Z: Is a <em>confounding</em> variable that effects both X &amp; Y, thus “confounding” their relationship.</li>
+</ul>
+<p>So as we said, alcohol will both cause people to be more likely to sleep with their shoes on as well as more likely to wake up with a headache. Thus when evaluating what causes one to wake up with a headache, its hard to tease out the effect of sleeping with shoes on versus just the alcohol. Thus our model needs to also use Z as an explanatory/predictor variable as well, in other words our doctor needs to take into account who had been drinking the night before. We’ll start covering multiple regression models that allows us to incorporate more than one variable in the next chapter.</p>
+<p>Establishing causation is a tricky problem and frequently takes either carefully designed experiments or methods to control for the effects of potential confounding variables. Both these approaches attempt either to remove all confounding variables or take them into account as best they can, and only focus on the behavior of an outcome variable in the presence of the levels of the other variable(s). Be careful as you read studies to make sure that the writers aren’t falling into this fallacy of correlation implying causation. If you spot one, you may want to send them a link to <a href="http://www.tylervigen.com/spurious-correlations">Spurious Correlations</a>.</p>
+</div>
+<div id="leastsquares" class="section level3">
+<h3><span class="header-section-number">6.3.3</span> Best fitting line</h3>
+<p>Regression lines are also known as “best fitting lines”. But what do we mean by best? Let’s unpack the criteria
+that is used by regression to determine best. Recall the plot in Figure <a href="6-regression.html#fig:numxplot5">6.8</a> where for a instructor
+with a beauty average score of <span class="math inline">\(x=7.333\)</span></p>
+<ul>
+<li>The observed value <span class="math inline">\(y=4.9\)</span> was marked with a red circle</li>
+<li>The fitted value <span class="math inline">\(\widehat{y} = 4.369\)</span> on the regression line was marked with a red square</li>
+<li>The residual <span class="math inline">\(y-\widehat{y} = 4.9-4.369 = 0.531\)</span> was the length of the blue arrow.</li>
+</ul>
+<p>Let’s do this for another arbitrarily chosen instructor whose beauty score was
+<span class="math inline">\(x=2.333\)</span>. The residual in this case is <span class="math inline">\(2.7 - 4.036 = -1.336\)</span>.</p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-198-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Another arbitrarily chosen instructor whose beauty score was
+<span class="math inline">\(x=3.667\)</span> results in the residual in this case being <span class="math inline">\(4.4 - 4.125 = 0.2753\)</span>.</p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-199-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Let’s do this one more time for another arbitrarily chosen instructor. This instructor had a beauty score of
+<span class="math inline">\(x = 6\)</span>. The residual in this case is <span class="math inline">\(3.8 - 4.28 = -0.4802\)</span>.</p>
+<p><img src="ismaykim_files/figure-html/here-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Now let’s say we repeated this process for all 463 instructors in our
+dataset. Regression <em>minimizes the sum of all 463 arrow lengths
+squared.</em> In other words, it minimizes the sum of the squared residuals:</p>
+<p><span class="math display">\[
+\sum_{i=1}^{n}(y_i - \widehat{y}_i)^2
+\]</span></p>
+<p>We square the arrow lengths so that positive and negative deviations of the same amount are treated equally. That’s why alternative names for the simple linear regression line are the <strong>least-squares line</strong> and the <strong>best fitting line</strong>. It can be proven via calculus and linear algebra that this line uniquely minimizes the sum of the squared arrow lengths.</p>
+<p>For the regression line in the plot, the sum of the squared residuals is 131.879. This is the lowest possible value of the sum of the squared residuals of all possible lines we could draw on this scatterplot? How do we know this? We can mathematically prove this fact, but this requires some calculus and linear algebra, so let’s leave this proof for another course!</p>
+</div>
+<div id="underthehood" class="section level3">
+<h3><span class="header-section-number">6.3.4</span> <code>get_regression_x()</code> functions</h3>
+<p>What is going on behind the scenes with the <code>get_regression_table()</code> <code>get_regression_points()</code> from the <code>moderndive</code> package? Recall we introduced</p>
+<ol style="list-style-type: decimal">
+<li>In Subsection <a href="6-regression.html#model1table">6.1.2</a>, the <code>get_regression_table()</code> function that returned a regression table.</li>
+<li>In Subsection <a href="6-regression.html#model1points">6.1.3</a>, the <code>get_regression_points()</code> function that returned information on all <span class="math inline">\(n\)</span> points/observations involved in a regression?</li>
+</ol>
+<p>and that these were examples of <em>wrapper functions</em> that takes other pre-existing functions and “wraps” them in a single function. This way all the user needs to worry about is the input and the output format, and ignore what’s “under the hood.” In this subsection we “lift the hood” and see how the engine of these wrapper functions work.</p>
+<p>First, the <code>get_regression_table()</code> wrapper function leverages the</p>
+<ul>
+<li>the <code>tidy()</code> function in the <a href="https://broom.tidyverse.org/"><code>broom</code> package</a> and</li>
+<li>the <code>clean_names()</code> function in the <a href="https://github.com/sfirke/janitor"><code>janitor</code> package</a></li>
+</ul>
+<p>to generate tidy data frames with information about a regression model. Here is what the regression table from Subsection <a href="6-regression.html#model1table">6.1.2</a> looks like:</p>
+<pre class="sourceCode r"><code class="sourceCode r">score_model &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>bty_avg, <span class="dt">data =</span> evals_ch6)
+<span class="kw">get_regression_table</span>(score_model)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">3.880</td>
+<td align="right">0.076</td>
+<td align="right">50.96</td>
+<td align="right">0</td>
+<td align="right">3.731</td>
+<td align="right">4.030</td>
+</tr>
+<tr class="even">
+<td align="left">bty_avg</td>
+<td align="right">0.067</td>
+<td align="right">0.016</td>
+<td align="right">4.09</td>
+<td align="right">0</td>
+<td align="right">0.035</td>
+<td align="right">0.099</td>
+</tr>
+</tbody>
+</table>
+<p>The <code>get_regression_table()</code> function takes the above two functions that already existed in other R packages, uses them, and hides the details as seen below. This was on the editorial decision on our part as we felt the following code was unfortunately out of the reach for some new coders, so the following wrapper function was written so that users need only focus on the output.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(broom)
+<span class="kw">library</span>(janitor)
+score_model <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">tidy</span>(<span class="dt">conf.int =</span> <span class="ot">TRUE</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate_if</span>(is.numeric, round, <span class="dt">digits =</span> <span class="dv">3</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">clean_names</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">lower_ci =</span> conf_low,
+         <span class="dt">upper_ci =</span> conf_high)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">(Intercept)</td>
+<td align="right">3.880</td>
+<td align="right">0.076</td>
+<td align="right">50.96</td>
+<td align="right">0</td>
+<td align="right">3.731</td>
+<td align="right">4.030</td>
+</tr>
+<tr class="even">
+<td align="left">bty_avg</td>
+<td align="right">0.067</td>
+<td align="right">0.016</td>
+<td align="right">4.09</td>
+<td align="right">0</td>
+<td align="right">0.035</td>
+<td align="right">0.099</td>
+</tr>
+</tbody>
+</table>
+<p>Note that the <code>mutate_if()</code> function is from the <code>dplyr</code> package and applies the <code>round()</code> function with 3 significant digits precision only to those variables that are numerical.</p>
+<p>Similarly, the second <code>get_regression_points()</code> function is another wrapper function, but this time returning information about the points in a regression rather than the regression table. It uses the <code>augment()</code> function in the <a href="https://broom.tidyverse.org/"><code>broom</code> package</a> instead of <code>tidy()</code> as with <code>get_regression_points()</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(broom)
+<span class="kw">library</span>(janitor)
+score_model <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">augment</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate_if</span>(is.numeric, round, <span class="dt">digits =</span> <span class="dv">3</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">clean_names</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span><span class="kw">c</span>(<span class="st">&quot;se_fit&quot;</span>, <span class="st">&quot;hat&quot;</span>, <span class="st">&quot;sigma&quot;</span>, <span class="st">&quot;cooksd&quot;</span>, <span class="st">&quot;std_resid&quot;</span>))</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="right">score</th>
+<th align="right">bty_avg</th>
+<th align="right">fitted</th>
+<th align="right">resid</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">4.7</td>
+<td align="right">5.00</td>
+<td align="right">4.21</td>
+<td align="right">0.486</td>
+</tr>
+<tr class="even">
+<td align="right">4.1</td>
+<td align="right">5.00</td>
+<td align="right">4.21</td>
+<td align="right">-0.114</td>
+</tr>
+<tr class="odd">
+<td align="right">3.9</td>
+<td align="right">5.00</td>
+<td align="right">4.21</td>
+<td align="right">-0.314</td>
+</tr>
+<tr class="even">
+<td align="right">4.8</td>
+<td align="right">5.00</td>
+<td align="right">4.21</td>
+<td align="right">0.586</td>
+</tr>
+<tr class="odd">
+<td align="right">4.6</td>
+<td align="right">3.00</td>
+<td align="right">4.08</td>
+<td align="right">0.520</td>
+</tr>
+<tr class="even">
+<td align="right">4.3</td>
+<td align="right">3.00</td>
+<td align="right">4.08</td>
+<td align="right">0.220</td>
+</tr>
+<tr class="odd">
+<td align="right">2.8</td>
+<td align="right">3.00</td>
+<td align="right">4.08</td>
+<td align="right">-1.280</td>
+</tr>
+<tr class="even">
+<td align="right">4.1</td>
+<td align="right">3.33</td>
+<td align="right">4.10</td>
+<td align="right">-0.002</td>
+</tr>
+<tr class="odd">
+<td align="right">3.4</td>
+<td align="right">3.33</td>
+<td align="right">4.10</td>
+<td align="right">-0.702</td>
+</tr>
+<tr class="even">
+<td align="right">4.5</td>
+<td align="right">3.17</td>
+<td align="right">4.09</td>
+<td align="right">0.409</td>
+</tr>
+</tbody>
+</table>
+<p>In this case, it outputs only variables of interest to us as new regression modelers: the outcome variable <span class="math inline">\(y\)</span> (<code>score</code>), all explanatory/predictor variables (<code>bty_avg</code>), all resulting <code>fitted</code> values <span class="math inline">\(\hat{y}\)</span> used by applying the equation of the regression line to <code>bty_avg</code>, and the <code>resid</code>ual <span class="math inline">\(y - \hat{y}\)</span>.</p>
+<p>If you’re even more curious, take a look at the source code for these functions on <a href="https://github.com/moderndive/moderndive/blob/master/R/regression_functions.R">GitHub</a>.</p>
+</div>
+</div>
+<div id="conclusion-4" class="section level2">
+<h2><span class="header-section-number">6.4</span> Conclusion</h2>
+<p>In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter <a href="7-multiple-regression.html#multiple-regression">7</a>, we’ll study <em>multiple regression</em> where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections {#model1residuals} and {#model2residuals}. We are actually verifying some very important assumptions that must be met for the <code>std_error</code> (standard error), <code>p_value</code>, <code>lower_ci</code> and <code>upper_ci</code> (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in!</p>
+<div id="script-of-r-code-3" class="section level3">
+<h3><span class="header-section-number">6.4.1</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/06-regression.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="5-wrangling.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="7-multiple-regression.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/06-regression.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/7-hypothesis-testing.html b/previous_versions/v0.4.0/7-hypothesis-testing.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/previous_versions/v0.4.0/7-multiple-regression.html b/previous_versions/v0.4.0/7-multiple-regression.html
new file mode 100644
index 000000000..d74909907
--- /dev/null
+++ b/previous_versions/v0.4.0/7-multiple-regression.html
@@ -0,0 +1,1538 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>7 Multiple Regression | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="7 Multiple Regression | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="7 Multiple Regression | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="6-regression.html">
+<link rel="next" href="8-sampling.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="multiple-regression" class="section level1">
+<h1><span class="header-section-number">7</span> Multiple Regression</h1>
+<p>In Chapter <a href="6-regression.html#regression">6</a> we introduced ideas related to modeling, in particular that the fundamental premise of modeling is <em>to make explicit the relationship</em> between an outcome variable <span class="math inline">\(y\)</span> and an explanatory/predictor variable <span class="math inline">\(x\)</span>. Recall further the synonyms that we used to also denote <span class="math inline">\(y\)</span> as the dependent variable and <span class="math inline">\(x\)</span> as an independent variable or covariate.</p>
+<p>There are many modeling approaches one could take, among the most well-known being linear regression, which was the focus of the last chapter. Whereas in the last chapter we focused solely on regression scenarios where there is only one explanatory/predictor variable, in this chapter, we now focus on modeling scenarios where there is more than one. This case of regression more than one explanatory variable is known as multiple regression. You can imagine when trying to model a particular outcome variable, like teaching evaluation score as in Section <a href="6-regression.html#model1">6.1</a> or life expectancy as in Section <a href="6-regression.html#model2">6.2</a>, it would be very useful to incorporate more than one explanatory variable.</p>
+<p>Since our regression models will now consider more than one explanatory/predictor variable, the interpretation of the associated effect of any one explanatory/predictor variables must be made in conjunction with the others. For example, say we are modeling individuals’ incomes as a function of their number of years of education and their parents’ wealth. When interpreting the effect of education on income, one has to consider the effect of their parents’ wealth at the same time, as these two variables are almost certainly related. Make note of this throughout this chapter and as you work on interpreting the results of multiple regression models into the future.</p>
+<div id="needed-packages-4" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(moderndive)
+<span class="kw">library</span>(ISLR)
+<span class="kw">library</span>(skimr)</code></pre>
+</div>
+<div id="datacamp-5" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>The approach taken below of using more than one variable of information in models using multiple regression is identical to that taken in ModernDive co-author <a href="https://twitter.com/rudeboybert">Albert Y. Kim’s</a> DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression”.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse"><img src="images/datacamp_intro_to_modeling.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="model3" class="section level2">
+<h2><span class="header-section-number">7.1</span> Two numerical explanatory variables</h2>
+<p>Let’s now attempt to identify factors that are associated with how much credit card debt an individual will have. The textbook <a href="http://www-bcf.usc.edu/~gareth/ISL/">An Introduction to Statistical Learning with Applications in R</a> by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani is an intermediate-level textbook on statistical and machine learning freely available <a href="http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf">here</a>. It has an accompanying R package called <code>ISLR</code> with datasets that the authors use to demonstrate various machine learning methods. One dataset that is frequently used by the authors is the <code>Credit</code> dataset where predictions are made on the credit card balance held by <span class="math inline">\(n = 400\)</span> credit card holders. These predictions are based on information about them like income, credit limit, and education level. Note that this dataset is not based on actual individuals, it is a simulated dataset used for educational purposes.</p>
+<p>Since no information was provided as to who these <span class="math inline">\(n\)</span> = 400 individuals are and how they came to be included in this dataset, it will be hard to make any scientific claims based on this data. Recall our discussion from the previous chapter that correlation does not necessarily imply causation. That being said, we’ll still use <code>Credit</code> to demonstrate multiple regression with:</p>
+<ol style="list-style-type: decimal">
+<li>A numerical outcome variable <span class="math inline">\(y\)</span>, in this case credit card balance.</li>
+<li>Two explanatory variables:
+<ol style="list-style-type: decimal">
+<li>A first numerical explanatory variable <span class="math inline">\(x_1\)</span>. In this case, their credit limit.</li>
+<li>A second numerical explanatory variable <span class="math inline">\(x_2\)</span>. In this case, their income (in thousands of dollars).</li>
+</ol></li>
+</ol>
+<p>In the forthcoming Learning Checks, we’ll consider a different scenario:</p>
+<ol style="list-style-type: decimal">
+<li>The same numerical outcome variable <span class="math inline">\(y\)</span>: credit card balance.</li>
+<li>Two new explanatory variables:
+<ol style="list-style-type: decimal">
+<li>A first numerical explanatory variable <span class="math inline">\(x_1\)</span>: their credit rating.</li>
+<li>A second numerical explanatory variable <span class="math inline">\(x_2\)</span>: their age.</li>
+</ol></li>
+</ol>
+<div id="model3EDA" class="section level3">
+<h3><span class="header-section-number">7.1.1</span> Exploratory data analysis</h3>
+<p>Let’s load the <code>Credit</code> data and <code>select()</code> only the needed subset of variables.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ISLR)
+Credit &lt;-<span class="st"> </span>Credit <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(Balance, Limit, Income, Rating, Age)</code></pre>
+<p>Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the <code>glimpse()</code> function. Although in Table <a href="7-multiple-regression.html#tab:model3-data-preview">7.1</a> we only show 5 randomly selected credit card holders out of 400:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(Credit)</code></pre>
+<table>
+<caption><span id="tab:model3-data-preview">Table 7.1: </span>Random sample of 5 credit card holders</caption>
+<thead>
+<tr class="header">
+<th align="right">Balance</th>
+<th align="right">Limit</th>
+<th align="right">Income</th>
+<th align="right">Rating</th>
+<th align="right">Age</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1425</td>
+<td align="right">6045</td>
+<td align="right">39.8</td>
+<td align="right">459</td>
+<td align="right">32</td>
+</tr>
+<tr class="even">
+<td align="right">279</td>
+<td align="right">3300</td>
+<td align="right">15.1</td>
+<td align="right">266</td>
+<td align="right">66</td>
+</tr>
+<tr class="odd">
+<td align="right">204</td>
+<td align="right">5308</td>
+<td align="right">80.6</td>
+<td align="right">394</td>
+<td align="right">57</td>
+</tr>
+<tr class="even">
+<td align="right">1050</td>
+<td align="right">9310</td>
+<td align="right">180.4</td>
+<td align="right">665</td>
+<td align="right">67</td>
+</tr>
+<tr class="odd">
+<td align="right">15</td>
+<td align="right">4952</td>
+<td align="right">88.8</td>
+<td align="right">360</td>
+<td align="right">86</td>
+</tr>
+</tbody>
+</table>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(Credit)</code></pre>
+<pre><code>Observations: 400
+Variables: 5
+$ Balance &lt;int&gt; 333, 903, 580, 964, 331, 1151, 203, 872, 279, 1350, 1407, 0, …
+$ Limit   &lt;int&gt; 3606, 6645, 7075, 9504, 4897, 8047, 3388, 7114, 3300, 6819, 8…
+$ Income  &lt;dbl&gt; 14.9, 106.0, 104.6, 148.9, 55.9, 80.2, 21.0, 71.4, 15.1, 71.1…
+$ Rating  &lt;int&gt; 283, 483, 514, 681, 357, 569, 259, 512, 266, 491, 589, 138, 3…
+$ Age     &lt;int&gt; 34, 82, 71, 36, 68, 77, 37, 87, 66, 41, 30, 64, 57, 49, 75, 5…</code></pre>
+<p>Let’s look at some summary statistics, again using the <code>skim()</code> function from the <code>skimr</code> package:</p>
+<pre class="sourceCode r"><code class="sourceCode r">Credit <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(Balance, Limit, Income) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">skim</span>()</code></pre>
+<pre><code>Skim summary statistics
+ n obs: 400 
+ n variables: 3 
+
+── Variable type:integer ─────
+ variable missing complete   n    mean      sd  p0     p25    p50     p75  p100
+  Balance       0      400 400  520.01  459.76   0   68.75  459.5  863     1999
+    Limit       0      400 400 4735.6  2308.2  855 3088    4622.5 5872.75 13913
+     hist
+ ▇▃▃▃▂▁▁▁
+ ▅▇▇▃▂▁▁▁
+
+── Variable type:numeric ─────
+ variable missing complete   n  mean    sd    p0   p25   p50   p75   p100
+   Income       0      400 400 45.22 35.24 10.35 21.01 33.12 57.47 186.63
+     hist
+ ▇▃▂▁▁▁▁▁</code></pre>
+<p>We observe for example:</p>
+<ol style="list-style-type: decimal">
+<li>The mean and median credit card balance are $520.01 and $495.50 respectively.</li>
+<li>25% of card holders had debts of $68.75 or less.</li>
+<li>The mean and median credit card limit are $4735.6 and $4622.50 respectively.</li>
+<li>75% of these card holders had incomes of $57,470 or less.</li>
+</ol>
+<p>Since our outcome variable <code>Balance</code> and the explanatory variables <code>Limit</code> and
+<code>Rating</code> are numerical, we can compute the correlation coefficient between pairs
+of these variables. First, we could run the <code>get_correlation()</code> command as seen
+in Subsection <a href="6-regression.html#model1EDA">6.1.1</a> twice, once for each explanatory variable:</p>
+<pre class="sourceCode r"><code class="sourceCode r">Credit <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_correlation</span>(Balance <span class="op">~</span><span class="st"> </span>Limit)
+Credit <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_correlation</span>(Balance <span class="op">~</span><span class="st"> </span>Income)</code></pre>
+<p>Or we can simultaneously compute them by returning a <em>correlation matrix</em> in
+Table <a href="7-multiple-regression.html#tab:model3-correlation">7.2</a>. We can read off the correlation coefficient
+for any pair of variables by looking them up in the appropriate row/column combination.</p>
+<pre class="sourceCode r"><code class="sourceCode r">Credit <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(Balance, Limit, Income) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">cor</span>()</code></pre>
+<table>
+<caption><span id="tab:model3-correlation">Table 7.2: </span>Correlations between credit card balance, credit limit, and income</caption>
+<thead>
+<tr class="header">
+<th></th>
+<th align="right">Balance</th>
+<th align="right">Limit</th>
+<th align="right">Income</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>Balance</td>
+<td align="right">1.000</td>
+<td align="right">0.862</td>
+<td align="right">0.464</td>
+</tr>
+<tr class="even">
+<td>Limit</td>
+<td align="right">0.862</td>
+<td align="right">1.000</td>
+<td align="right">0.792</td>
+</tr>
+<tr class="odd">
+<td>Income</td>
+<td align="right">0.464</td>
+<td align="right">0.792</td>
+<td align="right">1.000</td>
+</tr>
+</tbody>
+</table>
+<p>For example, the correlation coefficient of:</p>
+<ol style="list-style-type: decimal">
+<li><code>Balance</code> with itself is 1 as we would expect based on the definition of the correlation coefficient.</li>
+<li><code>Balance</code> with <code>Limit</code> is 0.862. This indicates a strong positive linear relationship, which makes sense as only individuals with large credit limits can accrue large credit card balances.</li>
+<li><code>Balance</code> with <code>Income</code> is 0.464. This is suggestive of another positive linear relationship, although not as strong as the relationship between <code>Balance</code> and <code>Limit</code>.</li>
+<li>As an added bonus, we can read off the correlation coefficient of the two explanatory variables, <code>Limit</code> and <code>Income</code> of 0.792. In this case, we say there is a high degree of <em>collinearity</em> between these two explanatory variables.</li>
+</ol>
+<p>Collinearity (or multicollinearity) is a phenomenon in which one explanatory variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. So in this case, if we knew someone’s credit card <code>Limit</code> and since <code>Limit</code> and <code>Income</code> are highly correlated, we could make a fairly accurate guess as to that person’s <code>Income</code>. Or put loosely, these two variables provided redundant information. For now let’s ignore any issues related to collinearity and press on.</p>
+<p>Let’s visualize the relationship of the outcome variable with each of the two explanatory variables in two separate plots:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(Credit, <span class="kw">aes</span>(<span class="dt">x =</span> Limit, <span class="dt">y =</span> Balance)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Credit limit (in $)&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Credit card balance (in $)&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Relationship between balance and credit limit&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>, <span class="dt">se =</span> <span class="ot">FALSE</span>)
+  
+<span class="kw">ggplot</span>(Credit, <span class="kw">aes</span>(<span class="dt">x =</span> Income, <span class="dt">y =</span> Balance)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Income (in $1000)&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Credit card balance (in $)&quot;</span>, 
+       <span class="dt">title =</span> <span class="st">&quot;Relationship between balance and income&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>, <span class="dt">se =</span> <span class="ot">FALSE</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:2numxplot1"></span>
+<img src="ismaykim_files/figure-html/2numxplot1-1.png" alt="Relationship between credit card balance and credit limit/income" width="\textwidth" />
+<p class="caption">
+Figure 7.1: Relationship between credit card balance and credit limit/income
+</p>
+</div>
+<p>First, there is a positive relationship between credit limit and balance, since as credit limit increases so also does credit card balance; this is to be expected given the strongly positive correlation coefficient of 0.862. In the case of income, the positive relationship doesn’t appear as strong, given the weakly positive correlation coefficient of 0.464. However the two plots in Figure <a href="7-multiple-regression.html#fig:2numxplot1">7.1</a> only focus on the relationship of the outcome variable with each of the explanatory variables independently. To get a sense of the <em>joint</em> relationship of all three variables simultaneously through a visualization, let’s display the data in a 3-dimensional (3D) scatterplot, where</p>
+<ol style="list-style-type: decimal">
+<li>The numerical outcome variable <span class="math inline">\(y\)</span> <code>Balance</code> is on the z-axis (vertical axis)</li>
+<li>The two numerical explanatory variables form the “floor” axes. In this case
+<ol style="list-style-type: decimal">
+<li>The first numerical explanatory variable <span class="math inline">\(x_1\)</span> <code>Income</code> is on of the floor axes.</li>
+<li>The second numerical explanatory variable <span class="math inline">\(x_2\)</span> <code>Limit</code> is on the other floor axis.</li>
+</ol></li>
+</ol>
+<p>Click on the following image to open an interactive 3D scatterplot in your browser:</p>
+<center>
+<a target="_blank" href="https://assets.datacamp.com/production/repositories/1575/datasets/f369dc94041e88effd5ed66512978f8cdfd33801/03-01-slides-interactive_3D_scatterplot_regression_plane.html"><img src="images/credit_card_balance_3D_scatterplot.png" title="3D scatterplot" width="800"/></a>
+</center>
+<p>Previously in Figure <a href="6-regression.html#fig:numxplot4">6.6</a>, we plotted a “best-fitting” regression line through a set of points where the numerical outcome variable <span class="math inline">\(y\)</span> was teaching <code>score</code> and a single numerical explanatory variable <span class="math inline">\(x\)</span> was <code>bty_avg</code>. What is the analogous concept when we have <em>two</em> numerical predictor variables? Instead of a best-fitting line, we now have a best-fitting <em>plane</em>, which is a 3D generalization of lines which exist in 2D. Click on the following image to open an interactive plot of the regression plane in your browser. Move the image around, zoom in, and think about how this plane generalizes the concept of a linear regression line to three dimensions.</p>
+<center>
+<a target="_blank" href="https://beta.rstudioconnect.com/connect/#/apps/3214/"><img src="images/credit_card_balance_regression_plane.png" title="Regression plane" width="800"/></a>
+</center>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC7.1)</strong> Conduct a new exploratory data analysis with the same outcome variable <span class="math inline">\(y\)</span> being <code>Balance</code> but with <code>Rating</code> and <code>Age</code> as the new explanatory variables <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span>. Remember, this involves three things:</p>
+<ol style="list-style-type: lower-alpha">
+<li>Looking at the raw values</li>
+<li>Computing summary statistics of the variables of interest.</li>
+<li>Creating informative visualizations</li>
+</ol>
+<p>What can you say about the relationship between a credit card holder’s balance and their credit rating and age?</p>
+<!-- CHESTER: I'm not sold on this practice and prefer to assign new variables in R like `Credit_small` instead of overwriting. I seem to remember us agreeing that re-assignment was only OK if we added more variables in Chapter 2-5, not if we chose a subset. We should stay consistent throughout so I'd recommend switching this to a different name as I have with `evals` in Chapters 6 and 7. -->
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model3table" class="section level3">
+<h3><span class="header-section-number">7.1.2</span> Multiple regression</h3>
+<p>Just as we did when we had a single numerical explanatory variable <span class="math inline">\(x\)</span> in Subsection <a href="6-regression.html#model1table">6.1.2</a> and when we had a single categorical explanatory variable <span class="math inline">\(x\)</span> in Subsection <a href="6-regression.html#model2table">6.2.2</a>, we fit a regression model and obtained the regression table in our two numerical explanatory variable scenario. To fit a regression model and get a table using <code>get_regression_table()</code>, we now use a <code>+</code> to consider multiple explanatory variables. In this case since we want to perform a regression of <code>Limit</code> and <code>Income</code> simultaneously, we input <code>Balance ~ Limit + Income</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">Balance_model &lt;-<span class="st"> </span><span class="kw">lm</span>(Balance <span class="op">~</span><span class="st"> </span>Limit <span class="op">+</span><span class="st"> </span>Income, <span class="dt">data =</span> Credit)
+<span class="kw">get_regression_table</span>(Balance_model)</code></pre>
+<table>
+<caption><span id="tab:model3-table-output">Table 7.3: </span>Multiple regression table</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">-385.179</td>
+<td align="right">19.465</td>
+<td align="right">-19.8</td>
+<td align="right">0</td>
+<td align="right">-423.446</td>
+<td align="right">-346.912</td>
+</tr>
+<tr class="even">
+<td align="left">Limit</td>
+<td align="right">0.264</td>
+<td align="right">0.006</td>
+<td align="right">45.0</td>
+<td align="right">0</td>
+<td align="right">0.253</td>
+<td align="right">0.276</td>
+</tr>
+<tr class="odd">
+<td align="left">Income</td>
+<td align="right">-7.663</td>
+<td align="right">0.385</td>
+<td align="right">-19.9</td>
+<td align="right">0</td>
+<td align="right">-8.420</td>
+<td align="right">-6.906</td>
+</tr>
+</tbody>
+</table>
+<p>How do we interpret these three values that define the regression plane?</p>
+<ul>
+<li>Intercept: -$385.18 (rounded to two decimal points to represent cents). The intercept in our case represents the credit card balance for an individual who has both a credit <code>Limit</code> of $0 and <code>Income</code> of $0. In our data however, the intercept has limited practical interpretation as no individuals had <code>Limit</code> or <code>Income</code> values of $0 and furthermore the smallest credit card balance was $0. Rather, it is used to situate the regression plane in 3D space.</li>
+<li>Limit: $0.26. Now that we have multiple variables to consider, we have to add
+a caveat to our interpretation: <em>taking all other variables in our model into account, for every increase of one unit in credit <code>Limit</code> (dollars), there is an associated increase of on average $0.26 in credit card balance</em>. Note:
+<ul>
+<li>Just as we did in Subsection <a href="6-regression.html#model1table">6.1.2</a>, we are not making any causal statements, only statements relating to the association between credit limit and balance</li>
+<li>We need to preface our interpretation of the associated effect of <code>Limit</code> with the statement “taking all other variables into account”, in this case <code>Income</code>, to emphasize that we are now jointly interpreting the associated effect of multiple explanatory variables in the same model and not in isolation.</li>
+</ul></li>
+<li>Income: -$7.66. Similarly, <em>taking all other variables into account, for every increase of one unit in <code>Income</code> (in other words, $1000 in income), there is an associated decrease of on average $7.66 in credit card balance</em>.</li>
+</ul>
+<p>However, recall in Figure <a href="7-multiple-regression.html#fig:2numxplot1">7.1</a> that when considered separately, both <code>Limit</code> and <code>Income</code> had positive relationships with the outcome variable <code>Balance</code>. As card holders’ credit limits increased their credit card balances tended to increase as well, and a similar relationship held for incomes and balances. In the above multiple regression, however, the slope for <code>Income</code> is now -7.66, suggesting a <em>negative relationship</em> between income and credit card balance. What explains these contradictory results?</p>
+<p>This is known as Simpson’s Paradox, a phenomenon in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. We expand on this in Subsection <a href="7-multiple-regression.html#simpsonsparadox">7.3.2</a> where we’ll look at the relationship between credit <code>Limit</code> and credit card balance but split by different income bracket groups.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC7.2)</strong> Fit a new simple linear regression using <code>lm(Balance ~ Rating + Age, data = Credit)</code> where <code>Rating</code> and <code>Age</code> are the new numerical explanatory variables <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span>. Get information about the “best-fitting” line from the regression table by applying the <code>get_regression_table()</code> function. How do the regression results match up with the results from your exploratory data analysis above?</p>
+<div class="learncheck">
+
+</div>
+</div>
+<div id="model3points" class="section level3">
+<h3><span class="header-section-number">7.1.3</span> Observed/fitted values and residuals</h3>
+<p>As we did previously in Table <a href="7-multiple-regression.html#tab:model3-points-table">7.4</a>, let’s unpack the output of the <code>get_regression_points()</code> function for our model for credit card balance for all 400 card holders in the dataset. Recall that each card holder corresponds to one of the 400 rows in the <code>Credit</code> data frame and also for one of the 400 3D points in the 3D scatterplots in Subsection <a href="7-multiple-regression.html#model3EDA">7.1.1</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(Balance_model)
+regression_points</code></pre>
+<table>
+<caption><span id="tab:model3-points-table">Table 7.4: </span>Regression points (first 5 rows of 400)</caption>
+<thead>
+<tr class="header">
+<th align="right">ID</th>
+<th align="right">Balance</th>
+<th align="right">Limit</th>
+<th align="right">Income</th>
+<th align="right">Balance_hat</th>
+<th align="right">residual</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">333</td>
+<td align="right">3606</td>
+<td align="right">14.9</td>
+<td align="right">454</td>
+<td align="right">-120.8</td>
+</tr>
+<tr class="even">
+<td align="right">2</td>
+<td align="right">903</td>
+<td align="right">6645</td>
+<td align="right">106.0</td>
+<td align="right">559</td>
+<td align="right">344.3</td>
+</tr>
+<tr class="odd">
+<td align="right">3</td>
+<td align="right">580</td>
+<td align="right">7075</td>
+<td align="right">104.6</td>
+<td align="right">683</td>
+<td align="right">-103.4</td>
+</tr>
+<tr class="even">
+<td align="right">4</td>
+<td align="right">964</td>
+<td align="right">9504</td>
+<td align="right">148.9</td>
+<td align="right">986</td>
+<td align="right">-21.7</td>
+</tr>
+<tr class="odd">
+<td align="right">5</td>
+<td align="right">331</td>
+<td align="right">4897</td>
+<td align="right">55.9</td>
+<td align="right">481</td>
+<td align="right">-150.0</td>
+</tr>
+</tbody>
+</table>
+<p>Recall the format of the output:</p>
+<ul>
+<li><code>Balance</code> corresponds to <span class="math inline">\(y\)</span> (the observed value)</li>
+<li><code>Balance_hat</code> corresponds to <span class="math inline">\(\widehat{y}\)</span> (the fitted value)</li>
+<li><code>residual</code> corresponds to <span class="math inline">\(y - \widehat{y}\)</span> (the residual)</li>
+</ul>
+</div>
+<div id="model3residuals" class="section level3">
+<h3><span class="header-section-number">7.1.4</span> Residual analysis</h3>
+<p>Recall in Section <a href="6-regression.html#model1residuals">6.1.4</a>, our first residual analysis plot investigated the presence of any systematic pattern in the residuals when we had a single numerical predictor: <code>bty_age</code>. For the <code>Credit</code> card dataset, since we have two numerical predictors, <code>Limit</code> and <code>Income</code>, we must perform this twice:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> Limit, <span class="dt">y =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Credit limit (in $)&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Residuals vs credit limit&quot;</span>)
+  
+<span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> Income, <span class="dt">y =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Income (in $1000)&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Residuals vs income&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-226"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-226-1.png" alt="Residuals vs credit limit and income" width="\textwidth" />
+<p class="caption">
+Figure 7.2: Residuals vs credit limit and income
+</p>
+</div>
+<p>In this case, there <strong>does</strong> appear to be a systematic pattern to the residuals. As the scatter of the residuals around the line <span class="math inline">\(y=0\)</span> is definitely not consistent. This behavior of the residuals is further evidenced by the histogram of residuals in Figure <a href="7-multiple-regression.html#fig:model3-residuals-hist">7.3</a>. We observe that the residuals have a slight right-skew (recall we say that data is right-skewed, or positively-skewed, if there is a tail to the right). Ideally, these residuals should be bell-shaped around a residual value of 0.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:model3-residuals-hist"></span>
+<img src="ismaykim_files/figure-html/model3-residuals-hist-1.png" alt="Relationship between credit card balance and credit limit/income" width="\textwidth" />
+<p class="caption">
+Figure 7.3: Relationship between credit card balance and credit limit/income
+</p>
+</div>
+<p>Another way to interpret this histogram is that since the residual is computed as <span class="math inline">\(y - \widehat{y}\)</span> = <code>balance</code> - <code>balance_hat</code>, we have some values where the fitted value <span class="math inline">\(\widehat{y}\)</span> is very much lower than the observed value <span class="math inline">\(y\)</span>. In other words, we are underestimating certain credit card holders’ balances by a very large amount.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p><strong>(LC7.3)</strong> Continuing with our regression using <code>Rating</code> and <code>Age</code> as the explanatory variables and credit card <code>Balance</code> as the outcome variable, use the <code>get_regression_points()</code> function to get the observed values, fitted values, and residuals for all 400 credit card holders. Perform a residual analysis and look for any systematic patterns in the residuals.</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="model4" class="section level2">
+<h2><span class="header-section-number">7.2</span> One numerical &amp; one categorical explanatory variable</h2>
+<p>Let’s revisit the instructor evaluation data introduced in Section <a href="6-regression.html#model1">6.1</a>, where we studied the relationship between instructor evaluation scores and their beauty scores. This analysis suggested that there is a positive relationship between <code>bty_avg</code> and <code>score</code>, in other words as instructors had higher beauty scores, they also tended to have higher teaching evaluation scores. Now let’s say instead of <code>bty_avg</code> we are interested in the numerical explanatory variable <span class="math inline">\(x_1\)</span> <code>age</code> and furthermore we want to use a second explanatory variable <span class="math inline">\(x_2\)</span>, the (binary) categorical variable <code>gender</code>.</p>
+<p><strong>Note</strong>: This study only focused on the gender binary of <code>&quot;male&quot;</code> or <code>&quot;female&quot;</code> when the data was collected and analyzed years ago. It has been tradition to use gender as an “easy” binary variable in the past in statistical analyses. We have chosen to include it here because of the interesting results of the study, but we also understand that a segment of the population is not included in this dichotomous assignment of gender and we advocate for more inclusion in future studies to show representation of groups that do not identify with the gender binary. We now resume our analyses using this <code>evals</code> data and hope that others find these results interesting and worth further exploration.</p>
+<p>Our modeling scenario now becomes</p>
+<ol style="list-style-type: decimal">
+<li>A numerical outcome variable <span class="math inline">\(y\)</span>. As before, instructor evaluation score.</li>
+<li>Two explanatory variables:
+<ol style="list-style-type: decimal">
+<li>A numerical explanatory variable <span class="math inline">\(x_1\)</span>: in this case, their age.</li>
+<li>A categorical explanatory variable <span class="math inline">\(x_2\)</span>: in this case, their binary gender.</li>
+</ol></li>
+</ol>
+<div id="model4EDA" class="section level3">
+<h3><span class="header-section-number">7.2.1</span> Exploratory data analysis</h3>
+<p>Let’s reload the <code>evals</code> data and <code>select()</code> only the needed subset of variables. Note that these are different than the variables chosen in Chapter 6. Let’s given this the name <code>evals_ch7</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch7 &lt;-<span class="st"> </span>evals <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">select</span>(score, age, gender)</code></pre>
+<p>Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the <code>glimpse()</code> function, although in Table <a href="7-multiple-regression.html#tab:model4-data-preview">7.5</a> we only show 5 randomly selected instructors out of 463:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">View</span>(evals_ch7)</code></pre>
+<table>
+<caption><span id="tab:model4-data-preview">Table 7.5: </span>Random sample of 5 instructors</caption>
+<thead>
+<tr class="header">
+<th align="right">score</th>
+<th align="right">age</th>
+<th align="left">gender</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">3.6</td>
+<td align="right">34</td>
+<td align="left">male</td>
+</tr>
+<tr class="even">
+<td align="right">4.9</td>
+<td align="right">43</td>
+<td align="left">male</td>
+</tr>
+<tr class="odd">
+<td align="right">3.3</td>
+<td align="right">47</td>
+<td align="left">male</td>
+</tr>
+<tr class="even">
+<td align="right">4.4</td>
+<td align="right">33</td>
+<td align="left">female</td>
+</tr>
+<tr class="odd">
+<td align="right">4.7</td>
+<td align="right">60</td>
+<td align="left">male</td>
+</tr>
+</tbody>
+</table>
+<p>Let’s look at some summary statistics using the <code>skim()</code> function from the <code>skimr</code> package:</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch7 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">skim</span>()</code></pre>
+<pre><code>Skim summary statistics
+ n obs: 463 
+ n variables: 3 
+
+── Variable type:factor ──────
+ variable missing complete   n n_unique                top_counts ordered
+   gender       0      463 463        2 mal: 268, fem: 195, NA: 0   FALSE
+
+── Variable type:integer ─────
+ variable missing complete   n  mean  sd p0 p25 p50 p75 p100     hist
+      age       0      463 463 48.37 9.8 29  42  48  57   73 ▅▅▅▇▅▇▂▁
+
+── Variable type:numeric ─────
+ variable missing complete   n mean   sd  p0 p25 p50 p75 p100     hist
+    score       0      463 463 4.17 0.54 2.3 3.8 4.3 4.6    5 ▁▁▂▃▅▇▇▆</code></pre>
+<p>Furthermore, let’s compute the correlation between two numerical variables we have <code>score</code> and <code>age</code>. Recall that correlation coefficients only exist between numerical variables. We observe that they are weakly negatively correlated.</p>
+<pre class="sourceCode r"><code class="sourceCode r">evals_ch7 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_correlation</span>(<span class="dt">formula =</span> score <span class="op">~</span><span class="st"> </span>age)</code></pre>
+<pre><code># A tibble: 1 x 1
+  correlation
+        &lt;dbl&gt;
+1      -0.107</code></pre>
+<p>In Figure <a href="7-multiple-regression.html#fig:numxcatxplot1">7.4</a>, we plot a scatterplot of <code>score</code> over <code>age</code>. Given that <code>gender</code> is a binary categorical variable in this study, we can make some interesting tweaks:</p>
+<ol style="list-style-type: decimal">
+<li>We can assign a color to points from each of the two levels of <code>gender</code>: female and male.</li>
+<li>Furthermore, the <code>geom_smooth(method = &quot;lm&quot;, se = FALSE)</code> layer automatically fits a different regression line for each since we have provided <code>color = gender</code> at the top level in <code>ggplot()</code>. This allows for all <code>geom_</code>etries that follow to have the same mapping of <code>aes()</code>thetics to variables throughout the plot.</li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(evals_ch7, <span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> score, <span class="dt">color =</span> gender)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_jitter</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Age&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Teaching Score&quot;</span>, <span class="dt">color =</span> <span class="st">&quot;Gender&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method =</span> <span class="st">&quot;lm&quot;</span>, <span class="dt">se =</span> <span class="ot">FALSE</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:numxcatxplot1"></span>
+<img src="ismaykim_files/figure-html/numxcatxplot1-1.png" alt="Instructor evaluation scores at UT Austin split by gender (jittered)" width="\textwidth" />
+<p class="caption">
+Figure 7.4: Instructor evaluation scores at UT Austin split by gender (jittered)
+</p>
+</div>
+<p>We notice some interesting trends:</p>
+<ol style="list-style-type: decimal">
+<li>There are almost no women faculty over the age of 60. We can see this by the lack of red dots above 60.</li>
+<li>Fitting separate regression lines for men and women, we see they have different slopes. We see that the associated effect of increasing age seems to be much harsher for women than men. In other words, as women age, the drop in their teaching score appears to be faster.</li>
+</ol>
+</div>
+<div id="model4table" class="section level3">
+<h3><span class="header-section-number">7.2.2</span> Multiple regression: Parallel slopes model</h3>
+<p>Much like we started to consider multiple explanatory variables using the <code>+</code> sign in Subsection <a href="7-multiple-regression.html#model3table">7.1.2</a>, let’s fit a regression model and get the regression table. This time we provide the name of <code>score_model_2</code> to our regression model fit, in so as to not overwrite the model <code>score_model</code> from Section <a href="6-regression.html#model1table">6.1.2</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">score_model_<span class="dv">2</span> &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">+</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_ch7)
+<span class="kw">get_regression_table</span>(score_model_<span class="dv">2</span>)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-234">Table 7.6: </span>Regression table</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">4.484</td>
+<td align="right">0.125</td>
+<td align="right">35.79</td>
+<td align="right">0.000</td>
+<td align="right">4.238</td>
+<td align="right">4.730</td>
+</tr>
+<tr class="even">
+<td align="left">age</td>
+<td align="right">-0.009</td>
+<td align="right">0.003</td>
+<td align="right">-3.28</td>
+<td align="right">0.001</td>
+<td align="right">-0.014</td>
+<td align="right">-0.003</td>
+</tr>
+<tr class="odd">
+<td align="left">gendermale</td>
+<td align="right">0.191</td>
+<td align="right">0.052</td>
+<td align="right">3.63</td>
+<td align="right">0.000</td>
+<td align="right">0.087</td>
+<td align="right">0.294</td>
+</tr>
+</tbody>
+</table>
+<p>The modeling equation for this scenario is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{y} &amp;= b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 \\
+\widehat{\mbox{score}} &amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) \\
+\end{align}
+\]</span>
+where <span class="math inline">\(\mathbb{1}_{\mbox{is male}}(x)\)</span> is an <em>indicator function</em> for <code>sex == male</code>. In other words, <span class="math inline">\(\mathbb{1}_{\mbox{is male}}(x)\)</span> equals one if the current observation corresponds to a male professor, and 0 if the current observation corresponds to a female professor. This model can be visualized in Figure <a href="7-multiple-regression.html#fig:numxcatxplot2">7.5</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:numxcatxplot2"></span>
+<img src="ismaykim_files/figure-html/numxcatxplot2-1.png" alt="Instructor evaluation scores at UT Austin by gender: same slope" width="\textwidth" />
+<p class="caption">
+Figure 7.5: Instructor evaluation scores at UT Austin by gender: same slope
+</p>
+</div>
+<p>We see that:</p>
+<ul>
+<li>Females are treated as the baseline for comparison for no other reason than “female” is alphabetically earlier than “male.” The <span class="math inline">\(b_{male} = 0.1906\)</span> is the vertical “bump” that men get in their teaching evaluation scores. Or more precisely, it is the average difference in teaching score
+that men get <em>relative to the baseline of women</em>.</li>
+<li>Accordingly, the intercepts (which in this case make no sense since no instructor can have an age of 0) are :
+<ul>
+<li>for women: <span class="math inline">\(b_0\)</span> = 4.484</li>
+<li>for men: <span class="math inline">\(b_0 + b_{male}\)</span> = 4.484 + 0.191 = 4.675</li>
+</ul></li>
+<li>Both men and women have the same slope. In other words, <em>in this model</em> the associated effect of age is the same for men and women. So for every increase of one year in age, there is on average an associated change of <span class="math inline">\(b_{age}\)</span> = -0.009 (a decrease) in teaching score.</li>
+</ul>
+<p>But wait, why is Figure <a href="7-multiple-regression.html#fig:numxcatxplot2">7.5</a> different than Figure <a href="7-multiple-regression.html#fig:numxcatxplot1">7.4</a>! What is going on? What we have in the original plot is known as an <em>interaction effect</em> between age and gender. Focusing on fitting a model for each of men and women, we see that the resulting regression lines are different. Thus, <code>gender</code> appears to interact in different ways for men and women with the different values of <code>age</code>.</p>
+</div>
+<div id="model4interactiontable" class="section level3">
+<h3><span class="header-section-number">7.2.3</span> Multiple regression: Interaction model</h3>
+<p>We say a model has an <em>interaction effect</em> if the associated effect of one variable <em>depends on the value of another variable</em>. These types of models usually prove to be tricky to view on first glance because of their complexity. In this case, the effect of <code>age</code> will depend on the value of <code>gender</code>. Put differently, the effect of age on teaching scores will differ for men and for women, as was suggested by the different slopes for men and women in our visual exploratory data analysis in Figure <a href="7-multiple-regression.html#fig:numxcatxplot1">7.4</a>.</p>
+<p>Let’s fit a regression with an interaction term. Instead of using the <code>+</code> sign in the enumeration of explanatory variables, we use the <code>*</code> sign. Let’s fit this regression and save it in <code>score_model_3</code>, then we get the regression table using the <code>get_regression_table()</code> function as before.</p>
+<pre class="sourceCode r"><code class="sourceCode r">score_model_interaction &lt;-<span class="st"> </span><span class="kw">lm</span>(score <span class="op">~</span><span class="st"> </span>age <span class="op">*</span><span class="st"> </span>gender, <span class="dt">data =</span> evals_ch7)
+<span class="kw">get_regression_table</span>(score_model_interaction)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-236">Table 7.7: </span>Regression table</caption>
+<thead>
+<tr class="header">
+<th align="left">term</th>
+<th align="right">estimate</th>
+<th align="right">std_error</th>
+<th align="right">statistic</th>
+<th align="right">p_value</th>
+<th align="right">lower_ci</th>
+<th align="right">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">intercept</td>
+<td align="right">4.883</td>
+<td align="right">0.205</td>
+<td align="right">23.80</td>
+<td align="right">0.000</td>
+<td align="right">4.480</td>
+<td align="right">5.286</td>
+</tr>
+<tr class="even">
+<td align="left">age</td>
+<td align="right">-0.018</td>
+<td align="right">0.004</td>
+<td align="right">-3.92</td>
+<td align="right">0.000</td>
+<td align="right">-0.026</td>
+<td align="right">-0.009</td>
+</tr>
+<tr class="odd">
+<td align="left">gendermale</td>
+<td align="right">-0.446</td>
+<td align="right">0.265</td>
+<td align="right">-1.68</td>
+<td align="right">0.094</td>
+<td align="right">-0.968</td>
+<td align="right">0.076</td>
+</tr>
+<tr class="even">
+<td align="left">age:gendermale</td>
+<td align="right">0.014</td>
+<td align="right">0.006</td>
+<td align="right">2.45</td>
+<td align="right">0.015</td>
+<td align="right">0.003</td>
+<td align="right">0.024</td>
+</tr>
+</tbody>
+</table>
+<p>The modeling equation for this scenario is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{y} &amp;= b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 + b_3 \cdot x_1 \cdot x_2\\
+\widehat{\mbox{score}} &amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\
+\end{align}
+\]</span></p>
+<p>Oof, that’s a lot of rows in the regression table output and a lot of terms in the model equation. The fourth term being added on the right hand side of the equation corresponds to the <em>interaction term</em>. Let’s simplify things by considering men and women separately. First, recall that <span class="math inline">\(\mathbb{1}_{\mbox{is male}}(x)\)</span> equals 1 if a particular observation (or row in <code>evals_ch7</code>) corresponds to a male instructor. In this case, using the values from the regression table the fitted value of <span class="math inline">\(\widehat{\mbox{score}}\)</span> is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\mbox{score}} &amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\
+&amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot 1 + b_{\mbox{age,male}} \cdot \mbox{age} \cdot 1 \\
+&amp;= \left(b_0 + b_{\mbox{male}}\right) + \left(b_{\mbox{age}} +  b_{\mbox{age,male}} \right) \cdot \mbox{age} \\
+&amp;= \left(4.883 + -0.446\right) + \left(-0.018 +  0.014 \right) \cdot \mbox{age} \\
+&amp;= 4.437 -0.004 \cdot \mbox{age}
+\end{align}
+\]</span></p>
+<p>Second, recall that <span class="math inline">\(\mathbb{1}_{\mbox{is male}}(x)\)</span> equals 0 if a particular observation corresponds to a female instructor. Again, using the values from the regression table the fitted value of <span class="math inline">\(\widehat{\mbox{score}}\)</span> is:</p>
+<p><span class="math display">\[
+\begin{align}
+\widehat{\mbox{score}} &amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\
+&amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot 0 + b_{\mbox{age,male}}\mbox{age} \cdot 0 \\
+&amp;= b_0 + b_{\mbox{age}} \cdot \mbox{age}\\
+&amp;= 4.883 -0.018 \cdot \mbox{age}
+\end{align}
+\]</span></p>
+<p>Let’s summarize these values in a table:</p>
+<table>
+<caption><span id="tab:unnamed-chunk-237">Table 7.8: </span>Comparison of male and female intercepts and age slopes</caption>
+<thead>
+<tr class="header">
+<th align="left">Gender</th>
+<th align="right">Intercept</th>
+<th align="right">Slope for age</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">Male instructors</td>
+<td align="right">4.44</td>
+<td align="right">-0.004</td>
+</tr>
+<tr class="even">
+<td align="left">Female instructors</td>
+<td align="right">4.88</td>
+<td align="right">-0.018</td>
+</tr>
+</tbody>
+</table>
+<p>We see that while male instructors have a lower intercept, as they age, they have a less steep associated average decrease in teaching scores: 0.004 teaching score units per year as opposed to -0.018 for women. This is consistent with the different slopes and intercepts of the red and blue regression lines fit in Figure <a href="7-multiple-regression.html#fig:numxcatxplot1">7.4</a>. Recall our definition of a model having an interaction effect: when the associated effect of one variable, in this case <code>age</code>, depends on the value of another variable, in this case <code>gender</code>.</p>
+<p>But how do we know when it’s appropriate to include an interaction effect? For example, which is the more appropriate model? The regular multiple regression model without an interaction term we saw in Section <a href="7-multiple-regression.html#model4table">7.2.2</a> or the multiple regression model with the interaction term we just saw? We’ll revisit this question in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on “inference for regression.”</p>
+</div>
+<div id="model4points" class="section level3">
+<h3><span class="header-section-number">7.2.4</span> Observed/fitted values and residuals</h3>
+<p>Now say we want to apply the above calculations for male and female instructors for all 463 instructors in the <code>evals_ch7</code> dataset. As our multiple regression models get more and more complex, computing such values by hand gets more and more tedious. The <code>get_regression_points()</code> function spares us this tedium and returns all fitted values and all residuals. For simplicity, let’s focus only on the fitted interaction model, which is saved in <code>score_model_interaction</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">regression_points &lt;-<span class="st"> </span><span class="kw">get_regression_points</span>(score_model_interaction)
+regression_points</code></pre>
+<table>
+<caption><span id="tab:model4-points-table">Table 7.9: </span>Regression points (first 5 rows of 463)</caption>
+<thead>
+<tr class="header">
+<th align="right">ID</th>
+<th align="right">score</th>
+<th align="right">age</th>
+<th align="left">gender</th>
+<th align="right">score_hat</th>
+<th align="right">residual</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">4.7</td>
+<td align="right">36</td>
+<td align="left">female</td>
+<td align="right">4.25</td>
+<td align="right">0.448</td>
+</tr>
+<tr class="even">
+<td align="right">2</td>
+<td align="right">4.1</td>
+<td align="right">36</td>
+<td align="left">female</td>
+<td align="right">4.25</td>
+<td align="right">-0.152</td>
+</tr>
+<tr class="odd">
+<td align="right">3</td>
+<td align="right">3.9</td>
+<td align="right">36</td>
+<td align="left">female</td>
+<td align="right">4.25</td>
+<td align="right">-0.352</td>
+</tr>
+<tr class="even">
+<td align="right">4</td>
+<td align="right">4.8</td>
+<td align="right">36</td>
+<td align="left">female</td>
+<td align="right">4.25</td>
+<td align="right">0.548</td>
+</tr>
+<tr class="odd">
+<td align="right">5</td>
+<td align="right">4.6</td>
+<td align="right">59</td>
+<td align="left">male</td>
+<td align="right">4.20</td>
+<td align="right">0.399</td>
+</tr>
+</tbody>
+</table>
+<p>Recall the format of the output:</p>
+<ul>
+<li><code>score</code> corresponds to <span class="math inline">\(y\)</span> the observed value</li>
+<li><code>score_hat</code> corresponds to <span class="math inline">\(\widehat{y} = \widehat{\mbox{score}}\)</span> the fitted value</li>
+<li><code>residual</code> corresponds to the residual <span class="math inline">\(y - \widehat{y}\)</span></li>
+</ul>
+</div>
+<div id="model4residuals" class="section level3">
+<h3><span class="header-section-number">7.2.5</span> Residual analysis</h3>
+<p>As always, let’s perform a residual analysis first with a histogram, which we can facet by <code>gender</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.25</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span>gender)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:residual1"></span>
+<img src="ismaykim_files/figure-html/residual1-1.png" alt="Interaction model histogram of residuals" width="\textwidth" />
+<p class="caption">
+Figure 7.6: Interaction model histogram of residuals
+</p>
+</div>
+<p>Second, the residuals as compared to the predictor variables:</p>
+<ul>
+<li><span class="math inline">\(x_1\)</span>: numerical explanatory/predictor variable of <code>age</code></li>
+<li><span class="math inline">\(x_2\)</span>: categorical explanatory/predictor variable of <code>gender</code></li>
+</ul>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(regression_points, <span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> residual)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;age&quot;</span>, <span class="dt">y =</span> <span class="st">&quot;Residual&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_hline</span>(<span class="dt">yintercept =</span> <span class="dv">0</span>, <span class="dt">col =</span> <span class="st">&quot;blue&quot;</span>, <span class="dt">size =</span> <span class="dv">1</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>gender)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:residual2"></span>
+<img src="ismaykim_files/figure-html/residual2-1.png" alt="Interaction model residuals vs predictor" width="\textwidth" />
+<p class="caption">
+Figure 7.7: Interaction model residuals vs predictor
+</p>
+</div>
+</div>
+</div>
+<div id="related-topics-1" class="section level2">
+<h2><span class="header-section-number">7.3</span> Related topics</h2>
+<div id="correlationcoefficient2" class="section level3">
+<h3><span class="header-section-number">7.3.1</span> More on the correlation coefficient</h3>
+<p>Recall from Table <a href="7-multiple-regression.html#tab:model3-correlation">7.2</a> that we saw the correlation
+coefficient between <code>Income</code> in thousands of dollars and credit card <code>Balance</code>
+was 0.464. What if in instead we looked at the correlation coefficient between
+<code>Income</code> and credit card <code>Balance</code>, but where <code>Income</code> was in dollars and not
+thousands of dollars? This can be done by multiplying <code>Income</code> by 1000.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ISLR)
+<span class="kw">data</span>(Credit)
+Credit <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(Balance, Income) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">Income =</span> Income <span class="op">*</span><span class="st"> </span><span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">cor</span>()</code></pre>
+<table>
+<caption><span id="tab:cor-credit-2">Table 7.10: </span>Correlation between income (in $) and credit card balance</caption>
+<thead>
+<tr class="header">
+<th></th>
+<th align="right">Balance</th>
+<th align="right">Income</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>Balance</td>
+<td align="right">1.000</td>
+<td align="right">0.464</td>
+</tr>
+<tr class="even">
+<td>Income</td>
+<td align="right">0.464</td>
+<td align="right">1.000</td>
+</tr>
+</tbody>
+</table>
+<p>We see it is the same! We say that the correlation coefficient is invariant to linear
+transformations! In other words,</p>
+<ul>
+<li>the correlation between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> will be the same as</li>
+<li>the correlation between <span class="math inline">\(a\times x + b\)</span> and <span class="math inline">\(y\)</span> where <span class="math inline">\(a\)</span> and <span class="math inline">\(b\)</span> are numerical values (real numbers in mathematical terms).</li>
+</ul>
+</div>
+<div id="simpsonsparadox" class="section level3">
+<h3><span class="header-section-number">7.3.2</span> Simpson’s Paradox</h3>
+<p>Recall in Section <a href="7-multiple-regression.html#model3">7.1</a>, we saw the two following seemingly contradictory results when studying the relationship between credit card balance, credit limit, and income. On the one hand, the right hand plot of Figure <a href="7-multiple-regression.html#fig:2numxplot1">7.1</a> suggested that credit card balance and income were positively related:</p>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-240"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-240-1.png" alt="Relationship between credit card balance and credit limit/income" width="\textwidth" />
+<p class="caption">
+Figure 7.8: Relationship between credit card balance and credit limit/income
+</p>
+</div>
+<p>On the other hand, the multiple regression in Table <a href="7-multiple-regression.html#tab:model3-table-output">7.3</a>, suggested that when modeling credit card balance as a function of both credit limit and income at the same time, credit limit has a negative relationship with balance, as evidenced by the slope of -7.66. How can this be?</p>
+<p>First, let’s dive a little deeper into the explanatory variable <code>Limit</code>. Figure <a href="7-multiple-regression.html#fig:credit-limit-quartiles">7.9</a> shows a histogram of all 400 values of <code>Limit</code>, along with vertical red lines that cut up the data into quartiles, meaning:</p>
+<ol style="list-style-type: decimal">
+<li>25% of credit limits were between $0 and $3088. Let’s call this the “low” credit limit bracket.</li>
+<li>25% of credit limits were between $3088 and $4622. Let’s call this the “medium-low” credit limit bracket.</li>
+<li>25% of credit limits were between $4622 and $5873. Let’s call this the “medium-high” credit limit bracket.</li>
+<li>25% of credit limits were over $5873. Let’s call this the “high” credit limit bracket.</li>
+</ol>
+<div class="figure" style="text-align: center"><span id="fig:credit-limit-quartiles"></span>
+<img src="ismaykim_files/figure-html/credit-limit-quartiles-1.png" alt="Histogram of credit limits and quartiles" width="\textwidth" />
+<p class="caption">
+Figure 7.9: Histogram of credit limits and quartiles
+</p>
+</div>
+<p>Let’s now display</p>
+<ol style="list-style-type: decimal">
+<li>The scatterplot showing the relationship between credit card balance and limit (the right-hand plot of Figure <a href="7-multiple-regression.html#fig:2numxplot1">7.1</a>).</li>
+<li>The scatterplot showing the relationship between credit card balance and limit now with a color aesthetic added corresponding to the credit limit bracket.</li>
+</ol>
+<div class="figure" style="text-align: center"><span id="fig:2numxplot4"></span>
+<img src="ismaykim_files/figure-html/2numxplot4-1.png" alt="Relationship between credit card balance and income for different credit limit brackets" width="\textwidth" />
+<p class="caption">
+Figure 7.10: Relationship between credit card balance and income for different credit limit brackets
+</p>
+</div>
+<p>In the right-hand plot, the</p>
+<ul>
+<li>Red points (bottom-left) correspond to the low credit limit bracket.</li>
+<li>Green points correspond to the medium-low credit limit bracket.</li>
+<li>Blue points correspond to the medium-high credit limit bracket.</li>
+<li>Purple points (top-right) correspond to the high credit limit bracket.</li>
+</ul>
+<p>The left-hand plot focuses of the relationship between balance and income in aggregate, but the right-hand plot focuses on the relationship between balance and income <em>broken down by credit limit bracket</em>. Whereas in aggregate there is an overall positive relationship, when broken down we now see that for the low (red points), medium-low (green points), and medium-high (blue points) income bracket groups, the strong positive relationship between credit card balance and income disappears! Only for the high bracket does the relationship stay somewhat positive. In this example, credit limit is a <em>confounding variable</em> for credit card balance and income.</p>
+<!--
+Alternatively, we could also have used facets, where each facet has roughly 25% of people based
+on the credit limit bracket. However, IMO the above plot is easier to read.
+
+<div class="figure" style="text-align: center">
+<img src="ismaykim_files/figure-html/2numxplot5-1.png" alt="Relationship between credit card balance and income for different credit limit brackets" width="\textwidth" />
+<p class="caption">(\#fig:2numxplot5)Relationship between credit card balance and income for different credit limit brackets</p>
+</div>
+-->
+</div>
+</div>
+<div id="conclusion-5" class="section level2">
+<h2><span class="header-section-number">7.4</span> Conclusion</h2>
+<div id="whats-to-come-4" class="section level3">
+<h3><span class="header-section-number">7.4.1</span> What’s to come?</h3>
+<p>Congratulations! We’re ready to proceed to the third portion of this book: “statistical inference” using a new package called <code>infer</code>. Once we’ve covered Chapters <a href="8-sampling.html#sampling">8</a> on sampling, <a href="9-confidence-intervals.html#confidence-intervals">9</a> on confidence intervals, and <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> on hypothesis testing, we’ll come back to the models we’ve seen in “data modeling” in Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on inference for regression. As we said at the end of Chapter <a href="6-regression.html#regression">6</a>, we’ll see why we’ve been conducting the residual analyses from Subsections <a href="7-multiple-regression.html#model3residuals">7.1.4</a> and <a href="7-multiple-regression.html#model4residuals">7.2.5</a>. We are actually verifying some very important assumptions that must be met for the <code>std_error</code> (standard error), <code>p_value</code>, <code>conf_low</code> and <code>conf_high</code> (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation.</p>
+<p>Up next:</p>
+<center>
+<img src="images/flowcharts/flowchart/flowchart.006.png" title="ModernDive flowchart" width="800"/>
+</center>
+</div>
+<div id="script-of-r-code-4" class="section level3">
+<h3><span class="header-section-number">7.4.2</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/07-multiple-regression.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+
+
+
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="6-regression.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="8-sampling.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/07-multiple-regression.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/8-inference-for-regression.html b/previous_versions/v0.4.0/8-inference-for-regression.html
new file mode 100644
index 000000000..e69de29bb
diff --git a/previous_versions/v0.4.0/8-sampling.html b/previous_versions/v0.4.0/8-sampling.html
new file mode 100644
index 000000000..f30c114c7
--- /dev/null
+++ b/previous_versions/v0.4.0/8-sampling.html
@@ -0,0 +1,1612 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>8 Sampling | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="8 Sampling | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="8 Sampling | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="7-multiple-regression.html">
+<link rel="next" href="9-confidence-intervals.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="sampling" class="section level1">
+<h1><span class="header-section-number">8</span> Sampling</h1>
+<p>In this chapter we kick off the third segment of this book, statistical inference, by learning about <strong>sampling</strong>. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we’ll cover in Chapters <a href="9-confidence-intervals.html#confidence-intervals">9</a> and <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> respectively. We will see that the tools that you learned in the data science segment of this book (data visualization, “tidy” data format, and data wrangling) will also play an important role here in the development of your understanding. As mentioned before, the concepts throughout this text all build into a culmination allowing you to “think with data.”</p>
+<div id="needed-packages-5" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(moderndive)</code></pre>
+</div>
+<div id="introduction-to-sampling" class="section level2">
+<h2><span class="header-section-number">8.1</span> Introduction to sampling</h2>
+<p>Let’s kick off this chapter immediately with an exercise that involves <strong>sampling</strong>. Imagine you are given a large bowl with 2400 balls that are either red or white. We are interested in the proportion of balls in this bowl that are red, but you don’t have the time to do an exhaustive count. You are also given a “shovel” that you can insert into this bowl…</p>
+<div class="figure" style="text-align: center"><span id="fig:sampling-exercise-1"></span>
+<img src="images/sampling_bowl_2.jpg" alt="A bowl with 2400 balls" width="600px" />
+<p class="caption">
+Figure 8.1: A bowl with 2400 balls
+</p>
+</div>
+<p>… and extract a sample of 50 balls:</p>
+<div class="figure" style="text-align: center"><span id="fig:sampling-exercise-2"></span>
+<img src="images/sampling_bowl_3_cropped.jpg" alt="A shovel used to extract a sample of size n = 50" width="600px" />
+<p class="caption">
+Figure 8.2: A shovel used to extract a sample of size n = 50
+</p>
+</div>
+<div id="concepts-related-to-sampling" class="section level3 unnumbered">
+<h3>Concepts related to sampling</h3>
+<p>Let’s now define some concepts and terminology important to understand sampling, being sure to tie things back to the above example. You might have to read this a couple times more as you progress throughout this book, as they are very deeply layered concepts. However as we’ll soon see, they are very powerful concepts that open up a whole new world of scientific thinking:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Population</strong>: The population is a set of <span class="math inline">\(N\)</span> observations of interest.
+<ul>
+<li>Above Ex: Our bowl consisting of <span class="math inline">\(N=2400\)</span> identically-shaped balls.</li>
+</ul></li>
+<li><strong>Population parameter</strong>: A population parameter is a numerical summary value about the population. In most settings, this is a value that’s unknown and you wish you knew it.
+<ul>
+<li>Above Ex: The true <em>population proportion <span class="math inline">\(p\)</span></em> of the balls in the bowl that are red.</li>
+<li>In this scenario the parameter of interest is the proportion, but in others it could be numerical summary values like the mean, median, etc.</li>
+</ul></li>
+<li><strong>Census</strong>: An exhaustive enumeration/counting of all observations in the population in order to compute the population parameter’s numerical value. <em>exactly</em>
+<ul>
+<li>Above Ex: This corresponds to manually going over all <span class="math inline">\(N=2400\)</span> balls and counting the number that are red, thereby allowing us to compute the population proportion <span class="math inline">\(p\)</span> of the balls that are red exactly.</li>
+<li>When <span class="math inline">\(N\)</span> is small, a census is feasible. However, when <span class="math inline">\(N\)</span> is large, a census can get very expensive, either in terms of time, energy, or money.</li>
+<li>Ex: the Decennial United States census attempts to exhaustively count the US population. Consequently it is a very expensive, but necessary, procedure.</li>
+</ul></li>
+<li><strong>Sampling</strong>: Collecting a sample of size <span class="math inline">\(n\)</span> of observations from the population. Typically the sample size <span class="math inline">\(n\)</span> is much smaller than the population size <span class="math inline">\(N\)</span>, thereby making sampling a much cheaper procedure than a census.
+<ul>
+<li>Above Ex: Using the shovel to extract a sample of <span class="math inline">\(n=50\)</span> balls.</li>
+<li>It is important to remember that the lowercase <span class="math inline">\(n\)</span> corresponds to the sample size and uppercase <span class="math inline">\(N\)</span> corresponds to the population size, thus <span class="math inline">\(n \leq N\)</span>.</li>
+</ul></li>
+<li><strong>Point estimates/sample statistics</strong>: A summary statistic based on the sample of size <span class="math inline">\(n\)</span> that <em>estimates</em> the unknown population parameter.
+<ul>
+<li>Above Ex: it’s the <em>sample proportion <span class="math inline">\(\widehat{p}\)</span></em> red of the balls in the sample of size <span class="math inline">\(n=50\)</span>.</li>
+<li>Key: The sample proportion red <span class="math inline">\(\widehat{p}\)</span> is an <em>estimate</em> of the true unknown population proportion red <span class="math inline">\(p\)</span>.</li>
+</ul></li>
+<li><strong>Representative sampling</strong>: A sample is said be a <em>representative sample</em> if it “looks like the population”. In other words, the sample’s characteristics are a good representation of the population’s characteristics.
+<ul>
+<li>Above Ex: Does our sample of <span class="math inline">\(n=50\)</span> balls “look like” the contents of the larger set of <span class="math inline">\(N=2400\)</span> balls in the bowl?</li>
+</ul></li>
+<li><strong>Generalizability</strong>: We say a sample is <em>generalizable</em> if any results of based on the sample can generalize to the population.
+<ul>
+<li>Above Ex: Is <span class="math inline">\(\widehat{p}\)</span> a “good guess” of <span class="math inline">\(p\)</span>?</li>
+<li>In other words, can we <em>infer</em> about the true proportion of the balls in the bowl that are red, based on the results of our sample of <span class="math inline">\(n=50\)</span> balls?</li>
+</ul></li>
+<li><strong>Bias</strong>: In a statistical sense, we say <em>bias</em> occurs if certain observations in a population have a higher chance of being sampled than others. We say a sampling procedure is <em>unbiased</em> if every observation in a population had an equal chance of being sampled.
+<ul>
+<li>Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? we feel since the balls are all of the same size, there isn’t any bias in the sampling. If, say, the red balls had a much larger diameter than the red ones. You might have have a higher or lower probability of now sampling red balls.</li>
+</ul></li>
+<li><strong>Random sampling</strong>: We say a sampling procedure is <em>random</em> if we sample randomly from the population in an unbiased fashion.
+<ul>
+<li>Above Ex: As long as you mixed the bowl sufficiently before sampling, your samples of size <span class="math inline">\(n=50\)</span> balls would be random.</li>
+</ul></li>
+</ol>
+</div>
+<div id="inference-via-sampling" class="section level3 unnumbered">
+<h3>Inference via sampling</h3>
+<p>Why did we go through the trouble of enumerating all the above concepts and terminology?</p>
+<p><strong>The moral of the story</strong>:</p>
+<blockquote>
+<ul>
+<li>If the sampling of a sample of size <span class="math inline">\(n\)</span> is done at <strong>random</strong>, then</li>
+<li>The sample is <strong>unbiased</strong> and <strong>representative</strong> of the population, thus</li>
+<li>Any result based on the sample can <strong>generalize</strong> to the population, thus</li>
+<li>The <strong>point estimate/sample statistic</strong> is a “good guess” of the unknown population parameter of interest</li>
+</ul>
+</blockquote>
+<p><strong>and thus we have inferred about the population based on our sample. In the above example</strong>:</p>
+<blockquote>
+<ul>
+<li>If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size <span class="math inline">\(n=50\)</span>, then</li>
+<li>The contents of the shovel will “look like” the contents of the bowl, thus</li>
+<li>Any results based on the sample of <span class="math inline">\(n=50\)</span> balls can generalize to the large bowl of <span class="math inline">\(N=2400\)</span> balls, thus</li>
+<li>The sample proportion <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> balls in the shovel that are red is a “good guess” of the true population proportion <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls that are red.</li>
+</ul>
+</blockquote>
+<p><strong>and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel.</strong></p>
+<p>At this point, you might be saying to yourself: “Big deal, why do we care about this bowl?” As hopefully you’ll soon come to appreciate, this sampling bowl exercise is merely a <strong>simulation</strong> representing the reality of many important sampling scenarios in a simplified and accessible setting. One in particular sampling scenario is familiar to many: polling. Whether for market research or for political purposes, polls inform much of the world’s decision and opinion making, and understanding the mechanism behind them can better inform you statistical citizenship. We’ll tie-in everything we learn in this chapter with an example relating to a 2013 poll on President Obama’s approval ratings among young adult in Section <a href="8-sampling.html#polls">8.4</a>.</p>
+</div>
+</div>
+<div id="tactile" class="section level2">
+<h2><span class="header-section-number">8.2</span> Tactile sampling simulation</h2>
+<p>Let’s start by revisiting our <em>tactile</em> sampling illustrating with “sampling bowl” in Figures <a href="8-sampling.html#fig:sampling-exercise-1">8.1</a> and <a href="8-sampling.html#fig:sampling-exercise-2">8.2</a>. By <em>tactile</em> we mean with your hands and to the touch. We’ll break down the act of tactile sampling from the bowl with the shovel using our newly acquired concepts and terminology relating to sampling. In particular we’ll study how <em>sampling variability</em> affects outcomes, which we’ll illustrate through simulations of <em>repeated sampling</em>. To this end, we’ll be using both the above-mentioned <em>tactile</em> simulation, but also using <em>virtual</em> simulation. By <em>virtual</em> we mean on the computer.</p>
+<div id="using-shovel-once" class="section level3">
+<h3><span class="header-section-number">8.2.1</span> Using shovel once</h3>
+<p>Let’s now view our shovel through the lens of sampling with the following 3-step <em>tactile</em> sampling simulation:</p>
+<p><strong>Step 1</strong>: Use the shovel to take a sample of size <span class="math inline">\(n=50\)</span> balls from the bowl as seen in Fig <a href="8-sampling.html#fig:tactile1">8.3</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:tactile1"></span>
+<img src="images/sampling/tactile_1_b.jpg" alt="Step 1: Take sample of size $n=50$" width="600px" />
+<p class="caption">
+Figure 8.3: Step 1: Take sample of size <span class="math inline">\(n=50\)</span>
+</p>
+</div>
+<p><strong>Step 2</strong>: Pour them into a cup and</p>
+<ul>
+<li>Count the number that are red then</li>
+<li>Compute the sample proportion <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> balls that are red</li>
+</ul>
+<p>as seen in Figure <a href="8-sampling.html#fig:tactile2">8.4</a> below. Note from above there are 18 balls out of <span class="math inline">\(n=50\)</span> that are red. Thus the <em>sample proportion red <span class="math inline">\(\widehat{p}\)</span></em> for this particular sample is thus <span class="math inline">\(\widehat{p} = 18 / 50 = 0.36\)</span>.</p>
+<div class="figure" style="text-align: center"><span id="fig:tactile2"></span>
+<img src="images/sampling/tactile_2_a.jpg" alt="Step 2: Pour into Red Solo Cup and compute $\widehat{p}$" width="400px" />
+<p class="caption">
+Figure 8.4: Step 2: Pour into Red Solo Cup and compute <span class="math inline">\(\widehat{p}\)</span>
+</p>
+</div>
+<p><strong>Step 3</strong>: Mark the sample proportion <span class="math inline">\(\widehat{p}\)</span> in a hand-drawn histogram, just like our intrepid students are doing in Figure <a href="8-sampling.html#fig:tactile3">8.5</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:tactile3"></span>
+<img src="images/sampling/tactile_3_a.jpg" alt="Step 3: Mark $\widehat{p}$'s in histogram" width="600px" />
+<p class="caption">
+Figure 8.5: Step 3: Mark <span class="math inline">\(\widehat{p}\)</span>’s in histogram
+</p>
+</div>
+<p><strong>Repeat Steps 1-3 a few times</strong>: After a few groups of students complete this exercise, let’s draw the resulting histogram by hand. In Figure <a href="8-sampling.html#fig:tactile4">8.6</a> we have the resulting hand-drawn histogram for 10 groups of students.</p>
+<div class="figure" style="text-align: center"><span id="fig:tactile4"></span>
+<img src="images/sampling/tactile_3_c.jpg" alt="Step 3: Histogram of 10 values of $\widehat{p}$" width="600px" />
+<p class="caption">
+Figure 8.6: Step 3: Histogram of 10 values of <span class="math inline">\(\widehat{p}\)</span>
+</p>
+</div>
+<p>Observe the behavior of the 10 different values of the sample proportion <span class="math inline">\(\widehat{p}\)</span> in the histogram of their distribution, in particular where the values center and how much they spread out, in other words <em>how much they vary</em>. Note:</p>
+<ul>
+<li>The lowest value of <span class="math inline">\(\widehat{p}\)</span> was somewhere between 0.20 and 0.25.</li>
+<li>The highest value of <span class="math inline">\(\widehat{p}\)</span> was somewhere between 0.45 and 0.50.</li>
+<li>Five of the sample proportions <span class="math inline">\(\widehat{p}\)</span> cluster. Five different samples of size <span class="math inline">\(n=50\)</span> yielded sample proportions <span class="math inline">\(\widehat{p}\)</span> that were in the range 0.30 to 0.35.</li>
+</ul>
+<p>Let’s now look at some real-life outcomes of this tactile sampling simulation. We present the actual results for not 10 groups of students, but 33 groups of students below!</p>
+</div>
+<div id="student-shovels" class="section level3">
+<h3><span class="header-section-number">8.2.2</span> Using shovel 33 times</h3>
+<p>All told, 33 groups took samples. In other words, the shovel was used 33 times and 33 values of the sample proportion <span class="math inline">\(\widehat{p}\)</span> were computed; this data is saved in the <code>tactile_prop_red</code> data frame included in the <code>moderndive</code> package. Let’s display its contents in Table <a href="#tab:tactile-prop-red"><strong>??</strong></a>. Notice how the <code>replicate</code> column enumerates each of the 33 groups, <code>red_balls</code> is the count of balls in the sample of size <span class="math inline">\(n=50\)</span> that we red, and <code>prop_red</code> is the sample proportion <span class="math inline">\(\widehat{p}\)</span> that are red.</p>
+<pre class="sourceCode r"><code class="sourceCode r">tactile_prop_red
+<span class="kw">View</span>(tactile_prop_red)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">group</th>
+<th align="center">replicate</th>
+<th align="center">red_balls</th>
+<th align="center">prop_red</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">Ilyas, Yohan</td>
+<td align="center">1</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="even">
+<td align="center">Morgan, Terrance</td>
+<td align="center">2</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="odd">
+<td align="center">Martin, Thomas</td>
+<td align="center">3</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="even">
+<td align="center">Clark, Frank</td>
+<td align="center">4</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="odd">
+<td align="center">Riddhi, Karina</td>
+<td align="center">5</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="even">
+<td align="center">Andrew, Tyler</td>
+<td align="center">6</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="odd">
+<td align="center">Julia</td>
+<td align="center">7</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="even">
+<td align="center">Rachel, Lauren</td>
+<td align="center">8</td>
+<td align="center">11</td>
+<td align="center">0.22</td>
+</tr>
+<tr class="odd">
+<td align="center">Daniel, Caroline</td>
+<td align="center">9</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="even">
+<td align="center">Josh, Maeve</td>
+<td align="center">10</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="odd">
+<td align="center">Emily, Emily</td>
+<td align="center">11</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="even">
+<td align="center">Conrad, Emily</td>
+<td align="center">12</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="odd">
+<td align="center">Oliver, Erik</td>
+<td align="center">13</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="even">
+<td align="center">Isabel, Nam</td>
+<td align="center">14</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="odd">
+<td align="center">X, Claire</td>
+<td align="center">15</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="even">
+<td align="center">Cindy, Kimberly</td>
+<td align="center">16</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="odd">
+<td align="center">Kevin, James</td>
+<td align="center">17</td>
+<td align="center">11</td>
+<td align="center">0.22</td>
+</tr>
+<tr class="even">
+<td align="center">Nam, Isabelle</td>
+<td align="center">18</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="odd">
+<td align="center">Harry, Yuko</td>
+<td align="center">19</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="even">
+<td align="center">Yuki, Eileen</td>
+<td align="center">20</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="odd">
+<td align="center">Ramses</td>
+<td align="center">21</td>
+<td align="center">23</td>
+<td align="center">0.46</td>
+</tr>
+<tr class="even">
+<td align="center">Joshua, Elizabeth, Stanley</td>
+<td align="center">22</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="odd">
+<td align="center">Siobhan, Jane</td>
+<td align="center">23</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="even">
+<td align="center">Jack, Will</td>
+<td align="center">24</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="odd">
+<td align="center">Caroline, Katie</td>
+<td align="center">25</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="even">
+<td align="center">Griffin, Y</td>
+<td align="center">26</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="odd">
+<td align="center">Kaitlin, Jordan</td>
+<td align="center">27</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="even">
+<td align="center">Ella, Garrett</td>
+<td align="center">28</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="odd">
+<td align="center">Julie, Hailin</td>
+<td align="center">29</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="even">
+<td align="center">Katie, Caroline</td>
+<td align="center">30</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="odd">
+<td align="center">Mallory, Damani, Melissa</td>
+<td align="center">31</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="even">
+<td align="center">Katie</td>
+<td align="center">32</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="odd">
+<td align="center">Francis, Vignesh</td>
+<td align="center">33</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+</tbody>
+</table>
+<p>Using your data visualization skills that you honed in Chapter <a href="3-viz.html#viz">3</a>, let’s visualize the distribution of these 33 sample proportions red <span class="math inline">\(\widehat{p}\)</span> using a histogram with <code>binwidth = 0.05</code>. This visualization is appropriate since <code>prop_red</code> is a numerical variable. This histogram is showing a very particular important type of distribution in statistics: the <em>sampling distribution</em>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(tactile_prop_red, <span class="kw">aes</span>(<span class="dt">x =</span> prop_red)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.05</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Sample proportion red based on n = 50&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Sampling distribution of p-hat&quot;</span>) </code></pre>
+<div class="figure" style="text-align: center"><span id="fig:samplingdistribution-tactile"></span>
+<img src="ismaykim_files/figure-html/samplingdistribution-tactile-1.png" alt="Sampling distribution of 33 sample proportions based on 33 tactile samples with n=50" width="\textwidth" />
+<p class="caption">
+Figure 8.7: Sampling distribution of 33 sample proportions based on 33 tactile samples with n=50
+</p>
+</div>
+<p>Sampling distributions are a specific kind of distribution: distributions of <em>point estimates/sample statistics</em> based on samples of size <span class="math inline">\(n\)</span> used to estimate an unknown <em>population parameter</em>.</p>
+<p>In the case of the histogram in Figure <a href="8-sampling.html#fig:samplingdistribution-tactile">8.7</a>, its the distribution of the sample proportion red <span class="math inline">\(\widehat{p}\)</span> based on <span class="math inline">\(n=50\)</span> sampled balls from the bowl, for which we want to estimate the unknown <em>population proportion</em> <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls that are red. Sampling distributions describe how values of the sample proportion red <span class="math inline">\(\widehat{p}\)</span> will vary from sample to sample due to <strong>sampling variability</strong> and thus identify “typical” and “atypical” values of <span class="math inline">\(\widehat{p}\)</span>. For example</p>
+<ul>
+<li>Obtaining a sample that yields <span class="math inline">\(\widehat{p} = 0.36\)</span> would be considered typical, common, and plausible since it would in theory occur frequently.</li>
+<li>Obtaining a sample that yields <span class="math inline">\(\widehat{p} = 0.8\)</span> would be considered atypical, uncommon, and implausible since it lies far away from most of the distribution.</li>
+</ul>
+<p>Let’s now ask ourselves the following questions:</p>
+<ol style="list-style-type: decimal">
+<li>Where is the sampling distribution centered?</li>
+<li>What is the spread of this sampling distribution?</li>
+</ol>
+<p>Recall from Section <a href="5-wrangling.html#summarize">5.4</a> the mean and the standard deviation are two summary statistics that would answer this question:</p>
+<pre class="sourceCode r"><code class="sourceCode r">tactile_prop_red <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(prop_red), <span class="dt">sd =</span> <span class="kw">sd</span>(prop_red))</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">mean</th>
+<th align="center">sd</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">0.356</td>
+<td align="center">0.058</td>
+</tr>
+</tbody>
+</table>
+<p>Finally, it’s important to keep in mind:</p>
+<ol style="list-style-type: decimal">
+<li>If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red <span class="math inline">\(p\)</span>, or in other words the true number of balls out of 2400 that are red.</li>
+<li>The spread of this histogram, as quantified by the standard deviation of 0.058, is called the <strong>standard error</strong>. It quantifies the variability of our estimates for <span class="math inline">\(\widehat{p}\)</span>.
+<ul>
+<li><strong>Note</strong>: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors.</li>
+</ul></li>
+</ol>
+</div>
+</div>
+<div id="virtual" class="section level2">
+<h2><span class="header-section-number">8.3</span> Virtual sampling simulation</h2>
+<p>Now let’s mimic the above <em>tactile</em> sampling, but with <em>virtual</em> sampling. We’ll resort to virtual sampling because while collecting 33 tactile samples manually is feasible, for large numbers like 1000, things start getting tiresome! That’s where a computer can really help: computers excel at performing mundane tasks repeatedly; think of what accounting software must be like!</p>
+<p>In other words:</p>
+<ul>
+<li>Instead of considering the <em>tactile bowl</em> shown in Figure <a href="8-sampling.html#fig:sampling-exercise-1">8.1</a> above and using a <em>tactile shovel</em> to draw samples of size <span class="math inline">\(n=50\)</span></li>
+<li>Let’s use a <em>virtual bowl</em> saved in a computer and use R’s random number generator as a <em>virtual shovel</em> to draw samples of size <span class="math inline">\(n=50\)</span></li>
+</ul>
+<p>First, we describe our <em>virtual bowl</em>. In the <code>moderndive</code> package, we’ve included a data frame called <code>bowl</code> that has 2400 rows corresponding to the <span class="math inline">\(N=2400\)</span> balls in the physical bowl. Run <code>View(bowl)</code> in RStudio to convince yourselves that <code>bowl</code> is indeed a virtual version of the tactile bowl in the previous section.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bowl</code></pre>
+<pre><code># A tibble: 2,400 x 2
+   ball_ID color
+     &lt;int&gt; &lt;chr&gt;
+ 1       1 white
+ 2       2 white
+ 3       3 white
+ 4       4 red  
+ 5       5 white
+ 6       6 white
+ 7       7 red  
+ 8       8 white
+ 9       9 red  
+10      10 white
+# … with 2,390 more rows</code></pre>
+<p>Note that the balls are not actually marked with numbers; the variable <code>ball_ID</code> is merely used as an identification variable for each row of <code>bowl</code>. Recall our previous discussion on identification variables in Subsection <a href="4-tidy.html#identification-vs-measurement">4.2.2</a> in the “Data Tidying” Chapter <a href="4-tidy.html#tidy">4</a>.</p>
+<p>Next, we describe our <em>virtual shovel</em>: the <code>rep_sample_n()</code> function included in the <code>moderndive</code> package where <code>rep_sample_n()</code> indicates that we are taking repeated/replicated samples of size <span class="math inline">\(n\)</span>.</p>
+<div id="using-shovel-once-1" class="section level3">
+<h3><span class="header-section-number">8.3.1</span> Using shovel once</h3>
+<p>The <code>rep_sample_n()</code> function included in the <code>moderndive</code> package where <code>rep_sample_n()</code> indicates that we are taking repeated/replicated samples of size <span class="math inline">\(n\)</span>. Let’s perform the virtual analogue of tactilely inserting the shovel <em>only once</em> into the bowl and extracting a sample of <code>size</code> <span class="math inline">\(n=50\)</span>. In the table below we only show results about the first 10 sampled balls out of 50.</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_shovel &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>)
+<span class="kw">View</span>(virtual_shovel)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-249">Table 8.1: </span>First 10 sampled balls of 50 in virtual sample</caption>
+<thead>
+<tr class="header">
+<th align="right">replicate</th>
+<th align="right">ball_ID</th>
+<th align="right">color</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">2079</td>
+<td align="right">red</td>
+</tr>
+<tr class="even">
+<td align="right">1</td>
+<td align="right">1076</td>
+<td align="right">white</td>
+</tr>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">1691</td>
+<td align="right">red</td>
+</tr>
+<tr class="even">
+<td align="right">1</td>
+<td align="right">1687</td>
+<td align="right">red</td>
+</tr>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">1434</td>
+<td align="right">white</td>
+</tr>
+<tr class="even">
+<td align="right">1</td>
+<td align="right">954</td>
+<td align="right">white</td>
+</tr>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">483</td>
+<td align="right">white</td>
+</tr>
+<tr class="even">
+<td align="right">1</td>
+<td align="right">1520</td>
+<td align="right">white</td>
+</tr>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">2060</td>
+<td align="right">red</td>
+</tr>
+<tr class="even">
+<td align="right">1</td>
+<td align="right">1682</td>
+<td align="right">white</td>
+</tr>
+</tbody>
+</table>
+<p>Looking at all 50 rows of <code>virtual_shovel</code> in the spreadsheet viewer that pops up after running <code>View(virtual_shovel)</code> in RStudio, the <code>ball_ID</code> variable seems to suggest that we do indeed have a random sample of <span class="math inline">\(n=50\)</span> balls. However, what does the <code>replicate</code> variable indicate, where in this case it’s equal to 1 for all 50 rows? We’ll see in a minute. First let’s compute both the number of balls red and the proportion red out of <span class="math inline">\(n=50\)</span> using our <code>dplyr</code> data wrangling tools from Chapter <a href="5-wrangling.html#wrangling">5</a>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_shovel <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)</code></pre>
+<table>
+<caption><span id="tab:unnamed-chunk-251">Table 8.2: </span>Count and proportion red in single virtual sample of size n = 50</caption>
+<thead>
+<tr class="header">
+<th align="right">replicate</th>
+<th align="right">red</th>
+<th align="right">prop_red</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="right">1</td>
+<td align="right">23</td>
+<td align="right">0.46</td>
+</tr>
+</tbody>
+</table>
+<p>Why does this work? Because for every row where <code>color == &quot;red&quot;</code>, the Boolean <code>TRUE</code> is returned and R treats <code>TRUE</code> like the number <code>1</code>. Equivalently, for every row where <code>color</code> is not equal to <code>&quot;red&quot;</code>, the Boolean <code>FALSE</code> is returned and R treats <code>FALSE</code> like the number <code>0</code>. So summing the number of <code>TRUE</code>’s and <code>FALSE</code>’s is equivalent to summing <code>1</code>’s and <code>0</code>’s which counts the number of balls where <code>color</code> is <code>red</code>.</p>
+</div>
+<div id="using-shovel-33-times" class="section level3">
+<h3><span class="header-section-number">8.3.2</span> Using shovel 33 times</h3>
+<p>Recall however in our tactile sampling exercise in Section <a href="8-sampling.html#tactile">8.2</a> above that we had 33 groups of students take 33 samples total of size <span class="math inline">\(n=50\)</span> using the shovel 33 times and hence compute 33 separate values of the sample proportion red <span class="math inline">\(\widehat{p}\)</span>. In other words we <em>repeated/replicated</em> the sampling 33 times. We can achieve this by reusing the same <code>rep_sample_n()</code> function code above, but by adding the <code>reps = 33</code> argument indicating we want to repeat this sampling 33 times:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_samples &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">33</span>)
+<span class="kw">View</span>(virtual_samples)</code></pre>
+<p><code>virtual_samples</code> has <span class="math inline">\(50 \times 33 = 1650\)</span> rows, corresponding to 33 samples of size <span class="math inline">\(n=50\)</span>, or 33 draws from the shovel. We won’t display the contents of this data frame but leave it to you to <code>View()</code> this data frame. You’ll see that the first 50 rows have <code>replicate</code> equal to 1, then the next 50 rows have <code>replicate</code> equal to 2, and so on and so forth, up until the last 50 rows which have <code>replicate</code> equal to 33. The <code>replicate</code> variable denotes which of our 33 samples a particular ball is included in.</p>
+<p>Now let’s compute the 33 corresponding values of the sample proportion <span class="math inline">\(\widehat{p}\)</span> based on 33 different samples of size <span class="math inline">\(n=50\)</span> by reusing the previous code, but remembering to <code>group_by</code> the <code>replicate</code> variable first since we want to compute the sample proportion for each of the 33 samples separately. Notice the similarity of this table with Table <a href="#tab:tactile-prop-red"><strong>??</strong></a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red &lt;-<span class="st"> </span>virtual_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)
+<span class="kw">View</span>(virtual_prop_red)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">replicate</th>
+<th align="center">red</th>
+<th align="center">prop_red</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">1</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="even">
+<td align="center">2</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="odd">
+<td align="center">3</td>
+<td align="center">24</td>
+<td align="center">0.48</td>
+</tr>
+<tr class="even">
+<td align="center">4</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="odd">
+<td align="center">5</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="even">
+<td align="center">6</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="odd">
+<td align="center">7</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="even">
+<td align="center">8</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="odd">
+<td align="center">9</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="even">
+<td align="center">10</td>
+<td align="center">12</td>
+<td align="center">0.24</td>
+</tr>
+<tr class="odd">
+<td align="center">11</td>
+<td align="center">22</td>
+<td align="center">0.44</td>
+</tr>
+<tr class="even">
+<td align="center">12</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="odd">
+<td align="center">13</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="even">
+<td align="center">14</td>
+<td align="center">22</td>
+<td align="center">0.44</td>
+</tr>
+<tr class="odd">
+<td align="center">15</td>
+<td align="center">13</td>
+<td align="center">0.26</td>
+</tr>
+<tr class="even">
+<td align="center">16</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+</tr>
+<tr class="odd">
+<td align="center">17</td>
+<td align="center">23</td>
+<td align="center">0.46</td>
+</tr>
+<tr class="even">
+<td align="center">18</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="odd">
+<td align="center">19</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+</tr>
+<tr class="even">
+<td align="center">20</td>
+<td align="center">12</td>
+<td align="center">0.24</td>
+</tr>
+<tr class="odd">
+<td align="center">21</td>
+<td align="center">14</td>
+<td align="center">0.28</td>
+</tr>
+<tr class="even">
+<td align="center">22</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+</tr>
+<tr class="odd">
+<td align="center">23</td>
+<td align="center">14</td>
+<td align="center">0.28</td>
+</tr>
+<tr class="even">
+<td align="center">24</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="odd">
+<td align="center">25</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="even">
+<td align="center">26</td>
+<td align="center">12</td>
+<td align="center">0.24</td>
+</tr>
+<tr class="odd">
+<td align="center">27</td>
+<td align="center">22</td>
+<td align="center">0.44</td>
+</tr>
+<tr class="even">
+<td align="center">28</td>
+<td align="center">23</td>
+<td align="center">0.46</td>
+</tr>
+<tr class="odd">
+<td align="center">29</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+</tr>
+<tr class="even">
+<td align="center">30</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+</tr>
+<tr class="odd">
+<td align="center">31</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+<tr class="even">
+<td align="center">32</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+</tr>
+<tr class="odd">
+<td align="center">33</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+</tr>
+</tbody>
+</table>
+<p>Just as we did before, let’s now visualize the <em>sampling distribution</em> using a histogram with <code>binwidth = 0.05</code> of the 33 virtually sample proportions <span class="math inline">\(\widehat{p}\)</span>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(virtual_prop_red, <span class="kw">aes</span>(<span class="dt">x =</span> prop_red)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.05</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Sample proportion red based on n = 50&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Sampling distribution of p-hat&quot;</span>) </code></pre>
+<div class="figure" style="text-align: center"><span id="fig:samplingdistribution-virtual"></span>
+<img src="ismaykim_files/figure-html/samplingdistribution-virtual-1.png" alt="Sampling distribution of 33 sample proportions based on 33 virtual samples with n=50" width="\textwidth" />
+<p class="caption">
+Figure 8.8: Sampling distribution of 33 sample proportions based on 33 virtual samples with n=50
+</p>
+</div>
+<p>The resulting sampling distribution based on our virtual sampling simulation is near identical to the sampling distribution of our tactile sampling simulation from Section <a href="8-sampling.html#virtual">8.3</a>. Let’s compare them side-by-side in Figure <a href="8-sampling.html#fig:tactile-vs-virtual">8.9</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:tactile-vs-virtual"></span>
+<img src="ismaykim_files/figure-html/tactile-vs-virtual-1.png" alt="Comparison of sampling distributions based on 33 tactile &amp; virtual samples with n=50" width="\textwidth" />
+<p class="caption">
+Figure 8.9: Comparison of sampling distributions based on 33 tactile &amp; virtual samples with n=50
+</p>
+</div>
+<p>We see that they are similar in terms of center and spread, although not identical due to random variation. This was in fact by design, as we made the virtual contents of the virtual <code>bowl</code> match the actual contents of the actual bowl pictured above.</p>
+</div>
+<div id="using-shovel-1000-times" class="section level3">
+<h3><span class="header-section-number">8.3.3</span> Using shovel 1000 times</h3>
+<p>In Figure <a href="8-sampling.html#fig:samplingdistribution-virtual">8.8</a>, we can start seeing a pattern in the sampling distribution emerge. However, 33 values of the sample proportion <span class="math inline">\(\widehat{p}\)</span> might not be enough to get a true sense of the distribution. Using 1000 values of <span class="math inline">\(\widehat{p}\)</span> would definitely give a better sense. What are our two options for constructing these histograms?</p>
+<ol style="list-style-type: decimal">
+<li>Tactile sampling: Make the 33 groups of students take <span class="math inline">\(1000 / 33 \approx 31\)</span> samples of size <span class="math inline">\(n=50\)</span> each, count the number of red balls for each of the 1000 tactile samples, and then compute the 1000 corresponding values of the sample proportion <span class="math inline">\(\widehat{p}\)</span>. However, this would be cruel and unusual as this would take hours!</li>
+<li>Virtual sampling: Computers are very good at automating repetitive tasks such as this one. This is the way to go!</li>
+</ol>
+<p>First, generate 1000 samples of size <span class="math inline">\(n=50\)</span></p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_samples &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">1000</span>)
+<span class="kw">View</span>(virtual_samples)</code></pre>
+<p>Then for each of these 1000 samples of size <span class="math inline">\(n=50\)</span>, compute the corresponding sample proportions</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red &lt;-<span class="st"> </span>virtual_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)
+<span class="kw">View</span>(virtual_prop_red)</code></pre>
+<p>As previously done, let’s plot the sampling distribution of these 1000 simulated values of the sample proportion red <span class="math inline">\(\widehat{p}\)</span> with a histogram in Figure <a href="8-sampling.html#fig:samplingdistribution-virtual-1000">8.10</a>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(virtual_prop_red, <span class="kw">aes</span>(<span class="dt">x =</span> prop_red)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.05</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">labs</span>(<span class="dt">x =</span> <span class="st">&quot;Sample proportion red based on n = 50&quot;</span>, <span class="dt">title =</span> <span class="st">&quot;Sampling distribution of p-hat&quot;</span>) </code></pre>
+<div class="figure" style="text-align: center"><span id="fig:samplingdistribution-virtual-1000"></span>
+<img src="ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png" alt="Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50" width="\textwidth" />
+<p class="caption">
+Figure 8.10: Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50
+</p>
+</div>
+<p>Since the sampling is random and thus representative and unbiased, the above sampling distribution is centered at the true population proportion red <span class="math inline">\(p\)</span> of all <span class="math inline">\(N=2400\)</span> balls in the bowl. Eyeballing it, the sampling distribution appears to be centered at around 0.375.</p>
+<p>What is the standard error of the above sampling distribution of <span class="math inline">\(\widehat{p}\)</span> based on 1000 samples of size <span class="math inline">\(n=50\)</span>?</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">SE =</span> <span class="kw">sd</span>(prop_red))</code></pre>
+<pre><code># A tibble: 1 x 1
+      SE
+   &lt;dbl&gt;
+1 0.0698</code></pre>
+<p>What this value is saying might not be immediately apparent by itself to someone who is new to sampling. It’s best to first compare different standard errors for different sampling schemes based on different sample sizes <span class="math inline">\(n\)</span>. We’ll do so for samples of size <span class="math inline">\(n=25\)</span>, <span class="math inline">\(n=50\)</span>, and <span class="math inline">\(n=100\)</span> next.</p>
+</div>
+<div id="using-different-shovels" class="section level3">
+<h3><span class="header-section-number">8.3.4</span> Using different shovels</h3>
+<p>Recall, the sampling we just did on the computer using the <code>rep_sample_n()</code> function is simply a virtual version of act of taking a tactile sample using the shovel with <span class="math inline">\(n=50\)</span> slots seen in Figure <a href="8-sampling.html#fig:shovel-n-50">8.11</a>. We visualized the variation in the resulting sample proportion red <span class="math inline">\(\widehat{p}\)</span> in a histogram of the sampling distribution and quantified this variation using the standard error.</p>
+<div class="figure" style="text-align: center"><span id="fig:shovel-n-50"></span>
+<img src="images/sampling/shovel_050.jpg" alt="Tactile shovel for sampling n = 50 balls" width="400px" />
+<p class="caption">
+Figure 8.11: Tactile shovel for sampling n = 50 balls
+</p>
+</div>
+<p>But what if we changed the sample size to <span class="math inline">\(n=25\)</span>? This would correspond to sampling using the shovel with <span class="math inline">\(n=25\)</span> slots see in Figure <a href="8-sampling.html#fig:shovel-n-25">8.12</a>. What differences if any would you notice about the sampling distribution and the standard error?</p>
+<div class="figure" style="text-align: center"><span id="fig:shovel-n-25"></span>
+<img src="images/sampling/shovel_025.jpg" alt="Tactile shovel for sampling n = 25 balls" width="400px" />
+<p class="caption">
+Figure 8.12: Tactile shovel for sampling n = 25 balls
+</p>
+</div>
+<p>Furthermore what if we took samples of size <span class="math inline">\(n=100\)</span> as well? This would correspond to sampling using the shovel with <span class="math inline">\(n=100\)</span> slots see in Figure <a href="8-sampling.html#fig:shovel-n-100">8.13</a>. What differences if any would you notice about the sampling distribution and the standard error for <span class="math inline">\(n=100\)</span> as compared to <span class="math inline">\(n=50\)</span> and <span class="math inline">\(n=25\)</span>?</p>
+<div class="figure" style="text-align: center"><span id="fig:shovel-n-100"></span>
+<img src="images/sampling/shovel_100.jpg" alt="Tactile shovel for sampling n = 100 balls" width="400px" />
+<p class="caption">
+Figure 8.13: Tactile shovel for sampling n = 100 balls
+</p>
+</div>
+<p>Let’s take the opportunity to review our sampling procedure and do this for 1000 virtual samples of size <span class="math inline">\(n=25\)</span>, <span class="math inline">\(n=50\)</span>, <span class="math inline">\(n=100\)</span> each.</p>
+<p><strong>Shovel with <span class="math inline">\(n=50\)</span> slots</strong>: Take 1000 virtual samples of size <span class="math inline">\(n=50\)</span>, mimicking the act of taking 1000 tactile samples using the shovel with <span class="math inline">\(n=50\)</span> slots:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_samples_<span class="dv">50</span> &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">1000</span>)</code></pre>
+<p>Then based on each of these 1000 virtual samples of size <span class="math inline">\(n=50\)</span>, compute the corresponding 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span> being sure to divide by <code>50</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">50</span> &lt;-<span class="st"> </span>virtual_samples_<span class="dv">50</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)</code></pre>
+<p>The <em>standard error</em> is the standard deviation of the 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span>, in other words we are quantifying how much <span class="math inline">\(\widehat{p}\)</span> varies from sample-to-sample based on samples of size <span class="math inline">\(n=50\)</span> due to sampling variation.</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">50</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">SE =</span> <span class="kw">sd</span>(prop_red))</code></pre>
+<pre><code># A tibble: 1 x 1
+      SE
+   &lt;dbl&gt;
+1 0.0694</code></pre>
+<p><strong>Shovel with <span class="math inline">\(n=25\)</span> slots</strong>: Take 1000 virtual samples of size <span class="math inline">\(n=25\)</span>, mimicking the act of taking 1000 tactile samples using the shovel with <span class="math inline">\(n=25\)</span> slots:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_samples_<span class="dv">25</span> &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">25</span>, <span class="dt">reps =</span> <span class="dv">1000</span>)</code></pre>
+<p>Then based on each of these 1000 virtual samples of size <span class="math inline">\(n=50\)</span>, compute the corresponding 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span> being sure to divide by <code>50</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">25</span> &lt;-<span class="st"> </span>virtual_samples_<span class="dv">25</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">25</span>)</code></pre>
+<p>The <em>standard error</em> is the standard deviation of the 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span>, in other words we are quantifying how much <span class="math inline">\(\widehat{p}\)</span> varies from sample-to-sample based on samples of size <span class="math inline">\(n=25\)</span> due to sampling variation.</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">25</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">SE =</span> <span class="kw">sd</span>(prop_red))</code></pre>
+<pre><code># A tibble: 1 x 1
+     SE
+  &lt;dbl&gt;
+1 0.100</code></pre>
+<p><strong>Shovel with <span class="math inline">\(n=100\)</span> slots</strong>: Take 1000 virtual samples of size <span class="math inline">\(n=100\)</span>, mimicking the act of taking 1000 tactile samples using the shovel with <span class="math inline">\(n=100\)</span> slots:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_samples_<span class="dv">100</span> &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">100</span>, <span class="dt">reps =</span> <span class="dv">1000</span>)</code></pre>
+<p>Then based on each of these 1000 virtual samples of size <span class="math inline">\(n=100\)</span>, compute the corresponding 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span> being sure to divide by <code>100</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">100</span> &lt;-<span class="st"> </span>virtual_samples_<span class="dv">100</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">100</span>)</code></pre>
+<p>The <em>standard error</em> is the standard deviation of the 1000 sample proportions <span class="math inline">\(\widehat{p}\)</span>, in other words we are quantifying how much <span class="math inline">\(\widehat{p}\)</span> varies from sample-to-sample based on samples of size <span class="math inline">\(n=100\)</span> due to sampling variation.</p>
+<pre class="sourceCode r"><code class="sourceCode r">virtual_prop_red_<span class="dv">100</span> <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">SE =</span> <span class="kw">sd</span>(prop_red))</code></pre>
+<pre><code># A tibble: 1 x 1
+      SE
+   &lt;dbl&gt;
+1 0.0457</code></pre>
+<p><strong>Comparison</strong>: Let’s compare the 3 standard errors we computed above in Table <a href="#tab:comparing-n"><strong>??</strong></a>:</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">n</th>
+<th align="center">SE</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">25</td>
+<td align="center">0.100</td>
+</tr>
+<tr class="even">
+<td align="center">50</td>
+<td align="center">0.069</td>
+</tr>
+<tr class="odd">
+<td align="center">100</td>
+<td align="center">0.046</td>
+</tr>
+</tbody>
+</table>
+<p>Observe the behavior of the standard error as <span class="math inline">\(n\)</span> increases from <span class="math inline">\(n=25\)</span> to <span class="math inline">\(n=50\)</span> to <span class="math inline">\(n=100\)</span>, the standard error get smaller. In other words, the values of <span class="math inline">\(\widehat{p}\)</span> vary less. The standard error is a numerical quantification of the spreads of the following three histograms (on the same scale) of the sampling distribution of the sample proportion <span class="math inline">\(\widehat{p}\)</span>:</p>
+<div class="figure" style="text-align: center"><span id="fig:comparing-sampling-distributions"></span>
+<img src="ismaykim_files/figure-html/comparing-sampling-distributions-1.png" alt="Comparing sampling distributions of p-hat for different sample sizes n" width="\textwidth" />
+<p class="caption">
+Figure 8.14: Comparing sampling distributions of p-hat for different sample sizes n
+</p>
+</div>
+<p>Observe that the histogram of possible <span class="math inline">\(\widehat{p}\)</span> values are narrowest and most consistent for the <span class="math inline">\(n=100\)</span> case. In other words, they make less error. “Bigger sample size equals better sampling” is a concept you probably knew before reading this chapter. What we’ve just demonstrated is what this concept means: Samples based on large samples sizes will yield point estimates that vary less around the true value and hence be less prone to error.</p>
+<p>In the case of our sampling bowl, the sample proportion red <span class="math inline">\(\widehat{p}\)</span> based on samples of size <span class="math inline">\(n=100\)</span> will vary the least around the true proportion <span class="math inline">\(p\)</span> of the balls that are red, and thus be less prone to error. On the case of polls as we study in the next chapter: representative polls based on a larger number of respondents will yield guess that tend to be closer to the truth.</p>
+</div>
+</div>
+<div id="polls" class="section level2">
+<h2><span class="header-section-number">8.4</span> In real-life sampling: Polls</h2>
+<p>In December 4, 2013 National Public Radio reported on a recent poll of President Obama’s approval rating among young Americans aged 18-29 in an article <a href="https://www.npr.org/sections/itsallpolitics/2013/12/04/248793753/poll-support-for-obama-among-young-americans-eroding">Poll: Support For Obama Among Young Americans Eroding</a>. A quote from the article:</p>
+<blockquote>
+<p>After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama.</p>
+<p>According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama’s job performance, his lowest-ever standing among the group and an 11-point drop from April.</p>
+</blockquote>
+<p>Let’s tie elements of this story using the concepts and terminology we learned at the outset of this chapter along with our observations from the tactile and virtual sampling simulations:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Population</strong>: Who is the population of <span class="math inline">\(N\)</span> observations of interest?
+<ul>
+<li>Bowl: <span class="math inline">\(N=2400\)</span> identically-shaped balls</li>
+<li>Obama poll: <span class="math inline">\(N = \text{?}\)</span> young Americans aged 18-29</li>
+</ul></li>
+<li><strong>Population parameter</strong>: What is the population parameter?
+<ul>
+<li>Bowl: The true population proportion <span class="math inline">\(p\)</span> of the balls in the bowl that are red.</li>
+<li>Obama poll: The true population proportion <span class="math inline">\(p\)</span> of young Americans who approve of Obama’s job performance.</li>
+</ul></li>
+<li><strong>Census</strong>: What would a census be in this case?
+<ul>
+<li>Bowl: Manually going over all <span class="math inline">\(N=2400\)</span> balls and exactly computing the population proportion <span class="math inline">\(p\)</span> of the balls that are red.</li>
+<li>Obama poll: Locating all <span class="math inline">\(N = \text{?}\)</span> young Americans (which is in the millions) and asking them if they approve of Obama’s job performance. This would be quite expensive to do!</li>
+</ul></li>
+<li><strong>Sampling</strong>: How do you acquire the sample of size <span class="math inline">\(n\)</span> observations?
+<ul>
+<li>Bowl: Using the shovel to extract a sample of <span class="math inline">\(n=50\)</span> balls.</li>
+<li>Obama poll: One way would be to get phone records from a database and pick out <span class="math inline">\(n\)</span> phone numbers. In the case of the above poll, the sample was of size <span class="math inline">\(n=2089\)</span> young adults.</li>
+</ul></li>
+<li><strong>Point estimates/sample statistics</strong>: What is the summary statistic based on the sample of size <span class="math inline">\(n\)</span> that <em>estimates</em> the unknown population parameter?
+<ul>
+<li>Bowl: The <em>sample proportion <span class="math inline">\(\widehat{p}\)</span></em> red of the balls in the sample of size <span class="math inline">\(n=50\)</span>.</li>
+<li>Key: The sample proportion red <span class="math inline">\(\widehat{p}\)</span> of young Americans in the sample of size <span class="math inline">\(n=2089\)</span> that approve of Obama’s job performance. In this study’s case, <span class="math inline">\(\widehat{p} = 0.41\)</span> which is the quoted 41% figure in the article.</li>
+</ul></li>
+<li><strong>Representative sampling</strong>: Is the sample procedure <em>representative</em>? In other words, to the resulting samples “look like” the population?
+<ul>
+<li>Bowl: Does our sample of <span class="math inline">\(n=50\)</span> balls “look like” the contents of the larger set of <span class="math inline">\(N=2400\)</span> balls in the bowl?</li>
+<li>Obama poll: Does our sample of <span class="math inline">\(n=2089\)</span> young Americans “look like” the population of all young Americans aged 18-29?</li>
+</ul></li>
+<li><strong>Generalizability</strong>: Are the samples <em>generalizable</em> to the greater population?
+<ul>
+<li>Bowl: Is <span class="math inline">\(\widehat{p}\)</span> a “good guess” of <span class="math inline">\(p\)</span>?</li>
+<li>Obama poll: Is <span class="math inline">\(\widehat{p} = 0.41\)</span> a “good guess” of <span class="math inline">\(p\)</span>? In other words, can we confidently say that 41% of <em>all</em> young Americans approve of Obama.</li>
+</ul></li>
+<li><strong>Bias</strong>: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample?
+<ul>
+<li>Bowl: Here, I would say it is unbiased. All balls are equally sized as evidenced by the slots of the <span class="math inline">\(n=50\)</span> shovel, and thus no particular color of ball can be favored in our samples over others.</li>
+<li>Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using a database of only mobile phone numbers, would people without mobile phones be included? What about if this were an internet poll on a certain news website? Would non-readers of this this website be included?</li>
+</ul></li>
+<li><strong>Random sampling</strong>: Was the sampling random?
+<ul>
+<li>Bowl: As long as you mixed the bowl sufficiently before sampling, your samples would be random?</li>
+<li>Obama poll: Random sampling is a necessary assumption for all of the above to work. Most articles reporting on polls take this assumption as granted. In our Obama poll, you’d have to ask the group that conducted the poll: The Harvard University Institute of Politics.</li>
+</ul></li>
+</ol>
+<p>Recall the punchline of all the above:</p>
+<blockquote>
+<ul>
+<li>If the sampling of a sample of size <span class="math inline">\(n\)</span> is done at <strong>random</strong>, then</li>
+<li>The sample is <strong>unbiased</strong> and <strong>representative</strong> of the population, thus</li>
+<li>Any result based on the sample can <strong>generalize</strong> to the population, thus</li>
+<li>The <strong>point estimate/sample statistic</strong> is a “good guess” of the unknown population parameter of interest</li>
+</ul>
+</blockquote>
+<p>and thus we have <em>inferred</em> about the population based on our sample. In the bowl example:</p>
+<blockquote>
+<ul>
+<li>If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size <span class="math inline">\(n=50\)</span>, then</li>
+<li>The contents of the shovel will “look like” the contents of the bowl, thus</li>
+<li>Any results based on the sample of <span class="math inline">\(n=50\)</span> balls can generalize to the large bowl of <span class="math inline">\(N=2400\)</span> balls, thus</li>
+<li>The sample proportion <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> sampled balls in the shovel that are red is a “good guess” of the true population proportion <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls that are red.</li>
+</ul>
+</blockquote>
+<p>and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel: the proportion of balls that are red. In the Obama poll example:</p>
+<blockquote>
+<ul>
+<li>If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then</li>
+<li>These 2089 young Americans would “look like” the population of all young Americans, thus</li>
+<li>Any results based on this sample of 2089 young Americans can generalize to entire population of all young Americans, thus</li>
+<li>The reported sample approval rating of 41% of these 2089 young Americans is a “good guess” of the true approval rating amongst <em>all</em> young Americans.</li>
+</ul>
+</blockquote>
+<p>So long story short, this poll’s guess of Obama’s approval rating was 41%. However is this the end of the story when understanding the results of a poll? If you read further in the article, it states:</p>
+<blockquote>
+<p>The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points.</p>
+</blockquote>
+<p>Note the term <em>margin of error</em>, which here is plus or minus 2.1 percentage points. This is saying that a typical range of errors for polls of this type is about <span class="math inline">\(\pm 2.1\%\)</span>, in words from about 2.1% too small to about 2.1% too big. These errors are caused by <em>sampling variation</em>, the same sampling variation you saw studied in the histograms in Sections <a href="8-sampling.html#tactile">8.2</a> on our tactile sampling simulations and Sections <a href="8-sampling.html#virtual">8.3</a> on our virtual sampling simulations.</p>
+<p>In this case of polls, any variation from the true approval rating is an “error” and a reasonable range of errors is the margin of error. We’ll see in the next chapter that this what’s known as a 95% confidence interval for the unknown approval rating. We’ll study confidence intervals using a new package for our data science and statistical toolbox: the <code>infer</code> package for statistical inference.</p>
+</div>
+<div id="conclusion-6" class="section level2">
+<h2><span class="header-section-number">8.5</span> Conclusion</h2>
+<div id="central-limit-theorem" class="section level3">
+<h3><span class="header-section-number">8.5.1</span> Central Limit Theorem</h3>
+<p>What you did in Section <a href="8-sampling.html#tactile">8.2</a> and <a href="8-sampling.html#virtual">8.3</a> was demonstrate a very famous theorem, or mathematically proven truth, called the <em>Central Limit Theorem</em>. It loosely states that when sample means and sample proportions are based on larger and larger samples, the sampling distribution corresponding to these point estimates get</p>
+<ol style="list-style-type: decimal">
+<li>More and more normal</li>
+<li>More and more narrow</li>
+</ol>
+<p>Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following three minute and 38 second video explaining this crucial theorem to statistics using as examples, what else?</p>
+<ol style="list-style-type: decimal">
+<li>The average weight of wild bunny rabbits!</li>
+<li>The average wing span of dragons!</li>
+</ol>
+<center>
+<iframe width="800" height="450" src="https://www.youtube.com/embed/jvoxEYmQHNM" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen>
+</iframe>
+</center>
+</div>
+<div id="whats-to-come-5" class="section level3">
+<h3><span class="header-section-number">8.5.2</span> What’s to come?</h3>
+<p>This chapter serves as an introduction to the theoretical underpinning of the statistical inference techniques that will be discussed in greater detail in Chapter <a href="9-confidence-intervals.html#confidence-intervals">9</a> for confidence intervals and Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> for hypothesis testing.</p>
+</div>
+<div id="script-of-r-code-5" class="section level3">
+<h3><span class="header-section-number">8.5.3</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/08-sampling.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="7-multiple-regression.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="9-confidence-intervals.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/08-sampling.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/9-confidence-intervals.html b/previous_versions/v0.4.0/9-confidence-intervals.html
new file mode 100644
index 000000000..00a84b19c
--- /dev/null
+++ b/previous_versions/v0.4.0/9-confidence-intervals.html
@@ -0,0 +1,1806 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>9 Confidence Intervals | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="9 Confidence Intervals | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="9 Confidence Intervals | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="8-sampling.html">
+<link rel="next" href="10-hypothesis-testing.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="confidence-intervals" class="section level1">
+<h1><span class="header-section-number">9</span> Confidence Intervals</h1>
+<p>In Chapter <a href="8-sampling.html#sampling">8</a>, we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter <a href="8-sampling.html#sampling">8</a>:</p>
+<p>Generally speaking, we learned that if the sampling of a sample of size <span class="math inline">\(n\)</span> is done at <em>random</em>, then the resulting sample is <em>unbiased</em> and <em>representative</em> of the <em>population</em>, thus any result based on the sample can <em>generalize</em> to the population, and hence the <strong>point estimate/sample statistic</strong> computed from this sample is a “good guess” of the unknown population parameter of interest</p>
+<p>Specific to the bowl, we learned that if we properly mix the balls first thereby ensuring the randomness of samples extracted using the shovel with <span class="math inline">\(n=50\)</span> slots, then the contents of the shovel will “look like” the contents of the bowl, thus any results based on the sample of <span class="math inline">\(n=50\)</span> balls can generalize to the large bowl of <span class="math inline">\(N=2400\)</span> balls, and hence the sample proportion red <span class="math inline">\(\widehat{p}\)</span> of the <span class="math inline">\(n=50\)</span> balls in the shovel is a “good guess” of the true population proportion red <span class="math inline">\(p\)</span> of the <span class="math inline">\(N=2400\)</span> balls in the bowl.</p>
+<p>We emphasize that we used a point estimate/sample statistic, in this case the sample proportion <span class="math inline">\(\widehat{p}\)</span>, to estimate the unknown value of the population parameter, in this case the population proportion <span class="math inline">\(p\)</span>. In other words, we are using the sample to <strong>infer</strong> about the population.</p>
+<p>We can however consider inferential situations other than just those involving proportions. We present a wide array of such scenarios in Table <a href="#tab:inference-summary-table"><strong>??</strong></a>. In all 7 cases, the point estimate/sample statistic <em>estimates</em> the unknown population parameter. It does so by computing summary statistics based on a sample of size <span class="math inline">\(n\)</span>.</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">Scenario</th>
+<th align="center">Population parameter</th>
+<th align="center">Population Notation</th>
+<th align="center">Point estimate/sample statistic</th>
+<th align="center">Sample Notation</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">1</td>
+<td align="center">Population proportion</td>
+<td align="center"><span class="math inline">\(p\)</span></td>
+<td align="center">Sample proportion</td>
+<td align="center"><span class="math inline">\(\widehat{p}\)</span></td>
+</tr>
+<tr class="even">
+<td align="center">2</td>
+<td align="center">Population mean</td>
+<td align="center"><span class="math inline">\(\mu\)</span></td>
+<td align="center">Sample mean</td>
+<td align="center"><span class="math inline">\(\overline{x}\)</span></td>
+</tr>
+<tr class="odd">
+<td align="center">3</td>
+<td align="center">Difference in population proportions</td>
+<td align="center"><span class="math inline">\(p_1 - p_2\)</span></td>
+<td align="center">Difference in sample proportions</td>
+<td align="center"><span class="math inline">\(\widehat{p}_1 - \widehat{p}_2\)</span></td>
+</tr>
+<tr class="even">
+<td align="center">4</td>
+<td align="center">Difference in population means</td>
+<td align="center"><span class="math inline">\(\mu_1 - \mu_2\)</span></td>
+<td align="center">Difference in sample means</td>
+<td align="center"><span class="math inline">\(\overline{x}_1 - \overline{x}_2\)</span></td>
+</tr>
+<tr class="odd">
+<td align="center">5</td>
+<td align="center">Population standard deviation</td>
+<td align="center"><span class="math inline">\(\sigma\)</span></td>
+<td align="center">Sample standard deviation</td>
+<td align="center"><span class="math inline">\(s\)</span></td>
+</tr>
+<tr class="even">
+<td align="center">6</td>
+<td align="center">Population regression intercept</td>
+<td align="center"><span class="math inline">\(\beta_0\)</span></td>
+<td align="center">Sample regression intercept</td>
+<td align="center"><span class="math inline">\(\widehat{\beta}_0\)</span> or <span class="math inline">\(b_0\)</span></td>
+</tr>
+<tr class="odd">
+<td align="center">7</td>
+<td align="center">Population regression slope</td>
+<td align="center"><span class="math inline">\(\beta_1\)</span></td>
+<td align="center">Sample regression slope</td>
+<td align="center"><span class="math inline">\(\widehat{\beta}_1\)</span> or <span class="math inline">\(b_1\)</span></td>
+</tr>
+</tbody>
+</table>
+<p>We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing:</p>
+<ul>
+<li>Scenario 2 about means. Ex: the average age of pennies.</li>
+<li>Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of <em>two-sample</em> inference.</li>
+<li>Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This another situation of <em>two-sample</em> inference.</li>
+</ul>
+<p>In contrast to these, Scenario 5 involves a measure of spread: the standard deviation. Does the spread/variability of a sample match the spread/variability of the population? However, we leave this topic for a more intermediate course on statistical inference.</p>
+<p>In Chapter <a href="11-inference-for-regression.html#inference-for-regression">11</a> on inference for regression, we’ll cover Scenarios 6 &amp; 7 about the regression line. In particular we’ll see that the fitted regression line from Chapter <a href="6-regression.html#regression">6</a> on basic regression, <span class="math inline">\(\widehat{y} = b_0 + b_1 \cdot x\)</span>, is in fact an estimate of some true population regression line <span class="math inline">\(y = \beta_0 + \beta+1 \cdot x\)</span> based on a sample of <span class="math inline">\(n\)</span> pairs of points <span class="math inline">\((x, y)\)</span>. Ex: Recall our sample of <span class="math inline">\(n=463\)</span> instructors at the UT Austin from the <code>evals</code> data set in Chapter <a href="6-regression.html#regression">6</a>. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for <em>all</em> instructors, not just those at the UT Austin?</p>
+<p>In most cases, we don’t have the population values as we did with the <code>bowl</code> of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a <strong>confidence interval</strong> and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as <strong>bootstrapping</strong> that will be the focus of the beginning sections of this chapter.</p>
+<div id="needed-packages-6" class="section level3 unnumbered">
+<h3>Needed packages</h3>
+<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section <a href="2-getting-started.html#packages">2.3</a> for information on how to install and load R packages.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(janitor)
+<span class="kw">library</span>(moderndive)
+<span class="kw">library</span>(infer)</code></pre>
+</div>
+<div id="datacamp-6" class="section level3 unnumbered">
+<h3>DataCamp</h3>
+<p>Our approach of using data science tools to understand the first major component of statistical inference, confidence intervals, uses the same tools as in <a href="https://twitter.com/minebocek">Mine Cetinkaya-Rundel</a> and <a href="https://twitter.com/crite">Andrew Bray’s</a> DataCamp courses “Inference for Numerical Data” and “Inference for Categorical Data.” If you’re interested in complementing your learning below in an interactive online environment, click on the images below to access the courses.</p>
+<center>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-numerical-data"><img src="images/datacamp_inference_for_numerical_data.png" alt="Drawing" style="height: 150px;"/></a>
+<a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-categorical-data"><img src="images/datacamp_inference_for_categorical_data.png" alt="Drawing" style="height: 150px;"/></a>
+</center>
+</div>
+<div id="bootstrapping" class="section level2">
+<h2><span class="header-section-number">9.1</span> Bootstrapping</h2>
+<div id="data-explanation" class="section level3">
+<h3><span class="header-section-number">9.1.1</span> Data explanation</h3>
+<p>The <code>moderndive</code> package contains a sample of 40 pennies collected and minted in the United States. Let’s explore this sample data first:</p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample</code></pre>
+<pre><code># A tibble: 40 x 2
+    year age_in_2011
+   &lt;int&gt;       &lt;int&gt;
+ 1  2005           6
+ 2  1981          30
+ 3  1977          34
+ 4  1992          19
+ 5  2005           6
+ 6  2006           5
+ 7  2000          11
+ 8  1992          19
+ 9  1988          23
+10  1996          15
+# … with 30 more rows</code></pre>
+<p>The <code>pennies_sample</code> data frame has rows corresponding to a single penny with two variables:</p>
+<ul>
+<li><code>year</code> of minting as shown on the penny and</li>
+<li><code>age_in_2011</code> giving the years the penny had been in circulation from 2011 as an integer, e.g. 15, 2, etc.</li>
+</ul>
+<p>Suppose we are interested in understanding some properties of the mean age of <strong>all</strong> US pennies from this data collected in 2011. How might we go about that? Let’s begin by understanding some of the properties of <code>pennies_sample</code> using data wrangling from Chapter <a href="5-wrangling.html#wrangling">5</a> and data visualization from Chapter <a href="3-viz.html#viz">3</a>.</p>
+</div>
+<div id="exploratory-data-analysis" class="section level3">
+<h3><span class="header-section-number">9.1.2</span> Exploratory data analysis</h3>
+<p>First, let’s visualize the values in this sample as a histogram:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(pennies_sample, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-275-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see a roughly symmetric distribution here that has quite a few values near 20 years in age with only a few larger than 40 years or smaller than 5 years. If <code>pennies_sample</code> is a representative sample from the population, we’d expect the age of all US pennies collected in 2011 to have a similar shape, a similar spread, and similar measures of central tendency like the mean.</p>
+<p>So where does the mean value fall for this sample? This point will be known as our <strong>point estimate</strong> and provides us with a single number that could serve as the guess to what the true population mean age might be. Recall how to find this using the <code>dplyr</code> package:</p>
+<pre class="sourceCode r"><code class="sourceCode r">x_bar &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))
+x_bar</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  25.1</code></pre>
+<p>We’ve denoted this <em>sample mean</em> as <span class="math inline">\(\bar{x}\)</span>, which is the standard symbol for denoting the mean of a sample. Our point estimate is, thus, <span class="math inline">\(\bar{x} = 25.1\)</span>. Note that this is just one sample though providing just one guess at the population mean. What if we’d like to have another guess?</p>
+<p>This should all sound similar to what we did in Chapter <a href="8-sampling.html#sampling">8</a>. There instead of collecting just a single scoop of balls we had many different students use the shovel to scoop different samples of red and white balls. We then calculated a sample statistic (the sample proportion) from each sample. But, we don’t have a population to pull from here with the pennies. We only have this one sample.</p>
+<p>The process of <strong>bootstrapping</strong> allows us to use a single sample to generate many different samples that will act as our way of approximating a sampling distribution using a created <strong>bootstrap distribution</strong> instead. We will pull ourselves up from our bootstraps using a single sample (<code>pennies_sample</code>) to get an idea of the grander sampling distribution.</p>
+</div>
+<div id="bootstrap-process" class="section level3">
+<h3><span class="header-section-number">9.1.3</span> The Bootstrapping Process</h3>
+<p>Bootstrapping uses a process of sampling <strong>with replacement</strong> from our original sample to create new <strong>bootstrap samples</strong> of the <em>same</em> size as our original sample. We can again make use of the <code>rep_sample_n()</code> function to explore what one such bootstrap sample would look like. Remember that we are randomly sampling from the original sample here with replacement and that we always use the same sample size for the bootstrap samples as the size of the original sample (<code>pennies_sample</code>).</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_sample1 &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">40</span>, <span class="dt">replace =</span> <span class="ot">TRUE</span>, <span class="dt">reps =</span> <span class="dv">1</span>)
+bootstrap_sample1</code></pre>
+<pre><code># A tibble: 40 x 3
+# Groups:   replicate [1]
+   replicate  year age_in_2011
+       &lt;int&gt; &lt;int&gt;       &lt;int&gt;
+ 1         1  1983          28
+ 2         1  2000          11
+ 3         1  2004           7
+ 4         1  1981          30
+ 5         1  1993          18
+ 6         1  2006           5
+ 7         1  1981          30
+ 8         1  2004           7
+ 9         1  1992          19
+10         1  1994          17
+# … with 30 more rows</code></pre>
+<p>Let’s visualize what this new bootstrap sample looks like:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(bootstrap_sample1, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-279-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We now have another sample from what we could assume comes from the population of interest. We can similarly calculate the sample mean of this bootstrap sample, called a <strong>bootstrap statistic</strong>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_sample1 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 1 x 2
+  replicate  stat
+      &lt;int&gt; &lt;dbl&gt;
+1         1  23.2</code></pre>
+<p>We can see that this sample mean is smaller than the <code>x_bar</code> value we calculated earlier for the <code>pennies_sample</code> data. We’ll come back to analyzing the different bootstrap statistic values shortly.</p>
+<p>Let’s recap what was done to get to this bootstrap sample using a tactile explanation:</p>
+<ol style="list-style-type: decimal">
+<li>First, pretend that each of the 40 values of <code>age_in_2011</code> in <code>pennies_sample</code> were written on a small piece of paper. Recall that these values were 6, 30, 34, 19, 6, etc.</li>
+<li>Now, put the 40 small pieces of paper into a receptacle such as a baseball cap.</li>
+<li>Shake up the pieces of paper.</li>
+<li>Draw “at random” from the cap to select one piece of paper.</li>
+<li>Write down the value on this piece of paper. Say that it is 28.</li>
+<li>Now, place this piece of paper containing 28 back into the cap.</li>
+<li>Draw “at random” again from the cap to select a piece of paper. Note that this is the <em>sampling with replacement</em> part since you may draw 28 again.</li>
+<li>Repeat this process until you have drawn 40 pieces of paper and written down the values on these 40 pieces of paper. Completing this repetition produces ONE bootstrap sample.</li>
+</ol>
+<p>If you look at the values in <code>bootstrap_sample1</code>, you can see how this process plays out. We originally drew 28, then we drew 11, then 7, and so on. Of course, we didn’t actually use pieces of paper and a cap here. We just had the computer perform this process for us to produce <code>bootstrap_sample1</code> using <code>rep_sample_n()</code> with <code>replace = TRUE</code> set.</p>
+<p>The process of <em>sampling with replacement</em> is how we can use the original sample to take a guess as to what other values in the population may be. Sometimes in these bootstrap samples, we will select lots of larger values from the original sample, sometimes we will select lots of smaller values, and most frequently we will select values that are near the center of the sample. Let’s explore what the distribution of values of <code>age_in_2011</code> for six different bootstrap samples looks like to further understand this variability.</p>
+<pre class="sourceCode r"><code class="sourceCode r">six_bootstrap_samples &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">40</span>, <span class="dt">replace =</span> <span class="ot">TRUE</span>, <span class="dt">reps =</span> <span class="dv">6</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(six_bootstrap_samples, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>replicate)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-282-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can also look at the six different means using <code>dplyr</code> syntax:</p>
+<pre class="sourceCode r"><code class="sourceCode r">six_bootstrap_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 6 x 2
+  replicate  stat
+      &lt;int&gt; &lt;dbl&gt;
+1         1  23.6
+2         2  24.1
+3         3  25.2
+4         4  23.1
+5         5  24.0
+6         6  24.7</code></pre>
+<p>Instead of doing this six times, we could do it 1000 times and then look at the distribution of <code>stat</code> across all 1000 of the <code>replicate</code>s. This sets the stage for the <code>infer</code> R package <span class="citation">(Bray et al. 2019)</span> that was created to help users perform statistical inference such as confidence intervals and hypothesis tests using verbs similar to what you’ve seen with <code>dplyr</code>. We’ll walk through setting up each of the <code>infer</code> verbs for confidence intervals using this <code>pennies_sample</code> example, while also explaining the purpose of the verbs in a general framework.</p>
+</div>
+</div>
+<div id="the-infer-package-for-statistical-inference" class="section level2">
+<h2><span class="header-section-number">9.2</span> The infer package for statistical inference</h2>
+<p>The <code>infer</code> package makes great use of the <code>%&gt;%</code> to create a pipeline for statistical inference. The goal of the package is to provide a way for its users to explain the computational process of confidence intervals and hypothesis tests using the code as a guide. The verbs build in order here, so you’ll want to start with <code>specify()</code> and then continue through the others as needed.</p>
+<div id="specify-variables" class="section level3">
+<h3><span class="header-section-number">9.2.1</span> Specify variables</h3>
+<p><img src="images/flowcharts/infer/specify.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The <code>specify()</code> function is used primarily to choose which variables will be the focus of the statistical inference. In addition, a setting of which variable will act as the <code>explanatory</code> and which acts as the <code>response</code> variable is done here. For proportion problems to those in Chapter <a href="8-sampling.html#sampling">8</a>, we can also give which of the different levels we would like to have as a <code>success</code>. We’ll see further examples of these options in this chapter, Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a>, and in Appendix <a href="B-appendixB.html#appendixB">B</a>.</p>
+<p>To begin to create a confidence interval for the population mean age of US pennies in 2011, we start by using <code>specify()</code> to choose which variable in our <code>pennies_sample</code> data we’d like to work with. This can be done in one of two ways:</p>
+<ol style="list-style-type: decimal">
+<li>Using the <code>response</code> argument:</li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age_in_<span class="dv">2011</span>)</code></pre>
+<pre><code>Response: age_in_2011 (integer)
+# A tibble: 40 x 1
+   age_in_2011
+         &lt;int&gt;
+ 1           6
+ 2          30
+ 3          34
+ 4          19
+ 5           6
+ 6           5
+ 7          11
+ 8          19
+ 9          23
+10          15
+# … with 30 more rows</code></pre>
+<ol start="2" style="list-style-type: decimal">
+<li>Using <code>formula</code> notation:</li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> age_in_<span class="dv">2011</span> <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>)</code></pre>
+<pre><code>Response: age_in_2011 (integer)
+# A tibble: 40 x 1
+   age_in_2011
+         &lt;int&gt;
+ 1           6
+ 2          30
+ 3          34
+ 4          19
+ 5           6
+ 6           5
+ 7          11
+ 8          19
+ 9          23
+10          15
+# … with 30 more rows</code></pre>
+<p>Note that the formula notation uses the common R methodology to include the response <span class="math inline">\(y\)</span> variable on the left of the <code>~</code> and the explanatory <span class="math inline">\(x\)</span> variable on the right of the “tilde.” Recall that you used this notation frequently with the <code>lm()</code> function in Chapters <a href="6-regression.html#regression">6</a> and <a href="7-multiple-regression.html#multiple-regression">7</a> when fitting regression models. Either notation works just fine, but a preference is usually given here for the <code>formula</code> notation to further build on the ideas from earlier chapters.</p>
+</div>
+<div id="generate-replicates" class="section level3">
+<h3><span class="header-section-number">9.2.2</span> Generate replicates</h3>
+<p><img src="images/flowcharts/infer/generate.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>After <code>specify()</code>ing the variables we’d like in our inferential analysis, we next feed that into the <code>generate()</code> verb. The <code>generate()</code> verb’s main argument is <code>reps</code>, which is used to give how many different repetitions one would like to perform. Another argument here is <code>type</code>, which is automatically determined by the kinds of variables passed into <code>specify()</code>. We can also be explicit and set this <code>type</code> to be <code>type = &quot;bootstrap&quot;</code>. This <code>type</code> argument will be further used in hypothesis testing in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> as well. Make sure to check out <code>?generate</code> to see the options here and use the <code>?</code> operator to better understand other verbs as well.</p>
+<p>Let’s <code>generate()</code> 1000 bootstrap samples:</p>
+<pre class="sourceCode r"><code class="sourceCode r">thousand_bootstrap_samples &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age_in_<span class="dv">2011</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>)</code></pre>
+<p>We can use the <code>dplyr</code> <code>count()</code> function to help us understand what the <code>thousand_bootstrap_samples</code> data frame looks like:</p>
+<pre class="sourceCode r"><code class="sourceCode r">thousand_bootstrap_samples <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">count</span>(replicate)</code></pre>
+<pre><code># A tibble: 1,000 x 2
+# Groups:   replicate [1,000]
+   replicate     n
+       &lt;int&gt; &lt;int&gt;
+ 1         1    40
+ 2         2    40
+ 3         3    40
+ 4         4    40
+ 5         5    40
+ 6         6    40
+ 7         7    40
+ 8         8    40
+ 9         9    40
+10        10    40
+# … with 990 more rows</code></pre>
+<p>Notice that each <code>replicate</code> has 40 entries here. Now that we have 1000 different bootstrap samples, our next step is to <code>calculate</code> the bootstrap statistics for each sample.</p>
+</div>
+<div id="calculate-summary-statistics" class="section level3">
+<h3><span class="header-section-number">9.2.3</span> Calculate summary statistics</h3>
+<p><img src="images/flowcharts/infer/calculate.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>After <code>generate()</code>ing many different samples, we next want to condense those samples down into a single statistic for each <code>replicate</code>d sample. As seen in the diagram, the <code>calculate()</code> function is helpful here.</p>
+<p>As we did at the beginning of this chapter, we now want to calculate the mean <code>age_in_2011</code> for each bootstrap sample. To do so, we use the <code>stat</code> argument and set it to <code>&quot;mean&quot;</code> below. The <code>stat</code> argument has a variety of different options here and we will see further examples of this throughout the remaining chapters.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution &lt;-<span class="st"> </span>pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age_in_<span class="dv">2011</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)
+bootstrap_distribution</code></pre>
+<pre><code># A tibble: 1,000 x 2
+   replicate  stat
+       &lt;int&gt; &lt;dbl&gt;
+ 1         1  26.5
+ 2         2  25.4
+ 3         3  26.0
+ 4         4  26  
+ 5         5  25.2
+ 6         6  29.0
+ 7         7  22.8
+ 8         8  26.4
+ 9         9  24.9
+10        10  28.1
+# … with 990 more rows</code></pre>
+<p>We see that the resulting data has 1000 rows and 2 columns corresponding to the 1000 replicates and the mean for each bootstrap sample.</p>
+<div id="observed-statistic-point-estimate-calculations" class="section level4 unnumbered">
+<h4>Observed statistic / point estimate calculations</h4>
+<p>Just as <code>group_by() %&gt;% summarize()</code> produces a useful workflow in <code>dplyr</code>, we can also use <code>specify() %&gt;% calculate()</code> to compute summary measures on our original sample data. It’s often helpful both in confidence interval calculations, but also in hypothesis testing to identify what the corresponding statistic is in the original data. For our example on penny age, we computed above a value of <code>x_bar</code> using the <code>summarize()</code> verb in <code>dplyr</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  25.1</code></pre>
+<p>This can also be done by skipping the <code>generate()</code> step in the pipeline feeding <code>specify()</code> directly into <code>calculate()</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age_in_<span class="dv">2011</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  25.1</code></pre>
+<p>This shortcut will be particularly useful when the calculation of the observed statistic is tricky to do using <code>dplyr</code> alone. This is particularly the case when working with more than one variable as will be seen in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a>.</p>
+</div>
+</div>
+<div id="visualize-the-results" class="section level3">
+<h3><span class="header-section-number">9.2.4</span> Visualize the results</h3>
+<p><img src="images/flowcharts/infer/visualize.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The <code>visualize()</code> verb provides a simple way to view the bootstrap distribution as a histogram of the <code>stat</code> variable values. It has many other arguments that one can use as well including the shading of the histogram values corresponding to the confidence interval values.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-295-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The shape of this resulting distribution may look familiar to you. It resembles the well-known normal (bell-shaped) curve.</p>
+<p>The following diagram recaps the <code>infer</code> pipeline for creating a bootstrap distribution.</p>
+<p><img src="images/flowcharts/infer/ci_diagram.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+</div>
+</div>
+<div id="now-to-confidence-intervals" class="section level2">
+<h2><span class="header-section-number">9.3</span> Now to confidence intervals</h2>
+<p><strong>Definition: Confidence Interval</strong></p>
+<p>A <em>confidence interval</em> gives a range of plausible values for a parameter. It depends on a specified <em>confidence level</em> with higher confidence levels corresponding to wider confidence intervals and lower confidence levels corresponding to narrower confidence intervals. Common confidence levels include 90%, 95%, and 99%.</p>
+<p>Usually we don’t just begin sections with a definition, but <em>confidence intervals</em> are simple to define and play an important role in the sciences and any field that uses data. You can think of a confidence interval as playing the role of a net when fishing. Instead of just trying to catch a fish with a single spear (estimating an unknown parameter by using a single point estimate/statistic), we can use a net to try to provide a range of possible locations for the fish (use a range of possible values based around our statistic to make a plausible guess as to the location of the parameter).</p>
+<p>The bootstrapping process will provide bootstrap statistics that have a bootstrap distribution with center at (or extremely close to) the mean of the original sample. This can be seen by giving the observed statistic <code>obs_stat</code> argument the value of the point estimate <code>x_bar</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> x_bar)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-297-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can also compute the mean of the bootstrap distribution of means to see how it compares to <code>x_bar</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_of_means =</span> <span class="kw">mean</span>(stat))</code></pre>
+<pre><code># A tibble: 1 x 1
+  mean_of_means
+          &lt;dbl&gt;
+1          25.1</code></pre>
+<p>In this case, we can see that the bootstrap distribution provides us a guess as to what the variability in different sample means may look like only using the original sample as our guide. We can quantify this variability in the form of a 95% confidence interval in a couple different ways.</p>
+<div id="percentile-method" class="section level3">
+<h3><span class="header-section-number">9.3.1</span> The percentile method</h3>
+<p>One way to calculate a range of plausible values for the unknown mean age of coins in 2011 is to use the middle 95% of the <code>bootstrap_distribution</code> to determine our endpoints. Our endpoints are thus at the 2.5<sup>th</sup> and 97.5<sup>th</sup> percentiles. This can be done with <code>infer</code> using the <code>get_ci()</code> function. (You can also use the <code>conf_int()</code> or <code>get_confidence_interval()</code> functions here as they are aliases that work the exact same way.)</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">level =</span> <span class="fl">0.95</span>, <span class="dt">type =</span> <span class="st">&quot;percentile&quot;</span>)</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1   21.0    29.3</code></pre>
+<p>These options are the default values for <code>level</code> and <code>type</code> so we can also just do:</p>
+<pre class="sourceCode r"><code class="sourceCode r">percentile_ci &lt;-<span class="st"> </span>bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+percentile_ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1   21.0    29.3</code></pre>
+<p>Using the percentile method, our range of plausible values for the mean age of US pennies in circulation in 2011 is 20.972 years to 29.252 years. We can use the <code>visualize()</code> function to view this using the <code>endpoints</code> and <code>direction</code> arguments, setting <code>direction</code> to <code>&quot;between&quot;</code> (between the values) and <code>endpoints</code> to be those stored with name <code>percentile_ci</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> percentile_ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-301-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>You can see that 95% of the data stored in the <code>stat</code> variable in <code>bootstrap_distribution</code> falls between the two endpoints with 2.5% to the left outside of the shading and 2.5% to the right outside of the shading. The cut-off points that provide our range are shown with the darker lines.</p>
+</div>
+<div id="the-standard-error-method" class="section level3">
+<h3><span class="header-section-number">9.3.2</span> The standard error method</h3>
+<p>If the bootstrap distribution is close to symmetric and bell-shaped, we can also use a shortcut formula for determining the lower and upper endpoints of the confidence interval. This is done by using the formula <span class="math inline">\(\bar{x} \pm (multiplier * SE),\)</span> where <span class="math inline">\(\bar{x}\)</span> is our original sample mean and <span class="math inline">\(SE\)</span> stands for <strong>standard error</strong> and corresponds to the standard deviation of the bootstrap distribution. The value of <span class="math inline">\(multiplier\)</span> here is the appropriate percentile of the standard normal distribution.</p>
+<p>These are automatically calculated when <code>level</code> is provided with <code>level = 0.95</code> being the default. (95% of the values in a standard normal distribution fall within 1.96 standard deviations of the mean, so <span class="math inline">\(multiplier = 1.96\)</span> for <code>level = 0.95</code>, for example.) As mentioned, this formula assumes that the bootstrap distribution is symmetric and bell-shaped. This is often the case with bootstrap distributions, especially those in which the original distribution of the sample is not highly skewed.</p>
+<p><strong>Definition: standard error</strong></p>
+<p>The <em>standard error</em> is the standard deviation of the sampling distribution.</p>
+<p>The variability of the sampling distribution may be approximated by the variability of the bootstrap distribution. Traditional theory-based methodologies for inference also have formulas for standard errors, assuming some conditions are met.</p>
+<p>This <span class="math inline">\(\bar{x} \pm (multiplier * SE)\)</span> formula is implemented in the <code>get_ci()</code> function as shown with our pennies problem using the bootstrap distribution’s variability as an approximation for the sampling distribution’s variability. We’ll see more on this approximation shortly.</p>
+<p>Note that the center of the confidence interval (the <code>point_estimate</code>) must be provided for the standard error confidence interval.</p>
+<pre class="sourceCode r"><code class="sourceCode r">standard_error_ci &lt;-<span class="st"> </span>bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">type =</span> <span class="st">&quot;se&quot;</span>, <span class="dt">point_estimate =</span> x_bar)
+standard_error_ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  lower upper
+  &lt;dbl&gt; &lt;dbl&gt;
+1  21.0  29.3</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> standard_error_ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-304-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that both methods produce nearly identical confidence intervals with the percentile method being <span class="math inline">\([20.97, 29.25]\)</span> and the standard error method being <span class="math inline">\([20.97, 29.28]\)</span>.</p>
+</div>
+</div>
+<div id="comparing-bootstrap-and-sampling-distributions" class="section level2">
+<h2><span class="header-section-number">9.4</span> Comparing bootstrap and sampling distributions</h2>
+<p>To help build up the idea of a confidence interval, we weren’t completely honest in our initial discussion. The <code>pennies_sample</code> data frame represents a sample from a larger number of pennies stored as <code>pennies</code> in the <code>moderndive</code> package. The <code>pennies</code> data frame (also in the <code>moderndive</code> package) contains 800 rows of data and two columns pertaining to the same variables as <code>pennies_sample</code>. Let’s begin by understanding some of the properties of the <code>age_by_2011</code> variable in the <code>pennies</code> data frame.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(pennies, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-305-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_age =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>),
+            <span class="dt">median_age =</span> <span class="kw">median</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 1 x 2
+  mean_age median_age
+     &lt;dbl&gt;      &lt;dbl&gt;
+1     21.2         20</code></pre>
+<p>We see that <code>pennies</code> is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that <code>pennies_sample</code> was more symmetric than <code>pennies</code>. In fact, it actually exhibited some left-skew as we compare the mean and median values.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(pennies_sample, <span class="kw">aes</span>(<span class="dt">x =</span> age_in_<span class="dv">2011</span>)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-307-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_age =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>),
+            <span class="dt">median_age =</span> <span class="kw">median</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 1 x 2
+  mean_age median_age
+     &lt;dbl&gt;      &lt;dbl&gt;
+1     25.1       25.5</code></pre>
+<div id="sampling-distribution" class="section level4 unnumbered">
+<h4>Sampling distribution</h4>
+<p>Let’s assume that <code>pennies</code> represents our population of interest. We can then create a sampling distribution for the population mean age of pennies, denoted by the Greek letter <span class="math inline">\(\mu\)</span>, using the <code>rep_sample_n()</code> function seen in Chapter <a href="8-sampling.html#sampling">8</a>. First we will create 1000 samples from the <code>pennies</code> data frame.</p>
+<pre class="sourceCode r"><code class="sourceCode r">thousand_samples &lt;-<span class="st"> </span>pennies <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">40</span>, <span class="dt">reps =</span> <span class="dv">1000</span>, <span class="dt">replace =</span> <span class="ot">FALSE</span>)</code></pre>
+<p>When creating a sampling distribution, we do not replace the items when we create each sample. This is in contrast to the bootstrap distribution. It’s important to remember that the sampling distribution is sampling <strong>without</strong> replacement from the population to better understand sample-to-sample variability, whereas the bootstrap distribution is sampling <strong>with</strong> replacement from our original sample to better understand potential sample-to-sample variability.</p>
+<p>After sampling from <code>pennies</code> 1000 times, we next want to compute the mean age for each of the 1000 samples:</p>
+<pre class="sourceCode r"><code class="sourceCode r">sampling_distribution &lt;-<span class="st"> </span>thousand_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">stat =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<p>We could use <code>ggplot()</code> with <code>geom_histogram()</code> again, but since we’ve named our column in <code>summarize()</code> to be <code>stat</code>, we can also use the shortcut <code>visualize()</code> function in <code>infer</code> and also specify the number of bins and also fill the bars with a different color such as <code>&quot;salmon&quot;</code>. This will be done to help remember that <code>&quot;salmon&quot;</code> corresponds to “sampling distribution”.</p>
+<pre class="sourceCode r"><code class="sourceCode r">sampling_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">fill =</span> <span class="st">&quot;salmon&quot;</span>)</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-311"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-311-1.png" alt="Sampling distribution for n=40 samples of pennies" width="\textwidth" />
+<p class="caption">
+Figure 9.1: Sampling distribution for n=40 samples of pennies
+</p>
+</div>
+<p>We can also examine the variability in this sampling distribution by calculating the standard deviation of the <code>stat</code> column. Remember that the standard deviation of the sampling distribution is the <strong>standard error</strong>, frequently denoted as <code>se</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">sampling_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">se =</span> <span class="kw">sd</span>(stat))</code></pre>
+<pre><code># A tibble: 1 x 1
+     se
+  &lt;dbl&gt;
+1  2.01</code></pre>
+</div>
+<div id="bootstrap-distribution" class="section level4 unnumbered">
+<h4>Bootstrap distribution</h4>
+<p>Let’s now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We’ll shade the bootstrap distribution blue to further assist with remembering which is which.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">10</span>, <span class="dt">fill =</span> <span class="st">&quot;blue&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-313-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">se =</span> <span class="kw">sd</span>(stat))</code></pre>
+<pre><code># A tibble: 1 x 1
+     se
+  &lt;dbl&gt;
+1  2.12</code></pre>
+<p>Notice that while the standard deviations are similar, the center of the sampling distribution and the bootstrap distribution differ:</p>
+<pre class="sourceCode r"><code class="sourceCode r">sampling_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_of_sampling_means =</span> <span class="kw">mean</span>(stat))</code></pre>
+<pre><code># A tibble: 1 x 1
+  mean_of_sampling_means
+                   &lt;dbl&gt;
+1                   21.2</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">mean_of_bootstrap_means =</span> <span class="kw">mean</span>(stat))</code></pre>
+<pre><code># A tibble: 1 x 1
+  mean_of_bootstrap_means
+                    &lt;dbl&gt;
+1                    25.1</code></pre>
+<p>Since the bootstrap distribution is centered at the original sample mean, it doesn’t necessarily provide a good estimate of the overall population mean <span class="math inline">\(\mu\)</span>. Let’s calculate the mean of <code>age_in_2011</code> for the <code>pennies</code> data frame to see how it compares to the mean of the sampling distribution and the mean of the bootstrap distribution.</p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">overall_mean =</span> <span class="kw">mean</span>(age_in_<span class="dv">2011</span>))</code></pre>
+<pre><code># A tibble: 1 x 1
+  overall_mean
+         &lt;dbl&gt;
+1         21.2</code></pre>
+<p>Notice that this value matches up well with the mean of the sampling distribution. This is actually an artifact of the Central Limit Theorem introduced in Chapter <a href="8-sampling.html#sampling">8</a>. The mean of the sampling distribution is expected to be the mean of the overall population.</p>
+<p>The unfortunate fact though is that we don’t know the population mean in nearly all circumstances. The motivation of presenting it here was to show that the theory behind the Central Limit Theorem works using the tools you’ve worked with so far using the <code>ggplot2</code>, <code>dplyr</code>, <code>moderndive</code>, and <code>infer</code> packages.</p>
+<p>If we aren’t able to use the sample mean as a good guess for the population mean, how should we best go about estimating what the population mean may be if we can only select samples from the population. We’ve now come full circle and can discuss the underpinnings of the confidence interval and ways to interpret it.</p>
+</div>
+</div>
+<div id="interpreting-the-confidence-interval" class="section level2">
+<h2><span class="header-section-number">9.5</span> Interpreting the confidence interval</h2>
+<p>As shown above in Subsection <a href="9-confidence-intervals.html#percentile-method">9.3.1</a>, one range of plausible values for the population mean age of pennies in 2011, denoted by <span class="math inline">\(\mu\)</span>, is <span class="math inline">\([20.97, 29.25]\)</span>. Recall that this confidence interval is based on bootstrapping using <code>pennies_sample</code>. Note that the mean of <code>pennies</code> (21.152) does fall in this confidence interval. If we had a different sample of size 40 and constructed a confidence interval using the same method, would we be guaranteed that it contained the population parameter value as well? Let’s try it out:</p>
+<pre class="sourceCode r"><code class="sourceCode r">pennies_sample2 &lt;-<span class="st"> </span>pennies <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">sample_n</span>(<span class="dt">size =</span> <span class="dv">40</span>)</code></pre>
+<p>Note the use of the <code>sample_n()</code> function in the <code>dplyr</code> package here. This does the same thing as <code>rep_sample_n(reps = 1)</code> but omits the extra <code>replicate</code> column.</p>
+<p>We next create an <code>infer</code> pipeline to generate a percentile-based 95% confidence interval for <span class="math inline">\(\mu\)</span>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">percentile_ci2 &lt;-<span class="st"> </span>pennies_sample2 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> age_in_<span class="dv">2011</span> <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+percentile_ci2</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1   18.4    25.3</code></pre>
+<p>This new confidence interval also contains the value of <span class="math inline">\(\mu\)</span>. Let’s further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of <code>pennies</code>. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years.</p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-321-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Of the 100 confidence intervals based on samples of size <span class="math inline">\(n = 40\)</span>, 96 of them captured the population mean <span class="math inline">\(\mu = 21.152\)</span>, whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we’d expect 95% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is “95% reliable” in that we can expect it to include the true population parameter 95% of the time if the process is repeated.</p>
+<p>To further accentuate this point, let’s perform a similar procedure using 90% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals.</p>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-322-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>Of the 100 confidence intervals based on samples of size <span class="math inline">\(n = 40\)</span>, 87 of them captured the population mean <span class="math inline">\(\mu = 21.152\)</span>, whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be “95% confident” or “90% confident” that the true value falls within the range of the specified confidence interval. We will use this “confident” language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process.</p>
+<div id="back-to-our-pennies-example" class="section level4 unnumbered">
+<h4>Back to our pennies example</h4>
+<p>After this elaboration on what the level corresponds to in a confidence interval, let’s conclude by providing an interpretation of the original confidence interval result we found in Subsection <a href="9-confidence-intervals.html#percentile-method">9.3.1</a>.</p>
+<p><strong>Interpretation:</strong> We are 95% confident that the true mean age of pennies in circulation in 2011 is between 20.972 and 29.252 years. This level of confidence is based on the percentile-based method including the true mean 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.</p>
+</div>
+</div>
+<div id="one-prop-ci" class="section level2">
+<h2><span class="header-section-number">9.6</span> EXAMPLE: One proportion</h2>
+<p>Let’s revisit our exercise of trying to estimate the proportion of red balls in the bowl from Chapter <a href="8-sampling.html#sampling">8</a>. We are now interested in determining a confidence interval for population parameter <span class="math inline">\(p\)</span>, the proportion of balls that are red out of the total <span class="math inline">\(N = 2400\)</span> red and white balls.</p>
+<p>We will use the first sample reported from Ilyas and Yohan in Subsection <a href="8-sampling.html#student-shovels">8.2.2</a> for our point estimate. They observed 21 red balls out of the 50 in their shovel. This data is stored in the <code>tactile_shovel1</code> data frame in the <code>moderndive</code> package.</p>
+<!-- Need to include this in the pkg! -->
+<pre class="sourceCode r"><code class="sourceCode r">tactile_shovel1</code></pre>
+<pre><code># A tibble: 50 x 1
+   color
+   &lt;chr&gt;
+ 1 red  
+ 2 red  
+ 3 white
+ 4 red  
+ 5 white
+ 6 red  
+ 7 red  
+ 8 white
+ 9 red  
+10 white
+# … with 40 more rows</code></pre>
+<div id="observed-statistic" class="section level3">
+<h3><span class="header-section-number">9.6.1</span> Observed Statistic</h3>
+<p>To compute the proportion that are red in this data we can use the <code>specify() %&gt;% calculate()</code> workflow. Note the use of the <code>success</code> argument here to clarify which of the two colors <code>&quot;red&quot;</code> or <code>&quot;white&quot;</code> we are interested in.</p>
+<pre class="sourceCode r"><code class="sourceCode r">p_hat &lt;-<span class="st"> </span>tactile_shovel1 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> color <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>, <span class="dt">success =</span> <span class="st">&quot;red&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)
+p_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  0.42</code></pre>
+</div>
+<div id="bootstrap-distribution-1" class="section level3">
+<h3><span class="header-section-number">9.6.2</span> Bootstrap distribution</h3>
+<p>Next we want to calculate many different bootstrap samples and their corresponding bootstrap statistic (the proportion of red balls). We’ve done 1000 in the past, but let’s go up to 10,000 now to better see the resulting distribution. Recall that this is done by including a <code>generate()</code> function call in the middle of our pipeline:</p>
+<pre class="sourceCode r"><code class="sourceCode r">tactile_shovel1 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> color <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>, <span class="dt">success =</span> <span class="st">&quot;red&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>)</code></pre>
+<p>This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the <code>calculate()</code> step.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_props &lt;-<span class="st"> </span>tactile_shovel1 <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> color <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>, <span class="dt">success =</span> <span class="st">&quot;red&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)</code></pre>
+<p>Let’s <code>visualize()</code> what the resulting bootstrap distribution looks like as a histogram. We’ve adjusted the number of bins here as well to better see the resulting shape.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_props <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">25</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-330-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that the resulting distribution is symmetric and bell-shaped so it doesn’t much matter which confidence interval method we choose. Let’s use the standard error method to create a 95% confidence interval.</p>
+<pre class="sourceCode r"><code class="sourceCode r">standard_error_ci &lt;-<span class="st"> </span>bootstrap_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">type =</span> <span class="st">&quot;se&quot;</span>, <span class="dt">level =</span> <span class="fl">0.95</span>, <span class="dt">point_estimate =</span> p_hat)
+standard_error_ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  lower upper
+  &lt;dbl&gt; &lt;dbl&gt;
+1 0.284 0.556</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">25</span>, <span class="dt">endpoints =</span> standard_error_ci)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-332-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We are 95% confident that the true proportion of red balls in the bowl is between 0.284 and years. This level of confidence is based on the standard error-based method including the true proportion 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.</p>
+</div>
+<div id="theory-based-confidence-intervals" class="section level3">
+<h3><span class="header-section-number">9.6.3</span> Theory-based confidence intervals</h3>
+<p>When the bootstrap distribution has the nice symmetric, bell shape that we saw in the red balls example above, we can also use a formula to quantify the standard error. This provides another way to compute a confidence interval, but is a little more tedious and mathematical. The steps are outlined below. We’ve also shown how we can use the confidence interval (CI) interpretation in this case as well to support your understanding of this tricky concept.</p>
+<div id="procedure-for-building-a-theory-based-ci-for-p" class="section level4 unnumbered">
+<h4>Procedure for building a theory-based CI for <span class="math inline">\(p\)</span></h4>
+<p>To construct a theory-based confidence interval for <span class="math inline">\(p\)</span>, the unknown true population proportion we</p>
+<ol style="list-style-type: decimal">
+<li>Collect a sample of size <span class="math inline">\(n\)</span></li>
+<li>Compute <span class="math inline">\(\widehat{p}\)</span></li>
+<li>Compute the standard error <span class="math display">\[\text{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]</span></li>
+<li>Compute the margin of error <span class="math display">\[\text{MoE} = 1.96 \cdot \text{SE} =  1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]</span></li>
+<li>Compute both end points of the confidence interval:
+<ul>
+<li>The lower end point <code>lower_ci</code>: <span class="math display">\[\widehat{p} - \text{MoE} = \widehat{p} - 1.96 \cdot \text{SE} = \widehat{p} - 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]</span></li>
+<li>The upper end point <code>upper_ci</code>: <span class="math display">\[\widehat{p} + \text{MoE} = \widehat{p} + 1.96 \cdot \text{SE} = \widehat{p} + 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]</span></li>
+</ul></li>
+<li>Alternatively, you can succinctly summarize a 95% confidence interval for <span class="math inline">\(p\)</span> using the <span class="math inline">\(\pm\)</span> symbol:</li>
+</ol>
+<p><span class="math display">\[
+\widehat{p} \pm \text{MoE} = \widehat{p} \pm 1.96 \cdot \text{SE} = \widehat{p} \pm 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}
+\]</span></p>
+</div>
+<div id="confidence-intervals-based-on-33-tactile-samples" class="section level4 unnumbered">
+<h4>Confidence intervals based on 33 tactile samples</h4>
+<p>Let’s load the tactile sampling data for the 33 groups from Chapter <a href="8-sampling.html#sampling">8</a>. Recall this data was saved in the <code>tactile_prop_red</code> data frame included in the <code>moderndive</code> package.</p>
+<!-- Load tactile_prop_red into moderndive package too -->
+<pre class="sourceCode r"><code class="sourceCode r">tactile_prop_red</code></pre>
+<p>Let’s now apply the above procedure for constructing confidence intervals for <span class="math inline">\(p\)</span> using the data saved in <code>tactile_prop_red</code> by adding/modifying new columns using the <code>dplyr</code> package data wrangling tools seen in Chapter <a href="5-wrangling.html#wrangling">5</a>:</p>
+<ol style="list-style-type: decimal">
+<li>Rename <code>prop_red</code> to <code>p_hat</code>, the official name of the sample proportion</li>
+<li>Make explicit the sample size <code>n</code> of <span class="math inline">\(n=50\)</span></li>
+<li>the standard error <code>SE</code></li>
+<li>the margin of error <code>MoE</code></li>
+<li>the left endpoint of the confidence interval <code>lower_ci</code></li>
+<li>the right endpoint of the confidence interval <code>upper_ci</code></li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r">conf_ints &lt;-<span class="st"> </span>tactile_prop_red <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">p_hat =</span> prop_red) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(
+    <span class="dt">n =</span> <span class="dv">50</span>,
+    <span class="dt">SE =</span> <span class="kw">sqrt</span>(p_hat <span class="op">*</span><span class="st"> </span>(<span class="dv">1</span> <span class="op">-</span><span class="st"> </span>p_hat) <span class="op">/</span><span class="st"> </span>n),
+    <span class="dt">MoE =</span> <span class="fl">1.96</span> <span class="op">*</span><span class="st"> </span>SE,
+    <span class="dt">lower_ci =</span> p_hat <span class="op">-</span><span class="st"> </span>MoE,
+    <span class="dt">upper_ci =</span> p_hat <span class="op">+</span><span class="st"> </span>MoE
+  )
+conf_ints</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">group</th>
+<th align="center">red_balls</th>
+<th align="center">p_hat</th>
+<th align="center">n</th>
+<th align="center">SE</th>
+<th align="center">MoE</th>
+<th align="center">lower_ci</th>
+<th align="center">upper_ci</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">Ilyas, Yohan</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="even">
+<td align="center">Morgan, Terrance</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+<td align="center">50</td>
+<td align="center">0.067</td>
+<td align="center">0.131</td>
+<td align="center">0.209</td>
+<td align="center">0.471</td>
+</tr>
+<tr class="odd">
+<td align="center">Martin, Thomas</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="even">
+<td align="center">Clark, Frank</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="odd">
+<td align="center">Riddhi, Karina</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+<td align="center">50</td>
+<td align="center">0.068</td>
+<td align="center">0.133</td>
+<td align="center">0.227</td>
+<td align="center">0.493</td>
+</tr>
+<tr class="even">
+<td align="center">Andrew, Tyler</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+<td align="center">50</td>
+<td align="center">0.069</td>
+<td align="center">0.135</td>
+<td align="center">0.245</td>
+<td align="center">0.515</td>
+</tr>
+<tr class="odd">
+<td align="center">Julia</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+<td align="center">50</td>
+<td align="center">0.069</td>
+<td align="center">0.135</td>
+<td align="center">0.245</td>
+<td align="center">0.515</td>
+</tr>
+<tr class="even">
+<td align="center">Rachel, Lauren</td>
+<td align="center">11</td>
+<td align="center">0.22</td>
+<td align="center">50</td>
+<td align="center">0.059</td>
+<td align="center">0.115</td>
+<td align="center">0.105</td>
+<td align="center">0.335</td>
+</tr>
+<tr class="odd">
+<td align="center">Daniel, Caroline</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+<td align="center">50</td>
+<td align="center">0.065</td>
+<td align="center">0.127</td>
+<td align="center">0.173</td>
+<td align="center">0.427</td>
+</tr>
+<tr class="even">
+<td align="center">Josh, Maeve</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+<td align="center">50</td>
+<td align="center">0.067</td>
+<td align="center">0.131</td>
+<td align="center">0.209</td>
+<td align="center">0.471</td>
+</tr>
+<tr class="odd">
+<td align="center">Emily, Emily</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+<td align="center">50</td>
+<td align="center">0.066</td>
+<td align="center">0.129</td>
+<td align="center">0.191</td>
+<td align="center">0.449</td>
+</tr>
+<tr class="even">
+<td align="center">Conrad, Emily</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+<td align="center">50</td>
+<td align="center">0.068</td>
+<td align="center">0.133</td>
+<td align="center">0.227</td>
+<td align="center">0.493</td>
+</tr>
+<tr class="odd">
+<td align="center">Oliver, Erik</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+<td align="center">50</td>
+<td align="center">0.067</td>
+<td align="center">0.131</td>
+<td align="center">0.209</td>
+<td align="center">0.471</td>
+</tr>
+<tr class="even">
+<td align="center">Isabel, Nam</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="odd">
+<td align="center">X, Claire</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+<td align="center">50</td>
+<td align="center">0.065</td>
+<td align="center">0.127</td>
+<td align="center">0.173</td>
+<td align="center">0.427</td>
+</tr>
+<tr class="even">
+<td align="center">Cindy, Kimberly</td>
+<td align="center">20</td>
+<td align="center">0.40</td>
+<td align="center">50</td>
+<td align="center">0.069</td>
+<td align="center">0.136</td>
+<td align="center">0.264</td>
+<td align="center">0.536</td>
+</tr>
+<tr class="odd">
+<td align="center">Kevin, James</td>
+<td align="center">11</td>
+<td align="center">0.22</td>
+<td align="center">50</td>
+<td align="center">0.059</td>
+<td align="center">0.115</td>
+<td align="center">0.105</td>
+<td align="center">0.335</td>
+</tr>
+<tr class="even">
+<td align="center">Nam, Isabelle</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="odd">
+<td align="center">Harry, Yuko</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+<td align="center">50</td>
+<td align="center">0.065</td>
+<td align="center">0.127</td>
+<td align="center">0.173</td>
+<td align="center">0.427</td>
+</tr>
+<tr class="even">
+<td align="center">Yuki, Eileen</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+<td align="center">50</td>
+<td align="center">0.066</td>
+<td align="center">0.129</td>
+<td align="center">0.191</td>
+<td align="center">0.449</td>
+</tr>
+<tr class="odd">
+<td align="center">Ramses</td>
+<td align="center">23</td>
+<td align="center">0.46</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.138</td>
+<td align="center">0.322</td>
+<td align="center">0.598</td>
+</tr>
+<tr class="even">
+<td align="center">Joshua, Elizabeth, Stanley</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+<td align="center">50</td>
+<td align="center">0.065</td>
+<td align="center">0.127</td>
+<td align="center">0.173</td>
+<td align="center">0.427</td>
+</tr>
+<tr class="odd">
+<td align="center">Siobhan, Jane</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+<td align="center">50</td>
+<td align="center">0.068</td>
+<td align="center">0.133</td>
+<td align="center">0.227</td>
+<td align="center">0.493</td>
+</tr>
+<tr class="even">
+<td align="center">Jack, Will</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+<td align="center">50</td>
+<td align="center">0.066</td>
+<td align="center">0.129</td>
+<td align="center">0.191</td>
+<td align="center">0.449</td>
+</tr>
+<tr class="odd">
+<td align="center">Caroline, Katie</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="even">
+<td align="center">Griffin, Y</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+<td align="center">50</td>
+<td align="center">0.068</td>
+<td align="center">0.133</td>
+<td align="center">0.227</td>
+<td align="center">0.493</td>
+</tr>
+<tr class="odd">
+<td align="center">Kaitlin, Jordan</td>
+<td align="center">17</td>
+<td align="center">0.34</td>
+<td align="center">50</td>
+<td align="center">0.067</td>
+<td align="center">0.131</td>
+<td align="center">0.209</td>
+<td align="center">0.471</td>
+</tr>
+<tr class="even">
+<td align="center">Ella, Garrett</td>
+<td align="center">18</td>
+<td align="center">0.36</td>
+<td align="center">50</td>
+<td align="center">0.068</td>
+<td align="center">0.133</td>
+<td align="center">0.227</td>
+<td align="center">0.493</td>
+</tr>
+<tr class="odd">
+<td align="center">Julie, Hailin</td>
+<td align="center">15</td>
+<td align="center">0.30</td>
+<td align="center">50</td>
+<td align="center">0.065</td>
+<td align="center">0.127</td>
+<td align="center">0.173</td>
+<td align="center">0.427</td>
+</tr>
+<tr class="even">
+<td align="center">Katie, Caroline</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="odd">
+<td align="center">Mallory, Damani, Melissa</td>
+<td align="center">21</td>
+<td align="center">0.42</td>
+<td align="center">50</td>
+<td align="center">0.070</td>
+<td align="center">0.137</td>
+<td align="center">0.283</td>
+<td align="center">0.557</td>
+</tr>
+<tr class="even">
+<td align="center">Katie</td>
+<td align="center">16</td>
+<td align="center">0.32</td>
+<td align="center">50</td>
+<td align="center">0.066</td>
+<td align="center">0.129</td>
+<td align="center">0.191</td>
+<td align="center">0.449</td>
+</tr>
+<tr class="odd">
+<td align="center">Francis, Vignesh</td>
+<td align="center">19</td>
+<td align="center">0.38</td>
+<td align="center">50</td>
+<td align="center">0.069</td>
+<td align="center">0.135</td>
+<td align="center">0.245</td>
+<td align="center">0.515</td>
+</tr>
+</tbody>
+</table>
+<p>Let’s plot:</p>
+<ol style="list-style-type: decimal">
+<li>These 33 confidence intervals for <span class="math inline">\(p\)</span>: from <code>lower_ci</code> to <code>upper_ci</code></li>
+<li>The true population proportion <span class="math inline">\(p = 900 / 2400 = 0.375\)</span> with a red vertical line</li>
+</ol>
+<div class="figure" style="text-align: center"><span id="fig:tactile-conf-int"></span>
+<img src="ismaykim_files/figure-html/tactile-conf-int-1.png" alt="33 confidence intervals based on 33 tactile samples of size n=50" width="\textwidth" />
+<p class="caption">
+Figure 9.2: 33 confidence intervals based on 33 tactile samples of size n=50
+</p>
+</div>
+<p>We see that:</p>
+<ul>
+<li>In 31 cases, the confidence intervals “capture” the true <span class="math inline">\(p = 900 / 2400 = 0.375\)</span></li>
+<li>In 2 cases, the confidence intervals do not “capture” the true <span class="math inline">\(p = 900 / 2400 = 0.375\)</span></li>
+</ul>
+<p>Thus, the confidence intervals capture the true proportion $31 / 33 = 93.939% of the time using this theory-based methodology.</p>
+</div>
+<div id="confidence-intervals-based-on-100-virtual-samples" class="section level4 unnumbered">
+<h4>Confidence intervals based on 100 virtual samples</h4>
+<p>Let’s say however, we repeated the above 100 times, not tactilely, but virtually. Let’s do this only 100 times instead of 1000 like we did before so that the results can fit on the screen. Again, the steps for compute a 95% confidence interval for <span class="math inline">\(p\)</span> are:</p>
+<ol style="list-style-type: decimal">
+<li>Collect a sample of size <span class="math inline">\(n = 50\)</span> as we did in Chapter <a href="8-sampling.html#sampling">8</a></li>
+<li>Compute <span class="math inline">\(\widehat{p}\)</span>: the sample proportion red of these <span class="math inline">\(n=50\)</span> balls</li>
+<li>Compute the standard error <span class="math inline">\(\text{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)</span></li>
+<li>Compute the margin of error <span class="math inline">\(\text{MoE} = 1.96 \cdot \text{SE} = 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)</span></li>
+<li>Compute both end points of the confidence interval:
+<ul>
+<li><code>lower_ci</code>: <span class="math inline">\(\widehat{p} - \text{MoE} = \widehat{p} - 1.96 \cdot \text{SE} = \widehat{p} - 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)</span></li>
+<li><code>upper_ci</code>: <span class="math inline">\(\widehat{p} + \text{MoE} = \widehat{p} + 1.96 \cdot \text{SE} = \widehat{p} +1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)</span></li>
+</ul></li>
+</ol>
+<p>Run the following three steps, being sure to <code>View()</code> the resulting data frame after each step so you can convince yourself of what’s going on:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># First: Take 100 virtual samples of n=50 balls</span>
+virtual_samples &lt;-<span class="st"> </span>bowl <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rep_sample_n</span>(<span class="dt">size =</span> <span class="dv">50</span>, <span class="dt">reps =</span> <span class="dv">100</span>)
+
+<span class="co"># Second: For each virtual sample compute the proportion red</span>
+virtual_prop_red &lt;-<span class="st"> </span>virtual_samples <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(replicate) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">red =</span> <span class="kw">sum</span>(color <span class="op">==</span><span class="st"> &quot;red&quot;</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">prop_red =</span> red <span class="op">/</span><span class="st"> </span><span class="dv">50</span>)
+
+<span class="co"># Third: Compute the 95% confidence interval as above</span>
+virtual_prop_red &lt;-<span class="st"> </span>virtual_prop_red <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">p_hat =</span> prop_red) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">mutate</span>(
+    <span class="dt">n =</span> <span class="dv">50</span>,
+    <span class="dt">SE =</span> <span class="kw">sqrt</span>(p_hat<span class="op">*</span>(<span class="dv">1</span><span class="op">-</span>p_hat)<span class="op">/</span>n),
+    <span class="dt">MoE =</span> <span class="fl">1.96</span> <span class="op">*</span><span class="st"> </span>SE,
+    <span class="dt">lower_ci =</span> p_hat <span class="op">-</span><span class="st"> </span>MoE,
+    <span class="dt">upper_ci =</span> p_hat <span class="op">+</span><span class="st"> </span>MoE
+  )</code></pre>
+<p>Here are the results:</p>
+<div class="figure" style="text-align: center"><span id="fig:virtual-conf-int"></span>
+<img src="ismaykim_files/figure-html/virtual-conf-int-1.png" alt="100 confidence intervals based on 100 virtual samples of size n=50" width="\textwidth" />
+<p class="caption">
+Figure 9.3: 100 confidence intervals based on 100 virtual samples of size n=50
+</p>
+</div>
+<p>We see that of our 100 confidence intervals based on samples of size <span class="math inline">\(n=50\)</span>, 96 of them captured the true <span class="math inline">\(p = 900/2400\)</span>, whereas 4 of them missed. As we create more and more confidence intervals based on more and more samples, about 95% of these intervals will capture. In other words our procedure is “95% reliable.”</p>
+<p>Theoretical methods like this have largely been used in the past since we didn’t have the computing power to perform the simulation-based methods such as bootstrapping. They are still commonly used though and if the normality assumptions are met, they can provide a nice option for finding confidence intervals and performing hypothesis tests as we will see in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a>.</p>
+</div>
+</div>
+</div>
+<div id="example-comparing-two-proportions" class="section level2">
+<h2><span class="header-section-number">9.7</span> EXAMPLE: Comparing two proportions</h2>
+<p>If you see someone else yawn, are you more likely to yawn? In an <a href="http://www.discovery.com/tv-shows/mythbusters/mythbusters-database/yawning-contagious/">episode</a> of the show <em>Mythbusters</em>, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website <a href="https://www.discovery.com/tv-shows/mythbusters/videos/is-yawning-contagious">here</a>. More information about the episode is also available on IMDb <a href="https://www.imdb.com/title/tt0768479/">here</a>.</p>
+<p>Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (“confederate”) who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at <code>mythbusters_yawn</code> in the <code>moderndive</code> package. Let’s check it out.</p>
+<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn</code></pre>
+<pre><code># A tibble: 50 x 3
+    subj group   yawn 
+   &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
+ 1     1 seed    yes  
+ 2     2 control yes  
+ 3     3 seed    no   
+ 4     4 seed    yes  
+ 5     5 seed    no   
+ 6     6 control no   
+ 7     7 seed    yes  
+ 8     8 control no   
+ 9     9 control no   
+10    10 seed    no   
+# … with 40 more rows</code></pre>
+<ul>
+<li>The participant ID is stored in the <code>subj</code> variable with values of 1 to 50.</li>
+<li>The <code>group</code> variable is either <code>&quot;seed&quot;</code> for when a confederate was trying to influence the participant or <code>&quot;control&quot;</code> if a confederate did not interact with the participant.</li>
+<li>The <code>yawn</code> variable is either <code>&quot;yes&quot;</code> if the participant yawned or <code>&quot;no&quot;</code> if the participant did not yawn.</li>
+</ul>
+<p>We can use the <code>janitor</code> package to get a glimpse into this data in a table format:</p>
+<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">tabyl</span>(group, yawn) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">adorn_percentages</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">adorn_pct_formatting</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="co"># To show original counts</span>
+<span class="st">  </span><span class="kw">adorn_ns</span>()</code></pre>
+<pre><code>   group         no        yes
+ control 75.0% (12) 25.0%  (4)
+    seed 70.6% (24) 29.4% (10)</code></pre>
+<p>We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We’d like to see if the difference between these two proportions is significantly larger than 0. If so, we’d have evidence to support the claim that yawning is contagious based on this study.</p>
+<p>In looking over this problem, we can make note of some important details to include in our <code>infer</code> pipeline:</p>
+<ul>
+<li>We are calling a <code>success</code> having a <code>yawn</code> value of <code>&quot;yes&quot;</code>.</li>
+<li>Our response variable will always correspond to the variable used in the <code>success</code> so the response variable is <code>yawn</code>.</li>
+<li>The explanatory variable is the other variable of interest here: <code>group</code>.</li>
+</ul>
+<p>To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not.</p>
+<div id="compute-the-point-estimate" class="section level3">
+<h3><span class="header-section-number">9.7.1</span> Compute the point estimate</h3>
+<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group)</code></pre>
+<pre><code>Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`.</code></pre>
+<p>Note that the <code>success</code> argument must be specified in situations such as this where the response variable has only two levels.</p>
+<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>)</code></pre>
+<pre><code>Response: yawn (factor)
+Explanatory: group (factor)
+# A tibble: 50 x 2
+   yawn  group  
+   &lt;fct&gt; &lt;fct&gt;  
+ 1 yes   seed   
+ 2 yes   control
+ 3 no    seed   
+ 4 yes   seed   
+ 5 no    seed   
+ 6 no    control
+ 7 yes   seed   
+ 8 no    control
+ 9 no    control
+10 no    seed   
+# … with 40 more rows</code></pre>
+<p>We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes.</p>
+<pre class="sourceCode r"><code class="sourceCode r">mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>)</code></pre>
+<pre><code>Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c(&quot;first&quot;, &quot;second&quot;)` means `(&quot;first&quot; - &quot;second&quot;)`. Check `?calculate` for details.</code></pre>
+<p>We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the <code>order</code> in which R should subtract these proportions of successes. As the error message states, we’ll want to put <code>&quot;seed&quot;</code> first after <code>c()</code> and then <code>&quot;control&quot;</code>: <code>order = c(&quot;seed&quot;, &quot;control&quot;)</code>. Our point estimate is thus calculated:</p>
+<pre class="sourceCode r"><code class="sourceCode r">obs_diff &lt;-<span class="st"> </span>mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;seed&quot;</span>, <span class="st">&quot;control&quot;</span>))
+obs_diff</code></pre>
+<pre><code># A tibble: 1 x 1
+    stat
+   &lt;dbl&gt;
+1 0.0441</code></pre>
+<p>This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25).</p>
+</div>
+<div id="bootstrap-distribution-2" class="section level3">
+<h3><span class="header-section-number">9.7.2</span> Bootstrap distribution</h3>
+<p>Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection <a href="9-confidence-intervals.html#bootstrap-process">9.1.3</a> and in computing bootstrap proportions in Section <a href="9-confidence-intervals.html#one-prop-ci">9.6</a>, but we haven’t yet worked with bootstrapping involving multiple variables though.</p>
+<p>In the <code>infer</code> package, bootstrapping with multiple variables means that each <strong>row</strong> is potentially resampled. Let’s investigate this by looking at the first few rows of <code>mythbusters_yawn</code>:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(mythbusters_yawn)</code></pre>
+<pre><code># A tibble: 6 x 3
+   subj group   yawn 
+  &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
+1     1 seed    yes  
+2     2 control yes  
+3     3 seed    no   
+4     4 seed    yes  
+5     5 seed    no   
+6     6 control no   </code></pre>
+<p>When we bootstrap this data, we are potentially pulling the subject’s readings multiple times. Thus, we could see the entries of <code>&quot;seed&quot;</code> for <code>group</code> and <code>&quot;no&quot;</code> for <code>yawn</code> together in a new row in a bootstrap sample. This is further seen by exploring the <code>sample_n()</code> function in <code>dplyr</code> on this smaller 6 row data frame comprised of <code>head(mythbusters_yawn)</code>. The <code>sample_n()</code> function can perform this bootstrapping procedure and is similar to the <code>rep_sample_n()</code> function in <code>infer</code>, except that it is not <code>rep</code>eated but rather only performs one sample with or without replacement.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2019</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(mythbusters_yawn) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">sample_n</span>(<span class="dt">size =</span> <span class="dv">6</span>, <span class="dt">replace =</span> <span class="ot">TRUE</span>)</code></pre>
+<pre><code># A tibble: 6 x 3
+   subj group   yawn 
+  &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
+1     5 seed    no   
+2     5 seed    no   
+3     2 control yes  
+4     4 seed    yes  
+5     1 seed    yes  
+6     1 seed    yes  </code></pre>
+<p>We can see that in this bootstrap sample generated from the first six rows of <code>mythbusters_yawn</code>, we have some rows repeated. The same is true when we perform the <code>generate()</code> step in <code>infer</code> as done below.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution &lt;-<span class="st"> </span>mythbusters_yawn <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">formula =</span> yawn <span class="op">~</span><span class="st"> </span>group, <span class="dt">success =</span> <span class="st">&quot;yes&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">1000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;seed&quot;</span>, <span class="st">&quot;control&quot;</span>))</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>(<span class="dt">bins =</span> <span class="dv">20</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-347-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply <code>get_ci()</code> can be used.</p>
+<pre class="sourceCode r"><code class="sourceCode r">bootstrap_distribution <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>(<span class="dt">type =</span> <span class="st">&quot;percentile&quot;</span>, <span class="dt">level =</span> <span class="fl">0.95</span>)</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1 -0.219   0.293</code></pre>
+<p>The confidence interval shown here includes the value of 0. We’ll see in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> further what this means in terms of this difference being statistically significant or not, but let’s examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293.</p>
+<p>Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about “95% confident”) that the seed group had a higher proportion of yawning than the control group.</p>
+<p>Note that this all relates to the importance of denoting the <code>order</code> argument in the <code>calculate()</code> function. Since we specified <code>&quot;seed&quot;</code> and then <code>&quot;control&quot;</code> positive values for the statistic correspond to the <code>&quot;seed&quot;</code> proportion being higher, whereas negative values correspond to the <code>&quot;control&quot;</code> group being higher.</p>
+<p>We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that “yawning is contagious” being “confirmed” is not statistically appropriate.</p>
+<div class="learncheck">
+<p>
+<strong><em>Learning check</em></strong>
+</p>
+</div>
+<p>Practice problems to come soon!</p>
+<div class="learncheck">
+
+</div>
+</div>
+</div>
+<div id="conclusion-7" class="section level2">
+<h2><span class="header-section-number">9.8</span> Conclusion</h2>
+<div id="whats-to-come-6" class="section level3">
+<h3><span class="header-section-number">9.8.1</span> What’s to come?</h3>
+<p>This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter <a href="10-hypothesis-testing.html#hypothesis-testing">10</a> up next!</p>
+</div>
+<div id="script-of-r-code-6" class="section level3">
+<h3><span class="header-section-number">9.8.2</span> Script of R code</h3>
+<p>An R script file of all R code used in this chapter is available <a href="https://moderndive.com/scripts/09-confidence-intervals.R">here</a>.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="8-sampling.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="10-hypothesis-testing.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/09-confidence-intervals.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/A-appendixA.html b/previous_versions/v0.4.0/A-appendixA.html
new file mode 100644
index 000000000..38f919b39
--- /dev/null
+++ b/previous_versions/v0.4.0/A-appendixA.html
@@ -0,0 +1,657 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>A Statistical Background | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="A Statistical Background | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="A Statistical Background | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="12-thinking-with-data.html">
+<link rel="next" href="B-appendixB.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="appendixA" class="section level1">
+<h1><span class="header-section-number">A</span> Statistical Background</h1>
+<div id="basic-statistical-terms" class="section level2">
+<h2><span class="header-section-number">A.1</span> Basic statistical terms</h2>
+<div id="mean" class="section level3">
+<h3><span class="header-section-number">A.1.1</span> Mean</h3>
+<p>The mean is the most commonly reported measure of center. It is commonly called the “average” though this term can be a little ambiguous. The mean is the sum of all of the data elements divided by how many elements there are. If we have <span class="math inline">\(n\)</span> data points, the mean is given by: <span class="math display">\[Mean = \frac{x_1 + x_2 + \cdots + x_n}{n}\]</span></p>
+</div>
+<div id="median" class="section level3">
+<h3><span class="header-section-number">A.1.2</span> Median</h3>
+<p>The median is calculated by first sorting a variable’s data from smallest to largest. After sorting the data, the middle element in the list is the <strong>median</strong>. If the middle falls between two values, then the median is the mean of those two values.</p>
+</div>
+<div id="standard-deviation" class="section level3">
+<h3><span class="header-section-number">A.1.3</span> Standard deviation</h3>
+<p>We will next discuss the <strong>standard deviation</strong> of a sample dataset pertaining to one variable. The formula can be a little intimidating at first but it is important to remember that it is essentially a measure of how far to expect a given data value is from its mean:</p>
+<p><span class="math display">\[Standard \, deviation = \sqrt{\frac{(x_1 - Mean)^2 + (x_2 - Mean)^2 + \cdots + (x_n - Mean)^2}{n - 1}}\]</span></p>
+</div>
+<div id="five-number-summary" class="section level3">
+<h3><span class="header-section-number">A.1.4</span> Five-number summary</h3>
+<p>The <strong>five-number summary</strong> consists of five values: minimum, first quantile (25<sup>th</sup> percentile), median (50<sup>th</sup> percentile), third quantile (75<sup>th</sup>) quantile, and maximum. The quantiles are calculated as</p>
+<ul>
+<li>first quantile (<span class="math inline">\(Q_1\)</span>): the median of the first half of the sorted data</li>
+<li>third quantile (<span class="math inline">\(Q_3\)</span>): the median of the second half of the sorted data</li>
+</ul>
+<p>The <em>interquartile range</em> is defined as <span class="math inline">\(Q_3 - Q_1\)</span> and is a measure of how spread out the middle 50% of values is. The five-number summary is not influenced by the presence of outliers in the ways that the mean and standard deviation are. It is, thus, recommended for skewed datasets.</p>
+</div>
+<div id="distribution" class="section level3">
+<h3><span class="header-section-number">A.1.5</span> Distribution</h3>
+<p>The <strong>distribution</strong> of a variable/dataset corresponds to generalizing patterns in the dataset. It often shows how frequently elements in the dataset appear. It shows how the data varies and gives some information about where a typical element in the data might fall. Distributions are most easily seen through data visualization.</p>
+</div>
+<div id="outliers" class="section level3">
+<h3><span class="header-section-number">A.1.6</span> Outliers</h3>
+<p><strong>Outliers</strong> correspond to values in the dataset that fall far outside the range of “ordinary” values. In regards to a boxplot (by default), they correspond to values below <span class="math inline">\(Q_1 - (1.5 * IQR)\)</span> or above <span class="math inline">\(Q_3 + (1.5 * IQR)\)</span>.</p>
+<p>Note that these terms (aside from <strong>Distribution</strong>) only apply to quantitative variables.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="12-thinking-with-data.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="B-appendixB.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/91-appendixA.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/B-appendixB.html b/previous_versions/v0.4.0/B-appendixB.html
new file mode 100644
index 000000000..ea1e3de54
--- /dev/null
+++ b/previous_versions/v0.4.0/B-appendixB.html
@@ -0,0 +1,1618 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>B Inference Examples | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="B Inference Examples | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="B Inference Examples | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="A-appendixA.html">
+<link rel="next" href="C-appendixC.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="appendixB" class="section level1">
+<h1><span class="header-section-number">B</span> Inference Examples</h1>
+<p>This appendix is designed to provide you with examples of the five basic hypothesis tests and their corresponding confidence intervals. Traditional theory-based methods as well as computational-based methods are presented. <!-- You can also use this appendix as a way to check for understanding of which statistical graphic is most appropriate given the problem set-up. --></p>
+<div class="learncheck">
+<p>
+<strong>Note: This appendix is still under construction. If you would like to contribute, please check us out on GitHub at <a href="https://github.com/moderndive/moderndive_book" class="uri">https://github.com/moderndive/moderndive_book</a>.</strong>
+</p>
+<p>
+<strong>Please check out our sneak peak of <code>infer</code> below in the meanwhile. For more details on <code>infer</code> visit <a href="https://infer.netlify.com/" class="uri">https://infer.netlify.com/</a></strong>.
+</p>
+<center>
+<img src="images/sign-2408065_1920.png" alt="Drawing" style="height: 100px;"/>
+</center>
+</div>
+<div id="needed-packages-10" class="section level2 unnumbered">
+<h2>Needed packages</h2>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(infer)
+<span class="kw">library</span>(knitr)
+<span class="kw">library</span>(readr)
+<span class="kw">library</span>(janitor)</code></pre>
+</div>
+<div id="inference-mind-map" class="section level2">
+<h2><span class="header-section-number">B.1</span> Inference mind map</h2>
+<p>To help you better navigate and choose the appropriate analysis, we’ve created a mind map on <a href="http://coggle.it" class="uri">http://coggle.it</a> available <a href="https://coggle.it/diagram/Vxlydu1akQFeqo6-">here</a> and below.</p>
+<div class="figure" style="text-align: center"><span id="fig:infer-map"></span>
+<img src="images/coggle.png" alt="Mind map for Inference" width="200%" />
+<p class="caption">
+Figure B.1: Mind map for Inference
+</p>
+</div>
+</div>
+<div id="one-mean" class="section level2">
+<h2><span class="header-section-number">B.2</span> One mean</h2>
+<div id="problem-statement" class="section level3">
+<h3><span class="header-section-number">B.2.1</span> Problem statement</h3>
+<p>The National Survey of Family Growth conducted by the
+Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy,
+infertility, use of contraception, and men’s and women’s health. One of the variables collected on
+this survey is the age at first marriage. 5,534 randomly sampled US women between 2006 and 2010 completed the survey. The women sampled here had been married at least once. Do we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years? <span class="citation">(Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 4])</span></p>
+</div>
+<div id="competing-hypotheses" class="section level3">
+<h3><span class="header-section-number">B.2.2</span> Competing hypotheses</h3>
+<div id="in-words" class="section level4 unnumbered">
+<h4>In words</h4>
+<ul>
+<li><p>Null hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is equal to 23 years.</p></li>
+<li><p>Alternative hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.</p></li>
+</ul>
+</div>
+<div id="in-symbols-with-annotations" class="section level4 unnumbered">
+<h4>In symbols (with annotations)</h4>
+<ul>
+<li><span class="math inline">\(H_0: \mu = \mu_{0}\)</span>, where <span class="math inline">\(\mu\)</span> represents the mean age of first marriage for all US women from 2006 to 2010 and <span class="math inline">\(\mu_0\)</span> is 23.</li>
+<li><span class="math inline">\(H_A: \mu &gt; 23\)</span></li>
+</ul>
+</div>
+<div id="set-alpha" class="section level4 unnumbered">
+<h4>Set <span class="math inline">\(\alpha\)</span></h4>
+<p>It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.</p>
+</div>
+</div>
+<div id="exploring-the-sample-data" class="section level3">
+<h3><span class="header-section-number">B.2.3</span> Exploring the sample data</h3>
+<pre class="sourceCode r"><code class="sourceCode r">age_at_marriage &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;https://moderndive.com/data/ageAtMar.csv&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">age_summ &lt;-<span class="st"> </span>age_at_marriage <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">sample_size =</span> <span class="kw">n</span>(),
+    <span class="dt">mean =</span> <span class="kw">mean</span>(age),
+    <span class="dt">sd =</span> <span class="kw">sd</span>(age),
+    <span class="dt">minimum =</span> <span class="kw">min</span>(age),
+    <span class="dt">lower_quartile =</span> <span class="kw">quantile</span>(age, <span class="fl">0.25</span>),
+    <span class="dt">median =</span> <span class="kw">median</span>(age),
+    <span class="dt">upper_quartile =</span> <span class="kw">quantile</span>(age, <span class="fl">0.75</span>),
+    <span class="dt">max =</span> <span class="kw">max</span>(age))
+<span class="kw">kable</span>(age_summ)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">sample_size</th>
+<th align="center">mean</th>
+<th align="center">sd</th>
+<th align="center">minimum</th>
+<th align="center">lower_quartile</th>
+<th align="center">median</th>
+<th align="center">upper_quartile</th>
+<th align="center">max</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">5534</td>
+<td align="center">23.4</td>
+<td align="center">4.72</td>
+<td align="center">10</td>
+<td align="center">20</td>
+<td align="center">23</td>
+<td align="center">26</td>
+<td align="center">43</td>
+</tr>
+</tbody>
+</table>
+<p>The histogram below also shows the distribution of <code>age</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> age_at_marriage, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> age)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">3</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/hist1b-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The observed statistic of interest here is the sample mean:</p>
+<pre class="sourceCode r"><code class="sourceCode r">x_bar &lt;-<span class="st"> </span>age_at_marriage <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)
+x_bar</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  23.4</code></pre>
+<div id="guess-about-statistical-significance" class="section level4 unnumbered">
+<h4>Guess about statistical significance</h4>
+<p>We are looking to see if the observed sample mean of 23.44 is statistically greater than <span class="math inline">\(\mu_0 = 23\)</span>. They seem to be quite close, but we have a large sample size here. Let’s guess that the large sample size will lead us to reject this practically small difference.</p>
+<hr />
+</div>
+</div>
+<div id="non-traditional-methods" class="section level3">
+<h3><span class="header-section-number">B.2.4</span> Non-traditional methods</h3>
+<div id="bootstrapping-for-hypothesis-test" class="section level4 unnumbered">
+<h4>Bootstrapping for hypothesis test</h4>
+<p>In order to look to see if the observed sample mean of 23.44 is statistically greater than <span class="math inline">\(\mu_0 = 23\)</span>, we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 5534 was selected.</p>
+<p>We can use the idea of <em>bootstrapping</em> to simulate the population from which the sample came and then generate samples from that simulated population to account for sampling variability. Recall how bootstrapping would apply in this context:</p>
+<ol style="list-style-type: decimal">
+<li>Sample with replacement from our original sample of 5534 women and repeat this process 10,000 times,</li>
+<li>calculate the mean for each of the 10,000 bootstrap samples created in Step 1.,</li>
+<li>combine all of these bootstrap statistics calculated in Step 2 into a <code>boot_distn</code> object, and</li>
+<li>shift the center of this distribution over to the null value of 23. (This is needed since it will be centered at 23.44 via the process of bootstrapping.)</li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2018</span>)
+null_distn_one_mean &lt;-<span class="st"> </span>age_at_marriage <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;point&quot;</span>, <span class="dt">mu =</span> <span class="dv">23</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_one_mean <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-447-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44 for our <span class="math inline">\(p\)</span>-value.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_one_mean <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> x_bar, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-448-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="calculate-p-value" class="section level5 unnumbered">
+<h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_one_mean <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> x_bar, <span class="dt">direction =</span> <span class="st">&quot;greater&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1       0</code></pre>
+<p>So our <span class="math inline">\(p\)</span>-value is 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tail of the null distribution.</p>
+</div>
+</div>
+<div id="bootstrapping-for-confidence-interval" class="section level4 unnumbered">
+<h4>Bootstrapping for confidence interval</h4>
+<p>We can also create a confidence interval for the unknown population parameter <span class="math inline">\(\mu\)</span> using our sample data using <em>bootstrapping</em>. Note that we don’t need to shift this distribution since we want the center of our confidence interval to be our point estimate <span class="math inline">\(\bar{x}_{obs} = 23.44\)</span>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_one_mean &lt;-<span class="st"> </span>age_at_marriage <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> age) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">ci &lt;-<span class="st"> </span>boot_distn_one_mean <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1   23.3    23.6</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_one_mean <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-452-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that 23 is not contained in this confidence interval as a plausible value of <span class="math inline">\(\mu\)</span> (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (<span class="math inline">\(\mu &gt; 23\)</span>).</p>
+<p><strong>Interpretation</strong>: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.316 and 23.565.</p>
+<hr />
+</div>
+</div>
+<div id="traditional-methods" class="section level3">
+<h3><span class="header-section-number">B.2.5</span> Traditional methods</h3>
+<div id="check-conditions" class="section level4 unnumbered">
+<h4>Check conditions</h4>
+<p>Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.</p>
+<ol style="list-style-type: decimal">
+<li><p><em>Independent observations</em>: The observations are collected independently.</p>
+<p>The cases are selected independently through random sampling so this condition is met.</p></li>
+<li><p><em>Approximately normal</em>: The distribution of the response variable should be normal or the sample size should be at least 30.</p>
+<p>The histogram for the sample above does show some skew.</p></li>
+</ol>
+<p>The Q-Q plot below also shows some skew.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> age_at_marriage, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">sample =</span> age)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">stat_qq</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/qqplotmean-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The sample size here is quite large though (<span class="math inline">\(n = 5534\)</span>) so both conditions are met.</p>
+</div>
+<div id="test-statistic" class="section level4 unnumbered">
+<h4>Test statistic</h4>
+<p>The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean <span class="math inline">\(\mu\)</span>. A good guess is the sample mean <span class="math inline">\(\bar{X}\)</span>. Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of <span class="math inline">\(\bar{x}_{obs} = 23.44\)</span> or larger assuming that the population mean is 23 (assuming the null hypothesis is true). If the conditions are met and assuming <span class="math inline">\(H_0\)</span> is true, we can “standardize” this original test statistic of <span class="math inline">\(\bar{X}\)</span> into a <span class="math inline">\(T\)</span> statistic that follows a <span class="math inline">\(t\)</span> distribution with degrees of freedom equal to <span class="math inline">\(df = n - 1\)</span>:</p>
+<p><span class="math display">\[ T =\dfrac{ \bar{X} - \mu_0}{ S / \sqrt{n} } \sim t (df = n - 1) \]</span></p>
+<p>where <span class="math inline">\(S\)</span> represents the standard deviation of the sample and <span class="math inline">\(n\)</span> is the sample size.</p>
+<div id="observed-test-statistic" class="section level5 unnumbered">
+<h5>Observed test statistic</h5>
+<p>While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the <code>t_test()</code> function to perform this analysis for us.</p>
+<pre class="sourceCode r"><code class="sourceCode r">t_test_results &lt;-<span class="st"> </span>age_at_marriage <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span>infer<span class="op">::</span><span class="kw">t_test</span>(<span class="dt">formula =</span> age <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>,
+       <span class="dt">alternative =</span> <span class="st">&quot;greater&quot;</span>,
+       <span class="dt">mu =</span> <span class="dv">23</span>)
+t_test_results</code></pre>
+<pre><code># A tibble: 1 x 6
+  statistic  t_df  p_value alternative lower_ci upper_ci
+      &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt;          &lt;dbl&gt;    &lt;dbl&gt;
+1      6.94  5533 2.25e-12 greater         23.3      Inf</code></pre>
+<p>We see here that the <span class="math inline">\(t_{obs}\)</span> value is 6.936.</p>
+</div>
+</div>
+<div id="compute-p-value" class="section level4 unnumbered">
+<h4>Compute <span class="math inline">\(p\)</span>-value</h4>
+<p>The <span class="math inline">\(p\)</span>-value—the probability of observing an <span class="math inline">\(t_{obs}\)</span> value of 6.936 or more in our null distribution of a <span class="math inline">\(t\)</span> with 5533 degrees of freedom—is essentially 0.</p>
+</div>
+<div id="state-conclusion" class="section level4 unnumbered">
+<h4>State conclusion</h4>
+<p>We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean was statistically greater than the hypothesized mean has supporting evidence here. Based on this sample, we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.</p>
+</div>
+<div id="confidence-interval" class="section level4 unnumbered">
+<h4>Confidence interval</h4>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">t.test</span>(<span class="dt">x =</span> age_at_marriage<span class="op">$</span>age, 
+       <span class="dt">alternative =</span> <span class="st">&quot;two.sided&quot;</span>,
+       <span class="dt">mu =</span> <span class="dv">23</span>)<span class="op">$</span>conf</code></pre>
+<pre><code>[1] 23.3 23.6
+attr(,&quot;conf.level&quot;)
+[1] 0.95</code></pre>
+<hr />
+</div>
+</div>
+<div id="comparing-results" class="section level3">
+<h3><span class="header-section-number">B.2.6</span> Comparing results</h3>
+<p>Observing the bootstrap distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the <span class="math inline">\(p\)</span>-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met (the large sample size was the driver here) leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.</p>
+<hr />
+<hr />
+</div>
+</div>
+<div id="one-proportion" class="section level2">
+<h2><span class="header-section-number">B.3</span> One proportion</h2>
+<div id="problem-statement-1" class="section level3">
+<h3><span class="header-section-number">B.3.1</span> Problem statement</h3>
+<p>The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. 73 were satisfied and the remaining were unsatisfied. Based on these findings from the sample, can we reject the CEO’s hypothesis that 80% of the customers are satisfied? [Tweaked a bit from <a href="http://stattrek.com/hypothesis-test/proportion.aspx?Tutorial=AP" class="uri">http://stattrek.com/hypothesis-test/proportion.aspx?Tutorial=AP</a>]</p>
+</div>
+<div id="competing-hypotheses-1" class="section level3">
+<h3><span class="header-section-number">B.3.2</span> Competing hypotheses</h3>
+<div id="in-words-1" class="section level4 unnumbered">
+<h4>In words</h4>
+<ul>
+<li><p>Null hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is equal 0.80.</p></li>
+<li><p>Alternative hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80.</p></li>
+</ul>
+</div>
+<div id="in-symbols-with-annotations-1" class="section level4 unnumbered">
+<h4>In symbols (with annotations)</h4>
+<ul>
+<li><span class="math inline">\(H_0: \pi = p_{0}\)</span>, where <span class="math inline">\(\pi\)</span> represents the proportion of all customers of the large electric utility satisfied with service they receive and <span class="math inline">\(p_0\)</span> is 0.8.</li>
+<li><span class="math inline">\(H_A: \pi \ne 0.8\)</span></li>
+</ul>
+</div>
+<div id="set-alpha-1" class="section level4 unnumbered">
+<h4>Set <span class="math inline">\(\alpha\)</span></h4>
+<p>It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.</p>
+</div>
+</div>
+<div id="exploring-the-sample-data-1" class="section level3">
+<h3><span class="header-section-number">B.3.3</span> Exploring the sample data</h3>
+<pre class="sourceCode r"><code class="sourceCode r">elec &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="kw">rep</span>(<span class="st">&quot;satisfied&quot;</span>, <span class="dv">73</span>), <span class="kw">rep</span>(<span class="st">&quot;unsatisfied&quot;</span>, <span class="dv">27</span>)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">as_data_frame</span>() <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">satisfy =</span> value)</code></pre>
+<p>The bar graph below also shows the distribution of <code>satisfy</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> elec, <span class="kw">aes</span>(<span class="dt">x =</span> satisfy)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_bar</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/bar-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>The observed statistic is computed as</p>
+<pre class="sourceCode r"><code class="sourceCode r">p_hat &lt;-<span class="st"> </span>elec <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> satisfy, <span class="dt">success =</span> <span class="st">&quot;satisfied&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)
+p_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1  0.73</code></pre>
+<div id="guess-about-statistical-significance-1" class="section level4 unnumbered">
+<h4>Guess about statistical significance</h4>
+<p>We are looking to see if the sample proportion of 0.73 is statistically different from <span class="math inline">\(p_0 = 0.8\)</span> based on this sample. They seem to be quite close, and our sample size is not huge here (<span class="math inline">\(n = 100\)</span>). Let’s guess that we do not have evidence to reject the null hypothesis.</p>
+<hr />
+</div>
+</div>
+<div id="non-traditional-methods-1" class="section level3">
+<h3><span class="header-section-number">B.3.4</span> Non-traditional methods</h3>
+<div id="simulation-for-hypothesis-test" class="section level4 unnumbered">
+<h4>Simulation for hypothesis test</h4>
+<p>In order to look to see if 0.73 is statistically different from 0.8, we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 100 was selected. We can use the idea of an unfair coin to <em>simulate</em> this process. We will simulate flipping an unfair coin (with probability of success 0.8 matching the null hypothesis) 100 times. Then we will keep track of how many heads come up in those 100 flips. Our simulated statistic matches with how we calculated the original statistic <span class="math inline">\(\hat{p}\)</span>: the number of heads (satisfied) out of our total sample of 100. We then repeat this process many times (say 10,000) to create the null distribution looking at the simulated proportions of successes:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2018</span>)
+null_distn_one_prop &lt;-<span class="st"> </span>elec <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> satisfy, <span class="dt">success =</span> <span class="st">&quot;satisfied&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;point&quot;</span>, <span class="dt">p =</span> <span class="fl">0.8</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-455-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a two-tailed test so we will be looking for values that are 0.8 - 0.73 = 0.07 away from 0.8 in BOTH directions for our <span class="math inline">\(p\)</span>-value:</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> p_hat, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-456-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="calculate-p-value-1" class="section level5 unnumbered">
+<h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> p_hat, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1  0.0813</code></pre>
+<p>So our <span class="math inline">\(p\)</span>-value is 0.081 and we fail to reject the null hypothesis at the 5% level.</p>
+</div>
+</div>
+<div id="bootstrapping-for-confidence-interval-1" class="section level4 unnumbered">
+<h4>Bootstrapping for confidence interval</h4>
+<p>We can also create a confidence interval for the unknown population parameter <span class="math inline">\(\pi\)</span> using our sample data. To do so, we use <em>bootstrapping</em>, which involves</p>
+<ol style="list-style-type: decimal">
+<li>sampling with replacement from our original sample of 100 survey respondents and repeating this process 10,000 times,</li>
+<li>calculating the proportion of successes for each of the 10,000 bootstrap samples created in Step 1.,</li>
+<li>combining all of these bootstrap statistics calculated in Step 2 into a <code>boot_distn</code> object,</li>
+<li>identifying the 2.5<sup>th</sup> and 97.5<sup>th</sup> percentiles of this distribution (corresponding to the 5% significance level chosen) to find a 95% confidence interval for <span class="math inline">\(\pi\)</span>, and</li>
+<li>interpret this confidence interval in the context of the problem.</li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_one_prop &lt;-<span class="st"> </span>elec <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> satisfy, <span class="dt">success =</span> <span class="st">&quot;satisfied&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;prop&quot;</span>)</code></pre>
+<p>Just as we use the <code>mean</code> function for calculating the mean over a numerical variable, we can also use it to compute the proportion of successes for a categorical variable where we specify what we are calling a “success” after the <code>==</code>. (Think about the formula for calculating a mean and how R handles logical statements such as <code>satisfy == &quot;satisfied&quot;</code> for why this must be true.)</p>
+<pre class="sourceCode r"><code class="sourceCode r">ci &lt;-<span class="st"> </span>boot_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1   0.64    0.81</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_one_prop <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-460-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that 0.80 is contained in this confidence interval as a plausible value of <span class="math inline">\(\pi\)</span> (the unknown population proportion). This matches with our hypothesis test results of failing to reject the null hypothesis.</p>
+<p><strong>Interpretation</strong>: We are 95% confident the true proportion of customers who are satisfied with the service they receive is between 0.64 and 0.81.</p>
+<hr />
+</div>
+</div>
+<div id="traditional-methods-1" class="section level3">
+<h3><span class="header-section-number">B.3.5</span> Traditional methods</h3>
+<div id="check-conditions-1" class="section level4 unnumbered">
+<h4>Check conditions</h4>
+<p>Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.</p>
+<ol style="list-style-type: decimal">
+<li><p><em>Independent observations</em>: The observations are collected independently.</p>
+<p>The cases are selected independently through random sampling so this condition is met.</p></li>
+<li><p><em>Approximately normal</em>: The number of expected successes and expected failures is at least 10.</p>
+<p>This condition is met since 73 and 27 are both greater than 10.</p></li>
+</ol>
+</div>
+<div id="test-statistic-1" class="section level4 unnumbered">
+<h4>Test statistic</h4>
+<p>The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population proportion <span class="math inline">\(\pi\)</span>. A good guess is the sample proportion <span class="math inline">\(\hat{P}\)</span>. Recall that this sample proportion is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample proportion of <span class="math inline">\(\hat{p}_{obs} = 0.73\)</span> or larger assuming that the population proportion is 0.80 (assuming the null hypothesis is true). If the conditions are met and assuming <span class="math inline">\(H_0\)</span> is true, we can standardize this original test statistic of <span class="math inline">\(\hat{P}\)</span> into a <span class="math inline">\(Z\)</span> statistic that follows a <span class="math inline">\(N(0, 1)\)</span> distribution.</p>
+<p><span class="math display">\[ Z =\dfrac{ \hat{P} - p_0}{\sqrt{\dfrac{p_0(1 - p_0)}{n} }} \sim N(0, 1) \]</span></p>
+<div id="observed-test-statistic-1" class="section level5 unnumbered">
+<h5>Observed test statistic</h5>
+<p>While one could compute this observed test statistic by “hand” by plugging the observed values into the formula, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. The calculation has been done in R below for completeness though:</p>
+<pre class="sourceCode r"><code class="sourceCode r">p_hat &lt;-<span class="st"> </span><span class="fl">0.73</span>
+p0 &lt;-<span class="st"> </span><span class="fl">0.8</span>
+n &lt;-<span class="st"> </span><span class="dv">100</span>
+(z_obs &lt;-<span class="st"> </span>(p_hat <span class="op">-</span><span class="st"> </span>p0) <span class="op">/</span><span class="st"> </span><span class="kw">sqrt</span>( (p0 <span class="op">*</span><span class="st"> </span>(<span class="dv">1</span> <span class="op">-</span><span class="st"> </span>p0)) <span class="op">/</span><span class="st"> </span>n))</code></pre>
+<pre><code>[1] -1.75</code></pre>
+<p>We see here that the <span class="math inline">\(z_{obs}\)</span> value is around -1.75. Our observed sample proportion of 0.73 is 1.75 standard errors below the hypothesized parameter value of 0.8.</p>
+</div>
+</div>
+<div id="visualize-and-compute-p-value" class="section level4 unnumbered">
+<h4>Visualize and compute <span class="math inline">\(p\)</span>-value</h4>
+<pre class="sourceCode r"><code class="sourceCode r">elec <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> satisfy, <span class="dt">success =</span> <span class="st">&quot;satisfied&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;point&quot;</span>, <span class="dt">p =</span> <span class="fl">0.8</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;z&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">method =</span> <span class="st">&quot;theoretical&quot;</span>, <span class="dt">obs_stat =</span> z_obs, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/pvaloneprop-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="dv">2</span> <span class="op">*</span><span class="st"> </span><span class="kw">pnorm</span>(z_obs)</code></pre>
+<pre><code>[1] 0.0801</code></pre>
+<p>The <span class="math inline">\(p\)</span>-value—the probability of observing an <span class="math inline">\(z_{obs}\)</span> value of -1.75 or more extreme (in both directions) in our null distribution—is around 8%.</p>
+<p>Note that we could also do this test directly using the <code>prop.test</code> function.</p>
+<pre class="sourceCode r"><code class="sourceCode r">stats<span class="op">::</span><span class="kw">prop.test</span>(<span class="dt">x =</span> <span class="kw">table</span>(elec<span class="op">$</span>satisfy),
+       <span class="dt">n =</span> <span class="kw">length</span>(elec<span class="op">$</span>satisfy),
+       <span class="dt">alternative =</span> <span class="st">&quot;two.sided&quot;</span>,
+       <span class="dt">p =</span> <span class="fl">0.8</span>,
+       <span class="dt">correct =</span> <span class="ot">FALSE</span>)</code></pre>
+<pre><code>
+    1-sample proportions test without continuity correction
+
+data:  table(elec$satisfy), null probability 0.8
+X-squared = 3, df = 1, p-value = 0.08
+alternative hypothesis: true p is not equal to 0.8
+95 percent confidence interval:
+ 0.636 0.807
+sample estimates:
+   p 
+0.73 </code></pre>
+<p><code>prop.test</code> does a <span class="math inline">\(\chi^2\)</span> test here but this matches up exactly with what we would expect: <span class="math inline">\(x^2_{obs} = 3.06 = (-1.75)^2 = (z_{obs})^2\)</span> and the <span class="math inline">\(p\)</span>-values are the same because we are focusing on a two-tailed test.</p>
+<p>Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping.</p>
+</div>
+<div id="state-conclusion-1" class="section level4 unnumbered">
+<h4>State conclusion</h4>
+<p>We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample proportion was not statistically greater than the hypothesized proportion has not been invalidated. Based on this sample, we have do not evidence that the proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80, at the 5% level.</p>
+<hr />
+</div>
+</div>
+<div id="comparing-results-1" class="section level3">
+<h3><span class="header-section-number">B.3.6</span> Comparing results</h3>
+<p>Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the <span class="math inline">\(p\)</span>-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.</p>
+<hr />
+<hr />
+</div>
+</div>
+<div id="two-proportions" class="section level2">
+<h2><span class="header-section-number">B.4</span> Two proportions</h2>
+<div id="problem-statement-2" class="section level3">
+<h3><span class="header-section-number">B.4.1</span> Problem statement</h3>
+<p>A 2010 survey asked 827 randomly sampled registered voters
+in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of
+California? Or do you not know enough to say?” Conduct a hypothesis test to determine if the data
+provide strong evidence that the proportion of college
+graduates who do not have an opinion on this issue is
+different than that of non-college graduates. <span class="citation">(Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 6])</span></p>
+</div>
+<div id="competing-hypotheses-2" class="section level3">
+<h3><span class="header-section-number">B.4.2</span> Competing hypotheses</h3>
+<div id="in-words-2" class="section level4 unnumbered">
+<h4>In words</h4>
+<ul>
+<li><p>Null hypothesis: There is no association between having an opinion on drilling and having a college degree for all registered California voters in 2010.</p></li>
+<li><p>Alternative hypothesis: There is an association between having an opinion on drilling and having a college degree for all registered California voters in 2010.</p></li>
+</ul>
+</div>
+<div id="another-way-in-words" class="section level4 unnumbered">
+<h4>Another way in words</h4>
+<ul>
+<li><p>Null hypothesis: The probability that a Californian voter in 2010 having no opinion on drilling and is a college graduate is the <strong>same</strong> as that of a non-college graduate.</p></li>
+<li><p>Alternative hypothesis: These parameter probabilities are different.</p></li>
+</ul>
+</div>
+<div id="in-symbols-with-annotations-2" class="section level4 unnumbered">
+<h4>In symbols (with annotations)</h4>
+<ul>
+<li><span class="math inline">\(H_0: \pi_{college} = \pi_{no\_college}\)</span> or <span class="math inline">\(H_0: \pi_{college} - \pi_{no\_college} = 0\)</span>, where <span class="math inline">\(\pi\)</span> represents the probability of not having an opinion on drilling.</li>
+<li><span class="math inline">\(H_A: \pi_{college} - \pi_{no\_college} \ne 0\)</span></li>
+</ul>
+</div>
+<div id="set-alpha-2" class="section level4 unnumbered">
+<h4>Set <span class="math inline">\(\alpha\)</span></h4>
+<p>It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.</p>
+</div>
+</div>
+<div id="exploring-the-sample-data-2" class="section level3">
+<h3><span class="header-section-number">B.4.3</span> Exploring the sample data</h3>
+<pre class="sourceCode r"><code class="sourceCode r">offshore &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;https://moderndive.com/data/offshore.csv&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">offshore <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">tabyl</span>(college_grad, response)</code></pre>
+<pre><code> college_grad no opinion opinion
+           no        131     258
+          yes        104     334</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">off_summ &lt;-<span class="st"> </span>offshore <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(college_grad) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">prop_no_opinion =</span> <span class="kw">mean</span>(response <span class="op">==</span><span class="st"> &quot;no opinion&quot;</span>),
+    <span class="dt">sample_size =</span> <span class="kw">n</span>())</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(offshore, <span class="kw">aes</span>(<span class="dt">x =</span> college_grad, <span class="dt">fill =</span> response)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>(<span class="dt">position =</span> <span class="st">&quot;fill&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">coord_flip</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/stacked_bar-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="guess-about-statistical-significance-2" class="section level4 unnumbered">
+<h4>Guess about statistical significance</h4>
+<p>We are looking to see if a difference exists in the size of the bars corresponding to <code>no opinion</code> for the plot. Based solely on the plot, we have little reason to believe that a difference exists since the bars seem to be about the same size, BUT…it’s important to use statistics to see if that difference is actually statistically significant!</p>
+<hr />
+</div>
+</div>
+<div id="non-traditional-methods-2" class="section level3">
+<h3><span class="header-section-number">B.4.4</span> Non-traditional methods</h3>
+<div id="collecting-summary-info" class="section level4 unnumbered">
+<h4>Collecting summary info</h4>
+<p>The observed statistic is</p>
+<pre class="sourceCode r"><code class="sourceCode r">d_hat &lt;-<span class="st"> </span>offshore <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(response <span class="op">~</span><span class="st"> </span>college_grad, <span class="dt">success =</span> <span class="st">&quot;no opinion&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;yes&quot;</span>, <span class="st">&quot;no&quot;</span>))
+d_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+     stat
+    &lt;dbl&gt;
+1 -0.0993</code></pre>
+</div>
+<div id="randomization-for-hypothesis-test" class="section level4 unnumbered">
+<h4>Randomization for hypothesis test</h4>
+<p>In order to look to see if the observed sample proportion of no opinion for college graduates of 0.337 is statistically different than that for graduates of 0.237, we need to account for the sample sizes. Note that this is the same as looking to see if <span class="math inline">\(\hat{p}_{grad} - \hat{p}_{nograd}\)</span> is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 389 and 438 were selected.</p>
+<p>We can use the idea of <em>randomization testing</em> (also known as <em>permutation testing</em>) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using <em>shuffling</em> from that simulated population to account for sampling variability.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2018</span>)
+null_distn_two_props &lt;-<span class="st"> </span>offshore <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(response <span class="op">~</span><span class="st"> </span>college_grad, <span class="dt">success =</span> <span class="st">&quot;no opinion&quot;</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;yes&quot;</span>, <span class="st">&quot;no&quot;</span>))</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-465-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.099 or less than or equal to 0.099 for our <span class="math inline">\(p\)</span>-value.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;two_sided&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-466-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="calculate-p-value-2" class="section level5 unnumbered">
+<h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;two_sided&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1  0.0021</code></pre>
+<p>So our <span class="math inline">\(p\)</span>-value is 0.002 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tails of the null distribution.</p>
+</div>
+</div>
+<div id="bootstrapping-for-confidence-interval-2" class="section level4 unnumbered">
+<h4>Bootstrapping for confidence interval</h4>
+<p>We can also create a confidence interval for the unknown population parameter <span class="math inline">\(\pi_{college} - \pi_{no\_college}\)</span> using our sample data with <em>bootstrapping</em>.</p>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_two_props &lt;-<span class="st"> </span>offshore <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(response <span class="op">~</span><span class="st"> </span>college_grad, <span class="dt">success =</span> <span class="st">&quot;no opinion&quot;</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in props&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;yes&quot;</span>, <span class="st">&quot;no&quot;</span>))</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">ci &lt;-<span class="st"> </span>boot_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1 -0.161 -0.0378</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_two_props <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-470-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that 0 is not contained in this confidence interval as a plausible value of <span class="math inline">\(\pi_{college} - \pi_{no\_college}\)</span> (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates.</p>
+<p><strong>Interpretation</strong>: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates.</p>
+<hr />
+</div>
+</div>
+<div id="traditional-methods-2" class="section level3">
+<h3><span class="header-section-number">B.4.5</span> Traditional methods</h3>
+</div>
+<div id="check-conditions-2" class="section level3">
+<h3><span class="header-section-number">B.4.6</span> Check conditions</h3>
+<p>Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.</p>
+<ol style="list-style-type: decimal">
+<li><p><em>Independent observations</em>: Each case that was selected must be independent of all the other cases selected.</p>
+<p>This condition is met since cases were selected at random to observe.</p></li>
+<li><p><em>Sample size</em>: The number of pooled successes and pooled failures must be at least 10 for each group.</p>
+<p>We need to first figure out the pooled success rate: <span class="math display">\[\hat{p}_{obs} = \dfrac{131 + 104}{827} = 0.28.\]</span> We now determine expected (pooled) success and failure counts:</p>
+<p><span class="math inline">\(0.28 \cdot (131 + 258) = 108.92\)</span>, <span class="math inline">\(0.72 \cdot (131 + 258) = 280.08\)</span></p>
+<p><span class="math inline">\(0.28 \cdot (104 + 334) = 122.64\)</span>, <span class="math inline">\(0.72 \cdot (104 + 334) = 315.36\)</span></p></li>
+<li><p><em>Independent selection of samples</em>: The cases are not paired in any meaningful way.</p>
+<p>We have no reason to suspect that a college graduate selected would have any relationship to a non-college graduate selected.</p></li>
+</ol>
+</div>
+<div id="test-statistic-2" class="section level3">
+<h3><span class="header-section-number">B.4.7</span> Test statistic</h3>
+<p>The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample proportions corresponding to no opinion on drilling (<span class="math inline">\(\hat{p}_{college, obs} - \hat{p}_{no\_college, obs}\)</span> = 0.033) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the standard normal distribution to standardize the difference in sample proportions (<span class="math inline">\(\hat{P}_{college} - \hat{P}_{no\_college}\)</span>) using the standard error of <span class="math inline">\(\hat{P}_{college} - \hat{P}_{no\_college}\)</span> and the pooled estimate:</p>
+<p><span class="math display">\[ Z =\dfrac{ (\hat{P}_1 - \hat{P}_2) - 0}{\sqrt{\dfrac{\hat{P}(1 - \hat{P})}{n_1} + \dfrac{\hat{P}(1 - \hat{P})}{n_2} }} \sim N(0, 1) \]</span> where <span class="math inline">\(\hat{P} = \dfrac{\text{total number of successes} }{ \text{total number of cases}}.\)</span></p>
+<div id="observed-test-statistic-2" class="section level4 unnumbered">
+<h4>Observed test statistic</h4>
+<p>While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the <code>prop.test</code> function to perform this analysis for us.</p>
+<pre class="sourceCode r"><code class="sourceCode r">z_hat &lt;-<span class="st"> </span>offshore <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(response <span class="op">~</span><span class="st"> </span>college_grad, <span class="dt">success =</span> <span class="st">&quot;no opinion&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;z&quot;</span>, <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;yes&quot;</span>, <span class="st">&quot;no&quot;</span>))
+z_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1 -3.16</code></pre>
+<p>The observed difference in sample proportions is 3.16 standard deviations smaller than 0.</p>
+<p>The <span class="math inline">\(p\)</span>-value—the probability of observing a <span class="math inline">\(Z\)</span> value of -3.16 or more extreme in our null distribution—is 0.0016. This can also be calculated in R directly:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="dv">2</span> <span class="op">*</span><span class="st"> </span><span class="kw">pnorm</span>(<span class="op">-</span><span class="fl">3.16</span>, <span class="dt">lower.tail =</span> <span class="ot">TRUE</span>)</code></pre>
+<pre><code>[1] 0.00158</code></pre>
+</div>
+</div>
+<div id="state-conclusion-2" class="section level3">
+<h3><span class="header-section-number">B.4.8</span> State conclusion</h3>
+<p>We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians.</p>
+<hr />
+</div>
+<div id="comparing-results-2" class="section level3">
+<h3><span class="header-section-number">B.4.9</span> Comparing results</h3>
+<p>Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the <span class="math inline">\(p\)</span>-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results.</p>
+<hr />
+<hr />
+</div>
+</div>
+<div id="two-means-independent-samples" class="section level2">
+<h2><span class="header-section-number">B.5</span> Two means (independent samples)</h2>
+<div id="problem-statement-3" class="section level3">
+<h3><span class="header-section-number">B.5.1</span> Problem statement</h3>
+<p>Average income varies from one region of the country to
+another, and it often reflects both lifestyles and regional living expenses. Suppose a new graduate
+is considering a job in two locations, Cleveland, OH and Sacramento, CA, and he wants to see
+whether the average income in one of these cities is higher than the other. He would like to conduct
+a hypothesis test based on two randomly selected samples from the 2000 Census. <span class="citation">(Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 5])</span></p>
+</div>
+<div id="competing-hypotheses-3" class="section level3">
+<h3><span class="header-section-number">B.5.2</span> Competing hypotheses</h3>
+<div id="in-words-3" class="section level4 unnumbered">
+<h4>In words</h4>
+<ul>
+<li><p>Null hypothesis: There is no association between income and location (Cleveland, OH and Sacramento, CA).</p></li>
+<li><p>Alternative hypothesis: There is an association between income and location (Cleveland, OH and Sacramento, CA).</p></li>
+</ul>
+</div>
+<div id="another-way-in-words-1" class="section level4 unnumbered">
+<h4>Another way in words</h4>
+<ul>
+<li><p>Null hypothesis: The mean income is the <strong>same</strong> for both cities.</p></li>
+<li><p>Alternative hypothesis: The mean income is different for the two cities.</p></li>
+</ul>
+</div>
+<div id="in-symbols-with-annotations-3" class="section level4 unnumbered">
+<h4>In symbols (with annotations)</h4>
+<ul>
+<li><span class="math inline">\(H_0: \mu_{sac} = \mu_{cle}\)</span> or <span class="math inline">\(H_0: \mu_{sac} - \mu_{cle} = 0\)</span>, where <span class="math inline">\(\mu\)</span> represents the average income.</li>
+<li><span class="math inline">\(H_A: \mu_{sac} - \mu_{cle} \ne 0\)</span></li>
+</ul>
+</div>
+<div id="set-alpha-3" class="section level4 unnumbered">
+<h4>Set <span class="math inline">\(\alpha\)</span></h4>
+<p>It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.</p>
+</div>
+</div>
+<div id="exploring-the-sample-data-3" class="section level3">
+<h3><span class="header-section-number">B.5.3</span> Exploring the sample data</h3>
+<pre class="sourceCode r"><code class="sourceCode r">cle_sac &lt;-<span class="st"> </span><span class="kw">read.delim</span>(<span class="st">&quot;https://moderndive.com/data/cleSac.txt&quot;</span>) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">rename</span>(<span class="dt">metro_area =</span> Metropolitan_area_Detailed,
+         <span class="dt">income =</span> Total_personal_income) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">na.omit</span>()</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">inc_summ &lt;-<span class="st"> </span>cle_sac <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">group_by</span>(metro_area) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">sample_size =</span> <span class="kw">n</span>(),
+    <span class="dt">mean =</span> <span class="kw">mean</span>(income),
+    <span class="dt">sd =</span> <span class="kw">sd</span>(income),
+    <span class="dt">minimum =</span> <span class="kw">min</span>(income),
+    <span class="dt">lower_quartile =</span> <span class="kw">quantile</span>(income, <span class="fl">0.25</span>),
+    <span class="dt">median =</span> <span class="kw">median</span>(income),
+    <span class="dt">upper_quartile =</span> <span class="kw">quantile</span>(income, <span class="fl">0.75</span>),
+    <span class="dt">max =</span> <span class="kw">max</span>(income))
+<span class="kw">kable</span>(inc_summ)</code></pre>
+<table>
+<thead>
+<tr class="header">
+<th align="center">metro_area</th>
+<th align="center">sample_size</th>
+<th align="center">mean</th>
+<th align="center">sd</th>
+<th align="center">minimum</th>
+<th align="center">lower_quartile</th>
+<th align="center">median</th>
+<th align="center">upper_quartile</th>
+<th align="center">max</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center">Cleveland_ OH</td>
+<td align="center">212</td>
+<td align="center">27467</td>
+<td align="center">27681</td>
+<td align="center">0</td>
+<td align="center">8475</td>
+<td align="center">21000</td>
+<td align="center">35275</td>
+<td align="center">152400</td>
+</tr>
+<tr class="even">
+<td align="center">Sacramento_ CA</td>
+<td align="center">175</td>
+<td align="center">32428</td>
+<td align="center">35774</td>
+<td align="center">0</td>
+<td align="center">8050</td>
+<td align="center">20000</td>
+<td align="center">49350</td>
+<td align="center">206900</td>
+</tr>
+</tbody>
+</table>
+<p>The boxplot below also shows the mean for each group highlighted by the red dots.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(cle_sac, <span class="kw">aes</span>(<span class="dt">x =</span> metro_area, <span class="dt">y =</span> income)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_boxplot</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">stat_summary</span>(<span class="dt">fun.y =</span> <span class="st">&quot;mean&quot;</span>, <span class="dt">geom =</span> <span class="st">&quot;point&quot;</span>, <span class="dt">color =</span> <span class="st">&quot;red&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/boxplot-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="guess-about-statistical-significance-3" class="section level4 unnumbered">
+<h4>Guess about statistical significance</h4>
+<p>We are looking to see if a difference exists in the mean income of the two levels of the explanatory variable. Based solely on the boxplot, we have reason to believe that no difference exists. The distributions of income seem similar and the means fall in roughly the same place.</p>
+<hr />
+</div>
+</div>
+<div id="non-traditional-methods-3" class="section level3">
+<h3><span class="header-section-number">B.5.4</span> Non-traditional methods</h3>
+<div id="collecting-summary-info-1" class="section level4 unnumbered">
+<h4>Collecting summary info</h4>
+<p>We now compute the observed statistic:</p>
+<pre class="sourceCode r"><code class="sourceCode r">d_hat &lt;-<span class="st"> </span>cle_sac <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(income <span class="op">~</span><span class="st"> </span>metro_area) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>, 
+            <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Sacramento_ CA&quot;</span>, <span class="st">&quot;Cleveland_ OH&quot;</span>))
+d_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1 4960.</code></pre>
+</div>
+<div id="randomization-for-hypothesis-test-1" class="section level4 unnumbered">
+<h4>Randomization for hypothesis test</h4>
+<p>In order to look to see if the observed sample mean for Sacramento of 27467.066 is statistically different than that for Cleveland of 32427.543, we need to account for the sample sizes. Note that this is the same as looking to see if <span class="math inline">\(\bar{x}_{sac} - \bar{x}_{cle}\)</span> is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 212 and 175 were selected.</p>
+<p>We can use the idea of <em>randomization testing</em> (also known as <em>permutation testing</em>) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using <em>shuffling</em> from that simulated population to account for sampling variability.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2018</span>)
+null_distn_two_means &lt;-<span class="st"> </span>cle_sac <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(income <span class="op">~</span><span class="st"> </span>metro_area) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;independence&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>,
+            <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Sacramento_ CA&quot;</span>, <span class="st">&quot;Cleveland_ OH&quot;</span>))</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-474-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to 4960.477 or less than or equal to -4960.477 for our <span class="math inline">\(p\)</span>-value.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-475-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="calculate-p-value-3" class="section level5 unnumbered">
+<h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;both&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1   0.124</code></pre>
+<p>So our <span class="math inline">\(p\)</span>-value is 0.124 and we fail to reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are not very far into the tail of the null distribution.</p>
+</div>
+</div>
+<div id="bootstrapping-for-confidence-interval-3" class="section level4 unnumbered">
+<h4>Bootstrapping for confidence interval</h4>
+<p>We can also create a confidence interval for the unknown population parameter <span class="math inline">\(\mu_{sac} - \mu_{cle}\)</span> using our sample data with <em>bootstrapping</em>. Here we will bootstrap each of the groups with replacement instead of shuffling. This is done using the <code>groups</code>
+argument in the <code>resample</code> function to fix the size of each group to
+be the same as the original group sizes of 175 for Sacramento and 212 for Cleveland.</p>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_two_means &lt;-<span class="st"> </span>cle_sac <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(income <span class="op">~</span><span class="st"> </span>metro_area) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;diff in means&quot;</span>,
+            <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Sacramento_ CA&quot;</span>, <span class="st">&quot;Cleveland_ OH&quot;</span>))</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">ci &lt;-<span class="st"> </span>boot_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1 -1446.  11308.</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_two_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-479-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that 0 is contained in this confidence interval as a plausible value of <span class="math inline">\(\mu_{sac} - \mu_{cle}\)</span> (the unknown population parameter). This matches with our hypothesis test results of failing to reject the null hypothesis. Since zero is a plausible value of the population parameter, we do not have evidence that Sacramento incomes are different than Cleveland incomes.</p>
+<p><strong>Interpretation</strong>: We are 95% confident the true mean yearly income for those living in Sacramento is between 1445.53 dollars smaller to 11307.82 dollars higher than for Cleveland.</p>
+<p><strong>Note</strong>: You could also use the null distribution based on randomization with a shift to have its center at <span class="math inline">\(\bar{x}_{sac} - \bar{x}_{cle} = \$4960.48\)</span> instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above.</p>
+<hr />
+</div>
+</div>
+<div id="traditional-methods-3" class="section level3">
+<h3><span class="header-section-number">B.5.5</span> Traditional methods</h3>
+<div id="check-conditions-3" class="section level5 unnumbered">
+<h5>Check conditions</h5>
+<p>Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.</p>
+<ol style="list-style-type: decimal">
+<li><p><em>Independent observations</em>: The observations are independent in both groups.</p>
+<p>This <code>metro_area</code> variable is met since the cases are randomly selected from each city.</p></li>
+<li><p><em>Approximately normal</em>: The distribution of the response for each group should be normal or the sample sizes should be at least 30.</p></li>
+</ol>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(cle_sac, <span class="kw">aes</span>(<span class="dt">x =</span> income)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">binwidth =</span> <span class="dv">20000</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>metro_area)</code></pre>
+<p><img src="ismaykim_files/figure-html/hist-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We have some reason to doubt the normality assumption here since both the histograms show deviation from a normal model fitting the data well for each group. The sample sizes for each group are greater than 100 though so the assumptions should still apply.</p>
+<ol start="3" style="list-style-type: decimal">
+<li><p><em>Independent samples</em>: The samples should be collected without any natural pairing.</p>
+<p>There is no mention of there being a relationship between those selected in Cleveland and in Sacramento.</p></li>
+</ol>
+</div>
+</div>
+<div id="test-statistic-3" class="section level3">
+<h3><span class="header-section-number">B.5.6</span> Test statistic</h3>
+<p>The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample means (<span class="math inline">\(\bar{x}_{sac, obs} - \bar{x}_{cle, obs}\)</span> = 4960.477) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the <span class="math inline">\(t\)</span> distribution to standardize the difference in sample means (<span class="math inline">\(\bar{X}_{sac} - \bar{X}_{cle}\)</span>) using the approximate standard error of <span class="math inline">\(\bar{X}_{sac} - \bar{X}_{cle}\)</span> (invoking <span class="math inline">\(S_{sac}\)</span> and <span class="math inline">\(S_{cle}\)</span> as estimates of unknown <span class="math inline">\(\sigma_{sac}\)</span> and <span class="math inline">\(\sigma_{cle}\)</span>).</p>
+<p><span class="math display">\[ T =\dfrac{ (\bar{X}_1 - \bar{X}_2) - 0}{ \sqrt{\dfrac{S_1^2}{n_1} + \dfrac{S_2^2}{n_2}}  } \sim t (df = min(n_1 - 1, n_2 - 1)) \]</span> where 1 = Sacramento and 2 = Cleveland with <span class="math inline">\(S_1^2\)</span> and <span class="math inline">\(S_2^2\)</span> the sample variance of the incomes of both cities, respectively, and <span class="math inline">\(n_1 = 175\)</span> for Sacramento and <span class="math inline">\(n_2 = 212\)</span> for Cleveland.</p>
+<div id="observed-test-statistic-3" class="section level4 unnumbered">
+<h4>Observed test statistic</h4>
+<p>Note that we could also do (ALMOST) this test directly using the <code>t.test</code> function. The <code>x</code> and <code>y</code> arguments are expected to both be numeric vectors here so we’ll need to appropriately filter our datasets.</p>
+<pre class="sourceCode r"><code class="sourceCode r">cle_sac <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(income <span class="op">~</span><span class="st"> </span>metro_area) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;t&quot;</span>,
+            <span class="dt">order =</span> <span class="kw">c</span>(<span class="st">&quot;Cleveland_ OH&quot;</span>, <span class="st">&quot;Sacramento_ CA&quot;</span>))</code></pre>
+<pre><code># A tibble: 1 x 1
+   stat
+  &lt;dbl&gt;
+1 -1.50</code></pre>
+<!--
+Note that the degrees of freedom reported above are different than what we used above in specifying the **Test Statistic**.  The degrees of freedom used here is also known as the Satterthwaite approximation and involves a quite complicated formula.  For most problems, the much simpler smaller of sample sizes minus one will suffice.
+-->
+<p>We see here that the observed test statistic value is around -1.5. <!-- with $df = min(212 - 1, 175 - 1) = 174$.--></p>
+<p>While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies.</p>
+<!--
+We can use the `inference` function in the `oilabs` package to perform this analysis for us.  Note that to obtain the `F value` given here, you divide the observed $MSG$ value of 17.53 by the observed $MSE$ value of 1.75.  (The use of the word `Residuals` will make more sense when we have covered regression.)
+-->
+<!--Recall that for large degrees of freedom, the $t$ distribution is roughly equal to the standard normal curve so our difference in `df` for the Satterthwaite and "min" variations doesn't really matter.-->
+</div>
+</div>
+<div id="compute-p-value-1" class="section level3">
+<h3><span class="header-section-number">B.5.7</span> Compute <span class="math inline">\(p\)</span>-value</h3>
+<p>The <span class="math inline">\(p\)</span>-value—the probability of observing an <span class="math inline">\(t_{174}\)</span> value of -1.501 or more extreme (in both directions) in our null distribution—is 0.13. This can also be calculated in R directly:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="dv">2</span> <span class="op">*</span><span class="st"> </span><span class="kw">pt</span>(<span class="op">-</span><span class="fl">1.501</span>, <span class="dt">df =</span> <span class="kw">min</span>(<span class="dv">212</span> <span class="op">-</span><span class="st"> </span><span class="dv">1</span>, <span class="dv">175</span> <span class="op">-</span><span class="st"> </span><span class="dv">1</span>), <span class="dt">lower.tail =</span> <span class="ot">TRUE</span>)</code></pre>
+<pre><code>[1] 0.135</code></pre>
+<p>We can also approximate by using the standard normal curve:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="dv">2</span> <span class="op">*</span><span class="st"> </span><span class="kw">pnorm</span>(<span class="op">-</span><span class="fl">1.501</span>)</code></pre>
+<pre><code>[1] 0.133</code></pre>
+<p>Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping.</p>
+</div>
+<div id="state-conclusion-3" class="section level3">
+<h3><span class="header-section-number">B.5.8</span> State conclusion</h3>
+<p>We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference not existing in the means was backed by this statistical analysis. We do not have evidence to suggest that the true mean income differs between Cleveland, OH and Sacramento, CA based on this data.</p>
+<hr />
+</div>
+<div id="comparing-results-3" class="section level3">
+<h3><span class="header-section-number">B.5.9</span> Comparing results</h3>
+<p>Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the <span class="math inline">\(p\)</span>-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.</p>
+<hr />
+<hr />
+</div>
+</div>
+<div id="two-means-paired-samples" class="section level2">
+<h2><span class="header-section-number">B.6</span> Two means (paired samples)</h2>
+<div id="problem-statement-4" class="section level4 unnumbered">
+<h4>Problem statement</h4>
+<p>Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water at 10 randomly selected locations on a stretch of river. Do the data suggest that the true average concentration in the surface water is smaller than that of bottom water? (Note that units are not given.) [Tweaked a bit from <a href="https://onlinecourses.science.psu.edu/stat500/node/51" class="uri">https://onlinecourses.science.psu.edu/stat500/node/51</a>]</p>
+</div>
+<div id="competing-hypotheses-4" class="section level3">
+<h3><span class="header-section-number">B.6.1</span> Competing hypotheses</h3>
+<div id="in-words-4" class="section level4 unnumbered">
+<h4>In words</h4>
+<ul>
+<li><p>Null hypothesis: The mean concentration in the bottom water is the same as that of the surface water at different paired locations.</p></li>
+<li><p>Alternative hypothesis: The mean concentration in the surface water is smaller than that of the bottom water at different paired locations.</p></li>
+</ul>
+</div>
+<div id="in-symbols-with-annotations-4" class="section level4 unnumbered">
+<h4>In symbols (with annotations)</h4>
+<ul>
+<li><span class="math inline">\(H_0: \mu_{diff} = 0\)</span>, where <span class="math inline">\(\mu_{diff}\)</span> represents the mean difference in concentration for surface water minus bottom water.</li>
+<li><span class="math inline">\(H_A: \mu_{diff} &lt; 0\)</span></li>
+</ul>
+</div>
+<div id="set-alpha-4" class="section level4 unnumbered">
+<h4>Set <span class="math inline">\(\alpha\)</span></h4>
+<p>It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.</p>
+</div>
+</div>
+<div id="exploring-the-sample-data-4" class="section level3">
+<h3><span class="header-section-number">B.6.2</span> Exploring the sample data</h3>
+<pre class="sourceCode r"><code class="sourceCode r">zinc_tidy &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;https://moderndive.com/data/zinc_tidy.csv&quot;</span>)</code></pre>
+<p>We want to look at the differences in <code>surface - bottom</code> for each location:</p>
+<pre class="sourceCode r"><code class="sourceCode r">zinc_diff &lt;-<span class="st"> </span>zinc_tidy <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(loc_id) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">pair_diff =</span> <span class="kw">diff</span>(concentration)) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">ungroup</span>()</code></pre>
+<p>Next we calculate the mean difference as our observed statistic:</p>
+<pre class="sourceCode r"><code class="sourceCode r">d_hat &lt;-<span class="st"> </span>zinc_diff <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> pair_diff) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)
+d_hat</code></pre>
+<pre><code># A tibble: 1 x 1
+     stat
+    &lt;dbl&gt;
+1 -0.0804</code></pre>
+<p>The histogram below also shows the distribution of <code>pair_diff</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(zinc_diff, <span class="kw">aes</span>(<span class="dt">x =</span> pair_diff)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="fl">0.04</span>, <span class="dt">color =</span> <span class="st">&quot;white&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/hist1a-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="guess-about-statistical-significance-4" class="section level4 unnumbered">
+<h4>Guess about statistical significance</h4>
+<p>We are looking to see if the sample paired mean difference of -0.08 is statistically less than 0. They seem to be quite close, but we have a small number of pairs here. Let’s guess that we will fail to reject the null hypothesis.</p>
+<hr />
+</div>
+</div>
+<div id="non-traditional-methods-4" class="section level3">
+<h3><span class="header-section-number">B.6.3</span> Non-traditional methods</h3>
+<div id="bootstrapping-for-hypothesis-test-1" class="section level4 unnumbered">
+<h4>Bootstrapping for hypothesis test</h4>
+<p>In order to look to see if the observed sample mean difference <span class="math inline">\(\bar{x}_{diff} = 4960.477\)</span> is statistically less than 0, we need to account for the number of pairs. We also need to determine a process that replicates how the paired data was selected in a way similar to how we calculated our original difference in sample means.</p>
+<p>Treating the differences as our data of interest, we next use the process of <strong>bootstrapping</strong> to build other simulated samples and then calculate the mean of the bootstrap samples. We hypothesize that the mean difference is zero.</p>
+<p>This process is similar to comparing the One Mean example seen above, but using the differences between the two groups as a single sample with a hypothesized mean difference of 0.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">2018</span>)
+null_distn_paired_means &lt;-<span class="st"> </span>zinc_diff <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> pair_diff) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">hypothesize</span>(<span class="dt">null =</span> <span class="st">&quot;point&quot;</span>, <span class="dt">mu =</span> <span class="dv">0</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">visualize</span>()</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-483-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We can next use this distribution to observe our <span class="math inline">\(p\)</span>-value. Recall this is a left-tailed test so we will be looking for values that are less than or equal to 4960.477 for our <span class="math inline">\(p\)</span>-value.</p>
+<pre class="sourceCode r"><code class="sourceCode r">null_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;less&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-484-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<div id="calculate-p-value-4" class="section level5 unnumbered">
+<h5>Calculate <span class="math inline">\(p\)</span>-value</h5>
+<pre class="sourceCode r"><code class="sourceCode r">pvalue &lt;-<span class="st"> </span>null_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_pvalue</span>(<span class="dt">obs_stat =</span> d_hat, <span class="dt">direction =</span> <span class="st">&quot;less&quot;</span>)
+pvalue</code></pre>
+<pre><code># A tibble: 1 x 1
+  p_value
+    &lt;dbl&gt;
+1       0</code></pre>
+<p>So our <span class="math inline">\(p\)</span>-value is essentially 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the left tail of the null distribution.</p>
+</div>
+</div>
+<div id="bootstrapping-for-confidence-interval-4" class="section level4 unnumbered">
+<h4>Bootstrapping for confidence interval</h4>
+<p>We can also create a confidence interval for the unknown population parameter <span class="math inline">\(\mu_{diff}\)</span> using our sample data (the calculated differences) with <em>bootstrapping</em>. This is similar to the bootstrapping done in a one sample mean case, except now our data is differences instead of raw numerical data.
+Note that this code is identical to the pipeline shown in the hypothesis test above except the <code>hypothesize()</code> function is not called.</p>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_paired_means &lt;-<span class="st"> </span>zinc_diff <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">specify</span>(<span class="dt">response =</span> pair_diff) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">generate</span>(<span class="dt">reps =</span> <span class="dv">10000</span>) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">calculate</span>(<span class="dt">stat =</span> <span class="st">&quot;mean&quot;</span>)</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">ci &lt;-<span class="st"> </span>boot_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">get_ci</span>()
+ci</code></pre>
+<pre><code># A tibble: 1 x 2
+  `2.5%` `97.5%`
+   &lt;dbl&gt;   &lt;dbl&gt;
+1 -0.112 -0.0503</code></pre>
+<pre class="sourceCode r"><code class="sourceCode r">boot_distn_paired_means <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">visualize</span>(<span class="dt">endpoints =</span> ci, <span class="dt">direction =</span> <span class="st">&quot;between&quot;</span>)</code></pre>
+<p><img src="ismaykim_files/figure-html/unnamed-chunk-488-1.png" width="\textwidth" style="display: block; margin: auto;" /></p>
+<p>We see that 0 is not contained in this confidence interval as a plausible value of <span class="math inline">\(\mu_{diff}\)</span> (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter and since the entire confidence interval falls below zero, we have evidence that surface zinc concentration levels are lower, on average, than bottom level zinc concentrations.</p>
+<p><strong>Interpretation</strong>: We are 95% confident the true mean zinc concentration on the surface is between 0.11 units smaller to 0.05 units smaller than on the bottom.</p>
+<hr />
+</div>
+</div>
+<div id="traditional-methods-4" class="section level3">
+<h3><span class="header-section-number">B.6.4</span> Traditional methods</h3>
+<div id="check-conditions-4" class="section level4 unnumbered">
+<h4>Check conditions</h4>
+<p>Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.</p>
+<ol style="list-style-type: decimal">
+<li><p><em>Independent observations</em>: The observations among pairs are independent.</p>
+<p>The locations are selected independently through random sampling so this condition is met.</p></li>
+<li><p><em>Approximately normal</em>: The distribution of population of differences is normal or the number of pairs is at least 30.</p>
+<p>The histogram above does show some skew so we have reason to doubt the population being normal based on this sample. We also only have 10 pairs which is fewer than the 30 needed. A theory-based test may not be valid here.</p></li>
+</ol>
+</div>
+<div id="test-statistic-4" class="section level4 unnumbered">
+<h4>Test statistic</h4>
+<p>The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean difference <span class="math inline">\(\mu_{diff}\)</span>. A good guess is the sample mean difference <span class="math inline">\(\bar{X}_{diff}\)</span>. Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of <span class="math inline">\(\bar{x}_{diff, obs} = 0.0804\)</span> or larger assuming that the population mean difference is 0 (assuming the null hypothesis is true). If the conditions are met and assuming <span class="math inline">\(H_0\)</span> is true, we can “standardize” this original test statistic of <span class="math inline">\(\bar{X}_{diff}\)</span> into a <span class="math inline">\(T\)</span> statistic that follows a <span class="math inline">\(t\)</span> distribution with degrees of freedom equal to <span class="math inline">\(df = n - 1\)</span>:</p>
+<p><span class="math display">\[ T =\dfrac{ \bar{X}_{diff} - 0}{ S_{diff} / \sqrt{n} } \sim t (df = n - 1) \]</span></p>
+<p>where <span class="math inline">\(S\)</span> represents the standard deviation of the sample differences and <span class="math inline">\(n\)</span> is the number of pairs.</p>
+<div id="observed-test-statistic-4" class="section level5 unnumbered">
+<h5>Observed test statistic</h5>
+<p>While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the <code>t_test</code> function on the differences to perform this analysis for us.</p>
+<pre class="sourceCode r"><code class="sourceCode r">t_test_results &lt;-<span class="st"> </span>zinc_diff <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span>infer<span class="op">::</span><span class="kw">t_test</span>(<span class="dt">formula =</span> pair_diff <span class="op">~</span><span class="st"> </span><span class="ot">NULL</span>, 
+         <span class="dt">alternative =</span> <span class="st">&quot;less&quot;</span>,
+         <span class="dt">mu =</span> <span class="dv">0</span>)
+t_test_results</code></pre>
+<pre><code># A tibble: 1 x 6
+  statistic  t_df  p_value alternative lower_ci upper_ci
+      &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt;          &lt;dbl&gt;    &lt;dbl&gt;
+1     -4.86     9 0.000446 less            -Inf  -0.0501</code></pre>
+<p>We see here that the <span class="math inline">\(t_{obs}\)</span> value is -4.864.</p>
+</div>
+</div>
+<div id="compute-p-value-2" class="section level4 unnumbered">
+<h4>Compute <span class="math inline">\(p\)</span>-value</h4>
+<p>The <span class="math inline">\(p\)</span>-value—the probability of observing a <span class="math inline">\(t_{obs}\)</span> value of -4.864 or less in our null distribution of a <span class="math inline">\(t\)</span> with 9 degrees of freedom—is 0. This can also be calculated in R directly:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">pt</span>(<span class="op">-</span><span class="fl">4.8638</span>, <span class="dt">df =</span> <span class="kw">nrow</span>(zinc_diff) <span class="op">-</span><span class="st"> </span><span class="dv">1</span>, <span class="dt">lower.tail =</span> <span class="ot">TRUE</span>)</code></pre>
+<pre><code>[1] 0.000446</code></pre>
+</div>
+<div id="state-conclusion-4" class="section level4 unnumbered">
+<h4>State conclusion</h4>
+<p>We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean difference was not statistically less than the hypothesized mean of 0 has been invalidated here. Based on this sample, we have evidence that the mean concentration in the bottom water is greater than that of the surface water at different paired locations.</p>
+<hr />
+</div>
+</div>
+<div id="comparing-results-4" class="section level3">
+<h3><span class="header-section-number">B.6.5</span> Comparing results</h3>
+<p>Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the <span class="math inline">\(p\)</span>-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results here.</p>
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="A-appendixA.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="C-appendixC.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/92-appendixB.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/C-appendixC.html b/previous_versions/v0.4.0/C-appendixC.html
new file mode 100644
index 000000000..c8c82b6fe
--- /dev/null
+++ b/previous_versions/v0.4.0/C-appendixC.html
@@ -0,0 +1,693 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>C Reach for the Stars | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="C Reach for the Stars | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="C Reach for the Stars | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="B-appendixB.html">
+<link rel="next" href="references.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="appendixC" class="section level1">
+<h1><span class="header-section-number">C</span> Reach for the Stars</h1>
+<div id="needed-packages-11" class="section level2 unnumbered">
+<h2>Needed packages</h2>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dplyr)
+<span class="kw">library</span>(ggplot2)
+<span class="kw">library</span>(knitr)
+<span class="kw">library</span>(dygraphs)
+<span class="kw">library</span>(nycflights13)</code></pre>
+</div>
+<div id="sorted-barplots" class="section level2">
+<h2><span class="header-section-number">C.1</span> Sorted barplots</h2>
+<p>Building upon the example in Section <a href="3-viz.html#geombar">3.8</a>:</p>
+<pre class="sourceCode r"><code class="sourceCode r">flights_table &lt;-<span class="st"> </span><span class="kw">table</span>(flights<span class="op">$</span>carrier)
+flights_table</code></pre>
+<pre><code>
+   9E    AA    AS    B6    DL    EV    F9    FL    HA    MQ    OO    UA    US 
+18460 32729   714 54635 48110 54173   685  3260   342 26397    32 58665 20536 
+   VX    WN    YV 
+ 5162 12275   601 </code></pre>
+<p>We can sort this table from highest to lowest counts by using the <code>sort</code> function:</p>
+<pre class="sourceCode r"><code class="sourceCode r">sorted_flights &lt;-<span class="st"> </span><span class="kw">sort</span>(flights_table, <span class="dt">decreasing =</span> <span class="ot">TRUE</span>)
+<span class="kw">names</span>(sorted_flights)</code></pre>
+<pre><code> [1] &quot;UA&quot; &quot;B6&quot; &quot;EV&quot; &quot;DL&quot; &quot;AA&quot; &quot;MQ&quot; &quot;US&quot; &quot;9E&quot; &quot;WN&quot; &quot;VX&quot; &quot;FL&quot; &quot;AS&quot; &quot;F9&quot; &quot;YV&quot; &quot;HA&quot;
+[16] &quot;OO&quot;</code></pre>
+<p>It is often preferred for barplots to be ordered corresponding to the heights of the bars. This allows the reader to more easily compare the ordering of different airlines in terms of departed flights <span class="citation">(Robbins 2013)</span>. We can also much more easily answer questions like “How many airlines have more departing flights than Southwest Airlines?”.</p>
+<p>We can use the sorted table giving the number of flights defined as <code>sorted_flights</code> to <strong>reorder</strong> the <code>carrier</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> carrier)) <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_bar</span>() <span class="op">+</span>
+<span class="st">  </span><span class="kw">scale_x_discrete</span>(<span class="dt">limits =</span> <span class="kw">names</span>(sorted_flights))</code></pre>
+<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-494"></span>
+<img src="ismaykim_files/figure-html/unnamed-chunk-494-1.png" alt="Number of flights departing NYC in 2013 by airline - Descending numbers" width="\textwidth" />
+<p class="caption">
+Figure C.1: Number of flights departing NYC in 2013 by airline - Descending numbers
+</p>
+</div>
+<p>The last addition here specifies the values of the horizontal <code>x</code> axis on a discrete scale to correspond to those given by the entries of <code>sorted_flights</code>.</p>
+</div>
+<div id="interactive-graphics" class="section level2">
+<h2><span class="header-section-number">C.2</span> Interactive graphics</h2>
+<div id="interactive-linegraphs" class="section level3">
+<h3><span class="header-section-number">C.2.1</span> Interactive linegraphs</h3>
+<p>Another useful tool for viewing linegraphs such as this is the <code>dygraph</code> function in the <code>dygraphs</code> package in combination with the <code>dyRangeSelector</code> function. This allows us to zoom in on a selected range and get an interactive plot for us to work with:</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dygraphs)
+flights_day &lt;-<span class="st"> </span><span class="kw">mutate</span>(flights, <span class="dt">date =</span> <span class="kw">as.Date</span>(time_hour))
+flights_summarized &lt;-<span class="st"> </span>flights_day <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">group_by</span>(date) <span class="op">%&gt;%</span>
+<span class="st">  </span><span class="kw">summarize</span>(<span class="dt">median_arr_delay =</span> <span class="kw">median</span>(arr_delay, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
+<span class="kw">rownames</span>(flights_summarized) &lt;-<span class="st"> </span>flights_summarized<span class="op">$</span>date
+flights_summarized &lt;-<span class="st"> </span><span class="kw">select</span>(flights_summarized, <span class="op">-</span>date)
+<span class="kw">dyRangeSelector</span>(<span class="kw">dygraph</span>(flights_summarized))</code></pre>
+<div id="htmlwidget-31c6bf48ffd450078efe" style="width:100%;height:384px;" class="dygraphs html-widget"></div>
+<script type="application/json" data-for="htmlwidget-31c6bf48ffd450078efe">{"x":{"attrs":{"labels":["day","median_arr_delay"],"legend":"auto","retainDateWindow":false,"axes":{"x":{"pixelsPerLabel":60}},"showRangeSelector":true,"rangeSelectorHeight":40,"rangeSelectorPlotFillColor":" #A7B1C4","rangeSelectorPlotStrokeColor":"#808FAB","interactionModel":"Dygraph.Interaction.defaultModel"},"scale":"daily","annotations":[],"shadings":[],"events":[],"format":"date","data":[["2013-01-01T05:00:00.000Z","2013-01-02T05:00:00.000Z","2013-01-03T05:00:00.000Z","2013-01-04T05:00:00.000Z","2013-01-05T05:00:00.000Z","2013-01-06T05:00:00.000Z","2013-01-07T05:00:00.000Z","2013-01-08T05:00:00.000Z","2013-01-09T05:00:00.000Z","2013-01-10T05:00:00.000Z","2013-01-11T05:00:00.000Z","2013-01-12T05:00:00.000Z","2013-01-13T05:00:00.000Z","2013-01-14T05:00:00.000Z","2013-01-15T05:00:00.000Z","2013-01-16T05:00:00.000Z","2013-01-17T05:00:00.000Z","2013-01-18T05:00:00.000Z","2013-01-19T05:00:00.000Z","2013-01-20T05:00:00.000Z","2013-01-21T05:00:00.000Z","2013-01-22T05:00:00.000Z","2013-01-23T05:00:00.000Z","2013-01-24T05:00:00.000Z","2013-01-25T05:00:00.000Z","2013-01-26T05:00:00.000Z","2013-01-27T05:00:00.000Z","2013-01-28T05:00:00.000Z","2013-01-29T05:00:00.000Z","2013-01-30T05:00:00.000Z","2013-01-31T05:00:00.000Z","2013-02-01T05:00:00.000Z","2013-02-02T05:00:00.000Z","2013-02-03T05:00:00.000Z","2013-02-04T05:00:00.000Z","2013-02-05T05:00:00.000Z","2013-02-06T05:00:00.000Z","2013-02-07T05:00:00.000Z","2013-02-08T05:00:00.000Z","2013-02-09T05:00:00.000Z","2013-02-10T05:00:00.000Z","2013-02-11T05:00:00.000Z","2013-02-12T05:00:00.000Z","2013-02-13T05:00:00.000Z","2013-02-14T05:00:00.000Z","2013-02-15T05:00:00.000Z","2013-02-16T05:00:00.000Z","2013-02-17T05:00:00.000Z","2013-02-18T05:00:00.000Z","2013-02-19T05:00:00.000Z","2013-02-20T05:00:00.000Z","2013-02-21T05:00:00.000Z","2013-02-22T05:00:00.000Z","2013-02-23T05:00:00.000Z","2013-02-24T05:00:00.000Z","2013-02-25T05:00:00.000Z","2013-02-26T05:00:00.000Z","2013-02-27T05:00:00.000Z","2013-02-28T05:00:00.000Z","2013-03-01T05:00:00.000Z","2013-03-02T05:00:00.000Z","2013-03-03T05:00:00.000Z","2013-03-04T05:00:00.000Z","2013-03-05T05:00:00.000Z","2013-03-06T05:00:00.000Z","2013-03-07T05:00:00.000Z","2013-03-08T05:00:00.000Z","2013-03-09T05:00:00.000Z","2013-03-10T05:00:00.000Z","2013-03-11T04:00:00.000Z","2013-03-12T04:00:00.000Z","2013-03-13T04:00:00.000Z","2013-03-14T04:00:00.000Z","2013-03-15T04:00:00.000Z","2013-03-16T04:00:00.000Z","2013-03-17T04:00:00.000Z","2013-03-18T04:00:00.000Z","2013-03-19T04:00:00.000Z","2013-03-20T04:00:00.000Z","2013-03-21T04:00:00.000Z","2013-03-22T04:00:00.000Z","2013-03-23T04:00:00.000Z","2013-03-24T04:00:00.000Z","2013-03-25T04:00:00.000Z","2013-03-26T04:00:00.000Z","2013-03-27T04:00:00.000Z","2013-03-28T04:00:00.000Z","2013-03-29T04:00:00.000Z","2013-03-30T04:00:00.000Z","2013-03-31T04:00:00.000Z","2013-04-01T04:00:00.000Z","2013-04-02T04:00:00.000Z","2013-04-03T04:00:00.000Z","2013-04-04T04:00:00.000Z","2013-04-05T04:00:00.000Z","2013-04-06T04:00:00.000Z","2013-04-07T04:00:00.000Z","2013-04-08T04:00:00.000Z","2013-04-09T04:00:00.000Z","2013-04-10T04:00:00.000Z","2013-04-11T04:00:00.000Z","2013-04-12T04:00:00.000Z","2013-04-13T04:00:00.000Z","2013-04-14T04:00:00.000Z","2013-04-15T04:00:00.000Z","2013-04-16T04:00:00.000Z","2013-04-17T04:00:00.000Z","2013-04-18T04:00:00.000Z","2013-04-19T04:00:00.000Z","2013-04-20T04:00:00.000Z","2013-04-21T04:00:00.000Z","2013-04-22T04:00:00.000Z","2013-04-23T04:00:00.000Z","2013-04-24T04:00:00.000Z","2013-04-25T04:00:00.000Z","2013-04-26T04:00:00.000Z","2013-04-27T04:00:00.000Z","2013-04-28T04:00:00.000Z","2013-04-29T04:00:00.000Z","2013-04-30T04:00:00.000Z","2013-05-01T04:00:00.000Z","2013-05-02T04:00:00.000Z","2013-05-03T04:00:00.000Z","2013-05-04T04:00:00.000Z","2013-05-05T04:00:00.000Z","2013-05-06T04:00:00.000Z","2013-05-07T04:00:00.000Z","2013-05-08T04:00:00.000Z","2013-05-09T04:00:00.000Z","2013-05-10T04:00:00.000Z","2013-05-11T04:00:00.000Z","2013-05-12T04:00:00.000Z","2013-05-13T04:00:00.000Z","2013-05-14T04:00:00.000Z","2013-05-15T04:00:00.000Z","2013-05-16T04:00:00.000Z","2013-05-17T04:00:00.000Z","2013-05-18T04:00:00.000Z","2013-05-19T04:00:00.000Z","2013-05-20T04:00:00.000Z","2013-05-21T04:00:00.000Z","2013-05-22T04:00:00.000Z","2013-05-23T04:00:00.000Z","2013-05-24T04:00:00.000Z","2013-05-25T04:00:00.000Z","2013-05-26T04:00:00.000Z","2013-05-27T04:00:00.000Z","2013-05-28T04:00:00.000Z","2013-05-29T04:00:00.000Z","2013-05-30T04:00:00.000Z","2013-05-31T04:00:00.000Z","2013-06-01T04:00:00.000Z","2013-06-02T04:00:00.000Z","2013-06-03T04:00:00.000Z","2013-06-04T04:00:00.000Z","2013-06-05T04:00:00.000Z","2013-06-06T04:00:00.000Z","2013-06-07T04:00:00.000Z","2013-06-08T04:00:00.000Z","2013-06-09T04:00:00.000Z","2013-06-10T04:00:00.000Z","2013-06-11T04:00:00.000Z","2013-06-12T04:00:00.000Z","2013-06-13T04:00:00.000Z","2013-06-14T04:00:00.000Z","2013-06-15T04:00:00.000Z","2013-06-16T04:00:00.000Z","2013-06-17T04:00:00.000Z","2013-06-18T04:00:00.000Z","2013-06-19T04:00:00.000Z","2013-06-20T04:00:00.000Z","2013-06-21T04:00:00.000Z","2013-06-22T04:00:00.000Z","2013-06-23T04:00:00.000Z","2013-06-24T04:00:00.000Z","2013-06-25T04:00:00.000Z","2013-06-26T04:00:00.000Z","2013-06-27T04:00:00.000Z","2013-06-28T04:00:00.000Z","2013-06-29T04:00:00.000Z","2013-06-30T04:00:00.000Z","2013-07-01T04:00:00.000Z","2013-07-02T04:00:00.000Z","2013-07-03T04:00:00.000Z","2013-07-04T04:00:00.000Z","2013-07-05T04:00:00.000Z","2013-07-06T04:00:00.000Z","2013-07-07T04:00:00.000Z","2013-07-08T04:00:00.000Z","2013-07-09T04:00:00.000Z","2013-07-10T04:00:00.000Z","2013-07-11T04:00:00.000Z","2013-07-12T04:00:00.000Z","2013-07-13T04:00:00.000Z","2013-07-14T04:00:00.000Z","2013-07-15T04:00:00.000Z","2013-07-16T04:00:00.000Z","2013-07-17T04:00:00.000Z","2013-07-18T04:00:00.000Z","2013-07-19T04:00:00.000Z","2013-07-20T04:00:00.000Z","2013-07-21T04:00:00.000Z","2013-07-22T04:00:00.000Z","2013-07-23T04:00:00.000Z","2013-07-24T04:00:00.000Z","2013-07-25T04:00:00.000Z","2013-07-26T04:00:00.000Z","2013-07-27T04:00:00.000Z","2013-07-28T04:00:00.000Z","2013-07-29T04:00:00.000Z","2013-07-30T04:00:00.000Z","2013-07-31T04:00:00.000Z","2013-08-01T04:00:00.000Z","2013-08-02T04:00:00.000Z","2013-08-03T04:00:00.000Z","2013-08-04T04:00:00.000Z","2013-08-05T04:00:00.000Z","2013-08-06T04:00:00.000Z","2013-08-07T04:00:00.000Z","2013-08-08T04:00:00.000Z","2013-08-09T04:00:00.000Z","2013-08-10T04:00:00.000Z","2013-08-11T04:00:00.000Z","2013-08-12T04:00:00.000Z","2013-08-13T04:00:00.000Z","2013-08-14T04:00:00.000Z","2013-08-15T04:00:00.000Z","2013-08-16T04:00:00.000Z","2013-08-17T04:00:00.000Z","2013-08-18T04:00:00.000Z","2013-08-19T04:00:00.000Z","2013-08-20T04:00:00.000Z","2013-08-21T04:00:00.000Z","2013-08-22T04:00:00.000Z","2013-08-23T04:00:00.000Z","2013-08-24T04:00:00.000Z","2013-08-25T04:00:00.000Z","2013-08-26T04:00:00.000Z","2013-08-27T04:00:00.000Z","2013-08-28T04:00:00.000Z","2013-08-29T04:00:00.000Z","2013-08-30T04:00:00.000Z","2013-08-31T04:00:00.000Z","2013-09-01T04:00:00.000Z","2013-09-02T04:00:00.000Z","2013-09-03T04:00:00.000Z","2013-09-04T04:00:00.000Z","2013-09-05T04:00:00.000Z","2013-09-06T04:00:00.000Z","2013-09-07T04:00:00.000Z","2013-09-08T04:00:00.000Z","2013-09-09T04:00:00.000Z","2013-09-10T04:00:00.000Z","2013-09-11T04:00:00.000Z","2013-09-12T04:00:00.000Z","2013-09-13T04:00:00.000Z","2013-09-14T04:00:00.000Z","2013-09-15T04:00:00.000Z","2013-09-16T04:00:00.000Z","2013-09-17T04:00:00.000Z","2013-09-18T04:00:00.000Z","2013-09-19T04:00:00.000Z","2013-09-20T04:00:00.000Z","2013-09-21T04:00:00.000Z","2013-09-22T04:00:00.000Z","2013-09-23T04:00:00.000Z","2013-09-24T04:00:00.000Z","2013-09-25T04:00:00.000Z","2013-09-26T04:00:00.000Z","2013-09-27T04:00:00.000Z","2013-09-28T04:00:00.000Z","2013-09-29T04:00:00.000Z","2013-09-30T04:00:00.000Z","2013-10-01T04:00:00.000Z","2013-10-02T04:00:00.000Z","2013-10-03T04:00:00.000Z","2013-10-04T04:00:00.000Z","2013-10-05T04:00:00.000Z","2013-10-06T04:00:00.000Z","2013-10-07T04:00:00.000Z","2013-10-08T04:00:00.000Z","2013-10-09T04:00:00.000Z","2013-10-10T04:00:00.000Z","2013-10-11T04:00:00.000Z","2013-10-12T04:00:00.000Z","2013-10-13T04:00:00.000Z","2013-10-14T04:00:00.000Z","2013-10-15T04:00:00.000Z","2013-10-16T04:00:00.000Z","2013-10-17T04:00:00.000Z","2013-10-18T04:00:00.000Z","2013-10-19T04:00:00.000Z","2013-10-20T04:00:00.000Z","2013-10-21T04:00:00.000Z","2013-10-22T04:00:00.000Z","2013-10-23T04:00:00.000Z","2013-10-24T04:00:00.000Z","2013-10-25T04:00:00.000Z","2013-10-26T04:00:00.000Z","2013-10-27T04:00:00.000Z","2013-10-28T04:00:00.000Z","2013-10-29T04:00:00.000Z","2013-10-30T04:00:00.000Z","2013-10-31T04:00:00.000Z","2013-11-01T04:00:00.000Z","2013-11-02T04:00:00.000Z","2013-11-03T04:00:00.000Z","2013-11-04T05:00:00.000Z","2013-11-05T05:00:00.000Z","2013-11-06T05:00:00.000Z","2013-11-07T05:00:00.000Z","2013-11-08T05:00:00.000Z","2013-11-09T05:00:00.000Z","2013-11-10T05:00:00.000Z","2013-11-11T05:00:00.000Z","2013-11-12T05:00:00.000Z","2013-11-13T05:00:00.000Z","2013-11-14T05:00:00.000Z","2013-11-15T05:00:00.000Z","2013-11-16T05:00:00.000Z","2013-11-17T05:00:00.000Z","2013-11-18T05:00:00.000Z","2013-11-19T05:00:00.000Z","2013-11-20T05:00:00.000Z","2013-11-21T05:00:00.000Z","2013-11-22T05:00:00.000Z","2013-11-23T05:00:00.000Z","2013-11-24T05:00:00.000Z","2013-11-25T05:00:00.000Z","2013-11-26T05:00:00.000Z","2013-11-27T05:00:00.000Z","2013-11-28T05:00:00.000Z","2013-11-29T05:00:00.000Z","2013-11-30T05:00:00.000Z","2013-12-01T05:00:00.000Z","2013-12-02T05:00:00.000Z","2013-12-03T05:00:00.000Z","2013-12-04T05:00:00.000Z","2013-12-05T05:00:00.000Z","2013-12-06T05:00:00.000Z","2013-12-07T05:00:00.000Z","2013-12-08T05:00:00.000Z","2013-12-09T05:00:00.000Z","2013-12-10T05:00:00.000Z","2013-12-11T05:00:00.000Z","2013-12-12T05:00:00.000Z","2013-12-13T05:00:00.000Z","2013-12-14T05:00:00.000Z","2013-12-15T05:00:00.000Z","2013-12-16T05:00:00.000Z","2013-12-17T05:00:00.000Z","2013-12-18T05:00:00.000Z","2013-12-19T05:00:00.000Z","2013-12-20T05:00:00.000Z","2013-12-21T05:00:00.000Z","2013-12-22T05:00:00.000Z","2013-12-23T05:00:00.000Z","2013-12-24T05:00:00.000Z","2013-12-25T05:00:00.000Z","2013-12-26T05:00:00.000Z","2013-12-27T05:00:00.000Z","2013-12-28T05:00:00.000Z","2013-12-29T05:00:00.000Z","2013-12-30T05:00:00.000Z","2013-12-31T05:00:00.000Z","2014-01-01T05:00:00.000Z"],[3,4,1,-7,-7,-2,-8,-8,-6,-11,-11,-14,-9,3,-3,16,1,-3,-12,-7,-3,2,-1,-1,3,-1,-9.5,-3,-12,-1,12,0,-9,-6,-3,1,-6,-5,10,-3,-5,7,-3,-6,-2,-3,-4,-12,-9.5,-3,-5,0,3,3,-8,-5,-5,11,-10,-8,-9,-13,-8,-10,-7.5,0,58,-9,-12,-7,3,-7,-7,-7,2.5,0,9,15,-3,-5,-6,3,-1,-1,-11,-11,-13,-14,-17,-10,0,-5,-4,-1,-2,-11,-9,-10,-9,-4,6,19,0,-3,-5,-8,-4,10,14,1,-3,19,13,4,23,11,-14,-11,-10,-13,-14,-13,-7,-15,-15,-12,-16,10,4,-3,2,-10,-12,-15,-12,-5,-6,-15,-3,-3,-6,5,30.5,10,-7,-14,-15,-9,-6,-10,-11,-16,-5,10,-5,-10,-6,5,-8,-9,3,-7,-5,30,4,-11,-9,3,13,4,-11,-11,-12,-8,14.5,15,5,8,14,0,11,44.5,1,2,-13.5,-15,-15,0,9,7,13,4,4,2,-16,-14,-6,-5.5,-6,-3,-1,-1,12,25,5,2,3,-7,7,3,-7,-8,11,2,-2,-4,-5,-8,-2,20,27,-1,-2,2,16,6,-2,-8,-12,-9,-9,-13,-14,10,-6,-16.5,-20,-18,-16,1,-6,-18,-15,-16,-2,-8,-18,-18,-19,-22,-16,-15,-15,-10,16,4,-17,-15,-7,-13,-16,-11,-13,-11,-8,-10,-12,-9,-9,-11,-19,-14,-15,-21,-16,-5,-10,-13,-10,5,-8,-11,-2,2,-13,-9,-7,-10,-7,-4,-2,-4,-5,-5,-4,-1,-4,-3,-5,-13,-7,-7,-9,-5,0,-4,-11,-1,-8,-8,-1,-3,-10,-9,-9,4,-12,-13,-8,-11,-2.5,-4,-6,-9,-5,-1,-1,-6,-8,-3,8,-5,-14,-17,-11.5,-7,-3,-7.5,12,9,1,10,29,35,7,0,-4,16,5.5,1,27,8,0,2,3,5,15,-1,-9,-2,-5,-8,-1,2,1,3]]},"evals":["attrs.interactionModel"],"jsHooks":[]}</script>
+<p><br></p>
+<p>The syntax here is a little different than what we have covered so far. The <code>dygraph</code> function is expecting for the dates to be given as the <code>rownames</code> of the object. We then remove the <code>date</code> variable from the <code>flights_summarized</code> data frame since it is accounted for in the <code>rownames</code>. Lastly, we run the <code>dygraph</code> function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via <code>dyRangeSelector</code>. (Note that this plot will only be interactive in the HTML version of this book.)</p>
+<!--
+**`paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Use the interactive linegraph to determine the highest median arrival delay for flights from NYC in 2013.  What date was it and what do you think contributed to it?
+
+
+** ** What are three specific questions that can be more easily answered by looking at Figure 4.6 instead of Figure 4.5?
+
+- Changing the labels of a plot (x-axis, y-axis)
+- stat = "identity" for aggregated data and barplots
+- Changing the theme for ggplots (`ggthemes` package too)
+- Adding `code_folding` and `code_download` to YAML
+- `kable` function from `knitr`
+- Reading in data from files in different formats - Getting Used to R book reference
+- Reshaping the data with `tidyr`
+
+-->
+
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="B-appendixB.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+<a href="references.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/93-appendixC.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/data/ageAtMar.csv b/previous_versions/v0.4.0/data/ageAtMar.csv
new file mode 100755
index 000000000..b68e12a1b
--- /dev/null
+++ b/previous_versions/v0.4.0/data/ageAtMar.csv
@@ -0,0 +1,5535 @@
+age
+32
+25
+24
+26
+32
+29
+23
+23
+29
+27
+23
+21
+29
+40
+22
+20
+31
+21
+25
+24
+23
+31
+30
+34
+29
+29
+35
+25
+22
+26
+26
+21
+25
+22
+19
+24
+22
+26
+27
+29
+25
+22
+29
+21
+23
+28
+25
+29
+27
+25
+21
+20
+31
+33
+32
+24
+33
+24
+22
+33
+33
+25
+24
+23
+20
+27
+28
+23
+27
+25
+22
+29
+27
+19
+27
+17
+29
+23
+27
+24
+27
+31
+30
+19
+24
+28
+15
+19
+26
+22
+19
+19
+22
+26
+24
+20
+30
+18
+22
+21
+17
+27
+27
+31
+34
+32
+22
+28
+25
+24
+18
+23
+17
+16
+23
+24
+21
+30
+23
+24
+26
+25
+22
+27
+19
+24
+24
+22
+26
+22
+23
+33
+28
+20
+24
+20
+26
+28
+24
+19
+41
+34
+23
+33
+39
+27
+23
+26
+18
+28
+34
+31
+18
+18
+22
+21
+29
+29
+27
+30
+25
+27
+23
+29
+22
+23
+22
+22
+26
+18
+22
+33
+19
+21
+29
+29
+18
+27
+27
+23
+20
+25
+25
+25
+23
+28
+30
+20
+28
+27
+29
+27
+29
+40
+23
+29
+32
+24
+23
+27
+29
+23
+23
+19
+17
+25
+37
+23
+42
+35
+24
+26
+27
+28
+30
+21
+18
+19
+19
+18
+22
+19
+28
+32
+20
+29
+31
+29
+31
+31
+24
+25
+24
+27
+24
+21
+29
+24
+31
+26
+19
+41
+26
+24
+23
+20
+26
+18
+27
+28
+22
+18
+26
+32
+28
+23
+32
+34
+28
+25
+25
+19
+25
+33
+25
+23
+17
+36
+29
+25
+20
+31
+28
+29
+18
+28
+20
+25
+28
+27
+25
+21
+30
+22
+25
+23
+25
+27
+27
+23
+25
+25
+21
+29
+22
+36
+22
+24
+24
+30
+25
+22
+21
+19
+27
+25
+29
+19
+25
+28
+25
+20
+30
+27
+23
+24
+30
+25
+18
+22
+30
+25
+25
+23
+29
+25
+30
+19
+22
+19
+24
+19
+25
+22
+37
+34
+35
+22
+26
+18
+31
+22
+21
+29
+22
+33
+26
+20
+24
+24
+15
+23
+28
+24
+21
+23
+26
+19
+25
+25
+17
+18
+19
+21
+29
+22
+19
+29
+23
+22
+28
+26
+23
+20
+24
+27
+25
+34
+24
+30
+22
+30
+20
+28
+27
+27
+27
+21
+23
+29
+23
+31
+20
+18
+22
+20
+28
+20
+22
+21
+21
+31
+29
+24
+31
+22
+19
+19
+18
+23
+19
+21
+29
+27
+20
+37
+27
+32
+22
+20
+21
+23
+21
+21
+23
+19
+19
+38
+21
+25
+21
+33
+23
+23
+18
+33
+27
+21
+20
+25
+18
+20
+19
+26
+20
+26
+20
+25
+16
+16
+22
+24
+22
+24
+21
+24
+34
+19
+29
+24
+23
+23
+21
+21
+23
+31
+17
+28
+19
+17
+24
+27
+23
+22
+21
+34
+17
+30
+41
+28
+27
+28
+24
+19
+20
+28
+29
+20
+25
+16
+21
+26
+31
+20
+18
+19
+21
+16
+21
+31
+21
+22
+18
+23
+33
+36
+26
+26
+18
+24
+25
+25
+28
+22
+20
+24
+20
+26
+21
+26
+26
+24
+38
+22
+25
+21
+35
+20
+18
+22
+23
+32
+27
+20
+13
+20
+26
+20
+21
+20
+37
+26
+22
+26
+29
+21
+22
+25
+27
+21
+29
+19
+32
+27
+31
+32
+20
+22
+22
+29
+17
+25
+26
+22
+26
+21
+33
+26
+21
+24
+22
+28
+26
+25
+26
+22
+24
+27
+33
+23
+26
+21
+26
+22
+26
+21
+19
+28
+30
+22
+20
+27
+26
+23
+29
+23
+26
+20
+26
+30
+18
+26
+15
+25
+25
+23
+30
+28
+18
+21
+25
+26
+26
+25
+29
+17
+22
+25
+30
+31
+24
+38
+25
+20
+26
+30
+23
+24
+25
+26
+21
+29
+29
+18
+22
+21
+29
+22
+20
+19
+20
+19
+25
+25
+18
+19
+19
+21
+24
+21
+22
+23
+19
+22
+29
+25
+26
+24
+26
+29
+22
+21
+21
+20
+25
+26
+26
+21
+24
+40
+20
+19
+16
+29
+24
+25
+22
+28
+23
+17
+16
+23
+20
+21
+23
+33
+20
+27
+19
+24
+26
+22
+26
+22
+29
+29
+25
+21
+24
+32
+42
+22
+26
+24
+19
+21
+22
+21
+27
+37
+22
+20
+31
+24
+20
+25
+22
+23
+20
+24
+25
+33
+20
+21
+19
+28
+24
+19
+19
+31
+28
+20
+27
+18
+22
+20
+22
+25
+28
+26
+20
+20
+37
+21
+19
+32
+29
+19
+19
+34
+21
+27
+29
+23
+21
+23
+26
+30
+26
+35
+23
+18
+19
+33
+30
+21
+26
+25
+35
+30
+29
+25
+20
+21
+25
+26
+25
+39
+25
+24
+26
+23
+29
+26
+22
+25
+25
+20
+22
+26
+22
+29
+26
+25
+25
+29
+19
+27
+40
+21
+25
+29
+21
+23
+26
+19
+33
+17
+28
+25
+24
+26
+18
+31
+21
+24
+26
+24
+25
+32
+18
+22
+25
+26
+23
+20
+20
+29
+30
+28
+23
+28
+27
+21
+19
+23
+21
+21
+23
+23
+18
+26
+22
+20
+20
+15
+24
+20
+32
+32
+28
+23
+26
+32
+19
+28
+24
+30
+17
+27
+26
+26
+27
+33
+36
+22
+22
+30
+26
+23
+20
+18
+21
+17
+19
+31
+21
+19
+30
+24
+25
+28
+21
+25
+26
+30
+23
+28
+27
+32
+25
+20
+20
+37
+23
+28
+24
+35
+25
+26
+21
+30
+23
+32
+21
+24
+20
+33
+20
+18
+28
+27
+19
+29
+19
+14
+20
+21
+17
+28
+21
+21
+31
+25
+27
+29
+29
+18
+29
+25
+27
+25
+21
+24
+18
+22
+22
+17
+29
+21
+33
+29
+21
+27
+18
+22
+18
+18
+20
+23
+19
+21
+21
+23
+19
+27
+22
+24
+17
+17
+27
+29
+22
+26
+33
+20
+21
+21
+27
+21
+25
+22
+17
+27
+21
+18
+30
+31
+23
+26
+19
+21
+30
+28
+19
+27
+23
+23
+19
+23
+13
+28
+23
+18
+17
+25
+25
+25
+23
+26
+21
+23
+35
+20
+23
+23
+21
+25
+16
+23
+17
+24
+20
+24
+17
+22
+28
+24
+25
+18
+23
+28
+19
+29
+27
+26
+25
+23
+28
+21
+26
+25
+29
+25
+22
+28
+23
+25
+22
+25
+19
+19
+20
+16
+24
+20
+18
+30
+19
+29
+23
+31
+15
+25
+15
+29
+23
+24
+14
+19
+31
+28
+16
+17
+28
+21
+18
+22
+20
+23
+18
+24
+25
+14
+18
+23
+19
+17
+20
+14
+22
+16
+21
+19
+17
+21
+27
+24
+19
+17
+21
+17
+25
+26
+18
+24
+26
+24
+22
+25
+24
+21
+16
+29
+18
+26
+28
+28
+19
+29
+21
+31
+30
+23
+19
+18
+23
+21
+22
+23
+21
+21
+16
+35
+21
+20
+20
+25
+30
+29
+29
+18
+18
+25
+22
+19
+30
+25
+26
+18
+26
+23
+25
+24
+18
+19
+22
+19
+28
+18
+24
+19
+22
+35
+14
+23
+19
+24
+25
+20
+28
+32
+17
+28
+29
+25
+29
+24
+25
+21
+25
+22
+24
+24
+20
+24
+21
+23
+26
+22
+20
+22
+27
+22
+21
+23
+26
+25
+22
+26
+24
+19
+34
+20
+27
+23
+20
+27
+27
+19
+25
+22
+31
+33
+19
+20
+31
+21
+17
+17
+21
+23
+18
+27
+33
+27
+23
+26
+27
+22
+18
+19
+25
+19
+24
+17
+25
+19
+15
+20
+24
+23
+19
+25
+27
+27
+22
+29
+21
+20
+32
+28
+29
+21
+26
+16
+17
+19
+21
+15
+22
+23
+22
+22
+25
+23
+21
+37
+18
+21
+29
+20
+17
+19
+22
+27
+23
+25
+22
+19
+15
+22
+19
+28
+24
+23
+20
+21
+22
+30
+24
+20
+24
+20
+18
+27
+22
+22
+40
+18
+25
+25
+23
+19
+18
+26
+24
+14
+21
+19
+18
+22
+10
+23
+18
+18
+21
+30
+18
+20
+31
+19
+17
+18
+25
+19
+22
+31
+19
+27
+17
+23
+27
+24
+17
+19
+19
+25
+21
+28
+23
+20
+23
+22
+24
+23
+37
+25
+21
+24
+18
+23
+16
+21
+17
+26
+30
+28
+23
+30
+26
+24
+20
+22
+30
+25
+15
+21
+25
+26
+26
+20
+22
+41
+20
+25
+42
+22
+25
+16
+19
+20
+23
+29
+23
+21
+23
+24
+23
+19
+22
+23
+18
+21
+18
+21
+20
+20
+19
+23
+25
+19
+25
+16
+22
+29
+19
+29
+30
+27
+22
+20
+18
+26
+20
+20
+18
+19
+31
+21
+31
+22
+20
+25
+18
+19
+19
+27
+28
+29
+25
+18
+22
+18
+22
+25
+20
+18
+20
+19
+22
+19
+18
+23
+20
+20
+24
+27
+22
+21
+31
+18
+33
+16
+23
+21
+23
+21
+20
+27
+19
+21
+28
+30
+18
+17
+28
+18
+31
+21
+32
+22
+16
+26
+26
+18
+19
+15
+22
+21
+19
+19
+19
+22
+15
+26
+29
+23
+20
+20
+20
+24
+21
+23
+32
+31
+21
+31
+28
+33
+26
+20
+24
+24
+25
+31
+24
+32
+28
+20
+22
+18
+25
+16
+18
+12
+22
+29
+31
+20
+21
+20
+23
+24
+29
+19
+20
+18
+21
+20
+22
+21
+20
+28
+22
+17
+24
+30
+17
+27
+33
+26
+22
+18
+25
+22
+32
+22
+18
+22
+27
+24
+30
+30
+25
+25
+23
+19
+28
+19
+16
+20
+25
+29
+18
+33
+25
+28
+32
+34
+32
+25
+24
+28
+26
+27
+29
+30
+20
+24
+28
+27
+24
+27
+24
+23
+26
+26
+26
+26
+26
+24
+22
+30
+21
+22
+22
+34
+25
+27
+21
+14
+30
+27
+32
+28
+20
+24
+22
+23
+15
+20
+25
+22
+22
+25
+27
+30
+26
+23
+23
+15
+21
+18
+21
+23
+22
+24
+29
+24
+19
+29
+18
+21
+20
+18
+23
+24
+19
+21
+28
+20
+30
+19
+23
+24
+24
+25
+23
+22
+29
+26
+26
+23
+25
+18
+41
+19
+28
+18
+19
+38
+19
+33
+30
+18
+16
+23
+29
+31
+19
+22
+18
+21
+17
+22
+26
+26
+20
+25
+28
+24
+23
+28
+17
+22
+25
+25
+22
+19
+18
+17
+25
+23
+32
+22
+17
+30
+27
+16
+21
+21
+26
+17
+22
+22
+31
+23
+22
+25
+30
+23
+25
+20
+25
+21
+21
+19
+16
+24
+28
+19
+19
+39
+21
+32
+25
+26
+40
+16
+25
+21
+19
+20
+20
+25
+20
+22
+25
+22
+16
+17
+19
+15
+30
+18
+21
+30
+21
+35
+20
+21
+26
+30
+17
+23
+19
+25
+19
+17
+32
+25
+32
+19
+25
+25
+27
+19
+20
+28
+25
+28
+21
+29
+23
+25
+17
+17
+24
+37
+28
+22
+25
+31
+25
+24
+19
+22
+28
+24
+24
+18
+29
+22
+27
+26
+27
+19
+35
+17
+20
+18
+19
+27
+22
+29
+26
+30
+27
+19
+22
+25
+22
+29
+28
+25
+29
+38
+23
+35
+25
+32
+23
+30
+35
+24
+26
+22
+16
+23
+26
+21
+23
+19
+28
+16
+22
+26
+26
+26
+31
+19
+21
+20
+17
+26
+20
+28
+25
+27
+20
+25
+32
+32
+19
+15
+24
+22
+20
+22
+26
+23
+21
+18
+22
+27
+19
+21
+24
+20
+25
+22
+21
+20
+18
+20
+18
+20
+27
+18
+18
+23
+16
+18
+22
+22
+19
+18
+22
+26
+19
+35
+31
+22
+15
+26
+21
+25
+38
+15
+19
+26
+18
+24
+22
+21
+18
+19
+25
+27
+20
+19
+22
+23
+23
+28
+29
+16
+23
+27
+19
+23
+23
+25
+22
+18
+25
+20
+24
+20
+27
+30
+24
+17
+18
+18
+22
+25
+23
+22
+18
+23
+21
+27
+28
+20
+29
+23
+18
+34
+24
+36
+25
+28
+35
+23
+23
+31
+23
+27
+23
+19
+20
+26
+24
+34
+29
+21
+27
+23
+27
+23
+21
+22
+21
+37
+23
+24
+29
+30
+26
+26
+21
+29
+23
+26
+21
+19
+21
+34
+25
+19
+29
+19
+22
+29
+24
+25
+28
+14
+25
+33
+21
+25
+20
+26
+20
+24
+18
+26
+27
+16
+23
+21
+22
+28
+27
+19
+18
+27
+28
+19
+27
+22
+22
+27
+26
+24
+29
+28
+22
+27
+25
+23
+19
+25
+23
+26
+22
+23
+24
+24
+18
+32
+23
+27
+23
+16
+24
+27
+28
+20
+24
+30
+18
+27
+21
+25
+28
+24
+34
+31
+23
+37
+35
+38
+35
+20
+28
+31
+27
+35
+22
+22
+33
+29
+26
+25
+19
+23
+28
+19
+37
+22
+23
+26
+21
+18
+18
+37
+29
+20
+22
+24
+25
+19
+27
+32
+27
+32
+26
+31
+37
+23
+22
+24
+32
+29
+29
+25
+23
+31
+20
+17
+16
+26
+21
+21
+28
+23
+29
+24
+17
+17
+24
+22
+21
+24
+20
+23
+22
+26
+21
+21
+24
+26
+25
+37
+24
+35
+23
+21
+24
+22
+28
+25
+25
+20
+22
+21
+21
+17
+36
+26
+28
+16
+17
+20
+14
+21
+28
+18
+25
+29
+31
+34
+28
+24
+34
+29
+18
+29
+25
+18
+19
+26
+22
+22
+20
+30
+24
+22
+29
+23
+21
+19
+26
+29
+26
+26
+18
+24
+21
+27
+22
+25
+29
+31
+23
+29
+19
+23
+29
+25
+26
+24
+23
+21
+30
+29
+22
+19
+23
+32
+34
+30
+27
+23
+22
+20
+24
+29
+21
+21
+19
+24
+24
+24
+27
+27
+19
+22
+20
+34
+22
+19
+21
+30
+18
+23
+22
+21
+25
+18
+30
+28
+17
+24
+21
+23
+23
+18
+19
+27
+23
+24
+25
+16
+19
+29
+21
+36
+28
+21
+25
+19
+26
+25
+26
+26
+20
+21
+20
+23
+26
+26
+14
+21
+29
+27
+25
+27
+22
+26
+20
+26
+19
+29
+29
+34
+25
+22
+25
+18
+22
+21
+23
+25
+32
+23
+34
+24
+23
+24
+30
+30
+27
+28
+32
+29
+41
+25
+26
+23
+30
+20
+26
+19
+22
+26
+26
+22
+21
+18
+24
+22
+20
+27
+30
+23
+24
+18
+19
+21
+35
+16
+21
+23
+24
+22
+24
+25
+25
+18
+30
+16
+21
+22
+33
+20
+17
+23
+26
+24
+23
+27
+28
+27
+25
+27
+30
+25
+29
+30
+30
+24
+21
+26
+20
+34
+21
+22
+22
+25
+17
+29
+22
+25
+21
+25
+23
+23
+21
+23
+23
+39
+25
+22
+26
+20
+19
+18
+24
+19
+20
+24
+19
+33
+25
+24
+21
+20
+25
+26
+21
+21
+29
+35
+30
+20
+22
+28
+20
+20
+37
+26
+26
+21
+23
+31
+18
+23
+21
+19
+22
+21
+16
+22
+28
+20
+21
+20
+24
+25
+20
+19
+17
+17
+20
+18
+23
+21
+25
+21
+19
+31
+30
+33
+16
+34
+34
+18
+25
+28
+18
+20
+25
+23
+26
+23
+22
+29
+24
+22
+21
+25
+25
+20
+24
+23
+25
+22
+19
+24
+27
+25
+22
+35
+29
+36
+29
+25
+21
+24
+29
+28
+30
+25
+26
+26
+30
+34
+24
+18
+25
+23
+39
+21
+20
+31
+20
+21
+28
+25
+23
+29
+19
+28
+27
+19
+18
+30
+25
+29
+18
+19
+21
+22
+21
+24
+38
+20
+23
+25
+19
+19
+18
+25
+26
+27
+19
+24
+25
+29
+18
+18
+25
+24
+18
+19
+24
+22
+25
+22
+19
+20
+20
+35
+23
+17
+25
+27
+28
+16
+32
+28
+18
+25
+18
+22
+19
+18
+27
+22
+30
+19
+23
+19
+20
+28
+24
+16
+19
+21
+24
+15
+18
+30
+20
+24
+27
+23
+26
+18
+20
+18
+21
+18
+27
+23
+29
+23
+28
+28
+27
+27
+28
+20
+22
+30
+23
+19
+19
+19
+25
+30
+24
+21
+19
+29
+20
+28
+22
+20
+19
+27
+22
+28
+25
+24
+35
+26
+26
+23
+26
+20
+24
+26
+28
+31
+26
+22
+22
+27
+26
+21
+23
+24
+27
+23
+22
+24
+21
+27
+21
+18
+17
+27
+18
+23
+28
+18
+25
+20
+20
+28
+20
+25
+26
+21
+26
+26
+23
+27
+21
+21
+33
+23
+22
+24
+21
+24
+24
+23
+25
+26
+23
+25
+20
+20
+26
+21
+26
+25
+22
+21
+28
+29
+20
+25
+21
+26
+23
+27
+24
+16
+31
+21
+23
+28
+23
+22
+27
+24
+29
+31
+30
+30
+19
+21
+21
+20
+28
+20
+29
+22
+26
+26
+28
+21
+31
+25
+23
+32
+26
+30
+28
+33
+18
+33
+22
+16
+32
+26
+23
+33
+28
+24
+25
+28
+30
+24
+21
+24
+22
+23
+23
+26
+22
+29
+28
+24
+30
+22
+22
+26
+33
+19
+24
+22
+19
+19
+25
+28
+19
+21
+30
+18
+30
+22
+20
+31
+28
+22
+27
+22
+20
+21
+22
+23
+22
+39
+21
+23
+23
+28
+20
+35
+20
+19
+35
+26
+19
+35
+25
+25
+25
+30
+22
+18
+27
+20
+25
+24
+19
+29
+18
+30
+25
+24
+22
+25
+25
+35
+25
+22
+20
+31
+26
+27
+25
+25
+21
+22
+22
+22
+25
+26
+23
+20
+28
+23
+30
+18
+25
+26
+22
+19
+19
+20
+34
+25
+22
+23
+16
+29
+35
+23
+26
+19
+38
+20
+25
+19
+24
+19
+19
+29
+21
+20
+29
+21
+23
+29
+19
+20
+22
+28
+17
+23
+24
+32
+21
+25
+15
+25
+22
+24
+27
+20
+28
+24
+36
+19
+33
+19
+31
+21
+34
+26
+23
+19
+30
+32
+25
+28
+24
+23
+31
+25
+25
+22
+27
+24
+20
+23
+23
+23
+26
+23
+26
+18
+23
+24
+24
+19
+25
+31
+23
+19
+23
+25
+22
+24
+21
+22
+22
+17
+23
+20
+24
+24
+22
+20
+26
+24
+20
+22
+27
+39
+20
+14
+22
+25
+21
+30
+19
+34
+24
+23
+18
+20
+21
+14
+27
+27
+20
+30
+20
+34
+25
+28
+24
+32
+26
+23
+20
+30
+25
+21
+19
+24
+24
+19
+20
+19
+21
+27
+21
+24
+24
+25
+27
+23
+17
+30
+39
+21
+22
+21
+24
+20
+25
+19
+23
+22
+19
+16
+20
+21
+31
+24
+24
+22
+29
+21
+22
+18
+19
+21
+25
+21
+29
+24
+22
+18
+24
+28
+19
+21
+19
+28
+25
+22
+24
+21
+27
+20
+28
+18
+18
+23
+25
+22
+22
+21
+26
+28
+26
+25
+20
+24
+20
+19
+23
+21
+30
+42
+29
+39
+34
+21
+18
+24
+25
+29
+40
+23
+24
+23
+22
+17
+17
+21
+26
+35
+22
+18
+26
+20
+22
+25
+19
+21
+27
+20
+22
+19
+21
+29
+20
+23
+17
+25
+16
+19
+21
+20
+36
+21
+17
+18
+23
+21
+21
+19
+22
+22
+21
+25
+25
+18
+27
+20
+24
+31
+19
+27
+21
+19
+19
+26
+23
+24
+24
+32
+25
+20
+27
+22
+15
+26
+29
+18
+20
+27
+22
+21
+23
+19
+25
+24
+33
+29
+33
+24
+27
+18
+28
+24
+27
+29
+33
+21
+19
+27
+19
+19
+24
+22
+12
+21
+22
+22
+30
+28
+28
+25
+23
+31
+33
+25
+27
+26
+21
+29
+21
+23
+17
+23
+24
+16
+22
+20
+21
+19
+22
+26
+30
+27
+35
+26
+22
+33
+25
+19
+21
+20
+34
+25
+26
+26
+19
+26
+24
+23
+27
+18
+22
+25
+19
+27
+25
+26
+26
+29
+23
+22
+23
+28
+21
+17
+19
+20
+17
+22
+26
+18
+19
+26
+18
+20
+22
+24
+23
+19
+19
+24
+25
+25
+26
+21
+17
+20
+17
+20
+22
+19
+19
+18
+27
+26
+21
+33
+25
+31
+16
+28
+22
+26
+18
+19
+23
+18
+24
+29
+30
+25
+16
+26
+20
+23
+23
+20
+16
+26
+33
+19
+22
+17
+20
+21
+18
+26
+19
+24
+24
+26
+23
+21
+20
+18
+19
+27
+21
+28
+24
+31
+20
+22
+21
+18
+29
+26
+18
+20
+22
+22
+22
+20
+22
+28
+33
+19
+26
+20
+25
+19
+34
+18
+19
+18
+19
+29
+16
+27
+19
+19
+26
+24
+28
+29
+22
+19
+20
+28
+22
+16
+20
+20
+28
+26
+29
+25
+23
+18
+19
+18
+15
+29
+20
+33
+27
+19
+29
+28
+26
+20
+21
+22
+32
+18
+24
+19
+20
+21
+20
+28
+21
+31
+24
+40
+21
+15
+26
+26
+26
+19
+21
+22
+18
+23
+21
+20
+21
+20
+20
+35
+26
+30
+19
+27
+19
+20
+29
+24
+17
+22
+16
+34
+25
+24
+20
+24
+27
+21
+30
+26
+38
+38
+18
+18
+32
+19
+24
+20
+15
+24
+30
+25
+15
+28
+20
+27
+24
+31
+15
+19
+31
+20
+35
+18
+20
+19
+25
+16
+24
+22
+28
+30
+21
+24
+20
+20
+19
+34
+23
+16
+21
+16
+19
+20
+21
+27
+21
+22
+20
+22
+19
+23
+30
+24
+31
+31
+22
+19
+33
+23
+22
+19
+16
+18
+22
+17
+20
+18
+31
+25
+23
+19
+18
+19
+21
+21
+27
+22
+19
+29
+20
+20
+16
+18
+24
+25
+19
+31
+19
+22
+19
+16
+17
+23
+21
+22
+27
+16
+23
+20
+20
+20
+19
+19
+24
+18
+21
+23
+20
+21
+20
+20
+18
+20
+20
+22
+18
+33
+18
+15
+17
+21
+16
+36
+31
+17
+21
+17
+18
+17
+19
+23
+18
+17
+20
+16
+23
+20
+24
+18
+14
+17
+14
+17
+23
+18
+24
+23
+25
+20
+18
+15
+22
+26
+17
+19
+18
+34
+24
+18
+27
+20
+20
+19
+19
+20
+19
+25
+27
+23
+28
+22
+27
+24
+26
+15
+26
+28
+24
+33
+24
+23
+16
+30
+21
+22
+26
+23
+18
+28
+26
+31
+22
+27
+21
+20
+19
+29
+16
+24
+26
+21
+25
+19
+26
+29
+28
+24
+29
+28
+21
+17
+22
+26
+19
+34
+26
+19
+29
+24
+30
+16
+24
+25
+22
+24
+19
+22
+21
+23
+30
+20
+22
+27
+27
+28
+23
+24
+17
+31
+25
+25
+25
+22
+23
+17
+25
+29
+33
+19
+24
+33
+18
+27
+30
+15
+30
+17
+21
+25
+18
+28
+22
+23
+20
+18
+19
+32
+24
+25
+23
+26
+30
+24
+25
+25
+20
+24
+19
+22
+31
+26
+28
+28
+24
+19
+26
+18
+25
+17
+34
+19
+28
+20
+21
+21
+18
+18
+19
+21
+34
+20
+24
+16
+20
+22
+22
+21
+24
+23
+20
+19
+17
+19
+21
+33
+25
+18
+17
+29
+27
+27
+33
+22
+22
+23
+13
+25
+24
+21
+21
+32
+20
+21
+28
+20
+29
+25
+25
+28
+34
+26
+25
+24
+21
+25
+20
+21
+27
+27
+18
+23
+14
+27
+22
+24
+21
+26
+24
+23
+19
+20
+22
+22
+20
+30
+23
+28
+19
+21
+23
+26
+19
+27
+27
+22
+24
+25
+36
+19
+34
+35
+26
+21
+23
+33
+20
+23
+26
+21
+19
+24
+20
+28
+21
+37
+26
+21
+18
+20
+18
+43
+25
+19
+28
+19
+20
+25
+20
+21
+15
+21
+20
+21
+19
+29
+22
+22
+18
+20
+29
+29
+23
+27
+21
+20
+18
+35
+25
+23
+24
+18
+20
+19
+18
+16
+37
+26
+24
+33
+35
+23
+20
+22
+14
+24
+19
+19
+18
+29
+15
+17
+37
+22
+25
+19
+20
+32
+21
+19
+29
+21
+23
+16
+24
+20
+22
+18
+18
+19
+23
+39
+21
+19
+22
+24
+25
+28
+18
+16
+18
+21
+21
+18
+18
+20
+24
+23
+15
+19
+19
+22
+23
+27
+26
+25
+24
+22
+18
+17
+18
+26
+18
+24
+18
+23
+20
+24
+24
+21
+27
+27
+35
+24
+25
+23
+20
+24
+20
+25
+21
+24
+23
+25
+21
+20
+21
+20
+32
+24
+18
+28
+16
+19
+18
+23
+24
+25
+20
+23
+20
+29
+23
+18
+21
+21
+23
+23
+21
+22
+22
+21
+20
+28
+21
+22
+21
+21
+24
+20
+28
+17
+21
+18
+20
+19
+20
+23
+33
+19
+18
+25
+23
+24
+19
+23
+25
+21
+26
+27
+19
+28
+20
+34
+25
+20
+19
+22
+22
+30
+21
+24
+18
+20
+15
+19
+23
+24
+36
+18
+27
+21
+17
+21
+26
+18
+24
+31
+14
+30
+26
+23
+19
+16
+23
+19
+20
+28
+23
+23
+33
+34
+32
+21
+20
+18
+25
+26
+24
+27
+17
+31
+38
+22
+31
+20
+25
+23
+15
+24
+21
+20
+19
+15
+23
+24
+28
+20
+28
+27
+19
+24
+25
+19
+25
+29
+25
+22
+21
+26
+21
+25
+21
+18
+27
+25
+23
+22
+23
+23
+24
+22
+21
+22
+20
+23
+25
+23
+21
+20
+21
+21
+22
+26
+25
+18
+18
+25
+24
+20
+26
+21
+20
+23
+20
+20
+17
+17
+19
+23
+20
+19
+19
+21
+20
+26
+22
+22
+24
+28
+22
+25
+22
+19
+20
+21
+21
+21
+22
+28
+20
+21
+25
+22
+24
+30
+21
+19
+21
+24
+27
+21
+19
+22
+15
+18
+20
+21
+19
+22
+16
+21
+18
+23
+19
+19
+21
+24
+27
+21
+27
+24
+31
+20
+26
+20
+21
+18
+24
+21
+24
+19
+18
+24
+23
+33
+25
+22
+30
+28
+21
+29
+25
+25
+29
+27
+25
+27
+27
+25
+27
+27
+20
+22
+27
+36
+19
+16
+24
+18
+27
+26
+19
+23
+22
+22
+24
+24
+22
+26
+29
+23
+25
+25
+21
+24
+21
+24
+22
+17
+24
+26
+25
+19
+21
+21
+20
+20
+22
+24
+30
+26
+24
+29
+22
+28
+27
+32
+23
+19
+24
+28
+30
+25
+33
+30
+21
+18
+29
+32
+28
+34
+21
+22
+30
+21
+24
+25
+33
+18
+23
+24
+34
+26
+25
+22
+23
+26
+33
+27
+24
+25
+22
+29
+19
+26
+22
+23
+19
+18
+15
+20
+24
+18
+18
+21
+18
+18
+18
+19
+17
+31
+20
+16
+24
+20
+25
+25
+22
+18
+18
+26
+23
+40
+20
+19
+21
+19
+21
+23
+19
+25
+20
+22
+24
+20
+23
+29
+20
+23
+23
+19
+23
+25
+23
+24
+25
+22
+28
+23
+28
+23
+16
+24
+23
+20
+27
+25
+20
+25
+30
+31
+23
+19
+29
+18
+25
+22
+22
+20
+13
+38
+18
+22
+19
+20
+18
+28
+16
+25
+19
+24
+21
+21
+19
+18
+21
+21
+18
+21
+24
+17
+21
+20
+19
+19
+18
+24
+18
+25
+28
+18
+27
+19
+27
+19
+31
+19
+28
+21
+17
+29
+21
+18
+26
+24
+31
+25
+23
+27
+22
+26
+27
+23
+20
+20
+27
+29
+21
+23
+35
+27
+19
+31
+34
+19
+23
+26
+27
+17
+19
+18
+19
+19
+20
+23
+24
+20
+21
+17
+18
+23
+21
+21
+24
+16
+19
+19
+16
+21
+17
+24
+19
+16
+21
+16
+22
+25
+42
+25
+22
+16
+25
+17
+23
+30
+31
+23
+26
+24
+18
+23
+28
+21
+21
+18
+19
+27
+21
+18
+24
+14
+21
+26
+28
+18
+19
+18
+36
+22
+21
+17
+18
+30
+21
+22
+23
+20
+21
+22
+26
+25
+22
+29
+21
+23
+18
+18
+25
+23
+19
+18
+29
+27
+22
+26
+26
+17
+26
+22
+30
+26
+16
+28
+26
+20
+19
+18
+23
+22
+35
+26
+21
+22
+23
+24
+23
+20
+22
+25
+21
+24
+33
+18
+22
+25
+33
+19
+20
+24
+24
+24
+28
+20
+32
+21
+23
+26
+25
+24
+23
+24
+30
+22
+28
+30
+19
+30
+23
+28
+20
+24
+28
+19
+22
+18
+24
+25
+22
+30
+24
+24
+19
+30
+27
+23
+32
+23
+29
+25
+17
+19
+18
+19
+18
+24
+22
+28
+24
+21
+27
+22
+23
+28
+24
+18
+23
+20
+22
+22
+17
+23
+23
+28
+22
+20
+24
+24
+24
+22
+26
+26
+33
+20
+21
+30
+26
+26
+21
+19
+20
+24
+34
+21
+18
+19
+23
+26
+29
+19
+25
+21
+22
+26
+28
+27
+27
+19
+22
+24
+20
+25
+18
+21
+21
+20
+19
+20
+26
+24
+20
+18
+27
+19
+21
+24
+23
+21
+27
+20
+26
+21
+18
+20
+23
+23
+24
+29
+20
+21
+18
+25
+22
+29
+18
+19
+30
+18
+25
+20
+22
+24
+27
+25
+25
+22
+18
+17
+19
+27
+28
+26
+20
+22
+24
+23
+23
+25
+20
+23
+27
+20
+24
+23
+25
+24
+19
+18
+22
+24
+23
+15
+19
+18
+22
+16
+18
+35
+22
+22
+20
+25
+20
+20
+25
+22
+37
+21
+18
+19
+18
+18
+27
+21
+24
+20
+20
+19
+22
+22
+23
+20
+18
+19
+22
+25
+25
+25
+20
+18
+20
+24
+21
+18
+19
+19
+21
+19
+20
+27
+27
+23
+24
+22
+19
+20
+22
+18
+19
+29
+16
+38
+24
+19
+23
+14
+36
+25
+19
+23
+30
+26
+28
+26
+26
+15
+22
+21
+20
+22
+21
+22
+19
+28
+18
+33
+25
+16
+24
+19
+24
+20
+24
+21
+25
+21
+20
+28
+19
+21
+24
+18
+18
+31
+18
+20
+19
+23
+19
+23
+25
+20
+24
+20
+21
+26
+22
+22
+25
+24
+21
+23
+25
+24
+18
+23
+25
+18
+26
+24
+21
+25
+23
+22
+28
+21
+24
+20
+26
+25
+19
+20
+24
+16
+25
+26
+31
+26
+20
+29
+23
+19
+24
+27
+22
+27
+23
+22
+24
+20
+19
+26
+23
+21
+19
+20
+31
+17
+18
+21
+17
+22
+22
+26
+26
+22
+18
+15
+19
+26
+23
+20
+15
+23
+18
+22
+21
+21
+21
+27
+19
+20
+28
+21
+39
+26
+22
+20
+24
+20
+20
+28
+30
+18
+22
+28
+20
+19
+19
+20
+27
+18
+24
+21
+20
+20
+32
+20
+22
+18
+22
+18
+30
+17
+17
+20
+23
+17
+24
+24
+16
+20
+20
+24
+26
+22
+19
+21
+28
+21
+26
+26
+17
+27
+26
+19
+33
+22
+18
+21
+21
+24
+16
+20
+22
+14
+22
+21
+21
+19
+24
+39
+20
+16
+25
+20
+26
+29
+23
+29
+26
+20
+20
+36
+30
+24
+23
+30
+27
+29
+26
+25
+23
+24
+28
+27
+18
+32
+18
+23
+19
+21
+21
+17
+27
+19
+26
+24
+21
+21
+27
+23
+23
+23
+23
+25
+21
+27
+20
+23
+21
+27
+20
+23
+23
+18
+16
+19
+19
+37
+19
+23
+22
+27
+26
+19
+22
+24
+19
+16
+17
+20
+22
+23
+18
+24
+19
+17
+29
+25
+21
+23
+23
+20
+19
+17
+21
+15
+24
+25
+18
+20
+23
+20
+22
+19
+27
+15
+24
+19
+16
+19
+16
+15
+14
+18
+16
+19
+17
+19
+18
+16
+18
+21
+18
+42
+20
+17
+17
+19
+18
+28
+16
+31
+29
+26
+28
+18
+17
+17
+17
+30
+23
+25
+19
+20
+19
+20
+20
+25
+26
+20
+24
+18
+27
+25
+20
+20
+22
+19
+25
+30
+22
+17
+19
+19
+21
+36
+17
+25
+17
+13
+20
+28
+21
+21
+26
+40
+24
+25
+33
+23
+35
+23
+19
+22
+18
+23
+27
+31
+19
+23
+27
+22
+18
+19
+18
+22
+21
+22
+37
+19
+22
+25
+27
+38
+33
+19
+23
+17
+41
+20
+20
+21
+34
+20
+20
+20
+15
+20
+30
+23
+16
+28
+18
+21
+16
+18
+18
+18
+26
+18
+18
+21
+20
+21
+18
+20
+17
+21
+21
+18
+22
+15
+22
+18
+22
+20
+24
+20
+17
+29
+25
+18
+23
+21
+18
+18
+21
+18
+23
+25
+20
+20
+20
+17
+20
+25
+18
+25
+24
+18
+20
+19
+27
+28
+21
+22
+28
+16
+17
+16
+19
+17
+29
+21
+22
+21
+18
+22
+27
+26
+22
+20
+20
+24
+19
+22
+18
+32
+21
+19
+21
+15
+28
+20
+25
+19
+24
+19
+19
+33
+39
+18
+21
+25
+19
+19
+23
+21
+29
+19
+24
+22
+25
+21
+18
+24
+18
+21
+20
+18
+23
+33
+21
+19
+18
+26
+21
+17
+18
+34
+18
+21
+18
+19
+17
+32
+24
+21
+24
+20
+18
+22
+20
+17
+23
+21
+19
+26
+23
+26
+21
+23
+15
+21
+17
+28
+20
+28
+20
+22
+22
+24
+17
+32
+24
+16
+24
+23
+20
+27
+22
+42
+28
+18
+31
+22
+22
+19
+19
+22
+32
+15
+27
+23
+23
+18
+18
+22
+25
+20
+22
+22
+17
+21
+17
+20
+15
+19
+18
+26
+25
+18
+24
+26
+22
+18
+22
+17
+31
+18
+21
+31
+20
+26
+27
+25
+26
+27
+19
+18
+24
+18
+22
+23
+28
+28
+23
+26
+29
+28
+18
+20
+20
+15
+18
+23
+26
+20
+20
+23
+26
+19
+19
+20
+25
+21
+21
+24
+19
+20
+16
+14
+24
+19
+28
+20
+25
+31
+21
+22
+23
+19
+24
+19
+20
+19
+20
+22
+22
+27
+22
+26
+22
+14
+19
+18
+20
+27
+20
+20
+21
+21
+24
+24
+16
+25
+27
+22
+21
+31
+26
+20
+17
+21
+20
+19
+19
+21
+16
+21
+33
+22
+19
+25
+23
+23
+21
+22
+27
+20
+21
+23
+17
+23
+18
+28
+25
+23
+31
+35
+23
+20
+18
+24
+31
+19
+32
+19
+30
+19
+26
+19
+22
+16
+19
+21
+21
+40
+23
+26
+17
+20
+17
+31
+21
+22
+22
+18
+17
+22
+24
+25
+25
+23
+24
+23
+30
+21
+25
+32
+23
+27
+26
+22
+25
+34
+16
+22
+22
+18
+23
+23
+20
+28
+26
+26
+19
+34
+22
+28
+19
+24
+21
+19
diff --git a/previous_versions/v0.4.0/data/cleSac.txt b/previous_versions/v0.4.0/data/cleSac.txt
new file mode 100755
index 000000000..20e7da082
--- /dev/null
+++ b/previous_versions/v0.4.0/data/cleSac.txt
@@ -0,0 +1 @@
+Census_year	State_FIPS_code	Metropolitan_area_Detailed	Age	Sex	Race_General	Marital_status	Total_personal_income2000	California	Sacramento_ CA	56	Male	Japanese	Married_ spouse present	402402000	California	Sacramento_ CA	53	Female	White	Married_ spouse present	136002000	California	Sacramento_ CA	17	Female	Two major races	Never married/single (N/A)	02000	California	Sacramento_ CA	37	Female	White	Never married/single (N/A)	490002000	California	Sacramento_ CA	40	Male	White	Never married/single (N/A)	383002000	California	Sacramento_ CA	23	Male	Other race_ nec	Never married/single (N/A)	140002000	California	Sacramento_ CA	40	Female	Black/Negro	Divorced	90002000	California	Sacramento_ CA	11	Male	Black/Negro	Never married/single (N/A)	2000	California	Sacramento_ CA	46	Male	Black/Negro	Married_ spouse present	400002000	California	Sacramento_ CA	34	Female	Black/Negro	Married_ spouse present	180002000	California	Sacramento_ CA	16	Male	Black/Negro	Never married/single (N/A)	02000	California	Sacramento_ CA	11	Female	Black/Negro	Never married/single (N/A)	2000	California	Sacramento_ CA	7	Female	Black/Negro	Never married/single (N/A)	2000	California	Sacramento_ CA	23	Male	White	Never married/single (N/A)	650002000	California	Sacramento_ CA	30	Female	White	Divorced	300002000	California	Sacramento_ CA	35	Male	White	Married_ spouse present	611002000	California	Sacramento_ CA	30	Male	White	Married_ spouse present	620002000	California	Sacramento_ CA	28	Female	White	Married_ spouse present	55002000	California	Sacramento_ CA	3	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	0	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	42	Male	White	Married_ spouse present	360002000	California	Sacramento_ CA	17	Female	White	Never married/single (N/A)	02000	California	Sacramento_ CA	6	Male	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Male	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	40	Male	White	Married_ spouse present	500002000	California	Sacramento_ CA	37	Female	White	Married_ spouse present	700002000	California	Sacramento_ CA	9	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	7	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	39	Male	White	Divorced	344002000	California	Sacramento_ CA	33	Male	Other Asian or Pacific Islander	Married_ spouse present	180002000	California	Sacramento_ CA	37	Female	Other Asian or Pacific Islander	Married_ spouse present	02000	California	Sacramento_ CA	62	Male	Other Asian or Pacific Islander	Married_ spouse present	38002000	California	Sacramento_ CA	27	Male	Other race_ nec	Married_ spouse absent	150002000	California	Sacramento_ CA	11	Female	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	21	Female	Other race_ nec	Married_ spouse absent	02000	California	Sacramento_ CA	5	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	4	Female	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	80	Female	White	Widowed	551002000	California	Sacramento_ CA	28	Female	Other Asian or Pacific Islander	Married_ spouse present	270002000	California	Sacramento_ CA	0	Male	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	85	Female	White	Widowed	139002000	California	Sacramento_ CA	24	Female	White	Never married/single (N/A)	200002000	California	Sacramento_ CA	45	Female	Black/Negro	Divorced	1500002000	California	Sacramento_ CA	52	Female	White	Divorced	83002000	California	Sacramento_ CA	23	Male	Black/Negro	Never married/single (N/A)	02000	California	Sacramento_ CA	16	Male	Black/Negro	Never married/single (N/A)	02000	California	Sacramento_ CA	43	Female	Other Asian or Pacific Islander	Married_ spouse present	02000	California	Sacramento_ CA	62	Male	White	Married_ spouse present	420002000	California	Sacramento_ CA	60	Female	White	Divorced	14002000	California	Sacramento_ CA	52	Male	White	Married_ spouse present	700002000	California	Sacramento_ CA	51	Female	White	Married_ spouse present	350002000	California	Sacramento_ CA	49	Female	White	Divorced	660002000	California	Sacramento_ CA	29	Male	White	Married_ spouse present	135002000	California	Sacramento_ CA	4	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	2	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	49	Female	Other Asian or Pacific Islander	Married_ spouse present	51002000	California	Sacramento_ CA	51	Male	Other Asian or Pacific Islander	Married_ spouse present	81002000	California	Sacramento_ CA	19	Female	Other Asian or Pacific Islander	Never married/single (N/A)	80002000	California	Sacramento_ CA	25	Male	Other Asian or Pacific Islander	Married_ spouse present	320002000	California	Sacramento_ CA	55	Female	White	Married_ spouse present	518002000	California	Sacramento_ CA	39	Female	White	Never married/single (N/A)	250002000	California	Sacramento_ CA	39	Male	White	Married_ spouse absent	950002000	California	Sacramento_ CA	25	Female	American Indian or Alaska Native	Never married/single (N/A)	320002000	California	Sacramento_ CA	24	Female	White	Married_ spouse present	02000	California	Sacramento_ CA	4	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	77	Male	White	Married_ spouse present	550002000	California	Sacramento_ CA	63	Female	Two major races	Married_ spouse present	510002000	California	Sacramento_ CA	33	Male	White	Married_ spouse present	200002000	California	Sacramento_ CA	12	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	4	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	35	Male	Black/Negro	Divorced	8502000	California	Sacramento_ CA	44	Male	White	Married_ spouse present	800002000	California	Sacramento_ CA	44	Female	White	Married_ spouse present	440002000	California	Sacramento_ CA	18	Male	White	Never married/single (N/A)	02000	California	Sacramento_ CA	15	Female	White	Never married/single (N/A)	02000	California	Sacramento_ CA	19	Male	Two major races	Never married/single (N/A)	37502000	California	Sacramento_ CA	37	Male	Black/Negro	Married_ spouse present	200002000	California	Sacramento_ CA	1	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	30	Male	White	Married_ spouse present	360002000	California	Sacramento_ CA	39	Male	White	Married_ spouse absent	550002000	California	Sacramento_ CA	41	Female	White	Married_ spouse absent	02000	California	Sacramento_ CA	36	Female	White	Never married/single (N/A)	320002000	California	Sacramento_ CA	33	Female	White	Divorced	360002000	California	Sacramento_ CA	18	Male	Other race_ nec	Never married/single (N/A)	20102000	California	Sacramento_ CA	2	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	49	Male	White	Married_ spouse present	763002000	California	Sacramento_ CA	46	Female	White	Married_ spouse present	410002000	California	Sacramento_ CA	20	Female	Black/Negro	Never married/single (N/A)	100002000	California	Sacramento_ CA	35	Male	White	Divorced	96002000	California	Sacramento_ CA	59	Male	White	Divorced	540002000	California	Sacramento_ CA	44	Female	White	Never married/single (N/A)	290002000	California	Sacramento_ CA	15	Female	White	Never married/single (N/A)	02000	California	Sacramento_ CA	51	Male	Japanese	Married_ spouse present	120002000	California	Sacramento_ CA	19	Male	White	Never married/single (N/A)	20002000	California	Sacramento_ CA	16	Male	White	Never married/single (N/A)	02000	California	Sacramento_ CA	14	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	54	Female	White	Married_ spouse present	394002000	California	Sacramento_ CA	51	Female	White	Married_ spouse present	02000	California	Sacramento_ CA	12	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	30	Female	White	Married_ spouse present	400002000	California	Sacramento_ CA	29	Male	White	Married_ spouse present	300002000	California	Sacramento_ CA	0	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	63	Female	White	Married_ spouse present	221002000	California	Sacramento_ CA	46	Female	White	Divorced	179002000	California	Sacramento_ CA	26	Male	White	Never married/single (N/A)	200002000	California	Sacramento_ CA	46	Female	Black/Negro	Divorced	230002000	California	Sacramento_ CA	24	Male	Black/Negro	Never married/single (N/A)	250002000	California	Sacramento_ CA	80	Male	White	Married_ spouse absent	120002000	California	Sacramento_ CA	36	Male	White	Married_ spouse present	109002000	California	Sacramento_ CA	29	Male	White	Married_ spouse absent	1600002000	California	Sacramento_ CA	64	Male	White	Divorced	140002000	California	Sacramento_ CA	27	Female	White	Married_ spouse present	196002000	California	Sacramento_ CA	29	Male	White	Married_ spouse present	680002000	California	Sacramento_ CA	93	Male	White	Married_ spouse present	393002000	California	Sacramento_ CA	22	Male	Other Asian or Pacific Islander	Never married/single (N/A)	120002000	California	Sacramento_ CA	23	Male	Other Asian or Pacific Islander	Never married/single (N/A)	67002000	California	Sacramento_ CA	38	Male	White	Divorced	500002000	California	Sacramento_ CA	40	Female	White	Married_ spouse present	524902000	California	Sacramento_ CA	39	Male	White	Married_ spouse present	624002000	California	Sacramento_ CA	11	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	25	Female	Other Asian or Pacific Islander	Married_ spouse present	02000	California	Sacramento_ CA	8	Female	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	44	Male	Two major races	Married_ spouse present	160002000	California	Sacramento_ CA	39	Female	Two major races	Married_ spouse present	69002000	California	Sacramento_ CA	21	Male	Other race_ nec	Never married/single (N/A)	40002000	California	Sacramento_ CA	20	Male	White	Never married/single (N/A)	130002000	California	Sacramento_ CA	17	Female	White	Never married/single (N/A)	02000	California	Sacramento_ CA	12	Female	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	21	Male	Other race_ nec	Never married/single (N/A)	167002000	California	Sacramento_ CA	37	Female	Black/Negro	Separated	249002000	California	Sacramento_ CA	33	Male	Black/Negro	Never married/single (N/A)	161002000	California	Sacramento_ CA	15	Male	Black/Negro	Never married/single (N/A)	71002000	California	Sacramento_ CA	7	Female	Black/Negro	Never married/single (N/A)	2000	California	Sacramento_ CA	12	Female	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	4	Female	Other Asian or Pacific Islander	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	16	Male	White	Never married/single (N/A)	02000	California	Sacramento_ CA	15	Male	White	Never married/single (N/A)	02000	California	Sacramento_ CA	65	Male	White	Married_ spouse present	448002000	California	Sacramento_ CA	71	Female	White	Married_ spouse present	160002000	California	Sacramento_ CA	71	Male	White	Married_ spouse present	867002000	California	Sacramento_ CA	65	Female	White	Married_ spouse present	30002000	California	Sacramento_ CA	40	Female	White	Married_ spouse present	125002000	California	Sacramento_ CA	12	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	52	Female	White	Married_ spouse present	166002000	California	Sacramento_ CA	40	Female	Two major races	Married_ spouse present	1700002000	California	Sacramento_ CA	46	Male	Two major races	Married_ spouse present	700002000	California	Sacramento_ CA	31	Male	White	Married_ spouse present	320002000	California	Sacramento_ CA	8	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	36	Male	Other race_ nec	Married_ spouse present	400402000	California	Sacramento_ CA	8	Male	Other race_ nec	Never married/single (N/A)	2000	California	Sacramento_ CA	45	Male	White	Married_ spouse present	100002000	California	Sacramento_ CA	36	Female	White	Married_ spouse present	340002000	California	Sacramento_ CA	29	Male	White	Married_ spouse present	6002000	California	Sacramento_ CA	24	Female	White	Married_ spouse present	02000	California	Sacramento_ CA	43	Male	White	Married_ spouse present	60002000	California	Sacramento_ CA	2	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	0	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	37	Male	Black/Negro	Married_ spouse present	500202000	California	Sacramento_ CA	35	Female	Black/Negro	Married_ spouse present	500202000	California	Sacramento_ CA	50	Male	White	Married_ spouse present	920002000	California	Sacramento_ CA	49	Female	White	Married_ spouse present	700002000	California	Sacramento_ CA	7	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	7	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	35	Male	White	Married_ spouse present	150002000	California	Sacramento_ CA	27	Female	White	Married_ spouse present	160002000	California	Sacramento_ CA	38	Female	Other Asian or Pacific Islander	Never married/single (N/A)	300002000	California	Sacramento_ CA	58	Male	White	Married_ spouse present	1340002000	California	Sacramento_ CA	53	Female	Two major races	Married_ spouse present	02000	California	Sacramento_ CA	41	Male	White	Married_ spouse present	1700002000	California	Sacramento_ CA	14	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	48	Female	Two major races	Married_ spouse present	02000	California	Sacramento_ CA	33	Female	Two major races	Married_ spouse present	652002000	California	Sacramento_ CA	82	Female	White	Widowed	497002000	California	Sacramento_ CA	50	Male	White	Married_ spouse present	791002000	California	Sacramento_ CA	47	Female	White	Married_ spouse present	142002000	California	Sacramento_ CA	30	Male	Two major races	Married_ spouse present	200002000	California	Sacramento_ CA	44	Female	Two major races	Married_ spouse present	1052002000	California	Sacramento_ CA	41	Female	White	Married_ spouse present	200002000	California	Sacramento_ CA	4	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	25	Female	White	Never married/single (N/A)	292002000	California	Sacramento_ CA	7	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	53	Female	White	Married_ spouse present	600002000	California	Sacramento_ CA	19	Female	White	Never married/single (N/A)	20002000	California	Sacramento_ CA	93	Male	White	Divorced	226002000	California	Sacramento_ CA	32	Male	White	Divorced	120002000	California	Sacramento_ CA	50	Female	White	Married_ spouse present	340002000	California	Sacramento_ CA	53	Male	White	Married_ spouse present	246002000	California	Sacramento_ CA	41	Male	White	Married_ spouse present	500002000	California	Sacramento_ CA	38	Female	White	Married_ spouse present	210002000	California	Sacramento_ CA	15	Female	White	Never married/single (N/A)	02000	California	Sacramento_ CA	8	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	0	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	79	Female	White	Widowed	160002000	California	Sacramento_ CA	63	Female	White	Widowed	2069002000	California	Sacramento_ CA	41	Male	White	Married_ spouse present	100002000	California	Sacramento_ CA	40	Female	White	Married_ spouse present	56002000	California	Sacramento_ CA	34	Female	White	Never married/single (N/A)	245002000	California	Sacramento_ CA	11	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	51	Male	White	Married_ spouse present	849002000	California	Sacramento_ CA	11	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	66	Male	White	Married_ spouse present	93002000	California	Sacramento_ CA	65	Female	White	Married_ spouse present	56002000	California	Sacramento_ CA	52	Male	White	Married_ spouse present	604002000	California	Sacramento_ CA	31	Male	White	Never married/single (N/A)	250002000	California	Sacramento_ CA	54	Female	White	Divorced	250002000	California	Sacramento_ CA	7	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	5	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	43	Female	White	Married_ spouse present	50002000	California	Sacramento_ CA	12	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	44	Male	White	Never married/single (N/A)	200002000	California	Sacramento_ CA	69	Female	White	Widowed	304002000	California	Sacramento_ CA	52	Female	White	Separated	130002000	California	Sacramento_ CA	42	Male	White	Married_ spouse present	810002000	California	Sacramento_ CA	47	Female	White	Married_ spouse present	134002000	California	Sacramento_ CA	12	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	59	Female	White	Married_ spouse present	304002000	California	Sacramento_ CA	14	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	6	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	1	Male	White	Never married/single (N/A)	2000	California	Sacramento_ CA	28	Female	White	Married_ spouse present	376002000	California	Sacramento_ CA	1	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	0	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	62	Male	White	Married_ spouse present	1150002000	California	Sacramento_ CA	83	Female	Chinese	Widowed	821002000	California	Sacramento_ CA	9	Female	White	Never married/single (N/A)	2000	California	Sacramento_ CA	50	Male	White	Married_ spouse present	500002000	California	Sacramento_ CA	48	Male	White	Married_ spouse present	136002000	California	Sacramento_ CA	23	Male	White	Never married/single (N/A)	9002000	Ohio	Cleveland_ OH	76	Male	White	Married_ spouse absent	330002000	Ohio	Cleveland_ OH	68	Male	White	Married_ spouse present	413002000	Ohio	Cleveland_ OH	46	Female	White	Married_ spouse present	477002000	Ohio	Cleveland_ OH	45	Male	White	Widowed	66902000	Ohio	Cleveland_ OH	48	Male	White	Married_ spouse present	900002000	Ohio	Cleveland_ OH	48	Female	White	Married_ spouse present	210002000	Ohio	Cleveland_ OH	15	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	50	Male	White	Married_ spouse present	810002000	Ohio	Cleveland_ OH	50	Female	White	Married_ spouse present	170002000	Ohio	Cleveland_ OH	62	Female	White	Married_ spouse present	23002000	Ohio	Cleveland_ OH	30	Male	White	Married_ spouse present	352002000	Ohio	Cleveland_ OH	31	Female	White	Married_ spouse present	246002000	Ohio	Cleveland_ OH	5	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	49	Male	White	Married_ spouse present	1270002000	Ohio	Cleveland_ OH	16	Male	White	Never married/single (N/A)	1302000	Ohio	Cleveland_ OH	88	Female	White	Widowed	199002000	Ohio	Cleveland_ OH	35	Female	White	Married_ spouse present	100002000	Ohio	Cleveland_ OH	38	Male	White	Never married/single (N/A)	182002000	Ohio	Cleveland_ OH	67	Female	White	Married_ spouse present	84002000	Ohio	Cleveland_ OH	48	Female	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	32	Male	White	Married_ spouse present	430002000	Ohio	Cleveland_ OH	21	Female	Black/Negro	Never married/single (N/A)	106002000	Ohio	Cleveland_ OH	32	Female	White	Married_ spouse present	276002000	Ohio	Cleveland_ OH	12	Female	Two major races	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	7	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	5	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	38	Female	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	11	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	8	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	5	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	0	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	42	Male	White	Married_ spouse present	960302000	Ohio	Cleveland_ OH	42	Female	White	Married_ spouse present	280002000	Ohio	Cleveland_ OH	25	Male	White	Divorced	250002000	Ohio	Cleveland_ OH	54	Male	White	Married_ spouse present	310002000	Ohio	Cleveland_ OH	52	Female	White	Married_ spouse present	361002000	Ohio	Cleveland_ OH	29	Male	White	Separated	610002000	Ohio	Cleveland_ OH	58	Female	White	Divorced	61002000	Ohio	Cleveland_ OH	70	Male	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	24	Female	White	Never married/single (N/A)	270002000	Ohio	Cleveland_ OH	14	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	12	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	43	Female	Other race_ nec	Married_ spouse present	390002000	Ohio	Cleveland_ OH	13	Male	Other race_ nec	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	20	Female	Other race_ nec	Never married/single (N/A)	15402000	Ohio	Cleveland_ OH	54	Female	White	Divorced	550002000	Ohio	Cleveland_ OH	15	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	13	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	38	Female	White	Married_ spouse present	290002000	Ohio	Cleveland_ OH	42	Female	American Indian or Alaska Native	Never married/single (N/A)	141002000	Ohio	Cleveland_ OH	58	Male	White	Married_ spouse present	1524002000	Ohio	Cleveland_ OH	26	Male	White	Married_ spouse present	366002000	Ohio	Cleveland_ OH	8	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	5	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	0	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	65	Male	White	Married_ spouse present	201002000	Ohio	Cleveland_ OH	56	Female	White	Married_ spouse present	3002000	Ohio	Cleveland_ OH	61	Female	White	Married_ spouse present	64002000	Ohio	Cleveland_ OH	50	Male	White	Married_ spouse present	756002000	Ohio	Cleveland_ OH	41	Female	White	Married_ spouse present	4902000	Ohio	Cleveland_ OH	11	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	86	Male	White	Widowed	295002000	Ohio	Cleveland_ OH	45	Male	White	Divorced	679002000	Ohio	Cleveland_ OH	33	Male	White	Never married/single (N/A)	220002000	Ohio	Cleveland_ OH	51	Female	White	Married_ spouse absent	96002000	Ohio	Cleveland_ OH	15	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	32	Female	White	Divorced	355002000	Ohio	Cleveland_ OH	22	Female	White	Never married/single (N/A)	240002000	Ohio	Cleveland_ OH	10	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	77	Male	White	Married_ spouse present	132102000	Ohio	Cleveland_ OH	75	Female	White	Married_ spouse present	149202000	Ohio	Cleveland_ OH	57	Female	White	Married_ spouse present	7002000	Ohio	Cleveland_ OH	23	Female	White	Never married/single (N/A)	110002000	Ohio	Cleveland_ OH	4	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	37	Male	White	Married_ spouse present	301002000	Ohio	Cleveland_ OH	72	Female	White	Married_ spouse present	53002000	Ohio	Cleveland_ OH	62	Female	White	Divorced	90002000	Ohio	Cleveland_ OH	77	Male	White	Divorced	107802000	Ohio	Cleveland_ OH	41	Male	White	Never married/single (N/A)	180002000	Ohio	Cleveland_ OH	52	Female	White	Divorced	487002000	Ohio	Cleveland_ OH	53	Male	White	Divorced	350002000	Ohio	Cleveland_ OH	43	Male	White	Married_ spouse present	620002000	Ohio	Cleveland_ OH	14	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	10	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	56	Male	White	Married_ spouse present	593002000	Ohio	Cleveland_ OH	53	Female	White	Married_ spouse present	350002000	Ohio	Cleveland_ OH	60	Male	White	Married_ spouse present	360042000	Ohio	Cleveland_ OH	57	Female	White	Married_ spouse present	250102000	Ohio	Cleveland_ OH	50	Male	White	Married_ spouse present	377002000	Ohio	Cleveland_ OH	45	Female	White	Married_ spouse present	336002000	Ohio	Cleveland_ OH	18	Male	White	Never married/single (N/A)	88402000	Ohio	Cleveland_ OH	11	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	39	Female	White	Married_ spouse present	248002000	Ohio	Cleveland_ OH	35	Male	White	Married_ spouse present	544502000	Ohio	Cleveland_ OH	2	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	56	Female	White	Divorced	544002000	Ohio	Cleveland_ OH	93	Female	White	Widowed	02000	Ohio	Cleveland_ OH	69	Male	White	Widowed	479902000	Ohio	Cleveland_ OH	51	Male	White	Married_ spouse present	1310002000	Ohio	Cleveland_ OH	53	Female	White	Married_ spouse present	700002000	Ohio	Cleveland_ OH	80	Male	White	Married_ spouse present	432002000	Ohio	Cleveland_ OH	68	Female	White	Married_ spouse present	708002000	Ohio	Cleveland_ OH	38	Male	White	Never married/single (N/A)	250002000	Ohio	Cleveland_ OH	34	Female	White	Married_ spouse present	300002000	Ohio	Cleveland_ OH	70	Female	White	Never married/single (N/A)	667002000	Ohio	Cleveland_ OH	57	Male	White	Married_ spouse present	600002000	Ohio	Cleveland_ OH	47	Female	White	Never married/single (N/A)	220002000	Ohio	Cleveland_ OH	67	Female	White	Married_ spouse absent	289002000	Ohio	Cleveland_ OH	35	Female	White	Divorced	241002000	Ohio	Cleveland_ OH	15	Male	White	Never married/single (N/A)	9002000	Ohio	Cleveland_ OH	13	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	58	Female	White	Widowed	489002000	Ohio	Cleveland_ OH	72	Female	White	Widowed	136002000	Ohio	Cleveland_ OH	27	Female	Black/Negro	Never married/single (N/A)	207002000	Ohio	Cleveland_ OH	7	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	4	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	0	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	73	Male	White	Widowed	185002000	Ohio	Cleveland_ OH	65	Male	Black/Negro	Married_ spouse present	218002000	Ohio	Cleveland_ OH	66	Female	Black/Negro	Married_ spouse present	36002000	Ohio	Cleveland_ OH	63	Male	White	Married_ spouse present	90002000	Ohio	Cleveland_ OH	60	Female	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	79	Male	White	Never married/single (N/A)	134002000	Ohio	Cleveland_ OH	83	Female	White	Widowed	104002000	Ohio	Cleveland_ OH	7	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	83	Male	White	Widowed	180002000	Ohio	Cleveland_ OH	62	Male	Black/Negro	Never married/single (N/A)	60002000	Ohio	Cleveland_ OH	12	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	1	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	14	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	8	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	71	Female	White	Never married/single (N/A)	60002000	Ohio	Cleveland_ OH	68	Female	White	Widowed	135702000	Ohio	Cleveland_ OH	51	Male	White	Married_ spouse present	327002000	Ohio	Cleveland_ OH	50	Female	White	Married_ spouse present	326002000	Ohio	Cleveland_ OH	19	Male	White	Never married/single (N/A)	45602000	Ohio	Cleveland_ OH	53	Female	White	Never married/single (N/A)	420002000	Ohio	Cleveland_ OH	24	Female	Black/Negro	Never married/single (N/A)	26002000	Ohio	Cleveland_ OH	2	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	77	Female	White	Widowed	123902000	Ohio	Cleveland_ OH	37	Female	Black/Negro	Never married/single (N/A)	232002000	Ohio	Cleveland_ OH	16	Female	Black/Negro	Never married/single (N/A)	5002000	Ohio	Cleveland_ OH	13	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	16	Female	Black/Negro	Never married/single (N/A)	5002000	Ohio	Cleveland_ OH	2	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	12	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	11	Female	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	66	Male	Black/Negro	Divorced	100102000	Ohio	Cleveland_ OH	41	Male	Black/Negro	Never married/single (N/A)	56002000	Ohio	Cleveland_ OH	30	Female	Black/Negro	Never married/single (N/A)	120042000	Ohio	Cleveland_ OH	42	Female	White	Divorced	416002000	Ohio	Cleveland_ OH	65	Male	White	Divorced	440002000	Ohio	Cleveland_ OH	47	Male	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	47	Female	White	Never married/single (N/A)	106002000	Ohio	Cleveland_ OH	59	Female	White	Divorced	02000	Ohio	Cleveland_ OH	86	Female	Black/Negro	Widowed	178602000	Ohio	Cleveland_ OH	70	Male	Black/Negro	Married_ spouse present	208002000	Ohio	Cleveland_ OH	69	Female	Black/Negro	Married_ spouse present	299002000	Ohio	Cleveland_ OH	52	Female	White	Divorced	5002000	Ohio	Cleveland_ OH	49	Female	Black/Negro	Never married/single (N/A)	55002000	Ohio	Cleveland_ OH	63	Female	White	Divorced	76002000	Ohio	Cleveland_ OH	33	Male	White	Never married/single (N/A)	319002000	Ohio	Cleveland_ OH	73	Female	White	Widowed	87002000	Ohio	Cleveland_ OH	27	Male	White	Divorced	190002000	Ohio	Cleveland_ OH	39	Male	White	Married_ spouse present	860002000	Ohio	Cleveland_ OH	39	Female	White	Married_ spouse present	200002000	Ohio	Cleveland_ OH	14	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	0	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	37	Male	Black/Negro	Never married/single (N/A)	370602000	Ohio	Cleveland_ OH	32	Male	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	59	Female	Two major races	Divorced	300002000	Ohio	Cleveland_ OH	57	Female	White	Married_ spouse present	1260002000	Ohio	Cleveland_ OH	74	Male	White	Married_ spouse present	299002000	Ohio	Cleveland_ OH	71	Female	White	Married_ spouse present	65002000	Ohio	Cleveland_ OH	40	Male	White	Never married/single (N/A)	78002000	Ohio	Cleveland_ OH	42	Female	White	Separated	195002000	Ohio	Cleveland_ OH	79	Male	White	Married_ spouse present	325002000	Ohio	Cleveland_ OH	47	Male	White	Married_ spouse present	350002000	Ohio	Cleveland_ OH	49	Female	White	Married_ spouse present	200002000	Ohio	Cleveland_ OH	9	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	85	Male	White	Widowed	175002000	Ohio	Cleveland_ OH	40	Female	White	Divorced	232002000	Ohio	Cleveland_ OH	14	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	84	Female	White	Widowed	75002000	Ohio	Cleveland_ OH	53	Female	White	Never married/single (N/A)	320002000	Ohio	Cleveland_ OH	26	Male	Black/Negro	Married_ spouse present	182002000	Ohio	Cleveland_ OH	1	Male	Black/Negro	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	75	Male	White	Married_ spouse present	170002000	Ohio	Cleveland_ OH	35	Female	White	Married_ spouse present	300002000	Ohio	Cleveland_ OH	9	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	3	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	3	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	74	Female	White	Married_ spouse present	270002000	Ohio	Cleveland_ OH	18	Female	White	Never married/single (N/A)	28002000	Ohio	Cleveland_ OH	54	Female	Black/Negro	Divorced	110002000	Ohio	Cleveland_ OH	58	Male	Black/Negro	Divorced	100002000	Ohio	Cleveland_ OH	29	Male	White	Never married/single (N/A)	303002000	Ohio	Cleveland_ OH	51	Male	White	Separated	400002000	Ohio	Cleveland_ OH	63	Female	White	Divorced	600002000	Ohio	Cleveland_ OH	67	Female	White	Widowed	92002000	Ohio	Cleveland_ OH	69	Female	Black/Negro	Widowed	85002000	Ohio	Cleveland_ OH	40	Female	White	Married_ spouse present	131302000	Ohio	Cleveland_ OH	62	Male	White	Married_ spouse present	455002000	Ohio	Cleveland_ OH	61	Female	White	Married_ spouse present	490002000	Ohio	Cleveland_ OH	18	Male	White	Never married/single (N/A)	290002000	Ohio	Cleveland_ OH	61	Male	White	Married_ spouse present	200002000	Ohio	Cleveland_ OH	38	Male	White	Married_ spouse present	462002000	Ohio	Cleveland_ OH	40	Male	White	Married_ spouse present	1120002000	Ohio	Cleveland_ OH	34	Female	White	Married_ spouse present	32002000	Ohio	Cleveland_ OH	19	Male	White	Never married/single (N/A)	130002000	Ohio	Cleveland_ OH	36	Male	White	Married_ spouse present	535002000	Ohio	Cleveland_ OH	60	Male	White	Married_ spouse present	492002000	Ohio	Cleveland_ OH	47	Female	White	Married_ spouse present	253502000	Ohio	Cleveland_ OH	26	Female	White	Never married/single (N/A)	293002000	Ohio	Cleveland_ OH	51	Female	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	34	Female	White	Separated	25002000	Ohio	Cleveland_ OH	3	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	45	Male	White	Married_ spouse present	519102000	Ohio	Cleveland_ OH	7	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	31	Female	White	Married_ spouse present	330002000	Ohio	Cleveland_ OH	0	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	41	Female	White	Married_ spouse present	80002000	Ohio	Cleveland_ OH	22	Male	White	Never married/single (N/A)	32002000	Ohio	Cleveland_ OH	12	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	35	Male	Other race_ nec	Married_ spouse absent	138002000	Ohio	Cleveland_ OH	11	Female	Other race_ nec	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	26	Male	Other race_ nec	Married_ spouse present	120002000	Ohio	Cleveland_ OH	47	Male	Other race_ nec	Married_ spouse absent	124002000	Ohio	Cleveland_ OH	62	Male	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	63	Female	White	Married_ spouse present	02000	Ohio	Cleveland_ OH	18	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	17	Male	White	Never married/single (N/A)	8302000	Ohio	Cleveland_ OH	41	Female	White	Divorced	120002000	Ohio	Cleveland_ OH	15	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	26	Male	White	Married_ spouse present	754002000	Ohio	Cleveland_ OH	30	Female	White	Married_ spouse present	56002000	Ohio	Cleveland_ OH	51	Male	White	Married_ spouse present	361002000	Ohio	Cleveland_ OH	44	Female	White	Married_ spouse present	301002000	Ohio	Cleveland_ OH	22	Female	White	Never married/single (N/A)	180002000	Ohio	Cleveland_ OH	18	Female	White	Never married/single (N/A)	60002000	Ohio	Cleveland_ OH	21	Female	White	Never married/single (N/A)	65002000	Ohio	Cleveland_ OH	74	Male	White	Married_ spouse present	231002000	Ohio	Cleveland_ OH	27	Male	White	Divorced	600002000	Ohio	Cleveland_ OH	8	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	47	Male	White	Married_ spouse present	336002000	Ohio	Cleveland_ OH	72	Female	White	Widowed	210002000	Ohio	Cleveland_ OH	61	Male	White	Married_ spouse present	1432302000	Ohio	Cleveland_ OH	16	Female	White	Never married/single (N/A)	02000	Ohio	Cleveland_ OH	13	Male	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	43	Male	White	Married_ spouse present	1163902000	Ohio	Cleveland_ OH	39	Female	White	Married_ spouse present	230002000	Ohio	Cleveland_ OH	10	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	6	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	62	Male	White	Widowed	4202000	Ohio	Cleveland_ OH	53	Male	White	Married_ spouse present	700002000	Ohio	Cleveland_ OH	52	Female	White	Married_ spouse present	300002000	Ohio	Cleveland_ OH	50	Male	White	Divorced	165002000	Ohio	Cleveland_ OH	59	Female	White	Divorced	272002000	Ohio	Cleveland_ OH	40	Male	White	Married_ spouse present	237002000	Ohio	Cleveland_ OH	40	Female	White	Married_ spouse present	198002000	Ohio	Cleveland_ OH	9	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	32	Female	White	Married_ spouse present	257002000	Ohio	Cleveland_ OH	4	Female	White	Never married/single (N/A)	2000	Ohio	Cleveland_ OH	34	Male	White	Never married/single (N/A)	360002000	Ohio	Cleveland_ OH	73	Female	White	Widowed	258202000	Ohio	Cleveland_ OH	59	Male	White	Married_ spouse present	340002000	Ohio	Cleveland_ OH	60	Female	White	Married_ spouse present	21000
\ No newline at end of file
diff --git a/previous_versions/v0.4.0/data/dem_score.csv b/previous_versions/v0.4.0/data/dem_score.csv
new file mode 100755
index 000000000..c48fc1f49
--- /dev/null
+++ b/previous_versions/v0.4.0/data/dem_score.csv
@@ -0,0 +1,97 @@
+country,1952,1957,1962,1967,1972,1977,1982,1987,1992
+Albania,-9,-9,-9,-9,-9,-9,-9,-9,5
+Argentina,-9,-1,-1,-9,-9,-9,-8,8,7
+Armenia,-9,-7,-7,-7,-7,-7,-7,-7,7
+Australia,10,10,10,10,10,10,10,10,10
+Austria,10,10,10,10,10,10,10,10,10
+Azerbaijan,-9,-7,-7,-7,-7,-7,-7,-7,1
+Belarus,-9,-7,-7,-7,-7,-7,-7,-7,7
+Belgium,10,10,10,10,10,10,10,10,10
+Bhutan,-10,-10,-10,-10,-10,-10,-10,-10,-10
+Bolivia,-4,-3,-3,-4,-7,-7,8,9,9
+Brazil,5,5,5,-9,-9,-4,-3,7,8
+Bulgaria,-7,-7,-7,-7,-7,-7,-7,-7,8
+Canada,10,10,10,10,10,10,10,10,10
+Chile,2,5,5,6,6,-7,-7,-6,8
+China,-8,-8,-8,-9,-8,-7,-7,-7,-7
+Colombia,-5,7,7,7,7,8,8,8,9
+Costa Rica,10,10,10,10,10,10,10,10,10
+Croatia,-7,-7,-7,-7,-7,-7,-5,-5,-3
+Cuba,0,-9,-7,-7,-7,-7,-7,-7,-7
+Czech Rep.,-7,-7,-7,-7,-7,-7,-7,-7,8
+Denmark,10,10,10,10,10,10,10,10,10
+Dominican Rep.,-9,-9,8,-3,-3,-3,6,6,6
+Ecuador,2,2,-1,-1,-5,-5,9,8,9
+Egypt,-7,-7,-7,-7,-7,-6,-6,-6,-6
+El Salvador,-6,-5,-3,0,-1,-6,2,6,7
+Estonia,-9,-7,-7,-7,-7,-7,-7,-7,6
+Ethiopia,-9,-9,-9,-9,-9,-7,-7,-8,0
+Finland,10,10,10,10,10,10,10,10,10
+France,10,10,5,5,8,8,8,9,9
+Georgia,-9,-7,-7,-7,-7,-7,-7,-7,4
+Germany,10,10,10,10,10,10,10,10,10
+Greece,4,4,4,-7,-7,8,8,10,10
+Guatemala,2,-6,-5,3,1,-3,-7,3,3
+Haiti,-5,-5,-9,-9,-10,-9,-9,-8,-7
+Honduras,-3,-1,-1,-1,-1,-1,6,5,6
+Hungary,-7,-7,-7,-7,-7,-7,-7,-7,10
+India,9,9,9,9,9,8,8,8,8
+Indonesia,0,-1,-5,-7,-7,-7,-7,-7,-7
+Iran,-1,-10,-10,-10,-10,-10,-6,-6,-6
+Iraq,-4,-4,-5,-5,-7,-7,-9,-9,-9
+Ireland,10,10,10,10,10,10,10,10,10
+Israel,10,10,10,9,9,9,9,9,9
+Italy,10,10,10,10,10,10,10,10,10
+Japan,10,10,10,10,10,10,10,10,10
+Jordan,-1,-9,-9,-9,-9,-10,-10,-9,-2
+Kazakhstan,-9,-7,-7,-7,-7,-7,-7,-7,-3
+"Korea, Dem. Rep.",-7,-8,-8,-9,-9,-9,-9,-9,-9
+"Korea, Rep.",-4,-4,-7,3,-9,-8,-5,1,6
+Kyrgyzstan,-9,-7,-7,-7,-7,-7,-7,-7,-3
+Latvia,-9,-7,-7,-7,-7,-7,-7,-7,8
+Lebanon,2,2,2,2,5,0,0,0,0
+Liberia,-6,-6,-6,-6,-6,-6,-7,-6,0
+Libya,-7,-7,-7,-7,-7,-7,-7,-7,-7
+Lithuania,-9,-7,-7,-7,-7,-7,-7,-7,10
+"Macedonia, FYR",-7,-7,-7,-7,-7,-7,-5,-5,6
+Mexico,-6,-6,-6,-6,-6,-3,-3,-3,0
+Moldova,-9,-7,-7,-7,-7,-7,-7,-7,5
+Mongolia,-7,-7,-7,-7,-7,-7,-7,-7,9
+Montenegro,-7,-7,-7,-7,-7,-7,-5,-5,-5
+Myanmar,8,8,-6,-7,-7,-6,-8,-8,-7
+Nepal,-7,-4,-9,-9,-9,-9,-2,-2,5
+Netherlands,10,10,10,10,10,10,10,10,10
+New Zealand,10,10,10,10,10,10,10,10,10
+Nicaragua,-8,-8,-8,-8,-8,-8,-5,-1,6
+Norway,10,10,10,10,10,10,10,10,10
+Oman,-6,-10,-10,-10,-10,-10,-10,-10,-9
+Pakistan,5,8,1,1,4,-7,-7,-4,8
+Panama,-1,4,4,4,-7,-7,-5,-8,8
+Paraguay,-5,-9,-9,-8,-8,-8,-8,-8,7
+Peru,-2,5,-6,5,-7,-7,7,7,-3
+Philippines,5,5,5,5,-9,-9,-7,8,8
+Poland,-7,-7,-7,-7,-7,-7,-8,-6,8
+Portugal,-9,-9,-9,-9,-9,9,10,10,10
+Romania,-7,-7,-7,-7,-7,-8,-8,-8,5
+Russia,-9,-7,-7,-7,-7,-7,-7,-7,5
+Saudi Arabia,-10,-10,-10,-10,-10,-10,-10,-10,-10
+Serbia,-7,-7,-7,-7,-7,-7,-5,-5,-5
+Slovak Republic,-7,-7,-7,-7,-7,-7,-7,-7,8
+Slovenia,-7,-7,-7,-7,-7,-7,-5,-5,10
+South Africa,4,4,4,4,4,4,4,4,6
+Spain,-7,-7,-7,-7,-7,5,10,10,10
+Sri Lanka,7,7,7,7,8,8,5,5,5
+Sweden,10,10,10,10,10,10,10,10,10
+Switzerland,10,10,10,10,10,10,10,10,10
+Syria,-7,7,-2,-7,-9,-9,-9,-9,-9
+Taiwan,-8,-8,-8,-8,-8,-7,-7,-1,7
+Tajikistan,-9,-7,-7,-7,-7,-7,-7,-7,-6
+Thailand,-6,-3,-7,-7,-7,-2,2,2,9
+Turkey,7,4,9,8,-2,9,-5,7,9
+Turkmenistan,-9,-7,-7,-7,-7,-7,-7,-7,-9
+Ukraine,-9,-7,-7,-7,-7,-7,-7,-7,6
+United Kingdom,10,10,10,10,10,10,10,10,10
+United States,10,10,10,10,10,10,10,10,10
+Uruguay,8,8,8,8,-3,-8,-7,9,10
+Uzbekistan,-9,-7,-7,-7,-7,-7,-7,-7,-9
+Venezuela,-3,-3,6,6,9,9,9,9,8
diff --git a/previous_versions/v0.4.0/data/dem_score.xlsx b/previous_versions/v0.4.0/data/dem_score.xlsx
new file mode 100755
index 000000000..85d90daa9
Binary files /dev/null and b/previous_versions/v0.4.0/data/dem_score.xlsx differ
diff --git a/previous_versions/v0.4.0/data/ideology.csv b/previous_versions/v0.4.0/data/ideology.csv
new file mode 100755
index 000000000..302957298
--- /dev/null
+++ b/previous_versions/v0.4.0/data/ideology.csv
@@ -0,0 +1,76 @@
+city,state,state_ideology
+New York,New York,Liberal
+Chicago,Illinois,Liberal
+Los Angeles,California,Liberal
+Washington,DC,Liberal
+Houston,Texas,Conservative
+Philadelphia,Pennsylvania,Conservative
+Phoenix,Arizona,Conservative
+San Diego,California,Liberal
+Dallas,Texas,Conservative
+Detroit,Michigan,Conservative
+San Francisco,California,Liberal
+San Antonio,Texas,Conservative
+Atlanta,Georgia,Conservative
+Las Vegas,Nevada,Liberal
+Baltimore,Maryland,Liberal
+Boston,Massachusetts,Liberal
+"Jacksonville, Fla.",Florida,Conservative
+"El Paso, Texas",Texas,Conservative
+"Columbus, Ohio",Ohio,Conservative
+Cleveland,Ohio,Conservative
+"Tucson, Ariz.",Arizona,Conservative
+"Newark, N.J.",New Jersey,Liberal
+"Austin, Texas",Texas,Conservative
+"Memphis, Tenn.",Tennessee,Conservative
+Milwaukee,Wisconsin,Conservative
+"San Jose, Calif.",California,Liberal
+Miami,Florida,Conservative
+Denver,Colorado,Liberal
+"Sacramento, Calif.",California,Liberal
+"Charlotte, N.C.",North Carolina,Conservative
+"Tampa, Fla.",Florida,Conservative
+Indianapolis,Indiana,Conservative
+"Santa Ana, Calif.",California,Liberal
+New Orleans,Louisiana,Conservative
+"Oakland, Calif.",California,Liberal
+"Orlando, Fla.",Florida,Conservative
+"Oklahoma City, Okla.",Oklahoma,Conservative
+Seattle,Washington,Liberal
+"Kansas City, Mo.",Missouri,Conservative
+"Nashville, Tenn.",Tennessee,Conservative
+"Laredo, Texas",Texas,Conservative
+"Fort Worth, Texas",Texas,Conservative
+"Louisville, Ky.",Kentucky,Conservative
+"Norfolk, Va.",Virginia,Liberal
+"Arlington, Va.",Virginia,Liberal
+Pittsburgh,Pennsylvania,Conservative
+"Albuquerque, N.M.",New Mexico,Liberal
+"Jersey City, N.J.",New Jersey,Liberal
+"Raleigh, N.C.",North Carolina,Conservative
+"Rochester, N.Y.",New York,Liberal
+Cincinnati,Ohio,Conservative
+"Long Beach, Calif.",California,Liberal
+"Birmingham, Ala.",Alabama,Conservative
+"Wichita, Kan.",Kansas,Conservative
+"Virginia Beach, Va.",Virginia,Liberal
+"Fresno, Calif.",California,Liberal
+"Buffalo, N.Y.",New York,Liberal
+Minneapolis,Minneapolis,Liberal
+"Portland, Ore.",Oregon,Liberal
+"Reno, Nev.",Nevada,Liberal
+"Richmond, Va.",Virginia,Liberal
+"Baton Rouge, La.",Louisiana,Conservative
+"Jackson, Miss.",Mississippi,Conservative
+"Riverside, Calif.",California,Liberal
+"Fort Lauderdale, Fla.",Florida,Conservative
+St. Louis,Missouri,Conservative
+"Brownsville, Texas",Texas,Conservative
+"Albany, N.Y.",New York,Liberal
+"Colorado Springs, Colo.",Colorado,Liberal
+"Savannah, Ga.",Georgia,Conservative
+"Winston-Salem, N.C.",North Carolina,Conservative
+"Toledo, Ohio",Ohio,Conservative
+"Madison, Wis.",Wisconsin,Conservative
+"Corpus Christi, Texas",Texas,Conservative
+"San Bernardino, Calif.",California,Liberal
\ No newline at end of file
diff --git a/previous_versions/v0.4.0/data/le_mess.csv b/previous_versions/v0.4.0/data/le_mess.csv
new file mode 100755
index 000000000..7cc6fb6fc
--- /dev/null
+++ b/previous_versions/v0.4.0/data/le_mess.csv
@@ -0,0 +1,203 @@
+country,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
+Afghanistan,27.13,27.67,28.19,28.73,29.27,29.8,30.34,30.86,31.4,31.94,32.47,33.01,33.53,34.07,34.6,35.13,35.66,36.17,36.69,37.2,37.7,38.19,38.67,39.14,39.61,40.07,40.53,40.98,41.46,41.96,42.51,43.11,43.75,44.45,45.21,46.02,46.87,47.74,48.62,49.5,49.3,49.4,49.5,48.9,49.4,49.7,49.5,48.6,50.0,50.1,50.4,51.0,51.4,51.8,52.0,52.1,52.4,52.8,53.3,53.6,54.0,54.4,54.8,54.9,53.8,52.72
+Albania,54.72,55.23,55.85,56.59,57.45,58.42,59.48,60.6,61.75,62.87,63.92,64.84,65.6,66.18,66.59,66.88,67.11,67.32,67.55,67.83,68.16,68.53,68.93,69.35,69.77,70.17,70.54,70.86,71.14,71.39,71.63,71.88,72.15,72.42,72.71,72.96,73.14,73.25,73.3,73.3,73.4,73.6,73.6,73.6,73.7,73.8,74.1,74.2,74.2,74.7,75.1,75.5,75.7,75.9,76.2,76.4,76.6,76.8,77.0,77.2,77.4,77.5,77.7,77.9,78.0,78.1
+Algeria,43.03,43.5,43.96,44.44,44.93,45.44,45.94,46.45,46.97,47.5,48.02,48.55,49.07,49.58,50.09,50.58,51.05,51.49,51.95,52.41,52.88,53.38,53.91,54.52,55.24,56.11,57.13,58.28,59.56,60.92,62.31,63.69,64.97,66.15,67.18,68.04,68.75,69.33,69.81,70.2,70.5,70.9,71.2,71.4,71.6,72.1,72.4,72.6,73.0,73.3,73.5,73.8,73.9,74.4,74.8,75.0,75.3,75.5,75.7,76.0,76.1,76.2,76.3,76.3,76.4,76.5
+Angola,31.05,31.59,32.14,32.69,33.24,33.78,34.33,34.88,35.43,35.98,36.53,37.08,37.63,38.18,38.74,39.28,39.84,40.39,40.95,41.5,42.06,42.62,43.17,43.71,44.22,44.68,45.12,45.5,45.84,46.14,46.42,46.69,46.96,47.23,47.5,47.75,47.99,48.2,48.4,48.6,49.3,49.6,48.4,50.0,50.9,51.3,51.7,51.8,51.8,52.3,52.5,53.3,53.9,54.5,55.2,55.7,56.2,56.7,57.1,57.6,58.1,58.5,58.8,59.2,59.6,60.0
+Antigua and Barbuda,58.26,58.8,59.34,59.87,60.41,60.93,61.45,61.97,62.48,62.97,63.46,63.93,64.38,64.81,65.23,65.63,66.03,66.41,66.81,67.19,67.56,67.94,68.3,68.64,68.99,69.32,69.64,69.96,70.28,70.59,70.9,71.22,71.52,71.82,72.13,72.42,72.7,72.97,73.24,73.5,73.6,73.5,73.4,73.4,73.5,73.5,73.9,74.1,74.0,73.8,74.1,74.3,74.5,74.6,74.9,74.9,75.3,75.5,75.7,75.8,75.9,76.1,76.2,76.3,76.4,76.5
+Argentina,61.93,62.54,63.1,63.59,64.03,64.41,64.73,65.0,65.22,65.39,65.53,65.64,65.74,65.84,65.95,66.08,66.26,66.47,66.72,67.01,67.32,67.64,67.96,68.28,68.6,68.92,69.24,69.57,69.89,70.2,70.51,70.78,71.04,71.26,71.46,71.66,71.84,72.05,72.26,72.5,72.7,72.8,73.1,73.4,73.5,73.5,73.6,73.8,73.9,74.2,74.3,74.3,74.5,75.0,75.3,75.3,75.2,75.4,75.6,75.8,76.0,76.1,76.2,76.3,76.5,76.7
+Armenia,62.67,63.13,63.6,64.07,64.54,65.0,65.45,65.92,66.39,66.86,67.33,67.82,68.3,68.78,69.26,69.74,70.22,70.67,71.1,71.47,71.79,72.02,72.19,72.28,72.33,72.38,72.44,72.53,72.63,72.72,72.73,72.64,72.43,72.1,71.7,71.24,70.82,70.46,70.22,70.1,69.7,68.8,68.3,68.6,69.1,69.4,70.0,70.5,70.8,71.3,71.4,71.6,71.5,71.8,71.8,71.7,72.3,72.3,72.6,73.0,73.5,73.9,74.3,74.5,74.7,74.9
+Aruba,58.96,60.01,60.98,61.87,62.69,63.42,64.09,64.68,65.2,65.66,66.07,66.44,66.79,67.11,67.44,67.76,68.1,68.44,68.78,69.14,69.5,69.85,70.19,70.52,70.83,71.14,71.44,71.74,72.02,72.29,72.54,72.75,72.93,73.07,73.18,73.26,73.33,73.38,73.43,73.47,73.51,73.54,73.57,73.6,73.62,73.65,73.67,73.7,73.73,73.78,73.85,73.94,74.05,74.18,74.32,74.47,74.62,74.77,74.92,75.06,75.19,75.32,75.46,75.59,75.72,75.85
+Australia,68.71,69.11,69.69,69.84,70.16,70.03,70.31,70.86,70.43,70.87,71.14,70.91,70.97,70.63,70.96,70.79,71.07,70.7,71.11,70.78,71.38,71.9,72.11,71.86,72.81,72.84,73.45,73.84,74.4,74.56,74.92,74.7,75.51,75.98,75.41,76.08,76.27,76.3,76.4,77.0,77.4,77.6,77.9,78.1,78.3,78.5,78.8,79.2,79.4,79.8,80.1,80.3,80.6,80.9,81.2,81.4,81.5,81.6,81.8,82.0,82.2,82.4,82.4,82.3,82.3,82.3
+Austria,65.24,66.78,67.27,67.3,67.58,67.7,67.46,68.46,68.39,68.75,69.72,69.51,69.64,70.13,69.92,70.22,70.1,70.25,70.02,70.07,70.27,70.59,71.16,71.15,71.28,71.77,72.12,72.2,72.51,72.64,72.96,73.12,73.19,73.73,73.95,74.43,74.86,75.34,75.43,75.7,75.8,76.0,76.2,76.5,76.8,77.1,77.6,77.8,78.0,78.2,78.6,78.8,79.0,79.4,79.5,80.0,80.1,80.4,80.3,80.5,80.7,80.9,81.1,81.2,81.3,81.4
+Azerbaijan,57.5,57.93,58.36,58.79,59.21,59.63,60.05,60.48,60.9,61.33,61.76,62.2,62.62,63.06,63.49,63.91,64.35,64.75,65.14,65.48,65.75,65.93,66.04,66.05,66.02,65.92,65.8,65.68,65.6,65.55,65.61,65.73,65.92,66.15,66.37,66.48,66.46,66.28,65.98,65.6,65.3,63.7,64.0,63.5,64.6,65.0,65.3,65.6,65.9,66.5,67.2,67.6,67.6,67.8,68.2,68.7,69.1,69.2,69.7,70.1,70.8,71.5,72.1,72.5,72.9,73.3
+Bahamas,58.91,59.29,59.67,60.03,60.39,60.72,61.06,61.38,61.69,62.0,62.29,62.58,62.85,63.13,63.4,63.65,63.91,64.14,64.39,64.61,64.85,65.08,65.3,65.53,65.74,65.96,66.16,66.37,66.57,66.75,66.95,67.12,67.31,67.5,67.67,67.86,68.02,68.2,68.35,68.5,68.9,69.2,69.7,69.5,69.7,70.0,70.2,70.1,70.1,70.2,70.3,70.4,71.1,71.7,71.7,72.0,71.8,72.2,72.7,72.7,72.6,72.7,72.9,73.5,73.7,73.9
+Bahrain,41.45,42.32,43.26,44.27,45.35,46.49,47.7,48.97,50.29,51.64,52.99,54.33,55.64,56.9,58.1,59.23,60.29,61.29,62.22,63.1,63.92,64.67,65.38,66.03,66.63,67.2,67.72,68.21,68.67,69.09,69.47,69.83,70.16,70.46,70.73,70.98,71.2,71.41,71.61,71.8,72.0,72.1,72.5,72.9,73.0,73.4,73.8,74.0,74.2,73.7,74.3,74.8,75.3,75.7,76.1,76.3,77.0,77.6,78.2,78.7,78.8,79.0,79.1,79.1,79.1,79.1
+Bangladesh,42.58,42.87,43.19,43.54,43.91,44.3,44.73,45.19,45.68,46.2,46.73,47.28,47.81,48.29,48.6,48.63,48.37,47.83,47.09,46.31,45.74,45.52,45.77,46.49,47.58,48.92,50.27,51.47,52.44,53.18,53.72,54.15,54.57,55.0,55.47,55.96,56.46,56.94,57.42,57.9,56.4,59.7,60.5,61.2,61.6,62.4,63.2,63.9,64.6,64.9,65.4,65.8,66.3,66.8,67.1,67.5,67.7,68.3,68.6,68.8,69.3,69.4,69.8,70.1,70.4,70.7
+Barbados,56.82,57.41,57.99,58.56,59.13,59.67,60.22,60.76,61.28,61.8,62.31,62.79,63.27,63.74,64.2,64.64,65.08,65.5,65.91,66.31,66.71,67.09,67.47,67.83,68.17,68.53,68.87,69.22,69.57,69.91,70.25,70.58,70.91,71.23,71.54,71.85,72.14,72.43,72.72,73.0,73.2,73.2,73.1,73.0,73.3,73.7,73.9,74.1,74.2,74.0,74.4,74.6,74.8,74.9,75.0,75.0,75.1,75.3,75.3,75.2,75.2,75.4,75.5,75.6,75.7,75.8
+Belarus,65.11,65.54,65.96,66.37,66.77,67.16,67.52,67.88,68.82,71.59,72.3,71.01,71.66,73.17,72.7,73.05,72.78,72.88,72.47,71.94,72.56,72.26,72.29,72.57,71.63,71.46,71.39,71.23,70.82,70.57,70.84,70.95,70.73,70.09,70.28,71.66,71.55,71.28,71.05,70.5,70.1,69.6,68.9,68.6,68.2,68.1,68.0,67.9,67.7,68.1,68.0,67.9,68.2,68.5,68.7,69.1,69.7,70.0,70.1,70.2,70.3,70.4,70.6,70.7,71.0,71.3
+Belgium,66.77,67.97,68.33,68.59,68.54,68.83,69.19,69.88,70.28,69.59,70.46,70.19,70.0,70.66,70.51,70.58,70.86,70.55,70.63,70.89,71.01,71.35,71.56,71.91,71.9,72.05,72.7,72.64,73.13,73.18,73.59,73.81,73.81,74.31,74.41,74.61,75.22,75.53,75.59,76.0,76.2,76.3,76.5,76.6,76.9,77.2,77.4,77.5,77.7,77.8,78.0,78.2,78.5,79.0,79.1,79.5,79.5,79.6,79.8,80.1,80.2,80.3,80.4,80.5,80.5,80.5
+Belize,55.15,55.7,56.27,56.82,57.37,57.91,58.46,58.99,59.54,60.08,60.64,61.2,61.78,62.36,62.95,63.53,64.11,64.67,65.21,65.72,66.21,66.66,67.11,67.52,67.93,68.32,68.7,69.06,69.43,69.78,70.13,70.47,70.8,71.09,71.34,71.51,71.6,71.61,71.54,71.4,71.2,71.1,70.8,70.6,70.5,70.4,69.7,69.5,69.3,69.0,68.8,69.3,69.6,69.9,70.0,70.3,70.6,70.7,70.9,71.2,71.2,71.3,71.3,71.5,71.7,71.9
+Benin,33.53,34.09,34.64,35.19,35.72,36.25,36.77,37.28,37.79,38.29,38.8,39.32,39.85,40.38,40.93,41.5,42.09,42.69,43.31,43.93,44.55,45.16,45.77,46.36,46.93,47.46,47.96,48.43,48.88,49.34,49.84,50.38,50.97,51.62,52.33,53.09,53.89,54.67,55.42,56.1,56.3,56.6,56.9,56.8,56.7,56.6,56.9,57.0,57.1,57.2,57.4,57.7,57.9,58.2,58.6,58.9,59.2,59.7,60.4,60.8,61.1,61.4,61.7,62.0,62.3,62.6
+Bhutan,30.94,31.47,32.01,32.56,33.12,33.68,34.25,34.81,35.38,35.94,36.49,37.04,37.57,38.12,38.68,39.28,39.94,40.66,41.45,42.31,43.23,44.2,45.18,46.2,47.21,48.22,49.22,50.21,51.18,52.12,53.05,53.96,54.87,55.78,56.69,57.61,58.54,59.48,60.44,61.4,61.9,62.4,62.8,63.1,63.8,64.7,65.1,65.6,66.5,65.9,67.5,68.1,68.5,68.9,69.3,69.8,70.3,70.7,70.9,71.4,71.7,71.9,72.2,72.4,72.7,73.0
+Bolivia,40.6,40.94,41.28,41.64,41.98,42.34,42.7,43.05,43.41,43.77,44.14,44.5,44.88,45.24,45.62,45.99,46.34,46.69,47.05,47.44,47.86,48.34,48.89,49.5,50.19,50.93,51.73,52.54,53.38,54.21,55.04,55.87,56.67,57.47,58.22,58.96,59.65,60.33,60.98,61.6,62.2,62.7,63.2,63.8,64.4,65.1,65.6,66.3,66.9,67.6,68.3,68.7,69.3,69.8,70.2,70.6,70.9,71.2,71.6,71.8,72.1,72.4,72.7,72.9,73.2,73.5
+Bosnia and Herzegovina,53.22,54.49,55.7,56.85,57.94,58.97,59.95,60.87,61.74,62.56,63.34,64.07,64.78,65.46,66.14,66.81,67.47,68.14,68.82,69.49,70.17,70.84,71.49,72.12,72.71,73.24,73.71,74.12,74.48,74.82,75.2,75.65,76.15,76.63,76.95,76.89,76.37,75.39,74.07,72.7,72.7,68.0,68.3,71.1,67.0,73.8,74.4,74.8,75.3,75.7,76.2,76.4,76.7,76.9,77.0,77.1,77.3,77.5,77.7,77.9,78.2,78.4,78.6,78.7,78.9,79.1
+Botswana,46.87,47.27,47.66,48.05,48.45,48.84,49.23,49.61,49.99,50.34,50.7,51.02,51.35,51.67,52.0,52.36,52.77,53.23,53.73,54.3,54.9,55.54,56.18,56.82,57.45,58.07,58.65,59.21,59.74,60.24,60.73,61.21,61.67,62.08,62.44,62.7,62.85,62.85,62.69,62.3,62.0,61.2,60.1,58.6,56.8,54.8,52.9,50.9,49.2,47.6,46.5,45.6,45.7,46.9,49.3,51.2,52.4,53.2,54.3,55.6,56.5,56.5,56.9,57.3,58.7,60.13
+Brazil,50.59,51.1,51.62,52.14,52.66,53.19,53.71,54.23,54.75,55.27,55.78,56.27,56.75,57.21,57.66,58.07,58.49,58.91,59.31,59.73,60.14,60.56,60.98,61.41,61.84,62.27,62.68,63.07,63.45,63.81,64.18,64.55,64.94,65.34,65.76,66.18,66.6,67.04,67.47,67.9,68.1,68.3,68.5,68.8,69.0,69.3,69.6,69.9,70.3,70.7,71.1,71.4,71.7,72.0,72.4,72.7,73.0,73.2,73.4,73.6,73.8,74.0,74.1,74.3,74.4,74.5
+Brunei,56.99,57.6,58.22,58.83,59.45,60.07,60.7,61.31,61.93,62.52,63.11,63.67,64.21,64.72,65.21,65.67,66.12,66.54,66.97,67.38,67.79,68.19,68.58,68.95,69.32,69.67,70.01,70.33,70.65,70.95,71.25,71.54,71.84,72.12,72.41,72.69,72.98,73.26,73.54,73.8,73.8,74.0,74.2,74.4,74.7,74.9,75.2,75.6,75.8,75.9,76.1,76.3,76.5,76.7,76.7,76.8,76.8,76.9,77.0,77.1,76.9,76.9,76.9,77.1,77.1,77.1
+Bulgaria,60.65,59.62,64.16,64.43,64.84,65.24,66.64,68.74,66.6,69.22,70.26,69.55,70.38,71.18,71.35,71.28,70.47,71.3,70.48,71.32,70.93,70.96,71.4,71.26,71.11,71.44,70.88,71.24,71.34,71.17,71.56,71.16,71.33,71.43,71.15,71.63,71.42,71.49,71.55,71.4,71.3,71.2,71.1,70.9,71.0,70.9,70.6,71.0,71.4,71.6,71.8,72.1,72.3,72.5,72.6,72.7,72.9,73.2,73.5,73.7,74.2,74.5,74.6,74.7,74.8,74.9
+Burkina Faso,30.65,31.18,31.69,32.21,32.72,33.21,33.71,34.21,34.71,35.21,35.72,36.23,36.75,37.27,37.8,38.3,38.8,39.3,39.78,40.27,40.75,41.25,41.78,42.36,43.0,43.74,44.61,45.56,46.58,47.61,48.58,49.45,50.17,50.71,51.08,51.28,51.38,51.42,51.42,51.4,51.4,51.3,51.3,51.3,51.3,51.5,51.6,51.8,52.2,52.6,53.2,53.8,54.5,55.1,55.9,56.6,57.4,58.0,58.5,59.0,59.5,59.9,60.3,60.6,60.9,61.2
+Burundi,38.19,38.45,38.72,38.98,39.25,39.51,39.77,40.04,40.3,40.58,40.85,41.13,41.41,41.69,41.95,42.18,42.37,42.52,42.65,42.76,42.91,43.1,43.35,43.66,44.02,44.43,44.84,45.24,45.6,45.93,46.22,46.49,46.75,46.95,47.08,47.05,46.88,46.54,46.1,45.6,45.4,45.3,45.1,45.0,44.5,44.3,45.0,45.5,46.3,46.7,48.4,49.8,51.3,53.0,54.7,56.4,57.9,59.1,60.0,60.4,60.8,61.1,61.3,61.4,61.4,61.4
+Cambodia,40.5,40.81,41.08,41.32,41.52,41.7,41.86,41.99,42.14,42.29,42.47,42.7,42.95,43.2,43.45,43.73,44.0,44.13,44.03,43.28,41.67,39.73,37.58,34.94,21.69,19.04,18.1,19.55,21.91,28.16,38.0,44.24,49.43,53.22,55.5,56.49,56.82,56.99,57.22,57.6,57.9,58.2,58.1,58.0,58.1,58.3,58.7,59.0,59.5,60.0,60.8,61.6,62.4,63.2,64.0,64.8,65.4,66.1,66.6,67.0,67.6,68.2,68.7,69.1,69.4,69.7
+Cameroon,39.08,39.51,39.94,40.41,40.87,41.37,41.88,42.39,42.93,43.46,44.0,44.53,45.07,45.59,46.13,46.67,47.22,47.79,48.37,48.97,49.59,50.22,50.85,51.49,52.13,52.74,53.36,53.95,54.52,55.06,55.56,56.03,56.45,56.83,57.17,57.48,57.75,58.01,58.22,58.4,58.2,57.9,57.4,57.0,56.5,56.2,55.5,55.0,54.7,54.3,54.2,54.2,54.3,54.4,54.9,55.4,55.7,56.6,57.3,57.8,58.1,58.5,59.0,59.1,59.4,59.7
+Canada,68.53,68.72,69.1,69.96,70.02,70.0,69.92,70.58,70.62,71.0,71.22,71.25,71.26,71.64,71.74,71.86,72.07,72.23,72.39,72.58,72.91,72.81,73.04,73.12,73.41,73.84,74.13,74.46,74.81,75.05,75.46,75.67,76.04,76.33,76.31,76.46,76.76,76.82,77.09,77.4,77.6,77.7,77.8,77.9,78.0,78.3,78.6,78.8,79.0,79.2,79.5,79.6,79.8,80.1,80.2,80.5,80.6,80.8,81.1,81.3,81.6,81.6,81.6,81.7,81.7,81.7
+Cape Verde,48.45,48.63,48.81,49.0,49.19,49.38,49.57,49.76,49.95,50.12,50.27,50.43,50.59,50.77,51.0,51.32,51.75,52.32,53.0,53.78,54.65,55.57,56.5,57.41,58.3,59.16,60.0,60.82,61.62,62.41,63.19,63.95,64.69,65.43,66.12,66.75,67.33,67.85,68.3,68.7,68.6,68.6,68.4,68.3,68.3,68.2,68.2,68.2,68.2,68.4,68.6,68.7,68.9,69.1,69.3,69.6,69.6,70.4,70.7,71.1,71.4,71.9,72.3,72.7,72.9,73.1
+Central African Republic,33.34,33.79,34.26,34.72,35.18,35.62,36.07,36.53,36.97,37.43,37.89,38.36,38.85,39.36,39.92,40.5,41.15,41.84,42.57,43.36,44.19,45.04,45.91,46.77,47.6,48.36,49.07,49.7,50.21,50.61,50.86,50.96,50.95,50.81,50.57,50.21,49.8,49.34,48.86,48.4,48.1,48.0,47.5,47.2,46.7,46.3,45.9,45.7,45.5,45.3,45.2,45.2,45.2,45.4,45.5,45.8,46.2,46.8,47.6,47.9,48.1,48.5,47.8,48.2,49.6,51.04
+Chad,37.29,37.69,38.09,38.49,38.9,39.31,39.72,40.14,40.54,40.95,41.35,41.76,42.17,42.58,43.01,43.48,43.98,44.54,45.12,45.72,46.33,46.91,47.47,47.98,48.45,48.89,49.31,49.72,50.14,50.56,50.97,51.38,51.78,52.15,52.51,52.81,53.09,53.33,53.52,53.7,54.3,53.9,54.0,53.6,53.6,53.0,52.5,52.1,51.7,51.5,51.7,51.9,52.1,52.6,53.0,53.1,54.0,54.3,55.2,55.8,56.1,56.3,56.6,56.8,57.4,58.01
+Channel Islands,68.71,69.09,69.43,69.72,69.97,70.19,70.37,70.52,70.64,70.74,70.83,70.93,71.03,71.14,71.27,71.39,71.51,71.62,71.73,71.82,71.92,72.02,72.13,72.26,72.41,72.58,72.77,72.98,73.21,73.44,73.67,73.89,74.1,74.3,74.49,74.68,74.87,75.07,75.29,75.51,75.73,75.94,76.14,76.34,76.53,76.72,76.92,77.14,77.37,77.61,77.87,78.14,78.41,78.67,78.93,79.16,79.38,79.57,79.75,79.9,80.05,80.19,80.32,80.47,80.61,80.75
+Chile,54.35,54.56,54.79,55.03,55.29,55.57,55.86,56.16,56.5,56.85,57.23,57.63,58.07,58.54,59.03,59.54,60.07,60.61,61.17,61.74,62.34,62.98,63.63,64.31,65.02,65.75,66.5,67.25,67.99,68.7,69.36,69.97,70.51,71.0,71.42,71.8,72.14,72.47,72.79,73.1,74.1,75.0,75.2,75.3,75.4,75.7,76.2,76.6,76.9,77.3,77.4,77.7,77.8,78.0,78.2,78.2,78.3,78.5,78.5,78.5,78.9,79.1,79.1,79.2,79.4,79.6
+China,41.98,42.91,43.85,45.7,47.2,49.57,49.62,49.17,37.36,30.53,32.95,43.29,50.64,52.0,54.28,55.37,56.9,57.87,59.38,61.0,62.04,61.36,60.97,60.63,60.78,60.46,61.94,62.15,62.95,63.92,64.2,65.28,65.49,65.68,65.87,66.05,66.23,66.39,66.56,66.7,67.0,67.2,67.5,67.9,68.4,68.8,69.1,69.4,69.6,69.8,70.0,70.2,70.9,71.4,71.9,72.6,73.1,73.4,73.9,74.3,74.9,75.3,75.7,75.9,76.2,76.5
+Colombia,49.7,50.93,52.08,53.16,54.15,55.07,55.91,56.69,57.39,58.03,58.63,59.18,59.71,60.21,60.7,61.16,61.6,62.03,62.43,62.83,63.23,63.64,64.08,64.53,65.04,65.58,66.17,66.79,67.43,68.07,68.67,69.24,69.72,70.13,70.48,70.74,70.96,71.14,71.32,71.5,71.1,71.1,71.4,71.6,72.0,72.2,72.8,73.1,73.2,73.3,73.5,73.7,74.5,74.7,75.1,75.3,75.9,76.2,76.2,76.4,77.0,77.3,77.5,77.8,78.0,78.2
+Comoros,40.58,40.91,41.25,41.61,41.99,42.38,42.78,43.19,43.61,44.04,44.47,44.89,45.32,45.75,46.18,46.63,47.1,47.58,48.09,48.61,49.12,49.63,50.12,50.59,51.03,51.46,51.89,52.3,52.72,53.15,53.59,54.03,54.48,54.93,55.36,55.77,56.15,56.5,56.81,57.1,57.4,57.8,58.2,58.5,58.9,58.4,59.4,60.0,61.4,62.1,63.0,63.8,64.8,65.5,66.0,66.3,66.6,67.1,66.7,67.7,67.2,67.6,67.8,68.0,68.1,68.2
+"Congo, Dem. Rep.",40.07,40.58,41.06,41.53,41.97,42.39,42.79,43.17,43.54,43.9,44.25,44.61,44.98,45.36,45.77,46.2,46.66,47.14,47.63,48.13,48.6,49.05,49.46,49.83,50.17,50.49,50.8,51.11,51.43,51.76,52.09,52.41,52.72,53.0,53.28,53.55,53.81,54.07,54.31,54.5,54.4,54.3,54.3,54.3,54.0,51.8,53.2,53.5,54.0,54.3,54.5,54.7,54.9,55.9,56.4,56.8,57.1,57.5,57.9,58.4,58.8,59.1,59.6,60.1,60.8,61.51
+"Congo, Rep.",41.81,42.56,43.32,44.05,44.78,45.5,46.21,46.92,47.6,48.25,48.88,49.47,50.04,50.55,51.02,51.45,51.84,52.21,52.54,52.85,53.14,53.42,53.69,53.94,54.2,54.45,54.71,54.97,55.22,55.45,55.65,55.81,55.93,55.98,55.94,55.79,55.54,55.21,54.78,54.3,54.4,54.4,53.5,53.2,52.6,52.2,46.3,49.9,51.6,52.5,53.5,54.3,55.0,55.8,56.7,57.8,58.3,58.8,59.8,60.4,60.9,61.3,61.5,61.5,61.5,61.5
+Costa Rica,56.6,57.19,57.79,58.38,58.98,59.57,60.17,60.77,61.37,61.97,62.56,63.13,63.7,64.26,64.8,65.33,65.85,66.35,66.84,67.34,67.86,68.4,68.95,69.53,70.12,70.75,71.38,72.0,72.62,73.2,73.73,74.22,74.66,75.04,75.37,75.66,75.9,76.14,76.37,76.6,76.5,76.6,76.6,76.7,76.8,76.8,77.0,77.2,77.5,77.7,78.0,78.2,78.4,78.7,79.0,79.3,79.6,79.8,79.8,79.8,79.9,80.0,80.1,80.2,80.3,80.4
+Cote d'Ivoire,32.0,32.54,33.1,33.71,34.36,35.03,35.75,36.49,37.24,38.0,38.74,39.46,40.17,40.84,41.51,42.21,42.93,43.7,44.53,45.38,46.27,47.15,48.02,48.85,49.63,50.37,51.06,51.7,52.31,52.87,53.38,53.87,54.31,54.69,55.02,55.26,55.43,55.5,55.46,55.3,54.9,54.4,53.7,53.2,52.5,52.3,52.3,52.2,52.2,52.0,52.1,52.3,52.6,52.8,53.4,54.1,54.9,55.4,56.0,56.6,57.0,57.5,58.1,58.5,59.1,59.71
+Croatia,60.57,61.08,61.6,62.1,62.58,63.06,63.52,63.98,64.41,64.85,65.26,65.66,66.05,66.43,66.8,67.16,67.52,67.87,68.22,68.54,68.86,69.14,69.4,69.63,69.83,70.0,70.16,70.3,70.42,70.56,70.71,70.89,71.08,71.31,71.54,71.78,72.0,72.22,72.4,72.6,71.9,72.3,72.9,73.4,73.0,73.4,73.4,73.5,73.8,74.2,74.6,74.9,75.1,75.3,75.7,75.9,76.0,76.2,76.4,76.7,77.1,77.4,77.6,77.8,77.8,77.8
+Cuba,58.53,59.12,59.71,60.29,60.89,61.48,62.07,62.66,63.25,63.85,64.47,65.09,65.71,66.35,66.99,67.6,68.2,68.78,69.32,69.84,70.34,70.82,71.29,71.74,72.18,72.59,72.96,73.3,73.59,73.84,74.05,74.22,74.36,74.48,74.57,74.62,74.65,74.67,74.67,74.7,74.8,74.7,74.7,74.8,75.0,75.2,75.4,75.6,75.8,76.2,76.4,76.8,76.9,77.0,77.1,77.3,77.5,77.6,77.7,77.8,77.9,78.0,78.0,78.1,78.2,78.3
+Cyprus,66.13,66.58,67.03,67.45,67.87,68.26,68.65,69.01,69.38,69.72,70.06,70.38,70.71,71.02,71.33,71.62,71.92,72.19,72.47,72.73,72.99,73.23,73.47,73.7,73.93,74.15,74.37,74.58,74.79,74.99,75.19,75.38,75.58,75.76,75.95,76.12,76.3,76.47,76.64,76.8,76.4,76.7,76.8,76.4,76.7,77.1,77.1,77.1,77.5,77.7,78.5,78.7,79.0,79.1,79.0,79.5,79.8,80.0,80.3,80.6,81.1,81.5,81.7,81.7,81.8,81.9
+Czech Republic,65.32,66.94,67.64,68.14,69.06,69.47,69.14,70.05,70.04,70.58,70.77,70.04,70.56,70.73,70.43,70.65,70.55,70.11,69.62,69.72,69.96,70.49,70.33,70.42,70.77,70.88,70.94,71.02,71.13,70.67,71.11,71.22,71.0,71.26,71.48,71.42,71.87,72.08,72.13,71.8,72.0,72.3,72.7,73.0,73.4,73.8,74.2,74.5,74.7,75.0,75.3,75.4,75.6,75.9,76.2,76.5,76.8,77.1,77.3,77.5,77.8,78.1,78.3,78.6,78.8,79.0
+Denmark,70.97,70.82,71.2,71.4,71.97,72.11,71.87,72.3,72.29,72.28,72.55,72.43,72.52,72.61,72.49,72.57,73.06,73.27,73.36,73.49,73.55,73.59,73.83,73.96,74.24,73.91,74.82,74.59,74.41,74.3,74.44,74.78,74.65,74.81,74.68,74.86,74.97,75.06,75.1,75.1,75.4,75.4,75.4,75.4,75.6,75.9,76.2,76.7,76.3,77.1,77.2,77.2,77.6,77.8,78.3,78.3,78.4,78.9,79.1,79.4,79.9,80.3,80.3,80.3,80.4,80.5
+Djibouti,41.48,41.89,42.31,42.77,43.23,43.71,44.21,44.73,45.24,45.77,46.28,46.79,47.3,47.8,48.33,48.9,49.53,50.23,50.99,51.75,52.51,53.2,53.83,54.38,54.85,55.29,55.71,56.15,56.61,57.1,57.59,58.08,58.55,58.97,59.38,59.74,60.09,60.42,60.72,61.0,60.7,60.4,60.7,60.0,60.4,60.3,60.1,60.0,59.9,60.0,60.1,60.2,60.3,60.4,60.7,60.7,61.5,61.8,62.1,62.3,62.5,62.8,63.1,63.1,63.8,64.51
+Dominican Republic,45.6,46.5,47.39,48.27,49.15,50.01,50.87,51.71,52.54,53.37,54.17,54.97,55.75,56.52,57.28,58.02,58.75,59.47,60.16,60.83,61.47,62.09,62.67,63.23,63.75,64.25,64.73,65.19,65.65,66.12,66.6,67.11,67.63,68.18,68.75,69.34,69.96,70.58,71.2,71.8,72.2,72.5,72.5,72.5,72.6,72.6,72.9,72.9,73.2,73.3,73.4,73.5,73.5,73.1,73.3,73.5,73.7,74.1,74.3,74.4,74.6,74.7,74.9,75.1,75.3,75.5
+Ecuador,48.06,48.64,49.23,49.87,50.54,51.23,51.93,52.65,53.38,54.09,54.77,55.42,56.01,56.53,57.02,57.47,57.89,58.32,58.76,59.21,59.67,60.16,60.67,61.18,61.73,62.3,62.9,63.51,64.16,64.82,65.49,66.17,66.85,67.53,68.18,68.83,69.46,70.06,70.64,71.2,71.4,71.7,71.8,72.2,72.3,72.5,72.7,72.8,73.1,73.2,73.4,73.6,73.7,73.9,74.1,74.3,74.5,74.7,74.9,75.1,75.3,75.5,75.6,75.8,75.9,76.0
+Egypt,39.32,40.72,42.03,43.22,44.3,45.29,46.17,46.97,47.68,48.31,48.89,49.43,49.94,50.42,50.88,51.29,51.65,51.97,52.25,52.54,52.88,53.31,53.84,54.46,55.17,55.93,56.69,57.45,58.16,58.85,59.52,60.21,60.93,61.65,62.38,63.07,63.7,64.27,64.76,65.2,65.4,66.1,66.4,66.7,67.4,67.9,68.2,68.6,69.0,69.7,69.7,69.8,69.8,69.9,70.1,70.1,70.3,70.2,70.1,70.1,70.4,70.5,71.0,71.3,71.5,71.7
+El Salvador,44.11,45.06,45.99,46.9,47.8,48.68,49.55,50.39,51.22,52.02,52.77,53.5,54.18,54.81,55.4,55.93,56.41,56.84,57.24,57.57,57.85,58.07,58.22,58.33,58.36,58.33,58.22,58.09,57.98,57.96,58.13,58.53,59.19,60.11,61.24,62.54,63.91,65.28,66.55,67.7,68.1,68.9,69.3,69.6,70.0,70.3,70.8,71.0,71.6,71.9,71.7,72.5,72.6,72.8,73.0,73.3,73.5,73.7,73.8,74.1,74.3,74.5,74.6,74.8,74.9,75.0
+Equatorial Guinea,34.55,34.9,35.25,35.59,35.95,36.3,36.65,36.99,37.34,37.69,38.04,38.38,38.73,39.08,39.44,39.78,40.13,40.48,40.82,41.17,41.52,41.87,42.21,42.56,42.91,43.28,43.65,44.04,44.44,44.85,45.26,45.69,46.11,46.52,46.92,47.33,47.73,48.14,48.52,48.9,48.7,48.7,48.6,48.5,48.5,48.9,50.3,51.2,52.0,52.9,54.0,54.9,55.3,55.9,56.0,56.8,57.1,57.5,58.0,58.6,58.7,59.4,60.5,61.0,61.0,61.0
+Eritrea,36.47,36.75,37.02,37.29,37.58,37.86,38.14,38.42,38.73,39.03,39.35,39.69,40.04,40.41,40.81,41.22,41.66,42.1,42.56,43.02,43.47,43.92,44.35,44.75,45.14,45.49,45.8,46.09,46.38,46.66,46.97,47.33,47.74,48.21,48.77,49.38,50.06,50.8,51.58,52.4,53.4,54.9,56.2,57.0,57.8,58.4,59.0,58.8,52.2,37.6,59.9,60.0,59.9,60.0,59.9,60.0,60.1,60.1,60.1,60.1,60.2,60.3,60.4,60.6,60.7,60.8
+Estonia,59.91,61.13,63.7,65.05,65.73,67.36,67.84,68.29,68.72,69.42,69.74,69.93,69.99,70.74,70.81,70.78,71.08,70.7,70.4,70.51,70.71,70.48,70.83,70.94,70.26,69.88,70.01,69.87,69.66,69.75,69.62,70.03,69.95,69.83,69.97,71.11,71.13,71.17,70.73,70.1,69.6,69.3,68.2,66.3,67.7,69.8,70.0,69.5,70.2,70.4,70.0,70.9,71.5,72.0,72.5,72.9,73.0,74.2,74.9,76.4,76.3,76.7,77.5,77.6,77.8,78.0
+Ethiopia,33.09,33.41,33.8,34.23,34.72,35.25,32.41,30.37,37.08,37.72,38.35,38.94,39.49,39.36,38.13,39.09,41.09,41.38,41.65,41.9,42.14,41.98,39.85,37.71,38.78,42.86,42.41,42.07,42.74,42.8,42.87,42.93,42.5,39.46,35.43,41.39,43.95,44.4,44.82,45.2,46.9,47.8,48.4,48.8,49.2,50.0,50.6,51.1,50.6,52.1,52.7,53.6,54.3,55.2,56.1,57.2,58.6,60.0,61.2,62.1,62.9,63.6,64.2,64.7,65.2,65.7
+Fiji,51.3,51.85,52.38,52.9,53.4,53.89,54.36,54.81,55.26,55.7,56.12,56.54,56.94,57.35,57.75,58.14,58.52,58.89,59.26,59.61,59.96,60.29,60.6,60.91,61.21,61.5,61.8,62.09,62.37,62.65,62.92,63.2,63.46,63.71,63.96,64.2,64.43,64.66,64.88,65.1,65.1,65.0,64.8,64.7,64.5,64.3,64.1,64.2,64.1,64.2,64.4,64.5,64.6,64.7,64.8,64.8,64.9,64.9,64.9,65.2,65.3,65.4,65.6,65.7,65.8,65.9
+Finland,65.68,66.56,66.63,67.59,67.39,68.01,67.51,68.65,68.83,69.03,69.07,68.78,69.19,69.4,69.16,69.68,69.86,69.82,69.7,70.4,70.22,70.91,71.42,71.34,71.89,72.04,72.56,73.13,73.42,73.71,74.03,74.6,74.51,74.82,74.49,74.86,74.89,74.85,75.07,75.1,75.4,75.7,76.0,76.4,76.7,76.8,77.1,77.3,77.5,77.8,78.1,78.3,78.5,78.8,79.0,79.2,79.4,79.6,79.8,80.0,80.3,80.5,80.8,80.9,80.9,80.9
+France,66.17,67.46,67.4,68.27,68.54,68.57,69.0,70.24,70.27,70.49,71.07,70.61,70.46,71.43,71.26,71.67,71.67,71.66,71.4,72.29,72.27,72.52,72.69,73.04,73.13,73.38,73.99,74.12,74.43,74.53,74.69,75.07,75.06,75.56,75.67,75.95,76.55,76.78,76.91,77.2,77.3,77.6,77.7,78.0,78.2,78.4,78.7,78.7,78.8,79.1,79.2,79.4,79.6,80.2,80.4,80.7,81.0,81.1,81.2,81.4,81.6,81.6,81.7,81.7,81.8,81.9
+French Guiana,52.52,53.05,53.58,54.12,54.67,55.22,55.78,56.37,57.0,57.68,58.44,59.28,60.19,61.14,62.1,63.0,63.8,64.46,64.97,65.34,65.57,65.71,65.81,65.91,66.04,66.24,66.51,66.87,67.3,67.79,68.31,68.83,69.33,69.79,70.2,70.57,70.92,71.27,71.6,71.94,72.27,72.61,72.93,73.25,73.56,73.84,74.1,74.34,74.55,74.75,74.92,75.07,75.21,75.35,75.5,75.65,75.82,76.01,76.21,76.43,76.65,76.89,77.12,77.35,77.58,77.81
+French Polynesia,46.52,48.28,49.86,51.27,52.5,53.55,54.44,55.18,55.78,56.28,56.71,57.09,57.47,57.85,58.24,58.65,59.06,59.45,59.83,60.18,60.52,60.84,61.15,61.47,61.82,62.23,62.72,63.28,63.9,64.56,65.22,65.84,66.39,66.87,67.27,67.59,67.88,68.15,68.42,68.7,69.01,69.33,69.68,70.05,70.43,70.82,71.21,71.59,71.96,72.31,72.67,73.03,73.4,73.77,74.13,74.48,74.81,75.11,75.38,75.62,75.84,76.05,76.26,76.47,76.69,76.91
+Gabon,35.84,36.34,36.8,37.19,37.54,37.83,38.1,38.33,38.56,38.83,39.15,39.56,40.07,40.7,41.42,42.21,43.06,43.9,44.74,45.55,46.35,47.13,47.9,48.68,49.45,50.23,51.01,51.81,52.61,53.42,54.24,55.07,55.88,56.66,57.4,58.04,58.58,59.0,59.32,59.5,59.8,60.2,60.1,59.9,59.8,59.6,59.9,60.0,59.7,59.3,59.0,59.4,59.4,59.4,60.1,60.9,61.6,61.7,62.1,63.0,63.3,63.9,64.4,65.0,65.9,66.81
+Gambia,31.85,32.33,32.78,33.22,33.65,34.06,34.46,34.86,35.27,35.7,36.16,36.68,37.26,37.91,38.66,39.47,40.36,41.3,42.3,43.31,44.36,45.42,46.47,47.51,48.56,49.58,50.6,51.61,52.62,53.61,54.59,55.55,56.48,57.37,58.21,58.97,59.65,60.26,60.81,61.3,61.5,61.5,62.0,62.3,62.6,62.8,63.1,63.4,63.4,63.6,63.9,63.8,64.4,64.7,64.9,65.2,65.3,65.7,66.0,66.5,67.1,67.5,67.8,68.0,68.1,68.2
+Georgia,59.96,60.36,60.75,61.15,61.54,61.93,62.32,62.72,63.11,63.5,63.9,64.31,64.71,65.11,65.52,65.9,66.26,66.6,66.93,67.24,67.54,67.85,68.17,68.47,68.76,69.0,69.19,69.31,69.37,69.4,69.42,69.46,69.52,69.62,69.75,69.86,69.94,69.96,69.95,69.9,69.9,69.4,69.2,70.2,70.7,71.2,71.3,71.4,71.4,71.4,71.7,71.6,71.7,71.5,71.8,71.9,72.1,71.8,72.1,72.2,72.2,72.4,72.5,72.6,72.9,73.2
+Germany,67.08,67.4,67.7,68.0,68.28,68.57,68.49,69.23,69.34,69.26,69.85,70.01,70.1,70.66,70.65,70.77,70.99,70.64,70.48,70.72,70.94,71.16,71.41,71.71,71.56,72.02,72.63,72.6,72.96,73.14,73.37,73.69,73.97,74.44,74.55,74.75,75.15,75.33,75.51,75.4,75.6,76.0,76.1,76.4,76.6,76.9,77.3,77.6,77.8,78.1,78.4,78.6,78.8,79.2,79.4,79.7,79.9,80.0,80.1,80.3,80.5,80.6,80.7,80.7,80.8,80.9
+Ghana,41.66,42.22,42.76,43.3,43.83,44.36,44.87,45.37,45.86,46.34,46.8,47.25,47.66,48.07,48.44,48.8,49.14,49.46,49.78,50.08,50.39,50.7,51.02,51.35,51.68,52.0,52.33,52.63,52.95,53.26,53.6,53.95,54.34,54.76,55.23,55.75,56.31,56.89,57.47,58.0,58.4,58.7,59.5,59.6,60.0,60.1,59.8,60.1,60.1,60.0,59.9,60.0,60.2,60.5,60.8,61.2,61.6,62.0,62.4,62.9,63.5,64.1,64.5,64.8,65.3,65.8
+Greece,65.57,65.72,65.92,66.16,66.46,66.79,67.16,67.57,67.99,68.41,68.8,69.14,69.44,69.69,69.91,70.12,70.34,70.59,70.88,71.2,71.53,71.85,72.13,72.39,72.62,72.85,73.1,73.38,73.68,74.01,74.33,74.64,74.94,75.21,75.47,75.73,76.01,76.32,76.66,77.0,77.1,77.1,77.5,77.7,77.8,77.9,78.1,78.2,78.3,78.6,78.9,79.1,79.3,79.4,79.6,80.0,79.8,80.2,80.2,80.4,80.5,80.6,81.0,81.0,81.0,81.0
+Greenland,43.94,45.59,48.67,51.76,54.85,57.94,58.82,59.71,60.6,61.49,61.85,62.22,62.59,62.97,63.34,63.71,64.08,64.45,64.82,65.19,65.01,64.84,64.66,64.49,64.31,64.14,63.96,63.78,63.61,63.09,62.71,62.8,62.89,63.05,63.42,63.81,64.22,64.14,64.22,64.6,65.1,65.5,65.9,66.3,66.5,66.8,66.9,67.2,67.5,67.8,68.0,68.3,68.5,68.8,69.1,69.5,70.0,70.3,70.6,70.8,71.2,71.6,71.8,72.0,72.1,72.2
+Grenada,55.81,56.39,56.97,57.52,58.07,58.61,59.12,59.63,60.11,60.59,61.05,61.49,61.93,62.35,62.76,63.16,63.54,63.91,64.27,64.62,64.97,65.29,65.62,65.92,66.22,66.52,66.79,67.07,67.33,67.6,67.86,68.1,68.35,68.59,68.83,69.06,69.28,69.5,69.7,69.9,70.2,70.2,70.0,70.4,70.7,70.8,70.8,70.6,70.6,70.5,70.3,70.2,70.2,69.3,70.3,70.5,70.7,70.8,70.9,71.0,71.0,71.1,71.2,71.4,71.5,71.6
+Guadeloupe,52.09,52.94,53.77,54.57,55.35,56.11,56.84,57.55,58.24,58.91,59.58,60.23,60.87,61.51,62.14,62.75,63.34,63.91,64.46,64.98,65.49,65.99,66.49,66.97,67.46,67.93,68.4,68.86,69.31,69.75,70.18,70.6,71.01,71.42,71.82,72.21,72.6,72.98,73.35,73.72,74.08,74.44,74.79,75.14,75.48,75.82,76.15,76.48,76.8,77.12,77.43,77.74,78.04,78.35,78.65,78.95,79.25,79.55,79.85,80.14,80.43,80.69,80.95,81.18,81.41,81.64
+Guam,56.53,57.04,57.55,58.08,58.6,59.12,59.65,60.18,60.71,61.24,61.76,62.28,62.79,63.29,63.78,64.26,64.72,65.18,65.63,66.06,66.49,66.9,67.3,67.7,68.07,68.43,68.79,69.13,69.47,69.8,70.12,70.42,70.73,71.02,71.3,71.58,71.84,72.09,72.35,72.6,72.4,72.4,72.5,72.7,73.0,73.2,69.4,73.4,73.5,73.6,73.6,73.6,73.5,73.3,73.1,72.7,72.4,72.1,71.8,71.6,71.5,71.5,71.6,71.6,71.7,71.8
+Guatemala,42.06,42.44,42.83,43.27,43.73,44.23,44.77,45.32,45.91,46.51,47.12,47.76,48.4,49.05,49.73,50.43,51.16,51.93,52.72,53.5,54.27,55.0,55.67,56.28,56.82,57.32,57.78,58.22,58.66,59.12,59.6,60.1,60.62,61.16,61.72,62.28,62.86,63.45,64.02,64.6,64.0,63.8,64.2,64.6,66.9,68.1,67.7,67.7,68.8,68.8,69.3,70.0,70.1,70.2,69.8,70.2,71.0,71.2,70.9,71.2,71.6,72.1,72.3,72.4,72.6,72.8
+Guinea,33.12,33.44,33.74,34.04,34.35,34.64,34.92,35.2,35.46,35.71,35.95,36.17,36.37,36.57,36.77,36.96,37.16,37.37,37.61,37.89,38.2,38.57,38.97,39.43,39.94,40.47,41.04,41.63,42.25,42.92,43.66,44.47,45.37,46.32,47.33,48.35,49.37,50.33,51.22,52.0,52.3,52.5,53.0,53.1,53.4,53.8,54.0,54.0,54.0,54.2,54.4,54.7,55.1,55.6,56.0,56.4,56.8,57.1,57.5,57.9,58.2,58.5,58.8,58.6,59.1,59.6
+Guinea-Bissau,39.65,40.03,40.42,40.81,41.2,41.58,41.97,42.36,42.75,43.14,43.39,43.64,43.89,44.15,44.39,44.63,44.86,45.09,45.29,45.5,45.71,45.91,46.12,46.33,46.54,46.77,47.02,47.27,47.54,47.83,48.13,48.45,48.78,49.13,49.49,49.87,50.26,50.67,51.09,51.5,51.7,51.8,52.0,52.2,52.3,52.6,52.8,51.7,52.5,52.8,52.7,52.7,52.8,52.8,52.9,53.0,53.2,53.6,53.9,54.3,54.5,54.8,55.1,55.3,55.6,55.9
+Guyana,57.51,57.68,57.85,58.04,58.21,58.38,58.56,58.73,58.9,59.08,59.24,59.41,59.58,59.75,59.92,60.09,60.25,60.43,60.59,60.75,60.92,61.08,61.24,61.4,61.56,61.72,61.88,62.03,62.18,62.34,62.5,62.67,62.85,63.02,63.21,63.41,63.61,63.8,64.01,64.2,64.3,64.5,64.4,64.5,64.4,64.3,64.3,64.3,64.3,64.2,63.9,63.5,63.7,64.2,64.4,64.8,64.9,65.0,65.3,65.5,65.6,65.9,66.2,66.4,66.8,67.2
+Haiti,36.56,37.22,37.87,38.5,39.12,39.74,40.34,40.93,41.52,42.1,42.68,43.26,43.82,44.38,44.93,45.43,45.9,46.33,46.73,47.1,47.45,47.81,48.17,48.53,48.9,49.28,49.63,49.97,50.3,50.62,50.94,51.29,51.64,52.02,52.42,52.81,53.21,53.59,53.95,54.3,54.4,54.9,54.7,55.4,56.2,56.7,57.0,57.5,58.0,58.7,59.2,59.6,59.7,58.6,60.0,60.3,60.8,61.0,61.7,32.2,62.4,62.9,63.4,63.8,64.3,64.8
+Honduras,41.86,42.39,42.95,43.54,44.16,44.83,45.52,46.23,46.97,47.71,48.47,49.21,49.94,50.65,51.35,52.02,52.68,53.34,54.0,54.68,55.37,56.07,56.8,57.56,58.34,59.15,59.97,60.8,61.65,62.5,63.36,64.24,65.12,66.01,66.86,67.67,68.42,69.11,69.73,70.3,70.3,70.1,69.9,70.1,70.1,70.1,70.2,63.9,70.3,70.5,70.6,70.7,70.8,71.0,71.2,71.4,71.6,71.8,71.9,72.0,72.2,72.3,72.6,72.8,73.0,73.2
+"Hong Kong, China",62.38,62.9,63.43,63.98,64.54,65.11,65.69,66.28,66.87,67.45,68.01,68.55,69.05,69.52,69.96,70.37,70.77,71.15,71.53,71.89,72.25,72.58,72.9,73.2,73.49,73.78,74.06,74.35,74.64,74.93,75.22,75.5,75.77,76.03,76.28,76.53,76.78,77.02,77.27,77.52,77.77,78.01,78.25,78.48,78.72,78.99,79.29,79.63,79.99,80.36,80.73,81.08,81.4,81.68,81.92,82.12,82.31,82.49,82.66,82.84,83.02,83.2,83.38,83.56,83.73,83.9
+Hungary,62.48,64.05,63.89,65.46,66.91,66.07,66.44,67.45,67.35,68.13,69.06,68.0,69.02,69.52,69.22,69.98,69.55,69.38,69.45,69.29,69.19,69.82,69.69,69.41,69.46,69.75,70.02,69.56,69.77,69.18,69.24,69.47,69.03,69.07,69.01,69.22,69.69,70.09,69.53,69.5,69.2,69.1,69.2,69.5,70.1,70.5,70.9,71.1,71.3,71.8,72.3,72.6,72.7,72.9,73.1,73.3,73.6,73.9,74.3,74.6,75.0,75.5,76.1,76.5,76.7,76.9
+Iceland,71.12,72.57,72.39,73.45,73.4,73.08,73.58,73.55,72.78,74.22,73.6,73.82,73.13,73.72,74.0,73.4,73.9,74.12,73.9,74.0,73.75,74.66,74.52,74.59,75.57,76.94,76.35,76.66,76.88,76.92,76.61,77.26,76.91,77.71,77.85,78.38,77.53,77.39,78.46,78.3,78.4,78.6,78.8,79.1,78.9,79.4,79.6,79.9,80.2,80.5,80.8,81.0,81.3,81.5,81.7,81.8,82.1,82.4,82.5,82.8,82.9,83.1,83.2,83.3,83.3,83.3
+India,35.1,35.76,36.44,37.11,37.79,38.48,39.16,39.85,40.56,41.26,41.99,42.72,43.46,44.23,44.98,45.73,46.49,47.21,47.93,48.65,49.35,50.08,50.81,51.53,52.25,52.93,53.56,54.14,54.65,55.1,55.51,55.86,56.19,56.51,56.81,57.11,57.39,57.65,57.93,58.2,58.5,58.8,59.1,59.5,59.9,60.2,60.5,60.8,61.2,61.5,61.9,62.3,62.8,63.2,63.6,63.9,64.3,64.7,65.0,65.4,65.7,66.1,66.5,66.9,67.2,67.5
+Indonesia,36.99,37.93,38.86,39.78,40.68,41.57,42.45,43.32,44.17,45.01,45.83,46.65,47.45,48.24,43.77,44.18,50.54,51.27,52.0,52.71,53.4,54.09,54.75,55.41,56.04,56.67,57.27,57.87,58.45,59.01,59.57,60.12,60.64,61.16,61.66,62.15,62.63,63.1,63.55,64.0,64.5,64.9,65.3,65.7,66.1,66.4,66.7,67.0,67.2,67.5,67.8,68.0,68.2,66.7,68.7,68.9,69.2,69.4,69.6,69.8,70.1,70.3,70.6,70.8,71.1,71.4
+Iran,40.29,40.92,41.56,42.19,42.84,43.47,44.11,44.74,45.38,46.0,46.61,47.22,47.83,48.43,49.04,49.66,50.3,50.98,51.67,52.43,53.28,54.24,55.24,56.24,57.1,57.64,57.78,57.52,56.95,56.24,55.62,55.32,55.49,56.19,57.39,59.01,60.83,62.67,64.43,66.0,67.8,68.5,69.1,69.6,69.9,69.8,70.3,70.8,71.3,71.4,71.3,71.3,70.1,71.5,71.9,72.4,72.8,73.1,73.4,73.7,74.1,74.3,74.5,74.6,74.6,74.6
+Iraq,35.08,36.58,38.04,39.45,40.81,42.11,43.38,44.61,45.79,46.96,48.11,49.24,50.36,51.46,52.52,53.51,54.42,55.24,55.96,56.61,57.21,57.77,58.31,58.81,59.19,59.35,59.26,58.94,58.44,57.91,57.52,57.41,57.68,58.34,59.36,60.65,62.05,63.41,64.65,65.7,63.9,65.4,65.4,65.4,65.3,65.3,65.2,65.7,65.9,65.8,66.4,66.1,66.1,66.3,65.7,65.1,65.3,66.6,67.1,67.3,67.7,68.1,68.3,67.7,67.4,67.1
+Ireland,65.07,67.52,68.3,68.44,68.46,69.43,69.51,69.84,69.99,70.76,70.24,70.57,70.85,71.12,71.35,70.89,71.95,71.67,71.62,71.68,72.5,71.86,72.11,72.08,72.68,72.81,72.98,72.98,73.28,73.66,74.04,74.34,74.4,74.87,74.84,74.93,75.76,75.82,75.87,76.3,76.7,76.8,76.9,77.3,77.1,77.5,77.4,77.6,77.7,77.8,78.4,78.8,79.1,79.3,79.7,79.8,80.1,80.1,80.3,81.0,80.6,81.1,81.5,81.6,81.7,81.8
+Israel,64.42,65.04,65.62,66.15,66.65,67.1,67.51,67.89,68.24,68.55,68.85,69.13,69.41,69.68,69.93,70.17,70.39,70.6,70.78,70.96,71.13,71.33,71.54,71.78,72.04,72.33,72.62,72.9,73.19,73.47,73.74,73.99,74.49,74.78,75.1,74.92,75.29,75.65,76.24,76.7,76.5,76.3,76.9,77.1,77.4,77.7,77.9,78.1,78.5,78.6,78.8,78.6,79.1,79.5,79.7,79.6,80.3,80.6,81.0,81.6,81.6,82.1,82.0,81.3,82.1,82.91
+Italy,65.3,65.93,66.56,67.88,68.23,67.62,67.79,68.85,69.3,69.19,69.82,69.21,69.32,70.37,70.24,70.99,71.03,70.85,70.87,71.62,71.87,72.15,72.09,72.81,72.72,73.07,73.44,73.78,74.11,74.07,74.46,74.93,74.75,75.51,75.62,75.94,76.36,76.54,76.94,77.0,77.0,77.3,77.6,77.8,78.1,78.3,78.7,78.9,79.3,79.6,79.8,80.1,80.1,80.9,81.1,81.2,81.3,81.5,81.6,81.9,82.0,82.0,82.1,82.1,82.2,82.3
+Jamaica,58.02,59.06,60.07,61.03,61.95,62.83,63.66,64.47,65.21,65.91,66.57,67.17,67.74,68.25,68.73,69.17,69.58,69.99,70.36,70.72,71.06,71.39,71.71,72.0,72.29,72.58,72.89,73.21,73.52,73.82,74.1,74.34,74.55,74.7,74.79,74.84,74.85,74.85,74.83,74.8,74.9,74.9,74.8,74.8,74.7,74.5,74.4,74.5,74.6,74.4,74.2,74.5,74.8,75.0,75.4,75.5,75.3,75.1,74.8,74.8,74.6,74.7,74.8,74.8,75.0,75.2
+Japan,60.98,63.02,63.36,64.6,65.76,65.62,65.49,67.11,67.49,67.78,68.43,68.71,69.79,70.26,70.31,71.12,71.41,71.73,71.96,72.05,72.87,73.39,73.45,73.88,74.38,74.78,75.35,75.67,76.18,76.16,76.57,77.08,77.11,77.5,77.8,78.22,78.63,78.54,78.97,79.0,79.1,79.3,79.4,79.8,79.7,80.2,80.4,80.5,80.6,81.0,81.3,81.6,81.7,81.9,82.0,82.2,82.4,82.5,82.7,82.7,82.6,82.9,83.0,83.1,83.2,83.3
+Jordan,45.56,46.45,47.34,48.23,49.09,49.95,50.8,51.65,52.48,53.3,54.12,54.94,55.75,56.55,57.35,58.13,58.9,59.66,60.42,61.15,61.87,62.59,63.3,63.98,64.64,65.28,65.89,66.46,67.0,67.51,68.0,68.45,68.9,69.33,69.75,70.15,70.52,70.87,71.2,71.5,71.9,72.2,72.2,72.4,72.5,72.6,72.8,73.0,73.2,73.4,73.6,73.8,74.0,74.1,74.5,75.5,76.3,76.9,77.5,77.9,78.1,78.2,78.3,78.4,78.5,78.6
+Kazakhstan,54.67,55.15,55.63,56.11,56.58,57.05,57.51,57.98,58.44,58.91,59.38,59.85,60.31,60.79,61.24,61.69,62.12,62.53,62.92,63.27,63.6,63.89,64.16,64.4,64.64,64.88,65.13,65.41,65.71,66.05,66.43,66.84,67.28,67.7,68.07,68.34,68.48,68.47,68.31,68.0,67.6,67.1,65.3,64.6,63.6,63.5,63.9,64.2,65.0,64.9,65.1,65.4,65.3,65.3,65.3,65.3,65.8,67.1,68.2,68.5,69.1,69.7,70.0,70.2,70.2,70.2
+Kenya,42.33,42.71,43.16,43.64,44.17,44.75,45.37,46.03,46.72,47.42,48.13,48.82,49.48,50.13,50.75,51.35,51.96,52.57,53.19,53.83,54.45,55.08,55.69,56.29,56.89,57.49,58.1,58.74,59.36,59.96,60.52,61.02,61.42,61.74,61.96,62.09,62.13,62.1,62.0,61.8,61.1,60.3,59.5,58.7,58.1,57.4,56.7,56.1,55.8,55.6,55.6,55.7,55.8,56.2,57.2,58.4,59.8,60.8,61.9,62.9,63.7,64.3,64.8,65.0,65.1,65.2
+Kiribati,42.25,42.65,43.05,43.44,43.85,44.25,44.64,45.04,45.45,45.84,46.24,46.64,47.03,47.44,47.84,48.23,48.63,49.02,49.42,49.82,50.21,50.61,51.02,51.41,51.81,52.2,52.58,52.97,53.36,53.75,54.17,54.62,55.08,55.56,56.04,56.5,56.93,57.33,57.68,58.0,58.2,58.4,58.4,58.7,58.9,59.2,59.4,59.5,59.6,59.8,60.1,60.2,60.4,60.6,60.8,61.0,61.2,61.5,61.7,61.9,62.1,62.3,62.6,62.8,63.0,63.2
+Kuwait,52.95,54.13,55.27,56.36,57.43,58.45,59.44,60.38,61.29,62.15,62.98,63.77,64.53,65.24,65.92,66.58,67.2,67.8,68.38,68.93,69.46,69.97,70.46,70.93,71.39,71.85,72.29,72.72,73.14,73.58,74.0,74.41,74.81,75.22,75.59,75.95,76.29,76.62,76.92,77.2,64.4,80.0,78.7,77.6,76.5,76.0,76.2,76.3,77.3,77.7,77.6,78.2,78.5,78.1,77.7,77.7,77.7,77.3,77.4,78.5,79.0,79.1,79.7,80.2,80.3,80.4
+Kyrgyz Republic,52.07,52.52,52.96,53.41,53.86,54.31,54.75,55.2,55.64,56.09,56.54,56.99,57.44,57.9,58.34,58.76,59.19,59.58,59.95,60.3,60.61,60.88,61.14,61.38,61.6,61.83,62.05,62.3,62.57,62.89,63.23,63.62,64.04,64.45,64.86,65.23,65.54,65.77,65.93,66.0,65.9,65.6,65.3,65.0,65.1,65.2,65.3,65.6,65.8,65.9,66.0,65.9,66.0,66.2,66.5,66.7,67.0,67.3,67.7,67.9,68.5,69.0,69.4,69.6,69.8,70.0
+Lao,39.88,40.13,40.37,40.62,40.86,41.11,41.37,41.62,41.87,42.13,42.38,42.64,42.89,43.13,43.39,43.64,43.89,44.15,44.41,44.66,44.91,45.16,45.39,45.62,45.85,46.05,46.26,46.47,46.69,46.91,47.17,47.45,47.76,48.12,48.54,49.02,49.56,50.16,50.8,51.5,52.0,52.4,52.8,53.2,53.6,54.0,54.4,54.9,55.5,56.1,56.6,57.6,58.4,59.3,60.1,60.8,61.7,62.5,63.3,64.1,65.0,65.6,66.1,66.6,67.1,67.6
+Latvia,60.48,61.88,63.19,64.38,65.46,66.44,67.31,68.07,69.53,70.37,70.6,69.97,70.36,71.62,71.29,71.26,70.94,70.56,70.29,70.31,70.66,70.35,70.29,70.22,69.37,69.48,69.56,69.45,68.93,69.23,69.18,69.75,69.51,69.56,69.72,71.09,71.14,71.05,70.55,69.6,69.1,68.4,66.7,65.7,66.5,68.6,69.3,69.0,70.0,70.5,70.0,70.4,70.8,71.2,71.1,70.8,71.3,72.4,73.3,73.9,74.6,75.1,75.0,75.2,75.4,75.6
+Lebanon,59.61,60.04,60.45,60.85,61.23,61.6,61.95,62.28,62.6,62.9,63.19,63.47,63.74,64.0,64.25,64.5,64.76,65.01,65.27,65.52,65.75,65.98,66.18,66.37,66.54,66.69,66.83,66.96,67.08,67.21,67.36,67.52,67.68,67.87,68.07,68.29,68.53,68.78,69.03,69.3,71.9,72.2,72.5,73.0,73.4,74.0,74.4,74.9,75.6,75.9,76.3,76.6,76.9,77.1,77.3,77.4,77.5,77.8,77.9,78.1,76.6,78.5,78.6,78.7,78.9,79.1
+Lesotho,41.53,42.11,42.72,43.33,43.96,44.59,45.22,45.85,46.46,47.02,47.54,47.97,48.32,48.59,48.79,48.95,49.09,49.24,49.43,49.67,49.96,50.31,50.7,51.14,51.63,52.17,52.75,53.38,54.01,54.65,55.25,55.82,56.34,56.83,57.31,57.88,58.51,59.21,59.92,60.5,60.6,60.4,60.1,59.2,58.7,57.9,56.6,54.6,52.9,50.7,48.9,47.0,45.4,44.2,43.1,43.1,43.3,44.5,45.5,46.4,46.7,46.1,45.6,45.4,47.1,48.86
+Liberia,33.11,33.36,33.6,33.84,34.07,34.28,34.51,34.73,34.98,35.24,35.54,35.88,36.28,36.73,37.23,37.77,38.33,38.91,39.49,40.1,40.75,41.43,42.16,42.91,43.68,44.45,45.21,45.92,46.57,47.14,47.6,47.97,48.25,48.46,48.58,48.62,48.59,48.55,48.53,48.6,51.5,51.8,50.1,48.9,50.9,50.4,53.8,54.4,55.2,55.8,56.3,55.4,55.2,57.9,58.4,58.8,59.3,59.9,60.3,60.8,61.5,62.3,62.9,61.8,63.2,64.63
+Libya,38.07,37.73,37.66,37.89,38.39,39.18,40.22,41.5,42.97,44.59,46.28,48.0,49.69,51.28,52.77,54.15,55.45,56.69,57.88,59.01,60.11,61.16,62.16,63.13,64.06,64.95,65.81,66.62,67.4,68.13,68.82,69.46,70.04,70.58,71.09,71.56,72.03,72.48,72.94,73.4,73.7,73.8,74.2,74.4,74.6,74.6,74.8,74.8,74.9,74.8,75.0,75.0,75.1,75.2,75.4,75.5,75.5,75.6,75.7,75.9,60.5,75.5,75.8,75.0,74.1,73.21
+Lithuania,63.9,64.52,65.14,65.77,66.38,66.99,67.59,68.19,67.73,70.33,70.52,69.46,70.64,72.0,71.76,71.92,71.99,71.68,71.3,71.16,72.1,71.34,71.7,71.63,71.24,71.38,71.14,70.93,70.8,70.78,70.77,71.17,71.09,70.6,70.78,72.45,72.26,72.1,71.79,71.5,70.5,70.3,69.1,68.7,69.0,70.2,71.1,71.3,71.8,72.1,71.6,72.1,72.1,72.2,71.7,71.5,71.4,72.1,73.6,73.9,74.3,74.7,74.9,75.0,75.2,75.4
+Luxembourg,65.38,65.71,66.04,66.37,66.67,66.98,67.27,67.55,67.83,68.99,69.49,68.59,68.8,68.98,69.31,69.21,69.59,70.17,69.73,69.47,69.35,70.59,70.34,70.42,70.37,70.31,71.61,71.57,72.25,72.42,72.22,72.31,73.19,72.94,73.51,74.44,73.97,74.57,74.49,75.2,75.5,75.8,76.2,76.5,76.9,77.1,77.4,77.7,78.1,78.5,78.7,79.0,79.1,79.5,80.0,80.3,80.6,81.0,81.2,81.3,81.5,81.7,81.9,82.1,82.2,82.3
+"Macao, China",60.25,60.79,61.32,61.84,62.37,62.89,63.41,63.92,64.43,64.93,65.42,65.9,66.36,66.81,67.24,67.66,68.06,68.45,68.83,69.2,69.56,69.91,70.26,70.61,70.95,71.29,71.62,71.94,72.26,72.57,72.88,73.17,73.46,73.75,74.03,74.31,74.58,74.84,75.1,75.36,75.61,75.86,76.1,76.33,76.56,76.78,77.0,77.21,77.42,77.63,77.83,78.04,78.25,78.46,78.67,78.89,79.1,79.32,79.54,79.75,79.97,80.19,80.4,80.61,80.82,81.03
+"Macedonia, FYR",53.65,54.61,55.53,56.4,57.25,58.04,58.79,59.51,60.2,60.85,61.49,62.11,62.72,63.32,63.92,64.51,65.08,65.62,66.14,66.63,67.08,67.48,67.83,68.14,68.41,68.61,68.76,68.88,68.98,69.08,69.21,69.4,69.63,69.92,70.26,70.6,70.93,71.23,71.48,71.7,71.7,71.6,71.5,71.7,71.8,72.1,72.3,72.4,72.6,72.9,73.0,73.3,73.4,73.6,73.8,74.1,74.3,74.5,74.7,75.2,75.6,75.8,76.0,76.2,76.5,76.8
+Madagascar,36.69,37.28,37.86,38.45,39.03,39.62,40.21,40.79,41.38,41.96,42.54,43.12,43.7,44.28,44.85,45.43,46.01,46.6,47.18,47.77,48.36,48.94,49.5,50.06,50.59,51.12,51.63,52.12,52.58,53.01,53.36,53.64,53.86,54.03,54.19,54.38,54.63,54.98,55.43,56.0,56.2,56.4,56.3,56.8,57.2,57.6,58.0,58.3,58.8,59.1,59.6,59.8,60.1,60.6,61.2,61.7,62.0,62.2,62.3,62.4,62.6,62.8,63.0,63.3,63.5,63.7
+Malawi,36.45,36.62,36.81,37.02,37.24,37.48,37.72,37.99,38.25,38.51,38.76,39.02,39.25,39.49,39.75,40.03,40.36,40.73,41.16,41.62,42.09,42.55,43.0,43.41,43.79,44.16,44.54,44.92,45.31,45.72,46.13,46.53,46.91,47.26,47.6,47.9,48.17,48.42,48.64,48.8,48.6,48.3,48.0,47.4,46.9,46.3,45.8,45.3,45.1,45.4,45.9,46.4,47.0,47.5,48.5,49.6,51.0,52.4,53.9,55.4,56.6,58.0,59.3,60.1,60.5,60.9
+Malaysia,54.05,54.72,55.39,56.06,56.72,57.37,58.01,58.65,59.27,59.89,60.48,61.07,61.63,62.17,62.71,63.21,63.7,64.17,64.63,65.08,65.51,65.93,66.34,66.73,67.13,67.5,67.86,68.21,68.56,68.89,69.22,69.53,69.84,70.14,70.45,70.73,71.01,71.28,71.54,71.8,72.0,72.2,72.4,72.4,72.4,72.5,72.8,73.0,73.1,73.3,73.6,73.8,73.9,74.0,74.3,74.5,74.5,74.5,74.3,74.4,74.6,74.7,74.9,75.1,75.3,75.5
+Maldives,33.9,34.18,34.49,34.86,35.27,35.72,36.22,36.78,37.39,38.07,38.82,39.64,40.54,41.48,42.47,43.48,44.49,45.48,46.44,47.37,48.25,49.12,49.98,50.82,51.69,52.59,53.53,54.51,55.53,56.58,57.62,58.64,59.64,60.6,61.51,62.41,63.3,64.19,65.08,66.0,66.7,67.3,67.9,68.6,69.3,70.0,70.8,71.7,72.3,73.0,73.7,74.4,75.3,74.7,76.9,77.5,78.1,78.5,78.9,79.2,79.6,79.8,79.9,80.0,80.0,80.0
+Mali,27.34,27.71,28.04,28.34,28.6,28.84,29.04,29.23,29.42,29.61,29.83,30.08,30.4,30.79,31.26,31.8,32.41,33.07,33.77,34.51,35.27,36.04,36.82,37.61,38.39,39.18,39.97,40.79,41.61,42.45,43.3,44.17,45.02,45.86,46.68,47.45,48.18,48.86,49.47,50.0,50.5,50.8,51.2,51.2,51.4,51.8,52.2,50.9,53.5,53.5,54.1,54.6,55.5,56.2,56.9,57.4,58.0,58.5,58.9,59.2,59.6,59.8,59.8,60.0,60.2,60.4
+Malta,66.02,66.17,66.35,66.55,66.79,67.06,67.34,67.65,67.97,68.32,68.67,69.02,69.37,69.7,70.03,70.36,70.67,70.98,71.29,71.6,71.9,72.2,72.49,72.78,73.07,73.36,73.63,73.92,74.19,74.47,74.74,75.01,75.28,75.54,75.81,76.08,76.33,76.59,76.84,77.1,77.3,77.5,77.9,78.2,78.4,78.5,78.8,78.9,79.0,79.2,79.4,79.8,80.1,80.3,80.7,81.0,80.9,80.7,81.2,81.3,81.3,81.6,81.7,82.0,82.1,82.2
+Martinique,54.51,55.23,55.93,56.61,57.28,57.93,58.57,59.2,59.81,60.41,61.0,61.58,62.16,62.72,63.28,63.84,64.39,64.93,65.46,65.99,66.51,67.02,67.53,68.02,68.51,69.0,69.47,69.93,70.38,70.82,71.25,71.68,72.09,72.5,72.9,73.29,73.67,74.05,74.42,74.79,75.15,75.51,75.86,76.2,76.54,76.88,77.22,77.55,77.88,78.19,78.5,78.78,79.05,79.31,79.55,79.78,80.01,80.24,80.48,80.71,80.95,81.18,81.41,81.64,81.86,82.08
+Mauritania,37.95,38.53,39.14,39.77,40.42,41.09,41.78,42.48,43.2,43.91,44.62,45.31,45.96,46.59,47.18,47.73,48.26,48.78,49.27,49.77,50.25,50.73,51.2,51.69,52.19,52.73,53.29,53.89,54.51,55.13,55.75,56.34,56.9,57.41,57.86,58.28,58.64,58.96,59.25,59.5,60.2,60.4,60.7,60.7,61.2,61.5,62.0,62.5,63.2,63.8,64.2,64.9,65.5,65.9,66.3,67.0,67.5,67.9,68.2,68.6,68.8,69.1,69.3,69.6,69.7,69.8
+Mauritius,48.57,49.61,50.68,51.78,52.92,54.09,55.28,56.46,57.63,58.74,59.75,60.64,61.38,61.97,62.4,62.67,62.85,62.97,63.05,63.14,63.27,63.45,63.68,63.99,64.37,64.83,65.34,65.87,66.41,66.92,67.34,67.7,67.96,68.14,68.26,68.38,68.53,68.74,68.99,69.3,69.6,69.7,69.8,70.0,70.3,70.5,70.7,71.0,71.2,71.4,71.6,71.7,71.9,72.1,72.4,72.5,72.7,72.9,73.2,73.4,73.7,74.1,74.2,74.3,74.5,74.7
+Mayotte,45.38,46.68,47.92,49.11,50.24,51.32,52.34,53.3,54.22,55.09,55.92,56.72,57.5,58.25,58.98,59.7,60.39,61.07,61.73,62.36,62.99,63.59,64.17,64.74,65.3,65.84,66.36,66.88,67.38,67.86,68.34,68.8,69.25,69.69,70.12,70.54,70.95,71.35,71.75,72.14,72.53,72.91,73.28,73.64,74.0,74.35,74.7,75.03,75.36,75.69,76.01,76.33,76.64,76.95,77.24,77.53,77.8,78.05,78.29,78.52,78.74,78.96,79.19,79.42,79.65,79.88
+Mexico,49.27,50.37,51.42,52.43,53.39,54.29,55.14,55.94,56.67,57.34,57.95,58.49,58.96,59.4,59.78,60.15,60.53,60.91,61.32,61.77,62.25,62.75,63.29,63.83,64.39,64.95,65.51,66.05,66.58,67.09,67.58,68.05,68.52,68.97,69.4,69.84,70.26,70.67,71.09,71.5,71.9,72.1,72.4,72.7,73.0,73.3,73.6,73.7,74.1,74.6,74.9,74.9,74.9,75.2,75.1,75.4,75.6,75.4,75.3,75.4,75.7,75.7,75.4,75.6,75.9,76.2
+"Micronesia, Fed. Sts.",53.56,53.92,54.28,54.65,55.01,55.37,55.73,56.09,56.45,56.82,57.18,57.54,57.9,58.26,58.63,58.99,59.36,59.73,60.1,60.48,60.89,61.3,61.71,62.12,62.5,62.85,63.14,63.37,63.53,63.64,63.71,63.77,63.81,63.86,63.92,63.99,64.07,64.15,64.23,64.3,64.5,64.7,64.9,65.1,65.4,65.7,65.9,66.1,66.3,66.6,66.8,66.0,67.3,67.4,67.6,67.7,67.9,68.0,68.1,68.3,68.4,68.6,68.7,68.8,68.9,69.0
+Moldova,58.5,58.96,59.42,59.85,60.27,60.68,61.07,61.46,61.84,62.22,62.61,62.99,63.38,63.77,64.14,64.48,64.78,65.03,65.23,65.39,65.48,65.55,65.58,65.6,65.6,65.57,65.52,65.47,65.41,65.4,65.48,65.68,65.98,66.38,66.83,67.29,67.69,67.98,68.16,68.2,67.4,67.6,67.4,65.8,65.4,66.1,67.9,68.5,68.4,68.6,69.2,69.6,69.9,70.2,69.5,69.8,70.0,70.4,70.6,70.5,72.3,72.4,73.3,73.6,73.9,74.2
+Mongolia,43.09,43.41,43.83,44.34,44.96,45.66,46.46,47.33,48.25,49.2,50.15,51.08,51.94,52.74,53.48,54.16,54.8,55.43,56.02,56.58,57.08,57.49,57.82,58.06,58.22,58.31,58.36,58.4,58.46,58.56,58.73,59.0,59.34,59.76,60.22,60.71,61.18,61.61,61.98,62.3,62.3,62.2,62.0,62.0,61.7,61.7,61.9,62.1,62.3,62.5,62.7,62.9,63.1,63.4,63.6,64.0,64.4,64.8,65.0,65.2,65.6,66.0,66.4,66.8,67.1,67.4
+Montenegro,59.32,59.59,59.91,60.31,60.78,61.3,61.87,62.5,63.17,63.86,64.54,65.21,65.86,66.47,67.05,67.62,68.19,68.78,69.36,69.94,70.48,70.99,71.41,71.78,72.07,72.33,72.55,72.75,72.95,73.16,73.35,73.52,73.68,73.83,73.96,74.08,74.21,74.35,74.47,74.6,74.4,74.2,73.9,73.7,73.5,73.4,73.3,73.1,73.0,73.3,73.5,74.0,74.5,74.8,75.0,75.2,75.6,76.0,76.3,76.5,76.7,76.8,76.9,77.1,77.2,77.3
+Morocco,45.84,46.21,46.58,46.98,47.39,47.81,48.25,48.7,49.17,49.64,50.11,50.6,51.09,51.58,52.06,52.54,53.0,53.46,53.91,54.34,54.77,55.19,55.62,56.08,56.56,57.11,57.72,58.39,59.13,59.93,60.77,61.63,62.49,63.33,64.14,64.91,65.66,66.38,67.06,67.7,68.1,68.4,68.6,69.1,69.5,70.0,70.4,70.8,71.1,71.5,71.8,72.0,72.3,72.5,72.7,72.9,73.1,73.3,73.5,73.7,73.9,74.1,74.3,74.4,74.6,74.8
+Mozambique,32.26,32.92,33.58,34.25,34.91,35.58,36.23,36.89,37.54,38.17,38.79,39.4,39.98,40.54,41.1,41.66,42.21,42.78,43.37,43.97,44.58,45.21,45.85,46.46,47.06,47.61,48.1,48.52,48.88,49.17,49.4,49.57,49.72,49.87,50.02,50.21,50.45,50.74,51.08,51.5,51.7,52.1,52.3,52.6,52.7,52.6,52.5,52.6,52.6,52.3,52.8,52.7,52.9,53.0,52.9,53.0,53.2,54.0,54.4,54.4,54.5,54.5,54.8,56.1,57.1,58.12
+Myanmar,33.8,35.24,36.53,37.69,38.71,39.6,40.36,41.03,41.65,42.25,42.9,43.64,44.47,45.4,46.4,47.39,48.31,49.11,49.78,50.31,50.72,51.09,51.44,51.78,52.15,52.54,52.93,53.31,53.69,54.07,54.44,54.8,55.16,55.52,55.87,56.23,56.58,56.93,57.26,57.6,57.8,58.1,58.4,58.8,59.0,59.4,59.7,60.1,60.4,60.8,61.3,61.7,62.3,62.8,63.4,64.0,64.6,59.4,65.6,66.0,66.4,66.8,67.2,67.6,68.0,68.4
+Namibia,40.72,41.49,42.23,42.96,43.69,44.39,45.09,45.76,46.42,47.07,47.7,48.31,48.9,49.48,50.05,50.61,51.17,51.71,52.26,52.81,53.36,53.91,54.44,54.98,55.51,56.04,56.56,57.07,57.57,58.06,58.54,59.01,59.45,59.87,60.27,60.65,61.0,61.3,61.54,61.7,61.9,62.0,62.0,61.5,60.5,59.3,58.1,56.7,55.4,54.0,53.4,52.7,52.4,52.5,53.1,54.9,57.5,59.1,60.3,61.4,62.6,63.6,63.9,64.1,64.2,64.3
+Nepal,35.53,36.0,36.48,36.96,37.43,37.9,38.38,38.85,39.32,39.8,40.26,40.74,41.21,41.67,42.14,42.6,43.05,43.51,43.97,44.43,44.91,45.41,45.92,46.47,47.05,47.64,48.28,48.94,49.63,50.32,51.06,51.81,52.57,53.36,54.17,54.98,55.83,56.68,57.53,58.4,59.1,60.0,60.2,61.0,61.7,62.5,63.4,63.9,64.6,65.2,65.9,65.9,66.8,67.0,67.4,67.8,68.1,68.4,68.7,69.0,69.3,69.7,69.9,70.2,69.7,69.2
+Netherlands,71.5,72.12,71.7,72.39,72.51,72.52,72.97,73.13,73.17,73.35,73.54,73.21,73.33,73.71,73.58,73.52,73.79,73.6,73.51,73.57,73.81,73.72,74.17,74.56,74.49,74.61,75.2,75.11,75.59,75.72,75.93,76.01,76.21,76.28,76.34,76.31,76.78,76.98,76.82,77.0,77.2,77.3,77.2,77.5,77.6,77.6,77.9,78.1,78.0,78.1,78.3,78.5,78.7,79.1,79.6,79.9,80.2,80.3,80.6,80.8,80.9,81.0,81.2,81.3,81.3,81.3
+Netherlands Antilles,58.96,60.02,61.0,61.89,62.7,63.43,64.08,64.65,65.15,65.6,65.99,66.34,66.67,67.0,67.33,67.67,68.03,68.41,68.81,69.22,69.63,70.05,70.45,70.84,71.21,71.58,71.94,72.29,72.64,72.96,73.27,73.56,73.8,74.02,74.19,74.33,74.42,74.49,74.52,74.54,74.53,74.52,74.5,74.49,74.48,74.48,74.5,74.53,74.57,74.65,74.76,74.91,75.09,75.3,75.53,75.76,75.98,76.18,76.36,76.52,76.65,76.77,76.89,77.01,77.14,77.27
+New Caledonia,49.51,50.34,51.16,51.96,52.74,53.5,54.25,54.98,55.69,56.38,57.06,57.72,58.36,58.99,59.6,60.19,60.77,61.33,61.89,62.42,62.95,63.46,63.95,64.44,64.91,65.37,65.81,66.25,66.68,67.09,67.5,67.89,68.28,68.65,69.02,69.37,69.72,70.05,70.38,70.7,71.01,71.31,71.6,71.89,72.16,72.43,72.7,72.95,73.21,73.46,73.7,73.94,74.17,74.4,74.62,74.84,75.05,75.26,75.47,75.67,75.88,76.09,76.31,76.52,76.74,76.96
+New Zealand,69.17,69.4,70.25,70.36,70.49,70.75,70.27,70.9,70.82,71.28,71.0,71.26,71.33,71.37,71.3,71.16,71.54,71.2,71.57,71.35,71.8,71.92,71.78,72.03,72.3,72.5,72.25,73.14,73.18,72.98,73.77,73.87,73.97,74.53,74.03,74.28,74.36,74.64,75.05,75.6,75.9,76.2,76.5,76.7,77.0,77.3,77.6,78.0,78.2,78.4,78.6,78.9,79.1,79.4,79.8,79.9,80.1,80.3,80.5,80.8,80.8,81.1,81.4,81.4,81.4,81.4
+Nicaragua,43.38,44.18,44.98,45.78,46.59,47.4,48.22,49.04,49.86,50.69,51.53,52.36,53.19,54.04,54.88,55.74,56.6,57.47,58.33,59.18,60.01,60.8,61.56,62.28,62.95,63.59,64.17,64.73,65.28,65.83,66.38,66.95,67.56,68.2,68.89,69.67,70.51,71.4,72.35,73.3,73.7,73.6,73.9,74.1,74.4,74.7,75.0,73.2,75.6,76.0,76.2,76.3,76.3,76.4,76.6,76.7,76.8,77.0,77.1,77.2,77.4,77.5,77.6,77.8,78.0,78.2
+Niger,35.61,35.72,35.83,35.95,36.08,36.22,36.37,36.51,36.67,36.82,36.97,37.1,37.24,37.36,37.49,37.61,37.73,37.88,38.05,38.24,38.45,38.69,38.95,39.25,39.57,39.97,40.4,40.9,41.44,42.0,42.58,43.13,43.66,44.15,44.63,45.09,45.57,46.07,46.62,47.2,47.9,48.2,48.6,49.1,49.5,50.2,50.6,51.2,51.8,52.4,52.9,53.7,54.4,55.2,55.9,56.6,57.3,58.0,58.6,59.2,59.6,60.0,60.4,60.7,61.0,61.3
+Nigeria,35.25,35.74,36.25,36.79,37.35,37.93,38.53,39.14,39.76,40.39,41.0,41.61,42.19,42.75,43.29,43.81,38.31,33.47,31.63,41.79,46.56,47.16,47.77,48.38,49.0,49.62,50.24,50.84,51.42,51.95,52.41,52.8,53.12,53.36,53.54,53.67,53.78,53.88,53.98,54.1,54.3,54.4,54.5,54.9,55.0,55.0,55.0,55.1,55.2,55.2,55.4,55.3,55.6,56.1,56.8,57.4,58.3,59.2,60.3,61.2,62.0,62.6,63.3,63.7,64.6,65.51
+North Korea,26.78,24.76,31.74,42.66,46.7,48.18,49.16,49.73,50.43,50.9,51.25,51.64,52.15,52.86,53.76,54.84,55.97,57.07,58.1,59.06,59.93,60.74,61.5,62.22,62.88,63.49,64.04,64.53,64.98,65.39,65.75,66.08,66.4,66.69,67.0,67.36,67.78,68.22,68.63,68.9,69.2,69.4,69.6,69.7,58.6,58.7,58.8,58.9,59.0,59.1,59.2,59.3,69.9,70.0,70.2,70.4,70.6,70.9,71.0,71.2,71.4,71.6,71.8,71.9,72.1,72.3
+Norway,72.58,72.72,73.2,73.28,73.5,73.55,73.5,73.5,73.63,73.66,73.67,73.55,73.2,73.7,73.83,74.11,74.18,74.07,73.78,74.19,74.3,74.46,74.56,74.88,74.93,75.17,75.51,75.54,75.54,75.8,76.0,76.13,76.19,76.36,76.07,76.21,76.07,76.17,76.52,76.6,77.0,77.1,77.5,77.7,77.9,78.2,78.3,78.3,78.5,78.6,78.9,79.1,79.5,79.8,80.2,80.4,80.6,80.8,80.8,81.1,81.1,81.6,81.6,82.0,82.0,82.0
+Oman,35.74,36.78,37.81,38.82,39.82,40.8,41.78,42.75,43.7,44.64,45.57,46.47,47.37,48.26,49.13,49.97,50.8,51.62,52.43,53.26,54.14,55.07,56.06,57.11,58.2,59.32,60.45,61.57,62.65,63.7,64.69,65.65,66.59,67.48,68.35,69.17,69.95,70.7,71.41,72.1,72.5,72.9,73.3,73.6,73.9,74.2,74.5,74.8,75.1,75.2,75.4,75.4,75.6,75.8,76.0,76.0,76.0,76.2,76.2,76.1,76.3,76.6,76.8,77.0,77.2,77.4
+Pakistan,36.85,38.07,39.26,40.42,41.56,42.67,43.75,44.8,45.81,46.79,47.73,48.63,49.47,50.27,51.01,51.7,52.34,52.95,53.52,54.06,54.6,55.12,55.64,56.16,56.68,57.17,57.63,58.05,58.44,58.79,59.13,59.45,59.77,60.09,60.43,60.77,61.11,61.45,61.78,62.1,62.2,62.1,62.0,61.9,61.8,61.9,61.8,62.0,62.1,62.3,62.5,62.6,62.8,63.1,62.2,63.7,63.8,64.1,64.3,64.5,64.9,65.1,65.4,65.6,65.9,66.2
+Panama,56.42,56.99,57.56,58.14,58.72,59.31,59.89,60.47,61.05,61.62,62.17,62.71,63.22,63.72,64.21,64.7,65.18,65.65,66.15,66.66,67.18,67.72,68.26,68.81,69.35,69.88,70.38,70.85,71.3,71.72,72.1,72.47,72.8,73.13,73.45,73.76,74.06,74.34,74.62,74.9,75.0,75.0,75.2,75.2,75.3,75.4,75.6,75.8,76.2,76.5,76.7,76.9,77.0,77.1,77.2,77.2,77.3,77.3,77.3,77.3,77.4,77.5,77.6,77.9,78.2,78.5
+Papua New Guinea,34.02,34.53,35.04,35.54,36.03,36.53,37.02,37.51,38.04,38.6,39.2,39.87,40.6,41.39,42.22,43.07,43.92,44.74,45.53,46.27,46.97,47.63,48.27,48.9,49.54,50.21,50.91,51.65,52.4,53.11,53.74,54.26,54.65,54.92,55.08,55.19,55.3,55.47,55.7,56.0,56.0,56.2,56.4,56.7,56.9,57.0,57.2,56.5,57.4,57.5,57.6,57.6,57.7,57.7,57.9,58.0,58.2,58.6,58.8,59.1,59.4,59.7,60.2,60.5,60.9,61.3
+Paraguay,64.04,64.16,64.33,64.52,64.76,65.03,65.33,65.65,66.0,66.35,66.7,67.03,67.33,67.61,67.87,68.11,68.37,68.63,68.9,69.2,69.49,69.78,70.06,70.32,70.57,70.81,71.04,71.28,71.51,71.73,71.97,72.19,72.41,72.64,72.87,73.11,73.36,73.62,73.91,74.2,74.2,74.1,74.1,74.0,74.1,74.1,74.2,74.2,74.3,74.2,74.2,74.1,74.1,73.8,74.0,74.0,74.0,74.0,74.0,74.0,74.0,74.1,74.1,74.3,74.4,74.5
+Peru,43.99,44.43,44.91,45.41,45.95,46.51,47.1,47.72,48.34,48.95,49.56,50.14,50.7,51.25,51.79,52.38,53.03,53.74,54.52,55.36,56.2,57.04,57.85,58.6,59.31,59.99,60.63,61.28,61.93,62.59,63.25,63.9,64.55,65.18,65.8,66.41,66.99,67.57,68.14,68.7,69.2,69.5,70.0,70.5,71.1,71.7,72.4,73.1,73.9,74.6,75.2,75.7,76.2,76.7,77.2,77.7,77.9,78.2,78.2,78.4,78.5,78.7,79.1,79.3,79.5,79.7
+Philippines,55.43,55.83,56.23,56.61,56.99,57.36,57.74,58.11,58.46,58.82,59.17,59.53,59.87,60.21,60.56,60.91,61.26,61.6,61.94,62.26,62.54,62.77,62.95,63.1,63.21,63.32,63.44,63.6,63.81,64.06,64.37,64.74,65.13,65.53,65.95,66.35,66.72,67.05,67.34,67.6,67.9,68.2,68.3,68.6,68.8,68.9,69.0,69.0,69.2,69.1,69.0,69.0,69.1,69.1,69.1,69.2,69.7,69.8,69.9,70.1,70.2,70.3,70.3,70.7,71.0,71.3
+Poland,59.68,60.87,61.96,62.97,63.9,64.74,65.5,65.97,65.59,67.92,68.04,67.71,68.64,68.87,69.58,69.99,69.69,70.33,69.83,69.96,69.76,70.95,70.95,71.46,70.88,70.88,70.78,70.71,71.05,70.4,71.38,71.45,71.29,71.03,70.78,71.07,71.12,71.49,71.25,70.9,70.7,71.1,71.7,71.7,71.9,72.4,72.7,73.0,73.1,73.8,74.2,74.6,74.9,75.0,75.1,75.2,75.2,75.4,75.7,76.2,76.5,76.7,77.3,77.4,77.6,77.8
+Portugal,58.71,59.81,61.11,62.25,61.42,61.22,61.49,63.79,62.97,64.23,62.85,64.37,65.0,65.22,66.17,65.67,66.57,66.88,66.49,67.14,66.91,69.23,68.63,69.18,68.9,69.12,70.37,70.83,71.64,71.71,71.9,72.73,72.65,72.94,73.22,73.61,74.0,74.02,74.58,74.2,74.2,74.6,74.7,75.5,75.5,75.5,75.8,76.1,76.4,76.8,76.8,77.3,77.6,78.2,78.4,79.0,79.2,79.4,79.6,79.9,80.2,80.4,80.7,80.7,80.8,80.9
+Puerto Rico,61.57,62.94,64.16,65.22,66.13,66.87,67.48,67.94,68.31,68.58,68.8,69.0,69.21,69.45,69.72,70.03,70.36,70.7,71.03,71.35,71.66,71.98,72.26,72.53,72.77,72.98,73.15,73.28,73.38,73.47,73.56,73.65,73.76,73.89,74.0,74.07,74.08,74.04,73.93,73.8,73.8,73.7,73.8,73.1,73.3,73.6,74.5,75.1,75.2,75.6,75.8,76.2,76.5,76.5,76.6,76.8,76.9,77.0,77.1,77.1,77.4,77.7,77.9,78.2,78.5,78.8
+Qatar,53.86,54.67,55.47,56.26,57.04,57.81,58.58,59.33,60.08,60.82,61.57,62.31,63.06,63.79,64.53,65.25,65.95,66.64,67.29,67.91,68.49,69.03,69.52,69.98,70.4,70.79,71.15,71.49,71.83,72.14,72.45,72.75,73.01,73.27,73.51,73.74,73.94,74.14,74.32,74.5,74.4,74.5,74.5,74.4,74.4,74.5,74.6,74.6,74.6,74.7,75.0,75.0,75.2,75.8,76.3,76.7,77.3,77.9,78.5,79.2,79.7,79.9,79.9,79.8,79.7,79.6
+Reunion,45.98,47.28,48.53,49.72,50.86,51.94,52.96,53.93,54.85,55.73,56.57,57.37,58.15,58.9,59.64,60.36,61.06,61.74,62.41,63.06,63.69,64.3,64.89,65.46,66.0,66.53,67.05,67.55,68.03,68.51,68.97,69.43,69.87,70.3,70.73,71.14,71.54,71.94,72.32,72.69,73.06,73.41,73.77,74.11,74.45,74.79,75.12,75.44,75.76,76.08,76.38,76.68,76.97,77.26,77.53,77.81,78.08,78.35,78.62,78.88,79.14,79.4,79.65,79.89,80.12,80.35
+Romania,61.13,61.07,61.19,61.47,61.93,62.54,63.29,64.14,65.04,65.92,66.7,67.32,67.74,67.96,68.02,67.98,67.95,68.01,68.16,68.41,68.73,69.06,69.34,69.58,69.75,69.87,69.95,70.01,70.06,70.1,70.12,70.11,70.1,70.08,70.05,70.02,70.0,69.98,69.99,70.0,70.5,70.0,69.8,69.5,69.4,69.1,69.1,69.8,70.6,71.1,71.1,71.2,71.6,72.0,72.4,72.8,73.3,73.2,73.3,73.7,74.5,74.7,74.9,75.1,75.2,75.3
+Russia,57.76,58.16,58.96,60.96,63.35,64.85,63.95,66.84,67.59,68.61,68.85,68.51,68.98,69.77,69.36,69.43,69.21,69.17,68.65,68.76,69.02,68.92,68.89,68.88,68.24,67.98,67.85,67.89,67.61,67.57,67.79,68.25,68.01,67.53,68.19,69.8,69.81,69.66,69.57,69.2,69.1,68.0,65.2,63.8,64.4,65.7,67.0,67.2,65.9,65.1,65.1,64.9,64.7,65.1,65.1,66.7,67.7,67.9,68.8,68.9,69.8,70.4,70.8,70.9,71.0,71.1
+Rwanda,39.99,40.32,40.66,41.0,41.34,41.69,42.03,42.38,42.73,43.07,43.41,43.74,44.05,44.35,44.62,44.85,45.07,45.27,45.44,45.58,45.71,45.81,45.91,46.01,46.13,46.31,46.54,46.81,47.12,47.46,47.88,48.32,48.69,48.88,49.15,49.42,49.69,49.96,50.23,50.5,49.3,48.0,46.7,13.2,43.8,44.6,44.0,45.6,47.2,49.2,51.0,53.5,55.5,57.6,59.6,61.6,63.1,64.1,64.3,65.1,65.3,65.5,65.6,65.7,65.9,66.1
+Samoa,46.08,46.69,47.3,47.9,48.5,49.09,49.69,50.28,50.87,51.45,52.04,52.62,53.21,53.8,54.39,54.98,55.57,56.15,56.75,57.33,57.92,58.5,59.09,59.67,60.26,60.84,61.44,62.02,62.62,63.2,63.79,64.36,64.94,65.51,66.1,66.67,67.27,67.87,68.49,69.1,69.1,69.5,69.7,69.8,70.0,70.2,70.4,70.6,70.7,70.8,71.0,71.2,71.4,71.6,71.8,72.0,72.1,72.3,70.4,72.6,72.7,72.7,73.0,73.1,73.2,73.3
+Sao Tome and Principe,46.1,46.54,47.01,47.52,48.05,48.6,49.18,49.77,50.38,51.01,51.62,52.21,52.79,53.36,53.92,54.47,55.03,55.6,56.19,56.81,57.47,58.13,58.81,59.47,60.09,60.63,61.08,61.42,61.67,61.83,61.93,62.02,62.12,62.24,62.4,62.59,62.79,63.0,63.2,63.4,63.5,63.6,63.7,64.0,64.1,63.9,63.9,64.0,64.4,64.6,64.9,65.0,65.3,65.4,65.5,65.7,65.7,66.0,66.7,66.9,67.2,67.4,67.6,67.8,68.0,68.2
+Saudi Arabia,42.31,42.89,43.47,44.05,44.64,45.23,45.82,46.42,47.02,47.62,48.22,48.84,49.48,50.15,50.88,51.68,52.55,53.51,54.55,55.65,56.82,58.04,59.26,60.48,61.67,62.83,63.95,65.01,66.03,66.99,67.89,68.74,69.54,70.3,71.01,71.66,72.28,72.85,73.39,73.9,74.3,74.6,74.9,75.1,75.5,75.8,76.0,76.3,76.6,76.8,77.1,77.2,77.4,77.5,77.8,77.9,78.2,78.3,78.5,78.7,78.9,79.2,79.3,79.4,79.5,79.6
+Senegal,34.89,35.39,35.88,36.34,36.78,37.19,37.57,37.93,38.23,38.46,38.63,38.72,38.76,38.74,38.71,38.7,38.74,38.9,39.17,39.59,40.18,40.94,41.85,42.85,43.94,45.07,46.21,47.33,48.4,49.42,50.43,51.44,52.47,53.48,54.45,55.36,56.17,56.86,57.41,57.8,58.0,58.0,58.2,58.2,58.4,58.8,58.9,59.1,59.2,59.7,60.2,60.4,61.3,61.7,62.2,62.5,63.0,63.5,63.9,64.2,64.4,64.6,64.8,65.0,65.3,65.6
+Serbia,58.63,59.11,59.61,60.12,60.63,61.15,61.69,62.23,62.78,63.33,63.88,64.44,64.99,65.53,66.06,66.56,67.05,67.51,67.94,68.34,68.7,69.03,69.32,69.59,69.82,70.03,70.21,70.37,70.53,70.68,70.84,71.01,71.19,71.38,71.58,71.78,71.99,72.17,72.35,72.5,71.4,72.4,72.3,72.1,72.0,71.9,72.1,71.5,71.0,72.1,72.4,72.5,72.7,72.9,73.2,73.6,74.0,74.3,74.6,74.8,75.1,75.4,75.7,75.9,76.2,76.5
+Seychelles,57.55,57.43,57.45,57.57,57.82,58.18,58.65,59.19,59.8,60.42,61.03,61.59,62.08,62.47,62.81,63.11,63.43,63.78,64.18,64.62,65.11,65.59,66.06,66.49,66.9,67.26,67.59,67.89,68.16,68.4,68.63,68.83,69.02,69.17,69.3,69.36,69.37,69.31,69.22,69.1,69.1,69.2,69.3,69.6,69.8,69.9,70.1,70.4,70.7,70.9,71.1,71.3,71.5,71.7,72.0,72.3,72.6,72.9,73.0,73.1,73.4,73.7,73.8,74.0,74.1,74.2
+Sierra Leone,31.66,32.13,32.62,33.1,33.6,34.09,34.59,35.08,35.58,36.07,36.57,37.06,37.57,38.1,38.7,39.38,40.18,41.08,42.08,43.15,44.28,45.39,46.48,47.5,48.45,49.31,50.11,50.83,51.49,52.04,52.5,52.83,53.06,53.16,53.14,52.98,52.72,52.36,51.98,51.6,51.4,51.9,52.1,51.6,50.9,51.9,51.3,49.7,49.2,51.5,51.8,51.6,51.7,52.0,52.3,52.7,53.0,53.6,54.2,55.0,55.6,56.4,57.1,55.2,57.1,59.07
+Singapore,58.62,59.54,60.41,61.24,62.01,62.73,63.39,64.01,64.54,65.02,65.41,65.72,65.97,66.16,66.31,66.46,66.63,66.84,67.09,67.4,67.75,68.12,68.5,68.88,69.26,69.62,69.98,70.34,70.68,71.04,71.39,71.74,72.11,72.49,72.87,73.27,73.68,74.1,74.51,74.9,75.6,76.0,76.2,76.3,76.4,76.7,77.2,77.6,78.0,78.3,78.6,78.9,79.3,79.8,80.0,80.2,80.4,80.6,81.0,81.3,81.5,81.6,81.7,81.9,82.0,82.1
+Slovak Republic,61.35,64.4,65.7,66.76,67.89,68.42,67.51,69.41,69.09,70.42,70.86,70.4,70.79,71.17,70.39,70.53,71.07,70.6,69.91,69.84,69.99,70.46,70.16,70.33,70.45,70.62,70.58,70.59,70.92,70.58,70.82,70.94,70.64,70.88,70.89,71.07,71.24,71.32,71.12,71.0,71.1,71.4,71.9,72.3,72.4,72.8,72.8,72.8,73.0,73.3,73.6,73.8,73.9,74.2,74.3,74.5,74.6,74.9,75.2,75.7,76.1,76.5,77.0,77.4,77.6,77.8
+Slovenia,64.71,65.28,65.83,66.34,66.81,67.25,67.66,68.02,68.34,68.62,68.82,68.98,69.08,69.12,69.14,69.14,69.14,69.17,69.23,69.32,69.47,69.66,69.86,70.09,70.32,70.51,70.66,70.77,70.85,70.89,70.94,71.03,70.74,71.2,71.63,72.17,72.1,72.75,73.19,73.7,73.6,73.8,73.9,74.2,74.6,75.0,75.2,75.4,75.7,76.1,76.3,76.6,76.8,77.2,77.6,77.9,78.2,78.7,79.1,79.5,79.9,80.1,80.3,80.8,80.9,81.0
+Solomon Islands,45.39,45.97,46.53,47.11,47.68,48.26,48.83,49.41,49.98,50.55,51.12,51.69,52.27,52.84,53.42,54.0,54.58,55.16,55.74,56.31,56.91,57.52,58.13,58.74,59.33,59.9,60.43,60.89,61.27,61.53,61.59,61.46,61.16,60.74,60.26,59.84,59.58,59.52,59.7,60.1,60.0,60.4,60.6,60.9,61.1,61.4,61.5,61.6,61.7,61.7,61.7,61.7,61.7,61.7,61.8,61.9,61.9,62.3,62.4,62.7,63.0,63.3,63.5,63.6,64.0,64.4
+Somalia,34.13,34.6,35.07,35.54,36.01,36.47,36.94,37.41,37.87,38.34,38.8,39.26,39.74,40.21,40.68,41.14,41.61,42.08,42.54,42.99,43.44,43.9,44.35,44.8,45.24,45.7,46.15,46.6,47.03,47.46,47.88,48.28,48.65,48.98,49.24,49.36,49.34,49.19,48.98,48.8,47.4,48.4,49.7,49.7,49.9,49.9,49.6,50.3,50.4,50.7,50.9,51.1,51.5,51.6,52.1,52.2,52.4,52.6,52.8,51.6,52.0,53.4,54.1,54.3,54.2,54.1
+South Africa,43.92,44.67,45.37,46.03,46.63,47.19,47.71,48.17,48.6,49.01,49.4,49.78,50.14,50.52,50.91,51.3,51.68,52.04,52.41,52.77,53.11,53.44,53.77,54.11,54.47,54.86,55.3,55.77,56.29,56.85,57.44,58.04,58.64,59.22,59.78,60.32,60.83,61.29,61.69,62.0,62.5,62.4,63.0,62.8,62.7,61.6,60.0,58.9,57.9,56.4,55.9,54.8,53.7,52.8,52.7,52.5,53.0,53.4,53.9,54.9,56.6,59.0,60.7,61.2,61.3,61.4
+South Korea,40.52,40.02,45.02,48.02,49.55,50.22,50.9,51.6,52.3,53.02,53.75,54.51,55.27,56.04,56.84,57.67,58.54,59.44,60.35,61.22,62.02,62.73,63.34,63.84,64.26,64.62,64.95,65.31,65.7,66.15,66.66,67.21,67.78,68.37,68.98,69.58,70.18,70.75,71.29,71.8,72.2,72.7,73.1,73.6,74.0,74.5,74.9,75.4,75.8,76.3,76.7,77.1,77.7,78.2,78.7,79.1,79.4,79.8,80.1,80.4,80.6,80.7,80.9,80.9,81.0,81.1
+South Sudan,28.6,29.37,30.11,30.82,31.51,32.17,32.81,33.42,34.02,34.61,35.18,35.75,36.32,36.9,37.48,38.04,38.6,39.15,39.68,40.21,40.75,41.29,41.84,42.39,42.93,43.43,43.87,44.26,44.61,44.93,45.25,45.6,46.01,46.5,47.06,47.72,48.45,49.23,50.05,50.9,51.0,51.6,51.9,52.3,52.7,53.1,53.4,53.8,54.1,54.4,54.7,54.9,55.0,55.2,55.3,55.4,55.5,55.6,55.8,56.0,55.9,56.0,56.0,56.1,56.1,56.1
+Spain,61.5,64.92,65.79,66.98,66.75,66.79,66.63,68.82,68.74,69.23,69.62,69.65,69.81,70.54,70.95,71.2,71.39,71.68,71.21,72.19,71.79,73.0,72.78,73.16,73.49,73.81,74.32,74.51,75.05,75.53,75.67,76.22,76.0,76.38,76.34,76.59,76.82,76.82,76.89,76.9,77.0,77.4,77.6,77.8,77.9,78.1,78.6,78.8,78.8,79.2,79.5,79.6,79.6,80.0,80.3,80.7,80.8,81.1,81.5,81.8,82.0,82.2,82.5,82.5,82.6,82.7
+Sri Lanka,53.25,54.34,55.32,56.22,57.01,57.71,58.32,58.86,59.32,59.76,60.18,60.61,61.06,61.55,62.07,62.62,63.17,63.7,64.21,64.69,65.15,65.56,65.97,66.36,66.76,67.17,67.6,68.06,68.52,68.97,69.35,69.64,69.83,69.93,69.97,70.0,70.05,70.16,70.32,70.5,71.3,72.0,72.9,72.8,71.7,71.3,71.4,72.0,72.4,72.4,73.3,73.7,74.0,69.4,73.9,73.9,74.4,74.0,74.1,75.0,76.4,76.8,77.1,77.4,77.6,77.8
+St. Lucia,51.89,52.09,52.4,52.81,53.32,53.92,54.6,55.36,56.15,56.97,57.75,58.47,59.11,59.66,60.15,60.58,61.0,61.45,61.94,62.47,63.04,63.63,64.22,64.8,65.38,65.96,66.54,67.1,67.64,68.15,68.6,68.99,69.29,69.53,69.7,69.83,69.93,70.03,70.12,70.2,70.4,70.5,70.7,70.9,71.1,71.2,71.5,71.7,71.8,72.0,72.1,72.3,72.5,72.8,73.1,73.4,73.7,74.1,74.3,74.5,74.6,74.7,74.7,74.8,74.8,74.8
+St. Vincent and the Grenadines,50.11,50.59,51.19,51.89,52.69,53.58,54.57,55.63,56.73,57.85,58.96,59.99,60.93,61.75,62.46,63.06,63.58,64.04,64.46,64.84,65.16,65.43,65.64,65.82,65.99,66.16,66.36,66.61,66.88,67.19,67.52,67.84,68.15,68.43,68.68,68.92,69.14,69.34,69.53,69.7,69.7,69.7,69.7,69.6,69.6,69.4,69.7,69.8,69.6,69.1,69.7,69.7,70.1,70.2,70.4,70.6,70.8,70.9,71.1,71.1,71.0,71.1,70.8,71.1,71.2,71.3
+Sudan,44.44,45.08,45.71,46.31,46.88,47.45,48.0,48.53,49.04,49.54,50.04,50.52,50.99,51.47,51.94,52.42,52.9,53.36,53.82,54.26,54.68,55.06,55.41,55.73,56.0,56.23,56.44,56.63,56.8,56.95,57.11,57.27,57.44,57.61,57.81,58.01,58.23,58.44,58.67,58.9,59.2,59.4,59.5,60.2,60.5,60.6,60.8,61.2,62.0,62.4,62.8,63.3,63.5,63.7,64.6,64.9,65.3,65.5,65.7,66.1,66.3,66.7,66.9,67.2,67.5,67.8
+Suriname,55.52,56.24,56.93,57.57,58.16,58.72,59.24,59.71,60.16,60.58,61.0,61.41,61.81,62.23,62.65,63.07,63.49,63.89,64.28,64.66,65.0,65.32,65.62,65.91,66.19,66.47,66.76,67.07,67.39,67.71,68.02,68.31,68.57,68.79,68.98,69.15,69.29,69.43,69.57,69.7,69.9,69.8,69.7,69.8,70.1,70.2,70.2,70.1,69.9,69.7,69.5,69.4,69.5,69.7,69.9,70.0,70.1,70.2,70.5,70.7,71.0,71.3,71.6,71.8,72.0,72.2
+Swaziland,41.01,41.51,41.98,42.44,42.88,43.3,43.7,44.08,44.44,44.78,45.1,45.42,45.73,46.05,46.39,46.76,47.2,47.67,48.21,48.79,49.4,50.03,50.67,51.3,51.94,52.58,53.24,53.92,54.62,55.31,56.02,56.71,57.38,58.0,58.6,59.15,59.67,60.13,60.5,60.7,60.7,61.0,61.3,60.7,59.1,57.1,55.8,53.5,51.4,48.8,46.6,45.1,44.0,43.0,42.5,43.1,44.3,45.1,45.9,46.4,48.0,49.1,49.4,49.8,51.8,53.88
+Sweden,71.35,71.84,71.88,72.34,72.58,72.64,72.47,73.11,73.34,73.01,73.47,73.34,73.53,73.7,73.85,74.09,74.12,73.99,74.11,74.66,74.58,74.68,74.83,74.94,74.95,74.96,75.39,75.48,75.52,75.74,76.04,76.36,76.6,76.86,76.72,76.98,77.12,77.01,77.67,77.6,77.7,78.1,78.3,78.5,78.9,79.1,79.4,79.5,79.5,79.7,79.8,80.0,80.2,80.2,80.6,80.8,80.9,81.1,81.2,81.6,81.7,81.8,81.9,82.1,82.1,82.1
+Switzerland,68.72,69.63,69.55,70.02,70.1,70.23,70.58,71.32,71.48,71.46,71.79,71.35,71.34,72.23,72.36,72.5,72.8,72.75,72.76,73.18,73.3,73.82,74.12,74.47,74.86,74.98,75.43,75.39,75.69,75.69,75.92,76.26,76.27,76.87,76.99,77.17,77.47,77.49,77.68,77.5,77.6,77.9,78.3,78.4,78.5,79.1,79.2,79.5,79.8,79.8,80.2,80.4,80.6,81.0,81.3,81.5,81.7,82.0,82.0,82.3,82.6,82.7,82.8,82.9,83.0,83.1
+Syria,47.87,48.44,49.02,49.59,50.15,50.7,51.25,51.79,52.33,52.87,53.43,53.98,54.56,55.15,55.77,56.42,57.12,57.83,58.57,59.31,60.08,60.82,61.56,62.26,62.95,63.6,64.24,64.84,65.44,66.01,66.56,67.08,67.58,68.05,68.51,68.94,69.35,69.75,70.14,70.5,71.0,71.8,72.0,72.3,72.7,73.1,73.4,73.8,74.1,74.4,74.6,74.9,75.1,75.3,75.5,75.7,75.9,76.1,76.3,76.5,75.1,68.1,69.0,67.2,68.2,69.21
+Taiwan,55.11,58.51,60.31,62.01,62.41,62.51,62.41,64.21,64.22,64.42,64.92,65.22,66.02,66.72,67.42,67.42,67.52,67.62,68.62,68.67,69.08,69.38,69.43,69.8,70.05,70.41,70.58,71.15,71.28,71.53,71.63,72.14,72.12,72.79,72.98,73.11,73.4,73.22,73.53,73.8,74.2,74.3,74.5,74.6,74.6,74.7,75.2,75.4,75.3,76.0,76.4,76.9,77.3,77.3,77.4,77.8,78.2,78.4,78.7,79.0,78.8,79.0,79.3,79.4,79.5,79.6
+Tajikistan,52.94,53.4,53.87,54.33,54.79,55.26,55.72,56.17,56.64,57.1,57.57,58.03,58.51,58.98,59.45,59.9,60.34,60.77,61.17,61.55,61.9,62.23,62.53,62.81,63.08,63.34,63.57,63.81,64.04,64.28,64.53,64.8,65.07,65.34,65.55,65.69,65.73,65.67,65.5,65.3,65.3,62.6,64.2,64.1,64.1,63.3,64.8,64.9,65.5,65.8,66.1,66.5,66.9,67.5,68.0,68.7,69.2,69.6,70.0,70.1,70.1,70.8,71.4,71.9,72.4,72.9
+Tanzania,41.66,42.19,42.69,43.18,43.63,44.05,44.46,44.84,45.22,45.57,45.91,46.26,46.62,46.99,47.37,47.77,48.19,48.62,49.07,49.53,50.03,50.55,51.09,51.65,52.19,52.71,53.19,53.61,53.98,54.29,54.56,54.82,55.05,55.26,55.44,55.54,55.58,55.51,55.39,55.2,55.1,54.7,54.5,54.0,53.9,53.8,53.8,53.7,53.8,54.3,54.8,55.4,55.9,56.5,57.1,57.9,59.1,60.4,60.8,61.4,61.7,61.9,62.7,63.3,64.1,64.91
+Thailand,51.14,51.5,51.9,52.32,52.78,53.28,53.8,54.35,54.91,55.46,56.01,56.51,56.98,57.4,57.8,58.18,58.56,58.96,59.39,59.86,60.33,60.82,61.29,61.77,62.24,62.7,63.15,63.62,64.1,64.62,65.22,65.91,66.69,67.52,68.36,69.15,69.84,70.38,70.76,71.0,71.0,70.9,70.8,70.6,70.6,70.6,70.5,70.4,70.5,70.7,71.2,71.7,72.1,72.2,73.1,73.5,73.8,73.9,74.0,74.2,74.3,74.4,74.4,74.6,74.7,74.8
+Timor-Leste,31.41,32.12,32.83,33.54,34.24,34.94,35.64,36.34,37.04,37.74,38.45,39.15,39.85,40.55,41.29,42.12,43.02,43.96,44.86,45.56,45.8,45.51,44.71,43.49,42.12,40.94,40.25,40.27,41.01,42.45,44.42,46.61,48.76,50.76,52.51,53.99,55.26,56.41,57.47,58.5,59.2,59.9,60.6,61.3,61.8,62.3,62.4,62.8,62.3,60.7,64.4,65.3,65.7,66.5,67.5,68.5,69.2,69.9,70.4,70.8,71.3,71.7,72.0,72.3,72.4,72.5
+Togo,34.69,35.42,36.15,36.86,37.57,38.28,38.98,39.68,40.38,41.06,41.74,42.42,43.1,43.77,44.43,45.09,45.75,46.41,47.07,47.72,48.36,49.0,49.63,50.26,50.88,51.49,52.09,52.7,53.29,53.87,54.43,54.97,55.48,55.96,56.39,56.79,57.14,57.42,57.65,57.8,57.8,57.9,57.8,57.6,57.6,57.3,56.9,56.6,56.8,56.7,56.7,56.7,56.4,56.8,56.8,57.5,57.5,57.5,58.0,58.7,59.6,60.3,60.7,61.1,61.5,61.9
+Tonga,58.0,58.35,58.7,59.05,59.41,59.77,60.12,60.48,60.84,61.2,61.56,61.91,62.26,62.6,62.94,63.27,63.61,63.93,64.26,64.58,64.88,65.17,65.44,65.69,65.93,66.16,66.39,66.61,66.84,67.08,67.32,67.56,67.8,68.04,68.27,68.48,68.67,68.83,68.98,69.1,69.3,69.4,69.5,69.5,69.6,69.7,69.7,69.7,69.6,69.6,69.6,69.7,69.6,69.8,70.0,70.1,70.2,70.3,68.6,70.7,70.8,71.0,71.2,71.3,71.5,71.7
+Trinidad and Tobago,57.36,57.85,58.39,58.98,59.61,60.27,60.97,61.68,62.38,63.07,63.68,64.2,64.63,64.94,65.17,65.31,65.41,65.5,65.6,65.73,65.91,66.11,66.33,66.57,66.83,67.08,67.33,67.54,67.73,67.89,68.03,68.16,68.28,68.39,68.5,68.62,68.74,68.87,68.98,69.1,69.3,69.2,69.3,69.2,69.3,69.3,69.4,69.6,69.3,69.5,69.8,69.9,70.4,70.9,71.1,71.3,71.5,71.7,71.8,71.8,71.9,72.0,72.1,72.3,72.4,72.5
+Tunisia,39.03,39.33,39.68,40.06,40.48,40.94,41.43,41.97,42.56,43.2,43.89,44.65,45.47,46.35,47.31,48.33,49.42,50.56,51.74,52.94,54.16,55.37,56.57,57.75,58.9,60.03,61.15,62.27,63.36,64.41,65.4,66.31,67.13,67.87,68.56,69.2,69.83,70.48,71.13,71.8,72.0,72.2,72.2,72.5,72.9,73.4,73.9,74.3,74.7,75.0,75.3,75.5,75.7,76.0,76.2,76.4,76.6,76.8,77.0,77.1,77.2,77.4,77.5,77.6,77.6,77.6
+Turkey,41.2,41.68,42.2,42.76,43.35,43.99,44.67,45.38,46.13,46.91,47.71,48.52,49.35,50.15,50.96,51.74,52.48,53.21,53.91,54.59,55.27,55.96,56.65,57.36,58.08,58.81,59.55,60.29,61.03,61.74,62.45,63.15,63.82,64.49,65.15,65.76,66.37,66.96,67.53,68.1,68.5,69.2,69.7,69.8,70.0,70.6,71.2,72.0,71.5,73.8,74.4,75.1,75.1,75.8,76.2,76.7,77.4,77.8,78.5,78.8,78.8,79.1,78.8,79.1,79.2,79.3
+Turkmenistan,50.89,51.34,51.79,52.25,52.69,53.14,53.58,54.03,54.47,54.91,55.36,55.82,56.27,56.72,57.17,57.61,58.02,58.42,58.8,59.15,59.46,59.74,60.01,60.25,60.49,60.73,61.0,61.28,61.58,61.9,62.24,62.57,62.89,63.18,63.44,63.63,63.77,63.85,63.89,63.9,63.5,63.5,63.5,63.4,63.3,63.2,63.2,63.3,63.5,63.7,64.1,64.4,64.8,65.3,65.8,66.3,66.8,67.2,67.6,68.1,68.5,68.9,69.2,69.6,70.0,70.4
+Uganda,39.94,40.51,41.08,41.65,42.24,42.82,43.42,44.03,44.64,45.27,45.91,46.56,47.22,47.86,48.49,49.07,49.58,50.05,50.43,50.74,50.99,51.17,51.33,51.45,51.55,51.65,51.75,51.83,51.93,52.01,52.09,52.14,52.17,52.16,52.09,51.94,51.72,51.42,51.08,50.7,50.0,49.6,49.0,48.5,48.3,48.2,48.5,48.7,48.9,49.1,49.7,50.3,51.2,52.0,53.5,54.9,55.3,56.0,57.0,57.8,58.6,59.3,60.1,60.7,61.3,61.91
+Ukraine,62.2,62.94,63.63,64.42,66.26,67.15,67.19,68.88,69.26,70.88,71.15,70.56,71.18,71.97,71.4,71.66,71.25,71.33,70.7,70.59,70.81,70.57,70.75,70.63,69.96,70.01,69.68,69.63,69.36,69.33,69.36,69.51,69.48,69.17,69.47,70.82,70.61,70.49,70.43,70.0,69.4,68.8,68.3,67.5,66.5,66.7,67.3,68.1,67.7,67.3,67.5,67.5,67.7,67.5,67.1,67.9,67.6,67.8,69.6,70.5,71.1,71.2,71.3,71.3,71.5,71.7
+United Arab Emirates,41.83,43.04,44.22,45.37,46.5,47.62,48.7,49.77,50.82,51.85,52.89,53.91,54.91,55.9,56.87,57.81,58.7,59.54,60.33,61.08,61.78,62.45,63.09,63.7,64.3,64.87,65.41,65.93,66.43,66.91,67.36,67.79,68.2,68.6,68.98,69.34,69.68,70.0,70.31,70.6,70.8,71.1,71.3,71.6,71.9,72.1,72.4,72.8,73.0,73.3,73.6,73.8,74.1,74.4,75.2,75.7,75.6,75.6,75.6,75.6,75.5,75.5,75.4,75.4,75.4,75.4
+United Kingdom,68.26,69.55,69.82,70.19,70.15,70.42,70.54,70.71,70.81,71.02,70.77,70.84,70.74,71.53,71.52,71.43,72.06,71.68,71.64,71.89,72.2,71.98,72.18,72.38,72.65,72.62,73.11,73.04,73.14,73.57,73.9,74.03,74.28,74.66,74.51,74.78,75.12,75.23,75.36,75.7,76.0,76.2,76.3,76.6,76.7,76.9,77.1,77.3,77.5,77.8,78.0,78.2,78.4,78.7,79.0,79.2,79.5,79.7,80.0,80.2,80.5,80.7,80.8,80.9,81.0,81.1
+United States,68.22,68.44,68.79,69.58,69.63,69.71,69.49,69.76,69.98,69.91,70.32,70.21,70.04,70.33,70.41,70.43,70.76,70.42,70.66,70.92,71.24,71.34,71.54,72.08,72.68,72.99,73.38,73.58,74.03,73.93,74.36,74.65,74.71,74.81,74.79,74.87,75.01,75.02,75.1,75.4,75.5,75.8,75.7,75.8,75.9,76.3,76.6,76.8,76.9,76.9,76.9,77.1,77.3,77.6,77.6,77.8,78.1,78.3,78.5,78.8,78.9,79.0,79.1,79.1,79.1,79.1
+Uruguay,65.96,66.11,66.28,66.47,66.69,66.93,67.18,67.43,67.7,67.95,68.19,68.39,68.55,68.67,68.74,68.78,68.8,68.82,68.84,68.88,68.94,69.01,69.1,69.23,69.39,69.58,69.8,70.05,70.32,70.6,70.89,71.17,71.46,71.72,71.97,72.2,72.41,72.61,72.81,73.0,72.6,73.2,73.2,73.3,73.4,73.5,73.7,74.0,74.3,74.6,74.8,75.0,75.0,75.3,75.5,75.7,75.7,76.0,76.2,76.2,76.3,76.3,76.4,76.6,76.8,77.0
+Uzbekistan,55.32,55.78,56.23,56.68,57.13,57.58,58.02,58.46,58.91,59.35,59.8,60.25,60.7,61.15,61.59,62.02,62.43,62.83,63.2,63.54,63.86,64.14,64.4,64.64,64.87,65.11,65.37,65.64,65.94,66.25,66.59,66.93,67.27,67.59,67.85,68.02,68.09,68.06,67.96,67.8,67.6,67.3,67.0,66.7,66.6,66.7,66.9,67.1,67.4,67.6,67.8,67.9,68.1,68.3,68.5,68.8,69.2,69.6,69.9,70.2,70.6,70.9,71.2,71.5,71.8,72.1
+Vanuatu,40.79,41.36,41.94,42.51,43.09,43.67,44.24,44.82,45.4,45.97,46.55,47.14,47.71,48.29,48.87,49.44,50.01,50.56,51.12,51.67,52.21,52.77,53.33,53.89,54.46,55.05,55.64,56.24,56.83,57.41,57.97,58.5,58.98,59.44,59.87,60.27,60.67,61.07,61.48,61.9,62.0,62.1,62.2,62.2,62.3,62.4,61.2,62.5,62.0,62.5,62.5,62.5,62.5,62.6,62.7,62.9,63.2,63.4,63.6,63.9,64.1,64.4,64.6,64.7,64.9,65.1
+Venezuela,54.64,55.24,55.84,56.43,57.03,57.64,58.25,58.86,59.47,60.08,60.69,61.3,61.91,62.51,63.09,63.66,64.22,64.77,65.3,65.8,66.27,66.72,67.14,67.53,67.9,68.23,68.52,68.79,69.04,69.3,69.57,69.85,70.17,70.53,70.89,71.27,71.63,71.95,72.24,72.5,72.4,72.4,72.5,72.4,72.7,73.1,73.6,73.6,70.2,73.8,73.8,73.8,73.5,74.3,74.6,74.5,74.4,74.2,74.4,74.9,74.8,74.6,74.7,74.8,74.8,74.8
+Vietnam,51.98,52.81,53.6,54.36,55.11,55.83,56.52,57.19,57.86,58.52,59.17,59.82,60.42,60.95,61.32,61.36,61.06,60.45,59.63,58.78,58.17,58.0,58.35,59.23,60.54,62.07,63.58,64.86,65.84,66.49,66.86,67.1,67.3,67.51,67.77,68.07,68.38,68.68,69.0,69.3,69.6,69.8,70.1,70.3,70.6,70.9,71.1,71.5,71.7,72.0,72.2,72.5,72.8,73.0,73.3,73.5,73.8,74.1,74.3,74.5,74.7,74.9,75.0,75.2,75.4,75.6
+Virgin Islands (U.S.),57.9,58.87,59.74,60.54,61.25,61.88,62.44,62.93,63.36,63.75,64.11,64.46,64.82,65.2,65.6,66.02,66.44,66.87,67.29,67.71,68.12,68.53,68.94,69.34,69.73,70.11,70.46,70.8,71.12,71.43,71.74,72.05,72.38,72.71,73.06,73.41,73.75,74.09,74.42,74.73,75.04,75.34,75.64,75.94,76.23,76.52,76.8,77.07,77.33,77.57,77.8,78.0,78.19,78.36,78.52,78.69,78.86,79.05,79.25,79.46,79.69,79.92,80.15,80.38,80.6,80.82
+West Bank and Gaza,47.03,47.31,47.63,47.97,48.36,48.78,49.23,49.72,50.25,50.82,51.43,52.08,52.75,53.47,54.2,54.94,55.7,56.45,57.22,57.97,58.73,59.48,60.26,61.03,61.81,62.6,63.39,64.18,64.96,65.74,66.48,67.21,67.92,68.59,69.23,69.82,70.38,70.88,71.36,71.8,72.0,72.4,72.8,73.3,73.7,74.0,74.2,74.5,74.7,74.4,74.7,74.4,74.4,74.4,74.6,74.4,74.3,74.1,73.8,74.3,74.2,74.2,74.4,74.5,74.6,74.7
+Western Sahara,34.95,35.33,35.72,36.1,36.48,36.86,37.24,37.62,37.99,38.37,38.75,39.12,39.5,39.88,40.26,40.62,40.97,41.32,41.67,42.07,42.52,43.07,43.7,44.43,45.23,46.11,47.01,47.92,48.82,49.72,50.61,51.5,52.4,53.3,54.17,54.99,55.74,56.43,57.04,57.59,58.09,58.56,59.03,59.51,60.0,60.51,61.04,61.57,62.11,62.64,63.15,63.65,64.13,64.58,65.01,65.41,65.79,66.16,66.51,66.84,67.17,67.47,67.76,68.04,68.3,68.56
+Yemen,24.0,24.96,25.92,26.87,27.84,28.8,29.76,30.72,31.68,32.64,33.58,34.52,35.45,36.37,37.27,38.15,39.01,39.87,40.71,41.55,42.4,43.28,44.17,45.1,46.05,47.05,48.06,49.08,50.11,51.13,52.13,53.09,54.02,54.89,55.69,56.4,57.04,57.6,58.08,58.5,58.9,59.3,59.6,59.7,60.3,60.7,61.1,61.5,62.0,62.4,62.8,63.3,63.7,64.2,64.6,65.0,65.2,65.7,66.2,66.6,66.6,66.7,67.1,67.1,66.0,64.92
+Zambia,43.22,43.79,44.38,44.95,45.53,46.1,46.67,47.24,47.79,48.34,48.89,49.42,49.94,50.44,50.96,51.49,52.05,52.64,53.25,53.88,54.51,55.13,55.71,56.24,56.7,57.07,57.36,57.57,57.66,57.62,57.45,57.14,56.71,56.17,55.54,54.85,54.09,53.33,52.59,51.9,50.7,49.6,48.6,47.7,46.9,46.3,45.9,45.4,45.0,44.8,44.9,45.1,45.3,46.3,47.1,47.9,49.0,51.1,52.3,53.1,53.7,54.7,55.6,56.3,56.7,57.1
+Zimbabwe,48.75,49.25,49.75,50.25,50.73,51.22,51.71,52.17,52.64,53.11,53.55,53.99,54.42,54.83,55.25,55.65,56.04,56.43,56.83,57.22,57.63,58.05,58.47,58.92,59.41,59.94,60.53,61.17,61.82,62.48,63.13,63.73,64.23,64.63,64.86,64.9,64.74,64.39,63.81,63.0,62.7,61.4,59.8,58.2,56.0,54.4,52.8,50.9,49.3,47.9,47.0,45.9,45.3,44.7,45.1,45.5,46.4,47.3,48.0,49.1,51.6,54.2,55.7,57.0,59.3,61.69
diff --git a/previous_versions/v0.4.0/data/offshore.csv b/previous_versions/v0.4.0/data/offshore.csv
new file mode 100755
index 000000000..5aa096441
--- /dev/null
+++ b/previous_versions/v0.4.0/data/offshore.csv
@@ -0,0 +1,828 @@
+college_grad,response
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+yes,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+no,opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+yes,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
+no,no opinion
diff --git a/previous_versions/v0.4.0/data/zinc_tidy.csv b/previous_versions/v0.4.0/data/zinc_tidy.csv
new file mode 100755
index 000000000..84856e658
--- /dev/null
+++ b/previous_versions/v0.4.0/data/zinc_tidy.csv
@@ -0,0 +1,21 @@
+loc_id,location,concentration
+1.0,bottom,0.43
+1.0,surface,0.415
+2.0,bottom,0.266
+2.0,surface,0.238
+3.0,bottom,0.567
+3.0,surface,0.39
+4.0,bottom,0.531
+4.0,surface,0.41
+5.0,bottom,0.707
+5.0,surface,0.605
+6.0,bottom,0.716
+6.0,surface,0.609
+7.0,bottom,0.651
+7.0,surface,0.632
+8.0,bottom,0.589
+8.0,surface,0.523
+9.0,bottom,0.469
+9.0,surface,0.411
+10.0,bottom,0.723
+10.0,surface,0.612
diff --git a/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg b/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg
new file mode 100755
index 000000000..92464e41e
Binary files /dev/null and b/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg differ
diff --git a/previous_versions/v0.4.0/images/apps.jpg b/previous_versions/v0.4.0/images/apps.jpg
new file mode 100755
index 000000000..7ef7ea59a
Binary files /dev/null and b/previous_versions/v0.4.0/images/apps.jpg differ
diff --git a/previous_versions/v0.4.0/images/coggle.png b/previous_versions/v0.4.0/images/coggle.png
new file mode 100755
index 000000000..668944334
Binary files /dev/null and b/previous_versions/v0.4.0/images/coggle.png differ
diff --git a/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png b/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png
new file mode 100755
index 000000000..054694d97
Binary files /dev/null and b/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png differ
diff --git a/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png b/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png
new file mode 100755
index 000000000..d7037938b
Binary files /dev/null and b/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png differ
diff --git a/previous_versions/v0.4.0/images/dashboard.jpg b/previous_versions/v0.4.0/images/dashboard.jpg
new file mode 100755
index 000000000..57996bf17
Binary files /dev/null and b/previous_versions/v0.4.0/images/dashboard.jpg differ
diff --git a/previous_versions/v0.4.0/images/datacamp.png b/previous_versions/v0.4.0/images/datacamp.png
new file mode 100755
index 000000000..2911de3c4
Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp.png differ
diff --git a/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png b/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png
new file mode 100755
index 000000000..17fcfa240
Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png differ
diff --git a/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png b/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png
new file mode 100755
index 000000000..811743c26
Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png differ
diff --git a/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png b/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png
new file mode 100755
index 000000000..143c4cee8
Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png differ
diff --git a/previous_versions/v0.4.0/images/datacamp_intermediate_R.png b/previous_versions/v0.4.0/images/datacamp_intermediate_R.png
new file mode 100755
index 000000000..81b3cf7fb
Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_intermediate_R.png differ
diff --git a/previous_versions/v0.4.0/images/datacamp_intro_to_R.png b/previous_versions/v0.4.0/images/datacamp_intro_to_R.png
new file mode 100755
index 000000000..193664acd
Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_intro_to_R.png differ
diff --git a/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png b/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png
new file mode 100755
index 000000000..8bd13337a
Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png differ
diff --git a/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png b/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png
new file mode 100755
index 000000000..69ca9772a
Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png differ
diff --git a/previous_versions/v0.4.0/images/datacamp_working_with_data.png b/previous_versions/v0.4.0/images/datacamp_working_with_data.png
new file mode 100755
index 000000000..eeb4ac861
Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_working_with_data.png differ
diff --git a/previous_versions/v0.4.0/images/engine.jpg b/previous_versions/v0.4.0/images/engine.jpg
new file mode 100755
index 000000000..597512b49
Binary files /dev/null and b/previous_versions/v0.4.0/images/engine.jpg differ
diff --git a/previous_versions/v0.4.0/images/errors.png b/previous_versions/v0.4.0/images/errors.png
new file mode 100755
index 000000000..43c19d9a3
Binary files /dev/null and b/previous_versions/v0.4.0/images/errors.png differ
diff --git a/previous_versions/v0.4.0/images/filter.png b/previous_versions/v0.4.0/images/filter.png
new file mode 100755
index 000000000..8cd96205d
Binary files /dev/null and b/previous_versions/v0.4.0/images/filter.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png b/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png
new file mode 100755
index 000000000..e14558e96
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png b/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png
new file mode 100755
index 000000000..0ce574917
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png b/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png
new file mode 100755
index 000000000..7c8b6c6a7
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png
new file mode 100755
index 000000000..71139e1a1
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png
new file mode 100755
index 000000000..e78715c4d
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png
new file mode 100755
index 000000000..dce19ad70
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png
new file mode 100755
index 000000000..964f0ae8f
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png b/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png
new file mode 100755
index 000000000..83b51e66e
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png b/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png
new file mode 100755
index 000000000..d9baa59f1
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/generate.png b/previous_versions/v0.4.0/images/flowcharts/infer/generate.png
new file mode 100755
index 000000000..d81baa6ff
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/generate.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/ht.png b/previous_versions/v0.4.0/images/flowcharts/infer/ht.png
new file mode 100755
index 000000000..5effd3674
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/ht.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png b/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png
new file mode 100755
index 000000000..582bdad19
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/specify.png b/previous_versions/v0.4.0/images/flowcharts/infer/specify.png
new file mode 100755
index 000000000..7f68e18b7
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/specify.png differ
diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png b/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png
new file mode 100755
index 000000000..895426ff3
Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png differ
diff --git a/previous_versions/v0.4.0/images/group_summary.png b/previous_versions/v0.4.0/images/group_summary.png
new file mode 100755
index 000000000..2f09b0f0f
Binary files /dev/null and b/previous_versions/v0.4.0/images/group_summary.png differ
diff --git a/previous_versions/v0.4.0/images/guess_the_correlation.png b/previous_versions/v0.4.0/images/guess_the_correlation.png
new file mode 100755
index 000000000..fefdb23b1
Binary files /dev/null and b/previous_versions/v0.4.0/images/guess_the_correlation.png differ
diff --git a/previous_versions/v0.4.0/images/ht.png b/previous_versions/v0.4.0/images/ht.png
new file mode 100755
index 000000000..204422828
Binary files /dev/null and b/previous_versions/v0.4.0/images/ht.png differ
diff --git a/previous_versions/v0.4.0/images/iphone.jpg b/previous_versions/v0.4.0/images/iphone.jpg
new file mode 100755
index 000000000..cf3a222a0
Binary files /dev/null and b/previous_versions/v0.4.0/images/iphone.jpg differ
diff --git a/previous_versions/v0.4.0/images/ismay.jpeg b/previous_versions/v0.4.0/images/ismay.jpeg
new file mode 100755
index 000000000..f68ead9ed
Binary files /dev/null and b/previous_versions/v0.4.0/images/ismay.jpeg differ
diff --git a/previous_versions/v0.4.0/images/join-inner.png b/previous_versions/v0.4.0/images/join-inner.png
new file mode 100755
index 000000000..18e996daa
Binary files /dev/null and b/previous_versions/v0.4.0/images/join-inner.png differ
diff --git a/previous_versions/v0.4.0/images/kim.jpeg b/previous_versions/v0.4.0/images/kim.jpeg
new file mode 100755
index 000000000..524aff3d5
Binary files /dev/null and b/previous_versions/v0.4.0/images/kim.jpeg differ
diff --git a/previous_versions/v0.4.0/images/logos/book_cover.png b/previous_versions/v0.4.0/images/logos/book_cover.png
new file mode 100755
index 000000000..f20fd9ef6
Binary files /dev/null and b/previous_versions/v0.4.0/images/logos/book_cover.png differ
diff --git a/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png b/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png
new file mode 100755
index 000000000..d28831d0b
Binary files /dev/null and b/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png differ
diff --git a/previous_versions/v0.4.0/images/logos/favicons/favicon.ico b/previous_versions/v0.4.0/images/logos/favicons/favicon.ico
new file mode 100755
index 000000000..bddb10a6f
Binary files /dev/null and b/previous_versions/v0.4.0/images/logos/favicons/favicon.ico differ
diff --git a/previous_versions/v0.4.0/images/mutate.png b/previous_versions/v0.4.0/images/mutate.png
new file mode 100755
index 000000000..ab15762b8
Binary files /dev/null and b/previous_versions/v0.4.0/images/mutate.png differ
diff --git a/previous_versions/v0.4.0/images/read_excel.png b/previous_versions/v0.4.0/images/read_excel.png
new file mode 100755
index 000000000..e9467bb82
Binary files /dev/null and b/previous_versions/v0.4.0/images/read_excel.png differ
diff --git a/previous_versions/v0.4.0/images/relational-nycflights.png b/previous_versions/v0.4.0/images/relational-nycflights.png
new file mode 100755
index 000000000..10b04ce0f
Binary files /dev/null and b/previous_versions/v0.4.0/images/relational-nycflights.png differ
diff --git a/previous_versions/v0.4.0/images/rstudio.png b/previous_versions/v0.4.0/images/rstudio.png
new file mode 100755
index 000000000..e1d286545
Binary files /dev/null and b/previous_versions/v0.4.0/images/rstudio.png differ
diff --git a/previous_versions/v0.4.0/images/sampling/shovel_025.jpg b/previous_versions/v0.4.0/images/sampling/shovel_025.jpg
new file mode 100755
index 000000000..df2c5e1d2
Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/shovel_025.jpg differ
diff --git a/previous_versions/v0.4.0/images/sampling/shovel_050.jpg b/previous_versions/v0.4.0/images/sampling/shovel_050.jpg
new file mode 100755
index 000000000..68787cf3d
Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/shovel_050.jpg differ
diff --git a/previous_versions/v0.4.0/images/sampling/shovel_100.jpg b/previous_versions/v0.4.0/images/sampling/shovel_100.jpg
new file mode 100755
index 000000000..1cc70a70f
Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/shovel_100.jpg differ
diff --git a/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg b/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg
new file mode 100755
index 000000000..9a045406f
Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg differ
diff --git a/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg b/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg
new file mode 100755
index 000000000..45b2791a9
Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg differ
diff --git a/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg b/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg
new file mode 100755
index 000000000..50ef8b56f
Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg differ
diff --git a/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg b/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg
new file mode 100755
index 000000000..bd20120f3
Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg differ
diff --git a/previous_versions/v0.4.0/images/sampling_bowl_2.jpg b/previous_versions/v0.4.0/images/sampling_bowl_2.jpg
new file mode 100755
index 000000000..48412bcfd
Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling_bowl_2.jpg differ
diff --git a/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg b/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg
new file mode 100755
index 000000000..a38e5d063
Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg differ
diff --git a/previous_versions/v0.4.0/images/select.png b/previous_versions/v0.4.0/images/select.png
new file mode 100755
index 000000000..a7329274a
Binary files /dev/null and b/previous_versions/v0.4.0/images/select.png differ
diff --git a/previous_versions/v0.4.0/images/sign-2408065_1920.png b/previous_versions/v0.4.0/images/sign-2408065_1920.png
new file mode 100755
index 000000000..824dc86f0
Binary files /dev/null and b/previous_versions/v0.4.0/images/sign-2408065_1920.png differ
diff --git a/previous_versions/v0.4.0/images/summarize1.png b/previous_versions/v0.4.0/images/summarize1.png
new file mode 100755
index 000000000..e52e1d984
Binary files /dev/null and b/previous_versions/v0.4.0/images/summarize1.png differ
diff --git a/previous_versions/v0.4.0/images/summary.png b/previous_versions/v0.4.0/images/summary.png
new file mode 100755
index 000000000..86415225e
Binary files /dev/null and b/previous_versions/v0.4.0/images/summary.png differ
diff --git a/previous_versions/v0.4.0/images/tidy-1.png b/previous_versions/v0.4.0/images/tidy-1.png
new file mode 100755
index 000000000..4287d74c6
Binary files /dev/null and b/previous_versions/v0.4.0/images/tidy-1.png differ
diff --git a/previous_versions/v0.4.0/images/tidy1.png b/previous_versions/v0.4.0/images/tidy1.png
new file mode 100755
index 000000000..88771ff58
Binary files /dev/null and b/previous_versions/v0.4.0/images/tidy1.png differ
diff --git a/previous_versions/v0.4.0/index.html b/previous_versions/v0.4.0/index.html
new file mode 100644
index 000000000..3fdbeb151
--- /dev/null
+++ b/previous_versions/v0.4.0/index.html
@@ -0,0 +1,941 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+
+<link rel="next" href="2-getting-started.html">
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="header">
+<h1 class="title">An Introduction to Statistical and Data Sciences via R</h1>
+<p class="author"><em>Chester Ismay and Albert Y. Kim</em></p>
+<p class="date"><em>July 21, 2018</em></p>
+</div>
+<div id="intro" class="section level1">
+<h1><span class="header-section-number">1</span> Introduction</h1>
+<!--
+---
+
+<div class="learncheck">
+<p><strong>Note: This is the development version of ModernDive and is currently in the process of being edited. For the latest released version of ModernDive, please go to <a href="https://moderndive.com/">ModernDive.com</a>.</strong></p>
+</div>
+-->
+<hr />
+<div id="important-note" class="section level2">
+<h2><span class="header-section-number">1.1</span> Important Note</h2>
+<p>This is a previous version (v0.4.0) of ModernDive and may be out of date. For the current version of ModernDive, please go to <a href="https://moderndive.com/">ModernDive.com</a>.</p>
+<hr />
+<p><img src="https://cran.r-project.org/Rlogo.svg" alt="Drawing" style="height: 150px;"/>
+     
+<img src="https://www.rstudio.com/wp-content/uploads/2014/07/RStudio-Logo-Blue-Gradient.png" alt="Drawing" style="height: 150px;"/></p>
+<p><strong>Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do?</strong> If you’re asking yourself this question, then you’ve come to the right place! Start with our <a href="index.html#sec:intro-for-students">Introduction for Students</a>.</p>
+<ul>
+<li><em>Are you an instructor hoping to use this book in your courses? Then click <a href="index.html#sec:intro-instructors">here</a> for more information on how to teach with this book.</em></li>
+<li><em>Are you looking to connect with and contribute to ModernDive? Then click <a href="index.html#sec:connect-contribute">here</a> for information on how.</em></li>
+<li><em>Are you curious about the publishing of this book? Then click <a href="index.html#sec:about-book">here</a> for more information on the open-source technology, in particular R Markdown and the bookdown package.</em></li>
+</ul>
+<p>This is version 0.4.0 of ModernDive published on July 21, 2018. For previous versions of ModernDive, see Section <a href="index.html#sec:about-book">1.6</a>.</p>
+<hr />
+</div>
+<div id="sec:intro-for-students" class="section level2">
+<h2><span class="header-section-number">1.2</span> Introduction for students</h2>
+<p>This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.</p>
+<p>In Figure <a href="index.html#fig:moderndive-figure">1.1</a> we present a flowchart of what you’ll cover in this book. You’ll first get started with with data in Chapter <a href="2-getting-started.html#getting-started">2</a>, where you’ll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then</p>
+<ol style="list-style-type: decimal">
+<li><strong>Data science</strong>: You’ll assemble your data science toolbox using <code>tidyverse</code> packages. In particular:
+<ul>
+<li>Ch.<a href="3-viz.html#viz">3</a>: Visualizing data via the <code>ggplot2</code> package.</li>
+<li>Ch.<a href="4-tidy.html#tidy">4</a>: Understanding the concept of “tidy” data as a standardized data input format for all packages in the <code>tidyverse</code></li>
+<li>Ch.<a href="5-wrangling.html#wrangling">5</a>: Wrangling data via the <code>dplyr</code> package.</li>
+</ul></li>
+<li><strong>Data modeling</strong>: Using these data science tools and helper functions from the <code>moderndive</code> package, you’ll start performing data modeling. In particular:
+<ul>
+<li>Ch.<a href="6-regression.html#regression">6</a>: Constructing basic regression models.</li>
+<li>Ch.<a href="7-multiple-regression.html#multiple-regression">7</a>: Constructing multiple regression models.</li>
+</ul></li>
+<li><strong>Statistical inference</strong>: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the <code>infer</code> package. In particular:
+<ul>
+<li>Ch.<a href="8-sampling.html#sampling">8</a>: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls.</li>
+<li>Ch.<a href="9-confidence-intervals.html#confidence-intervals">9</a>: Building confidence intervals.</li>
+<li>Ch.<a href="10-hypothesis-testing.html#hypothesis-testing">10</a>: Conducting hypothesis tests.</li>
+</ul></li>
+<li><strong>Data modeling revisited</strong>: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.<a href="6-regression.html#regression">6</a> &amp; Ch.<a href="6-regression.html#regression">6</a>. In particular:
+<ul>
+<li>Ch.<a href="11-inference-for-regression.html#inference-for-regression">11</a>: Interpreting both the statistical and practice significance of the results of the models.</li>
+</ul></li>
+</ol>
+<p>We’ll end with a discussion on what it means to “think with data” in Chapter <a href="12-thinking-with-data.html#thinking-with-data">12</a> and present an example case study data analysis of house prices in Seattle.</p>
+<div class="figure" style="text-align: center"><span id="fig:moderndive-figure"></span>
+<img src="images/flowcharts/flowchart/flowchart.002.png" alt="ModernDive Flowchart" width="\textwidth" />
+<p class="caption">
+Figure 1.1: ModernDive Flowchart
+</p>
+</div>
+<div id="subsec:learning-goals" class="section level3">
+<h3><span class="header-section-number">1.2.1</span> What you will learn from this book</h3>
+<p>We hope that by the end of this book, you’ll have learned</p>
+<ol style="list-style-type: decimal">
+<li>How to use R to explore data.<br />
+</li>
+<li>How to answer statistical questions using tools like confidence intervals and hypothesis tests.</li>
+<li>How to effectively create “data stories” using these tools.</li>
+</ol>
+<p>What do we mean by data stories? We mean any analysis involving data that engages the reader in answering questions with careful visuals and thoughtful discussion, such as <a href="http://rpubs.com/ry_lisa_elana/chicago">How strong is the relationship between per capita income and crime in Chicago neighborhoods?</a> and <a href="https://ismayc.github.io/soc301_s2017/group_projects/group4.html">How many f**ks does Quentin Tarantino give (as measured by the amount of swearing in his films)?</a>. Further discussions on data stories can be found in this <a href="https://www.thinkwithgoogle.com/marketing-resources/data-measurement/tell-meaningful-stories-with-data/">Think With Google article</a>.</p>
+<p>For other examples of data stories constructed by students like yourselves, look at the final projects for two courses that have previously used ModernDive:</p>
+<ul>
+<li>Middlebury College <a href="https://rudeboybert.github.io/MATH116/PS/final_project/final_project_outline.html#past_examples">MATH 116 Introduction to Statistical and Data Sciences</a> using student collected data.</li>
+<li>Pacific University <a href="https://ismayc.github.io/soc301_s2017/group-projects/index.html">SOC 301 Social Statistics</a> using data from the <a href="https://cran.r-project.org/web/packages/fivethirtyeight/vignettes/fivethirtyeight.html">fivethirtyeight R package</a>.</li>
+</ul>
+<p>This book will help you develop your “data science toolbox”, including tools such as data visualization, data formatting, data wrangling, and data modeling using regression. With these tools, you’ll be able to perform the entirety of the “data/science pipeline” while building data communication skills (see Subsection <a href="index.html#subsec:pipeline">1.2.2</a> for more details).</p>
+<p>In particular, this book will lean heavily on data visualization. In today’s world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are to convey relationships with data. You’ll also see the use of visualization to introduce concepts like mean, median, standard deviation, distributions, etc. In general, we’ll use visualization as a way of building almost all of the ideas in this book.</p>
+<p>To impart the statistical lessons in this book, we have intentionally minimized the number of mathematical formulas used and instead have focused on developing a conceptual understanding via data visualization, statistical computing, and simulations. We hope this is a more intuitive experience than the way statistics has traditionally been taught in the past and how it is commonly perceived.</p>
+<p>Finally, you’ll learn the importance of literate programming. By this we mean you’ll learn how to write code that is useful not just for a computer to execute but also for readers to understand exactly what your analysis is doing and how you did it. This is part of a greater effort to encourage reproducible research (see Subsection <a href="index.html#subsec:reproducible">1.2.3</a> for more details). Hal Abelson coined the phrase that we will follow throughout this book:</p>
+<blockquote>
+<p>“Programs must be written for people to read, and only incidentally for machines to execute.”</p>
+</blockquote>
+<p>We understand that there may be challenging moments as you learn to program. Both of us continue to struggle and find ourselves often using web searches to find answers and reach out to colleagues for help. In the long run though, we all can solve problems faster and more elegantly via programming. We wrote this book as our way to help you get started and you should know that there is a huge community of R users that are always happy to help everyone along as well. This community exists in particular on the internet on various forums and websites such as <a href="https://stackoverflow.com/">stackoverflow.com</a>.</p>
+</div>
+<div id="subsec:pipeline" class="section level3">
+<h3><span class="header-section-number">1.2.2</span> Data/science pipeline</h3>
+<p>You may think of statistics as just being a bunch of numbers. We commonly hear the phrase “statistician” when listening to broadcasts of sporting events. Statistics (in particular, data analysis), in addition to describing numbers like with baseball batting averages, plays a vital role in all of the sciences. You’ll commonly hear the phrase “statistically significant” thrown around in the media. You’ll see articles that say “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this book, you’ll be able to better understand whether these claims should be trusted or whether we should be wary. Inside data analysis are many sub-fields that we will discuss throughout this book (though not necessarily in this order):</p>
+<ul>
+<li>data collection</li>
+<li>data wrangling</li>
+<li>data visualization</li>
+<li>data modeling</li>
+<li>inference</li>
+<li>correlation and regression</li>
+<li>interpretation of results</li>
+<li>data communication/storytelling</li>
+</ul>
+<p>These sub-fields are summarized in what Grolemund and Wickham term the <a href="http://r4ds.had.co.nz/explore-intro.html">“Data/Science Pipeline”</a> in Figure <a href="index.html#fig:pipeline-figure">1.2</a>.</p>
+<div class="figure" style="text-align: center"><span id="fig:pipeline-figure"></span>
+<img src="images/tidy1.png" alt="Data/Science Pipeline" width="\textwidth" />
+<p class="caption">
+Figure 1.2: Data/Science Pipeline
+</p>
+</div>
+<p>We will begin by digging into the gray <strong>Understand</strong> portion of the cycle with data visualization, then with a discussion on what is meant by tidy data and data wrangling, and then conclude by talking about interpreting and discussing the results of our models via <strong>Communication</strong>. These steps are vital to any statistical analysis. But why should you care about statistics? “Why did they make me take this class?”</p>
+<p>There’s a reason so many fields require a statistics course. Scientific knowledge grows through an understanding of statistical significance and data analysis. You needn’t be intimidated by statistics. It’s not the beast that it used to be and, paired with computation, you’ll see how reproducible research in the sciences particularly increases scientific knowledge.</p>
+</div>
+<div id="subsec:reproducible" class="section level3">
+<h3><span class="header-section-number">1.2.3</span> Reproducible research</h3>
+<blockquote>
+<p>“The most important tool is the <em>mindset</em>, when starting, that the end product will be reproducible.” – Keith Baggerly</p>
+</blockquote>
+<p>Another goal of this book is to help readers understand the importance of reproducible analyses. The hope is to get readers into the habit of making their analyses reproducible from the very beginning. This means we’ll be trying to help you build new habits. This will take practice and be difficult at times. You’ll see just why it is so important for you to keep track of your code and well-document it to help yourself later and any potential collaborators as well.</p>
+<p>Copying and pasting results from one program into a word processor is not the way that efficient and effective scientific research is conducted. It’s much more important for time to be spent on data collection and data analysis and not on copying and pasting plots back and forth across a variety of programs.</p>
+<p>In a traditional analyses if an error was made with the original data, we’d need to step through the entire process again: recreate the plots and copy and paste all of the new plots and our statistical analysis into your document. This is error prone and a frustrating use of time. We’ll see how to use R Markdown to get away from this tedious activity so that we can spend more time doing science.</p>
+<blockquote>
+<p>“We are talking about <em>computational</em> reproducibility.” - Yihui Xie</p>
+</blockquote>
+<p>Reproducibility means a lot of things in terms of different scientific fields. Are experiments conducted in a way that another researcher could follow the steps and get similar results? In this book, we will focus on what is known as <strong>computational reproducibility</strong>. This refers to being able to pass all of one’s data analysis, data-sets, and conclusions to someone else and have them get exactly the same results on their machine. This allows for time to be spent interpreting results and considering assumptions instead of the more error prone way of starting from scratch or following a list of steps that may be different from machine to machine.</p>
+<!--
+Additionally, this book will focus on computational thinking, data thinking, and inferential thinking. We'll see throughout the book how these three modes of thinking can build effective ways to work with, to describe, and to convey statistical knowledge.  
+-->
+</div>
+<div id="final-note-for-students" class="section level3">
+<h3><span class="header-section-number">1.2.4</span> Final note for students</h3>
+<p>At this point, if you are interested in instructor perspectives on this book, ways to contribute and collaborate, or the technical details of this book’s construction and publishing, then continue with the rest of the chapter below. Otherwise, let’s get started with R and RStudio in Chapter <a href="2-getting-started.html#getting-started">2</a>!</p>
+<hr />
+</div>
+</div>
+<div id="sec:intro-instructors" class="section level2">
+<h2><span class="header-section-number">1.3</span> Introduction for instructors</h2>
+<p>This book is inspired by the following books:</p>
+<ul>
+<li>“Mathematical Statistics with Resampling and R” <span class="citation">(Chihara and Hesterberg 2011)</span>,</li>
+<li>“OpenIntro: Intro Stat with Randomization and Simulation” <span class="citation">(Diez, Barr, and Çetinkaya-Rundel 2014)</span>, and</li>
+<li>“R for Data Science” <span class="citation">(Grolemund and Wickham 2016)</span>.</li>
+</ul>
+<p>The first book, while designed for upper-level undergraduates and graduate students, provides an excellent resource on how to use resampling to impart statistical concepts like sampling distributions using computation instead of large-sample approximations and other mathematical formulas. The last two books are free options to learning introductory statistics and data science, providing an alternative to the many traditionally expensive introductory statistics textbooks.</p>
+<p>When looking over the large number of introductory statistics textbooks that currently exist, we found that there wasn’t one that incorporated many newly developed R packages directly into the text, in particular the many packages included in the <a href="http://tidyverse.org/"><code>tidyverse</code></a> collection of packages, such as <code>ggplot2</code>, <code>dplyr</code>, <code>tidyr</code>, and <code>broom</code>. Additionally, there wasn’t an open-source and easily reproducible textbook available that exposed new learners all of three of the learning goals listed at the outset of Subsection <a href="index.html#subsec:learning-goals">1.2.1</a>.</p>
+<div id="who-is-this-book-for" class="section level3">
+<h3><span class="header-section-number">1.3.1</span> Who is this book for?</h3>
+<p>This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience.</p>
+<p>Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you.</p>
+<ol style="list-style-type: decimal">
+<li><strong>Blur the lines between lecture and lab</strong>
+<ul>
+<li>With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened.</li>
+<li>It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key.</li>
+</ul></li>
+<li><strong>Focus on the entire data/science research pipeline</strong>
+<ul>
+<li>We believe that the entirety of Grolemund and Wickham’s <a href="http://r4ds.had.co.nz/introduction.html">data/science pipeline</a> should be taught.</li>
+<li>We believe in <a href="https://arxiv.org/abs/1507.05346">“minimizing prerequisites to research”</a>: students should be answering questions with data as soon as possible.</li>
+</ul></li>
+<li><strong>It’s all about the data</strong>
+<ul>
+<li>We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the <code>nycflights13</code> and <code>fivethirtyeight</code> packages.</li>
+<li>We believe that <a href="http://escholarship.org/uc/item/84v3774z">data visualization is a gateway drug for statistics</a> and that the Grammar of Graphics as implemented in the <code>ggplot2</code> package is the best way to impart such lessons. However, we often hear: “You can’t teach <code>ggplot2</code> for data visualization in intro stats!” We, like <a href="http://varianceexplained.org/r/teach_ggplot2_to_beginners/">David Robinson</a>, are much more optimistic.</li>
+<li><code>dplyr</code> has made data wrangling much more <a href="http://chance.amstat.org/2015/04/setting-the-stage/">accessible</a> to novices, and hence much more interesting data-sets can be explored.</li>
+</ul></li>
+<li><strong>Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas</strong>
+<ul>
+<li>Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference.</li>
+<li>This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics.</li>
+</ul></li>
+<li><strong>Don’t fence off students from the computation pool, throw them in!</strong>
+<ul>
+<li>Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice.</li>
+<li>We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis.</li>
+</ul></li>
+<li><strong>Complete reproducibility and customizability</strong>
+<ul>
+<li>We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book!</li>
+<li>Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see <a href="#about-book">About this Book</a>.</li>
+</ul></li>
+</ol>
+<hr />
+</div>
+</div>
+<div id="datacamp" class="section level2">
+<h2><span class="header-section-number">1.4</span> DataCamp</h2>
+<p><img src="images/datacamp.png" /></p>
+<p>DataCamp is a browser-based interactive platform for learning data science, offering courses on a wide array of courses on data science, analytics, statistics, machine learning, and artificial intelligence, where each course is a combination of lectures and exercises that offer immediate feedback.</p>
+<p>The following chapters of ModernDive roughly map to the following closely-integrated DataCamp courses that use the same R tools and often even the same datasets. By no means is this an exhaustive list of possible DataCamp courses that are relevant to the topics in this book, we recommend these ones in particular to supplement your ModernDive experience.</p>
+<p>Click on the image for each course to access its webpage on <a href="https://www.datacamp.com/home">datacamp.com</a>. Instructors at accredited universities can sign their class up for a free academic licence at <a href="https://www.datacamp.com/groups/education">DataCamp For The Classroom</a>, giving their students access to all premium courses for 6 months for free.</p>
+<table>
+<colgroup>
+<col width="33%" />
+<col width="33%" />
+<col width="33%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th align="left">Chapter</th>
+<th align="left">Topic</th>
+<th align="left">DataCamp Courses</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">2</td>
+<td align="left">Basic R programming concepts</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/free-introduction-to-r"><img src="images/datacamp_intro_to_R.png" alt="Drawing" style="height: 150px;"/></a> <a target="_blank" class="page-link" href="https://www.datacamp.com/courses/intermediate-r"><img src="images/datacamp_intermediate_R.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+<tr class="even">
+<td align="left">3 &amp; 5</td>
+<td align="left">Introductory data visualization and wrangling</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/introduction-to-the-tidyverse"><img src="images/datacamp_intro_to_tidyverse.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+<tr class="odd">
+<td align="left">4 &amp; 5</td>
+<td align="left">Data “tidying” and intermediate data wrangling</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/working-with-data-in-the-tidyverse"><img src="images/datacamp_working_with_data.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+<tr class="even">
+<td align="left">6 &amp; 7</td>
+<td align="left">Data modelling, basic regression, and multiple regression</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse"><img src="images/datacamp_intro_to_modeling.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+<tr class="odd">
+<td align="left">9 &amp; 10</td>
+<td align="left">Statistical inference: confidence intervals and hypothesis testing</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-numerical-data"><img src="images/datacamp_inference_for_numerical_data.png" alt="Drawing" style="height: 150px;"/></a> <a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-categorical-data"><img src="images/datacamp_inference_for_categorical_data.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+<tr class="even">
+<td align="left">11</td>
+<td align="left">Inference for regression</td>
+<td align="left"><a target="_blank" class="page-link" href="https://www.datacamp.com/courses/inference-for-linear-regression"><img src="images/datacamp_inference_for_regression.png" alt="Drawing" style="height: 150px;"/></a></td>
+</tr>
+</tbody>
+</table>
+<hr />
+</div>
+<div id="sec:connect-contribute" class="section level2">
+<h2><span class="header-section-number">1.5</span> Connect and contribute</h2>
+<p>If you would like to connect with ModernDive, check out the following links:</p>
+<ul>
+<li>If you would like to receive periodic updates about ModernDive (roughly every 3 months), please sign up for our <a href="http://eepurl.com/cBkItf">mailing list</a>.</li>
+<li>Contact Albert at <a href="mailto:albert@moderndive.com">albert@moderndive.com</a> and Chester <a href="mailto:chester@moderndive.com">chester@moderndive.com</a></li>
+<li>We’re on Twitter at <a href="https://twitter.com/ModernDive">ModernDive</a>.</li>
+</ul>
+<p>If you would like to contribute to ModernDive, there are many ways! Let’s all work together to make this book as great as possible for as many students and instructors as possible!</p>
+<ul>
+<li>Please let us know if you find any errors, typos, or areas from improvement on our <a href="https://github.com/moderndive/moderndive_book/issues">GitHub issues</a> page.</li>
+<li>If you are familiar with GitHub and would like to contribute more, please see Section <a href="index.html#sec:about-book">1.6</a> below.</li>
+</ul>
+<p>The authors would like to thank <a href="https://github.com/nsonneborn">Nina Sonneborn</a>, <a href="https://twitter.com/rhobott?lang=en">Kristin Bott</a>, and the participants of our <a href="https://www.causeweb.org/cause/uscots/uscots17/workshop/3">USCOTS 2017 workshop</a> for their feedback and suggestions. A special thanks goes to Prof. Yana Weinstein, cognitive psychological scientist and co-founder of <a href="http://www.learningscientists.org/yana-weinstein/">The Learning Scientists</a>, for her extensive contributions.</p>
+<hr />
+</div>
+<div id="sec:about-book" class="section level2">
+<h2><span class="header-section-number">1.6</span> About this book</h2>
+<p>This book was written using RStudio’s <a href="https://bookdown.org/">bookdown</a> package by Yihui Xie <span class="citation">(Xie 2018)</span>. This package simplifies the publishing of books by having all content written in <a href="http://rmarkdown.rstudio.com/html_document_format.html">R Markdown</a>. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub:</p>
+<ul>
+<li><strong>Latest published version</strong> The most up-to-date release:
+<ul>
+<li>Version 0.4.0 released on July 21, 2018 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.4.0">source code</a>).</li>
+<li>Available at <a href="https://moderndive.com/">ModernDive.com</a></li>
+</ul></li>
+<li><strong>Development version</strong> The working copy of the next version which is currently being edited:
+<ul>
+<li>Preview of development version is available at <a href="https://moderndive.netlify.com/">https://moderndive.netlify.com/</a></li>
+<li>Source code: Available on ModernDive’s <a href="https://github.com/moderndive/moderndive_book">GitHub repository page</a></li>
+</ul></li>
+<li><strong>Previous versions</strong> Older versions that may be out of date:
+<ul>
+<li><a href="https://moderndive.com/previous_versions/v0.3.0">Version 0.3.0</a> released on February 3, 2018 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.3.0">source code</a>)</li>
+<li><a href="https://moderndive.com/previous_versions/v0.2.0">Version 0.2.0</a> released on August 02, 2017 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.2.0">source code</a>)</li>
+<li><a href="https://moderndive.com/previous_versions/v0.1.3">Version 0.1.3</a> released on February 09, 2017 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.1.3">source code</a>)</li>
+<li><a href="https://moderndive.com/previous_versions/v0.1.2">Version 0.1.2</a> released on January 22, 2017 (<a href="https://github.com/moderndive/moderndive_book/releases/tag/v0.1.2">source code</a>)</li>
+</ul></li>
+</ul>
+<p>Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated <em>editions</em> of the textbook every few years, we apply a software design influenced model of publishing more easily updated <em>versions</em>. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests.</p>
+<p>Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of <code>index.Rmd</code> as “Chester Ismay, Albert Y. Kim, and YOU!”</p>
+<hr />
+</div>
+<div id="sec:about-authors" class="section level2">
+<h2><span class="header-section-number">1.7</span> About the authors</h2>
+<p>Who we are!</p>
+<table>
+<thead>
+<tr class="header">
+<th align="center">Chester Ismay</th>
+<th align="center">Albert Y. Kim</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="center"><img src="images/ismay.jpeg" alt="Drawing" style="height: 200px;"/></td>
+<td align="center"><img src="images/kim.jpeg" alt="Drawing" style="height: 200px;"/></td>
+</tr>
+</tbody>
+</table>
+<ul>
+<li>Chester Ismay: Data Science Curriculum Lead - DataCamp, Portland, OR, USA.
+<ul>
+<li>Email: <a href="mailto:chester@moderndive.com">chester@moderndive.com</a></li>
+<li>Webpage: <a href="http://ismayc.github.io/" class="uri">http://ismayc.github.io/</a></li>
+<li>Twitter: <a href="https://twitter.com/old_man_chester">old_man_chester</a></li>
+<li>GitHub: <a href="https://github.com/ismayc" class="uri">https://github.com/ismayc</a></li>
+</ul></li>
+<li>Albert Y. Kim: Assistant Professor of Statistical &amp; Data Sciences - Smith College, Northampton, MA, USA.
+<ul>
+<li>Email: <a href="mailto:albert@moderndive.com">albert@moderndive.com</a></li>
+<li>Webpage: <a href="http://rudeboybert.rbind.io/" class="uri">http://rudeboybert.rbind.io/</a></li>
+<li>Twitter: <a href="https://twitter.com/rudeboybert">rudeboybert</a></li>
+<li>GitHub: <a href="https://github.com/rudeboybert" class="uri">https://github.com/rudeboybert</a></li>
+</ul></li>
+</ul>
+<!--
+### Colophon 
+
+* ModernDive is written using the CC0 1.0 Universal License; more information on this license is available [here](https://creativecommons.org/publicdomain/zero/1.0/).
+* ModernDive uses the following versions of R packages (and their dependent packages):
+-->
+
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+
+<a href="2-getting-started.html" class="navigation navigation-next navigation-unique" aria-label="Next page"><i class="fa fa-angle-right"></i></a>
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/index.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot1-1.png
new file mode 100644
index 000000000..2ec3bb210
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot1-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png
new file mode 100644
index 000000000..1c40650d5
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png
new file mode 100644
index 000000000..75f1b3198
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png
new file mode 100644
index 000000000..e64a9d291
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png
new file mode 100644
index 000000000..b0424a705
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png
new file mode 100644
index 000000000..eac2ae532
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png
new file mode 100644
index 000000000..b461e4d8d
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png
new file mode 100644
index 000000000..c1b23c5d0
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png
new file mode 100644
index 000000000..7533ef4c0
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png
new file mode 100644
index 000000000..59079169e
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png
new file mode 100644
index 000000000..c373f0614
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png
new file mode 100644
index 000000000..52ec8d8c9
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png
new file mode 100644
index 000000000..8e38890bd
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png
new file mode 100644
index 000000000..bc07984da
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png
new file mode 100644
index 000000000..19c3d3ce4
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png
new file mode 100644
index 000000000..05fd2a2c6
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png
new file mode 100644
index 000000000..ae1e27ce4
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png
new file mode 100644
index 000000000..b15743ff0
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png
new file mode 100644
index 000000000..5c819748e
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png
new file mode 100644
index 000000000..6c974ac5c
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png
new file mode 100644
index 000000000..945c2a767
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png
new file mode 100644
index 000000000..ac51b9a4c
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png
new file mode 100644
index 000000000..0e7da0c56
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/guatline-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/guatline-1.png
new file mode 100644
index 000000000..f90e86717
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/guatline-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png
new file mode 100644
index 000000000..21cc533e4
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png
new file mode 100644
index 000000000..9aabd2385
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png
new file mode 100644
index 000000000..c22482a88
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png
new file mode 100644
index 000000000..ed39c46a3
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png
new file mode 100644
index 000000000..5289175ca
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png
new file mode 100644
index 000000000..1aff0c4d4
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png
new file mode 100644
index 000000000..87b9e1d29
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png
new file mode 100644
index 000000000..f7be7b354
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png
new file mode 100644
index 000000000..ae2dc50a1
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png
new file mode 100644
index 000000000..e24767ad9
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png
new file mode 100644
index 000000000..89a84cdec
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png
new file mode 100644
index 000000000..35d0b1184
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png
new file mode 100644
index 000000000..1b56768ff
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png
new file mode 100644
index 000000000..9d795fdd9
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png
new file mode 100644
index 000000000..fa5dcdc0c
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png
new file mode 100644
index 000000000..b57a36c43
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png
new file mode 100644
index 000000000..a9ba01c3c
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png
new file mode 100644
index 000000000..99c74f9f3
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png
new file mode 100644
index 000000000..e4bf82faa
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png
new file mode 100644
index 000000000..1dba66321
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png
new file mode 100644
index 000000000..b1494c193
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png
new file mode 100644
index 000000000..677284899
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png
new file mode 100644
index 000000000..2b7c97f70
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png
new file mode 100644
index 000000000..2f468ce06
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png
new file mode 100644
index 000000000..9cbd57423
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png
new file mode 100644
index 000000000..07492d6b9
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png
new file mode 100644
index 000000000..80f396a81
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png
new file mode 100644
index 000000000..22004abf2
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png
new file mode 100644
index 000000000..0c3d6fc59
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png
new file mode 100644
index 000000000..257b8868f
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png
new file mode 100644
index 000000000..f51ed808e
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png
new file mode 100644
index 000000000..b16443c75
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png
new file mode 100644
index 000000000..a02e9ea6d
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png
new file mode 100644
index 000000000..cbd0db485
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png
new file mode 100644
index 000000000..250e5aba0
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png
new file mode 100644
index 000000000..9fe9d9ddb
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png
new file mode 100644
index 000000000..b1bbe1929
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png
new file mode 100644
index 000000000..354794075
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png
new file mode 100644
index 000000000..6497fc940
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png
new file mode 100644
index 000000000..894639541
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png
new file mode 100644
index 000000000..e4d3802a7
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png
new file mode 100644
index 000000000..f223722fe
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png
new file mode 100644
index 000000000..2893c6c1b
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png
new file mode 100644
index 000000000..68d82d93f
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png
new file mode 100644
index 000000000..07c033577
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png
new file mode 100644
index 000000000..44a656cb9
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png
new file mode 100644
index 000000000..45267ae95
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-121-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-121-1.png
new file mode 100644
index 000000000..ca28c859b
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-121-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-211-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-198-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-211-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-198-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-212-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-199-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-212-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-199-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-244-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-226-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-244-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-226-1.png
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png
new file mode 100644
index 000000000..2ec3bb210
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png
new file mode 100644
index 000000000..2eff27fd8
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-27-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-27-1.png
new file mode 100644
index 000000000..4cd5d2522
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-27-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-294-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-275-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-294-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-275-1.png
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-279-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-279-1.png
new file mode 100644
index 000000000..b15d00000
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-279-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-28-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-28-1.png
new file mode 100644
index 000000000..52137cb8e
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-28-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-282-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-282-1.png
new file mode 100644
index 000000000..aba8f4003
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-282-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-29-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-29-1.png
new file mode 100644
index 000000000..610bc7ff9
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-29-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-295-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-295-1.png
new file mode 100644
index 000000000..0585865d8
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-295-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png
new file mode 100644
index 000000000..ecc2e03aa
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-30-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-30-1.png
new file mode 100644
index 000000000..95cb41aad
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-30-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png
new file mode 100644
index 000000000..67f3cebba
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png
new file mode 100644
index 000000000..59b11718b
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png
new file mode 100644
index 000000000..b8923fa33
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-296-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-307-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-296-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-307-1.png
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png
new file mode 100644
index 000000000..887295abf
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png
new file mode 100644
index 000000000..361b4cf56
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-342-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-321-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-342-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-321-1.png
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png
new file mode 100644
index 000000000..3f91edc4c
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png
new file mode 100644
index 000000000..f5ebba1b3
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png
new file mode 100644
index 000000000..b66803a57
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png
new file mode 100644
index 000000000..51c650433
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-380-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-357-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-380-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-357-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-387-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-366-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-387-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-366-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-392-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-369-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-392-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-369-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-391-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-370-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-391-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-370-1.png
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-382-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-382-1.png
new file mode 100644
index 000000000..fb7e59457
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-382-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png
new file mode 100644
index 000000000..a7c90d982
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png
new file mode 100644
index 000000000..38e224167
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-389-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-389-1.png
new file mode 100644
index 000000000..7b141a8b0
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-389-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-411-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-390-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-411-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-390-1.png
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png
new file mode 100644
index 000000000..61c1fc57b
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png
new file mode 100644
index 000000000..81bb3ed7e
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-395-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-395-1.png
new file mode 100644
index 000000000..2f1945041
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-395-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-404-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-404-1.png
new file mode 100644
index 000000000..4efe03a77
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-404-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-434-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-410-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-434-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-410-1.png
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-439-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-439-1.png
new file mode 100644
index 000000000..7f18858c5
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-439-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-447-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-447-1.png
new file mode 100644
index 000000000..c5de3a043
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-447-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png
new file mode 100644
index 000000000..987c77048
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png
new file mode 100644
index 000000000..23529eddf
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png
new file mode 100644
index 000000000..e5fc89b14
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png
new file mode 100644
index 000000000..9fba85a96
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png
new file mode 100644
index 000000000..d8fdb2a55
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-465-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-465-1.png
new file mode 100644
index 000000000..8d5cbc32d
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-465-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png
new file mode 100644
index 000000000..d58a15b80
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png
new file mode 100644
index 000000000..7330b399c
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-474-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-474-1.png
new file mode 100644
index 000000000..70da38928
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-474-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png
new file mode 100644
index 000000000..08e92540f
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png
new file mode 100644
index 000000000..ca3cdd4bf
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-508-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-483-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-508-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-483-1.png
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png
new file mode 100644
index 000000000..c561dced3
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png differ
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-488-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-488-1.png
new file mode 100644
index 000000000..38bc86fb9
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-488-1.png differ
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-519-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-494-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-519-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-494-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-53-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-50-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-53-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-50-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-56-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-53-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-56-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-53-1.png
diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-73-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-70-1.png
similarity index 100%
rename from docs/ismaykim_files/figure-html/unnamed-chunk-73-1.png
rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-70-1.png
diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png
new file mode 100644
index 000000000..a3dff48e8
Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png differ
diff --git a/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js
new file mode 100644
index 000000000..7d6121e1d
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js
@@ -0,0 +1,6 @@
+/*! @license Copyright 2014 Dan Vanderkam (danvdk@gmail.com) MIT-licensed (http://opensource.org/licenses/MIT) */
+!function(t){"use strict";for(var e,a,i={},r=function(){},n="memory".split(","),o="assert,clear,count,debug,dir,dirxml,error,exception,group,groupCollapsed,groupEnd,info,log,markTimeline,profile,profiles,profileEnd,show,table,time,timeEnd,timeline,timelineEnd,timeStamp,trace,warn".split(",");e=n.pop();)t[e]=t[e]||i;for(;a=o.pop();)t[a]=t[a]||r}(this.console=this.console||{}),function(){"use strict";CanvasRenderingContext2D.prototype.installPattern=function(t){if("undefined"!=typeof this.isPatternInstalled)throw"Must un-install old line pattern before installing a new one.";this.isPatternInstalled=!0;var e=[0,0],a=[],i=this.beginPath,r=this.lineTo,n=this.moveTo,o=this.stroke;this.uninstallPattern=function(){this.beginPath=i,this.lineTo=r,this.moveTo=n,this.stroke=o,this.uninstallPattern=void 0,this.isPatternInstalled=void 0},this.beginPath=function(){a=[],i.call(this)},this.moveTo=function(t,e){a.push([[t,e]]),n.call(this,t,e)},this.lineTo=function(t,e){var i=a[a.length-1];i.push([t,e])},this.stroke=function(){if(0===a.length)return void o.call(this);for(var i=0;i<a.length;i++)for(var s=a[i],l=s[0][0],h=s[0][1],p=1;p<s.length;p++){var g=s[p][0],d=s[p][1];this.save();var u=g-l,c=d-h,y=Math.sqrt(u*u+c*c),_=Math.atan2(c,u);this.translate(l,h),n.call(this,0,0),this.rotate(_);for(var v=e[0],f=0;y>f;){var x=t[v];f+=e[1]?e[1]:x,f>y?(e=[v,f-y],f=y):e=[(v+1)%t.length,0],v%2===0?r.call(this,f,0):n.call(this,f,0),v=(v+1)%t.length}this.restore(),l=g,h=d}o.call(this),a=[]}},CanvasRenderingContext2D.prototype.uninstallPattern=function(){throw"Must install a line pattern before uninstalling it."}}();var DygraphOptions=function(){return function(){"use strict";var t=function(t){this.dygraph_=t,this.yAxes_=[],this.xAxis_={},this.series_={},this.global_=this.dygraph_.attrs_,this.user_=this.dygraph_.user_attrs_||{},this.labels_=[],this.highlightSeries_=this.get("highlightSeriesOpts")||{},this.reparseSeries()};t.AXIS_STRING_MAPPINGS_={y:0,Y:0,y1:0,Y1:0,y2:1,Y2:1},t.axisToIndex_=function(e){if("string"==typeof e){if(t.AXIS_STRING_MAPPINGS_.hasOwnProperty(e))return t.AXIS_STRING_MAPPINGS_[e];throw"Unknown axis : "+e}if("number"==typeof e){if(0===e||1===e)return e;throw"Dygraphs only supports two y-axes, indexed from 0-1."}if(e)throw"Unknown axis : "+e;return 0},t.prototype.reparseSeries=function(){var e=this.get("labels");if(e){this.labels_=e.slice(1),this.yAxes_=[{series:[],options:{}}],this.xAxis_={options:{}},this.series_={};var a=!this.user_.series;if(a){for(var i=0,r=0;r<this.labels_.length;r++){var n=this.labels_[r],o=this.user_[n]||{},s=0,l=o.axis;"object"==typeof l&&(s=++i,this.yAxes_[s]={series:[n],options:l}),l||this.yAxes_[0].series.push(n),this.series_[n]={idx:r,yAxis:s,options:o}}for(var r=0;r<this.labels_.length;r++){var n=this.labels_[r],o=this.series_[n].options,l=o.axis;if("string"==typeof l){if(!this.series_.hasOwnProperty(l))return void console.error("Series "+n+" wants to share a y-axis with series "+l+", which does not define its own axis.");var s=this.series_[l].yAxis;this.series_[n].yAxis=s,this.yAxes_[s].series.push(n)}}}else for(var r=0;r<this.labels_.length;r++){var n=this.labels_[r],o=this.user_.series[n]||{},s=t.axisToIndex_(o.axis);this.series_[n]={idx:r,yAxis:s,options:o},this.yAxes_[s]?this.yAxes_[s].series.push(n):this.yAxes_[s]={series:[n],options:{}}}var h=this.user_.axes||{};Dygraph.update(this.yAxes_[0].options,h.y||{}),this.yAxes_.length>1&&Dygraph.update(this.yAxes_[1].options,h.y2||{}),Dygraph.update(this.xAxis_.options,h.x||{})}},t.prototype.get=function(t){var e=this.getGlobalUser_(t);return null!==e?e:this.getGlobalDefault_(t)},t.prototype.getGlobalUser_=function(t){return this.user_.hasOwnProperty(t)?this.user_[t]:null},t.prototype.getGlobalDefault_=function(t){return this.global_.hasOwnProperty(t)?this.global_[t]:Dygraph.DEFAULT_ATTRS.hasOwnProperty(t)?Dygraph.DEFAULT_ATTRS[t]:null},t.prototype.getForAxis=function(t,e){var a,i;if("number"==typeof e)a=e,i=0===a?"y":"y2";else{if("y1"==e&&(e="y"),"y"==e)a=0;else if("y2"==e)a=1;else{if("x"!=e)throw"Unknown axis "+e;a=-1}i=e}var r=-1==a?this.xAxis_:this.yAxes_[a];if(r){var n=r.options;if(n.hasOwnProperty(t))return n[t]}if("x"!==e||"logscale"!==t){var o=this.getGlobalUser_(t);if(null!==o)return o}var s=Dygraph.DEFAULT_ATTRS.axes[i];return s.hasOwnProperty(t)?s[t]:this.getGlobalDefault_(t)},t.prototype.getForSeries=function(t,e){if(e===this.dygraph_.getHighlightSeries()&&this.highlightSeries_.hasOwnProperty(t))return this.highlightSeries_[t];if(!this.series_.hasOwnProperty(e))throw"Unknown series: "+e;var a=this.series_[e],i=a.options;return i.hasOwnProperty(t)?i[t]:this.getForAxis(t,a.yAxis)},t.prototype.numAxes=function(){return this.yAxes_.length},t.prototype.axisForSeries=function(t){return this.series_[t].yAxis},t.prototype.axisOptions=function(t){return this.yAxes_[t].options},t.prototype.seriesForAxis=function(t){return this.yAxes_[t].series},t.prototype.seriesNames=function(){return this.labels_};return t}()}(),DygraphLayout=function(){"use strict";var t=function(t){this.dygraph_=t,this.points=[],this.setNames=[],this.annotations=[],this.yAxes_=null,this.xTicks_=null,this.yTicks_=null};return t.prototype.addDataset=function(t,e){this.points.push(e),this.setNames.push(t)},t.prototype.getPlotArea=function(){return this.area_},t.prototype.computePlotArea=function(){var t={x:0,y:0};t.w=this.dygraph_.width_-t.x-this.dygraph_.getOption("rightGap"),t.h=this.dygraph_.height_;var e={chart_div:this.dygraph_.graphDiv,reserveSpaceLeft:function(e){var a={x:t.x,y:t.y,w:e,h:t.h};return t.x+=e,t.w-=e,a},reserveSpaceRight:function(e){var a={x:t.x+t.w-e,y:t.y,w:e,h:t.h};return t.w-=e,a},reserveSpaceTop:function(e){var a={x:t.x,y:t.y,w:t.w,h:e};return t.y+=e,t.h-=e,a},reserveSpaceBottom:function(e){var a={x:t.x,y:t.y+t.h-e,w:t.w,h:e};return t.h-=e,a},chartRect:function(){return{x:t.x,y:t.y,w:t.w,h:t.h}}};this.dygraph_.cascadeEvents_("layout",e),this.area_=t},t.prototype.setAnnotations=function(t){this.annotations=[];for(var e=this.dygraph_.getOption("xValueParser")||function(t){return t},a=0;a<t.length;a++){var i={};if(!t[a].xval&&void 0===t[a].x)return void console.error("Annotations must have an 'x' property");if(t[a].icon&&(!t[a].hasOwnProperty("width")||!t[a].hasOwnProperty("height")))return void console.error("Must set width and height when setting annotation.icon property");Dygraph.update(i,t[a]),i.xval||(i.xval=e(i.x)),this.annotations.push(i)}},t.prototype.setXTicks=function(t){this.xTicks_=t},t.prototype.setYAxes=function(t){this.yAxes_=t},t.prototype.evaluate=function(){this._xAxis={},this._evaluateLimits(),this._evaluateLineCharts(),this._evaluateLineTicks(),this._evaluateAnnotations()},t.prototype._evaluateLimits=function(){var t=this.dygraph_.xAxisRange();this._xAxis.minval=t[0],this._xAxis.maxval=t[1];var e=t[1]-t[0];this._xAxis.scale=0!==e?1/e:1,this.dygraph_.getOptionForAxis("logscale","x")&&(this._xAxis.xlogrange=Dygraph.log10(this._xAxis.maxval)-Dygraph.log10(this._xAxis.minval),this._xAxis.xlogscale=0!==this._xAxis.xlogrange?1/this._xAxis.xlogrange:1);for(var a=0;a<this.yAxes_.length;a++){var i=this.yAxes_[a];i.minyval=i.computedValueRange[0],i.maxyval=i.computedValueRange[1],i.yrange=i.maxyval-i.minyval,i.yscale=0!==i.yrange?1/i.yrange:1,this.dygraph_.getOption("logscale")&&(i.ylogrange=Dygraph.log10(i.maxyval)-Dygraph.log10(i.minyval),i.ylogscale=0!==i.ylogrange?1/i.ylogrange:1,(!isFinite(i.ylogrange)||isNaN(i.ylogrange))&&console.error("axis "+a+" of graph at "+i.g+" can't be displayed in log scale for range ["+i.minyval+" - "+i.maxyval+"]"))}},t.calcXNormal_=function(t,e,a){return a?(Dygraph.log10(t)-Dygraph.log10(e.minval))*e.xlogscale:(t-e.minval)*e.scale},t.calcYNormal_=function(t,e,a){if(a){var i=1-(Dygraph.log10(e)-Dygraph.log10(t.minyval))*t.ylogscale;return isFinite(i)?i:0/0}return 1-(e-t.minyval)*t.yscale},t.prototype._evaluateLineCharts=function(){for(var e=this.dygraph_.getOption("stackedGraph"),a=this.dygraph_.getOptionForAxis("logscale","x"),i=0;i<this.points.length;i++){for(var r=this.points[i],n=this.setNames[i],o=this.dygraph_.getOption("connectSeparatedPoints",n),s=this.dygraph_.axisPropertiesForSeries(n),l=this.dygraph_.attributes_.getForSeries("logscale",n),h=0;h<r.length;h++){var p=r[h];p.x=t.calcXNormal_(p.xval,this._xAxis,a);var g=p.yval;e&&(p.y_stacked=t.calcYNormal_(s,p.yval_stacked,l),null===g||isNaN(g)||(g=p.yval_stacked)),null===g&&(g=0/0,o||(p.yval=0/0)),p.y=t.calcYNormal_(s,g,l)}this.dygraph_.dataHandler_.onLineEvaluated(r,s,l)}},t.prototype._evaluateLineTicks=function(){var t,e,a,i;for(this.xticks=[],t=0;t<this.xTicks_.length;t++)e=this.xTicks_[t],a=e.label,i=this.dygraph_.toPercentXCoord(e.v),i>=0&&1>i&&this.xticks.push([i,a]);for(this.yticks=[],t=0;t<this.yAxes_.length;t++)for(var r=this.yAxes_[t],n=0;n<r.ticks.length;n++)e=r.ticks[n],a=e.label,i=this.dygraph_.toPercentYCoord(e.v,t),i>0&&1>=i&&this.yticks.push([t,i,a])},t.prototype._evaluateAnnotations=function(){var t,e={};for(t=0;t<this.annotations.length;t++){var a=this.annotations[t];e[a.xval+","+a.series]=a}if(this.annotated_points=[],this.annotations&&this.annotations.length)for(var i=0;i<this.points.length;i++){var r=this.points[i];for(t=0;t<r.length;t++){var n=r[t],o=n.xval+","+n.name;o in e&&(n.annotation=e[o],this.annotated_points.push(n))}}},t.prototype.removeAllDatasets=function(){delete this.points,delete this.setNames,delete this.setPointsLengths,delete this.setPointsOffsets,this.points=[],this.setNames=[],this.setPointsLengths=[],this.setPointsOffsets=[]},t}(),DygraphCanvasRenderer=function(){"use strict";var t=function(t,e,a,i){if(this.dygraph_=t,this.layout=i,this.element=e,this.elementContext=a,this.height=t.height_,this.width=t.width_,!this.isIE&&!Dygraph.isCanvasSupported(this.element))throw"Canvas is not supported.";if(this.area=i.getPlotArea(),this.dygraph_.isUsingExcanvas_)this._createIEClipArea();else if(!Dygraph.isAndroid()){var r=this.dygraph_.canvas_ctx_;r.beginPath(),r.rect(this.area.x,this.area.y,this.area.w,this.area.h),r.clip(),r=this.dygraph_.hidden_ctx_,r.beginPath(),r.rect(this.area.x,this.area.y,this.area.w,this.area.h),r.clip()}};return t.prototype.clear=function(){var t;if(this.isIE)try{this.clearDelay&&(this.clearDelay.cancel(),this.clearDelay=null),t=this.elementContext}catch(e){return}t=this.elementContext,t.clearRect(0,0,this.width,this.height)},t.prototype.render=function(){this._updatePoints(),this._renderLineChart()},t.prototype._createIEClipArea=function(){function t(t){if(0!==t.w&&0!==t.h){var i=document.createElement("div");i.className=e,i.style.backgroundColor=r,i.style.position="absolute",i.style.left=t.x+"px",i.style.top=t.y+"px",i.style.width=t.w+"px",i.style.height=t.h+"px",a.appendChild(i)}}for(var e="dygraph-clip-div",a=this.dygraph_.graphDiv,i=a.childNodes.length-1;i>=0;i--)a.childNodes[i].className==e&&a.removeChild(a.childNodes[i]);for(var r=document.bgColor,n=this.dygraph_.graphDiv;n!=document;){var o=n.currentStyle.backgroundColor;if(o&&"transparent"!=o){r=o;break}n=n.parentNode}var s=this.area;t({x:0,y:0,w:s.x,h:this.height}),t({x:s.x,y:0,w:this.width-s.x,h:s.y}),t({x:s.x+s.w,y:0,w:this.width-s.x-s.w,h:this.height}),t({x:s.x,y:s.y+s.h,w:this.width-s.x,h:this.height-s.h-s.y})},t._getIteratorPredicate=function(e){return e?t._predicateThatSkipsEmptyPoints:null},t._predicateThatSkipsEmptyPoints=function(t,e){return null!==t[e].yval},t._drawStyledLine=function(e,a,i,r,n,o,s){var l=e.dygraph,h=l.getBooleanOption("stepPlot",e.setName);Dygraph.isArrayLike(r)||(r=null);var p=l.getBooleanOption("drawGapEdgePoints",e.setName),g=e.points,d=e.setName,u=Dygraph.createIterator(g,0,g.length,t._getIteratorPredicate(l.getBooleanOption("connectSeparatedPoints",d))),c=r&&r.length>=2,y=e.drawingContext;y.save(),c&&y.installPattern(r);var _=t._drawSeries(e,u,i,s,n,p,h,a);t._drawPointsOnLine(e,_,o,a,s),c&&y.uninstallPattern(),y.restore()},t._drawSeries=function(t,e,a,i,r,n,o,s){var l,h,p=null,g=null,d=null,u=[],c=!0,y=t.drawingContext;y.beginPath(),y.strokeStyle=s,y.lineWidth=a;for(var _=e.array_,v=e.end_,f=e.predicate_,x=e.start_;v>x;x++){if(h=_[x],f){for(;v>x&&!f(_,x);)x++;if(x==v)break;h=_[x]}if(null===h.canvasy||h.canvasy!=h.canvasy)o&&null!==p&&(y.moveTo(p,g),y.lineTo(h.canvasx,g)),p=g=null;else{if(l=!1,n||!p){e.nextIdx_=x,e.next(),d=e.hasNext?e.peek.canvasy:null;var m=null===d||d!=d;l=!p&&m,n&&(!c&&!p||e.hasNext&&m)&&(l=!0)}null!==p?a&&(o&&(y.moveTo(p,g),y.lineTo(h.canvasx,g)),y.lineTo(h.canvasx,h.canvasy)):y.moveTo(h.canvasx,h.canvasy),(r||l)&&u.push([h.canvasx,h.canvasy,h.idx]),p=h.canvasx,g=h.canvasy}c=!1}return y.stroke(),u},t._drawPointsOnLine=function(t,e,a,i,r){for(var n=t.drawingContext,o=0;o<e.length;o++){var s=e[o];n.save(),a.call(t.dygraph,t.dygraph,t.setName,n,s[0],s[1],i,r,s[2]),n.restore()}},t.prototype._updatePoints=function(){for(var t=this.layout.points,e=t.length;e--;)for(var a=t[e],i=a.length;i--;){var r=a[i];r.canvasx=this.area.w*r.x+this.area.x,r.canvasy=this.area.h*r.y+this.area.y}},t.prototype._renderLineChart=function(t,e){var a,i,r=e||this.elementContext,n=this.layout.points,o=this.layout.setNames;this.colors=this.dygraph_.colorsMap_;var s=this.dygraph_.getOption("plotter"),l=s;Dygraph.isArrayLike(l)||(l=[l]);var h={};for(a=0;a<o.length;a++){i=o[a];var p=this.dygraph_.getOption("plotter",i);p!=s&&(h[i]=p)}for(a=0;a<l.length;a++)for(var g=l[a],d=a==l.length-1,u=0;u<n.length;u++)if(i=o[u],!t||i==t){var c=n[u],y=g;if(i in h){if(!d)continue;y=h[i]}var _=this.colors[i],v=this.dygraph_.getOption("strokeWidth",i);r.save(),r.strokeStyle=_,r.lineWidth=v,y({points:c,setName:i,drawingContext:r,color:_,strokeWidth:v,dygraph:this.dygraph_,axis:this.dygraph_.axisPropertiesForSeries(i),plotArea:this.area,seriesIndex:u,seriesCount:n.length,singleSeriesName:t,allSeriesPoints:n}),r.restore()}},t._Plotters={linePlotter:function(e){t._linePlotter(e)},fillPlotter:function(e){t._fillPlotter(e)},errorPlotter:function(e){t._errorPlotter(e)}},t._linePlotter=function(e){var a=e.dygraph,i=e.setName,r=e.strokeWidth,n=a.getNumericOption("strokeBorderWidth",i),o=a.getOption("drawPointCallback",i)||Dygraph.Circles.DEFAULT,s=a.getOption("strokePattern",i),l=a.getBooleanOption("drawPoints",i),h=a.getNumericOption("pointSize",i);n&&r&&t._drawStyledLine(e,a.getOption("strokeBorderColor",i),r+2*n,s,l,o,h),t._drawStyledLine(e,e.color,r,s,l,o,h)},t._errorPlotter=function(e){var a=e.dygraph,i=e.setName,r=a.getBooleanOption("errorBars")||a.getBooleanOption("customBars");if(r){var n=a.getBooleanOption("fillGraph",i);n&&console.warn("Can't use fillGraph option with error bars");var o,s=e.drawingContext,l=e.color,h=a.getNumericOption("fillAlpha",i),p=a.getBooleanOption("stepPlot",i),g=e.points,d=Dygraph.createIterator(g,0,g.length,t._getIteratorPredicate(a.getBooleanOption("connectSeparatedPoints",i))),u=0/0,c=0/0,y=[-1,-1],_=Dygraph.toRGB_(l),v="rgba("+_.r+","+_.g+","+_.b+","+h+")";s.fillStyle=v,s.beginPath();for(var f=function(t){return null===t||void 0===t||isNaN(t)};d.hasNext;){var x=d.next();!p&&f(x.y)||p&&!isNaN(c)&&f(c)?u=0/0:(o=[x.y_bottom,x.y_top],p&&(c=x.y),isNaN(o[0])&&(o[0]=x.y),isNaN(o[1])&&(o[1]=x.y),o[0]=e.plotArea.h*o[0]+e.plotArea.y,o[1]=e.plotArea.h*o[1]+e.plotArea.y,isNaN(u)||(p?(s.moveTo(u,y[0]),s.lineTo(x.canvasx,y[0]),s.lineTo(x.canvasx,y[1])):(s.moveTo(u,y[0]),s.lineTo(x.canvasx,o[0]),s.lineTo(x.canvasx,o[1])),s.lineTo(u,y[1]),s.closePath()),y=o,u=x.canvasx)}s.fill()}},t._fastCanvasProxy=function(t){var e=[],a=null,i=null,r=1,n=2,o=0,s=function(t){if(!(e.length<=1)){for(var a=e.length-1;a>0;a--){var i=e[a];if(i[0]==n){var o=e[a-1];o[1]==i[1]&&o[2]==i[2]&&e.splice(a,1)}}for(var a=0;a<e.length-1;){var i=e[a];i[0]==n&&e[a+1][0]==n?e.splice(a,1):a++}if(e.length>2&&!t){var s=0;e[0][0]==n&&s++;for(var l=null,h=null,a=s;a<e.length;a++){var i=e[a];if(i[0]==r)if(null===l&&null===h)l=a,h=a;else{var p=i[2];p<e[l][2]?l=a:p>e[h][2]&&(h=a)}}var g=e[l],d=e[h];e.splice(s,e.length-s),h>l?(e.push(g),e.push(d)):l>h?(e.push(d),e.push(g)):e.push(g)}}},l=function(a){s(a);for(var l=0,h=e.length;h>l;l++){var p=e[l];p[0]==r?t.lineTo(p[1],p[2]):p[0]==n&&t.moveTo(p[1],p[2])}e.length&&(i=e[e.length-1][1]),o+=e.length,e=[]},h=function(t,r,n){var o=Math.round(r);if(null===a||o!=a){var s=a-i>1,h=o-a>1,p=s||h;l(p),a=o}e.push([t,r,n])};return{moveTo:function(t,e){h(n,t,e)},lineTo:function(t,e){h(r,t,e)},stroke:function(){l(!0),t.stroke()},fill:function(){l(!0),t.fill()},beginPath:function(){l(!0),t.beginPath()},closePath:function(){l(!0),t.closePath()},_count:function(){return o}}},t._fillPlotter=function(e){if(!e.singleSeriesName&&0===e.seriesIndex){for(var a=e.dygraph,i=a.getLabels().slice(1),r=i.length;r>=0;r--)a.visibility()[r]||i.splice(r,1);var n=function(){for(var t=0;t<i.length;t++)if(a.getBooleanOption("fillGraph",i[t]))return!0;return!1}();if(n)for(var o,s,l=e.plotArea,h=e.allSeriesPoints,p=h.length,g=a.getNumericOption("fillAlpha"),d=a.getBooleanOption("stackedGraph"),u=a.getColors(),c={},y=function(t,e,a,i){if(t.lineTo(e,a),d)for(var r=i.length-1;r>=0;r--){var n=i[r];t.lineTo(n[0],n[1])}},_=p-1;_>=0;_--){var v=e.drawingContext,f=i[_];if(a.getBooleanOption("fillGraph",f)){var x=a.getBooleanOption("stepPlot",f),m=u[_],D=a.axisPropertiesForSeries(f),w=1+D.minyval*D.yscale;0>w?w=0:w>1&&(w=1),w=l.h*w+l.y;var A,b=h[_],T=Dygraph.createIterator(b,0,b.length,t._getIteratorPredicate(a.getBooleanOption("connectSeparatedPoints",f))),E=0/0,C=[-1,-1],L=Dygraph.toRGB_(m),P="rgba("+L.r+","+L.g+","+L.b+","+g+")";v.fillStyle=P,v.beginPath();var S,O=!0;(b.length>2*a.width_||Dygraph.FORCE_FAST_PROXY)&&(v=t._fastCanvasProxy(v));for(var M,R=[];T.hasNext;)if(M=T.next(),Dygraph.isOK(M.y)||x){if(d){if(!O&&S==M.xval)continue;O=!1,S=M.xval,o=c[M.canvasx];var F;F=void 0===o?w:s?o[0]:o,A=[M.canvasy,F],x?-1===C[0]?c[M.canvasx]=[M.canvasy,w]:c[M.canvasx]=[M.canvasy,C[0]]:c[M.canvasx]=M.canvasy}else A=isNaN(M.canvasy)&&x?[l.y+l.h,w]:[M.canvasy,w];isNaN(E)?(v.moveTo(M.canvasx,A[1]),v.lineTo(M.canvasx,A[0])):(x?(v.lineTo(M.canvasx,C[0]),v.lineTo(M.canvasx,A[0])):v.lineTo(M.canvasx,A[0]),d&&(R.push([E,C[1]]),R.push(s&&o?[M.canvasx,o[1]]:[M.canvasx,A[1]]))),C=A,E=M.canvasx}else y(v,E,C[1],R),R=[],E=0/0,null===M.y_stacked||isNaN(M.y_stacked)||(c[M.canvasx]=l.h*M.y_stacked+l.y);s=x,A&&M&&(y(v,M.canvasx,A[1],R),R=[]),v.fill()}}}},t}(),Dygraph=function(){"use strict";var t=function(t,e,a,i){this.is_initial_draw_=!0,this.readyFns_=[],void 0!==i?(console.warn("Using deprecated four-argument dygraph constructor"),this.__old_init__(t,e,a,i)):this.__init__(t,e,a)};return t.NAME="Dygraph",t.VERSION="1.1.1",t.__repr__=function(){return"["+t.NAME+" "+t.VERSION+"]"},t.toString=function(){return t.__repr__()},t.DEFAULT_ROLL_PERIOD=1,t.DEFAULT_WIDTH=480,t.DEFAULT_HEIGHT=320,t.ANIMATION_STEPS=12,t.ANIMATION_DURATION=200,t.KMB_LABELS=["K","M","B","T","Q"],t.KMG2_BIG_LABELS=["k","M","G","T","P","E","Z","Y"],t.KMG2_SMALL_LABELS=["m","u","n","p","f","a","z","y"],t.numberValueFormatter=function(e,a){var i=a("sigFigs");if(null!==i)return t.floatFormat(e,i);var r,n=a("digitsAfterDecimal"),o=a("maxNumberWidth"),s=a("labelsKMB"),l=a("labelsKMG2");if(r=0!==e&&(Math.abs(e)>=Math.pow(10,o)||Math.abs(e)<Math.pow(10,-n))?e.toExponential(n):""+t.round_(e,n),s||l){var h,p=[],g=[];s&&(h=1e3,p=t.KMB_LABELS),l&&(s&&console.warn("Setting both labelsKMB and labelsKMG2. Pick one!"),h=1024,p=t.KMG2_BIG_LABELS,g=t.KMG2_SMALL_LABELS);for(var d=Math.abs(e),u=t.pow(h,p.length),c=p.length-1;c>=0;c--,u/=h)if(d>=u){r=t.round_(e/u,n)+p[c];break}if(l){var y=String(e.toExponential()).split("e-");2===y.length&&y[1]>=3&&y[1]<=24&&(r=y[1]%3>0?t.round_(y[0]/t.pow(10,y[1]%3),n):Number(y[0]).toFixed(2),r+=g[Math.floor(y[1]/3)-1])}}return r},t.numberAxisLabelFormatter=function(e,a,i){return t.numberValueFormatter.call(this,e,i)},t.SHORT_MONTH_NAMES_=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],t.dateAxisLabelFormatter=function(e,a,i){var r=i("labelsUTC"),n=r?t.DateAccessorsUTC:t.DateAccessorsLocal,o=n.getFullYear(e),s=n.getMonth(e),l=n.getDate(e),h=n.getHours(e),p=n.getMinutes(e),g=n.getSeconds(e),d=n.getSeconds(e);if(a>=t.DECADAL)return""+o;if(a>=t.MONTHLY)return t.SHORT_MONTH_NAMES_[s]+"&#160;"+o;var u=3600*h+60*p+g+.001*d;return 0===u||a>=t.DAILY?t.zeropad(l)+"&#160;"+t.SHORT_MONTH_NAMES_[s]:t.hmsString_(h,p,g)},t.dateAxisFormatter=t.dateAxisLabelFormatter,t.dateValueFormatter=function(e,a){return t.dateString_(e,a("labelsUTC"))},t.Plotters=DygraphCanvasRenderer._Plotters,t.DEFAULT_ATTRS={highlightCircleSize:3,highlightSeriesOpts:null,highlightSeriesBackgroundAlpha:.5,labelsDivWidth:250,labelsDivStyles:{},labelsSeparateLines:!1,labelsShowZeroValues:!0,labelsKMB:!1,labelsKMG2:!1,showLabelsOnHighlight:!0,digitsAfterDecimal:2,maxNumberWidth:6,sigFigs:null,strokeWidth:1,strokeBorderWidth:0,strokeBorderColor:"white",axisTickSize:3,axisLabelFontSize:14,rightGap:5,showRoller:!1,xValueParser:t.dateParser,delimiter:",",sigma:2,errorBars:!1,fractions:!1,wilsonInterval:!0,customBars:!1,fillGraph:!1,fillAlpha:.15,connectSeparatedPoints:!1,stackedGraph:!1,stackedGraphNaNFill:"all",hideOverlayOnMouseOut:!0,legend:"onmouseover",stepPlot:!1,avoidMinZero:!1,xRangePad:0,yRangePad:null,drawAxesAtZero:!1,titleHeight:28,xLabelHeight:18,yLabelWidth:18,drawXAxis:!0,drawYAxis:!0,axisLineColor:"black",axisLineWidth:.3,gridLineWidth:.3,axisLabelColor:"black",axisLabelWidth:50,drawYGrid:!0,drawXGrid:!0,gridLineColor:"rgb(128,128,128)",interactionModel:null,animatedZooms:!1,showRangeSelector:!1,rangeSelectorHeight:40,rangeSelectorPlotStrokeColor:"#808FAB",rangeSelectorPlotFillColor:"#A7B1C4",showInRangeSelector:null,plotter:[t.Plotters.fillPlotter,t.Plotters.errorPlotter,t.Plotters.linePlotter],plugins:[],axes:{x:{pixelsPerLabel:70,axisLabelWidth:60,axisLabelFormatter:t.dateAxisLabelFormatter,valueFormatter:t.dateValueFormatter,drawGrid:!0,drawAxis:!0,independentTicks:!0,ticker:null},y:{axisLabelWidth:50,pixelsPerLabel:30,valueFormatter:t.numberValueFormatter,axisLabelFormatter:t.numberAxisLabelFormatter,drawGrid:!0,drawAxis:!0,independentTicks:!0,ticker:null},y2:{axisLabelWidth:50,pixelsPerLabel:30,valueFormatter:t.numberValueFormatter,axisLabelFormatter:t.numberAxisLabelFormatter,drawAxis:!0,drawGrid:!1,independentTicks:!1,ticker:null}}},t.HORIZONTAL=1,t.VERTICAL=2,t.PLUGINS=[],t.addedAnnotationCSS=!1,t.prototype.__old_init__=function(e,a,i,r){if(null!==i){for(var n=["Date"],o=0;o<i.length;o++)n.push(i[o]);t.update(r,{labels:n})}this.__init__(e,a,r)},t.prototype.__init__=function(e,a,i){if(/MSIE/.test(navigator.userAgent)&&!window.opera&&"undefined"!=typeof G_vmlCanvasManager&&"complete"!=document.readyState){var r=this;return void setTimeout(function(){r.__init__(e,a,i)},100)}if((null===i||void 0===i)&&(i={}),i=t.mapLegacyOptions_(i),"string"==typeof e&&(e=document.getElementById(e)),!e)return void console.error("Constructing dygraph with a non-existent div!");this.isUsingExcanvas_="undefined"!=typeof G_vmlCanvasManager,this.maindiv_=e,this.file_=a,this.rollPeriod_=i.rollPeriod||t.DEFAULT_ROLL_PERIOD,this.previousVerticalX_=-1,this.fractions_=i.fractions||!1,this.dateWindow_=i.dateWindow||null,this.annotations_=[],this.zoomed_x_=!1,this.zoomed_y_=!1,e.innerHTML="",""===e.style.width&&i.width&&(e.style.width=i.width+"px"),""===e.style.height&&i.height&&(e.style.height=i.height+"px"),""===e.style.height&&0===e.clientHeight&&(e.style.height=t.DEFAULT_HEIGHT+"px",""===e.style.width&&(e.style.width=t.DEFAULT_WIDTH+"px")),this.width_=e.clientWidth||i.width||0,this.height_=e.clientHeight||i.height||0,i.stackedGraph&&(i.fillGraph=!0),this.user_attrs_={},t.update(this.user_attrs_,i),this.attrs_={},t.updateDeep(this.attrs_,t.DEFAULT_ATTRS),this.boundaryIds_=[],this.setIndexByName_={},this.datasetIndex_=[],this.registeredEvents_=[],this.eventListeners_={},this.attributes_=new DygraphOptions(this),this.createInterface_(),this.plugins_=[];for(var n=t.PLUGINS.concat(this.getOption("plugins")),o=0;o<n.length;o++){var s,l=n[o];s="undefined"!=typeof l.activate?l:new l;var h={plugin:s,events:{},options:{},pluginOptions:{}},p=s.activate(this);for(var g in p)p.hasOwnProperty(g)&&(h.events[g]=p[g]);this.plugins_.push(h)}for(var o=0;o<this.plugins_.length;o++){var d=this.plugins_[o];for(var g in d.events)if(d.events.hasOwnProperty(g)){var u=d.events[g],c=[d.plugin,u];g in this.eventListeners_?this.eventListeners_[g].push(c):this.eventListeners_[g]=[c]}}this.createDragInterface_(),this.start_()},t.prototype.cascadeEvents_=function(e,a){if(!(e in this.eventListeners_))return!1;var i={dygraph:this,cancelable:!1,defaultPrevented:!1,preventDefault:function(){if(!i.cancelable)throw"Cannot call preventDefault on non-cancelable event.";i.defaultPrevented=!0},propagationStopped:!1,stopPropagation:function(){i.propagationStopped=!0}};t.update(i,a);var r=this.eventListeners_[e];if(r)for(var n=r.length-1;n>=0;n--){var o=r[n][0],s=r[n][1];if(s.call(o,i),i.propagationStopped)break}return i.defaultPrevented},t.prototype.getPluginInstance_=function(t){for(var e=0;e<this.plugins_.length;e++){var a=this.plugins_[e];if(a.plugin instanceof t)return a.plugin}return null},t.prototype.isZoomed=function(t){if(null===t||void 0===t)return this.zoomed_x_||this.zoomed_y_;if("x"===t)return this.zoomed_x_;if("y"===t)return this.zoomed_y_;throw"axis parameter is ["+t+"] must be null, 'x' or 'y'."},t.prototype.toString=function(){var t=this.maindiv_,e=t&&t.id?t.id:t;return"[Dygraph "+e+"]"},t.prototype.attr_=function(t,e){return e?this.attributes_.getForSeries(t,e):this.attributes_.get(t)},t.prototype.getOption=function(t,e){return this.attr_(t,e)},t.prototype.getNumericOption=function(t,e){return this.getOption(t,e)},t.prototype.getStringOption=function(t,e){return this.getOption(t,e)},t.prototype.getBooleanOption=function(t,e){return this.getOption(t,e)},t.prototype.getFunctionOption=function(t,e){return this.getOption(t,e)},t.prototype.getOptionForAxis=function(t,e){return this.attributes_.getForAxis(t,e)},t.prototype.optionsViewForAxis_=function(t){var e=this;return function(a){var i=e.user_attrs_.axes;return i&&i[t]&&i[t].hasOwnProperty(a)?i[t][a]:"x"===t&&"logscale"===a?!1:"undefined"!=typeof e.user_attrs_[a]?e.user_attrs_[a]:(i=e.attrs_.axes,i&&i[t]&&i[t].hasOwnProperty(a)?i[t][a]:"y"==t&&e.axes_[0].hasOwnProperty(a)?e.axes_[0][a]:"y2"==t&&e.axes_[1].hasOwnProperty(a)?e.axes_[1][a]:e.attr_(a))}},t.prototype.rollPeriod=function(){return this.rollPeriod_},t.prototype.xAxisRange=function(){return this.dateWindow_?this.dateWindow_:this.xAxisExtremes()},t.prototype.xAxisExtremes=function(){var t=this.getNumericOption("xRangePad")/this.plotter_.area.w;if(0===this.numRows())return[0-t,1+t];var e=this.rawData_[0][0],a=this.rawData_[this.rawData_.length-1][0];if(t){var i=a-e;e-=i*t,a+=i*t}return[e,a]},t.prototype.yAxisRange=function(t){if("undefined"==typeof t&&(t=0),0>t||t>=this.axes_.length)return null;var e=this.axes_[t];return[e.computedValueRange[0],e.computedValueRange[1]]},t.prototype.yAxisRanges=function(){for(var t=[],e=0;e<this.axes_.length;e++)t.push(this.yAxisRange(e));return t},t.prototype.toDomCoords=function(t,e,a){return[this.toDomXCoord(t),this.toDomYCoord(e,a)]},t.prototype.toDomXCoord=function(t){if(null===t)return null;var e=this.plotter_.area,a=this.xAxisRange();return e.x+(t-a[0])/(a[1]-a[0])*e.w},t.prototype.toDomYCoord=function(t,e){var a=this.toPercentYCoord(t,e);if(null===a)return null;var i=this.plotter_.area;return i.y+a*i.h},t.prototype.toDataCoords=function(t,e,a){return[this.toDataXCoord(t),this.toDataYCoord(e,a)]},t.prototype.toDataXCoord=function(e){if(null===e)return null;var a=this.plotter_.area,i=this.xAxisRange();if(this.attributes_.getForAxis("logscale","x")){var r=(e-a.x)/a.w,n=t.log10(i[0]),o=t.log10(i[1]),s=n+r*(o-n),l=Math.pow(t.LOG_SCALE,s);return l}return i[0]+(e-a.x)/a.w*(i[1]-i[0])},t.prototype.toDataYCoord=function(e,a){if(null===e)return null;var i=this.plotter_.area,r=this.yAxisRange(a);if("undefined"==typeof a&&(a=0),this.attributes_.getForAxis("logscale",a)){var n=(e-i.y)/i.h,o=t.log10(r[0]),s=t.log10(r[1]),l=s-n*(s-o),h=Math.pow(t.LOG_SCALE,l);return h}return r[0]+(i.y+i.h-e)/i.h*(r[1]-r[0])},t.prototype.toPercentYCoord=function(e,a){if(null===e)return null;"undefined"==typeof a&&(a=0);var i,r=this.yAxisRange(a),n=this.attributes_.getForAxis("logscale",a);if(n){var o=t.log10(r[0]),s=t.log10(r[1]);i=(s-t.log10(e))/(s-o)}else i=(r[1]-e)/(r[1]-r[0]);return i},t.prototype.toPercentXCoord=function(e){if(null===e)return null;var a,i=this.xAxisRange(),r=this.attributes_.getForAxis("logscale","x");if(r===!0){var n=t.log10(i[0]),o=t.log10(i[1]);a=(t.log10(e)-n)/(o-n)}else a=(e-i[0])/(i[1]-i[0]);return a},t.prototype.numColumns=function(){return this.rawData_?this.rawData_[0]?this.rawData_[0].length:this.attr_("labels").length:0},t.prototype.numRows=function(){return this.rawData_?this.rawData_.length:0},t.prototype.getValue=function(t,e){return 0>t||t>this.rawData_.length?null:0>e||e>this.rawData_[t].length?null:this.rawData_[t][e]},t.prototype.createInterface_=function(){var e=this.maindiv_;this.graphDiv=document.createElement("div"),this.graphDiv.style.textAlign="left",this.graphDiv.style.position="relative",e.appendChild(this.graphDiv),this.canvas_=t.createCanvas(),this.canvas_.style.position="absolute",this.hidden_=this.createPlotKitCanvas_(this.canvas_),this.canvas_ctx_=t.getContext(this.canvas_),this.hidden_ctx_=t.getContext(this.hidden_),this.resizeElements_(),this.graphDiv.appendChild(this.hidden_),this.graphDiv.appendChild(this.canvas_),this.mouseEventElement_=this.createMouseEventElement_(),this.layout_=new DygraphLayout(this);var a=this;this.mouseMoveHandler_=function(t){a.mouseMove_(t)},this.mouseOutHandler_=function(e){var i=e.target||e.fromElement,r=e.relatedTarget||e.toElement;t.isNodeContainedBy(i,a.graphDiv)&&!t.isNodeContainedBy(r,a.graphDiv)&&a.mouseOut_(e)},this.addAndTrackEvent(window,"mouseout",this.mouseOutHandler_),this.addAndTrackEvent(this.mouseEventElement_,"mousemove",this.mouseMoveHandler_),this.resizeHandler_||(this.resizeHandler_=function(t){a.resize()},this.addAndTrackEvent(window,"resize",this.resizeHandler_))},t.prototype.resizeElements_=function(){this.graphDiv.style.width=this.width_+"px",this.graphDiv.style.height=this.height_+"px";var e=t.getContextPixelRatio(this.canvas_ctx_);this.canvas_.width=this.width_*e,this.canvas_.height=this.height_*e,this.canvas_.style.width=this.width_+"px",this.canvas_.style.height=this.height_+"px",1!==e&&this.canvas_ctx_.scale(e,e);var a=t.getContextPixelRatio(this.hidden_ctx_);this.hidden_.width=this.width_*a,this.hidden_.height=this.height_*a,this.hidden_.style.width=this.width_+"px",this.hidden_.style.height=this.height_+"px",1!==a&&this.hidden_ctx_.scale(a,a)},t.prototype.destroy=function(){this.canvas_ctx_.restore(),this.hidden_ctx_.restore();for(var e=this.plugins_.length-1;e>=0;e--){var a=this.plugins_.pop();a.plugin.destroy&&a.plugin.destroy()}var i=function(t){for(;t.hasChildNodes();)i(t.firstChild),t.removeChild(t.firstChild)};this.removeTrackedEvents_(),t.removeEvent(window,"mouseout",this.mouseOutHandler_),t.removeEvent(this.mouseEventElement_,"mousemove",this.mouseMoveHandler_),t.removeEvent(window,"resize",this.resizeHandler_),this.resizeHandler_=null,i(this.maindiv_);var r=function(t){for(var e in t)"object"==typeof t[e]&&(t[e]=null)};r(this.layout_),r(this.plotter_),r(this)},t.prototype.createPlotKitCanvas_=function(e){var a=t.createCanvas();return a.style.position="absolute",a.style.top=e.style.top,a.style.left=e.style.left,a.width=this.width_,a.height=this.height_,a.style.width=this.width_+"px",a.style.height=this.height_+"px",a},t.prototype.createMouseEventElement_=function(){if(this.isUsingExcanvas_){var t=document.createElement("div");return t.style.position="absolute",t.style.backgroundColor="white",t.style.filter="alpha(opacity=0)",t.style.width=this.width_+"px",t.style.height=this.height_+"px",this.graphDiv.appendChild(t),t}return this.canvas_},t.prototype.setColors_=function(){var e=this.getLabels(),a=e.length-1;this.colors_=[],this.colorsMap_={};for(var i=this.getNumericOption("colorSaturation")||1,r=this.getNumericOption("colorValue")||.5,n=Math.ceil(a/2),o=this.getOption("colors"),s=this.visibility(),l=0;a>l;l++)if(s[l]){
+var h=e[l+1],p=this.attributes_.getForSeries("color",h);if(!p)if(o)p=o[l%o.length];else{var g=l%2?n+(l+1)/2:Math.ceil((l+1)/2),d=1*g/(1+a);p=t.hsvToRGB(d,i,r)}this.colors_.push(p),this.colorsMap_[h]=p}},t.prototype.getColors=function(){return this.colors_},t.prototype.getPropertiesForSeries=function(t){for(var e=-1,a=this.getLabels(),i=1;i<a.length;i++)if(a[i]==t){e=i;break}return-1==e?null:{name:t,column:e,visible:this.visibility()[e-1],color:this.colorsMap_[t],axis:1+this.attributes_.axisForSeries(t)}},t.prototype.createRollInterface_=function(){this.roller_||(this.roller_=document.createElement("input"),this.roller_.type="text",this.roller_.style.display="none",this.graphDiv.appendChild(this.roller_));var t=this.getBooleanOption("showRoller")?"block":"none",e=this.plotter_.area,a={position:"absolute",zIndex:10,top:e.y+e.h-25+"px",left:e.x+1+"px",display:t};this.roller_.size="2",this.roller_.value=this.rollPeriod_;for(var i in a)a.hasOwnProperty(i)&&(this.roller_.style[i]=a[i]);var r=this;this.roller_.onchange=function(){r.adjustRoll(r.roller_.value)}},t.prototype.createDragInterface_=function(){var e={isZooming:!1,isPanning:!1,is2DPan:!1,dragStartX:null,dragStartY:null,dragEndX:null,dragEndY:null,dragDirection:null,prevEndX:null,prevEndY:null,prevDragDirection:null,cancelNextDblclick:!1,initialLeftmostDate:null,xUnitsPerPixel:null,dateRange:null,px:0,py:0,boundedDates:null,boundedValues:null,tarp:new t.IFrameTarp,initializeMouseDown:function(e,a,i){e.preventDefault?e.preventDefault():(e.returnValue=!1,e.cancelBubble=!0);var r=t.findPos(a.canvas_);i.px=r.x,i.py=r.y,i.dragStartX=t.dragGetX_(e,i),i.dragStartY=t.dragGetY_(e,i),i.cancelNextDblclick=!1,i.tarp.cover()},destroy:function(){var t=this;if((t.isZooming||t.isPanning)&&(t.isZooming=!1,t.dragStartX=null,t.dragStartY=null),t.isPanning){t.isPanning=!1,t.draggingDate=null,t.dateRange=null;for(var e=0;e<i.axes_.length;e++)delete i.axes_[e].draggingValue,delete i.axes_[e].dragValueRange}t.tarp.uncover()}},a=this.getOption("interactionModel"),i=this,r=function(t){return function(a){t(a,i,e)}};for(var n in a)a.hasOwnProperty(n)&&this.addAndTrackEvent(this.mouseEventElement_,n,r(a[n]));if(!a.willDestroyContextMyself){var o=function(t){e.destroy()};this.addAndTrackEvent(document,"mouseup",o)}},t.prototype.drawZoomRect_=function(e,a,i,r,n,o,s,l){var h=this.canvas_ctx_;o==t.HORIZONTAL?h.clearRect(Math.min(a,s),this.layout_.getPlotArea().y,Math.abs(a-s),this.layout_.getPlotArea().h):o==t.VERTICAL&&h.clearRect(this.layout_.getPlotArea().x,Math.min(r,l),this.layout_.getPlotArea().w,Math.abs(r-l)),e==t.HORIZONTAL?i&&a&&(h.fillStyle="rgba(128,128,128,0.33)",h.fillRect(Math.min(a,i),this.layout_.getPlotArea().y,Math.abs(i-a),this.layout_.getPlotArea().h)):e==t.VERTICAL&&n&&r&&(h.fillStyle="rgba(128,128,128,0.33)",h.fillRect(this.layout_.getPlotArea().x,Math.min(r,n),this.layout_.getPlotArea().w,Math.abs(n-r))),this.isUsingExcanvas_&&(this.currentZoomRectArgs_=[e,a,i,r,n,0,0,0])},t.prototype.clearZoomRect_=function(){this.currentZoomRectArgs_=null,this.canvas_ctx_.clearRect(0,0,this.width_,this.height_)},t.prototype.doZoomX_=function(t,e){this.currentZoomRectArgs_=null;var a=this.toDataXCoord(t),i=this.toDataXCoord(e);this.doZoomXDates_(a,i)},t.prototype.doZoomXDates_=function(t,e){var a=this.xAxisRange(),i=[t,e];this.zoomed_x_=!0;var r=this;this.doAnimatedZoom(a,i,null,null,function(){r.getFunctionOption("zoomCallback")&&r.getFunctionOption("zoomCallback").call(r,t,e,r.yAxisRanges())})},t.prototype.doZoomY_=function(t,e){this.currentZoomRectArgs_=null;for(var a=this.yAxisRanges(),i=[],r=0;r<this.axes_.length;r++){var n=this.toDataYCoord(t,r),o=this.toDataYCoord(e,r);i.push([o,n])}this.zoomed_y_=!0;var s=this;this.doAnimatedZoom(null,null,a,i,function(){if(s.getFunctionOption("zoomCallback")){var t=s.xAxisRange();s.getFunctionOption("zoomCallback").call(s,t[0],t[1],s.yAxisRanges())}})},t.zoomAnimationFunction=function(t,e){var a=1.5;return(1-Math.pow(a,-t))/(1-Math.pow(a,-e))},t.prototype.resetZoom=function(){var t=!1,e=!1,a=!1;null!==this.dateWindow_&&(t=!0,e=!0);for(var i=0;i<this.axes_.length;i++)"undefined"!=typeof this.axes_[i].valueWindow&&null!==this.axes_[i].valueWindow&&(t=!0,a=!0);if(this.clearSelection(),t){this.zoomed_x_=!1,this.zoomed_y_=!1;var r=this.rawData_[0][0],n=this.rawData_[this.rawData_.length-1][0];if(!this.getBooleanOption("animatedZooms")){for(this.dateWindow_=null,i=0;i<this.axes_.length;i++)null!==this.axes_[i].valueWindow&&delete this.axes_[i].valueWindow;return this.drawGraph_(),void(this.getFunctionOption("zoomCallback")&&this.getFunctionOption("zoomCallback").call(this,r,n,this.yAxisRanges()))}var o=null,s=null,l=null,h=null;if(e&&(o=this.xAxisRange(),s=[r,n]),a){l=this.yAxisRanges();var p=this.gatherDatasets_(this.rolledSeries_,null),g=p.extremes;for(this.computeYAxisRanges_(g),h=[],i=0;i<this.axes_.length;i++){var d=this.axes_[i];h.push(null!==d.valueRange&&void 0!==d.valueRange?d.valueRange:d.extremeRange)}}var u=this;this.doAnimatedZoom(o,s,l,h,function(){u.dateWindow_=null;for(var t=0;t<u.axes_.length;t++)null!==u.axes_[t].valueWindow&&delete u.axes_[t].valueWindow;u.getFunctionOption("zoomCallback")&&u.getFunctionOption("zoomCallback").call(u,r,n,u.yAxisRanges())})}},t.prototype.doAnimatedZoom=function(e,a,i,r,n){var o,s,l=this.getBooleanOption("animatedZooms")?t.ANIMATION_STEPS:1,h=[],p=[];if(null!==e&&null!==a)for(o=1;l>=o;o++)s=t.zoomAnimationFunction(o,l),h[o-1]=[e[0]*(1-s)+s*a[0],e[1]*(1-s)+s*a[1]];if(null!==i&&null!==r)for(o=1;l>=o;o++){s=t.zoomAnimationFunction(o,l);for(var g=[],d=0;d<this.axes_.length;d++)g.push([i[d][0]*(1-s)+s*r[d][0],i[d][1]*(1-s)+s*r[d][1]]);p[o-1]=g}var u=this;t.repeatAndCleanup(function(t){if(p.length)for(var e=0;e<u.axes_.length;e++){var a=p[t][e];u.axes_[e].valueWindow=[a[0],a[1]]}h.length&&(u.dateWindow_=h[t]),u.drawGraph_()},l,t.ANIMATION_DURATION/l,n)},t.prototype.getArea=function(){return this.plotter_.area},t.prototype.eventToDomCoords=function(e){if(e.offsetX&&e.offsetY)return[e.offsetX,e.offsetY];var a=t.findPos(this.mouseEventElement_),i=t.pageX(e)-a.x,r=t.pageY(e)-a.y;return[i,r]},t.prototype.findClosestRow=function(e){for(var a=1/0,i=-1,r=this.layout_.points,n=0;n<r.length;n++)for(var o=r[n],s=o.length,l=0;s>l;l++){var h=o[l];if(t.isValidPoint(h,!0)){var p=Math.abs(h.canvasx-e);a>p&&(a=p,i=h.idx)}}return i},t.prototype.findClosestPoint=function(e,a){for(var i,r,n,o,s,l,h,p=1/0,g=this.layout_.points.length-1;g>=0;--g)for(var d=this.layout_.points[g],u=0;u<d.length;++u)o=d[u],t.isValidPoint(o)&&(r=o.canvasx-e,n=o.canvasy-a,i=r*r+n*n,p>i&&(p=i,s=o,l=g,h=o.idx));var c=this.layout_.setNames[l];return{row:h,seriesName:c,point:s}},t.prototype.findStackedPoint=function(e,a){for(var i,r,n=this.findClosestRow(e),o=0;o<this.layout_.points.length;++o){var s=this.getLeftBoundary_(o),l=n-s,h=this.layout_.points[o];if(!(l>=h.length)){var p=h[l];if(t.isValidPoint(p)){var g=p.canvasy;if(e>p.canvasx&&l+1<h.length){var d=h[l+1];if(t.isValidPoint(d)){var u=d.canvasx-p.canvasx;if(u>0){var c=(e-p.canvasx)/u;g+=c*(d.canvasy-p.canvasy)}}}else if(e<p.canvasx&&l>0){var y=h[l-1];if(t.isValidPoint(y)){var u=p.canvasx-y.canvasx;if(u>0){var c=(p.canvasx-e)/u;g+=c*(y.canvasy-p.canvasy)}}}(0===o||a>g)&&(i=p,r=o)}}}var _=this.layout_.setNames[r];return{row:n,seriesName:_,point:i}},t.prototype.mouseMove_=function(t){var e=this.layout_.points;if(void 0!==e&&null!==e){var a=this.eventToDomCoords(t),i=a[0],r=a[1],n=this.getOption("highlightSeriesOpts"),o=!1;if(n&&!this.isSeriesLocked()){var s;s=this.getBooleanOption("stackedGraph")?this.findStackedPoint(i,r):this.findClosestPoint(i,r),o=this.setSelection(s.row,s.seriesName)}else{var l=this.findClosestRow(i);o=this.setSelection(l)}var h=this.getFunctionOption("highlightCallback");h&&o&&h.call(this,t,this.lastx_,this.selPoints_,this.lastRow_,this.highlightSet_)}},t.prototype.getLeftBoundary_=function(t){if(this.boundaryIds_[t])return this.boundaryIds_[t][0];for(var e=0;e<this.boundaryIds_.length;e++)if(void 0!==this.boundaryIds_[e])return this.boundaryIds_[e][0];return 0},t.prototype.animateSelection_=function(e){var a=10,i=30;void 0===this.fadeLevel&&(this.fadeLevel=0),void 0===this.animateId&&(this.animateId=0);var r=this.fadeLevel,n=0>e?r:a-r;if(0>=n)return void(this.fadeLevel&&this.updateSelection_(1));var o=++this.animateId,s=this;t.repeatAndCleanup(function(t){s.animateId==o&&(s.fadeLevel+=e,0===s.fadeLevel?s.clearSelection():s.updateSelection_(s.fadeLevel/a))},n,i,function(){})},t.prototype.updateSelection_=function(e){this.cascadeEvents_("select",{selectedRow:this.lastRow_,selectedX:this.lastx_,selectedPoints:this.selPoints_});var a,i=this.canvas_ctx_;if(this.getOption("highlightSeriesOpts")){i.clearRect(0,0,this.width_,this.height_);var r=1-this.getNumericOption("highlightSeriesBackgroundAlpha");if(r){var n=!0;if(n){if(void 0===e)return void this.animateSelection_(1);r*=e}i.fillStyle="rgba(255,255,255,"+r+")",i.fillRect(0,0,this.width_,this.height_)}this.plotter_._renderLineChart(this.highlightSet_,i)}else if(this.previousVerticalX_>=0){var o=0,s=this.attr_("labels");for(a=1;a<s.length;a++){var l=this.getNumericOption("highlightCircleSize",s[a]);l>o&&(o=l)}var h=this.previousVerticalX_;i.clearRect(h-o-1,0,2*o+2,this.height_)}if(this.isUsingExcanvas_&&this.currentZoomRectArgs_&&t.prototype.drawZoomRect_.apply(this,this.currentZoomRectArgs_),this.selPoints_.length>0){var p=this.selPoints_[0].canvasx;for(i.save(),a=0;a<this.selPoints_.length;a++){var g=this.selPoints_[a];if(t.isOK(g.canvasy)){var d=this.getNumericOption("highlightCircleSize",g.name),u=this.getFunctionOption("drawHighlightPointCallback",g.name),c=this.plotter_.colors[g.name];u||(u=t.Circles.DEFAULT),i.lineWidth=this.getNumericOption("strokeWidth",g.name),i.strokeStyle=c,i.fillStyle=c,u.call(this,this,g.name,i,p,g.canvasy,c,d,g.idx)}}i.restore(),this.previousVerticalX_=p}},t.prototype.setSelection=function(t,e,a){this.selPoints_=[];var i=!1;if(t!==!1&&t>=0){t!=this.lastRow_&&(i=!0),this.lastRow_=t;for(var r=0;r<this.layout_.points.length;++r){var n=this.layout_.points[r],o=t-this.getLeftBoundary_(r);if(o<n.length&&n[o].idx==t){var s=n[o];null!==s.yval&&this.selPoints_.push(s)}else for(var l=0;l<n.length;++l){var s=n[l];if(s.idx==t){null!==s.yval&&this.selPoints_.push(s);break}}}}else this.lastRow_>=0&&(i=!0),this.lastRow_=-1;return this.selPoints_.length?this.lastx_=this.selPoints_[0].xval:this.lastx_=-1,void 0!==e&&(this.highlightSet_!==e&&(i=!0),this.highlightSet_=e),void 0!==a&&(this.lockedSet_=a),i&&this.updateSelection_(void 0),i},t.prototype.mouseOut_=function(t){this.getFunctionOption("unhighlightCallback")&&this.getFunctionOption("unhighlightCallback").call(this,t),this.getBooleanOption("hideOverlayOnMouseOut")&&!this.lockedSet_&&this.clearSelection()},t.prototype.clearSelection=function(){return this.cascadeEvents_("deselect",{}),this.lockedSet_=!1,this.fadeLevel?void this.animateSelection_(-1):(this.canvas_ctx_.clearRect(0,0,this.width_,this.height_),this.fadeLevel=0,this.selPoints_=[],this.lastx_=-1,this.lastRow_=-1,void(this.highlightSet_=null))},t.prototype.getSelection=function(){if(!this.selPoints_||this.selPoints_.length<1)return-1;for(var t=0;t<this.layout_.points.length;t++)for(var e=this.layout_.points[t],a=0;a<e.length;a++)if(e[a].x==this.selPoints_[0].x)return e[a].idx;return-1},t.prototype.getHighlightSeries=function(){return this.highlightSet_},t.prototype.isSeriesLocked=function(){return this.lockedSet_},t.prototype.loadedEvent_=function(t){this.rawData_=this.parseCSV_(t),this.cascadeDataDidUpdateEvent_(),this.predraw_()},t.prototype.addXTicks_=function(){var t;t=this.dateWindow_?[this.dateWindow_[0],this.dateWindow_[1]]:this.xAxisExtremes();var e=this.optionsViewForAxis_("x"),a=e("ticker")(t[0],t[1],this.plotter_.area.w,e,this);this.layout_.setXTicks(a)},t.prototype.getHandlerClass_=function(){var e;return e=this.attr_("dataHandler")?this.attr_("dataHandler"):this.fractions_?this.getBooleanOption("errorBars")?t.DataHandlers.FractionsBarsHandler:t.DataHandlers.DefaultFractionHandler:this.getBooleanOption("customBars")?t.DataHandlers.CustomBarsHandler:this.getBooleanOption("errorBars")?t.DataHandlers.ErrorBarsHandler:t.DataHandlers.DefaultHandler},t.prototype.predraw_=function(){var t=new Date;this.dataHandler_=new(this.getHandlerClass_()),this.layout_.computePlotArea(),this.computeYAxes_(),this.is_initial_draw_||(this.canvas_ctx_.restore(),this.hidden_ctx_.restore()),this.canvas_ctx_.save(),this.hidden_ctx_.save(),this.plotter_=new DygraphCanvasRenderer(this,this.hidden_,this.hidden_ctx_,this.layout_),this.createRollInterface_(),this.cascadeEvents_("predraw"),this.rolledSeries_=[null];for(var e=1;e<this.numColumns();e++){var a=this.dataHandler_.extractSeries(this.rawData_,e,this.attributes_);this.rollPeriod_>1&&(a=this.dataHandler_.rollingAverage(a,this.rollPeriod_,this.attributes_)),this.rolledSeries_.push(a)}this.drawGraph_();var i=new Date;this.drawingTimeMs_=i-t},t.PointType=void 0,t.stackPoints_=function(t,e,a,i){for(var r=null,n=null,o=null,s=-1,l=function(e){if(!(s>=e))for(var a=e;a<t.length;++a)if(o=null,!isNaN(t[a].yval)&&null!==t[a].yval){s=a,o=t[a];break}},h=0;h<t.length;++h){var p=t[h],g=p.xval;void 0===e[g]&&(e[g]=0);var d=p.yval;isNaN(d)||null===d?"none"==i?d=0:(l(h),d=n&&o&&"none"!=i?n.yval+(o.yval-n.yval)*((g-n.xval)/(o.xval-n.xval)):n&&"all"==i?n.yval:o&&"all"==i?o.yval:0):n=p;var u=e[g];r!=g&&(u+=d,e[g]=u),r=g,p.yval_stacked=u,u>a[1]&&(a[1]=u),u<a[0]&&(a[0]=u)}},t.prototype.gatherDatasets_=function(e,a){var i,r,n,o,s,l,h=[],p=[],g=[],d={},u=e.length-1;for(i=u;i>=1;i--)if(this.visibility()[i-1]){if(a){l=e[i];var c=a[0],y=a[1];for(n=null,o=null,r=0;r<l.length;r++)l[r][0]>=c&&null===n&&(n=r),l[r][0]<=y&&(o=r);null===n&&(n=0);for(var _=n,v=!0;v&&_>0;)_--,v=null===l[_][1];null===o&&(o=l.length-1);var f=o;for(v=!0;v&&f<l.length-1;)f++,v=null===l[f][1];_!==n&&(n=_),f!==o&&(o=f),h[i-1]=[n,o],l=l.slice(n,o+1)}else l=e[i],h[i-1]=[0,l.length-1];var x=this.attr_("labels")[i],m=this.dataHandler_.getExtremeYValues(l,a,this.getBooleanOption("stepPlot",x)),D=this.dataHandler_.seriesToPoints(l,x,h[i-1][0]);this.getBooleanOption("stackedGraph")&&(s=this.attributes_.axisForSeries(x),void 0===g[s]&&(g[s]=[]),t.stackPoints_(D,g[s],m,this.getBooleanOption("stackedGraphNaNFill"))),d[x]=m,p[i]=D}return{points:p,extremes:d,boundaryIds:h}},t.prototype.drawGraph_=function(){var t=new Date,e=this.is_initial_draw_;this.is_initial_draw_=!1,this.layout_.removeAllDatasets(),this.setColors_(),this.attrs_.pointSize=.5*this.getNumericOption("highlightCircleSize");var a=this.gatherDatasets_(this.rolledSeries_,this.dateWindow_),i=a.points,r=a.extremes;this.boundaryIds_=a.boundaryIds,this.setIndexByName_={};var n=this.attr_("labels");n.length>0&&(this.setIndexByName_[n[0]]=0);for(var o=0,s=1;s<i.length;s++)this.setIndexByName_[n[s]]=s,this.visibility()[s-1]&&(this.layout_.addDataset(n[s],i[s]),this.datasetIndex_[s]=o++);this.computeYAxisRanges_(r),this.layout_.setYAxes(this.axes_),this.addXTicks_();var l=this.zoomed_x_;if(this.zoomed_x_=l,this.layout_.evaluate(),this.renderGraph_(e),this.getStringOption("timingName")){var h=new Date;console.log(this.getStringOption("timingName")+" - drawGraph: "+(h-t)+"ms")}},t.prototype.renderGraph_=function(t){this.cascadeEvents_("clearChart"),this.plotter_.clear(),this.getFunctionOption("underlayCallback")&&this.getFunctionOption("underlayCallback").call(this,this.hidden_ctx_,this.layout_.getPlotArea(),this,this);var e={canvas:this.hidden_,drawingContext:this.hidden_ctx_};if(this.cascadeEvents_("willDrawChart",e),this.plotter_.render(),this.cascadeEvents_("didDrawChart",e),this.lastRow_=-1,this.canvas_.getContext("2d").clearRect(0,0,this.width_,this.height_),null!==this.getFunctionOption("drawCallback")&&this.getFunctionOption("drawCallback").call(this,this,t),t)for(this.readyFired_=!0;this.readyFns_.length>0;){var a=this.readyFns_.pop();a(this)}},t.prototype.computeYAxes_=function(){var e,a,i,r,n;if(void 0!==this.axes_&&this.user_attrs_.hasOwnProperty("valueRange")===!1)for(e=[],i=0;i<this.axes_.length;i++)e.push(this.axes_[i].valueWindow);for(this.axes_=[],a=0;a<this.attributes_.numAxes();a++)r={g:this},t.update(r,this.attributes_.axisOptions(a)),this.axes_[a]=r;if(n=this.attr_("valueRange"),n&&(this.axes_[0].valueRange=n),void 0!==e){var o=Math.min(e.length,this.axes_.length);for(i=0;o>i;i++)this.axes_[i].valueWindow=e[i]}for(a=0;a<this.axes_.length;a++)if(0===a)r=this.optionsViewForAxis_("y"+(a?"2":"")),n=r("valueRange"),n&&(this.axes_[a].valueRange=n);else{var s=this.user_attrs_.axes;s&&s.y2&&(n=s.y2.valueRange,n&&(this.axes_[a].valueRange=n))}},t.prototype.numAxes=function(){return this.attributes_.numAxes()},t.prototype.axisPropertiesForSeries=function(t){return this.axes_[this.attributes_.axisForSeries(t)]},t.prototype.computeYAxisRanges_=function(t){for(var e,a,i,r,n,o=function(t){return isNaN(parseFloat(t))},s=this.attributes_.numAxes(),l=0;s>l;l++){var h=this.axes_[l],p=this.attributes_.getForAxis("logscale",l),g=this.attributes_.getForAxis("includeZero",l),d=this.attributes_.getForAxis("independentTicks",l);if(i=this.attributes_.seriesForAxis(l),e=!0,r=.1,null!==this.getNumericOption("yRangePad")&&(e=!1,r=this.getNumericOption("yRangePad")/this.plotter_.area.h),0===i.length)h.extremeRange=[0,1];else{for(var u,c,y=1/0,_=-(1/0),v=0;v<i.length;v++)t.hasOwnProperty(i[v])&&(u=t[i[v]][0],null!==u&&(y=Math.min(u,y)),c=t[i[v]][1],null!==c&&(_=Math.max(c,_)));g&&!p&&(y>0&&(y=0),0>_&&(_=0)),y==1/0&&(y=0),_==-(1/0)&&(_=1),a=_-y,0===a&&(0!==_?a=Math.abs(_):(_=1,a=1));var f,x;if(p)if(e)f=_+r*a,x=y;else{var m=Math.exp(Math.log(a)*r);f=_*m,x=y/m}else f=_+r*a,x=y-r*a,e&&!this.getBooleanOption("avoidMinZero")&&(0>x&&y>=0&&(x=0),f>0&&0>=_&&(f=0));h.extremeRange=[x,f]}if(h.valueWindow)h.computedValueRange=[h.valueWindow[0],h.valueWindow[1]];else if(h.valueRange){var D=o(h.valueRange[0])?h.extremeRange[0]:h.valueRange[0],w=o(h.valueRange[1])?h.extremeRange[1]:h.valueRange[1];if(!e)if(h.logscale){var m=Math.exp(Math.log(a)*r);D*=m,w/=m}else a=w-D,D-=a*r,w+=a*r;h.computedValueRange=[D,w]}else h.computedValueRange=h.extremeRange;if(d){h.independentTicks=d;var A=this.optionsViewForAxis_("y"+(l?"2":"")),b=A("ticker");h.ticks=b(h.computedValueRange[0],h.computedValueRange[1],this.plotter_.area.h,A,this),n||(n=h)}}if(void 0===n)throw'Configuration Error: At least one axis has to have the "independentTicks" option activated.';for(var l=0;s>l;l++){var h=this.axes_[l];if(!h.independentTicks){for(var A=this.optionsViewForAxis_("y"+(l?"2":"")),b=A("ticker"),T=n.ticks,E=n.computedValueRange[1]-n.computedValueRange[0],C=h.computedValueRange[1]-h.computedValueRange[0],L=[],P=0;P<T.length;P++){var S=(T[P].v-n.computedValueRange[0])/E,O=h.computedValueRange[0]+S*C;L.push(O)}h.ticks=b(h.computedValueRange[0],h.computedValueRange[1],this.plotter_.area.h,A,this,L)}}},t.prototype.detectTypeFromString_=function(t){var e=!1,a=t.indexOf("-");a>0&&"e"!=t[a-1]&&"E"!=t[a-1]||t.indexOf("/")>=0||isNaN(parseFloat(t))?e=!0:8==t.length&&t>"19700101"&&"20371231">t&&(e=!0),this.setXAxisOptions_(e)},t.prototype.setXAxisOptions_=function(e){e?(this.attrs_.xValueParser=t.dateParser,this.attrs_.axes.x.valueFormatter=t.dateValueFormatter,this.attrs_.axes.x.ticker=t.dateTicker,this.attrs_.axes.x.axisLabelFormatter=t.dateAxisLabelFormatter):(this.attrs_.xValueParser=function(t){return parseFloat(t)},this.attrs_.axes.x.valueFormatter=function(t){return t},this.attrs_.axes.x.ticker=t.numericTicks,this.attrs_.axes.x.axisLabelFormatter=this.attrs_.axes.x.valueFormatter)},t.prototype.parseCSV_=function(e){var a,i,r=[],n=t.detectLineDelimiter(e),o=e.split(n||"\n"),s=this.getStringOption("delimiter");-1==o[0].indexOf(s)&&o[0].indexOf("	")>=0&&(s="	");var l=0;"labels"in this.user_attrs_||(l=1,this.attrs_.labels=o[0].split(s),this.attributes_.reparseSeries());for(var h,p=0,g=!1,d=this.attr_("labels").length,u=!1,c=l;c<o.length;c++){var y=o[c];if(p=c,0!==y.length&&"#"!=y[0]){var _=y.split(s);if(!(_.length<2)){var v=[];if(g||(this.detectTypeFromString_(_[0]),h=this.getFunctionOption("xValueParser"),g=!0),v[0]=h(_[0],this),this.fractions_)for(i=1;i<_.length;i++)a=_[i].split("/"),2!=a.length?(console.error('Expected fractional "num/den" values in CSV data but found a value \''+_[i]+"' on line "+(1+c)+" ('"+y+"') which is not of this form."),v[i]=[0,0]):v[i]=[t.parseFloat_(a[0],c,y),t.parseFloat_(a[1],c,y)];else if(this.getBooleanOption("errorBars"))for(_.length%2!=1&&console.error("Expected alternating (value, stdev.) pairs in CSV data but line "+(1+c)+" has an odd number of values ("+(_.length-1)+"): '"+y+"'"),i=1;i<_.length;i+=2)v[(i+1)/2]=[t.parseFloat_(_[i],c,y),t.parseFloat_(_[i+1],c,y)];else if(this.getBooleanOption("customBars"))for(i=1;i<_.length;i++){var f=_[i];/^ *$/.test(f)?v[i]=[null,null,null]:(a=f.split(";"),3==a.length?v[i]=[t.parseFloat_(a[0],c,y),t.parseFloat_(a[1],c,y),t.parseFloat_(a[2],c,y)]:console.warn('When using customBars, values must be either blank or "low;center;high" tuples (got "'+f+'" on line '+(1+c)))}else for(i=1;i<_.length;i++)v[i]=t.parseFloat_(_[i],c,y);if(r.length>0&&v[0]<r[r.length-1][0]&&(u=!0),v.length!=d&&console.error("Number of columns in line "+c+" ("+v.length+") does not agree with number of labels ("+d+") "+y),0===c&&this.attr_("labels")){var x=!0;for(i=0;x&&i<v.length;i++)v[i]&&(x=!1);if(x){console.warn("The dygraphs 'labels' option is set, but the first row of CSV data ('"+y+"') appears to also contain labels. Will drop the CSV labels and use the option labels.");continue}}r.push(v)}}}return u&&(console.warn("CSV is out of order; order it correctly to speed loading."),r.sort(function(t,e){return t[0]-e[0]})),r},t.prototype.parseArray_=function(e){if(0===e.length)return console.error("Can't plot empty data set"),null;if(0===e[0].length)return console.error("Data set cannot contain an empty row"),null;var a;if(null===this.attr_("labels")){for(console.warn("Using default labels. Set labels explicitly via 'labels' in the options parameter"),this.attrs_.labels=["X"],a=1;a<e[0].length;a++)this.attrs_.labels.push("Y"+a);this.attributes_.reparseSeries()}else{var i=this.attr_("labels");if(i.length!=e[0].length)return console.error("Mismatch between number of labels ("+i+") and number of columns in array ("+e[0].length+")"),null}if(t.isDateLike(e[0][0])){this.attrs_.axes.x.valueFormatter=t.dateValueFormatter,this.attrs_.axes.x.ticker=t.dateTicker,this.attrs_.axes.x.axisLabelFormatter=t.dateAxisLabelFormatter;var r=t.clone(e);for(a=0;a<e.length;a++){if(0===r[a].length)return console.error("Row "+(1+a)+" of data is empty"),null;if(null===r[a][0]||"function"!=typeof r[a][0].getTime||isNaN(r[a][0].getTime()))return console.error("x value in row "+(1+a)+" is not a Date"),null;r[a][0]=r[a][0].getTime()}return r}return this.attrs_.axes.x.valueFormatter=function(t){return t},this.attrs_.axes.x.ticker=t.numericTicks,this.attrs_.axes.x.axisLabelFormatter=t.numberAxisLabelFormatter,e},t.prototype.parseDataTable_=function(e){var a=function(t){var e=String.fromCharCode(65+t%26);for(t=Math.floor(t/26);t>0;)e=String.fromCharCode(65+(t-1)%26)+e.toLowerCase(),t=Math.floor((t-1)/26);return e},i=e.getNumberOfColumns(),r=e.getNumberOfRows(),n=e.getColumnType(0);if("date"==n||"datetime"==n)this.attrs_.xValueParser=t.dateParser,this.attrs_.axes.x.valueFormatter=t.dateValueFormatter,this.attrs_.axes.x.ticker=t.dateTicker,this.attrs_.axes.x.axisLabelFormatter=t.dateAxisLabelFormatter;else{if("number"!=n)return console.error("only 'date', 'datetime' and 'number' types are supported for column 1 of DataTable input (Got '"+n+"')"),null;this.attrs_.xValueParser=function(t){return parseFloat(t)},this.attrs_.axes.x.valueFormatter=function(t){return t},this.attrs_.axes.x.ticker=t.numericTicks,this.attrs_.axes.x.axisLabelFormatter=this.attrs_.axes.x.valueFormatter}var o,s,l=[],h={},p=!1;for(o=1;i>o;o++){var g=e.getColumnType(o);if("number"==g)l.push(o);else if("string"==g&&this.getBooleanOption("displayAnnotations")){var d=l[l.length-1];h.hasOwnProperty(d)?h[d].push(o):h[d]=[o],p=!0}else console.error("Only 'number' is supported as a dependent type with Gviz. 'string' is only supported if displayAnnotations is true")}var u=[e.getColumnLabel(0)];for(o=0;o<l.length;o++)u.push(e.getColumnLabel(l[o])),this.getBooleanOption("errorBars")&&(o+=1);this.attrs_.labels=u,i=u.length;var c=[],y=!1,_=[];for(o=0;r>o;o++){var v=[];if("undefined"!=typeof e.getValue(o,0)&&null!==e.getValue(o,0)){if(v.push("date"==n||"datetime"==n?e.getValue(o,0).getTime():e.getValue(o,0)),this.getBooleanOption("errorBars"))for(s=0;i-1>s;s++)v.push([e.getValue(o,1+2*s),e.getValue(o,2+2*s)]);else{for(s=0;s<l.length;s++){var f=l[s];if(v.push(e.getValue(o,f)),p&&h.hasOwnProperty(f)&&null!==e.getValue(o,h[f][0])){var x={};x.series=e.getColumnLabel(f),x.xval=v[0],x.shortText=a(_.length),x.text="";for(var m=0;m<h[f].length;m++)m&&(x.text+="\n"),x.text+=e.getValue(o,h[f][m]);_.push(x)}}for(s=0;s<v.length;s++)isFinite(v[s])||(v[s]=null)}c.length>0&&v[0]<c[c.length-1][0]&&(y=!0),c.push(v)}else console.warn("Ignoring row "+o+" of DataTable because of undefined or null first column.")}y&&(console.warn("DataTable is out of order; order it correctly to speed loading."),c.sort(function(t,e){return t[0]-e[0]})),this.rawData_=c,_.length>0&&this.setAnnotations(_,!0),this.attributes_.reparseSeries()},t.prototype.cascadeDataDidUpdateEvent_=function(){this.cascadeEvents_("dataDidUpdate",{})},t.prototype.start_=function(){var e=this.file_;if("function"==typeof e&&(e=e()),t.isArrayLike(e))this.rawData_=this.parseArray_(e),this.cascadeDataDidUpdateEvent_(),this.predraw_();else if("object"==typeof e&&"function"==typeof e.getColumnRange)this.parseDataTable_(e),this.cascadeDataDidUpdateEvent_(),this.predraw_();else if("string"==typeof e){var a=t.detectLineDelimiter(e);if(a)this.loadedEvent_(e);else{var i;i=window.XMLHttpRequest?new XMLHttpRequest:new ActiveXObject("Microsoft.XMLHTTP");var r=this;i.onreadystatechange=function(){4==i.readyState&&(200===i.status||0===i.status)&&r.loadedEvent_(i.responseText)},i.open("GET",e,!0),i.send(null)}}else console.error("Unknown data format: "+typeof e)},t.prototype.updateOptions=function(e,a){"undefined"==typeof a&&(a=!1);var i=e.file,r=t.mapLegacyOptions_(e);"rollPeriod"in r&&(this.rollPeriod_=r.rollPeriod),"dateWindow"in r&&(this.dateWindow_=r.dateWindow,"isZoomedIgnoreProgrammaticZoom"in r||(this.zoomed_x_=null!==r.dateWindow)),"valueRange"in r&&!("isZoomedIgnoreProgrammaticZoom"in r)&&(this.zoomed_y_=null!==r.valueRange);var n=t.isPixelChangingOptionList(this.attr_("labels"),r);t.updateDeep(this.user_attrs_,r),this.attributes_.reparseSeries(),i?(this.cascadeEvents_("dataWillUpdate",{}),this.file_=i,a||this.start_()):a||(n?this.predraw_():this.renderGraph_(!1))},t.mapLegacyOptions_=function(t){var e={};for(var a in t)t.hasOwnProperty(a)&&"file"!=a&&t.hasOwnProperty(a)&&(e[a]=t[a]);var i=function(t,a,i){e.axes||(e.axes={}),e.axes[t]||(e.axes[t]={}),e.axes[t][a]=i},r=function(a,r,n){"undefined"!=typeof t[a]&&(console.warn("Option "+a+" is deprecated. Use the "+n+" option for the "+r+" axis instead. (e.g. { axes : { "+r+" : { "+n+" : ... } } } (see http://dygraphs.com/per-axis.html for more information."),i(r,n,t[a]),delete e[a])};return r("xValueFormatter","x","valueFormatter"),r("pixelsPerXLabel","x","pixelsPerLabel"),r("xAxisLabelFormatter","x","axisLabelFormatter"),r("xTicker","x","ticker"),r("yValueFormatter","y","valueFormatter"),r("pixelsPerYLabel","y","pixelsPerLabel"),r("yAxisLabelFormatter","y","axisLabelFormatter"),r("yTicker","y","ticker"),r("drawXGrid","x","drawGrid"),r("drawXAxis","x","drawAxis"),r("drawYGrid","y","drawGrid"),r("drawYAxis","y","drawAxis"),r("xAxisLabelWidth","x","axisLabelWidth"),r("yAxisLabelWidth","y","axisLabelWidth"),e},t.prototype.resize=function(t,e){if(!this.resize_lock){this.resize_lock=!0,null===t!=(null===e)&&(console.warn("Dygraph.resize() should be called with zero parameters or two non-NULL parameters. Pretending it was zero."),t=e=null);var a=this.width_,i=this.height_;t?(this.maindiv_.style.width=t+"px",this.maindiv_.style.height=e+"px",this.width_=t,this.height_=e):(this.width_=this.maindiv_.clientWidth,this.height_=this.maindiv_.clientHeight),(a!=this.width_||i!=this.height_)&&(this.resizeElements_(),this.predraw_()),this.resize_lock=!1}},t.prototype.adjustRoll=function(t){this.rollPeriod_=t,this.predraw_()},t.prototype.visibility=function(){for(this.getOption("visibility")||(this.attrs_.visibility=[]);this.getOption("visibility").length<this.numColumns()-1;)this.attrs_.visibility.push(!0);return this.getOption("visibility")},t.prototype.setVisibility=function(t,e){var a=this.visibility();0>t||t>=a.length?console.warn("invalid series number in setVisibility: "+t):(a[t]=e,this.predraw_())},t.prototype.size=function(){return{width:this.width_,height:this.height_}},t.prototype.setAnnotations=function(e,a){return t.addAnnotationRule(),this.annotations_=e,this.layout_?(this.layout_.setAnnotations(this.annotations_),void(a||this.predraw_())):void console.warn("Tried to setAnnotations before dygraph was ready. Try setting them in a ready() block. See dygraphs.com/tests/annotation.html")},t.prototype.annotations=function(){return this.annotations_},t.prototype.getLabels=function(){var t=this.attr_("labels");return t?t.slice():null},t.prototype.indexFromSetName=function(t){return this.setIndexByName_[t]},t.prototype.ready=function(t){this.is_initial_draw_?this.readyFns_.push(t):t.call(this,this)},t.addAnnotationRule=function(){if(!t.addedAnnotationCSS){var e="border: 1px solid black; background-color: white; text-align: center;",a=document.createElement("style");a.type="text/css",document.getElementsByTagName("head")[0].appendChild(a);for(var i=0;i<document.styleSheets.length;i++)if(!document.styleSheets[i].disabled){var r=document.styleSheets[i];try{if(r.insertRule){var n=r.cssRules?r.cssRules.length:0;r.insertRule(".dygraphDefaultAnnotation { "+e+" }",n)}else r.addRule&&r.addRule(".dygraphDefaultAnnotation",e);return void(t.addedAnnotationCSS=!0)}catch(o){}}console.warn("Unable to add default annotation CSS rule; display may be off.")}},"object"==typeof exports&&"undefined"!=typeof module&&(module.exports=t),t}();!function(){"use strict";function t(t){var e=a.exec(t);if(!e)return null;var i=parseInt(e[1],10),r=parseInt(e[2],10),n=parseInt(e[3],10);return e[4]?{r:i,g:r,b:n,a:parseFloat(e[4])}:{r:i,g:r,b:n}}Dygraph.LOG_SCALE=10,Dygraph.LN_TEN=Math.log(Dygraph.LOG_SCALE),Dygraph.log10=function(t){return Math.log(t)/Dygraph.LN_TEN},Dygraph.DOTTED_LINE=[2,2],Dygraph.DASHED_LINE=[7,3],Dygraph.DOT_DASH_LINE=[7,2,2,2],Dygraph.getContext=function(t){return t.getContext("2d")},Dygraph.addEvent=function(t,e,a){t.addEventListener?t.addEventListener(e,a,!1):(t[e+a]=function(){a(window.event)},t.attachEvent("on"+e,t[e+a]))},Dygraph.prototype.addAndTrackEvent=function(t,e,a){Dygraph.addEvent(t,e,a),this.registeredEvents_.push({elem:t,type:e,fn:a})},Dygraph.removeEvent=function(t,e,a){if(t.removeEventListener)t.removeEventListener(e,a,!1);else{try{t.detachEvent("on"+e,t[e+a])}catch(i){}t[e+a]=null}},Dygraph.prototype.removeTrackedEvents_=function(){if(this.registeredEvents_)for(var t=0;t<this.registeredEvents_.length;t++){var e=this.registeredEvents_[t];Dygraph.removeEvent(e.elem,e.type,e.fn)}this.registeredEvents_=[]},Dygraph.cancelEvent=function(t){return t=t?t:window.event,t.stopPropagation&&t.stopPropagation(),t.preventDefault&&t.preventDefault(),t.cancelBubble=!0,t.cancel=!0,t.returnValue=!1,!1},Dygraph.hsvToRGB=function(t,e,a){var i,r,n;if(0===e)i=a,r=a,n=a;else{var o=Math.floor(6*t),s=6*t-o,l=a*(1-e),h=a*(1-e*s),p=a*(1-e*(1-s));switch(o){case 1:i=h,r=a,n=l;break;case 2:i=l,r=a,n=p;break;case 3:i=l,r=h,n=a;break;case 4:i=p,r=l,n=a;break;case 5:i=a,r=l,n=h;break;case 6:case 0:i=a,r=p,n=l}}return i=Math.floor(255*i+.5),r=Math.floor(255*r+.5),n=Math.floor(255*n+.5),"rgb("+i+","+r+","+n+")"},Dygraph.findPos=function(t){var e=0,a=0;if(t.offsetParent)for(var i=t;;){var r="0",n="0";if(window.getComputedStyle){
+var o=window.getComputedStyle(i,null);r=o.borderLeft||"0",n=o.borderTop||"0"}if(e+=parseInt(r,10),a+=parseInt(n,10),e+=i.offsetLeft,a+=i.offsetTop,!i.offsetParent)break;i=i.offsetParent}else t.x&&(e+=t.x),t.y&&(a+=t.y);for(;t&&t!=document.body;)e-=t.scrollLeft,a-=t.scrollTop,t=t.parentNode;return{x:e,y:a}},Dygraph.pageX=function(t){if(t.pageX)return!t.pageX||t.pageX<0?0:t.pageX;var e=document.documentElement,a=document.body;return t.clientX+(e.scrollLeft||a.scrollLeft)-(e.clientLeft||0)},Dygraph.pageY=function(t){if(t.pageY)return!t.pageY||t.pageY<0?0:t.pageY;var e=document.documentElement,a=document.body;return t.clientY+(e.scrollTop||a.scrollTop)-(e.clientTop||0)},Dygraph.dragGetX_=function(t,e){return Dygraph.pageX(t)-e.px},Dygraph.dragGetY_=function(t,e){return Dygraph.pageY(t)-e.py},Dygraph.isOK=function(t){return!!t&&!isNaN(t)},Dygraph.isValidPoint=function(t,e){return t?null===t.yval?!1:null===t.x||void 0===t.x?!1:null===t.y||void 0===t.y?!1:isNaN(t.x)||!e&&isNaN(t.y)?!1:!0:!1},Dygraph.floatFormat=function(t,e){var a=Math.min(Math.max(1,e||2),21);return Math.abs(t)<.001&&0!==t?t.toExponential(a-1):t.toPrecision(a)},Dygraph.zeropad=function(t){return 10>t?"0"+t:""+t},Dygraph.DateAccessorsLocal={getFullYear:function(t){return t.getFullYear()},getMonth:function(t){return t.getMonth()},getDate:function(t){return t.getDate()},getHours:function(t){return t.getHours()},getMinutes:function(t){return t.getMinutes()},getSeconds:function(t){return t.getSeconds()},getMilliseconds:function(t){return t.getMilliseconds()},getDay:function(t){return t.getDay()},makeDate:function(t,e,a,i,r,n,o){return new Date(t,e,a,i,r,n,o)}},Dygraph.DateAccessorsUTC={getFullYear:function(t){return t.getUTCFullYear()},getMonth:function(t){return t.getUTCMonth()},getDate:function(t){return t.getUTCDate()},getHours:function(t){return t.getUTCHours()},getMinutes:function(t){return t.getUTCMinutes()},getSeconds:function(t){return t.getUTCSeconds()},getMilliseconds:function(t){return t.getUTCMilliseconds()},getDay:function(t){return t.getUTCDay()},makeDate:function(t,e,a,i,r,n,o){return new Date(Date.UTC(t,e,a,i,r,n,o))}},Dygraph.hmsString_=function(t,e,a){var i=Dygraph.zeropad,r=i(t)+":"+i(e);return a&&(r+=":"+i(a)),r},Dygraph.dateString_=function(t,e){var a=Dygraph.zeropad,i=e?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal,r=new Date(t),n=i.getFullYear(r),o=i.getMonth(r),s=i.getDate(r),l=i.getHours(r),h=i.getMinutes(r),p=i.getSeconds(r),g=""+n,d=a(o+1),u=a(s),c=3600*l+60*h+p,y=g+"/"+d+"/"+u;return c&&(y+=" "+Dygraph.hmsString_(l,h,p)),y},Dygraph.round_=function(t,e){var a=Math.pow(10,e);return Math.round(t*a)/a},Dygraph.binarySearch=function(t,e,a,i,r){if((null===i||void 0===i||null===r||void 0===r)&&(i=0,r=e.length-1),i>r)return-1;(null===a||void 0===a)&&(a=0);var n,o=function(t){return t>=0&&t<e.length},s=parseInt((i+r)/2,10),l=e[s];return l==t?s:l>t?a>0&&(n=s-1,o(n)&&e[n]<t)?s:Dygraph.binarySearch(t,e,a,i,s-1):t>l?0>a&&(n=s+1,o(n)&&e[n]>t)?s:Dygraph.binarySearch(t,e,a,s+1,r):-1},Dygraph.dateParser=function(t){var e,a;if((-1==t.search("-")||-1!=t.search("T")||-1!=t.search("Z"))&&(a=Dygraph.dateStrToMillis(t),a&&!isNaN(a)))return a;if(-1!=t.search("-")){for(e=t.replace("-","/","g");-1!=e.search("-");)e=e.replace("-","/");a=Dygraph.dateStrToMillis(e)}else 8==t.length?(e=t.substr(0,4)+"/"+t.substr(4,2)+"/"+t.substr(6,2),a=Dygraph.dateStrToMillis(e)):a=Dygraph.dateStrToMillis(t);return(!a||isNaN(a))&&console.error("Couldn't parse "+t+" as a date"),a},Dygraph.dateStrToMillis=function(t){return new Date(t).getTime()},Dygraph.update=function(t,e){if("undefined"!=typeof e&&null!==e)for(var a in e)e.hasOwnProperty(a)&&(t[a]=e[a]);return t},Dygraph.updateDeep=function(t,e){function a(t){return"object"==typeof Node?t instanceof Node:"object"==typeof t&&"number"==typeof t.nodeType&&"string"==typeof t.nodeName}if("undefined"!=typeof e&&null!==e)for(var i in e)e.hasOwnProperty(i)&&(null===e[i]?t[i]=null:Dygraph.isArrayLike(e[i])?t[i]=e[i].slice():a(e[i])?t[i]=e[i]:"object"==typeof e[i]?(("object"!=typeof t[i]||null===t[i])&&(t[i]={}),Dygraph.updateDeep(t[i],e[i])):t[i]=e[i]);return t},Dygraph.isArrayLike=function(t){var e=typeof t;return"object"!=e&&("function"!=e||"function"!=typeof t.item)||null===t||"number"!=typeof t.length||3===t.nodeType?!1:!0},Dygraph.isDateLike=function(t){return"object"!=typeof t||null===t||"function"!=typeof t.getTime?!1:!0},Dygraph.clone=function(t){for(var e=[],a=0;a<t.length;a++)e.push(Dygraph.isArrayLike(t[a])?Dygraph.clone(t[a]):t[a]);return e},Dygraph.createCanvas=function(){var t=document.createElement("canvas"),e=/MSIE/.test(navigator.userAgent)&&!window.opera;return e&&"undefined"!=typeof G_vmlCanvasManager&&(t=G_vmlCanvasManager.initElement(t)),t},Dygraph.getContextPixelRatio=function(t){try{var e=window.devicePixelRatio,a=t.webkitBackingStorePixelRatio||t.mozBackingStorePixelRatio||t.msBackingStorePixelRatio||t.oBackingStorePixelRatio||t.backingStorePixelRatio||1;return void 0!==e?e/a:1}catch(i){return 1}},Dygraph.isAndroid=function(){return/Android/.test(navigator.userAgent)},Dygraph.Iterator=function(t,e,a,i){e=e||0,a=a||t.length,this.hasNext=!0,this.peek=null,this.start_=e,this.array_=t,this.predicate_=i,this.end_=Math.min(t.length,e+a),this.nextIdx_=e-1,this.next()},Dygraph.Iterator.prototype.next=function(){if(!this.hasNext)return null;for(var t=this.peek,e=this.nextIdx_+1,a=!1;e<this.end_;){if(!this.predicate_||this.predicate_(this.array_,e)){this.peek=this.array_[e],a=!0;break}e++}return this.nextIdx_=e,a||(this.hasNext=!1,this.peek=null),t},Dygraph.createIterator=function(t,e,a,i){return new Dygraph.Iterator(t,e,a,i)},Dygraph.requestAnimFrame=function(){return window.requestAnimationFrame||window.webkitRequestAnimationFrame||window.mozRequestAnimationFrame||window.oRequestAnimationFrame||window.msRequestAnimationFrame||function(t){window.setTimeout(t,1e3/60)}}(),Dygraph.repeatAndCleanup=function(t,e,a,i){var r,n=0,o=(new Date).getTime();if(t(n),1==e)return void i();var s=e-1;!function l(){n>=e||Dygraph.requestAnimFrame.call(window,function(){var e=(new Date).getTime(),h=e-o;r=n,n=Math.floor(h/a);var p=n-r,g=n+p>s;g||n>=s?(t(s),i()):(0!==p&&t(n),l())})}()};var e={annotationClickHandler:!0,annotationDblClickHandler:!0,annotationMouseOutHandler:!0,annotationMouseOverHandler:!0,axisLabelColor:!0,axisLineColor:!0,axisLineWidth:!0,clickCallback:!0,drawCallback:!0,drawHighlightPointCallback:!0,drawPoints:!0,drawPointCallback:!0,drawXGrid:!0,drawYGrid:!0,fillAlpha:!0,gridLineColor:!0,gridLineWidth:!0,hideOverlayOnMouseOut:!0,highlightCallback:!0,highlightCircleSize:!0,interactionModel:!0,isZoomedIgnoreProgrammaticZoom:!0,labelsDiv:!0,labelsDivStyles:!0,labelsDivWidth:!0,labelsKMB:!0,labelsKMG2:!0,labelsSeparateLines:!0,labelsShowZeroValues:!0,legend:!0,panEdgeFraction:!0,pixelsPerYLabel:!0,pointClickCallback:!0,pointSize:!0,rangeSelectorPlotFillColor:!0,rangeSelectorPlotStrokeColor:!0,showLabelsOnHighlight:!0,showRoller:!0,strokeWidth:!0,underlayCallback:!0,unhighlightCallback:!0,zoomCallback:!0};Dygraph.isPixelChangingOptionList=function(t,a){var i={};if(t)for(var r=1;r<t.length;r++)i[t[r]]=!0;var n=function(t){for(var a in t)if(t.hasOwnProperty(a)&&!e[a])return!0;return!1};for(var o in a)if(a.hasOwnProperty(o))if("highlightSeriesOpts"==o||i[o]&&!a.series){if(n(a[o]))return!0}else if("series"==o||"axes"==o){var s=a[o];for(var l in s)if(s.hasOwnProperty(l)&&n(s[l]))return!0}else if(!e[o])return!0;return!1},Dygraph.Circles={DEFAULT:function(t,e,a,i,r,n,o){a.beginPath(),a.fillStyle=n,a.arc(i,r,o,0,2*Math.PI,!1),a.fill()}},Dygraph.IFrameTarp=function(){this.tarps=[]},Dygraph.IFrameTarp.prototype.cover=function(){for(var t=document.getElementsByTagName("iframe"),e=0;e<t.length;e++){var a=t[e],i=Dygraph.findPos(a),r=i.x,n=i.y,o=a.offsetWidth,s=a.offsetHeight,l=document.createElement("div");l.style.position="absolute",l.style.left=r+"px",l.style.top=n+"px",l.style.width=o+"px",l.style.height=s+"px",l.style.zIndex=999,document.body.appendChild(l),this.tarps.push(l)}},Dygraph.IFrameTarp.prototype.uncover=function(){for(var t=0;t<this.tarps.length;t++)this.tarps[t].parentNode.removeChild(this.tarps[t]);this.tarps=[]},Dygraph.detectLineDelimiter=function(t){for(var e=0;e<t.length;e++){var a=t.charAt(e);if("\r"===a)return e+1<t.length&&"\n"===t.charAt(e+1)?"\r\n":a;if("\n"===a)return e+1<t.length&&"\r"===t.charAt(e+1)?"\n\r":a}return null},Dygraph.isNodeContainedBy=function(t,e){if(null===e||null===t)return!1;for(var a=t;a&&a!==e;)a=a.parentNode;return a===e},Dygraph.pow=function(t,e){return 0>e?1/Math.pow(t,-e):Math.pow(t,e)};var a=/^rgba?\((\d{1,3}),\s*(\d{1,3}),\s*(\d{1,3})(?:,\s*([01](?:\.\d+)?))?\)$/;Dygraph.toRGB_=function(e){var a=t(e);if(a)return a;var i=document.createElement("div");i.style.backgroundColor=e,i.style.visibility="hidden",document.body.appendChild(i);var r;return r=window.getComputedStyle?window.getComputedStyle(i,null).backgroundColor:i.currentStyle.backgroundColor,document.body.removeChild(i),t(r)},Dygraph.isCanvasSupported=function(t){var e;try{e=t||document.createElement("canvas"),e.getContext("2d")}catch(a){var i=navigator.appVersion.match(/MSIE (\d\.\d)/),r=-1!=navigator.userAgent.toLowerCase().indexOf("opera");return!i||i[1]<6||r?!1:!0}return!0},Dygraph.parseFloat_=function(t,e,a){var i=parseFloat(t);if(!isNaN(i))return i;if(/^ *$/.test(t))return null;if(/^ *nan *$/i.test(t))return 0/0;var r="Unable to parse '"+t+"' as a number";return void 0!==a&&void 0!==e&&(r+=" on line "+(1+(e||0))+" ('"+a+"') of CSV."),console.error(r),null}}(),function(){"use strict";Dygraph.GVizChart=function(t){this.container=t},Dygraph.GVizChart.prototype.draw=function(t,e){this.container.innerHTML="","undefined"!=typeof this.date_graph&&this.date_graph.destroy(),this.date_graph=new Dygraph(this.container,t,e)},Dygraph.GVizChart.prototype.setSelection=function(t){var e=!1;t.length&&(e=t[0].row),this.date_graph.setSelection(e)},Dygraph.GVizChart.prototype.getSelection=function(){var t=[],e=this.date_graph.getSelection();if(0>e)return t;for(var a=this.date_graph.layout_.points,i=0;i<a.length;++i)t.push({row:e,column:i+1});return t}}(),function(){"use strict";var t=100;Dygraph.Interaction={},Dygraph.Interaction.maybeTreatMouseOpAsClick=function(t,e,a){a.dragEndX=Dygraph.dragGetX_(t,a),a.dragEndY=Dygraph.dragGetY_(t,a);var i=Math.abs(a.dragEndX-a.dragStartX),r=Math.abs(a.dragEndY-a.dragStartY);2>i&&2>r&&void 0!==e.lastx_&&-1!=e.lastx_&&Dygraph.Interaction.treatMouseOpAsClick(e,t,a),a.regionWidth=i,a.regionHeight=r},Dygraph.Interaction.startPan=function(t,e,a){var i,r;a.isPanning=!0;var n=e.xAxisRange();if(e.getOptionForAxis("logscale","x")?(a.initialLeftmostDate=Dygraph.log10(n[0]),a.dateRange=Dygraph.log10(n[1])-Dygraph.log10(n[0])):(a.initialLeftmostDate=n[0],a.dateRange=n[1]-n[0]),a.xUnitsPerPixel=a.dateRange/(e.plotter_.area.w-1),e.getNumericOption("panEdgeFraction")){var o=e.width_*e.getNumericOption("panEdgeFraction"),s=e.xAxisExtremes(),l=e.toDomXCoord(s[0])-o,h=e.toDomXCoord(s[1])+o,p=e.toDataXCoord(l),g=e.toDataXCoord(h);a.boundedDates=[p,g];var d=[],u=e.height_*e.getNumericOption("panEdgeFraction");for(i=0;i<e.axes_.length;i++){r=e.axes_[i];var c=r.extremeRange,y=e.toDomYCoord(c[0],i)+u,_=e.toDomYCoord(c[1],i)-u,v=e.toDataYCoord(y,i),f=e.toDataYCoord(_,i);d[i]=[v,f]}a.boundedValues=d}for(a.is2DPan=!1,a.axes=[],i=0;i<e.axes_.length;i++){r=e.axes_[i];var x={},m=e.yAxisRange(i),D=e.attributes_.getForAxis("logscale",i);D?(x.initialTopValue=Dygraph.log10(m[1]),x.dragValueRange=Dygraph.log10(m[1])-Dygraph.log10(m[0])):(x.initialTopValue=m[1],x.dragValueRange=m[1]-m[0]),x.unitsPerPixel=x.dragValueRange/(e.plotter_.area.h-1),a.axes.push(x),(r.valueWindow||r.valueRange)&&(a.is2DPan=!0)}},Dygraph.Interaction.movePan=function(t,e,a){a.dragEndX=Dygraph.dragGetX_(t,a),a.dragEndY=Dygraph.dragGetY_(t,a);var i=a.initialLeftmostDate-(a.dragEndX-a.dragStartX)*a.xUnitsPerPixel;a.boundedDates&&(i=Math.max(i,a.boundedDates[0]));var r=i+a.dateRange;if(a.boundedDates&&r>a.boundedDates[1]&&(i-=r-a.boundedDates[1],r=i+a.dateRange),e.getOptionForAxis("logscale","x")?e.dateWindow_=[Math.pow(Dygraph.LOG_SCALE,i),Math.pow(Dygraph.LOG_SCALE,r)]:e.dateWindow_=[i,r],a.is2DPan)for(var n=a.dragEndY-a.dragStartY,o=0;o<e.axes_.length;o++){var s=e.axes_[o],l=a.axes[o],h=n*l.unitsPerPixel,p=a.boundedValues?a.boundedValues[o]:null,g=l.initialTopValue+h;p&&(g=Math.min(g,p[1]));var d=g-l.dragValueRange;p&&d<p[0]&&(g-=d-p[0],d=g-l.dragValueRange),e.attributes_.getForAxis("logscale",o)?s.valueWindow=[Math.pow(Dygraph.LOG_SCALE,d),Math.pow(Dygraph.LOG_SCALE,g)]:s.valueWindow=[d,g]}e.drawGraph_(!1)},Dygraph.Interaction.endPan=Dygraph.Interaction.maybeTreatMouseOpAsClick,Dygraph.Interaction.startZoom=function(t,e,a){a.isZooming=!0,a.zoomMoved=!1},Dygraph.Interaction.moveZoom=function(t,e,a){a.zoomMoved=!0,a.dragEndX=Dygraph.dragGetX_(t,a),a.dragEndY=Dygraph.dragGetY_(t,a);var i=Math.abs(a.dragStartX-a.dragEndX),r=Math.abs(a.dragStartY-a.dragEndY);a.dragDirection=r/2>i?Dygraph.VERTICAL:Dygraph.HORIZONTAL,e.drawZoomRect_(a.dragDirection,a.dragStartX,a.dragEndX,a.dragStartY,a.dragEndY,a.prevDragDirection,a.prevEndX,a.prevEndY),a.prevEndX=a.dragEndX,a.prevEndY=a.dragEndY,a.prevDragDirection=a.dragDirection},Dygraph.Interaction.treatMouseOpAsClick=function(t,e,a){for(var i=t.getFunctionOption("clickCallback"),r=t.getFunctionOption("pointClickCallback"),n=null,o=-1,s=Number.MAX_VALUE,l=0;l<t.selPoints_.length;l++){var h=t.selPoints_[l],p=Math.pow(h.canvasx-a.dragEndX,2)+Math.pow(h.canvasy-a.dragEndY,2);!isNaN(p)&&(-1==o||s>p)&&(s=p,o=l)}var g=t.getNumericOption("highlightCircleSize")+2;if(g*g>=s&&(n=t.selPoints_[o]),n){var d={cancelable:!0,point:n,canvasx:a.dragEndX,canvasy:a.dragEndY},u=t.cascadeEvents_("pointClick",d);if(u)return;r&&r.call(t,e,n)}var d={cancelable:!0,xval:t.lastx_,pts:t.selPoints_,canvasx:a.dragEndX,canvasy:a.dragEndY};t.cascadeEvents_("click",d)||i&&i.call(t,e,t.lastx_,t.selPoints_)},Dygraph.Interaction.endZoom=function(t,e,a){e.clearZoomRect_(),a.isZooming=!1,Dygraph.Interaction.maybeTreatMouseOpAsClick(t,e,a);var i=e.getArea();if(a.regionWidth>=10&&a.dragDirection==Dygraph.HORIZONTAL){var r=Math.min(a.dragStartX,a.dragEndX),n=Math.max(a.dragStartX,a.dragEndX);r=Math.max(r,i.x),n=Math.min(n,i.x+i.w),n>r&&e.doZoomX_(r,n),a.cancelNextDblclick=!0}else if(a.regionHeight>=10&&a.dragDirection==Dygraph.VERTICAL){var o=Math.min(a.dragStartY,a.dragEndY),s=Math.max(a.dragStartY,a.dragEndY);o=Math.max(o,i.y),s=Math.min(s,i.y+i.h),s>o&&e.doZoomY_(o,s),a.cancelNextDblclick=!0}a.dragStartX=null,a.dragStartY=null},Dygraph.Interaction.startTouch=function(t,e,a){t.preventDefault(),t.touches.length>1&&(a.startTimeForDoubleTapMs=null);for(var i=[],r=0;r<t.touches.length;r++){var n=t.touches[r];i.push({pageX:n.pageX,pageY:n.pageY,dataX:e.toDataXCoord(n.pageX),dataY:e.toDataYCoord(n.pageY)})}if(a.initialTouches=i,1==i.length)a.initialPinchCenter=i[0],a.touchDirections={x:!0,y:!0};else if(i.length>=2){a.initialPinchCenter={pageX:.5*(i[0].pageX+i[1].pageX),pageY:.5*(i[0].pageY+i[1].pageY),dataX:.5*(i[0].dataX+i[1].dataX),dataY:.5*(i[0].dataY+i[1].dataY)};var o=180/Math.PI*Math.atan2(a.initialPinchCenter.pageY-i[0].pageY,i[0].pageX-a.initialPinchCenter.pageX);o=Math.abs(o),o>90&&(o=90-o),a.touchDirections={x:67.5>o,y:o>22.5}}a.initialRange={x:e.xAxisRange(),y:e.yAxisRange()}},Dygraph.Interaction.moveTouch=function(t,e,a){a.startTimeForDoubleTapMs=null;var i,r=[];for(i=0;i<t.touches.length;i++){var n=t.touches[i];r.push({pageX:n.pageX,pageY:n.pageY})}var o,s=a.initialTouches,l=a.initialPinchCenter;o=1==r.length?r[0]:{pageX:.5*(r[0].pageX+r[1].pageX),pageY:.5*(r[0].pageY+r[1].pageY)};var h={pageX:o.pageX-l.pageX,pageY:o.pageY-l.pageY},p=a.initialRange.x[1]-a.initialRange.x[0],g=a.initialRange.y[0]-a.initialRange.y[1];h.dataX=h.pageX/e.plotter_.area.w*p,h.dataY=h.pageY/e.plotter_.area.h*g;var d,u;if(1==r.length)d=1,u=1;else if(r.length>=2){var c=s[1].pageX-l.pageX;d=(r[1].pageX-o.pageX)/c;var y=s[1].pageY-l.pageY;u=(r[1].pageY-o.pageY)/y}d=Math.min(8,Math.max(.125,d)),u=Math.min(8,Math.max(.125,u));var _=!1;if(a.touchDirections.x&&(e.dateWindow_=[l.dataX-h.dataX+(a.initialRange.x[0]-l.dataX)/d,l.dataX-h.dataX+(a.initialRange.x[1]-l.dataX)/d],_=!0),a.touchDirections.y)for(i=0;1>i;i++){var v=e.axes_[i],f=e.attributes_.getForAxis("logscale",i);f||(v.valueWindow=[l.dataY-h.dataY+(a.initialRange.y[0]-l.dataY)/u,l.dataY-h.dataY+(a.initialRange.y[1]-l.dataY)/u],_=!0)}if(e.drawGraph_(!1),_&&r.length>1&&e.getFunctionOption("zoomCallback")){var x=e.xAxisRange();e.getFunctionOption("zoomCallback").call(e,x[0],x[1],e.yAxisRanges())}},Dygraph.Interaction.endTouch=function(t,e,a){if(0!==t.touches.length)Dygraph.Interaction.startTouch(t,e,a);else if(1==t.changedTouches.length){var i=(new Date).getTime(),r=t.changedTouches[0];a.startTimeForDoubleTapMs&&i-a.startTimeForDoubleTapMs<500&&a.doubleTapX&&Math.abs(a.doubleTapX-r.screenX)<50&&a.doubleTapY&&Math.abs(a.doubleTapY-r.screenY)<50?e.resetZoom():(a.startTimeForDoubleTapMs=i,a.doubleTapX=r.screenX,a.doubleTapY=r.screenY)}};var e=function(t,e,a){return e>t?e-t:t>a?t-a:0},a=function(t,a){var i=Dygraph.findPos(a.canvas_),r={left:i.x,right:i.x+a.canvas_.offsetWidth,top:i.y,bottom:i.y+a.canvas_.offsetHeight},n={x:Dygraph.pageX(t),y:Dygraph.pageY(t)},o=e(n.x,r.left,r.right),s=e(n.y,r.top,r.bottom);return Math.max(o,s)};Dygraph.Interaction.defaultModel={mousedown:function(e,i,r){if(!e.button||2!=e.button){r.initializeMouseDown(e,i,r),e.altKey||e.shiftKey?Dygraph.startPan(e,i,r):Dygraph.startZoom(e,i,r);var n=function(e){if(r.isZooming){var n=a(e,i);t>n?Dygraph.moveZoom(e,i,r):null!==r.dragEndX&&(r.dragEndX=null,r.dragEndY=null,i.clearZoomRect_())}else r.isPanning&&Dygraph.movePan(e,i,r)},o=function(t){r.isZooming?null!==r.dragEndX?Dygraph.endZoom(t,i,r):Dygraph.Interaction.maybeTreatMouseOpAsClick(t,i,r):r.isPanning&&Dygraph.endPan(t,i,r),Dygraph.removeEvent(document,"mousemove",n),Dygraph.removeEvent(document,"mouseup",o),r.destroy()};i.addAndTrackEvent(document,"mousemove",n),i.addAndTrackEvent(document,"mouseup",o)}},willDestroyContextMyself:!0,touchstart:function(t,e,a){Dygraph.Interaction.startTouch(t,e,a)},touchmove:function(t,e,a){Dygraph.Interaction.moveTouch(t,e,a)},touchend:function(t,e,a){Dygraph.Interaction.endTouch(t,e,a)},dblclick:function(t,e,a){if(a.cancelNextDblclick)return void(a.cancelNextDblclick=!1);var i={canvasx:a.dragEndX,canvasy:a.dragEndY};e.cascadeEvents_("dblclick",i)||t.altKey||t.shiftKey||e.resetZoom()}},Dygraph.DEFAULT_ATTRS.interactionModel=Dygraph.Interaction.defaultModel,Dygraph.defaultInteractionModel=Dygraph.Interaction.defaultModel,Dygraph.endZoom=Dygraph.Interaction.endZoom,Dygraph.moveZoom=Dygraph.Interaction.moveZoom,Dygraph.startZoom=Dygraph.Interaction.startZoom,Dygraph.endPan=Dygraph.Interaction.endPan,Dygraph.movePan=Dygraph.Interaction.movePan,Dygraph.startPan=Dygraph.Interaction.startPan,Dygraph.Interaction.nonInteractiveModel_={mousedown:function(t,e,a){a.initializeMouseDown(t,e,a)},mouseup:Dygraph.Interaction.maybeTreatMouseOpAsClick},Dygraph.Interaction.dragIsPanInteractionModel={mousedown:function(t,e,a){a.initializeMouseDown(t,e,a),Dygraph.startPan(t,e,a)},mousemove:function(t,e,a){a.isPanning&&Dygraph.movePan(t,e,a)},mouseup:function(t,e,a){a.isPanning&&Dygraph.endPan(t,e,a)}}}(),function(){"use strict";Dygraph.TickList=void 0,Dygraph.Ticker=void 0,Dygraph.numericLinearTicks=function(t,e,a,i,r,n){var o=function(t){return"logscale"===t?!1:i(t)};return Dygraph.numericTicks(t,e,a,o,r,n)},Dygraph.numericTicks=function(t,e,a,i,r,n){var o,s,l,h,p=i("pixelsPerLabel"),g=[];if(n)for(o=0;o<n.length;o++)g.push({v:n[o]});else{if(i("logscale")){h=Math.floor(a/p);var d=Dygraph.binarySearch(t,Dygraph.PREFERRED_LOG_TICK_VALUES,1),u=Dygraph.binarySearch(e,Dygraph.PREFERRED_LOG_TICK_VALUES,-1);-1==d&&(d=0),-1==u&&(u=Dygraph.PREFERRED_LOG_TICK_VALUES.length-1);var c=null;if(u-d>=h/4){for(var y=u;y>=d;y--){var _=Dygraph.PREFERRED_LOG_TICK_VALUES[y],v=Math.log(_/t)/Math.log(e/t)*a,f={v:_};null===c?c={tickValue:_,pixel_coord:v}:Math.abs(v-c.pixel_coord)>=p?c={tickValue:_,pixel_coord:v}:f.label="",g.push(f)}g.reverse()}}if(0===g.length){var x,m,D=i("labelsKMG2");D?(x=[1,2,4,8,16,32,64,128,256],m=16):(x=[1,2,5,10,20,50,100],m=10);var w,A,b,T,E=Math.ceil(a/p),C=Math.abs(e-t)/E,L=Math.floor(Math.log(C)/Math.log(m)),P=Math.pow(m,L);for(s=0;s<x.length&&(w=P*x[s],A=Math.floor(t/w)*w,b=Math.ceil(e/w)*w,h=Math.abs(b-A)/w,T=a/h,!(T>p));s++);for(A>b&&(w*=-1),o=0;h>=o;o++)l=A+o*w,g.push({v:l})}}var S=i("axisLabelFormatter");for(o=0;o<g.length;o++)void 0===g[o].label&&(g[o].label=S.call(r,g[o].v,0,i,r));return g},Dygraph.dateTicker=function(t,e,a,i,r,n){var o=Dygraph.pickDateTickGranularity(t,e,a,i);return o>=0?Dygraph.getDateAxis(t,e,o,i,r):[]},Dygraph.SECONDLY=0,Dygraph.TWO_SECONDLY=1,Dygraph.FIVE_SECONDLY=2,Dygraph.TEN_SECONDLY=3,Dygraph.THIRTY_SECONDLY=4,Dygraph.MINUTELY=5,Dygraph.TWO_MINUTELY=6,Dygraph.FIVE_MINUTELY=7,Dygraph.TEN_MINUTELY=8,Dygraph.THIRTY_MINUTELY=9,Dygraph.HOURLY=10,Dygraph.TWO_HOURLY=11,Dygraph.SIX_HOURLY=12,Dygraph.DAILY=13,Dygraph.TWO_DAILY=14,Dygraph.WEEKLY=15,Dygraph.MONTHLY=16,Dygraph.QUARTERLY=17,Dygraph.BIANNUAL=18,Dygraph.ANNUAL=19,Dygraph.DECADAL=20,Dygraph.CENTENNIAL=21,Dygraph.NUM_GRANULARITIES=22,Dygraph.DATEFIELD_Y=0,Dygraph.DATEFIELD_M=1,Dygraph.DATEFIELD_D=2,Dygraph.DATEFIELD_HH=3,Dygraph.DATEFIELD_MM=4,Dygraph.DATEFIELD_SS=5,Dygraph.DATEFIELD_MS=6,Dygraph.NUM_DATEFIELDS=7,Dygraph.TICK_PLACEMENT=[],Dygraph.TICK_PLACEMENT[Dygraph.SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:1,spacing:1e3},Dygraph.TICK_PLACEMENT[Dygraph.TWO_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:2,spacing:2e3},Dygraph.TICK_PLACEMENT[Dygraph.FIVE_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:5,spacing:5e3},Dygraph.TICK_PLACEMENT[Dygraph.TEN_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:10,spacing:1e4},Dygraph.TICK_PLACEMENT[Dygraph.THIRTY_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:30,spacing:3e4},Dygraph.TICK_PLACEMENT[Dygraph.MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:1,spacing:6e4},Dygraph.TICK_PLACEMENT[Dygraph.TWO_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:2,spacing:12e4},Dygraph.TICK_PLACEMENT[Dygraph.FIVE_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:5,spacing:3e5},Dygraph.TICK_PLACEMENT[Dygraph.TEN_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:10,spacing:6e5},Dygraph.TICK_PLACEMENT[Dygraph.THIRTY_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:30,spacing:18e5},Dygraph.TICK_PLACEMENT[Dygraph.HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:1,spacing:36e5},Dygraph.TICK_PLACEMENT[Dygraph.TWO_HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:2,spacing:72e5},Dygraph.TICK_PLACEMENT[Dygraph.SIX_HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:6,spacing:216e5},Dygraph.TICK_PLACEMENT[Dygraph.DAILY]={datefield:Dygraph.DATEFIELD_D,step:1,spacing:864e5},Dygraph.TICK_PLACEMENT[Dygraph.TWO_DAILY]={datefield:Dygraph.DATEFIELD_D,step:2,spacing:1728e5},Dygraph.TICK_PLACEMENT[Dygraph.WEEKLY]={datefield:Dygraph.DATEFIELD_D,step:7,spacing:6048e5},Dygraph.TICK_PLACEMENT[Dygraph.MONTHLY]={datefield:Dygraph.DATEFIELD_M,step:1,spacing:2629817280},Dygraph.TICK_PLACEMENT[Dygraph.QUARTERLY]={datefield:Dygraph.DATEFIELD_M,step:3,spacing:216e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.BIANNUAL]={datefield:Dygraph.DATEFIELD_M,step:6,spacing:432e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.ANNUAL]={datefield:Dygraph.DATEFIELD_Y,step:1,spacing:864e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.DECADAL]={datefield:Dygraph.DATEFIELD_Y,step:10,spacing:315578073600},Dygraph.TICK_PLACEMENT[Dygraph.CENTENNIAL]={datefield:Dygraph.DATEFIELD_Y,step:100,spacing:3155780736e3},Dygraph.PREFERRED_LOG_TICK_VALUES=function(){for(var t=[],e=-39;39>=e;e++)for(var a=Math.pow(10,e),i=1;9>=i;i++){var r=a*i;t.push(r)}return t}(),Dygraph.pickDateTickGranularity=function(t,e,a,i){for(var r=i("pixelsPerLabel"),n=0;n<Dygraph.NUM_GRANULARITIES;n++){var o=Dygraph.numDateTicks(t,e,n);if(a/o>=r)return n}return-1},Dygraph.numDateTicks=function(t,e,a){var i=Dygraph.TICK_PLACEMENT[a].spacing;return Math.round(1*(e-t)/i)},Dygraph.getDateAxis=function(t,e,a,i,r){var n=i("axisLabelFormatter"),o=i("labelsUTC"),s=o?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal,l=Dygraph.TICK_PLACEMENT[a].datefield,h=Dygraph.TICK_PLACEMENT[a].step,p=Dygraph.TICK_PLACEMENT[a].spacing,g=new Date(t),d=[];d[Dygraph.DATEFIELD_Y]=s.getFullYear(g),d[Dygraph.DATEFIELD_M]=s.getMonth(g),d[Dygraph.DATEFIELD_D]=s.getDate(g),d[Dygraph.DATEFIELD_HH]=s.getHours(g),d[Dygraph.DATEFIELD_MM]=s.getMinutes(g),d[Dygraph.DATEFIELD_SS]=s.getSeconds(g),d[Dygraph.DATEFIELD_MS]=s.getMilliseconds(g);var u=d[l]%h;a==Dygraph.WEEKLY&&(u=s.getDay(g)),d[l]-=u;for(var c=l+1;c<Dygraph.NUM_DATEFIELDS;c++)d[c]=c===Dygraph.DATEFIELD_D?1:0;var y=[],_=s.makeDate.apply(null,d),v=_.getTime();if(a<=Dygraph.HOURLY)for(t>v&&(v+=p,_=new Date(v));e>=v;)y.push({v:v,label:n.call(r,_,a,i,r)}),v+=p,_=new Date(v);else for(t>v&&(d[l]+=h,_=s.makeDate.apply(null,d),v=_.getTime());e>=v;)(a>=Dygraph.DAILY||s.getHours(_)%h===0)&&y.push({v:v,label:n.call(r,_,a,i,r)}),d[l]+=h,_=s.makeDate.apply(null,d),v=_.getTime();return y},Dygraph&&Dygraph.DEFAULT_ATTRS&&Dygraph.DEFAULT_ATTRS.axes&&Dygraph.DEFAULT_ATTRS.axes.x&&Dygraph.DEFAULT_ATTRS.axes.y&&Dygraph.DEFAULT_ATTRS.axes.y2&&(Dygraph.DEFAULT_ATTRS.axes.x.ticker=Dygraph.dateTicker,Dygraph.DEFAULT_ATTRS.axes.y.ticker=Dygraph.numericTicks,Dygraph.DEFAULT_ATTRS.axes.y2.ticker=Dygraph.numericTicks)}(),Dygraph.Plugins={},Dygraph.Plugins.Annotations=function(){"use strict";var t=function(){this.annotations_=[]};return t.prototype.toString=function(){return"Annotations Plugin"},t.prototype.activate=function(t){return{clearChart:this.clearChart,didDrawChart:this.didDrawChart}},t.prototype.detachLabels=function(){for(var t=0;t<this.annotations_.length;t++){var e=this.annotations_[t];e.parentNode&&e.parentNode.removeChild(e),this.annotations_[t]=null}this.annotations_=[]},t.prototype.clearChart=function(t){this.detachLabels()},t.prototype.didDrawChart=function(t){var e=t.dygraph,a=e.layout_.annotated_points;if(a&&0!==a.length)for(var i=t.canvas.parentNode,r={position:"absolute",fontSize:e.getOption("axisLabelFontSize")+"px",zIndex:10,overflow:"hidden"},n=function(t,a,i){return function(r){var n=i.annotation;n.hasOwnProperty(t)?n[t](n,i,e,r):e.getOption(a)&&e.getOption(a)(n,i,e,r)}},o=t.dygraph.plotter_.area,s={},l=0;l<a.length;l++){var h=a[l];if(!(h.canvasx<o.x||h.canvasx>o.x+o.w||h.canvasy<o.y||h.canvasy>o.y+o.h)){var p=h.annotation,g=6;p.hasOwnProperty("tickHeight")&&(g=p.tickHeight);var d=document.createElement("div");for(var u in r)r.hasOwnProperty(u)&&(d.style[u]=r[u]);p.hasOwnProperty("icon")||(d.className="dygraphDefaultAnnotation"),p.hasOwnProperty("cssClass")&&(d.className+=" "+p.cssClass);var c=p.hasOwnProperty("width")?p.width:16,y=p.hasOwnProperty("height")?p.height:16;if(p.hasOwnProperty("icon")){var _=document.createElement("img");_.src=p.icon,_.width=c,_.height=y,d.appendChild(_)}else h.annotation.hasOwnProperty("shortText")&&d.appendChild(document.createTextNode(h.annotation.shortText));var v=h.canvasx-c/2;d.style.left=v+"px";var f=0;if(p.attachAtBottom){var x=o.y+o.h-y-g;s[v]?x-=s[v]:s[v]=0,s[v]+=g+y,f=x}else f=h.canvasy-y-g;d.style.top=f+"px",d.style.width=c+"px",d.style.height=y+"px",d.title=h.annotation.text,d.style.color=e.colorsMap_[h.name],d.style.borderColor=e.colorsMap_[h.name],p.div=d,e.addAndTrackEvent(d,"click",n("clickHandler","annotationClickHandler",h,this)),e.addAndTrackEvent(d,"mouseover",n("mouseOverHandler","annotationMouseOverHandler",h,this)),e.addAndTrackEvent(d,"mouseout",n("mouseOutHandler","annotationMouseOutHandler",h,this)),e.addAndTrackEvent(d,"dblclick",n("dblClickHandler","annotationDblClickHandler",h,this)),i.appendChild(d),this.annotations_.push(d);var m=t.drawingContext;if(m.save(),m.strokeStyle=e.colorsMap_[h.name],m.beginPath(),p.attachAtBottom){var x=f+y;m.moveTo(h.canvasx,x),m.lineTo(h.canvasx,x+g)}else m.moveTo(h.canvasx,h.canvasy),m.lineTo(h.canvasx,h.canvasy-2-g);m.closePath(),m.stroke(),m.restore()}}},t.prototype.destroy=function(){this.detachLabels()},t}(),Dygraph.Plugins.Axes=function(){"use strict";var t=function(){this.xlabels_=[],this.ylabels_=[]};return t.prototype.toString=function(){return"Axes Plugin"},t.prototype.activate=function(t){return{layout:this.layout,clearChart:this.clearChart,willDrawChart:this.willDrawChart}},t.prototype.layout=function(t){var e=t.dygraph;if(e.getOptionForAxis("drawAxis","y")){var a=e.getOptionForAxis("axisLabelWidth","y")+2*e.getOptionForAxis("axisTickSize","y");t.reserveSpaceLeft(a)}if(e.getOptionForAxis("drawAxis","x")){var i;i=e.getOption("xAxisHeight")?e.getOption("xAxisHeight"):e.getOptionForAxis("axisLabelFontSize","x")+2*e.getOptionForAxis("axisTickSize","x"),t.reserveSpaceBottom(i)}if(2==e.numAxes()){if(e.getOptionForAxis("drawAxis","y2")){var a=e.getOptionForAxis("axisLabelWidth","y2")+2*e.getOptionForAxis("axisTickSize","y2");t.reserveSpaceRight(a)}}else e.numAxes()>2&&e.error("Only two y-axes are supported at this time. (Trying to use "+e.numAxes()+")")},t.prototype.detachLabels=function(){function t(t){for(var e=0;e<t.length;e++){var a=t[e];a.parentNode&&a.parentNode.removeChild(a)}}t(this.xlabels_),t(this.ylabels_),this.xlabels_=[],this.ylabels_=[]},t.prototype.clearChart=function(t){this.detachLabels()},t.prototype.willDrawChart=function(t){function e(t){return Math.round(t)+.5}function a(t){return Math.round(t)-.5}var i=t.dygraph;if(i.getOptionForAxis("drawAxis","x")||i.getOptionForAxis("drawAxis","y")||i.getOptionForAxis("drawAxis","y2")){var r,n,o,s,l,h=t.drawingContext,p=t.canvas.parentNode,g=i.width_,d=i.height_,u=function(t){return{position:"absolute",fontSize:i.getOptionForAxis("axisLabelFontSize",t)+"px",zIndex:10,color:i.getOptionForAxis("axisLabelColor",t),width:i.getOptionForAxis("axisLabelWidth",t)+"px",lineHeight:"normal",overflow:"hidden"}},c={x:u("x"),y:u("y"),y2:u("y2")},y=function(t,e,a){var i=document.createElement("div"),r=c["y2"==a?"y2":e];for(var n in r)r.hasOwnProperty(n)&&(i.style[n]=r[n]);var o=document.createElement("div");return o.className="dygraph-axis-label dygraph-axis-label-"+e+(a?" dygraph-axis-label-"+a:""),o.innerHTML=t,i.appendChild(o),i};h.save();var _=i.layout_,v=t.dygraph.plotter_.area,f=function(t){return function(e){return i.getOptionForAxis(e,t)}};if(i.getOptionForAxis("drawAxis","y")){if(_.yticks&&_.yticks.length>0){var x=i.numAxes(),m=[f("y"),f("y2")];for(l=0;l<_.yticks.length;l++){if(s=_.yticks[l],"function"==typeof s)return;n=v.x;var D=1,w="y1",A=m[0];1==s[0]&&(n=v.x+v.w,D=-1,w="y2",A=m[1]);var b=A("axisLabelFontSize");o=v.y+s[1]*v.h,r=y(s[2],"y",2==x?w:null);var T=o-b/2;0>T&&(T=0),T+b+3>d?r.style.bottom="0":r.style.top=T+"px",0===s[0]?(r.style.left=v.x-A("axisLabelWidth")-A("axisTickSize")+"px",r.style.textAlign="right"):1==s[0]&&(r.style.left=v.x+v.w+A("axisTickSize")+"px",r.style.textAlign="left"),r.style.width=A("axisLabelWidth")+"px",p.appendChild(r),this.ylabels_.push(r)}var E=this.ylabels_[0],b=i.getOptionForAxis("axisLabelFontSize","y"),C=parseInt(E.style.top,10)+b;C>d-b&&(E.style.top=parseInt(E.style.top,10)-b/2+"px")}var L;if(i.getOption("drawAxesAtZero")){var P=i.toPercentXCoord(0);(P>1||0>P||isNaN(P))&&(P=0),L=e(v.x+P*v.w)}else L=e(v.x);h.strokeStyle=i.getOptionForAxis("axisLineColor","y"),h.lineWidth=i.getOptionForAxis("axisLineWidth","y"),h.beginPath(),h.moveTo(L,a(v.y)),h.lineTo(L,a(v.y+v.h)),h.closePath(),h.stroke(),2==i.numAxes()&&(h.strokeStyle=i.getOptionForAxis("axisLineColor","y2"),h.lineWidth=i.getOptionForAxis("axisLineWidth","y2"),h.beginPath(),h.moveTo(a(v.x+v.w),a(v.y)),h.lineTo(a(v.x+v.w),a(v.y+v.h)),h.closePath(),h.stroke())}if(i.getOptionForAxis("drawAxis","x")){if(_.xticks){var A=f("x");for(l=0;l<_.xticks.length;l++){s=_.xticks[l],n=v.x+s[0]*v.w,o=v.y+v.h,r=y(s[1],"x"),r.style.textAlign="center",r.style.top=o+A("axisTickSize")+"px";var S=n-A("axisLabelWidth")/2;S+A("axisLabelWidth")>g&&(S=g-A("axisLabelWidth"),r.style.textAlign="right"),0>S&&(S=0,r.style.textAlign="left"),r.style.left=S+"px",r.style.width=A("axisLabelWidth")+"px",
+p.appendChild(r),this.xlabels_.push(r)}}h.strokeStyle=i.getOptionForAxis("axisLineColor","x"),h.lineWidth=i.getOptionForAxis("axisLineWidth","x"),h.beginPath();var O;if(i.getOption("drawAxesAtZero")){var P=i.toPercentYCoord(0,0);(P>1||0>P)&&(P=1),O=a(v.y+P*v.h)}else O=a(v.y+v.h);h.moveTo(e(v.x),O),h.lineTo(e(v.x+v.w),O),h.closePath(),h.stroke()}h.restore()}},t}(),Dygraph.Plugins.ChartLabels=function(){"use strict";var t=function(){this.title_div_=null,this.xlabel_div_=null,this.ylabel_div_=null,this.y2label_div_=null};t.prototype.toString=function(){return"ChartLabels Plugin"},t.prototype.activate=function(t){return{layout:this.layout,didDrawChart:this.didDrawChart}};var e=function(t){var e=document.createElement("div");return e.style.position="absolute",e.style.left=t.x+"px",e.style.top=t.y+"px",e.style.width=t.w+"px",e.style.height=t.h+"px",e};t.prototype.detachLabels_=function(){for(var t=[this.title_div_,this.xlabel_div_,this.ylabel_div_,this.y2label_div_],e=0;e<t.length;e++){var a=t[e];a&&a.parentNode&&a.parentNode.removeChild(a)}this.title_div_=null,this.xlabel_div_=null,this.ylabel_div_=null,this.y2label_div_=null};var a=function(t,e,a,i,r){var n=document.createElement("div");n.style.position="absolute",1==a?n.style.left="0px":n.style.left=e.x+"px",n.style.top=e.y+"px",n.style.width=e.w+"px",n.style.height=e.h+"px",n.style.fontSize=t.getOption("yLabelWidth")-2+"px";var o=document.createElement("div");o.style.position="absolute",o.style.width=e.h+"px",o.style.height=e.w+"px",o.style.top=e.h/2-e.w/2+"px",o.style.left=e.w/2-e.h/2+"px",o.style.textAlign="center";var s="rotate("+(1==a?"-":"")+"90deg)";o.style.transform=s,o.style.WebkitTransform=s,o.style.MozTransform=s,o.style.OTransform=s,o.style.msTransform=s,"undefined"!=typeof document.documentMode&&document.documentMode<9&&(o.style.filter="progid:DXImageTransform.Microsoft.BasicImage(rotation="+(1==a?"3":"1")+")",o.style.left="0px",o.style.top="0px");var l=document.createElement("div");return l.className=i,l.innerHTML=r,o.appendChild(l),n.appendChild(o),n};return t.prototype.layout=function(t){this.detachLabels_();var i=t.dygraph,r=t.chart_div;if(i.getOption("title")){var n=t.reserveSpaceTop(i.getOption("titleHeight"));this.title_div_=e(n),this.title_div_.style.textAlign="center",this.title_div_.style.fontSize=i.getOption("titleHeight")-8+"px",this.title_div_.style.fontWeight="bold",this.title_div_.style.zIndex=10;var o=document.createElement("div");o.className="dygraph-label dygraph-title",o.innerHTML=i.getOption("title"),this.title_div_.appendChild(o),r.appendChild(this.title_div_)}if(i.getOption("xlabel")){var s=t.reserveSpaceBottom(i.getOption("xLabelHeight"));this.xlabel_div_=e(s),this.xlabel_div_.style.textAlign="center",this.xlabel_div_.style.fontSize=i.getOption("xLabelHeight")-2+"px";var o=document.createElement("div");o.className="dygraph-label dygraph-xlabel",o.innerHTML=i.getOption("xlabel"),this.xlabel_div_.appendChild(o),r.appendChild(this.xlabel_div_)}if(i.getOption("ylabel")){var l=t.reserveSpaceLeft(0);this.ylabel_div_=a(i,l,1,"dygraph-label dygraph-ylabel",i.getOption("ylabel")),r.appendChild(this.ylabel_div_)}if(i.getOption("y2label")&&2==i.numAxes()){var h=t.reserveSpaceRight(0);this.y2label_div_=a(i,h,2,"dygraph-label dygraph-y2label",i.getOption("y2label")),r.appendChild(this.y2label_div_)}},t.prototype.didDrawChart=function(t){var e=t.dygraph;this.title_div_&&(this.title_div_.children[0].innerHTML=e.getOption("title")),this.xlabel_div_&&(this.xlabel_div_.children[0].innerHTML=e.getOption("xlabel")),this.ylabel_div_&&(this.ylabel_div_.children[0].children[0].innerHTML=e.getOption("ylabel")),this.y2label_div_&&(this.y2label_div_.children[0].children[0].innerHTML=e.getOption("y2label"))},t.prototype.clearChart=function(){},t.prototype.destroy=function(){this.detachLabels_()},t}(),Dygraph.Plugins.Grid=function(){"use strict";var t=function(){};return t.prototype.toString=function(){return"Gridline Plugin"},t.prototype.activate=function(t){return{willDrawChart:this.willDrawChart}},t.prototype.willDrawChart=function(t){function e(t){return Math.round(t)+.5}function a(t){return Math.round(t)-.5}var i,r,n,o,s=t.dygraph,l=t.drawingContext,h=s.layout_,p=t.dygraph.plotter_.area;if(s.getOptionForAxis("drawGrid","y")){for(var g=["y","y2"],d=[],u=[],c=[],y=[],_=[],n=0;n<g.length;n++)c[n]=s.getOptionForAxis("drawGrid",g[n]),c[n]&&(d[n]=s.getOptionForAxis("gridLineColor",g[n]),u[n]=s.getOptionForAxis("gridLineWidth",g[n]),_[n]=s.getOptionForAxis("gridLinePattern",g[n]),y[n]=_[n]&&_[n].length>=2);for(o=h.yticks,l.save(),n=0;n<o.length;n++){var v=o[n][0];c[v]&&(y[v]&&l.installPattern(_[v]),l.strokeStyle=d[v],l.lineWidth=u[v],i=e(p.x),r=a(p.y+o[n][1]*p.h),l.beginPath(),l.moveTo(i,r),l.lineTo(i+p.w,r),l.closePath(),l.stroke(),y[v]&&l.uninstallPattern())}l.restore()}if(s.getOptionForAxis("drawGrid","x")){o=h.xticks,l.save();var _=s.getOptionForAxis("gridLinePattern","x"),y=_&&_.length>=2;for(y&&l.installPattern(_),l.strokeStyle=s.getOptionForAxis("gridLineColor","x"),l.lineWidth=s.getOptionForAxis("gridLineWidth","x"),n=0;n<o.length;n++)i=e(p.x+o[n][0]*p.w),r=a(p.y+p.h),l.beginPath(),l.moveTo(i,r),l.lineTo(i,p.y),l.closePath(),l.stroke();y&&l.uninstallPattern(),l.restore()}},t.prototype.destroy=function(){},t}(),Dygraph.Plugins.Legend=function(){"use strict";var t=function(){this.legend_div_=null,this.is_generated_div_=!1};t.prototype.toString=function(){return"Legend Plugin"};var e;t.prototype.activate=function(t){var e,a=t.getOption("labelsDivWidth"),i=t.getOption("labelsDiv");if(i&&null!==i)e="string"==typeof i||i instanceof String?document.getElementById(i):i;else{var r={position:"absolute",fontSize:"14px",zIndex:10,width:a+"px",top:"0px",left:t.size().width-a-2+"px",background:"white",lineHeight:"normal",textAlign:"left",overflow:"hidden"};Dygraph.update(r,t.getOption("labelsDivStyles")),e=document.createElement("div"),e.className="dygraph-legend";for(var n in r)if(r.hasOwnProperty(n))try{e.style[n]=r[n]}catch(o){console.warn("You are using unsupported css properties for your browser in labelsDivStyles")}t.graphDiv.appendChild(e),this.is_generated_div_=!0}return this.legend_div_=e,this.one_em_width_=10,{select:this.select,deselect:this.deselect,predraw:this.predraw,didDrawChart:this.didDrawChart}};var a=function(t){var e=document.createElement("span");e.setAttribute("style","margin: 0; padding: 0 0 0 1em; border: 0;"),t.appendChild(e);var a=e.offsetWidth;return t.removeChild(e),a},i=function(t){return t.replace(/&/g,"&amp;").replace(/"/g,"&quot;").replace(/</g,"&lt;").replace(/>/g,"&gt;")};return t.prototype.select=function(e){var a=e.selectedX,i=e.selectedPoints,r=e.selectedRow,n=e.dygraph.getOption("legend");if("never"===n)return void(this.legend_div_.style.display="none");if("follow"===n){var o=e.dygraph.plotter_.area,s=e.dygraph.getOption("labelsDivWidth"),l=e.dygraph.getOptionForAxis("axisLabelWidth","y"),h=i[0].x*o.w+20,p=i[0].y*o.h-20;h+s+1>window.scrollX+window.innerWidth&&(h=h-40-s-(l-o.x)),e.dygraph.graphDiv.appendChild(this.legend_div_),this.legend_div_.style.left=l+h+"px",this.legend_div_.style.top=p+"px"}var g=t.generateLegendHTML(e.dygraph,a,i,this.one_em_width_,r);this.legend_div_.innerHTML=g,this.legend_div_.style.display=""},t.prototype.deselect=function(e){var i=e.dygraph.getOption("legend");"always"!==i&&(this.legend_div_.style.display="none");var r=a(this.legend_div_);this.one_em_width_=r;var n=t.generateLegendHTML(e.dygraph,void 0,void 0,r,null);this.legend_div_.innerHTML=n},t.prototype.didDrawChart=function(t){this.deselect(t)},t.prototype.predraw=function(t){if(this.is_generated_div_){t.dygraph.graphDiv.appendChild(this.legend_div_);var e=t.dygraph.plotter_.area,a=t.dygraph.getOption("labelsDivWidth");this.legend_div_.style.left=e.x+e.w-a-1+"px",this.legend_div_.style.top=e.y+"px",this.legend_div_.style.width=a+"px"}},t.prototype.destroy=function(){this.legend_div_=null},t.generateLegendHTML=function(t,a,r,n,o){if(t.getOption("showLabelsOnHighlight")!==!0)return"";var s,l,h,p,g,d=t.getLabels();if("undefined"==typeof a){if("always"!=t.getOption("legend"))return"";for(l=t.getOption("labelsSeparateLines"),s="",h=1;h<d.length;h++){var u=t.getPropertiesForSeries(d[h]);u.visible&&(""!==s&&(s+=l?"<br/>":" "),g=t.getOption("strokePattern",d[h]),p=e(g,u.color,n),s+="<span style='font-weight: bold; color: "+u.color+";'>"+p+" "+i(d[h])+"</span>")}return s}var c=t.optionsViewForAxis_("x"),y=c("valueFormatter");s=y.call(t,a,c,d[0],t,o,0),""!==s&&(s+=":");var _=[],v=t.numAxes();for(h=0;v>h;h++)_[h]=t.optionsViewForAxis_("y"+(h?1+h:""));var f=t.getOption("labelsShowZeroValues");l=t.getOption("labelsSeparateLines");var x=t.getHighlightSeries();for(h=0;h<r.length;h++){var m=r[h];if((0!==m.yval||f)&&Dygraph.isOK(m.canvasy)){l&&(s+="<br/>");var u=t.getPropertiesForSeries(m.name),D=_[u.axis-1],w=D("valueFormatter"),A=w.call(t,m.yval,D,m.name,t,o,d.indexOf(m.name)),b=m.name==x?" class='highlight'":"";s+="<span"+b+"> <b><span style='color: "+u.color+";'>"+i(m.name)+"</span></b>:&#160;"+A+"</span>"}}return s},e=function(t,e,a){var i=/MSIE/.test(navigator.userAgent)&&!window.opera;if(i)return"&mdash;";if(!t||t.length<=1)return'<div style="display: inline-block; position: relative; bottom: .5ex; padding-left: 1em; height: 1px; border-bottom: 2px solid '+e+';"></div>';var r,n,o,s,l,h=0,p=0,g=[];for(r=0;r<=t.length;r++)h+=t[r%t.length];if(l=Math.floor(a/(h-t[0])),l>1){for(r=0;r<t.length;r++)g[r]=t[r]/a;p=g.length}else{for(l=1,r=0;r<t.length;r++)g[r]=t[r]/h;p=g.length+1}var d="";for(n=0;l>n;n++)for(r=0;p>r;r+=2)o=g[r%g.length],s=r<t.length?g[(r+1)%g.length]:0,d+='<div style="display: inline-block; position: relative; bottom: .5ex; margin-right: '+s+"em; padding-left: "+o+"em; height: 1px; border-bottom: 2px solid "+e+';"></div>';return d},t}(),Dygraph.Plugins.RangeSelector=function(){"use strict";var t=function(){this.isIE_=/MSIE/.test(navigator.userAgent)&&!window.opera,this.hasTouchInterface_="undefined"!=typeof TouchEvent,this.isMobileDevice_=/mobile|android/gi.test(navigator.appVersion),this.interfaceCreated_=!1};return t.prototype.toString=function(){return"RangeSelector Plugin"},t.prototype.activate=function(t){return this.dygraph_=t,this.isUsingExcanvas_=t.isUsingExcanvas_,this.getOption_("showRangeSelector")&&this.createInterface_(),{layout:this.reserveSpace_,predraw:this.renderStaticLayer_,didDrawChart:this.renderInteractiveLayer_}},t.prototype.destroy=function(){this.bgcanvas_=null,this.fgcanvas_=null,this.leftZoomHandle_=null,this.rightZoomHandle_=null,this.iePanOverlay_=null},t.prototype.getOption_=function(t,e){return this.dygraph_.getOption(t,e)},t.prototype.setDefaultOption_=function(t,e){this.dygraph_.attrs_[t]=e},t.prototype.createInterface_=function(){this.createCanvases_(),this.isUsingExcanvas_&&this.createIEPanOverlay_(),this.createZoomHandles_(),this.initInteraction_(),this.getOption_("animatedZooms")&&(console.warn("Animated zooms and range selector are not compatible; disabling animatedZooms."),this.dygraph_.updateOptions({animatedZooms:!1},!0)),this.interfaceCreated_=!0,this.addToGraph_()},t.prototype.addToGraph_=function(){var t=this.graphDiv_=this.dygraph_.graphDiv;t.appendChild(this.bgcanvas_),t.appendChild(this.fgcanvas_),t.appendChild(this.leftZoomHandle_),t.appendChild(this.rightZoomHandle_)},t.prototype.removeFromGraph_=function(){var t=this.graphDiv_;t.removeChild(this.bgcanvas_),t.removeChild(this.fgcanvas_),t.removeChild(this.leftZoomHandle_),t.removeChild(this.rightZoomHandle_),this.graphDiv_=null},t.prototype.reserveSpace_=function(t){this.getOption_("showRangeSelector")&&t.reserveSpaceBottom(this.getOption_("rangeSelectorHeight")+4)},t.prototype.renderStaticLayer_=function(){this.updateVisibility_()&&(this.resize_(),this.drawStaticLayer_())},t.prototype.renderInteractiveLayer_=function(){this.updateVisibility_()&&!this.isChangingRange_&&(this.placeZoomHandles_(),this.drawInteractiveLayer_())},t.prototype.updateVisibility_=function(){var t=this.getOption_("showRangeSelector");if(t)this.interfaceCreated_?this.graphDiv_&&this.graphDiv_.parentNode||this.addToGraph_():this.createInterface_();else if(this.graphDiv_){this.removeFromGraph_();var e=this.dygraph_;setTimeout(function(){e.width_=0,e.resize()},1)}return t},t.prototype.resize_=function(){function t(t,e,a){var i=Dygraph.getContextPixelRatio(e);t.style.top=a.y+"px",t.style.left=a.x+"px",t.width=a.w*i,t.height=a.h*i,t.style.width=a.w+"px",t.style.height=a.h+"px",1!=i&&e.scale(i,i)}var e=this.dygraph_.layout_.getPlotArea(),a=0;this.dygraph_.getOptionForAxis("drawAxis","x")&&(a=this.getOption_("xAxisHeight")||this.getOption_("axisLabelFontSize")+2*this.getOption_("axisTickSize")),this.canvasRect_={x:e.x,y:e.y+e.h+a+4,w:e.w,h:this.getOption_("rangeSelectorHeight")},t(this.bgcanvas_,this.bgcanvas_ctx_,this.canvasRect_),t(this.fgcanvas_,this.fgcanvas_ctx_,this.canvasRect_)},t.prototype.createCanvases_=function(){this.bgcanvas_=Dygraph.createCanvas(),this.bgcanvas_.className="dygraph-rangesel-bgcanvas",this.bgcanvas_.style.position="absolute",this.bgcanvas_.style.zIndex=9,this.bgcanvas_ctx_=Dygraph.getContext(this.bgcanvas_),this.fgcanvas_=Dygraph.createCanvas(),this.fgcanvas_.className="dygraph-rangesel-fgcanvas",this.fgcanvas_.style.position="absolute",this.fgcanvas_.style.zIndex=9,this.fgcanvas_.style.cursor="default",this.fgcanvas_ctx_=Dygraph.getContext(this.fgcanvas_)},t.prototype.createIEPanOverlay_=function(){this.iePanOverlay_=document.createElement("div"),this.iePanOverlay_.style.position="absolute",this.iePanOverlay_.style.backgroundColor="white",this.iePanOverlay_.style.filter="alpha(opacity=0)",this.iePanOverlay_.style.display="none",this.iePanOverlay_.style.cursor="move",this.fgcanvas_.appendChild(this.iePanOverlay_)},t.prototype.createZoomHandles_=function(){var t=new Image;t.className="dygraph-rangesel-zoomhandle",t.style.position="absolute",t.style.zIndex=10,t.style.visibility="hidden",t.style.cursor="col-resize",/MSIE 7/.test(navigator.userAgent)?(t.width=7,t.height=14,t.style.backgroundColor="white",t.style.border="1px solid #333333"):(t.width=9,t.height=16,t.src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAkAAAAQCAYAAADESFVDAAAAAXNSR0IArs4c6QAAAAZiS0dEANAAzwDP4Z7KegAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAAd0SU1FB9sHGw0cMqdt1UwAAAAZdEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIEdJTVBXgQ4XAAAAaElEQVQoz+3SsRFAQBCF4Z9WJM8KCDVwownl6YXsTmCUsyKGkZzcl7zkz3YLkypgAnreFmDEpHkIwVOMfpdi9CEEN2nGpFdwD03yEqDtOgCaun7sqSTDH32I1pQA2Pb9sZecAxc5r3IAb21d6878xsAAAAAASUVORK5CYII="),this.isMobileDevice_&&(t.width*=2,t.height*=2),this.leftZoomHandle_=t,this.rightZoomHandle_=t.cloneNode(!1)},t.prototype.initInteraction_=function(){var t,e,a,i,r,n,o,s,l,h,p,g,d,u,c=this,y=document,_=0,v=null,f=!1,x=!1,m=!this.isMobileDevice_&&!this.isUsingExcanvas_,D=new Dygraph.IFrameTarp;t=function(t){var e=c.dygraph_.xAxisExtremes(),a=(e[1]-e[0])/c.canvasRect_.w,i=e[0]+(t.leftHandlePos-c.canvasRect_.x)*a,r=e[0]+(t.rightHandlePos-c.canvasRect_.x)*a;return[i,r]},e=function(t){return Dygraph.cancelEvent(t),f=!0,_=t.clientX,v=t.target?t.target:t.srcElement,("mousedown"===t.type||"dragstart"===t.type)&&(Dygraph.addEvent(y,"mousemove",a),Dygraph.addEvent(y,"mouseup",i)),c.fgcanvas_.style.cursor="col-resize",D.cover(),!0},a=function(t){if(!f)return!1;Dygraph.cancelEvent(t);var e=t.clientX-_;if(Math.abs(e)<4)return!0;_=t.clientX;var a,i=c.getZoomHandleStatus_();v==c.leftZoomHandle_?(a=i.leftHandlePos+e,a=Math.min(a,i.rightHandlePos-v.width-3),a=Math.max(a,c.canvasRect_.x)):(a=i.rightHandlePos+e,a=Math.min(a,c.canvasRect_.x+c.canvasRect_.w),a=Math.max(a,i.leftHandlePos+v.width+3));var n=v.width/2;return v.style.left=a-n+"px",c.drawInteractiveLayer_(),m&&r(),!0},i=function(t){return f?(f=!1,D.uncover(),Dygraph.removeEvent(y,"mousemove",a),Dygraph.removeEvent(y,"mouseup",i),c.fgcanvas_.style.cursor="default",m||r(),!0):!1},r=function(){try{var e=c.getZoomHandleStatus_();if(c.isChangingRange_=!0,e.isZoomed){var a=t(e);c.dygraph_.doZoomXDates_(a[0],a[1])}else c.dygraph_.resetZoom()}finally{c.isChangingRange_=!1}},n=function(t){if(c.isUsingExcanvas_)return t.srcElement==c.iePanOverlay_;var e=c.leftZoomHandle_.getBoundingClientRect(),a=e.left+e.width/2;e=c.rightZoomHandle_.getBoundingClientRect();var i=e.left+e.width/2;return t.clientX>a&&t.clientX<i},o=function(t){return!x&&n(t)&&c.getZoomHandleStatus_().isZoomed?(Dygraph.cancelEvent(t),x=!0,_=t.clientX,"mousedown"===t.type&&(Dygraph.addEvent(y,"mousemove",s),Dygraph.addEvent(y,"mouseup",l)),!0):!1},s=function(t){if(!x)return!1;Dygraph.cancelEvent(t);var e=t.clientX-_;if(Math.abs(e)<4)return!0;_=t.clientX;var a=c.getZoomHandleStatus_(),i=a.leftHandlePos,r=a.rightHandlePos,n=r-i;i+e<=c.canvasRect_.x?(i=c.canvasRect_.x,r=i+n):r+e>=c.canvasRect_.x+c.canvasRect_.w?(r=c.canvasRect_.x+c.canvasRect_.w,i=r-n):(i+=e,r+=e);var o=c.leftZoomHandle_.width/2;return c.leftZoomHandle_.style.left=i-o+"px",c.rightZoomHandle_.style.left=r-o+"px",c.drawInteractiveLayer_(),m&&h(),!0},l=function(t){return x?(x=!1,Dygraph.removeEvent(y,"mousemove",s),Dygraph.removeEvent(y,"mouseup",l),m||h(),!0):!1},h=function(){try{c.isChangingRange_=!0,c.dygraph_.dateWindow_=t(c.getZoomHandleStatus_()),c.dygraph_.drawGraph_(!1)}finally{c.isChangingRange_=!1}},p=function(t){if(!f&&!x){var e=n(t)?"move":"default";e!=c.fgcanvas_.style.cursor&&(c.fgcanvas_.style.cursor=e)}},g=function(t){"touchstart"==t.type&&1==t.targetTouches.length?e(t.targetTouches[0])&&Dygraph.cancelEvent(t):"touchmove"==t.type&&1==t.targetTouches.length?a(t.targetTouches[0])&&Dygraph.cancelEvent(t):i(t)},d=function(t){"touchstart"==t.type&&1==t.targetTouches.length?o(t.targetTouches[0])&&Dygraph.cancelEvent(t):"touchmove"==t.type&&1==t.targetTouches.length?s(t.targetTouches[0])&&Dygraph.cancelEvent(t):l(t)},u=function(t,e){for(var a=["touchstart","touchend","touchmove","touchcancel"],i=0;i<a.length;i++)c.dygraph_.addAndTrackEvent(t,a[i],e)},this.setDefaultOption_("interactionModel",Dygraph.Interaction.dragIsPanInteractionModel),this.setDefaultOption_("panEdgeFraction",1e-4);var w=window.opera?"mousedown":"dragstart";this.dygraph_.addAndTrackEvent(this.leftZoomHandle_,w,e),this.dygraph_.addAndTrackEvent(this.rightZoomHandle_,w,e),this.isUsingExcanvas_?this.dygraph_.addAndTrackEvent(this.iePanOverlay_,"mousedown",o):(this.dygraph_.addAndTrackEvent(this.fgcanvas_,"mousedown",o),this.dygraph_.addAndTrackEvent(this.fgcanvas_,"mousemove",p)),this.hasTouchInterface_&&(u(this.leftZoomHandle_,g),u(this.rightZoomHandle_,g),u(this.fgcanvas_,d))},t.prototype.drawStaticLayer_=function(){var t=this.bgcanvas_ctx_;t.clearRect(0,0,this.canvasRect_.w,this.canvasRect_.h);try{this.drawMiniPlot_()}catch(e){console.warn(e)}var a=.5;this.bgcanvas_ctx_.lineWidth=1,t.strokeStyle="gray",t.beginPath(),t.moveTo(a,a),t.lineTo(a,this.canvasRect_.h-a),t.lineTo(this.canvasRect_.w-a,this.canvasRect_.h-a),t.lineTo(this.canvasRect_.w-a,a),t.stroke()},t.prototype.drawMiniPlot_=function(){var t=this.getOption_("rangeSelectorPlotFillColor"),e=this.getOption_("rangeSelectorPlotStrokeColor");if(t||e){var a=this.getOption_("stepPlot"),i=this.computeCombinedSeriesAndLimits_(),r=i.yMax-i.yMin,n=this.bgcanvas_ctx_,o=.5,s=this.dygraph_.xAxisExtremes(),l=Math.max(s[1]-s[0],1e-30),h=(this.canvasRect_.w-o)/l,p=(this.canvasRect_.h-o)/r,g=this.canvasRect_.w-o,d=this.canvasRect_.h-o,u=null,c=null;n.beginPath(),n.moveTo(o,d);for(var y=0;y<i.data.length;y++){var _=i.data[y],v=null!==_[0]?(_[0]-s[0])*h:0/0,f=null!==_[1]?d-(_[1]-i.yMin)*p:0/0;(a||null===u||Math.round(v)!=Math.round(u))&&(isFinite(v)&&isFinite(f)?(null===u?n.lineTo(v,d):a&&n.lineTo(v,c),n.lineTo(v,f),u=v,c=f):(null!==u&&(a?(n.lineTo(v,c),n.lineTo(v,d)):n.lineTo(u,d)),u=c=null))}if(n.lineTo(g,d),n.closePath(),t){var x=this.bgcanvas_ctx_.createLinearGradient(0,0,0,d);x.addColorStop(0,"white"),x.addColorStop(1,t),this.bgcanvas_ctx_.fillStyle=x,n.fill()}e&&(this.bgcanvas_ctx_.strokeStyle=e,this.bgcanvas_ctx_.lineWidth=1.5,n.stroke())}},t.prototype.computeCombinedSeriesAndLimits_=function(){var t,e=this.dygraph_,a=this.getOption_("logscale"),i=e.numColumns(),r=e.getLabels(),n=new Array(i),o=!1;for(t=1;i>t;t++){var s=this.getOption_("showInRangeSelector",r[t]);n[t]=s,null!==s&&(o=!0)}if(!o)for(t=0;t<n.length;t++)n[t]=!0;var l=[],h=e.dataHandler_,p=e.attributes_;for(t=1;t<e.numColumns();t++)if(n[t]){var g=h.extractSeries(e.rawData_,t,p);e.rollPeriod()>1&&(g=h.rollingAverage(g,e.rollPeriod(),p)),l.push(g)}var d=[];for(t=0;t<l[0].length;t++){for(var u=0,c=0,y=0;y<l.length;y++){var _=l[y][t][1];null===_||isNaN(_)||(c++,u+=_)}d.push([l[0][t][0],u/c])}var v=Number.MAX_VALUE,f=-Number.MAX_VALUE;for(t=0;t<d.length;t++){var x=d[t][1];null!==x&&isFinite(x)&&(!a||x>0)&&(v=Math.min(v,x),f=Math.max(f,x))}var m=.25;if(a)for(f=Dygraph.log10(f),f+=f*m,v=Dygraph.log10(v),t=0;t<d.length;t++)d[t][1]=Dygraph.log10(d[t][1]);else{var D,w=f-v;D=w<=Number.MIN_VALUE?f*m:w*m,f+=D,v-=D}return{data:d,yMin:v,yMax:f}},t.prototype.placeZoomHandles_=function(){var t=this.dygraph_.xAxisExtremes(),e=this.dygraph_.xAxisRange(),a=t[1]-t[0],i=Math.max(0,(e[0]-t[0])/a),r=Math.max(0,(t[1]-e[1])/a),n=this.canvasRect_.x+this.canvasRect_.w*i,o=this.canvasRect_.x+this.canvasRect_.w*(1-r),s=Math.max(this.canvasRect_.y,this.canvasRect_.y+(this.canvasRect_.h-this.leftZoomHandle_.height)/2),l=this.leftZoomHandle_.width/2;this.leftZoomHandle_.style.left=n-l+"px",this.leftZoomHandle_.style.top=s+"px",this.rightZoomHandle_.style.left=o-l+"px",this.rightZoomHandle_.style.top=this.leftZoomHandle_.style.top,this.leftZoomHandle_.style.visibility="visible",this.rightZoomHandle_.style.visibility="visible"},t.prototype.drawInteractiveLayer_=function(){var t=this.fgcanvas_ctx_;t.clearRect(0,0,this.canvasRect_.w,this.canvasRect_.h);var e=1,a=this.canvasRect_.w-e,i=this.canvasRect_.h-e,r=this.getZoomHandleStatus_();if(t.strokeStyle="black",r.isZoomed){var n=Math.max(e,r.leftHandlePos-this.canvasRect_.x),o=Math.min(a,r.rightHandlePos-this.canvasRect_.x);t.fillStyle="rgba(240, 240, 240, 0.6)",t.fillRect(0,0,n,this.canvasRect_.h),t.fillRect(o,0,this.canvasRect_.w-o,this.canvasRect_.h),t.beginPath(),t.moveTo(e,e),t.lineTo(n,e),t.lineTo(n,i),t.lineTo(o,i),t.lineTo(o,e),t.lineTo(a,e),t.stroke(),this.isUsingExcanvas_&&(this.iePanOverlay_.style.width=o-n+"px",this.iePanOverlay_.style.left=n+"px",this.iePanOverlay_.style.height=i+"px",this.iePanOverlay_.style.display="inline")}else t.beginPath(),t.moveTo(e,e),t.lineTo(e,i),t.lineTo(a,i),t.lineTo(a,e),t.stroke(),this.iePanOverlay_&&(this.iePanOverlay_.style.display="none")},t.prototype.getZoomHandleStatus_=function(){var t=this.leftZoomHandle_.width/2,e=parseFloat(this.leftZoomHandle_.style.left)+t,a=parseFloat(this.rightZoomHandle_.style.left)+t;return{leftHandlePos:e,rightHandlePos:a,isZoomed:e-1>this.canvasRect_.x||a+1<this.canvasRect_.x+this.canvasRect_.w}},t}(),Dygraph.PLUGINS.push(Dygraph.Plugins.Legend,Dygraph.Plugins.Axes,Dygraph.Plugins.RangeSelector,Dygraph.Plugins.ChartLabels,Dygraph.Plugins.Annotations,Dygraph.Plugins.Grid),Dygraph.DataHandler=function(){},Dygraph.DataHandlers={},function(){"use strict";var t=Dygraph.DataHandler;t.X=0,t.Y=1,t.EXTRAS=2,t.prototype.extractSeries=function(t,e,a){},t.prototype.seriesToPoints=function(e,a,i){for(var r=[],n=0;n<e.length;++n){var o=e[n],s=o[1],l=null===s?null:t.parseFloat(s),h={x:0/0,y:0/0,xval:t.parseFloat(o[0]),yval:l,name:a,idx:n+i};r.push(h)}return this.onPointsCreated_(e,r),r},t.prototype.onPointsCreated_=function(t,e){},t.prototype.rollingAverage=function(t,e,a){},t.prototype.getExtremeYValues=function(t,e,a){},t.prototype.onLineEvaluated=function(t,e,a){},t.prototype.computeYInterpolation_=function(t,e,a){var i=e[1]-t[1],r=e[0]-t[0],n=i/r,o=(a-t[0])*n;return t[1]+o},t.prototype.getIndexesInWindow_=function(t,e){var a=0,i=t.length-1;if(e){for(var r=0,n=e[0],o=e[1];r<t.length-1&&t[r][0]<n;)a++,r++;for(r=t.length-1;r>0&&t[r][0]>o;)i--,r--}return i>=a?[a,i]:[0,t.length-1]},t.parseFloat=function(t){return null===t?0/0:t}}(),function(){"use strict";Dygraph.DataHandlers.DefaultHandler=function(){};var t=Dygraph.DataHandlers.DefaultHandler;t.prototype=new Dygraph.DataHandler,t.prototype.extractSeries=function(t,e,a){for(var i=[],r=a.get("logscale"),n=0;n<t.length;n++){var o=t[n][0],s=t[n][e];r&&0>=s&&(s=null),i.push([o,s])}return i},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r,n,o,s,l=[];if(1==e)return t;for(i=0;i<t.length;i++){for(o=0,s=0,r=Math.max(0,i-e+1);i+1>r;r++)n=t[r][1],null===n||isNaN(n)||(s++,o+=t[r][1]);s?l[i]=[t[i][0],o/s]:l[i]=[t[i][0],null]}return l},t.prototype.getExtremeYValues=function(t,e,a){for(var i,r=null,n=null,o=0,s=t.length-1,l=o;s>=l;l++)i=t[l][1],null===i||isNaN(i)||((null===n||i>n)&&(n=i),(null===r||r>i)&&(r=i));return[r,n]}}(),function(){"use strict";Dygraph.DataHandlers.DefaultFractionHandler=function(){};var t=Dygraph.DataHandlers.DefaultFractionHandler;t.prototype=new Dygraph.DataHandlers.DefaultHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s,l,h=[],p=100,g=a.get("logscale"),d=0;d<t.length;d++)i=t[d][0],n=t[d][e],g&&null!==n&&(n[0]<=0||n[1]<=0)&&(n=null),null!==n?(o=n[0],s=n[1],null===o||isNaN(o)?h.push([i,o,[o,s]]):(l=s?o/s:0,r=p*l,h.push([i,r,[o,s]]))):h.push([i,null,[null,null]]);return h},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r=[],n=0,o=0,s=100;for(i=0;i<t.length;i++){n+=t[i][2][0],o+=t[i][2][1],i-e>=0&&(n-=t[i-e][2][0],o-=t[i-e][2][1]);var l=t[i][0],h=o?n/o:0;r[i]=[l,s*h]}return r}}(),function(){"use strict";Dygraph.DataHandlers.BarsHandler=function(){Dygraph.DataHandler.call(this)},Dygraph.DataHandlers.BarsHandler.prototype=new Dygraph.DataHandler;var t=Dygraph.DataHandlers.BarsHandler;t.prototype.extractSeries=function(t,e,a){},t.prototype.rollingAverage=function(t,e,a){},t.prototype.onPointsCreated_=function(t,e){for(var a=0;a<t.length;++a){var i=t[a],r=e[a];r.y_top=0/0,r.y_bottom=0/0,r.yval_minus=Dygraph.DataHandler.parseFloat(i[2][0]),r.yval_plus=Dygraph.DataHandler.parseFloat(i[2][1])}},t.prototype.getExtremeYValues=function(t,e,a){for(var i,r=null,n=null,o=0,s=t.length-1,l=o;s>=l;l++)if(i=t[l][1],null!==i&&!isNaN(i)){var h=t[l][2][0],p=t[l][2][1];h>i&&(h=i),i>p&&(p=i),(null===n||p>n)&&(n=p),(null===r||r>h)&&(r=h)}return[r,n]},t.prototype.onLineEvaluated=function(t,e,a){for(var i,r=0;r<t.length;r++)i=t[r],i.y_top=DygraphLayout.calcYNormal_(e,i.yval_minus,a),i.y_bottom=DygraphLayout.calcYNormal_(e,i.yval_plus,a)}}(),function(){"use strict";Dygraph.DataHandlers.CustomBarsHandler=function(){};var t=Dygraph.DataHandlers.CustomBarsHandler;t.prototype=new Dygraph.DataHandlers.BarsHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o=[],s=a.get("logscale"),l=0;l<t.length;l++)i=t[l][0],n=t[l][e],s&&null!==n&&(n[0]<=0||n[1]<=0||n[2]<=0)&&(n=null),null!==n?(r=n[1],o.push(null===r||isNaN(r)?[i,r,[r,r]]:[i,r,[n[0],n[2]]])):o.push([i,null,[null,null]]);return o},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r,n,o,s,l,h,p=[];for(r=0,o=0,n=0,s=0,l=0;l<t.length;l++){if(i=t[l][1],h=t[l][2],p[l]=t[l],null===i||isNaN(i)||(r+=h[0],o+=i,n+=h[1],s+=1),l-e>=0){var g=t[l-e];null===g[1]||isNaN(g[1])||(r-=g[2][0],o-=g[1],n-=g[2][1],s-=1)}s?p[l]=[t[l][0],1*o/s,[1*r/s,1*n/s]]:p[l]=[t[l][0],null,[null,null]]}return p}}(),function(){"use strict";Dygraph.DataHandlers.ErrorBarsHandler=function(){};var t=Dygraph.DataHandlers.ErrorBarsHandler;t.prototype=new Dygraph.DataHandlers.BarsHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s=[],l=a.get("sigma"),h=a.get("logscale"),p=0;p<t.length;p++)i=t[p][0],o=t[p][e],h&&null!==o&&(o[0]<=0||o[0]-l*o[1]<=0)&&(o=null),null!==o?(r=o[0],null===r||isNaN(r)?s.push([i,r,[r,r,r]]):(n=l*o[1],s.push([i,r,[r-n,r+n,o[1]]]))):s.push([i,null,[null,null,null]]);return s},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r,n,o,s,l,h,p,g,d=[],u=a.get("sigma");for(i=0;i<t.length;i++){for(s=0,p=0,l=0,r=Math.max(0,i-e+1);i+1>r;r++)n=t[r][1],null===n||isNaN(n)||(l++,s+=n,p+=Math.pow(t[r][2][2],2));l?(h=Math.sqrt(p)/l,g=s/l,d[i]=[t[i][0],g,[g-u*h,g+u*h]]):(o=1==e?t[i][1]:null,d[i]=[t[i][0],o,[o,o]])}return d}}(),function(){"use strict";Dygraph.DataHandlers.FractionsBarsHandler=function(){};var t=Dygraph.DataHandlers.FractionsBarsHandler;t.prototype=new Dygraph.DataHandlers.BarsHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s,l,h,p,g=[],d=100,u=a.get("sigma"),c=a.get("logscale"),y=0;y<t.length;y++)i=t[y][0],n=t[y][e],c&&null!==n&&(n[0]<=0||n[1]<=0)&&(n=null),null!==n?(o=n[0],s=n[1],null===o||isNaN(o)?g.push([i,o,[o,o,o,s]]):(l=s?o/s:0,h=s?u*Math.sqrt(l*(1-l)/s):1,p=d*h,r=d*l,g.push([i,r,[r-p,r+p,o,s]]))):g.push([i,null,[null,null,null,null]]);return g},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r,n,o,s=[],l=a.get("sigma"),h=a.get("wilsonInterval"),p=0,g=0,d=100;for(n=0;n<t.length;n++){p+=t[n][2][2],g+=t[n][2][3],n-e>=0&&(p-=t[n-e][2][2],g-=t[n-e][2][3]);var u=t[n][0],c=g?p/g:0;if(h)if(g){var y=0>c?0:c,_=g,v=l*Math.sqrt(y*(1-y)/_+l*l/(4*_*_)),f=1+l*l/g;i=(y+l*l/(2*g)-v)/f,r=(y+l*l/(2*g)+v)/f,s[n]=[u,y*d,[i*d,r*d]]}else s[n]=[u,0,[0,0]];else o=g?l*Math.sqrt(c*(1-c)/g):1,s[n]=[u,d*c,[d*(c-o),d*(c+o)]]}return s}}();
+//# sourceMappingURL=dygraph-combined.js.map
\ No newline at end of file
diff --git a/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css
new file mode 100644
index 000000000..4745b2fc2
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css
@@ -0,0 +1,8 @@
+
+div .dygraphs input[type="text"] {
+  width: 25px;
+}
+
+div .qt .dygraph-axis-label {
+  font-size: 11px;
+}
\ No newline at end of file
diff --git a/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js
new file mode 100644
index 000000000..2df07a9b8
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js
@@ -0,0 +1,123 @@
+/**
+ * @license
+ * Copyright 2011 Dan Vanderkam (danvdk@gmail.com)
+ * MIT-licensed (http://opensource.org/licenses/MIT)
+ */
+
+/**
+ * @fileoverview
+ * Including this file will add several additional shapes to Dygraph.Circles
+ * which can be passed to drawPointCallback.
+ * See tests/custom-circles.html for usage.
+ */
+
+(function() {
+
+/**
+ * @param {!CanvasRenderingContext2D} ctx the canvas context
+ * @param {number} sides the number of sides in the shape.
+ * @param {number} radius the radius of the image.
+ * @param {number} cx center x coordate
+ * @param {number} cy center y coordinate
+ * @param {number=} rotationRadians the shift of the initial angle, in radians.
+ * @param {number=} delta the angle shift for each line. If missing, creates a
+ *     regular polygon.
+ */
+var regularShape = function(
+    ctx, sides, radius, cx, cy, rotationRadians, delta) {
+  rotationRadians = rotationRadians || 0;
+  delta = delta || Math.PI * 2 / sides;
+
+  ctx.beginPath();
+  var initialAngle = rotationRadians;
+  var angle = initialAngle;
+
+  var computeCoordinates = function() {
+    var x = cx + (Math.sin(angle) * radius);
+    var y = cy + (-Math.cos(angle) * radius);
+    return [x, y];
+  };
+
+  var initialCoordinates = computeCoordinates();
+  var x = initialCoordinates[0];
+  var y = initialCoordinates[1];
+  ctx.moveTo(x, y);
+
+  for (var idx = 0; idx < sides; idx++) {
+    angle = (idx == sides - 1) ? initialAngle : (angle + delta);
+    var coords = computeCoordinates();
+    ctx.lineTo(coords[0], coords[1]);
+  }
+  ctx.fill();
+  ctx.stroke();
+};
+
+/**
+ * TODO(danvk): be more specific on the return type.
+ * @param {number} sides
+ * @param {number=} rotationRadians
+ * @param {number=} delta
+ * @return {Function}
+ * @private
+ */
+var shapeFunction = function(sides, rotationRadians, delta) {
+  return function(g, name, ctx, cx, cy, color, radius) {
+    ctx.strokeStyle = color;
+    ctx.fillStyle = "white";
+    regularShape(ctx, sides, radius, cx, cy, rotationRadians, delta);
+  };
+};
+
+var customCircles = {
+  TRIANGLE : shapeFunction(3),
+  SQUARE : shapeFunction(4, Math.PI / 4),
+  DIAMOND : shapeFunction(4),
+  PENTAGON : shapeFunction(5),
+  HEXAGON : shapeFunction(6),
+  CIRCLE : function(g, name, ctx, cx, cy, color, radius) {
+    ctx.beginPath();
+    ctx.strokeStyle = color;
+    ctx.fillStyle = "white";
+    ctx.arc(cx, cy, radius, 0, 2 * Math.PI, false);
+    ctx.fill();
+    ctx.stroke();
+  },
+  STAR : shapeFunction(5, 0, 4 * Math.PI / 5),
+  PLUS : function(g, name, ctx, cx, cy, color, radius) {
+    ctx.strokeStyle = color;
+
+    ctx.beginPath();
+    ctx.moveTo(cx + radius, cy);
+    ctx.lineTo(cx - radius, cy);
+    ctx.closePath();
+    ctx.stroke();
+
+    ctx.beginPath();
+    ctx.moveTo(cx, cy + radius);
+    ctx.lineTo(cx, cy - radius);
+    ctx.closePath();
+    ctx.stroke();
+  },
+  EX : function(g, name, ctx, cx, cy, color, radius) {
+    ctx.strokeStyle = color;
+
+    ctx.beginPath();
+    ctx.moveTo(cx + radius, cy + radius);
+    ctx.lineTo(cx - radius, cy - radius);
+    ctx.closePath();
+    ctx.stroke();
+
+    ctx.beginPath();
+    ctx.moveTo(cx + radius, cy - radius);
+    ctx.lineTo(cx - radius, cy + radius);
+    ctx.closePath();
+    ctx.stroke();
+  }
+};
+
+for (var k in customCircles) {
+  if (!customCircles.hasOwnProperty(k)) continue;
+  Dygraph.Circles[k] = customCircles[k];
+}
+
+})();
diff --git a/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js b/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js
new file mode 100644
index 000000000..3cd03913f
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js
@@ -0,0 +1,789 @@
+
+// polyfill indexOf for IE8
+if (!Array.prototype.indexOf) {
+  Array.prototype.indexOf = function(elt /*, from*/) {
+    var len = this.length >>> 0;
+
+    var from = Number(arguments[1]) || 0;
+    from = (from < 0)
+         ? Math.ceil(from)
+         : Math.floor(from);
+    if (from < 0)
+      from += len;
+
+    for (; from < len; from++) {
+      if (from in this &&
+          this[from] === elt)
+        return from;
+    }
+    return -1;
+  };
+}
+
+HTMLWidgets.widget({
+
+  name: "dygraphs",
+
+  type: "output",
+
+  factory: function(el, width, height) {
+    
+    // reference to dygraph
+    var dygraph = null;
+    
+    // reference to widget global groups
+    var groups = this.groups;
+ 
+    // add qt style if we are running under Qt
+    if (window.navigator.userAgent.indexOf(" Qt/") > 0)
+      el.className += " qt";
+    
+    return {
+      
+      renderValue: function(x) {
+        
+        // reference to this for closures
+        var thiz = this;
+        
+        // get dygraph attrs and populate file field
+        var attrs = x.attrs;
+        attrs.file = x.data;
+	      
+	// disable zoom interaction except for clicks
+        if (attrs.disableZoom) {
+          attrs.interactionModel = Dygraph.Interaction.nonInteractiveModel_;
+        }
+        
+        // convert non-arrays to arrays
+        for (var index = 0; index < attrs.file.length; index++) {
+          if (!$.isArray(attrs.file[index]))
+            attrs.file[index] = [].concat(attrs.file[index]);
+        }
+            
+        // resolve "auto" legend behavior
+        if (x.attrs.legend == "auto") {
+          if (x.data.length <= 2)
+            x.attrs.legend = "onmouseover";
+          else
+            x.attrs.legend = "always";
+        }
+        
+        if (x.format == "date") {
+          
+          // set appropriated function in case of fixed tz
+          if ((attrs.axes.x.axisLabelFormatter === undefined) && x.fixedtz)
+            attrs.axes.x.axisLabelFormatter = this.xAxisLabelFormatterFixedTZ(x.tzone);
+            
+          if ((attrs.axes.x.valueFormatter === undefined) && x.fixedtz)
+            attrs.axes.x.valueFormatter = this.xValueFormatterFixedTZ(x.scale, x.tzone);
+      
+          if ((attrs.axes.x.ticker === undefined) && x.fixedtz)
+            attrs.axes.x.ticker = this.customDateTickerFixedTZ(x.tzone);
+        
+          // provide an automatic x value formatter if none is already specified
+          if ((attrs.axes.x.valueFormatter === undefined) && (x.fixedtz != true))
+            attrs.axes.x.valueFormatter = this.xValueFormatter(x.scale);
+          
+          // convert time to js time
+          attrs.file[0] = attrs.file[0].map(function(value) {
+            return thiz.normalizeDateValue(x.scale, value, x.fixedtz);
+          });
+          if (attrs.dateWindow != null) {
+            attrs.dateWindow = attrs.dateWindow.map(function(value) {
+              var date = thiz.normalizeDateValue(x.scale, value, x.fixedtz);
+              return date.getTime();
+            });
+          }
+        }
+        
+        
+        // transpose array
+        attrs.file = HTMLWidgets.transposeArray2D(attrs.file);
+        
+        // add drawCallback for group
+        if (x.group != null)
+          this.addGroupDrawCallback(x);  
+          
+        // add shading and event callback if necessary
+        this.addShadingCallback(x);
+        this.addEventCallback(x);
+        this.addZoomCallback(x);
+        
+        // disable y-axis touch events on mobile phones
+        if (attrs.mobileDisableYTouch !== false && this.isMobilePhone()) {
+          // create default interaction model if necessary
+          if (!attrs.interactionModel)
+            attrs.interactionModel = Dygraph.Interaction.defaultModel;
+          // disable y touch direction
+          attrs.interactionModel.touchstart = function(event, dygraph, context) {
+            Dygraph.defaultInteractionModel.touchstart(event, dygraph, context);
+            context.touchDirections = { x: true, y: false };
+          };
+        }
+    
+        // create plugins
+        if (x.plugins) {
+          attrs.plugins = [];
+          for (var plugin in x.plugins) {
+            if (x.plugins.hasOwnProperty(plugin)) {
+              
+              // get plugin options
+              var options = x.plugins[plugin];
+              
+              // create plugin and add to dygraph
+              var p = new Dygraph.Plugins[plugin](options);
+              attrs.plugins.push(p);
+            }
+          }
+        }
+
+        // custom plotter
+        if (x.plotter) {
+          attrs.plotter = Dygraph.Plotters[x.plotter];
+        }
+
+        // custom data handler
+        if (x.dataHandler) {
+          attrs.dataHandler = Dygraph.DataHandlers[x.dataHandler];
+        }
+
+        // custom circles
+        if (x.pointShape) {
+          if (typeof x.pointShape === 'string') {
+            attrs.drawPointCallback = Dygraph.Circles[x.pointShape.toUpperCase()];
+            attrs.drawHighlightPointCallback = Dygraph.Circles[x.pointShape.toUpperCase()];
+          } else {
+            for (var s in x.pointShape) {
+              if (x.pointShape.hasOwnProperty(s)) {
+                attrs.series[s].drawPointCallback = Dygraph.Circles[x.pointShape[s].toUpperCase()];
+                attrs.series[s].drawHighlightPointCallback = Dygraph.Circles[x.pointShape[s].toUpperCase()];
+              }
+            }
+          }
+        }
+    
+        // if there is no existing dygraph perform initialization
+        if (!dygraph) {
+          
+          // subscribe to custom shown event (fired by ioslides to trigger
+          // shiny reactivity but we can use it as well). this is necessary
+          // because if a dygraph starts out as display:none it has height
+          // and width == 0 and this doesn't change when it becomes visible
+          $(el).closest('slide').on('shown', function() {
+            if (dygraph)
+              dygraph.resize();  
+          });
+          
+          // do the same for reveal.js
+          $(el).closest('section.slide').on('shown', function() {
+            if (dygraph)
+              dygraph.resize();  
+          });
+          
+          // redraw on R Markdown {.tabset} tab visibility changed
+          var tab = $(el).closest('div.tab-pane');
+          if (tab !== null) {
+            var tabID = tab.attr('id');
+            var tabAnchor = $('a[data-toggle="tab"][href="#' + tabID + '"]');
+            if (tabAnchor !== null) {
+              tabAnchor.on('shown.bs.tab', function() {
+                if (dygraph)
+                  dygraph.resize();  
+              });
+            }
+          }
+          // add default font for viewer mode
+          if (this.queryVar("viewer_pane") === "1")
+            document.body.style.fontFamily = "Arial, sans-serif";
+    
+          // inject css if necessary
+          if (x.css != null) {
+            var style = document.createElement('style');
+            style.type = 'text/css';
+            if (style.styleSheet) 
+              style.styleSheet.cssText = x.css;
+            else 
+              style.appendChild(document.createTextNode(x.css));
+            document.getElementsByTagName("head")[0].appendChild(style);
+          }
+          
+        } else {
+          
+            // retain the userDateWindow if requested
+            if (dygraph.userDateWindow != null
+                && attrs.retainDateWindow == true) {
+              attrs.dateWindow = dygraph.xAxisRange();
+            }
+                
+            // remove it from groups if it's there
+            if (x.group != null && groups[x.group] != null) {
+              var index = groups[x.group].indexOf(dygraph);
+              if (index != -1)
+                groups[x.group].splice(index, 1);
+            }
+            
+            // destroy the existing dygraph 
+            dygraph.destroy();
+            dygraph = null;
+        }
+        
+        // create the dygraph and add it to it's group (if any)
+        dygraph = thiz.dygraph = new Dygraph(el, attrs.file, attrs);
+        dygraph.userDateWindow = attrs.dateWindow;
+        if (x.group != null)
+          groups[x.group].push(dygraph);
+   	
+        // add shiny inputs for date window and click
+        if (HTMLWidgets.shinyMode) {
+          var isDate = x.format == "date";
+          this.addClickShinyInput(el.id, isDate);
+          this.addDateWindowShinyInput(el.id, isDate);
+        }
+        
+        // set annotations
+        if (x.annotations != null) {
+          dygraph.ready(function() {
+            if (x.format == "date") {
+              x.annotations.map(function(annotation) {
+                var date = thiz.normalizeDateValue(x.scale, annotation.x, x.fixedtz);
+                annotation.x = date.getTime();
+              });
+            }
+            dygraph.setAnnotations(x.annotations);
+          }); 
+        }
+          
+      },
+      
+      customDateTickerFixedTZ : function(tz){
+        return function(t,e,a,i,r) {   
+          var a=Dygraph.pickDateTickGranularity(t,e,a,i);
+          if(a >= 0){
+            
+            var n=i("axisLabelFormatter"),
+            o=i("labelsUTC"),
+            s=o?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal;
+            l=Dygraph.TICK_PLACEMENT[a].datefield;
+            h=Dygraph.TICK_PLACEMENT[a].step;
+            p=Dygraph.TICK_PLACEMENT[a].spacing;
+            
+            var y = [];
+            var d = moment(t);
+            d.tz(tz); 
+            d.millisecond(0);
+          
+            if(l > Dygraph.DATEFIELD_M){
+              var x;
+              if (l === Dygraph.DATEFIELD_SS) {  // seconds 
+                x = d.second();         
+                d.second(x - x % h);     
+              } else if(l === Dygraph.DATEFIELD_MM){
+                d.second(0)
+                x = d.minute();
+                d.minute(x - x % h);
+              } else if(l === Dygraph.DATEFIELD_HH){
+                d.second(0);
+                d.minute(0);
+                x = d.hour();
+                d.hour(x - x % h);
+              } else if(l === Dygraph.DATEFIELD_D){
+                d.second(0);
+                d.minute(0);
+                d.hour(0);
+                if (h == 7) {  // one week
+                    d.startOf('week');
+                }
+              }
+              
+              v = d.valueOf();
+              _=moment(v).tz(tz);
+            
+              // For spacings coarser than two-hourly, we want to ignore daylight
+              // savings transitions to get consistent ticks. For finer-grained ticks,
+              // it's essential to show the DST transition in all its messiness.
+              var start_offset_min = moment(v).tz(tz).zone();
+              var check_dst = (p >= Dygraph.TICK_PLACEMENT[Dygraph.TWO_HOURLY].spacing);
+              
+    	        if(a<=Dygraph.HOURLY){
+    		        for(t>v&&(v+=p,_=moment(v).tz(tz));e>=v;){
+    			        y.push({v:v,label:n(_,a,i,r)});
+    			        v+=p;
+    			        _=moment(v).tz(tz);
+    		        }
+    	        }else{
+                for(t>v&&(v+=p,_=moment(v).tz(tz));e>=v;){  
+                
+                  // This ensures that we stay on the same hourly "rhythm" across
+                  // daylight savings transitions. Without this, the ticks could get off
+                  // by an hour. See tests/daylight-savings.html or issue 147.
+                  if (check_dst && _.zone() != start_offset_min) {
+                    var delta_min = _.zone() - start_offset_min;
+                    v += delta_min * 60 * 1000;
+                    _= moment(v).tz(tz);
+                    start_offset_min = _.zone();
+    
+                    // Check whether we've backed into the previous timezone again.
+                    // This can happen during a "spring forward" transition. In this case,
+                    // it's best to skip this tick altogether (we may be shooting for a
+                    // non-existent time like the 2AM that's skipped) and go to the next
+                    // one.
+                    if (moment(v + p).tz(tz).zone() != start_offset_min) {
+                      v += p;
+                      _= moment(v).tz(tz);
+                      start_offset_min = _.zone();
+                    }
+                  }
+                
+                  (a>=Dygraph.DAILY||_.get('hour')%h===0)&&y.push({v:v,label:n(_,a,i,r)});
+    			        v+=p;
+    			        _=moment(v).tz(tz);
+    		        }
+    	        }
+    	      }else{
+              var start_year = moment(t).tz(tz).year();
+              var end_year   = moment(e).tz(tz).year();
+              var start_month = moment(t).tz(tz).month();
+              
+              if(l === Dygraph.DATEFIELD_M){
+                var step_month = h;
+                for (var ii = start_year; ii <= end_year; ii++) {
+                  for (var j = 0; j < 12;) {
+                    var dt = moment(new Date(ii, j, 1)).tz(tz); 
+                    // fix some tz bug
+                    dt.year(ii);
+                    dt.month(j);
+                    dt.date(1);
+                    dt.hour(0);
+                    v = dt.valueOf();
+                    y.push({v:v,label:n(moment(v).tz(tz),a,i,r)});
+                    j+=step_month;
+                  }
+                }
+              }else{
+                var step_year = h;
+                for (var ii = start_year; ii <= end_year;) {
+                  var dt = moment(new Date(ii, 1, 1)).tz(tz); 
+                  // fix some tz bug
+                  dt.year(ii);
+                  dt.month(j);
+                  dt.date(1);
+                  dt.hour(0);
+                  v = dt.valueOf();
+                  y.push({v:v,label:n(moment(v).tz(tz),a,i,r)});
+                  ii+=step_year;
+                }
+              }
+    	      }
+    	      return y;
+    	    }else{
+           return []; 
+    	    }
+        };
+      },
+    
+      xAxisLabelFormatterFixedTZ : function(tz){
+      
+        return function dateAxisFormatter(date, granularity){
+          var mmnt = moment(date).tz(tz);
+          if (granularity >= Dygraph.DECADAL){
+            return mmnt.format('YYYY');
+          }else{
+            if(granularity >= Dygraph.MONTHLY){
+              return mmnt.format('MMM YYYY');
+            }else{
+              var frac = mmnt.hour() * 3600 + mmnt.minute() * 60 + mmnt.second() + mmnt.millisecond();
+                if (frac === 0 || granularity >= Dygraph.DAILY) {
+                  return mmnt.format('DD MMM');
+                } else {
+                 if (mmnt.second()) {
+                   return mmnt.format('HH:mm:ss');
+                 } else {
+                   return mmnt.format('HH:mm');
+                 }
+                }
+             } 
+                            
+           }         
+       }
+      },
+             
+      xValueFormatterFixedTZ: function(scale, tz) {
+                       
+        return function(millis) {
+          var mmnt = moment(millis).tz(tz);
+          if (scale == "yearly")
+            return mmnt.format('YYYY') + ' (' + mmnt.zoneAbbr() + ')';
+          else if (scale == "quarterly")
+            return mmnt.fquarter(1) + ' (' + mmnt.zoneAbbr() + ')';
+            else if (scale == "monthly")
+              return mmnt.format('MMM, YYYY')+ ' (' + mmnt.zoneAbbr() + ')';
+            else if (scale == "daily" || scale == "weekly")
+              return mmnt.format('MMM, DD, YYYY')+ ' (' + mmnt.zoneAbbr() + ')';
+            else
+              return mmnt.format('dddd, MMMM DD, YYYY HH:mm:ss')+ ' (' + mmnt.zoneAbbr() + ')';
+        }
+      },
+      
+      xValueFormatter: function(scale) {
+        
+        var monthNames = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
+                          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"];
+                          
+        return function(millis) {
+          var date = new Date(millis);
+          if (scale == "yearly")
+            return date.getFullYear();
+          else if (scale == "quarterly")
+            return moment(millis).fquarter(1);
+          else if (scale == "monthly")
+            return monthNames[date.getMonth()] + ', ' + date.getFullYear(); 
+          else if (scale == "daily" || scale == "weekly")
+            return monthNames[date.getMonth()] + ', ' + 
+                              date.getDate() + ', ' + 
+                              date.getFullYear();
+          else
+            return date.toLocaleString();
+        }
+      },
+      
+      addZoomCallback: function(x) {
+        
+        // alias this
+        var thiz = this;
+        
+        // get attrs
+        var attrs = x.attrs;
+        
+        // check for an existing zoomCallback
+        var prevZoomCallback = attrs["zoomCallback"];
+        
+        attrs.zoomCallback = function(minDate, maxDate, yRanges) {
+          
+          // call existing
+          if (prevZoomCallback)
+            prevZoomCallback(minDate, maxDate, yRanges);
+            
+          // record user date window (or lack thereof)
+          if (dygraph.xAxisExtremes()[0] != minDate ||
+              dygraph.xAxisExtremes()[1] != maxDate) {
+             dygraph.userDateWindow = [minDate, maxDate];
+          } else {
+             dygraph.userDateWindow = null;
+          }
+          
+          // record in group if necessary
+          if (x.group != null && groups[x.group] != null) {
+            var group = groups[x.group];
+            for(var i = 0; i<group.length; i++)
+              group[i].userDateWindow = dygraph.userDateWindow;
+          }
+        };
+      },
+      
+      addGroupDrawCallback: function(x) {
+        
+        // get attrs
+        var attrs = x.attrs;
+        
+        // check for an existing drawCallback
+        var prevDrawCallback = attrs["drawCallback"];
+        
+        groups[x.group] = groups[x.group] || [];
+        var group = groups[x.group];
+        var blockRedraw = false;
+        attrs.drawCallback = function(me, initial) {
+          
+          // call existing
+          if (prevDrawCallback)
+            prevDrawCallback(me, initial);
+          
+          // sync peers in group
+          if (blockRedraw || initial) return;
+          blockRedraw = true;
+          var range = dygraph.xAxisRange();
+          for (var j = 0; j < group.length; j++) {
+            if (group[j] == me) continue;
+            // update group range only if it's different (prevents
+            // infinite recursion in updateOptions)
+            var peerRange = group[j].xAxisRange();
+            if (peerRange[0] != range[0] || peerRange[1] != range[1]) {
+              group[j].updateOptions({
+                dateWindow: range
+              });
+            }
+          }
+          blockRedraw = false;
+        };
+      },
+      
+      addShadingCallback: function(x) {
+        
+        // bail if no shadings
+        if (x.shadings.length == 0)
+          return;
+        
+        // alias this
+        var thiz = this;
+        
+        // get attrs
+        var attrs = x.attrs;
+        
+        // check for an existing underlayCallback
+        var prevUnderlayCallback = attrs["underlayCallback"];
+        
+        // install callback
+        attrs.underlayCallback = function(canvas, area, g) {
+          
+          // call existing
+          if (prevUnderlayCallback)
+            prevUnderlayCallback(canvas, area, g);
+            
+          for (var i = 0; i < x.shadings.length; i++) {
+            var shading = x.shadings[i];
+            canvas.save();
+            canvas.fillStyle = shading.color;
+            if (shading.axis == "x") {
+              var x1 = shading.from;
+              var x2 = shading.to;
+              if (x.format == "date") {
+                x1 = thiz.normalizeDateValue(x.scale, x1, x.fixedtz).getTime();
+                x2 = thiz.normalizeDateValue(x.scale, x2, x.fixedtz).getTime();
+              }
+              var left = g.toDomXCoord(x1);
+              var right = g.toDomXCoord(x2);
+              
+              canvas.fillRect(left, area.y, right - left, area.h);
+            } else if (shading.axis == "y") {
+              var bottom = g.toDomYCoord(shading.from);
+              var top = g.toDomYCoord(shading.to);
+    
+              canvas.fillRect(area.x, bottom, area.w, top - bottom);
+            }
+            canvas.restore();
+          }
+        };
+      },
+      
+      addEventCallback: function(x) {
+        
+        // bail if no evets
+        if (x.events.length == 0)
+          return;
+        
+        // alias this
+        var thiz = this;
+        
+        // get attrs
+        var attrs = x.attrs;
+        
+        // check for an existing underlayCallback
+        var prevUnderlayCallback = attrs["underlayCallback"];
+        
+        // install callback
+        attrs.underlayCallback = function(canvas, area, g) {
+          
+          // call existing
+          if (prevUnderlayCallback)
+            prevUnderlayCallback(canvas, area, g);
+            
+          for (var i = 0; i < x.events.length; i++) {
+            
+            // get event and x-coordinate
+            var event = x.events[i];
+            
+            // draw line
+            canvas.save();
+            canvas.strokeStyle = event.color;
+            if (event.axis == "x") {
+              var xPos;
+              if (jQuery.isNumeric(event.pos)) {
+                xPos = g.toDomXCoord(event.pos);
+              } else {
+                xPos = thiz.normalizeDateValue(x.scale, event.pos, x.fixedtz).getTime();
+                xPos = g.toDomXCoord(xPos);
+              }
+              
+              // draw line
+              thiz.dashedLine(canvas, 
+                              xPos, 
+                              area.y, 
+                              xPos, 
+                              area.y + area.h,
+                              event.strokePattern);
+            } else if (event.axis == "y") {
+              yPos = g.toDomYCoord(event.pos);
+              
+              thiz.dashedLine(canvas, 
+                              area.x, 
+                              yPos, 
+                              area.x + area.w, 
+                              yPos,
+                              event.strokePattern);
+            }
+            canvas.restore();
+            
+            // draw label
+            if (event.label != null) {
+              canvas.save();
+              thiz.setFontSize(canvas, 12);
+              var size = canvas.measureText(event.label);
+              if (event.axis == "x") {
+                var tx = xPos - 4;
+                var ty;
+                if (event.labelLoc == "top")
+                  ty = area.y + size.width + 10;
+                else
+                  ty = area.y + area.h - 10;
+                canvas.translate(tx, ty);
+                canvas.rotate(3 * Math.PI / 2);
+                canvas.translate(-tx,-ty);
+              } else if (event.axis == "y") {
+                var ty = yPos - 4;
+                var tx;
+                if (event.labelLoc == "right")
+                  tx = area.x + area.w - size.width - 10;
+                else
+                  tx = area.x + 10;
+              }
+              canvas.fillStyle = event.color;
+              canvas.fillText(event.label, tx, ty);
+              canvas.restore();
+            }
+          }
+        };
+      },
+      
+      addDateWindowShinyInput: function(id, isDate) {
+          
+        // check for an existing drawCallback
+        var prevDrawCallback = dygraph.getOption("drawCallback");
+        
+        // install the callback
+        dygraph.updateOptions({
+          drawCallback: function(me, initial) {
+            // call existing
+            if (prevDrawCallback)
+              prevDrawCallback(me, initial);
+            // fire input change
+            var range = dygraph.xAxisRange();
+            if (isDate)
+              range = [new Date(range[0]), new Date(range[1])];
+            if (Shiny.onInputChange) // may note be ready yet in case of static render
+              Shiny.onInputChange(id + "_date_window", range); 
+          }
+        });
+      },
+      
+      addClickShinyInput: function(id, isDate) {
+        
+        var prevClickCallback = dygraph.getOption("clickCallback")
+        
+        dygraph.updateOptions({
+          clickCallback: function(e, x, points) {
+            
+            // call existing
+            if (prevClickCallback)
+              prevClickCallback(e, x, points);
+              
+			      // fire input change
+			      if (Shiny.onInputChange) { // may note be ready yet in case of static render
+              Shiny.onInputChange(el.id + "_click", {
+        				x: isDate ? new Date(x) : x,
+        				x_closest_point: isDate ? new Date(points[0].xval) : points[0].xval,
+        				y_closest_point: points[0].yval,
+        				series_name: points[0].name,
+        				'.nonce': Math.random() // Force reactivity if click hasn't changed
+  			      }); 
+			      }
+          }
+        });
+      },
+      
+      // Add dashed line support to canvas rendering context
+      // See: http://stackoverflow.com/questions/4576724/dotted-stroke-in-canvas
+      dashedLine: function(canvas, x, y, x2, y2, dashArray) {
+        canvas.beginPath();
+        if (!dashArray) dashArray=[10,5];
+        if (dashLength==0) dashLength = 0.001; // Hack for Safari
+        var dashCount = dashArray.length;
+        canvas.moveTo(x, y);
+        var dx = (x2-x), dy = (y2-y);
+        var slope = dx ? dy/dx : 1e15;
+        var distRemaining = Math.sqrt( dx*dx + dy*dy );
+        var dashIndex=0, draw=true;
+        while (distRemaining>=0.1){
+          var dashLength = dashArray[dashIndex++%dashCount];
+          if (dashLength > distRemaining) dashLength = distRemaining;
+          var xStep = Math.sqrt( dashLength*dashLength / (1 + slope*slope) );
+          if (dx<0) xStep = -xStep;
+          x += xStep
+          y += slope*xStep;
+          canvas[draw ? 'lineTo' : 'moveTo'](x,y);
+          distRemaining -= dashLength;
+          draw = !draw;
+        }
+        canvas.stroke();
+      },
+      
+      setFontSize: function(canvas, size) {
+        var cFont = canvas.font;
+        var parts = cFont.split(' ');
+        if (parts.length === 2)
+          canvas.font = size + 'px ' + parts[1];
+        else if (parts.length === 3)
+          canvas.font = parts[0] + ' ' + size + 'px ' + parts[2];
+      },
+      
+      // Returns the value of a GET variable
+      queryVar: function(name) {
+        return decodeURI(window.location.search.replace(
+          new RegExp("^(?:.*[&\\?]" +
+                     encodeURI(name).replace(/[\.\+\*]/g, "\\$&") +
+                     "(?:\\=([^&]*))?)?.*$", "i"),
+          "$1"));
+      },
+      
+      // We deal exclusively in UTC dates within R, however dygraphs deals 
+      // exclusively in the local time zone. Therefore, in order to plot date
+      // labels that make sense to the user when we are dealing with days,
+      // months or years we need to convert the UTC date value to a local time
+      // value that "looks like" the equivilant UTC value. To do this we add the
+      // timezone offset to the UTC date.
+      // Don't use in case of fixedtz
+      normalizeDateValue: function(scale, value, fixedtz) {
+        var date = new Date(value); 
+        if (scale != "minute" && scale != "hourly" && scale != "seconds" && !fixedtz) {
+          var localAsUTC = date.getTime() + (date.getTimezoneOffset() * 60000);
+          date = new Date(localAsUTC);
+        }
+        return date;
+      },
+      
+      // safely detect rendering on a mobile phone
+      isMobilePhone: function() {
+        try
+        {
+          return ! window.matchMedia("only screen and (min-width: 768px)").matches;
+        }
+        catch(e) {
+          return false;
+        }
+      },
+        
+      
+      resize: function(width, height) {
+        if (dygraph)
+          dygraph.resize();
+      },
+      
+      // export dygraph so other code can get a hold of it
+      dygraph: null
+    
+    };
+  },
+  
+  // track groups globally
+  groups: {}
+  
+});
+
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf
new file mode 100644
index 000000000..35acda2fa
Binary files /dev/null and b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf differ
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css
new file mode 100644
index 000000000..8e5bb8a3c
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css
@@ -0,0 +1,99 @@
+.book .book-header h1 {
+  padding-left: 20px;
+  padding-right: 20px;
+}
+.book .book-header.fixed {
+  position: fixed;
+  right: 0;
+  top: 0;
+  left: 0;
+  border-bottom: 1px solid rgba(0,0,0,.07);
+}
+span.search-highlight {
+  background-color: #ffff88;
+}
+@media (min-width: 600px) {
+  .book.with-summary .book-header.fixed {
+    left: 300px;
+  }
+}
+@media (max-width: 1240px) {
+  .book .book-body.fixed {
+    top: 50px;
+  }
+  .book .book-body.fixed .body-inner {
+    top: auto;
+  }
+}
+@media (max-width: 600px) {
+  .book.with-summary .book-header.fixed {
+    left: calc(100% - 60px);
+    min-width: 300px;
+  }
+  .book.with-summary .book-body {
+    transform: none;
+    left: calc(100% - 60px);
+    min-width: 300px;
+  }
+  .book .book-body.fixed {
+    top: 0;
+  }
+}
+
+.book .book-body.fixed .body-inner {
+  top: 50px;
+}
+.book .book-body .page-wrapper .page-inner section.normal sub, .book .book-body .page-wrapper .page-inner section.normal sup {
+  font-size: 85%;
+}
+
+@media print {
+  .book .book-summary, .book .book-body .book-header, .fa {
+    display: none !important;
+  }
+  .book .book-body.fixed {
+    left: 0px;
+  }
+  .book .book-body,.book .book-body .body-inner, .book.with-summary {
+    overflow: visible !important;
+  }
+}
+.kable_wrapper {
+  border-spacing: 20px 0;
+  border-collapse: separate;
+  border: none;
+  margin: auto;
+}
+.kable_wrapper > tbody > tr > td {
+  vertical-align: top;
+}
+.book .book-body .page-wrapper .page-inner section.normal table tr.header {
+  border-top-width: 2px;
+}
+.book .book-body .page-wrapper .page-inner section.normal table tr:last-child td {
+  border-bottom-width: 2px;
+}
+.book .book-body .page-wrapper .page-inner section.normal table td, .book .book-body .page-wrapper .page-inner section.normal table th {
+  border-left: none;
+  border-right: none;
+}
+.book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr, .book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr > td {
+  border-top: none;
+}
+.book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr:last-child > td {
+    border-bottom: none;
+}
+
+div.theorem, div.lemma, div.corollary, div.proposition, div.conjecture {
+  font-style: italic;
+}
+span.theorem, span.lemma, span.corollary, span.proposition, span.conjecture {
+  font-style: normal;
+}
+div.proof:after {
+  content: "\25a2";
+  float: right;
+}
+.header-section-number {
+  padding-right: .5em;
+}
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css
new file mode 100644
index 000000000..87236b4c0
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css
@@ -0,0 +1,292 @@
+/*
+ * Theme 1
+ */
+.color-theme-1 .dropdown-menu {
+  background-color: #111111;
+  border-color: #7e888b;
+}
+.color-theme-1 .dropdown-menu .dropdown-caret .caret-inner {
+  border-bottom: 9px solid #111111;
+}
+.color-theme-1 .dropdown-menu .buttons {
+  border-color: #7e888b;
+}
+.color-theme-1 .dropdown-menu .button {
+  color: #afa790;
+}
+.color-theme-1 .dropdown-menu .button:hover {
+  color: #73553c;
+}
+/*
+ * Theme 2
+ */
+.color-theme-2 .dropdown-menu {
+  background-color: #2d3143;
+  border-color: #272a3a;
+}
+.color-theme-2 .dropdown-menu .dropdown-caret .caret-inner {
+  border-bottom: 9px solid #2d3143;
+}
+.color-theme-2 .dropdown-menu .buttons {
+  border-color: #272a3a;
+}
+.color-theme-2 .dropdown-menu .button {
+  color: #62677f;
+}
+.color-theme-2 .dropdown-menu .button:hover {
+  color: #f4f4f5;
+}
+.book .book-header .font-settings .font-enlarge {
+  line-height: 30px;
+  font-size: 1.4em;
+}
+.book .book-header .font-settings .font-reduce {
+  line-height: 30px;
+  font-size: 1em;
+}
+.book.color-theme-1 .book-body {
+  color: #704214;
+  background: #f3eacb;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section {
+  background: #f3eacb;
+}
+.book.color-theme-2 .book-body {
+  color: #bdcadb;
+  background: #1c1f2b;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section {
+  background: #1c1f2b;
+}
+.book.font-size-0 .book-body .page-inner section {
+  font-size: 1.2rem;
+}
+.book.font-size-1 .book-body .page-inner section {
+  font-size: 1.4rem;
+}
+.book.font-size-2 .book-body .page-inner section {
+  font-size: 1.6rem;
+}
+.book.font-size-3 .book-body .page-inner section {
+  font-size: 2.2rem;
+}
+.book.font-size-4 .book-body .page-inner section {
+  font-size: 4rem;
+}
+.book.font-family-0 {
+  font-family: Georgia, serif;
+}
+.book.font-family-1 {
+  font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal {
+  color: #704214;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal a {
+  color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h1,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h2,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h3,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h4,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h5,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h6 {
+  color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h1,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h2 {
+  border-color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h6 {
+  color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal hr {
+  background-color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal blockquote {
+  border-color: #c4b29f;
+  opacity: 0.9;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code {
+  background: #fdf6e3;
+  color: #657b83;
+  border-color: #f8df9c;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal .highlight {
+  background-color: inherit;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table th,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table td {
+  border-color: #f5d06c;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table tr {
+  color: inherit;
+  background-color: #fdf6e3;
+  border-color: #444444;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n) {
+  background-color: #fbeecb;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal {
+  color: #bdcadb;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal a {
+  color: #3eb1d0;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h1,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h2,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h3,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h4,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h5,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h6 {
+  color: #fffffa;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h1,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h2 {
+  border-color: #373b4e;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h6 {
+  color: #373b4e;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal hr {
+  background-color: #373b4e;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal blockquote {
+  border-color: #373b4e;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code {
+  color: #9dbed8;
+  background: #2d3143;
+  border-color: #2d3143;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal .highlight {
+  background-color: #282a39;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table th,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table td {
+  border-color: #3b3f54;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table tr {
+  color: #b6c2d2;
+  background-color: #2d3143;
+  border-color: #3b3f54;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n) {
+  background-color: #35394b;
+}
+.book.color-theme-1 .book-header {
+  color: #afa790;
+  background: transparent;
+}
+.book.color-theme-1 .book-header .btn {
+  color: #afa790;
+}
+.book.color-theme-1 .book-header .btn:hover {
+  color: #73553c;
+  background: none;
+}
+.book.color-theme-1 .book-header h1 {
+  color: #704214;
+}
+.book.color-theme-2 .book-header {
+  color: #7e888b;
+  background: transparent;
+}
+.book.color-theme-2 .book-header .btn {
+  color: #3b3f54;
+}
+.book.color-theme-2 .book-header .btn:hover {
+  color: #fffff5;
+  background: none;
+}
+.book.color-theme-2 .book-header h1 {
+  color: #bdcadb;
+}
+.book.color-theme-1 .book-body .navigation {
+  color: #afa790;
+}
+.book.color-theme-1 .book-body .navigation:hover {
+  color: #73553c;
+}
+.book.color-theme-2 .book-body .navigation {
+  color: #383f52;
+}
+.book.color-theme-2 .book-body .navigation:hover {
+  color: #fffff5;
+}
+/*
+ * Theme 1
+ */
+.book.color-theme-1 .book-summary {
+  color: #afa790;
+  background: #111111;
+  border-right: 1px solid rgba(0, 0, 0, 0.07);
+}
+.book.color-theme-1 .book-summary .book-search {
+  background: transparent;
+}
+.book.color-theme-1 .book-summary .book-search input,
+.book.color-theme-1 .book-summary .book-search input:focus {
+  border: 1px solid transparent;
+}
+.book.color-theme-1 .book-summary ul.summary li.divider {
+  background: #7e888b;
+  box-shadow: none;
+}
+.book.color-theme-1 .book-summary ul.summary li i.fa-check {
+  color: #33cc33;
+}
+.book.color-theme-1 .book-summary ul.summary li.done > a {
+  color: #877f6a;
+}
+.book.color-theme-1 .book-summary ul.summary li a,
+.book.color-theme-1 .book-summary ul.summary li span {
+  color: #877f6a;
+  background: transparent;
+  font-weight: normal;
+}
+.book.color-theme-1 .book-summary ul.summary li.active > a,
+.book.color-theme-1 .book-summary ul.summary li a:hover {
+  color: #704214;
+  background: transparent;
+  font-weight: normal;
+}
+/*
+ * Theme 2
+ */
+.book.color-theme-2 .book-summary {
+  color: #bcc1d2;
+  background: #2d3143;
+  border-right: none;
+}
+.book.color-theme-2 .book-summary .book-search {
+  background: transparent;
+}
+.book.color-theme-2 .book-summary .book-search input,
+.book.color-theme-2 .book-summary .book-search input:focus {
+  border: 1px solid transparent;
+}
+.book.color-theme-2 .book-summary ul.summary li.divider {
+  background: #272a3a;
+  box-shadow: none;
+}
+.book.color-theme-2 .book-summary ul.summary li i.fa-check {
+  color: #33cc33;
+}
+.book.color-theme-2 .book-summary ul.summary li.done > a {
+  color: #62687f;
+}
+.book.color-theme-2 .book-summary ul.summary li a,
+.book.color-theme-2 .book-summary ul.summary li span {
+  color: #c1c6d7;
+  background: transparent;
+  font-weight: 600;
+}
+.book.color-theme-2 .book-summary ul.summary li.active > a,
+.book.color-theme-2 .book-summary ul.summary li a:hover {
+  color: #f4f4f5;
+  background: #252737;
+  font-weight: 600;
+}
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css
new file mode 100644
index 000000000..2aabd3deb
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css
@@ -0,0 +1,426 @@
+.book .book-body .page-wrapper .page-inner section.normal pre,
+.book .book-body .page-wrapper .page-inner section.normal code {
+  /* http://jmblog.github.com/color-themes-for-google-code-highlightjs */
+  /* Tomorrow Comment */
+  /* Tomorrow Red */
+  /* Tomorrow Orange */
+  /* Tomorrow Yellow */
+  /* Tomorrow Green */
+  /* Tomorrow Aqua */
+  /* Tomorrow Blue */
+  /* Tomorrow Purple */
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-comment,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-comment,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-title {
+  color: #8e908c;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-variable,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-variable,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-attribute,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-tag,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-tag,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-regexp,
+.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-constant,
+.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-constant,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-tag .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-tag .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-pi,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-pi,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-doctype,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-doctype,
+.book .book-body .page-wrapper .page-inner section.normal pre .html .hljs-doctype,
+.book .book-body .page-wrapper .page-inner section.normal code .html .hljs-doctype,
+.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-id,
+.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-id,
+.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-class,
+.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-class,
+.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo,
+.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo {
+  color: #c82829;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-number,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-number,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-pragma,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-built_in,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-literal,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-literal,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-params,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-params,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-constant,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-constant {
+  color: #f5871f;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-class .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-class .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-rules .hljs-attribute,
+.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-rules .hljs-attribute {
+  color: #eab700;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-string,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-string,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-value,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-value,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-inheritance,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-inheritance,
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-header,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-header,
+.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-symbol,
+.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-symbol,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata {
+  color: #718c00;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-hexcolor,
+.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-hexcolor {
+  color: #3e999f;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-function,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-function,
+.book .book-body .page-wrapper .page-inner section.normal pre .python .hljs-decorator,
+.book .book-body .page-wrapper .page-inner section.normal code .python .hljs-decorator,
+.book .book-body .page-wrapper .page-inner section.normal pre .python .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .python .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-function .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-function .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-title .hljs-keyword,
+.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-title .hljs-keyword,
+.book .book-body .page-wrapper .page-inner section.normal pre .perl .hljs-sub,
+.book .book-body .page-wrapper .page-inner section.normal code .perl .hljs-sub,
+.book .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal pre .coffeescript .hljs-title,
+.book .book-body .page-wrapper .page-inner section.normal code .coffeescript .hljs-title {
+  color: #4271ae;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs-keyword,
+.book .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-function,
+.book .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-function {
+  color: #8959a8;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .hljs,
+.book .book-body .page-wrapper .page-inner section.normal code .hljs {
+  display: block;
+  background: white;
+  color: #4d4d4c;
+  padding: 0.5em;
+}
+.book .book-body .page-wrapper .page-inner section.normal pre .coffeescript .javascript,
+.book .book-body .page-wrapper .page-inner section.normal code .coffeescript .javascript,
+.book .book-body .page-wrapper .page-inner section.normal pre .javascript .xml,
+.book .book-body .page-wrapper .page-inner section.normal code .javascript .xml,
+.book .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula,
+.book .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .javascript,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .javascript,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .vbscript,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .vbscript,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .css,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .css,
+.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata,
+.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata {
+  opacity: 0.5;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code {
+  /*
+
+Orginal Style from ethanschoonover.com/solarized (c) Jeremy Hull <sourdrums@gmail.com>
+
+*/
+  /* Solarized Green */
+  /* Solarized Cyan */
+  /* Solarized Blue */
+  /* Solarized Yellow */
+  /* Solarized Orange */
+  /* Solarized Red */
+  /* Solarized Violet */
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs {
+  display: block;
+  padding: 0.5em;
+  background: #fdf6e3;
+  color: #657b83;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-comment,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-comment,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-template_comment,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-template_comment,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .diff .hljs-header,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .diff .hljs-header,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-doctype,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-doctype,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-pi,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-pi,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .lisp .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .lisp .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-javadoc,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-javadoc {
+  color: #93a1a1;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-keyword,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-winutils,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-winutils,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .method,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .method,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-addition,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-addition,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-tag,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-tag,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-request,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-request,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-status,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-status,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .nginx .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .nginx .hljs-title {
+  color: #859900;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-number,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-number,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-command,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-command,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-tag .hljs-value,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-tag .hljs-value,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-rules .hljs-value,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-rules .hljs-value,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-phpdoc,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-phpdoc,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-regexp,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-hexcolor,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-hexcolor,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_url,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_url {
+  color: #2aa198;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-localvars,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-localvars,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-chunk,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-chunk,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-decorator,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-decorator,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-built_in,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-identifier,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-identifier,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .vhdl .hljs-literal,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .vhdl .hljs-literal,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-id,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-id,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-function,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-function {
+  color: #268bd2;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-attribute,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-variable,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-variable,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .lisp .hljs-body,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .lisp .hljs-body,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .smalltalk .hljs-number,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .smalltalk .hljs-number,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-constant,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-constant,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-class .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-class .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-parent,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-parent,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .haskell .hljs-type,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .haskell .hljs-type,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_reference,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_reference {
+  color: #b58900;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor .hljs-keyword,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor .hljs-keyword,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-pragma,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-shebang,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-shebang,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-symbol,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-symbol,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-symbol .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-symbol .hljs-string,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .diff .hljs-change,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .diff .hljs-change,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-special,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-special,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-attr_selector,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-attr_selector,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-subst,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-subst,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-cdata,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-cdata,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .clojure .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .clojure .hljs-title,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-header,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-header {
+  color: #cb4b16;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-deletion,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-deletion,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-important,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-important {
+  color: #dc322f;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_label,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_label {
+  color: #6c71c4;
+}
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula,
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula {
+  background: #eee8d5;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code {
+  /* Tomorrow Night Bright Theme */
+  /* Original theme - https://github.com/chriskempson/tomorrow-theme */
+  /* http://jmblog.github.com/color-themes-for-google-code-highlightjs */
+  /* Tomorrow Comment */
+  /* Tomorrow Red */
+  /* Tomorrow Orange */
+  /* Tomorrow Yellow */
+  /* Tomorrow Green */
+  /* Tomorrow Aqua */
+  /* Tomorrow Blue */
+  /* Tomorrow Purple */
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-comment,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-comment,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-title {
+  color: #969896;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-variable,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-variable,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-attribute,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-tag,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-tag,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-regexp,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-constant,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-constant,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-tag .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-tag .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-pi,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-pi,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-doctype,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-doctype,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .html .hljs-doctype,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .html .hljs-doctype,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-id,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-id,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-class,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-class,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo {
+  color: #d54e53;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-number,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-number,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-pragma,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-built_in,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-literal,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-literal,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-params,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-params,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-constant,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-constant {
+  color: #e78c45;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-class .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-class .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-rules .hljs-attribute,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-rules .hljs-attribute {
+  color: #e7c547;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-string,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-string,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-value,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-value,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-inheritance,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-inheritance,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-header,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-header,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-symbol,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-symbol,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata {
+  color: #b9ca4a;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-hexcolor,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-hexcolor {
+  color: #70c0b1;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-function,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-function,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .python .hljs-decorator,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .python .hljs-decorator,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .python .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .python .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-function .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-function .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-title .hljs-keyword,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-title .hljs-keyword,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .perl .hljs-sub,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .perl .hljs-sub,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .coffeescript .hljs-title,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .coffeescript .hljs-title {
+  color: #7aa6da;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-keyword,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-function,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-function {
+  color: #c397d8;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs {
+  display: block;
+  background: black;
+  color: #eaeaea;
+  padding: 0.5em;
+}
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .coffeescript .javascript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .coffeescript .javascript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .xml,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .xml,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .javascript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .javascript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .vbscript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .vbscript,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .css,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .css,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata {
+  opacity: 0.5;
+}
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css
new file mode 100644
index 000000000..d7ff2d991
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css
@@ -0,0 +1,28 @@
+.book .book-summary .book-search {
+  padding: 6px;
+  background: transparent;
+  position: absolute;
+  top: -50px;
+  left: 0px;
+  right: 0px;
+  transition: top 0.5s ease;
+}
+.book .book-summary .book-search input,
+.book .book-summary .book-search input:focus,
+.book .book-summary .book-search input:hover {
+  width: 100%;
+  background: transparent;
+  border: 1px solid #ccc;
+  box-shadow: none;
+  outline: none;
+  line-height: 22px;
+  padding: 7px 4px;
+  color: inherit;
+  box-sizing: border-box;
+}
+.book.with-search .book-summary .book-search {
+  top: 0px;
+}
+.book.with-search .book-summary ul.summary {
+  top: 50px;
+}
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css
new file mode 100644
index 000000000..7fba1b9fb
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css
@@ -0,0 +1 @@
+.book .book-body .page-wrapper .page-inner section.normal table{display:table;width:100%;border-collapse:collapse;border-spacing:0;overflow:auto}.book .book-body .page-wrapper .page-inner section.normal table td,.book .book-body .page-wrapper .page-inner section.normal table th{padding:6px 13px;border:1px solid #ddd}.book .book-body .page-wrapper .page-inner section.normal table tr{background-color:#fff;border-top:1px solid #ccc}.book .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n){background-color:#f8f8f8}.book .book-body .page-wrapper .page-inner section.normal table th{font-weight:700}
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css
new file mode 100644
index 000000000..b89689209
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css
@@ -0,0 +1,10 @@
+/*! normalize.css v2.1.0 | MIT License | git.io/normalize */img,legend{border:0}*,.fa{-webkit-font-smoothing:antialiased}.fa-ul>li,sub,sup{position:relative}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book-langs-index .inner .languages:after,.buttons:after,.dropdown-menu .buttons:after{clear:both}body,html{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}article,aside,details,figcaption,figure,footer,header,hgroup,main,nav,section,summary{display:block}audio,canvas,video{display:inline-block}.hidden,[hidden]{display:none}audio:not([controls]){display:none;height:0}html{font-family:sans-serif}body,figure{margin:0}a:focus{outline:dotted thin}a:active,a:hover{outline:0}h1{font-size:2em;margin:.67em 0}abbr[title]{border-bottom:1px dotted}b,strong{font-weight:700}dfn{font-style:italic}hr{-moz-box-sizing:content-box;box-sizing:content-box;height:0}mark{background:#ff0;color:#000}code,kbd,pre,samp{font-family:monospace,serif;font-size:1em}pre{white-space:pre-wrap}q{quotes:"\201C" "\201D" "\2018" "\2019"}small{font-size:80%}sub,sup{font-size:75%;line-height:0;vertical-align:baseline}sup{top:-.5em}sub{bottom:-.25em}svg:not(:root){overflow:hidden}fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}legend{padding:0}button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}button,input{line-height:normal}button,select{text-transform:none}button,html input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer}button[disabled],html input[disabled]{cursor:default}input[type=checkbox],input[type=radio]{box-sizing:border-box;padding:0}input[type=search]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}input[type=search]::-webkit-search-cancel-button{margin-right:10px;}button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}textarea{overflow:auto;vertical-align:top}table{border-collapse:collapse;border-spacing:0}/*!
+ * Preboot v2
+ *
+ * Open sourced under MIT license by @mdo.
+ * Some variables and mixins from Bootstrap (Apache 2 license).
+ */.link-inherit,.link-inherit:focus,.link-inherit:hover{color:inherit}.fa,.fa-stack{display:inline-block}/*!
+ *  Font Awesome 4.1.0 by @davegandy - http://fontawesome.io - @fontawesome
+ *  License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
+ */@font-face{font-family:FontAwesome;src:url(./fontawesome/fontawesome-webfont.ttf?v=4.1.0) format('truetype');font-weight:400;font-style:normal}.fa{font-family:FontAwesome;font-style:normal;font-weight:400;line-height:1;-moz-osx-font-smoothing:grayscale}.book .book-header,.book .book-summary{font-family:"Helvetica Neue",Helvetica,Arial,sans-serif}.fa-lg{font-size:1.33333333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571429em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14285714em;list-style-type:none}.fa-li{position:absolute;left:-2.14285714em;width:2.14285714em;top:.14285714em;text-align:center}.fa-li.fa-lg{left:-1.85714286em}.fa-border{padding:.2em .25em .15em;border:.08em solid #eee;border-radius:.1em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left{margin-right:.3em}.fa.pull-right{margin-left:.3em}.fa-spin{-webkit-animation:spin 2s infinite linear;-moz-animation:spin 2s infinite linear;-o-animation:spin 2s infinite linear;animation:spin 2s infinite linear}@-moz-keyframes spin{0%{-moz-transform:rotate(0)}100%{-moz-transform:rotate(359deg)}}@-webkit-keyframes spin{0%{-webkit-transform:rotate(0)}100%{-webkit-transform:rotate(359deg)}}@-o-keyframes spin{0%{-o-transform:rotate(0)}100%{-o-transform:rotate(359deg)}}@keyframes spin{0%{-webkit-transform:rotate(0);transform:rotate(0)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=1);-webkit-transform:rotate(90deg);-moz-transform:rotate(90deg);-ms-transform:rotate(90deg);-o-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2);-webkit-transform:rotate(180deg);-moz-transform:rotate(180deg);-ms-transform:rotate(180deg);-o-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=3);-webkit-transform:rotate(270deg);-moz-transform:rotate(270deg);-ms-transform:rotate(270deg);-o-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1);-webkit-transform:scale(-1,1);-moz-transform:scale(-1,1);-ms-transform:scale(-1,1);-o-transform:scale(-1,1);transform:scale(-1,1)}.fa-flip-vertical{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1);-webkit-transform:scale(1,-1);-moz-transform:scale(1,-1);-ms-transform:scale(1,-1);-o-transform:scale(1,-1);transform:scale(1,-1)}.fa-stack{position:relative;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:"\f000"}.fa-music:before{content:"\f001"}.fa-search:before{content:"\f002"}.fa-envelope-o:before{content:"\f003"}.fa-heart:before{content:"\f004"}.fa-star:before{content:"\f005"}.fa-star-o:before{content:"\f006"}.fa-user:before{content:"\f007"}.fa-film:before{content:"\f008"}.fa-th-large:before{content:"\f009"}.fa-th:before{content:"\f00a"}.fa-th-list:before{content:"\f00b"}.fa-check:before{content:"\f00c"}.fa-times:before{content:"\f00d"}.fa-search-plus:before{content:"\f00e"}.fa-search-minus:before{content:"\f010"}.fa-power-off:before{content:"\f011"}.fa-signal:before{content:"\f012"}.fa-cog:before,.fa-gear:before{content:"\f013"}.fa-trash-o:before{content:"\f014"}.fa-home:before{content:"\f015"}.fa-file-o:before{content:"\f016"}.fa-clock-o:before{content:"\f017"}.fa-road:before{content:"\f018"}.fa-download:before{content:"\f019"}.fa-arrow-circle-o-down:before{content:"\f01a"}.fa-arrow-circle-o-up:before{content:"\f01b"}.fa-inbox:before{content:"\f01c"}.fa-play-circle-o:before{content:"\f01d"}.fa-repeat:before,.fa-rotate-right:before{content:"\f01e"}.fa-refresh:before{content:"\f021"}.fa-list-alt:before{content:"\f022"}.fa-lock:before{content:"\f023"}.fa-flag:before{content:"\f024"}.fa-headphones:before{content:"\f025"}.fa-volume-off:before{content:"\f026"}.fa-volume-down:before{content:"\f027"}.fa-volume-up:before{content:"\f028"}.fa-qrcode:before{content:"\f029"}.fa-barcode:before{content:"\f02a"}.fa-tag:before{content:"\f02b"}.fa-tags:before{content:"\f02c"}.fa-book:before{content:"\f02d"}.fa-bookmark:before{content:"\f02e"}.fa-print:before{content:"\f02f"}.fa-camera:before{content:"\f030"}.fa-font:before{content:"\f031"}.fa-bold:before{content:"\f032"}.fa-italic:before{content:"\f033"}.fa-text-height:before{content:"\f034"}.fa-text-width:before{content:"\f035"}.fa-align-left:before{content:"\f036"}.fa-align-center:before{content:"\f037"}.fa-align-right:before{content:"\f038"}.fa-align-justify:before{content:"\f039"}.fa-list:before{content:"\f03a"}.fa-dedent:before,.fa-outdent:before{content:"\f03b"}.fa-indent:before{content:"\f03c"}.fa-video-camera:before{content:"\f03d"}.fa-image:before,.fa-photo:before,.fa-picture-o:before{content:"\f03e"}.fa-pencil:before{content:"\f040"}.fa-map-marker:before{content:"\f041"}.fa-adjust:before{content:"\f042"}.fa-tint:before{content:"\f043"}.fa-edit:before,.fa-pencil-square-o:before{content:"\f044"}.fa-share-square-o:before{content:"\f045"}.fa-check-square-o:before{content:"\f046"}.fa-arrows:before{content:"\f047"}.fa-step-backward:before{content:"\f048"}.fa-fast-backward:before{content:"\f049"}.fa-backward:before{content:"\f04a"}.fa-play:before{content:"\f04b"}.fa-pause:before{content:"\f04c"}.fa-stop:before{content:"\f04d"}.fa-forward:before{content:"\f04e"}.fa-fast-forward:before{content:"\f050"}.fa-step-forward:before{content:"\f051"}.fa-eject:before{content:"\f052"}.fa-chevron-left:before{content:"\f053"}.fa-chevron-right:before{content:"\f054"}.fa-plus-circle:before{content:"\f055"}.fa-minus-circle:before{content:"\f056"}.fa-times-circle:before{content:"\f057"}.fa-check-circle:before{content:"\f058"}.fa-question-circle:before{content:"\f059"}.fa-info-circle:before{content:"\f05a"}.fa-crosshairs:before{content:"\f05b"}.fa-times-circle-o:before{content:"\f05c"}.fa-check-circle-o:before{content:"\f05d"}.fa-ban:before{content:"\f05e"}.fa-arrow-left:before{content:"\f060"}.fa-arrow-right:before{content:"\f061"}.fa-arrow-up:before{content:"\f062"}.fa-arrow-down:before{content:"\f063"}.fa-mail-forward:before,.fa-share:before{content:"\f064"}.fa-expand:before{content:"\f065"}.fa-compress:before{content:"\f066"}.fa-plus:before{content:"\f067"}.fa-minus:before{content:"\f068"}.fa-asterisk:before{content:"\f069"}.fa-exclamation-circle:before{content:"\f06a"}.fa-gift:before{content:"\f06b"}.fa-leaf:before{content:"\f06c"}.fa-fire:before{content:"\f06d"}.fa-eye:before{content:"\f06e"}.fa-eye-slash:before{content:"\f070"}.fa-exclamation-triangle:before,.fa-warning:before{content:"\f071"}.fa-plane:before{content:"\f072"}.fa-calendar:before{content:"\f073"}.fa-random:before{content:"\f074"}.fa-comment:before{content:"\f075"}.fa-magnet:before{content:"\f076"}.fa-chevron-up:before{content:"\f077"}.fa-chevron-down:before{content:"\f078"}.fa-retweet:before{content:"\f079"}.fa-shopping-cart:before{content:"\f07a"}.fa-folder:before{content:"\f07b"}.fa-folder-open:before{content:"\f07c"}.fa-arrows-v:before{content:"\f07d"}.fa-arrows-h:before{content:"\f07e"}.fa-bar-chart-o:before{content:"\f080"}.fa-twitter-square:before{content:"\f081"}.fa-facebook-square:before{content:"\f082"}.fa-camera-retro:before{content:"\f083"}.fa-key:before{content:"\f084"}.fa-cogs:before,.fa-gears:before{content:"\f085"}.fa-comments:before{content:"\f086"}.fa-thumbs-o-up:before{content:"\f087"}.fa-thumbs-o-down:before{content:"\f088"}.fa-star-half:before{content:"\f089"}.fa-heart-o:before{content:"\f08a"}.fa-sign-out:before{content:"\f08b"}.fa-linkedin-square:before{content:"\f08c"}.fa-thumb-tack:before{content:"\f08d"}.fa-external-link:before{content:"\f08e"}.fa-sign-in:before{content:"\f090"}.fa-trophy:before{content:"\f091"}.fa-github-square:before{content:"\f092"}.fa-upload:before{content:"\f093"}.fa-lemon-o:before{content:"\f094"}.fa-phone:before{content:"\f095"}.fa-square-o:before{content:"\f096"}.fa-bookmark-o:before{content:"\f097"}.fa-phone-square:before{content:"\f098"}.fa-twitter:before{content:"\f099"}.fa-facebook:before{content:"\f09a"}.fa-github:before{content:"\f09b"}.fa-unlock:before{content:"\f09c"}.fa-credit-card:before{content:"\f09d"}.fa-rss:before{content:"\f09e"}.fa-hdd-o:before{content:"\f0a0"}.fa-bullhorn:before{content:"\f0a1"}.fa-bell:before{content:"\f0f3"}.fa-certificate:before{content:"\f0a3"}.fa-hand-o-right:before{content:"\f0a4"}.fa-hand-o-left:before{content:"\f0a5"}.fa-hand-o-up:before{content:"\f0a6"}.fa-hand-o-down:before{content:"\f0a7"}.fa-arrow-circle-left:before{content:"\f0a8"}.fa-arrow-circle-right:before{content:"\f0a9"}.fa-arrow-circle-up:before{content:"\f0aa"}.fa-arrow-circle-down:before{content:"\f0ab"}.fa-globe:before{content:"\f0ac"}.fa-wrench:before{content:"\f0ad"}.fa-tasks:before{content:"\f0ae"}.fa-filter:before{content:"\f0b0"}.fa-briefcase:before{content:"\f0b1"}.fa-arrows-alt:before{content:"\f0b2"}.fa-group:before,.fa-users:before{content:"\f0c0"}.fa-chain:before,.fa-link:before{content:"\f0c1"}.fa-cloud:before{content:"\f0c2"}.fa-flask:before{content:"\f0c3"}.fa-cut:before,.fa-scissors:before{content:"\f0c4"}.fa-copy:before,.fa-files-o:before{content:"\f0c5"}.fa-paperclip:before{content:"\f0c6"}.fa-floppy-o:before,.fa-save:before{content:"\f0c7"}.fa-square:before{content:"\f0c8"}.fa-bars:before,.fa-navicon:before,.fa-reorder:before{content:"\f0c9"}.fa-list-ul:before{content:"\f0ca"}.fa-list-ol:before{content:"\f0cb"}.fa-strikethrough:before{content:"\f0cc"}.fa-underline:before{content:"\f0cd"}.fa-table:before{content:"\f0ce"}.fa-magic:before{content:"\f0d0"}.fa-truck:before{content:"\f0d1"}.fa-pinterest:before{content:"\f0d2"}.fa-pinterest-square:before{content:"\f0d3"}.fa-google-plus-square:before{content:"\f0d4"}.fa-google-plus:before{content:"\f0d5"}.fa-money:before{content:"\f0d6"}.fa-caret-down:before{content:"\f0d7"}.fa-caret-up:before{content:"\f0d8"}.fa-caret-left:before{content:"\f0d9"}.fa-caret-right:before{content:"\f0da"}.fa-columns:before{content:"\f0db"}.fa-sort:before,.fa-unsorted:before{content:"\f0dc"}.fa-sort-desc:before,.fa-sort-down:before{content:"\f0dd"}.fa-sort-asc:before,.fa-sort-up:before{content:"\f0de"}.fa-envelope:before{content:"\f0e0"}.fa-linkedin:before{content:"\f0e1"}.fa-rotate-left:before,.fa-undo:before{content:"\f0e2"}.fa-gavel:before,.fa-legal:before{content:"\f0e3"}.fa-dashboard:before,.fa-tachometer:before{content:"\f0e4"}.fa-comment-o:before{content:"\f0e5"}.fa-comments-o:before{content:"\f0e6"}.fa-bolt:before,.fa-flash:before{content:"\f0e7"}.fa-sitemap:before{content:"\f0e8"}.fa-umbrella:before{content:"\f0e9"}.fa-clipboard:before,.fa-paste:before{content:"\f0ea"}.fa-lightbulb-o:before{content:"\f0eb"}.fa-exchange:before{content:"\f0ec"}.fa-cloud-download:before{content:"\f0ed"}.fa-cloud-upload:before{content:"\f0ee"}.fa-user-md:before{content:"\f0f0"}.fa-stethoscope:before{content:"\f0f1"}.fa-suitcase:before{content:"\f0f2"}.fa-bell-o:before{content:"\f0a2"}.fa-coffee:before{content:"\f0f4"}.fa-cutlery:before{content:"\f0f5"}.fa-file-text-o:before{content:"\f0f6"}.fa-building-o:before{content:"\f0f7"}.fa-hospital-o:before{content:"\f0f8"}.fa-ambulance:before{content:"\f0f9"}.fa-medkit:before{content:"\f0fa"}.fa-fighter-jet:before{content:"\f0fb"}.fa-beer:before{content:"\f0fc"}.fa-h-square:before{content:"\f0fd"}.fa-plus-square:before{content:"\f0fe"}.fa-angle-double-left:before{content:"\f100"}.fa-angle-double-right:before{content:"\f101"}.fa-angle-double-up:before{content:"\f102"}.fa-angle-double-down:before{content:"\f103"}.fa-angle-left:before{content:"\f104"}.fa-angle-right:before{content:"\f105"}.fa-angle-up:before{content:"\f106"}.fa-angle-down:before{content:"\f107"}.fa-desktop:before{content:"\f108"}.fa-laptop:before{content:"\f109"}.fa-tablet:before{content:"\f10a"}.fa-mobile-phone:before,.fa-mobile:before{content:"\f10b"}.fa-circle-o:before{content:"\f10c"}.fa-quote-left:before{content:"\f10d"}.fa-quote-right:before{content:"\f10e"}.fa-spinner:before{content:"\f110"}.fa-circle:before{content:"\f111"}.fa-mail-reply:before,.fa-reply:before{content:"\f112"}.fa-github-alt:before{content:"\f113"}.fa-folder-o:before{content:"\f114"}.fa-folder-open-o:before{content:"\f115"}.fa-smile-o:before{content:"\f118"}.fa-frown-o:before{content:"\f119"}.fa-meh-o:before{content:"\f11a"}.fa-gamepad:before{content:"\f11b"}.fa-keyboard-o:before{content:"\f11c"}.fa-flag-o:before{content:"\f11d"}.fa-flag-checkered:before{content:"\f11e"}.fa-terminal:before{content:"\f120"}.fa-code:before{content:"\f121"}.fa-mail-reply-all:before,.fa-reply-all:before{content:"\f122"}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:"\f123"}.fa-location-arrow:before{content:"\f124"}.fa-crop:before{content:"\f125"}.fa-code-fork:before{content:"\f126"}.fa-chain-broken:before,.fa-unlink:before{content:"\f127"}.fa-question:before{content:"\f128"}.fa-info:before{content:"\f129"}.fa-exclamation:before{content:"\f12a"}.fa-superscript:before{content:"\f12b"}.fa-subscript:before{content:"\f12c"}.fa-eraser:before{content:"\f12d"}.fa-puzzle-piece:before{content:"\f12e"}.fa-microphone:before{content:"\f130"}.fa-microphone-slash:before{content:"\f131"}.fa-shield:before{content:"\f132"}.fa-calendar-o:before{content:"\f133"}.fa-fire-extinguisher:before{content:"\f134"}.fa-rocket:before{content:"\f135"}.fa-maxcdn:before{content:"\f136"}.fa-chevron-circle-left:before{content:"\f137"}.fa-chevron-circle-right:before{content:"\f138"}.fa-chevron-circle-up:before{content:"\f139"}.fa-chevron-circle-down:before{content:"\f13a"}.fa-html5:before{content:"\f13b"}.fa-css3:before{content:"\f13c"}.fa-anchor:before{content:"\f13d"}.fa-unlock-alt:before{content:"\f13e"}.fa-bullseye:before{content:"\f140"}.fa-ellipsis-h:before{content:"\f141"}.fa-ellipsis-v:before{content:"\f142"}.fa-rss-square:before{content:"\f143"}.fa-play-circle:before{content:"\f144"}.fa-ticket:before{content:"\f145"}.fa-minus-square:before{content:"\f146"}.fa-minus-square-o:before{content:"\f147"}.fa-level-up:before{content:"\f148"}.fa-level-down:before{content:"\f149"}.fa-check-square:before{content:"\f14a"}.fa-pencil-square:before{content:"\f14b"}.fa-external-link-square:before{content:"\f14c"}.fa-share-square:before{content:"\f14d"}.fa-compass:before{content:"\f14e"}.fa-caret-square-o-down:before,.fa-toggle-down:before{content:"\f150"}.fa-caret-square-o-up:before,.fa-toggle-up:before{content:"\f151"}.fa-caret-square-o-right:before,.fa-toggle-right:before{content:"\f152"}.fa-eur:before,.fa-euro:before{content:"\f153"}.fa-gbp:before{content:"\f154"}.fa-dollar:before,.fa-usd:before{content:"\f155"}.fa-inr:before,.fa-rupee:before{content:"\f156"}.fa-cny:before,.fa-jpy:before,.fa-rmb:before,.fa-yen:before{content:"\f157"}.fa-rouble:before,.fa-rub:before,.fa-ruble:before{content:"\f158"}.fa-krw:before,.fa-won:before{content:"\f159"}.fa-bitcoin:before,.fa-btc:before{content:"\f15a"}.fa-file:before{content:"\f15b"}.fa-file-text:before{content:"\f15c"}.fa-sort-alpha-asc:before{content:"\f15d"}.fa-sort-alpha-desc:before{content:"\f15e"}.fa-sort-amount-asc:before{content:"\f160"}.fa-sort-amount-desc:before{content:"\f161"}.fa-sort-numeric-asc:before{content:"\f162"}.fa-sort-numeric-desc:before{content:"\f163"}.fa-thumbs-up:before{content:"\f164"}.fa-thumbs-down:before{content:"\f165"}.fa-youtube-square:before{content:"\f166"}.fa-youtube:before{content:"\f167"}.fa-xing:before{content:"\f168"}.fa-xing-square:before{content:"\f169"}.fa-youtube-play:before{content:"\f16a"}.fa-dropbox:before{content:"\f16b"}.fa-stack-overflow:before{content:"\f16c"}.fa-instagram:before{content:"\f16d"}.fa-flickr:before{content:"\f16e"}.fa-adn:before{content:"\f170"}.fa-bitbucket:before{content:"\f171"}.fa-bitbucket-square:before{content:"\f172"}.fa-tumblr:before{content:"\f173"}.fa-tumblr-square:before{content:"\f174"}.fa-long-arrow-down:before{content:"\f175"}.fa-long-arrow-up:before{content:"\f176"}.fa-long-arrow-left:before{content:"\f177"}.fa-long-arrow-right:before{content:"\f178"}.fa-apple:before{content:"\f179"}.fa-windows:before{content:"\f17a"}.fa-android:before{content:"\f17b"}.fa-linux:before{content:"\f17c"}.fa-dribbble:before{content:"\f17d"}.fa-skype:before{content:"\f17e"}.fa-foursquare:before{content:"\f180"}.fa-trello:before{content:"\f181"}.fa-female:before{content:"\f182"}.fa-male:before{content:"\f183"}.fa-gittip:before{content:"\f184"}.fa-sun-o:before{content:"\f185"}.fa-moon-o:before{content:"\f186"}.fa-archive:before{content:"\f187"}.fa-bug:before{content:"\f188"}.fa-vk:before{content:"\f189"}.fa-weibo:before{content:"\f18a"}.fa-renren:before{content:"\f18b"}.fa-pagelines:before{content:"\f18c"}.fa-stack-exchange:before{content:"\f18d"}.fa-arrow-circle-o-right:before{content:"\f18e"}.fa-arrow-circle-o-left:before{content:"\f190"}.fa-caret-square-o-left:before,.fa-toggle-left:before{content:"\f191"}.fa-dot-circle-o:before{content:"\f192"}.fa-wheelchair:before{content:"\f193"}.fa-vimeo-square:before{content:"\f194"}.fa-try:before,.fa-turkish-lira:before{content:"\f195"}.fa-plus-square-o:before{content:"\f196"}.fa-space-shuttle:before{content:"\f197"}.fa-slack:before{content:"\f198"}.fa-envelope-square:before{content:"\f199"}.fa-wordpress:before{content:"\f19a"}.fa-openid:before{content:"\f19b"}.fa-bank:before,.fa-institution:before,.fa-university:before{content:"\f19c"}.fa-graduation-cap:before,.fa-mortar-board:before{content:"\f19d"}.fa-yahoo:before{content:"\f19e"}.fa-google:before{content:"\f1a0"}.fa-reddit:before{content:"\f1a1"}.fa-reddit-square:before{content:"\f1a2"}.fa-stumbleupon-circle:before{content:"\f1a3"}.fa-stumbleupon:before{content:"\f1a4"}.fa-delicious:before{content:"\f1a5"}.fa-digg:before{content:"\f1a6"}.fa-pied-piper-square:before,.fa-pied-piper:before{content:"\f1a7"}.fa-pied-piper-alt:before{content:"\f1a8"}.fa-drupal:before{content:"\f1a9"}.fa-joomla:before{content:"\f1aa"}.fa-language:before{content:"\f1ab"}.fa-fax:before{content:"\f1ac"}.fa-building:before{content:"\f1ad"}.fa-child:before{content:"\f1ae"}.fa-paw:before{content:"\f1b0"}.fa-spoon:before{content:"\f1b1"}.fa-cube:before{content:"\f1b2"}.fa-cubes:before{content:"\f1b3"}.fa-behance:before{content:"\f1b4"}.fa-behance-square:before{content:"\f1b5"}.fa-steam:before{content:"\f1b6"}.fa-steam-square:before{content:"\f1b7"}.fa-recycle:before{content:"\f1b8"}.fa-automobile:before,.fa-car:before{content:"\f1b9"}.fa-cab:before,.fa-taxi:before{content:"\f1ba"}.fa-tree:before{content:"\f1bb"}.fa-spotify:before{content:"\f1bc"}.fa-deviantart:before{content:"\f1bd"}.fa-soundcloud:before{content:"\f1be"}.fa-database:before{content:"\f1c0"}.fa-file-pdf-o:before{content:"\f1c1"}.fa-file-word-o:before{content:"\f1c2"}.fa-file-excel-o:before{content:"\f1c3"}.fa-file-powerpoint-o:before{content:"\f1c4"}.fa-file-image-o:before,.fa-file-photo-o:before,.fa-file-picture-o:before{content:"\f1c5"}.fa-file-archive-o:before,.fa-file-zip-o:before{content:"\f1c6"}.fa-file-audio-o:before,.fa-file-sound-o:before{content:"\f1c7"}.fa-file-movie-o:before,.fa-file-video-o:before{content:"\f1c8"}.fa-file-code-o:before{content:"\f1c9"}.fa-vine:before{content:"\f1ca"}.fa-codepen:before{content:"\f1cb"}.fa-jsfiddle:before{content:"\f1cc"}.fa-life-bouy:before,.fa-life-ring:before,.fa-life-saver:before,.fa-support:before{content:"\f1cd"}.fa-circle-o-notch:before{content:"\f1ce"}.fa-ra:before,.fa-rebel:before{content:"\f1d0"}.fa-empire:before,.fa-ge:before{content:"\f1d1"}.fa-git-square:before{content:"\f1d2"}.fa-git:before{content:"\f1d3"}.fa-hacker-news:before{content:"\f1d4"}.fa-tencent-weibo:before{content:"\f1d5"}.fa-qq:before{content:"\f1d6"}.fa-wechat:before,.fa-weixin:before{content:"\f1d7"}.fa-paper-plane:before,.fa-send:before{content:"\f1d8"}.fa-paper-plane-o:before,.fa-send-o:before{content:"\f1d9"}.fa-history:before{content:"\f1da"}.fa-circle-thin:before{content:"\f1db"}.fa-header:before{content:"\f1dc"}.fa-paragraph:before{content:"\f1dd"}.fa-sliders:before{content:"\f1de"}.fa-share-alt:before{content:"\f1e0"}.fa-share-alt-square:before{content:"\f1e1"}.fa-bomb:before{content:"\f1e2"}.book-langs-index{width:100%;height:100%;padding:40px 0;margin:0;overflow:auto}@media (max-width:600px){.book-langs-index{padding:0}}.book-langs-index .inner{max-width:600px;width:100%;margin:0 auto;padding:30px;background:#fff;border-radius:3px}.book-langs-index .inner h3{margin:0}.book-langs-index .inner .languages{list-style:none;padding:20px 30px;margin-top:20px;border-top:1px solid #eee}.book-langs-index .inner .languages:after,.book-langs-index .inner .languages:before{content:" ";display:table;line-height:0}.book-langs-index .inner .languages li{width:50%;float:left;padding:10px 5px;font-size:16px}@media (max-width:600px){.book-langs-index .inner .languages li{width:100%;max-width:100%}}.book .book-header{overflow:visible;height:50px;padding:0 8px;z-index:2;font-size:.85em;color:#7e888b;background:0 0}.book .book-header .btn{display:block;height:50px;padding:0 15px;border-bottom:none;color:#ccc;text-transform:uppercase;line-height:50px;-webkit-box-shadow:none!important;box-shadow:none!important;position:relative;font-size:14px}.book .book-header .btn:hover{position:relative;text-decoration:none;color:#444;background:0 0}.book .book-header h1{margin:0;font-size:20px;font-weight:200;text-align:center;line-height:50px;opacity:0;padding-left:200px;padding-right:200px;-webkit-transition:opacity .2s ease;-moz-transition:opacity .2s ease;-o-transition:opacity .2s ease;transition:opacity .2s ease;overflow:hidden;text-overflow:ellipsis;white-space:nowrap}.book .book-header h1 a,.book .book-header h1 a:hover{color:inherit;text-decoration:none}@media (max-width:1000px){.book .book-header h1{display:none}}.book .book-header h1 i{display:none}.book .book-header:hover h1{opacity:1}.book.is-loading .book-header h1 i{display:inline-block}.book.is-loading .book-header h1 a{display:none}.dropdown{position:relative}.dropdown-menu{position:absolute;top:100%;left:0;z-index:100;display:none;float:left;min-width:160px;padding:0;margin:2px 0 0;list-style:none;font-size:14px;background-color:#fafafa;border:1px solid rgba(0,0,0,.07);border-radius:1px;-webkit-box-shadow:0 6px 12px rgba(0,0,0,.175);box-shadow:0 6px 12px rgba(0,0,0,.175);background-clip:padding-box}.dropdown-menu.open{display:block}.dropdown-menu.dropdown-left{left:auto;right:4%}.dropdown-menu.dropdown-left .dropdown-caret{right:14px;left:auto}.dropdown-menu .dropdown-caret{position:absolute;top:-8px;left:14px;width:18px;height:10px;float:left;overflow:hidden}.dropdown-menu .dropdown-caret .caret-inner,.dropdown-menu .dropdown-caret .caret-outer{display:inline-block;top:0;border-left:9px solid transparent;border-right:9px solid transparent;position:absolute}.dropdown-menu .dropdown-caret .caret-outer{border-bottom:9px solid rgba(0,0,0,.1);height:auto;left:0;width:auto;margin-left:-1px}.dropdown-menu .dropdown-caret .caret-inner{margin-top:-1px;top:1px;border-bottom:9px solid #fafafa}.dropdown-menu .buttons{border-bottom:1px solid rgba(0,0,0,.07)}.dropdown-menu .buttons:after,.dropdown-menu .buttons:before{content:" ";display:table;line-height:0}.dropdown-menu .buttons:last-child{border-bottom:none}.dropdown-menu .buttons .button{border:0;background-color:transparent;color:#a6a6a6;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.alert,.dropdown-menu .buttons .button:hover{color:#444}.dropdown-menu .buttons .button:focus,.dropdown-menu .buttons .button:hover{outline:0}.dropdown-menu .buttons .button.size-2{width:50%}.dropdown-menu .buttons .button.size-3{width:33%}.alert{padding:15px;margin-bottom:20px;background:#eee;border-bottom:5px solid #ddd}.alert-success{background:#dff0d8;border-color:#d6e9c6;color:#3c763d}.alert-info{background:#d9edf7;border-color:#bce8f1;color:#31708f}.alert-danger{background:#f2dede;border-color:#ebccd1;color:#a94442}.alert-warning{background:#fcf8e3;border-color:#faebcc;color:#8a6d3b}.book .book-summary{position:absolute;top:0;left:-300px;bottom:0;z-index:1;width:300px;color:#364149;background:#fafafa;border-right:1px solid rgba(0,0,0,.07);-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-summary ul.summary{position:absolute;top:0;left:0;right:0;bottom:0;overflow-y:auto;list-style:none;margin:0;padding:0;-webkit-transition:top .5s ease;-moz-transition:top .5s ease;-o-transition:top .5s ease;transition:top .5s ease}.book .book-summary ul.summary li{list-style:none}.book .book-summary ul.summary li.divider{height:1px;margin:7px 0;overflow:hidden;background:rgba(0,0,0,.07)}.book .book-summary ul.summary li i.fa-check{display:none;position:absolute;right:9px;top:16px;font-size:9px;color:#3c3}.book .book-summary ul.summary li.done>a{color:#364149;font-weight:400}.book .book-summary ul.summary li.done>a i{display:inline}.book .book-summary ul.summary li a,.book .book-summary ul.summary li span{display:block;padding:10px 15px;border-bottom:none;color:#364149;background:0 0;text-overflow:ellipsis;overflow:hidden;white-space:nowrap;position:relative}.book .book-summary ul.summary li span{cursor:not-allowed;opacity:.3;filter:alpha(opacity=30)}.book .book-summary ul.summary li a:hover,.book .book-summary ul.summary li.active>a{color:#008cff;background:0 0;text-decoration:none}.book .book-summary ul.summary li ul{padding-left:20px}@media (max-width:600px){.book .book-summary{width:calc(100% - 60px);bottom:0;left:-100%}}.book.with-summary .book-summary{left:0}.book.without-animation .book-summary{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.book{position:relative;width:100%;height:100%}.book .book-body,.book .book-body .body-inner{position:absolute;top:0;left:0;overflow-y:auto;bottom:0;right:0}.book .book-body{color:#000;background:#fff;-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-body .page-wrapper{position:relative;outline:0}.book .book-body .page-wrapper .page-inner{max-width:800px;margin:0 auto;padding:20px 0 40px}.book .book-body .page-wrapper .page-inner section{margin:0;padding:5px 15px;background:#fff;border-radius:2px;line-height:1.7;font-size:1.6rem}.book .book-body .page-wrapper .page-inner .btn-group .btn{border-radius:0;background:#eee;border:0}@media (max-width:1240px){.book .book-body{-webkit-transition:-webkit-transform 250ms ease;-moz-transition:-moz-transform 250ms ease;-o-transition:-o-transform 250ms ease;transition:transform 250ms ease;padding-bottom:20px}.book .book-body .body-inner{position:static;min-height:calc(100% - 50px)}}@media (min-width:600px){.book.with-summary .book-body{left:300px}}@media (max-width:600px){.book.with-summary{overflow:hidden}.book.with-summary .book-body{-webkit-transform:translate(calc(100% - 60px),0);-moz-transform:translate(calc(100% - 60px),0);-ms-transform:translate(calc(100% - 60px),0);-o-transform:translate(calc(100% - 60px),0);transform:translate(calc(100% - 60px),0)}}.book.without-animation .book-body{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.buttons:after,.buttons:before{content:" ";display:table;line-height:0}.button{border:0;background:#eee;color:#666;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.button:hover{color:#444}.button:focus,.button:hover{outline:0}.button.size-2{width:50%}.button.size-3{width:33%}.book .book-body .page-wrapper .page-inner section{display:none}.book .book-body .page-wrapper .page-inner section.normal{display:block;word-wrap:break-word;overflow:hidden;color:#333;line-height:1.7;text-size-adjust:100%;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;-moz-text-size-adjust:100%}.book .book-body .page-wrapper .page-inner section.normal *{box-sizing:border-box;-webkit-box-sizing:border-box;}.book .book-body .page-wrapper .page-inner section.normal>:first-child{margin-top:0!important}.book .book-body .page-wrapper .page-inner section.normal>:last-child{margin-bottom:0!important}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal figure,.book .book-body .page-wrapper .page-inner section.normal img,.book .book-body .page-wrapper .page-inner section.normal pre,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal tr{page-break-inside:avoid}.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal p{orphans:3;widows:3}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5{page-break-after:avoid}.book .book-body .page-wrapper .page-inner section.normal b,.book .book-body .page-wrapper .page-inner section.normal strong{font-weight:700}.book .book-body .page-wrapper .page-inner section.normal em{font-style:italic}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal dl,.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal p,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal ul{margin-top:0;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal a{color:#4183c4;text-decoration:none;background:0 0}.book .book-body .page-wrapper .page-inner section.normal a:active,.book .book-body .page-wrapper .page-inner section.normal a:focus,.book .book-body .page-wrapper .page-inner section.normal a:hover{outline:0;text-decoration:underline}.book .book-body .page-wrapper .page-inner section.normal img{border:0;max-width:100%}.book .book-body .page-wrapper .page-inner section.normal hr{height:4px;padding:0;margin:1.7em 0;overflow:hidden;background-color:#e7e7e7;border:none}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book .book-body .page-wrapper .page-inner section.normal hr:before{display:table;content:" "}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal h6{margin-top:1.275em;margin-bottom:.85em;}.book .book-body .page-wrapper .page-inner section.normal h1{font-size:2em}.book .book-body .page-wrapper .page-inner section.normal h2{font-size:1.75em}.book .book-body .page-wrapper .page-inner section.normal h3{font-size:1.5em}.book .book-body .page-wrapper .page-inner section.normal h4{font-size:1.25em}.book .book-body .page-wrapper .page-inner section.normal h5{font-size:1em}.book .book-body .page-wrapper .page-inner section.normal h6{font-size:1em;color:#777}.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal pre{font-family:Consolas,"Liberation Mono",Menlo,Courier,monospace;direction:ltr;border:none;color:inherit}.book .book-body .page-wrapper .page-inner section.normal pre{overflow:auto;word-wrap:normal;margin:0 0 1.275em;padding:.85em 1em;background:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal pre>code{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;font-size:.85em;white-space:pre;background:0 0}.book .book-body .page-wrapper .page-inner section.normal pre>code:after,.book .book-body .page-wrapper .page-inner section.normal pre>code:before{content:normal}.book .book-body .page-wrapper .page-inner section.normal code{padding:.2em;margin:0;font-size:.85em;background-color:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal code:after,.book .book-body .page-wrapper .page-inner section.normal code:before{letter-spacing:-.2em;content:"\00a0"}.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal ul{padding:0 0 0 2em;margin:0 0 .85em}.book .book-body .page-wrapper .page-inner section.normal ol ol,.book .book-body .page-wrapper .page-inner section.normal ol ul,.book .book-body .page-wrapper .page-inner section.normal ul ol,.book .book-body .page-wrapper .page-inner section.normal ul ul{margin-top:0;margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal ol ol{list-style-type:lower-roman}.book .book-body .page-wrapper .page-inner section.normal blockquote{margin:0 0 .85em;padding:0 15px;opacity:0.75;border-left:4px solid #dcdcdc}.book .book-body .page-wrapper .page-inner section.normal blockquote:first-child{margin-top:0}.book .book-body .page-wrapper .page-inner section.normal blockquote:last-child{margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal dl{padding:0}.book .book-body .page-wrapper .page-inner section.normal dl dt{padding:0;margin-top:.85em;font-style:italic;font-weight:700}.book .book-body .page-wrapper .page-inner section.normal dl dd{padding:0 .85em;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal dd{margin-left:0}.book .book-body .page-wrapper .page-inner section.normal .glossary-term{cursor:help;text-decoration:underline}.book .book-body .navigation{position:absolute;top:50px;bottom:0;margin:0;max-width:150px;min-width:90px;display:flex;justify-content:center;align-content:center;flex-direction:column;font-size:40px;color:#ccc;text-align:center;-webkit-transition:all 350ms ease;-moz-transition:all 350ms ease;-o-transition:all 350ms ease;transition:all 350ms ease}.book .book-body .navigation:hover{text-decoration:none;color:#444}.book .book-body .navigation.navigation-next{right:0}.book .book-body .navigation.navigation-prev{left:0}@media (max-width:1240px){.book .book-body .navigation{position:static;top:auto;max-width:50%;width:50%;display:inline-block;float:left}.book .book-body .navigation.navigation-unique{max-width:100%;width:100%}}.book .book-body .page-wrapper .page-inner section.glossary{margin-bottom:40px}.book .book-body .page-wrapper .page-inner section.glossary h2 a,.book .book-body .page-wrapper .page-inner section.glossary h2 a:hover{color:inherit;text-decoration:none}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index{list-style:none;margin:0;padding:0}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index li{display:inline;margin:0 8px;white-space:nowrap}*{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;-webkit-overflow-scrolling:touch;-webkit-tap-highlight-color:transparent;-webkit-text-size-adjust:none;-webkit-touch-callout:none}a{text-decoration:none}body,html{height:100%}html{font-size:62.5%}body{text-rendering:optimizeLegibility;font-smoothing:antialiased;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:.2px;text-size-adjust:100%}
+.book .book-summary ul.summary li a span {display:inline;padding:initial;overflow:visible;cursor:auto;opacity:1;}
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js
new file mode 100644
index 000000000..9ace197e9
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js
@@ -0,0 +1,6 @@
+(function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof require=="function"&&require;if(!u&&a)return a(o,!0);if(i)return i(o,!0);var f=new Error("Cannot find module '"+o+"'");throw f.code="MODULE_NOT_FOUND",f}var l=n[o]={exports:{}};t[o][0].call(l.exports,function(e){var n=t[o][1][e];return s(n?n:e)},l,l.exports,e,t,n,r)}return n[o].exports}var i=typeof require=="function"&&require;for(var o=0;o<r.length;o++)s(r[o]);return s})({1:[function(require,module,exports){if(typeof module==="object"&&typeof module.exports==="object"){module.exports=jQuery}},{}],2:[function(require,module,exports){(function(global){(function(){var undefined;var VERSION="3.10.1";var BIND_FLAG=1,BIND_KEY_FLAG=2,CURRY_BOUND_FLAG=4,CURRY_FLAG=8,CURRY_RIGHT_FLAG=16,PARTIAL_FLAG=32,PARTIAL_RIGHT_FLAG=64,ARY_FLAG=128,REARG_FLAG=256;var DEFAULT_TRUNC_LENGTH=30,DEFAULT_TRUNC_OMISSION="...";var HOT_COUNT=150,HOT_SPAN=16;var LARGE_ARRAY_SIZE=200;var LAZY_FILTER_FLAG=1,LAZY_MAP_FLAG=2;var FUNC_ERROR_TEXT="Expected a function";var PLACEHOLDER="__lodash_placeholder__";var argsTag="[object Arguments]",arrayTag="[object Array]",boolTag="[object Boolean]",dateTag="[object Date]",errorTag="[object Error]",funcTag="[object Function]",mapTag="[object Map]",numberTag="[object Number]",objectTag="[object Object]",regexpTag="[object RegExp]",setTag="[object Set]",stringTag="[object String]",weakMapTag="[object WeakMap]";var arrayBufferTag="[object ArrayBuffer]",float32Tag="[object Float32Array]",float64Tag="[object Float64Array]",int8Tag="[object Int8Array]",int16Tag="[object Int16Array]",int32Tag="[object Int32Array]",uint8Tag="[object Uint8Array]",uint8ClampedTag="[object Uint8ClampedArray]",uint16Tag="[object Uint16Array]",uint32Tag="[object Uint32Array]";var reEmptyStringLeading=/\b__p \+= '';/g,reEmptyStringMiddle=/\b(__p \+=) '' \+/g,reEmptyStringTrailing=/(__e\(.*?\)|\b__t\)) \+\n'';/g;var reEscapedHtml=/&(?:amp|lt|gt|quot|#39|#96);/g,reUnescapedHtml=/[&<>"'`]/g,reHasEscapedHtml=RegExp(reEscapedHtml.source),reHasUnescapedHtml=RegExp(reUnescapedHtml.source);var reEscape=/<%-([\s\S]+?)%>/g,reEvaluate=/<%([\s\S]+?)%>/g,reInterpolate=/<%=([\s\S]+?)%>/g;var reIsDeepProp=/\.|\[(?:[^[\]]*|(["'])(?:(?!\1)[^\n\\]|\\.)*?\1)\]/,reIsPlainProp=/^\w*$/,rePropName=/[^.[\]]+|\[(?:(-?\d+(?:\.\d+)?)|(["'])((?:(?!\2)[^\n\\]|\\.)*?)\2)\]/g;var reRegExpChars=/^[:!,]|[\\^$.*+?()[\]{}|\/]|(^[0-9a-fA-Fnrtuvx])|([\n\r\u2028\u2029])/g,reHasRegExpChars=RegExp(reRegExpChars.source);var reComboMark=/[\u0300-\u036f\ufe20-\ufe23]/g;var reEscapeChar=/\\(\\)?/g;var reEsTemplate=/\$\{([^\\}]*(?:\\.[^\\}]*)*)\}/g;var reFlags=/\w*$/;var reHasHexPrefix=/^0[xX]/;var reIsHostCtor=/^\[object .+?Constructor\]$/;var reIsUint=/^\d+$/;var reLatin1=/[\xc0-\xd6\xd8-\xde\xdf-\xf6\xf8-\xff]/g;var reNoMatch=/($^)/;var reUnescapedString=/['\n\r\u2028\u2029\\]/g;var reWords=function(){var upper="[A-Z\\xc0-\\xd6\\xd8-\\xde]",lower="[a-z\\xdf-\\xf6\\xf8-\\xff]+";return RegExp(upper+"+(?="+upper+lower+")|"+upper+"?"+lower+"|"+upper+"+|[0-9]+","g")}();var contextProps=["Array","ArrayBuffer","Date","Error","Float32Array","Float64Array","Function","Int8Array","Int16Array","Int32Array","Math","Number","Object","RegExp","Set","String","_","clearTimeout","isFinite","parseFloat","parseInt","setTimeout","TypeError","Uint8Array","Uint8ClampedArray","Uint16Array","Uint32Array","WeakMap"];var templateCounter=-1;var typedArrayTags={};typedArrayTags[float32Tag]=typedArrayTags[float64Tag]=typedArrayTags[int8Tag]=typedArrayTags[int16Tag]=typedArrayTags[int32Tag]=typedArrayTags[uint8Tag]=typedArrayTags[uint8ClampedTag]=typedArrayTags[uint16Tag]=typedArrayTags[uint32Tag]=true;typedArrayTags[argsTag]=typedArrayTags[arrayTag]=typedArrayTags[arrayBufferTag]=typedArrayTags[boolTag]=typedArrayTags[dateTag]=typedArrayTags[errorTag]=typedArrayTags[funcTag]=typedArrayTags[mapTag]=typedArrayTags[numberTag]=typedArrayTags[objectTag]=typedArrayTags[regexpTag]=typedArrayTags[setTag]=typedArrayTags[stringTag]=typedArrayTags[weakMapTag]=false;var cloneableTags={};cloneableTags[argsTag]=cloneableTags[arrayTag]=cloneableTags[arrayBufferTag]=cloneableTags[boolTag]=cloneableTags[dateTag]=cloneableTags[float32Tag]=cloneableTags[float64Tag]=cloneableTags[int8Tag]=cloneableTags[int16Tag]=cloneableTags[int32Tag]=cloneableTags[numberTag]=cloneableTags[objectTag]=cloneableTags[regexpTag]=cloneableTags[stringTag]=cloneableTags[uint8Tag]=cloneableTags[uint8ClampedTag]=cloneableTags[uint16Tag]=cloneableTags[uint32Tag]=true;cloneableTags[errorTag]=cloneableTags[funcTag]=cloneableTags[mapTag]=cloneableTags[setTag]=cloneableTags[weakMapTag]=false;var deburredLetters={"À":"A","Á":"A","Â":"A","Ã":"A","Ä":"A","Å":"A","à":"a","á":"a","â":"a","ã":"a","ä":"a","å":"a","Ç":"C","ç":"c","Ð":"D","ð":"d","È":"E","É":"E","Ê":"E","Ë":"E","è":"e","é":"e","ê":"e","ë":"e","Ì":"I","Í":"I","Î":"I","Ï":"I","ì":"i","í":"i","î":"i","ï":"i","Ñ":"N","ñ":"n","Ò":"O","Ó":"O","Ô":"O","Õ":"O","Ö":"O","Ø":"O","ò":"o","ó":"o","ô":"o","õ":"o","ö":"o","ø":"o","Ù":"U","Ú":"U","Û":"U","Ü":"U","ù":"u","ú":"u","û":"u","ü":"u","Ý":"Y","ý":"y","ÿ":"y","Æ":"Ae","æ":"ae","Þ":"Th","þ":"th","ß":"ss"};var htmlEscapes={"&":"&amp;","<":"&lt;",">":"&gt;",'"':"&quot;","'":"&#39;","`":"&#96;"};var htmlUnescapes={"&amp;":"&","&lt;":"<","&gt;":">","&quot;":'"',"&#39;":"'","&#96;":"`"};var objectTypes={"function":true,object:true};var regexpEscapes={0:"x30",1:"x31",2:"x32",3:"x33",4:"x34",5:"x35",6:"x36",7:"x37",8:"x38",9:"x39",A:"x41",B:"x42",C:"x43",D:"x44",E:"x45",F:"x46",a:"x61",b:"x62",c:"x63",d:"x64",e:"x65",f:"x66",n:"x6e",r:"x72",t:"x74",u:"x75",v:"x76",x:"x78"};var stringEscapes={"\\":"\\","'":"'","\n":"n","\r":"r","\u2028":"u2028","\u2029":"u2029"};var freeExports=objectTypes[typeof exports]&&exports&&!exports.nodeType&&exports;var freeModule=objectTypes[typeof module]&&module&&!module.nodeType&&module;var freeGlobal=freeExports&&freeModule&&typeof global=="object"&&global&&global.Object&&global;var freeSelf=objectTypes[typeof self]&&self&&self.Object&&self;var freeWindow=objectTypes[typeof window]&&window&&window.Object&&window;var moduleExports=freeModule&&freeModule.exports===freeExports&&freeExports;var root=freeGlobal||freeWindow!==(this&&this.window)&&freeWindow||freeSelf||this;function baseCompareAscending(value,other){if(value!==other){var valIsNull=value===null,valIsUndef=value===undefined,valIsReflexive=value===value;var othIsNull=other===null,othIsUndef=other===undefined,othIsReflexive=other===other;if(value>other&&!othIsNull||!valIsReflexive||valIsNull&&!othIsUndef&&othIsReflexive||valIsUndef&&othIsReflexive){return 1}if(value<other&&!valIsNull||!othIsReflexive||othIsNull&&!valIsUndef&&valIsReflexive||othIsUndef&&valIsReflexive){return-1}}return 0}function baseFindIndex(array,predicate,fromRight){var length=array.length,index=fromRight?length:-1;while(fromRight?index--:++index<length){if(predicate(array[index],index,array)){return index}}return-1}function baseIndexOf(array,value,fromIndex){if(value!==value){return indexOfNaN(array,fromIndex)}var index=fromIndex-1,length=array.length;while(++index<length){if(array[index]===value){return index}}return-1}function baseIsFunction(value){return typeof value=="function"||false}function baseToString(value){return value==null?"":value+""}function charsLeftIndex(string,chars){var index=-1,length=string.length;while(++index<length&&chars.indexOf(string.charAt(index))>-1){}return index}function charsRightIndex(string,chars){var index=string.length;while(index--&&chars.indexOf(string.charAt(index))>-1){}return index}function compareAscending(object,other){return baseCompareAscending(object.criteria,other.criteria)||object.index-other.index}function compareMultiple(object,other,orders){var index=-1,objCriteria=object.criteria,othCriteria=other.criteria,length=objCriteria.length,ordersLength=orders.length;while(++index<length){var result=baseCompareAscending(objCriteria[index],othCriteria[index]);if(result){if(index>=ordersLength){return result}var order=orders[index];return result*(order==="asc"||order===true?1:-1)}}return object.index-other.index}function deburrLetter(letter){return deburredLetters[letter]}function escapeHtmlChar(chr){return htmlEscapes[chr]}function escapeRegExpChar(chr,leadingChar,whitespaceChar){if(leadingChar){chr=regexpEscapes[chr]}else if(whitespaceChar){chr=stringEscapes[chr]}return"\\"+chr}function escapeStringChar(chr){return"\\"+stringEscapes[chr]}function indexOfNaN(array,fromIndex,fromRight){var length=array.length,index=fromIndex+(fromRight?0:-1);while(fromRight?index--:++index<length){var other=array[index];if(other!==other){return index}}return-1}function isObjectLike(value){return!!value&&typeof value=="object"}function isSpace(charCode){return charCode<=160&&(charCode>=9&&charCode<=13)||charCode==32||charCode==160||charCode==5760||charCode==6158||charCode>=8192&&(charCode<=8202||charCode==8232||charCode==8233||charCode==8239||charCode==8287||charCode==12288||charCode==65279)}function replaceHolders(array,placeholder){var index=-1,length=array.length,resIndex=-1,result=[];while(++index<length){if(array[index]===placeholder){array[index]=PLACEHOLDER;result[++resIndex]=index}}return result}function sortedUniq(array,iteratee){var seen,index=-1,length=array.length,resIndex=-1,result=[];while(++index<length){var value=array[index],computed=iteratee?iteratee(value,index,array):value;if(!index||seen!==computed){seen=computed;result[++resIndex]=value}}return result}function trimmedLeftIndex(string){var index=-1,length=string.length;while(++index<length&&isSpace(string.charCodeAt(index))){}return index}function trimmedRightIndex(string){var index=string.length;while(index--&&isSpace(string.charCodeAt(index))){}return index}function unescapeHtmlChar(chr){return htmlUnescapes[chr]}function runInContext(context){context=context?_.defaults(root.Object(),context,_.pick(root,contextProps)):root;var Array=context.Array,Date=context.Date,Error=context.Error,Function=context.Function,Math=context.Math,Number=context.Number,Object=context.Object,RegExp=context.RegExp,String=context.String,TypeError=context.TypeError;var arrayProto=Array.prototype,objectProto=Object.prototype,stringProto=String.prototype;var fnToString=Function.prototype.toString;var hasOwnProperty=objectProto.hasOwnProperty;var idCounter=0;var objToString=objectProto.toString;var oldDash=root._;var reIsNative=RegExp("^"+fnToString.call(hasOwnProperty).replace(/[\\^$.*+?()[\]{}|]/g,"\\$&").replace(/hasOwnProperty|(function).*?(?=\\\()| for .+?(?=\\\])/g,"$1.*?")+"$");var ArrayBuffer=context.ArrayBuffer,clearTimeout=context.clearTimeout,parseFloat=context.parseFloat,pow=Math.pow,propertyIsEnumerable=objectProto.propertyIsEnumerable,Set=getNative(context,"Set"),setTimeout=context.setTimeout,splice=arrayProto.splice,Uint8Array=context.Uint8Array,WeakMap=getNative(context,"WeakMap");var nativeCeil=Math.ceil,nativeCreate=getNative(Object,"create"),nativeFloor=Math.floor,nativeIsArray=getNative(Array,"isArray"),nativeIsFinite=context.isFinite,nativeKeys=getNative(Object,"keys"),nativeMax=Math.max,nativeMin=Math.min,nativeNow=getNative(Date,"now"),nativeParseInt=context.parseInt,nativeRandom=Math.random;var NEGATIVE_INFINITY=Number.NEGATIVE_INFINITY,POSITIVE_INFINITY=Number.POSITIVE_INFINITY;var MAX_ARRAY_LENGTH=4294967295,MAX_ARRAY_INDEX=MAX_ARRAY_LENGTH-1,HALF_MAX_ARRAY_LENGTH=MAX_ARRAY_LENGTH>>>1;var MAX_SAFE_INTEGER=9007199254740991;var metaMap=WeakMap&&new WeakMap;var realNames={};function lodash(value){if(isObjectLike(value)&&!isArray(value)&&!(value instanceof LazyWrapper)){if(value instanceof LodashWrapper){return value}if(hasOwnProperty.call(value,"__chain__")&&hasOwnProperty.call(value,"__wrapped__")){return wrapperClone(value)}}return new LodashWrapper(value)}function baseLodash(){}function LodashWrapper(value,chainAll,actions){this.__wrapped__=value;this.__actions__=actions||[];this.__chain__=!!chainAll}var support=lodash.support={};lodash.templateSettings={escape:reEscape,evaluate:reEvaluate,interpolate:reInterpolate,variable:"",imports:{_:lodash}};function LazyWrapper(value){this.__wrapped__=value;this.__actions__=[];this.__dir__=1;this.__filtered__=false;this.__iteratees__=[];this.__takeCount__=POSITIVE_INFINITY;this.__views__=[]}function lazyClone(){var result=new LazyWrapper(this.__wrapped__);result.__actions__=arrayCopy(this.__actions__);result.__dir__=this.__dir__;result.__filtered__=this.__filtered__;result.__iteratees__=arrayCopy(this.__iteratees__);result.__takeCount__=this.__takeCount__;result.__views__=arrayCopy(this.__views__);return result}function lazyReverse(){if(this.__filtered__){var result=new LazyWrapper(this);result.__dir__=-1;result.__filtered__=true}else{result=this.clone();result.__dir__*=-1}return result}function lazyValue(){var array=this.__wrapped__.value(),dir=this.__dir__,isArr=isArray(array),isRight=dir<0,arrLength=isArr?array.length:0,view=getView(0,arrLength,this.__views__),start=view.start,end=view.end,length=end-start,index=isRight?end:start-1,iteratees=this.__iteratees__,iterLength=iteratees.length,resIndex=0,takeCount=nativeMin(length,this.__takeCount__);if(!isArr||arrLength<LARGE_ARRAY_SIZE||arrLength==length&&takeCount==length){return baseWrapperValue(isRight&&isArr?array.reverse():array,this.__actions__)}var result=[];outer:while(length--&&resIndex<takeCount){index+=dir;var iterIndex=-1,value=array[index];while(++iterIndex<iterLength){var data=iteratees[iterIndex],iteratee=data.iteratee,type=data.type,computed=iteratee(value);if(type==LAZY_MAP_FLAG){value=computed}else if(!computed){if(type==LAZY_FILTER_FLAG){continue outer}else{break outer}}}result[resIndex++]=value}return result}function MapCache(){this.__data__={}}function mapDelete(key){return this.has(key)&&delete this.__data__[key]}function mapGet(key){return key=="__proto__"?undefined:this.__data__[key]}function mapHas(key){return key!="__proto__"&&hasOwnProperty.call(this.__data__,key)}function mapSet(key,value){if(key!="__proto__"){this.__data__[key]=value}return this}function SetCache(values){var length=values?values.length:0;this.data={hash:nativeCreate(null),set:new Set};while(length--){this.push(values[length])}}function cacheIndexOf(cache,value){var data=cache.data,result=typeof value=="string"||isObject(value)?data.set.has(value):data.hash[value];return result?0:-1}function cachePush(value){var data=this.data;if(typeof value=="string"||isObject(value)){data.set.add(value)}else{data.hash[value]=true}}function arrayConcat(array,other){var index=-1,length=array.length,othIndex=-1,othLength=other.length,result=Array(length+othLength);while(++index<length){result[index]=array[index]}while(++othIndex<othLength){result[index++]=other[othIndex]}return result}function arrayCopy(source,array){var index=-1,length=source.length;array||(array=Array(length));while(++index<length){array[index]=source[index]}return array}function arrayEach(array,iteratee){var index=-1,length=array.length;while(++index<length){if(iteratee(array[index],index,array)===false){break}}return array}function arrayEachRight(array,iteratee){var length=array.length;while(length--){if(iteratee(array[length],length,array)===false){break}}return array}function arrayEvery(array,predicate){var index=-1,length=array.length;while(++index<length){if(!predicate(array[index],index,array)){return false}}return true}function arrayExtremum(array,iteratee,comparator,exValue){var index=-1,length=array.length,computed=exValue,result=computed;while(++index<length){var value=array[index],current=+iteratee(value);if(comparator(current,computed)){computed=current;result=value}}return result}function arrayFilter(array,predicate){var index=-1,length=array.length,resIndex=-1,result=[];while(++index<length){var value=array[index];if(predicate(value,index,array)){result[++resIndex]=value}}return result}function arrayMap(array,iteratee){var index=-1,length=array.length,result=Array(length);while(++index<length){result[index]=iteratee(array[index],index,array)}return result}function arrayPush(array,values){var index=-1,length=values.length,offset=array.length;while(++index<length){array[offset+index]=values[index]}return array}function arrayReduce(array,iteratee,accumulator,initFromArray){var index=-1,length=array.length;if(initFromArray&&length){accumulator=array[++index]}while(++index<length){accumulator=iteratee(accumulator,array[index],index,array)}return accumulator}function arrayReduceRight(array,iteratee,accumulator,initFromArray){var length=array.length;if(initFromArray&&length){accumulator=array[--length]}while(length--){accumulator=iteratee(accumulator,array[length],length,array)}return accumulator}function arraySome(array,predicate){var index=-1,length=array.length;while(++index<length){if(predicate(array[index],index,array)){return true}}return false}function arraySum(array,iteratee){var length=array.length,result=0;while(length--){result+=+iteratee(array[length])||0}return result}function assignDefaults(objectValue,sourceValue){return objectValue===undefined?sourceValue:objectValue}function assignOwnDefaults(objectValue,sourceValue,key,object){return objectValue===undefined||!hasOwnProperty.call(object,key)?sourceValue:objectValue}function assignWith(object,source,customizer){var index=-1,props=keys(source),length=props.length;while(++index<length){var key=props[index],value=object[key],result=customizer(value,source[key],key,object,source);if((result===result?result!==value:value===value)||value===undefined&&!(key in object)){object[key]=result}}return object}function baseAssign(object,source){return source==null?object:baseCopy(source,keys(source),object)}function baseAt(collection,props){var index=-1,isNil=collection==null,isArr=!isNil&&isArrayLike(collection),length=isArr?collection.length:0,propsLength=props.length,result=Array(propsLength);while(++index<propsLength){var key=props[index];if(isArr){result[index]=isIndex(key,length)?collection[key]:undefined}else{result[index]=isNil?undefined:collection[key]}}return result}function baseCopy(source,props,object){object||(object={});var index=-1,length=props.length;while(++index<length){var key=props[index];object[key]=source[key]}return object}function baseCallback(func,thisArg,argCount){var type=typeof func;if(type=="function"){return thisArg===undefined?func:bindCallback(func,thisArg,argCount)}if(func==null){return identity}if(type=="object"){return baseMatches(func)}return thisArg===undefined?property(func):baseMatchesProperty(func,thisArg)}function baseClone(value,isDeep,customizer,key,object,stackA,stackB){var result;if(customizer){result=object?customizer(value,key,object):customizer(value)}if(result!==undefined){return result}if(!isObject(value)){return value}var isArr=isArray(value);if(isArr){result=initCloneArray(value);if(!isDeep){return arrayCopy(value,result)}}else{var tag=objToString.call(value),isFunc=tag==funcTag;if(tag==objectTag||tag==argsTag||isFunc&&!object){result=initCloneObject(isFunc?{}:value);if(!isDeep){return baseAssign(result,value)}}else{return cloneableTags[tag]?initCloneByTag(value,tag,isDeep):object?value:{}}}stackA||(stackA=[]);stackB||(stackB=[]);var length=stackA.length;while(length--){if(stackA[length]==value){return stackB[length]}}stackA.push(value);stackB.push(result);(isArr?arrayEach:baseForOwn)(value,function(subValue,key){result[key]=baseClone(subValue,isDeep,customizer,key,value,stackA,stackB)});return result}var baseCreate=function(){function object(){}return function(prototype){if(isObject(prototype)){object.prototype=prototype;var result=new object;object.prototype=undefined}return result||{}}}();function baseDelay(func,wait,args){if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}return setTimeout(function(){func.apply(undefined,args)},wait)}function baseDifference(array,values){var length=array?array.length:0,result=[];if(!length){return result}var index=-1,indexOf=getIndexOf(),isCommon=indexOf==baseIndexOf,cache=isCommon&&values.length>=LARGE_ARRAY_SIZE?createCache(values):null,valuesLength=values.length;if(cache){indexOf=cacheIndexOf;isCommon=false;values=cache}outer:while(++index<length){var value=array[index];if(isCommon&&value===value){var valuesIndex=valuesLength;while(valuesIndex--){if(values[valuesIndex]===value){continue outer}}result.push(value)}else if(indexOf(values,value,0)<0){result.push(value)}}return result}var baseEach=createBaseEach(baseForOwn);var baseEachRight=createBaseEach(baseForOwnRight,true);function baseEvery(collection,predicate){var result=true;baseEach(collection,function(value,index,collection){result=!!predicate(value,index,collection);return result});return result}function baseExtremum(collection,iteratee,comparator,exValue){var computed=exValue,result=computed;baseEach(collection,function(value,index,collection){var current=+iteratee(value,index,collection);if(comparator(current,computed)||current===exValue&&current===result){computed=current;result=value}});return result}function baseFill(array,value,start,end){var length=array.length;start=start==null?0:+start||0;if(start<0){start=-start>length?0:length+start}end=end===undefined||end>length?length:+end||0;if(end<0){end+=length}length=start>end?0:end>>>0;start>>>=0;while(start<length){array[start++]=value}return array}function baseFilter(collection,predicate){var result=[];baseEach(collection,function(value,index,collection){if(predicate(value,index,collection)){result.push(value)}});return result}function baseFind(collection,predicate,eachFunc,retKey){var result;eachFunc(collection,function(value,key,collection){if(predicate(value,key,collection)){result=retKey?key:value;return false}});return result}function baseFlatten(array,isDeep,isStrict,result){result||(result=[]);var index=-1,length=array.length;while(++index<length){var value=array[index];if(isObjectLike(value)&&isArrayLike(value)&&(isStrict||isArray(value)||isArguments(value))){if(isDeep){baseFlatten(value,isDeep,isStrict,result)}else{arrayPush(result,value)}}else if(!isStrict){result[result.length]=value}}return result}var baseFor=createBaseFor();var baseForRight=createBaseFor(true);function baseForIn(object,iteratee){return baseFor(object,iteratee,keysIn)}function baseForOwn(object,iteratee){return baseFor(object,iteratee,keys)}function baseForOwnRight(object,iteratee){return baseForRight(object,iteratee,keys)}function baseFunctions(object,props){var index=-1,length=props.length,resIndex=-1,result=[];while(++index<length){var key=props[index];if(isFunction(object[key])){result[++resIndex]=key}}return result}function baseGet(object,path,pathKey){if(object==null){return}if(pathKey!==undefined&&pathKey in toObject(object)){path=[pathKey]}var index=0,length=path.length;while(object!=null&&index<length){object=object[path[index++]]}return index&&index==length?object:undefined}function baseIsEqual(value,other,customizer,isLoose,stackA,stackB){if(value===other){return true}if(value==null||other==null||!isObject(value)&&!isObjectLike(other)){return value!==value&&other!==other}return baseIsEqualDeep(value,other,baseIsEqual,customizer,isLoose,stackA,stackB)}function baseIsEqualDeep(object,other,equalFunc,customizer,isLoose,stackA,stackB){var objIsArr=isArray(object),othIsArr=isArray(other),objTag=arrayTag,othTag=arrayTag;if(!objIsArr){objTag=objToString.call(object);if(objTag==argsTag){objTag=objectTag}else if(objTag!=objectTag){objIsArr=isTypedArray(object)}}if(!othIsArr){othTag=objToString.call(other);if(othTag==argsTag){othTag=objectTag}else if(othTag!=objectTag){othIsArr=isTypedArray(other)}}var objIsObj=objTag==objectTag,othIsObj=othTag==objectTag,isSameTag=objTag==othTag;if(isSameTag&&!(objIsArr||objIsObj)){return equalByTag(object,other,objTag)}if(!isLoose){var objIsWrapped=objIsObj&&hasOwnProperty.call(object,"__wrapped__"),othIsWrapped=othIsObj&&hasOwnProperty.call(other,"__wrapped__");if(objIsWrapped||othIsWrapped){return equalFunc(objIsWrapped?object.value():object,othIsWrapped?other.value():other,customizer,isLoose,stackA,stackB)}}if(!isSameTag){return false}stackA||(stackA=[]);stackB||(stackB=[]);var length=stackA.length;while(length--){if(stackA[length]==object){return stackB[length]==other}}stackA.push(object);stackB.push(other);var result=(objIsArr?equalArrays:equalObjects)(object,other,equalFunc,customizer,isLoose,stackA,stackB);stackA.pop();stackB.pop();return result}function baseIsMatch(object,matchData,customizer){var index=matchData.length,length=index,noCustomizer=!customizer;if(object==null){return!length}object=toObject(object);while(index--){var data=matchData[index];if(noCustomizer&&data[2]?data[1]!==object[data[0]]:!(data[0]in object)){return false}}while(++index<length){data=matchData[index];var key=data[0],objValue=object[key],srcValue=data[1];if(noCustomizer&&data[2]){if(objValue===undefined&&!(key in object)){return false}}else{var result=customizer?customizer(objValue,srcValue,key):undefined;if(!(result===undefined?baseIsEqual(srcValue,objValue,customizer,true):result)){return false}}}return true}function baseMap(collection,iteratee){var index=-1,result=isArrayLike(collection)?Array(collection.length):[];baseEach(collection,function(value,key,collection){result[++index]=iteratee(value,key,collection)});return result}function baseMatches(source){var matchData=getMatchData(source);if(matchData.length==1&&matchData[0][2]){var key=matchData[0][0],value=matchData[0][1];return function(object){if(object==null){return false}return object[key]===value&&(value!==undefined||key in toObject(object))}}return function(object){return baseIsMatch(object,matchData)}}function baseMatchesProperty(path,srcValue){var isArr=isArray(path),isCommon=isKey(path)&&isStrictComparable(srcValue),pathKey=path+"";path=toPath(path);return function(object){if(object==null){return false}var key=pathKey;object=toObject(object);if((isArr||!isCommon)&&!(key in object)){object=path.length==1?object:baseGet(object,baseSlice(path,0,-1));if(object==null){return false}key=last(path);object=toObject(object)}return object[key]===srcValue?srcValue!==undefined||key in object:baseIsEqual(srcValue,object[key],undefined,true)}}function baseMerge(object,source,customizer,stackA,stackB){if(!isObject(object)){return object}var isSrcArr=isArrayLike(source)&&(isArray(source)||isTypedArray(source)),props=isSrcArr?undefined:keys(source);arrayEach(props||source,function(srcValue,key){if(props){key=srcValue;srcValue=source[key]}if(isObjectLike(srcValue)){stackA||(stackA=[]);stackB||(stackB=[]);baseMergeDeep(object,source,key,baseMerge,customizer,stackA,stackB)}else{var value=object[key],result=customizer?customizer(value,srcValue,key,object,source):undefined,isCommon=result===undefined;if(isCommon){result=srcValue}if((result!==undefined||isSrcArr&&!(key in object))&&(isCommon||(result===result?result!==value:value===value))){object[key]=result}}});return object}function baseMergeDeep(object,source,key,mergeFunc,customizer,stackA,stackB){var length=stackA.length,srcValue=source[key];while(length--){if(stackA[length]==srcValue){object[key]=stackB[length];return}}var value=object[key],result=customizer?customizer(value,srcValue,key,object,source):undefined,isCommon=result===undefined;if(isCommon){result=srcValue;if(isArrayLike(srcValue)&&(isArray(srcValue)||isTypedArray(srcValue))){result=isArray(value)?value:isArrayLike(value)?arrayCopy(value):[]}else if(isPlainObject(srcValue)||isArguments(srcValue)){result=isArguments(value)?toPlainObject(value):isPlainObject(value)?value:{}}else{isCommon=false}}stackA.push(srcValue);stackB.push(result);if(isCommon){object[key]=mergeFunc(result,srcValue,customizer,stackA,stackB)}else if(result===result?result!==value:value===value){object[key]=result}}function baseProperty(key){return function(object){return object==null?undefined:object[key]}}function basePropertyDeep(path){var pathKey=path+"";path=toPath(path);return function(object){return baseGet(object,path,pathKey)}}function basePullAt(array,indexes){var length=array?indexes.length:0;while(length--){var index=indexes[length];if(index!=previous&&isIndex(index)){var previous=index;splice.call(array,index,1)}}return array}function baseRandom(min,max){return min+nativeFloor(nativeRandom()*(max-min+1))}function baseReduce(collection,iteratee,accumulator,initFromCollection,eachFunc){eachFunc(collection,function(value,index,collection){accumulator=initFromCollection?(initFromCollection=false,value):iteratee(accumulator,value,index,collection)});return accumulator}var baseSetData=!metaMap?identity:function(func,data){metaMap.set(func,data);return func};function baseSlice(array,start,end){var index=-1,length=array.length;start=start==null?0:+start||0;if(start<0){start=-start>length?0:length+start}end=end===undefined||end>length?length:+end||0;if(end<0){end+=length}length=start>end?0:end-start>>>0;start>>>=0;var result=Array(length);while(++index<length){result[index]=array[index+start]}return result}function baseSome(collection,predicate){var result;baseEach(collection,function(value,index,collection){result=predicate(value,index,collection);return!result});return!!result}function baseSortBy(array,comparer){var length=array.length;array.sort(comparer);while(length--){array[length]=array[length].value}return array}function baseSortByOrder(collection,iteratees,orders){var callback=getCallback(),index=-1;iteratees=arrayMap(iteratees,function(iteratee){return callback(iteratee)});var result=baseMap(collection,function(value){var criteria=arrayMap(iteratees,function(iteratee){return iteratee(value)});return{criteria:criteria,index:++index,value:value}});return baseSortBy(result,function(object,other){return compareMultiple(object,other,orders)})}function baseSum(collection,iteratee){var result=0;baseEach(collection,function(value,index,collection){result+=+iteratee(value,index,collection)||0});return result}function baseUniq(array,iteratee){var index=-1,indexOf=getIndexOf(),length=array.length,isCommon=indexOf==baseIndexOf,isLarge=isCommon&&length>=LARGE_ARRAY_SIZE,seen=isLarge?createCache():null,result=[];if(seen){indexOf=cacheIndexOf;isCommon=false}else{isLarge=false;seen=iteratee?[]:result}outer:while(++index<length){var value=array[index],computed=iteratee?iteratee(value,index,array):value;if(isCommon&&value===value){var seenIndex=seen.length;while(seenIndex--){if(seen[seenIndex]===computed){continue outer}}if(iteratee){seen.push(computed)}result.push(value)}else if(indexOf(seen,computed,0)<0){if(iteratee||isLarge){seen.push(computed)}result.push(value)}}return result}function baseValues(object,props){var index=-1,length=props.length,result=Array(length);while(++index<length){result[index]=object[props[index]]}return result}function baseWhile(array,predicate,isDrop,fromRight){var length=array.length,index=fromRight?length:-1;while((fromRight?index--:++index<length)&&predicate(array[index],index,array)){}return isDrop?baseSlice(array,fromRight?0:index,fromRight?index+1:length):baseSlice(array,fromRight?index+1:0,fromRight?length:index)}function baseWrapperValue(value,actions){var result=value;if(result instanceof LazyWrapper){result=result.value()}var index=-1,length=actions.length;while(++index<length){var action=actions[index];result=action.func.apply(action.thisArg,arrayPush([result],action.args))}return result}function binaryIndex(array,value,retHighest){var low=0,high=array?array.length:low;if(typeof value=="number"&&value===value&&high<=HALF_MAX_ARRAY_LENGTH){while(low<high){var mid=low+high>>>1,computed=array[mid];if((retHighest?computed<=value:computed<value)&&computed!==null){low=mid+1}else{high=mid}}return high}return binaryIndexBy(array,value,identity,retHighest)}function binaryIndexBy(array,value,iteratee,retHighest){value=iteratee(value);var low=0,high=array?array.length:0,valIsNaN=value!==value,valIsNull=value===null,valIsUndef=value===undefined;while(low<high){var mid=nativeFloor((low+high)/2),computed=iteratee(array[mid]),isDef=computed!==undefined,isReflexive=computed===computed;if(valIsNaN){var setLow=isReflexive||retHighest}else if(valIsNull){setLow=isReflexive&&isDef&&(retHighest||computed!=null);
+}else if(valIsUndef){setLow=isReflexive&&(retHighest||isDef)}else if(computed==null){setLow=false}else{setLow=retHighest?computed<=value:computed<value}if(setLow){low=mid+1}else{high=mid}}return nativeMin(high,MAX_ARRAY_INDEX)}function bindCallback(func,thisArg,argCount){if(typeof func!="function"){return identity}if(thisArg===undefined){return func}switch(argCount){case 1:return function(value){return func.call(thisArg,value)};case 3:return function(value,index,collection){return func.call(thisArg,value,index,collection)};case 4:return function(accumulator,value,index,collection){return func.call(thisArg,accumulator,value,index,collection)};case 5:return function(value,other,key,object,source){return func.call(thisArg,value,other,key,object,source)}}return function(){return func.apply(thisArg,arguments)}}function bufferClone(buffer){var result=new ArrayBuffer(buffer.byteLength),view=new Uint8Array(result);view.set(new Uint8Array(buffer));return result}function composeArgs(args,partials,holders){var holdersLength=holders.length,argsIndex=-1,argsLength=nativeMax(args.length-holdersLength,0),leftIndex=-1,leftLength=partials.length,result=Array(leftLength+argsLength);while(++leftIndex<leftLength){result[leftIndex]=partials[leftIndex]}while(++argsIndex<holdersLength){result[holders[argsIndex]]=args[argsIndex]}while(argsLength--){result[leftIndex++]=args[argsIndex++]}return result}function composeArgsRight(args,partials,holders){var holdersIndex=-1,holdersLength=holders.length,argsIndex=-1,argsLength=nativeMax(args.length-holdersLength,0),rightIndex=-1,rightLength=partials.length,result=Array(argsLength+rightLength);while(++argsIndex<argsLength){result[argsIndex]=args[argsIndex]}var offset=argsIndex;while(++rightIndex<rightLength){result[offset+rightIndex]=partials[rightIndex]}while(++holdersIndex<holdersLength){result[offset+holders[holdersIndex]]=args[argsIndex++]}return result}function createAggregator(setter,initializer){return function(collection,iteratee,thisArg){var result=initializer?initializer():{};iteratee=getCallback(iteratee,thisArg,3);if(isArray(collection)){var index=-1,length=collection.length;while(++index<length){var value=collection[index];setter(result,value,iteratee(value,index,collection),collection)}}else{baseEach(collection,function(value,key,collection){setter(result,value,iteratee(value,key,collection),collection)})}return result}}function createAssigner(assigner){return restParam(function(object,sources){var index=-1,length=object==null?0:sources.length,customizer=length>2?sources[length-2]:undefined,guard=length>2?sources[2]:undefined,thisArg=length>1?sources[length-1]:undefined;if(typeof customizer=="function"){customizer=bindCallback(customizer,thisArg,5);length-=2}else{customizer=typeof thisArg=="function"?thisArg:undefined;length-=customizer?1:0}if(guard&&isIterateeCall(sources[0],sources[1],guard)){customizer=length<3?undefined:customizer;length=1}while(++index<length){var source=sources[index];if(source){assigner(object,source,customizer)}}return object})}function createBaseEach(eachFunc,fromRight){return function(collection,iteratee){var length=collection?getLength(collection):0;if(!isLength(length)){return eachFunc(collection,iteratee)}var index=fromRight?length:-1,iterable=toObject(collection);while(fromRight?index--:++index<length){if(iteratee(iterable[index],index,iterable)===false){break}}return collection}}function createBaseFor(fromRight){return function(object,iteratee,keysFunc){var iterable=toObject(object),props=keysFunc(object),length=props.length,index=fromRight?length:-1;while(fromRight?index--:++index<length){var key=props[index];if(iteratee(iterable[key],key,iterable)===false){break}}return object}}function createBindWrapper(func,thisArg){var Ctor=createCtorWrapper(func);function wrapper(){var fn=this&&this!==root&&this instanceof wrapper?Ctor:func;return fn.apply(thisArg,arguments)}return wrapper}function createCache(values){return nativeCreate&&Set?new SetCache(values):null}function createCompounder(callback){return function(string){var index=-1,array=words(deburr(string)),length=array.length,result="";while(++index<length){result=callback(result,array[index],index)}return result}}function createCtorWrapper(Ctor){return function(){var args=arguments;switch(args.length){case 0:return new Ctor;case 1:return new Ctor(args[0]);case 2:return new Ctor(args[0],args[1]);case 3:return new Ctor(args[0],args[1],args[2]);case 4:return new Ctor(args[0],args[1],args[2],args[3]);case 5:return new Ctor(args[0],args[1],args[2],args[3],args[4]);case 6:return new Ctor(args[0],args[1],args[2],args[3],args[4],args[5]);case 7:return new Ctor(args[0],args[1],args[2],args[3],args[4],args[5],args[6])}var thisBinding=baseCreate(Ctor.prototype),result=Ctor.apply(thisBinding,args);return isObject(result)?result:thisBinding}}function createCurry(flag){function curryFunc(func,arity,guard){if(guard&&isIterateeCall(func,arity,guard)){arity=undefined}var result=createWrapper(func,flag,undefined,undefined,undefined,undefined,undefined,arity);result.placeholder=curryFunc.placeholder;return result}return curryFunc}function createDefaults(assigner,customizer){return restParam(function(args){var object=args[0];if(object==null){return object}args.push(customizer);return assigner.apply(undefined,args)})}function createExtremum(comparator,exValue){return function(collection,iteratee,thisArg){if(thisArg&&isIterateeCall(collection,iteratee,thisArg)){iteratee=undefined}iteratee=getCallback(iteratee,thisArg,3);if(iteratee.length==1){collection=isArray(collection)?collection:toIterable(collection);var result=arrayExtremum(collection,iteratee,comparator,exValue);if(!(collection.length&&result===exValue)){return result}}return baseExtremum(collection,iteratee,comparator,exValue)}}function createFind(eachFunc,fromRight){return function(collection,predicate,thisArg){predicate=getCallback(predicate,thisArg,3);if(isArray(collection)){var index=baseFindIndex(collection,predicate,fromRight);return index>-1?collection[index]:undefined}return baseFind(collection,predicate,eachFunc)}}function createFindIndex(fromRight){return function(array,predicate,thisArg){if(!(array&&array.length)){return-1}predicate=getCallback(predicate,thisArg,3);return baseFindIndex(array,predicate,fromRight)}}function createFindKey(objectFunc){return function(object,predicate,thisArg){predicate=getCallback(predicate,thisArg,3);return baseFind(object,predicate,objectFunc,true)}}function createFlow(fromRight){return function(){var wrapper,length=arguments.length,index=fromRight?length:-1,leftIndex=0,funcs=Array(length);while(fromRight?index--:++index<length){var func=funcs[leftIndex++]=arguments[index];if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}if(!wrapper&&LodashWrapper.prototype.thru&&getFuncName(func)=="wrapper"){wrapper=new LodashWrapper([],true)}}index=wrapper?-1:length;while(++index<length){func=funcs[index];var funcName=getFuncName(func),data=funcName=="wrapper"?getData(func):undefined;if(data&&isLaziable(data[0])&&data[1]==(ARY_FLAG|CURRY_FLAG|PARTIAL_FLAG|REARG_FLAG)&&!data[4].length&&data[9]==1){wrapper=wrapper[getFuncName(data[0])].apply(wrapper,data[3])}else{wrapper=func.length==1&&isLaziable(func)?wrapper[funcName]():wrapper.thru(func)}}return function(){var args=arguments,value=args[0];if(wrapper&&args.length==1&&isArray(value)&&value.length>=LARGE_ARRAY_SIZE){return wrapper.plant(value).value()}var index=0,result=length?funcs[index].apply(this,args):value;while(++index<length){result=funcs[index].call(this,result)}return result}}}function createForEach(arrayFunc,eachFunc){return function(collection,iteratee,thisArg){return typeof iteratee=="function"&&thisArg===undefined&&isArray(collection)?arrayFunc(collection,iteratee):eachFunc(collection,bindCallback(iteratee,thisArg,3))}}function createForIn(objectFunc){return function(object,iteratee,thisArg){if(typeof iteratee!="function"||thisArg!==undefined){iteratee=bindCallback(iteratee,thisArg,3)}return objectFunc(object,iteratee,keysIn)}}function createForOwn(objectFunc){return function(object,iteratee,thisArg){if(typeof iteratee!="function"||thisArg!==undefined){iteratee=bindCallback(iteratee,thisArg,3)}return objectFunc(object,iteratee)}}function createObjectMapper(isMapKeys){return function(object,iteratee,thisArg){var result={};iteratee=getCallback(iteratee,thisArg,3);baseForOwn(object,function(value,key,object){var mapped=iteratee(value,key,object);key=isMapKeys?mapped:key;value=isMapKeys?value:mapped;result[key]=value});return result}}function createPadDir(fromRight){return function(string,length,chars){string=baseToString(string);return(fromRight?string:"")+createPadding(string,length,chars)+(fromRight?"":string)}}function createPartial(flag){var partialFunc=restParam(function(func,partials){var holders=replaceHolders(partials,partialFunc.placeholder);return createWrapper(func,flag,undefined,partials,holders)});return partialFunc}function createReduce(arrayFunc,eachFunc){return function(collection,iteratee,accumulator,thisArg){var initFromArray=arguments.length<3;return typeof iteratee=="function"&&thisArg===undefined&&isArray(collection)?arrayFunc(collection,iteratee,accumulator,initFromArray):baseReduce(collection,getCallback(iteratee,thisArg,4),accumulator,initFromArray,eachFunc)}}function createHybridWrapper(func,bitmask,thisArg,partials,holders,partialsRight,holdersRight,argPos,ary,arity){var isAry=bitmask&ARY_FLAG,isBind=bitmask&BIND_FLAG,isBindKey=bitmask&BIND_KEY_FLAG,isCurry=bitmask&CURRY_FLAG,isCurryBound=bitmask&CURRY_BOUND_FLAG,isCurryRight=bitmask&CURRY_RIGHT_FLAG,Ctor=isBindKey?undefined:createCtorWrapper(func);function wrapper(){var length=arguments.length,index=length,args=Array(length);while(index--){args[index]=arguments[index]}if(partials){args=composeArgs(args,partials,holders)}if(partialsRight){args=composeArgsRight(args,partialsRight,holdersRight)}if(isCurry||isCurryRight){var placeholder=wrapper.placeholder,argsHolders=replaceHolders(args,placeholder);length-=argsHolders.length;if(length<arity){var newArgPos=argPos?arrayCopy(argPos):undefined,newArity=nativeMax(arity-length,0),newsHolders=isCurry?argsHolders:undefined,newHoldersRight=isCurry?undefined:argsHolders,newPartials=isCurry?args:undefined,newPartialsRight=isCurry?undefined:args;bitmask|=isCurry?PARTIAL_FLAG:PARTIAL_RIGHT_FLAG;bitmask&=~(isCurry?PARTIAL_RIGHT_FLAG:PARTIAL_FLAG);if(!isCurryBound){bitmask&=~(BIND_FLAG|BIND_KEY_FLAG)}var newData=[func,bitmask,thisArg,newPartials,newsHolders,newPartialsRight,newHoldersRight,newArgPos,ary,newArity],result=createHybridWrapper.apply(undefined,newData);if(isLaziable(func)){setData(result,newData)}result.placeholder=placeholder;return result}}var thisBinding=isBind?thisArg:this,fn=isBindKey?thisBinding[func]:func;if(argPos){args=reorder(args,argPos)}if(isAry&&ary<args.length){args.length=ary}if(this&&this!==root&&this instanceof wrapper){fn=Ctor||createCtorWrapper(func)}return fn.apply(thisBinding,args)}return wrapper}function createPadding(string,length,chars){var strLength=string.length;length=+length;if(strLength>=length||!nativeIsFinite(length)){return""}var padLength=length-strLength;chars=chars==null?" ":chars+"";return repeat(chars,nativeCeil(padLength/chars.length)).slice(0,padLength)}function createPartialWrapper(func,bitmask,thisArg,partials){var isBind=bitmask&BIND_FLAG,Ctor=createCtorWrapper(func);function wrapper(){var argsIndex=-1,argsLength=arguments.length,leftIndex=-1,leftLength=partials.length,args=Array(leftLength+argsLength);while(++leftIndex<leftLength){args[leftIndex]=partials[leftIndex]}while(argsLength--){args[leftIndex++]=arguments[++argsIndex]}var fn=this&&this!==root&&this instanceof wrapper?Ctor:func;return fn.apply(isBind?thisArg:this,args)}return wrapper}function createRound(methodName){var func=Math[methodName];return function(number,precision){precision=precision===undefined?0:+precision||0;if(precision){precision=pow(10,precision);return func(number*precision)/precision}return func(number)}}function createSortedIndex(retHighest){return function(array,value,iteratee,thisArg){var callback=getCallback(iteratee);return iteratee==null&&callback===baseCallback?binaryIndex(array,value,retHighest):binaryIndexBy(array,value,callback(iteratee,thisArg,1),retHighest)}}function createWrapper(func,bitmask,thisArg,partials,holders,argPos,ary,arity){var isBindKey=bitmask&BIND_KEY_FLAG;if(!isBindKey&&typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}var length=partials?partials.length:0;if(!length){bitmask&=~(PARTIAL_FLAG|PARTIAL_RIGHT_FLAG);partials=holders=undefined}length-=holders?holders.length:0;if(bitmask&PARTIAL_RIGHT_FLAG){var partialsRight=partials,holdersRight=holders;partials=holders=undefined}var data=isBindKey?undefined:getData(func),newData=[func,bitmask,thisArg,partials,holders,partialsRight,holdersRight,argPos,ary,arity];if(data){mergeData(newData,data);bitmask=newData[1];arity=newData[9]}newData[9]=arity==null?isBindKey?0:func.length:nativeMax(arity-length,0)||0;if(bitmask==BIND_FLAG){var result=createBindWrapper(newData[0],newData[2])}else if((bitmask==PARTIAL_FLAG||bitmask==(BIND_FLAG|PARTIAL_FLAG))&&!newData[4].length){result=createPartialWrapper.apply(undefined,newData)}else{result=createHybridWrapper.apply(undefined,newData)}var setter=data?baseSetData:setData;return setter(result,newData)}function equalArrays(array,other,equalFunc,customizer,isLoose,stackA,stackB){var index=-1,arrLength=array.length,othLength=other.length;if(arrLength!=othLength&&!(isLoose&&othLength>arrLength)){return false}while(++index<arrLength){var arrValue=array[index],othValue=other[index],result=customizer?customizer(isLoose?othValue:arrValue,isLoose?arrValue:othValue,index):undefined;if(result!==undefined){if(result){continue}return false}if(isLoose){if(!arraySome(other,function(othValue){return arrValue===othValue||equalFunc(arrValue,othValue,customizer,isLoose,stackA,stackB)})){return false}}else if(!(arrValue===othValue||equalFunc(arrValue,othValue,customizer,isLoose,stackA,stackB))){return false}}return true}function equalByTag(object,other,tag){switch(tag){case boolTag:case dateTag:return+object==+other;case errorTag:return object.name==other.name&&object.message==other.message;case numberTag:return object!=+object?other!=+other:object==+other;case regexpTag:case stringTag:return object==other+""}return false}function equalObjects(object,other,equalFunc,customizer,isLoose,stackA,stackB){var objProps=keys(object),objLength=objProps.length,othProps=keys(other),othLength=othProps.length;if(objLength!=othLength&&!isLoose){return false}var index=objLength;while(index--){var key=objProps[index];if(!(isLoose?key in other:hasOwnProperty.call(other,key))){return false}}var skipCtor=isLoose;while(++index<objLength){key=objProps[index];var objValue=object[key],othValue=other[key],result=customizer?customizer(isLoose?othValue:objValue,isLoose?objValue:othValue,key):undefined;if(!(result===undefined?equalFunc(objValue,othValue,customizer,isLoose,stackA,stackB):result)){return false}skipCtor||(skipCtor=key=="constructor")}if(!skipCtor){var objCtor=object.constructor,othCtor=other.constructor;if(objCtor!=othCtor&&("constructor"in object&&"constructor"in other)&&!(typeof objCtor=="function"&&objCtor instanceof objCtor&&typeof othCtor=="function"&&othCtor instanceof othCtor)){return false}}return true}function getCallback(func,thisArg,argCount){var result=lodash.callback||callback;result=result===callback?baseCallback:result;return argCount?result(func,thisArg,argCount):result}var getData=!metaMap?noop:function(func){return metaMap.get(func)};function getFuncName(func){var result=func.name,array=realNames[result],length=array?array.length:0;while(length--){var data=array[length],otherFunc=data.func;if(otherFunc==null||otherFunc==func){return data.name}}return result}function getIndexOf(collection,target,fromIndex){var result=lodash.indexOf||indexOf;result=result===indexOf?baseIndexOf:result;return collection?result(collection,target,fromIndex):result}var getLength=baseProperty("length");function getMatchData(object){var result=pairs(object),length=result.length;while(length--){result[length][2]=isStrictComparable(result[length][1])}return result}function getNative(object,key){var value=object==null?undefined:object[key];return isNative(value)?value:undefined}function getView(start,end,transforms){var index=-1,length=transforms.length;while(++index<length){var data=transforms[index],size=data.size;switch(data.type){case"drop":start+=size;break;case"dropRight":end-=size;break;case"take":end=nativeMin(end,start+size);break;case"takeRight":start=nativeMax(start,end-size);break}}return{start:start,end:end}}function initCloneArray(array){var length=array.length,result=new array.constructor(length);if(length&&typeof array[0]=="string"&&hasOwnProperty.call(array,"index")){result.index=array.index;result.input=array.input}return result}function initCloneObject(object){var Ctor=object.constructor;if(!(typeof Ctor=="function"&&Ctor instanceof Ctor)){Ctor=Object}return new Ctor}function initCloneByTag(object,tag,isDeep){var Ctor=object.constructor;switch(tag){case arrayBufferTag:return bufferClone(object);case boolTag:case dateTag:return new Ctor(+object);case float32Tag:case float64Tag:case int8Tag:case int16Tag:case int32Tag:case uint8Tag:case uint8ClampedTag:case uint16Tag:case uint32Tag:var buffer=object.buffer;return new Ctor(isDeep?bufferClone(buffer):buffer,object.byteOffset,object.length);case numberTag:case stringTag:return new Ctor(object);case regexpTag:var result=new Ctor(object.source,reFlags.exec(object));result.lastIndex=object.lastIndex}return result}function invokePath(object,path,args){if(object!=null&&!isKey(path,object)){path=toPath(path);object=path.length==1?object:baseGet(object,baseSlice(path,0,-1));path=last(path)}var func=object==null?object:object[path];return func==null?undefined:func.apply(object,args)}function isArrayLike(value){return value!=null&&isLength(getLength(value))}function isIndex(value,length){value=typeof value=="number"||reIsUint.test(value)?+value:-1;length=length==null?MAX_SAFE_INTEGER:length;return value>-1&&value%1==0&&value<length}function isIterateeCall(value,index,object){if(!isObject(object)){return false}var type=typeof index;if(type=="number"?isArrayLike(object)&&isIndex(index,object.length):type=="string"&&index in object){var other=object[index];return value===value?value===other:other!==other}return false}function isKey(value,object){var type=typeof value;if(type=="string"&&reIsPlainProp.test(value)||type=="number"){return true}if(isArray(value)){return false}var result=!reIsDeepProp.test(value);return result||object!=null&&value in toObject(object)}function isLaziable(func){var funcName=getFuncName(func);if(!(funcName in LazyWrapper.prototype)){return false}var other=lodash[funcName];if(func===other){return true}var data=getData(other);return!!data&&func===data[0]}function isLength(value){return typeof value=="number"&&value>-1&&value%1==0&&value<=MAX_SAFE_INTEGER}function isStrictComparable(value){return value===value&&!isObject(value)}function mergeData(data,source){var bitmask=data[1],srcBitmask=source[1],newBitmask=bitmask|srcBitmask,isCommon=newBitmask<ARY_FLAG;var isCombo=srcBitmask==ARY_FLAG&&bitmask==CURRY_FLAG||srcBitmask==ARY_FLAG&&bitmask==REARG_FLAG&&data[7].length<=source[8]||srcBitmask==(ARY_FLAG|REARG_FLAG)&&bitmask==CURRY_FLAG;if(!(isCommon||isCombo)){return data}if(srcBitmask&BIND_FLAG){data[2]=source[2];newBitmask|=bitmask&BIND_FLAG?0:CURRY_BOUND_FLAG}var value=source[3];if(value){var partials=data[3];data[3]=partials?composeArgs(partials,value,source[4]):arrayCopy(value);data[4]=partials?replaceHolders(data[3],PLACEHOLDER):arrayCopy(source[4])}value=source[5];if(value){partials=data[5];data[5]=partials?composeArgsRight(partials,value,source[6]):arrayCopy(value);data[6]=partials?replaceHolders(data[5],PLACEHOLDER):arrayCopy(source[6])}value=source[7];if(value){data[7]=arrayCopy(value)}if(srcBitmask&ARY_FLAG){data[8]=data[8]==null?source[8]:nativeMin(data[8],source[8])}if(data[9]==null){data[9]=source[9]}data[0]=source[0];data[1]=newBitmask;return data}function mergeDefaults(objectValue,sourceValue){return objectValue===undefined?sourceValue:merge(objectValue,sourceValue,mergeDefaults)}function pickByArray(object,props){object=toObject(object);var index=-1,length=props.length,result={};while(++index<length){var key=props[index];if(key in object){result[key]=object[key]}}return result}function pickByCallback(object,predicate){var result={};baseForIn(object,function(value,key,object){if(predicate(value,key,object)){result[key]=value}});return result}function reorder(array,indexes){var arrLength=array.length,length=nativeMin(indexes.length,arrLength),oldArray=arrayCopy(array);while(length--){var index=indexes[length];array[length]=isIndex(index,arrLength)?oldArray[index]:undefined}return array}var setData=function(){var count=0,lastCalled=0;return function(key,value){var stamp=now(),remaining=HOT_SPAN-(stamp-lastCalled);lastCalled=stamp;if(remaining>0){if(++count>=HOT_COUNT){return key}}else{count=0}return baseSetData(key,value)}}();function shimKeys(object){var props=keysIn(object),propsLength=props.length,length=propsLength&&object.length;var allowIndexes=!!length&&isLength(length)&&(isArray(object)||isArguments(object));var index=-1,result=[];while(++index<propsLength){var key=props[index];if(allowIndexes&&isIndex(key,length)||hasOwnProperty.call(object,key)){result.push(key)}}return result}function toIterable(value){if(value==null){return[]}if(!isArrayLike(value)){return values(value)}return isObject(value)?value:Object(value)}function toObject(value){return isObject(value)?value:Object(value)}function toPath(value){if(isArray(value)){return value}var result=[];baseToString(value).replace(rePropName,function(match,number,quote,string){result.push(quote?string.replace(reEscapeChar,"$1"):number||match)});return result}function wrapperClone(wrapper){return wrapper instanceof LazyWrapper?wrapper.clone():new LodashWrapper(wrapper.__wrapped__,wrapper.__chain__,arrayCopy(wrapper.__actions__))}function chunk(array,size,guard){if(guard?isIterateeCall(array,size,guard):size==null){size=1}else{size=nativeMax(nativeFloor(size)||1,1)}var index=0,length=array?array.length:0,resIndex=-1,result=Array(nativeCeil(length/size));while(index<length){result[++resIndex]=baseSlice(array,index,index+=size)}return result}function compact(array){var index=-1,length=array?array.length:0,resIndex=-1,result=[];while(++index<length){var value=array[index];if(value){result[++resIndex]=value}}return result}var difference=restParam(function(array,values){return isObjectLike(array)&&isArrayLike(array)?baseDifference(array,baseFlatten(values,false,true)):[]});function drop(array,n,guard){var length=array?array.length:0;if(!length){return[]}if(guard?isIterateeCall(array,n,guard):n==null){n=1}return baseSlice(array,n<0?0:n)}function dropRight(array,n,guard){var length=array?array.length:0;if(!length){return[]}if(guard?isIterateeCall(array,n,guard):n==null){n=1}n=length-(+n||0);return baseSlice(array,0,n<0?0:n)}function dropRightWhile(array,predicate,thisArg){return array&&array.length?baseWhile(array,getCallback(predicate,thisArg,3),true,true):[]}function dropWhile(array,predicate,thisArg){return array&&array.length?baseWhile(array,getCallback(predicate,thisArg,3),true):[]}function fill(array,value,start,end){var length=array?array.length:0;if(!length){return[]}if(start&&typeof start!="number"&&isIterateeCall(array,value,start)){start=0;end=length}return baseFill(array,value,start,end)}var findIndex=createFindIndex();var findLastIndex=createFindIndex(true);function first(array){return array?array[0]:undefined}function flatten(array,isDeep,guard){var length=array?array.length:0;if(guard&&isIterateeCall(array,isDeep,guard)){isDeep=false}return length?baseFlatten(array,isDeep):[]}function flattenDeep(array){var length=array?array.length:0;return length?baseFlatten(array,true):[]}function indexOf(array,value,fromIndex){var length=array?array.length:0;if(!length){return-1}if(typeof fromIndex=="number"){fromIndex=fromIndex<0?nativeMax(length+fromIndex,0):fromIndex}else if(fromIndex){var index=binaryIndex(array,value);if(index<length&&(value===value?value===array[index]:array[index]!==array[index])){return index}return-1}return baseIndexOf(array,value,fromIndex||0)}function initial(array){return dropRight(array,1)}var intersection=restParam(function(arrays){var othLength=arrays.length,othIndex=othLength,caches=Array(length),indexOf=getIndexOf(),isCommon=indexOf==baseIndexOf,result=[];while(othIndex--){var value=arrays[othIndex]=isArrayLike(value=arrays[othIndex])?value:[];caches[othIndex]=isCommon&&value.length>=120?createCache(othIndex&&value):null}var array=arrays[0],index=-1,length=array?array.length:0,seen=caches[0];outer:while(++index<length){value=array[index];if((seen?cacheIndexOf(seen,value):indexOf(result,value,0))<0){var othIndex=othLength;while(--othIndex){var cache=caches[othIndex];if((cache?cacheIndexOf(cache,value):indexOf(arrays[othIndex],value,0))<0){continue outer}}if(seen){seen.push(value)}result.push(value)}}return result});function last(array){var length=array?array.length:0;return length?array[length-1]:undefined}function lastIndexOf(array,value,fromIndex){var length=array?array.length:0;if(!length){return-1}var index=length;if(typeof fromIndex=="number"){index=(fromIndex<0?nativeMax(length+fromIndex,0):nativeMin(fromIndex||0,length-1))+1}else if(fromIndex){index=binaryIndex(array,value,true)-1;var other=array[index];if(value===value?value===other:other!==other){return index}return-1}if(value!==value){return indexOfNaN(array,index,true)}while(index--){if(array[index]===value){return index}}return-1}function pull(){var args=arguments,array=args[0];if(!(array&&array.length)){return array}var index=0,indexOf=getIndexOf(),length=args.length;while(++index<length){var fromIndex=0,value=args[index];while((fromIndex=indexOf(array,value,fromIndex))>-1){splice.call(array,fromIndex,1)}}return array}var pullAt=restParam(function(array,indexes){indexes=baseFlatten(indexes);var result=baseAt(array,indexes);basePullAt(array,indexes.sort(baseCompareAscending));return result});function remove(array,predicate,thisArg){var result=[];if(!(array&&array.length)){return result}var index=-1,indexes=[],length=array.length;predicate=getCallback(predicate,thisArg,3);while(++index<length){var value=array[index];if(predicate(value,index,array)){result.push(value);indexes.push(index)}}basePullAt(array,indexes);return result}function rest(array){return drop(array,1)}function slice(array,start,end){var length=array?array.length:0;if(!length){return[]}if(end&&typeof end!="number"&&isIterateeCall(array,start,end)){start=0;end=length}return baseSlice(array,start,end)}var sortedIndex=createSortedIndex();var sortedLastIndex=createSortedIndex(true);function take(array,n,guard){var length=array?array.length:0;if(!length){return[]}if(guard?isIterateeCall(array,n,guard):n==null){n=1}return baseSlice(array,0,n<0?0:n)}function takeRight(array,n,guard){var length=array?array.length:0;if(!length){return[]}if(guard?isIterateeCall(array,n,guard):n==null){n=1}n=length-(+n||0);return baseSlice(array,n<0?0:n)}function takeRightWhile(array,predicate,thisArg){return array&&array.length?baseWhile(array,getCallback(predicate,thisArg,3),false,true):[]}function takeWhile(array,predicate,thisArg){return array&&array.length?baseWhile(array,getCallback(predicate,thisArg,3)):[]}var union=restParam(function(arrays){return baseUniq(baseFlatten(arrays,false,true))});function uniq(array,isSorted,iteratee,thisArg){var length=array?array.length:0;if(!length){return[]}if(isSorted!=null&&typeof isSorted!="boolean"){thisArg=iteratee;iteratee=isIterateeCall(array,isSorted,thisArg)?undefined:isSorted;isSorted=false}var callback=getCallback();if(!(iteratee==null&&callback===baseCallback)){iteratee=callback(iteratee,thisArg,3)}return isSorted&&getIndexOf()==baseIndexOf?sortedUniq(array,iteratee):baseUniq(array,iteratee)}function unzip(array){if(!(array&&array.length)){return[]}var index=-1,length=0;array=arrayFilter(array,function(group){if(isArrayLike(group)){length=nativeMax(group.length,length);return true}});var result=Array(length);while(++index<length){result[index]=arrayMap(array,baseProperty(index))}return result}function unzipWith(array,iteratee,thisArg){var length=array?array.length:0;if(!length){return[]}var result=unzip(array);if(iteratee==null){return result}iteratee=bindCallback(iteratee,thisArg,4);return arrayMap(result,function(group){return arrayReduce(group,iteratee,undefined,true)})}var without=restParam(function(array,values){return isArrayLike(array)?baseDifference(array,values):[]});function xor(){var index=-1,length=arguments.length;while(++index<length){var array=arguments[index];if(isArrayLike(array)){var result=result?arrayPush(baseDifference(result,array),baseDifference(array,result)):array}}return result?baseUniq(result):[]}var zip=restParam(unzip);function zipObject(props,values){var index=-1,length=props?props.length:0,result={};if(length&&!values&&!isArray(props[0])){values=[]}while(++index<length){var key=props[index];if(values){result[key]=values[index]}else if(key){result[key[0]]=key[1]}}return result}var zipWith=restParam(function(arrays){var length=arrays.length,iteratee=length>2?arrays[length-2]:undefined,thisArg=length>1?arrays[length-1]:undefined;if(length>2&&typeof iteratee=="function"){length-=2}else{iteratee=length>1&&typeof thisArg=="function"?(--length,thisArg):undefined;thisArg=undefined}arrays.length=length;return unzipWith(arrays,iteratee,thisArg)});function chain(value){var result=lodash(value);result.__chain__=true;return result}function tap(value,interceptor,thisArg){interceptor.call(thisArg,value);return value}function thru(value,interceptor,thisArg){return interceptor.call(thisArg,value)}function wrapperChain(){return chain(this)}function wrapperCommit(){return new LodashWrapper(this.value(),this.__chain__)}var wrapperConcat=restParam(function(values){values=baseFlatten(values);return this.thru(function(array){return arrayConcat(isArray(array)?array:[toObject(array)],values)})});function wrapperPlant(value){var result,parent=this;while(parent instanceof baseLodash){var clone=wrapperClone(parent);if(result){previous.__wrapped__=clone}else{result=clone}var previous=clone;parent=parent.__wrapped__}previous.__wrapped__=value;return result}function wrapperReverse(){var value=this.__wrapped__;var interceptor=function(value){return wrapped&&wrapped.__dir__<0?value:value.reverse()};if(value instanceof LazyWrapper){var wrapped=value;if(this.__actions__.length){wrapped=new LazyWrapper(this)}wrapped=wrapped.reverse();wrapped.__actions__.push({func:thru,args:[interceptor],thisArg:undefined});return new LodashWrapper(wrapped,this.__chain__)}return this.thru(interceptor)}function wrapperToString(){return this.value()+""}function wrapperValue(){return baseWrapperValue(this.__wrapped__,this.__actions__)}var at=restParam(function(collection,props){return baseAt(collection,baseFlatten(props))});var countBy=createAggregator(function(result,value,key){hasOwnProperty.call(result,key)?++result[key]:result[key]=1});function every(collection,predicate,thisArg){var func=isArray(collection)?arrayEvery:baseEvery;if(thisArg&&isIterateeCall(collection,predicate,thisArg)){predicate=undefined}if(typeof predicate!="function"||thisArg!==undefined){predicate=getCallback(predicate,thisArg,3)}return func(collection,predicate)}function filter(collection,predicate,thisArg){var func=isArray(collection)?arrayFilter:baseFilter;predicate=getCallback(predicate,thisArg,3);return func(collection,predicate)}var find=createFind(baseEach);var findLast=createFind(baseEachRight,true);function findWhere(collection,source){return find(collection,baseMatches(source))}var forEach=createForEach(arrayEach,baseEach);var forEachRight=createForEach(arrayEachRight,baseEachRight);
+var groupBy=createAggregator(function(result,value,key){if(hasOwnProperty.call(result,key)){result[key].push(value)}else{result[key]=[value]}});function includes(collection,target,fromIndex,guard){var length=collection?getLength(collection):0;if(!isLength(length)){collection=values(collection);length=collection.length}if(typeof fromIndex!="number"||guard&&isIterateeCall(target,fromIndex,guard)){fromIndex=0}else{fromIndex=fromIndex<0?nativeMax(length+fromIndex,0):fromIndex||0}return typeof collection=="string"||!isArray(collection)&&isString(collection)?fromIndex<=length&&collection.indexOf(target,fromIndex)>-1:!!length&&getIndexOf(collection,target,fromIndex)>-1}var indexBy=createAggregator(function(result,value,key){result[key]=value});var invoke=restParam(function(collection,path,args){var index=-1,isFunc=typeof path=="function",isProp=isKey(path),result=isArrayLike(collection)?Array(collection.length):[];baseEach(collection,function(value){var func=isFunc?path:isProp&&value!=null?value[path]:undefined;result[++index]=func?func.apply(value,args):invokePath(value,path,args)});return result});function map(collection,iteratee,thisArg){var func=isArray(collection)?arrayMap:baseMap;iteratee=getCallback(iteratee,thisArg,3);return func(collection,iteratee)}var partition=createAggregator(function(result,value,key){result[key?0:1].push(value)},function(){return[[],[]]});function pluck(collection,path){return map(collection,property(path))}var reduce=createReduce(arrayReduce,baseEach);var reduceRight=createReduce(arrayReduceRight,baseEachRight);function reject(collection,predicate,thisArg){var func=isArray(collection)?arrayFilter:baseFilter;predicate=getCallback(predicate,thisArg,3);return func(collection,function(value,index,collection){return!predicate(value,index,collection)})}function sample(collection,n,guard){if(guard?isIterateeCall(collection,n,guard):n==null){collection=toIterable(collection);var length=collection.length;return length>0?collection[baseRandom(0,length-1)]:undefined}var index=-1,result=toArray(collection),length=result.length,lastIndex=length-1;n=nativeMin(n<0?0:+n||0,length);while(++index<n){var rand=baseRandom(index,lastIndex),value=result[rand];result[rand]=result[index];result[index]=value}result.length=n;return result}function shuffle(collection){return sample(collection,POSITIVE_INFINITY)}function size(collection){var length=collection?getLength(collection):0;return isLength(length)?length:keys(collection).length}function some(collection,predicate,thisArg){var func=isArray(collection)?arraySome:baseSome;if(thisArg&&isIterateeCall(collection,predicate,thisArg)){predicate=undefined}if(typeof predicate!="function"||thisArg!==undefined){predicate=getCallback(predicate,thisArg,3)}return func(collection,predicate)}function sortBy(collection,iteratee,thisArg){if(collection==null){return[]}if(thisArg&&isIterateeCall(collection,iteratee,thisArg)){iteratee=undefined}var index=-1;iteratee=getCallback(iteratee,thisArg,3);var result=baseMap(collection,function(value,key,collection){return{criteria:iteratee(value,key,collection),index:++index,value:value}});return baseSortBy(result,compareAscending)}var sortByAll=restParam(function(collection,iteratees){if(collection==null){return[]}var guard=iteratees[2];if(guard&&isIterateeCall(iteratees[0],iteratees[1],guard)){iteratees.length=1}return baseSortByOrder(collection,baseFlatten(iteratees),[])});function sortByOrder(collection,iteratees,orders,guard){if(collection==null){return[]}if(guard&&isIterateeCall(iteratees,orders,guard)){orders=undefined}if(!isArray(iteratees)){iteratees=iteratees==null?[]:[iteratees]}if(!isArray(orders)){orders=orders==null?[]:[orders]}return baseSortByOrder(collection,iteratees,orders)}function where(collection,source){return filter(collection,baseMatches(source))}var now=nativeNow||function(){return(new Date).getTime()};function after(n,func){if(typeof func!="function"){if(typeof n=="function"){var temp=n;n=func;func=temp}else{throw new TypeError(FUNC_ERROR_TEXT)}}n=nativeIsFinite(n=+n)?n:0;return function(){if(--n<1){return func.apply(this,arguments)}}}function ary(func,n,guard){if(guard&&isIterateeCall(func,n,guard)){n=undefined}n=func&&n==null?func.length:nativeMax(+n||0,0);return createWrapper(func,ARY_FLAG,undefined,undefined,undefined,undefined,n)}function before(n,func){var result;if(typeof func!="function"){if(typeof n=="function"){var temp=n;n=func;func=temp}else{throw new TypeError(FUNC_ERROR_TEXT)}}return function(){if(--n>0){result=func.apply(this,arguments)}if(n<=1){func=undefined}return result}}var bind=restParam(function(func,thisArg,partials){var bitmask=BIND_FLAG;if(partials.length){var holders=replaceHolders(partials,bind.placeholder);bitmask|=PARTIAL_FLAG}return createWrapper(func,bitmask,thisArg,partials,holders)});var bindAll=restParam(function(object,methodNames){methodNames=methodNames.length?baseFlatten(methodNames):functions(object);var index=-1,length=methodNames.length;while(++index<length){var key=methodNames[index];object[key]=createWrapper(object[key],BIND_FLAG,object)}return object});var bindKey=restParam(function(object,key,partials){var bitmask=BIND_FLAG|BIND_KEY_FLAG;if(partials.length){var holders=replaceHolders(partials,bindKey.placeholder);bitmask|=PARTIAL_FLAG}return createWrapper(key,bitmask,object,partials,holders)});var curry=createCurry(CURRY_FLAG);var curryRight=createCurry(CURRY_RIGHT_FLAG);function debounce(func,wait,options){var args,maxTimeoutId,result,stamp,thisArg,timeoutId,trailingCall,lastCalled=0,maxWait=false,trailing=true;if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}wait=wait<0?0:+wait||0;if(options===true){var leading=true;trailing=false}else if(isObject(options)){leading=!!options.leading;maxWait="maxWait"in options&&nativeMax(+options.maxWait||0,wait);trailing="trailing"in options?!!options.trailing:trailing}function cancel(){if(timeoutId){clearTimeout(timeoutId)}if(maxTimeoutId){clearTimeout(maxTimeoutId)}lastCalled=0;maxTimeoutId=timeoutId=trailingCall=undefined}function complete(isCalled,id){if(id){clearTimeout(id)}maxTimeoutId=timeoutId=trailingCall=undefined;if(isCalled){lastCalled=now();result=func.apply(thisArg,args);if(!timeoutId&&!maxTimeoutId){args=thisArg=undefined}}}function delayed(){var remaining=wait-(now()-stamp);if(remaining<=0||remaining>wait){complete(trailingCall,maxTimeoutId)}else{timeoutId=setTimeout(delayed,remaining)}}function maxDelayed(){complete(trailing,timeoutId)}function debounced(){args=arguments;stamp=now();thisArg=this;trailingCall=trailing&&(timeoutId||!leading);if(maxWait===false){var leadingCall=leading&&!timeoutId}else{if(!maxTimeoutId&&!leading){lastCalled=stamp}var remaining=maxWait-(stamp-lastCalled),isCalled=remaining<=0||remaining>maxWait;if(isCalled){if(maxTimeoutId){maxTimeoutId=clearTimeout(maxTimeoutId)}lastCalled=stamp;result=func.apply(thisArg,args)}else if(!maxTimeoutId){maxTimeoutId=setTimeout(maxDelayed,remaining)}}if(isCalled&&timeoutId){timeoutId=clearTimeout(timeoutId)}else if(!timeoutId&&wait!==maxWait){timeoutId=setTimeout(delayed,wait)}if(leadingCall){isCalled=true;result=func.apply(thisArg,args)}if(isCalled&&!timeoutId&&!maxTimeoutId){args=thisArg=undefined}return result}debounced.cancel=cancel;return debounced}var defer=restParam(function(func,args){return baseDelay(func,1,args)});var delay=restParam(function(func,wait,args){return baseDelay(func,wait,args)});var flow=createFlow();var flowRight=createFlow(true);function memoize(func,resolver){if(typeof func!="function"||resolver&&typeof resolver!="function"){throw new TypeError(FUNC_ERROR_TEXT)}var memoized=function(){var args=arguments,key=resolver?resolver.apply(this,args):args[0],cache=memoized.cache;if(cache.has(key)){return cache.get(key)}var result=func.apply(this,args);memoized.cache=cache.set(key,result);return result};memoized.cache=new memoize.Cache;return memoized}var modArgs=restParam(function(func,transforms){transforms=baseFlatten(transforms);if(typeof func!="function"||!arrayEvery(transforms,baseIsFunction)){throw new TypeError(FUNC_ERROR_TEXT)}var length=transforms.length;return restParam(function(args){var index=nativeMin(args.length,length);while(index--){args[index]=transforms[index](args[index])}return func.apply(this,args)})});function negate(predicate){if(typeof predicate!="function"){throw new TypeError(FUNC_ERROR_TEXT)}return function(){return!predicate.apply(this,arguments)}}function once(func){return before(2,func)}var partial=createPartial(PARTIAL_FLAG);var partialRight=createPartial(PARTIAL_RIGHT_FLAG);var rearg=restParam(function(func,indexes){return createWrapper(func,REARG_FLAG,undefined,undefined,undefined,baseFlatten(indexes))});function restParam(func,start){if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}start=nativeMax(start===undefined?func.length-1:+start||0,0);return function(){var args=arguments,index=-1,length=nativeMax(args.length-start,0),rest=Array(length);while(++index<length){rest[index]=args[start+index]}switch(start){case 0:return func.call(this,rest);case 1:return func.call(this,args[0],rest);case 2:return func.call(this,args[0],args[1],rest)}var otherArgs=Array(start+1);index=-1;while(++index<start){otherArgs[index]=args[index]}otherArgs[start]=rest;return func.apply(this,otherArgs)}}function spread(func){if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}return function(array){return func.apply(this,array)}}function throttle(func,wait,options){var leading=true,trailing=true;if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}if(options===false){leading=false}else if(isObject(options)){leading="leading"in options?!!options.leading:leading;trailing="trailing"in options?!!options.trailing:trailing}return debounce(func,wait,{leading:leading,maxWait:+wait,trailing:trailing})}function wrap(value,wrapper){wrapper=wrapper==null?identity:wrapper;return createWrapper(wrapper,PARTIAL_FLAG,undefined,[value],[])}function clone(value,isDeep,customizer,thisArg){if(isDeep&&typeof isDeep!="boolean"&&isIterateeCall(value,isDeep,customizer)){isDeep=false}else if(typeof isDeep=="function"){thisArg=customizer;customizer=isDeep;isDeep=false}return typeof customizer=="function"?baseClone(value,isDeep,bindCallback(customizer,thisArg,1)):baseClone(value,isDeep)}function cloneDeep(value,customizer,thisArg){return typeof customizer=="function"?baseClone(value,true,bindCallback(customizer,thisArg,1)):baseClone(value,true)}function gt(value,other){return value>other}function gte(value,other){return value>=other}function isArguments(value){return isObjectLike(value)&&isArrayLike(value)&&hasOwnProperty.call(value,"callee")&&!propertyIsEnumerable.call(value,"callee")}var isArray=nativeIsArray||function(value){return isObjectLike(value)&&isLength(value.length)&&objToString.call(value)==arrayTag};function isBoolean(value){return value===true||value===false||isObjectLike(value)&&objToString.call(value)==boolTag}function isDate(value){return isObjectLike(value)&&objToString.call(value)==dateTag}function isElement(value){return!!value&&value.nodeType===1&&isObjectLike(value)&&!isPlainObject(value)}function isEmpty(value){if(value==null){return true}if(isArrayLike(value)&&(isArray(value)||isString(value)||isArguments(value)||isObjectLike(value)&&isFunction(value.splice))){return!value.length}return!keys(value).length}function isEqual(value,other,customizer,thisArg){customizer=typeof customizer=="function"?bindCallback(customizer,thisArg,3):undefined;var result=customizer?customizer(value,other):undefined;return result===undefined?baseIsEqual(value,other,customizer):!!result}function isError(value){return isObjectLike(value)&&typeof value.message=="string"&&objToString.call(value)==errorTag}function isFinite(value){return typeof value=="number"&&nativeIsFinite(value)}function isFunction(value){return isObject(value)&&objToString.call(value)==funcTag}function isObject(value){var type=typeof value;return!!value&&(type=="object"||type=="function")}function isMatch(object,source,customizer,thisArg){customizer=typeof customizer=="function"?bindCallback(customizer,thisArg,3):undefined;return baseIsMatch(object,getMatchData(source),customizer)}function isNaN(value){return isNumber(value)&&value!=+value}function isNative(value){if(value==null){return false}if(isFunction(value)){return reIsNative.test(fnToString.call(value))}return isObjectLike(value)&&reIsHostCtor.test(value)}function isNull(value){return value===null}function isNumber(value){return typeof value=="number"||isObjectLike(value)&&objToString.call(value)==numberTag}function isPlainObject(value){var Ctor;if(!(isObjectLike(value)&&objToString.call(value)==objectTag&&!isArguments(value))||!hasOwnProperty.call(value,"constructor")&&(Ctor=value.constructor,typeof Ctor=="function"&&!(Ctor instanceof Ctor))){return false}var result;baseForIn(value,function(subValue,key){result=key});return result===undefined||hasOwnProperty.call(value,result)}function isRegExp(value){return isObject(value)&&objToString.call(value)==regexpTag}function isString(value){return typeof value=="string"||isObjectLike(value)&&objToString.call(value)==stringTag}function isTypedArray(value){return isObjectLike(value)&&isLength(value.length)&&!!typedArrayTags[objToString.call(value)]}function isUndefined(value){return value===undefined}function lt(value,other){return value<other}function lte(value,other){return value<=other}function toArray(value){var length=value?getLength(value):0;if(!isLength(length)){return values(value)}if(!length){return[]}return arrayCopy(value)}function toPlainObject(value){return baseCopy(value,keysIn(value))}var merge=createAssigner(baseMerge);var assign=createAssigner(function(object,source,customizer){return customizer?assignWith(object,source,customizer):baseAssign(object,source)});function create(prototype,properties,guard){var result=baseCreate(prototype);if(guard&&isIterateeCall(prototype,properties,guard)){properties=undefined}return properties?baseAssign(result,properties):result}var defaults=createDefaults(assign,assignDefaults);var defaultsDeep=createDefaults(merge,mergeDefaults);var findKey=createFindKey(baseForOwn);var findLastKey=createFindKey(baseForOwnRight);var forIn=createForIn(baseFor);var forInRight=createForIn(baseForRight);var forOwn=createForOwn(baseForOwn);var forOwnRight=createForOwn(baseForOwnRight);function functions(object){return baseFunctions(object,keysIn(object))}function get(object,path,defaultValue){var result=object==null?undefined:baseGet(object,toPath(path),path+"");return result===undefined?defaultValue:result}function has(object,path){if(object==null){return false}var result=hasOwnProperty.call(object,path);if(!result&&!isKey(path)){path=toPath(path);object=path.length==1?object:baseGet(object,baseSlice(path,0,-1));if(object==null){return false}path=last(path);result=hasOwnProperty.call(object,path)}return result||isLength(object.length)&&isIndex(path,object.length)&&(isArray(object)||isArguments(object))}function invert(object,multiValue,guard){if(guard&&isIterateeCall(object,multiValue,guard)){multiValue=undefined}var index=-1,props=keys(object),length=props.length,result={};while(++index<length){var key=props[index],value=object[key];if(multiValue){if(hasOwnProperty.call(result,value)){result[value].push(key)}else{result[value]=[key]}}else{result[value]=key}}return result}var keys=!nativeKeys?shimKeys:function(object){var Ctor=object==null?undefined:object.constructor;if(typeof Ctor=="function"&&Ctor.prototype===object||typeof object!="function"&&isArrayLike(object)){return shimKeys(object)}return isObject(object)?nativeKeys(object):[]};function keysIn(object){if(object==null){return[]}if(!isObject(object)){object=Object(object)}var length=object.length;length=length&&isLength(length)&&(isArray(object)||isArguments(object))&&length||0;var Ctor=object.constructor,index=-1,isProto=typeof Ctor=="function"&&Ctor.prototype===object,result=Array(length),skipIndexes=length>0;while(++index<length){result[index]=index+""}for(var key in object){if(!(skipIndexes&&isIndex(key,length))&&!(key=="constructor"&&(isProto||!hasOwnProperty.call(object,key)))){result.push(key)}}return result}var mapKeys=createObjectMapper(true);var mapValues=createObjectMapper();var omit=restParam(function(object,props){if(object==null){return{}}if(typeof props[0]!="function"){var props=arrayMap(baseFlatten(props),String);return pickByArray(object,baseDifference(keysIn(object),props))}var predicate=bindCallback(props[0],props[1],3);return pickByCallback(object,function(value,key,object){return!predicate(value,key,object)})});function pairs(object){object=toObject(object);var index=-1,props=keys(object),length=props.length,result=Array(length);while(++index<length){var key=props[index];result[index]=[key,object[key]]}return result}var pick=restParam(function(object,props){if(object==null){return{}}return typeof props[0]=="function"?pickByCallback(object,bindCallback(props[0],props[1],3)):pickByArray(object,baseFlatten(props))});function result(object,path,defaultValue){var result=object==null?undefined:object[path];if(result===undefined){if(object!=null&&!isKey(path,object)){path=toPath(path);object=path.length==1?object:baseGet(object,baseSlice(path,0,-1));result=object==null?undefined:object[last(path)]}result=result===undefined?defaultValue:result}return isFunction(result)?result.call(object):result}function set(object,path,value){if(object==null){return object}var pathKey=path+"";path=object[pathKey]!=null||isKey(path,object)?[pathKey]:toPath(path);var index=-1,length=path.length,lastIndex=length-1,nested=object;while(nested!=null&&++index<length){var key=path[index];if(isObject(nested)){if(index==lastIndex){nested[key]=value}else if(nested[key]==null){nested[key]=isIndex(path[index+1])?[]:{}}}nested=nested[key]}return object}function transform(object,iteratee,accumulator,thisArg){var isArr=isArray(object)||isTypedArray(object);iteratee=getCallback(iteratee,thisArg,4);if(accumulator==null){if(isArr||isObject(object)){var Ctor=object.constructor;if(isArr){accumulator=isArray(object)?new Ctor:[]}else{accumulator=baseCreate(isFunction(Ctor)?Ctor.prototype:undefined)}}else{accumulator={}}}(isArr?arrayEach:baseForOwn)(object,function(value,index,object){return iteratee(accumulator,value,index,object)});return accumulator}function values(object){return baseValues(object,keys(object))}function valuesIn(object){return baseValues(object,keysIn(object))}function inRange(value,start,end){start=+start||0;if(end===undefined){end=start;start=0}else{end=+end||0}return value>=nativeMin(start,end)&&value<nativeMax(start,end)}function random(min,max,floating){if(floating&&isIterateeCall(min,max,floating)){max=floating=undefined}var noMin=min==null,noMax=max==null;if(floating==null){if(noMax&&typeof min=="boolean"){floating=min;min=1}else if(typeof max=="boolean"){floating=max;noMax=true}}if(noMin&&noMax){max=1;noMax=false}min=+min||0;if(noMax){max=min;min=0}else{max=+max||0}if(floating||min%1||max%1){var rand=nativeRandom();return nativeMin(min+rand*(max-min+parseFloat("1e-"+((rand+"").length-1))),max)}return baseRandom(min,max)}var camelCase=createCompounder(function(result,word,index){word=word.toLowerCase();return result+(index?word.charAt(0).toUpperCase()+word.slice(1):word)});function capitalize(string){string=baseToString(string);return string&&string.charAt(0).toUpperCase()+string.slice(1)}function deburr(string){string=baseToString(string);return string&&string.replace(reLatin1,deburrLetter).replace(reComboMark,"")}function endsWith(string,target,position){string=baseToString(string);target=target+"";var length=string.length;position=position===undefined?length:nativeMin(position<0?0:+position||0,length);position-=target.length;return position>=0&&string.indexOf(target,position)==position}function escape(string){string=baseToString(string);return string&&reHasUnescapedHtml.test(string)?string.replace(reUnescapedHtml,escapeHtmlChar):string}function escapeRegExp(string){string=baseToString(string);return string&&reHasRegExpChars.test(string)?string.replace(reRegExpChars,escapeRegExpChar):string||"(?:)"}var kebabCase=createCompounder(function(result,word,index){return result+(index?"-":"")+word.toLowerCase()});function pad(string,length,chars){string=baseToString(string);length=+length;var strLength=string.length;if(strLength>=length||!nativeIsFinite(length)){return string}var mid=(length-strLength)/2,leftLength=nativeFloor(mid),rightLength=nativeCeil(mid);chars=createPadding("",rightLength,chars);return chars.slice(0,leftLength)+string+chars}var padLeft=createPadDir();var padRight=createPadDir(true);function parseInt(string,radix,guard){if(guard?isIterateeCall(string,radix,guard):radix==null){radix=0}else if(radix){radix=+radix}string=trim(string);return nativeParseInt(string,radix||(reHasHexPrefix.test(string)?16:10))}function repeat(string,n){var result="";string=baseToString(string);n=+n;if(n<1||!string||!nativeIsFinite(n)){return result}do{if(n%2){result+=string}n=nativeFloor(n/2);string+=string}while(n);return result}var snakeCase=createCompounder(function(result,word,index){return result+(index?"_":"")+word.toLowerCase()});var startCase=createCompounder(function(result,word,index){return result+(index?" ":"")+(word.charAt(0).toUpperCase()+word.slice(1))});function startsWith(string,target,position){string=baseToString(string);position=position==null?0:nativeMin(position<0?0:+position||0,string.length);return string.lastIndexOf(target,position)==position}function template(string,options,otherOptions){var settings=lodash.templateSettings;if(otherOptions&&isIterateeCall(string,options,otherOptions)){options=otherOptions=undefined}string=baseToString(string);options=assignWith(baseAssign({},otherOptions||options),settings,assignOwnDefaults);var imports=assignWith(baseAssign({},options.imports),settings.imports,assignOwnDefaults),importsKeys=keys(imports),importsValues=baseValues(imports,importsKeys);var isEscaping,isEvaluating,index=0,interpolate=options.interpolate||reNoMatch,source="__p += '";var reDelimiters=RegExp((options.escape||reNoMatch).source+"|"+interpolate.source+"|"+(interpolate===reInterpolate?reEsTemplate:reNoMatch).source+"|"+(options.evaluate||reNoMatch).source+"|$","g");var sourceURL="//# sourceURL="+("sourceURL"in options?options.sourceURL:"lodash.templateSources["+ ++templateCounter+"]")+"\n";string.replace(reDelimiters,function(match,escapeValue,interpolateValue,esTemplateValue,evaluateValue,offset){interpolateValue||(interpolateValue=esTemplateValue);source+=string.slice(index,offset).replace(reUnescapedString,escapeStringChar);if(escapeValue){isEscaping=true;source+="' +\n__e("+escapeValue+") +\n'"}if(evaluateValue){isEvaluating=true;source+="';\n"+evaluateValue+";\n__p += '"}if(interpolateValue){source+="' +\n((__t = ("+interpolateValue+")) == null ? '' : __t) +\n'"}index=offset+match.length;return match});source+="';\n";var variable=options.variable;if(!variable){source="with (obj) {\n"+source+"\n}\n"}source=(isEvaluating?source.replace(reEmptyStringLeading,""):source).replace(reEmptyStringMiddle,"$1").replace(reEmptyStringTrailing,"$1;");source="function("+(variable||"obj")+") {\n"+(variable?"":"obj || (obj = {});\n")+"var __t, __p = ''"+(isEscaping?", __e = _.escape":"")+(isEvaluating?", __j = Array.prototype.join;\n"+"function print() { __p += __j.call(arguments, '') }\n":";\n")+source+"return __p\n}";var result=attempt(function(){return Function(importsKeys,sourceURL+"return "+source).apply(undefined,importsValues)});result.source=source;if(isError(result)){throw result}return result}function trim(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(trimmedLeftIndex(string),trimmedRightIndex(string)+1)}chars=chars+"";return string.slice(charsLeftIndex(string,chars),charsRightIndex(string,chars)+1)}function trimLeft(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(trimmedLeftIndex(string))}return string.slice(charsLeftIndex(string,chars+""))}function trimRight(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(0,trimmedRightIndex(string)+1)}return string.slice(0,charsRightIndex(string,chars+"")+1)}function trunc(string,options,guard){if(guard&&isIterateeCall(string,options,guard)){options=undefined}var length=DEFAULT_TRUNC_LENGTH,omission=DEFAULT_TRUNC_OMISSION;if(options!=null){if(isObject(options)){var separator="separator"in options?options.separator:separator;length="length"in options?+options.length||0:length;omission="omission"in options?baseToString(options.omission):omission}else{length=+options||0}}string=baseToString(string);if(length>=string.length){return string}var end=length-omission.length;if(end<1){return omission}var result=string.slice(0,end);if(separator==null){return result+omission}if(isRegExp(separator)){if(string.slice(end).search(separator)){var match,newEnd,substring=string.slice(0,end);if(!separator.global){separator=RegExp(separator.source,(reFlags.exec(separator)||"")+"g")}separator.lastIndex=0;while(match=separator.exec(substring)){newEnd=match.index}result=result.slice(0,newEnd==null?end:newEnd)}}else if(string.indexOf(separator,end)!=end){var index=result.lastIndexOf(separator);if(index>-1){result=result.slice(0,index)}}return result+omission}function unescape(string){string=baseToString(string);return string&&reHasEscapedHtml.test(string)?string.replace(reEscapedHtml,unescapeHtmlChar):string}function words(string,pattern,guard){if(guard&&isIterateeCall(string,pattern,guard)){pattern=undefined}string=baseToString(string);return string.match(pattern||reWords)||[]}var attempt=restParam(function(func,args){try{return func.apply(undefined,args)}catch(e){return isError(e)?e:new Error(e)}});function callback(func,thisArg,guard){if(guard&&isIterateeCall(func,thisArg,guard)){thisArg=undefined}return isObjectLike(func)?matches(func):baseCallback(func,thisArg)}function constant(value){return function(){return value}}function identity(value){return value}function matches(source){return baseMatches(baseClone(source,true))}function matchesProperty(path,srcValue){return baseMatchesProperty(path,baseClone(srcValue,true))}var method=restParam(function(path,args){return function(object){return invokePath(object,path,args)}});var methodOf=restParam(function(object,args){return function(path){return invokePath(object,path,args)}});function mixin(object,source,options){if(options==null){var isObj=isObject(source),props=isObj?keys(source):undefined,methodNames=props&&props.length?baseFunctions(source,props):undefined;if(!(methodNames?methodNames.length:isObj)){methodNames=false;options=source;source=object;object=this}}if(!methodNames){methodNames=baseFunctions(source,keys(source))}var chain=true,index=-1,isFunc=isFunction(object),length=methodNames.length;if(options===false){chain=false}else if(isObject(options)&&"chain"in options){chain=options.chain}while(++index<length){var methodName=methodNames[index],func=source[methodName];object[methodName]=func;if(isFunc){object.prototype[methodName]=function(func){return function(){var chainAll=this.__chain__;if(chain||chainAll){var result=object(this.__wrapped__),actions=result.__actions__=arrayCopy(this.__actions__);actions.push({func:func,args:arguments,thisArg:object});result.__chain__=chainAll;return result}return func.apply(object,arrayPush([this.value()],arguments))}}(func)}}return object}function noConflict(){root._=oldDash;return this}function noop(){}function property(path){return isKey(path)?baseProperty(path):basePropertyDeep(path)}function propertyOf(object){return function(path){return baseGet(object,toPath(path),path+"")}}function range(start,end,step){if(step&&isIterateeCall(start,end,step)){end=step=undefined}start=+start||0;step=step==null?1:+step||0;if(end==null){end=start;start=0}else{end=+end||0}var index=-1,length=nativeMax(nativeCeil((end-start)/(step||1)),0),result=Array(length);while(++index<length){result[index]=start;start+=step}return result}function times(n,iteratee,thisArg){n=nativeFloor(n);if(n<1||!nativeIsFinite(n)){return[]}var index=-1,result=Array(nativeMin(n,MAX_ARRAY_LENGTH));iteratee=bindCallback(iteratee,thisArg,1);while(++index<n){if(index<MAX_ARRAY_LENGTH){result[index]=iteratee(index)}else{iteratee(index)}}return result}function uniqueId(prefix){var id=++idCounter;return baseToString(prefix)+id}function add(augend,addend){return(+augend||0)+(+addend||0)}var ceil=createRound("ceil");var floor=createRound("floor");var max=createExtremum(gt,NEGATIVE_INFINITY);var min=createExtremum(lt,POSITIVE_INFINITY);var round=createRound("round");function sum(collection,iteratee,thisArg){if(thisArg&&isIterateeCall(collection,iteratee,thisArg)){iteratee=undefined}iteratee=getCallback(iteratee,thisArg,3);return iteratee.length==1?arraySum(isArray(collection)?collection:toIterable(collection),iteratee):baseSum(collection,iteratee)}lodash.prototype=baseLodash.prototype;LodashWrapper.prototype=baseCreate(baseLodash.prototype);LodashWrapper.prototype.constructor=LodashWrapper;LazyWrapper.prototype=baseCreate(baseLodash.prototype);LazyWrapper.prototype.constructor=LazyWrapper;MapCache.prototype["delete"]=mapDelete;MapCache.prototype.get=mapGet;MapCache.prototype.has=mapHas;MapCache.prototype.set=mapSet;SetCache.prototype.push=cachePush;memoize.Cache=MapCache;lodash.after=after;lodash.ary=ary;lodash.assign=assign;lodash.at=at;lodash.before=before;lodash.bind=bind;lodash.bindAll=bindAll;lodash.bindKey=bindKey;lodash.callback=callback;lodash.chain=chain;lodash.chunk=chunk;lodash.compact=compact;lodash.constant=constant;lodash.countBy=countBy;lodash.create=create;lodash.curry=curry;lodash.curryRight=curryRight;lodash.debounce=debounce;lodash.defaults=defaults;lodash.defaultsDeep=defaultsDeep;lodash.defer=defer;lodash.delay=delay;lodash.difference=difference;lodash.drop=drop;lodash.dropRight=dropRight;lodash.dropRightWhile=dropRightWhile;lodash.dropWhile=dropWhile;lodash.fill=fill;lodash.filter=filter;lodash.flatten=flatten;lodash.flattenDeep=flattenDeep;lodash.flow=flow;lodash.flowRight=flowRight;lodash.forEach=forEach;lodash.forEachRight=forEachRight;lodash.forIn=forIn;lodash.forInRight=forInRight;lodash.forOwn=forOwn;lodash.forOwnRight=forOwnRight;lodash.functions=functions;lodash.groupBy=groupBy;lodash.indexBy=indexBy;lodash.initial=initial;lodash.intersection=intersection;lodash.invert=invert;lodash.invoke=invoke;lodash.keys=keys;lodash.keysIn=keysIn;lodash.map=map;lodash.mapKeys=mapKeys;lodash.mapValues=mapValues;lodash.matches=matches;lodash.matchesProperty=matchesProperty;lodash.memoize=memoize;lodash.merge=merge;lodash.method=method;lodash.methodOf=methodOf;lodash.mixin=mixin;lodash.modArgs=modArgs;lodash.negate=negate;lodash.omit=omit;lodash.once=once;lodash.pairs=pairs;lodash.partial=partial;lodash.partialRight=partialRight;lodash.partition=partition;lodash.pick=pick;lodash.pluck=pluck;lodash.property=property;lodash.propertyOf=propertyOf;lodash.pull=pull;lodash.pullAt=pullAt;lodash.range=range;lodash.rearg=rearg;lodash.reject=reject;lodash.remove=remove;lodash.rest=rest;lodash.restParam=restParam;lodash.set=set;lodash.shuffle=shuffle;lodash.slice=slice;lodash.sortBy=sortBy;lodash.sortByAll=sortByAll;lodash.sortByOrder=sortByOrder;lodash.spread=spread;lodash.take=take;lodash.takeRight=takeRight;lodash.takeRightWhile=takeRightWhile;lodash.takeWhile=takeWhile;lodash.tap=tap;lodash.throttle=throttle;lodash.thru=thru;lodash.times=times;lodash.toArray=toArray;lodash.toPlainObject=toPlainObject;lodash.transform=transform;lodash.union=union;lodash.uniq=uniq;lodash.unzip=unzip;lodash.unzipWith=unzipWith;lodash.values=values;lodash.valuesIn=valuesIn;lodash.where=where;lodash.without=without;
+lodash.wrap=wrap;lodash.xor=xor;lodash.zip=zip;lodash.zipObject=zipObject;lodash.zipWith=zipWith;lodash.backflow=flowRight;lodash.collect=map;lodash.compose=flowRight;lodash.each=forEach;lodash.eachRight=forEachRight;lodash.extend=assign;lodash.iteratee=callback;lodash.methods=functions;lodash.object=zipObject;lodash.select=filter;lodash.tail=rest;lodash.unique=uniq;mixin(lodash,lodash);lodash.add=add;lodash.attempt=attempt;lodash.camelCase=camelCase;lodash.capitalize=capitalize;lodash.ceil=ceil;lodash.clone=clone;lodash.cloneDeep=cloneDeep;lodash.deburr=deburr;lodash.endsWith=endsWith;lodash.escape=escape;lodash.escapeRegExp=escapeRegExp;lodash.every=every;lodash.find=find;lodash.findIndex=findIndex;lodash.findKey=findKey;lodash.findLast=findLast;lodash.findLastIndex=findLastIndex;lodash.findLastKey=findLastKey;lodash.findWhere=findWhere;lodash.first=first;lodash.floor=floor;lodash.get=get;lodash.gt=gt;lodash.gte=gte;lodash.has=has;lodash.identity=identity;lodash.includes=includes;lodash.indexOf=indexOf;lodash.inRange=inRange;lodash.isArguments=isArguments;lodash.isArray=isArray;lodash.isBoolean=isBoolean;lodash.isDate=isDate;lodash.isElement=isElement;lodash.isEmpty=isEmpty;lodash.isEqual=isEqual;lodash.isError=isError;lodash.isFinite=isFinite;lodash.isFunction=isFunction;lodash.isMatch=isMatch;lodash.isNaN=isNaN;lodash.isNative=isNative;lodash.isNull=isNull;lodash.isNumber=isNumber;lodash.isObject=isObject;lodash.isPlainObject=isPlainObject;lodash.isRegExp=isRegExp;lodash.isString=isString;lodash.isTypedArray=isTypedArray;lodash.isUndefined=isUndefined;lodash.kebabCase=kebabCase;lodash.last=last;lodash.lastIndexOf=lastIndexOf;lodash.lt=lt;lodash.lte=lte;lodash.max=max;lodash.min=min;lodash.noConflict=noConflict;lodash.noop=noop;lodash.now=now;lodash.pad=pad;lodash.padLeft=padLeft;lodash.padRight=padRight;lodash.parseInt=parseInt;lodash.random=random;lodash.reduce=reduce;lodash.reduceRight=reduceRight;lodash.repeat=repeat;lodash.result=result;lodash.round=round;lodash.runInContext=runInContext;lodash.size=size;lodash.snakeCase=snakeCase;lodash.some=some;lodash.sortedIndex=sortedIndex;lodash.sortedLastIndex=sortedLastIndex;lodash.startCase=startCase;lodash.startsWith=startsWith;lodash.sum=sum;lodash.template=template;lodash.trim=trim;lodash.trimLeft=trimLeft;lodash.trimRight=trimRight;lodash.trunc=trunc;lodash.unescape=unescape;lodash.uniqueId=uniqueId;lodash.words=words;lodash.all=every;lodash.any=some;lodash.contains=includes;lodash.eq=isEqual;lodash.detect=find;lodash.foldl=reduce;lodash.foldr=reduceRight;lodash.head=first;lodash.include=includes;lodash.inject=reduce;mixin(lodash,function(){var source={};baseForOwn(lodash,function(func,methodName){if(!lodash.prototype[methodName]){source[methodName]=func}});return source}(),false);lodash.sample=sample;lodash.prototype.sample=function(n){if(!this.__chain__&&n==null){return sample(this.value())}return this.thru(function(value){return sample(value,n)})};lodash.VERSION=VERSION;arrayEach(["bind","bindKey","curry","curryRight","partial","partialRight"],function(methodName){lodash[methodName].placeholder=lodash});arrayEach(["drop","take"],function(methodName,index){LazyWrapper.prototype[methodName]=function(n){var filtered=this.__filtered__;if(filtered&&!index){return new LazyWrapper(this)}n=n==null?1:nativeMax(nativeFloor(n)||0,0);var result=this.clone();if(filtered){result.__takeCount__=nativeMin(result.__takeCount__,n)}else{result.__views__.push({size:n,type:methodName+(result.__dir__<0?"Right":"")})}return result};LazyWrapper.prototype[methodName+"Right"]=function(n){return this.reverse()[methodName](n).reverse()}});arrayEach(["filter","map","takeWhile"],function(methodName,index){var type=index+1,isFilter=type!=LAZY_MAP_FLAG;LazyWrapper.prototype[methodName]=function(iteratee,thisArg){var result=this.clone();result.__iteratees__.push({iteratee:getCallback(iteratee,thisArg,1),type:type});result.__filtered__=result.__filtered__||isFilter;return result}});arrayEach(["first","last"],function(methodName,index){var takeName="take"+(index?"Right":"");LazyWrapper.prototype[methodName]=function(){return this[takeName](1).value()[0]}});arrayEach(["initial","rest"],function(methodName,index){var dropName="drop"+(index?"":"Right");LazyWrapper.prototype[methodName]=function(){return this.__filtered__?new LazyWrapper(this):this[dropName](1)}});arrayEach(["pluck","where"],function(methodName,index){var operationName=index?"filter":"map",createCallback=index?baseMatches:property;LazyWrapper.prototype[methodName]=function(value){return this[operationName](createCallback(value))}});LazyWrapper.prototype.compact=function(){return this.filter(identity)};LazyWrapper.prototype.reject=function(predicate,thisArg){predicate=getCallback(predicate,thisArg,1);return this.filter(function(value){return!predicate(value)})};LazyWrapper.prototype.slice=function(start,end){start=start==null?0:+start||0;var result=this;if(result.__filtered__&&(start>0||end<0)){return new LazyWrapper(result)}if(start<0){result=result.takeRight(-start)}else if(start){result=result.drop(start)}if(end!==undefined){end=+end||0;result=end<0?result.dropRight(-end):result.take(end-start)}return result};LazyWrapper.prototype.takeRightWhile=function(predicate,thisArg){return this.reverse().takeWhile(predicate,thisArg).reverse()};LazyWrapper.prototype.toArray=function(){return this.take(POSITIVE_INFINITY)};baseForOwn(LazyWrapper.prototype,function(func,methodName){var checkIteratee=/^(?:filter|map|reject)|While$/.test(methodName),retUnwrapped=/^(?:first|last)$/.test(methodName),lodashFunc=lodash[retUnwrapped?"take"+(methodName=="last"?"Right":""):methodName];if(!lodashFunc){return}lodash.prototype[methodName]=function(){var args=retUnwrapped?[1]:arguments,chainAll=this.__chain__,value=this.__wrapped__,isHybrid=!!this.__actions__.length,isLazy=value instanceof LazyWrapper,iteratee=args[0],useLazy=isLazy||isArray(value);if(useLazy&&checkIteratee&&typeof iteratee=="function"&&iteratee.length!=1){isLazy=useLazy=false}var interceptor=function(value){return retUnwrapped&&chainAll?lodashFunc(value,1)[0]:lodashFunc.apply(undefined,arrayPush([value],args))};var action={func:thru,args:[interceptor],thisArg:undefined},onlyLazy=isLazy&&!isHybrid;if(retUnwrapped&&!chainAll){if(onlyLazy){value=value.clone();value.__actions__.push(action);return func.call(value)}return lodashFunc.call(undefined,this.value())[0]}if(!retUnwrapped&&useLazy){value=onlyLazy?value:new LazyWrapper(this);var result=func.apply(value,args);result.__actions__.push(action);return new LodashWrapper(result,chainAll)}return this.thru(interceptor)}});arrayEach(["join","pop","push","replace","shift","sort","splice","split","unshift"],function(methodName){var func=(/^(?:replace|split)$/.test(methodName)?stringProto:arrayProto)[methodName],chainName=/^(?:push|sort|unshift)$/.test(methodName)?"tap":"thru",retUnwrapped=/^(?:join|pop|replace|shift)$/.test(methodName);lodash.prototype[methodName]=function(){var args=arguments;if(retUnwrapped&&!this.__chain__){return func.apply(this.value(),args)}return this[chainName](function(value){return func.apply(value,args)})}});baseForOwn(LazyWrapper.prototype,function(func,methodName){var lodashFunc=lodash[methodName];if(lodashFunc){var key=lodashFunc.name,names=realNames[key]||(realNames[key]=[]);names.push({name:methodName,func:lodashFunc})}});realNames[createHybridWrapper(undefined,BIND_KEY_FLAG).name]=[{name:"wrapper",func:undefined}];LazyWrapper.prototype.clone=lazyClone;LazyWrapper.prototype.reverse=lazyReverse;LazyWrapper.prototype.value=lazyValue;lodash.prototype.chain=wrapperChain;lodash.prototype.commit=wrapperCommit;lodash.prototype.concat=wrapperConcat;lodash.prototype.plant=wrapperPlant;lodash.prototype.reverse=wrapperReverse;lodash.prototype.toString=wrapperToString;lodash.prototype.run=lodash.prototype.toJSON=lodash.prototype.valueOf=lodash.prototype.value=wrapperValue;lodash.prototype.collect=lodash.prototype.map;lodash.prototype.head=lodash.prototype.first;lodash.prototype.select=lodash.prototype.filter;lodash.prototype.tail=lodash.prototype.rest;return lodash}var _=runInContext();if(typeof define=="function"&&typeof define.amd=="object"&&define.amd){root._=_;define(function(){return _})}else if(freeExports&&freeModule){if(moduleExports){(freeModule.exports=_)._=_}else{freeExports._=_}}else{root._=_}}).call(this)}).call(this,typeof global!=="undefined"?global:typeof self!=="undefined"?self:typeof window!=="undefined"?window:{})},{}],3:[function(require,module,exports){(function(window,document,undefined){var _MAP={8:"backspace",9:"tab",13:"enter",16:"shift",17:"ctrl",18:"alt",20:"capslock",27:"esc",32:"space",33:"pageup",34:"pagedown",35:"end",36:"home",37:"left",38:"up",39:"right",40:"down",45:"ins",46:"del",91:"meta",93:"meta",224:"meta"};var _KEYCODE_MAP={106:"*",107:"+",109:"-",110:".",111:"/",186:";",187:"=",188:",",189:"-",190:".",191:"/",192:"`",219:"[",220:"\\",221:"]",222:"'"};var _SHIFT_MAP={"~":"`","!":"1","@":"2","#":"3",$:"4","%":"5","^":"6","&":"7","*":"8","(":"9",")":"0",_:"-","+":"=",":":";",'"':"'","<":",",">":".","?":"/","|":"\\"};var _SPECIAL_ALIASES={option:"alt",command:"meta","return":"enter",escape:"esc",plus:"+",mod:/Mac|iPod|iPhone|iPad/.test(navigator.platform)?"meta":"ctrl"};var _REVERSE_MAP;for(var i=1;i<20;++i){_MAP[111+i]="f"+i}for(i=0;i<=9;++i){_MAP[i+96]=i}function _addEvent(object,type,callback){if(object.addEventListener){object.addEventListener(type,callback,false);return}object.attachEvent("on"+type,callback)}function _characterFromEvent(e){if(e.type=="keypress"){var character=String.fromCharCode(e.which);if(!e.shiftKey){character=character.toLowerCase()}return character}if(_MAP[e.which]){return _MAP[e.which]}if(_KEYCODE_MAP[e.which]){return _KEYCODE_MAP[e.which]}return String.fromCharCode(e.which).toLowerCase()}function _modifiersMatch(modifiers1,modifiers2){return modifiers1.sort().join(",")===modifiers2.sort().join(",")}function _eventModifiers(e){var modifiers=[];if(e.shiftKey){modifiers.push("shift")}if(e.altKey){modifiers.push("alt")}if(e.ctrlKey){modifiers.push("ctrl")}if(e.metaKey){modifiers.push("meta")}return modifiers}function _preventDefault(e){if(e.preventDefault){e.preventDefault();return}e.returnValue=false}function _stopPropagation(e){if(e.stopPropagation){e.stopPropagation();return}e.cancelBubble=true}function _isModifier(key){return key=="shift"||key=="ctrl"||key=="alt"||key=="meta"}function _getReverseMap(){if(!_REVERSE_MAP){_REVERSE_MAP={};for(var key in _MAP){if(key>95&&key<112){continue}if(_MAP.hasOwnProperty(key)){_REVERSE_MAP[_MAP[key]]=key}}}return _REVERSE_MAP}function _pickBestAction(key,modifiers,action){if(!action){action=_getReverseMap()[key]?"keydown":"keypress"}if(action=="keypress"&&modifiers.length){action="keydown"}return action}function _keysFromString(combination){if(combination==="+"){return["+"]}combination=combination.replace(/\+{2}/g,"+plus");return combination.split("+")}function _getKeyInfo(combination,action){var keys;var key;var i;var modifiers=[];keys=_keysFromString(combination);for(i=0;i<keys.length;++i){key=keys[i];if(_SPECIAL_ALIASES[key]){key=_SPECIAL_ALIASES[key]}if(action&&action!="keypress"&&_SHIFT_MAP[key]){key=_SHIFT_MAP[key];modifiers.push("shift")}if(_isModifier(key)){modifiers.push(key)}}action=_pickBestAction(key,modifiers,action);return{key:key,modifiers:modifiers,action:action}}function _belongsTo(element,ancestor){if(element===null||element===document){return false}if(element===ancestor){return true}return _belongsTo(element.parentNode,ancestor)}function Mousetrap(targetElement){var self=this;targetElement=targetElement||document;if(!(self instanceof Mousetrap)){return new Mousetrap(targetElement)}self.target=targetElement;self._callbacks={};self._directMap={};var _sequenceLevels={};var _resetTimer;var _ignoreNextKeyup=false;var _ignoreNextKeypress=false;var _nextExpectedAction=false;function _resetSequences(doNotReset){doNotReset=doNotReset||{};var activeSequences=false,key;for(key in _sequenceLevels){if(doNotReset[key]){activeSequences=true;continue}_sequenceLevels[key]=0}if(!activeSequences){_nextExpectedAction=false}}function _getMatches(character,modifiers,e,sequenceName,combination,level){var i;var callback;var matches=[];var action=e.type;if(!self._callbacks[character]){return[]}if(action=="keyup"&&_isModifier(character)){modifiers=[character]}for(i=0;i<self._callbacks[character].length;++i){callback=self._callbacks[character][i];if(!sequenceName&&callback.seq&&_sequenceLevels[callback.seq]!=callback.level){continue}if(action!=callback.action){continue}if(action=="keypress"&&!e.metaKey&&!e.ctrlKey||_modifiersMatch(modifiers,callback.modifiers)){var deleteCombo=!sequenceName&&callback.combo==combination;var deleteSequence=sequenceName&&callback.seq==sequenceName&&callback.level==level;if(deleteCombo||deleteSequence){self._callbacks[character].splice(i,1)}matches.push(callback)}}return matches}function _fireCallback(callback,e,combo,sequence){if(self.stopCallback(e,e.target||e.srcElement,combo,sequence)){return}if(callback(e,combo)===false){_preventDefault(e);_stopPropagation(e)}}self._handleKey=function(character,modifiers,e){var callbacks=_getMatches(character,modifiers,e);var i;var doNotReset={};var maxLevel=0;var processedSequenceCallback=false;for(i=0;i<callbacks.length;++i){if(callbacks[i].seq){maxLevel=Math.max(maxLevel,callbacks[i].level)}}for(i=0;i<callbacks.length;++i){if(callbacks[i].seq){if(callbacks[i].level!=maxLevel){continue}processedSequenceCallback=true;doNotReset[callbacks[i].seq]=1;_fireCallback(callbacks[i].callback,e,callbacks[i].combo,callbacks[i].seq);continue}if(!processedSequenceCallback){_fireCallback(callbacks[i].callback,e,callbacks[i].combo)}}var ignoreThisKeypress=e.type=="keypress"&&_ignoreNextKeypress;if(e.type==_nextExpectedAction&&!_isModifier(character)&&!ignoreThisKeypress){_resetSequences(doNotReset)}_ignoreNextKeypress=processedSequenceCallback&&e.type=="keydown"};function _handleKeyEvent(e){if(typeof e.which!=="number"){e.which=e.keyCode}var character=_characterFromEvent(e);if(!character){return}if(e.type=="keyup"&&_ignoreNextKeyup===character){_ignoreNextKeyup=false;return}self.handleKey(character,_eventModifiers(e),e)}function _resetSequenceTimer(){clearTimeout(_resetTimer);_resetTimer=setTimeout(_resetSequences,1e3)}function _bindSequence(combo,keys,callback,action){_sequenceLevels[combo]=0;function _increaseSequence(nextAction){return function(){_nextExpectedAction=nextAction;++_sequenceLevels[combo];_resetSequenceTimer()}}function _callbackAndReset(e){_fireCallback(callback,e,combo);if(action!=="keyup"){_ignoreNextKeyup=_characterFromEvent(e)}setTimeout(_resetSequences,10)}for(var i=0;i<keys.length;++i){var isFinal=i+1===keys.length;var wrappedCallback=isFinal?_callbackAndReset:_increaseSequence(action||_getKeyInfo(keys[i+1]).action);_bindSingle(keys[i],wrappedCallback,action,combo,i)}}function _bindSingle(combination,callback,action,sequenceName,level){self._directMap[combination+":"+action]=callback;combination=combination.replace(/\s+/g," ");var sequence=combination.split(" ");var info;if(sequence.length>1){_bindSequence(combination,sequence,callback,action);return}info=_getKeyInfo(combination,action);self._callbacks[info.key]=self._callbacks[info.key]||[];_getMatches(info.key,info.modifiers,{type:info.action},sequenceName,combination,level);self._callbacks[info.key][sequenceName?"unshift":"push"]({callback:callback,modifiers:info.modifiers,action:info.action,seq:sequenceName,level:level,combo:combination})}self._bindMultiple=function(combinations,callback,action){for(var i=0;i<combinations.length;++i){_bindSingle(combinations[i],callback,action)}};_addEvent(targetElement,"keypress",_handleKeyEvent);_addEvent(targetElement,"keydown",_handleKeyEvent);_addEvent(targetElement,"keyup",_handleKeyEvent)}Mousetrap.prototype.bind=function(keys,callback,action){var self=this;keys=keys instanceof Array?keys:[keys];self._bindMultiple.call(self,keys,callback,action);return self};Mousetrap.prototype.unbind=function(keys,action){var self=this;return self.bind.call(self,keys,function(){},action)};Mousetrap.prototype.trigger=function(keys,action){var self=this;if(self._directMap[keys+":"+action]){self._directMap[keys+":"+action]({},keys)}return self};Mousetrap.prototype.reset=function(){var self=this;self._callbacks={};self._directMap={};return self};Mousetrap.prototype.stopCallback=function(e,element){var self=this;if((" "+element.className+" ").indexOf(" mousetrap ")>-1){return false}if(_belongsTo(element,self.target)){return false}return element.tagName=="INPUT"||element.tagName=="SELECT"||element.tagName=="TEXTAREA"||element.isContentEditable};Mousetrap.prototype.handleKey=function(){var self=this;return self._handleKey.apply(self,arguments)};Mousetrap.init=function(){var documentMousetrap=Mousetrap(document);for(var method in documentMousetrap){if(method.charAt(0)!=="_"){Mousetrap[method]=function(method){return function(){return documentMousetrap[method].apply(documentMousetrap,arguments)}}(method)}}};Mousetrap.init();window.Mousetrap=Mousetrap;if(typeof module!=="undefined"&&module.exports){module.exports=Mousetrap}if(typeof define==="function"&&define.amd){define(function(){return Mousetrap})}})(window,document)},{}],4:[function(require,module,exports){(function(process){function normalizeArray(parts,allowAboveRoot){var up=0;for(var i=parts.length-1;i>=0;i--){var last=parts[i];if(last==="."){parts.splice(i,1)}else if(last===".."){parts.splice(i,1);up++}else if(up){parts.splice(i,1);up--}}if(allowAboveRoot){for(;up--;up){parts.unshift("..")}}return parts}var splitPathRe=/^(\/?|)([\s\S]*?)((?:\.{1,2}|[^\/]+?|)(\.[^.\/]*|))(?:[\/]*)$/;var splitPath=function(filename){return splitPathRe.exec(filename).slice(1)};exports.resolve=function(){var resolvedPath="",resolvedAbsolute=false;for(var i=arguments.length-1;i>=-1&&!resolvedAbsolute;i--){var path=i>=0?arguments[i]:process.cwd();if(typeof path!=="string"){throw new TypeError("Arguments to path.resolve must be strings")}else if(!path){continue}resolvedPath=path+"/"+resolvedPath;resolvedAbsolute=path.charAt(0)==="/"}resolvedPath=normalizeArray(filter(resolvedPath.split("/"),function(p){return!!p}),!resolvedAbsolute).join("/");return(resolvedAbsolute?"/":"")+resolvedPath||"."};exports.normalize=function(path){var isAbsolute=exports.isAbsolute(path),trailingSlash=substr(path,-1)==="/";path=normalizeArray(filter(path.split("/"),function(p){return!!p}),!isAbsolute).join("/");if(!path&&!isAbsolute){path="."}if(path&&trailingSlash){path+="/"}return(isAbsolute?"/":"")+path};exports.isAbsolute=function(path){return path.charAt(0)==="/"};exports.join=function(){var paths=Array.prototype.slice.call(arguments,0);return exports.normalize(filter(paths,function(p,index){if(typeof p!=="string"){throw new TypeError("Arguments to path.join must be strings")}return p}).join("/"))};exports.relative=function(from,to){from=exports.resolve(from).substr(1);to=exports.resolve(to).substr(1);function trim(arr){var start=0;for(;start<arr.length;start++){if(arr[start]!=="")break}var end=arr.length-1;for(;end>=0;end--){if(arr[end]!=="")break}if(start>end)return[];return arr.slice(start,end-start+1)}var fromParts=trim(from.split("/"));var toParts=trim(to.split("/"));var length=Math.min(fromParts.length,toParts.length);var samePartsLength=length;for(var i=0;i<length;i++){if(fromParts[i]!==toParts[i]){samePartsLength=i;break}}var outputParts=[];for(var i=samePartsLength;i<fromParts.length;i++){outputParts.push("..")}outputParts=outputParts.concat(toParts.slice(samePartsLength));return outputParts.join("/")};exports.sep="/";exports.delimiter=":";exports.dirname=function(path){var result=splitPath(path),root=result[0],dir=result[1];if(!root&&!dir){return"."}if(dir){dir=dir.substr(0,dir.length-1)}return root+dir};exports.basename=function(path,ext){var f=splitPath(path)[2];if(ext&&f.substr(-1*ext.length)===ext){f=f.substr(0,f.length-ext.length)}return f};exports.extname=function(path){return splitPath(path)[3]};function filter(xs,f){if(xs.filter)return xs.filter(f);var res=[];for(var i=0;i<xs.length;i++){if(f(xs[i],i,xs))res.push(xs[i])}return res}var substr="ab".substr(-1)==="b"?function(str,start,len){return str.substr(start,len)}:function(str,start,len){if(start<0)start=str.length+start;return str.substr(start,len)}}).call(this,require("_process"))},{_process:5}],5:[function(require,module,exports){var process=module.exports={};var queue=[];var draining=false;var currentQueue;var queueIndex=-1;function cleanUpNextTick(){draining=false;if(currentQueue.length){queue=currentQueue.concat(queue)}else{queueIndex=-1}if(queue.length){drainQueue()}}function drainQueue(){if(draining){return}var timeout=setTimeout(cleanUpNextTick);draining=true;var len=queue.length;while(len){currentQueue=queue;queue=[];while(++queueIndex<len){if(currentQueue){currentQueue[queueIndex].run()}}queueIndex=-1;len=queue.length}currentQueue=null;draining=false;clearTimeout(timeout)}process.nextTick=function(fun){var args=new Array(arguments.length-1);if(arguments.length>1){for(var i=1;i<arguments.length;i++){args[i-1]=arguments[i]}}queue.push(new Item(fun,args));if(queue.length===1&&!draining){setTimeout(drainQueue,0)}};function Item(fun,array){this.fun=fun;this.array=array}Item.prototype.run=function(){this.fun.apply(null,this.array)};process.title="browser";process.browser=true;process.env={};process.argv=[];process.version="";process.versions={};function noop(){}process.on=noop;process.addListener=noop;process.once=noop;process.off=noop;process.removeListener=noop;process.removeAllListeners=noop;process.emit=noop;process.binding=function(name){throw new Error("process.binding is not supported")};process.cwd=function(){return"/"};process.chdir=function(dir){throw new Error("process.chdir is not supported")};process.umask=function(){return 0}},{}],6:[function(require,module,exports){(function(global){(function(root){var freeExports=typeof exports=="object"&&exports&&!exports.nodeType&&exports;var freeModule=typeof module=="object"&&module&&!module.nodeType&&module;var freeGlobal=typeof global=="object"&&global;if(freeGlobal.global===freeGlobal||freeGlobal.window===freeGlobal||freeGlobal.self===freeGlobal){root=freeGlobal}var punycode,maxInt=2147483647,base=36,tMin=1,tMax=26,skew=38,damp=700,initialBias=72,initialN=128,delimiter="-",regexPunycode=/^xn--/,regexNonASCII=/[^\x20-\x7E]/,regexSeparators=/[\x2E\u3002\uFF0E\uFF61]/g,errors={overflow:"Overflow: input needs wider integers to process","not-basic":"Illegal input >= 0x80 (not a basic code point)","invalid-input":"Invalid input"},baseMinusTMin=base-tMin,floor=Math.floor,stringFromCharCode=String.fromCharCode,key;function error(type){throw RangeError(errors[type])}function map(array,fn){var length=array.length;var result=[];while(length--){result[length]=fn(array[length])}return result}function mapDomain(string,fn){var parts=string.split("@");var result="";if(parts.length>1){result=parts[0]+"@";string=parts[1]}string=string.replace(regexSeparators,".");var labels=string.split(".");var encoded=map(labels,fn).join(".");return result+encoded}function ucs2decode(string){var output=[],counter=0,length=string.length,value,extra;while(counter<length){value=string.charCodeAt(counter++);if(value>=55296&&value<=56319&&counter<length){extra=string.charCodeAt(counter++);if((extra&64512)==56320){output.push(((value&1023)<<10)+(extra&1023)+65536)}else{output.push(value);counter--}}else{output.push(value)}}return output}function ucs2encode(array){return map(array,function(value){var output="";if(value>65535){value-=65536;output+=stringFromCharCode(value>>>10&1023|55296);value=56320|value&1023}output+=stringFromCharCode(value);return output}).join("")}function basicToDigit(codePoint){if(codePoint-48<10){return codePoint-22}if(codePoint-65<26){return codePoint-65}if(codePoint-97<26){return codePoint-97}return base}function digitToBasic(digit,flag){return digit+22+75*(digit<26)-((flag!=0)<<5)}function adapt(delta,numPoints,firstTime){var k=0;delta=firstTime?floor(delta/damp):delta>>1;delta+=floor(delta/numPoints);for(;delta>baseMinusTMin*tMax>>1;k+=base){delta=floor(delta/baseMinusTMin)}return floor(k+(baseMinusTMin+1)*delta/(delta+skew))}function decode(input){var output=[],inputLength=input.length,out,i=0,n=initialN,bias=initialBias,basic,j,index,oldi,w,k,digit,t,baseMinusT;basic=input.lastIndexOf(delimiter);if(basic<0){basic=0}for(j=0;j<basic;++j){if(input.charCodeAt(j)>=128){error("not-basic")}output.push(input.charCodeAt(j))}for(index=basic>0?basic+1:0;index<inputLength;){for(oldi=i,w=1,k=base;;k+=base){if(index>=inputLength){error("invalid-input")}digit=basicToDigit(input.charCodeAt(index++));if(digit>=base||digit>floor((maxInt-i)/w)){error("overflow")}i+=digit*w;t=k<=bias?tMin:k>=bias+tMax?tMax:k-bias;if(digit<t){break}baseMinusT=base-t;if(w>floor(maxInt/baseMinusT)){error("overflow")}w*=baseMinusT}out=output.length+1;bias=adapt(i-oldi,out,oldi==0);if(floor(i/out)>maxInt-n){error("overflow")}n+=floor(i/out);i%=out;output.splice(i++,0,n)}return ucs2encode(output)}function encode(input){var n,delta,handledCPCount,basicLength,bias,j,m,q,k,t,currentValue,output=[],inputLength,handledCPCountPlusOne,baseMinusT,qMinusT;input=ucs2decode(input);inputLength=input.length;n=initialN;delta=0;bias=initialBias;for(j=0;j<inputLength;++j){currentValue=input[j];if(currentValue<128){output.push(stringFromCharCode(currentValue))}}handledCPCount=basicLength=output.length;if(basicLength){output.push(delimiter)}while(handledCPCount<inputLength){for(m=maxInt,j=0;j<inputLength;++j){currentValue=input[j];if(currentValue>=n&&currentValue<m){m=currentValue}}handledCPCountPlusOne=handledCPCount+1;if(m-n>floor((maxInt-delta)/handledCPCountPlusOne)){error("overflow")}delta+=(m-n)*handledCPCountPlusOne;n=m;for(j=0;j<inputLength;++j){currentValue=input[j];if(currentValue<n&&++delta>maxInt){error("overflow")}if(currentValue==n){for(q=delta,k=base;;k+=base){t=k<=bias?tMin:k>=bias+tMax?tMax:k-bias;if(q<t){break}qMinusT=q-t;baseMinusT=base-t;output.push(stringFromCharCode(digitToBasic(t+qMinusT%baseMinusT,0)));q=floor(qMinusT/baseMinusT)}output.push(stringFromCharCode(digitToBasic(q,0)));bias=adapt(delta,handledCPCountPlusOne,handledCPCount==basicLength);delta=0;++handledCPCount}}++delta;++n}return output.join("")}function toUnicode(input){return mapDomain(input,function(string){return regexPunycode.test(string)?decode(string.slice(4).toLowerCase()):string})}function toASCII(input){return mapDomain(input,function(string){return regexNonASCII.test(string)?"xn--"+encode(string):string})}punycode={version:"1.3.2",ucs2:{decode:ucs2decode,encode:ucs2encode},decode:decode,encode:encode,toASCII:toASCII,toUnicode:toUnicode};if(typeof define=="function"&&typeof define.amd=="object"&&define.amd){define("punycode",function(){return punycode})}else if(freeExports&&freeModule){if(module.exports==freeExports){freeModule.exports=punycode}else{for(key in punycode){punycode.hasOwnProperty(key)&&(freeExports[key]=punycode[key])}}}else{root.punycode=punycode}})(this)}).call(this,typeof global!=="undefined"?global:typeof self!=="undefined"?self:typeof window!=="undefined"?window:{})},{}],7:[function(require,module,exports){"use strict";function hasOwnProperty(obj,prop){return Object.prototype.hasOwnProperty.call(obj,prop)}module.exports=function(qs,sep,eq,options){sep=sep||"&";eq=eq||"=";var obj={};if(typeof qs!=="string"||qs.length===0){return obj}var regexp=/\+/g;qs=qs.split(sep);var maxKeys=1e3;if(options&&typeof options.maxKeys==="number"){maxKeys=options.maxKeys}var len=qs.length;if(maxKeys>0&&len>maxKeys){len=maxKeys}for(var i=0;i<len;++i){var x=qs[i].replace(regexp,"%20"),idx=x.indexOf(eq),kstr,vstr,k,v;if(idx>=0){kstr=x.substr(0,idx);vstr=x.substr(idx+1)}else{kstr=x;vstr=""}k=decodeURIComponent(kstr);v=decodeURIComponent(vstr);if(!hasOwnProperty(obj,k)){obj[k]=v}else if(isArray(obj[k])){obj[k].push(v)}else{obj[k]=[obj[k],v]}}return obj};var isArray=Array.isArray||function(xs){return Object.prototype.toString.call(xs)==="[object Array]"}},{}],8:[function(require,module,exports){"use strict";var stringifyPrimitive=function(v){switch(typeof v){case"string":return v;case"boolean":return v?"true":"false";case"number":return isFinite(v)?v:"";default:return""}};module.exports=function(obj,sep,eq,name){sep=sep||"&";eq=eq||"=";if(obj===null){obj=undefined}if(typeof obj==="object"){return map(objectKeys(obj),function(k){var ks=encodeURIComponent(stringifyPrimitive(k))+eq;if(isArray(obj[k])){return map(obj[k],function(v){return ks+encodeURIComponent(stringifyPrimitive(v))}).join(sep)}else{return ks+encodeURIComponent(stringifyPrimitive(obj[k]))}}).join(sep)}if(!name)return"";return encodeURIComponent(stringifyPrimitive(name))+eq+encodeURIComponent(stringifyPrimitive(obj))};var isArray=Array.isArray||function(xs){return Object.prototype.toString.call(xs)==="[object Array]"};function map(xs,f){if(xs.map)return xs.map(f);var res=[];for(var i=0;i<xs.length;i++){res.push(f(xs[i],i))}return res}var objectKeys=Object.keys||function(obj){var res=[];for(var key in obj){if(Object.prototype.hasOwnProperty.call(obj,key))res.push(key)}return res}},{}],9:[function(require,module,exports){"use strict";exports.decode=exports.parse=require("./decode");exports.encode=exports.stringify=require("./encode")},{"./decode":7,"./encode":8}],10:[function(require,module,exports){var punycode=require("punycode");exports.parse=urlParse;exports.resolve=urlResolve;exports.resolveObject=urlResolveObject;exports.format=urlFormat;exports.Url=Url;function Url(){this.protocol=null;this.slashes=null;this.auth=null;this.host=null;this.port=null;this.hostname=null;this.hash=null;this.search=null;this.query=null;this.pathname=null;this.path=null;this.href=null}var protocolPattern=/^([a-z0-9.+-]+:)/i,portPattern=/:[0-9]*$/,delims=["<",">",'"',"`"," ","\r","\n","	"],unwise=["{","}","|","\\","^","`"].concat(delims),autoEscape=["'"].concat(unwise),nonHostChars=["%","/","?",";","#"].concat(autoEscape),hostEndingChars=["/","?","#"],hostnameMaxLen=255,hostnamePartPattern=/^[a-z0-9A-Z_-]{0,63}$/,hostnamePartStart=/^([a-z0-9A-Z_-]{0,63})(.*)$/,unsafeProtocol={javascript:true,"javascript:":true},hostlessProtocol={javascript:true,"javascript:":true},slashedProtocol={http:true,https:true,ftp:true,gopher:true,file:true,"http:":true,"https:":true,"ftp:":true,"gopher:":true,"file:":true},querystring=require("querystring");function urlParse(url,parseQueryString,slashesDenoteHost){if(url&&isObject(url)&&url instanceof Url)return url;var u=new Url;u.parse(url,parseQueryString,slashesDenoteHost);return u}Url.prototype.parse=function(url,parseQueryString,slashesDenoteHost){if(!isString(url)){throw new TypeError("Parameter 'url' must be a string, not "+typeof url)}var rest=url;rest=rest.trim();var proto=protocolPattern.exec(rest);if(proto){proto=proto[0];var lowerProto=proto.toLowerCase();this.protocol=lowerProto;rest=rest.substr(proto.length)}if(slashesDenoteHost||proto||rest.match(/^\/\/[^@\/]+@[^@\/]+/)){var slashes=rest.substr(0,2)==="//";if(slashes&&!(proto&&hostlessProtocol[proto])){rest=rest.substr(2);this.slashes=true}}if(!hostlessProtocol[proto]&&(slashes||proto&&!slashedProtocol[proto])){var hostEnd=-1;for(var i=0;i<hostEndingChars.length;i++){var hec=rest.indexOf(hostEndingChars[i]);if(hec!==-1&&(hostEnd===-1||hec<hostEnd))hostEnd=hec}var auth,atSign;if(hostEnd===-1){atSign=rest.lastIndexOf("@")}else{atSign=rest.lastIndexOf("@",hostEnd)}if(atSign!==-1){auth=rest.slice(0,atSign);rest=rest.slice(atSign+1);this.auth=decodeURIComponent(auth)}hostEnd=-1;for(var i=0;i<nonHostChars.length;i++){var hec=rest.indexOf(nonHostChars[i]);if(hec!==-1&&(hostEnd===-1||hec<hostEnd))hostEnd=hec}if(hostEnd===-1)hostEnd=rest.length;this.host=rest.slice(0,hostEnd);rest=rest.slice(hostEnd);this.parseHost();this.hostname=this.hostname||"";var ipv6Hostname=this.hostname[0]==="["&&this.hostname[this.hostname.length-1]==="]";if(!ipv6Hostname){var hostparts=this.hostname.split(/\./);
+for(var i=0,l=hostparts.length;i<l;i++){var part=hostparts[i];if(!part)continue;if(!part.match(hostnamePartPattern)){var newpart="";for(var j=0,k=part.length;j<k;j++){if(part.charCodeAt(j)>127){newpart+="x"}else{newpart+=part[j]}}if(!newpart.match(hostnamePartPattern)){var validParts=hostparts.slice(0,i);var notHost=hostparts.slice(i+1);var bit=part.match(hostnamePartStart);if(bit){validParts.push(bit[1]);notHost.unshift(bit[2])}if(notHost.length){rest="/"+notHost.join(".")+rest}this.hostname=validParts.join(".");break}}}}if(this.hostname.length>hostnameMaxLen){this.hostname=""}else{this.hostname=this.hostname.toLowerCase()}if(!ipv6Hostname){var domainArray=this.hostname.split(".");var newOut=[];for(var i=0;i<domainArray.length;++i){var s=domainArray[i];newOut.push(s.match(/[^A-Za-z0-9_-]/)?"xn--"+punycode.encode(s):s)}this.hostname=newOut.join(".")}var p=this.port?":"+this.port:"";var h=this.hostname||"";this.host=h+p;this.href+=this.host;if(ipv6Hostname){this.hostname=this.hostname.substr(1,this.hostname.length-2);if(rest[0]!=="/"){rest="/"+rest}}}if(!unsafeProtocol[lowerProto]){for(var i=0,l=autoEscape.length;i<l;i++){var ae=autoEscape[i];var esc=encodeURIComponent(ae);if(esc===ae){esc=escape(ae)}rest=rest.split(ae).join(esc)}}var hash=rest.indexOf("#");if(hash!==-1){this.hash=rest.substr(hash);rest=rest.slice(0,hash)}var qm=rest.indexOf("?");if(qm!==-1){this.search=rest.substr(qm);this.query=rest.substr(qm+1);if(parseQueryString){this.query=querystring.parse(this.query)}rest=rest.slice(0,qm)}else if(parseQueryString){this.search="";this.query={}}if(rest)this.pathname=rest;if(slashedProtocol[lowerProto]&&this.hostname&&!this.pathname){this.pathname="/"}if(this.pathname||this.search){var p=this.pathname||"";var s=this.search||"";this.path=p+s}this.href=this.format();return this};function urlFormat(obj){if(isString(obj))obj=urlParse(obj);if(!(obj instanceof Url))return Url.prototype.format.call(obj);return obj.format()}Url.prototype.format=function(){var auth=this.auth||"";if(auth){auth=encodeURIComponent(auth);auth=auth.replace(/%3A/i,":");auth+="@"}var protocol=this.protocol||"",pathname=this.pathname||"",hash=this.hash||"",host=false,query="";if(this.host){host=auth+this.host}else if(this.hostname){host=auth+(this.hostname.indexOf(":")===-1?this.hostname:"["+this.hostname+"]");if(this.port){host+=":"+this.port}}if(this.query&&isObject(this.query)&&Object.keys(this.query).length){query=querystring.stringify(this.query)}var search=this.search||query&&"?"+query||"";if(protocol&&protocol.substr(-1)!==":")protocol+=":";if(this.slashes||(!protocol||slashedProtocol[protocol])&&host!==false){host="//"+(host||"");if(pathname&&pathname.charAt(0)!=="/")pathname="/"+pathname}else if(!host){host=""}if(hash&&hash.charAt(0)!=="#")hash="#"+hash;if(search&&search.charAt(0)!=="?")search="?"+search;pathname=pathname.replace(/[?#]/g,function(match){return encodeURIComponent(match)});search=search.replace("#","%23");return protocol+host+pathname+search+hash};function urlResolve(source,relative){return urlParse(source,false,true).resolve(relative)}Url.prototype.resolve=function(relative){return this.resolveObject(urlParse(relative,false,true)).format()};function urlResolveObject(source,relative){if(!source)return relative;return urlParse(source,false,true).resolveObject(relative)}Url.prototype.resolveObject=function(relative){if(isString(relative)){var rel=new Url;rel.parse(relative,false,true);relative=rel}var result=new Url;Object.keys(this).forEach(function(k){result[k]=this[k]},this);result.hash=relative.hash;if(relative.href===""){result.href=result.format();return result}if(relative.slashes&&!relative.protocol){Object.keys(relative).forEach(function(k){if(k!=="protocol")result[k]=relative[k]});if(slashedProtocol[result.protocol]&&result.hostname&&!result.pathname){result.path=result.pathname="/"}result.href=result.format();return result}if(relative.protocol&&relative.protocol!==result.protocol){if(!slashedProtocol[relative.protocol]){Object.keys(relative).forEach(function(k){result[k]=relative[k]});result.href=result.format();return result}result.protocol=relative.protocol;if(!relative.host&&!hostlessProtocol[relative.protocol]){var relPath=(relative.pathname||"").split("/");while(relPath.length&&!(relative.host=relPath.shift()));if(!relative.host)relative.host="";if(!relative.hostname)relative.hostname="";if(relPath[0]!=="")relPath.unshift("");if(relPath.length<2)relPath.unshift("");result.pathname=relPath.join("/")}else{result.pathname=relative.pathname}result.search=relative.search;result.query=relative.query;result.host=relative.host||"";result.auth=relative.auth;result.hostname=relative.hostname||relative.host;result.port=relative.port;if(result.pathname||result.search){var p=result.pathname||"";var s=result.search||"";result.path=p+s}result.slashes=result.slashes||relative.slashes;result.href=result.format();return result}var isSourceAbs=result.pathname&&result.pathname.charAt(0)==="/",isRelAbs=relative.host||relative.pathname&&relative.pathname.charAt(0)==="/",mustEndAbs=isRelAbs||isSourceAbs||result.host&&relative.pathname,removeAllDots=mustEndAbs,srcPath=result.pathname&&result.pathname.split("/")||[],relPath=relative.pathname&&relative.pathname.split("/")||[],psychotic=result.protocol&&!slashedProtocol[result.protocol];if(psychotic){result.hostname="";result.port=null;if(result.host){if(srcPath[0]==="")srcPath[0]=result.host;else srcPath.unshift(result.host)}result.host="";if(relative.protocol){relative.hostname=null;relative.port=null;if(relative.host){if(relPath[0]==="")relPath[0]=relative.host;else relPath.unshift(relative.host)}relative.host=null}mustEndAbs=mustEndAbs&&(relPath[0]===""||srcPath[0]==="")}if(isRelAbs){result.host=relative.host||relative.host===""?relative.host:result.host;result.hostname=relative.hostname||relative.hostname===""?relative.hostname:result.hostname;result.search=relative.search;result.query=relative.query;srcPath=relPath}else if(relPath.length){if(!srcPath)srcPath=[];srcPath.pop();srcPath=srcPath.concat(relPath);result.search=relative.search;result.query=relative.query}else if(!isNullOrUndefined(relative.search)){if(psychotic){result.hostname=result.host=srcPath.shift();var authInHost=result.host&&result.host.indexOf("@")>0?result.host.split("@"):false;if(authInHost){result.auth=authInHost.shift();result.host=result.hostname=authInHost.shift()}}result.search=relative.search;result.query=relative.query;if(!isNull(result.pathname)||!isNull(result.search)){result.path=(result.pathname?result.pathname:"")+(result.search?result.search:"")}result.href=result.format();return result}if(!srcPath.length){result.pathname=null;if(result.search){result.path="/"+result.search}else{result.path=null}result.href=result.format();return result}var last=srcPath.slice(-1)[0];var hasTrailingSlash=(result.host||relative.host)&&(last==="."||last==="..")||last==="";var up=0;for(var i=srcPath.length;i>=0;i--){last=srcPath[i];if(last=="."){srcPath.splice(i,1)}else if(last===".."){srcPath.splice(i,1);up++}else if(up){srcPath.splice(i,1);up--}}if(!mustEndAbs&&!removeAllDots){for(;up--;up){srcPath.unshift("..")}}if(mustEndAbs&&srcPath[0]!==""&&(!srcPath[0]||srcPath[0].charAt(0)!=="/")){srcPath.unshift("")}if(hasTrailingSlash&&srcPath.join("/").substr(-1)!=="/"){srcPath.push("")}var isAbsolute=srcPath[0]===""||srcPath[0]&&srcPath[0].charAt(0)==="/";if(psychotic){result.hostname=result.host=isAbsolute?"":srcPath.length?srcPath.shift():"";var authInHost=result.host&&result.host.indexOf("@")>0?result.host.split("@"):false;if(authInHost){result.auth=authInHost.shift();result.host=result.hostname=authInHost.shift()}}mustEndAbs=mustEndAbs||result.host&&srcPath.length;if(mustEndAbs&&!isAbsolute){srcPath.unshift("")}if(!srcPath.length){result.pathname=null;result.path=null}else{result.pathname=srcPath.join("/")}if(!isNull(result.pathname)||!isNull(result.search)){result.path=(result.pathname?result.pathname:"")+(result.search?result.search:"")}result.auth=relative.auth||result.auth;result.slashes=result.slashes||relative.slashes;result.href=result.format();return result};Url.prototype.parseHost=function(){var host=this.host;var port=portPattern.exec(host);if(port){port=port[0];if(port!==":"){this.port=port.substr(1)}host=host.substr(0,host.length-port.length)}if(host)this.hostname=host};function isString(arg){return typeof arg==="string"}function isObject(arg){return typeof arg==="object"&&arg!==null}function isNull(arg){return arg===null}function isNullOrUndefined(arg){return arg==null}},{punycode:6,querystring:9}],11:[function(require,module,exports){var $=require("jquery");function toggleDropdown(e){var $dropdown=$(e.currentTarget).parent().find(".dropdown-menu");$dropdown.toggleClass("open");e.stopPropagation();e.preventDefault()}function closeDropdown(e){$(".dropdown-menu").removeClass("open")}function init(){$(document).on("click",".toggle-dropdown",toggleDropdown);$(document).on("click",".dropdown-menu",function(e){e.stopPropagation()});$(document).on("click",closeDropdown)}module.exports={init:init}},{jquery:1}],12:[function(require,module,exports){var $=require("jquery");module.exports=$({})},{jquery:1}],13:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var storage=require("./storage");var dropdown=require("./dropdown");var events=require("./events");var state=require("./state");var keyboard=require("./keyboard");var navigation=require("./navigation");var sidebar=require("./sidebar");var toolbar=require("./toolbar");function start(config){sidebar.init();keyboard.init();dropdown.init();navigation.init();toolbar.createButton({index:0,icon:"fa fa-align-justify",onClick:function(e){e.preventDefault();sidebar.toggle()}});events.trigger("start",config);navigation.notify()}var gitbook={start:start,events:events,state:state,toolbar:toolbar,sidebar:sidebar,storage:storage,keyboard:keyboard};var MODULES={gitbook:gitbook,jquery:$,lodash:_};window.gitbook=gitbook;window.$=$;window.jQuery=$;gitbook.require=function(mods,fn){mods=_.map(mods,function(mod){mod=mod.toLowerCase();if(!MODULES[mod]){throw new Error("GitBook module "+mod+" doesn't exist")}return MODULES[mod]});fn.apply(null,mods)};module.exports={}},{"./dropdown":11,"./events":12,"./keyboard":14,"./navigation":16,"./sidebar":18,"./state":19,"./storage":20,"./toolbar":21,jquery:1,lodash:2}],14:[function(require,module,exports){var Mousetrap=require("mousetrap");var navigation=require("./navigation");var sidebar=require("./sidebar");function bindShortcut(keys,fn){Mousetrap.bind(keys,function(e){fn();return false})}function init(){bindShortcut(["right"],function(e){navigation.goNext()});bindShortcut(["left"],function(e){navigation.goPrev()});bindShortcut(["s"],function(e){sidebar.toggle()})}module.exports={init:init,bind:bindShortcut}},{"./navigation":16,"./sidebar":18,mousetrap:3}],15:[function(require,module,exports){var state=require("./state");function showLoading(p){state.$book.addClass("is-loading");p.always(function(){state.$book.removeClass("is-loading")});return p}module.exports={show:showLoading}},{"./state":19}],16:[function(require,module,exports){var $=require("jquery");var url=require("url");var events=require("./events");var state=require("./state");var loading=require("./loading");var usePushState=typeof history.pushState!=="undefined";function handleNavigation(relativeUrl,push){var uri=url.resolve(window.location.pathname,relativeUrl);notifyPageChange();location.href=relativeUrl;return;return loading.show($.get(uri).done(function(html){if(push)history.pushState({path:uri},null,uri);html=html.replace(/<(\/?)(html|head|body)([^>]*)>/gi,function(a,b,c,d){return"<"+b+"div"+(b?"":' data-element="'+c+'"')+d+">"});var $page=$(html);var $pageHead=$page.find("[data-element=head]");var $pageBody=$page.find(".book");document.title=$pageHead.find("title").text();var $head=$("head");$head.find("link[rel=prev]").remove();$head.find("link[rel=next]").remove();$head.append($pageHead.find("link[rel=prev]"));$head.append($pageHead.find("link[rel=next]"));var bodyClass=$(".book").attr("class");var scrollPosition=$(".book-summary .summary").scrollTop();$pageBody.toggleClass("with-summary",$(".book").hasClass("with-summary"));$(".book").replaceWith($pageBody);$(".book").attr("class",bodyClass);$(".book-summary .summary").scrollTop(scrollPosition);state.update($("html"));preparePage()}).fail(function(e){location.href=relativeUrl}))}function updateNavigationPosition(){var bodyInnerWidth,pageWrapperWidth;bodyInnerWidth=parseInt($(".body-inner").css("width"),10);pageWrapperWidth=parseInt($(".page-wrapper").css("width"),10);$(".navigation-next").css("margin-right",bodyInnerWidth-pageWrapperWidth+"px")}function notifyPageChange(){events.trigger("page.change")}function preparePage(notify){var $bookBody=$(".book-body");var $bookInner=$bookBody.find(".body-inner");var $pageWrapper=$bookInner.find(".page-wrapper");updateNavigationPosition();$bookInner.scrollTop(0);$bookBody.scrollTop(0);if(notify!==false)notifyPageChange()}function isLeftClickEvent(e){return e.button===0}function isModifiedEvent(e){return!!(e.metaKey||e.altKey||e.ctrlKey||e.shiftKey)}function handlePagination(e){if(isModifiedEvent(e)||!isLeftClickEvent(e)){return}e.stopPropagation();e.preventDefault();var url=$(this).attr("href");if(url)handleNavigation(url,true)}function goNext(){var url=$(".navigation-next").attr("href");if(url)handleNavigation(url,true)}function goPrev(){var url=$(".navigation-prev").attr("href");if(url)handleNavigation(url,true)}function init(){$.ajaxSetup({});if(location.protocol!=="file:"){history.replaceState({path:window.location.href},"")}window.onpopstate=function(event){if(event.state===null){return}return handleNavigation(event.state.path,false)};$(document).on("click",".navigation-prev",handlePagination);$(document).on("click",".navigation-next",handlePagination);$(document).on("click",".summary [data-path] a",handlePagination);$(window).resize(updateNavigationPosition);preparePage(false)}module.exports={init:init,goNext:goNext,goPrev:goPrev,notify:notifyPageChange}},{"./events":12,"./loading":15,"./state":19,jquery:1,url:10}],17:[function(require,module,exports){module.exports={isMobile:function(){return document.body.clientWidth<=600}}},{}],18:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var storage=require("./storage");var platform=require("./platform");var state=require("./state");function toggleSidebar(_state,animation){if(state!=null&&isOpen()==_state)return;if(animation==null)animation=true;state.$book.toggleClass("without-animation",!animation);state.$book.toggleClass("with-summary",_state);storage.set("sidebar",isOpen())}function isOpen(){return state.$book.hasClass("with-summary")}function init(){if(platform.isMobile()){toggleSidebar(false,false)}else{toggleSidebar(storage.get("sidebar",true),false)}$(document).on("click",".book-summary li.chapter a",function(e){if(platform.isMobile())toggleSidebar(false,false)})}function filterSummary(paths){var $summary=$(".book-summary");$summary.find("li").each(function(){var path=$(this).data("path");var st=paths==null||_.contains(paths,path);$(this).toggle(st);if(st)$(this).parents("li").show()})}module.exports={init:init,isOpen:isOpen,toggle:toggleSidebar,filter:filterSummary}},{"./platform":17,"./state":19,"./storage":20,jquery:1,lodash:2}],19:[function(require,module,exports){var $=require("jquery");var url=require("url");var path=require("path");var state={};state.update=function(dom){var $book=$(dom.find(".book"));state.$book=$book;state.level=$book.data("level");state.basePath=$book.data("basepath");state.innerLanguage=$book.data("innerlanguage");state.revision=$book.data("revision");state.filepath=$book.data("filepath");state.chapterTitle=$book.data("chapter-title");state.root=url.resolve(location.protocol+"//"+location.host,path.dirname(path.resolve(location.pathname.replace(/\/$/,"/index.html"),state.basePath))).replace(/\/?$/,"/");state.bookRoot=state.innerLanguage?url.resolve(state.root,".."):state.root};state.update($);module.exports=state},{jquery:1,path:4,url:10}],20:[function(require,module,exports){var baseKey="";module.exports={setBaseKey:function(key){baseKey=key},set:function(key,value){key=baseKey+":"+key;try{localStorage[key]=JSON.stringify(value)}catch(e){}},get:function(key,def){key=baseKey+":"+key;if(localStorage[key]===undefined)return def;try{var v=JSON.parse(localStorage[key]);return v==null?def:v}catch(err){return localStorage[key]||def}},remove:function(key){key=baseKey+":"+key;localStorage.removeItem(key)}}},{}],21:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var events=require("./events");var buttons=[];function insertAt(parent,selector,index,element){var lastIndex=parent.children(selector).size();if(index<0){index=Math.max(0,lastIndex+1+index)}parent.append(element);if(index<lastIndex){parent.children(selector).eq(index).before(parent.children(selector).last())}}function defaultOnClick(e){e.preventDefault()}function createDropdownMenu(dropdown){var $menu=$("<div>",{"class":"dropdown-menu",html:'<div class="dropdown-caret"><span class="caret-outer"></span><span class="caret-inner"></span></div>'});if(_.isString(dropdown)){$menu.append(dropdown)}else{var groups=_.map(dropdown,function(group){if(_.isArray(group))return group;else return[group]});_.each(groups,function(group){var $group=$("<div>",{"class":"buttons"});var sizeClass="size-"+group.length;_.each(group,function(btn){btn=_.defaults(btn||{},{text:"",className:"",onClick:defaultOnClick});var $btn=$("<button>",{"class":"button "+sizeClass+" "+btn.className,text:btn.text});$btn.click(btn.onClick);$group.append($btn)});$menu.append($group)})}return $menu}function createButton(opts){opts=_.defaults(opts||{},{label:"",icon:"",text:"",position:"left",className:"",onClick:defaultOnClick,dropdown:null,index:null});buttons.push(opts);updateButton(opts)}function updateButton(opts){var $result;var $toolbar=$(".book-header");var $title=$toolbar.find("h1");var positionClass="pull-"+opts.position;var $btn=$("<a>",{"class":"btn",text:opts.text?" "+opts.text:"","aria-label":opts.label,href:"#"});$btn.click(opts.onClick);if(opts.icon){$("<i>",{"class":opts.icon}).prependTo($btn)}if(opts.dropdown){var $container=$("<div>",{"class":"dropdown "+positionClass+" "+opts.className});$btn.addClass("toggle-dropdown");$container.append($btn);var $menu=createDropdownMenu(opts.dropdown);$menu.addClass("dropdown-"+(opts.position=="right"?"left":"right"));$container.append($menu);$result=$container}else{$btn.addClass(positionClass);$btn.addClass(opts.className);$result=$btn}$result.addClass("js-toolbar-action");if(_.isNumber(opts.index)&&opts.index>=0){insertAt($toolbar,".btn, .dropdown, h1",opts.index,$result)}else{$result.insertBefore($title)}}module.exports={createButton:createButton}},{"./events":12,jquery:1,lodash:2}]},{},[13]);
+//# sourceMappingURL=app.min.map
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/jquery.highlight.js b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/jquery.highlight.js
new file mode 100644
index 000000000..a0b69fc96
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/jquery.highlight.js
@@ -0,0 +1,84 @@
+gitbook.require(["jQuery"], function(jQuery) {
+
+/*
+ * jQuery Highlight plugin
+ *
+ * Based on highlight v3 by Johann Burkard
+ * http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html
+ *
+ * Code a little bit refactored and cleaned (in my humble opinion).
+ * Most important changes:
+ *  - has an option to highlight only entire words (wordsOnly - false by default),
+ *  - has an option to be case sensitive (caseSensitive - false by default)
+ *  - highlight element tag and class names can be specified in options
+ *
+ * Copyright (c) 2009 Bartek Szopka
+ *
+ * Licensed under MIT license.
+ *
+ */
+
+jQuery.extend({
+    highlight: function (node, re, nodeName, className) {
+        if (node.nodeType === 3) {
+            var match = node.data.match(re);
+            if (match) {
+                var highlight = document.createElement(nodeName || 'span');
+                highlight.className = className || 'highlight';
+                var wordNode = node.splitText(match.index);
+                wordNode.splitText(match[0].length);
+                var wordClone = wordNode.cloneNode(true);
+                highlight.appendChild(wordClone);
+                wordNode.parentNode.replaceChild(highlight, wordNode);
+                return 1; //skip added node in parent
+            }
+        } else if ((node.nodeType === 1 && node.childNodes) && // only element nodes that have children
+                !/(script|style)/i.test(node.tagName) && // ignore script and style nodes
+                !(node.tagName === nodeName.toUpperCase() && node.className === className)) { // skip if already highlighted
+            for (var i = 0; i < node.childNodes.length; i++) {
+                i += jQuery.highlight(node.childNodes[i], re, nodeName, className);
+            }
+        }
+        return 0;
+    }
+});
+
+jQuery.fn.unhighlight = function (options) {
+    var settings = { className: 'highlight', element: 'span' };
+    jQuery.extend(settings, options);
+
+    return this.find(settings.element + "." + settings.className).each(function () {
+        var parent = this.parentNode;
+        parent.replaceChild(this.firstChild, this);
+        parent.normalize();
+    }).end();
+};
+
+jQuery.fn.highlight = function (words, options) {
+    var settings = { className: 'highlight', element: 'span', caseSensitive: false, wordsOnly: false };
+    jQuery.extend(settings, options);
+
+    if (words.constructor === String) {
+        words = [words];
+    }
+    words = jQuery.grep(words, function(word, i){
+      return word !== '';
+    });
+    words = jQuery.map(words, function(word, i) {
+      return word.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
+    });
+    if (words.length === 0) { return this; }
+
+    var flag = settings.caseSensitive ? "" : "i";
+    var pattern = "(" + words.join("|") + ")";
+    if (settings.wordsOnly) {
+        pattern = "\\b" + pattern + "\\b";
+    }
+    var re = new RegExp(pattern, flag);
+
+    return this.each(function () {
+        jQuery.highlight(this, re, settings.element, settings.className);
+    });
+};
+
+});
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/lunr.js b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/lunr.js
new file mode 100644
index 000000000..3f846a141
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/lunr.js
@@ -0,0 +1,7 @@
+/**
+ * lunr - http://lunrjs.com - A bit like Solr, but much smaller and not as bright - 0.5.12
+ * Copyright (C) 2015 Oliver Nightingale
+ * MIT Licensed
+ * @license
+ */
+!function(){var t=function(e){var n=new t.Index;return n.pipeline.add(t.trimmer,t.stopWordFilter,t.stemmer),e&&e.call(n,n),n};t.version="0.5.12",t.utils={},t.utils.warn=function(t){return function(e){t.console&&console.warn&&console.warn(e)}}(this),t.EventEmitter=function(){this.events={}},t.EventEmitter.prototype.addListener=function(){var t=Array.prototype.slice.call(arguments),e=t.pop(),n=t;if("function"!=typeof e)throw new TypeError("last argument must be a function");n.forEach(function(t){this.hasHandler(t)||(this.events[t]=[]),this.events[t].push(e)},this)},t.EventEmitter.prototype.removeListener=function(t,e){if(this.hasHandler(t)){var n=this.events[t].indexOf(e);this.events[t].splice(n,1),this.events[t].length||delete this.events[t]}},t.EventEmitter.prototype.emit=function(t){if(this.hasHandler(t)){var e=Array.prototype.slice.call(arguments,1);this.events[t].forEach(function(t){t.apply(void 0,e)})}},t.EventEmitter.prototype.hasHandler=function(t){return t in this.events},t.tokenizer=function(t){return arguments.length&&null!=t&&void 0!=t?Array.isArray(t)?t.map(function(t){return t.toLowerCase()}):t.toString().trim().toLowerCase().split(/[\s\-\/]+/):[]},t.Pipeline=function(){this._stack=[]},t.Pipeline.registeredFunctions={},t.Pipeline.registerFunction=function(e,n){n in this.registeredFunctions&&t.utils.warn("Overwriting existing registered function: "+n),e.label=n,t.Pipeline.registeredFunctions[e.label]=e},t.Pipeline.warnIfFunctionNotRegistered=function(e){var n=e.label&&e.label in this.registeredFunctions;n||t.utils.warn("Function is not registered with pipeline. This may cause problems when serialising the index.\n",e)},t.Pipeline.load=function(e){var n=new t.Pipeline;return e.forEach(function(e){var i=t.Pipeline.registeredFunctions[e];if(!i)throw new Error("Cannot load un-registered function: "+e);n.add(i)}),n},t.Pipeline.prototype.add=function(){var e=Array.prototype.slice.call(arguments);e.forEach(function(e){t.Pipeline.warnIfFunctionNotRegistered(e),this._stack.push(e)},this)},t.Pipeline.prototype.after=function(e,n){t.Pipeline.warnIfFunctionNotRegistered(n);var i=this._stack.indexOf(e);if(-1==i)throw new Error("Cannot find existingFn");i+=1,this._stack.splice(i,0,n)},t.Pipeline.prototype.before=function(e,n){t.Pipeline.warnIfFunctionNotRegistered(n);var i=this._stack.indexOf(e);if(-1==i)throw new Error("Cannot find existingFn");this._stack.splice(i,0,n)},t.Pipeline.prototype.remove=function(t){var e=this._stack.indexOf(t);-1!=e&&this._stack.splice(e,1)},t.Pipeline.prototype.run=function(t){for(var e=[],n=t.length,i=this._stack.length,o=0;n>o;o++){for(var r=t[o],s=0;i>s&&(r=this._stack[s](r,o,t),void 0!==r);s++);void 0!==r&&e.push(r)}return e},t.Pipeline.prototype.reset=function(){this._stack=[]},t.Pipeline.prototype.toJSON=function(){return this._stack.map(function(e){return t.Pipeline.warnIfFunctionNotRegistered(e),e.label})},t.Vector=function(){this._magnitude=null,this.list=void 0,this.length=0},t.Vector.Node=function(t,e,n){this.idx=t,this.val=e,this.next=n},t.Vector.prototype.insert=function(e,n){this._magnitude=void 0;var i=this.list;if(!i)return this.list=new t.Vector.Node(e,n,i),this.length++;if(e<i.idx)return this.list=new t.Vector.Node(e,n,i),this.length++;for(var o=i,r=i.next;void 0!=r;){if(e<r.idx)return o.next=new t.Vector.Node(e,n,r),this.length++;o=r,r=r.next}return o.next=new t.Vector.Node(e,n,r),this.length++},t.Vector.prototype.magnitude=function(){if(this._magnitude)return this._magnitude;for(var t,e=this.list,n=0;e;)t=e.val,n+=t*t,e=e.next;return this._magnitude=Math.sqrt(n)},t.Vector.prototype.dot=function(t){for(var e=this.list,n=t.list,i=0;e&&n;)e.idx<n.idx?e=e.next:e.idx>n.idx?n=n.next:(i+=e.val*n.val,e=e.next,n=n.next);return i},t.Vector.prototype.similarity=function(t){return this.dot(t)/(this.magnitude()*t.magnitude())},t.SortedSet=function(){this.length=0,this.elements=[]},t.SortedSet.load=function(t){var e=new this;return e.elements=t,e.length=t.length,e},t.SortedSet.prototype.add=function(){var t,e;for(t=0;t<arguments.length;t++)e=arguments[t],~this.indexOf(e)||this.elements.splice(this.locationFor(e),0,e);this.length=this.elements.length},t.SortedSet.prototype.toArray=function(){return this.elements.slice()},t.SortedSet.prototype.map=function(t,e){return this.elements.map(t,e)},t.SortedSet.prototype.forEach=function(t,e){return this.elements.forEach(t,e)},t.SortedSet.prototype.indexOf=function(t){for(var e=0,n=this.elements.length,i=n-e,o=e+Math.floor(i/2),r=this.elements[o];i>1;){if(r===t)return o;t>r&&(e=o),r>t&&(n=o),i=n-e,o=e+Math.floor(i/2),r=this.elements[o]}return r===t?o:-1},t.SortedSet.prototype.locationFor=function(t){for(var e=0,n=this.elements.length,i=n-e,o=e+Math.floor(i/2),r=this.elements[o];i>1;)t>r&&(e=o),r>t&&(n=o),i=n-e,o=e+Math.floor(i/2),r=this.elements[o];return r>t?o:t>r?o+1:void 0},t.SortedSet.prototype.intersect=function(e){for(var n=new t.SortedSet,i=0,o=0,r=this.length,s=e.length,a=this.elements,h=e.elements;;){if(i>r-1||o>s-1)break;a[i]!==h[o]?a[i]<h[o]?i++:a[i]>h[o]&&o++:(n.add(a[i]),i++,o++)}return n},t.SortedSet.prototype.clone=function(){var e=new t.SortedSet;return e.elements=this.toArray(),e.length=e.elements.length,e},t.SortedSet.prototype.union=function(t){var e,n,i;return this.length>=t.length?(e=this,n=t):(e=t,n=this),i=e.clone(),i.add.apply(i,n.toArray()),i},t.SortedSet.prototype.toJSON=function(){return this.toArray()},t.Index=function(){this._fields=[],this._ref="id",this.pipeline=new t.Pipeline,this.documentStore=new t.Store,this.tokenStore=new t.TokenStore,this.corpusTokens=new t.SortedSet,this.eventEmitter=new t.EventEmitter,this._idfCache={},this.on("add","remove","update",function(){this._idfCache={}}.bind(this))},t.Index.prototype.on=function(){var t=Array.prototype.slice.call(arguments);return this.eventEmitter.addListener.apply(this.eventEmitter,t)},t.Index.prototype.off=function(t,e){return this.eventEmitter.removeListener(t,e)},t.Index.load=function(e){e.version!==t.version&&t.utils.warn("version mismatch: current "+t.version+" importing "+e.version);var n=new this;return n._fields=e.fields,n._ref=e.ref,n.documentStore=t.Store.load(e.documentStore),n.tokenStore=t.TokenStore.load(e.tokenStore),n.corpusTokens=t.SortedSet.load(e.corpusTokens),n.pipeline=t.Pipeline.load(e.pipeline),n},t.Index.prototype.field=function(t,e){var e=e||{},n={name:t,boost:e.boost||1};return this._fields.push(n),this},t.Index.prototype.ref=function(t){return this._ref=t,this},t.Index.prototype.add=function(e,n){var i={},o=new t.SortedSet,r=e[this._ref],n=void 0===n?!0:n;this._fields.forEach(function(n){var r=this.pipeline.run(t.tokenizer(e[n.name]));i[n.name]=r,t.SortedSet.prototype.add.apply(o,r)},this),this.documentStore.set(r,o),t.SortedSet.prototype.add.apply(this.corpusTokens,o.toArray());for(var s=0;s<o.length;s++){var a=o.elements[s],h=this._fields.reduce(function(t,e){var n=i[e.name].length;if(!n)return t;var o=i[e.name].filter(function(t){return t===a}).length;return t+o/n*e.boost},0);this.tokenStore.add(a,{ref:r,tf:h})}n&&this.eventEmitter.emit("add",e,this)},t.Index.prototype.remove=function(t,e){var n=t[this._ref],e=void 0===e?!0:e;if(this.documentStore.has(n)){var i=this.documentStore.get(n);this.documentStore.remove(n),i.forEach(function(t){this.tokenStore.remove(t,n)},this),e&&this.eventEmitter.emit("remove",t,this)}},t.Index.prototype.update=function(t,e){var e=void 0===e?!0:e;this.remove(t,!1),this.add(t,!1),e&&this.eventEmitter.emit("update",t,this)},t.Index.prototype.idf=function(t){var e="@"+t;if(Object.prototype.hasOwnProperty.call(this._idfCache,e))return this._idfCache[e];var n=this.tokenStore.count(t),i=1;return n>0&&(i=1+Math.log(this.documentStore.length/n)),this._idfCache[e]=i},t.Index.prototype.search=function(e){var n=this.pipeline.run(t.tokenizer(e)),i=new t.Vector,o=[],r=this._fields.reduce(function(t,e){return t+e.boost},0),s=n.some(function(t){return this.tokenStore.has(t)},this);if(!s)return[];n.forEach(function(e,n,s){var a=1/s.length*this._fields.length*r,h=this,l=this.tokenStore.expand(e).reduce(function(n,o){var r=h.corpusTokens.indexOf(o),s=h.idf(o),l=1,u=new t.SortedSet;if(o!==e){var c=Math.max(3,o.length-e.length);l=1/Math.log(c)}return r>-1&&i.insert(r,a*s*l),Object.keys(h.tokenStore.get(o)).forEach(function(t){u.add(t)}),n.union(u)},new t.SortedSet);o.push(l)},this);var a=o.reduce(function(t,e){return t.intersect(e)});return a.map(function(t){return{ref:t,score:i.similarity(this.documentVector(t))}},this).sort(function(t,e){return e.score-t.score})},t.Index.prototype.documentVector=function(e){for(var n=this.documentStore.get(e),i=n.length,o=new t.Vector,r=0;i>r;r++){var s=n.elements[r],a=this.tokenStore.get(s)[e].tf,h=this.idf(s);o.insert(this.corpusTokens.indexOf(s),a*h)}return o},t.Index.prototype.toJSON=function(){return{version:t.version,fields:this._fields,ref:this._ref,documentStore:this.documentStore.toJSON(),tokenStore:this.tokenStore.toJSON(),corpusTokens:this.corpusTokens.toJSON(),pipeline:this.pipeline.toJSON()}},t.Index.prototype.use=function(t){var e=Array.prototype.slice.call(arguments,1);e.unshift(this),t.apply(this,e)},t.Store=function(){this.store={},this.length=0},t.Store.load=function(e){var n=new this;return n.length=e.length,n.store=Object.keys(e.store).reduce(function(n,i){return n[i]=t.SortedSet.load(e.store[i]),n},{}),n},t.Store.prototype.set=function(t,e){this.has(t)||this.length++,this.store[t]=e},t.Store.prototype.get=function(t){return this.store[t]},t.Store.prototype.has=function(t){return t in this.store},t.Store.prototype.remove=function(t){this.has(t)&&(delete this.store[t],this.length--)},t.Store.prototype.toJSON=function(){return{store:this.store,length:this.length}},t.stemmer=function(){var t={ational:"ate",tional:"tion",enci:"ence",anci:"ance",izer:"ize",bli:"ble",alli:"al",entli:"ent",eli:"e",ousli:"ous",ization:"ize",ation:"ate",ator:"ate",alism:"al",iveness:"ive",fulness:"ful",ousness:"ous",aliti:"al",iviti:"ive",biliti:"ble",logi:"log"},e={icate:"ic",ative:"",alize:"al",iciti:"ic",ical:"ic",ful:"",ness:""},n="[^aeiou]",i="[aeiouy]",o=n+"[^aeiouy]*",r=i+"[aeiou]*",s="^("+o+")?"+r+o,a="^("+o+")?"+r+o+"("+r+")?$",h="^("+o+")?"+r+o+r+o,l="^("+o+")?"+i,u=new RegExp(s),c=new RegExp(h),f=new RegExp(a),d=new RegExp(l),p=/^(.+?)(ss|i)es$/,m=/^(.+?)([^s])s$/,v=/^(.+?)eed$/,y=/^(.+?)(ed|ing)$/,g=/.$/,S=/(at|bl|iz)$/,w=new RegExp("([^aeiouylsz])\\1$"),x=new RegExp("^"+o+i+"[^aeiouwxy]$"),k=/^(.+?[^aeiou])y$/,b=/^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/,E=/^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/,_=/^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/,F=/^(.+?)(s|t)(ion)$/,O=/^(.+?)e$/,P=/ll$/,N=new RegExp("^"+o+i+"[^aeiouwxy]$"),T=function(n){var i,o,r,s,a,h,l;if(n.length<3)return n;if(r=n.substr(0,1),"y"==r&&(n=r.toUpperCase()+n.substr(1)),s=p,a=m,s.test(n)?n=n.replace(s,"$1$2"):a.test(n)&&(n=n.replace(a,"$1$2")),s=v,a=y,s.test(n)){var T=s.exec(n);s=u,s.test(T[1])&&(s=g,n=n.replace(s,""))}else if(a.test(n)){var T=a.exec(n);i=T[1],a=d,a.test(i)&&(n=i,a=S,h=w,l=x,a.test(n)?n+="e":h.test(n)?(s=g,n=n.replace(s,"")):l.test(n)&&(n+="e"))}if(s=k,s.test(n)){var T=s.exec(n);i=T[1],n=i+"i"}if(s=b,s.test(n)){var T=s.exec(n);i=T[1],o=T[2],s=u,s.test(i)&&(n=i+t[o])}if(s=E,s.test(n)){var T=s.exec(n);i=T[1],o=T[2],s=u,s.test(i)&&(n=i+e[o])}if(s=_,a=F,s.test(n)){var T=s.exec(n);i=T[1],s=c,s.test(i)&&(n=i)}else if(a.test(n)){var T=a.exec(n);i=T[1]+T[2],a=c,a.test(i)&&(n=i)}if(s=O,s.test(n)){var T=s.exec(n);i=T[1],s=c,a=f,h=N,(s.test(i)||a.test(i)&&!h.test(i))&&(n=i)}return s=P,a=c,s.test(n)&&a.test(n)&&(s=g,n=n.replace(s,"")),"y"==r&&(n=r.toLowerCase()+n.substr(1)),n};return T}(),t.Pipeline.registerFunction(t.stemmer,"stemmer"),t.stopWordFilter=function(e){return e&&t.stopWordFilter.stopWords[e]!==e?e:void 0},t.stopWordFilter.stopWords={a:"a",able:"able",about:"about",across:"across",after:"after",all:"all",almost:"almost",also:"also",am:"am",among:"among",an:"an",and:"and",any:"any",are:"are",as:"as",at:"at",be:"be",because:"because",been:"been",but:"but",by:"by",can:"can",cannot:"cannot",could:"could",dear:"dear",did:"did","do":"do",does:"does",either:"either","else":"else",ever:"ever",every:"every","for":"for",from:"from",get:"get",got:"got",had:"had",has:"has",have:"have",he:"he",her:"her",hers:"hers",him:"him",his:"his",how:"how",however:"however",i:"i","if":"if","in":"in",into:"into",is:"is",it:"it",its:"its",just:"just",least:"least",let:"let",like:"like",likely:"likely",may:"may",me:"me",might:"might",most:"most",must:"must",my:"my",neither:"neither",no:"no",nor:"nor",not:"not",of:"of",off:"off",often:"often",on:"on",only:"only",or:"or",other:"other",our:"our",own:"own",rather:"rather",said:"said",say:"say",says:"says",she:"she",should:"should",since:"since",so:"so",some:"some",than:"than",that:"that",the:"the",their:"their",them:"them",then:"then",there:"there",these:"these",they:"they","this":"this",tis:"tis",to:"to",too:"too",twas:"twas",us:"us",wants:"wants",was:"was",we:"we",were:"were",what:"what",when:"when",where:"where",which:"which","while":"while",who:"who",whom:"whom",why:"why",will:"will","with":"with",would:"would",yet:"yet",you:"you",your:"your"},t.Pipeline.registerFunction(t.stopWordFilter,"stopWordFilter"),t.trimmer=function(t){var e=t.replace(/^\W+/,"").replace(/\W+$/,"");return""===e?void 0:e},t.Pipeline.registerFunction(t.trimmer,"trimmer"),t.TokenStore=function(){this.root={docs:{}},this.length=0},t.TokenStore.load=function(t){var e=new this;return e.root=t.root,e.length=t.length,e},t.TokenStore.prototype.add=function(t,e,n){var n=n||this.root,i=t[0],o=t.slice(1);return i in n||(n[i]={docs:{}}),0===o.length?(n[i].docs[e.ref]=e,void(this.length+=1)):this.add(o,e,n[i])},t.TokenStore.prototype.has=function(t){if(!t)return!1;for(var e=this.root,n=0;n<t.length;n++){if(!e[t[n]])return!1;e=e[t[n]]}return!0},t.TokenStore.prototype.getNode=function(t){if(!t)return{};for(var e=this.root,n=0;n<t.length;n++){if(!e[t[n]])return{};e=e[t[n]]}return e},t.TokenStore.prototype.get=function(t,e){return this.getNode(t,e).docs||{}},t.TokenStore.prototype.count=function(t,e){return Object.keys(this.get(t,e)).length},t.TokenStore.prototype.remove=function(t,e){if(t){for(var n=this.root,i=0;i<t.length;i++){if(!(t[i]in n))return;n=n[t[i]]}delete n.docs[e]}},t.TokenStore.prototype.expand=function(t,e){var n=this.getNode(t),i=n.docs||{},e=e||[];return Object.keys(i).length&&e.push(t),Object.keys(n).forEach(function(n){"docs"!==n&&e.concat(this.expand(t+n,e))},this),e},t.TokenStore.prototype.toJSON=function(){return{root:this.root,length:this.length}},function(t,e){"function"==typeof define&&define.amd?define(e):"object"==typeof exports?module.exports=e():t.lunr=e()}(this,function(){return t})}();
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-bookdown.js b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-bookdown.js
new file mode 100644
index 000000000..4337cd193
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-bookdown.js
@@ -0,0 +1,232 @@
+gitbook.require(["gitbook", "lodash", "jQuery"], function(gitbook, _, $) {
+
+  var gs = gitbook.storage;
+
+  gitbook.events.bind("start", function(e, config) {
+
+    // add the Edit button (edit on Github)
+    var edit = config.edit;
+    if (edit && edit.link) gitbook.toolbar.createButton({
+      icon: 'fa fa-edit',
+      label: edit.text || 'Edit',
+      position: 'left',
+      onClick: function(e) {
+        e.preventDefault();
+        window.open(edit.link);
+      }
+    });
+
+    // add the History button (file history on Github)
+    var history = config.history;
+    if (history && history.link) gitbook.toolbar.createButton({
+      icon: 'fa fa-history',
+      label: history.text || 'History',
+      position: 'left',
+      onClick: function(e) {
+        e.preventDefault();
+        window.open(history.link);
+      }
+    });
+
+    var down = config.download;
+    var normalizeDownload = function() {
+      if (!down || !(down instanceof Array) || down.length === 0) return;
+      if (down[0] instanceof Array) return down;
+      return $.map(down, function(file, i) {
+        return [[file, file.replace(/.*[.]/g, '').toUpperCase()]];
+      });
+    };
+    down = normalizeDownload(down);
+    if (down) if (down.length === 1 && /[.]pdf$/.test(down[0][0])) {
+      gitbook.toolbar.createButton({
+        icon: 'fa fa-file-pdf-o',
+        label: down[0][1],
+        position: 'left',
+        onClick: function(e) {
+          e.preventDefault();
+          window.open(down[0][0]);
+        }
+      });
+    } else {
+      gitbook.toolbar.createButton({
+        icon: 'fa fa-download',
+        label: 'Download',
+        position: 'left',
+        dropdown: $.map(down, function(item, i) {
+          return {
+            text: item[1],
+            onClick: function(e) {
+              e.preventDefault();
+              window.open(item[0]);
+            }
+          };
+        })
+      });
+    }
+
+    // highlight the current section in TOC
+    var href = window.location.pathname;
+    href = href.substr(href.lastIndexOf('/') + 1);
+    if (href === '') href = 'index.html';
+    var li = $('a[href^="' + href + location.hash + '"]').parent('li.chapter').first();
+    var summary = $('ul.summary'), chaps = summary.find('li.chapter');
+    if (li.length === 0) li = chaps.first();
+    li.addClass('active');
+    chaps.on('click', function(e) {
+      chaps.removeClass('active');
+      $(this).addClass('active');
+      gs.set('tocScrollTop', summary.scrollTop());
+    });
+
+    var toc = config.toc;
+    // collapse TOC items that are not for the current chapter
+    if (toc && toc.collapse) (function() {
+      var type = toc.collapse;
+      if (type === 'none') return;
+      if (type !== 'section' && type !== 'subsection') return;
+      // sections under chapters
+      var toc_sub = summary.children('li[data-level]').children('ul');
+      if (type === 'section') {
+        toc_sub.hide()
+          .parent().has(li).children('ul').show();
+      } else {
+        toc_sub.children('li').children('ul').hide()
+          .parent().has(li).children('ul').show();
+      }
+      li.children('ul').show();
+      var toc_sub2 = toc_sub.children('li');
+      if (type === 'section') toc_sub2.children('ul').hide();
+      summary.children('li[data-level]').find('a')
+        .on('click.bookdown', function(e) {
+          if (href === $(this).attr('href').replace(/#.*/, ''))
+            $(this).parent('li').children('ul').toggle();
+        });
+    })();
+
+    // add tooltips to the <a>'s that are truncated
+    $('a').each(function(i, el) {
+      if (el.offsetWidth >= el.scrollWidth) return;
+      if (typeof el.title === 'undefined') return;
+      el.title = el.text;
+    });
+
+    // restore TOC scroll position
+    var pos = gs.get('tocScrollTop');
+    if (typeof pos !== 'undefined') summary.scrollTop(pos);
+
+    // highlight the TOC item that has same text as the heading in view as scrolling
+    if (toc && toc.scroll_highlight !== false) (function() {
+      // scroll the current TOC item into viewport
+      var ht = $(window).height(), rect = li[0].getBoundingClientRect();
+      if (rect.top >= ht || rect.top <= 0 || rect.bottom <= 0) {
+        summary.scrollTop(li[0].offsetTop);
+      }
+      // current chapter TOC items
+      var items = $('a[href^="' + href + '"]').parent('li.chapter'),
+          m = items.length;
+      if (m === 0) {
+        items = summary.find('li.chapter');
+        m = items.length;
+      }
+      if (m === 0) return;
+      // all section titles on current page
+      var hs = bookInner.find('.page-inner').find('h1,h2,h3'), n = hs.length,
+          ts = hs.map(function(i, el) { return $(el).text(); });
+      if (n === 0) return;
+      var scrollHandler = function(e) {
+        var ht = $(window).height();
+        clearTimeout($.data(this, 'scrollTimer'));
+        $.data(this, 'scrollTimer', setTimeout(function() {
+          // find the first visible title in the viewport
+          for (var i = 0; i < n; i++) {
+            var rect = hs[i].getBoundingClientRect();
+            if (rect.top >= 0 && rect.bottom <= ht) break;
+          }
+          if (i === n) return;
+          items.removeClass('active');
+          for (var j = 0; j < m; j++) {
+            if (items.eq(j).children('a').first().text() === ts[i]) break;
+          }
+          if (j === m) j = 0;  // highlight the chapter title
+          // search bottom-up for a visible TOC item to highlight; if an item is
+          // hidden, we check if its parent is visible, and so on
+          while (j > 0 && items.eq(j).is(':hidden')) j--;
+          items.eq(j).addClass('active');
+        }, 250));
+      };
+      bookInner.on('scroll.bookdown', scrollHandler);
+      bookBody.on('scroll.bookdown', scrollHandler);
+    })();
+
+    // do not refresh the page if the TOC item points to the current page
+    $('a[href="' + href + '"]').parent('li.chapter').children('a')
+      .on('click', function(e) {
+        bookInner.scrollTop(0);
+        bookBody.scrollTop(0);
+        return false;
+      });
+
+    var toolbar = config.toolbar;
+    if (!toolbar || toolbar.position !== 'static') {
+      var bookHeader = $('.book-header');
+      bookBody.addClass('fixed');
+      bookHeader.addClass('fixed')
+      .css('background-color', bookBody.css('background-color'))
+      .on('click.bookdown', function(e) {
+        // the theme may have changed after user clicks the theme button
+        bookHeader.css('background-color', bookBody.css('background-color'));
+      });
+    }
+
+  });
+
+  gitbook.events.bind("page.change", function(e) {
+    // store TOC scroll position
+    var summary = $('ul.summary');
+    gs.set('tocScrollTop', summary.scrollTop());
+  });
+
+  var bookBody = $('.book-body'), bookInner = bookBody.find('.body-inner');
+  var chapterTitle = function() {
+    return bookInner.find('.page-inner').find('h1,h2').first().text();
+  };
+  var bookTitle = function() {
+    return bookInner.find('.book-header > h1').first().text();
+  };
+  var saveScrollPos = function(e) {
+    // save scroll position before page is reloaded
+    gs.set('bodyScrollTop', {
+      body: bookBody.scrollTop(),
+      inner: bookInner.scrollTop(),
+      focused: document.hasFocus(),
+      title: chapterTitle()
+    });
+  };
+  $(document).on('servr:reload', saveScrollPos);
+
+  // check if the page is loaded in an iframe (e.g. the RStudio preview window)
+  var inIFrame = function() {
+    var inIframe = true;
+    try { inIframe = window.self !== window.top; } catch (e) {}
+    return inIframe;
+  };
+  $(window).on('blur unload', function(e) {
+    if (inIFrame()) saveScrollPos(e);
+    gs.set('bookTitle', bookTitle());
+  });
+
+  $(function(e) {
+    if (gs.get('bookTitle', '') !== bookTitle()) localStorage.clear();
+    var pos = gs.get('bodyScrollTop');
+    if (pos) {
+      if (pos.title === chapterTitle()) {
+        if (pos.body !== 0) bookBody.scrollTop(pos.body);
+        if (pos.inner !== 0) bookInner.scrollTop(pos.inner);
+      }
+      if (pos.focused) bookInner.find('.page-wrapper').focus();
+    }
+    // clear book body scroll position
+    gs.remove('bodyScrollTop');
+  });
+
+});
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-fontsettings.js b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-fontsettings.js
new file mode 100644
index 000000000..b39eca27e
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-fontsettings.js
@@ -0,0 +1,151 @@
+gitbook.require(["gitbook", "lodash", "jQuery"], function(gitbook, _, $) {
+    var fontState;
+
+    var THEMES = {
+        "white": 0,
+        "sepia": 1,
+        "night": 2
+    };
+
+    var FAMILY = {
+        "serif": 0,
+        "sans": 1
+    };
+
+    // Save current font settings
+    function saveFontSettings() {
+        gitbook.storage.set("fontState", fontState);
+        update();
+    }
+
+    // Increase font size
+    function enlargeFontSize(e) {
+        e.preventDefault();
+        if (fontState.size >= 4) return;
+
+        fontState.size++;
+        saveFontSettings();
+    };
+
+    // Decrease font size
+    function reduceFontSize(e) {
+        e.preventDefault();
+        if (fontState.size <= 0) return;
+
+        fontState.size--;
+        saveFontSettings();
+    };
+
+    // Change font family
+    function changeFontFamily(index, e) {
+        e.preventDefault();
+
+        fontState.family = index;
+        saveFontSettings();
+    };
+
+    // Change type of color
+    function changeColorTheme(index, e) {
+        e.preventDefault();
+
+        var $book = $(".book");
+
+        if (fontState.theme !== 0)
+            $book.removeClass("color-theme-"+fontState.theme);
+
+        fontState.theme = index;
+        if (fontState.theme !== 0)
+            $book.addClass("color-theme-"+fontState.theme);
+
+        saveFontSettings();
+    };
+
+    function update() {
+        var $book = gitbook.state.$book;
+
+        $(".font-settings .font-family-list li").removeClass("active");
+        $(".font-settings .font-family-list li:nth-child("+(fontState.family+1)+")").addClass("active");
+
+        $book[0].className = $book[0].className.replace(/\bfont-\S+/g, '');
+        $book.addClass("font-size-"+fontState.size);
+        $book.addClass("font-family-"+fontState.family);
+
+        if(fontState.theme !== 0) {
+            $book[0].className = $book[0].className.replace(/\bcolor-theme-\S+/g, '');
+            $book.addClass("color-theme-"+fontState.theme);
+        }
+    };
+
+    function init(config) {
+        var $bookBody, $book;
+
+        //Find DOM elements.
+        $book = gitbook.state.$book;
+        $bookBody = $book.find(".book-body");
+
+        // Instantiate font state object
+        fontState = gitbook.storage.get("fontState", {
+            size: config.size || 2,
+            family: FAMILY[config.family || "sans"],
+            theme: THEMES[config.theme || "white"]
+        });
+
+        update();
+    };
+
+
+    gitbook.events.bind("start", function(e, config) {
+        var opts = config.fontsettings;
+
+        // Create buttons in toolbar
+        gitbook.toolbar.createButton({
+            icon: 'fa fa-font',
+            label: 'Font Settings',
+            className: 'font-settings',
+            dropdown: [
+                [
+                    {
+                        text: 'A',
+                        className: 'font-reduce',
+                        onClick: reduceFontSize
+                    },
+                    {
+                        text: 'A',
+                        className: 'font-enlarge',
+                        onClick: enlargeFontSize
+                    }
+                ],
+                [
+                    {
+                        text: 'Serif',
+                        onClick: _.partial(changeFontFamily, 0)
+                    },
+                    {
+                        text: 'Sans',
+                        onClick: _.partial(changeFontFamily, 1)
+                    }
+                ],
+                [
+                    {
+                        text: 'White',
+                        onClick: _.partial(changeColorTheme, 0)
+                    },
+                    {
+                        text: 'Sepia',
+                        onClick: _.partial(changeColorTheme, 1)
+                    },
+                    {
+                        text: 'Night',
+                        onClick: _.partial(changeColorTheme, 2)
+                    }
+                ]
+            ]
+        });
+
+
+        // Init current settings
+        init(opts);
+    });
+});
+
+
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-search.js b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-search.js
new file mode 100644
index 000000000..a8dad2893
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-search.js
@@ -0,0 +1,222 @@
+gitbook.require(["gitbook", "lodash", "jQuery"], function(gitbook, _, $) {
+    var index = null;
+    var $searchInput, $searchLabel, $searchForm;
+    var $highlighted, hi = 0, hiOpts = { className: 'search-highlight' };
+    var collapse = false;
+
+    // Use a specific index
+    function loadIndex(data) {
+        // [Yihui] In bookdown, I use a character matrix to store the chapter
+        // content, and the index is dynamically built on the client side.
+        // Gitbook prebuilds the index data instead: https://github.com/GitbookIO/plugin-search
+        // We can certainly do that via R packages V8 and jsonlite, but let's
+        // see how slow it really is before improving it. On the other hand,
+        // lunr cannot handle non-English text very well, e.g. the default
+        // tokenizer cannot deal with Chinese text, so we may want to replace
+        // lunr with a dumb simple text matching approach.
+        index = lunr(function () {
+          this.ref('url');
+          this.field('title', { boost: 10 });
+          this.field('body');
+        });
+        data.map(function(item) {
+          index.add({
+            url: item[0],
+            title: item[1],
+            body: item[2]
+          });
+        });
+    }
+
+    // Fetch the search index
+    function fetchIndex() {
+        return $.getJSON(gitbook.state.basePath+"/search_index.json")
+                .then(loadIndex);  // [Yihui] we need to use this object later
+    }
+
+    // Search for a term and return results
+    function search(q) {
+        if (!index) return;
+
+        var results = _.chain(index.search(q))
+        .map(function(result) {
+            var parts = result.ref.split("#");
+            return {
+                path: parts[0],
+                hash: parts[1]
+            };
+        })
+        .value();
+
+        // [Yihui] Highlight the search keyword on current page
+        hi = 0;
+        $highlighted = results.length === 0 ? undefined : $('.page-inner')
+          .unhighlight(hiOpts).highlight(q, hiOpts).find('span.search-highlight');
+        scrollToHighlighted();
+        toggleTOC(results.length > 0);
+
+        return results;
+    }
+
+    // [Yihui] Scroll the chapter body to the i-th highlighted string
+    function scrollToHighlighted() {
+      if (!$highlighted) return;
+      var n = $highlighted.length;
+      if (n === 0) return;
+      var $p = $highlighted.eq(hi), p = $p[0], rect = p.getBoundingClientRect();
+      if (rect.top < 0 || rect.bottom > $(window).height()) {
+        ($(window).width() >= 1240 ? $('.body-inner') : $('.book-body'))
+          .scrollTop(p.offsetTop - 100);
+      }
+      $highlighted.css('background-color', '');
+      // an orange background color on the current item and removed later
+      $p.css('background-color', 'orange');
+      setTimeout(function() {
+        $p.css('background-color', '');
+      }, 2000);
+    }
+
+    // [Yihui] Expand/collapse TOC
+    function toggleTOC(show) {
+      if (!collapse) return;
+      var toc_sub = $('ul.summary').children('li[data-level]').children('ul');
+      if (show) return toc_sub.show();
+      var href = window.location.pathname;
+      href = href.substr(href.lastIndexOf('/') + 1);
+      if (href === '') href = 'index.html';
+      var li = $('a[href^="' + href + location.hash + '"]').parent('li.chapter').first();
+      toc_sub.hide().parent().has(li).children('ul').show();
+      li.children('ul').show();
+    }
+
+    // Create search form
+    function createForm(value) {
+        if ($searchForm) $searchForm.remove();
+        if ($searchLabel) $searchLabel.remove();
+        if ($searchInput) $searchInput.remove();
+
+        $searchForm = $('<div>', {
+            'class': 'book-search',
+            'role': 'search'
+        });
+
+        $searchLabel = $('<label>', {
+            'for': 'search-box',
+            'aria-hidden': 'false',
+            'hidden': ''
+        });
+
+        $searchInput = $('<input>', {
+            'id': 'search-box',
+            'type': 'search',
+            'class': 'form-control',
+            'val': value,
+            'placeholder': 'Type to search'
+        });
+
+        $searchLabel.append("Type to search");
+        $searchLabel.appendTo($searchForm);
+        $searchInput.appendTo($searchForm);
+        $searchForm.prependTo(gitbook.state.$book.find('.book-summary'));
+    }
+
+    // Return true if search is open
+    function isSearchOpen() {
+        return gitbook.state.$book.hasClass("with-search");
+    }
+
+    // Toggle the search
+    function toggleSearch(_state) {
+        if (isSearchOpen() === _state) return;
+        if (!$searchInput) return;
+
+        gitbook.state.$book.toggleClass("with-search", _state);
+
+        // If search bar is open: focus input
+        if (isSearchOpen()) {
+            gitbook.sidebar.toggle(true);
+            $searchInput.focus();
+        } else {
+            $searchInput.blur();
+            $searchInput.val("");
+            gitbook.storage.remove("keyword");
+            gitbook.sidebar.filter(null);
+            $('.page-inner').unhighlight(hiOpts);
+            toggleTOC(false);
+        }
+    }
+
+    // Recover current search when page changed
+    function recoverSearch() {
+        var keyword = gitbook.storage.get("keyword", "");
+
+        createForm(keyword);
+
+        if (keyword.length > 0) {
+            if(!isSearchOpen()) {
+                toggleSearch(true); // [Yihui] open the search box
+            }
+            gitbook.sidebar.filter(_.pluck(search(keyword), "path"));
+        }
+    }
+
+
+    gitbook.events.bind("start", function(e, config) {
+        // [Yihui] disable search
+        if (config.search === false) return;
+        collapse = !config.toc || config.toc.collapse === 'section' ||
+          config.toc.collapse === 'subsection';
+
+        // Pre-fetch search index and create the form
+        fetchIndex()
+        // [Yihui] recover search after the page is loaded
+        .then(recoverSearch);
+
+
+        // Type in search bar
+        $(document).on("keyup", ".book-search input", function(e) {
+            var key = (e.keyCode ? e.keyCode : e.which);
+            // [Yihui] Escape -> close search box; Up/Down: previous/next highlighted
+            if (key == 27) {
+                e.preventDefault();
+                toggleSearch(false);
+            } else if (key == 38) {
+              if (hi <= 0 && $highlighted) hi = $highlighted.length;
+              hi--;
+              scrollToHighlighted();
+            } else if (key == 40) {
+              hi++;
+              if ($highlighted && hi >= $highlighted.length) hi = 0;
+              scrollToHighlighted();
+            }
+        }).on("input", ".book-search input", function(e) {
+            var q = $(this).val().trim();
+            if (q.length === 0) {
+                gitbook.sidebar.filter(null);
+                gitbook.storage.remove("keyword");
+                $('.page-inner').unhighlight(hiOpts);
+                toggleTOC(false);
+            } else {
+                var results = search(q);
+                gitbook.sidebar.filter(
+                    _.pluck(results, "path")
+                );
+                gitbook.storage.set("keyword", q);
+            }
+        });
+
+        // Create the toggle search button
+        gitbook.toolbar.createButton({
+            icon: 'fa fa-search',
+            label: 'Search',
+            position: 'left',
+            onClick: toggleSearch
+        });
+
+        // Bind keyboard to toggle search
+        gitbook.keyboard.bind(['f'], toggleSearch);
+    });
+
+    // [Yihui] do not try to recover search; always start fresh
+    // gitbook.events.bind("page.change", recoverSearch);
+});
diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-sharing.js b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-sharing.js
new file mode 100644
index 000000000..afa214826
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/plugin-sharing.js
@@ -0,0 +1,112 @@
+gitbook.require(["gitbook", "lodash", "jQuery"], function(gitbook, _, $) {
+    var SITES = {
+        'github': {
+            'label': 'Github',
+            'icon': 'fa fa-github',
+            'onClick': function(e) {
+                e.preventDefault();
+                var repo = $('meta[name="github-repo"]').attr('content');
+                if (typeof repo === 'undefined') throw("Github repo not defined");
+                window.open("https://github.com/"+repo);
+            }
+        },
+        'facebook': {
+            'label': 'Facebook',
+            'icon': 'fa fa-facebook',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("http://www.facebook.com/sharer/sharer.php?s=100&p[url]="+encodeURIComponent(location.href));
+            }
+        },
+        'twitter': {
+            'label': 'Twitter',
+            'icon': 'fa fa-twitter',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("http://twitter.com/home?status="+encodeURIComponent(document.title+" "+location.href));
+            }
+        },
+        'google': {
+            'label': 'Google+',
+            'icon': 'fa fa-google-plus',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("https://plus.google.com/share?url="+encodeURIComponent(location.href));
+            }
+        },
+        'linkedin': {
+            'label': 'LinkedIn',
+            'icon': 'fa fa-linkedin',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("https://www.linkedin.com/shareArticle?mini=true&url="+encodeURIComponent(location.href)+"&title="+encodeURIComponent(document.title));
+            }
+        },
+        'weibo': {
+            'label': 'Weibo',
+            'icon': 'fa fa-weibo',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("http://service.weibo.com/share/share.php?content=utf-8&url="+encodeURIComponent(location.href)+"&title="+encodeURIComponent(document.title));
+            }
+        },
+        'instapaper': {
+            'label': 'Instapaper',
+            'icon': 'fa fa-instapaper',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("http://www.instapaper.com/text?u="+encodeURIComponent(location.href));
+            }
+        },
+        'vk': {
+            'label': 'VK',
+            'icon': 'fa fa-vk',
+            'onClick': function(e) {
+                e.preventDefault();
+                window.open("http://vkontakte.ru/share.php?url="+encodeURIComponent(location.href));
+            }
+        }
+    };
+
+
+
+    gitbook.events.bind("start", function(e, config) {
+        var opts = config.sharing;
+        if (!opts) return;
+
+        // Create dropdown menu
+        var menu = _.chain(opts.all)
+            .map(function(id) {
+                var site = SITES[id];
+
+                return {
+                    text: site.label,
+                    onClick: site.onClick
+                };
+            })
+            .compact()
+            .value();
+
+        // Create main button with dropdown
+        if (menu.length > 0) {
+            gitbook.toolbar.createButton({
+                icon: 'fa fa-share-alt',
+                label: 'Share',
+                position: 'right',
+                dropdown: [menu]
+            });
+        }
+
+        // Direct actions to share
+        _.each(SITES, function(site, sideId) {
+            if (!opts[sideId]) return;
+
+            gitbook.toolbar.createButton({
+                icon: site.icon,
+                label: site.text,
+                position: 'right',
+                onClick: site.onClick
+            });
+        });
+    });
+});
diff --git a/previous_versions/v0.4.0/libs/htmlwidgets-1.2/htmlwidgets.js b/previous_versions/v0.4.0/libs/htmlwidgets-1.2/htmlwidgets.js
new file mode 100755
index 000000000..ecda3ef8b
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/htmlwidgets-1.2/htmlwidgets.js
@@ -0,0 +1,836 @@
+(function() {
+  // If window.HTMLWidgets is already defined, then use it; otherwise create a
+  // new object. This allows preceding code to set options that affect the
+  // initialization process (though none currently exist).
+  window.HTMLWidgets = window.HTMLWidgets || {};
+
+  // See if we're running in a viewer pane. If not, we're in a web browser.
+  var viewerMode = window.HTMLWidgets.viewerMode =
+      /\bviewer_pane=1\b/.test(window.location);
+
+  // See if we're running in Shiny mode. If not, it's a static document.
+  // Note that static widgets can appear in both Shiny and static modes, but
+  // obviously, Shiny widgets can only appear in Shiny apps/documents.
+  var shinyMode = window.HTMLWidgets.shinyMode =
+      typeof(window.Shiny) !== "undefined" && !!window.Shiny.outputBindings;
+
+  // We can't count on jQuery being available, so we implement our own
+  // version if necessary.
+  function querySelectorAll(scope, selector) {
+    if (typeof(jQuery) !== "undefined" && scope instanceof jQuery) {
+      return scope.find(selector);
+    }
+    if (scope.querySelectorAll) {
+      return scope.querySelectorAll(selector);
+    }
+  }
+
+  function asArray(value) {
+    if (value === null)
+      return [];
+    if ($.isArray(value))
+      return value;
+    return [value];
+  }
+
+  // Implement jQuery's extend
+  function extend(target /*, ... */) {
+    if (arguments.length == 1) {
+      return target;
+    }
+    for (var i = 1; i < arguments.length; i++) {
+      var source = arguments[i];
+      for (var prop in source) {
+        if (source.hasOwnProperty(prop)) {
+          target[prop] = source[prop];
+        }
+      }
+    }
+    return target;
+  }
+
+  // IE8 doesn't support Array.forEach.
+  function forEach(values, callback, thisArg) {
+    if (values.forEach) {
+      values.forEach(callback, thisArg);
+    } else {
+      for (var i = 0; i < values.length; i++) {
+        callback.call(thisArg, values[i], i, values);
+      }
+    }
+  }
+
+  // Replaces the specified method with the return value of funcSource.
+  //
+  // Note that funcSource should not BE the new method, it should be a function
+  // that RETURNS the new method. funcSource receives a single argument that is
+  // the overridden method, it can be called from the new method. The overridden
+  // method can be called like a regular function, it has the target permanently
+  // bound to it so "this" will work correctly.
+  function overrideMethod(target, methodName, funcSource) {
+    var superFunc = target[methodName] || function() {};
+    var superFuncBound = function() {
+      return superFunc.apply(target, arguments);
+    };
+    target[methodName] = funcSource(superFuncBound);
+  }
+
+  // Add a method to delegator that, when invoked, calls
+  // delegatee.methodName. If there is no such method on
+  // the delegatee, but there was one on delegator before
+  // delegateMethod was called, then the original version
+  // is invoked instead.
+  // For example:
+  //
+  // var a = {
+  //   method1: function() { console.log('a1'); }
+  //   method2: function() { console.log('a2'); }
+  // };
+  // var b = {
+  //   method1: function() { console.log('b1'); }
+  // };
+  // delegateMethod(a, b, "method1");
+  // delegateMethod(a, b, "method2");
+  // a.method1();
+  // a.method2();
+  //
+  // The output would be "b1", "a2".
+  function delegateMethod(delegator, delegatee, methodName) {
+    var inherited = delegator[methodName];
+    delegator[methodName] = function() {
+      var target = delegatee;
+      var method = delegatee[methodName];
+
+      // The method doesn't exist on the delegatee. Instead,
+      // call the method on the delegator, if it exists.
+      if (!method) {
+        target = delegator;
+        method = inherited;
+      }
+
+      if (method) {
+        return method.apply(target, arguments);
+      }
+    };
+  }
+
+  // Implement a vague facsimilie of jQuery's data method
+  function elementData(el, name, value) {
+    if (arguments.length == 2) {
+      return el["htmlwidget_data_" + name];
+    } else if (arguments.length == 3) {
+      el["htmlwidget_data_" + name] = value;
+      return el;
+    } else {
+      throw new Error("Wrong number of arguments for elementData: " +
+        arguments.length);
+    }
+  }
+
+  // http://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
+  function escapeRegExp(str) {
+    return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
+  }
+
+  function hasClass(el, className) {
+    var re = new RegExp("\\b" + escapeRegExp(className) + "\\b");
+    return re.test(el.className);
+  }
+
+  // elements - array (or array-like object) of HTML elements
+  // className - class name to test for
+  // include - if true, only return elements with given className;
+  //   if false, only return elements *without* given className
+  function filterByClass(elements, className, include) {
+    var results = [];
+    for (var i = 0; i < elements.length; i++) {
+      if (hasClass(elements[i], className) == include)
+        results.push(elements[i]);
+    }
+    return results;
+  }
+
+  function on(obj, eventName, func) {
+    if (obj.addEventListener) {
+      obj.addEventListener(eventName, func, false);
+    } else if (obj.attachEvent) {
+      obj.attachEvent(eventName, func);
+    }
+  }
+
+  function off(obj, eventName, func) {
+    if (obj.removeEventListener)
+      obj.removeEventListener(eventName, func, false);
+    else if (obj.detachEvent) {
+      obj.detachEvent(eventName, func);
+    }
+  }
+
+  // Translate array of values to top/right/bottom/left, as usual with
+  // the "padding" CSS property
+  // https://developer.mozilla.org/en-US/docs/Web/CSS/padding
+  function unpackPadding(value) {
+    if (typeof(value) === "number")
+      value = [value];
+    if (value.length === 1) {
+      return {top: value[0], right: value[0], bottom: value[0], left: value[0]};
+    }
+    if (value.length === 2) {
+      return {top: value[0], right: value[1], bottom: value[0], left: value[1]};
+    }
+    if (value.length === 3) {
+      return {top: value[0], right: value[1], bottom: value[2], left: value[1]};
+    }
+    if (value.length === 4) {
+      return {top: value[0], right: value[1], bottom: value[2], left: value[3]};
+    }
+  }
+
+  // Convert an unpacked padding object to a CSS value
+  function paddingToCss(paddingObj) {
+    return paddingObj.top + "px " + paddingObj.right + "px " + paddingObj.bottom + "px " + paddingObj.left + "px";
+  }
+
+  // Makes a number suitable for CSS
+  function px(x) {
+    if (typeof(x) === "number")
+      return x + "px";
+    else
+      return x;
+  }
+
+  // Retrieves runtime widget sizing information for an element.
+  // The return value is either null, or an object with fill, padding,
+  // defaultWidth, defaultHeight fields.
+  function sizingPolicy(el) {
+    var sizingEl = document.querySelector("script[data-for='" + el.id + "'][type='application/htmlwidget-sizing']");
+    if (!sizingEl)
+      return null;
+    var sp = JSON.parse(sizingEl.textContent || sizingEl.text || "{}");
+    if (viewerMode) {
+      return sp.viewer;
+    } else {
+      return sp.browser;
+    }
+  }
+
+  // @param tasks Array of strings (or falsy value, in which case no-op).
+  //   Each element must be a valid JavaScript expression that yields a
+  //   function. Or, can be an array of objects with "code" and "data"
+  //   properties; in this case, the "code" property should be a string
+  //   of JS that's an expr that yields a function, and "data" should be
+  //   an object that will be added as an additional argument when that
+  //   function is called.
+  // @param target The object that will be "this" for each function
+  //   execution.
+  // @param args Array of arguments to be passed to the functions. (The
+  //   same arguments will be passed to all functions.)
+  function evalAndRun(tasks, target, args) {
+    if (tasks) {
+      forEach(tasks, function(task) {
+        var theseArgs = args;
+        if (typeof(task) === "object") {
+          theseArgs = theseArgs.concat([task.data]);
+          task = task.code;
+        }
+        var taskFunc = eval("(" + task + ")");
+        if (typeof(taskFunc) !== "function") {
+          throw new Error("Task must be a function! Source:\n" + task);
+        }
+        taskFunc.apply(target, theseArgs);
+      });
+    }
+  }
+
+  function initSizing(el) {
+    var sizing = sizingPolicy(el);
+    if (!sizing)
+      return;
+
+    var cel = document.getElementById("htmlwidget_container");
+    if (!cel)
+      return;
+
+    if (typeof(sizing.padding) !== "undefined") {
+      document.body.style.margin = "0";
+      document.body.style.padding = paddingToCss(unpackPadding(sizing.padding));
+    }
+
+    if (sizing.fill) {
+      document.body.style.overflow = "hidden";
+      document.body.style.width = "100%";
+      document.body.style.height = "100%";
+      document.documentElement.style.width = "100%";
+      document.documentElement.style.height = "100%";
+      if (cel) {
+        cel.style.position = "absolute";
+        var pad = unpackPadding(sizing.padding);
+        cel.style.top = pad.top + "px";
+        cel.style.right = pad.right + "px";
+        cel.style.bottom = pad.bottom + "px";
+        cel.style.left = pad.left + "px";
+        el.style.width = "100%";
+        el.style.height = "100%";
+      }
+
+      return {
+        getWidth: function() { return cel.offsetWidth; },
+        getHeight: function() { return cel.offsetHeight; }
+      };
+
+    } else {
+      el.style.width = px(sizing.width);
+      el.style.height = px(sizing.height);
+
+      return {
+        getWidth: function() { return el.offsetWidth; },
+        getHeight: function() { return el.offsetHeight; }
+      };
+    }
+  }
+
+  // Default implementations for methods
+  var defaults = {
+    find: function(scope) {
+      return querySelectorAll(scope, "." + this.name);
+    },
+    renderError: function(el, err) {
+      var $el = $(el);
+
+      this.clearError(el);
+
+      // Add all these error classes, as Shiny does
+      var errClass = "shiny-output-error";
+      if (err.type !== null) {
+        // use the classes of the error condition as CSS class names
+        errClass = errClass + " " + $.map(asArray(err.type), function(type) {
+          return errClass + "-" + type;
+        }).join(" ");
+      }
+      errClass = errClass + " htmlwidgets-error";
+
+      // Is el inline or block? If inline or inline-block, just display:none it
+      // and add an inline error.
+      var display = $el.css("display");
+      $el.data("restore-display-mode", display);
+
+      if (display === "inline" || display === "inline-block") {
+        $el.hide();
+        if (err.message !== "") {
+          var errorSpan = $("<span>").addClass(errClass);
+          errorSpan.text(err.message);
+          $el.after(errorSpan);
+        }
+      } else if (display === "block") {
+        // If block, add an error just after the el, set visibility:none on the
+        // el, and position the error to be on top of the el.
+        // Mark it with a unique ID and CSS class so we can remove it later.
+        $el.css("visibility", "hidden");
+        if (err.message !== "") {
+          var errorDiv = $("<div>").addClass(errClass).css("position", "absolute")
+            .css("top", el.offsetTop)
+            .css("left", el.offsetLeft)
+            // setting width can push out the page size, forcing otherwise
+            // unnecessary scrollbars to appear and making it impossible for
+            // the element to shrink; so use max-width instead
+            .css("maxWidth", el.offsetWidth)
+            .css("height", el.offsetHeight);
+          errorDiv.text(err.message);
+          $el.after(errorDiv);
+
+          // Really dumb way to keep the size/position of the error in sync with
+          // the parent element as the window is resized or whatever.
+          var intId = setInterval(function() {
+            if (!errorDiv[0].parentElement) {
+              clearInterval(intId);
+              return;
+            }
+            errorDiv
+              .css("top", el.offsetTop)
+              .css("left", el.offsetLeft)
+              .css("maxWidth", el.offsetWidth)
+              .css("height", el.offsetHeight);
+          }, 500);
+        }
+      }
+    },
+    clearError: function(el) {
+      var $el = $(el);
+      var display = $el.data("restore-display-mode");
+      $el.data("restore-display-mode", null);
+
+      if (display === "inline" || display === "inline-block") {
+        if (display)
+          $el.css("display", display);
+        $(el.nextSibling).filter(".htmlwidgets-error").remove();
+      } else if (display === "block"){
+        $el.css("visibility", "inherit");
+        $(el.nextSibling).filter(".htmlwidgets-error").remove();
+      }
+    },
+    sizing: {}
+  };
+
+  // Called by widget bindings to register a new type of widget. The definition
+  // object can contain the following properties:
+  // - name (required) - A string indicating the binding name, which will be
+  //   used by default as the CSS classname to look for.
+  // - initialize (optional) - A function(el) that will be called once per
+  //   widget element; if a value is returned, it will be passed as the third
+  //   value to renderValue.
+  // - renderValue (required) - A function(el, data, initValue) that will be
+  //   called with data. Static contexts will cause this to be called once per
+  //   element; Shiny apps will cause this to be called multiple times per
+  //   element, as the data changes.
+  window.HTMLWidgets.widget = function(definition) {
+    if (!definition.name) {
+      throw new Error("Widget must have a name");
+    }
+    if (!definition.type) {
+      throw new Error("Widget must have a type");
+    }
+    // Currently we only support output widgets
+    if (definition.type !== "output") {
+      throw new Error("Unrecognized widget type '" + definition.type + "'");
+    }
+    // TODO: Verify that .name is a valid CSS classname
+
+    // Support new-style instance-bound definitions. Old-style class-bound
+    // definitions have one widget "object" per widget per type/class of
+    // widget; the renderValue and resize methods on such widget objects
+    // take el and instance arguments, because the widget object can't
+    // store them. New-style instance-bound definitions have one widget
+    // object per widget instance; the definition that's passed in doesn't
+    // provide renderValue or resize methods at all, just the single method
+    //   factory(el, width, height)
+    // which returns an object that has renderValue(x) and resize(w, h).
+    // This enables a far more natural programming style for the widget
+    // author, who can store per-instance state using either OO-style
+    // instance fields or functional-style closure variables (I guess this
+    // is in contrast to what can only be called C-style pseudo-OO which is
+    // what we required before).
+    if (definition.factory) {
+      definition = createLegacyDefinitionAdapter(definition);
+    }
+
+    if (!definition.renderValue) {
+      throw new Error("Widget must have a renderValue function");
+    }
+
+    // For static rendering (non-Shiny), use a simple widget registration
+    // scheme. We also use this scheme for Shiny apps/documents that also
+    // contain static widgets.
+    window.HTMLWidgets.widgets = window.HTMLWidgets.widgets || [];
+    // Merge defaults into the definition; don't mutate the original definition.
+    var staticBinding = extend({}, defaults, definition);
+    overrideMethod(staticBinding, "find", function(superfunc) {
+      return function(scope) {
+        var results = superfunc(scope);
+        // Filter out Shiny outputs, we only want the static kind
+        return filterByClass(results, "html-widget-output", false);
+      };
+    });
+    window.HTMLWidgets.widgets.push(staticBinding);
+
+    if (shinyMode) {
+      // Shiny is running. Register the definition with an output binding.
+      // The definition itself will not be the output binding, instead
+      // we will make an output binding object that delegates to the
+      // definition. This is because we foolishly used the same method
+      // name (renderValue) for htmlwidgets definition and Shiny bindings
+      // but they actually have quite different semantics (the Shiny
+      // bindings receive data that includes lots of metadata that it
+      // strips off before calling htmlwidgets renderValue). We can't
+      // just ignore the difference because in some widgets it's helpful
+      // to call this.renderValue() from inside of resize(), and if
+      // we're not delegating, then that call will go to the Shiny
+      // version instead of the htmlwidgets version.
+
+      // Merge defaults with definition, without mutating either.
+      var bindingDef = extend({}, defaults, definition);
+
+      // This object will be our actual Shiny binding.
+      var shinyBinding = new Shiny.OutputBinding();
+
+      // With a few exceptions, we'll want to simply use the bindingDef's
+      // version of methods if they are available, otherwise fall back to
+      // Shiny's defaults. NOTE: If Shiny's output bindings gain additional
+      // methods in the future, and we want them to be overrideable by
+      // HTMLWidget binding definitions, then we'll need to add them to this
+      // list.
+      delegateMethod(shinyBinding, bindingDef, "getId");
+      delegateMethod(shinyBinding, bindingDef, "onValueChange");
+      delegateMethod(shinyBinding, bindingDef, "onValueError");
+      delegateMethod(shinyBinding, bindingDef, "renderError");
+      delegateMethod(shinyBinding, bindingDef, "clearError");
+      delegateMethod(shinyBinding, bindingDef, "showProgress");
+
+      // The find, renderValue, and resize are handled differently, because we
+      // want to actually decorate the behavior of the bindingDef methods.
+
+      shinyBinding.find = function(scope) {
+        var results = bindingDef.find(scope);
+
+        // Only return elements that are Shiny outputs, not static ones
+        var dynamicResults = results.filter(".html-widget-output");
+
+        // It's possible that whatever caused Shiny to think there might be
+        // new dynamic outputs, also caused there to be new static outputs.
+        // Since there might be lots of different htmlwidgets bindings, we
+        // schedule execution for later--no need to staticRender multiple
+        // times.
+        if (results.length !== dynamicResults.length)
+          scheduleStaticRender();
+
+        return dynamicResults;
+      };
+
+      // Wrap renderValue to handle initialization, which unfortunately isn't
+      // supported natively by Shiny at the time of this writing.
+
+      shinyBinding.renderValue = function(el, data) {
+        Shiny.renderDependencies(data.deps);
+        // Resolve strings marked as javascript literals to objects
+        if (!(data.evals instanceof Array)) data.evals = [data.evals];
+        for (var i = 0; data.evals && i < data.evals.length; i++) {
+          window.HTMLWidgets.evaluateStringMember(data.x, data.evals[i]);
+        }
+        if (!bindingDef.renderOnNullValue) {
+          if (data.x === null) {
+            el.style.visibility = "hidden";
+            return;
+          } else {
+            el.style.visibility = "inherit";
+          }
+        }
+        if (!elementData(el, "initialized")) {
+          initSizing(el);
+
+          elementData(el, "initialized", true);
+          if (bindingDef.initialize) {
+            var result = bindingDef.initialize(el, el.offsetWidth,
+              el.offsetHeight);
+            elementData(el, "init_result", result);
+          }
+        }
+        bindingDef.renderValue(el, data.x, elementData(el, "init_result"));
+        evalAndRun(data.jsHooks.render, elementData(el, "init_result"), [el, data.x]);
+      };
+
+      // Only override resize if bindingDef implements it
+      if (bindingDef.resize) {
+        shinyBinding.resize = function(el, width, height) {
+          // Shiny can call resize before initialize/renderValue have been
+          // called, which doesn't make sense for widgets.
+          if (elementData(el, "initialized")) {
+            bindingDef.resize(el, width, height, elementData(el, "init_result"));
+          }
+        };
+      }
+
+      Shiny.outputBindings.register(shinyBinding, bindingDef.name);
+    }
+  };
+
+  var scheduleStaticRenderTimerId = null;
+  function scheduleStaticRender() {
+    if (!scheduleStaticRenderTimerId) {
+      scheduleStaticRenderTimerId = setTimeout(function() {
+        scheduleStaticRenderTimerId = null;
+        window.HTMLWidgets.staticRender();
+      }, 1);
+    }
+  }
+
+  // Render static widgets after the document finishes loading
+  // Statically render all elements that are of this widget's class
+  window.HTMLWidgets.staticRender = function() {
+    var bindings = window.HTMLWidgets.widgets || [];
+    forEach(bindings, function(binding) {
+      var matches = binding.find(document.documentElement);
+      forEach(matches, function(el) {
+        var sizeObj = initSizing(el, binding);
+
+        if (hasClass(el, "html-widget-static-bound"))
+          return;
+        el.className = el.className + " html-widget-static-bound";
+
+        var initResult;
+        if (binding.initialize) {
+          initResult = binding.initialize(el,
+            sizeObj ? sizeObj.getWidth() : el.offsetWidth,
+            sizeObj ? sizeObj.getHeight() : el.offsetHeight
+          );
+          elementData(el, "init_result", initResult);
+        }
+
+        if (binding.resize) {
+          var lastSize = {};
+          var resizeHandler = function(e) {
+            var size = {
+              w: sizeObj ? sizeObj.getWidth() : el.offsetWidth,
+              h: sizeObj ? sizeObj.getHeight() : el.offsetHeight
+            };
+            if (size.w === 0 && size.h === 0)
+              return;
+            if (size.w === lastSize.w && size.h === lastSize.h)
+              return;
+            lastSize = size;
+            binding.resize(el, size.w, size.h, initResult);
+          };
+
+          on(window, "resize", resizeHandler);
+
+          // This is needed for cases where we're running in a Shiny
+          // app, but the widget itself is not a Shiny output, but
+          // rather a simple static widget. One example of this is
+          // an rmarkdown document that has runtime:shiny and widget
+          // that isn't in a render function. Shiny only knows to
+          // call resize handlers for Shiny outputs, not for static
+          // widgets, so we do it ourselves.
+          if (window.jQuery) {
+            window.jQuery(document).on(
+              "shown.htmlwidgets shown.bs.tab.htmlwidgets shown.bs.collapse.htmlwidgets",
+              resizeHandler
+            );
+            window.jQuery(document).on(
+              "hidden.htmlwidgets hidden.bs.tab.htmlwidgets hidden.bs.collapse.htmlwidgets",
+              resizeHandler
+            );
+          }
+
+          // This is needed for the specific case of ioslides, which
+          // flips slides between display:none and display:block.
+          // Ideally we would not have to have ioslide-specific code
+          // here, but rather have ioslides raise a generic event,
+          // but the rmarkdown package just went to CRAN so the
+          // window to getting that fixed may be long.
+          if (window.addEventListener) {
+            // It's OK to limit this to window.addEventListener
+            // browsers because ioslides itself only supports
+            // such browsers.
+            on(document, "slideenter", resizeHandler);
+            on(document, "slideleave", resizeHandler);
+          }
+        }
+
+        var scriptData = document.querySelector("script[data-for='" + el.id + "'][type='application/json']");
+        if (scriptData) {
+          var data = JSON.parse(scriptData.textContent || scriptData.text);
+          // Resolve strings marked as javascript literals to objects
+          if (!(data.evals instanceof Array)) data.evals = [data.evals];
+          for (var k = 0; data.evals && k < data.evals.length; k++) {
+            window.HTMLWidgets.evaluateStringMember(data.x, data.evals[k]);
+          }
+          binding.renderValue(el, data.x, initResult);
+          evalAndRun(data.jsHooks.render, initResult, [el, data.x]);
+        }
+      });
+    });
+
+    invokePostRenderHandlers();
+  }
+
+  // Wait until after the document has loaded to render the widgets.
+  if (document.addEventListener) {
+    document.addEventListener("DOMContentLoaded", function() {
+      document.removeEventListener("DOMContentLoaded", arguments.callee, false);
+      window.HTMLWidgets.staticRender();
+    }, false);
+  } else if (document.attachEvent) {
+    document.attachEvent("onreadystatechange", function() {
+      if (document.readyState === "complete") {
+        document.detachEvent("onreadystatechange", arguments.callee);
+        window.HTMLWidgets.staticRender();
+      }
+    });
+  }
+
+
+  window.HTMLWidgets.getAttachmentUrl = function(depname, key) {
+    // If no key, default to the first item
+    if (typeof(key) === "undefined")
+      key = 1;
+
+    var link = document.getElementById(depname + "-" + key + "-attachment");
+    if (!link) {
+      throw new Error("Attachment " + depname + "/" + key + " not found in document");
+    }
+    return link.getAttribute("href");
+  };
+
+  window.HTMLWidgets.dataframeToD3 = function(df) {
+    var names = [];
+    var length;
+    for (var name in df) {
+        if (df.hasOwnProperty(name))
+            names.push(name);
+        if (typeof(df[name]) !== "object" || typeof(df[name].length) === "undefined") {
+            throw new Error("All fields must be arrays");
+        } else if (typeof(length) !== "undefined" && length !== df[name].length) {
+            throw new Error("All fields must be arrays of the same length");
+        }
+        length = df[name].length;
+    }
+    var results = [];
+    var item;
+    for (var row = 0; row < length; row++) {
+        item = {};
+        for (var col = 0; col < names.length; col++) {
+            item[names[col]] = df[names[col]][row];
+        }
+        results.push(item);
+    }
+    return results;
+  };
+
+  window.HTMLWidgets.transposeArray2D = function(array) {
+      if (array.length === 0) return array;
+      var newArray = array[0].map(function(col, i) {
+          return array.map(function(row) {
+              return row[i]
+          })
+      });
+      return newArray;
+  };
+  // Split value at splitChar, but allow splitChar to be escaped
+  // using escapeChar. Any other characters escaped by escapeChar
+  // will be included as usual (including escapeChar itself).
+  function splitWithEscape(value, splitChar, escapeChar) {
+    var results = [];
+    var escapeMode = false;
+    var currentResult = "";
+    for (var pos = 0; pos < value.length; pos++) {
+      if (!escapeMode) {
+        if (value[pos] === splitChar) {
+          results.push(currentResult);
+          currentResult = "";
+        } else if (value[pos] === escapeChar) {
+          escapeMode = true;
+        } else {
+          currentResult += value[pos];
+        }
+      } else {
+        currentResult += value[pos];
+        escapeMode = false;
+      }
+    }
+    if (currentResult !== "") {
+      results.push(currentResult);
+    }
+    return results;
+  }
+  // Function authored by Yihui/JJ Allaire
+  window.HTMLWidgets.evaluateStringMember = function(o, member) {
+    var parts = splitWithEscape(member, '.', '\\');
+    for (var i = 0, l = parts.length; i < l; i++) {
+      var part = parts[i];
+      // part may be a character or 'numeric' member name
+      if (o !== null && typeof o === "object" && part in o) {
+        if (i == (l - 1)) { // if we are at the end of the line then evalulate
+          if (typeof o[part] === "string")
+            o[part] = eval("(" + o[part] + ")");
+        } else { // otherwise continue to next embedded object
+          o = o[part];
+        }
+      }
+    }
+  };
+
+  // Retrieve the HTMLWidget instance (i.e. the return value of an
+  // HTMLWidget binding's initialize() or factory() function)
+  // associated with an element, or null if none.
+  window.HTMLWidgets.getInstance = function(el) {
+    return elementData(el, "init_result");
+  };
+
+  // Finds the first element in the scope that matches the selector,
+  // and returns the HTMLWidget instance (i.e. the return value of
+  // an HTMLWidget binding's initialize() or factory() function)
+  // associated with that element, if any. If no element matches the
+  // selector, or the first matching element has no HTMLWidget
+  // instance associated with it, then null is returned.
+  //
+  // The scope argument is optional, and defaults to window.document.
+  window.HTMLWidgets.find = function(scope, selector) {
+    if (arguments.length == 1) {
+      selector = scope;
+      scope = document;
+    }
+
+    var el = scope.querySelector(selector);
+    if (el === null) {
+      return null;
+    } else {
+      return window.HTMLWidgets.getInstance(el);
+    }
+  };
+
+  // Finds all elements in the scope that match the selector, and
+  // returns the HTMLWidget instances (i.e. the return values of
+  // an HTMLWidget binding's initialize() or factory() function)
+  // associated with the elements, in an array. If elements that
+  // match the selector don't have an associated HTMLWidget
+  // instance, the returned array will contain nulls.
+  //
+  // The scope argument is optional, and defaults to window.document.
+  window.HTMLWidgets.findAll = function(scope, selector) {
+    if (arguments.length == 1) {
+      selector = scope;
+      scope = document;
+    }
+
+    var nodes = scope.querySelectorAll(selector);
+    var results = [];
+    for (var i = 0; i < nodes.length; i++) {
+      results.push(window.HTMLWidgets.getInstance(nodes[i]));
+    }
+    return results;
+  };
+
+  var postRenderHandlers = [];
+  function invokePostRenderHandlers() {
+    while (postRenderHandlers.length) {
+      var handler = postRenderHandlers.shift();
+      if (handler) {
+        handler();
+      }
+    }
+  }
+
+  // Register the given callback function to be invoked after the
+  // next time static widgets are rendered.
+  window.HTMLWidgets.addPostRenderHandler = function(callback) {
+    postRenderHandlers.push(callback);
+  };
+
+  // Takes a new-style instance-bound definition, and returns an
+  // old-style class-bound definition. This saves us from having
+  // to rewrite all the logic in this file to accomodate both
+  // types of definitions.
+  function createLegacyDefinitionAdapter(defn) {
+    var result = {
+      name: defn.name,
+      type: defn.type,
+      initialize: function(el, width, height) {
+        return defn.factory(el, width, height);
+      },
+      renderValue: function(el, x, instance) {
+        return instance.renderValue(x);
+      },
+      resize: function(el, width, height, instance) {
+        return instance.resize(width, height);
+      }
+    };
+
+    if (defn.find)
+      result.find = defn.find;
+    if (defn.renderError)
+      result.renderError = defn.renderError;
+    if (defn.clearError)
+      result.clearError = defn.clearError;
+
+    return result;
+  }
+})();
+
diff --git a/previous_versions/v0.4.0/libs/htmlwidgets-1.3/htmlwidgets.js b/previous_versions/v0.4.0/libs/htmlwidgets-1.3/htmlwidgets.js
new file mode 100644
index 000000000..ed9837d9c
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/htmlwidgets-1.3/htmlwidgets.js
@@ -0,0 +1,839 @@
+(function() {
+  // If window.HTMLWidgets is already defined, then use it; otherwise create a
+  // new object. This allows preceding code to set options that affect the
+  // initialization process (though none currently exist).
+  window.HTMLWidgets = window.HTMLWidgets || {};
+
+  // See if we're running in a viewer pane. If not, we're in a web browser.
+  var viewerMode = window.HTMLWidgets.viewerMode =
+      /\bviewer_pane=1\b/.test(window.location);
+
+  // See if we're running in Shiny mode. If not, it's a static document.
+  // Note that static widgets can appear in both Shiny and static modes, but
+  // obviously, Shiny widgets can only appear in Shiny apps/documents.
+  var shinyMode = window.HTMLWidgets.shinyMode =
+      typeof(window.Shiny) !== "undefined" && !!window.Shiny.outputBindings;
+
+  // We can't count on jQuery being available, so we implement our own
+  // version if necessary.
+  function querySelectorAll(scope, selector) {
+    if (typeof(jQuery) !== "undefined" && scope instanceof jQuery) {
+      return scope.find(selector);
+    }
+    if (scope.querySelectorAll) {
+      return scope.querySelectorAll(selector);
+    }
+  }
+
+  function asArray(value) {
+    if (value === null)
+      return [];
+    if ($.isArray(value))
+      return value;
+    return [value];
+  }
+
+  // Implement jQuery's extend
+  function extend(target /*, ... */) {
+    if (arguments.length == 1) {
+      return target;
+    }
+    for (var i = 1; i < arguments.length; i++) {
+      var source = arguments[i];
+      for (var prop in source) {
+        if (source.hasOwnProperty(prop)) {
+          target[prop] = source[prop];
+        }
+      }
+    }
+    return target;
+  }
+
+  // IE8 doesn't support Array.forEach.
+  function forEach(values, callback, thisArg) {
+    if (values.forEach) {
+      values.forEach(callback, thisArg);
+    } else {
+      for (var i = 0; i < values.length; i++) {
+        callback.call(thisArg, values[i], i, values);
+      }
+    }
+  }
+
+  // Replaces the specified method with the return value of funcSource.
+  //
+  // Note that funcSource should not BE the new method, it should be a function
+  // that RETURNS the new method. funcSource receives a single argument that is
+  // the overridden method, it can be called from the new method. The overridden
+  // method can be called like a regular function, it has the target permanently
+  // bound to it so "this" will work correctly.
+  function overrideMethod(target, methodName, funcSource) {
+    var superFunc = target[methodName] || function() {};
+    var superFuncBound = function() {
+      return superFunc.apply(target, arguments);
+    };
+    target[methodName] = funcSource(superFuncBound);
+  }
+
+  // Add a method to delegator that, when invoked, calls
+  // delegatee.methodName. If there is no such method on
+  // the delegatee, but there was one on delegator before
+  // delegateMethod was called, then the original version
+  // is invoked instead.
+  // For example:
+  //
+  // var a = {
+  //   method1: function() { console.log('a1'); }
+  //   method2: function() { console.log('a2'); }
+  // };
+  // var b = {
+  //   method1: function() { console.log('b1'); }
+  // };
+  // delegateMethod(a, b, "method1");
+  // delegateMethod(a, b, "method2");
+  // a.method1();
+  // a.method2();
+  //
+  // The output would be "b1", "a2".
+  function delegateMethod(delegator, delegatee, methodName) {
+    var inherited = delegator[methodName];
+    delegator[methodName] = function() {
+      var target = delegatee;
+      var method = delegatee[methodName];
+
+      // The method doesn't exist on the delegatee. Instead,
+      // call the method on the delegator, if it exists.
+      if (!method) {
+        target = delegator;
+        method = inherited;
+      }
+
+      if (method) {
+        return method.apply(target, arguments);
+      }
+    };
+  }
+
+  // Implement a vague facsimilie of jQuery's data method
+  function elementData(el, name, value) {
+    if (arguments.length == 2) {
+      return el["htmlwidget_data_" + name];
+    } else if (arguments.length == 3) {
+      el["htmlwidget_data_" + name] = value;
+      return el;
+    } else {
+      throw new Error("Wrong number of arguments for elementData: " +
+        arguments.length);
+    }
+  }
+
+  // http://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
+  function escapeRegExp(str) {
+    return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
+  }
+
+  function hasClass(el, className) {
+    var re = new RegExp("\\b" + escapeRegExp(className) + "\\b");
+    return re.test(el.className);
+  }
+
+  // elements - array (or array-like object) of HTML elements
+  // className - class name to test for
+  // include - if true, only return elements with given className;
+  //   if false, only return elements *without* given className
+  function filterByClass(elements, className, include) {
+    var results = [];
+    for (var i = 0; i < elements.length; i++) {
+      if (hasClass(elements[i], className) == include)
+        results.push(elements[i]);
+    }
+    return results;
+  }
+
+  function on(obj, eventName, func) {
+    if (obj.addEventListener) {
+      obj.addEventListener(eventName, func, false);
+    } else if (obj.attachEvent) {
+      obj.attachEvent(eventName, func);
+    }
+  }
+
+  function off(obj, eventName, func) {
+    if (obj.removeEventListener)
+      obj.removeEventListener(eventName, func, false);
+    else if (obj.detachEvent) {
+      obj.detachEvent(eventName, func);
+    }
+  }
+
+  // Translate array of values to top/right/bottom/left, as usual with
+  // the "padding" CSS property
+  // https://developer.mozilla.org/en-US/docs/Web/CSS/padding
+  function unpackPadding(value) {
+    if (typeof(value) === "number")
+      value = [value];
+    if (value.length === 1) {
+      return {top: value[0], right: value[0], bottom: value[0], left: value[0]};
+    }
+    if (value.length === 2) {
+      return {top: value[0], right: value[1], bottom: value[0], left: value[1]};
+    }
+    if (value.length === 3) {
+      return {top: value[0], right: value[1], bottom: value[2], left: value[1]};
+    }
+    if (value.length === 4) {
+      return {top: value[0], right: value[1], bottom: value[2], left: value[3]};
+    }
+  }
+
+  // Convert an unpacked padding object to a CSS value
+  function paddingToCss(paddingObj) {
+    return paddingObj.top + "px " + paddingObj.right + "px " + paddingObj.bottom + "px " + paddingObj.left + "px";
+  }
+
+  // Makes a number suitable for CSS
+  function px(x) {
+    if (typeof(x) === "number")
+      return x + "px";
+    else
+      return x;
+  }
+
+  // Retrieves runtime widget sizing information for an element.
+  // The return value is either null, or an object with fill, padding,
+  // defaultWidth, defaultHeight fields.
+  function sizingPolicy(el) {
+    var sizingEl = document.querySelector("script[data-for='" + el.id + "'][type='application/htmlwidget-sizing']");
+    if (!sizingEl)
+      return null;
+    var sp = JSON.parse(sizingEl.textContent || sizingEl.text || "{}");
+    if (viewerMode) {
+      return sp.viewer;
+    } else {
+      return sp.browser;
+    }
+  }
+
+  // @param tasks Array of strings (or falsy value, in which case no-op).
+  //   Each element must be a valid JavaScript expression that yields a
+  //   function. Or, can be an array of objects with "code" and "data"
+  //   properties; in this case, the "code" property should be a string
+  //   of JS that's an expr that yields a function, and "data" should be
+  //   an object that will be added as an additional argument when that
+  //   function is called.
+  // @param target The object that will be "this" for each function
+  //   execution.
+  // @param args Array of arguments to be passed to the functions. (The
+  //   same arguments will be passed to all functions.)
+  function evalAndRun(tasks, target, args) {
+    if (tasks) {
+      forEach(tasks, function(task) {
+        var theseArgs = args;
+        if (typeof(task) === "object") {
+          theseArgs = theseArgs.concat([task.data]);
+          task = task.code;
+        }
+        var taskFunc = eval("(" + task + ")");
+        if (typeof(taskFunc) !== "function") {
+          throw new Error("Task must be a function! Source:\n" + task);
+        }
+        taskFunc.apply(target, theseArgs);
+      });
+    }
+  }
+
+  function initSizing(el) {
+    var sizing = sizingPolicy(el);
+    if (!sizing)
+      return;
+
+    var cel = document.getElementById("htmlwidget_container");
+    if (!cel)
+      return;
+
+    if (typeof(sizing.padding) !== "undefined") {
+      document.body.style.margin = "0";
+      document.body.style.padding = paddingToCss(unpackPadding(sizing.padding));
+    }
+
+    if (sizing.fill) {
+      document.body.style.overflow = "hidden";
+      document.body.style.width = "100%";
+      document.body.style.height = "100%";
+      document.documentElement.style.width = "100%";
+      document.documentElement.style.height = "100%";
+      if (cel) {
+        cel.style.position = "absolute";
+        var pad = unpackPadding(sizing.padding);
+        cel.style.top = pad.top + "px";
+        cel.style.right = pad.right + "px";
+        cel.style.bottom = pad.bottom + "px";
+        cel.style.left = pad.left + "px";
+        el.style.width = "100%";
+        el.style.height = "100%";
+      }
+
+      return {
+        getWidth: function() { return cel.offsetWidth; },
+        getHeight: function() { return cel.offsetHeight; }
+      };
+
+    } else {
+      el.style.width = px(sizing.width);
+      el.style.height = px(sizing.height);
+
+      return {
+        getWidth: function() { return el.offsetWidth; },
+        getHeight: function() { return el.offsetHeight; }
+      };
+    }
+  }
+
+  // Default implementations for methods
+  var defaults = {
+    find: function(scope) {
+      return querySelectorAll(scope, "." + this.name);
+    },
+    renderError: function(el, err) {
+      var $el = $(el);
+
+      this.clearError(el);
+
+      // Add all these error classes, as Shiny does
+      var errClass = "shiny-output-error";
+      if (err.type !== null) {
+        // use the classes of the error condition as CSS class names
+        errClass = errClass + " " + $.map(asArray(err.type), function(type) {
+          return errClass + "-" + type;
+        }).join(" ");
+      }
+      errClass = errClass + " htmlwidgets-error";
+
+      // Is el inline or block? If inline or inline-block, just display:none it
+      // and add an inline error.
+      var display = $el.css("display");
+      $el.data("restore-display-mode", display);
+
+      if (display === "inline" || display === "inline-block") {
+        $el.hide();
+        if (err.message !== "") {
+          var errorSpan = $("<span>").addClass(errClass);
+          errorSpan.text(err.message);
+          $el.after(errorSpan);
+        }
+      } else if (display === "block") {
+        // If block, add an error just after the el, set visibility:none on the
+        // el, and position the error to be on top of the el.
+        // Mark it with a unique ID and CSS class so we can remove it later.
+        $el.css("visibility", "hidden");
+        if (err.message !== "") {
+          var errorDiv = $("<div>").addClass(errClass).css("position", "absolute")
+            .css("top", el.offsetTop)
+            .css("left", el.offsetLeft)
+            // setting width can push out the page size, forcing otherwise
+            // unnecessary scrollbars to appear and making it impossible for
+            // the element to shrink; so use max-width instead
+            .css("maxWidth", el.offsetWidth)
+            .css("height", el.offsetHeight);
+          errorDiv.text(err.message);
+          $el.after(errorDiv);
+
+          // Really dumb way to keep the size/position of the error in sync with
+          // the parent element as the window is resized or whatever.
+          var intId = setInterval(function() {
+            if (!errorDiv[0].parentElement) {
+              clearInterval(intId);
+              return;
+            }
+            errorDiv
+              .css("top", el.offsetTop)
+              .css("left", el.offsetLeft)
+              .css("maxWidth", el.offsetWidth)
+              .css("height", el.offsetHeight);
+          }, 500);
+        }
+      }
+    },
+    clearError: function(el) {
+      var $el = $(el);
+      var display = $el.data("restore-display-mode");
+      $el.data("restore-display-mode", null);
+
+      if (display === "inline" || display === "inline-block") {
+        if (display)
+          $el.css("display", display);
+        $(el.nextSibling).filter(".htmlwidgets-error").remove();
+      } else if (display === "block"){
+        $el.css("visibility", "inherit");
+        $(el.nextSibling).filter(".htmlwidgets-error").remove();
+      }
+    },
+    sizing: {}
+  };
+
+  // Called by widget bindings to register a new type of widget. The definition
+  // object can contain the following properties:
+  // - name (required) - A string indicating the binding name, which will be
+  //   used by default as the CSS classname to look for.
+  // - initialize (optional) - A function(el) that will be called once per
+  //   widget element; if a value is returned, it will be passed as the third
+  //   value to renderValue.
+  // - renderValue (required) - A function(el, data, initValue) that will be
+  //   called with data. Static contexts will cause this to be called once per
+  //   element; Shiny apps will cause this to be called multiple times per
+  //   element, as the data changes.
+  window.HTMLWidgets.widget = function(definition) {
+    if (!definition.name) {
+      throw new Error("Widget must have a name");
+    }
+    if (!definition.type) {
+      throw new Error("Widget must have a type");
+    }
+    // Currently we only support output widgets
+    if (definition.type !== "output") {
+      throw new Error("Unrecognized widget type '" + definition.type + "'");
+    }
+    // TODO: Verify that .name is a valid CSS classname
+
+    // Support new-style instance-bound definitions. Old-style class-bound
+    // definitions have one widget "object" per widget per type/class of
+    // widget; the renderValue and resize methods on such widget objects
+    // take el and instance arguments, because the widget object can't
+    // store them. New-style instance-bound definitions have one widget
+    // object per widget instance; the definition that's passed in doesn't
+    // provide renderValue or resize methods at all, just the single method
+    //   factory(el, width, height)
+    // which returns an object that has renderValue(x) and resize(w, h).
+    // This enables a far more natural programming style for the widget
+    // author, who can store per-instance state using either OO-style
+    // instance fields or functional-style closure variables (I guess this
+    // is in contrast to what can only be called C-style pseudo-OO which is
+    // what we required before).
+    if (definition.factory) {
+      definition = createLegacyDefinitionAdapter(definition);
+    }
+
+    if (!definition.renderValue) {
+      throw new Error("Widget must have a renderValue function");
+    }
+
+    // For static rendering (non-Shiny), use a simple widget registration
+    // scheme. We also use this scheme for Shiny apps/documents that also
+    // contain static widgets.
+    window.HTMLWidgets.widgets = window.HTMLWidgets.widgets || [];
+    // Merge defaults into the definition; don't mutate the original definition.
+    var staticBinding = extend({}, defaults, definition);
+    overrideMethod(staticBinding, "find", function(superfunc) {
+      return function(scope) {
+        var results = superfunc(scope);
+        // Filter out Shiny outputs, we only want the static kind
+        return filterByClass(results, "html-widget-output", false);
+      };
+    });
+    window.HTMLWidgets.widgets.push(staticBinding);
+
+    if (shinyMode) {
+      // Shiny is running. Register the definition with an output binding.
+      // The definition itself will not be the output binding, instead
+      // we will make an output binding object that delegates to the
+      // definition. This is because we foolishly used the same method
+      // name (renderValue) for htmlwidgets definition and Shiny bindings
+      // but they actually have quite different semantics (the Shiny
+      // bindings receive data that includes lots of metadata that it
+      // strips off before calling htmlwidgets renderValue). We can't
+      // just ignore the difference because in some widgets it's helpful
+      // to call this.renderValue() from inside of resize(), and if
+      // we're not delegating, then that call will go to the Shiny
+      // version instead of the htmlwidgets version.
+
+      // Merge defaults with definition, without mutating either.
+      var bindingDef = extend({}, defaults, definition);
+
+      // This object will be our actual Shiny binding.
+      var shinyBinding = new Shiny.OutputBinding();
+
+      // With a few exceptions, we'll want to simply use the bindingDef's
+      // version of methods if they are available, otherwise fall back to
+      // Shiny's defaults. NOTE: If Shiny's output bindings gain additional
+      // methods in the future, and we want them to be overrideable by
+      // HTMLWidget binding definitions, then we'll need to add them to this
+      // list.
+      delegateMethod(shinyBinding, bindingDef, "getId");
+      delegateMethod(shinyBinding, bindingDef, "onValueChange");
+      delegateMethod(shinyBinding, bindingDef, "onValueError");
+      delegateMethod(shinyBinding, bindingDef, "renderError");
+      delegateMethod(shinyBinding, bindingDef, "clearError");
+      delegateMethod(shinyBinding, bindingDef, "showProgress");
+
+      // The find, renderValue, and resize are handled differently, because we
+      // want to actually decorate the behavior of the bindingDef methods.
+
+      shinyBinding.find = function(scope) {
+        var results = bindingDef.find(scope);
+
+        // Only return elements that are Shiny outputs, not static ones
+        var dynamicResults = results.filter(".html-widget-output");
+
+        // It's possible that whatever caused Shiny to think there might be
+        // new dynamic outputs, also caused there to be new static outputs.
+        // Since there might be lots of different htmlwidgets bindings, we
+        // schedule execution for later--no need to staticRender multiple
+        // times.
+        if (results.length !== dynamicResults.length)
+          scheduleStaticRender();
+
+        return dynamicResults;
+      };
+
+      // Wrap renderValue to handle initialization, which unfortunately isn't
+      // supported natively by Shiny at the time of this writing.
+
+      shinyBinding.renderValue = function(el, data) {
+        Shiny.renderDependencies(data.deps);
+        // Resolve strings marked as javascript literals to objects
+        if (!(data.evals instanceof Array)) data.evals = [data.evals];
+        for (var i = 0; data.evals && i < data.evals.length; i++) {
+          window.HTMLWidgets.evaluateStringMember(data.x, data.evals[i]);
+        }
+        if (!bindingDef.renderOnNullValue) {
+          if (data.x === null) {
+            el.style.visibility = "hidden";
+            return;
+          } else {
+            el.style.visibility = "inherit";
+          }
+        }
+        if (!elementData(el, "initialized")) {
+          initSizing(el);
+
+          elementData(el, "initialized", true);
+          if (bindingDef.initialize) {
+            var result = bindingDef.initialize(el, el.offsetWidth,
+              el.offsetHeight);
+            elementData(el, "init_result", result);
+          }
+        }
+        bindingDef.renderValue(el, data.x, elementData(el, "init_result"));
+        evalAndRun(data.jsHooks.render, elementData(el, "init_result"), [el, data.x]);
+      };
+
+      // Only override resize if bindingDef implements it
+      if (bindingDef.resize) {
+        shinyBinding.resize = function(el, width, height) {
+          // Shiny can call resize before initialize/renderValue have been
+          // called, which doesn't make sense for widgets.
+          if (elementData(el, "initialized")) {
+            bindingDef.resize(el, width, height, elementData(el, "init_result"));
+          }
+        };
+      }
+
+      Shiny.outputBindings.register(shinyBinding, bindingDef.name);
+    }
+  };
+
+  var scheduleStaticRenderTimerId = null;
+  function scheduleStaticRender() {
+    if (!scheduleStaticRenderTimerId) {
+      scheduleStaticRenderTimerId = setTimeout(function() {
+        scheduleStaticRenderTimerId = null;
+        window.HTMLWidgets.staticRender();
+      }, 1);
+    }
+  }
+
+  // Render static widgets after the document finishes loading
+  // Statically render all elements that are of this widget's class
+  window.HTMLWidgets.staticRender = function() {
+    var bindings = window.HTMLWidgets.widgets || [];
+    forEach(bindings, function(binding) {
+      var matches = binding.find(document.documentElement);
+      forEach(matches, function(el) {
+        var sizeObj = initSizing(el, binding);
+
+        if (hasClass(el, "html-widget-static-bound"))
+          return;
+        el.className = el.className + " html-widget-static-bound";
+
+        var initResult;
+        if (binding.initialize) {
+          initResult = binding.initialize(el,
+            sizeObj ? sizeObj.getWidth() : el.offsetWidth,
+            sizeObj ? sizeObj.getHeight() : el.offsetHeight
+          );
+          elementData(el, "init_result", initResult);
+        }
+
+        if (binding.resize) {
+          var lastSize = {
+            w: sizeObj ? sizeObj.getWidth() : el.offsetWidth,
+            h: sizeObj ? sizeObj.getHeight() : el.offsetHeight
+          };
+          var resizeHandler = function(e) {
+            var size = {
+              w: sizeObj ? sizeObj.getWidth() : el.offsetWidth,
+              h: sizeObj ? sizeObj.getHeight() : el.offsetHeight
+            };
+            if (size.w === 0 && size.h === 0)
+              return;
+            if (size.w === lastSize.w && size.h === lastSize.h)
+              return;
+            lastSize = size;
+            binding.resize(el, size.w, size.h, initResult);
+          };
+
+          on(window, "resize", resizeHandler);
+
+          // This is needed for cases where we're running in a Shiny
+          // app, but the widget itself is not a Shiny output, but
+          // rather a simple static widget. One example of this is
+          // an rmarkdown document that has runtime:shiny and widget
+          // that isn't in a render function. Shiny only knows to
+          // call resize handlers for Shiny outputs, not for static
+          // widgets, so we do it ourselves.
+          if (window.jQuery) {
+            window.jQuery(document).on(
+              "shown.htmlwidgets shown.bs.tab.htmlwidgets shown.bs.collapse.htmlwidgets",
+              resizeHandler
+            );
+            window.jQuery(document).on(
+              "hidden.htmlwidgets hidden.bs.tab.htmlwidgets hidden.bs.collapse.htmlwidgets",
+              resizeHandler
+            );
+          }
+
+          // This is needed for the specific case of ioslides, which
+          // flips slides between display:none and display:block.
+          // Ideally we would not have to have ioslide-specific code
+          // here, but rather have ioslides raise a generic event,
+          // but the rmarkdown package just went to CRAN so the
+          // window to getting that fixed may be long.
+          if (window.addEventListener) {
+            // It's OK to limit this to window.addEventListener
+            // browsers because ioslides itself only supports
+            // such browsers.
+            on(document, "slideenter", resizeHandler);
+            on(document, "slideleave", resizeHandler);
+          }
+        }
+
+        var scriptData = document.querySelector("script[data-for='" + el.id + "'][type='application/json']");
+        if (scriptData) {
+          var data = JSON.parse(scriptData.textContent || scriptData.text);
+          // Resolve strings marked as javascript literals to objects
+          if (!(data.evals instanceof Array)) data.evals = [data.evals];
+          for (var k = 0; data.evals && k < data.evals.length; k++) {
+            window.HTMLWidgets.evaluateStringMember(data.x, data.evals[k]);
+          }
+          binding.renderValue(el, data.x, initResult);
+          evalAndRun(data.jsHooks.render, initResult, [el, data.x]);
+        }
+      });
+    });
+
+    invokePostRenderHandlers();
+  }
+
+  // Wait until after the document has loaded to render the widgets.
+  if (document.addEventListener) {
+    document.addEventListener("DOMContentLoaded", function() {
+      document.removeEventListener("DOMContentLoaded", arguments.callee, false);
+      window.HTMLWidgets.staticRender();
+    }, false);
+  } else if (document.attachEvent) {
+    document.attachEvent("onreadystatechange", function() {
+      if (document.readyState === "complete") {
+        document.detachEvent("onreadystatechange", arguments.callee);
+        window.HTMLWidgets.staticRender();
+      }
+    });
+  }
+
+
+  window.HTMLWidgets.getAttachmentUrl = function(depname, key) {
+    // If no key, default to the first item
+    if (typeof(key) === "undefined")
+      key = 1;
+
+    var link = document.getElementById(depname + "-" + key + "-attachment");
+    if (!link) {
+      throw new Error("Attachment " + depname + "/" + key + " not found in document");
+    }
+    return link.getAttribute("href");
+  };
+
+  window.HTMLWidgets.dataframeToD3 = function(df) {
+    var names = [];
+    var length;
+    for (var name in df) {
+        if (df.hasOwnProperty(name))
+            names.push(name);
+        if (typeof(df[name]) !== "object" || typeof(df[name].length) === "undefined") {
+            throw new Error("All fields must be arrays");
+        } else if (typeof(length) !== "undefined" && length !== df[name].length) {
+            throw new Error("All fields must be arrays of the same length");
+        }
+        length = df[name].length;
+    }
+    var results = [];
+    var item;
+    for (var row = 0; row < length; row++) {
+        item = {};
+        for (var col = 0; col < names.length; col++) {
+            item[names[col]] = df[names[col]][row];
+        }
+        results.push(item);
+    }
+    return results;
+  };
+
+  window.HTMLWidgets.transposeArray2D = function(array) {
+      if (array.length === 0) return array;
+      var newArray = array[0].map(function(col, i) {
+          return array.map(function(row) {
+              return row[i]
+          })
+      });
+      return newArray;
+  };
+  // Split value at splitChar, but allow splitChar to be escaped
+  // using escapeChar. Any other characters escaped by escapeChar
+  // will be included as usual (including escapeChar itself).
+  function splitWithEscape(value, splitChar, escapeChar) {
+    var results = [];
+    var escapeMode = false;
+    var currentResult = "";
+    for (var pos = 0; pos < value.length; pos++) {
+      if (!escapeMode) {
+        if (value[pos] === splitChar) {
+          results.push(currentResult);
+          currentResult = "";
+        } else if (value[pos] === escapeChar) {
+          escapeMode = true;
+        } else {
+          currentResult += value[pos];
+        }
+      } else {
+        currentResult += value[pos];
+        escapeMode = false;
+      }
+    }
+    if (currentResult !== "") {
+      results.push(currentResult);
+    }
+    return results;
+  }
+  // Function authored by Yihui/JJ Allaire
+  window.HTMLWidgets.evaluateStringMember = function(o, member) {
+    var parts = splitWithEscape(member, '.', '\\');
+    for (var i = 0, l = parts.length; i < l; i++) {
+      var part = parts[i];
+      // part may be a character or 'numeric' member name
+      if (o !== null && typeof o === "object" && part in o) {
+        if (i == (l - 1)) { // if we are at the end of the line then evalulate
+          if (typeof o[part] === "string")
+            o[part] = eval("(" + o[part] + ")");
+        } else { // otherwise continue to next embedded object
+          o = o[part];
+        }
+      }
+    }
+  };
+
+  // Retrieve the HTMLWidget instance (i.e. the return value of an
+  // HTMLWidget binding's initialize() or factory() function)
+  // associated with an element, or null if none.
+  window.HTMLWidgets.getInstance = function(el) {
+    return elementData(el, "init_result");
+  };
+
+  // Finds the first element in the scope that matches the selector,
+  // and returns the HTMLWidget instance (i.e. the return value of
+  // an HTMLWidget binding's initialize() or factory() function)
+  // associated with that element, if any. If no element matches the
+  // selector, or the first matching element has no HTMLWidget
+  // instance associated with it, then null is returned.
+  //
+  // The scope argument is optional, and defaults to window.document.
+  window.HTMLWidgets.find = function(scope, selector) {
+    if (arguments.length == 1) {
+      selector = scope;
+      scope = document;
+    }
+
+    var el = scope.querySelector(selector);
+    if (el === null) {
+      return null;
+    } else {
+      return window.HTMLWidgets.getInstance(el);
+    }
+  };
+
+  // Finds all elements in the scope that match the selector, and
+  // returns the HTMLWidget instances (i.e. the return values of
+  // an HTMLWidget binding's initialize() or factory() function)
+  // associated with the elements, in an array. If elements that
+  // match the selector don't have an associated HTMLWidget
+  // instance, the returned array will contain nulls.
+  //
+  // The scope argument is optional, and defaults to window.document.
+  window.HTMLWidgets.findAll = function(scope, selector) {
+    if (arguments.length == 1) {
+      selector = scope;
+      scope = document;
+    }
+
+    var nodes = scope.querySelectorAll(selector);
+    var results = [];
+    for (var i = 0; i < nodes.length; i++) {
+      results.push(window.HTMLWidgets.getInstance(nodes[i]));
+    }
+    return results;
+  };
+
+  var postRenderHandlers = [];
+  function invokePostRenderHandlers() {
+    while (postRenderHandlers.length) {
+      var handler = postRenderHandlers.shift();
+      if (handler) {
+        handler();
+      }
+    }
+  }
+
+  // Register the given callback function to be invoked after the
+  // next time static widgets are rendered.
+  window.HTMLWidgets.addPostRenderHandler = function(callback) {
+    postRenderHandlers.push(callback);
+  };
+
+  // Takes a new-style instance-bound definition, and returns an
+  // old-style class-bound definition. This saves us from having
+  // to rewrite all the logic in this file to accomodate both
+  // types of definitions.
+  function createLegacyDefinitionAdapter(defn) {
+    var result = {
+      name: defn.name,
+      type: defn.type,
+      initialize: function(el, width, height) {
+        return defn.factory(el, width, height);
+      },
+      renderValue: function(el, x, instance) {
+        return instance.renderValue(x);
+      },
+      resize: function(el, width, height, instance) {
+        return instance.resize(width, height);
+      }
+    };
+
+    if (defn.find)
+      result.find = defn.find;
+    if (defn.renderError)
+      result.renderError = defn.renderError;
+    if (defn.clearError)
+      result.clearError = defn.clearError;
+
+    return result;
+  }
+})();
+
diff --git a/previous_versions/v0.4.0/libs/jquery-2.2.3/jquery.min.js b/previous_versions/v0.4.0/libs/jquery-2.2.3/jquery.min.js
new file mode 100644
index 000000000..b8c4187de
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/jquery-2.2.3/jquery.min.js
@@ -0,0 +1,4 @@
+/*! jQuery v2.2.3 | (c) jQuery Foundation | jquery.org/license */
+!function(a,b){"object"==typeof module&&"object"==typeof module.exports?module.exports=a.document?b(a,!0):function(a){if(!a.document)throw new Error("jQuery requires a window with a document");return b(a)}:b(a)}("undefined"!=typeof window?window:this,function(a,b){var c=[],d=a.document,e=c.slice,f=c.concat,g=c.push,h=c.indexOf,i={},j=i.toString,k=i.hasOwnProperty,l={},m="2.2.3",n=function(a,b){return new n.fn.init(a,b)},o=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,p=/^-ms-/,q=/-([\da-z])/gi,r=function(a,b){return b.toUpperCase()};n.fn=n.prototype={jquery:m,constructor:n,selector:"",length:0,toArray:function(){return e.call(this)},get:function(a){return null!=a?0>a?this[a+this.length]:this[a]:e.call(this)},pushStack:function(a){var b=n.merge(this.constructor(),a);return b.prevObject=this,b.context=this.context,b},each:function(a){return n.each(this,a)},map:function(a){return this.pushStack(n.map(this,function(b,c){return a.call(b,c,b)}))},slice:function(){return this.pushStack(e.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(a){var b=this.length,c=+a+(0>a?b:0);return this.pushStack(c>=0&&b>c?[this[c]]:[])},end:function(){return this.prevObject||this.constructor()},push:g,sort:c.sort,splice:c.splice},n.extend=n.fn.extend=function(){var a,b,c,d,e,f,g=arguments[0]||{},h=1,i=arguments.length,j=!1;for("boolean"==typeof g&&(j=g,g=arguments[h]||{},h++),"object"==typeof g||n.isFunction(g)||(g={}),h===i&&(g=this,h--);i>h;h++)if(null!=(a=arguments[h]))for(b in a)c=g[b],d=a[b],g!==d&&(j&&d&&(n.isPlainObject(d)||(e=n.isArray(d)))?(e?(e=!1,f=c&&n.isArray(c)?c:[]):f=c&&n.isPlainObject(c)?c:{},g[b]=n.extend(j,f,d)):void 0!==d&&(g[b]=d));return g},n.extend({expando:"jQuery"+(m+Math.random()).replace(/\D/g,""),isReady:!0,error:function(a){throw new Error(a)},noop:function(){},isFunction:function(a){return"function"===n.type(a)},isArray:Array.isArray,isWindow:function(a){return null!=a&&a===a.window},isNumeric:function(a){var b=a&&a.toString();return!n.isArray(a)&&b-parseFloat(b)+1>=0},isPlainObject:function(a){var b;if("object"!==n.type(a)||a.nodeType||n.isWindow(a))return!1;if(a.constructor&&!k.call(a,"constructor")&&!k.call(a.constructor.prototype||{},"isPrototypeOf"))return!1;for(b in a);return void 0===b||k.call(a,b)},isEmptyObject:function(a){var b;for(b in a)return!1;return!0},type:function(a){return null==a?a+"":"object"==typeof a||"function"==typeof a?i[j.call(a)]||"object":typeof a},globalEval:function(a){var b,c=eval;a=n.trim(a),a&&(1===a.indexOf("use strict")?(b=d.createElement("script"),b.text=a,d.head.appendChild(b).parentNode.removeChild(b)):c(a))},camelCase:function(a){return a.replace(p,"ms-").replace(q,r)},nodeName:function(a,b){return a.nodeName&&a.nodeName.toLowerCase()===b.toLowerCase()},each:function(a,b){var c,d=0;if(s(a)){for(c=a.length;c>d;d++)if(b.call(a[d],d,a[d])===!1)break}else for(d in a)if(b.call(a[d],d,a[d])===!1)break;return a},trim:function(a){return null==a?"":(a+"").replace(o,"")},makeArray:function(a,b){var c=b||[];return null!=a&&(s(Object(a))?n.merge(c,"string"==typeof a?[a]:a):g.call(c,a)),c},inArray:function(a,b,c){return null==b?-1:h.call(b,a,c)},merge:function(a,b){for(var c=+b.length,d=0,e=a.length;c>d;d++)a[e++]=b[d];return a.length=e,a},grep:function(a,b,c){for(var d,e=[],f=0,g=a.length,h=!c;g>f;f++)d=!b(a[f],f),d!==h&&e.push(a[f]);return e},map:function(a,b,c){var d,e,g=0,h=[];if(s(a))for(d=a.length;d>g;g++)e=b(a[g],g,c),null!=e&&h.push(e);else for(g in a)e=b(a[g],g,c),null!=e&&h.push(e);return f.apply([],h)},guid:1,proxy:function(a,b){var c,d,f;return"string"==typeof b&&(c=a[b],b=a,a=c),n.isFunction(a)?(d=e.call(arguments,2),f=function(){return a.apply(b||this,d.concat(e.call(arguments)))},f.guid=a.guid=a.guid||n.guid++,f):void 0},now:Date.now,support:l}),"function"==typeof Symbol&&(n.fn[Symbol.iterator]=c[Symbol.iterator]),n.each("Boolean Number String Function Array Date RegExp Object Error Symbol".split(" "),function(a,b){i["[object "+b+"]"]=b.toLowerCase()});function s(a){var b=!!a&&"length"in a&&a.length,c=n.type(a);return"function"===c||n.isWindow(a)?!1:"array"===c||0===b||"number"==typeof b&&b>0&&b-1 in a}var t=function(a){var b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u="sizzle"+1*new Date,v=a.document,w=0,x=0,y=ga(),z=ga(),A=ga(),B=function(a,b){return a===b&&(l=!0),0},C=1<<31,D={}.hasOwnProperty,E=[],F=E.pop,G=E.push,H=E.push,I=E.slice,J=function(a,b){for(var c=0,d=a.length;d>c;c++)if(a[c]===b)return c;return-1},K="checked|selected|async|autofocus|autoplay|controls|defer|disabled|hidden|ismap|loop|multiple|open|readonly|required|scoped",L="[\\x20\\t\\r\\n\\f]",M="(?:\\\\.|[\\w-]|[^\\x00-\\xa0])+",N="\\["+L+"*("+M+")(?:"+L+"*([*^$|!~]?=)"+L+"*(?:'((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\"|("+M+"))|)"+L+"*\\]",O=":("+M+")(?:\\((('((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\")|((?:\\\\.|[^\\\\()[\\]]|"+N+")*)|.*)\\)|)",P=new RegExp(L+"+","g"),Q=new RegExp("^"+L+"+|((?:^|[^\\\\])(?:\\\\.)*)"+L+"+$","g"),R=new RegExp("^"+L+"*,"+L+"*"),S=new RegExp("^"+L+"*([>+~]|"+L+")"+L+"*"),T=new RegExp("="+L+"*([^\\]'\"]*?)"+L+"*\\]","g"),U=new RegExp(O),V=new RegExp("^"+M+"$"),W={ID:new RegExp("^#("+M+")"),CLASS:new RegExp("^\\.("+M+")"),TAG:new RegExp("^("+M+"|[*])"),ATTR:new RegExp("^"+N),PSEUDO:new RegExp("^"+O),CHILD:new RegExp("^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\("+L+"*(even|odd|(([+-]|)(\\d*)n|)"+L+"*(?:([+-]|)"+L+"*(\\d+)|))"+L+"*\\)|)","i"),bool:new RegExp("^(?:"+K+")$","i"),needsContext:new RegExp("^"+L+"*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\("+L+"*((?:-\\d)?\\d*)"+L+"*\\)|)(?=[^-]|$)","i")},X=/^(?:input|select|textarea|button)$/i,Y=/^h\d$/i,Z=/^[^{]+\{\s*\[native \w/,$=/^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/,_=/[+~]/,aa=/'|\\/g,ba=new RegExp("\\\\([\\da-f]{1,6}"+L+"?|("+L+")|.)","ig"),ca=function(a,b,c){var d="0x"+b-65536;return d!==d||c?b:0>d?String.fromCharCode(d+65536):String.fromCharCode(d>>10|55296,1023&d|56320)},da=function(){m()};try{H.apply(E=I.call(v.childNodes),v.childNodes),E[v.childNodes.length].nodeType}catch(ea){H={apply:E.length?function(a,b){G.apply(a,I.call(b))}:function(a,b){var c=a.length,d=0;while(a[c++]=b[d++]);a.length=c-1}}}function fa(a,b,d,e){var f,h,j,k,l,o,r,s,w=b&&b.ownerDocument,x=b?b.nodeType:9;if(d=d||[],"string"!=typeof a||!a||1!==x&&9!==x&&11!==x)return d;if(!e&&((b?b.ownerDocument||b:v)!==n&&m(b),b=b||n,p)){if(11!==x&&(o=$.exec(a)))if(f=o[1]){if(9===x){if(!(j=b.getElementById(f)))return d;if(j.id===f)return d.push(j),d}else if(w&&(j=w.getElementById(f))&&t(b,j)&&j.id===f)return d.push(j),d}else{if(o[2])return H.apply(d,b.getElementsByTagName(a)),d;if((f=o[3])&&c.getElementsByClassName&&b.getElementsByClassName)return H.apply(d,b.getElementsByClassName(f)),d}if(c.qsa&&!A[a+" "]&&(!q||!q.test(a))){if(1!==x)w=b,s=a;else if("object"!==b.nodeName.toLowerCase()){(k=b.getAttribute("id"))?k=k.replace(aa,"\\$&"):b.setAttribute("id",k=u),r=g(a),h=r.length,l=V.test(k)?"#"+k:"[id='"+k+"']";while(h--)r[h]=l+" "+qa(r[h]);s=r.join(","),w=_.test(a)&&oa(b.parentNode)||b}if(s)try{return H.apply(d,w.querySelectorAll(s)),d}catch(y){}finally{k===u&&b.removeAttribute("id")}}}return i(a.replace(Q,"$1"),b,d,e)}function ga(){var a=[];function b(c,e){return a.push(c+" ")>d.cacheLength&&delete b[a.shift()],b[c+" "]=e}return b}function ha(a){return a[u]=!0,a}function ia(a){var b=n.createElement("div");try{return!!a(b)}catch(c){return!1}finally{b.parentNode&&b.parentNode.removeChild(b),b=null}}function ja(a,b){var c=a.split("|"),e=c.length;while(e--)d.attrHandle[c[e]]=b}function ka(a,b){var c=b&&a,d=c&&1===a.nodeType&&1===b.nodeType&&(~b.sourceIndex||C)-(~a.sourceIndex||C);if(d)return d;if(c)while(c=c.nextSibling)if(c===b)return-1;return a?1:-1}function la(a){return function(b){var c=b.nodeName.toLowerCase();return"input"===c&&b.type===a}}function ma(a){return function(b){var c=b.nodeName.toLowerCase();return("input"===c||"button"===c)&&b.type===a}}function na(a){return ha(function(b){return b=+b,ha(function(c,d){var e,f=a([],c.length,b),g=f.length;while(g--)c[e=f[g]]&&(c[e]=!(d[e]=c[e]))})})}function oa(a){return a&&"undefined"!=typeof a.getElementsByTagName&&a}c=fa.support={},f=fa.isXML=function(a){var b=a&&(a.ownerDocument||a).documentElement;return b?"HTML"!==b.nodeName:!1},m=fa.setDocument=function(a){var b,e,g=a?a.ownerDocument||a:v;return g!==n&&9===g.nodeType&&g.documentElement?(n=g,o=n.documentElement,p=!f(n),(e=n.defaultView)&&e.top!==e&&(e.addEventListener?e.addEventListener("unload",da,!1):e.attachEvent&&e.attachEvent("onunload",da)),c.attributes=ia(function(a){return a.className="i",!a.getAttribute("className")}),c.getElementsByTagName=ia(function(a){return a.appendChild(n.createComment("")),!a.getElementsByTagName("*").length}),c.getElementsByClassName=Z.test(n.getElementsByClassName),c.getById=ia(function(a){return o.appendChild(a).id=u,!n.getElementsByName||!n.getElementsByName(u).length}),c.getById?(d.find.ID=function(a,b){if("undefined"!=typeof b.getElementById&&p){var c=b.getElementById(a);return c?[c]:[]}},d.filter.ID=function(a){var b=a.replace(ba,ca);return function(a){return a.getAttribute("id")===b}}):(delete d.find.ID,d.filter.ID=function(a){var b=a.replace(ba,ca);return function(a){var c="undefined"!=typeof a.getAttributeNode&&a.getAttributeNode("id");return c&&c.value===b}}),d.find.TAG=c.getElementsByTagName?function(a,b){return"undefined"!=typeof b.getElementsByTagName?b.getElementsByTagName(a):c.qsa?b.querySelectorAll(a):void 0}:function(a,b){var c,d=[],e=0,f=b.getElementsByTagName(a);if("*"===a){while(c=f[e++])1===c.nodeType&&d.push(c);return d}return f},d.find.CLASS=c.getElementsByClassName&&function(a,b){return"undefined"!=typeof b.getElementsByClassName&&p?b.getElementsByClassName(a):void 0},r=[],q=[],(c.qsa=Z.test(n.querySelectorAll))&&(ia(function(a){o.appendChild(a).innerHTML="<a id='"+u+"'></a><select id='"+u+"-\r\\' msallowcapture=''><option selected=''></option></select>",a.querySelectorAll("[msallowcapture^='']").length&&q.push("[*^$]="+L+"*(?:''|\"\")"),a.querySelectorAll("[selected]").length||q.push("\\["+L+"*(?:value|"+K+")"),a.querySelectorAll("[id~="+u+"-]").length||q.push("~="),a.querySelectorAll(":checked").length||q.push(":checked"),a.querySelectorAll("a#"+u+"+*").length||q.push(".#.+[+~]")}),ia(function(a){var b=n.createElement("input");b.setAttribute("type","hidden"),a.appendChild(b).setAttribute("name","D"),a.querySelectorAll("[name=d]").length&&q.push("name"+L+"*[*^$|!~]?="),a.querySelectorAll(":enabled").length||q.push(":enabled",":disabled"),a.querySelectorAll("*,:x"),q.push(",.*:")})),(c.matchesSelector=Z.test(s=o.matches||o.webkitMatchesSelector||o.mozMatchesSelector||o.oMatchesSelector||o.msMatchesSelector))&&ia(function(a){c.disconnectedMatch=s.call(a,"div"),s.call(a,"[s!='']:x"),r.push("!=",O)}),q=q.length&&new RegExp(q.join("|")),r=r.length&&new RegExp(r.join("|")),b=Z.test(o.compareDocumentPosition),t=b||Z.test(o.contains)?function(a,b){var c=9===a.nodeType?a.documentElement:a,d=b&&b.parentNode;return a===d||!(!d||1!==d.nodeType||!(c.contains?c.contains(d):a.compareDocumentPosition&&16&a.compareDocumentPosition(d)))}:function(a,b){if(b)while(b=b.parentNode)if(b===a)return!0;return!1},B=b?function(a,b){if(a===b)return l=!0,0;var d=!a.compareDocumentPosition-!b.compareDocumentPosition;return d?d:(d=(a.ownerDocument||a)===(b.ownerDocument||b)?a.compareDocumentPosition(b):1,1&d||!c.sortDetached&&b.compareDocumentPosition(a)===d?a===n||a.ownerDocument===v&&t(v,a)?-1:b===n||b.ownerDocument===v&&t(v,b)?1:k?J(k,a)-J(k,b):0:4&d?-1:1)}:function(a,b){if(a===b)return l=!0,0;var c,d=0,e=a.parentNode,f=b.parentNode,g=[a],h=[b];if(!e||!f)return a===n?-1:b===n?1:e?-1:f?1:k?J(k,a)-J(k,b):0;if(e===f)return ka(a,b);c=a;while(c=c.parentNode)g.unshift(c);c=b;while(c=c.parentNode)h.unshift(c);while(g[d]===h[d])d++;return d?ka(g[d],h[d]):g[d]===v?-1:h[d]===v?1:0},n):n},fa.matches=function(a,b){return fa(a,null,null,b)},fa.matchesSelector=function(a,b){if((a.ownerDocument||a)!==n&&m(a),b=b.replace(T,"='$1']"),c.matchesSelector&&p&&!A[b+" "]&&(!r||!r.test(b))&&(!q||!q.test(b)))try{var d=s.call(a,b);if(d||c.disconnectedMatch||a.document&&11!==a.document.nodeType)return d}catch(e){}return fa(b,n,null,[a]).length>0},fa.contains=function(a,b){return(a.ownerDocument||a)!==n&&m(a),t(a,b)},fa.attr=function(a,b){(a.ownerDocument||a)!==n&&m(a);var e=d.attrHandle[b.toLowerCase()],f=e&&D.call(d.attrHandle,b.toLowerCase())?e(a,b,!p):void 0;return void 0!==f?f:c.attributes||!p?a.getAttribute(b):(f=a.getAttributeNode(b))&&f.specified?f.value:null},fa.error=function(a){throw new Error("Syntax error, unrecognized expression: "+a)},fa.uniqueSort=function(a){var b,d=[],e=0,f=0;if(l=!c.detectDuplicates,k=!c.sortStable&&a.slice(0),a.sort(B),l){while(b=a[f++])b===a[f]&&(e=d.push(f));while(e--)a.splice(d[e],1)}return k=null,a},e=fa.getText=function(a){var b,c="",d=0,f=a.nodeType;if(f){if(1===f||9===f||11===f){if("string"==typeof a.textContent)return a.textContent;for(a=a.firstChild;a;a=a.nextSibling)c+=e(a)}else if(3===f||4===f)return a.nodeValue}else while(b=a[d++])c+=e(b);return c},d=fa.selectors={cacheLength:50,createPseudo:ha,match:W,attrHandle:{},find:{},relative:{">":{dir:"parentNode",first:!0}," ":{dir:"parentNode"},"+":{dir:"previousSibling",first:!0},"~":{dir:"previousSibling"}},preFilter:{ATTR:function(a){return a[1]=a[1].replace(ba,ca),a[3]=(a[3]||a[4]||a[5]||"").replace(ba,ca),"~="===a[2]&&(a[3]=" "+a[3]+" "),a.slice(0,4)},CHILD:function(a){return a[1]=a[1].toLowerCase(),"nth"===a[1].slice(0,3)?(a[3]||fa.error(a[0]),a[4]=+(a[4]?a[5]+(a[6]||1):2*("even"===a[3]||"odd"===a[3])),a[5]=+(a[7]+a[8]||"odd"===a[3])):a[3]&&fa.error(a[0]),a},PSEUDO:function(a){var b,c=!a[6]&&a[2];return W.CHILD.test(a[0])?null:(a[3]?a[2]=a[4]||a[5]||"":c&&U.test(c)&&(b=g(c,!0))&&(b=c.indexOf(")",c.length-b)-c.length)&&(a[0]=a[0].slice(0,b),a[2]=c.slice(0,b)),a.slice(0,3))}},filter:{TAG:function(a){var b=a.replace(ba,ca).toLowerCase();return"*"===a?function(){return!0}:function(a){return a.nodeName&&a.nodeName.toLowerCase()===b}},CLASS:function(a){var b=y[a+" "];return b||(b=new RegExp("(^|"+L+")"+a+"("+L+"|$)"))&&y(a,function(a){return b.test("string"==typeof a.className&&a.className||"undefined"!=typeof a.getAttribute&&a.getAttribute("class")||"")})},ATTR:function(a,b,c){return function(d){var e=fa.attr(d,a);return null==e?"!="===b:b?(e+="","="===b?e===c:"!="===b?e!==c:"^="===b?c&&0===e.indexOf(c):"*="===b?c&&e.indexOf(c)>-1:"$="===b?c&&e.slice(-c.length)===c:"~="===b?(" "+e.replace(P," ")+" ").indexOf(c)>-1:"|="===b?e===c||e.slice(0,c.length+1)===c+"-":!1):!0}},CHILD:function(a,b,c,d,e){var f="nth"!==a.slice(0,3),g="last"!==a.slice(-4),h="of-type"===b;return 1===d&&0===e?function(a){return!!a.parentNode}:function(b,c,i){var j,k,l,m,n,o,p=f!==g?"nextSibling":"previousSibling",q=b.parentNode,r=h&&b.nodeName.toLowerCase(),s=!i&&!h,t=!1;if(q){if(f){while(p){m=b;while(m=m[p])if(h?m.nodeName.toLowerCase()===r:1===m.nodeType)return!1;o=p="only"===a&&!o&&"nextSibling"}return!0}if(o=[g?q.firstChild:q.lastChild],g&&s){m=q,l=m[u]||(m[u]={}),k=l[m.uniqueID]||(l[m.uniqueID]={}),j=k[a]||[],n=j[0]===w&&j[1],t=n&&j[2],m=n&&q.childNodes[n];while(m=++n&&m&&m[p]||(t=n=0)||o.pop())if(1===m.nodeType&&++t&&m===b){k[a]=[w,n,t];break}}else if(s&&(m=b,l=m[u]||(m[u]={}),k=l[m.uniqueID]||(l[m.uniqueID]={}),j=k[a]||[],n=j[0]===w&&j[1],t=n),t===!1)while(m=++n&&m&&m[p]||(t=n=0)||o.pop())if((h?m.nodeName.toLowerCase()===r:1===m.nodeType)&&++t&&(s&&(l=m[u]||(m[u]={}),k=l[m.uniqueID]||(l[m.uniqueID]={}),k[a]=[w,t]),m===b))break;return t-=e,t===d||t%d===0&&t/d>=0}}},PSEUDO:function(a,b){var c,e=d.pseudos[a]||d.setFilters[a.toLowerCase()]||fa.error("unsupported pseudo: "+a);return e[u]?e(b):e.length>1?(c=[a,a,"",b],d.setFilters.hasOwnProperty(a.toLowerCase())?ha(function(a,c){var d,f=e(a,b),g=f.length;while(g--)d=J(a,f[g]),a[d]=!(c[d]=f[g])}):function(a){return e(a,0,c)}):e}},pseudos:{not:ha(function(a){var b=[],c=[],d=h(a.replace(Q,"$1"));return d[u]?ha(function(a,b,c,e){var f,g=d(a,null,e,[]),h=a.length;while(h--)(f=g[h])&&(a[h]=!(b[h]=f))}):function(a,e,f){return b[0]=a,d(b,null,f,c),b[0]=null,!c.pop()}}),has:ha(function(a){return function(b){return fa(a,b).length>0}}),contains:ha(function(a){return a=a.replace(ba,ca),function(b){return(b.textContent||b.innerText||e(b)).indexOf(a)>-1}}),lang:ha(function(a){return V.test(a||"")||fa.error("unsupported lang: "+a),a=a.replace(ba,ca).toLowerCase(),function(b){var c;do if(c=p?b.lang:b.getAttribute("xml:lang")||b.getAttribute("lang"))return c=c.toLowerCase(),c===a||0===c.indexOf(a+"-");while((b=b.parentNode)&&1===b.nodeType);return!1}}),target:function(b){var c=a.location&&a.location.hash;return c&&c.slice(1)===b.id},root:function(a){return a===o},focus:function(a){return a===n.activeElement&&(!n.hasFocus||n.hasFocus())&&!!(a.type||a.href||~a.tabIndex)},enabled:function(a){return a.disabled===!1},disabled:function(a){return a.disabled===!0},checked:function(a){var b=a.nodeName.toLowerCase();return"input"===b&&!!a.checked||"option"===b&&!!a.selected},selected:function(a){return a.parentNode&&a.parentNode.selectedIndex,a.selected===!0},empty:function(a){for(a=a.firstChild;a;a=a.nextSibling)if(a.nodeType<6)return!1;return!0},parent:function(a){return!d.pseudos.empty(a)},header:function(a){return Y.test(a.nodeName)},input:function(a){return X.test(a.nodeName)},button:function(a){var b=a.nodeName.toLowerCase();return"input"===b&&"button"===a.type||"button"===b},text:function(a){var b;return"input"===a.nodeName.toLowerCase()&&"text"===a.type&&(null==(b=a.getAttribute("type"))||"text"===b.toLowerCase())},first:na(function(){return[0]}),last:na(function(a,b){return[b-1]}),eq:na(function(a,b,c){return[0>c?c+b:c]}),even:na(function(a,b){for(var c=0;b>c;c+=2)a.push(c);return a}),odd:na(function(a,b){for(var c=1;b>c;c+=2)a.push(c);return a}),lt:na(function(a,b,c){for(var d=0>c?c+b:c;--d>=0;)a.push(d);return a}),gt:na(function(a,b,c){for(var d=0>c?c+b:c;++d<b;)a.push(d);return a})}},d.pseudos.nth=d.pseudos.eq;for(b in{radio:!0,checkbox:!0,file:!0,password:!0,image:!0})d.pseudos[b]=la(b);for(b in{submit:!0,reset:!0})d.pseudos[b]=ma(b);function pa(){}pa.prototype=d.filters=d.pseudos,d.setFilters=new pa,g=fa.tokenize=function(a,b){var c,e,f,g,h,i,j,k=z[a+" "];if(k)return b?0:k.slice(0);h=a,i=[],j=d.preFilter;while(h){c&&!(e=R.exec(h))||(e&&(h=h.slice(e[0].length)||h),i.push(f=[])),c=!1,(e=S.exec(h))&&(c=e.shift(),f.push({value:c,type:e[0].replace(Q," ")}),h=h.slice(c.length));for(g in d.filter)!(e=W[g].exec(h))||j[g]&&!(e=j[g](e))||(c=e.shift(),f.push({value:c,type:g,matches:e}),h=h.slice(c.length));if(!c)break}return b?h.length:h?fa.error(a):z(a,i).slice(0)};function qa(a){for(var b=0,c=a.length,d="";c>b;b++)d+=a[b].value;return d}function ra(a,b,c){var d=b.dir,e=c&&"parentNode"===d,f=x++;return b.first?function(b,c,f){while(b=b[d])if(1===b.nodeType||e)return a(b,c,f)}:function(b,c,g){var h,i,j,k=[w,f];if(g){while(b=b[d])if((1===b.nodeType||e)&&a(b,c,g))return!0}else while(b=b[d])if(1===b.nodeType||e){if(j=b[u]||(b[u]={}),i=j[b.uniqueID]||(j[b.uniqueID]={}),(h=i[d])&&h[0]===w&&h[1]===f)return k[2]=h[2];if(i[d]=k,k[2]=a(b,c,g))return!0}}}function sa(a){return a.length>1?function(b,c,d){var e=a.length;while(e--)if(!a[e](b,c,d))return!1;return!0}:a[0]}function ta(a,b,c){for(var d=0,e=b.length;e>d;d++)fa(a,b[d],c);return c}function ua(a,b,c,d,e){for(var f,g=[],h=0,i=a.length,j=null!=b;i>h;h++)(f=a[h])&&(c&&!c(f,d,e)||(g.push(f),j&&b.push(h)));return g}function va(a,b,c,d,e,f){return d&&!d[u]&&(d=va(d)),e&&!e[u]&&(e=va(e,f)),ha(function(f,g,h,i){var j,k,l,m=[],n=[],o=g.length,p=f||ta(b||"*",h.nodeType?[h]:h,[]),q=!a||!f&&b?p:ua(p,m,a,h,i),r=c?e||(f?a:o||d)?[]:g:q;if(c&&c(q,r,h,i),d){j=ua(r,n),d(j,[],h,i),k=j.length;while(k--)(l=j[k])&&(r[n[k]]=!(q[n[k]]=l))}if(f){if(e||a){if(e){j=[],k=r.length;while(k--)(l=r[k])&&j.push(q[k]=l);e(null,r=[],j,i)}k=r.length;while(k--)(l=r[k])&&(j=e?J(f,l):m[k])>-1&&(f[j]=!(g[j]=l))}}else r=ua(r===g?r.splice(o,r.length):r),e?e(null,g,r,i):H.apply(g,r)})}function wa(a){for(var b,c,e,f=a.length,g=d.relative[a[0].type],h=g||d.relative[" "],i=g?1:0,k=ra(function(a){return a===b},h,!0),l=ra(function(a){return J(b,a)>-1},h,!0),m=[function(a,c,d){var e=!g&&(d||c!==j)||((b=c).nodeType?k(a,c,d):l(a,c,d));return b=null,e}];f>i;i++)if(c=d.relative[a[i].type])m=[ra(sa(m),c)];else{if(c=d.filter[a[i].type].apply(null,a[i].matches),c[u]){for(e=++i;f>e;e++)if(d.relative[a[e].type])break;return va(i>1&&sa(m),i>1&&qa(a.slice(0,i-1).concat({value:" "===a[i-2].type?"*":""})).replace(Q,"$1"),c,e>i&&wa(a.slice(i,e)),f>e&&wa(a=a.slice(e)),f>e&&qa(a))}m.push(c)}return sa(m)}function xa(a,b){var c=b.length>0,e=a.length>0,f=function(f,g,h,i,k){var l,o,q,r=0,s="0",t=f&&[],u=[],v=j,x=f||e&&d.find.TAG("*",k),y=w+=null==v?1:Math.random()||.1,z=x.length;for(k&&(j=g===n||g||k);s!==z&&null!=(l=x[s]);s++){if(e&&l){o=0,g||l.ownerDocument===n||(m(l),h=!p);while(q=a[o++])if(q(l,g||n,h)){i.push(l);break}k&&(w=y)}c&&((l=!q&&l)&&r--,f&&t.push(l))}if(r+=s,c&&s!==r){o=0;while(q=b[o++])q(t,u,g,h);if(f){if(r>0)while(s--)t[s]||u[s]||(u[s]=F.call(i));u=ua(u)}H.apply(i,u),k&&!f&&u.length>0&&r+b.length>1&&fa.uniqueSort(i)}return k&&(w=y,j=v),t};return c?ha(f):f}return h=fa.compile=function(a,b){var c,d=[],e=[],f=A[a+" "];if(!f){b||(b=g(a)),c=b.length;while(c--)f=wa(b[c]),f[u]?d.push(f):e.push(f);f=A(a,xa(e,d)),f.selector=a}return f},i=fa.select=function(a,b,e,f){var i,j,k,l,m,n="function"==typeof a&&a,o=!f&&g(a=n.selector||a);if(e=e||[],1===o.length){if(j=o[0]=o[0].slice(0),j.length>2&&"ID"===(k=j[0]).type&&c.getById&&9===b.nodeType&&p&&d.relative[j[1].type]){if(b=(d.find.ID(k.matches[0].replace(ba,ca),b)||[])[0],!b)return e;n&&(b=b.parentNode),a=a.slice(j.shift().value.length)}i=W.needsContext.test(a)?0:j.length;while(i--){if(k=j[i],d.relative[l=k.type])break;if((m=d.find[l])&&(f=m(k.matches[0].replace(ba,ca),_.test(j[0].type)&&oa(b.parentNode)||b))){if(j.splice(i,1),a=f.length&&qa(j),!a)return H.apply(e,f),e;break}}}return(n||h(a,o))(f,b,!p,e,!b||_.test(a)&&oa(b.parentNode)||b),e},c.sortStable=u.split("").sort(B).join("")===u,c.detectDuplicates=!!l,m(),c.sortDetached=ia(function(a){return 1&a.compareDocumentPosition(n.createElement("div"))}),ia(function(a){return a.innerHTML="<a href='#'></a>","#"===a.firstChild.getAttribute("href")})||ja("type|href|height|width",function(a,b,c){return c?void 0:a.getAttribute(b,"type"===b.toLowerCase()?1:2)}),c.attributes&&ia(function(a){return a.innerHTML="<input/>",a.firstChild.setAttribute("value",""),""===a.firstChild.getAttribute("value")})||ja("value",function(a,b,c){return c||"input"!==a.nodeName.toLowerCase()?void 0:a.defaultValue}),ia(function(a){return null==a.getAttribute("disabled")})||ja(K,function(a,b,c){var d;return c?void 0:a[b]===!0?b.toLowerCase():(d=a.getAttributeNode(b))&&d.specified?d.value:null}),fa}(a);n.find=t,n.expr=t.selectors,n.expr[":"]=n.expr.pseudos,n.uniqueSort=n.unique=t.uniqueSort,n.text=t.getText,n.isXMLDoc=t.isXML,n.contains=t.contains;var u=function(a,b,c){var d=[],e=void 0!==c;while((a=a[b])&&9!==a.nodeType)if(1===a.nodeType){if(e&&n(a).is(c))break;d.push(a)}return d},v=function(a,b){for(var c=[];a;a=a.nextSibling)1===a.nodeType&&a!==b&&c.push(a);return c},w=n.expr.match.needsContext,x=/^<([\w-]+)\s*\/?>(?:<\/\1>|)$/,y=/^.[^:#\[\.,]*$/;function z(a,b,c){if(n.isFunction(b))return n.grep(a,function(a,d){return!!b.call(a,d,a)!==c});if(b.nodeType)return n.grep(a,function(a){return a===b!==c});if("string"==typeof b){if(y.test(b))return n.filter(b,a,c);b=n.filter(b,a)}return n.grep(a,function(a){return h.call(b,a)>-1!==c})}n.filter=function(a,b,c){var d=b[0];return c&&(a=":not("+a+")"),1===b.length&&1===d.nodeType?n.find.matchesSelector(d,a)?[d]:[]:n.find.matches(a,n.grep(b,function(a){return 1===a.nodeType}))},n.fn.extend({find:function(a){var b,c=this.length,d=[],e=this;if("string"!=typeof a)return this.pushStack(n(a).filter(function(){for(b=0;c>b;b++)if(n.contains(e[b],this))return!0}));for(b=0;c>b;b++)n.find(a,e[b],d);return d=this.pushStack(c>1?n.unique(d):d),d.selector=this.selector?this.selector+" "+a:a,d},filter:function(a){return this.pushStack(z(this,a||[],!1))},not:function(a){return this.pushStack(z(this,a||[],!0))},is:function(a){return!!z(this,"string"==typeof a&&w.test(a)?n(a):a||[],!1).length}});var A,B=/^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]*))$/,C=n.fn.init=function(a,b,c){var e,f;if(!a)return this;if(c=c||A,"string"==typeof a){if(e="<"===a[0]&&">"===a[a.length-1]&&a.length>=3?[null,a,null]:B.exec(a),!e||!e[1]&&b)return!b||b.jquery?(b||c).find(a):this.constructor(b).find(a);if(e[1]){if(b=b instanceof n?b[0]:b,n.merge(this,n.parseHTML(e[1],b&&b.nodeType?b.ownerDocument||b:d,!0)),x.test(e[1])&&n.isPlainObject(b))for(e in b)n.isFunction(this[e])?this[e](b[e]):this.attr(e,b[e]);return this}return f=d.getElementById(e[2]),f&&f.parentNode&&(this.length=1,this[0]=f),this.context=d,this.selector=a,this}return a.nodeType?(this.context=this[0]=a,this.length=1,this):n.isFunction(a)?void 0!==c.ready?c.ready(a):a(n):(void 0!==a.selector&&(this.selector=a.selector,this.context=a.context),n.makeArray(a,this))};C.prototype=n.fn,A=n(d);var D=/^(?:parents|prev(?:Until|All))/,E={children:!0,contents:!0,next:!0,prev:!0};n.fn.extend({has:function(a){var b=n(a,this),c=b.length;return this.filter(function(){for(var a=0;c>a;a++)if(n.contains(this,b[a]))return!0})},closest:function(a,b){for(var c,d=0,e=this.length,f=[],g=w.test(a)||"string"!=typeof a?n(a,b||this.context):0;e>d;d++)for(c=this[d];c&&c!==b;c=c.parentNode)if(c.nodeType<11&&(g?g.index(c)>-1:1===c.nodeType&&n.find.matchesSelector(c,a))){f.push(c);break}return this.pushStack(f.length>1?n.uniqueSort(f):f)},index:function(a){return a?"string"==typeof a?h.call(n(a),this[0]):h.call(this,a.jquery?a[0]:a):this[0]&&this[0].parentNode?this.first().prevAll().length:-1},add:function(a,b){return this.pushStack(n.uniqueSort(n.merge(this.get(),n(a,b))))},addBack:function(a){return this.add(null==a?this.prevObject:this.prevObject.filter(a))}});function F(a,b){while((a=a[b])&&1!==a.nodeType);return a}n.each({parent:function(a){var b=a.parentNode;return b&&11!==b.nodeType?b:null},parents:function(a){return u(a,"parentNode")},parentsUntil:function(a,b,c){return u(a,"parentNode",c)},next:function(a){return F(a,"nextSibling")},prev:function(a){return F(a,"previousSibling")},nextAll:function(a){return u(a,"nextSibling")},prevAll:function(a){return u(a,"previousSibling")},nextUntil:function(a,b,c){return u(a,"nextSibling",c)},prevUntil:function(a,b,c){return u(a,"previousSibling",c)},siblings:function(a){return v((a.parentNode||{}).firstChild,a)},children:function(a){return v(a.firstChild)},contents:function(a){return a.contentDocument||n.merge([],a.childNodes)}},function(a,b){n.fn[a]=function(c,d){var e=n.map(this,b,c);return"Until"!==a.slice(-5)&&(d=c),d&&"string"==typeof d&&(e=n.filter(d,e)),this.length>1&&(E[a]||n.uniqueSort(e),D.test(a)&&e.reverse()),this.pushStack(e)}});var G=/\S+/g;function H(a){var b={};return n.each(a.match(G)||[],function(a,c){b[c]=!0}),b}n.Callbacks=function(a){a="string"==typeof a?H(a):n.extend({},a);var b,c,d,e,f=[],g=[],h=-1,i=function(){for(e=a.once,d=b=!0;g.length;h=-1){c=g.shift();while(++h<f.length)f[h].apply(c[0],c[1])===!1&&a.stopOnFalse&&(h=f.length,c=!1)}a.memory||(c=!1),b=!1,e&&(f=c?[]:"")},j={add:function(){return f&&(c&&!b&&(h=f.length-1,g.push(c)),function d(b){n.each(b,function(b,c){n.isFunction(c)?a.unique&&j.has(c)||f.push(c):c&&c.length&&"string"!==n.type(c)&&d(c)})}(arguments),c&&!b&&i()),this},remove:function(){return n.each(arguments,function(a,b){var c;while((c=n.inArray(b,f,c))>-1)f.splice(c,1),h>=c&&h--}),this},has:function(a){return a?n.inArray(a,f)>-1:f.length>0},empty:function(){return f&&(f=[]),this},disable:function(){return e=g=[],f=c="",this},disabled:function(){return!f},lock:function(){return e=g=[],c||(f=c=""),this},locked:function(){return!!e},fireWith:function(a,c){return e||(c=c||[],c=[a,c.slice?c.slice():c],g.push(c),b||i()),this},fire:function(){return j.fireWith(this,arguments),this},fired:function(){return!!d}};return j},n.extend({Deferred:function(a){var b=[["resolve","done",n.Callbacks("once memory"),"resolved"],["reject","fail",n.Callbacks("once memory"),"rejected"],["notify","progress",n.Callbacks("memory")]],c="pending",d={state:function(){return c},always:function(){return e.done(arguments).fail(arguments),this},then:function(){var a=arguments;return n.Deferred(function(c){n.each(b,function(b,f){var g=n.isFunction(a[b])&&a[b];e[f[1]](function(){var a=g&&g.apply(this,arguments);a&&n.isFunction(a.promise)?a.promise().progress(c.notify).done(c.resolve).fail(c.reject):c[f[0]+"With"](this===d?c.promise():this,g?[a]:arguments)})}),a=null}).promise()},promise:function(a){return null!=a?n.extend(a,d):d}},e={};return d.pipe=d.then,n.each(b,function(a,f){var g=f[2],h=f[3];d[f[1]]=g.add,h&&g.add(function(){c=h},b[1^a][2].disable,b[2][2].lock),e[f[0]]=function(){return e[f[0]+"With"](this===e?d:this,arguments),this},e[f[0]+"With"]=g.fireWith}),d.promise(e),a&&a.call(e,e),e},when:function(a){var b=0,c=e.call(arguments),d=c.length,f=1!==d||a&&n.isFunction(a.promise)?d:0,g=1===f?a:n.Deferred(),h=function(a,b,c){return function(d){b[a]=this,c[a]=arguments.length>1?e.call(arguments):d,c===i?g.notifyWith(b,c):--f||g.resolveWith(b,c)}},i,j,k;if(d>1)for(i=new Array(d),j=new Array(d),k=new Array(d);d>b;b++)c[b]&&n.isFunction(c[b].promise)?c[b].promise().progress(h(b,j,i)).done(h(b,k,c)).fail(g.reject):--f;return f||g.resolveWith(k,c),g.promise()}});var I;n.fn.ready=function(a){return n.ready.promise().done(a),this},n.extend({isReady:!1,readyWait:1,holdReady:function(a){a?n.readyWait++:n.ready(!0)},ready:function(a){(a===!0?--n.readyWait:n.isReady)||(n.isReady=!0,a!==!0&&--n.readyWait>0||(I.resolveWith(d,[n]),n.fn.triggerHandler&&(n(d).triggerHandler("ready"),n(d).off("ready"))))}});function J(){d.removeEventListener("DOMContentLoaded",J),a.removeEventListener("load",J),n.ready()}n.ready.promise=function(b){return I||(I=n.Deferred(),"complete"===d.readyState||"loading"!==d.readyState&&!d.documentElement.doScroll?a.setTimeout(n.ready):(d.addEventListener("DOMContentLoaded",J),a.addEventListener("load",J))),I.promise(b)},n.ready.promise();var K=function(a,b,c,d,e,f,g){var h=0,i=a.length,j=null==c;if("object"===n.type(c)){e=!0;for(h in c)K(a,b,h,c[h],!0,f,g)}else if(void 0!==d&&(e=!0,n.isFunction(d)||(g=!0),j&&(g?(b.call(a,d),b=null):(j=b,b=function(a,b,c){return j.call(n(a),c)})),b))for(;i>h;h++)b(a[h],c,g?d:d.call(a[h],h,b(a[h],c)));return e?a:j?b.call(a):i?b(a[0],c):f},L=function(a){return 1===a.nodeType||9===a.nodeType||!+a.nodeType};function M(){this.expando=n.expando+M.uid++}M.uid=1,M.prototype={register:function(a,b){var c=b||{};return a.nodeType?a[this.expando]=c:Object.defineProperty(a,this.expando,{value:c,writable:!0,configurable:!0}),a[this.expando]},cache:function(a){if(!L(a))return{};var b=a[this.expando];return b||(b={},L(a)&&(a.nodeType?a[this.expando]=b:Object.defineProperty(a,this.expando,{value:b,configurable:!0}))),b},set:function(a,b,c){var d,e=this.cache(a);if("string"==typeof b)e[b]=c;else for(d in b)e[d]=b[d];return e},get:function(a,b){return void 0===b?this.cache(a):a[this.expando]&&a[this.expando][b]},access:function(a,b,c){var d;return void 0===b||b&&"string"==typeof b&&void 0===c?(d=this.get(a,b),void 0!==d?d:this.get(a,n.camelCase(b))):(this.set(a,b,c),void 0!==c?c:b)},remove:function(a,b){var c,d,e,f=a[this.expando];if(void 0!==f){if(void 0===b)this.register(a);else{n.isArray(b)?d=b.concat(b.map(n.camelCase)):(e=n.camelCase(b),b in f?d=[b,e]:(d=e,d=d in f?[d]:d.match(G)||[])),c=d.length;while(c--)delete f[d[c]]}(void 0===b||n.isEmptyObject(f))&&(a.nodeType?a[this.expando]=void 0:delete a[this.expando])}},hasData:function(a){var b=a[this.expando];return void 0!==b&&!n.isEmptyObject(b)}};var N=new M,O=new M,P=/^(?:\{[\w\W]*\}|\[[\w\W]*\])$/,Q=/[A-Z]/g;function R(a,b,c){var d;if(void 0===c&&1===a.nodeType)if(d="data-"+b.replace(Q,"-$&").toLowerCase(),c=a.getAttribute(d),"string"==typeof c){try{c="true"===c?!0:"false"===c?!1:"null"===c?null:+c+""===c?+c:P.test(c)?n.parseJSON(c):c;
+}catch(e){}O.set(a,b,c)}else c=void 0;return c}n.extend({hasData:function(a){return O.hasData(a)||N.hasData(a)},data:function(a,b,c){return O.access(a,b,c)},removeData:function(a,b){O.remove(a,b)},_data:function(a,b,c){return N.access(a,b,c)},_removeData:function(a,b){N.remove(a,b)}}),n.fn.extend({data:function(a,b){var c,d,e,f=this[0],g=f&&f.attributes;if(void 0===a){if(this.length&&(e=O.get(f),1===f.nodeType&&!N.get(f,"hasDataAttrs"))){c=g.length;while(c--)g[c]&&(d=g[c].name,0===d.indexOf("data-")&&(d=n.camelCase(d.slice(5)),R(f,d,e[d])));N.set(f,"hasDataAttrs",!0)}return e}return"object"==typeof a?this.each(function(){O.set(this,a)}):K(this,function(b){var c,d;if(f&&void 0===b){if(c=O.get(f,a)||O.get(f,a.replace(Q,"-$&").toLowerCase()),void 0!==c)return c;if(d=n.camelCase(a),c=O.get(f,d),void 0!==c)return c;if(c=R(f,d,void 0),void 0!==c)return c}else d=n.camelCase(a),this.each(function(){var c=O.get(this,d);O.set(this,d,b),a.indexOf("-")>-1&&void 0!==c&&O.set(this,a,b)})},null,b,arguments.length>1,null,!0)},removeData:function(a){return this.each(function(){O.remove(this,a)})}}),n.extend({queue:function(a,b,c){var d;return a?(b=(b||"fx")+"queue",d=N.get(a,b),c&&(!d||n.isArray(c)?d=N.access(a,b,n.makeArray(c)):d.push(c)),d||[]):void 0},dequeue:function(a,b){b=b||"fx";var c=n.queue(a,b),d=c.length,e=c.shift(),f=n._queueHooks(a,b),g=function(){n.dequeue(a,b)};"inprogress"===e&&(e=c.shift(),d--),e&&("fx"===b&&c.unshift("inprogress"),delete f.stop,e.call(a,g,f)),!d&&f&&f.empty.fire()},_queueHooks:function(a,b){var c=b+"queueHooks";return N.get(a,c)||N.access(a,c,{empty:n.Callbacks("once memory").add(function(){N.remove(a,[b+"queue",c])})})}}),n.fn.extend({queue:function(a,b){var c=2;return"string"!=typeof a&&(b=a,a="fx",c--),arguments.length<c?n.queue(this[0],a):void 0===b?this:this.each(function(){var c=n.queue(this,a,b);n._queueHooks(this,a),"fx"===a&&"inprogress"!==c[0]&&n.dequeue(this,a)})},dequeue:function(a){return this.each(function(){n.dequeue(this,a)})},clearQueue:function(a){return this.queue(a||"fx",[])},promise:function(a,b){var c,d=1,e=n.Deferred(),f=this,g=this.length,h=function(){--d||e.resolveWith(f,[f])};"string"!=typeof a&&(b=a,a=void 0),a=a||"fx";while(g--)c=N.get(f[g],a+"queueHooks"),c&&c.empty&&(d++,c.empty.add(h));return h(),e.promise(b)}});var S=/[+-]?(?:\d*\.|)\d+(?:[eE][+-]?\d+|)/.source,T=new RegExp("^(?:([+-])=|)("+S+")([a-z%]*)$","i"),U=["Top","Right","Bottom","Left"],V=function(a,b){return a=b||a,"none"===n.css(a,"display")||!n.contains(a.ownerDocument,a)};function W(a,b,c,d){var e,f=1,g=20,h=d?function(){return d.cur()}:function(){return n.css(a,b,"")},i=h(),j=c&&c[3]||(n.cssNumber[b]?"":"px"),k=(n.cssNumber[b]||"px"!==j&&+i)&&T.exec(n.css(a,b));if(k&&k[3]!==j){j=j||k[3],c=c||[],k=+i||1;do f=f||".5",k/=f,n.style(a,b,k+j);while(f!==(f=h()/i)&&1!==f&&--g)}return c&&(k=+k||+i||0,e=c[1]?k+(c[1]+1)*c[2]:+c[2],d&&(d.unit=j,d.start=k,d.end=e)),e}var X=/^(?:checkbox|radio)$/i,Y=/<([\w:-]+)/,Z=/^$|\/(?:java|ecma)script/i,$={option:[1,"<select multiple='multiple'>","</select>"],thead:[1,"<table>","</table>"],col:[2,"<table><colgroup>","</colgroup></table>"],tr:[2,"<table><tbody>","</tbody></table>"],td:[3,"<table><tbody><tr>","</tr></tbody></table>"],_default:[0,"",""]};$.optgroup=$.option,$.tbody=$.tfoot=$.colgroup=$.caption=$.thead,$.th=$.td;function _(a,b){var c="undefined"!=typeof a.getElementsByTagName?a.getElementsByTagName(b||"*"):"undefined"!=typeof a.querySelectorAll?a.querySelectorAll(b||"*"):[];return void 0===b||b&&n.nodeName(a,b)?n.merge([a],c):c}function aa(a,b){for(var c=0,d=a.length;d>c;c++)N.set(a[c],"globalEval",!b||N.get(b[c],"globalEval"))}var ba=/<|&#?\w+;/;function ca(a,b,c,d,e){for(var f,g,h,i,j,k,l=b.createDocumentFragment(),m=[],o=0,p=a.length;p>o;o++)if(f=a[o],f||0===f)if("object"===n.type(f))n.merge(m,f.nodeType?[f]:f);else if(ba.test(f)){g=g||l.appendChild(b.createElement("div")),h=(Y.exec(f)||["",""])[1].toLowerCase(),i=$[h]||$._default,g.innerHTML=i[1]+n.htmlPrefilter(f)+i[2],k=i[0];while(k--)g=g.lastChild;n.merge(m,g.childNodes),g=l.firstChild,g.textContent=""}else m.push(b.createTextNode(f));l.textContent="",o=0;while(f=m[o++])if(d&&n.inArray(f,d)>-1)e&&e.push(f);else if(j=n.contains(f.ownerDocument,f),g=_(l.appendChild(f),"script"),j&&aa(g),c){k=0;while(f=g[k++])Z.test(f.type||"")&&c.push(f)}return l}!function(){var a=d.createDocumentFragment(),b=a.appendChild(d.createElement("div")),c=d.createElement("input");c.setAttribute("type","radio"),c.setAttribute("checked","checked"),c.setAttribute("name","t"),b.appendChild(c),l.checkClone=b.cloneNode(!0).cloneNode(!0).lastChild.checked,b.innerHTML="<textarea>x</textarea>",l.noCloneChecked=!!b.cloneNode(!0).lastChild.defaultValue}();var da=/^key/,ea=/^(?:mouse|pointer|contextmenu|drag|drop)|click/,fa=/^([^.]*)(?:\.(.+)|)/;function ga(){return!0}function ha(){return!1}function ia(){try{return d.activeElement}catch(a){}}function ja(a,b,c,d,e,f){var g,h;if("object"==typeof b){"string"!=typeof c&&(d=d||c,c=void 0);for(h in b)ja(a,h,c,d,b[h],f);return a}if(null==d&&null==e?(e=c,d=c=void 0):null==e&&("string"==typeof c?(e=d,d=void 0):(e=d,d=c,c=void 0)),e===!1)e=ha;else if(!e)return a;return 1===f&&(g=e,e=function(a){return n().off(a),g.apply(this,arguments)},e.guid=g.guid||(g.guid=n.guid++)),a.each(function(){n.event.add(this,b,e,d,c)})}n.event={global:{},add:function(a,b,c,d,e){var f,g,h,i,j,k,l,m,o,p,q,r=N.get(a);if(r){c.handler&&(f=c,c=f.handler,e=f.selector),c.guid||(c.guid=n.guid++),(i=r.events)||(i=r.events={}),(g=r.handle)||(g=r.handle=function(b){return"undefined"!=typeof n&&n.event.triggered!==b.type?n.event.dispatch.apply(a,arguments):void 0}),b=(b||"").match(G)||[""],j=b.length;while(j--)h=fa.exec(b[j])||[],o=q=h[1],p=(h[2]||"").split(".").sort(),o&&(l=n.event.special[o]||{},o=(e?l.delegateType:l.bindType)||o,l=n.event.special[o]||{},k=n.extend({type:o,origType:q,data:d,handler:c,guid:c.guid,selector:e,needsContext:e&&n.expr.match.needsContext.test(e),namespace:p.join(".")},f),(m=i[o])||(m=i[o]=[],m.delegateCount=0,l.setup&&l.setup.call(a,d,p,g)!==!1||a.addEventListener&&a.addEventListener(o,g)),l.add&&(l.add.call(a,k),k.handler.guid||(k.handler.guid=c.guid)),e?m.splice(m.delegateCount++,0,k):m.push(k),n.event.global[o]=!0)}},remove:function(a,b,c,d,e){var f,g,h,i,j,k,l,m,o,p,q,r=N.hasData(a)&&N.get(a);if(r&&(i=r.events)){b=(b||"").match(G)||[""],j=b.length;while(j--)if(h=fa.exec(b[j])||[],o=q=h[1],p=(h[2]||"").split(".").sort(),o){l=n.event.special[o]||{},o=(d?l.delegateType:l.bindType)||o,m=i[o]||[],h=h[2]&&new RegExp("(^|\\.)"+p.join("\\.(?:.*\\.|)")+"(\\.|$)"),g=f=m.length;while(f--)k=m[f],!e&&q!==k.origType||c&&c.guid!==k.guid||h&&!h.test(k.namespace)||d&&d!==k.selector&&("**"!==d||!k.selector)||(m.splice(f,1),k.selector&&m.delegateCount--,l.remove&&l.remove.call(a,k));g&&!m.length&&(l.teardown&&l.teardown.call(a,p,r.handle)!==!1||n.removeEvent(a,o,r.handle),delete i[o])}else for(o in i)n.event.remove(a,o+b[j],c,d,!0);n.isEmptyObject(i)&&N.remove(a,"handle events")}},dispatch:function(a){a=n.event.fix(a);var b,c,d,f,g,h=[],i=e.call(arguments),j=(N.get(this,"events")||{})[a.type]||[],k=n.event.special[a.type]||{};if(i[0]=a,a.delegateTarget=this,!k.preDispatch||k.preDispatch.call(this,a)!==!1){h=n.event.handlers.call(this,a,j),b=0;while((f=h[b++])&&!a.isPropagationStopped()){a.currentTarget=f.elem,c=0;while((g=f.handlers[c++])&&!a.isImmediatePropagationStopped())a.rnamespace&&!a.rnamespace.test(g.namespace)||(a.handleObj=g,a.data=g.data,d=((n.event.special[g.origType]||{}).handle||g.handler).apply(f.elem,i),void 0!==d&&(a.result=d)===!1&&(a.preventDefault(),a.stopPropagation()))}return k.postDispatch&&k.postDispatch.call(this,a),a.result}},handlers:function(a,b){var c,d,e,f,g=[],h=b.delegateCount,i=a.target;if(h&&i.nodeType&&("click"!==a.type||isNaN(a.button)||a.button<1))for(;i!==this;i=i.parentNode||this)if(1===i.nodeType&&(i.disabled!==!0||"click"!==a.type)){for(d=[],c=0;h>c;c++)f=b[c],e=f.selector+" ",void 0===d[e]&&(d[e]=f.needsContext?n(e,this).index(i)>-1:n.find(e,this,null,[i]).length),d[e]&&d.push(f);d.length&&g.push({elem:i,handlers:d})}return h<b.length&&g.push({elem:this,handlers:b.slice(h)}),g},props:"altKey bubbles cancelable ctrlKey currentTarget detail eventPhase metaKey relatedTarget shiftKey target timeStamp view which".split(" "),fixHooks:{},keyHooks:{props:"char charCode key keyCode".split(" "),filter:function(a,b){return null==a.which&&(a.which=null!=b.charCode?b.charCode:b.keyCode),a}},mouseHooks:{props:"button buttons clientX clientY offsetX offsetY pageX pageY screenX screenY toElement".split(" "),filter:function(a,b){var c,e,f,g=b.button;return null==a.pageX&&null!=b.clientX&&(c=a.target.ownerDocument||d,e=c.documentElement,f=c.body,a.pageX=b.clientX+(e&&e.scrollLeft||f&&f.scrollLeft||0)-(e&&e.clientLeft||f&&f.clientLeft||0),a.pageY=b.clientY+(e&&e.scrollTop||f&&f.scrollTop||0)-(e&&e.clientTop||f&&f.clientTop||0)),a.which||void 0===g||(a.which=1&g?1:2&g?3:4&g?2:0),a}},fix:function(a){if(a[n.expando])return a;var b,c,e,f=a.type,g=a,h=this.fixHooks[f];h||(this.fixHooks[f]=h=ea.test(f)?this.mouseHooks:da.test(f)?this.keyHooks:{}),e=h.props?this.props.concat(h.props):this.props,a=new n.Event(g),b=e.length;while(b--)c=e[b],a[c]=g[c];return a.target||(a.target=d),3===a.target.nodeType&&(a.target=a.target.parentNode),h.filter?h.filter(a,g):a},special:{load:{noBubble:!0},focus:{trigger:function(){return this!==ia()&&this.focus?(this.focus(),!1):void 0},delegateType:"focusin"},blur:{trigger:function(){return this===ia()&&this.blur?(this.blur(),!1):void 0},delegateType:"focusout"},click:{trigger:function(){return"checkbox"===this.type&&this.click&&n.nodeName(this,"input")?(this.click(),!1):void 0},_default:function(a){return n.nodeName(a.target,"a")}},beforeunload:{postDispatch:function(a){void 0!==a.result&&a.originalEvent&&(a.originalEvent.returnValue=a.result)}}}},n.removeEvent=function(a,b,c){a.removeEventListener&&a.removeEventListener(b,c)},n.Event=function(a,b){return this instanceof n.Event?(a&&a.type?(this.originalEvent=a,this.type=a.type,this.isDefaultPrevented=a.defaultPrevented||void 0===a.defaultPrevented&&a.returnValue===!1?ga:ha):this.type=a,b&&n.extend(this,b),this.timeStamp=a&&a.timeStamp||n.now(),void(this[n.expando]=!0)):new n.Event(a,b)},n.Event.prototype={constructor:n.Event,isDefaultPrevented:ha,isPropagationStopped:ha,isImmediatePropagationStopped:ha,preventDefault:function(){var a=this.originalEvent;this.isDefaultPrevented=ga,a&&a.preventDefault()},stopPropagation:function(){var a=this.originalEvent;this.isPropagationStopped=ga,a&&a.stopPropagation()},stopImmediatePropagation:function(){var a=this.originalEvent;this.isImmediatePropagationStopped=ga,a&&a.stopImmediatePropagation(),this.stopPropagation()}},n.each({mouseenter:"mouseover",mouseleave:"mouseout",pointerenter:"pointerover",pointerleave:"pointerout"},function(a,b){n.event.special[a]={delegateType:b,bindType:b,handle:function(a){var c,d=this,e=a.relatedTarget,f=a.handleObj;return e&&(e===d||n.contains(d,e))||(a.type=f.origType,c=f.handler.apply(this,arguments),a.type=b),c}}}),n.fn.extend({on:function(a,b,c,d){return ja(this,a,b,c,d)},one:function(a,b,c,d){return ja(this,a,b,c,d,1)},off:function(a,b,c){var d,e;if(a&&a.preventDefault&&a.handleObj)return d=a.handleObj,n(a.delegateTarget).off(d.namespace?d.origType+"."+d.namespace:d.origType,d.selector,d.handler),this;if("object"==typeof a){for(e in a)this.off(e,b,a[e]);return this}return b!==!1&&"function"!=typeof b||(c=b,b=void 0),c===!1&&(c=ha),this.each(function(){n.event.remove(this,a,c,b)})}});var ka=/<(?!area|br|col|embed|hr|img|input|link|meta|param)(([\w:-]+)[^>]*)\/>/gi,la=/<script|<style|<link/i,ma=/checked\s*(?:[^=]|=\s*.checked.)/i,na=/^true\/(.*)/,oa=/^\s*<!(?:\[CDATA\[|--)|(?:\]\]|--)>\s*$/g;function pa(a,b){return n.nodeName(a,"table")&&n.nodeName(11!==b.nodeType?b:b.firstChild,"tr")?a.getElementsByTagName("tbody")[0]||a.appendChild(a.ownerDocument.createElement("tbody")):a}function qa(a){return a.type=(null!==a.getAttribute("type"))+"/"+a.type,a}function ra(a){var b=na.exec(a.type);return b?a.type=b[1]:a.removeAttribute("type"),a}function sa(a,b){var c,d,e,f,g,h,i,j;if(1===b.nodeType){if(N.hasData(a)&&(f=N.access(a),g=N.set(b,f),j=f.events)){delete g.handle,g.events={};for(e in j)for(c=0,d=j[e].length;d>c;c++)n.event.add(b,e,j[e][c])}O.hasData(a)&&(h=O.access(a),i=n.extend({},h),O.set(b,i))}}function ta(a,b){var c=b.nodeName.toLowerCase();"input"===c&&X.test(a.type)?b.checked=a.checked:"input"!==c&&"textarea"!==c||(b.defaultValue=a.defaultValue)}function ua(a,b,c,d){b=f.apply([],b);var e,g,h,i,j,k,m=0,o=a.length,p=o-1,q=b[0],r=n.isFunction(q);if(r||o>1&&"string"==typeof q&&!l.checkClone&&ma.test(q))return a.each(function(e){var f=a.eq(e);r&&(b[0]=q.call(this,e,f.html())),ua(f,b,c,d)});if(o&&(e=ca(b,a[0].ownerDocument,!1,a,d),g=e.firstChild,1===e.childNodes.length&&(e=g),g||d)){for(h=n.map(_(e,"script"),qa),i=h.length;o>m;m++)j=e,m!==p&&(j=n.clone(j,!0,!0),i&&n.merge(h,_(j,"script"))),c.call(a[m],j,m);if(i)for(k=h[h.length-1].ownerDocument,n.map(h,ra),m=0;i>m;m++)j=h[m],Z.test(j.type||"")&&!N.access(j,"globalEval")&&n.contains(k,j)&&(j.src?n._evalUrl&&n._evalUrl(j.src):n.globalEval(j.textContent.replace(oa,"")))}return a}function va(a,b,c){for(var d,e=b?n.filter(b,a):a,f=0;null!=(d=e[f]);f++)c||1!==d.nodeType||n.cleanData(_(d)),d.parentNode&&(c&&n.contains(d.ownerDocument,d)&&aa(_(d,"script")),d.parentNode.removeChild(d));return a}n.extend({htmlPrefilter:function(a){return a.replace(ka,"<$1></$2>")},clone:function(a,b,c){var d,e,f,g,h=a.cloneNode(!0),i=n.contains(a.ownerDocument,a);if(!(l.noCloneChecked||1!==a.nodeType&&11!==a.nodeType||n.isXMLDoc(a)))for(g=_(h),f=_(a),d=0,e=f.length;e>d;d++)ta(f[d],g[d]);if(b)if(c)for(f=f||_(a),g=g||_(h),d=0,e=f.length;e>d;d++)sa(f[d],g[d]);else sa(a,h);return g=_(h,"script"),g.length>0&&aa(g,!i&&_(a,"script")),h},cleanData:function(a){for(var b,c,d,e=n.event.special,f=0;void 0!==(c=a[f]);f++)if(L(c)){if(b=c[N.expando]){if(b.events)for(d in b.events)e[d]?n.event.remove(c,d):n.removeEvent(c,d,b.handle);c[N.expando]=void 0}c[O.expando]&&(c[O.expando]=void 0)}}}),n.fn.extend({domManip:ua,detach:function(a){return va(this,a,!0)},remove:function(a){return va(this,a)},text:function(a){return K(this,function(a){return void 0===a?n.text(this):this.empty().each(function(){1!==this.nodeType&&11!==this.nodeType&&9!==this.nodeType||(this.textContent=a)})},null,a,arguments.length)},append:function(){return ua(this,arguments,function(a){if(1===this.nodeType||11===this.nodeType||9===this.nodeType){var b=pa(this,a);b.appendChild(a)}})},prepend:function(){return ua(this,arguments,function(a){if(1===this.nodeType||11===this.nodeType||9===this.nodeType){var b=pa(this,a);b.insertBefore(a,b.firstChild)}})},before:function(){return ua(this,arguments,function(a){this.parentNode&&this.parentNode.insertBefore(a,this)})},after:function(){return ua(this,arguments,function(a){this.parentNode&&this.parentNode.insertBefore(a,this.nextSibling)})},empty:function(){for(var a,b=0;null!=(a=this[b]);b++)1===a.nodeType&&(n.cleanData(_(a,!1)),a.textContent="");return this},clone:function(a,b){return a=null==a?!1:a,b=null==b?a:b,this.map(function(){return n.clone(this,a,b)})},html:function(a){return K(this,function(a){var b=this[0]||{},c=0,d=this.length;if(void 0===a&&1===b.nodeType)return b.innerHTML;if("string"==typeof a&&!la.test(a)&&!$[(Y.exec(a)||["",""])[1].toLowerCase()]){a=n.htmlPrefilter(a);try{for(;d>c;c++)b=this[c]||{},1===b.nodeType&&(n.cleanData(_(b,!1)),b.innerHTML=a);b=0}catch(e){}}b&&this.empty().append(a)},null,a,arguments.length)},replaceWith:function(){var a=[];return ua(this,arguments,function(b){var c=this.parentNode;n.inArray(this,a)<0&&(n.cleanData(_(this)),c&&c.replaceChild(b,this))},a)}}),n.each({appendTo:"append",prependTo:"prepend",insertBefore:"before",insertAfter:"after",replaceAll:"replaceWith"},function(a,b){n.fn[a]=function(a){for(var c,d=[],e=n(a),f=e.length-1,h=0;f>=h;h++)c=h===f?this:this.clone(!0),n(e[h])[b](c),g.apply(d,c.get());return this.pushStack(d)}});var wa,xa={HTML:"block",BODY:"block"};function ya(a,b){var c=n(b.createElement(a)).appendTo(b.body),d=n.css(c[0],"display");return c.detach(),d}function za(a){var b=d,c=xa[a];return c||(c=ya(a,b),"none"!==c&&c||(wa=(wa||n("<iframe frameborder='0' width='0' height='0'/>")).appendTo(b.documentElement),b=wa[0].contentDocument,b.write(),b.close(),c=ya(a,b),wa.detach()),xa[a]=c),c}var Aa=/^margin/,Ba=new RegExp("^("+S+")(?!px)[a-z%]+$","i"),Ca=function(b){var c=b.ownerDocument.defaultView;return c&&c.opener||(c=a),c.getComputedStyle(b)},Da=function(a,b,c,d){var e,f,g={};for(f in b)g[f]=a.style[f],a.style[f]=b[f];e=c.apply(a,d||[]);for(f in b)a.style[f]=g[f];return e},Ea=d.documentElement;!function(){var b,c,e,f,g=d.createElement("div"),h=d.createElement("div");if(h.style){h.style.backgroundClip="content-box",h.cloneNode(!0).style.backgroundClip="",l.clearCloneStyle="content-box"===h.style.backgroundClip,g.style.cssText="border:0;width:8px;height:0;top:0;left:-9999px;padding:0;margin-top:1px;position:absolute",g.appendChild(h);function i(){h.style.cssText="-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;position:relative;display:block;margin:auto;border:1px;padding:1px;top:1%;width:50%",h.innerHTML="",Ea.appendChild(g);var d=a.getComputedStyle(h);b="1%"!==d.top,f="2px"===d.marginLeft,c="4px"===d.width,h.style.marginRight="50%",e="4px"===d.marginRight,Ea.removeChild(g)}n.extend(l,{pixelPosition:function(){return i(),b},boxSizingReliable:function(){return null==c&&i(),c},pixelMarginRight:function(){return null==c&&i(),e},reliableMarginLeft:function(){return null==c&&i(),f},reliableMarginRight:function(){var b,c=h.appendChild(d.createElement("div"));return c.style.cssText=h.style.cssText="-webkit-box-sizing:content-box;box-sizing:content-box;display:block;margin:0;border:0;padding:0",c.style.marginRight=c.style.width="0",h.style.width="1px",Ea.appendChild(g),b=!parseFloat(a.getComputedStyle(c).marginRight),Ea.removeChild(g),h.removeChild(c),b}})}}();function Fa(a,b,c){var d,e,f,g,h=a.style;return c=c||Ca(a),g=c?c.getPropertyValue(b)||c[b]:void 0,""!==g&&void 0!==g||n.contains(a.ownerDocument,a)||(g=n.style(a,b)),c&&!l.pixelMarginRight()&&Ba.test(g)&&Aa.test(b)&&(d=h.width,e=h.minWidth,f=h.maxWidth,h.minWidth=h.maxWidth=h.width=g,g=c.width,h.width=d,h.minWidth=e,h.maxWidth=f),void 0!==g?g+"":g}function Ga(a,b){return{get:function(){return a()?void delete this.get:(this.get=b).apply(this,arguments)}}}var Ha=/^(none|table(?!-c[ea]).+)/,Ia={position:"absolute",visibility:"hidden",display:"block"},Ja={letterSpacing:"0",fontWeight:"400"},Ka=["Webkit","O","Moz","ms"],La=d.createElement("div").style;function Ma(a){if(a in La)return a;var b=a[0].toUpperCase()+a.slice(1),c=Ka.length;while(c--)if(a=Ka[c]+b,a in La)return a}function Na(a,b,c){var d=T.exec(b);return d?Math.max(0,d[2]-(c||0))+(d[3]||"px"):b}function Oa(a,b,c,d,e){for(var f=c===(d?"border":"content")?4:"width"===b?1:0,g=0;4>f;f+=2)"margin"===c&&(g+=n.css(a,c+U[f],!0,e)),d?("content"===c&&(g-=n.css(a,"padding"+U[f],!0,e)),"margin"!==c&&(g-=n.css(a,"border"+U[f]+"Width",!0,e))):(g+=n.css(a,"padding"+U[f],!0,e),"padding"!==c&&(g+=n.css(a,"border"+U[f]+"Width",!0,e)));return g}function Pa(b,c,e){var f=!0,g="width"===c?b.offsetWidth:b.offsetHeight,h=Ca(b),i="border-box"===n.css(b,"boxSizing",!1,h);if(d.msFullscreenElement&&a.top!==a&&b.getClientRects().length&&(g=Math.round(100*b.getBoundingClientRect()[c])),0>=g||null==g){if(g=Fa(b,c,h),(0>g||null==g)&&(g=b.style[c]),Ba.test(g))return g;f=i&&(l.boxSizingReliable()||g===b.style[c]),g=parseFloat(g)||0}return g+Oa(b,c,e||(i?"border":"content"),f,h)+"px"}function Qa(a,b){for(var c,d,e,f=[],g=0,h=a.length;h>g;g++)d=a[g],d.style&&(f[g]=N.get(d,"olddisplay"),c=d.style.display,b?(f[g]||"none"!==c||(d.style.display=""),""===d.style.display&&V(d)&&(f[g]=N.access(d,"olddisplay",za(d.nodeName)))):(e=V(d),"none"===c&&e||N.set(d,"olddisplay",e?c:n.css(d,"display"))));for(g=0;h>g;g++)d=a[g],d.style&&(b&&"none"!==d.style.display&&""!==d.style.display||(d.style.display=b?f[g]||"":"none"));return a}n.extend({cssHooks:{opacity:{get:function(a,b){if(b){var c=Fa(a,"opacity");return""===c?"1":c}}}},cssNumber:{animationIterationCount:!0,columnCount:!0,fillOpacity:!0,flexGrow:!0,flexShrink:!0,fontWeight:!0,lineHeight:!0,opacity:!0,order:!0,orphans:!0,widows:!0,zIndex:!0,zoom:!0},cssProps:{"float":"cssFloat"},style:function(a,b,c,d){if(a&&3!==a.nodeType&&8!==a.nodeType&&a.style){var e,f,g,h=n.camelCase(b),i=a.style;return b=n.cssProps[h]||(n.cssProps[h]=Ma(h)||h),g=n.cssHooks[b]||n.cssHooks[h],void 0===c?g&&"get"in g&&void 0!==(e=g.get(a,!1,d))?e:i[b]:(f=typeof c,"string"===f&&(e=T.exec(c))&&e[1]&&(c=W(a,b,e),f="number"),null!=c&&c===c&&("number"===f&&(c+=e&&e[3]||(n.cssNumber[h]?"":"px")),l.clearCloneStyle||""!==c||0!==b.indexOf("background")||(i[b]="inherit"),g&&"set"in g&&void 0===(c=g.set(a,c,d))||(i[b]=c)),void 0)}},css:function(a,b,c,d){var e,f,g,h=n.camelCase(b);return b=n.cssProps[h]||(n.cssProps[h]=Ma(h)||h),g=n.cssHooks[b]||n.cssHooks[h],g&&"get"in g&&(e=g.get(a,!0,c)),void 0===e&&(e=Fa(a,b,d)),"normal"===e&&b in Ja&&(e=Ja[b]),""===c||c?(f=parseFloat(e),c===!0||isFinite(f)?f||0:e):e}}),n.each(["height","width"],function(a,b){n.cssHooks[b]={get:function(a,c,d){return c?Ha.test(n.css(a,"display"))&&0===a.offsetWidth?Da(a,Ia,function(){return Pa(a,b,d)}):Pa(a,b,d):void 0},set:function(a,c,d){var e,f=d&&Ca(a),g=d&&Oa(a,b,d,"border-box"===n.css(a,"boxSizing",!1,f),f);return g&&(e=T.exec(c))&&"px"!==(e[3]||"px")&&(a.style[b]=c,c=n.css(a,b)),Na(a,c,g)}}}),n.cssHooks.marginLeft=Ga(l.reliableMarginLeft,function(a,b){return b?(parseFloat(Fa(a,"marginLeft"))||a.getBoundingClientRect().left-Da(a,{marginLeft:0},function(){return a.getBoundingClientRect().left}))+"px":void 0}),n.cssHooks.marginRight=Ga(l.reliableMarginRight,function(a,b){return b?Da(a,{display:"inline-block"},Fa,[a,"marginRight"]):void 0}),n.each({margin:"",padding:"",border:"Width"},function(a,b){n.cssHooks[a+b]={expand:function(c){for(var d=0,e={},f="string"==typeof c?c.split(" "):[c];4>d;d++)e[a+U[d]+b]=f[d]||f[d-2]||f[0];return e}},Aa.test(a)||(n.cssHooks[a+b].set=Na)}),n.fn.extend({css:function(a,b){return K(this,function(a,b,c){var d,e,f={},g=0;if(n.isArray(b)){for(d=Ca(a),e=b.length;e>g;g++)f[b[g]]=n.css(a,b[g],!1,d);return f}return void 0!==c?n.style(a,b,c):n.css(a,b)},a,b,arguments.length>1)},show:function(){return Qa(this,!0)},hide:function(){return Qa(this)},toggle:function(a){return"boolean"==typeof a?a?this.show():this.hide():this.each(function(){V(this)?n(this).show():n(this).hide()})}});function Ra(a,b,c,d,e){return new Ra.prototype.init(a,b,c,d,e)}n.Tween=Ra,Ra.prototype={constructor:Ra,init:function(a,b,c,d,e,f){this.elem=a,this.prop=c,this.easing=e||n.easing._default,this.options=b,this.start=this.now=this.cur(),this.end=d,this.unit=f||(n.cssNumber[c]?"":"px")},cur:function(){var a=Ra.propHooks[this.prop];return a&&a.get?a.get(this):Ra.propHooks._default.get(this)},run:function(a){var b,c=Ra.propHooks[this.prop];return this.options.duration?this.pos=b=n.easing[this.easing](a,this.options.duration*a,0,1,this.options.duration):this.pos=b=a,this.now=(this.end-this.start)*b+this.start,this.options.step&&this.options.step.call(this.elem,this.now,this),c&&c.set?c.set(this):Ra.propHooks._default.set(this),this}},Ra.prototype.init.prototype=Ra.prototype,Ra.propHooks={_default:{get:function(a){var b;return 1!==a.elem.nodeType||null!=a.elem[a.prop]&&null==a.elem.style[a.prop]?a.elem[a.prop]:(b=n.css(a.elem,a.prop,""),b&&"auto"!==b?b:0)},set:function(a){n.fx.step[a.prop]?n.fx.step[a.prop](a):1!==a.elem.nodeType||null==a.elem.style[n.cssProps[a.prop]]&&!n.cssHooks[a.prop]?a.elem[a.prop]=a.now:n.style(a.elem,a.prop,a.now+a.unit)}}},Ra.propHooks.scrollTop=Ra.propHooks.scrollLeft={set:function(a){a.elem.nodeType&&a.elem.parentNode&&(a.elem[a.prop]=a.now)}},n.easing={linear:function(a){return a},swing:function(a){return.5-Math.cos(a*Math.PI)/2},_default:"swing"},n.fx=Ra.prototype.init,n.fx.step={};var Sa,Ta,Ua=/^(?:toggle|show|hide)$/,Va=/queueHooks$/;function Wa(){return a.setTimeout(function(){Sa=void 0}),Sa=n.now()}function Xa(a,b){var c,d=0,e={height:a};for(b=b?1:0;4>d;d+=2-b)c=U[d],e["margin"+c]=e["padding"+c]=a;return b&&(e.opacity=e.width=a),e}function Ya(a,b,c){for(var d,e=(_a.tweeners[b]||[]).concat(_a.tweeners["*"]),f=0,g=e.length;g>f;f++)if(d=e[f].call(c,b,a))return d}function Za(a,b,c){var d,e,f,g,h,i,j,k,l=this,m={},o=a.style,p=a.nodeType&&V(a),q=N.get(a,"fxshow");c.queue||(h=n._queueHooks(a,"fx"),null==h.unqueued&&(h.unqueued=0,i=h.empty.fire,h.empty.fire=function(){h.unqueued||i()}),h.unqueued++,l.always(function(){l.always(function(){h.unqueued--,n.queue(a,"fx").length||h.empty.fire()})})),1===a.nodeType&&("height"in b||"width"in b)&&(c.overflow=[o.overflow,o.overflowX,o.overflowY],j=n.css(a,"display"),k="none"===j?N.get(a,"olddisplay")||za(a.nodeName):j,"inline"===k&&"none"===n.css(a,"float")&&(o.display="inline-block")),c.overflow&&(o.overflow="hidden",l.always(function(){o.overflow=c.overflow[0],o.overflowX=c.overflow[1],o.overflowY=c.overflow[2]}));for(d in b)if(e=b[d],Ua.exec(e)){if(delete b[d],f=f||"toggle"===e,e===(p?"hide":"show")){if("show"!==e||!q||void 0===q[d])continue;p=!0}m[d]=q&&q[d]||n.style(a,d)}else j=void 0;if(n.isEmptyObject(m))"inline"===("none"===j?za(a.nodeName):j)&&(o.display=j);else{q?"hidden"in q&&(p=q.hidden):q=N.access(a,"fxshow",{}),f&&(q.hidden=!p),p?n(a).show():l.done(function(){n(a).hide()}),l.done(function(){var b;N.remove(a,"fxshow");for(b in m)n.style(a,b,m[b])});for(d in m)g=Ya(p?q[d]:0,d,l),d in q||(q[d]=g.start,p&&(g.end=g.start,g.start="width"===d||"height"===d?1:0))}}function $a(a,b){var c,d,e,f,g;for(c in a)if(d=n.camelCase(c),e=b[d],f=a[c],n.isArray(f)&&(e=f[1],f=a[c]=f[0]),c!==d&&(a[d]=f,delete a[c]),g=n.cssHooks[d],g&&"expand"in g){f=g.expand(f),delete a[d];for(c in f)c in a||(a[c]=f[c],b[c]=e)}else b[d]=e}function _a(a,b,c){var d,e,f=0,g=_a.prefilters.length,h=n.Deferred().always(function(){delete i.elem}),i=function(){if(e)return!1;for(var b=Sa||Wa(),c=Math.max(0,j.startTime+j.duration-b),d=c/j.duration||0,f=1-d,g=0,i=j.tweens.length;i>g;g++)j.tweens[g].run(f);return h.notifyWith(a,[j,f,c]),1>f&&i?c:(h.resolveWith(a,[j]),!1)},j=h.promise({elem:a,props:n.extend({},b),opts:n.extend(!0,{specialEasing:{},easing:n.easing._default},c),originalProperties:b,originalOptions:c,startTime:Sa||Wa(),duration:c.duration,tweens:[],createTween:function(b,c){var d=n.Tween(a,j.opts,b,c,j.opts.specialEasing[b]||j.opts.easing);return j.tweens.push(d),d},stop:function(b){var c=0,d=b?j.tweens.length:0;if(e)return this;for(e=!0;d>c;c++)j.tweens[c].run(1);return b?(h.notifyWith(a,[j,1,0]),h.resolveWith(a,[j,b])):h.rejectWith(a,[j,b]),this}}),k=j.props;for($a(k,j.opts.specialEasing);g>f;f++)if(d=_a.prefilters[f].call(j,a,k,j.opts))return n.isFunction(d.stop)&&(n._queueHooks(j.elem,j.opts.queue).stop=n.proxy(d.stop,d)),d;return n.map(k,Ya,j),n.isFunction(j.opts.start)&&j.opts.start.call(a,j),n.fx.timer(n.extend(i,{elem:a,anim:j,queue:j.opts.queue})),j.progress(j.opts.progress).done(j.opts.done,j.opts.complete).fail(j.opts.fail).always(j.opts.always)}n.Animation=n.extend(_a,{tweeners:{"*":[function(a,b){var c=this.createTween(a,b);return W(c.elem,a,T.exec(b),c),c}]},tweener:function(a,b){n.isFunction(a)?(b=a,a=["*"]):a=a.match(G);for(var c,d=0,e=a.length;e>d;d++)c=a[d],_a.tweeners[c]=_a.tweeners[c]||[],_a.tweeners[c].unshift(b)},prefilters:[Za],prefilter:function(a,b){b?_a.prefilters.unshift(a):_a.prefilters.push(a)}}),n.speed=function(a,b,c){var d=a&&"object"==typeof a?n.extend({},a):{complete:c||!c&&b||n.isFunction(a)&&a,duration:a,easing:c&&b||b&&!n.isFunction(b)&&b};return d.duration=n.fx.off?0:"number"==typeof d.duration?d.duration:d.duration in n.fx.speeds?n.fx.speeds[d.duration]:n.fx.speeds._default,null!=d.queue&&d.queue!==!0||(d.queue="fx"),d.old=d.complete,d.complete=function(){n.isFunction(d.old)&&d.old.call(this),d.queue&&n.dequeue(this,d.queue)},d},n.fn.extend({fadeTo:function(a,b,c,d){return this.filter(V).css("opacity",0).show().end().animate({opacity:b},a,c,d)},animate:function(a,b,c,d){var e=n.isEmptyObject(a),f=n.speed(b,c,d),g=function(){var b=_a(this,n.extend({},a),f);(e||N.get(this,"finish"))&&b.stop(!0)};return g.finish=g,e||f.queue===!1?this.each(g):this.queue(f.queue,g)},stop:function(a,b,c){var d=function(a){var b=a.stop;delete a.stop,b(c)};return"string"!=typeof a&&(c=b,b=a,a=void 0),b&&a!==!1&&this.queue(a||"fx",[]),this.each(function(){var b=!0,e=null!=a&&a+"queueHooks",f=n.timers,g=N.get(this);if(e)g[e]&&g[e].stop&&d(g[e]);else for(e in g)g[e]&&g[e].stop&&Va.test(e)&&d(g[e]);for(e=f.length;e--;)f[e].elem!==this||null!=a&&f[e].queue!==a||(f[e].anim.stop(c),b=!1,f.splice(e,1));!b&&c||n.dequeue(this,a)})},finish:function(a){return a!==!1&&(a=a||"fx"),this.each(function(){var b,c=N.get(this),d=c[a+"queue"],e=c[a+"queueHooks"],f=n.timers,g=d?d.length:0;for(c.finish=!0,n.queue(this,a,[]),e&&e.stop&&e.stop.call(this,!0),b=f.length;b--;)f[b].elem===this&&f[b].queue===a&&(f[b].anim.stop(!0),f.splice(b,1));for(b=0;g>b;b++)d[b]&&d[b].finish&&d[b].finish.call(this);delete c.finish})}}),n.each(["toggle","show","hide"],function(a,b){var c=n.fn[b];n.fn[b]=function(a,d,e){return null==a||"boolean"==typeof a?c.apply(this,arguments):this.animate(Xa(b,!0),a,d,e)}}),n.each({slideDown:Xa("show"),slideUp:Xa("hide"),slideToggle:Xa("toggle"),fadeIn:{opacity:"show"},fadeOut:{opacity:"hide"},fadeToggle:{opacity:"toggle"}},function(a,b){n.fn[a]=function(a,c,d){return this.animate(b,a,c,d)}}),n.timers=[],n.fx.tick=function(){var a,b=0,c=n.timers;for(Sa=n.now();b<c.length;b++)a=c[b],a()||c[b]!==a||c.splice(b--,1);c.length||n.fx.stop(),Sa=void 0},n.fx.timer=function(a){n.timers.push(a),a()?n.fx.start():n.timers.pop()},n.fx.interval=13,n.fx.start=function(){Ta||(Ta=a.setInterval(n.fx.tick,n.fx.interval))},n.fx.stop=function(){a.clearInterval(Ta),Ta=null},n.fx.speeds={slow:600,fast:200,_default:400},n.fn.delay=function(b,c){return b=n.fx?n.fx.speeds[b]||b:b,c=c||"fx",this.queue(c,function(c,d){var e=a.setTimeout(c,b);d.stop=function(){a.clearTimeout(e)}})},function(){var a=d.createElement("input"),b=d.createElement("select"),c=b.appendChild(d.createElement("option"));a.type="checkbox",l.checkOn=""!==a.value,l.optSelected=c.selected,b.disabled=!0,l.optDisabled=!c.disabled,a=d.createElement("input"),a.value="t",a.type="radio",l.radioValue="t"===a.value}();var ab,bb=n.expr.attrHandle;n.fn.extend({attr:function(a,b){return K(this,n.attr,a,b,arguments.length>1)},removeAttr:function(a){return this.each(function(){n.removeAttr(this,a)})}}),n.extend({attr:function(a,b,c){var d,e,f=a.nodeType;if(3!==f&&8!==f&&2!==f)return"undefined"==typeof a.getAttribute?n.prop(a,b,c):(1===f&&n.isXMLDoc(a)||(b=b.toLowerCase(),e=n.attrHooks[b]||(n.expr.match.bool.test(b)?ab:void 0)),void 0!==c?null===c?void n.removeAttr(a,b):e&&"set"in e&&void 0!==(d=e.set(a,c,b))?d:(a.setAttribute(b,c+""),c):e&&"get"in e&&null!==(d=e.get(a,b))?d:(d=n.find.attr(a,b),null==d?void 0:d))},attrHooks:{type:{set:function(a,b){if(!l.radioValue&&"radio"===b&&n.nodeName(a,"input")){var c=a.value;return a.setAttribute("type",b),c&&(a.value=c),b}}}},removeAttr:function(a,b){var c,d,e=0,f=b&&b.match(G);if(f&&1===a.nodeType)while(c=f[e++])d=n.propFix[c]||c,n.expr.match.bool.test(c)&&(a[d]=!1),a.removeAttribute(c)}}),ab={set:function(a,b,c){return b===!1?n.removeAttr(a,c):a.setAttribute(c,c),c}},n.each(n.expr.match.bool.source.match(/\w+/g),function(a,b){var c=bb[b]||n.find.attr;bb[b]=function(a,b,d){var e,f;return d||(f=bb[b],bb[b]=e,e=null!=c(a,b,d)?b.toLowerCase():null,bb[b]=f),e}});var cb=/^(?:input|select|textarea|button)$/i,db=/^(?:a|area)$/i;n.fn.extend({prop:function(a,b){return K(this,n.prop,a,b,arguments.length>1)},removeProp:function(a){return this.each(function(){delete this[n.propFix[a]||a]})}}),n.extend({prop:function(a,b,c){var d,e,f=a.nodeType;if(3!==f&&8!==f&&2!==f)return 1===f&&n.isXMLDoc(a)||(b=n.propFix[b]||b,
+e=n.propHooks[b]),void 0!==c?e&&"set"in e&&void 0!==(d=e.set(a,c,b))?d:a[b]=c:e&&"get"in e&&null!==(d=e.get(a,b))?d:a[b]},propHooks:{tabIndex:{get:function(a){var b=n.find.attr(a,"tabindex");return b?parseInt(b,10):cb.test(a.nodeName)||db.test(a.nodeName)&&a.href?0:-1}}},propFix:{"for":"htmlFor","class":"className"}}),l.optSelected||(n.propHooks.selected={get:function(a){var b=a.parentNode;return b&&b.parentNode&&b.parentNode.selectedIndex,null},set:function(a){var b=a.parentNode;b&&(b.selectedIndex,b.parentNode&&b.parentNode.selectedIndex)}}),n.each(["tabIndex","readOnly","maxLength","cellSpacing","cellPadding","rowSpan","colSpan","useMap","frameBorder","contentEditable"],function(){n.propFix[this.toLowerCase()]=this});var eb=/[\t\r\n\f]/g;function fb(a){return a.getAttribute&&a.getAttribute("class")||""}n.fn.extend({addClass:function(a){var b,c,d,e,f,g,h,i=0;if(n.isFunction(a))return this.each(function(b){n(this).addClass(a.call(this,b,fb(this)))});if("string"==typeof a&&a){b=a.match(G)||[];while(c=this[i++])if(e=fb(c),d=1===c.nodeType&&(" "+e+" ").replace(eb," ")){g=0;while(f=b[g++])d.indexOf(" "+f+" ")<0&&(d+=f+" ");h=n.trim(d),e!==h&&c.setAttribute("class",h)}}return this},removeClass:function(a){var b,c,d,e,f,g,h,i=0;if(n.isFunction(a))return this.each(function(b){n(this).removeClass(a.call(this,b,fb(this)))});if(!arguments.length)return this.attr("class","");if("string"==typeof a&&a){b=a.match(G)||[];while(c=this[i++])if(e=fb(c),d=1===c.nodeType&&(" "+e+" ").replace(eb," ")){g=0;while(f=b[g++])while(d.indexOf(" "+f+" ")>-1)d=d.replace(" "+f+" "," ");h=n.trim(d),e!==h&&c.setAttribute("class",h)}}return this},toggleClass:function(a,b){var c=typeof a;return"boolean"==typeof b&&"string"===c?b?this.addClass(a):this.removeClass(a):n.isFunction(a)?this.each(function(c){n(this).toggleClass(a.call(this,c,fb(this),b),b)}):this.each(function(){var b,d,e,f;if("string"===c){d=0,e=n(this),f=a.match(G)||[];while(b=f[d++])e.hasClass(b)?e.removeClass(b):e.addClass(b)}else void 0!==a&&"boolean"!==c||(b=fb(this),b&&N.set(this,"__className__",b),this.setAttribute&&this.setAttribute("class",b||a===!1?"":N.get(this,"__className__")||""))})},hasClass:function(a){var b,c,d=0;b=" "+a+" ";while(c=this[d++])if(1===c.nodeType&&(" "+fb(c)+" ").replace(eb," ").indexOf(b)>-1)return!0;return!1}});var gb=/\r/g,hb=/[\x20\t\r\n\f]+/g;n.fn.extend({val:function(a){var b,c,d,e=this[0];{if(arguments.length)return d=n.isFunction(a),this.each(function(c){var e;1===this.nodeType&&(e=d?a.call(this,c,n(this).val()):a,null==e?e="":"number"==typeof e?e+="":n.isArray(e)&&(e=n.map(e,function(a){return null==a?"":a+""})),b=n.valHooks[this.type]||n.valHooks[this.nodeName.toLowerCase()],b&&"set"in b&&void 0!==b.set(this,e,"value")||(this.value=e))});if(e)return b=n.valHooks[e.type]||n.valHooks[e.nodeName.toLowerCase()],b&&"get"in b&&void 0!==(c=b.get(e,"value"))?c:(c=e.value,"string"==typeof c?c.replace(gb,""):null==c?"":c)}}}),n.extend({valHooks:{option:{get:function(a){var b=n.find.attr(a,"value");return null!=b?b:n.trim(n.text(a)).replace(hb," ")}},select:{get:function(a){for(var b,c,d=a.options,e=a.selectedIndex,f="select-one"===a.type||0>e,g=f?null:[],h=f?e+1:d.length,i=0>e?h:f?e:0;h>i;i++)if(c=d[i],(c.selected||i===e)&&(l.optDisabled?!c.disabled:null===c.getAttribute("disabled"))&&(!c.parentNode.disabled||!n.nodeName(c.parentNode,"optgroup"))){if(b=n(c).val(),f)return b;g.push(b)}return g},set:function(a,b){var c,d,e=a.options,f=n.makeArray(b),g=e.length;while(g--)d=e[g],(d.selected=n.inArray(n.valHooks.option.get(d),f)>-1)&&(c=!0);return c||(a.selectedIndex=-1),f}}}}),n.each(["radio","checkbox"],function(){n.valHooks[this]={set:function(a,b){return n.isArray(b)?a.checked=n.inArray(n(a).val(),b)>-1:void 0}},l.checkOn||(n.valHooks[this].get=function(a){return null===a.getAttribute("value")?"on":a.value})});var ib=/^(?:focusinfocus|focusoutblur)$/;n.extend(n.event,{trigger:function(b,c,e,f){var g,h,i,j,l,m,o,p=[e||d],q=k.call(b,"type")?b.type:b,r=k.call(b,"namespace")?b.namespace.split("."):[];if(h=i=e=e||d,3!==e.nodeType&&8!==e.nodeType&&!ib.test(q+n.event.triggered)&&(q.indexOf(".")>-1&&(r=q.split("."),q=r.shift(),r.sort()),l=q.indexOf(":")<0&&"on"+q,b=b[n.expando]?b:new n.Event(q,"object"==typeof b&&b),b.isTrigger=f?2:3,b.namespace=r.join("."),b.rnamespace=b.namespace?new RegExp("(^|\\.)"+r.join("\\.(?:.*\\.|)")+"(\\.|$)"):null,b.result=void 0,b.target||(b.target=e),c=null==c?[b]:n.makeArray(c,[b]),o=n.event.special[q]||{},f||!o.trigger||o.trigger.apply(e,c)!==!1)){if(!f&&!o.noBubble&&!n.isWindow(e)){for(j=o.delegateType||q,ib.test(j+q)||(h=h.parentNode);h;h=h.parentNode)p.push(h),i=h;i===(e.ownerDocument||d)&&p.push(i.defaultView||i.parentWindow||a)}g=0;while((h=p[g++])&&!b.isPropagationStopped())b.type=g>1?j:o.bindType||q,m=(N.get(h,"events")||{})[b.type]&&N.get(h,"handle"),m&&m.apply(h,c),m=l&&h[l],m&&m.apply&&L(h)&&(b.result=m.apply(h,c),b.result===!1&&b.preventDefault());return b.type=q,f||b.isDefaultPrevented()||o._default&&o._default.apply(p.pop(),c)!==!1||!L(e)||l&&n.isFunction(e[q])&&!n.isWindow(e)&&(i=e[l],i&&(e[l]=null),n.event.triggered=q,e[q](),n.event.triggered=void 0,i&&(e[l]=i)),b.result}},simulate:function(a,b,c){var d=n.extend(new n.Event,c,{type:a,isSimulated:!0});n.event.trigger(d,null,b),d.isDefaultPrevented()&&c.preventDefault()}}),n.fn.extend({trigger:function(a,b){return this.each(function(){n.event.trigger(a,b,this)})},triggerHandler:function(a,b){var c=this[0];return c?n.event.trigger(a,b,c,!0):void 0}}),n.each("blur focus focusin focusout load resize scroll unload click dblclick mousedown mouseup mousemove mouseover mouseout mouseenter mouseleave change select submit keydown keypress keyup error contextmenu".split(" "),function(a,b){n.fn[b]=function(a,c){return arguments.length>0?this.on(b,null,a,c):this.trigger(b)}}),n.fn.extend({hover:function(a,b){return this.mouseenter(a).mouseleave(b||a)}}),l.focusin="onfocusin"in a,l.focusin||n.each({focus:"focusin",blur:"focusout"},function(a,b){var c=function(a){n.event.simulate(b,a.target,n.event.fix(a))};n.event.special[b]={setup:function(){var d=this.ownerDocument||this,e=N.access(d,b);e||d.addEventListener(a,c,!0),N.access(d,b,(e||0)+1)},teardown:function(){var d=this.ownerDocument||this,e=N.access(d,b)-1;e?N.access(d,b,e):(d.removeEventListener(a,c,!0),N.remove(d,b))}}});var jb=a.location,kb=n.now(),lb=/\?/;n.parseJSON=function(a){return JSON.parse(a+"")},n.parseXML=function(b){var c;if(!b||"string"!=typeof b)return null;try{c=(new a.DOMParser).parseFromString(b,"text/xml")}catch(d){c=void 0}return c&&!c.getElementsByTagName("parsererror").length||n.error("Invalid XML: "+b),c};var mb=/#.*$/,nb=/([?&])_=[^&]*/,ob=/^(.*?):[ \t]*([^\r\n]*)$/gm,pb=/^(?:about|app|app-storage|.+-extension|file|res|widget):$/,qb=/^(?:GET|HEAD)$/,rb=/^\/\//,sb={},tb={},ub="*/".concat("*"),vb=d.createElement("a");vb.href=jb.href;function wb(a){return function(b,c){"string"!=typeof b&&(c=b,b="*");var d,e=0,f=b.toLowerCase().match(G)||[];if(n.isFunction(c))while(d=f[e++])"+"===d[0]?(d=d.slice(1)||"*",(a[d]=a[d]||[]).unshift(c)):(a[d]=a[d]||[]).push(c)}}function xb(a,b,c,d){var e={},f=a===tb;function g(h){var i;return e[h]=!0,n.each(a[h]||[],function(a,h){var j=h(b,c,d);return"string"!=typeof j||f||e[j]?f?!(i=j):void 0:(b.dataTypes.unshift(j),g(j),!1)}),i}return g(b.dataTypes[0])||!e["*"]&&g("*")}function yb(a,b){var c,d,e=n.ajaxSettings.flatOptions||{};for(c in b)void 0!==b[c]&&((e[c]?a:d||(d={}))[c]=b[c]);return d&&n.extend(!0,a,d),a}function zb(a,b,c){var d,e,f,g,h=a.contents,i=a.dataTypes;while("*"===i[0])i.shift(),void 0===d&&(d=a.mimeType||b.getResponseHeader("Content-Type"));if(d)for(e in h)if(h[e]&&h[e].test(d)){i.unshift(e);break}if(i[0]in c)f=i[0];else{for(e in c){if(!i[0]||a.converters[e+" "+i[0]]){f=e;break}g||(g=e)}f=f||g}return f?(f!==i[0]&&i.unshift(f),c[f]):void 0}function Ab(a,b,c,d){var e,f,g,h,i,j={},k=a.dataTypes.slice();if(k[1])for(g in a.converters)j[g.toLowerCase()]=a.converters[g];f=k.shift();while(f)if(a.responseFields[f]&&(c[a.responseFields[f]]=b),!i&&d&&a.dataFilter&&(b=a.dataFilter(b,a.dataType)),i=f,f=k.shift())if("*"===f)f=i;else if("*"!==i&&i!==f){if(g=j[i+" "+f]||j["* "+f],!g)for(e in j)if(h=e.split(" "),h[1]===f&&(g=j[i+" "+h[0]]||j["* "+h[0]])){g===!0?g=j[e]:j[e]!==!0&&(f=h[0],k.unshift(h[1]));break}if(g!==!0)if(g&&a["throws"])b=g(b);else try{b=g(b)}catch(l){return{state:"parsererror",error:g?l:"No conversion from "+i+" to "+f}}}return{state:"success",data:b}}n.extend({active:0,lastModified:{},etag:{},ajaxSettings:{url:jb.href,type:"GET",isLocal:pb.test(jb.protocol),global:!0,processData:!0,async:!0,contentType:"application/x-www-form-urlencoded; charset=UTF-8",accepts:{"*":ub,text:"text/plain",html:"text/html",xml:"application/xml, text/xml",json:"application/json, text/javascript"},contents:{xml:/\bxml\b/,html:/\bhtml/,json:/\bjson\b/},responseFields:{xml:"responseXML",text:"responseText",json:"responseJSON"},converters:{"* text":String,"text html":!0,"text json":n.parseJSON,"text xml":n.parseXML},flatOptions:{url:!0,context:!0}},ajaxSetup:function(a,b){return b?yb(yb(a,n.ajaxSettings),b):yb(n.ajaxSettings,a)},ajaxPrefilter:wb(sb),ajaxTransport:wb(tb),ajax:function(b,c){"object"==typeof b&&(c=b,b=void 0),c=c||{};var e,f,g,h,i,j,k,l,m=n.ajaxSetup({},c),o=m.context||m,p=m.context&&(o.nodeType||o.jquery)?n(o):n.event,q=n.Deferred(),r=n.Callbacks("once memory"),s=m.statusCode||{},t={},u={},v=0,w="canceled",x={readyState:0,getResponseHeader:function(a){var b;if(2===v){if(!h){h={};while(b=ob.exec(g))h[b[1].toLowerCase()]=b[2]}b=h[a.toLowerCase()]}return null==b?null:b},getAllResponseHeaders:function(){return 2===v?g:null},setRequestHeader:function(a,b){var c=a.toLowerCase();return v||(a=u[c]=u[c]||a,t[a]=b),this},overrideMimeType:function(a){return v||(m.mimeType=a),this},statusCode:function(a){var b;if(a)if(2>v)for(b in a)s[b]=[s[b],a[b]];else x.always(a[x.status]);return this},abort:function(a){var b=a||w;return e&&e.abort(b),z(0,b),this}};if(q.promise(x).complete=r.add,x.success=x.done,x.error=x.fail,m.url=((b||m.url||jb.href)+"").replace(mb,"").replace(rb,jb.protocol+"//"),m.type=c.method||c.type||m.method||m.type,m.dataTypes=n.trim(m.dataType||"*").toLowerCase().match(G)||[""],null==m.crossDomain){j=d.createElement("a");try{j.href=m.url,j.href=j.href,m.crossDomain=vb.protocol+"//"+vb.host!=j.protocol+"//"+j.host}catch(y){m.crossDomain=!0}}if(m.data&&m.processData&&"string"!=typeof m.data&&(m.data=n.param(m.data,m.traditional)),xb(sb,m,c,x),2===v)return x;k=n.event&&m.global,k&&0===n.active++&&n.event.trigger("ajaxStart"),m.type=m.type.toUpperCase(),m.hasContent=!qb.test(m.type),f=m.url,m.hasContent||(m.data&&(f=m.url+=(lb.test(f)?"&":"?")+m.data,delete m.data),m.cache===!1&&(m.url=nb.test(f)?f.replace(nb,"$1_="+kb++):f+(lb.test(f)?"&":"?")+"_="+kb++)),m.ifModified&&(n.lastModified[f]&&x.setRequestHeader("If-Modified-Since",n.lastModified[f]),n.etag[f]&&x.setRequestHeader("If-None-Match",n.etag[f])),(m.data&&m.hasContent&&m.contentType!==!1||c.contentType)&&x.setRequestHeader("Content-Type",m.contentType),x.setRequestHeader("Accept",m.dataTypes[0]&&m.accepts[m.dataTypes[0]]?m.accepts[m.dataTypes[0]]+("*"!==m.dataTypes[0]?", "+ub+"; q=0.01":""):m.accepts["*"]);for(l in m.headers)x.setRequestHeader(l,m.headers[l]);if(m.beforeSend&&(m.beforeSend.call(o,x,m)===!1||2===v))return x.abort();w="abort";for(l in{success:1,error:1,complete:1})x[l](m[l]);if(e=xb(tb,m,c,x)){if(x.readyState=1,k&&p.trigger("ajaxSend",[x,m]),2===v)return x;m.async&&m.timeout>0&&(i=a.setTimeout(function(){x.abort("timeout")},m.timeout));try{v=1,e.send(t,z)}catch(y){if(!(2>v))throw y;z(-1,y)}}else z(-1,"No Transport");function z(b,c,d,h){var j,l,t,u,w,y=c;2!==v&&(v=2,i&&a.clearTimeout(i),e=void 0,g=h||"",x.readyState=b>0?4:0,j=b>=200&&300>b||304===b,d&&(u=zb(m,x,d)),u=Ab(m,u,x,j),j?(m.ifModified&&(w=x.getResponseHeader("Last-Modified"),w&&(n.lastModified[f]=w),w=x.getResponseHeader("etag"),w&&(n.etag[f]=w)),204===b||"HEAD"===m.type?y="nocontent":304===b?y="notmodified":(y=u.state,l=u.data,t=u.error,j=!t)):(t=y,!b&&y||(y="error",0>b&&(b=0))),x.status=b,x.statusText=(c||y)+"",j?q.resolveWith(o,[l,y,x]):q.rejectWith(o,[x,y,t]),x.statusCode(s),s=void 0,k&&p.trigger(j?"ajaxSuccess":"ajaxError",[x,m,j?l:t]),r.fireWith(o,[x,y]),k&&(p.trigger("ajaxComplete",[x,m]),--n.active||n.event.trigger("ajaxStop")))}return x},getJSON:function(a,b,c){return n.get(a,b,c,"json")},getScript:function(a,b){return n.get(a,void 0,b,"script")}}),n.each(["get","post"],function(a,b){n[b]=function(a,c,d,e){return n.isFunction(c)&&(e=e||d,d=c,c=void 0),n.ajax(n.extend({url:a,type:b,dataType:e,data:c,success:d},n.isPlainObject(a)&&a))}}),n._evalUrl=function(a){return n.ajax({url:a,type:"GET",dataType:"script",async:!1,global:!1,"throws":!0})},n.fn.extend({wrapAll:function(a){var b;return n.isFunction(a)?this.each(function(b){n(this).wrapAll(a.call(this,b))}):(this[0]&&(b=n(a,this[0].ownerDocument).eq(0).clone(!0),this[0].parentNode&&b.insertBefore(this[0]),b.map(function(){var a=this;while(a.firstElementChild)a=a.firstElementChild;return a}).append(this)),this)},wrapInner:function(a){return n.isFunction(a)?this.each(function(b){n(this).wrapInner(a.call(this,b))}):this.each(function(){var b=n(this),c=b.contents();c.length?c.wrapAll(a):b.append(a)})},wrap:function(a){var b=n.isFunction(a);return this.each(function(c){n(this).wrapAll(b?a.call(this,c):a)})},unwrap:function(){return this.parent().each(function(){n.nodeName(this,"body")||n(this).replaceWith(this.childNodes)}).end()}}),n.expr.filters.hidden=function(a){return!n.expr.filters.visible(a)},n.expr.filters.visible=function(a){return a.offsetWidth>0||a.offsetHeight>0||a.getClientRects().length>0};var Bb=/%20/g,Cb=/\[\]$/,Db=/\r?\n/g,Eb=/^(?:submit|button|image|reset|file)$/i,Fb=/^(?:input|select|textarea|keygen)/i;function Gb(a,b,c,d){var e;if(n.isArray(b))n.each(b,function(b,e){c||Cb.test(a)?d(a,e):Gb(a+"["+("object"==typeof e&&null!=e?b:"")+"]",e,c,d)});else if(c||"object"!==n.type(b))d(a,b);else for(e in b)Gb(a+"["+e+"]",b[e],c,d)}n.param=function(a,b){var c,d=[],e=function(a,b){b=n.isFunction(b)?b():null==b?"":b,d[d.length]=encodeURIComponent(a)+"="+encodeURIComponent(b)};if(void 0===b&&(b=n.ajaxSettings&&n.ajaxSettings.traditional),n.isArray(a)||a.jquery&&!n.isPlainObject(a))n.each(a,function(){e(this.name,this.value)});else for(c in a)Gb(c,a[c],b,e);return d.join("&").replace(Bb,"+")},n.fn.extend({serialize:function(){return n.param(this.serializeArray())},serializeArray:function(){return this.map(function(){var a=n.prop(this,"elements");return a?n.makeArray(a):this}).filter(function(){var a=this.type;return this.name&&!n(this).is(":disabled")&&Fb.test(this.nodeName)&&!Eb.test(a)&&(this.checked||!X.test(a))}).map(function(a,b){var c=n(this).val();return null==c?null:n.isArray(c)?n.map(c,function(a){return{name:b.name,value:a.replace(Db,"\r\n")}}):{name:b.name,value:c.replace(Db,"\r\n")}}).get()}}),n.ajaxSettings.xhr=function(){try{return new a.XMLHttpRequest}catch(b){}};var Hb={0:200,1223:204},Ib=n.ajaxSettings.xhr();l.cors=!!Ib&&"withCredentials"in Ib,l.ajax=Ib=!!Ib,n.ajaxTransport(function(b){var c,d;return l.cors||Ib&&!b.crossDomain?{send:function(e,f){var g,h=b.xhr();if(h.open(b.type,b.url,b.async,b.username,b.password),b.xhrFields)for(g in b.xhrFields)h[g]=b.xhrFields[g];b.mimeType&&h.overrideMimeType&&h.overrideMimeType(b.mimeType),b.crossDomain||e["X-Requested-With"]||(e["X-Requested-With"]="XMLHttpRequest");for(g in e)h.setRequestHeader(g,e[g]);c=function(a){return function(){c&&(c=d=h.onload=h.onerror=h.onabort=h.onreadystatechange=null,"abort"===a?h.abort():"error"===a?"number"!=typeof h.status?f(0,"error"):f(h.status,h.statusText):f(Hb[h.status]||h.status,h.statusText,"text"!==(h.responseType||"text")||"string"!=typeof h.responseText?{binary:h.response}:{text:h.responseText},h.getAllResponseHeaders()))}},h.onload=c(),d=h.onerror=c("error"),void 0!==h.onabort?h.onabort=d:h.onreadystatechange=function(){4===h.readyState&&a.setTimeout(function(){c&&d()})},c=c("abort");try{h.send(b.hasContent&&b.data||null)}catch(i){if(c)throw i}},abort:function(){c&&c()}}:void 0}),n.ajaxSetup({accepts:{script:"text/javascript, application/javascript, application/ecmascript, application/x-ecmascript"},contents:{script:/\b(?:java|ecma)script\b/},converters:{"text script":function(a){return n.globalEval(a),a}}}),n.ajaxPrefilter("script",function(a){void 0===a.cache&&(a.cache=!1),a.crossDomain&&(a.type="GET")}),n.ajaxTransport("script",function(a){if(a.crossDomain){var b,c;return{send:function(e,f){b=n("<script>").prop({charset:a.scriptCharset,src:a.url}).on("load error",c=function(a){b.remove(),c=null,a&&f("error"===a.type?404:200,a.type)}),d.head.appendChild(b[0])},abort:function(){c&&c()}}}});var Jb=[],Kb=/(=)\?(?=&|$)|\?\?/;n.ajaxSetup({jsonp:"callback",jsonpCallback:function(){var a=Jb.pop()||n.expando+"_"+kb++;return this[a]=!0,a}}),n.ajaxPrefilter("json jsonp",function(b,c,d){var e,f,g,h=b.jsonp!==!1&&(Kb.test(b.url)?"url":"string"==typeof b.data&&0===(b.contentType||"").indexOf("application/x-www-form-urlencoded")&&Kb.test(b.data)&&"data");return h||"jsonp"===b.dataTypes[0]?(e=b.jsonpCallback=n.isFunction(b.jsonpCallback)?b.jsonpCallback():b.jsonpCallback,h?b[h]=b[h].replace(Kb,"$1"+e):b.jsonp!==!1&&(b.url+=(lb.test(b.url)?"&":"?")+b.jsonp+"="+e),b.converters["script json"]=function(){return g||n.error(e+" was not called"),g[0]},b.dataTypes[0]="json",f=a[e],a[e]=function(){g=arguments},d.always(function(){void 0===f?n(a).removeProp(e):a[e]=f,b[e]&&(b.jsonpCallback=c.jsonpCallback,Jb.push(e)),g&&n.isFunction(f)&&f(g[0]),g=f=void 0}),"script"):void 0}),n.parseHTML=function(a,b,c){if(!a||"string"!=typeof a)return null;"boolean"==typeof b&&(c=b,b=!1),b=b||d;var e=x.exec(a),f=!c&&[];return e?[b.createElement(e[1])]:(e=ca([a],b,f),f&&f.length&&n(f).remove(),n.merge([],e.childNodes))};var Lb=n.fn.load;n.fn.load=function(a,b,c){if("string"!=typeof a&&Lb)return Lb.apply(this,arguments);var d,e,f,g=this,h=a.indexOf(" ");return h>-1&&(d=n.trim(a.slice(h)),a=a.slice(0,h)),n.isFunction(b)?(c=b,b=void 0):b&&"object"==typeof b&&(e="POST"),g.length>0&&n.ajax({url:a,type:e||"GET",dataType:"html",data:b}).done(function(a){f=arguments,g.html(d?n("<div>").append(n.parseHTML(a)).find(d):a)}).always(c&&function(a,b){g.each(function(){c.apply(this,f||[a.responseText,b,a])})}),this},n.each(["ajaxStart","ajaxStop","ajaxComplete","ajaxError","ajaxSuccess","ajaxSend"],function(a,b){n.fn[b]=function(a){return this.on(b,a)}}),n.expr.filters.animated=function(a){return n.grep(n.timers,function(b){return a===b.elem}).length};function Mb(a){return n.isWindow(a)?a:9===a.nodeType&&a.defaultView}n.offset={setOffset:function(a,b,c){var d,e,f,g,h,i,j,k=n.css(a,"position"),l=n(a),m={};"static"===k&&(a.style.position="relative"),h=l.offset(),f=n.css(a,"top"),i=n.css(a,"left"),j=("absolute"===k||"fixed"===k)&&(f+i).indexOf("auto")>-1,j?(d=l.position(),g=d.top,e=d.left):(g=parseFloat(f)||0,e=parseFloat(i)||0),n.isFunction(b)&&(b=b.call(a,c,n.extend({},h))),null!=b.top&&(m.top=b.top-h.top+g),null!=b.left&&(m.left=b.left-h.left+e),"using"in b?b.using.call(a,m):l.css(m)}},n.fn.extend({offset:function(a){if(arguments.length)return void 0===a?this:this.each(function(b){n.offset.setOffset(this,a,b)});var b,c,d=this[0],e={top:0,left:0},f=d&&d.ownerDocument;if(f)return b=f.documentElement,n.contains(b,d)?(e=d.getBoundingClientRect(),c=Mb(f),{top:e.top+c.pageYOffset-b.clientTop,left:e.left+c.pageXOffset-b.clientLeft}):e},position:function(){if(this[0]){var a,b,c=this[0],d={top:0,left:0};return"fixed"===n.css(c,"position")?b=c.getBoundingClientRect():(a=this.offsetParent(),b=this.offset(),n.nodeName(a[0],"html")||(d=a.offset()),d.top+=n.css(a[0],"borderTopWidth",!0),d.left+=n.css(a[0],"borderLeftWidth",!0)),{top:b.top-d.top-n.css(c,"marginTop",!0),left:b.left-d.left-n.css(c,"marginLeft",!0)}}},offsetParent:function(){return this.map(function(){var a=this.offsetParent;while(a&&"static"===n.css(a,"position"))a=a.offsetParent;return a||Ea})}}),n.each({scrollLeft:"pageXOffset",scrollTop:"pageYOffset"},function(a,b){var c="pageYOffset"===b;n.fn[a]=function(d){return K(this,function(a,d,e){var f=Mb(a);return void 0===e?f?f[b]:a[d]:void(f?f.scrollTo(c?f.pageXOffset:e,c?e:f.pageYOffset):a[d]=e)},a,d,arguments.length)}}),n.each(["top","left"],function(a,b){n.cssHooks[b]=Ga(l.pixelPosition,function(a,c){return c?(c=Fa(a,b),Ba.test(c)?n(a).position()[b]+"px":c):void 0})}),n.each({Height:"height",Width:"width"},function(a,b){n.each({padding:"inner"+a,content:b,"":"outer"+a},function(c,d){n.fn[d]=function(d,e){var f=arguments.length&&(c||"boolean"!=typeof d),g=c||(d===!0||e===!0?"margin":"border");return K(this,function(b,c,d){var e;return n.isWindow(b)?b.document.documentElement["client"+a]:9===b.nodeType?(e=b.documentElement,Math.max(b.body["scroll"+a],e["scroll"+a],b.body["offset"+a],e["offset"+a],e["client"+a])):void 0===d?n.css(b,c,g):n.style(b,c,d,g)},b,f?d:void 0,f,null)}})}),n.fn.extend({bind:function(a,b,c){return this.on(a,null,b,c)},unbind:function(a,b){return this.off(a,null,b)},delegate:function(a,b,c,d){return this.on(b,a,c,d)},undelegate:function(a,b,c){return 1===arguments.length?this.off(a,"**"):this.off(b,a||"**",c)},size:function(){return this.length}}),n.fn.andSelf=n.fn.addBack,"function"==typeof define&&define.amd&&define("jquery",[],function(){return n});var Nb=a.jQuery,Ob=a.$;return n.noConflict=function(b){return a.$===n&&(a.$=Ob),b&&a.jQuery===n&&(a.jQuery=Nb),n},b||(a.jQuery=a.$=n),n});
diff --git a/previous_versions/v0.4.0/libs/moment-2.8.4/moment.js b/previous_versions/v0.4.0/libs/moment-2.8.4/moment.js
new file mode 100644
index 000000000..85e190d4a
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/moment-2.8.4/moment.js
@@ -0,0 +1,2936 @@
+//! moment.js
+//! version : 2.8.4
+//! authors : Tim Wood, Iskren Chernev, Moment.js contributors
+//! license : MIT
+//! momentjs.com
+
+(function (undefined) {
+    /************************************
+        Constants
+    ************************************/
+
+    var moment,
+        VERSION = '2.8.4',
+        // the global-scope this is NOT the global object in Node.js
+        globalScope = typeof global !== 'undefined' ? global : this,
+        oldGlobalMoment,
+        round = Math.round,
+        hasOwnProperty = Object.prototype.hasOwnProperty,
+        i,
+
+        YEAR = 0,
+        MONTH = 1,
+        DATE = 2,
+        HOUR = 3,
+        MINUTE = 4,
+        SECOND = 5,
+        MILLISECOND = 6,
+
+        // internal storage for locale config files
+        locales = {},
+
+        // extra moment internal properties (plugins register props here)
+        momentProperties = [],
+
+        // check for nodeJS
+        hasModule = (typeof module !== 'undefined' && module && module.exports),
+
+        // ASP.NET json date format regex
+        aspNetJsonRegex = /^\/?Date\((\-?\d+)/i,
+        aspNetTimeSpanJsonRegex = /(\-)?(?:(\d*)\.)?(\d+)\:(\d+)(?:\:(\d+)\.?(\d{3})?)?/,
+
+        // from http://docs.closure-library.googlecode.com/git/closure_goog_date_date.js.source.html
+        // somewhat more in line with 4.4.3.2 2004 spec, but allows decimal anywhere
+        isoDurationRegex = /^(-)?P(?:(?:([0-9,.]*)Y)?(?:([0-9,.]*)M)?(?:([0-9,.]*)D)?(?:T(?:([0-9,.]*)H)?(?:([0-9,.]*)M)?(?:([0-9,.]*)S)?)?|([0-9,.]*)W)$/,
+
+        // format tokens
+        formattingTokens = /(\[[^\[]*\])|(\\)?(Mo|MM?M?M?|Do|DDDo|DD?D?D?|ddd?d?|do?|w[o|w]?|W[o|W]?|Q|YYYYYY|YYYYY|YYYY|YY|gg(ggg?)?|GG(GGG?)?|e|E|a|A|hh?|HH?|mm?|ss?|S{1,4}|x|X|zz?|ZZ?|.)/g,
+        localFormattingTokens = /(\[[^\[]*\])|(\\)?(LTS|LT|LL?L?L?|l{1,4})/g,
+
+        // parsing token regexes
+        parseTokenOneOrTwoDigits = /\d\d?/, // 0 - 99
+        parseTokenOneToThreeDigits = /\d{1,3}/, // 0 - 999
+        parseTokenOneToFourDigits = /\d{1,4}/, // 0 - 9999
+        parseTokenOneToSixDigits = /[+\-]?\d{1,6}/, // -999,999 - 999,999
+        parseTokenDigits = /\d+/, // nonzero number of digits
+        parseTokenWord = /[0-9]*['a-z\u00A0-\u05FF\u0700-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]+|[\u0600-\u06FF\/]+(\s*?[\u0600-\u06FF]+){1,2}/i, // any word (or two) characters or numbers including two/three word month in arabic.
+        parseTokenTimezone = /Z|[\+\-]\d\d:?\d\d/gi, // +00:00 -00:00 +0000 -0000 or Z
+        parseTokenT = /T/i, // T (ISO separator)
+        parseTokenOffsetMs = /[\+\-]?\d+/, // 1234567890123
+        parseTokenTimestampMs = /[\+\-]?\d+(\.\d{1,3})?/, // 123456789 123456789.123
+
+        //strict parsing regexes
+        parseTokenOneDigit = /\d/, // 0 - 9
+        parseTokenTwoDigits = /\d\d/, // 00 - 99
+        parseTokenThreeDigits = /\d{3}/, // 000 - 999
+        parseTokenFourDigits = /\d{4}/, // 0000 - 9999
+        parseTokenSixDigits = /[+-]?\d{6}/, // -999,999 - 999,999
+        parseTokenSignedNumber = /[+-]?\d+/, // -inf - inf
+
+        // iso 8601 regex
+        // 0000-00-00 0000-W00 or 0000-W00-0 + T + 00 or 00:00 or 00:00:00 or 00:00:00.000 + +00:00 or +0000 or +00)
+        isoRegex = /^\s*(?:[+-]\d{6}|\d{4})-(?:(\d\d-\d\d)|(W\d\d$)|(W\d\d-\d)|(\d\d\d))((T| )(\d\d(:\d\d(:\d\d(\.\d+)?)?)?)?([\+\-]\d\d(?::?\d\d)?|\s*Z)?)?$/,
+
+        isoFormat = 'YYYY-MM-DDTHH:mm:ssZ',
+
+        isoDates = [
+            ['YYYYYY-MM-DD', /[+-]\d{6}-\d{2}-\d{2}/],
+            ['YYYY-MM-DD', /\d{4}-\d{2}-\d{2}/],
+            ['GGGG-[W]WW-E', /\d{4}-W\d{2}-\d/],
+            ['GGGG-[W]WW', /\d{4}-W\d{2}/],
+            ['YYYY-DDD', /\d{4}-\d{3}/]
+        ],
+
+        // iso time formats and regexes
+        isoTimes = [
+            ['HH:mm:ss.SSSS', /(T| )\d\d:\d\d:\d\d\.\d+/],
+            ['HH:mm:ss', /(T| )\d\d:\d\d:\d\d/],
+            ['HH:mm', /(T| )\d\d:\d\d/],
+            ['HH', /(T| )\d\d/]
+        ],
+
+        // timezone chunker '+10:00' > ['10', '00'] or '-1530' > ['-15', '30']
+        parseTimezoneChunker = /([\+\-]|\d\d)/gi,
+
+        // getter and setter names
+        proxyGettersAndSetters = 'Date|Hours|Minutes|Seconds|Milliseconds'.split('|'),
+        unitMillisecondFactors = {
+            'Milliseconds' : 1,
+            'Seconds' : 1e3,
+            'Minutes' : 6e4,
+            'Hours' : 36e5,
+            'Days' : 864e5,
+            'Months' : 2592e6,
+            'Years' : 31536e6
+        },
+
+        unitAliases = {
+            ms : 'millisecond',
+            s : 'second',
+            m : 'minute',
+            h : 'hour',
+            d : 'day',
+            D : 'date',
+            w : 'week',
+            W : 'isoWeek',
+            M : 'month',
+            Q : 'quarter',
+            y : 'year',
+            DDD : 'dayOfYear',
+            e : 'weekday',
+            E : 'isoWeekday',
+            gg: 'weekYear',
+            GG: 'isoWeekYear'
+        },
+
+        camelFunctions = {
+            dayofyear : 'dayOfYear',
+            isoweekday : 'isoWeekday',
+            isoweek : 'isoWeek',
+            weekyear : 'weekYear',
+            isoweekyear : 'isoWeekYear'
+        },
+
+        // format function strings
+        formatFunctions = {},
+
+        // default relative time thresholds
+        relativeTimeThresholds = {
+            s: 45,  // seconds to minute
+            m: 45,  // minutes to hour
+            h: 22,  // hours to day
+            d: 26,  // days to month
+            M: 11   // months to year
+        },
+
+        // tokens to ordinalize and pad
+        ordinalizeTokens = 'DDD w W M D d'.split(' '),
+        paddedTokens = 'M D H h m s w W'.split(' '),
+
+        formatTokenFunctions = {
+            M    : function () {
+                return this.month() + 1;
+            },
+            MMM  : function (format) {
+                return this.localeData().monthsShort(this, format);
+            },
+            MMMM : function (format) {
+                return this.localeData().months(this, format);
+            },
+            D    : function () {
+                return this.date();
+            },
+            DDD  : function () {
+                return this.dayOfYear();
+            },
+            d    : function () {
+                return this.day();
+            },
+            dd   : function (format) {
+                return this.localeData().weekdaysMin(this, format);
+            },
+            ddd  : function (format) {
+                return this.localeData().weekdaysShort(this, format);
+            },
+            dddd : function (format) {
+                return this.localeData().weekdays(this, format);
+            },
+            w    : function () {
+                return this.week();
+            },
+            W    : function () {
+                return this.isoWeek();
+            },
+            YY   : function () {
+                return leftZeroFill(this.year() % 100, 2);
+            },
+            YYYY : function () {
+                return leftZeroFill(this.year(), 4);
+            },
+            YYYYY : function () {
+                return leftZeroFill(this.year(), 5);
+            },
+            YYYYYY : function () {
+                var y = this.year(), sign = y >= 0 ? '+' : '-';
+                return sign + leftZeroFill(Math.abs(y), 6);
+            },
+            gg   : function () {
+                return leftZeroFill(this.weekYear() % 100, 2);
+            },
+            gggg : function () {
+                return leftZeroFill(this.weekYear(), 4);
+            },
+            ggggg : function () {
+                return leftZeroFill(this.weekYear(), 5);
+            },
+            GG   : function () {
+                return leftZeroFill(this.isoWeekYear() % 100, 2);
+            },
+            GGGG : function () {
+                return leftZeroFill(this.isoWeekYear(), 4);
+            },
+            GGGGG : function () {
+                return leftZeroFill(this.isoWeekYear(), 5);
+            },
+            e : function () {
+                return this.weekday();
+            },
+            E : function () {
+                return this.isoWeekday();
+            },
+            a    : function () {
+                return this.localeData().meridiem(this.hours(), this.minutes(), true);
+            },
+            A    : function () {
+                return this.localeData().meridiem(this.hours(), this.minutes(), false);
+            },
+            H    : function () {
+                return this.hours();
+            },
+            h    : function () {
+                return this.hours() % 12 || 12;
+            },
+            m    : function () {
+                return this.minutes();
+            },
+            s    : function () {
+                return this.seconds();
+            },
+            S    : function () {
+                return toInt(this.milliseconds() / 100);
+            },
+            SS   : function () {
+                return leftZeroFill(toInt(this.milliseconds() / 10), 2);
+            },
+            SSS  : function () {
+                return leftZeroFill(this.milliseconds(), 3);
+            },
+            SSSS : function () {
+                return leftZeroFill(this.milliseconds(), 3);
+            },
+            Z    : function () {
+                var a = -this.zone(),
+                    b = '+';
+                if (a < 0) {
+                    a = -a;
+                    b = '-';
+                }
+                return b + leftZeroFill(toInt(a / 60), 2) + ':' + leftZeroFill(toInt(a) % 60, 2);
+            },
+            ZZ   : function () {
+                var a = -this.zone(),
+                    b = '+';
+                if (a < 0) {
+                    a = -a;
+                    b = '-';
+                }
+                return b + leftZeroFill(toInt(a / 60), 2) + leftZeroFill(toInt(a) % 60, 2);
+            },
+            z : function () {
+                return this.zoneAbbr();
+            },
+            zz : function () {
+                return this.zoneName();
+            },
+            x    : function () {
+                return this.valueOf();
+            },
+            X    : function () {
+                return this.unix();
+            },
+            Q : function () {
+                return this.quarter();
+            }
+        },
+
+        deprecations = {},
+
+        lists = ['months', 'monthsShort', 'weekdays', 'weekdaysShort', 'weekdaysMin'];
+
+    // Pick the first defined of two or three arguments. dfl comes from
+    // default.
+    function dfl(a, b, c) {
+        switch (arguments.length) {
+            case 2: return a != null ? a : b;
+            case 3: return a != null ? a : b != null ? b : c;
+            default: throw new Error('Implement me');
+        }
+    }
+
+    function hasOwnProp(a, b) {
+        return hasOwnProperty.call(a, b);
+    }
+
+    function defaultParsingFlags() {
+        // We need to deep clone this object, and es5 standard is not very
+        // helpful.
+        return {
+            empty : false,
+            unusedTokens : [],
+            unusedInput : [],
+            overflow : -2,
+            charsLeftOver : 0,
+            nullInput : false,
+            invalidMonth : null,
+            invalidFormat : false,
+            userInvalidated : false,
+            iso: false
+        };
+    }
+
+    function printMsg(msg) {
+        if (moment.suppressDeprecationWarnings === false &&
+                typeof console !== 'undefined' && console.warn) {
+            console.warn('Deprecation warning: ' + msg);
+        }
+    }
+
+    function deprecate(msg, fn) {
+        var firstTime = true;
+        return extend(function () {
+            if (firstTime) {
+                printMsg(msg);
+                firstTime = false;
+            }
+            return fn.apply(this, arguments);
+        }, fn);
+    }
+
+    function deprecateSimple(name, msg) {
+        if (!deprecations[name]) {
+            printMsg(msg);
+            deprecations[name] = true;
+        }
+    }
+
+    function padToken(func, count) {
+        return function (a) {
+            return leftZeroFill(func.call(this, a), count);
+        };
+    }
+    function ordinalizeToken(func, period) {
+        return function (a) {
+            return this.localeData().ordinal(func.call(this, a), period);
+        };
+    }
+
+    while (ordinalizeTokens.length) {
+        i = ordinalizeTokens.pop();
+        formatTokenFunctions[i + 'o'] = ordinalizeToken(formatTokenFunctions[i], i);
+    }
+    while (paddedTokens.length) {
+        i = paddedTokens.pop();
+        formatTokenFunctions[i + i] = padToken(formatTokenFunctions[i], 2);
+    }
+    formatTokenFunctions.DDDD = padToken(formatTokenFunctions.DDD, 3);
+
+
+    /************************************
+        Constructors
+    ************************************/
+
+    function Locale() {
+    }
+
+    // Moment prototype object
+    function Moment(config, skipOverflow) {
+        if (skipOverflow !== false) {
+            checkOverflow(config);
+        }
+        copyConfig(this, config);
+        this._d = new Date(+config._d);
+    }
+
+    // Duration Constructor
+    function Duration(duration) {
+        var normalizedInput = normalizeObjectUnits(duration),
+            years = normalizedInput.year || 0,
+            quarters = normalizedInput.quarter || 0,
+            months = normalizedInput.month || 0,
+            weeks = normalizedInput.week || 0,
+            days = normalizedInput.day || 0,
+            hours = normalizedInput.hour || 0,
+            minutes = normalizedInput.minute || 0,
+            seconds = normalizedInput.second || 0,
+            milliseconds = normalizedInput.millisecond || 0;
+
+        // representation for dateAddRemove
+        this._milliseconds = +milliseconds +
+            seconds * 1e3 + // 1000
+            minutes * 6e4 + // 1000 * 60
+            hours * 36e5; // 1000 * 60 * 60
+        // Because of dateAddRemove treats 24 hours as different from a
+        // day when working around DST, we need to store them separately
+        this._days = +days +
+            weeks * 7;
+        // It is impossible translate months into days without knowing
+        // which months you are are talking about, so we have to store
+        // it separately.
+        this._months = +months +
+            quarters * 3 +
+            years * 12;
+
+        this._data = {};
+
+        this._locale = moment.localeData();
+
+        this._bubble();
+    }
+
+    /************************************
+        Helpers
+    ************************************/
+
+
+    function extend(a, b) {
+        for (var i in b) {
+            if (hasOwnProp(b, i)) {
+                a[i] = b[i];
+            }
+        }
+
+        if (hasOwnProp(b, 'toString')) {
+            a.toString = b.toString;
+        }
+
+        if (hasOwnProp(b, 'valueOf')) {
+            a.valueOf = b.valueOf;
+        }
+
+        return a;
+    }
+
+    function copyConfig(to, from) {
+        var i, prop, val;
+
+        if (typeof from._isAMomentObject !== 'undefined') {
+            to._isAMomentObject = from._isAMomentObject;
+        }
+        if (typeof from._i !== 'undefined') {
+            to._i = from._i;
+        }
+        if (typeof from._f !== 'undefined') {
+            to._f = from._f;
+        }
+        if (typeof from._l !== 'undefined') {
+            to._l = from._l;
+        }
+        if (typeof from._strict !== 'undefined') {
+            to._strict = from._strict;
+        }
+        if (typeof from._tzm !== 'undefined') {
+            to._tzm = from._tzm;
+        }
+        if (typeof from._isUTC !== 'undefined') {
+            to._isUTC = from._isUTC;
+        }
+        if (typeof from._offset !== 'undefined') {
+            to._offset = from._offset;
+        }
+        if (typeof from._pf !== 'undefined') {
+            to._pf = from._pf;
+        }
+        if (typeof from._locale !== 'undefined') {
+            to._locale = from._locale;
+        }
+
+        if (momentProperties.length > 0) {
+            for (i in momentProperties) {
+                prop = momentProperties[i];
+                val = from[prop];
+                if (typeof val !== 'undefined') {
+                    to[prop] = val;
+                }
+            }
+        }
+
+        return to;
+    }
+
+    function absRound(number) {
+        if (number < 0) {
+            return Math.ceil(number);
+        } else {
+            return Math.floor(number);
+        }
+    }
+
+    // left zero fill a number
+    // see http://jsperf.com/left-zero-filling for performance comparison
+    function leftZeroFill(number, targetLength, forceSign) {
+        var output = '' + Math.abs(number),
+            sign = number >= 0;
+
+        while (output.length < targetLength) {
+            output = '0' + output;
+        }
+        return (sign ? (forceSign ? '+' : '') : '-') + output;
+    }
+
+    function positiveMomentsDifference(base, other) {
+        var res = {milliseconds: 0, months: 0};
+
+        res.months = other.month() - base.month() +
+            (other.year() - base.year()) * 12;
+        if (base.clone().add(res.months, 'M').isAfter(other)) {
+            --res.months;
+        }
+
+        res.milliseconds = +other - +(base.clone().add(res.months, 'M'));
+
+        return res;
+    }
+
+    function momentsDifference(base, other) {
+        var res;
+        other = makeAs(other, base);
+        if (base.isBefore(other)) {
+            res = positiveMomentsDifference(base, other);
+        } else {
+            res = positiveMomentsDifference(other, base);
+            res.milliseconds = -res.milliseconds;
+            res.months = -res.months;
+        }
+
+        return res;
+    }
+
+    // TODO: remove 'name' arg after deprecation is removed
+    function createAdder(direction, name) {
+        return function (val, period) {
+            var dur, tmp;
+            //invert the arguments, but complain about it
+            if (period !== null && !isNaN(+period)) {
+                deprecateSimple(name, 'moment().' + name  + '(period, number) is deprecated. Please use moment().' + name + '(number, period).');
+                tmp = val; val = period; period = tmp;
+            }
+
+            val = typeof val === 'string' ? +val : val;
+            dur = moment.duration(val, period);
+            addOrSubtractDurationFromMoment(this, dur, direction);
+            return this;
+        };
+    }
+
+    function addOrSubtractDurationFromMoment(mom, duration, isAdding, updateOffset) {
+        var milliseconds = duration._milliseconds,
+            days = duration._days,
+            months = duration._months;
+        updateOffset = updateOffset == null ? true : updateOffset;
+
+        if (milliseconds) {
+            mom._d.setTime(+mom._d + milliseconds * isAdding);
+        }
+        if (days) {
+            rawSetter(mom, 'Date', rawGetter(mom, 'Date') + days * isAdding);
+        }
+        if (months) {
+            rawMonthSetter(mom, rawGetter(mom, 'Month') + months * isAdding);
+        }
+        if (updateOffset) {
+            moment.updateOffset(mom, days || months);
+        }
+    }
+
+    // check if is an array
+    function isArray(input) {
+        return Object.prototype.toString.call(input) === '[object Array]';
+    }
+
+    function isDate(input) {
+        return Object.prototype.toString.call(input) === '[object Date]' ||
+            input instanceof Date;
+    }
+
+    // compare two arrays, return the number of differences
+    function compareArrays(array1, array2, dontConvert) {
+        var len = Math.min(array1.length, array2.length),
+            lengthDiff = Math.abs(array1.length - array2.length),
+            diffs = 0,
+            i;
+        for (i = 0; i < len; i++) {
+            if ((dontConvert && array1[i] !== array2[i]) ||
+                (!dontConvert && toInt(array1[i]) !== toInt(array2[i]))) {
+                diffs++;
+            }
+        }
+        return diffs + lengthDiff;
+    }
+
+    function normalizeUnits(units) {
+        if (units) {
+            var lowered = units.toLowerCase().replace(/(.)s$/, '$1');
+            units = unitAliases[units] || camelFunctions[lowered] || lowered;
+        }
+        return units;
+    }
+
+    function normalizeObjectUnits(inputObject) {
+        var normalizedInput = {},
+            normalizedProp,
+            prop;
+
+        for (prop in inputObject) {
+            if (hasOwnProp(inputObject, prop)) {
+                normalizedProp = normalizeUnits(prop);
+                if (normalizedProp) {
+                    normalizedInput[normalizedProp] = inputObject[prop];
+                }
+            }
+        }
+
+        return normalizedInput;
+    }
+
+    function makeList(field) {
+        var count, setter;
+
+        if (field.indexOf('week') === 0) {
+            count = 7;
+            setter = 'day';
+        }
+        else if (field.indexOf('month') === 0) {
+            count = 12;
+            setter = 'month';
+        }
+        else {
+            return;
+        }
+
+        moment[field] = function (format, index) {
+            var i, getter,
+                method = moment._locale[field],
+                results = [];
+
+            if (typeof format === 'number') {
+                index = format;
+                format = undefined;
+            }
+
+            getter = function (i) {
+                var m = moment().utc().set(setter, i);
+                return method.call(moment._locale, m, format || '');
+            };
+
+            if (index != null) {
+                return getter(index);
+            }
+            else {
+                for (i = 0; i < count; i++) {
+                    results.push(getter(i));
+                }
+                return results;
+            }
+        };
+    }
+
+    function toInt(argumentForCoercion) {
+        var coercedNumber = +argumentForCoercion,
+            value = 0;
+
+        if (coercedNumber !== 0 && isFinite(coercedNumber)) {
+            if (coercedNumber >= 0) {
+                value = Math.floor(coercedNumber);
+            } else {
+                value = Math.ceil(coercedNumber);
+            }
+        }
+
+        return value;
+    }
+
+    function daysInMonth(year, month) {
+        return new Date(Date.UTC(year, month + 1, 0)).getUTCDate();
+    }
+
+    function weeksInYear(year, dow, doy) {
+        return weekOfYear(moment([year, 11, 31 + dow - doy]), dow, doy).week;
+    }
+
+    function daysInYear(year) {
+        return isLeapYear(year) ? 366 : 365;
+    }
+
+    function isLeapYear(year) {
+        return (year % 4 === 0 && year % 100 !== 0) || year % 400 === 0;
+    }
+
+    function checkOverflow(m) {
+        var overflow;
+        if (m._a && m._pf.overflow === -2) {
+            overflow =
+                m._a[MONTH] < 0 || m._a[MONTH] > 11 ? MONTH :
+                m._a[DATE] < 1 || m._a[DATE] > daysInMonth(m._a[YEAR], m._a[MONTH]) ? DATE :
+                m._a[HOUR] < 0 || m._a[HOUR] > 24 ||
+                    (m._a[HOUR] === 24 && (m._a[MINUTE] !== 0 ||
+                                           m._a[SECOND] !== 0 ||
+                                           m._a[MILLISECOND] !== 0)) ? HOUR :
+                m._a[MINUTE] < 0 || m._a[MINUTE] > 59 ? MINUTE :
+                m._a[SECOND] < 0 || m._a[SECOND] > 59 ? SECOND :
+                m._a[MILLISECOND] < 0 || m._a[MILLISECOND] > 999 ? MILLISECOND :
+                -1;
+
+            if (m._pf._overflowDayOfYear && (overflow < YEAR || overflow > DATE)) {
+                overflow = DATE;
+            }
+
+            m._pf.overflow = overflow;
+        }
+    }
+
+    function isValid(m) {
+        if (m._isValid == null) {
+            m._isValid = !isNaN(m._d.getTime()) &&
+                m._pf.overflow < 0 &&
+                !m._pf.empty &&
+                !m._pf.invalidMonth &&
+                !m._pf.nullInput &&
+                !m._pf.invalidFormat &&
+                !m._pf.userInvalidated;
+
+            if (m._strict) {
+                m._isValid = m._isValid &&
+                    m._pf.charsLeftOver === 0 &&
+                    m._pf.unusedTokens.length === 0 &&
+                    m._pf.bigHour === undefined;
+            }
+        }
+        return m._isValid;
+    }
+
+    function normalizeLocale(key) {
+        return key ? key.toLowerCase().replace('_', '-') : key;
+    }
+
+    // pick the locale from the array
+    // try ['en-au', 'en-gb'] as 'en-au', 'en-gb', 'en', as in move through the list trying each
+    // substring from most specific to least, but move to the next array item if it's a more specific variant than the current root
+    function chooseLocale(names) {
+        var i = 0, j, next, locale, split;
+
+        while (i < names.length) {
+            split = normalizeLocale(names[i]).split('-');
+            j = split.length;
+            next = normalizeLocale(names[i + 1]);
+            next = next ? next.split('-') : null;
+            while (j > 0) {
+                locale = loadLocale(split.slice(0, j).join('-'));
+                if (locale) {
+                    return locale;
+                }
+                if (next && next.length >= j && compareArrays(split, next, true) >= j - 1) {
+                    //the next array item is better than a shallower substring of this one
+                    break;
+                }
+                j--;
+            }
+            i++;
+        }
+        return null;
+    }
+
+    function loadLocale(name) {
+        var oldLocale = null;
+        if (!locales[name] && hasModule) {
+            try {
+                oldLocale = moment.locale();
+                require('./locale/' + name);
+                // because defineLocale currently also sets the global locale, we want to undo that for lazy loaded locales
+                moment.locale(oldLocale);
+            } catch (e) { }
+        }
+        return locales[name];
+    }
+
+    // Return a moment from input, that is local/utc/zone equivalent to model.
+    function makeAs(input, model) {
+        var res, diff;
+        if (model._isUTC) {
+            res = model.clone();
+            diff = (moment.isMoment(input) || isDate(input) ?
+                    +input : +moment(input)) - (+res);
+            // Use low-level api, because this fn is low-level api.
+            res._d.setTime(+res._d + diff);
+            moment.updateOffset(res, false);
+            return res;
+        } else {
+            return moment(input).local();
+        }
+    }
+
+    /************************************
+        Locale
+    ************************************/
+
+
+    extend(Locale.prototype, {
+
+        set : function (config) {
+            var prop, i;
+            for (i in config) {
+                prop = config[i];
+                if (typeof prop === 'function') {
+                    this[i] = prop;
+                } else {
+                    this['_' + i] = prop;
+                }
+            }
+            // Lenient ordinal parsing accepts just a number in addition to
+            // number + (possibly) stuff coming from _ordinalParseLenient.
+            this._ordinalParseLenient = new RegExp(this._ordinalParse.source + '|' + /\d{1,2}/.source);
+        },
+
+        _months : 'January_February_March_April_May_June_July_August_September_October_November_December'.split('_'),
+        months : function (m) {
+            return this._months[m.month()];
+        },
+
+        _monthsShort : 'Jan_Feb_Mar_Apr_May_Jun_Jul_Aug_Sep_Oct_Nov_Dec'.split('_'),
+        monthsShort : function (m) {
+            return this._monthsShort[m.month()];
+        },
+
+        monthsParse : function (monthName, format, strict) {
+            var i, mom, regex;
+
+            if (!this._monthsParse) {
+                this._monthsParse = [];
+                this._longMonthsParse = [];
+                this._shortMonthsParse = [];
+            }
+
+            for (i = 0; i < 12; i++) {
+                // make the regex if we don't have it already
+                mom = moment.utc([2000, i]);
+                if (strict && !this._longMonthsParse[i]) {
+                    this._longMonthsParse[i] = new RegExp('^' + this.months(mom, '').replace('.', '') + '$', 'i');
+                    this._shortMonthsParse[i] = new RegExp('^' + this.monthsShort(mom, '').replace('.', '') + '$', 'i');
+                }
+                if (!strict && !this._monthsParse[i]) {
+                    regex = '^' + this.months(mom, '') + '|^' + this.monthsShort(mom, '');
+                    this._monthsParse[i] = new RegExp(regex.replace('.', ''), 'i');
+                }
+                // test the regex
+                if (strict && format === 'MMMM' && this._longMonthsParse[i].test(monthName)) {
+                    return i;
+                } else if (strict && format === 'MMM' && this._shortMonthsParse[i].test(monthName)) {
+                    return i;
+                } else if (!strict && this._monthsParse[i].test(monthName)) {
+                    return i;
+                }
+            }
+        },
+
+        _weekdays : 'Sunday_Monday_Tuesday_Wednesday_Thursday_Friday_Saturday'.split('_'),
+        weekdays : function (m) {
+            return this._weekdays[m.day()];
+        },
+
+        _weekdaysShort : 'Sun_Mon_Tue_Wed_Thu_Fri_Sat'.split('_'),
+        weekdaysShort : function (m) {
+            return this._weekdaysShort[m.day()];
+        },
+
+        _weekdaysMin : 'Su_Mo_Tu_We_Th_Fr_Sa'.split('_'),
+        weekdaysMin : function (m) {
+            return this._weekdaysMin[m.day()];
+        },
+
+        weekdaysParse : function (weekdayName) {
+            var i, mom, regex;
+
+            if (!this._weekdaysParse) {
+                this._weekdaysParse = [];
+            }
+
+            for (i = 0; i < 7; i++) {
+                // make the regex if we don't have it already
+                if (!this._weekdaysParse[i]) {
+                    mom = moment([2000, 1]).day(i);
+                    regex = '^' + this.weekdays(mom, '') + '|^' + this.weekdaysShort(mom, '') + '|^' + this.weekdaysMin(mom, '');
+                    this._weekdaysParse[i] = new RegExp(regex.replace('.', ''), 'i');
+                }
+                // test the regex
+                if (this._weekdaysParse[i].test(weekdayName)) {
+                    return i;
+                }
+            }
+        },
+
+        _longDateFormat : {
+            LTS : 'h:mm:ss A',
+            LT : 'h:mm A',
+            L : 'MM/DD/YYYY',
+            LL : 'MMMM D, YYYY',
+            LLL : 'MMMM D, YYYY LT',
+            LLLL : 'dddd, MMMM D, YYYY LT'
+        },
+        longDateFormat : function (key) {
+            var output = this._longDateFormat[key];
+            if (!output && this._longDateFormat[key.toUpperCase()]) {
+                output = this._longDateFormat[key.toUpperCase()].replace(/MMMM|MM|DD|dddd/g, function (val) {
+                    return val.slice(1);
+                });
+                this._longDateFormat[key] = output;
+            }
+            return output;
+        },
+
+        isPM : function (input) {
+            // IE8 Quirks Mode & IE7 Standards Mode do not allow accessing strings like arrays
+            // Using charAt should be more compatible.
+            return ((input + '').toLowerCase().charAt(0) === 'p');
+        },
+
+        _meridiemParse : /[ap]\.?m?\.?/i,
+        meridiem : function (hours, minutes, isLower) {
+            if (hours > 11) {
+                return isLower ? 'pm' : 'PM';
+            } else {
+                return isLower ? 'am' : 'AM';
+            }
+        },
+
+        _calendar : {
+            sameDay : '[Today at] LT',
+            nextDay : '[Tomorrow at] LT',
+            nextWeek : 'dddd [at] LT',
+            lastDay : '[Yesterday at] LT',
+            lastWeek : '[Last] dddd [at] LT',
+            sameElse : 'L'
+        },
+        calendar : function (key, mom, now) {
+            var output = this._calendar[key];
+            return typeof output === 'function' ? output.apply(mom, [now]) : output;
+        },
+
+        _relativeTime : {
+            future : 'in %s',
+            past : '%s ago',
+            s : 'a few seconds',
+            m : 'a minute',
+            mm : '%d minutes',
+            h : 'an hour',
+            hh : '%d hours',
+            d : 'a day',
+            dd : '%d days',
+            M : 'a month',
+            MM : '%d months',
+            y : 'a year',
+            yy : '%d years'
+        },
+
+        relativeTime : function (number, withoutSuffix, string, isFuture) {
+            var output = this._relativeTime[string];
+            return (typeof output === 'function') ?
+                output(number, withoutSuffix, string, isFuture) :
+                output.replace(/%d/i, number);
+        },
+
+        pastFuture : function (diff, output) {
+            var format = this._relativeTime[diff > 0 ? 'future' : 'past'];
+            return typeof format === 'function' ? format(output) : format.replace(/%s/i, output);
+        },
+
+        ordinal : function (number) {
+            return this._ordinal.replace('%d', number);
+        },
+        _ordinal : '%d',
+        _ordinalParse : /\d{1,2}/,
+
+        preparse : function (string) {
+            return string;
+        },
+
+        postformat : function (string) {
+            return string;
+        },
+
+        week : function (mom) {
+            return weekOfYear(mom, this._week.dow, this._week.doy).week;
+        },
+
+        _week : {
+            dow : 0, // Sunday is the first day of the week.
+            doy : 6  // The week that contains Jan 1st is the first week of the year.
+        },
+
+        _invalidDate: 'Invalid date',
+        invalidDate: function () {
+            return this._invalidDate;
+        }
+    });
+
+    /************************************
+        Formatting
+    ************************************/
+
+
+    function removeFormattingTokens(input) {
+        if (input.match(/\[[\s\S]/)) {
+            return input.replace(/^\[|\]$/g, '');
+        }
+        return input.replace(/\\/g, '');
+    }
+
+    function makeFormatFunction(format) {
+        var array = format.match(formattingTokens), i, length;
+
+        for (i = 0, length = array.length; i < length; i++) {
+            if (formatTokenFunctions[array[i]]) {
+                array[i] = formatTokenFunctions[array[i]];
+            } else {
+                array[i] = removeFormattingTokens(array[i]);
+            }
+        }
+
+        return function (mom) {
+            var output = '';
+            for (i = 0; i < length; i++) {
+                output += array[i] instanceof Function ? array[i].call(mom, format) : array[i];
+            }
+            return output;
+        };
+    }
+
+    // format date using native date object
+    function formatMoment(m, format) {
+        if (!m.isValid()) {
+            return m.localeData().invalidDate();
+        }
+
+        format = expandFormat(format, m.localeData());
+
+        if (!formatFunctions[format]) {
+            formatFunctions[format] = makeFormatFunction(format);
+        }
+
+        return formatFunctions[format](m);
+    }
+
+    function expandFormat(format, locale) {
+        var i = 5;
+
+        function replaceLongDateFormatTokens(input) {
+            return locale.longDateFormat(input) || input;
+        }
+
+        localFormattingTokens.lastIndex = 0;
+        while (i >= 0 && localFormattingTokens.test(format)) {
+            format = format.replace(localFormattingTokens, replaceLongDateFormatTokens);
+            localFormattingTokens.lastIndex = 0;
+            i -= 1;
+        }
+
+        return format;
+    }
+
+
+    /************************************
+        Parsing
+    ************************************/
+
+
+    // get the regex to find the next token
+    function getParseRegexForToken(token, config) {
+        var a, strict = config._strict;
+        switch (token) {
+        case 'Q':
+            return parseTokenOneDigit;
+        case 'DDDD':
+            return parseTokenThreeDigits;
+        case 'YYYY':
+        case 'GGGG':
+        case 'gggg':
+            return strict ? parseTokenFourDigits : parseTokenOneToFourDigits;
+        case 'Y':
+        case 'G':
+        case 'g':
+            return parseTokenSignedNumber;
+        case 'YYYYYY':
+        case 'YYYYY':
+        case 'GGGGG':
+        case 'ggggg':
+            return strict ? parseTokenSixDigits : parseTokenOneToSixDigits;
+        case 'S':
+            if (strict) {
+                return parseTokenOneDigit;
+            }
+            /* falls through */
+        case 'SS':
+            if (strict) {
+                return parseTokenTwoDigits;
+            }
+            /* falls through */
+        case 'SSS':
+            if (strict) {
+                return parseTokenThreeDigits;
+            }
+            /* falls through */
+        case 'DDD':
+            return parseTokenOneToThreeDigits;
+        case 'MMM':
+        case 'MMMM':
+        case 'dd':
+        case 'ddd':
+        case 'dddd':
+            return parseTokenWord;
+        case 'a':
+        case 'A':
+            return config._locale._meridiemParse;
+        case 'x':
+            return parseTokenOffsetMs;
+        case 'X':
+            return parseTokenTimestampMs;
+        case 'Z':
+        case 'ZZ':
+            return parseTokenTimezone;
+        case 'T':
+            return parseTokenT;
+        case 'SSSS':
+            return parseTokenDigits;
+        case 'MM':
+        case 'DD':
+        case 'YY':
+        case 'GG':
+        case 'gg':
+        case 'HH':
+        case 'hh':
+        case 'mm':
+        case 'ss':
+        case 'ww':
+        case 'WW':
+            return strict ? parseTokenTwoDigits : parseTokenOneOrTwoDigits;
+        case 'M':
+        case 'D':
+        case 'd':
+        case 'H':
+        case 'h':
+        case 'm':
+        case 's':
+        case 'w':
+        case 'W':
+        case 'e':
+        case 'E':
+            return parseTokenOneOrTwoDigits;
+        case 'Do':
+            return strict ? config._locale._ordinalParse : config._locale._ordinalParseLenient;
+        default :
+            a = new RegExp(regexpEscape(unescapeFormat(token.replace('\\', '')), 'i'));
+            return a;
+        }
+    }
+
+    function timezoneMinutesFromString(string) {
+        string = string || '';
+        var possibleTzMatches = (string.match(parseTokenTimezone) || []),
+            tzChunk = possibleTzMatches[possibleTzMatches.length - 1] || [],
+            parts = (tzChunk + '').match(parseTimezoneChunker) || ['-', 0, 0],
+            minutes = +(parts[1] * 60) + toInt(parts[2]);
+
+        return parts[0] === '+' ? -minutes : minutes;
+    }
+
+    // function to convert string input to date
+    function addTimeToArrayFromToken(token, input, config) {
+        var a, datePartArray = config._a;
+
+        switch (token) {
+        // QUARTER
+        case 'Q':
+            if (input != null) {
+                datePartArray[MONTH] = (toInt(input) - 1) * 3;
+            }
+            break;
+        // MONTH
+        case 'M' : // fall through to MM
+        case 'MM' :
+            if (input != null) {
+                datePartArray[MONTH] = toInt(input) - 1;
+            }
+            break;
+        case 'MMM' : // fall through to MMMM
+        case 'MMMM' :
+            a = config._locale.monthsParse(input, token, config._strict);
+            // if we didn't find a month name, mark the date as invalid.
+            if (a != null) {
+                datePartArray[MONTH] = a;
+            } else {
+                config._pf.invalidMonth = input;
+            }
+            break;
+        // DAY OF MONTH
+        case 'D' : // fall through to DD
+        case 'DD' :
+            if (input != null) {
+                datePartArray[DATE] = toInt(input);
+            }
+            break;
+        case 'Do' :
+            if (input != null) {
+                datePartArray[DATE] = toInt(parseInt(
+                            input.match(/\d{1,2}/)[0], 10));
+            }
+            break;
+        // DAY OF YEAR
+        case 'DDD' : // fall through to DDDD
+        case 'DDDD' :
+            if (input != null) {
+                config._dayOfYear = toInt(input);
+            }
+
+            break;
+        // YEAR
+        case 'YY' :
+            datePartArray[YEAR] = moment.parseTwoDigitYear(input);
+            break;
+        case 'YYYY' :
+        case 'YYYYY' :
+        case 'YYYYYY' :
+            datePartArray[YEAR] = toInt(input);
+            break;
+        // AM / PM
+        case 'a' : // fall through to A
+        case 'A' :
+            config._isPm = config._locale.isPM(input);
+            break;
+        // HOUR
+        case 'h' : // fall through to hh
+        case 'hh' :
+            config._pf.bigHour = true;
+            /* falls through */
+        case 'H' : // fall through to HH
+        case 'HH' :
+            datePartArray[HOUR] = toInt(input);
+            break;
+        // MINUTE
+        case 'm' : // fall through to mm
+        case 'mm' :
+            datePartArray[MINUTE] = toInt(input);
+            break;
+        // SECOND
+        case 's' : // fall through to ss
+        case 'ss' :
+            datePartArray[SECOND] = toInt(input);
+            break;
+        // MILLISECOND
+        case 'S' :
+        case 'SS' :
+        case 'SSS' :
+        case 'SSSS' :
+            datePartArray[MILLISECOND] = toInt(('0.' + input) * 1000);
+            break;
+        // UNIX OFFSET (MILLISECONDS)
+        case 'x':
+            config._d = new Date(toInt(input));
+            break;
+        // UNIX TIMESTAMP WITH MS
+        case 'X':
+            config._d = new Date(parseFloat(input) * 1000);
+            break;
+        // TIMEZONE
+        case 'Z' : // fall through to ZZ
+        case 'ZZ' :
+            config._useUTC = true;
+            config._tzm = timezoneMinutesFromString(input);
+            break;
+        // WEEKDAY - human
+        case 'dd':
+        case 'ddd':
+        case 'dddd':
+            a = config._locale.weekdaysParse(input);
+            // if we didn't get a weekday name, mark the date as invalid
+            if (a != null) {
+                config._w = config._w || {};
+                config._w['d'] = a;
+            } else {
+                config._pf.invalidWeekday = input;
+            }
+            break;
+        // WEEK, WEEK DAY - numeric
+        case 'w':
+        case 'ww':
+        case 'W':
+        case 'WW':
+        case 'd':
+        case 'e':
+        case 'E':
+            token = token.substr(0, 1);
+            /* falls through */
+        case 'gggg':
+        case 'GGGG':
+        case 'GGGGG':
+            token = token.substr(0, 2);
+            if (input) {
+                config._w = config._w || {};
+                config._w[token] = toInt(input);
+            }
+            break;
+        case 'gg':
+        case 'GG':
+            config._w = config._w || {};
+            config._w[token] = moment.parseTwoDigitYear(input);
+        }
+    }
+
+    function dayOfYearFromWeekInfo(config) {
+        var w, weekYear, week, weekday, dow, doy, temp;
+
+        w = config._w;
+        if (w.GG != null || w.W != null || w.E != null) {
+            dow = 1;
+            doy = 4;
+
+            // TODO: We need to take the current isoWeekYear, but that depends on
+            // how we interpret now (local, utc, fixed offset). So create
+            // a now version of current config (take local/utc/offset flags, and
+            // create now).
+            weekYear = dfl(w.GG, config._a[YEAR], weekOfYear(moment(), 1, 4).year);
+            week = dfl(w.W, 1);
+            weekday = dfl(w.E, 1);
+        } else {
+            dow = config._locale._week.dow;
+            doy = config._locale._week.doy;
+
+            weekYear = dfl(w.gg, config._a[YEAR], weekOfYear(moment(), dow, doy).year);
+            week = dfl(w.w, 1);
+
+            if (w.d != null) {
+                // weekday -- low day numbers are considered next week
+                weekday = w.d;
+                if (weekday < dow) {
+                    ++week;
+                }
+            } else if (w.e != null) {
+                // local weekday -- counting starts from begining of week
+                weekday = w.e + dow;
+            } else {
+                // default to begining of week
+                weekday = dow;
+            }
+        }
+        temp = dayOfYearFromWeeks(weekYear, week, weekday, doy, dow);
+
+        config._a[YEAR] = temp.year;
+        config._dayOfYear = temp.dayOfYear;
+    }
+
+    // convert an array to a date.
+    // the array should mirror the parameters below
+    // note: all values past the year are optional and will default to the lowest possible value.
+    // [year, month, day , hour, minute, second, millisecond]
+    function dateFromConfig(config) {
+        var i, date, input = [], currentDate, yearToUse;
+
+        if (config._d) {
+            return;
+        }
+
+        currentDate = currentDateArray(config);
+
+        //compute day of the year from weeks and weekdays
+        if (config._w && config._a[DATE] == null && config._a[MONTH] == null) {
+            dayOfYearFromWeekInfo(config);
+        }
+
+        //if the day of the year is set, figure out what it is
+        if (config._dayOfYear) {
+            yearToUse = dfl(config._a[YEAR], currentDate[YEAR]);
+
+            if (config._dayOfYear > daysInYear(yearToUse)) {
+                config._pf._overflowDayOfYear = true;
+            }
+
+            date = makeUTCDate(yearToUse, 0, config._dayOfYear);
+            config._a[MONTH] = date.getUTCMonth();
+            config._a[DATE] = date.getUTCDate();
+        }
+
+        // Default to current date.
+        // * if no year, month, day of month are given, default to today
+        // * if day of month is given, default month and year
+        // * if month is given, default only year
+        // * if year is given, don't default anything
+        for (i = 0; i < 3 && config._a[i] == null; ++i) {
+            config._a[i] = input[i] = currentDate[i];
+        }
+
+        // Zero out whatever was not defaulted, including time
+        for (; i < 7; i++) {
+            config._a[i] = input[i] = (config._a[i] == null) ? (i === 2 ? 1 : 0) : config._a[i];
+        }
+
+        // Check for 24:00:00.000
+        if (config._a[HOUR] === 24 &&
+                config._a[MINUTE] === 0 &&
+                config._a[SECOND] === 0 &&
+                config._a[MILLISECOND] === 0) {
+            config._nextDay = true;
+            config._a[HOUR] = 0;
+        }
+
+        config._d = (config._useUTC ? makeUTCDate : makeDate).apply(null, input);
+        // Apply timezone offset from input. The actual zone can be changed
+        // with parseZone.
+        if (config._tzm != null) {
+            config._d.setUTCMinutes(config._d.getUTCMinutes() + config._tzm);
+        }
+
+        if (config._nextDay) {
+            config._a[HOUR] = 24;
+        }
+    }
+
+    function dateFromObject(config) {
+        var normalizedInput;
+
+        if (config._d) {
+            return;
+        }
+
+        normalizedInput = normalizeObjectUnits(config._i);
+        config._a = [
+            normalizedInput.year,
+            normalizedInput.month,
+            normalizedInput.day || normalizedInput.date,
+            normalizedInput.hour,
+            normalizedInput.minute,
+            normalizedInput.second,
+            normalizedInput.millisecond
+        ];
+
+        dateFromConfig(config);
+    }
+
+    function currentDateArray(config) {
+        var now = new Date();
+        if (config._useUTC) {
+            return [
+                now.getUTCFullYear(),
+                now.getUTCMonth(),
+                now.getUTCDate()
+            ];
+        } else {
+            return [now.getFullYear(), now.getMonth(), now.getDate()];
+        }
+    }
+
+    // date from string and format string
+    function makeDateFromStringAndFormat(config) {
+        if (config._f === moment.ISO_8601) {
+            parseISO(config);
+            return;
+        }
+
+        config._a = [];
+        config._pf.empty = true;
+
+        // This array is used to make a Date, either with `new Date` or `Date.UTC`
+        var string = '' + config._i,
+            i, parsedInput, tokens, token, skipped,
+            stringLength = string.length,
+            totalParsedInputLength = 0;
+
+        tokens = expandFormat(config._f, config._locale).match(formattingTokens) || [];
+
+        for (i = 0; i < tokens.length; i++) {
+            token = tokens[i];
+            parsedInput = (string.match(getParseRegexForToken(token, config)) || [])[0];
+            if (parsedInput) {
+                skipped = string.substr(0, string.indexOf(parsedInput));
+                if (skipped.length > 0) {
+                    config._pf.unusedInput.push(skipped);
+                }
+                string = string.slice(string.indexOf(parsedInput) + parsedInput.length);
+                totalParsedInputLength += parsedInput.length;
+            }
+            // don't parse if it's not a known token
+            if (formatTokenFunctions[token]) {
+                if (parsedInput) {
+                    config._pf.empty = false;
+                }
+                else {
+                    config._pf.unusedTokens.push(token);
+                }
+                addTimeToArrayFromToken(token, parsedInput, config);
+            }
+            else if (config._strict && !parsedInput) {
+                config._pf.unusedTokens.push(token);
+            }
+        }
+
+        // add remaining unparsed input length to the string
+        config._pf.charsLeftOver = stringLength - totalParsedInputLength;
+        if (string.length > 0) {
+            config._pf.unusedInput.push(string);
+        }
+
+        // clear _12h flag if hour is <= 12
+        if (config._pf.bigHour === true && config._a[HOUR] <= 12) {
+            config._pf.bigHour = undefined;
+        }
+        // handle am pm
+        if (config._isPm && config._a[HOUR] < 12) {
+            config._a[HOUR] += 12;
+        }
+        // if is 12 am, change hours to 0
+        if (config._isPm === false && config._a[HOUR] === 12) {
+            config._a[HOUR] = 0;
+        }
+        dateFromConfig(config);
+        checkOverflow(config);
+    }
+
+    function unescapeFormat(s) {
+        return s.replace(/\\(\[)|\\(\])|\[([^\]\[]*)\]|\\(.)/g, function (matched, p1, p2, p3, p4) {
+            return p1 || p2 || p3 || p4;
+        });
+    }
+
+    // Code from http://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
+    function regexpEscape(s) {
+        return s.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
+    }
+
+    // date from string and array of format strings
+    function makeDateFromStringAndArray(config) {
+        var tempConfig,
+            bestMoment,
+
+            scoreToBeat,
+            i,
+            currentScore;
+
+        if (config._f.length === 0) {
+            config._pf.invalidFormat = true;
+            config._d = new Date(NaN);
+            return;
+        }
+
+        for (i = 0; i < config._f.length; i++) {
+            currentScore = 0;
+            tempConfig = copyConfig({}, config);
+            if (config._useUTC != null) {
+                tempConfig._useUTC = config._useUTC;
+            }
+            tempConfig._pf = defaultParsingFlags();
+            tempConfig._f = config._f[i];
+            makeDateFromStringAndFormat(tempConfig);
+
+            if (!isValid(tempConfig)) {
+                continue;
+            }
+
+            // if there is any input that was not parsed add a penalty for that format
+            currentScore += tempConfig._pf.charsLeftOver;
+
+            //or tokens
+            currentScore += tempConfig._pf.unusedTokens.length * 10;
+
+            tempConfig._pf.score = currentScore;
+
+            if (scoreToBeat == null || currentScore < scoreToBeat) {
+                scoreToBeat = currentScore;
+                bestMoment = tempConfig;
+            }
+        }
+
+        extend(config, bestMoment || tempConfig);
+    }
+
+    // date from iso format
+    function parseISO(config) {
+        var i, l,
+            string = config._i,
+            match = isoRegex.exec(string);
+
+        if (match) {
+            config._pf.iso = true;
+            for (i = 0, l = isoDates.length; i < l; i++) {
+                if (isoDates[i][1].exec(string)) {
+                    // match[5] should be 'T' or undefined
+                    config._f = isoDates[i][0] + (match[6] || ' ');
+                    break;
+                }
+            }
+            for (i = 0, l = isoTimes.length; i < l; i++) {
+                if (isoTimes[i][1].exec(string)) {
+                    config._f += isoTimes[i][0];
+                    break;
+                }
+            }
+            if (string.match(parseTokenTimezone)) {
+                config._f += 'Z';
+            }
+            makeDateFromStringAndFormat(config);
+        } else {
+            config._isValid = false;
+        }
+    }
+
+    // date from iso format or fallback
+    function makeDateFromString(config) {
+        parseISO(config);
+        if (config._isValid === false) {
+            delete config._isValid;
+            moment.createFromInputFallback(config);
+        }
+    }
+
+    function map(arr, fn) {
+        var res = [], i;
+        for (i = 0; i < arr.length; ++i) {
+            res.push(fn(arr[i], i));
+        }
+        return res;
+    }
+
+    function makeDateFromInput(config) {
+        var input = config._i, matched;
+        if (input === undefined) {
+            config._d = new Date();
+        } else if (isDate(input)) {
+            config._d = new Date(+input);
+        } else if ((matched = aspNetJsonRegex.exec(input)) !== null) {
+            config._d = new Date(+matched[1]);
+        } else if (typeof input === 'string') {
+            makeDateFromString(config);
+        } else if (isArray(input)) {
+            config._a = map(input.slice(0), function (obj) {
+                return parseInt(obj, 10);
+            });
+            dateFromConfig(config);
+        } else if (typeof(input) === 'object') {
+            dateFromObject(config);
+        } else if (typeof(input) === 'number') {
+            // from milliseconds
+            config._d = new Date(input);
+        } else {
+            moment.createFromInputFallback(config);
+        }
+    }
+
+    function makeDate(y, m, d, h, M, s, ms) {
+        //can't just apply() to create a date:
+        //http://stackoverflow.com/questions/181348/instantiating-a-javascript-object-by-calling-prototype-constructor-apply
+        var date = new Date(y, m, d, h, M, s, ms);
+
+        //the date constructor doesn't accept years < 1970
+        if (y < 1970) {
+            date.setFullYear(y);
+        }
+        return date;
+    }
+
+    function makeUTCDate(y) {
+        var date = new Date(Date.UTC.apply(null, arguments));
+        if (y < 1970) {
+            date.setUTCFullYear(y);
+        }
+        return date;
+    }
+
+    function parseWeekday(input, locale) {
+        if (typeof input === 'string') {
+            if (!isNaN(input)) {
+                input = parseInt(input, 10);
+            }
+            else {
+                input = locale.weekdaysParse(input);
+                if (typeof input !== 'number') {
+                    return null;
+                }
+            }
+        }
+        return input;
+    }
+
+    /************************************
+        Relative Time
+    ************************************/
+
+
+    // helper function for moment.fn.from, moment.fn.fromNow, and moment.duration.fn.humanize
+    function substituteTimeAgo(string, number, withoutSuffix, isFuture, locale) {
+        return locale.relativeTime(number || 1, !!withoutSuffix, string, isFuture);
+    }
+
+    function relativeTime(posNegDuration, withoutSuffix, locale) {
+        var duration = moment.duration(posNegDuration).abs(),
+            seconds = round(duration.as('s')),
+            minutes = round(duration.as('m')),
+            hours = round(duration.as('h')),
+            days = round(duration.as('d')),
+            months = round(duration.as('M')),
+            years = round(duration.as('y')),
+
+            args = seconds < relativeTimeThresholds.s && ['s', seconds] ||
+                minutes === 1 && ['m'] ||
+                minutes < relativeTimeThresholds.m && ['mm', minutes] ||
+                hours === 1 && ['h'] ||
+                hours < relativeTimeThresholds.h && ['hh', hours] ||
+                days === 1 && ['d'] ||
+                days < relativeTimeThresholds.d && ['dd', days] ||
+                months === 1 && ['M'] ||
+                months < relativeTimeThresholds.M && ['MM', months] ||
+                years === 1 && ['y'] || ['yy', years];
+
+        args[2] = withoutSuffix;
+        args[3] = +posNegDuration > 0;
+        args[4] = locale;
+        return substituteTimeAgo.apply({}, args);
+    }
+
+
+    /************************************
+        Week of Year
+    ************************************/
+
+
+    // firstDayOfWeek       0 = sun, 6 = sat
+    //                      the day of the week that starts the week
+    //                      (usually sunday or monday)
+    // firstDayOfWeekOfYear 0 = sun, 6 = sat
+    //                      the first week is the week that contains the first
+    //                      of this day of the week
+    //                      (eg. ISO weeks use thursday (4))
+    function weekOfYear(mom, firstDayOfWeek, firstDayOfWeekOfYear) {
+        var end = firstDayOfWeekOfYear - firstDayOfWeek,
+            daysToDayOfWeek = firstDayOfWeekOfYear - mom.day(),
+            adjustedMoment;
+
+
+        if (daysToDayOfWeek > end) {
+            daysToDayOfWeek -= 7;
+        }
+
+        if (daysToDayOfWeek < end - 7) {
+            daysToDayOfWeek += 7;
+        }
+
+        adjustedMoment = moment(mom).add(daysToDayOfWeek, 'd');
+        return {
+            week: Math.ceil(adjustedMoment.dayOfYear() / 7),
+            year: adjustedMoment.year()
+        };
+    }
+
+    //http://en.wikipedia.org/wiki/ISO_week_date#Calculating_a_date_given_the_year.2C_week_number_and_weekday
+    function dayOfYearFromWeeks(year, week, weekday, firstDayOfWeekOfYear, firstDayOfWeek) {
+        var d = makeUTCDate(year, 0, 1).getUTCDay(), daysToAdd, dayOfYear;
+
+        d = d === 0 ? 7 : d;
+        weekday = weekday != null ? weekday : firstDayOfWeek;
+        daysToAdd = firstDayOfWeek - d + (d > firstDayOfWeekOfYear ? 7 : 0) - (d < firstDayOfWeek ? 7 : 0);
+        dayOfYear = 7 * (week - 1) + (weekday - firstDayOfWeek) + daysToAdd + 1;
+
+        return {
+            year: dayOfYear > 0 ? year : year - 1,
+            dayOfYear: dayOfYear > 0 ?  dayOfYear : daysInYear(year - 1) + dayOfYear
+        };
+    }
+
+    /************************************
+        Top Level Functions
+    ************************************/
+
+    function makeMoment(config) {
+        var input = config._i,
+            format = config._f,
+            res;
+
+        config._locale = config._locale || moment.localeData(config._l);
+
+        if (input === null || (format === undefined && input === '')) {
+            return moment.invalid({nullInput: true});
+        }
+
+        if (typeof input === 'string') {
+            config._i = input = config._locale.preparse(input);
+        }
+
+        if (moment.isMoment(input)) {
+            return new Moment(input, true);
+        } else if (format) {
+            if (isArray(format)) {
+                makeDateFromStringAndArray(config);
+            } else {
+                makeDateFromStringAndFormat(config);
+            }
+        } else {
+            makeDateFromInput(config);
+        }
+
+        res = new Moment(config);
+        if (res._nextDay) {
+            // Adding is smart enough around DST
+            res.add(1, 'd');
+            res._nextDay = undefined;
+        }
+
+        return res;
+    }
+
+    moment = function (input, format, locale, strict) {
+        var c;
+
+        if (typeof(locale) === 'boolean') {
+            strict = locale;
+            locale = undefined;
+        }
+        // object construction must be done this way.
+        // https://github.com/moment/moment/issues/1423
+        c = {};
+        c._isAMomentObject = true;
+        c._i = input;
+        c._f = format;
+        c._l = locale;
+        c._strict = strict;
+        c._isUTC = false;
+        c._pf = defaultParsingFlags();
+
+        return makeMoment(c);
+    };
+
+    moment.suppressDeprecationWarnings = false;
+
+    moment.createFromInputFallback = deprecate(
+        'moment construction falls back to js Date. This is ' +
+        'discouraged and will be removed in upcoming major ' +
+        'release. Please refer to ' +
+        'https://github.com/moment/moment/issues/1407 for more info.',
+        function (config) {
+            config._d = new Date(config._i + (config._useUTC ? ' UTC' : ''));
+        }
+    );
+
+    // Pick a moment m from moments so that m[fn](other) is true for all
+    // other. This relies on the function fn to be transitive.
+    //
+    // moments should either be an array of moment objects or an array, whose
+    // first element is an array of moment objects.
+    function pickBy(fn, moments) {
+        var res, i;
+        if (moments.length === 1 && isArray(moments[0])) {
+            moments = moments[0];
+        }
+        if (!moments.length) {
+            return moment();
+        }
+        res = moments[0];
+        for (i = 1; i < moments.length; ++i) {
+            if (moments[i][fn](res)) {
+                res = moments[i];
+            }
+        }
+        return res;
+    }
+
+    moment.min = function () {
+        var args = [].slice.call(arguments, 0);
+
+        return pickBy('isBefore', args);
+    };
+
+    moment.max = function () {
+        var args = [].slice.call(arguments, 0);
+
+        return pickBy('isAfter', args);
+    };
+
+    // creating with utc
+    moment.utc = function (input, format, locale, strict) {
+        var c;
+
+        if (typeof(locale) === 'boolean') {
+            strict = locale;
+            locale = undefined;
+        }
+        // object construction must be done this way.
+        // https://github.com/moment/moment/issues/1423
+        c = {};
+        c._isAMomentObject = true;
+        c._useUTC = true;
+        c._isUTC = true;
+        c._l = locale;
+        c._i = input;
+        c._f = format;
+        c._strict = strict;
+        c._pf = defaultParsingFlags();
+
+        return makeMoment(c).utc();
+    };
+
+    // creating with unix timestamp (in seconds)
+    moment.unix = function (input) {
+        return moment(input * 1000);
+    };
+
+    // duration
+    moment.duration = function (input, key) {
+        var duration = input,
+            // matching against regexp is expensive, do it on demand
+            match = null,
+            sign,
+            ret,
+            parseIso,
+            diffRes;
+
+        if (moment.isDuration(input)) {
+            duration = {
+                ms: input._milliseconds,
+                d: input._days,
+                M: input._months
+            };
+        } else if (typeof input === 'number') {
+            duration = {};
+            if (key) {
+                duration[key] = input;
+            } else {
+                duration.milliseconds = input;
+            }
+        } else if (!!(match = aspNetTimeSpanJsonRegex.exec(input))) {
+            sign = (match[1] === '-') ? -1 : 1;
+            duration = {
+                y: 0,
+                d: toInt(match[DATE]) * sign,
+                h: toInt(match[HOUR]) * sign,
+                m: toInt(match[MINUTE]) * sign,
+                s: toInt(match[SECOND]) * sign,
+                ms: toInt(match[MILLISECOND]) * sign
+            };
+        } else if (!!(match = isoDurationRegex.exec(input))) {
+            sign = (match[1] === '-') ? -1 : 1;
+            parseIso = function (inp) {
+                // We'd normally use ~~inp for this, but unfortunately it also
+                // converts floats to ints.
+                // inp may be undefined, so careful calling replace on it.
+                var res = inp && parseFloat(inp.replace(',', '.'));
+                // apply sign while we're at it
+                return (isNaN(res) ? 0 : res) * sign;
+            };
+            duration = {
+                y: parseIso(match[2]),
+                M: parseIso(match[3]),
+                d: parseIso(match[4]),
+                h: parseIso(match[5]),
+                m: parseIso(match[6]),
+                s: parseIso(match[7]),
+                w: parseIso(match[8])
+            };
+        } else if (typeof duration === 'object' &&
+                ('from' in duration || 'to' in duration)) {
+            diffRes = momentsDifference(moment(duration.from), moment(duration.to));
+
+            duration = {};
+            duration.ms = diffRes.milliseconds;
+            duration.M = diffRes.months;
+        }
+
+        ret = new Duration(duration);
+
+        if (moment.isDuration(input) && hasOwnProp(input, '_locale')) {
+            ret._locale = input._locale;
+        }
+
+        return ret;
+    };
+
+    // version number
+    moment.version = VERSION;
+
+    // default format
+    moment.defaultFormat = isoFormat;
+
+    // constant that refers to the ISO standard
+    moment.ISO_8601 = function () {};
+
+    // Plugins that add properties should also add the key here (null value),
+    // so we can properly clone ourselves.
+    moment.momentProperties = momentProperties;
+
+    // This function will be called whenever a moment is mutated.
+    // It is intended to keep the offset in sync with the timezone.
+    moment.updateOffset = function () {};
+
+    // This function allows you to set a threshold for relative time strings
+    moment.relativeTimeThreshold = function (threshold, limit) {
+        if (relativeTimeThresholds[threshold] === undefined) {
+            return false;
+        }
+        if (limit === undefined) {
+            return relativeTimeThresholds[threshold];
+        }
+        relativeTimeThresholds[threshold] = limit;
+        return true;
+    };
+
+    moment.lang = deprecate(
+        'moment.lang is deprecated. Use moment.locale instead.',
+        function (key, value) {
+            return moment.locale(key, value);
+        }
+    );
+
+    // This function will load locale and then set the global locale.  If
+    // no arguments are passed in, it will simply return the current global
+    // locale key.
+    moment.locale = function (key, values) {
+        var data;
+        if (key) {
+            if (typeof(values) !== 'undefined') {
+                data = moment.defineLocale(key, values);
+            }
+            else {
+                data = moment.localeData(key);
+            }
+
+            if (data) {
+                moment.duration._locale = moment._locale = data;
+            }
+        }
+
+        return moment._locale._abbr;
+    };
+
+    moment.defineLocale = function (name, values) {
+        if (values !== null) {
+            values.abbr = name;
+            if (!locales[name]) {
+                locales[name] = new Locale();
+            }
+            locales[name].set(values);
+
+            // backwards compat for now: also set the locale
+            moment.locale(name);
+
+            return locales[name];
+        } else {
+            // useful for testing
+            delete locales[name];
+            return null;
+        }
+    };
+
+    moment.langData = deprecate(
+        'moment.langData is deprecated. Use moment.localeData instead.',
+        function (key) {
+            return moment.localeData(key);
+        }
+    );
+
+    // returns locale data
+    moment.localeData = function (key) {
+        var locale;
+
+        if (key && key._locale && key._locale._abbr) {
+            key = key._locale._abbr;
+        }
+
+        if (!key) {
+            return moment._locale;
+        }
+
+        if (!isArray(key)) {
+            //short-circuit everything else
+            locale = loadLocale(key);
+            if (locale) {
+                return locale;
+            }
+            key = [key];
+        }
+
+        return chooseLocale(key);
+    };
+
+    // compare moment object
+    moment.isMoment = function (obj) {
+        return obj instanceof Moment ||
+            (obj != null && hasOwnProp(obj, '_isAMomentObject'));
+    };
+
+    // for typechecking Duration objects
+    moment.isDuration = function (obj) {
+        return obj instanceof Duration;
+    };
+
+    for (i = lists.length - 1; i >= 0; --i) {
+        makeList(lists[i]);
+    }
+
+    moment.normalizeUnits = function (units) {
+        return normalizeUnits(units);
+    };
+
+    moment.invalid = function (flags) {
+        var m = moment.utc(NaN);
+        if (flags != null) {
+            extend(m._pf, flags);
+        }
+        else {
+            m._pf.userInvalidated = true;
+        }
+
+        return m;
+    };
+
+    moment.parseZone = function () {
+        return moment.apply(null, arguments).parseZone();
+    };
+
+    moment.parseTwoDigitYear = function (input) {
+        return toInt(input) + (toInt(input) > 68 ? 1900 : 2000);
+    };
+
+    /************************************
+        Moment Prototype
+    ************************************/
+
+
+    extend(moment.fn = Moment.prototype, {
+
+        clone : function () {
+            return moment(this);
+        },
+
+        valueOf : function () {
+            return +this._d + ((this._offset || 0) * 60000);
+        },
+
+        unix : function () {
+            return Math.floor(+this / 1000);
+        },
+
+        toString : function () {
+            return this.clone().locale('en').format('ddd MMM DD YYYY HH:mm:ss [GMT]ZZ');
+        },
+
+        toDate : function () {
+            return this._offset ? new Date(+this) : this._d;
+        },
+
+        toISOString : function () {
+            var m = moment(this).utc();
+            if (0 < m.year() && m.year() <= 9999) {
+                if ('function' === typeof Date.prototype.toISOString) {
+                    // native implementation is ~50x faster, use it when we can
+                    return this.toDate().toISOString();
+                } else {
+                    return formatMoment(m, 'YYYY-MM-DD[T]HH:mm:ss.SSS[Z]');
+                }
+            } else {
+                return formatMoment(m, 'YYYYYY-MM-DD[T]HH:mm:ss.SSS[Z]');
+            }
+        },
+
+        toArray : function () {
+            var m = this;
+            return [
+                m.year(),
+                m.month(),
+                m.date(),
+                m.hours(),
+                m.minutes(),
+                m.seconds(),
+                m.milliseconds()
+            ];
+        },
+
+        isValid : function () {
+            return isValid(this);
+        },
+
+        isDSTShifted : function () {
+            if (this._a) {
+                return this.isValid() && compareArrays(this._a, (this._isUTC ? moment.utc(this._a) : moment(this._a)).toArray()) > 0;
+            }
+
+            return false;
+        },
+
+        parsingFlags : function () {
+            return extend({}, this._pf);
+        },
+
+        invalidAt: function () {
+            return this._pf.overflow;
+        },
+
+        utc : function (keepLocalTime) {
+            return this.zone(0, keepLocalTime);
+        },
+
+        local : function (keepLocalTime) {
+            if (this._isUTC) {
+                this.zone(0, keepLocalTime);
+                this._isUTC = false;
+
+                if (keepLocalTime) {
+                    this.add(this._dateTzOffset(), 'm');
+                }
+            }
+            return this;
+        },
+
+        format : function (inputString) {
+            var output = formatMoment(this, inputString || moment.defaultFormat);
+            return this.localeData().postformat(output);
+        },
+
+        add : createAdder(1, 'add'),
+
+        subtract : createAdder(-1, 'subtract'),
+
+        diff : function (input, units, asFloat) {
+            var that = makeAs(input, this),
+                zoneDiff = (this.zone() - that.zone()) * 6e4,
+                diff, output, daysAdjust;
+
+            units = normalizeUnits(units);
+
+            if (units === 'year' || units === 'month') {
+                // average number of days in the months in the given dates
+                diff = (this.daysInMonth() + that.daysInMonth()) * 432e5; // 24 * 60 * 60 * 1000 / 2
+                // difference in months
+                output = ((this.year() - that.year()) * 12) + (this.month() - that.month());
+                // adjust by taking difference in days, average number of days
+                // and dst in the given months.
+                daysAdjust = (this - moment(this).startOf('month')) -
+                    (that - moment(that).startOf('month'));
+                // same as above but with zones, to negate all dst
+                daysAdjust -= ((this.zone() - moment(this).startOf('month').zone()) -
+                        (that.zone() - moment(that).startOf('month').zone())) * 6e4;
+                output += daysAdjust / diff;
+                if (units === 'year') {
+                    output = output / 12;
+                }
+            } else {
+                diff = (this - that);
+                output = units === 'second' ? diff / 1e3 : // 1000
+                    units === 'minute' ? diff / 6e4 : // 1000 * 60
+                    units === 'hour' ? diff / 36e5 : // 1000 * 60 * 60
+                    units === 'day' ? (diff - zoneDiff) / 864e5 : // 1000 * 60 * 60 * 24, negate dst
+                    units === 'week' ? (diff - zoneDiff) / 6048e5 : // 1000 * 60 * 60 * 24 * 7, negate dst
+                    diff;
+            }
+            return asFloat ? output : absRound(output);
+        },
+
+        from : function (time, withoutSuffix) {
+            return moment.duration({to: this, from: time}).locale(this.locale()).humanize(!withoutSuffix);
+        },
+
+        fromNow : function (withoutSuffix) {
+            return this.from(moment(), withoutSuffix);
+        },
+
+        calendar : function (time) {
+            // We want to compare the start of today, vs this.
+            // Getting start-of-today depends on whether we're zone'd or not.
+            var now = time || moment(),
+                sod = makeAs(now, this).startOf('day'),
+                diff = this.diff(sod, 'days', true),
+                format = diff < -6 ? 'sameElse' :
+                    diff < -1 ? 'lastWeek' :
+                    diff < 0 ? 'lastDay' :
+                    diff < 1 ? 'sameDay' :
+                    diff < 2 ? 'nextDay' :
+                    diff < 7 ? 'nextWeek' : 'sameElse';
+            return this.format(this.localeData().calendar(format, this, moment(now)));
+        },
+
+        isLeapYear : function () {
+            return isLeapYear(this.year());
+        },
+
+        isDST : function () {
+            return (this.zone() < this.clone().month(0).zone() ||
+                this.zone() < this.clone().month(5).zone());
+        },
+
+        day : function (input) {
+            var day = this._isUTC ? this._d.getUTCDay() : this._d.getDay();
+            if (input != null) {
+                input = parseWeekday(input, this.localeData());
+                return this.add(input - day, 'd');
+            } else {
+                return day;
+            }
+        },
+
+        month : makeAccessor('Month', true),
+
+        startOf : function (units) {
+            units = normalizeUnits(units);
+            // the following switch intentionally omits break keywords
+            // to utilize falling through the cases.
+            switch (units) {
+            case 'year':
+                this.month(0);
+                /* falls through */
+            case 'quarter':
+            case 'month':
+                this.date(1);
+                /* falls through */
+            case 'week':
+            case 'isoWeek':
+            case 'day':
+                this.hours(0);
+                /* falls through */
+            case 'hour':
+                this.minutes(0);
+                /* falls through */
+            case 'minute':
+                this.seconds(0);
+                /* falls through */
+            case 'second':
+                this.milliseconds(0);
+                /* falls through */
+            }
+
+            // weeks are a special case
+            if (units === 'week') {
+                this.weekday(0);
+            } else if (units === 'isoWeek') {
+                this.isoWeekday(1);
+            }
+
+            // quarters are also special
+            if (units === 'quarter') {
+                this.month(Math.floor(this.month() / 3) * 3);
+            }
+
+            return this;
+        },
+
+        endOf: function (units) {
+            units = normalizeUnits(units);
+            if (units === undefined || units === 'millisecond') {
+                return this;
+            }
+            return this.startOf(units).add(1, (units === 'isoWeek' ? 'week' : units)).subtract(1, 'ms');
+        },
+
+        isAfter: function (input, units) {
+            var inputMs;
+            units = normalizeUnits(typeof units !== 'undefined' ? units : 'millisecond');
+            if (units === 'millisecond') {
+                input = moment.isMoment(input) ? input : moment(input);
+                return +this > +input;
+            } else {
+                inputMs = moment.isMoment(input) ? +input : +moment(input);
+                return inputMs < +this.clone().startOf(units);
+            }
+        },
+
+        isBefore: function (input, units) {
+            var inputMs;
+            units = normalizeUnits(typeof units !== 'undefined' ? units : 'millisecond');
+            if (units === 'millisecond') {
+                input = moment.isMoment(input) ? input : moment(input);
+                return +this < +input;
+            } else {
+                inputMs = moment.isMoment(input) ? +input : +moment(input);
+                return +this.clone().endOf(units) < inputMs;
+            }
+        },
+
+        isSame: function (input, units) {
+            var inputMs;
+            units = normalizeUnits(units || 'millisecond');
+            if (units === 'millisecond') {
+                input = moment.isMoment(input) ? input : moment(input);
+                return +this === +input;
+            } else {
+                inputMs = +moment(input);
+                return +(this.clone().startOf(units)) <= inputMs && inputMs <= +(this.clone().endOf(units));
+            }
+        },
+
+        min: deprecate(
+                 'moment().min is deprecated, use moment.min instead. https://github.com/moment/moment/issues/1548',
+                 function (other) {
+                     other = moment.apply(null, arguments);
+                     return other < this ? this : other;
+                 }
+         ),
+
+        max: deprecate(
+                'moment().max is deprecated, use moment.max instead. https://github.com/moment/moment/issues/1548',
+                function (other) {
+                    other = moment.apply(null, arguments);
+                    return other > this ? this : other;
+                }
+        ),
+
+        // keepLocalTime = true means only change the timezone, without
+        // affecting the local hour. So 5:31:26 +0300 --[zone(2, true)]-->
+        // 5:31:26 +0200 It is possible that 5:31:26 doesn't exist int zone
+        // +0200, so we adjust the time as needed, to be valid.
+        //
+        // Keeping the time actually adds/subtracts (one hour)
+        // from the actual represented time. That is why we call updateOffset
+        // a second time. In case it wants us to change the offset again
+        // _changeInProgress == true case, then we have to adjust, because
+        // there is no such time in the given timezone.
+        zone : function (input, keepLocalTime) {
+            var offset = this._offset || 0,
+                localAdjust;
+            if (input != null) {
+                if (typeof input === 'string') {
+                    input = timezoneMinutesFromString(input);
+                }
+                if (Math.abs(input) < 16) {
+                    input = input * 60;
+                }
+                if (!this._isUTC && keepLocalTime) {
+                    localAdjust = this._dateTzOffset();
+                }
+                this._offset = input;
+                this._isUTC = true;
+                if (localAdjust != null) {
+                    this.subtract(localAdjust, 'm');
+                }
+                if (offset !== input) {
+                    if (!keepLocalTime || this._changeInProgress) {
+                        addOrSubtractDurationFromMoment(this,
+                                moment.duration(offset - input, 'm'), 1, false);
+                    } else if (!this._changeInProgress) {
+                        this._changeInProgress = true;
+                        moment.updateOffset(this, true);
+                        this._changeInProgress = null;
+                    }
+                }
+            } else {
+                return this._isUTC ? offset : this._dateTzOffset();
+            }
+            return this;
+        },
+
+        zoneAbbr : function () {
+            return this._isUTC ? 'UTC' : '';
+        },
+
+        zoneName : function () {
+            return this._isUTC ? 'Coordinated Universal Time' : '';
+        },
+
+        parseZone : function () {
+            if (this._tzm) {
+                this.zone(this._tzm);
+            } else if (typeof this._i === 'string') {
+                this.zone(this._i);
+            }
+            return this;
+        },
+
+        hasAlignedHourOffset : function (input) {
+            if (!input) {
+                input = 0;
+            }
+            else {
+                input = moment(input).zone();
+            }
+
+            return (this.zone() - input) % 60 === 0;
+        },
+
+        daysInMonth : function () {
+            return daysInMonth(this.year(), this.month());
+        },
+
+        dayOfYear : function (input) {
+            var dayOfYear = round((moment(this).startOf('day') - moment(this).startOf('year')) / 864e5) + 1;
+            return input == null ? dayOfYear : this.add((input - dayOfYear), 'd');
+        },
+
+        quarter : function (input) {
+            return input == null ? Math.ceil((this.month() + 1) / 3) : this.month((input - 1) * 3 + this.month() % 3);
+        },
+
+        weekYear : function (input) {
+            var year = weekOfYear(this, this.localeData()._week.dow, this.localeData()._week.doy).year;
+            return input == null ? year : this.add((input - year), 'y');
+        },
+
+        isoWeekYear : function (input) {
+            var year = weekOfYear(this, 1, 4).year;
+            return input == null ? year : this.add((input - year), 'y');
+        },
+
+        week : function (input) {
+            var week = this.localeData().week(this);
+            return input == null ? week : this.add((input - week) * 7, 'd');
+        },
+
+        isoWeek : function (input) {
+            var week = weekOfYear(this, 1, 4).week;
+            return input == null ? week : this.add((input - week) * 7, 'd');
+        },
+
+        weekday : function (input) {
+            var weekday = (this.day() + 7 - this.localeData()._week.dow) % 7;
+            return input == null ? weekday : this.add(input - weekday, 'd');
+        },
+
+        isoWeekday : function (input) {
+            // behaves the same as moment#day except
+            // as a getter, returns 7 instead of 0 (1-7 range instead of 0-6)
+            // as a setter, sunday should belong to the previous week.
+            return input == null ? this.day() || 7 : this.day(this.day() % 7 ? input : input - 7);
+        },
+
+        isoWeeksInYear : function () {
+            return weeksInYear(this.year(), 1, 4);
+        },
+
+        weeksInYear : function () {
+            var weekInfo = this.localeData()._week;
+            return weeksInYear(this.year(), weekInfo.dow, weekInfo.doy);
+        },
+
+        get : function (units) {
+            units = normalizeUnits(units);
+            return this[units]();
+        },
+
+        set : function (units, value) {
+            units = normalizeUnits(units);
+            if (typeof this[units] === 'function') {
+                this[units](value);
+            }
+            return this;
+        },
+
+        // If passed a locale key, it will set the locale for this
+        // instance.  Otherwise, it will return the locale configuration
+        // variables for this instance.
+        locale : function (key) {
+            var newLocaleData;
+
+            if (key === undefined) {
+                return this._locale._abbr;
+            } else {
+                newLocaleData = moment.localeData(key);
+                if (newLocaleData != null) {
+                    this._locale = newLocaleData;
+                }
+                return this;
+            }
+        },
+
+        lang : deprecate(
+            'moment().lang() is deprecated. Instead, use moment().localeData() to get the language configuration. Use moment().locale() to change languages.',
+            function (key) {
+                if (key === undefined) {
+                    return this.localeData();
+                } else {
+                    return this.locale(key);
+                }
+            }
+        ),
+
+        localeData : function () {
+            return this._locale;
+        },
+
+        _dateTzOffset : function () {
+            // On Firefox.24 Date#getTimezoneOffset returns a floating point.
+            // https://github.com/moment/moment/pull/1871
+            return Math.round(this._d.getTimezoneOffset() / 15) * 15;
+        }
+    });
+
+    function rawMonthSetter(mom, value) {
+        var dayOfMonth;
+
+        // TODO: Move this out of here!
+        if (typeof value === 'string') {
+            value = mom.localeData().monthsParse(value);
+            // TODO: Another silent failure?
+            if (typeof value !== 'number') {
+                return mom;
+            }
+        }
+
+        dayOfMonth = Math.min(mom.date(),
+                daysInMonth(mom.year(), value));
+        mom._d['set' + (mom._isUTC ? 'UTC' : '') + 'Month'](value, dayOfMonth);
+        return mom;
+    }
+
+    function rawGetter(mom, unit) {
+        return mom._d['get' + (mom._isUTC ? 'UTC' : '') + unit]();
+    }
+
+    function rawSetter(mom, unit, value) {
+        if (unit === 'Month') {
+            return rawMonthSetter(mom, value);
+        } else {
+            return mom._d['set' + (mom._isUTC ? 'UTC' : '') + unit](value);
+        }
+    }
+
+    function makeAccessor(unit, keepTime) {
+        return function (value) {
+            if (value != null) {
+                rawSetter(this, unit, value);
+                moment.updateOffset(this, keepTime);
+                return this;
+            } else {
+                return rawGetter(this, unit);
+            }
+        };
+    }
+
+    moment.fn.millisecond = moment.fn.milliseconds = makeAccessor('Milliseconds', false);
+    moment.fn.second = moment.fn.seconds = makeAccessor('Seconds', false);
+    moment.fn.minute = moment.fn.minutes = makeAccessor('Minutes', false);
+    // Setting the hour should keep the time, because the user explicitly
+    // specified which hour he wants. So trying to maintain the same hour (in
+    // a new timezone) makes sense. Adding/subtracting hours does not follow
+    // this rule.
+    moment.fn.hour = moment.fn.hours = makeAccessor('Hours', true);
+    // moment.fn.month is defined separately
+    moment.fn.date = makeAccessor('Date', true);
+    moment.fn.dates = deprecate('dates accessor is deprecated. Use date instead.', makeAccessor('Date', true));
+    moment.fn.year = makeAccessor('FullYear', true);
+    moment.fn.years = deprecate('years accessor is deprecated. Use year instead.', makeAccessor('FullYear', true));
+
+    // add plural methods
+    moment.fn.days = moment.fn.day;
+    moment.fn.months = moment.fn.month;
+    moment.fn.weeks = moment.fn.week;
+    moment.fn.isoWeeks = moment.fn.isoWeek;
+    moment.fn.quarters = moment.fn.quarter;
+
+    // add aliased format methods
+    moment.fn.toJSON = moment.fn.toISOString;
+
+    /************************************
+        Duration Prototype
+    ************************************/
+
+
+    function daysToYears (days) {
+        // 400 years have 146097 days (taking into account leap year rules)
+        return days * 400 / 146097;
+    }
+
+    function yearsToDays (years) {
+        // years * 365 + absRound(years / 4) -
+        //     absRound(years / 100) + absRound(years / 400);
+        return years * 146097 / 400;
+    }
+
+    extend(moment.duration.fn = Duration.prototype, {
+
+        _bubble : function () {
+            var milliseconds = this._milliseconds,
+                days = this._days,
+                months = this._months,
+                data = this._data,
+                seconds, minutes, hours, years = 0;
+
+            // The following code bubbles up values, see the tests for
+            // examples of what that means.
+            data.milliseconds = milliseconds % 1000;
+
+            seconds = absRound(milliseconds / 1000);
+            data.seconds = seconds % 60;
+
+            minutes = absRound(seconds / 60);
+            data.minutes = minutes % 60;
+
+            hours = absRound(minutes / 60);
+            data.hours = hours % 24;
+
+            days += absRound(hours / 24);
+
+            // Accurately convert days to years, assume start from year 0.
+            years = absRound(daysToYears(days));
+            days -= absRound(yearsToDays(years));
+
+            // 30 days to a month
+            // TODO (iskren): Use anchor date (like 1st Jan) to compute this.
+            months += absRound(days / 30);
+            days %= 30;
+
+            // 12 months -> 1 year
+            years += absRound(months / 12);
+            months %= 12;
+
+            data.days = days;
+            data.months = months;
+            data.years = years;
+        },
+
+        abs : function () {
+            this._milliseconds = Math.abs(this._milliseconds);
+            this._days = Math.abs(this._days);
+            this._months = Math.abs(this._months);
+
+            this._data.milliseconds = Math.abs(this._data.milliseconds);
+            this._data.seconds = Math.abs(this._data.seconds);
+            this._data.minutes = Math.abs(this._data.minutes);
+            this._data.hours = Math.abs(this._data.hours);
+            this._data.months = Math.abs(this._data.months);
+            this._data.years = Math.abs(this._data.years);
+
+            return this;
+        },
+
+        weeks : function () {
+            return absRound(this.days() / 7);
+        },
+
+        valueOf : function () {
+            return this._milliseconds +
+              this._days * 864e5 +
+              (this._months % 12) * 2592e6 +
+              toInt(this._months / 12) * 31536e6;
+        },
+
+        humanize : function (withSuffix) {
+            var output = relativeTime(this, !withSuffix, this.localeData());
+
+            if (withSuffix) {
+                output = this.localeData().pastFuture(+this, output);
+            }
+
+            return this.localeData().postformat(output);
+        },
+
+        add : function (input, val) {
+            // supports only 2.0-style add(1, 's') or add(moment)
+            var dur = moment.duration(input, val);
+
+            this._milliseconds += dur._milliseconds;
+            this._days += dur._days;
+            this._months += dur._months;
+
+            this._bubble();
+
+            return this;
+        },
+
+        subtract : function (input, val) {
+            var dur = moment.duration(input, val);
+
+            this._milliseconds -= dur._milliseconds;
+            this._days -= dur._days;
+            this._months -= dur._months;
+
+            this._bubble();
+
+            return this;
+        },
+
+        get : function (units) {
+            units = normalizeUnits(units);
+            return this[units.toLowerCase() + 's']();
+        },
+
+        as : function (units) {
+            var days, months;
+            units = normalizeUnits(units);
+
+            if (units === 'month' || units === 'year') {
+                days = this._days + this._milliseconds / 864e5;
+                months = this._months + daysToYears(days) * 12;
+                return units === 'month' ? months : months / 12;
+            } else {
+                // handle milliseconds separately because of floating point math errors (issue #1867)
+                days = this._days + Math.round(yearsToDays(this._months / 12));
+                switch (units) {
+                    case 'week': return days / 7 + this._milliseconds / 6048e5;
+                    case 'day': return days + this._milliseconds / 864e5;
+                    case 'hour': return days * 24 + this._milliseconds / 36e5;
+                    case 'minute': return days * 24 * 60 + this._milliseconds / 6e4;
+                    case 'second': return days * 24 * 60 * 60 + this._milliseconds / 1000;
+                    // Math.floor prevents floating point math errors here
+                    case 'millisecond': return Math.floor(days * 24 * 60 * 60 * 1000) + this._milliseconds;
+                    default: throw new Error('Unknown unit ' + units);
+                }
+            }
+        },
+
+        lang : moment.fn.lang,
+        locale : moment.fn.locale,
+
+        toIsoString : deprecate(
+            'toIsoString() is deprecated. Please use toISOString() instead ' +
+            '(notice the capitals)',
+            function () {
+                return this.toISOString();
+            }
+        ),
+
+        toISOString : function () {
+            // inspired by https://github.com/dordille/moment-isoduration/blob/master/moment.isoduration.js
+            var years = Math.abs(this.years()),
+                months = Math.abs(this.months()),
+                days = Math.abs(this.days()),
+                hours = Math.abs(this.hours()),
+                minutes = Math.abs(this.minutes()),
+                seconds = Math.abs(this.seconds() + this.milliseconds() / 1000);
+
+            if (!this.asSeconds()) {
+                // this is the same as C#'s (Noda) and python (isodate)...
+                // but not other JS (goog.date)
+                return 'P0D';
+            }
+
+            return (this.asSeconds() < 0 ? '-' : '') +
+                'P' +
+                (years ? years + 'Y' : '') +
+                (months ? months + 'M' : '') +
+                (days ? days + 'D' : '') +
+                ((hours || minutes || seconds) ? 'T' : '') +
+                (hours ? hours + 'H' : '') +
+                (minutes ? minutes + 'M' : '') +
+                (seconds ? seconds + 'S' : '');
+        },
+
+        localeData : function () {
+            return this._locale;
+        }
+    });
+
+    moment.duration.fn.toString = moment.duration.fn.toISOString;
+
+    function makeDurationGetter(name) {
+        moment.duration.fn[name] = function () {
+            return this._data[name];
+        };
+    }
+
+    for (i in unitMillisecondFactors) {
+        if (hasOwnProp(unitMillisecondFactors, i)) {
+            makeDurationGetter(i.toLowerCase());
+        }
+    }
+
+    moment.duration.fn.asMilliseconds = function () {
+        return this.as('ms');
+    };
+    moment.duration.fn.asSeconds = function () {
+        return this.as('s');
+    };
+    moment.duration.fn.asMinutes = function () {
+        return this.as('m');
+    };
+    moment.duration.fn.asHours = function () {
+        return this.as('h');
+    };
+    moment.duration.fn.asDays = function () {
+        return this.as('d');
+    };
+    moment.duration.fn.asWeeks = function () {
+        return this.as('weeks');
+    };
+    moment.duration.fn.asMonths = function () {
+        return this.as('M');
+    };
+    moment.duration.fn.asYears = function () {
+        return this.as('y');
+    };
+
+    /************************************
+        Default Locale
+    ************************************/
+
+
+    // Set default locale, other locale will inherit from English.
+    moment.locale('en', {
+        ordinalParse: /\d{1,2}(th|st|nd|rd)/,
+        ordinal : function (number) {
+            var b = number % 10,
+                output = (toInt(number % 100 / 10) === 1) ? 'th' :
+                (b === 1) ? 'st' :
+                (b === 2) ? 'nd' :
+                (b === 3) ? 'rd' : 'th';
+            return number + output;
+        }
+    });
+
+    /* EMBED_LOCALES */
+
+    /************************************
+        Exposing Moment
+    ************************************/
+
+    function makeGlobal(shouldDeprecate) {
+        /*global ender:false */
+        if (typeof ender !== 'undefined') {
+            return;
+        }
+        oldGlobalMoment = globalScope.moment;
+        if (shouldDeprecate) {
+            globalScope.moment = deprecate(
+                    'Accessing Moment through the global scope is ' +
+                    'deprecated, and will be removed in an upcoming ' +
+                    'release.',
+                    moment);
+        } else {
+            globalScope.moment = moment;
+        }
+    }
+
+    // CommonJS module is defined
+    if (hasModule) {
+        module.exports = moment;
+    } else if (typeof define === 'function' && define.amd) {
+        define('moment', function (require, exports, module) {
+            if (module.config && module.config() && module.config().noGlobal === true) {
+                // release the global variable
+                globalScope.moment = oldGlobalMoment;
+            }
+
+            return moment;
+        });
+        makeGlobal(true);
+    } else {
+        makeGlobal();
+    }
+}).call(this);
diff --git a/previous_versions/v0.4.0/libs/moment-fquarter-1.0.0/moment-fquarter.min.js b/previous_versions/v0.4.0/libs/moment-fquarter-1.0.0/moment-fquarter.min.js
new file mode 100644
index 000000000..fb8bd90fa
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/moment-fquarter-1.0.0/moment-fquarter.min.js
@@ -0,0 +1 @@
+(function(){function n(n){return n.fn.fquarter=function(n){var u=this.lang()._quarter||"Q",t={},r,i=null;return n=n||4,n>1?(r=this.subtract("months",n-1),i=r.clone().add("years",1)):r=this,t.quarter=Math.ceil((r.month()+1)/3),t.year=r.year(),t.nextYear=i?i.year():i,t.toString=function(){var n=u+t.quarter+" "+t.year;return i?n+"/"+i.format("YY"):n},t},n}typeof define=="function"&&define.amd?define("moment-fquarter",["moment"],n):typeof module!="undefined"?module.exports=n(require("moment")):typeof window!="undefined"&&window.moment&&n(window.moment)}).apply(this);
\ No newline at end of file
diff --git a/previous_versions/v0.4.0/libs/moment-timezone-0.2.5/moment-timezone-with-data.js b/previous_versions/v0.4.0/libs/moment-timezone-0.2.5/moment-timezone-with-data.js
new file mode 100644
index 000000000..27718d62c
--- /dev/null
+++ b/previous_versions/v0.4.0/libs/moment-timezone-0.2.5/moment-timezone-with-data.js
@@ -0,0 +1,990 @@
+//! moment-timezone.js
+//! version : 0.2.5
+//! author : Tim Wood
+//! license : MIT
+//! github.com/moment/moment-timezone
+
+(function (root, factory) {
+	"use strict";
+
+	/*global define*/
+	if (typeof define === 'function' && define.amd) {
+		define(['moment'], factory);                 // AMD
+	} else if (typeof exports === 'object') {
+		module.exports = factory(require('moment')); // Node
+	} else {
+		factory(root.moment);                        // Browser
+	}
+}(this, function (moment) {
+	"use strict";
+
+	// Do not load moment-timezone a second time.
+	if (moment.tz !== undefined) { return moment; }
+
+	var VERSION = "0.2.5",
+		zones = {},
+		links = {},
+
+		momentVersion = moment.version.split('.'),
+		major = +momentVersion[0],
+		minor = +momentVersion[1];
+
+	// Moment.js version check
+	if (major < 2 || (major === 2 && minor < 6)) {
+		logError('Moment Timezone requires Moment.js >= 2.6.0. You are using Moment.js ' + moment.version + '. See momentjs.com');
+	}
+
+	/************************************
+		Unpacking
+	************************************/
+
+	function charCodeToInt(charCode) {
+		if (charCode > 96) {
+			return charCode - 87;
+		} else if (charCode > 64) {
+			return charCode - 29;
+		}
+		return charCode - 48;
+	}
+
+	function unpackBase60(string) {
+		var i = 0,
+			parts = string.split('.'),
+			whole = parts[0],
+			fractional = parts[1] || '',
+			multiplier = 1,
+			num,
+			out = 0,
+			sign = 1;
+
+		// handle negative numbers
+		if (string.charCodeAt(0) === 45) {
+			i = 1;
+			sign = -1;
+		}
+
+		// handle digits before the decimal
+		for (i; i < whole.length; i++) {
+			num = charCodeToInt(whole.charCodeAt(i));
+			out = 60 * out + num;
+		}
+
+		// handle digits after the decimal
+		for (i = 0; i < fractional.length; i++) {
+			multiplier = multiplier / 60;
+			num = charCodeToInt(fractional.charCodeAt(i));
+			out += num * multiplier;
+		}
+
+		return out * sign;
+	}
+
+	function arrayToInt (array) {
+		for (var i = 0; i < array.length; i++) {
+			array[i] = unpackBase60(array[i]);
+		}
+	}
+
+	function intToUntil (array, length) {
+		for (var i = 0; i < length; i++) {
+			array[i] = Math.round((array[i - 1] || 0) + (array[i] * 60000)); // minutes to milliseconds
+		}
+
+		array[length - 1] = Infinity;
+	}
+
+	function mapIndices (source, indices) {
+		var out = [], i;
+
+		for (i = 0; i < indices.length; i++) {
+			out[i] = source[indices[i]];
+		}
+
+		return out;
+	}
+
+	function unpack (string) {
+		var data = string.split('|'),
+			offsets = data[2].split(' '),
+			indices = data[3].split(''),
+			untils  = data[4].split(' ');
+
+		arrayToInt(offsets);
+		arrayToInt(indices);
+		arrayToInt(untils);
+
+		intToUntil(untils, indices.length);
+
+		return {
+			name    : data[0],
+			abbrs   : mapIndices(data[1].split(' '), indices),
+			offsets : mapIndices(offsets, indices),
+			untils  : untils
+		};
+	}
+
+	/************************************
+		Zone object
+	************************************/
+
+	function Zone (packedString) {
+		if (packedString) {
+			this._set(unpack(packedString));
+		}
+	}
+
+	Zone.prototype = {
+		_set : function (unpacked) {
+			this.name    = unpacked.name;
+			this.abbrs   = unpacked.abbrs;
+			this.untils  = unpacked.untils;
+			this.offsets = unpacked.offsets;
+		},
+
+		_index : function (timestamp) {
+			var target = +timestamp,
+				untils = this.untils,
+				i;
+
+			for (i = 0; i < untils.length; i++) {
+				if (target < untils[i]) {
+					return i;
+				}
+			}
+		},
+
+		parse : function (timestamp) {
+			var target  = +timestamp,
+				offsets = this.offsets,
+				untils  = this.untils,
+				max     = untils.length - 1,
+				offset, offsetNext, offsetPrev, i;
+
+			for (i = 0; i < max; i++) {
+				offset     = offsets[i];
+				offsetNext = offsets[i + 1];
+				offsetPrev = offsets[i ? i - 1 : i];
+
+				if (offset < offsetNext && tz.moveAmbiguousForward) {
+					offset = offsetNext;
+				} else if (offset > offsetPrev && tz.moveInvalidForward) {
+					offset = offsetPrev;
+				}
+
+				if (target < untils[i] - (offset * 60000)) {
+					return offsets[i];
+				}
+			}
+
+			return offsets[max];
+		},
+
+		abbr : function (mom) {
+			return this.abbrs[this._index(mom)];
+		},
+
+		offset : function (mom) {
+			return this.offsets[this._index(mom)];
+		}
+	};
+
+	/************************************
+		Global Methods
+	************************************/
+
+	function normalizeName (name) {
+		return (name || '').toLowerCase().replace(/\//g, '_');
+	}
+
+	function addZone (packed) {
+		var i, zone, zoneName;
+
+		if (typeof packed === "string") {
+			packed = [packed];
+		}
+
+		for (i = 0; i < packed.length; i++) {
+			zone = new Zone(packed[i]);
+			zoneName = normalizeName(zone.name);
+			zones[zoneName] = zone;
+			upgradeLinksToZones(zoneName);
+		}
+	}
+
+	function getZone (name) {
+		return zones[normalizeName(name)] || null;
+	}
+
+	function getNames () {
+		var i, out = [];
+
+		for (i in zones) {
+			if (zones.hasOwnProperty(i) && zones[i]) {
+				out.push(zones[i].name);
+			}
+		}
+
+		return out.sort();
+	}
+
+	function addLink (aliases) {
+		var i, alias;
+
+		if (typeof aliases === "string") {
+			aliases = [aliases];
+		}
+
+		for (i = 0; i < aliases.length; i++) {
+			alias = aliases[i].split('|');
+			pushLink(alias[0], alias[1]);
+			pushLink(alias[1], alias[0]);
+		}
+	}
+
+	function upgradeLinksToZones (zoneName) {
+		if (!links[zoneName]) {
+			return;
+		}
+
+		var i,
+			zone = zones[zoneName],
+			linkNames = links[zoneName];
+
+		for (i = 0; i < linkNames.length; i++) {
+			copyZoneWithName(zone, linkNames[i]);
+		}
+
+		links[zoneName] = null;
+	}
+
+	function copyZoneWithName (zone, name) {
+		var linkZone = zones[normalizeName(name)] = new Zone();
+		linkZone._set(zone);
+		linkZone.name = name;
+	}
+
+	function pushLink (zoneName, linkName) {
+		zoneName = normalizeName(zoneName);
+
+		if (zones[zoneName]) {
+			copyZoneWithName(zones[zoneName], linkName);
+		} else {
+			links[zoneName] = links[zoneName] || [];
+			links[zoneName].push(linkName);
+		}
+	}
+
+	function loadData (data) {
+		addZone(data.zones);
+		addLink(data.links);
+		tz.dataVersion = data.version;
+	}
+
+	function zoneExists (name) {
+		if (!zoneExists.didShowError) {
+			zoneExists.didShowError = true;
+				logError("moment.tz.zoneExists('" + name + "') has been deprecated in favor of !moment.tz.zone('" + name + "')");
+		}
+		return !!getZone(name);
+	}
+
+	function needsOffset (m) {
+		return !!(m._a && (m._tzm === undefined));
+	}
+
+	function logError (message) {
+		if (typeof console !== 'undefined' && typeof console.error === 'function') {
+			console.error(message);
+		}
+	}
+
+	/************************************
+		moment.tz namespace
+	************************************/
+
+	function tz () {
+		var args = Array.prototype.slice.call(arguments, 0, -1),
+			name = arguments[arguments.length - 1],
+			zone = getZone(name),
+			out  = moment.utc.apply(null, args);
+
+		if (zone && needsOffset(out)) {
+			out.add(zone.parse(out), 'minutes');
+		}
+
+		out.tz(name);
+
+		return out;
+	}
+
+	tz.version      = VERSION;
+	tz.dataVersion  = '';
+	tz._zones       = zones;
+	tz._links       = links;
+	tz.add          = addZone;
+	tz.link         = addLink;
+	tz.load         = loadData;
+	tz.zone         = getZone;
+	tz.zoneExists   = zoneExists; // deprecated in 0.1.0
+	tz.names        = getNames;
+	tz.Zone         = Zone;
+	tz.unpack       = unpack;
+	tz.unpackBase60 = unpackBase60;
+	tz.needsOffset  = needsOffset;
+	tz.moveInvalidForward   = true;
+	tz.moveAmbiguousForward = false;
+
+	/************************************
+		Interface with Moment.js
+	************************************/
+
+	var fn = moment.fn;
+
+	moment.tz = tz;
+
+	moment.updateOffset = function (mom, keepTime) {
+		var offset;
+		if (mom._z) {
+			offset = mom._z.offset(mom);
+			if (Math.abs(offset) < 16) {
+				offset = offset / 60;
+			}
+			mom.zone(offset, keepTime);
+		}
+	};
+
+	fn.tz = function (name) {
+		if (name) {
+			this._z = getZone(name);
+			if (this._z) {
+				moment.updateOffset(this);
+			} else {
+				logError("Moment Timezone has no data for " + name + ". See http://momentjs.com/timezone/docs/#/data-loading/.");
+			}
+			return this;
+		}
+		if (this._z) { return this._z.name; }
+	};
+
+	function abbrWrap (old) {
+		return function () {
+			if (this._z) { return this._z.abbr(this); }
+			return old.call(this);
+		};
+	}
+
+	function resetZoneWrap (old) {
+		return function () {
+			this._z = null;
+			return old.apply(this, arguments);
+		};
+	}
+
+	fn.zoneName = abbrWrap(fn.zoneName);
+	fn.zoneAbbr = abbrWrap(fn.zoneAbbr);
+	fn.utc      = resetZoneWrap(fn.utc);
+
+	// Cloning a moment should include the _z property.
+	var momentProperties = moment.momentProperties;
+	if (Object.prototype.toString.call(momentProperties) === '[object Array]') {
+		// moment 2.8.1+
+		momentProperties.push('_z');
+		momentProperties.push('_a');
+	} else if (momentProperties) {
+		// moment 2.7.0
+		momentProperties._z = null;
+	}
+
+	loadData({
+		"version": "2014j",
+		"zones": [
+			"Africa/Abidjan|LMT GMT|g.8 0|01|-2ldXH.Q",
+			"Africa/Accra|LMT GMT GHST|.Q 0 -k|012121212121212121212121212121212121212121212121|-26BbX.8 6tzX.8 MnE 1BAk MnE 1BAk MnE 1BAk MnE 1C0k MnE 1BAk MnE 1BAk MnE 1BAk MnE 1C0k MnE 1BAk MnE 1BAk MnE 1BAk MnE 1C0k MnE 1BAk MnE 1BAk MnE 1BAk MnE 1C0k MnE 1BAk MnE 1BAk MnE 1BAk MnE 1C0k MnE 1BAk MnE 1BAk MnE",
+			"Africa/Addis_Ababa|LMT EAT BEAT BEAUT|-2r.g -30 -2u -2J|01231|-1F3Cr.g 3Dzr.g okMu MFXJ",
+			"Africa/Algiers|PMT WET WEST CET CEST|-9.l 0 -10 -10 -20|0121212121212121343431312123431213|-2nco9.l cNb9.l HA0 19A0 1iM0 11c0 1oo0 Wo0 1rc0 QM0 1EM0 UM0 DA0 Imo0 rd0 De0 9Xz0 1fb0 1ap0 16K0 2yo0 mEp0 hwL0 jxA0 11A0 dDd0 17b0 11B0 1cN0 2Dy0 1cN0 1fB0 1cL0",
+			"Africa/Bangui|LMT WAT|-d.A -10|01|-22y0d.A",
+			"Africa/Bissau|LMT WAT GMT|12.k 10 0|012|-2ldWV.E 2xonV.E",
+			"Africa/Blantyre|LMT CAT|-2a.k -20|01|-2GJea.k",
+			"Africa/Cairo|EET EEST|-20 -30|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-1bIO0 vb0 1ip0 11z0 1iN0 1nz0 12p0 1pz0 10N0 1pz0 16p0 1jz0 s3d0 Vz0 1oN0 11b0 1oO0 10N0 1pz0 10N0 1pb0 10N0 1pb0 10N0 1pb0 10N0 1pz0 10N0 1pb0 10N0 1pb0 11d0 1oL0 11d0 1pb0 11d0 1oL0 11d0 1oL0 11d0 1oL0 11d0 1pb0 11d0 1oL0 11d0 1oL0 11d0 1oL0 11d0 1pb0 11d0 1oL0 11d0 1oL0 11d0 1oL0 11d0 1pb0 11d0 1oL0 11d0 1WL0 rd0 1Rz0 wp0 1pb0 11d0 1oL0 11d0 1oL0 11d0 1oL0 11d0 1pb0 11d0 1qL0 Xd0 1oL0 11d0 1oL0 11d0 1pb0 11d0 1oL0 11d0 1oL0 11d0 1ny0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 WL0 1qN0 Rb0 1wp0 On0 1zd0 Lz0 1EN0 Fb0 c10 8n0 8Nd0 gL0 e10 mn0 1o10 jz0 gN0 pb0 1qN0 dX0 e10 xz0 1o10 bb0 e10 An0 1o10 5z0 e10 FX0 1o10 2L0 e10 IL0 1C10 Lz0 1wp0 TX0 1qN0 WL0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0",
+			"Africa/Casablanca|LMT WET WEST CET|u.k 0 -10 -10|012121212121212121312121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2gMnt.E 130Lt.E rb0 Dd0 dVb0 b6p0 TX0 EoB0 LL0 gnd0 rz0 43d0 AL0 1Nd0 XX0 1Cp0 pz0 dEp0 4mn0 SyN0 AL0 1Nd0 wn0 1FB0 Db0 1zd0 Lz0 1Nf0 wM0 co0 go0 1o00 s00 dA0 vc0 11A0 A00 e00 y00 11A0 uo0 e00 DA0 11A0 rA0 e00 Jc0 WM0 m00 gM0 M00 WM0 jc0 e00 RA0 11A0 dA0 e00 Uo0 11A0 800 gM0 Xc0 11A0 5c0 e00 17A0 WM0 2o0 e00 1ao0 19A0 1g00 16M0 1iM0 1400 1lA0 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qo0 1200 1kM0 14M0 1i00",
+			"Africa/Ceuta|WET WEST CET CEST|0 -10 -10 -20|010101010101010101010232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-25KN0 11z0 drd0 18o0 3I00 17c0 1fA0 1a00 1io0 1a00 1y7p0 LL0 gnd0 rz0 43d0 AL0 1Nd0 XX0 1Cp0 pz0 dEp0 4VB0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Africa/El_Aaiun|LMT WAT WET WEST|Q.M 10 0 -10|0123232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-1rDz7.c 1GVA7.c 6L0 AL0 1Nd0 XX0 1Cp0 pz0 1cBB0 AL0 1Nd0 wn0 1FB0 Db0 1zd0 Lz0 1Nf0 wM0 co0 go0 1o00 s00 dA0 vc0 11A0 A00 e00 y00 11A0 uo0 e00 DA0 11A0 rA0 e00 Jc0 WM0 m00 gM0 M00 WM0 jc0 e00 RA0 11A0 dA0 e00 Uo0 11A0 800 gM0 Xc0 11A0 5c0 e00 17A0 WM0 2o0 e00 1ao0 19A0 1g00 16M0 1iM0 1400 1lA0 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qo0 1200 1kM0 14M0 1i00",
+			"Africa/Johannesburg|SAST SAST SAST|-1u -20 -30|012121|-2GJdu 1Ajdu 1cL0 1cN0 1cL0",
+			"Africa/Juba|LMT CAT CAST EAT|-2a.8 -20 -30 -30|01212121212121212121212121212121213|-1yW2a.8 1zK0a.8 16L0 1iN0 17b0 1jd0 17b0 1ip0 17z0 1i10 17X0 1hB0 18n0 1hd0 19b0 1gp0 19z0 1iN0 17b0 1ip0 17z0 1i10 18n0 1hd0 18L0 1gN0 19b0 1gp0 19z0 1iN0 17z0 1i10 17X0 yGd0",
+			"Africa/Monrovia|MMT LRT GMT|H.8 I.u 0|012|-23Lzg.Q 29s01.m",
+			"Africa/Ndjamena|LMT WAT WAST|-10.c -10 -20|0121|-2le10.c 2J3c0.c Wn0",
+			"Africa/Tripoli|LMT CET CEST EET|-Q.I -10 -20 -20|012121213121212121212121213123123|-21JcQ.I 1hnBQ.I vx0 4iP0 xx0 4eN0 Bb0 7ip0 U0n0 A10 1db0 1cN0 1db0 1dd0 1db0 1eN0 1bb0 1e10 1cL0 1c10 1db0 1dd0 1db0 1cN0 1db0 1q10 fAn0 1ep0 1db0 AKq0 TA0 1o00",
+			"Africa/Tunis|PMT CET CEST|-9.l -10 -20|0121212121212121212121212121212121|-2nco9.l 18pa9.l 1qM0 DA0 3Tc0 11B0 1ze0 WM0 7z0 3d0 14L0 1cN0 1f90 1ar0 16J0 1gXB0 WM0 1rA0 11c0 nwo0 Ko0 1cM0 1cM0 1rA0 10M0 zuM0 10N0 1aN0 1qM0 WM0 1qM0 11A0 1o00",
+			"Africa/Windhoek|SWAT SAST SAST CAT WAT WAST|-1u -20 -30 -20 -10 -20|012134545454545454545454545454545454545454545454545454545454545454545454545454545454545454545|-2GJdu 1Ajdu 1cL0 1SqL0 9NA0 11D0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 11B0 1nX0 11B0",
+			"America/Adak|NST NWT NPT BST BDT AHST HAST HADT|b0 a0 a0 b0 a0 a0 a0 90|012034343434343434343434343434343456767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676|-17SX0 8wW0 iB0 Qlb0 52O0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 cm0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Anchorage|CAT CAWT CAPT AHST AHDT YST AKST AKDT|a0 90 90 a0 90 90 90 80|012034343434343434343434343434343456767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676|-17T00 8wX0 iA0 Qlb0 52O0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 cm0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Anguilla|LMT AST|46.4 40|01|-2kNvR.U",
+			"America/Antigua|LMT EST AST|47.c 50 40|012|-2kNvQ.M 1yxAQ.M",
+			"America/Araguaina|LMT BRT BRST|3c.M 30 20|0121212121212121212121212121212121212121212121212121|-2glwL.c HdKL.c 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 dMN0 Lz0 1zd0 Rb0 1wN0 Wn0 1tB0 Rb0 1tB0 WL0 1tB0 Rb0 1zd0 On0 1HB0 FX0 ny10 Lz0",
+			"America/Argentina/Buenos_Aires|CMT ART ARST ART ARST|4g.M 40 30 30 20|0121212121212121212121212121212121212121213434343434343234343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wp0 Rb0 1wp0 TX0 g0p0 10M0 j3c0 uL0 1qN0 WL0",
+			"America/Argentina/Catamarca|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|0121212121212121212121212121212121212121213434343454343235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wq0 Ra0 1wp0 TX0 g0p0 10M0 ako0 7B0 8zb0 uL0",
+			"America/Argentina/Cordoba|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|0121212121212121212121212121212121212121213434343454343234343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wq0 Ra0 1wp0 TX0 g0p0 10M0 j3c0 uL0 1qN0 WL0",
+			"America/Argentina/Jujuy|CMT ART ARST ART ARST WART WARST|4g.M 40 30 30 20 40 30|01212121212121212121212121212121212121212134343456543432343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1ze0 TX0 1ld0 WK0 1wp0 TX0 g0p0 10M0 j3c0 uL0",
+			"America/Argentina/La_Rioja|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|01212121212121212121212121212121212121212134343434534343235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Qn0 qO0 16n0 Rb0 1wp0 TX0 g0p0 10M0 ako0 7B0 8zb0 uL0",
+			"America/Argentina/Mendoza|CMT ART ARST ART ARST WART WARST|4g.M 40 30 30 20 40 30|0121212121212121212121212121212121212121213434345656543235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1u20 SL0 1vd0 Tb0 1wp0 TW0 g0p0 10M0 agM0 Op0 7TX0 uL0",
+			"America/Argentina/Rio_Gallegos|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|0121212121212121212121212121212121212121213434343434343235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wp0 Rb0 1wp0 TX0 g0p0 10M0 ako0 7B0 8zb0 uL0",
+			"America/Argentina/Salta|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|01212121212121212121212121212121212121212134343434543432343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wq0 Ra0 1wp0 TX0 g0p0 10M0 j3c0 uL0",
+			"America/Argentina/San_Juan|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|01212121212121212121212121212121212121212134343434534343235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Qn0 qO0 16n0 Rb0 1wp0 TX0 g0p0 10M0 ak00 m10 8lb0 uL0",
+			"America/Argentina/San_Luis|CMT ART ARST ART ARST WART WARST|4g.M 40 30 30 20 40 30|01212121212121212121212121212121212121212134343456536353465653|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 XX0 1q20 SL0 AN0 kin0 10M0 ak00 m10 8lb0 8L0 jd0 1qN0 WL0 1qN0",
+			"America/Argentina/Tucuman|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|012121212121212121212121212121212121212121343434345434323534343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wq0 Ra0 1wp0 TX0 g0p0 10M0 ako0 4N0 8BX0 uL0 1qN0 WL0",
+			"America/Argentina/Ushuaia|CMT ART ARST ART ARST WART|4g.M 40 30 30 20 40|0121212121212121212121212121212121212121213434343434343235343|-20UHH.c pKnH.c Mn0 1iN0 Tb0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 1C10 LX0 1C10 LX0 1C10 LX0 1C10 Mn0 MN0 2jz0 MN0 4lX0 u10 5Lb0 1pB0 Fnz0 u10 uL0 1vd0 SL0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 zvd0 Bz0 1tB0 TX0 1wp0 Rb0 1wp0 Rb0 1wp0 TX0 g0p0 10M0 ajA0 8p0 8zb0 uL0",
+			"America/Aruba|LMT ANT AST|4z.L 4u 40|012|-2kV7o.d 28KLS.d",
+			"America/Asuncion|AMT PYT PYT PYST|3O.E 40 30 30|012131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313|-1x589.k 1DKM9.k 3CL0 3Dd0 10L0 1pB0 10n0 1pB0 10n0 1pB0 1cL0 1dd0 1db0 1dd0 1cL0 1dd0 1cL0 1dd0 1cL0 1dd0 1db0 1dd0 1cL0 1dd0 1cL0 1dd0 1cL0 1dd0 1db0 1dd0 1cL0 1lB0 14n0 1dd0 1cL0 1fd0 WL0 1rd0 1aL0 1dB0 Xz0 1qp0 Xb0 1qN0 10L0 1rB0 TX0 1tB0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 1cL0 WN0 1qL0 11B0 1nX0 1ip0 WL0 1qN0 WL0 1qN0 WL0 1tB0 TX0 1tB0 TX0 1tB0 19X0 1a10 1fz0 1a10 1fz0 1cN0 17b0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1fB0 19X0 1fB0 19X0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1fB0 19X0 1fB0 19X0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1ip0 17b0 1ip0 17b0 1ip0",
+			"America/Atikokan|CST CDT CWT CPT EST|60 50 50 50 50|0101234|-25TQ0 1in0 Rnb0 3je0 8x30 iw0",
+			"America/Bahia|LMT BRT BRST|2y.4 30 20|01212121212121212121212121212121212121212121212121212121212121|-2glxp.U HdLp.U 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 1EN0 Lz0 1C10 IL0 1HB0 Db0 1HB0 On0 1zd0 On0 1zd0 Lz0 1zd0 Rb0 1wN0 Wn0 1tB0 Rb0 1tB0 WL0 1tB0 Rb0 1zd0 On0 1HB0 FX0 l5B0 Rb0",
+			"America/Bahia_Banderas|LMT MST CST PST MDT CDT|71 70 60 80 60 50|0121212131414141414141414141414141414152525252525252525252525252525252525252525252525252525252|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 otX0 gmN0 P2N0 13Vd0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nW0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Barbados|LMT BMT AST ADT|3W.t 3W.t 40 30|01232323232|-1Q0I1.v jsM0 1ODC1.v IL0 1ip0 17b0 1ip0 17b0 1ld0 13b0",
+			"America/Belem|LMT BRT BRST|3d.U 30 20|012121212121212121212121212121|-2glwK.4 HdKK.4 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0",
+			"America/Belize|LMT CST CHDT CDT|5Q.M 60 5u 50|01212121212121212121212121212121212121212121212121213131|-2kBu7.c fPA7.c Onu 1zcu Rbu 1wou Rbu 1wou Rbu 1zcu Onu 1zcu Onu 1zcu Rbu 1wou Rbu 1wou Rbu 1wou Rbu 1zcu Onu 1zcu Onu 1zcu Rbu 1wou Rbu 1wou Rbu 1zcu Onu 1zcu Onu 1zcu Onu 1zcu Rbu 1wou Rbu 1wou Rbu 1zcu Onu 1zcu Onu 1zcu Rbu 1wou Rbu 1f0Mu qn0 lxB0 mn0",
+			"America/Blanc-Sablon|AST ADT AWT APT|40 30 30 30|010230|-25TS0 1in0 UGp0 8x50 iu0",
+			"America/Boa_Vista|LMT AMT AMST|42.E 40 30|0121212121212121212121212121212121|-2glvV.k HdKV.k 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 smp0 WL0 1tB0 2L0",
+			"America/Bogota|BMT COT COST|4U.g 50 40|0121|-2eb73.I 38yo3.I 2en0",
+			"America/Boise|PST PDT MST MWT MPT MDT|80 70 70 60 60 60|0101023425252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252|-261q0 1nX0 11B0 1nX0 8C10 JCL0 8x20 ix0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 Dd0 1Kn0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Cambridge_Bay|zzz MST MWT MPT MDDT MDT CST CDT EST|0 70 60 60 50 60 60 50 50|0123141515151515151515151515151515151515151515678651515151515151515151515151515151515151515151515151515151515151515151515151|-21Jc0 RO90 8x20 ix0 LCL0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11A0 1nX0 2K0 WQ0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Campo_Grande|LMT AMT AMST|3C.s 40 30|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212|-2glwl.w HdLl.w 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 1EN0 Lz0 1C10 IL0 1HB0 Db0 1HB0 On0 1zd0 On0 1zd0 Lz0 1zd0 Rb0 1wN0 Wn0 1tB0 Rb0 1tB0 WL0 1tB0 Rb0 1zd0 On0 1HB0 FX0 1C10 Lz0 1Ip0 HX0 1zd0 On0 1HB0 IL0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1zd0 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0",
+			"America/Cancun|LMT CST EST EDT CDT|5L.4 60 50 40 50|0123232341414141414141414141414141414141414141414141414141414141414141414141414141414141|-1UQG0 2q2o0 yLB0 1lb0 14p0 1lb0 14p0 Lz0 xB0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Caracas|CMT VET VET|4r.E 4u 40|0121|-2kV7w.k 28KM2.k 1IwOu",
+			"America/Cayenne|LMT GFT GFT|3t.k 40 30|012|-2mrwu.E 2gWou.E",
+			"America/Cayman|KMT EST|57.b 50|01|-2l1uQ.N",
+			"America/Chicago|CST CDT EST CWT CPT|60 50 50 50 50|01010101010101010101010101010101010102010101010103401010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261s0 1nX0 11B0 1nX0 1wp0 TX0 WN0 1qL0 1cN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 11B0 1Hz0 14p0 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 RB0 8x30 iw0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Chihuahua|LMT MST CST CDT MDT|74.k 70 60 50 60|0121212323241414141414141414141414141414141414141414141414141414141414141414141414141414141|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 2zQN0 1lb0 14p0 1lb0 14q0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Costa_Rica|SJMT CST CDT|5A.d 60 50|0121212121|-1Xd6n.L 2lu0n.L Db0 1Kp0 Db0 pRB0 15b0 1kp0 mL0",
+			"America/Creston|MST PST|70 80|010|-29DR0 43B0",
+			"America/Cuiaba|LMT AMT AMST|3I.k 40 30|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212|-2glwf.E HdLf.E 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 1EN0 Lz0 1C10 IL0 1HB0 Db0 1HB0 On0 1zd0 On0 1zd0 Lz0 1zd0 Rb0 1wN0 Wn0 1tB0 Rb0 1tB0 WL0 1tB0 Rb0 1zd0 On0 1HB0 FX0 4a10 HX0 1zd0 On0 1HB0 IL0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1zd0 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0",
+			"America/Danmarkshavn|LMT WGT WGST GMT|1e.E 30 20 0|01212121212121212121212121212121213|-2a5WJ.k 2z5fJ.k 19U0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 DC0",
+			"America/Dawson|YST YDT YWT YPT YDDT PST PDT|90 80 80 80 70 80 70|0101023040565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565|-25TN0 1in0 1o10 13V0 Ser0 8x00 iz0 LCL0 1fA0 jrA0 fNd0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Dawson_Creek|PST PDT PWT PPT MST|80 70 70 70 70|0102301010101010101010101010101010101010101010101010101014|-25TO0 1in0 UGp0 8x10 iy0 3NB0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 ML0",
+			"America/Denver|MST MDT MWT MPT|70 60 60 60|01010101023010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261r0 1nX0 11B0 1nX0 11B0 1qL0 WN0 mn0 Ord0 8x20 ix0 LCN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Detroit|LMT CST EST EWT EPT EDT|5w.b 60 50 40 40 40|01234252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252|-2Cgir.N peqr.N 156L0 8x40 iv0 6fd0 11z0 Jy10 SL0 dnB0 1cL0 s10 1Vz0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Edmonton|LMT MST MDT MWT MPT|7x.Q 70 60 60 60|01212121212121341212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2yd4q.8 shdq.8 1in0 17d0 hz0 2dB0 1fz0 1a10 11z0 1qN0 WL0 1qN0 11z0 IGN0 8x20 ix0 3NB0 11z0 LFB0 1cL0 3Cp0 1cL0 66N0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Eirunepe|LMT ACT ACST AMT|4D.s 50 40 40|0121212121212121212121212121212131|-2glvk.w HdLk.w 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 dPB0 On0 yTd0 d5X0",
+			"America/El_Salvador|LMT CST CDT|5U.M 60 50|012121|-1XiG3.c 2Fvc3.c WL0 1qN0 WL0",
+			"America/Ensenada|LMT MST PST PDT PWT PPT|7M.4 70 80 70 70 70|012123245232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-1UQE0 4PX0 8mM0 8lc0 SN0 1cL0 pHB0 83r0 zI0 5O10 1Rz0 cOP0 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 BUp0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 U10 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Fort_Wayne|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|010101023010101010101010101040454545454545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 QI10 Db0 RB0 8x30 iw0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 5Tz0 1o10 qLb0 1cL0 1cN0 1cL0 1qhd0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Fortaleza|LMT BRT BRST|2y 30 20|0121212121212121212121212121212121212121|-2glxq HdLq 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 nsp0 WL0 1tB0 5z0 2mN0 On0",
+			"America/Glace_Bay|LMT AST ADT AWT APT|3X.M 40 30 30 30|012134121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2IsI0.c CwO0.c 1in0 UGp0 8x50 iu0 iq10 11z0 Jg10 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Godthab|LMT WGT WGST|3q.U 30 20|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2a5Ux.4 2z5dx.4 19U0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"America/Goose_Bay|NST NDT NST NDT NWT NPT AST ADT ADDT|3u.Q 2u.Q 3u 2u 2u 2u 40 30 20|010232323232323245232323232323232323232323232323232323232326767676767676767676767676767676767676767676768676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676|-25TSt.8 1in0 DXb0 2HbX.8 WL0 1qN0 WL0 1qN0 WL0 1tB0 TX0 1tB0 WL0 1qN0 WL0 1qN0 7UHu itu 1tB0 WL0 1qN0 WL0 1qN0 WL0 1qN0 WL0 1tB0 WL0 1ld0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 S10 g0u 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14n1 1lb0 14p0 1nW0 11C0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zcX Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Grand_Turk|KMT EST EDT AST|57.b 50 40 40|0121212121212121212121212121212121212121212121212121212121212121212121212123|-2l1uQ.N 2HHBQ.N 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Guatemala|LMT CST CDT|62.4 60 50|0121212121|-24KhV.U 2efXV.U An0 mtd0 Nz0 ifB0 17b0 zDB0 11z0",
+			"America/Guayaquil|QMT ECT|5e 50|01|-1yVSK",
+			"America/Guyana|LMT GBGT GYT GYT GYT|3Q.E 3J 3J 30 40|01234|-2dvU7.k 24JzQ.k mlc0 Bxbf",
+			"America/Halifax|LMT AST ADT AWT APT|4e.o 40 30 30 30|0121212121212121212121212121212121212121212121212134121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2IsHJ.A xzzJ.A 1db0 3I30 1in0 3HX0 IL0 1E10 ML0 1yN0 Pb0 1Bd0 Mn0 1Bd0 Rz0 1w10 Xb0 1w10 LX0 1w10 Xb0 1w10 Lz0 1C10 Jz0 1E10 OL0 1yN0 Un0 1qp0 Xb0 1qp0 11X0 1w10 Lz0 1HB0 LX0 1C10 FX0 1w10 Xb0 1qp0 Xb0 1BB0 LX0 1td0 Xb0 1qp0 Xb0 Rf0 8x50 iu0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 3Qp0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 3Qp0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 6i10 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Havana|HMT CST CDT|5t.A 50 40|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1Meuu.o 72zu.o ML0 sld0 An0 1Nd0 Db0 1Nd0 An0 6Ep0 An0 1Nd0 An0 JDd0 Mn0 1Ap0 On0 1fd0 11X0 1qN0 WL0 1wp0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 14n0 1ld0 14L0 1kN0 15b0 1kp0 1cL0 1cN0 1fz0 1a10 1fz0 1fB0 11z0 14p0 1nX0 11B0 1nX0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 1a10 1in0 1a10 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 17c0 1o00 11A0 1qM0 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 11A0 6i00 Rc0 1wo0 U00 1tA0 Rc0 1wo0 U00 1wo0 U00 1zc0 U00 1qM0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0",
+			"America/Hermosillo|LMT MST CST PST MDT|7n.Q 70 60 80 60|0121212131414141|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 otX0 gmN0 P2N0 13Vd0 1lb0 14p0 1lb0 14p0 1lb0",
+			"America/Indiana/Knox|CST CDT CWT CPT EST|60 50 50 50 50|0101023010101010101010101010101010101040101010101010101010101010101010101010101010101010141010101010101010101010101010101010101010101010101010101010101010|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 3NB0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 11z0 1o10 11z0 1o10 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 3Cn0 8wp0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 z8o0 1o00 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Marengo|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|0101023010101010101010104545454545414545454545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 dyN0 11z0 6fd0 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 jrz0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1VA0 LA0 1BX0 1e6p0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Petersburg|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|01010230101010101010101010104010101010101010101010141014545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 njX0 WN0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 3Fb0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 19co0 1o00 Rd0 1zb0 Oo0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Tell_City|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|01010230101010101010101010101010454541010101010101010101010101010101010101010101010101010101010101010|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 1o10 11z0 g0p0 11z0 1o10 11z0 1qL0 WN0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 WL0 1qN0 1cL0 1cN0 1cL0 1cN0 caL0 1cL0 1cN0 1cL0 1qhd0 1o00 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Vevay|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|010102304545454545454545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 kPB0 Awn0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1lnd0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Vincennes|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|01010230101010101010101010101010454541014545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 1o10 11z0 g0p0 11z0 1o10 11z0 1qL0 WN0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 WL0 1qN0 1cL0 1cN0 1cL0 1cN0 caL0 1cL0 1cN0 1cL0 1qhd0 1o00 Rd0 1zb0 Oo0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Indiana/Winamac|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|01010230101010101010101010101010101010454541054545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 jrz0 1cL0 1cN0 1cL0 1qhd0 1o00 Rd0 1za0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Inuvik|zzz PST PDDT MST MDT|0 80 60 70 60|0121343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343|-FnA0 tWU0 1fA0 wPe0 2pz0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Iqaluit|zzz EWT EPT EST EDDT EDT CST CDT|0 40 40 50 30 40 60 50|01234353535353535353535353535353535353535353567353535353535353535353535353535353535353535353535353535353535353535353535353|-16K00 7nX0 iv0 LCL0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11C0 1nX0 11A0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Jamaica|KMT EST EDT|57.b 50 40|0121212121212121212121|-2l1uQ.N 2uM1Q.N 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0",
+			"America/Juneau|PST PWT PPT PDT YDT YST AKST AKDT|80 70 70 70 80 90 90 80|01203030303030303030303030403030356767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676|-17T20 8x10 iy0 Vo10 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cM0 1cM0 1cL0 1cN0 1fz0 1a10 1fz0 co0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Kentucky/Louisville|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|0101010102301010101010101010101010101454545454545414545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 3Fd0 Nb0 LPd0 11z0 RB0 8x30 iw0 Bb0 10N0 2bB0 8in0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 xz0 gso0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1VA0 LA0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Kentucky/Monticello|CST CDT CWT CPT EST EDT|60 50 50 50 50 40|0101023010101010101010101010101010101010101010101010101010101010101010101454545454545454545454545454545454545454545454545454545454545454545454545454|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 SWp0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11A0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/La_Paz|CMT BOST BOT|4w.A 3w.A 40|012|-1x37r.o 13b0",
+			"America/Lima|LMT PET PEST|58.A 50 40|0121212121212121|-2tyGP.o 1bDzP.o zX0 1aN0 1cL0 1cN0 1cL0 1PrB0 zX0 1O10 zX0 6Gp0 zX0 98p0 zX0",
+			"America/Los_Angeles|PST PDT PWT PPT|80 70 70 70|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261q0 1nX0 11B0 1nX0 SgN0 8x10 iy0 5Wp0 1Vb0 3dB0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Maceio|LMT BRT BRST|2m.Q 30 20|012121212121212121212121212121212121212121|-2glxB.8 HdLB.8 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 dMN0 Lz0 8Q10 WL0 1tB0 5z0 2mN0 On0",
+			"America/Managua|MMT CST EST CDT|5J.c 60 50 50|0121313121213131|-1quie.M 1yAMe.M 4mn0 9Up0 Dz0 1K10 Dz0 s3F0 1KH0 DB0 9In0 k8p0 19X0 1o30 11y0",
+			"America/Manaus|LMT AMT AMST|40.4 40 30|01212121212121212121212121212121|-2glvX.U HdKX.U 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 dPB0 On0",
+			"America/Martinique|FFMT AST ADT|44.k 40 30|0121|-2mPTT.E 2LPbT.E 19X0",
+			"America/Matamoros|LMT CST CDT|6E 60 50|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1UQG0 2FjC0 1nX0 i6p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 U10 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Mazatlan|LMT MST CST PST MDT|75.E 70 60 80 60|0121212131414141414141414141414141414141414141414141414141414141414141414141414141414141414141|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 otX0 gmN0 P2N0 13Vd0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Menominee|CST CDT CWT CPT EST|60 50 50 50 50|01010230101041010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 1o10 11z0 LCN0 1fz0 6410 9Jb0 1cM0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Merida|LMT CST EST CDT|5W.s 60 50 50|0121313131313131313131313131313131313131313131313131313131313131313131313131313131313131|-1UQG0 2q2o0 2hz0 wu30 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Metlakatla|PST PWT PPT PDT|80 70 70 70|0120303030303030303030303030303030|-17T20 8x10 iy0 Vo10 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0",
+			"America/Mexico_City|LMT MST CST CDT CWT|6A.A 70 60 50 50|012121232324232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 gEn0 TX0 3xd0 Jb0 6zB0 SL0 e5d0 17b0 1Pff0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Miquelon|LMT AST PMST PMDT|3I.E 40 30 20|012323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-2mKkf.k 2LTAf.k gQ10 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Moncton|EST AST ADT AWT APT|50 40 30 30 30|012121212121212121212134121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2IsH0 CwN0 1in0 zAo0 An0 1Nd0 An0 1Nd0 An0 1Nd0 An0 1Nd0 An0 1Nd0 An0 1K10 Lz0 1zB0 NX0 1u10 Wn0 S20 8x50 iu0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 3Cp0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14n1 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 ReX 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Monterrey|LMT CST CDT|6F.g 60 50|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1UQG0 2FjC0 1nX0 i6p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Montevideo|MMT UYT UYHST UYST UYT UYHST|3I.I 3u 30 20 30 2u|012121212121212121212121213434343434345454543453434343434343434343434343434343434343434343434343434343434343434343434343434343434343|-20UIf.g 8jzJ.g 1cLu 1dcu 1cLu 1dcu 1cLu ircu 11zu 1o0u 11zu 1o0u 11zu 1qMu WLu 1qMu WLu 1qMu WLu 1qMu 11zu 1o0u 11zu NAu 11bu 2iMu zWu Dq10 19X0 pd0 jz0 cm10 19X0 1fB0 1on0 11d0 1oL0 1nB0 1fzu 1aou 1fzu 1aou 1fzu 3nAu Jb0 3MN0 1SLu 4jzu 2PB0 Lb0 3Dd0 1pb0 ixd0 An0 1MN0 An0 1wp0 On0 1wp0 Rb0 1zd0 On0 1wp0 Rb0 s8p0 1fB0 1ip0 11z0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 1ld0 14n0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10 14n0 1ld0 14n0 1ld0 14n0 1ld0 14n0 1o10 11z0 1o10 11z0 1o10",
+			"America/Montreal|EST EDT EWT EPT|50 40 40 40|01010101010101010101010101010101010101010101012301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-28tR0 bV0 2m30 1in0 121u 1nb0 1g10 11z0 1o0u 11zu 1o0u 11zu 3VAu Rzu 1qMu WLu 1qMu WLu 1qKu WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 4kO0 8x40 iv0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Nassau|LMT EST EDT|59.u 50 40|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2kNuO.u 26XdO.u 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/New_York|EST EDT EWT EPT|50 40 40 40|01010101010101010101010101010101010101010101010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261t0 1nX0 11B0 1nX0 11B0 1qL0 1a10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 RB0 8x40 iv0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Nipigon|EST EDT EWT EPT|50 40 40 40|010123010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-25TR0 1in0 Rnb0 3je0 8x40 iv0 19yN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Nome|NST NWT NPT BST BDT YST AKST AKDT|b0 a0 a0 b0 a0 90 90 80|012034343434343434343434343434343456767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676767676|-17SX0 8wW0 iB0 Qlb0 52O0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 cl0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Noronha|LMT FNT FNST|29.E 20 10|0121212121212121212121212121212121212121|-2glxO.k HdKO.k 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 nsp0 WL0 1tB0 2L0 2pB0 On0",
+			"America/North_Dakota/Beulah|MST MDT MWT MPT CST CDT|70 60 60 60 60 50|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101014545454545454545454545454545454545454545454545454545454|-261r0 1nX0 11B0 1nX0 SgN0 8x20 ix0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Oo0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/North_Dakota/Center|MST MDT MWT MPT CST CDT|70 60 60 60 60 50|010102301010101010101010101010101010101010101010101010101014545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-261r0 1nX0 11B0 1nX0 SgN0 8x20 ix0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14o0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/North_Dakota/New_Salem|MST MDT MWT MPT CST CDT|70 60 60 60 60 50|010102301010101010101010101010101010101010101010101010101010101010101010101010101454545454545454545454545454545454545454545454545454545454545454545454|-261r0 1nX0 11B0 1nX0 SgN0 8x20 ix0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14o0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Ojinaga|LMT MST CST CDT MDT|6V.E 70 60 50 60|0121212323241414141414141414141414141414141414141414141414141414141414141414141414141414141|-1UQF0 deL0 8lc0 17c0 10M0 1dd0 2zQN0 1lb0 14p0 1lb0 14q0 1lb0 14p0 1nX0 11B0 1nX0 1fB0 WL0 1fB0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 U10 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Panama|CMT EST|5j.A 50|01|-2uduE.o",
+			"America/Pangnirtung|zzz AST AWT APT ADDT ADT EDT EST CST CDT|0 40 30 30 20 30 40 50 60 50|012314151515151515151515151515151515167676767689767676767676767676767676767676767676767676767676767676767676767676767676767|-1XiM0 PnG0 8x50 iu0 LCL0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1o00 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11C0 1nX0 11A0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Paramaribo|LMT PMT PMT NEGT SRT SRT|3E.E 3E.Q 3E.A 3u 3u 30|012345|-2nDUj.k Wqo0.c qanX.I 1dmLN.o lzc0",
+			"America/Phoenix|MST MDT MWT|70 60 60|01010202010|-261r0 1nX0 11B0 1nX0 SgN0 4Al1 Ap0 1db0 SWqX 1cL0",
+			"America/Port-au-Prince|PPMT EST EDT|4N 50 40|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-28RHb 2FnMb 19X0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14q0 1o00 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 14o0 1o00 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 i6n0 1nX0 11B0 1nX0 d430 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Porto_Acre|LMT ACT ACST AMT|4v.c 50 40 40|01212121212121212121212121212131|-2glvs.M HdLs.M 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 NBd0 d5X0",
+			"America/Porto_Velho|LMT AMT AMST|4f.A 40 30|012121212121212121212121212121|-2glvI.o HdKI.o 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0",
+			"America/Puerto_Rico|AST AWT APT|40 30 30|0120|-17lU0 7XT0 iu0",
+			"America/Rainy_River|CST CDT CWT CPT|60 50 50 50|010123010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-25TQ0 1in0 Rnb0 3je0 8x30 iw0 19yN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Rankin_Inlet|zzz CST CDDT CDT EST|0 60 40 50 50|012131313131313131313131313131313131313131313431313131313131313131313131313131313131313131313131313131313131313131313131|-vDc0 keu0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Recife|LMT BRT BRST|2j.A 30 20|0121212121212121212121212121212121212121|-2glxE.o HdLE.o 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 nsp0 WL0 1tB0 2L0 2pB0 On0",
+			"America/Regina|LMT MST MDT MWT MPT CST|6W.A 70 60 60 60 60|012121212121212121212121341212121212121212121212121215|-2AD51.o uHe1.o 1in0 s2L0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 66N0 1cL0 1cN0 19X0 1fB0 1cL0 1fB0 1cL0 1cN0 1cL0 M30 8x20 ix0 1ip0 1cL0 1ip0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 3NB0 1cL0 1cN0",
+			"America/Resolute|zzz CST CDDT CDT EST|0 60 40 50 50|012131313131313131313131313131313131313131313431313131313431313131313131313131313131313131313131313131313131313131313131|-SnA0 GWS0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Santa_Isabel|LMT MST PST PDT PWT PPT|7D.s 70 80 70 70 70|012123245232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-1UQE0 4PX0 8mM0 8lc0 SN0 1cL0 pHB0 83r0 zI0 5O10 1Rz0 cOP0 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 BUp0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0",
+			"America/Santarem|LMT AMT AMST BRT|3C.M 40 30 30|0121212121212121212121212121213|-2glwl.c HdLl.c 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 qe10 xb0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 NBd0",
+			"America/Santiago|SMT CLT CLT CLST CLST|4G.K 50 40 40 30|010203131313131313124242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424|-2q5Th.e fNch.e 5gLG.K 21bh.e jRAG.K 1pbh.e 11d0 1oL0 11d0 1oL0 11d0 1oL0 11d0 1pb0 11d0 nHX0 op0 9UK0 1Je0 Qen0 WL0 1zd0 On0 1ip0 11z0 1o10 11z0 1qN0 WL0 1ld0 14n0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 WL0 1qN0 1cL0 1cN0 11z0 1ld0 14n0 1qN0 11z0 1cN0 19X0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1ip0 1fz0 1fB0 11z0 1qN0 WL0 1qN0 WL0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1o10 19X0 1fB0 1nX0 G10 1EL0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0",
+			"America/Santo_Domingo|SDMT EST EDT EHDT AST|4E 50 40 4u 40|01213131313131414|-1ttjk 1lJMk Mn0 6sp0 Lbu 1Cou yLu 1RAu wLu 1QMu xzu 1Q0u xXu 1PAu 13jB0 e00",
+			"America/Sao_Paulo|LMT BRT BRST|36.s 30 20|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212|-2glwR.w HdKR.w 1cc0 1e10 1bX0 Ezd0 So0 1vA0 Mn0 1BB0 ML0 1BB0 zX0 pTd0 PX0 2ep0 nz0 1C10 zX0 1C10 LX0 1C10 Mn0 H210 Rb0 1tB0 IL0 1Fd0 FX0 1EN0 FX0 1HB0 Lz0 1EN0 Lz0 1C10 IL0 1HB0 Db0 1HB0 On0 1zd0 On0 1zd0 Lz0 1zd0 Rb0 1wN0 Wn0 1tB0 Rb0 1tB0 WL0 1tB0 Rb0 1zd0 On0 1HB0 FX0 1C10 Lz0 1Ip0 HX0 1zd0 On0 1HB0 IL0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1zd0 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0 On0 1zd0 On0 1zd0 On0 1C10 Lz0 1C10 Lz0 1C10 Lz0 1C10 On0 1zd0 Rb0 1wp0 On0 1C10 Lz0 1C10 On0 1zd0",
+			"America/Scoresbysund|LMT CGT CGST EGST EGT|1r.Q 20 10 0 10|0121343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434|-2a5Ww.8 2z5ew.8 1a00 1cK0 1cL0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"America/Sitka|PST PWT PPT PDT YST AKST AKDT|80 70 70 70 90 90 80|01203030303030303030303030303030345656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565|-17T20 8x10 iy0 Vo10 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 co0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/St_Johns|NST NDT NST NDT NWT NPT NDDT|3u.Q 2u.Q 3u 2u 2u 2u 1u|01010101010101010101010101010101010102323232323232324523232323232323232323232323232323232323232323232323232323232323232323232323232323232326232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-28oit.8 14L0 1nB0 1in0 1gm0 Dz0 1JB0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1fB0 19X0 1fB0 19X0 10O0 eKX.8 19X0 1iq0 WL0 1qN0 WL0 1qN0 WL0 1tB0 TX0 1tB0 WL0 1qN0 WL0 1qN0 7UHu itu 1tB0 WL0 1qN0 WL0 1qN0 WL0 1qN0 WL0 1tB0 WL0 1ld0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14n1 1lb0 14p0 1nW0 11C0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zcX Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Swift_Current|LMT MST MDT MWT MPT CST|7b.k 70 60 60 60 60|012134121212121212121215|-2AD4M.E uHdM.E 1in0 UGp0 8x20 ix0 1o10 17b0 1ip0 11z0 1o10 11z0 1o10 11z0 isN0 1cL0 3Cp0 1cL0 1cN0 11z0 1qN0 WL0 pMp0",
+			"America/Tegucigalpa|LMT CST CDT|5M.Q 60 50|01212121|-1WGGb.8 2ETcb.8 WL0 1qN0 WL0 GRd0 AL0",
+			"America/Thule|LMT AST ADT|4z.8 40 30|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2a5To.Q 31NBo.Q 1cL0 1cN0 1cL0 1fB0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Thunder_Bay|CST EST EWT EPT EDT|60 50 40 40 40|0123141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141|-2q5S0 1iaN0 8x40 iv0 XNB0 1cL0 1cN0 1fz0 1cN0 1cL0 3Cp0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Toronto|EST EDT EWT EPT|50 40 40 40|01010101010101010101010101010101010101010101012301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-25TR0 1in0 11Wu 1nzu 1fD0 WJ0 1wr0 Nb0 1Ap0 On0 1zd0 On0 1wp0 TX0 1tB0 TX0 1tB0 TX0 1tB0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 4kM0 8x40 iv0 1o10 11z0 1nX0 11z0 1o10 11z0 1o10 1qL0 11D0 1nX0 11B0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Vancouver|PST PDT PWT PPT|80 70 70 70|0102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-25TO0 1in0 UGp0 8x10 iy0 1o10 17b0 1ip0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Whitehorse|YST YDT YWT YPT YDDT PST PDT|90 80 80 80 70 80 70|0101023040565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565|-25TN0 1in0 1o10 13V0 Ser0 8x00 iz0 LCL0 1fA0 1Be0 xDz0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Winnipeg|CST CDT CWT CPT|60 50 50 50|010101023010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aIi0 WL0 3ND0 1in0 Jap0 Rb0 aCN0 8x30 iw0 1tB0 11z0 1ip0 11z0 1o10 11z0 1o10 11z0 1rd0 10L0 1op0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 1cL0 1cN0 11z0 6i10 WL0 6i10 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1a00 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1a00 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 14o0 1lc0 14o0 1o00 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 14o0 1o00 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1o00 11A0 1o00 11A0 1o00 14o0 1lc0 14o0 1lc0 14o0 1o00 11A0 1o00 11A0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Yakutat|YST YWT YPT YDT AKST AKDT|90 80 80 80 90 80|01203030303030303030303030303030304545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-17T10 8x00 iz0 Vo10 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 cn0 10q0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"America/Yellowknife|zzz MST MWT MPT MDDT MDT|0 70 60 60 50 60|012314151515151515151515151515151515151515151515151515151515151515151515151515151515151515151515151515151515151515151515151|-1pdA0 hix0 8x20 ix0 LCL0 1fA0 zgO0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"Antarctica/Casey|zzz AWST CAST|0 -80 -b0|012121|-2q00 1DjS0 T90 40P0 KL0",
+			"Antarctica/Davis|zzz DAVT DAVT|0 -70 -50|01012121|-vyo0 iXt0 alj0 1D7v0 VB0 3Wn0 KN0",
+			"Antarctica/DumontDUrville|zzz PMT DDUT|0 -a0 -a0|0102|-U0o0 cfq0 bFm0",
+			"Antarctica/Macquarie|AEST AEDT zzz MIST|-a0 -b0 0 -b0|0102010101010101010101010101010101010101010101010101010101010101010101010101010101010101013|-29E80 19X0 4SL0 1ayy0 Lvs0 1cM0 1o00 Rc0 1wo0 Rc0 1wo0 U00 1wo0 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 11A0 1qM0 WM0 1qM0 Oo0 1zc0 Oo0 1zc0 Oo0 1wo0 WM0 1tA0 WM0 1tA0 U00 1tA0 U00 1tA0 11A0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 11A0 1o00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1cM0 1cM0 1cM0",
+			"Antarctica/Mawson|zzz MAWT MAWT|0 -60 -50|012|-CEo0 2fyk0",
+			"Antarctica/McMurdo|NZMT NZST NZST NZDT|-bu -cu -c0 -d0|01020202020202020202020202023232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323|-1GCVu Lz0 1tB0 11zu 1o0u 11zu 1o0u 11zu 1o0u 14nu 1lcu 14nu 1lcu 1lbu 11Au 1nXu 11Au 1nXu 11Au 1nXu 11Au 1nXu 11Au 1qLu WMu 1qLu 11Au 1n1bu IM0 1C00 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1qM0 14o0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1io0 17c0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1io0 17c0 1io0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00",
+			"Antarctica/Palmer|zzz ARST ART ART ARST CLT CLST|0 30 40 30 20 40 30|012121212123435656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656565656|-cao0 nD0 1vd0 SL0 1vd0 17z0 1cN0 1fz0 1cN0 1cL0 1cN0 asn0 Db0 jsN0 14N0 11z0 1o10 11z0 1qN0 WL0 1qN0 WL0 1qN0 1cL0 1cN0 11z0 1ld0 14n0 1qN0 11z0 1cN0 19X0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1ip0 1fz0 1fB0 11z0 1qN0 WL0 1qN0 WL0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1o10 19X0 1fB0 1nX0 G10 1EL0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0",
+			"Antarctica/Rothera|zzz ROTT|0 30|01|gOo0",
+			"Antarctica/Syowa|zzz SYOT|0 -30|01|-vs00",
+			"Antarctica/Troll|zzz UTC CEST|0 0 -20|01212121212121212121212121212121212121212121212121212121212121212121|1puo0 hd0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Antarctica/Vostok|zzz VOST|0 -60|01|-tjA0",
+			"Arctic/Longyearbyen|CET CEST|-10 -20|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2awM0 Qm0 W6o0 5pf0 WM0 1fA0 1cM0 1cM0 1cM0 1cM0 wJc0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1qM0 WM0 zpc0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Asia/Aden|LMT AST|-2X.S -30|01|-MG2X.S",
+			"Asia/Almaty|LMT ALMT ALMT ALMST|-57.M -50 -60 -70|0123232323232323232323232323232323232323232323232|-1Pc57.M eUo7.M 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 3Cl0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0",
+			"Asia/Amman|LMT EET EEST|-2n.I -20 -30|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1yW2n.I 1HiMn.I KL0 1oN0 11b0 1oN0 11b0 1pd0 1dz0 1cp0 11b0 1op0 11b0 fO10 1db0 1e10 1cL0 1cN0 1cL0 1cN0 1fz0 1pd0 10n0 1ld0 14n0 1hB0 15b0 1ip0 19X0 1cN0 1cL0 1cN0 17b0 1ld0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1So0 y00 1fc0 1dc0 1co0 1dc0 1cM0 1cM0 1cM0 1o00 11A0 1lc0 17c0 1cM0 1cM0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 4bX0 Dd0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0",
+			"Asia/Anadyr|LMT ANAT ANAT ANAST ANAST ANAST ANAT|-bN.U -c0 -d0 -e0 -d0 -c0 -b0|01232414141414141414141561414141414141414141414141414141414141561|-1PcbN.U eUnN.U 23CL0 1db0 1cN0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qN0 WM0",
+			"Asia/Aqtau|LMT FORT FORT SHET SHET SHEST AQTT AQTST AQTST AQTT|-3l.4 -40 -50 -50 -60 -60 -50 -60 -50 -40|012345353535353535353536767676898989898989898989896|-1Pc3l.4 eUnl.4 1jcL0 JDc0 1cL0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 2UK0 Fz0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cN0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 RW0",
+			"Asia/Aqtobe|LMT AKTT AKTT AKTST AKTT AQTT AQTST|-3M.E -40 -50 -60 -60 -50 -60|01234323232323232323232565656565656565656565656565|-1Pc3M.E eUnM.E 23CL0 1db0 1cM0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 2UK0 Fz0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0",
+			"Asia/Ashgabat|LMT ASHT ASHT ASHST ASHST TMT TMT|-3R.w -40 -50 -60 -50 -40 -50|012323232323232323232324156|-1Pc3R.w eUnR.w 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 ba0 xC0",
+			"Asia/Baghdad|BMT AST ADT|-2V.A -30 -40|012121212121212121212121212121212121212121212121212121|-26BeV.A 2ACnV.A 11b0 1cp0 1dz0 1dd0 1db0 1cN0 1cp0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1de0 1dc0 1dc0 1dc0 1cM0 1dc0 1cM0 1dc0 1cM0 1dc0 1dc0 1dc0 1cM0 1dc0 1cM0 1dc0 1cM0 1dc0 1dc0 1dc0 1cM0 1dc0 1cM0 1dc0 1cM0 1dc0 1dc0 1dc0 1cM0 1dc0 1cM0 1dc0 1cM0 1dc0",
+			"Asia/Bahrain|LMT GST AST|-3m.k -40 -30|012|-21Jfm.k 27BXm.k",
+			"Asia/Baku|LMT BAKT BAKT BAKST BAKST AZST AZT AZT AZST|-3j.o -30 -40 -50 -40 -40 -30 -40 -50|0123232323232323232323245657878787878787878787878787878787878787878787878787878787878787878787878787878787878787|-1Pc3j.o 1jUoj.o WCL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 10K0 c30 1cJ0 1cL0 8wu0 1o00 11z0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Asia/Bangkok|BMT ICT|-6G.4 -70|01|-218SG.4",
+			"Asia/Beirut|EET EEST|-20 -30|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-21aq0 1on0 1410 1db0 19B0 1in0 1ip0 WL0 1lQp0 11b0 1oN0 11b0 1oN0 11b0 1pd0 11b0 1oN0 11b0 q6N0 En0 1oN0 11b0 1oN0 11b0 1oN0 11b0 1pd0 11b0 1oN0 11b0 1op0 11b0 dA10 17b0 1iN0 17b0 1iN0 17b0 1iN0 17b0 1vB0 SL0 1mp0 13z0 1iN0 17b0 1iN0 17b0 1jd0 12n0 1a10 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0",
+			"Asia/Bishkek|LMT FRUT FRUT FRUST FRUST KGT KGST KGT|-4W.o -50 -60 -70 -60 -50 -60 -60|01232323232323232323232456565656565656565656565656567|-1Pc4W.o eUnW.o 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 11c0 1tX0 17b0 1ip0 17b0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1cPu 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 T8u",
+			"Asia/Brunei|LMT BNT BNT|-7D.E -7u -80|012|-1KITD.E gDc9.E",
+			"Asia/Calcutta|HMT BURT IST IST|-5R.k -6u -5u -6u|01232|-18LFR.k 1unn.k HB0 7zX0",
+			"Asia/Chita|LMT YAKT YAKT YAKST YAKST YAKT IRKT|-7x.Q -80 -90 -a0 -90 -a0 -80|012323232323232323232324123232323232323232323232323232323232323256|-21Q7x.Q pAnx.Q 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Choibalsan|LMT ULAT ULAT CHOST CHOT CHOT|-7C -70 -80 -a0 -90 -80|012343434343434343434343434343434343434343434345|-2APHC 2UkoC cKn0 1da0 1dd0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 6hD0 11z0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 3Db0",
+			"Asia/Chongqing|CST CDT|-80 -90|01010101010101010|-1c1I0 LX0 16p0 1jz0 1Myp0 Rb0 1o10 11z0 1o10 11z0 1qN0 11z0 1o10 11z0 1o10 11z0",
+			"Asia/Colombo|MMT IST IHST IST LKT LKT|-5j.w -5u -60 -6u -6u -60|01231451|-2zOtj.w 1rFbN.w 1zzu 7Apu 23dz0 11zu n3cu",
+			"Asia/Dacca|HMT BURT IST DACT BDT BDST|-5R.k -6u -5u -60 -60 -70|01213454|-18LFR.k 1unn.k HB0 m6n0 LqMu 1x6n0 1i00",
+			"Asia/Damascus|LMT EET EEST|-2p.c -20 -30|01212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-21Jep.c Hep.c 17b0 1ip0 17b0 1ip0 17b0 1ip0 19X0 1xRB0 11X0 1oN0 10L0 1pB0 11b0 1oN0 10L0 1mp0 13X0 1oN0 11b0 1pd0 11b0 1oN0 11b0 1oN0 11b0 1oN0 11b0 1pd0 11b0 1oN0 11b0 1oN0 11b0 1oN0 11b0 1pd0 11b0 1oN0 Nb0 1AN0 Nb0 bcp0 19X0 1gp0 19X0 3ld0 1xX0 Vd0 1Bz0 Sp0 1vX0 10p0 1dz0 1cN0 1cL0 1db0 1db0 1g10 1an0 1ap0 1db0 1fd0 1db0 1cN0 1db0 1dd0 1db0 1cp0 1dz0 1c10 1dX0 1cN0 1db0 1dd0 1db0 1cN0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1db0 1cN0 1db0 1cN0 19z0 1fB0 1qL0 11B0 1on0 Wp0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 1qL0 WN0 1qL0",
+			"Asia/Dili|LMT TLT JST TLT WITA|-8m.k -80 -90 -90 -80|012343|-2le8m.k 1dnXm.k 8HA0 1ew00 Xld0",
+			"Asia/Dubai|LMT GST|-3F.c -40|01|-21JfF.c",
+			"Asia/Dushanbe|LMT DUST DUST DUSST DUSST TJT|-4z.c -50 -60 -70 -60 -50|0123232323232323232323245|-1Pc4z.c eUnz.c 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 14N0",
+			"Asia/Gaza|EET EET EEST IST IDT|-20 -30 -30 -20 -30|010101010102020202020202020202023434343434343434343434343430202020202020202020202020202020202020202020202020202020202020202020202020202020202020|-1c2q0 5Rb0 10r0 1px0 10N0 1pz0 16p0 1jB0 16p0 1jx0 pBd0 Vz0 1oN0 11b0 1oO0 10N0 1pz0 10N0 1pb0 10N0 1pb0 10N0 1pb0 10N0 1pz0 10N0 1pb0 10N0 1pb0 11d0 1oL0 dW0 hfB0 Db0 1fB0 Rb0 npB0 11z0 1C10 IL0 1s10 10n0 1o10 WL0 1zd0 On0 1ld0 11z0 1o10 14n0 1o10 14n0 1nd0 12n0 1nd0 Xz0 1q10 12n0 M10 C00 17c0 1io0 17c0 1io0 17c0 1o00 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 17c0 1io0 18N0 1bz0 19z0 1gp0 1610 1iL0 11z0 1o10 14o0 1lA1 SKX 1xd1 MKX 1AN0 1a00 1fA0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0",
+			"Asia/Hebron|EET EET EEST IST IDT|-20 -30 -30 -20 -30|01010101010202020202020202020202343434343434343434343434343020202020202020202020202020202020202020202020202020202020202020202020202020202020202020|-1c2q0 5Rb0 10r0 1px0 10N0 1pz0 16p0 1jB0 16p0 1jx0 pBd0 Vz0 1oN0 11b0 1oO0 10N0 1pz0 10N0 1pb0 10N0 1pb0 10N0 1pb0 10N0 1pz0 10N0 1pb0 10N0 1pb0 11d0 1oL0 dW0 hfB0 Db0 1fB0 Rb0 npB0 11z0 1C10 IL0 1s10 10n0 1o10 WL0 1zd0 On0 1ld0 11z0 1o10 14n0 1o10 14n0 1nd0 12n0 1nd0 Xz0 1q10 12n0 M10 C00 17c0 1io0 17c0 1io0 17c0 1o00 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 17c0 1io0 18N0 1bz0 19z0 1gp0 1610 1iL0 12L0 1mN0 14o0 1lc0 Tb0 1xd1 MKX bB0 cn0 1cN0 1a00 1fA0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 19X0 1fB0 19X0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0",
+			"Asia/Ho_Chi_Minh|LMT PLMT ICT IDT JST|-76.E -76.u -70 -80 -90|0123423232|-2yC76.E bK00.a 1h7b6.u 5lz0 18o0 3Oq0 k5b0 aW00 BAM0",
+			"Asia/Hong_Kong|LMT HKT HKST JST|-7A.G -80 -90 -90|0121312121212121212121212121212121212121212121212121212121212121212121|-2CFHA.G 1sEP6.G 1cL0 ylu 93X0 1qQu 1tX0 Rd0 1In0 NB0 1cL0 11B0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1kL0 14N0 1nX0 U10 1tz0 U10 1wn0 Rd0 1wn0 U10 1tz0 U10 1tz0 U10 1tz0 U10 1wn0 Rd0 1wn0 Rd0 1wn0 U10 1tz0 U10 1tz0 17d0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 s10 1Vz0 1cN0 1cL0 1cN0 1cL0 6fd0 14n0",
+			"Asia/Hovd|LMT HOVT HOVT HOVST|-66.A -60 -70 -80|01232323232323232323232323232323232323232323232|-2APG6.A 2Uko6.A cKn0 1db0 1dd0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 6hD0 11z0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0",
+			"Asia/Irkutsk|IMT IRKT IRKT IRKST IRKST IRKT|-6V.5 -70 -80 -90 -80 -90|012323232323232323232324123232323232323232323232323232323232323252|-21zGV.5 pjXV.5 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Istanbul|IMT EET EEST TRST TRT|-1U.U -20 -30 -40 -30|012121212121212121212121212121212121212121212121212121234343434342121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2ogNU.U dzzU.U 11b0 8tB0 1on0 1410 1db0 19B0 1in0 3Rd0 Un0 1oN0 11b0 zSp0 CL0 mN0 1Vz0 1gN0 1pz0 5Rd0 1fz0 1yp0 ML0 1kp0 17b0 1ip0 17b0 1fB0 19X0 1jB0 18L0 1ip0 17z0 qdd0 xX0 3S10 Tz0 dA10 11z0 1o10 11z0 1qN0 11z0 1ze0 11B0 WM0 1qO0 WI0 1nX0 1rB0 10L0 11B0 1in0 17d0 1in0 2pX0 19E0 1fU0 16Q0 1iI0 16Q0 1iI0 1Vd0 pb0 3Kp0 14o0 1df0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cL0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WO0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 Xc0 1qo0 WM0 1qM0 11A0 1o00 1200 1nA0 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Asia/Jakarta|BMT JAVT WIB JST WIB WIB|-77.c -7k -7u -90 -80 -70|01232425|-1Q0Tk luM0 mPzO 8vWu 6kpu 4PXu xhcu",
+			"Asia/Jayapura|LMT WIT ACST|-9m.M -90 -9u|0121|-1uu9m.M sMMm.M L4nu",
+			"Asia/Jerusalem|JMT IST IDT IDDT|-2k.E -20 -30 -40|01212121212132121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-26Bek.E SyMk.E 5Rb0 10r0 1px0 10N0 1pz0 16p0 1jB0 16p0 1jx0 3LB0 Em0 or0 1cn0 1dB0 16n0 10O0 1ja0 1tC0 14o0 1cM0 1a00 11A0 1Na0 An0 1MP0 AJ0 1Kp0 LC0 1oo0 Wl0 EQN0 Db0 1fB0 Rb0 npB0 11z0 1C10 IL0 1s10 10n0 1o10 WL0 1zd0 On0 1ld0 11z0 1o10 14n0 1o10 14n0 1nd0 12n0 1nd0 Xz0 1q10 12n0 1hB0 1dX0 1ep0 1aL0 1eN0 17X0 1nf0 11z0 1tB0 19W0 1e10 17b0 1ep0 1gL0 18N0 1fz0 1eN0 17b0 1gq0 1gn0 19d0 1dz0 1c10 17X0 1hB0 1gn0 19d0 1dz0 1c10 17X0 1kp0 1dz0 1c10 1aL0 1eN0 1oL0 10N0 1oL0 10N0 1oL0 10N0 1rz0 W10 1rz0 W10 1rz0 10N0 1oL0 10N0 1oL0 10N0 1rz0 W10 1rz0 W10 1rz0 10N0 1oL0 10N0 1oL0 10N0 1oL0 10N0 1rz0 W10 1rz0 W10 1rz0 10N0 1oL0 10N0 1oL0 10N0 1rz0 W10 1rz0 W10 1rz0 W10 1rz0 10N0 1oL0 10N0 1oL0",
+			"Asia/Kabul|AFT AFT|-40 -4u|01|-10Qs0",
+			"Asia/Kamchatka|LMT PETT PETT PETST PETST|-ay.A -b0 -c0 -d0 -c0|01232323232323232323232412323232323232323232323232323232323232412|-1SLKy.A ivXy.A 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qN0 WM0",
+			"Asia/Karachi|LMT IST IST KART PKT PKST|-4s.c -5u -6u -50 -50 -60|012134545454|-2xoss.c 1qOKW.c 7zX0 eup0 LqMu 1fy01 1cL0 dK0X 11b0 1610 1jX0",
+			"Asia/Kashgar|LMT XJT|-5O.k -60|01|-1GgtO.k",
+			"Asia/Kathmandu|LMT IST NPT|-5F.g -5u -5J|012|-21JhF.g 2EGMb.g",
+			"Asia/Khandyga|LMT YAKT YAKT YAKST YAKST VLAT VLAST VLAT YAKT|-92.d -80 -90 -a0 -90 -a0 -b0 -b0 -a0|01232323232323232323232412323232323232323232323232565656565656565782|-21Q92.d pAp2.d 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 qK0 yN0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 17V0 7zD0",
+			"Asia/Krasnoyarsk|LMT KRAT KRAT KRAST KRAST KRAT|-6b.q -60 -70 -80 -70 -80|012323232323232323232324123232323232323232323232323232323232323252|-21Hib.q prAb.q 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Kuala_Lumpur|SMT MALT MALST MALT MALT JST MYT|-6T.p -70 -7k -7k -7u -90 -80|01234546|-2Bg6T.p 17anT.p 7hXE dM00 17bO 8Fyu 1so1u",
+			"Asia/Kuching|LMT BORT BORT BORTST JST MYT|-7l.k -7u -80 -8k -90 -80|01232323232323232425|-1KITl.k gDbP.k 6ynu AnE 1O0k AnE 1NAk AnE 1NAk AnE 1NAk AnE 1O0k AnE 1NAk AnE pAk 8Fz0 1so10",
+			"Asia/Kuwait|LMT AST|-3b.U -30|01|-MG3b.U",
+			"Asia/Macao|LMT MOT MOST CST|-7y.k -80 -90 -80|0121212121212121212121212121212121212121213|-2le7y.k 1XO34.k 1wn0 Rd0 1wn0 R9u 1wqu U10 1tz0 TVu 1tz0 17gu 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cJu 1cL0 1cN0 1fz0 1cN0 1cOu 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cJu 1cL0 1cN0 1fz0 1cN0 1cL0 KEp0",
+			"Asia/Magadan|LMT MAGT MAGT MAGST MAGST MAGT|-a3.c -a0 -b0 -c0 -b0 -c0|012323232323232323232324123232323232323232323232323232323232323251|-1Pca3.c eUo3.c 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Makassar|LMT MMT WITA JST|-7V.A -7V.A -80 -90|01232|-21JjV.A vfc0 myLV.A 8ML0",
+			"Asia/Manila|PHT PHST JST|-80 -90 -90|010201010|-1kJI0 AL0 cK10 65X0 mXB0 vX0 VK10 1db0",
+			"Asia/Muscat|LMT GST|-3S.o -40|01|-21JfS.o",
+			"Asia/Nicosia|LMT EET EEST|-2d.s -20 -30|01212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1Vc2d.s 2a3cd.s 1cL0 1qp0 Xz0 19B0 19X0 1fB0 1db0 1cp0 1cL0 1fB0 19X0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1o30 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Asia/Novokuznetsk|LMT KRAT KRAT KRAST KRAST NOVST NOVT NOVT|-5M.M -60 -70 -80 -70 -70 -60 -70|012323232323232323232324123232323232323232323232323232323232325672|-1PctM.M eULM.M 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qN0 WM0 8Hz0",
+			"Asia/Novosibirsk|LMT NOVT NOVT NOVST NOVST|-5v.E -60 -70 -80 -70|0123232323232323232323241232341414141414141414141414141414141414121|-21Qnv.E pAFv.E 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 ml0 Os0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Omsk|LMT OMST OMST OMSST OMSST OMST|-4R.u -50 -60 -70 -60 -70|012323232323232323232324123232323232323232323232323232323232323252|-224sR.u pMLR.u 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Oral|LMT URAT URAT URAST URAT URAST ORAT ORAST ORAT|-3p.o -40 -50 -60 -60 -50 -40 -50 -50|012343232323232323251516767676767676767676767676768|-1Pc3p.o eUnp.o 23CL0 1db0 1cM0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1fA0 2UK0 Fz0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 RW0",
+			"Asia/Pontianak|LMT PMT WIB JST WIB WITA WIB|-7h.k -7h.k -7u -90 -80 -80 -70|012324256|-2ua7h.k XE00 munL.k 8Rau 6kpu 4PXu xhcu Wqnu",
+			"Asia/Pyongyang|LMT KST JCST JST KST|-8n -8u -90 -90 -90|01234|-2um8n 97XR 12FXu jdA0",
+			"Asia/Qatar|LMT GST AST|-3q.8 -40 -30|012|-21Jfq.8 27BXq.8",
+			"Asia/Qyzylorda|LMT KIZT KIZT KIZST KIZT QYZT QYZT QYZST|-4l.Q -40 -50 -60 -60 -50 -60 -70|012343232323232323232325676767676767676767676767676|-1Pc4l.Q eUol.Q 23CL0 1db0 1cM0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 2UK0 dC0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0",
+			"Asia/Rangoon|RMT BURT JST MMT|-6o.E -6u -90 -6u|0123|-21Jio.E SmnS.E 7j9u",
+			"Asia/Riyadh|LMT AST|-36.Q -30|01|-TvD6.Q",
+			"Asia/Sakhalin|LMT JCST JST SAKT SAKST SAKST SAKT|-9u.M -90 -90 -b0 -c0 -b0 -a0|0123434343434343434343435634343434343565656565656565656565656565636|-2AGVu.M 1iaMu.M je00 1qFa0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o10 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Samarkand|LMT SAMT SAMT SAMST TAST UZST UZT|-4r.R -40 -50 -60 -60 -60 -50|01234323232323232323232356|-1Pc4r.R eUor.R 23CL0 1db0 1cM0 1dc0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 11x0 bf0",
+			"Asia/Seoul|LMT KST JCST JST KST KDT KDT|-8r.Q -8u -90 -90 -90 -9u -a0|01234151515151515146464|-2um8r.Q 97XV.Q 12FXu jjA0 kKo0 2I0u OL0 1FB0 Rb0 1qN0 TX0 1tB0 TX0 1tB0 TX0 1tB0 TX0 2ap0 12FBu 11A0 1o00 11A0",
+			"Asia/Singapore|SMT MALT MALST MALT MALT JST SGT SGT|-6T.p -70 -7k -7k -7u -90 -7u -80|012345467|-2Bg6T.p 17anT.p 7hXE dM00 17bO 8Fyu Mspu DTA0",
+			"Asia/Srednekolymsk|LMT MAGT MAGT MAGST MAGST MAGT SRET|-ae.Q -a0 -b0 -c0 -b0 -c0 -b0|012323232323232323232324123232323232323232323232323232323232323256|-1Pcae.Q eUoe.Q 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Taipei|JWST JST CST CDT|-80 -90 -80 -90|01232323232323232323232323232323232323232|-1iw80 joM0 1yo0 Tz0 1ip0 1jX0 1cN0 11b0 1oN0 11b0 1oN0 11b0 1oN0 11b0 10N0 1BX0 10p0 1pz0 10p0 1pz0 10p0 1db0 1dd0 1db0 1cN0 1db0 1cN0 1db0 1cN0 1db0 1BB0 ML0 1Bd0 ML0 uq10 1db0 1cN0 1db0 97B0 AL0",
+			"Asia/Tashkent|LMT TAST TAST TASST TASST UZST UZT|-4B.b -50 -60 -70 -60 -60 -50|01232323232323232323232456|-1Pc4B.b eUnB.b 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 11y0 bf0",
+			"Asia/Tbilisi|TBMT TBIT TBIT TBIST TBIST GEST GET GET GEST|-2X.b -30 -40 -50 -40 -40 -30 -40 -50|0123232323232323232323245656565787878787878787878567|-1Pc2X.b 1jUnX.b WCL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 3y0 19f0 1cK0 1cL0 1cN0 1cL0 1cN0 1cL0 1cM0 1cL0 1fB0 3Nz0 11B0 1nX0 11B0 1qL0 WN0 1qL0 WN0 1qL0 11B0 1nX0 11B0 1nX0 11B0 An0 Os0 WM0",
+			"Asia/Tehran|LMT TMT IRST IRST IRDT IRDT|-3p.I -3p.I -3u -40 -50 -4u|01234325252525252525252525252525252525252525252525252525252525252525252525252525252525252525252525252|-2btDp.I 1d3c0 1huLT.I TXu 1pz0 sN0 vAu 1cL0 1dB0 1en0 pNB0 UL0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 64p0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0 1cN0 1dz0 1cp0 1dz0 1cp0 1dz0 1cp0 1dz0",
+			"Asia/Thimbu|LMT IST BTT|-5W.A -5u -60|012|-Su5W.A 1BGMs.A",
+			"Asia/Tokyo|JCST JST JDT|-90 -90 -a0|0121212121|-1iw90 pKq0 QL0 1lB0 13X0 1zB0 NX0 1zB0 NX0",
+			"Asia/Ulaanbaatar|LMT ULAT ULAT ULAST|-77.w -70 -80 -90|01232323232323232323232323232323232323232323232|-2APH7.w 2Uko7.w cKn0 1db0 1dd0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 6hD0 11z0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0",
+			"Asia/Ust-Nera|LMT YAKT YAKT MAGST MAGT MAGST MAGT MAGT VLAT VLAT|-9w.S -80 -90 -c0 -b0 -b0 -a0 -c0 -b0 -a0|0123434343434343434343456434343434343434343434343434343434343434789|-21Q9w.S pApw.S 23CL0 1d90 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 17V0 7zD0",
+			"Asia/Vladivostok|LMT VLAT VLAT VLAST VLAST VLAT|-8L.v -90 -a0 -b0 -a0 -b0|012323232323232323232324123232323232323232323232323232323232323252|-1SJIL.v itXL.v 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Yakutsk|LMT YAKT YAKT YAKST YAKST YAKT|-8C.W -80 -90 -a0 -90 -a0|012323232323232323232324123232323232323232323232323232323232323252|-21Q8C.W pAoC.W 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Yekaterinburg|LMT PMT SVET SVET SVEST SVEST YEKT YEKST YEKT|-42.x -3J.5 -40 -50 -60 -50 -50 -60 -60|0123434343434343434343435267676767676767676767676767676767676767686|-2ag42.x 7mQh.s qBvJ.5 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Asia/Yerevan|LMT YERT YERT YERST YERST AMST AMT AMT AMST|-2W -30 -40 -50 -40 -40 -30 -40 -50|0123232323232323232323245656565657878787878787878787878787878787|-1Pc2W 1jUnW WCL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1am0 2r0 1cJ0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 3Fb0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0",
+			"Atlantic/Azores|HMT AZOT AZOST AZOMT AZOT AZOST WET|1S.w 20 10 0 10 0 0|01212121212121212121212121212121212121212121232123212321232121212121212121212121212121212121212121454545454545454545454545454545456545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-2ldW5.s aPX5.s Sp0 LX0 1vc0 Tc0 1uM0 SM0 1vc0 Tc0 1vc0 SM0 1vc0 6600 1co0 3E00 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 3I00 17c0 1cM0 1cM0 3Fc0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Dc0 1tA0 1cM0 1dc0 1400 gL0 IM0 s10 U00 dX0 Rc0 pd0 Rc0 gL0 Oo0 pd0 Rc0 gL0 Oo0 pd0 14o0 1cM0 1cP0 1cM0 1cM0 1cM0 1cM0 1cM0 3Co0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 qIl0 1cM0 1fA0 1cM0 1cM0 1cN0 1cL0 1cN0 1cM0 1cM0 1cM0 1cM0 1cN0 1cL0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cL0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Atlantic/Bermuda|LMT AST ADT|4j.i 40 30|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1BnRE.G 1LTbE.G 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"Atlantic/Canary|LMT CANT WET WEST|11.A 10 0 -10|01232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-1UtaW.o XPAW.o 1lAK0 1a10 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Atlantic/Cape_Verde|LMT CVT CVST CVT|1y.4 20 10 10|01213|-2xomp.U 1qOMp.U 7zX0 1djf0",
+			"Atlantic/Faeroe|LMT WET WEST|r.4 0 -10|01212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2uSnw.U 2Wgow.U 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Atlantic/Madeira|FMT MADT MADST MADMT WET WEST|17.A 10 0 -10 0 -10|01212121212121212121212121212121212121212121232123212321232121212121212121212121212121212121212121454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-2ldWQ.o aPWQ.o Sp0 LX0 1vc0 Tc0 1uM0 SM0 1vc0 Tc0 1vc0 SM0 1vc0 6600 1co0 3E00 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 3I00 17c0 1cM0 1cM0 3Fc0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Dc0 1tA0 1cM0 1dc0 1400 gL0 IM0 s10 U00 dX0 Rc0 pd0 Rc0 gL0 Oo0 pd0 Rc0 gL0 Oo0 pd0 14o0 1cM0 1cP0 1cM0 1cM0 1cM0 1cM0 1cM0 3Co0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 qIl0 1cM0 1fA0 1cM0 1cM0 1cN0 1cL0 1cN0 1cM0 1cM0 1cM0 1cM0 1cN0 1cL0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Atlantic/Reykjavik|RMT IST ISST GMT|1r.M 10 0 0|01212121212121212121212121212121212121212121212121212121212121213|-2uWmw.c mfaw.c 1Bd0 ML0 1LB0 NLX0 1pe0 zd0 1EL0 LA0 1C00 Oo0 1wo0 Rc0 1wo0 Rc0 1wo0 Rc0 1zc0 Oo0 1zc0 14o0 1lc0 14o0 1lc0 14o0 1o00 11A0 1lc0 14o0 1o00 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1o00 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1o00 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1lc0 14o0 1o00 14o0",
+			"Atlantic/South_Georgia|GST|20|0|",
+			"Atlantic/Stanley|SMT FKT FKST FKT FKST|3P.o 40 30 30 20|0121212121212134343212121212121212121212121212121212121212121212121212|-2kJw8.A 12bA8.A 19X0 1fB0 19X0 1ip0 19X0 1fB0 19X0 1fB0 19X0 1fB0 Cn0 1Cc10 WL0 1qL0 U10 1tz0 U10 1qM0 WN0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1tz0 U10 1tz0 WN0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1tz0 WN0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1qL0 WN0 1qN0 U10 1wn0 Rd0 1wn0 U10 1tz0 U10 1tz0 U10 1tz0 U10 1tz0 U10 1wn0 U10 1tz0 U10 1tz0 U10",
+			"Australia/ACT|AEST AEDT|-a0 -b0|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-293lX xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 14o0 1o00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 U00 1qM0 WM0 1tA0 WM0 1tA0 U00 1tA0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 11A0 1o00 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 14o0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/Adelaide|ACST ACDT|-9u -au|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-293lt xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 U00 1qM0 WM0 1tA0 WM0 1tA0 U00 1tA0 U00 1tA0 Oo0 1zc0 WM0 1qM0 Rc0 1zc0 U00 1tA0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 14o0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/Brisbane|AEST AEDT|-a0 -b0|01010101010101010|-293lX xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 H1A0 Oo0 1zc0 Oo0 1zc0 Oo0",
+			"Australia/Broken_Hill|ACST ACDT|-9u -au|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-293lt xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 14o0 1o00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 U00 1qM0 WM0 1tA0 WM0 1tA0 U00 1tA0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 14o0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/Currie|AEST AEDT|-a0 -b0|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-29E80 19X0 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 11A0 1qM0 WM0 1qM0 Oo0 1zc0 Oo0 1zc0 Oo0 1wo0 WM0 1tA0 WM0 1tA0 U00 1tA0 U00 1tA0 11A0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 11A0 1o00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/Darwin|ACST ACDT|-9u -au|010101010|-293lt xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0",
+			"Australia/Eucla|ACWST ACWDT|-8J -9J|0101010101010101010|-293kI xcX 10jd0 yL0 1cN0 1cL0 1gSp0 Oo0 l5A0 Oo0 iJA0 G00 zU00 IM0 1qM0 11A0 1o00 11A0",
+			"Australia/Hobart|AEST AEDT|-a0 -b0|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-29E80 19X0 10jd0 yL0 1cN0 1cL0 1fB0 19X0 VfB0 1cM0 1o00 Rc0 1wo0 Rc0 1wo0 U00 1wo0 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 11A0 1qM0 WM0 1qM0 Oo0 1zc0 Oo0 1zc0 Oo0 1wo0 WM0 1tA0 WM0 1tA0 U00 1tA0 U00 1tA0 11A0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 11A0 1o00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/LHI|AEST LHST LHDT LHDT|-a0 -au -bu -b0|0121212121313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313|raC0 1zdu Rb0 1zd0 On0 1zd0 On0 1zd0 On0 1zd0 TXu 1qMu WLu 1tAu WLu 1tAu TXu 1tAu Onu 1zcu Onu 1zcu Onu 1zcu Rbu 1zcu Onu 1zcu Onu 1zcu 11zu 1o0u 11zu 1o0u 11zu 1o0u 11zu 1qMu WLu 11Au 1nXu 1qMu 11zu 1o0u 11zu 1o0u 11zu 1qMu WLu 1qMu 11zu 1o0u WLu 1qMu 14nu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1fAu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1fAu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1fzu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1fAu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1cMu 1cLu 1fAu 1cLu 1cMu 1cLu 1cMu",
+			"Australia/Lindeman|AEST AEDT|-a0 -b0|010101010101010101010|-293lX xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 H1A0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0",
+			"Australia/Melbourne|AEST AEDT|-a0 -b0|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101|-293lX xcX 10jd0 yL0 1cN0 1cL0 1fB0 19X0 17c10 LA0 1C00 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 U00 1qM0 WM0 1qM0 11A0 1tA0 U00 1tA0 U00 1tA0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 11A0 1o00 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 14o0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0",
+			"Australia/Perth|AWST AWDT|-80 -90|0101010101010101010|-293jX xcX 10jd0 yL0 1cN0 1cL0 1gSp0 Oo0 l5A0 Oo0 iJA0 G00 zU00 IM0 1qM0 11A0 1o00 11A0",
+			"CET|CET CEST|-10 -20|01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1o00 11A0 Qrc0 6i00 WM0 1fA0 1cM0 1cM0 1cM0 16M0 1gMM0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"CST6CDT|CST CDT CWT CPT|60 50 50 50|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261s0 1nX0 11B0 1nX0 SgN0 8x30 iw0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"Chile/EasterIsland|EMT EASST EAST EAST EASST|7h.s 60 70 60 50|012121212121212121212121212121213434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434|-1uSgG.w nHUG.w op0 9UK0 RXB0 WL0 1zd0 On0 1ip0 11z0 1o10 11z0 1qN0 WL0 1ld0 14n0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 WL0 1qN0 1cL0 1cN0 11z0 1ld0 14n0 1qN0 11z0 1cN0 19X0 1qN0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1ip0 1fz0 1fB0 11z0 1qN0 WL0 1qN0 WL0 1qN0 WL0 1qN0 11z0 1o10 11z0 1o10 11z0 1qN0 WL0 1qN0 17b0 1ip0 11z0 1o10 19X0 1fB0 1nX0 G10 1EL0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1wn0 Rd0 1zb0 Op0 1zb0 Rd0 1wn0 Rd0",
+			"EET|EET EEST|-20 -30|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|hDB0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"EST|EST|50|0|",
+			"EST5EDT|EST EDT EWT EPT|50 40 40 40|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261t0 1nX0 11B0 1nX0 SgN0 8x40 iv0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"Eire|DMT IST GMT BST IST|p.l -y.D 0 -10 -10|01232323232324242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242|-2ax9y.D Rc0 1fzy.D 14M0 1fc0 1g00 1co0 1dc0 1co0 1oo0 1400 1dc0 19A0 1io0 1io0 WM0 1o00 14o0 1o00 17c0 1io0 17c0 1fA0 1a00 1lc0 17c0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1cM0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1a00 1io0 1qM0 Dc0 g5X0 14p0 1wn0 17d0 1io0 11A0 1o00 17c0 1fA0 1a00 1fA0 1cM0 1fA0 1a00 17c0 1fA0 1a00 1io0 17c0 1lc0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1a00 1a00 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1tA0 IM0 90o0 U00 1tA0 U00 1tA0 U00 1tA0 U00 1tA0 WM0 1qM0 WM0 1qM0 WM0 1tA0 U00 1tA0 U00 1tA0 11z0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 14o0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Etc/GMT+0|GMT|0|0|",
+			"Etc/GMT+1|GMT+1|10|0|",
+			"Etc/GMT+10|GMT+10|a0|0|",
+			"Etc/GMT+11|GMT+11|b0|0|",
+			"Etc/GMT+12|GMT+12|c0|0|",
+			"Etc/GMT+2|GMT+2|20|0|",
+			"Etc/GMT+3|GMT+3|30|0|",
+			"Etc/GMT+4|GMT+4|40|0|",
+			"Etc/GMT+5|GMT+5|50|0|",
+			"Etc/GMT+6|GMT+6|60|0|",
+			"Etc/GMT+7|GMT+7|70|0|",
+			"Etc/GMT+8|GMT+8|80|0|",
+			"Etc/GMT+9|GMT+9|90|0|",
+			"Etc/GMT-1|GMT-1|-10|0|",
+			"Etc/GMT-10|GMT-10|-a0|0|",
+			"Etc/GMT-11|GMT-11|-b0|0|",
+			"Etc/GMT-12|GMT-12|-c0|0|",
+			"Etc/GMT-13|GMT-13|-d0|0|",
+			"Etc/GMT-14|GMT-14|-e0|0|",
+			"Etc/GMT-2|GMT-2|-20|0|",
+			"Etc/GMT-3|GMT-3|-30|0|",
+			"Etc/GMT-4|GMT-4|-40|0|",
+			"Etc/GMT-5|GMT-5|-50|0|",
+			"Etc/GMT-6|GMT-6|-60|0|",
+			"Etc/GMT-7|GMT-7|-70|0|",
+			"Etc/GMT-8|GMT-8|-80|0|",
+			"Etc/GMT-9|GMT-9|-90|0|",
+			"Etc/UCT|UCT|0|0|",
+			"Etc/UTC|UTC|0|0|",
+			"Europe/Amsterdam|AMT NST NEST NET CEST CET|-j.w -1j.w -1k -k -20 -10|010101010101010101010101010101010101010101012323234545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545|-2aFcj.w 11b0 1iP0 11A0 1io0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1co0 1io0 1yo0 Pc0 1a00 1fA0 1Bc0 Mo0 1tc0 Uo0 1tA0 U00 1uo0 W00 1s00 VA0 1so0 Vc0 1sM0 UM0 1wo0 Rc0 1u00 Wo0 1rA0 W00 1s00 VA0 1sM0 UM0 1w00 fV0 BCX.w 1tA0 U00 1u00 Wo0 1sm0 601k WM0 1fA0 1cM0 1cM0 1cM0 16M0 1gMM0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Andorra|WET CET CEST|0 -10 -20|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-UBA0 1xIN0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Athens|AMT EET EEST CEST CET|-1y.Q -20 -30 -20 -10|012123434121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2a61x.Q CNbx.Q mn0 kU10 9b0 3Es0 Xa0 1fb0 1dd0 k3X0 Nz0 SCp0 1vc0 SO0 1cM0 1a00 1ao0 1fc0 1a10 1fG0 1cg0 1dX0 1bX0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Belfast|GMT BST BDST|0 -10 -20|0101010101010101010101010101010101010101010101010121212121210101210101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2axa0 Rc0 1fA0 14M0 1fc0 1g00 1co0 1dc0 1co0 1oo0 1400 1dc0 19A0 1io0 1io0 WM0 1o00 14o0 1o00 17c0 1io0 17c0 1fA0 1a00 1lc0 17c0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1cM0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1a00 1io0 1qM0 Dc0 2Rz0 Dc0 1zc0 Oo0 1zc0 Rc0 1wo0 17c0 1iM0 FA0 xB0 1fA0 1a00 14o0 bb0 LA0 xB0 Rc0 1wo0 11A0 1o00 17c0 1fA0 1a00 1fA0 1cM0 1fA0 1a00 17c0 1fA0 1a00 1io0 17c0 1lc0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1a00 1a00 1qM0 WM0 1qM0 11A0 1o00 WM0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1tA0 IM0 90o0 U00 1tA0 U00 1tA0 U00 1tA0 U00 1tA0 WM0 1qM0 WM0 1qM0 WM0 1tA0 U00 1tA0 U00 1tA0 11z0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1o00 14o0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Belgrade|CET CEST|-10 -20|01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-19RC0 3IP0 WM0 1fA0 1cM0 1cM0 1rc0 Qo0 1vmo0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Berlin|CET CEST CEMT|-10 -20 -30|01010101010101210101210101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1o00 11A0 Qrc0 6i00 WM0 1fA0 1cM0 1cM0 1cM0 kL0 Nc0 m10 WM0 1ao0 1cp0 dX0 jz0 Dd0 1io0 17c0 1fA0 1a00 1ehA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Bratislava|CET CEST|-10 -20|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1o00 11A0 Qrc0 6i00 WM0 1fA0 1cM0 16M0 1lc0 1tA0 17A0 11c0 1io0 17c0 1io0 17c0 1fc0 1ao0 1bNc0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Brussels|WET CET CEST WEST|0 -10 -20 -10|0121212103030303030303030303030303030303030303030303212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2ehc0 3zX0 11c0 1iO0 11A0 1o00 11A0 my0 Ic0 1qM0 Rc0 1EM0 UM0 1u00 10o0 1io0 1io0 17c0 1a00 1fA0 1cM0 1cM0 1io0 17c0 1fA0 1a00 1io0 1a30 1io0 17c0 1fA0 1a00 1io0 17c0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Dc0 y00 5Wn0 WM0 1fA0 1cM0 16M0 1iM0 16M0 1C00 Uo0 1eeo0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Bucharest|BMT EET EEST|-1I.o -20 -30|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1xApI.o 20LI.o RA0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1Axc0 On0 1fA0 1a10 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cK0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cL0 1cN0 1cL0 1fB0 1nX0 11E0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Budapest|CET CEST|-10 -20|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1ip0 17b0 1op0 1tb0 Q2m0 3Ne0 WM0 1fA0 1cM0 1cM0 1oJ0 1dc0 1030 1fA0 1cM0 1cM0 1cM0 1cM0 1fA0 1a00 1iM0 1fA0 8Ha0 Rb0 1wN0 Rb0 1BB0 Lz0 1C20 LB0 SNX0 1a10 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Busingen|CET CEST|-10 -20|01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-19Lc0 11A0 1o00 11A0 1xG10 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Chisinau|CMT BMT EET EEST CEST CET MSK MSD|-1T -1I.o -20 -30 -20 -10 -30 -40|0123232323232323232345454676767676767676767623232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232|-26jdT wGMa.A 20LI.o RA0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 27A0 2en0 39g0 WM0 1fA0 1cM0 V90 1t7z0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1ty0 2bD0 1cM0 1cK0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1nX0 11E0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Copenhagen|CET CEST|-10 -20|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2azC0 Tz0 VuO0 60q0 WM0 1fA0 1cM0 1cM0 1cM0 S00 1HA0 Nc0 1C00 Dc0 1Nc0 Ao0 1h5A0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Gibraltar|GMT BST BDST CET CEST|0 -10 -20 -10 -20|010101010101010101010101010101010101010101010101012121212121010121010101010101010101034343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343|-2axa0 Rc0 1fA0 14M0 1fc0 1g00 1co0 1dc0 1co0 1oo0 1400 1dc0 19A0 1io0 1io0 WM0 1o00 14o0 1o00 17c0 1io0 17c0 1fA0 1a00 1lc0 17c0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1cM0 1io0 17c0 1fA0 1a00 1io0 17c0 1io0 17c0 1fA0 1a00 1io0 1qM0 Dc0 2Rz0 Dc0 1zc0 Oo0 1zc0 Rc0 1wo0 17c0 1iM0 FA0 xB0 1fA0 1a00 14o0 bb0 LA0 xB0 Rc0 1wo0 11A0 1o00 17c0 1fA0 1a00 1fA0 1cM0 1fA0 1a00 17c0 1fA0 1a00 1io0 17c0 1lc0 17c0 1fA0 10Jz0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Helsinki|HMT EET EEST|-1D.N -20 -30|0121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-1WuND.N OULD.N 1dA0 1xGq0 1cM0 1cM0 1cM0 1cN0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Kaliningrad|CET CEST CET CEST MSK MSD EEST EET FET|-10 -20 -20 -30 -30 -40 -30 -20 -30|0101010101010232454545454545454545454676767676767676767676767676767676767676787|-2aFe0 11d0 1iO0 11A0 1o00 11A0 Qrc0 6i00 WM0 1fA0 1cM0 1cM0 Am0 Lb0 1en0 op0 1pNz0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 1cJ0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Europe/Kiev|KMT EET MSK CEST CET MSD EEST|-22.4 -20 -30 -20 -10 -40 -30|0123434252525252525252525256161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161|-1Pc22.4 eUo2.4 rnz0 2Hg0 WM0 1fA0 da0 1v4m0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 Db0 3220 1cK0 1cL0 1cN0 1cL0 1cN0 1cL0 1cQ0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Lisbon|LMT WET WEST WEMT CET CEST|A.J 0 -10 -20 -10 -20|012121212121212121212121212121212121212121212321232123212321212121212121212121212121212121212121214121212121212121212121212121212124545454212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2ldXn.f aPWn.f Sp0 LX0 1vc0 Tc0 1uM0 SM0 1vc0 Tc0 1vc0 SM0 1vc0 6600 1co0 3E00 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 3I00 17c0 1cM0 1cM0 3Fc0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Dc0 1tA0 1cM0 1dc0 1400 gL0 IM0 s10 U00 dX0 Rc0 pd0 Rc0 gL0 Oo0 pd0 Rc0 gL0 Oo0 pd0 14o0 1cM0 1cP0 1cM0 1cM0 1cM0 1cM0 1cM0 3Co0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 pvy0 1cM0 1cM0 1fA0 1cM0 1cM0 1cN0 1cL0 1cN0 1cM0 1cM0 1cM0 1cM0 1cN0 1cL0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Luxembourg|LMT CET CEST WET WEST WEST WET|-o.A -10 -20 0 -10 -20 -10|0121212134343434343434343434343434343434343434343434565651212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2DG0o.A t6mo.A TB0 1nX0 Up0 1o20 11A0 rW0 CM0 1qP0 R90 1EO0 UK0 1u20 10m0 1ip0 1in0 17e0 19W0 1fB0 1db0 1cp0 1in0 17d0 1fz0 1a10 1in0 1a10 1in0 17f0 1fA0 1a00 1io0 17c0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Dc0 vA0 60L0 WM0 1fA0 1cM0 17c0 1io0 16M0 1C00 Uo0 1eeo0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Madrid|WET WEST WEMT CET CEST|0 -10 -20 -10 -20|01010101010101010101010121212121234343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343|-28dd0 11A0 1go0 19A0 1co0 1dA0 b1A0 18o0 3I00 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 iyo0 Rc0 18o0 1hc0 1io0 1a00 14o0 5aL0 MM0 1vc0 17A0 1i00 1bc0 1eo0 17d0 1in0 17A0 6hA0 10N0 XIL0 1a10 1in0 17d0 19X0 1cN0 1fz0 1a10 1fX0 1cp0 1cO0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Malta|CET CEST|-10 -20|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2as10 M00 1cM0 1cM0 14o0 1o00 WM0 1qM0 17c0 1cM0 M3A0 5M20 WM0 1fA0 1cM0 1cM0 1cM0 16m0 1de0 1lc0 14m0 1lc0 WO0 1qM0 GTW0 On0 1C10 Lz0 1C10 Lz0 1EN0 Lz0 1C10 Lz0 1zd0 Oo0 1C00 On0 1cp0 1cM0 1lA0 Xc0 1qq0 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1o10 11z0 1iN0 19z0 1fB0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Minsk|MMT EET MSK CEST CET MSD EEST FET|-1O -20 -30 -20 -10 -40 -30 -30|012343432525252525252525252616161616161616161616161616161616161616172|-1Pc1O eUnO qNX0 3gQ0 WM0 1fA0 1cM0 Al0 1tsn0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 3Fc0 1cN0 1cK0 1cM0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hy0",
+			"Europe/Monaco|PMT WET WEST WEMT CET CEST|-9.l 0 -10 -20 -10 -20|01212121212121212121212121212121212121212121212121232323232345454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-2nco9.l cNb9.l HA0 19A0 1iM0 11c0 1oo0 Wo0 1rc0 QM0 1EM0 UM0 1u00 10o0 1io0 1wo0 Rc0 1a00 1fA0 1cM0 1cM0 1io0 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 1fA0 1a00 1io0 17c0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Df0 2RV0 11z0 11B0 1ze0 WM0 1fA0 1cM0 1fa0 1aq0 16M0 1ekn0 1cL0 1fC0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Moscow|MMT MMT MST MDST MSD MSK MSM EET EEST MSK|-2u.h -2v.j -3v.j -4v.j -40 -30 -50 -20 -30 -40|012132345464575454545454545454545458754545454545454545454545454545454545454595|-2ag2u.h 2pyW.W 1bA0 11X0 GN0 1Hb0 c20 imv.j 3DA0 dz0 15A0 c10 2q10 iM10 23CL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 IM0 rU0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Europe/Paris|PMT WET WEST CEST CET WEMT|-9.l 0 -10 -20 -10 -20|0121212121212121212121212121212121212121212121212123434352543434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434343434|-2nco8.l cNb8.l HA0 19A0 1iM0 11c0 1oo0 Wo0 1rc0 QM0 1EM0 UM0 1u00 10o0 1io0 1wo0 Rc0 1a00 1fA0 1cM0 1cM0 1io0 17c0 1fA0 1a00 1io0 1a00 1io0 17c0 1fA0 1a00 1io0 17c0 1cM0 1cM0 1a00 1io0 1cM0 1cM0 1a00 1fA0 1io0 17c0 1cM0 1cM0 1a00 1fA0 1io0 1qM0 Df0 Ik0 5M30 WM0 1fA0 1cM0 Vx0 hB0 1aq0 16M0 1ekn0 1cL0 1fC0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Riga|RMT LST EET MSK CEST CET MSD EEST|-1A.y -2A.y -20 -30 -20 -10 -40 -30|010102345454536363636363636363727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272|-25TzA.y 11A0 1iM0 ko0 gWm0 yDXA.y 2bX0 3fE0 WM0 1fA0 1cM0 1cM0 4m0 1sLy0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 1o00 11A0 1o00 11A0 1qM0 3oo0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Rome|CET CEST|-10 -20|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2as10 M00 1cM0 1cM0 14o0 1o00 WM0 1qM0 17c0 1cM0 M3A0 5M20 WM0 1fA0 1cM0 16K0 1iO0 16m0 1de0 1lc0 14m0 1lc0 WO0 1qM0 GTW0 On0 1C10 Lz0 1C10 Lz0 1EN0 Lz0 1C10 Lz0 1zd0 Oo0 1C00 On0 1C10 Lz0 1zd0 On0 1C10 LA0 1C00 LA0 1zc0 Oo0 1C00 Oo0 1zc0 Oo0 1fC0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Samara|LMT SAMT SAMT KUYT KUYST MSD MSK EEST KUYT SAMST SAMST|-3k.k -30 -40 -40 -50 -40 -30 -30 -30 -50 -40|012343434343434343435656782929292929292929292929292929292929292a12|-22WNk.k qHak.k bcn0 1Qqo0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1fA0 1cM0 1cN0 8o0 14j0 1cL0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qN0 WM0",
+			"Europe/Simferopol|SMT EET MSK CEST CET MSD EEST MSK|-2g -20 -30 -20 -10 -40 -30 -40|012343432525252525252525252161616525252616161616161616161616161616161616172|-1Pc2g eUog rEn0 2qs0 WM0 1fA0 1cM0 3V0 1u0L0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1Q00 4eL0 1cL0 1cN0 1cL0 1cN0 dX0 WL0 1cN0 1cL0 1fB0 1o30 11B0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11z0 1nW0",
+			"Europe/Sofia|EET CET CEST EEST|-20 -10 -20 -30|01212103030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030|-168L0 WM0 1fA0 1cM0 1cM0 1cN0 1mKH0 1dd0 1fb0 1ap0 1fb0 1a20 1fy0 1a30 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cK0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 1nX0 11E0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Stockholm|CET CEST|-10 -20|01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2azC0 TB0 2yDe0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Tallinn|TMT CET CEST EET MSK MSD EEST|-1D -10 -20 -20 -30 -40 -30|012103421212454545454545454546363636363636363636363636363636363636363636363636363636363636363636363636363636363636363636363|-26oND teD 11A0 1Ta0 4rXl KSLD 2FX0 2Jg0 WM0 1fA0 1cM0 18J0 1sTX0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o10 11A0 1qM0 5QM0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Tirane|LMT CET CEST|-1j.k -10 -20|01212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2glBj.k 14pcj.k 5LC0 WM0 4M0 1fCK0 10n0 1op0 11z0 1pd0 11z0 1qN0 WL0 1qp0 Xb0 1qp0 Xb0 1qp0 11z0 1lB0 11z0 1qN0 11z0 1iN0 16n0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Uzhgorod|CET CEST MSK MSD EET EEST|-10 -20 -30 -40 -20 -30|010101023232323232323232320454545454545454545454545454545454545454545454545454545454545454545454545454545454545454545454|-1cqL0 6i00 WM0 1fA0 1cM0 1ml0 1Cp0 1r3W0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1Q00 1Nf0 2pw0 1cL0 1cN0 1cL0 1cN0 1cL0 1cQ0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Vienna|CET CEST|-10 -20|0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1o00 11A0 3KM0 14o0 LA00 6i00 WM0 1fA0 1cM0 1cM0 1cM0 400 2qM0 1a00 1cM0 1cM0 1io0 17c0 1gHa0 19X0 1cP0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Vilnius|WMT KMT CET EET MSK CEST MSD EEST|-1o -1z.A -10 -20 -30 -20 -40 -30|012324525254646464646464646464647373737373737352537373737373737373737373737373737373737373737373737373737373737373737373|-293do 6ILM.o 1Ooz.A zz0 Mfd0 29W0 3is0 WM0 1fA0 1cM0 LV0 1tgL0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cN0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11B0 1o00 11A0 1qM0 8io0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Volgograd|LMT TSAT STAT STAT VOLT VOLST VOLST VOLT MSK MSK|-2V.E -30 -30 -40 -40 -50 -40 -30 -40 -30|012345454545454545454676748989898989898989898989898989898989898989|-21IqV.E cLXV.E cEM0 1gqn0 Lco0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1fA0 1cM0 2pz0 1cJ0 1cQ0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 8Hz0",
+			"Europe/Warsaw|WMT CET CEST EET EEST|-1o -10 -20 -20 -30|012121234312121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121|-2ctdo 1LXo 11d0 1iO0 11A0 1o00 11A0 1on0 11A0 6zy0 HWP0 5IM0 WM0 1fA0 1cM0 1dz0 1mL0 1en0 15B0 1aq0 1nA0 11A0 1io0 17c0 1fA0 1a00 iDX0 LA0 1cM0 1cM0 1C00 Oo0 1cM0 1cM0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1C00 LA0 uso0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cN0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"Europe/Zaporozhye|CUT EET MSK CEST CET MSD EEST|-2k -20 -30 -20 -10 -40 -30|01234342525252525252525252526161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161|-1Pc2k eUok rdb0 2RE0 WM0 1fA0 8m0 1v9a0 1db0 1cN0 1db0 1cN0 1db0 1dd0 1cO0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cK0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cQ0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"HST|HST|a0|0|",
+			"Indian/Chagos|LMT IOT IOT|-4N.E -50 -60|012|-2xosN.E 3AGLN.E",
+			"Indian/Christmas|CXT|-70|0|",
+			"Indian/Cocos|CCT|-6u|0|",
+			"Indian/Kerguelen|zzz TFT|0 -50|01|-MG00",
+			"Indian/Mahe|LMT SCT|-3F.M -40|01|-2yO3F.M",
+			"Indian/Maldives|MMT MVT|-4S -50|01|-olgS",
+			"Indian/Mauritius|LMT MUT MUST|-3O -40 -50|012121|-2xorO 34unO 14L0 12kr0 11z0",
+			"Indian/Reunion|LMT RET|-3F.Q -40|01|-2mDDF.Q",
+			"Kwajalein|MHT KWAT MHT|-b0 c0 -c0|012|-AX0 W9X0",
+			"MET|MET MEST|-10 -20|01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-2aFe0 11d0 1iO0 11A0 1o00 11A0 Qrc0 6i00 WM0 1fA0 1cM0 1cM0 1cM0 16M0 1gMM0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00",
+			"MST|MST|70|0|",
+			"MST7MDT|MST MDT MWT MPT|70 60 60 60|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261r0 1nX0 11B0 1nX0 SgN0 8x20 ix0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"NZ-CHAT|CHAST CHAST CHADT|-cf -cJ -dJ|012121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212121212|-WqAf 1adef IM0 1C00 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Oo0 1zc0 Rc0 1zc0 Oo0 1qM0 14o0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1io0 17c0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1lc0 14o0 1lc0 14o0 1lc0 17c0 1io0 17c0 1io0 17c0 1io0 17c0 1io0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00",
+			"PST8PDT|PST PDT PWT PPT|80 70 70 70|010102301010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|-261q0 1nX0 11B0 1nX0 SgN0 8x10 iy0 QwN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1cN0 1cL0 1cN0 1cL0 s10 1Vz0 LB0 1BX0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1fz0 1a10 1fz0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 14p0 1lb0 14p0 1lb0 14p0 1nX0 11B0 1nX0 11B0 1nX0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Rd0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0 Op0 1zb0",
+			"Pacific/Apia|LMT WSST SST SDT WSDT WSST|bq.U bu b0 a0 -e0 -d0|01232345454545454545454545454545454545454545454545454545454|-2nDMx.4 1yW03.4 2rRbu 1ff0 1a00 CI0 AQ0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1io0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1a00 1fA0 1cM0 1fA0 1a00 1fA0 1a00",
+			"Pacific/Bougainville|PGT JST BST|-a0 -90 -b0|0102|-16Wy0 7CN0 2MQp0",
+			"Pacific/Chuuk|CHUT|-a0|0|",
+			"Pacific/Efate|LMT VUT VUST|-bd.g -b0 -c0|0121212121212121212121|-2l9nd.g 2Szcd.g 1cL0 1oN0 10L0 1fB0 19X0 1fB0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1cN0 1cL0 1fB0 Lz0 1Nd0 An0",
+			"Pacific/Enderbury|PHOT PHOT PHOT|c0 b0 -d0|012|nIc0 B8n0",
+			"Pacific/Fakaofo|TKT TKT|b0 -d0|01|1Gfn0",
+			"Pacific/Fiji|LMT FJT FJST|-bT.I -c0 -d0|012121212121212121212121212121212121212121212121212121212121212|-2bUzT.I 3m8NT.I LA0 1EM0 IM0 nJc0 LA0 1o00 Rc0 1wo0 Ao0 1Nc0 Ao0 1Q00 xz0 1SN0 uM0 1SM0 xA0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 xA0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 xA0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1VA0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0 uM0 1SM0",
+			"Pacific/Funafuti|TVT|-c0|0|",
+			"Pacific/Galapagos|LMT ECT GALT|5W.o 50 60|012|-1yVS1.A 2dTz1.A",
+			"Pacific/Gambier|LMT GAMT|8X.M 90|01|-2jof0.c",
+			"Pacific/Guadalcanal|LMT SBT|-aD.M -b0|01|-2joyD.M",
+			"Pacific/Guam|GST ChST|-a0 -a0|01|1fpq0",
+			"Pacific/Honolulu|HST HDT HST|au 9u a0|010102|-1thLu 8x0 lef0 8Pz0 46p0",
+			"Pacific/Kiritimati|LINT LINT LINT|aE a0 -e0|012|nIaE B8nk",
+			"Pacific/Kosrae|KOST KOST|-b0 -c0|010|-AX0 1bdz0",
+			"Pacific/Majuro|MHT MHT|-b0 -c0|01|-AX0",
+			"Pacific/Marquesas|LMT MART|9i 9u|01|-2joeG",
+			"Pacific/Midway|NST NDT BST SST|b0 a0 b0 b0|01023|-x3N0 An0 pJd0 EyM0",
+			"Pacific/Nauru|LMT NRT JST NRT|-b7.E -bu -90 -c0|01213|-1Xdn7.E PvzB.E 5RCu 1ouJu",
+			"Pacific/Niue|NUT NUT NUT|bk bu b0|012|-KfME 17y0a",
+			"Pacific/Norfolk|NMT NFT|-bc -bu|01|-Kgbc",
+			"Pacific/Noumea|LMT NCT NCST|-b5.M -b0 -c0|01212121|-2l9n5.M 2EqM5.M xX0 1PB0 yn0 HeP0 Ao0",
+			"Pacific/Pago_Pago|LMT NST BST SST|bm.M b0 b0 b0|0123|-2nDMB.c 2gVzB.c EyM0",
+			"Pacific/Palau|PWT|-90|0|",
+			"Pacific/Pitcairn|PNT PST|8u 80|01|18Vku",
+			"Pacific/Pohnpei|PONT|-b0|0|",
+			"Pacific/Port_Moresby|PGT|-a0|0|",
+			"Pacific/Rarotonga|CKT CKHST CKT|au 9u a0|012121212121212121212121212|lyWu IL0 1zcu Onu 1zcu Onu 1zcu Rbu 1zcu Onu 1zcu Onu 1zcu Onu 1zcu Onu 1zcu Onu 1zcu Rbu 1zcu Onu 1zcu Onu 1zcu Onu",
+			"Pacific/Saipan|MPT MPT ChST|-90 -a0 -a0|012|-AV0 1g2n0",
+			"Pacific/Tahiti|LMT TAHT|9W.g a0|01|-2joe1.I",
+			"Pacific/Tarawa|GILT|-c0|0|",
+			"Pacific/Tongatapu|TOT TOT TOST|-ck -d0 -e0|01212121|-1aB0k 2n5dk 15A0 1wo0 xz0 1Q10 xz0",
+			"Pacific/Wake|WAKT|-c0|0|",
+			"Pacific/Wallis|WFT|-c0|0|",
+			"WET|WET WEST|0 -10|010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010|hDB0 1a00 1fA0 1cM0 1cM0 1cM0 1fA0 1a00 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1cM0 1fA0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00 11A0 1qM0 WM0 1qM0 WM0 1qM0 WM0 1qM0 11A0 1o00 11A0 1o00"
+		],
+		"links": [
+			"Africa/Abidjan|Africa/Bamako",
+			"Africa/Abidjan|Africa/Banjul",
+			"Africa/Abidjan|Africa/Conakry",
+			"Africa/Abidjan|Africa/Dakar",
+			"Africa/Abidjan|Africa/Freetown",
+			"Africa/Abidjan|Africa/Lome",
+			"Africa/Abidjan|Africa/Nouakchott",
+			"Africa/Abidjan|Africa/Ouagadougou",
+			"Africa/Abidjan|Africa/Sao_Tome",
+			"Africa/Abidjan|Africa/Timbuktu",
+			"Africa/Abidjan|Atlantic/St_Helena",
+			"Africa/Addis_Ababa|Africa/Asmara",
+			"Africa/Addis_Ababa|Africa/Asmera",
+			"Africa/Addis_Ababa|Africa/Dar_es_Salaam",
+			"Africa/Addis_Ababa|Africa/Djibouti",
+			"Africa/Addis_Ababa|Africa/Kampala",
+			"Africa/Addis_Ababa|Africa/Mogadishu",
+			"Africa/Addis_Ababa|Africa/Nairobi",
+			"Africa/Addis_Ababa|Indian/Antananarivo",
+			"Africa/Addis_Ababa|Indian/Comoro",
+			"Africa/Addis_Ababa|Indian/Mayotte",
+			"Africa/Bangui|Africa/Brazzaville",
+			"Africa/Bangui|Africa/Douala",
+			"Africa/Bangui|Africa/Kinshasa",
+			"Africa/Bangui|Africa/Lagos",
+			"Africa/Bangui|Africa/Libreville",
+			"Africa/Bangui|Africa/Luanda",
+			"Africa/Bangui|Africa/Malabo",
+			"Africa/Bangui|Africa/Niamey",
+			"Africa/Bangui|Africa/Porto-Novo",
+			"Africa/Blantyre|Africa/Bujumbura",
+			"Africa/Blantyre|Africa/Gaborone",
+			"Africa/Blantyre|Africa/Harare",
+			"Africa/Blantyre|Africa/Kigali",
+			"Africa/Blantyre|Africa/Lubumbashi",
+			"Africa/Blantyre|Africa/Lusaka",
+			"Africa/Blantyre|Africa/Maputo",
+			"Africa/Cairo|Egypt",
+			"Africa/Johannesburg|Africa/Maseru",
+			"Africa/Johannesburg|Africa/Mbabane",
+			"Africa/Juba|Africa/Khartoum",
+			"Africa/Tripoli|Libya",
+			"America/Adak|America/Atka",
+			"America/Adak|US/Aleutian",
+			"America/Anchorage|US/Alaska",
+			"America/Anguilla|America/Dominica",
+			"America/Anguilla|America/Grenada",
+			"America/Anguilla|America/Guadeloupe",
+			"America/Anguilla|America/Marigot",
+			"America/Anguilla|America/Montserrat",
+			"America/Anguilla|America/Port_of_Spain",
+			"America/Anguilla|America/St_Barthelemy",
+			"America/Anguilla|America/St_Kitts",
+			"America/Anguilla|America/St_Lucia",
+			"America/Anguilla|America/St_Thomas",
+			"America/Anguilla|America/St_Vincent",
+			"America/Anguilla|America/Tortola",
+			"America/Anguilla|America/Virgin",
+			"America/Argentina/Buenos_Aires|America/Buenos_Aires",
+			"America/Argentina/Catamarca|America/Argentina/ComodRivadavia",
+			"America/Argentina/Catamarca|America/Catamarca",
+			"America/Argentina/Cordoba|America/Cordoba",
+			"America/Argentina/Cordoba|America/Rosario",
+			"America/Argentina/Jujuy|America/Jujuy",
+			"America/Argentina/Mendoza|America/Mendoza",
+			"America/Aruba|America/Curacao",
+			"America/Aruba|America/Kralendijk",
+			"America/Aruba|America/Lower_Princes",
+			"America/Atikokan|America/Coral_Harbour",
+			"America/Chicago|US/Central",
+			"America/Denver|America/Shiprock",
+			"America/Denver|Navajo",
+			"America/Denver|US/Mountain",
+			"America/Detroit|US/Michigan",
+			"America/Edmonton|Canada/Mountain",
+			"America/Ensenada|America/Tijuana",
+			"America/Ensenada|Mexico/BajaNorte",
+			"America/Fort_Wayne|America/Indiana/Indianapolis",
+			"America/Fort_Wayne|America/Indianapolis",
+			"America/Fort_Wayne|US/East-Indiana",
+			"America/Halifax|Canada/Atlantic",
+			"America/Havana|Cuba",
+			"America/Indiana/Knox|America/Knox_IN",
+			"America/Indiana/Knox|US/Indiana-Starke",
+			"America/Jamaica|Jamaica",
+			"America/Kentucky/Louisville|America/Louisville",
+			"America/Los_Angeles|US/Pacific",
+			"America/Los_Angeles|US/Pacific-New",
+			"America/Manaus|Brazil/West",
+			"America/Mazatlan|Mexico/BajaSur",
+			"America/Mexico_City|Mexico/General",
+			"America/New_York|US/Eastern",
+			"America/Noronha|Brazil/DeNoronha",
+			"America/Phoenix|US/Arizona",
+			"America/Porto_Acre|America/Rio_Branco",
+			"America/Porto_Acre|Brazil/Acre",
+			"America/Regina|Canada/East-Saskatchewan",
+			"America/Regina|Canada/Saskatchewan",
+			"America/Santiago|Chile/Continental",
+			"America/Sao_Paulo|Brazil/East",
+			"America/St_Johns|Canada/Newfoundland",
+			"America/Toronto|Canada/Eastern",
+			"America/Vancouver|Canada/Pacific",
+			"America/Whitehorse|Canada/Yukon",
+			"America/Winnipeg|Canada/Central",
+			"Antarctica/McMurdo|Antarctica/South_Pole",
+			"Antarctica/McMurdo|NZ",
+			"Antarctica/McMurdo|Pacific/Auckland",
+			"Arctic/Longyearbyen|Atlantic/Jan_Mayen",
+			"Arctic/Longyearbyen|Europe/Oslo",
+			"Asia/Ashgabat|Asia/Ashkhabad",
+			"Asia/Bangkok|Asia/Phnom_Penh",
+			"Asia/Bangkok|Asia/Vientiane",
+			"Asia/Calcutta|Asia/Kolkata",
+			"Asia/Chongqing|Asia/Chungking",
+			"Asia/Chongqing|Asia/Harbin",
+			"Asia/Chongqing|Asia/Shanghai",
+			"Asia/Chongqing|PRC",
+			"Asia/Dacca|Asia/Dhaka",
+			"Asia/Ho_Chi_Minh|Asia/Saigon",
+			"Asia/Hong_Kong|Hongkong",
+			"Asia/Istanbul|Europe/Istanbul",
+			"Asia/Istanbul|Turkey",
+			"Asia/Jerusalem|Asia/Tel_Aviv",
+			"Asia/Jerusalem|Israel",
+			"Asia/Kashgar|Asia/Urumqi",
+			"Asia/Kathmandu|Asia/Katmandu",
+			"Asia/Macao|Asia/Macau",
+			"Asia/Makassar|Asia/Ujung_Pandang",
+			"Asia/Nicosia|Europe/Nicosia",
+			"Asia/Seoul|ROK",
+			"Asia/Singapore|Singapore",
+			"Asia/Taipei|ROC",
+			"Asia/Tehran|Iran",
+			"Asia/Thimbu|Asia/Thimphu",
+			"Asia/Tokyo|Japan",
+			"Asia/Ulaanbaatar|Asia/Ulan_Bator",
+			"Atlantic/Faeroe|Atlantic/Faroe",
+			"Atlantic/Reykjavik|Iceland",
+			"Australia/ACT|Australia/Canberra",
+			"Australia/ACT|Australia/NSW",
+			"Australia/ACT|Australia/Sydney",
+			"Australia/Adelaide|Australia/South",
+			"Australia/Brisbane|Australia/Queensland",
+			"Australia/Broken_Hill|Australia/Yancowinna",
+			"Australia/Darwin|Australia/North",
+			"Australia/Hobart|Australia/Tasmania",
+			"Australia/LHI|Australia/Lord_Howe",
+			"Australia/Melbourne|Australia/Victoria",
+			"Australia/Perth|Australia/West",
+			"Chile/EasterIsland|Pacific/Easter",
+			"Eire|Europe/Dublin",
+			"Etc/GMT+0|Etc/GMT",
+			"Etc/GMT+0|Etc/GMT-0",
+			"Etc/GMT+0|Etc/GMT0",
+			"Etc/GMT+0|Etc/Greenwich",
+			"Etc/GMT+0|GMT",
+			"Etc/GMT+0|GMT+0",
+			"Etc/GMT+0|GMT-0",
+			"Etc/GMT+0|GMT0",
+			"Etc/GMT+0|Greenwich",
+			"Etc/UCT|UCT",
+			"Etc/UTC|Etc/Universal",
+			"Etc/UTC|Etc/Zulu",
+			"Etc/UTC|UTC",
+			"Etc/UTC|Universal",
+			"Etc/UTC|Zulu",
+			"Europe/Belfast|Europe/Guernsey",
+			"Europe/Belfast|Europe/Isle_of_Man",
+			"Europe/Belfast|Europe/Jersey",
+			"Europe/Belfast|Europe/London",
+			"Europe/Belfast|GB",
+			"Europe/Belfast|GB-Eire",
+			"Europe/Belgrade|Europe/Ljubljana",
+			"Europe/Belgrade|Europe/Podgorica",
+			"Europe/Belgrade|Europe/Sarajevo",
+			"Europe/Belgrade|Europe/Skopje",
+			"Europe/Belgrade|Europe/Zagreb",
+			"Europe/Bratislava|Europe/Prague",
+			"Europe/Busingen|Europe/Vaduz",
+			"Europe/Busingen|Europe/Zurich",
+			"Europe/Chisinau|Europe/Tiraspol",
+			"Europe/Helsinki|Europe/Mariehamn",
+			"Europe/Lisbon|Portugal",
+			"Europe/Moscow|W-SU",
+			"Europe/Rome|Europe/San_Marino",
+			"Europe/Rome|Europe/Vatican",
+			"Europe/Warsaw|Poland",
+			"Kwajalein|Pacific/Kwajalein",
+			"NZ-CHAT|Pacific/Chatham",
+			"Pacific/Chuuk|Pacific/Truk",
+			"Pacific/Chuuk|Pacific/Yap",
+			"Pacific/Honolulu|Pacific/Johnston",
+			"Pacific/Honolulu|US/Hawaii",
+			"Pacific/Pago_Pago|Pacific/Samoa",
+			"Pacific/Pago_Pago|US/Samoa",
+			"Pacific/Pohnpei|Pacific/Ponape"
+		]
+	});
+
+
+	return moment;
+}));
diff --git a/previous_versions/v0.4.0/references.html b/previous_versions/v0.4.0/references.html
new file mode 100644
index 000000000..feee013f5
--- /dev/null
+++ b/previous_versions/v0.4.0/references.html
@@ -0,0 +1,670 @@
+<!DOCTYPE html>
+<html >
+
+<head>
+
+  <meta charset="UTF-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <title>References | An Introduction to Statistical and Data Sciences via R</title>
+  <meta name="description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses.">
+  <meta name="generator" content="bookdown  and GitBook 2.6.7">
+
+  <meta property="og:title" content="References | An Introduction to Statistical and Data Sciences via R" />
+  <meta property="og:type" content="book" />
+  <meta property="og:url" content="https://moderndive.com/" />
+  <meta property="og:image" content="https://moderndive.com/images/logos/book_cover.png" />
+  <meta property="og:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="github-repo" content="moderndive/moderndive_book" />
+
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="References | An Introduction to Statistical and Data Sciences via R" />
+  
+  <meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses." />
+  <meta name="twitter:image" content="https://moderndive.com/images/logos/book_cover.png" />
+
+<meta name="author" content="Chester Ismay and Albert Y. Kim">
+
+
+<meta name="date" content="2018-07-21">
+
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="apple-mobile-web-app-capable" content="yes">
+  <meta name="apple-mobile-web-app-status-bar-style" content="black">
+  <link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png">
+  <link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon">
+<link rel="prev" href="C-appendixC.html">
+
+<script src="libs/jquery-2.2.3/jquery.min.js"></script>
+<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
+<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
+
+
+
+
+
+
+
+<script src="libs/htmlwidgets-1.3/htmlwidgets.js"></script>
+<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
+<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
+<script src="libs/dygraphs-1.1.1/shapes.js"></script>
+<script src="libs/moment-2.8.4/moment.js"></script>
+<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
+<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
+<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-89938436-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+
+<style type="text/css">
+a.sourceLine { display: inline-block; line-height: 1.25; }
+a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
+a.sourceLine:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode { white-space: pre; position: relative; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+code.sourceCode { white-space: pre-wrap; }
+a.sourceLine { text-indent: -1em; padding-left: 1em; }
+}
+pre.numberSource a.sourceLine
+  { position: relative; left: -4em; }
+pre.numberSource a.sourceLine::before
+  { content: attr(data-line-number);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; pointer-events: all; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {  }
+@media screen {
+a.sourceLine::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+</style>
+
+<link rel="stylesheet" href="style.css" type="text/css" />
+</head>
+
+<body>
+
+
+
+  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
+
+    <div class="book-summary">
+      <nav role="navigation">
+
+<ul class="summary">
+<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-note"><i class="fa fa-check"></i><b>1.1</b> Important Note</a></li>
+<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#sec:intro-for-students"><i class="fa fa-check"></i><b>1.2</b> Introduction for students</a><ul>
+<li class="chapter" data-level="1.2.1" data-path="index.html"><a href="index.html#subsec:learning-goals"><i class="fa fa-check"></i><b>1.2.1</b> What you will learn from this book</a></li>
+<li class="chapter" data-level="1.2.2" data-path="index.html"><a href="index.html#subsec:pipeline"><i class="fa fa-check"></i><b>1.2.2</b> Data/science pipeline</a></li>
+<li class="chapter" data-level="1.2.3" data-path="index.html"><a href="index.html#subsec:reproducible"><i class="fa fa-check"></i><b>1.2.3</b> Reproducible research</a></li>
+<li class="chapter" data-level="1.2.4" data-path="index.html"><a href="index.html#final-note-for-students"><i class="fa fa-check"></i><b>1.2.4</b> Final note for students</a></li>
+</ul></li>
+<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#sec:intro-instructors"><i class="fa fa-check"></i><b>1.3</b> Introduction for instructors</a><ul>
+<li class="chapter" data-level="1.3.1" data-path="index.html"><a href="index.html#who-is-this-book-for"><i class="fa fa-check"></i><b>1.3.1</b> Who is this book for?</a></li>
+</ul></li>
+<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#datacamp"><i class="fa fa-check"></i><b>1.4</b> DataCamp</a></li>
+<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#sec:connect-contribute"><i class="fa fa-check"></i><b>1.5</b> Connect and contribute</a></li>
+<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#sec:about-book"><i class="fa fa-check"></i><b>1.6</b> About this book</a></li>
+<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#sec:about-authors"><i class="fa fa-check"></i><b>1.7</b> About the authors</a></li>
+</ul></li>
+<li class="chapter" data-level="2" data-path="2-getting-started.html"><a href="2-getting-started.html"><i class="fa fa-check"></i><b>2</b> Getting Started with Data in R</a><ul>
+<li class="chapter" data-level="2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#what-are-r-and-rstudio"><i class="fa fa-check"></i><b>2.1</b> What are R and RStudio?</a><ul>
+<li class="chapter" data-level="2.1.1" data-path="2-getting-started.html"><a href="2-getting-started.html#installing-r-and-rstudio"><i class="fa fa-check"></i><b>2.1.1</b> Installing R and RStudio</a></li>
+<li class="chapter" data-level="2.1.2" data-path="2-getting-started.html"><a href="2-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>2.1.2</b> Using R via RStudio</a></li>
+</ul></li>
+<li class="chapter" data-level="2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#code"><i class="fa fa-check"></i><b>2.2</b> How do I code in R?</a><ul>
+<li class="chapter" data-level="2.2.1" data-path="2-getting-started.html"><a href="2-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>2.2.1</b> Basic programming concepts and terminology</a></li>
+<li class="chapter" data-level="2.2.2" data-path="2-getting-started.html"><a href="2-getting-started.html#tips-on-learning-to-code"><i class="fa fa-check"></i><b>2.2.2</b> Tips on learning to code</a></li>
+</ul></li>
+<li class="chapter" data-level="2.3" data-path="2-getting-started.html"><a href="2-getting-started.html#packages"><i class="fa fa-check"></i><b>2.3</b> What are R packages?</a><ul>
+<li class="chapter" data-level="2.3.1" data-path="2-getting-started.html"><a href="2-getting-started.html#package-installation"><i class="fa fa-check"></i><b>2.3.1</b> Package installation</a></li>
+<li class="chapter" data-level="2.3.2" data-path="2-getting-started.html"><a href="2-getting-started.html#package-loading"><i class="fa fa-check"></i><b>2.3.2</b> Package loading</a></li>
+</ul></li>
+<li class="chapter" data-level="2.4" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13"><i class="fa fa-check"></i><b>2.4</b> Explore your first dataset</a><ul>
+<li class="chapter" data-level="2.4.1" data-path="2-getting-started.html"><a href="2-getting-started.html#nycflights13-package"><i class="fa fa-check"></i><b>2.4.1</b> nycflights13 package</a></li>
+<li class="chapter" data-level="2.4.2" data-path="2-getting-started.html"><a href="2-getting-started.html#flights-data-frame"><i class="fa fa-check"></i><b>2.4.2</b> flights data frame</a></li>
+<li class="chapter" data-level="2.4.3" data-path="2-getting-started.html"><a href="2-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>2.4.3</b> Exploring data frames</a></li>
+<li class="chapter" data-level="2.4.4" data-path="2-getting-started.html"><a href="2-getting-started.html#help-files"><i class="fa fa-check"></i><b>2.4.4</b> Help files</a></li>
+</ul></li>
+<li class="chapter" data-level="2.5" data-path="2-getting-started.html"><a href="2-getting-started.html#conclusion"><i class="fa fa-check"></i><b>2.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="2.5.1" data-path="2-getting-started.html"><a href="2-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>2.5.1</b> What’s to come?</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>I Data Science via the tidyverse</b></span></li>
+<li class="chapter" data-level="3" data-path="3-viz.html"><a href="3-viz.html"><i class="fa fa-check"></i><b>3</b> Data Visualization via ggplot2</a><ul>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="3-viz.html"><a href="3-viz.html#datacamp-1"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="3.1" data-path="3-viz.html"><a href="3-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>3.1</b> The Grammar of Graphics</a><ul>
+<li class="chapter" data-level="3.1.1" data-path="3-viz.html"><a href="3-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.1</b> Components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.2" data-path="3-viz.html"><a href="3-viz.html#gapminder"><i class="fa fa-check"></i><b>3.1.2</b> Gapminder</a></li>
+<li class="chapter" data-level="3.1.3" data-path="3-viz.html"><a href="3-viz.html#other-components-of-the-grammar"><i class="fa fa-check"></i><b>3.1.3</b> Other components of the Grammar</a></li>
+<li class="chapter" data-level="3.1.4" data-path="3-viz.html"><a href="3-viz.html#the-ggplot2-package"><i class="fa fa-check"></i><b>3.1.4</b> The ggplot2 package</a></li>
+</ul></li>
+<li class="chapter" data-level="3.2" data-path="3-viz.html"><a href="3-viz.html#FiveNG"><i class="fa fa-check"></i><b>3.2</b> Five Named Graphs - The 5NG</a></li>
+<li class="chapter" data-level="3.3" data-path="3-viz.html"><a href="3-viz.html#scatterplots"><i class="fa fa-check"></i><b>3.3</b> 5NG#1: Scatterplots</a><ul>
+<li class="chapter" data-level="3.3.1" data-path="3-viz.html"><a href="3-viz.html#geompoint"><i class="fa fa-check"></i><b>3.3.1</b> Scatterplots via geom_point</a></li>
+<li class="chapter" data-level="3.3.2" data-path="3-viz.html"><a href="3-viz.html#overplotting"><i class="fa fa-check"></i><b>3.3.2</b> Over-plotting</a></li>
+<li class="chapter" data-level="3.3.3" data-path="3-viz.html"><a href="3-viz.html#summary"><i class="fa fa-check"></i><b>3.3.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.4" data-path="3-viz.html"><a href="3-viz.html#linegraphs"><i class="fa fa-check"></i><b>3.4</b> 5NG#2: Linegraphs</a><ul>
+<li class="chapter" data-level="3.4.1" data-path="3-viz.html"><a href="3-viz.html#geomline"><i class="fa fa-check"></i><b>3.4.1</b> Linegraphs via geom_line</a></li>
+<li class="chapter" data-level="3.4.2" data-path="3-viz.html"><a href="3-viz.html#summary-1"><i class="fa fa-check"></i><b>3.4.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.5" data-path="3-viz.html"><a href="3-viz.html#histograms"><i class="fa fa-check"></i><b>3.5</b> 5NG#3: Histograms</a><ul>
+<li class="chapter" data-level="3.5.1" data-path="3-viz.html"><a href="3-viz.html#geomhistogram"><i class="fa fa-check"></i><b>3.5.1</b> Histograms via geom_histogram</a></li>
+<li class="chapter" data-level="3.5.2" data-path="3-viz.html"><a href="3-viz.html#adjustbins"><i class="fa fa-check"></i><b>3.5.2</b> Adjusting the bins</a></li>
+<li class="chapter" data-level="3.5.3" data-path="3-viz.html"><a href="3-viz.html#summary-2"><i class="fa fa-check"></i><b>3.5.3</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.6" data-path="3-viz.html"><a href="3-viz.html#facets"><i class="fa fa-check"></i><b>3.6</b> Facets</a></li>
+<li class="chapter" data-level="3.7" data-path="3-viz.html"><a href="3-viz.html#boxplots"><i class="fa fa-check"></i><b>3.7</b> 5NG#4: Boxplots</a><ul>
+<li class="chapter" data-level="3.7.1" data-path="3-viz.html"><a href="3-viz.html#geomboxplot"><i class="fa fa-check"></i><b>3.7.1</b> Boxplots via geom_boxplot</a></li>
+<li class="chapter" data-level="3.7.2" data-path="3-viz.html"><a href="3-viz.html#summary-3"><i class="fa fa-check"></i><b>3.7.2</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.8" data-path="3-viz.html"><a href="3-viz.html#geombar"><i class="fa fa-check"></i><b>3.8</b> 5NG#5: Barplots</a><ul>
+<li class="chapter" data-level="3.8.1" data-path="3-viz.html"><a href="3-viz.html#barplots-via-geom_bargeom_col"><i class="fa fa-check"></i><b>3.8.1</b> Barplots via geom_bar/geom_col</a></li>
+<li class="chapter" data-level="3.8.2" data-path="3-viz.html"><a href="3-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>3.8.2</b> Must avoid pie charts!</a></li>
+<li class="chapter" data-level="3.8.3" data-path="3-viz.html"><a href="3-viz.html#using-barplots-to-compare-two-categorical-variables"><i class="fa fa-check"></i><b>3.8.3</b> Using barplots to compare two categorical variables</a></li>
+<li class="chapter" data-level="3.8.4" data-path="3-viz.html"><a href="3-viz.html#summary-4"><i class="fa fa-check"></i><b>3.8.4</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="3.9" data-path="3-viz.html"><a href="3-viz.html#conclusion-1"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="3.9.1" data-path="3-viz.html"><a href="3-viz.html#putting-it-all-together"><i class="fa fa-check"></i><b>3.9.1</b> Putting it all together</a></li>
+<li class="chapter" data-level="3.9.2" data-path="3-viz.html"><a href="3-viz.html#review-questions"><i class="fa fa-check"></i><b>3.9.2</b> Review questions</a></li>
+<li class="chapter" data-level="3.9.3" data-path="3-viz.html"><a href="3-viz.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="3.9.4" data-path="3-viz.html"><a href="3-viz.html#resources"><i class="fa fa-check"></i><b>3.9.4</b> Resources</a></li>
+<li class="chapter" data-level="3.9.5" data-path="3-viz.html"><a href="3-viz.html#script-of-r-code"><i class="fa fa-check"></i><b>3.9.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Tidy Data via tidyr</a><ul>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#datacamp-2"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#what-is-tidy-data"><i class="fa fa-check"></i><b>4.1</b> What is tidy data?</a></li>
+<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#back-to-nycflights13"><i class="fa fa-check"></i><b>4.2</b> Back to nycflights13</a><ul>
+<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#observational-units"><i class="fa fa-check"></i><b>4.2.1</b> Observational units</a></li>
+<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#identification-vs-measurement"><i class="fa fa-check"></i><b>4.2.2</b> Identification vs measurement variables</a></li>
+</ul></li>
+<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.3</b> Importing spreadsheets into R</a><ul>
+<li class="chapter" data-level="4.3.1" data-path="4-tidy.html"><a href="4-tidy.html#method-1-from-the-console"><i class="fa fa-check"></i><b>4.3.1</b> Method 1: From the console</a></li>
+<li class="chapter" data-level="4.3.2" data-path="4-tidy.html"><a href="4-tidy.html#method-2-using-rstudios-interface"><i class="fa fa-check"></i><b>4.3.2</b> Method 2: Using RStudio’s interface</a></li>
+</ul></li>
+<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidying"><i class="fa fa-check"></i><b>4.4</b> Converting to “tidy” data format</a></li>
+<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#optional-normal-forms-of-data"><i class="fa fa-check"></i><b>4.5</b> Optional: Normal forms of data</a></li>
+<li class="chapter" data-level="4.6" data-path="4-tidy.html"><a href="4-tidy.html#conclusion-2"><i class="fa fa-check"></i><b>4.6</b> Conclusion</a><ul>
+<li class="chapter" data-level="4.6.1" data-path="4-tidy.html"><a href="4-tidy.html#review-questions-1"><i class="fa fa-check"></i><b>4.6.1</b> Review questions</a></li>
+<li class="chapter" data-level="4.6.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.6.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="4.6.3" data-path="4-tidy.html"><a href="4-tidy.html#script-of-r-code-1"><i class="fa fa-check"></i><b>4.6.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="5" data-path="5-wrangling.html"><a href="5-wrangling.html"><i class="fa fa-check"></i><b>5</b> Data Wrangling via dplyr</a><ul>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="5-wrangling.html"><a href="5-wrangling.html#datacamp-3"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#piping"><i class="fa fa-check"></i><b>5.1</b> The pipe <code>%&gt;%</code></a></li>
+<li class="chapter" data-level="5.2" data-path="5-wrangling.html"><a href="5-wrangling.html#verbs"><i class="fa fa-check"></i><b>5.2</b> Data wrangling verbs</a></li>
+<li class="chapter" data-level="5.3" data-path="5-wrangling.html"><a href="5-wrangling.html#filter"><i class="fa fa-check"></i><b>5.3</b> Filter observations using filter</a></li>
+<li class="chapter" data-level="5.4" data-path="5-wrangling.html"><a href="5-wrangling.html#summarize"><i class="fa fa-check"></i><b>5.4</b> Summarize variables using summarize</a></li>
+<li class="chapter" data-level="5.5" data-path="5-wrangling.html"><a href="5-wrangling.html#groupby"><i class="fa fa-check"></i><b>5.5</b> Group rows using group_by</a><ul>
+<li class="chapter" data-level="5.5.1" data-path="5-wrangling.html"><a href="5-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>5.5.1</b> Grouping by more than one variable</a></li>
+</ul></li>
+<li class="chapter" data-level="5.6" data-path="5-wrangling.html"><a href="5-wrangling.html#mutate"><i class="fa fa-check"></i><b>5.6</b> Create new variables/change old variables using mutate</a></li>
+<li class="chapter" data-level="5.7" data-path="5-wrangling.html"><a href="5-wrangling.html#arrange"><i class="fa fa-check"></i><b>5.7</b> Reorder the data frame using arrange</a></li>
+<li class="chapter" data-level="5.8" data-path="5-wrangling.html"><a href="5-wrangling.html#joins"><i class="fa fa-check"></i><b>5.8</b> Joining data frames</a><ul>
+<li class="chapter" data-level="5.8.1" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables"><i class="fa fa-check"></i><b>5.8.1</b> Joining by “key” variables</a></li>
+<li class="chapter" data-level="5.8.2" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-key-variables-with-different-names"><i class="fa fa-check"></i><b>5.8.2</b> Joining by “key” variables with different names</a></li>
+<li class="chapter" data-level="5.8.3" data-path="5-wrangling.html"><a href="5-wrangling.html#joining-by-multiple-key-variables"><i class="fa fa-check"></i><b>5.8.3</b> Joining by multiple “key” variables</a></li>
+</ul></li>
+<li class="chapter" data-level="5.9" data-path="5-wrangling.html"><a href="5-wrangling.html#other-verbs"><i class="fa fa-check"></i><b>5.9</b> Other verbs</a><ul>
+<li class="chapter" data-level="5.9.1" data-path="5-wrangling.html"><a href="5-wrangling.html#select"><i class="fa fa-check"></i><b>5.9.1</b> Select variables using select</a></li>
+<li class="chapter" data-level="5.9.2" data-path="5-wrangling.html"><a href="5-wrangling.html#rename"><i class="fa fa-check"></i><b>5.9.2</b> Rename variables using rename</a></li>
+<li class="chapter" data-level="5.9.3" data-path="5-wrangling.html"><a href="5-wrangling.html#find-the-top-number-of-values-using-top_n"><i class="fa fa-check"></i><b>5.9.3</b> Find the top number of values using top_n</a></li>
+</ul></li>
+<li class="chapter" data-level="5.10" data-path="5-wrangling.html"><a href="5-wrangling.html#conclusion-3"><i class="fa fa-check"></i><b>5.10</b> Conclusion</a><ul>
+<li class="chapter" data-level="5.10.1" data-path="5-wrangling.html"><a href="5-wrangling.html#putting-it-all-together-available-seat-miles"><i class="fa fa-check"></i><b>5.10.1</b> Putting it all together: Available seat miles</a></li>
+<li class="chapter" data-level="5.10.2" data-path="5-wrangling.html"><a href="5-wrangling.html#review-questions-2"><i class="fa fa-check"></i><b>5.10.2</b> Review questions</a></li>
+<li class="chapter" data-level="5.10.3" data-path="5-wrangling.html"><a href="5-wrangling.html#whats-to-come-3"><i class="fa fa-check"></i><b>5.10.3</b> What’s to come?</a></li>
+<li class="chapter" data-level="5.10.4" data-path="5-wrangling.html"><a href="5-wrangling.html#resources-1"><i class="fa fa-check"></i><b>5.10.4</b> Resources</a></li>
+<li class="chapter" data-level="5.10.5" data-path="5-wrangling.html"><a href="5-wrangling.html#script-of-r-code-2"><i class="fa fa-check"></i><b>5.10.5</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>II Data Modeling via moderndive</b></span></li>
+<li class="chapter" data-level="6" data-path="6-regression.html"><a href="6-regression.html"><i class="fa fa-check"></i><b>6</b> Basic Regression</a><ul>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#needed-packages-3"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="6-regression.html"><a href="6-regression.html#datacamp-4"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="6.1" data-path="6-regression.html"><a href="6-regression.html#model1"><i class="fa fa-check"></i><b>6.1</b> One numerical explanatory variable</a><ul>
+<li class="chapter" data-level="6.1.1" data-path="6-regression.html"><a href="6-regression.html#model1EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.1.2" data-path="6-regression.html"><a href="6-regression.html#model1table"><i class="fa fa-check"></i><b>6.1.2</b> Simple linear regression</a></li>
+<li class="chapter" data-level="6.1.3" data-path="6-regression.html"><a href="6-regression.html#model1points"><i class="fa fa-check"></i><b>6.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.1.4" data-path="6-regression.html"><a href="6-regression.html#model1residuals"><i class="fa fa-check"></i><b>6.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.2" data-path="6-regression.html"><a href="6-regression.html#model2"><i class="fa fa-check"></i><b>6.2</b> One categorical explanatory variable</a><ul>
+<li class="chapter" data-level="6.2.1" data-path="6-regression.html"><a href="6-regression.html#model2EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="6.2.2" data-path="6-regression.html"><a href="6-regression.html#model2table"><i class="fa fa-check"></i><b>6.2.2</b> Linear regression</a></li>
+<li class="chapter" data-level="6.2.3" data-path="6-regression.html"><a href="6-regression.html#model2points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="6.2.4" data-path="6-regression.html"><a href="6-regression.html#model2residuals"><i class="fa fa-check"></i><b>6.2.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="6.3" data-path="6-regression.html"><a href="6-regression.html#related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a><ul>
+<li class="chapter" data-level="6.3.1" data-path="6-regression.html"><a href="6-regression.html#correlationcoefficient"><i class="fa fa-check"></i><b>6.3.1</b> Correlation coefficient</a></li>
+<li class="chapter" data-level="6.3.2" data-path="6-regression.html"><a href="6-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>6.3.2</b> Correlation is not necessarily causation</a></li>
+<li class="chapter" data-level="6.3.3" data-path="6-regression.html"><a href="6-regression.html#leastsquares"><i class="fa fa-check"></i><b>6.3.3</b> Best fitting line</a></li>
+<li class="chapter" data-level="6.3.4" data-path="6-regression.html"><a href="6-regression.html#underthehood"><i class="fa fa-check"></i><b>6.3.4</b> <code>get_regression_x()</code> functions</a></li>
+</ul></li>
+<li class="chapter" data-level="6.4" data-path="6-regression.html"><a href="6-regression.html#conclusion-4"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="6.4.1" data-path="6-regression.html"><a href="6-regression.html#script-of-r-code-3"><i class="fa fa-check"></i><b>6.4.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="7" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html"><i class="fa fa-check"></i><b>7</b> Multiple Regression</a><ul>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#needed-packages-4"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#datacamp-5"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="7.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3"><i class="fa fa-check"></i><b>7.1</b> Two numerical explanatory variables</a><ul>
+<li class="chapter" data-level="7.1.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>7.1.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.1.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>7.1.2</b> Multiple regression</a></li>
+<li class="chapter" data-level="7.1.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>7.1.3</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.1.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model3residuals"><i class="fa fa-check"></i><b>7.1.4</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4"><i class="fa fa-check"></i><b>7.2</b> One numerical &amp; one categorical explanatory variable</a><ul>
+<li class="chapter" data-level="7.2.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>7.2.1</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="7.2.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>7.2.2</b> Multiple regression: Parallel slopes model</a></li>
+<li class="chapter" data-level="7.2.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>7.2.3</b> Multiple regression: Interaction model</a></li>
+<li class="chapter" data-level="7.2.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>7.2.4</b> Observed/fitted values and residuals</a></li>
+<li class="chapter" data-level="7.2.5" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#model4residuals"><i class="fa fa-check"></i><b>7.2.5</b> Residual analysis</a></li>
+</ul></li>
+<li class="chapter" data-level="7.3" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#related-topics-1"><i class="fa fa-check"></i><b>7.3</b> Related topics</a><ul>
+<li class="chapter" data-level="7.3.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#correlationcoefficient2"><i class="fa fa-check"></i><b>7.3.1</b> More on the correlation coefficient</a></li>
+<li class="chapter" data-level="7.3.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#simpsonsparadox"><i class="fa fa-check"></i><b>7.3.2</b> Simpson’s Paradox</a></li>
+</ul></li>
+<li class="chapter" data-level="7.4" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#conclusion-5"><i class="fa fa-check"></i><b>7.4</b> Conclusion</a><ul>
+<li class="chapter" data-level="7.4.1" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>7.4.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="7.4.2" data-path="7-multiple-regression.html"><a href="7-multiple-regression.html#script-of-r-code-4"><i class="fa fa-check"></i><b>7.4.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>III Inference via infer</b></span></li>
+<li class="chapter" data-level="8" data-path="8-sampling.html"><a href="8-sampling.html"><i class="fa fa-check"></i><b>8</b> Sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#needed-packages-5"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="8.1" data-path="8-sampling.html"><a href="8-sampling.html#introduction-to-sampling"><i class="fa fa-check"></i><b>8.1</b> Introduction to sampling</a><ul>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#concepts-related-to-sampling"><i class="fa fa-check"></i>Concepts related to sampling</a></li>
+<li class="chapter" data-level="" data-path="8-sampling.html"><a href="8-sampling.html#inference-via-sampling"><i class="fa fa-check"></i>Inference via sampling</a></li>
+</ul></li>
+<li class="chapter" data-level="8.2" data-path="8-sampling.html"><a href="8-sampling.html#tactile"><i class="fa fa-check"></i><b>8.2</b> Tactile sampling simulation</a><ul>
+<li class="chapter" data-level="8.2.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once"><i class="fa fa-check"></i><b>8.2.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.2.2" data-path="8-sampling.html"><a href="8-sampling.html#student-shovels"><i class="fa fa-check"></i><b>8.2.2</b> Using shovel 33 times</a></li>
+</ul></li>
+<li class="chapter" data-level="8.3" data-path="8-sampling.html"><a href="8-sampling.html#virtual"><i class="fa fa-check"></i><b>8.3</b> Virtual sampling simulation</a><ul>
+<li class="chapter" data-level="8.3.1" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-once-1"><i class="fa fa-check"></i><b>8.3.1</b> Using shovel once</a></li>
+<li class="chapter" data-level="8.3.2" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-33-times"><i class="fa fa-check"></i><b>8.3.2</b> Using shovel 33 times</a></li>
+<li class="chapter" data-level="8.3.3" data-path="8-sampling.html"><a href="8-sampling.html#using-shovel-1000-times"><i class="fa fa-check"></i><b>8.3.3</b> Using shovel 1000 times</a></li>
+<li class="chapter" data-level="8.3.4" data-path="8-sampling.html"><a href="8-sampling.html#using-different-shovels"><i class="fa fa-check"></i><b>8.3.4</b> Using different shovels</a></li>
+</ul></li>
+<li class="chapter" data-level="8.4" data-path="8-sampling.html"><a href="8-sampling.html#polls"><i class="fa fa-check"></i><b>8.4</b> In real-life sampling: Polls</a></li>
+<li class="chapter" data-level="8.5" data-path="8-sampling.html"><a href="8-sampling.html#conclusion-6"><i class="fa fa-check"></i><b>8.5</b> Conclusion</a><ul>
+<li class="chapter" data-level="8.5.1" data-path="8-sampling.html"><a href="8-sampling.html#central-limit-theorem"><i class="fa fa-check"></i><b>8.5.1</b> Central Limit Theorem</a></li>
+<li class="chapter" data-level="8.5.2" data-path="8-sampling.html"><a href="8-sampling.html#whats-to-come-5"><i class="fa fa-check"></i><b>8.5.2</b> What’s to come?</a></li>
+<li class="chapter" data-level="8.5.3" data-path="8-sampling.html"><a href="8-sampling.html#script-of-r-code-5"><i class="fa fa-check"></i><b>8.5.3</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="9" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html"><i class="fa fa-check"></i><b>9</b> Confidence Intervals</a><ul>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#needed-packages-6"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#datacamp-6"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="9.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrapping"><i class="fa fa-check"></i><b>9.1</b> Bootstrapping</a><ul>
+<li class="chapter" data-level="9.1.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#data-explanation"><i class="fa fa-check"></i><b>9.1.1</b> Data explanation</a></li>
+<li class="chapter" data-level="9.1.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#exploratory-data-analysis"><i class="fa fa-check"></i><b>9.1.2</b> Exploratory data analysis</a></li>
+<li class="chapter" data-level="9.1.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>9.1.3</b> The Bootstrapping Process</a></li>
+</ul></li>
+<li class="chapter" data-level="9.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-infer-package-for-statistical-inference"><i class="fa fa-check"></i><b>9.2</b> The infer package for statistical inference</a><ul>
+<li class="chapter" data-level="9.2.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#specify-variables"><i class="fa fa-check"></i><b>9.2.1</b> Specify variables</a></li>
+<li class="chapter" data-level="9.2.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#generate-replicates"><i class="fa fa-check"></i><b>9.2.2</b> Generate replicates</a></li>
+<li class="chapter" data-level="9.2.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#calculate-summary-statistics"><i class="fa fa-check"></i><b>9.2.3</b> Calculate summary statistics</a></li>
+<li class="chapter" data-level="9.2.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#visualize-the-results"><i class="fa fa-check"></i><b>9.2.4</b> Visualize the results</a></li>
+</ul></li>
+<li class="chapter" data-level="9.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#now-to-confidence-intervals"><i class="fa fa-check"></i><b>9.3</b> Now to confidence intervals</a><ul>
+<li class="chapter" data-level="9.3.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>9.3.1</b> The percentile method</a></li>
+<li class="chapter" data-level="9.3.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#the-standard-error-method"><i class="fa fa-check"></i><b>9.3.2</b> The standard error method</a></li>
+</ul></li>
+<li class="chapter" data-level="9.4" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#comparing-bootstrap-and-sampling-distributions"><i class="fa fa-check"></i><b>9.4</b> Comparing bootstrap and sampling distributions</a></li>
+<li class="chapter" data-level="9.5" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>9.5</b> Interpreting the confidence interval</a></li>
+<li class="chapter" data-level="9.6" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>9.6</b> EXAMPLE: One proportion</a><ul>
+<li class="chapter" data-level="9.6.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#observed-statistic"><i class="fa fa-check"></i><b>9.6.1</b> Observed Statistic</a></li>
+<li class="chapter" data-level="9.6.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-1"><i class="fa fa-check"></i><b>9.6.2</b> Bootstrap distribution</a></li>
+<li class="chapter" data-level="9.6.3" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#theory-based-confidence-intervals"><i class="fa fa-check"></i><b>9.6.3</b> Theory-based confidence intervals</a></li>
+</ul></li>
+<li class="chapter" data-level="9.7" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#example-comparing-two-proportions"><i class="fa fa-check"></i><b>9.7</b> EXAMPLE: Comparing two proportions</a><ul>
+<li class="chapter" data-level="9.7.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#compute-the-point-estimate"><i class="fa fa-check"></i><b>9.7.1</b> Compute the point estimate</a></li>
+<li class="chapter" data-level="9.7.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#bootstrap-distribution-2"><i class="fa fa-check"></i><b>9.7.2</b> Bootstrap distribution</a></li>
+</ul></li>
+<li class="chapter" data-level="9.8" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#conclusion-7"><i class="fa fa-check"></i><b>9.8</b> Conclusion</a><ul>
+<li class="chapter" data-level="9.8.1" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#whats-to-come-6"><i class="fa fa-check"></i><b>9.8.1</b> What’s to come?</a></li>
+<li class="chapter" data-level="9.8.2" data-path="9-confidence-intervals.html"><a href="9-confidence-intervals.html#script-of-r-code-6"><i class="fa fa-check"></i><b>9.8.2</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html"><i class="fa fa-check"></i><b>10</b> Hypothesis Testing</a><ul>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#needed-packages-7"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#datacamp-7"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="10.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>10.1</b> When inference is not needed</a></li>
+<li class="chapter" data-level="10.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#ht-basics"><i class="fa fa-check"></i><b>10.2</b> Basics of hypothesis testing</a></li>
+<li class="chapter" data-level="10.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>10.3</b> Criminal trial analogy</a><ul>
+<li class="chapter" data-level="10.3.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#two-possible-conclusions"><i class="fa fa-check"></i><b>10.3.1</b> Two possible conclusions</a></li>
+</ul></li>
+<li class="chapter" data-level="10.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#types-of-errors-in-hypothesis-testing"><i class="fa fa-check"></i><b>10.4</b> Types of errors in hypothesis testing</a><ul>
+<li class="chapter" data-level="10.4.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#logic-of-hypothesis-testing"><i class="fa fa-check"></i><b>10.4.1</b> Logic of hypothesis testing</a></li>
+</ul></li>
+<li class="chapter" data-level="10.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#statistical-significance"><i class="fa fa-check"></i><b>10.5</b> Statistical significance</a></li>
+<li class="chapter" data-level="10.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#hypothesis-testing-with-infer"><i class="fa fa-check"></i><b>10.6</b> Hypothesis testing with infer</a></li>
+<li class="chapter" data-level="10.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-comparing-two-means"><i class="fa fa-check"></i><b>10.7</b> Example: Comparing two means</a><ul>
+<li class="chapter" data-level="10.7.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#randomizationpermutation"><i class="fa fa-check"></i><b>10.7.1</b> Randomization/permutation</a></li>
+<li class="chapter" data-level="10.7.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#comparing-action-and-romance-movies"><i class="fa fa-check"></i><b>10.7.2</b> Comparing action and romance movies</a></li>
+<li class="chapter" data-level="10.7.3" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#sampling-rightarrow-randomization"><i class="fa fa-check"></i><b>10.7.3</b> Sampling <span class="math inline">\(\rightarrow\)</span> randomization</a></li>
+<li class="chapter" data-level="10.7.4" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#data"><i class="fa fa-check"></i><b>10.7.4</b> Data</a></li>
+<li class="chapter" data-level="10.7.5" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#model-of-h_0"><i class="fa fa-check"></i><b>10.7.5</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.6" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#test-statistic-delta"><i class="fa fa-check"></i><b>10.7.6</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="10.7.7" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#observed-effect-delta"><i class="fa fa-check"></i><b>10.7.7</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="10.7.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#simulated-data"><i class="fa fa-check"></i><b>10.7.8</b> Simulated data</a></li>
+<li class="chapter" data-level="10.7.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#distribution-of-delta-under-h_0"><i class="fa fa-check"></i><b>10.7.9</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="10.7.10" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#the-p-value"><i class="fa fa-check"></i><b>10.7.10</b> The p-value</a></li>
+<li class="chapter" data-level="10.7.11" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#corresponding-confidence-interval"><i class="fa fa-check"></i><b>10.7.11</b> Corresponding confidence interval</a></li>
+<li class="chapter" data-level="10.7.12" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#summary-5"><i class="fa fa-check"></i><b>10.7.12</b> Summary</a></li>
+</ul></li>
+<li class="chapter" data-level="10.8" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>10.8</b> Building theory-based methods using computation</a><ul>
+<li class="chapter" data-level="10.8.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#example-t-test-for-two-independent-samples"><i class="fa fa-check"></i><b>10.8.1</b> Example: <span class="math inline">\(t\)</span>-test for two independent samples</a></li>
+<li class="chapter" data-level="10.8.2" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conditions-for-t-test"><i class="fa fa-check"></i><b>10.8.2</b> Conditions for t-test</a></li>
+</ul></li>
+<li class="chapter" data-level="10.9" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#conclusion-8"><i class="fa fa-check"></i><b>10.9</b> Conclusion</a><ul>
+<li class="chapter" data-level="10.9.1" data-path="10-hypothesis-testing.html"><a href="10-hypothesis-testing.html#script-of-r-code-7"><i class="fa fa-check"></i><b>10.9.1</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="11" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html"><i class="fa fa-check"></i><b>11</b> Inference for Regression</a><ul>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#needed-packages-8"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#datacamp-8"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="11.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulation-based-inference-for-regression"><i class="fa fa-check"></i><b>11.1</b> Simulation-based Inference for Regression</a><ul>
+<li class="chapter" data-level="11.1.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#data-1"><i class="fa fa-check"></i><b>11.1.1</b> Data</a></li>
+<li class="chapter" data-level="11.1.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#test-statistic-delta-1"><i class="fa fa-check"></i><b>11.1.2</b> Test statistic <span class="math inline">\(\delta\)</span></a></li>
+<li class="chapter" data-level="11.1.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#observed-effect-delta-1"><i class="fa fa-check"></i><b>11.1.3</b> Observed effect <span class="math inline">\(\delta^*\)</span></a></li>
+<li class="chapter" data-level="11.1.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#model-of-h_0-1"><i class="fa fa-check"></i><b>11.1.4</b> Model of <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.5" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#simulated-data-1"><i class="fa fa-check"></i><b>11.1.5</b> Simulated data</a></li>
+<li class="chapter" data-level="11.1.6" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#distribution-of-delta-under-h_0-1"><i class="fa fa-check"></i><b>11.1.6</b> Distribution of <span class="math inline">\(\delta\)</span> under <span class="math inline">\(H_0\)</span></a></li>
+<li class="chapter" data-level="11.1.7" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#the-p-value-1"><i class="fa fa-check"></i><b>11.1.7</b> The p-value</a></li>
+</ul></li>
+<li class="chapter" data-level="11.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#bootstrapping-for-the-regression-slope"><i class="fa fa-check"></i><b>11.2</b> Bootstrapping for the regression slope</a></li>
+<li class="chapter" data-level="11.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#inference-for-multiple-regression"><i class="fa fa-check"></i><b>11.3</b> Inference for multiple regression</a><ul>
+<li class="chapter" data-level="11.3.1" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-professor-evaluations-data"><i class="fa fa-check"></i><b>11.3.1</b> Refresher: Professor evaluations data</a></li>
+<li class="chapter" data-level="11.3.2" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-visualizations"><i class="fa fa-check"></i><b>11.3.2</b> Refresher: Visualizations</a></li>
+<li class="chapter" data-level="11.3.3" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#refresher-regression-tables"><i class="fa fa-check"></i><b>11.3.3</b> Refresher: Regression tables</a></li>
+<li class="chapter" data-level="11.3.4" data-path="11-inference-for-regression.html"><a href="11-inference-for-regression.html#script-of-r-code-8"><i class="fa fa-check"></i><b>11.3.4</b> Script of R code</a></li>
+</ul></li>
+</ul></li>
+<li class="part"><span><b>IV Conclusion</b></span></li>
+<li class="chapter" data-level="12" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html"><i class="fa fa-check"></i><b>12</b> Thinking with Data</a><ul>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#needed-packages-9"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#datacamp-9"><i class="fa fa-check"></i>DataCamp</a></li>
+<li class="chapter" data-level="12.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>12.1</b> Case study: Seattle house prices</a><ul>
+<li class="chapter" data-level="12.1.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>12.1.1</b> Exploratory data analysis (EDA)</a></li>
+<li class="chapter" data-level="12.1.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#log10-transformations"><i class="fa fa-check"></i><b>12.1.2</b> log10 transformations</a></li>
+<li class="chapter" data-level="12.1.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#eda-part-ii"><i class="fa fa-check"></i><b>12.1.3</b> EDA Part II</a></li>
+<li class="chapter" data-level="12.1.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>12.1.4</b> Regression modeling</a></li>
+<li class="chapter" data-level="12.1.5" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>12.1.5</b> Making predictions</a></li>
+</ul></li>
+<li class="chapter" data-level="12.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>12.2</b> Case study: Effective data storytelling</a><ul>
+<li class="chapter" data-level="12.2.1" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>12.2.1</b> Bechdel test for Hollywood gender representation</a></li>
+<li class="chapter" data-level="12.2.2" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>12.2.2</b> US Births in 1999</a></li>
+<li class="chapter" data-level="12.2.3" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#other-examples"><i class="fa fa-check"></i><b>12.2.3</b> Other examples</a></li>
+<li class="chapter" data-level="12.2.4" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#script-of-r-code-9"><i class="fa fa-check"></i><b>12.2.4</b> Script of R code</a></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="12-thinking-with-data.html"><a href="12-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
+</ul></li>
+<li class="appendix"><span><b>Appendix</b></span></li>
+<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a><ul>
+<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#basic-statistical-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a><ul>
+<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
+<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
+<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#standard-deviation"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation</a></li>
+<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
+<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
+<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a><ul>
+<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-10"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
+<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a><ul>
+<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a><ul>
+<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a><ul>
+<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#check-conditions-2"><i class="fa fa-check"></i><b>B.4.6</b> Check conditions</a></li>
+<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.7</b> Test statistic</a></li>
+<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.4.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a><ul>
+<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
+<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
+<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
+<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
+<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
+</ul></li>
+<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a><ul>
+<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
+<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
+<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
+<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
+<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Reach for the Stars</a><ul>
+<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-11"><i class="fa fa-check"></i>Needed packages</a></li>
+<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#sorted-barplots"><i class="fa fa-check"></i><b>C.1</b> Sorted barplots</a></li>
+<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a><ul>
+<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
+</ul></li>
+</ul></li>
+<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
+</ul>
+
+      </nav>
+    </div>
+
+    <div class="book-body">
+      <div class="body-inner">
+        <div class="book-header" role="navigation">
+          <h1>
+            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">An Introduction to Statistical and Data Sciences via R</a>
+          </h1>
+        </div>
+
+        <div class="page-wrapper" tabindex="-1" role="main">
+          <div class="page-inner">
+
+            <section class="normal" id="section-">
+<html>
+<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
+</html>
+<div id="references" class="section level1 unnumbered">
+<h1>References</h1>
+
+<div id="refs" class="references">
+<div>
+<p>Bray, Andrew, Chester Ismay, Ben Baumer, and Mine Cetinkaya-Rundel. 2019. <em>Infer: Tidy Statistical Inference</em>. <a href="https://github.com/tidymodels/infer">https://github.com/tidymodels/infer</a>.</p>
+</div>
+<div>
+<p>Chihara, Laura M., and Tim C. Hesterberg. 2011. <em>Mathematical Statistics with Resampling and R</em>. Hoboken, NJ: John Wiley; Sons. <a href="https://sites.google.com/site/chiharahesterberg/home">https://sites.google.com/site/chiharahesterberg/home</a>.</p>
+</div>
+<div>
+<p>Diez, David M, Christopher D Barr, and Mine Çetinkaya-Rundel. 2014. <em>Introductory Statistics with Randomization and Simulation</em>. First Edition. <a href="https://www.openintro.org/stat/textbook.php?stat_book=isrs">https://www.openintro.org/stat/textbook.php?stat_book=isrs</a>.</p>
+</div>
+<div>
+<p>Grolemund, Garrett, and Hadley Wickham. 2016. <em>R for Data Science</em>. <a href="http://r4ds.had.co.nz/">http://r4ds.had.co.nz/</a>.</p>
+</div>
+<div>
+<p>Ismay, Chester. 2016. <em>Getting Used to R, RStudio, and R Markdown</em>. <a href="http://ismayc.github.io/rbasics-book">http://ismayc.github.io/rbasics-book</a>.</p>
+</div>
+<div>
+<p>Kim, Albert Y., Chester Ismay, and Jennifer Chunn. 2019. <em>Fivethirtyeight: Data and Code Behind the Stories and Interactives at ’Fivethirtyeight’</em>. <a href="https://github.com/rudeboybert/fivethirtyeight">https://github.com/rudeboybert/fivethirtyeight</a>.</p>
+</div>
+<div>
+<p>Robbins, Naomi. 2013. <em>Creating More Effective Graphs</em>. Chart House.</p>
+</div>
+<div>
+<p>Wickham, Hadley. 2014. “Tidy Data.” <em>Journal of Statistical Software</em> Volume 59 (Issue 10). <a href="https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf">https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf</a>.</p>
+</div>
+<div>
+<p>———. 2015. <em>Ggplot2movies: Movies Data</em>. <a href="https://CRAN.R-project.org/package=ggplot2movies">https://CRAN.R-project.org/package=ggplot2movies</a>.</p>
+</div>
+<div>
+<p>———. 2018. <em>Nycflights13: Flights That Departed Nyc in 2013</em>. <a href="https://CRAN.R-project.org/package=nycflights13">https://CRAN.R-project.org/package=nycflights13</a>.</p>
+</div>
+<div>
+<p>Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. <em>Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics</em>. <a href="https://CRAN.R-project.org/package=ggplot2">https://CRAN.R-project.org/package=ggplot2</a>.</p>
+</div>
+<div>
+<p>Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2019. <em>Dplyr: A Grammar of Data Manipulation</em>. <a href="https://CRAN.R-project.org/package=dplyr">https://CRAN.R-project.org/package=dplyr</a>.</p>
+</div>
+<div>
+<p>Wickham, Hadley, and Lionel Henry. 2018. <em>Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions</em>. <a href="https://CRAN.R-project.org/package=tidyr">https://CRAN.R-project.org/package=tidyr</a>.</p>
+</div>
+<div>
+<p>Wilkinson, Leland. 2005. <em>The Grammar of Graphics (Statistics and Computing)</em>. Secaucus, NJ, USA: Springer-Verlag New York, Inc.</p>
+</div>
+<div>
+<p>Xie, Yihui. 2018. <em>Bookdown: Authoring Books and Technical Documents with R Markdown</em>. <a href="https://CRAN.R-project.org/package=bookdown">https://CRAN.R-project.org/package=bookdown</a>.</p>
+</div>
+</div>
+</div>
+            </section>
+
+          </div>
+        </div>
+      </div>
+<a href="C-appendixC.html" class="navigation navigation-prev navigation-unique" aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
+
+    </div>
+  </div>
+<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
+<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
+<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
+<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
+<script>
+gitbook.require(["gitbook"], function(gitbook) {
+gitbook.start({
+"sharing": {
+"github": false,
+"facebook": true,
+"twitter": true,
+"google": false,
+"linkedin": false,
+"weibo": false,
+"instapaper": false,
+"vk": false,
+"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
+},
+"fontsettings": {
+"theme": "white",
+"family": "sans",
+"size": 2
+},
+"edit": {
+"link": "https://github.com/moderndive/moderndive_book/edit/master/99-references.Rmd",
+"text": "Edit"
+},
+"history": {
+"link": null,
+"text": null
+},
+"download": null,
+"toc": {
+"collapse": "section",
+"scroll_highlight": true
+}
+});
+});
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    var src = "true";
+    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (location.protocol !== "file:" && /^https?:/.test(src))
+      src = src.replace(/^https?:/, '');
+    script.src = src;
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+</body>
+
+</html>
diff --git a/previous_versions/v0.4.0/scripts/02-getting-started.R b/previous_versions/v0.4.0/scripts/02-getting-started.R
new file mode 100755
index 000000000..1d5551eb9
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/02-getting-started.R
@@ -0,0 +1,49 @@
+## ---- eval=FALSE---------------------------------------------------------
+## library(ggplot2)
+## library(dplyr)
+
+## ----message=FALSE-------------------------------------------------------
+library(dplyr)
+
+# Be sure to install these first!
+library(nycflights13)
+library(knitr)
+
+## ----load_flights--------------------------------------------------------
+flights
+
+## **_Learning check_**
+
+## **Learning Check Solutions**
+
+## NA
+## ------------------------------------------------------------------------
+glimpse(flights)
+
+## **_Learning check_**
+
+## **Learning Check Solutions**
+
+## NA
+## ----eval=FALSE----------------------------------------------------------
+## airlines
+## kable(airlines)
+
+## ----eval=FALSE----------------------------------------------------------
+## airlines
+## airlines$name
+
+## ----eval=FALSE----------------------------------------------------------
+## ?flights
+
+## ---- echo=FALSE, warning=FALSE, message=FALSE, results='hide'-----------
+# needed_pkgs <- c("nycflights13", "tibble", "dplyr", "ggplot2", "knitr", 
+#   "okcupiddata", "dygraphs", "rmarkdown", "mosaic", 
+#   "ggplot2movies", "fivethirtyeight", "readr")
+# 
+# new.pkgs <- needed_pkgs[!(needed_pkgs %in% installed.packages())]
+# 
+# if(length(new.pkgs)) {
+#   install.packages(new.pkgs, repos = "http://cran.rstudio.com")
+# }
+
diff --git a/previous_versions/v0.4.0/scripts/03-visualization.R b/previous_versions/v0.4.0/scripts/03-visualization.R
new file mode 100755
index 000000000..3a52e073a
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/03-visualization.R
@@ -0,0 +1,328 @@
+## ----message=FALSE-------------------------------------------------------
+library(nycflights13)
+library(ggplot2)
+library(dplyr)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(gapminder)
+library(knitr)
+library(readr)
+
+## ---- echo=FALSE---------------------------------------------------------
+gapminder_2007 <- gapminder %>% 
+  filter(year == 2007) %>% 
+  select(-year) %>% 
+  rename(
+    Country = country,
+    Continent = continent,
+    `Life Expectancy` = lifeExp,
+    `Population` = pop,
+    `GDP per Capita` = gdpPercap
+  )
+
+## ---- echo=FALSE---------------------------------------------------------
+gapminder_2007 %>% 
+  head() %>% 
+  kable(
+    digits=2,
+    caption = "Gapminder 2007 Data: First 6 of 142 countries", 
+    booktabs = TRUE
+  )
+
+## ----gapminder, echo=FALSE, fig.cap="Life Expectancy over GDP per Capita in 2007"----
+ggplot(data = gapminder_2007, mapping = aes(x=`GDP per Capita`, y=`Life Expectancy`, size=Population, col=Continent)) +
+  geom_point()
+
+## ---- echo=FALSE---------------------------------------------------------
+map <- data_frame(
+  `data variable` = c("GDP per Capita", "Life Expectancy", "Population", "Continent"),
+  aes = c("x", "y", "size", "color"),
+  geom = c("point", "point", "point", "point")
+)
+
+map %>% 
+  kable(
+    caption = "Summary of Grammar of Graphics for this plot", 
+    booktabs = TRUE
+    )
+
+## **_Review questions_**
+
+## ------------------------------------------------------------------------
+all_alaska_flights <- flights %>% 
+  filter(carrier == "AS")
+
+## **Learning Check Solutions**
+
+## ----noalpha, fig.cap="Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013", message=TRUE----
+ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + 
+  geom_point()
+
+## ----nolayers, fig.cap="Plot with No Layers"-----------------------------
+ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay))
+
+## **Learning Check Solutions**
+
+## ---- include=show_solutions('3-2'), echo=show_solutions('3-2')----------
+ggplot(data = all_alaska_flights, mapping = aes(x = dep_time, y = dep_delay)) +
+  geom_point()
+
+## ----alpha, fig.cap="Delay scatterplot with alpha=0.2"-------------------
+ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + 
+  geom_point(alpha = 0.2)
+
+## ----jitter-example-df, echo=FALSE---------------------------------------
+jitter_example <- data_frame(
+  x = c(0, 0, 0, 0),
+  y = c(0, 0, 0, 0)
+)
+
+## ----jitter-example-df-01------------------------------------------------
+jitter_example
+
+## ----jitter-example-plot-1, fig.cap="Regular scatterplot of jitter example data", echo=FALSE----
+ggplot(data = jitter_example, mapping = aes(x = x, y = y)) + 
+  geom_point() +
+  coord_cartesian(xlim = c(-0.025, 0.025), ylim = c(-0.025, 0.025)) + 
+  labs(title = "Regular scatterplot")
+
+## ----jitter-example-plot-2, fig.cap="Jittered scatterplot of jitter example data", echo=FALSE----
+ggplot(data = jitter_example, mapping = aes(x = x, y = y)) + 
+  geom_jitter(width = 0.01, height = 0.01) +
+  coord_cartesian(xlim = c(-0.025, 0.025), ylim = c(-0.025, 0.025)) + 
+  labs(title = "Jittered scatterplot")
+
+## ----jitter, fig.cap="Jittered delay scatterplot"------------------------
+ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + 
+  geom_jitter(width = 30, height = 30)
+
+## ---- eval = FALSE-------------------------------------------------------
+## ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) +
+##   geom_jitter(width = 30, height = 30)
+## ggplot(all_alaska_flights, aes(x = dep_delay, y = arr_delay)) +
+##   geom_jitter(width = 30, height = 30)
+
+## **Learning Check Solutions**
+
+## ------------------------------------------------------------------------
+early_january_weather <- weather %>% 
+  filter(origin == "EWR" & month == 1 & day <= 15)
+
+## **Learning Check Solutions**
+
+## ----hourlytemp, fig.cap="Hourly Temperature in Newark for January 1-15, 2013"----
+ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) +
+  geom_line()
+
+## **Learning Check Solutions**
+
+## ---- include=show_solutions('3-5'), echo=show_solutions('3-5')----------
+ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = humid)) +
+  geom_line()
+
+## ----echo=FALSE, fig.height=0.8, fig.cap="Plot of Hourly Temperature Recordings from NYC in 2013"----
+ggplot(data = weather, mapping = aes(x = temp, y = factor("A"))) +
+  geom_point() +
+  theme(axis.ticks.y = element_blank(), 
+        axis.title.y = element_blank(),
+        axis.text.y = element_blank())
+hist_title <- "Histogram of Hourly Temperature Recordings from NYC in 2013"
+
+## ---- warning=TRUE, fig.cap=hist_title-----------------------------------
+ggplot(data = weather, mapping = aes(x = temp)) +
+  geom_histogram()
+
+## ----fig.cap=paste(hist_title, "- 60 Bins")------------------------------
+ggplot(data = weather, mapping = aes(x = temp)) +
+  geom_histogram(bins = 60, color = "white")
+
+## ----fig.cap=paste(hist_title, "- 60 Colored Bins")----------------------
+ggplot(data = weather, mapping = aes(x = temp)) +
+  geom_histogram(bins = 60, color = "white", fill = "steelblue")
+
+## ----fig.cap=paste(hist_title, "- Binwidth = 10"), fig.height=5----------
+ggplot(data = weather, mapping = aes(x = temp)) +
+  geom_histogram(binwidth = 10, color = "white")
+
+## **Learning Check Solutions**
+
+## ---- echo=show_solutions('3-7'), include=show_solutions('3-7'), message=FALSE, warning=FALSE----
+IQR(weather$temp, na.rm=TRUE)
+
+## ---- echo=show_solutions('3-7'), include=show_solutions('3-7'), message=FALSE, warning=FALSE----
+summary(weather$temp)
+
+## ----facethistogram, fig.cap="Faceted histogram"-------------------------
+ggplot(data = weather, mapping = aes(x = temp)) +
+  geom_histogram(binwidth = 5, color = "white") +
+  facet_wrap(~ month, nrow = 4)
+
+## **Learning Check Solutions**
+
+## ----badbox, fig.cap="Invalid boxplot specification", fig.height=3.5-----
+ggplot(data = weather, mapping = aes(x = month, y = temp)) +
+  geom_boxplot()
+
+## ----monthtempbox, fig.cap="Month by temp boxplot", fig.height=3.7-------
+ggplot(data = weather, mapping = aes(x = factor(month), y = temp)) +
+  geom_boxplot()
+
+## ----monthtempbox2, echo=FALSE, fig.cap="November boxplot", fig.height=3.7----
+weather %>% 
+  filter(month %in% c(11)) %>% 
+  ggplot(mapping = aes(x = factor(month), y = temp)) +
+  geom_boxplot()
+
+## ----monthtempbox3, echo=FALSE, fig.cap="November boxplot with points", fig.height=3.7----
+quartiles <- weather %>% filter(month == 11) %>% pull(temp) %>% quantile(prob=c(0.25, 0.5, 0.75))
+weather %>% 
+  filter(month %in% c(11)) %>% 
+  ggplot(mapping = aes(x = factor(month), y = temp)) +
+  geom_boxplot() +
+  geom_jitter(width = 0.05, height = 0.5, alpha = 0.2)
+
+## **Learning Check Solutions**
+
+## ---- include=show_solutions('3-9'), echo=show_solutions('3-9')----------
+weather %>% 
+  filter(month==5 & temp < 25)
+
+## There appears to be only one hour and only at JFK that recorded 13.1 F (-10.5 C) in the month of May. This is probably a data entry mistake!
+
+## ---- echo=show_solutions('3-9'), eval=FALSE-----------------------------
+## weather %>%
+##   group_by(month) %>%
+##   summarize(IQR = IQR(temp, na.rm=TRUE)) %>%
+##   arrange(desc(IQR))
+
+## ---- echo=FALSE, include=show_solutions('3-9')--------------------------
+weather %>%
+  group_by(month) %>%
+  summarize(IQR = IQR(temp, na.rm=TRUE)) %>%
+  arrange(desc(IQR)) %>%
+  kable()
+
+## **`r paste0("(LC", chap, ".", (lc - 1), ")")`: We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can't we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?**
+
+## ------------------------------------------------------------------------
+fruits <- data_frame(
+  fruit = c("apple", "apple", "apple", "orange", "orange")
+)
+fruits_counted <- data_frame(
+  fruit = c("apple", "orange"),
+  number = c(3, 2)
+)
+
+## ----fruits, echo=FALSE--------------------------------------------------
+kable(
+    fruits,
+    digits=2,
+    caption = "Fruits", 
+    booktabs = TRUE
+  )
+
+## ----fruitscounted, echo=FALSE-------------------------------------------
+kable(
+    fruits_counted,
+    digits=2,
+    caption = "Fruits (Pre-Counted)", 
+    booktabs = TRUE
+  )
+
+## ----geombar, fig.cap="Barplot when counts are not pre-counted", fig.height=2.5----
+ggplot(data = fruits, mapping = aes(x = fruit)) +
+  geom_bar()
+
+## ---- geomcol, fig.cap="Barplot when counts are pre-counted", fig.height=2.5----
+ggplot(data = fruits_counted, mapping = aes(x = fruit, y = number)) +
+  geom_col()
+
+## ----flightsbar, fig.cap="Number of flights departing NYC in 2013 by airline using geom_bar", fig.height=2.5----
+ggplot(data = flights, mapping = aes(x = carrier)) +
+  geom_bar()
+
+## ---- eval=FALSE---------------------------------------------------------
+## airlines
+
+## ---- echo=FALSE---------------------------------------------------------
+kable(airlines)
+
+## ----message=FALSE, eval=FALSE-------------------------------------------
+## flights_table <- flights %>%
+##   group_by(carrier) %>%
+##   summarize(number = n())
+## flights_table
+
+## ----message=FALSE, echo=FALSE-------------------------------------------
+flights_table <- flights %>% 
+  group_by(carrier) %>% 
+  summarize(number = n())
+kable(flights_table)
+
+## ----flightscol, fig.cap="Number of flights departing NYC in 2013 by airline using geom_col", fig.height=2.5----
+ggplot(data = flights_table, mapping = aes(x = carrier, y = number)) +
+  geom_col()
+
+## **Learning Check Solutions**
+
+## ----carrierpie, echo=FALSE, fig.cap="The dreaded pie chart", fig.height=5----
+ggplot(flights, mapping = aes(x = factor(1), fill = carrier)) +
+  geom_bar(width = 1) +
+  coord_polar(theta = "y") +
+  theme(axis.title.x = element_blank(), 
+    axis.title.y = element_blank(),
+    axis.ticks = element_blank(),
+    axis.text.y = element_blank(),
+    axis.text.x = element_blank(),
+    panel.grid.major = element_blank(),
+    panel.grid.minor = element_blank()) +
+  guides(fill = guide_legend(keywidth = 0.8, keyheight = 0.8))
+
+## **Learning Check Solutions**
+
+## ----message=FALSE-------------------------------------------------------
+flights_namedports <- flights %>% 
+  inner_join(airports, by = c("origin" = "faa"))
+
+## ---- fig.cap="Stacked barplot comparing the number of flights by carrier and airport", fig.height=3.5----
+ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
+  geom_bar()
+
+## ---- eval=FALSE---------------------------------------------------------
+## ggplot(data = flights_namedports, mapping = aes(x = carrier), fill = name) +
+##   geom_bar()
+
+## **Learning Check Solutions**
+
+## ---- fig.cap="Side-by-side AKA dodged barplot comparing the number of flights by carrier and airport", fig.height=5----
+ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
+  geom_bar(position = "dodge")
+
+## **Learning Check Solutions**
+
+## ----facet-bar-vert, fig.cap="Faceted barplot comparing the number of flights by carrier and airport", fig.height=7.5----
+ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
+  geom_bar() +
+  facet_wrap(~ name, ncol = 1)
+
+## **Learning Check Solutions**
+
+## ----viz-summary-table, echo=FALSE, message=FALSE------------------------
+# Original at https://docs.google.com/spreadsheets/d/1vzqlFiT6qm5wzy_L_0nL7EWAd6jiUZmLSCFhDhztDSg/edit#gid=0
+read_csv("data/ch3_summary_table - Sheet1.csv", na = "") %>% 
+  rename_(" " = "X1") %>% 
+  kable(
+    caption = "Summary of 5NG", 
+    booktabs = TRUE
+  )
+
+## ----viz-map, echo=FALSE, fig.cap="Mind map for Data Visualization", out.width="200%"----
+#library(knitr)
+#if(knitr:::is_html_output()){
+#  include_url("https://coggle.it/diagram/V_G2gzukTDoQ-aZt-", 
+#              height = "1000px")
+#} else {
+  #include_graphics("images/coggleviz.png")
+#}
+
diff --git a/previous_versions/v0.4.0/scripts/04-tidy.R b/previous_versions/v0.4.0/scripts/04-tidy.R
new file mode 100755
index 000000000..413aea2f0
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/04-tidy.R
@@ -0,0 +1,188 @@
+## ----setup_tidy, include=FALSE-------------------------------------------
+chap <- 4
+lc <- 0
+rq <- 0
+# **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**
+# **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
+
+knitr::opts_chunk$set(
+  tidy = FALSE, 
+  out.width = '\\textwidth'
+  )
+
+# This bit of code is a bug fix on asis blocks, which we use to show/not show LC
+# solutions, which are written like markdown text. In theory, it shouldn't be
+# necessary for knitr versions <=1.11.6, but I've found I still need to for
+# everything to knit properly in asis blocks. More info here: 
+# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
+library(knitr)
+knit_engines$set(asis = function(options) {
+  if (options$echo && options$eval) knit_child(text = options$code)
+})
+
+# This controls which LC solutions to show. Options for solutions_shown: "ALL"
+# (to show all solutions), or subsets of c('4-4', '4-5'), including the
+# null vector c('') to show no solutions.
+# solutions_shown <- c('4-1', '4-2', '4-3', '4-4')
+solutions_shown <- c('')
+show_solutions <- function(section){
+  return(solutions_shown == "ALL" | section %in% solutions_shown)
+  }
+
+## ----warning=FALSE, message=FALSE----------------------------------------
+library(dplyr)
+library(ggplot2)
+library(nycflights13)
+library(tidyr)
+library(readr)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+library(fivethirtyeight)
+library(stringr)
+
+## ----tidyfig, echo=FALSE, fig.cap="Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html"----
+knitr::include_graphics("images/tidy-1.png")
+
+## ----echo=FALSE----------------------------------------------------------
+stocks <- data_frame(
+  Date = as.Date('2009-01-01') + 0:4,
+  `Boeing Stock Price` = paste("$", c("173.55", "172.61", "173.86", "170.77", "174.29"), sep = ""),
+  `Amazon Stock Price` = paste("$", c("174.90", "171.42", "171.58", "173.89", "170.16"), sep = ""),
+  `Google Stock Price` = paste("$", c("174.34", "170.04", "173.65", "174.87", "172.19") ,sep = "")
+) %>% 
+  slice(1:2)
+stocks %>% 
+  kable(
+    digits = 2,
+    caption = "Stock Prices (Non-Tidy Format)", 
+    booktabs = TRUE
+  )
+
+## ----echo=FALSE----------------------------------------------------------
+stocks_tidy <- stocks %>% 
+  rename(
+    Boeing = `Boeing Stock Price`,
+    Amazon = `Amazon Stock Price`,
+    Google = `Google Stock Price`
+  ) %>% 
+  gather(`Stock Name`, `Stock Price`, -Date)
+stocks_tidy %>% 
+  kable(
+    digits = 2,
+    caption = "Stock Prices (Tidy Format)", 
+    booktabs = TRUE
+  ) 
+
+## ----echo=FALSE----------------------------------------------------------
+stocks <- data_frame(
+  Date = as.Date('2009-01-01') + 0:4,
+  `Boeing Price` = paste("$", c("173.55", "172.61", "173.86", "170.77", "174.29"), sep = ""),
+  `Weather` = c("Sunny", "Overcast", "Rain", "Rain", "Sunny")
+) %>% 
+  slice(1:2)
+stocks %>% 
+  kable(
+    digits = 2,
+    caption = "Date, Boeing Price, Weather Data", 
+    booktabs = TRUE
+  )
+
+## **_Learning check_**
+
+## ----echo=FALSE----------------------------------------------------------
+drinks_sub <- drinks %>%
+  select(-total_litres_of_pure_alcohol) %>% 
+  filter(country %in% c("USA", "Canada", "South Korea"))
+drinks_sub_tidy <- drinks_sub %>%
+  gather(type, servings, -c(country)) %>%
+  mutate(
+    type = str_sub(type, start=1, end=-10)
+  ) %>%
+  arrange(country, type) %>% 
+  rename(`alcohol type` = type)
+drinks_sub
+
+## **Learning Check Solutions**
+
+## ----lc4-1solutions-2, include=show_solutions('4-1'), echo=FALSE---------
+drinks_sub_tidy
+
+## Note that how the rows are sorted is inconsequential in whether or not the data frame is in tidy format. In other words, the following data frame sorted by alcohol type instead of country is equally in tidy format.
+
+## ----lc4-1solutions-4, include=show_solutions('4-1'), echo=FALSE---------
+drinks_sub_tidy %>% 
+  arrange(`alcohol type`)
+
+## ------------------------------------------------------------------------
+glimpse(airports)
+
+## **_Learning check_**
+
+## **Learning Check Solutions**
+
+## ----message=FALSE, eval=FALSE-------------------------------------------
+## library(readr)
+## dem_score <- read_csv("https://moderndive.com/data/dem_score.csv")
+## dem_score
+
+## ----message=FALSE, echo=FALSE-------------------------------------------
+dem_score <- read_csv("data/dem_score.csv")
+dem_score
+
+## ------------------------------------------------------------------------
+guat_dem <- dem_score %>% 
+  filter(country == "Guatemala")
+guat_dem
+
+## ------------------------------------------------------------------------
+guat_tidy <- gather(data = guat_dem, 
+                    key = year,
+                    value = democracy_score,
+                    - country) 
+guat_tidy
+
+## ----errors=TRUE---------------------------------------------------------
+ggplot(data = guat_tidy, mapping = aes(x = year, y = democracy_score)) +
+  geom_line()
+
+## ----guatline, fig.cap="Guatemala's democracy score ratings from 1952 to 1992"----
+ggplot(data = guat_tidy, mapping = aes(x = parse_number(year), y = democracy_score)) +
+  geom_line() +
+  labs(x = "year")
+
+## **Learning Check Solutions**
+
+## ----lc4-3solutions-2, include=show_solutions('4-3')---------------------
+dem_score_tidy <- gather(data = dem_score, key = year, value = democracy_score, - country)
+
+## Let's now compare the `dem_score` and `dem_score_tidy`. `dem_score` has democracy score information for each year in columns, whereas in `dem_score_tidy` there are explicit variables `year` and `democracy_score`. While both representations of the data contain the same information, we can only use `ggplot()` to create plots using the `dem_score_tidy` data frame.
+
+## ----lc4-3solutions-4, include=show_solutions('4-3')---------------------
+dem_score
+dem_score_tidy
+
+## **`r paste0("(LC", chap, ".", (lc - 1), ")")`** The code is similar
+
+## ----lc4-3solutions-6, include=show_solutions('4-3'), echo=show_solutions('4-3'), message=FALSE, warning=FALSE----
+life_expectancy <- read_csv('https://moderndive.com/data/le_mess.csv')
+life_expectancy_tidy <- gather(data = life_expectancy, key = year, value = life_expectancy, -country)
+
+## We observe the same construct structure with respect to `year` in `life_expectancy` vs `life_expectancy_tidy` as we did in `dem_score` vs `dem_score_tidy`:
+
+## ----lc4-3solutions-8, lc4-2solutions-4, include=show_solutions('4-3')----
+life_expectancy
+life_expectancy_tidy
+
+## ----message=FALSE-------------------------------------------------------
+library(dplyr)
+joined_flights <- inner_join(x = flights, y = airlines, by = "carrier")
+
+## ----eval=FALSE----------------------------------------------------------
+## View(joined_flights)
+
+## **_Learning check_**
+
+## **Learning Check Solutions**
+
diff --git a/previous_versions/v0.4.0/scripts/05-wrangling.R b/previous_versions/v0.4.0/scripts/05-wrangling.R
new file mode 100755
index 000000000..332f8b26f
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/05-wrangling.R
@@ -0,0 +1,336 @@
+## ---- message=FALSE------------------------------------------------------
+library(dplyr)
+library(ggplot2)
+library(nycflights13)
+
+## ---- eval=FALSE---------------------------------------------------------
+## portland_flights <- flights %>%
+##   filter(dest == "PDX")
+## View(portland_flights)
+
+## ---- eval=FALSE---------------------------------------------------------
+## btv_sea_flights_fall <- flights %>%
+##   filter(origin == "JFK", (dest == "BTV" | dest == "SEA"), month >= 10)
+## View(btv_sea_flights_fall)
+
+## ---- eval=FALSE---------------------------------------------------------
+## not_BTV_SEA <- flights %>%
+##   filter(!(dest == "BTV" | dest == "SEA"))
+## View(not_BTV_SEA)
+
+## ---- eval=FALSE---------------------------------------------------------
+## summary_temp <- weather %>%
+##   summarize(mean = mean(temp), std_dev = sd(temp))
+## summary_temp
+
+## ---- echo=FALSE---------------------------------------------------------
+summary_temp <- weather %>% 
+  summarize(mean = mean(temp), std_dev = sd(temp))
+kable(summary_temp)
+
+## ---- eval=FALSE---------------------------------------------------------
+## summary_temp <- weather %>%
+##   summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE))
+## summary_temp
+
+## ---- echo=FALSE---------------------------------------------------------
+summary_temp <- weather %>% 
+  summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE))
+kable(summary_temp)
+
+## ------------------------------------------------------------------------
+#summary_temp$mean
+
+## ----eval=FALSE----------------------------------------------------------
+## summary_temp <- weather %>%
+##   summarize(mean = mean(temp, na.rm = TRUE)) %>%
+##   summarize(std_dev = sd(temp, na.rm = TRUE))
+
+## ---- eval=FALSE---------------------------------------------------------
+## summary_monthly_temp <- weather %>%
+##   group_by(month) %>%
+##   summarize(mean = mean(temp, na.rm = TRUE),
+##             std_dev = sd(temp, na.rm = TRUE))
+## summary_monthly_temp
+
+## ---- echo=FALSE---------------------------------------------------------
+summary_monthly_temp <- weather %>% 
+  group_by(month) %>% 
+  summarize(mean = mean(temp, na.rm = TRUE), 
+            std_dev = sd(temp, na.rm = TRUE))
+kable(summary_monthly_temp)
+
+## ---- eval=FALSE---------------------------------------------------------
+## by_origin <- flights %>%
+##   group_by(origin) %>%
+##   summarize(count = n())
+## by_origin
+
+## ---- echo=FALSE---------------------------------------------------------
+by_origin <- flights %>% 
+  group_by(origin) %>% 
+  summarize(count = n())
+kable(by_origin)
+
+## ------------------------------------------------------------------------
+by_origin_monthly <- flights %>% 
+  group_by(origin, month) %>% 
+  summarize(count = n())
+by_origin_monthly
+
+## ------------------------------------------------------------------------
+by_monthly_origin <- flights %>% 
+  group_by(month, origin) %>% 
+  summarize(count = n())
+by_monthly_origin
+
+## ------------------------------------------------------------------------
+by_origin_monthly_incorrect <- flights %>% 
+  group_by(origin) %>% 
+  group_by(month) %>% 
+  summarize(count = n())
+by_origin_monthly_incorrect
+
+## ---- eval=FALSE---------------------------------------------------------
+## by_monthly_origin <- flights %>%
+##   count(origin, month)
+## by_monthly_origin
+
+## NA
+## ------------------------------------------------------------------------
+flights <- flights %>% 
+  mutate(gain = dep_delay - arr_delay)
+
+## ---- echo=FALSE---------------------------------------------------------
+flights %>% 
+  select(dep_delay, arr_delay, gain) %>% 
+  slice(1:5)
+
+## ---- eval=FALSE---------------------------------------------------------
+## gain_summary <- flights %>%
+##   summarize(
+##     min = min(gain, na.rm = TRUE),
+##     q1 = quantile(gain, 0.25, na.rm = TRUE),
+##     median = quantile(gain, 0.5, na.rm = TRUE),
+##     q3 = quantile(gain, 0.75, na.rm = TRUE),
+##     max = max(gain, na.rm = TRUE),
+##     mean = mean(gain, na.rm = TRUE),
+##     sd = sd(gain, na.rm = TRUE),
+##     missing = sum(is.na(gain))
+##   )
+## gain_summary
+
+## ----echo=FALSE----------------------------------------------------------
+gain_summary <- flights %>% 
+  summarize(
+    min = min(gain, na.rm = TRUE),
+    q1 = quantile(gain, 0.25, na.rm = TRUE),
+    median = quantile(gain, 0.5, na.rm = TRUE),
+    q3 = quantile(gain, 0.75, na.rm = TRUE),
+    max = max(gain, na.rm = TRUE),
+    mean = mean(gain, na.rm = TRUE),
+    sd = sd(gain, na.rm = TRUE),
+    missing = sum(is.na(gain))
+  )
+kable(gain_summary)
+
+## ----message=FALSE, fig.cap="Histogram of gain variable"-----------------
+ggplot(data = flights, mapping = aes(x = gain)) +
+  geom_histogram(color = "white", bins = 20)
+
+## ------------------------------------------------------------------------
+flights <- flights %>% 
+  mutate(
+    gain = dep_delay - arr_delay,
+    hours = air_time / 60,
+    gain_per_hour = gain / hours
+  )
+
+## ---- eval---------------------------------------------------------------
+freq_dest <- flights %>% 
+  group_by(dest) %>% 
+  summarize(num_flights = n())
+freq_dest
+
+## ------------------------------------------------------------------------
+freq_dest %>% 
+  arrange(num_flights)
+
+## ------------------------------------------------------------------------
+freq_dest %>% 
+  arrange(desc(num_flights))
+
+## ----eval=FALSE----------------------------------------------------------
+## View(airlines)
+
+## ----eval=FALSE----------------------------------------------------------
+## flights_joined <- flights %>%
+##   inner_join(airlines, by = "carrier")
+## View(flights)
+## View(flights_joined)
+
+## ----eval=FALSE----------------------------------------------------------
+## View(airports)
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights %>%
+##   inner_join(airports, by = c("dest" = "faa"))
+
+## ------------------------------------------------------------------------
+named_dests <- flights %>%
+  group_by(dest) %>%
+  summarize(num_flights = n()) %>%
+  arrange(desc(num_flights)) %>%
+  inner_join(airports, by = c("dest" = "faa")) %>%
+  rename(airport_name = name)
+named_dests
+
+## ------------------------------------------------------------------------
+flights_weather_joined <- flights %>%
+  inner_join(weather, by = c("year", "month", "day", "hour", "origin"))
+flights_weather_joined
+
+## ---- eval=FALSE---------------------------------------------------------
+## glimpse(flights)
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights %>%
+##   select(carrier, flight)
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_no_year <- flights %>%
+##   select(-year)
+## names(flights_no_year)
+
+## ---- eval=FALSE---------------------------------------------------------
+## flight_arr_times <- flights %>%
+##   select(month:day, arr_time:sched_arr_time)
+## flight_arr_times
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_reorder <- flights %>%
+##   select(month:day, hour:time_hour, everything())
+## names(flights_reorder)
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_begin_a <- flights %>%
+##   select(starts_with("a"))
+## flights_begin_a
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_delays <- flights %>%
+##   select(ends_with("delay"))
+## flights_delays
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_time <- flights %>%
+##   select(contains("time"))
+## flights_time
+
+## ---- eval=FALSE---------------------------------------------------------
+## flights_time_new <- flights %>%
+##   select(contains("time")) %>%
+##   rename(departure_time = dep_time,
+##          arrival_time = arr_time)
+## names(flights_time)
+
+## ---- eval=FALSE---------------------------------------------------------
+## named_dests %>%
+##   top_n(n = 10, wt = num_flights)
+
+## ---- eval=FALSE---------------------------------------------------------
+## named_dests  %>%
+##   top_n(n = 10, wt = num_flights) %>%
+##   arrange(desc(num_flights))
+
+## ---- eval=FALSE---------------------------------------------------------
+## ten_freq_dests <- flights %>%
+##   group_by(dest) %>%
+##   summarize(num_flights = n()) %>%
+##   arrange(desc(num_flights)) %>%
+##   top_n(n = 10)
+## View(ten_freq_dests)
+
+## ----wrangle-summary-table, echo=FALSE, message=FALSE--------------------
+# Original at https://docs.google.com/spreadsheets/d/1nRkXfYMQiTj79c08xQPY0zkoJSpde3NC1w6DRhsWCss/edit#gid=0
+read_csv("data/ch5_summary_table - Sheet1.csv", na = "") %>% 
+  rename_(" " = "X1") %>% 
+  kable(
+    caption = "Summary of data wrangling verbs", 
+    booktabs = TRUE
+  )
+
+## **Learning Check Solutions**
+
+## ----lc5-71solutions-2, include=show_solutions('5-7')--------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  mutate(ASM = seats * distance) %>% 
+  group_by(carrier) %>% 
+  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
+  arrange(desc(ASM))
+
+## Let's now break this down step-by-step. To compute the available seat miles for a given flight, we need the `distance` variable from the `flights` data frame and the `seats` variable from the `planes` data frame, necessitating a join by the key variable `tailnum` as illustrated in Figure \@ref(fig:reldiagram). To keep the resulting data frame easy to view, we'll `select()` only these two variables and `carrier`:
+
+## ----lc5-71solutions-4, include=show_solutions('5-7')--------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance)
+
+## Now for each flight we can compute the available seat miles `ASM` by multiplying the number of seats by the distance via a `mutate()`:
+
+## ----lc5-71solutions-6, include=show_solutions('5-7')--------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  # Added:
+  mutate(ASM = seats * distance)
+
+## Next we want to sum the `ASM` for each carrier. We achieve this by first grouping by `carrier` and then summarizing using the `sum()` function:
+
+## ----lc5-71solutions-8, include=show_solutions('5-7')--------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  mutate(ASM = seats * distance) %>% 
+  # Added:
+  group_by(carrier) %>% 
+  summarize(ASM = sum(ASM))
+
+## However, because for certain carriers certain flights have missing `NA` values, the resulting table also returns `NA`'s. We can eliminate these by adding a `na.rm = TRUE` argument to `sum()`, telling R that we want to remove the `NA`'s in the sum. We saw this in Section \ref(summarize):
+
+## ----lc5-71solutions-10, include=show_solutions('5-7')-------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  mutate(ASM = seats * distance) %>% 
+  group_by(carrier) %>% 
+  # Modified:
+  summarize(ASM = sum(ASM, na.rm = TRUE))
+
+## Finally, we `arrange()` the data in `desc()`ending order of `ASM`.
+
+## ----lc5-71solutions-12, include=show_solutions('5-7')-------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  mutate(ASM = seats * distance) %>% 
+  group_by(carrier) %>% 
+  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
+  # Added:
+  arrange(desc(ASM))
+
+## While the above data frame is correct, the IATA `carrier` code is not always useful. For example, what carrier is `WN`? We can address this by joining with the `airlines` dataset using `carrier` is the key variable. While this step is not absolutely required, it goes a long way to making the table easier to make sense of. It is important to be empathetic with the ultimate consumers of your presented data!
+
+## ----lc5-71solutions-14, include=show_solutions('5-7')-------------------
+flights %>% 
+  inner_join(planes, by = "tailnum") %>% 
+  select(carrier, seats, distance) %>% 
+  mutate(ASM = seats * distance) %>% 
+  group_by(carrier) %>% 
+  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
+  arrange(desc(ASM)) %>% 
+  # Added:
+  inner_join(airlines, by = "carrier")
+
diff --git a/previous_versions/v0.4.0/scripts/06-regression.R b/previous_versions/v0.4.0/scripts/06-regression.R
new file mode 100755
index 000000000..74b3f07b0
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/06-regression.R
@@ -0,0 +1,545 @@
+## ---- message=FALSE, warning=FALSE---------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+library(gapminder)
+library(skimr)
+
+## ---- message=FALSE, warning=FALSE, echo=FALSE---------------------------
+# Packages needed internally, but not in text.
+library(mvtnorm)
+library(tidyr)
+library(forcats)
+library(gridExtra)
+library(broom)
+library(janitor)
+library(patchwork)
+
+## ------------------------------------------------------------------------
+evals_ch6 <- evals %>%
+  select(score, bty_avg, age)
+
+## ---- eval=FALSE---------------------------------------------------------
+## evals_ch6 %>%
+##   sample_n(5)
+
+## ---- echo=FALSE---------------------------------------------------------
+set.seed(76)
+evals_ch6 %>%
+  sample_n(5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Random sample of 5 instructors",
+    booktabs = TRUE
+  )
+
+## ------------------------------------------------------------------------
+glimpse(evals_ch6)
+
+## ------------------------------------------------------------------------
+evals_ch6 %>% 
+  select(score, bty_avg) %>% 
+  skim()
+
+## ----correlation1, echo=FALSE, fig.cap="Different correlation coefficients"----
+correlation <- c(-0.9999, -0.75, 0, 0.75, 0.9999)
+n_sim <- 100
+
+values <- NULL
+for(i in seq_len(length(correlation))){
+  rho <- correlation[i]
+  sigma <- matrix(c(5, rho * sqrt(50), rho * sqrt(50), 10), 2, 2) 
+  sim <- rmvnorm(
+    n = n_sim,
+    mean = c(20,40),
+    sigma = sigma
+    ) %>%
+    as_data_frame() %>% 
+    mutate(correlation = round(rho, 2))
+  
+  values <- bind_rows(values, sim)
+}
+
+ggplot(data = values, mapping = aes(V1, V2)) +
+  geom_point() +
+  facet_wrap(~ correlation, nrow = 2) +
+  labs(x = "x", y = "y") + 
+  theme(
+    axis.text.x = element_blank(),
+    axis.text.y = element_blank(),
+    axis.ticks = element_blank()
+  )
+
+## ------------------------------------------------------------------------
+evals_ch6 %>% 
+  get_correlation(formula = score ~ bty_avg)
+
+## ------------------------------------------------------------------------
+cor(x = evals_ch6$bty_avg, y = evals_ch6$score)
+
+## ----numxplot1, warning=FALSE, fig.cap="Instructor evaluation scores at UT Austin"----
+ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Relationship of teaching and beauty scores")
+
+## ----numxplot2, echo=FALSE, warning=FALSE, fig.cap="Instructor evaluation scores at UT Austin: Jittered"----
+set.seed(76)
+ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_jitter() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Relationship of teaching and beauty scores")
+
+## ----numxplot2-a, echo=FALSE, warning=FALSE, fig.cap="Comparing regular and jittered scatterplots."----
+box <- data_frame(x=c(7.6, 8, 8, 7.6, 7.6), y=c(4.75, 4.75, 5.1, 5.1, 4.75))
+p1 <- ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Regular scatterplot") +
+  geom_path(data = box, aes(x=x, y=y), col = "orange", size = 1)
+set.seed(76)
+p2 <- ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_jitter() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Jittered scatterplot") +
+  geom_path(data = box, aes(x=x, y=y), col = "orange", size = 1)
+p1 + p2
+
+## ---- echo=FALSE---------------------------------------------------------
+set.seed(76)
+
+## ----numxplot3, warning=FALSE, fig.cap="Regression line"-----------------
+ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Relationship of teaching and beauty scores") +  
+  geom_smooth(method = "lm")
+
+## ---- echo=FALSE---------------------------------------------------------
+set.seed(76)
+
+## ----numxplot4, warning=FALSE, fig.cap="Regression line without error bands"----
+ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Relationship of teaching and beauty scores") +
+  geom_smooth(method = "lm", se = FALSE)
+
+## ----regtable-0----------------------------------------------------------
+score_model <- lm(score ~ bty_avg, data = evals_ch6)
+score_model
+
+## ----regtable, eval=FALSE------------------------------------------------
+## # Fit regression model:
+## score_model <- lm(score ~ bty_avg, data = evals_ch6)
+## # Get regression table:
+## get_regression_table(score_model)
+
+## ---- echo=FALSE---------------------------------------------------------
+score_model <- lm(score ~ bty_avg, data = evals_ch6)
+evals_line <- score_model %>% 
+  get_regression_table() %>%
+  pull(estimate)
+
+## ----numxplot4b, echo=FALSE----------------------------------------------
+get_regression_table(score_model) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Linear regression table",
+    booktabs = TRUE
+  )
+
+## ----moderndive-figure-wrapper, echo=FALSE, fig.align='center', fig.cap="The concept of a 'wrapper' function."----
+knitr::include_graphics("images/flowcharts/flowchart.011-cropped.png")
+
+## ---- echo=FALSE---------------------------------------------------------
+index <- which(evals_ch6$bty_avg == 7.333 & evals_ch6$score == 4.9)
+target_point <- score_model %>% 
+  get_regression_points() %>% 
+  slice(index)
+x <- target_point$bty_avg
+y <- target_point$score
+y_hat <- target_point$score_hat
+resid <- target_point$residual
+evals_ch6 %>%
+  slice(index) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Data for 21st instructor",
+    booktabs = TRUE
+  )
+
+## ---- echo=FALSE---------------------------------------------------------
+set.seed(76)
+
+## ----numxplot5, echo=FALSE, warning=FALSE, fig.cap="Example of observed value, fitted value, and residual"----
+best_fit_plot <- ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Teaching Score", 
+       title = "Relationship of teaching and beauty scores") + 
+  geom_smooth(method = "lm", se = FALSE) +
+  annotate("point", x = x, y = y, col = "red", size = 3) +
+  annotate("point", x = x, y = y_hat, col = "red", shape = 15, size = 3) +
+  annotate("segment", x = x, xend = x, y = y, yend = y_hat, color = "blue",
+           arrow = arrow(type = "closed", length = unit(0.02, "npc")))
+best_fit_plot
+
+## ---- eval=FALSE---------------------------------------------------------
+## regression_points <- get_regression_points(score_model)
+## regression_points
+
+## ---- echo=FALSE---------------------------------------------------------
+set.seed(76)
+regression_points <- get_regression_points(score_model) 
+regression_points %>%
+  slice(c(index, index + 1, index + 2, index + 3)) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Regression points (for only 21st through 24th instructor)",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE, echo=TRUE----------------------------------------------
+## ggplot(regression_points, aes(x = bty_avg, y = residual)) +
+##   geom_point() +
+##   labs(x = "Beauty Score", y = "Residual") +
+##   geom_hline(yintercept = 0, col = "blue", size = 1)
+
+## ----numxplot6, echo=FALSE, warning=FALSE, fig.cap="Plot of residuals over beauty score"----
+ggplot(regression_points, aes(x = bty_avg, y = residual)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Residual") +
+  geom_hline(yintercept = 0, col = "blue", size = 1) +
+  annotate("point", x = x, y = resid, col = "red", size = 3) +
+  annotate("point", x = x, y = 0, col = "red", shape = 15, size = 3) +
+  annotate("segment", x = x, xend = x, y = resid, yend = 0, color = "blue",
+           arrow = arrow(type = "closed", length = unit(0.02, "npc")))
+
+## ----numxplot7, echo=FALSE, warning=FALSE, fig.cap="Examples of less than ideal residual patterns"----
+resid_ex <- evals_ch6
+resid_ex$ex_1 <- ((evals_ch6$bty_avg - 5) ^ 2 - 6 + rnorm(nrow(evals_ch6), 0, 0.5)) * 0.4
+resid_ex$ex_2 <- (rnorm(nrow(evals_ch6), 0, 0.075 * evals_ch6$bty_avg ^ 2)) * 0.4
+  
+resid_ex <- resid_ex %>%
+  select(bty_avg, ex_1, ex_2) %>%
+  gather(type, eps, -bty_avg) %>% 
+  mutate(type = ifelse(type == "ex_1", "Example 1", "Example 2"))
+
+ggplot(resid_ex, aes(x = bty_avg, y = eps)) +
+  geom_point() +
+  labs(x = "Beauty Score", y = "Residual") +
+  geom_hline(yintercept = 0, col = "blue", size = 1) +
+  facet_wrap(~type)
+
+## ---- eval=FALSE, echo=TRUE----------------------------------------------
+## ggplot(regression_points, aes(x = residual)) +
+##   geom_histogram(binwidth = 0.25, color = "white") +
+##   labs(x = "Residual")
+
+## ----model1residualshist, echo=FALSE, warning=FALSE, fig.cap= "Histogram of residuals"----
+ggplot(regression_points, aes(x = residual)) +
+  geom_histogram(binwidth = 0.25, color = "white") +
+  labs(x = "Residual")
+
+## ----numxplot9, echo=FALSE, warning=FALSE, fig.cap="Examples of ideal and less than ideal residual patterns"----
+resid_ex <- evals_ch6
+resid_ex$`Ideal` <- rnorm(nrow(resid_ex), 0, sd = sd(regression_points$residual))
+resid_ex$`Less than ideal` <-
+  rnorm(nrow(resid_ex), 0, sd = sd(regression_points$residual))^2
+resid_ex$`Less than ideal` <- resid_ex$`Less than ideal` - mean(resid_ex$`Less than ideal` )
+
+resid_ex <- resid_ex %>%
+  select(bty_avg, `Ideal`, `Less than ideal`) %>%
+  gather(type, eps, -bty_avg)
+
+ggplot(resid_ex, aes(x = eps)) +
+  geom_histogram(binwidth = 0.25, color = "white") +
+  labs(x = "Residual") +
+  facet_wrap( ~ type, scales = "free")
+
+## ---- warning=FALSE, message=FALSE---------------------------------------
+library(gapminder)
+gapminder2007 <- gapminder %>%
+  filter(year == 2007) %>% 
+  select(country, continent, lifeExp, gdpPercap)
+
+## ---- eval=FALSE---------------------------------------------------------
+## View(gapminder2007)
+
+## ----model2-data-preview, echo=FALSE-------------------------------------
+gapminder2007 %>%
+  sample_n(5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Random sample of 5 countries",
+    booktabs = TRUE
+  )
+
+## ------------------------------------------------------------------------
+glimpse(gapminder2007)
+
+## ------------------------------------------------------------------------
+gapminder2007 %>% 
+  select(continent, lifeExp) %>% 
+  skim()
+
+## ---- echo=FALSE---------------------------------------------------------
+lifeExp_worldwide <- gapminder2007 %>%
+  summarize(median = median(lifeExp), mean = mean(lifeExp))
+
+## ----lifeExp2007hist, echo=FALSE, warning=FALSE, fig.cap="Histogram of Life Expectancy in 2007"----
+ggplot(gapminder2007, aes(x = lifeExp)) +
+  geom_histogram(binwidth = 5, color = "white") +
+  labs(x = "Life expectancy", y = "Number of countries", 
+       title = "Worldwide life expectancy")
+
+## ---- eval=TRUE----------------------------------------------------------
+lifeExp_by_continent <- gapminder2007 %>%
+  group_by(continent) %>%
+  summarize(median = median(lifeExp), mean = mean(lifeExp))
+
+## ----catxplot0, echo=FALSE-----------------------------------------------
+lifeExp_by_continent %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Life expectancy by continent",
+    booktabs = TRUE
+  )
+
+## ---- echo=FALSE---------------------------------------------------------
+median_africa <- lifeExp_by_continent %>%
+  filter(continent == "Africa") %>%
+  pull(median)
+mean_africa <- lifeExp_by_continent %>%
+  filter(continent == "Africa") %>%
+  pull(mean)
+n_countries <- gapminder2007 %>% nrow()
+n_countries_africa <- gapminder2007 %>% filter(continent == "Africa") %>% nrow()
+
+## ----catxplot0b, warning=FALSE, fig.cap="Life expectancy in 2007"--------
+ggplot(gapminder2007, aes(x = lifeExp)) +
+  geom_histogram(binwidth = 5, color = "white") +
+  labs(x = "Life expectancy", y = "Number of countries", 
+       title = "Life expectancy by continent") +
+  facet_wrap(~ continent, nrow = 2)
+
+## ----catxplot1, warning=FALSE, fig.cap="Life expectancy in 2007"---------
+ggplot(gapminder2007, aes(x = continent, y = lifeExp)) +
+  geom_boxplot() +
+  labs(x = "Continent", y = "Life expectancy (years)", 
+       title = "Life expectancy by continent") 
+
+## ----continent-mean-life-expectancies, echo=FALSE------------------------
+gapminder2007 %>%
+  group_by(continent) %>%
+  summarize(mean = mean(lifeExp)) %>%
+  mutate(`mean vs Africa` = mean - mean_africa) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Mean life expectancy by continent",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## lifeExp_model <- lm(lifeExp ~ continent, data = gapminder2007)
+## get_regression_table(lifeExp_model)
+
+## ---- echo=FALSE---------------------------------------------------------
+lifeExp_model <- lm(lifeExp ~ continent, data = gapminder2007)
+evals_line <- get_regression_table(lifeExp_model) %>%
+  pull(estimate)
+
+## ----catxplot4b, echo=FALSE----------------------------------------------
+get_regression_table(lifeExp_model) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Linear regression table",
+    booktabs = TRUE
+  )
+
+## ---- echo=FALSE---------------------------------------------------------
+gapminder2007 %>%
+  slice(1:10) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "First 10 out of 142 countries",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## regression_points <- get_regression_points(lifeExp_model)
+## regression_points
+
+## ---- echo=FALSE---------------------------------------------------------
+regression_points <- get_regression_points(lifeExp_model)
+regression_points %>%
+  slice(1:10) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Regression points (First 10 out of 142 countries)",
+    booktabs = TRUE
+  )
+
+## ----catxplot7, warning=FALSE, fig.cap="Plot of residuals over continent"----
+ggplot(regression_points, aes(x = continent, y = residual)) +
+  geom_jitter(width = 0.1) + 
+  labs(x = "Continent", y = "Residual") +
+  geom_hline(yintercept = 0, col = "blue")
+
+## ---- eval=FALSE---------------------------------------------------------
+## gapminder2007 %>%
+##   filter(continent == "Asia") %>%
+##   arrange(lifeExp)
+
+## ---- echo=FALSE---------------------------------------------------------
+gapminder2007 %>%
+  filter(continent == "Asia") %>%
+  arrange(lifeExp) %>%
+  slice(1:5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Countries in Asia with shortest life expectancy",
+    booktabs = TRUE
+  )
+
+## ----catxplot8, warning=FALSE, fig.cap="Histogram of residuals"----------
+ggplot(regression_points, aes(x = residual)) +
+  geom_histogram(binwidth = 5, color = "white") +
+  labs(x = "Residual")
+
+## ----correlation2, echo=FALSE, fig.cap="Different Correlation Coefficients"----
+correlation <- c(-0.9999, -0.9, -0.75, -0.3, 0, 0.3, 0.75, 0.9, 0.9999)
+n_sim <- 100
+
+values <- NULL
+for(i in seq_len(length(correlation))){
+  rho <- correlation[i]
+  sigma <- matrix(c(5, rho * sqrt(50), rho * sqrt(50), 10), 2, 2) 
+  sim <- rmvnorm(
+    n = n_sim,
+    mean = c(20,40),
+    sigma = sigma
+    ) %>%
+    as_data_frame() %>% 
+    mutate(correlation = round(rho,2))
+  
+  values <- bind_rows(values, sim)
+}
+
+ggplot(data = values, mapping = aes(V1, V2)) +
+  geom_point() +
+  facet_wrap(~ correlation, ncol = 3) +
+  labs(x = "x", y = "y") + 
+  theme(
+    axis.text.x = element_blank(),
+    axis.text.y = element_blank(),
+    axis.ticks = element_blank()
+  )
+
+## ----moderndive-figure-causal-graph-2, echo=FALSE, fig.align='center', fig.cap="Does sleeping with shoes on cause headaches?"----
+knitr::include_graphics("images/flowcharts/flowchart.010-cropped.png")
+
+## ----moderndive-figure-causal-graph, echo=FALSE, fig.align='center', fig.cap="Causal graph."----
+knitr::include_graphics("images/flowcharts/flowchart.009-cropped.png")
+
+## ----echo=FALSE----------------------------------------------------------
+index <- which(evals_ch6$bty_avg == 2.333 & evals_ch6$score == 2.7)
+target_point <- get_regression_points(score_model) %>% 
+  slice(index)
+x <- target_point$bty_avg
+y <- target_point$score
+y_hat <- target_point$score_hat
+resid <- target_point$residual
+
+best_fit_plot <- best_fit_plot +
+  annotate("point", x = x, y = y, col = "red", size = 3) +
+  annotate("point", x = x, y = y_hat, col = "red", shape = 15, size = 3) +
+  annotate("segment", x = x, xend = x, y = y, yend = y_hat, color = "blue",
+           arrow = arrow(type = "closed", length = unit(0.02, "npc")))
+best_fit_plot
+
+## ---- echo=FALSE---------------------------------------------------------
+index <- which(evals_ch6$bty_avg == 3.667 & evals_ch6$score == 4.4)
+score_model <- lm(score ~ bty_avg, data = evals_ch6)
+target_point <- get_regression_points(score_model) %>% 
+  slice(index)
+x <- target_point$bty_avg
+y <- target_point$score
+y_hat <- target_point$score_hat
+resid <- target_point$residual
+
+best_fit_plot <- best_fit_plot +
+  annotate("point", x = x, y = y, col = "red", size = 3) +
+  annotate("point", x = x, y = y_hat, col = "red", shape = 15, size = 3) +
+  annotate("segment", x = x, xend = x, y = y, yend = y_hat,
+           color = "blue", 
+           arrow = arrow(type = "closed", length = unit(0.02, "npc")))
+best_fit_plot
+
+## ----here, echo=FALSE----------------------------------------------------
+index <- which(evals_ch6$bty_avg == 6 & evals_ch6$score == 3.8)
+score_model <- lm(score ~ bty_avg, data = evals_ch6)
+target_point <- get_regression_points(score_model) %>%
+  slice(index)
+x <- target_point$bty_avg
+y <- target_point$score
+y_hat <- target_point$score_hat
+resid <- target_point$residual
+
+best_fit_plot <- best_fit_plot +
+  annotate("point", x = x, y = y, col = "red", size = 3) +
+  annotate("point", x = x, y = y_hat, col = "red", shape = 15, size = 3) +
+  annotate("segment", x = x, xend = x, y = y, yend = y_hat, color = "blue",
+           arrow = arrow(type = "closed", length = unit(0.02, "npc")))
+best_fit_plot
+
+## ---- eval = FALSE-------------------------------------------------------
+## score_model <- lm(score ~ bty_avg, data = evals_ch6)
+## get_regression_table(score_model)
+
+## ---- echo = FALSE-------------------------------------------------------
+score_model <- lm(score ~ bty_avg, data = evals_ch6)
+get_regression_table(score_model) %>% 
+  knitr::kable()
+
+## ---- eval = FALSE-------------------------------------------------------
+## library(broom)
+## library(janitor)
+## score_model %>%
+##   tidy(conf.int = TRUE) %>%
+##   mutate_if(is.numeric, round, digits = 3) %>%
+##   clean_names() %>%
+##   rename(lower_ci = conf_low,
+##          upper_ci = conf_high)
+
+## ---- echo = FALSE-------------------------------------------------------
+library(broom)
+library(janitor)
+score_model %>% 
+  tidy(conf.int = TRUE) %>% 
+  mutate_if(is.numeric, round, digits = 3) %>%
+  clean_names() %>% 
+  rename(lower_ci = conf_low,
+         upper_ci = conf_high) %>% 
+  knitr::kable()
+
+## ---- eval = FALSE-------------------------------------------------------
+## library(broom)
+## library(janitor)
+## score_model %>%
+##   augment() %>%
+##   mutate_if(is.numeric, round, digits = 3) %>%
+##   clean_names() %>%
+##   select(-c("se_fit", "hat", "sigma", "cooksd", "std_resid"))
+
+## ---- echo = FALSE-------------------------------------------------------
+library(broom)
+library(janitor)
+score_model %>% 
+  augment() %>% 
+  mutate_if(is.numeric, round, digits = 3) %>%
+  clean_names() %>% 
+  select(-c("se_fit", "hat", "sigma", "cooksd", "std_resid")) %>% 
+  slice(1:10) %>% 
+  knitr::kable()
+
diff --git a/previous_versions/v0.4.0/scripts/07-multiple-regression.R b/previous_versions/v0.4.0/scripts/07-multiple-regression.R
new file mode 100755
index 000000000..5fedd3571
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/07-multiple-regression.R
@@ -0,0 +1,386 @@
+## ---- message=FALSE, warning=FALSE---------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+library(ISLR)
+library(skimr)
+
+## ---- message=FALSE, warning=FALSE, echo=FALSE---------------------------
+# Packages needed internally, but not in text:
+library(mvtnorm)
+library(tidyr)
+library(forcats)
+library(gridExtra)
+
+## ---- warning=FALSE, message=FALSE---------------------------------------
+library(ISLR)
+Credit <- Credit %>%
+  select(Balance, Limit, Income, Rating, Age)
+
+## ---- eval=FALSE---------------------------------------------------------
+## View(Credit)
+
+## ----model3-data-preview, echo=FALSE-------------------------------------
+Credit %>%
+  sample_n(5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Random sample of 5 credit card holders",
+    booktabs = TRUE
+  )
+
+## ------------------------------------------------------------------------
+glimpse(Credit)
+
+## ------------------------------------------------------------------------
+Credit %>% 
+  select(Balance, Limit, Income) %>% 
+  skim()
+
+## ---- eval=FALSE---------------------------------------------------------
+## Credit %>%
+##   get_correlation(Balance ~ Limit)
+## Credit %>%
+##   get_correlation(Balance ~ Income)
+
+## ---- eval=FALSE---------------------------------------------------------
+## Credit %>%
+##   select(Balance, Limit, Income) %>%
+##   cor()
+
+## ----model3-correlation, echo=FALSE--------------------------------------
+Credit %>% 
+  select(Balance, Limit, Income) %>% 
+  cor() %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Correlations between credit card balance, credit limit, and income", 
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## ggplot(Credit, aes(x = Limit, y = Balance)) +
+##   geom_point() +
+##   labs(x = "Credit limit (in $)", y = "Credit card balance (in $)",
+##        title = "Relationship between balance and credit limit") +
+##   geom_smooth(method = "lm", se = FALSE)
+## 
+## ggplot(Credit, aes(x = Income, y = Balance)) +
+##   geom_point() +
+##   labs(x = "Income (in $1000)", y = "Credit card balance (in $)",
+##        title = "Relationship between balance and income") +
+##   geom_smooth(method = "lm", se = FALSE)
+
+## ----2numxplot1, echo=FALSE, fig.height=4, fig.cap="Relationship between credit card balance and credit limit/income"----
+model3_balance_vs_limit_plot <- ggplot(Credit, aes(x = Limit, y = Balance)) +
+  geom_point() +
+  labs(x = "Credit limit (in $)", y = "Credit card balance (in $)", 
+       title = "Balance vs credit limit") +
+  geom_smooth(method = "lm", se = FALSE)
+model3_balance_vs_income_plot <- ggplot(Credit, aes(x = Income, y = Balance)) +
+  geom_point() +
+  labs(x = "Income (in $1000)", y = "Credit card balance (in $)", 
+       title = "Balance vs income") +
+  geom_smooth(method = "lm", se = FALSE) +
+  scale_y_continuous(limits = c(0, NA))
+grid.arrange(model3_balance_vs_limit_plot, model3_balance_vs_income_plot, nrow = 1)
+
+## ---- eval=FALSE, echo=FALSE---------------------------------------------
+## # Save as 798 x 562 images/credit_card_balance_3D_scatterplot.png
+## library(ISLR)
+## library(plotly)
+## plot_ly(showscale=FALSE) %>%
+##   add_markers(
+##     x = Credit$Income,
+##     y = Credit$Limit,
+##     z = Credit$Balance,
+##     hoverinfo = 'text',
+##     text = ~paste("x1 - Income: ", Credit$Income,
+##                   "</br> x2 - Limit: ", Credit$Limit,
+##                   "</br> y - Balance: ", Credit$Balance)
+##   ) %>%
+##   layout(
+##     scene = list(
+##       xaxis = list(title = "x1 - Income (in $10K)"),
+##       yaxis = list(title = "x2 - Limit ($)"),
+##       zaxis = list(title = "y - Balance ($)")
+##     )
+##   )
+
+## ---- eval=FALSE, echo=FALSE---------------------------------------------
+## # Save as 798 x 562 images/credit_card_balance_regression_plane.png
+## library(ISLR)
+## library(plotly)
+## library(tidyverse)
+## 
+## # setup hideous grid required by plotly
+## model_lm <- lm(Balance ~ Income + Limit, data=Credit)
+## x_grid <- seq(from=min(Credit$Income), to=max(Credit$Income), length=100)
+## y_grid <- seq(from=min(Credit$Limit), to=max(Credit$Limit), length=200)
+## z_grid <- expand.grid(x_grid, y_grid) %>%
+##   tbl_df() %>%
+##   rename(
+##     x_grid = Var1,
+##     y_grid = Var2
+##   ) %>%
+##   mutate(z = coef(model_lm)[1] + coef(model_lm)[2]*x_grid + coef(model_lm)[3]*y_grid) %>%
+##   .[["z"]] %>%
+##   matrix(nrow=length(x_grid)) %>%
+##   t()
+## 
+## # plot points and plane
+## plot_ly(showscale = FALSE) %>%
+##   add_markers(
+##     x = Credit$Income,
+##     y = Credit$Limit,
+##     z = Credit$Balance,
+##     hoverinfo = 'text',
+##     text = ~paste("x1 - Income: ", Credit$Income, "</br> x2 - Limit: ",
+##                   Credit$Limit, "</br> y - Balance: ", Credit$Balance)
+##   ) %>%
+##   layout(
+##     scene = list(
+##       xaxis = list(title = "x1 - Income (in $10K)"),
+##       yaxis = list(title = "x2 - Limit ($)"),
+##       zaxis = list(title = "y - Balance ($)")
+##     )
+##   ) %>%
+##   add_surface(
+##     x = x_grid,
+##     y = y_grid,
+##     z = z_grid
+##   )
+
+## ---- eval=FALSE---------------------------------------------------------
+## Balance_model <- lm(Balance ~ Limit + Income, data = Credit)
+## get_regression_table(Balance_model)
+
+## ---- echo=FALSE---------------------------------------------------------
+Balance_model <- lm(Balance ~ Limit + Income, data = Credit)
+Credit_line <- get_regression_table(Balance_model) %>%
+  pull(estimate)
+
+## ----model3-table-output, echo=FALSE-------------------------------------
+get_regression_table(Balance_model) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Multiple regression table", 
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## regression_points <- get_regression_points(Balance_model)
+## regression_points
+
+## ----model3-points-table, echo=FALSE-------------------------------------
+set.seed(76)
+regression_points <- get_regression_points(Balance_model)
+regression_points %>%
+  slice(1:5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Regression points (first 5 rows of 400)",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## ggplot(regression_points, aes(x = Limit, y = residual)) +
+##   geom_point() +
+##   labs(x = "Credit limit (in $)", y = "Residual", title = "Residuals vs credit limit")
+## 
+## ggplot(regression_points, aes(x = Income, y = residual)) +
+##   geom_point() +
+##   labs(x = "Income (in $1000)", y = "Residual", title = "Residuals vs income")
+
+## ---- echo=FALSE, fig.height=4, fig.cap="Residuals vs credit limit and income"----
+model3_residual_vs_limit_plot <- ggplot(regression_points, aes(x = Limit, y = residual)) +
+  geom_point() +
+  labs(x = "Credit limit (in $)", y = "Residual", 
+       title = "Residuals vs credit limit")
+model3_residual_vs_income_plot <- ggplot(regression_points, aes(x = Income, y = residual)) +
+  geom_point() +
+  labs(x = "Income (in $1000)", y = "Residual", 
+       title = "Residuals vs income")
+grid.arrange(model3_residual_vs_limit_plot, model3_residual_vs_income_plot, nrow = 1)
+
+## ----model3-residuals-hist, fig.height=4, fig.cap="Relationship between credit card balance and credit limit/income"----
+ggplot(regression_points, aes(x = residual)) +
+  geom_histogram(color = "white") +
+  labs(x = "Residual")
+
+## ------------------------------------------------------------------------
+evals_ch7 <- evals %>%
+  select(score, age, gender)
+
+## ---- eval=FALSE---------------------------------------------------------
+## View(evals_ch7)
+
+## ----model4-data-preview, echo=FALSE-------------------------------------
+evals_ch7 %>%
+  sample_n(5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Random sample of 5 instructors",
+    booktabs = TRUE
+  )
+
+## ------------------------------------------------------------------------
+evals_ch7 %>% 
+  skim()
+
+## ------------------------------------------------------------------------
+evals_ch7 %>% 
+  get_correlation(formula = score ~ age)
+
+## ----numxcatxplot1, warning=FALSE, fig.cap="Instructor evaluation scores at UT Austin split by gender (jittered)"----
+ggplot(evals_ch7, aes(x = age, y = score, color = gender)) +
+  geom_jitter() +
+  labs(x = "Age", y = "Teaching Score", color = "Gender") +
+  geom_smooth(method = "lm", se = FALSE)
+
+## ---- eval=FALSE---------------------------------------------------------
+## score_model_2 <- lm(score ~ age + gender, data = evals_ch7)
+## get_regression_table(score_model_2)
+
+## ---- echo=FALSE---------------------------------------------------------
+score_model_2 <- lm(score ~ age + gender, data = evals_ch7)
+get_regression_table(score_model_2) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Regression table", 
+    booktabs = TRUE
+  )
+
+## ----numxcatxplot2, echo=FALSE, warning=FALSE, fig.cap="Instructor evaluation scores at UT Austin by gender: same slope"----
+coeff <- lm(score ~ age + gender, data = evals_ch7) %>% 
+  coef() %>%
+  as.numeric()
+slopes <- evals_ch7 %>%
+  group_by(gender) %>%
+  summarise(min = min(age), max = max(age)) %>%
+  mutate(intercept = coeff[1]) %>%
+  mutate(intercept = ifelse(gender == "male", intercept + coeff[3], intercept)) %>%
+  gather(point, age, -c(gender, intercept)) %>%
+  mutate(y_hat = intercept + age * coeff[2])
+
+ggplot(evals_ch7, aes(x = age, y = score, col = gender)) +
+  geom_jitter() +
+  labs(x = "Age", y = "Teaching Score", color = "Gender") +
+  geom_line(data = slopes, aes(y = y_hat), size = 1)
+
+## ---- eval=FALSE---------------------------------------------------------
+## score_model_interaction <- lm(score ~ age * gender, data = evals_ch7)
+## get_regression_table(score_model_interaction)
+
+## ---- echo=FALSE---------------------------------------------------------
+score_model_interaction <- lm(score ~ age * gender, data = evals_ch7)
+get_regression_table(score_model_interaction) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Regression table", 
+    booktabs = TRUE
+  )
+
+## ---- echo=FALSE---------------------------------------------------------
+data_frame(
+  Gender = c("Male instructors", "Female instructors"),
+  Intercept = c(4.437, 4.883),
+  `Slope for age` = c(-0.004, -0.018)
+) %>% 
+  knitr::kable(
+    caption = "Comparison of male and female intercepts and age slopes", 
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## regression_points <- get_regression_points(score_model_interaction)
+## regression_points
+
+## ----model4-points-table, echo=FALSE-------------------------------------
+set.seed(76)
+regression_points <- get_regression_points(score_model_interaction)
+regression_points %>%
+  slice(1:5) %>%
+  knitr::kable(
+    digits = 3,
+    caption = "Regression points (first 5 rows of 463)",
+    booktabs = TRUE
+  )
+
+## ----residual1, warning=FALSE, fig.cap="Interaction model histogram of residuals"----
+ggplot(regression_points, aes(x = residual)) +
+  geom_histogram(binwidth = 0.25, color = "white") +
+  labs(x = "Residual") +
+  facet_wrap(~gender)
+
+## ----residual2, warning=FALSE, fig.cap="Interaction model residuals vs predictor"----
+ggplot(regression_points, aes(x = age, y = residual)) +
+  geom_point() +
+  labs(x = "age", y = "Residual") +
+  geom_hline(yintercept = 0, col = "blue", size = 1) +
+  facet_wrap(~ gender)
+
+## ---- eval=FALSE---------------------------------------------------------
+## library(ISLR)
+## data(Credit)
+## Credit %>%
+##   select(Balance, Income) %>%
+##   mutate(Income = Income * 1000) %>%
+##   cor()
+
+## ----cor-credit-2, echo=FALSE--------------------------------------------
+library(ISLR)
+data(Credit)
+Credit %>% 
+  select(Balance, Income) %>% 
+  mutate(Income = Income * 1000) %>% 
+  cor() %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Correlation between income (in $) and credit card balance", 
+    booktabs = TRUE
+  )
+
+## ----echo=FALSE, fig.height=4, fig.cap="Relationship between credit card balance and credit limit/income"----
+grid.arrange(model3_balance_vs_limit_plot, model3_balance_vs_income_plot, nrow = 1)
+
+## ----credit-limit-quartiles, echo=FALSE, fig.height=4, fig.cap="Histogram of credit limits and quartiles"----
+ggplot(Credit, aes(x = Limit)) +
+  geom_histogram(color = "white") +
+  geom_vline(xintercept = quantile(Credit$Limit, probs = c(0.25, 0.5, 0.75)), col = "red", linetype = "dashed")
+
+## ---- 2numxplot4, fig.height=4, echo=FALSE, fig.cap="Relationship between credit card balance and income for different credit limit brackets"----
+Credit <- Credit %>% 
+  mutate(limit_bracket = cut_number(Limit, 4)) %>% 
+  mutate(limit_bracket = fct_recode(limit_bracket,
+    "low" =  "[855,3.09e+03]",
+    "medium-low" = "(3.09e+03,4.62e+03]", 
+    "medium-high" = "(4.62e+03,5.87e+03]", 
+    "high" = "(5.87e+03,1.39e+04]"
+  ))
+
+model3_balance_vs_income_plot <- ggplot(Credit, aes(x = Income, y = Balance)) +
+  geom_point() +
+  labs(x = "Income (in $1000)", y = "Credit card balance (in $)", 
+       title = "Balance vs income (overall)") +
+  geom_smooth(method = "lm", se = FALSE) +
+  scale_y_continuous(limits = c(0, NA))
+
+model3_balance_vs_income_plot_colored <- ggplot(Credit, aes(x = Income, y = Balance, col = limit_bracket)) +
+  geom_point() +
+  geom_smooth(method = "lm", se = FALSE) +
+  labs(x = "Income (in $1000)", y = "Credit card balance (in $)", 
+       color = "Credit limit\nbracket", title = "Balance vs income (by bracket)") + 
+  theme(legend.position = "none") +
+  scale_y_continuous(limits = c(0, NA))
+  
+grid.arrange(model3_balance_vs_income_plot, model3_balance_vs_income_plot_colored, nrow = 1)
+#cowplot::plot_grid(model3_balance_vs_income_plot, model3_balance_vs_income_plot_colored, nrow = 1, rel_widths = c(2/5, 3/5))
+
+## ---- 2numxplot5, echo=FALSE, warning=FALSE, fig.cap="Relationship between credit card balance and income for different credit limit brackets"----
+ggplot(Credit, aes(x = Income, y = Balance)) +
+  geom_point() +
+  facet_wrap(~limit_bracket) +
+  geom_smooth(method = "lm", se = FALSE) +
+  labs(x = "Income (in $1000)", y = "Credit card balance (in $)")
+
diff --git a/previous_versions/v0.4.0/scripts/08-sampling.R b/previous_versions/v0.4.0/scripts/08-sampling.R
new file mode 100755
index 000000000..255260859
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/08-sampling.R
@@ -0,0 +1,251 @@
+## ----message=FALSE, warning=FALSE----------------------------------------
+library(dplyr)
+library(ggplot2)
+library(moderndive)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+library(patchwork)
+set.seed(79)
+
+## ---- eval=FALSE---------------------------------------------------------
+## tactile_prop_red
+## View(tactile_prop_red)
+
+## ----tactile-prop-red, echo=FALSE, message=FALSE, warning=FALSE----------
+tactile_prop_red %>% 
+  kable(
+    digits = 2,
+    caption = "33 sample proportions based on 33 tactile samples with n = 50", 
+    booktabs = TRUE
+  )
+
+## ----eval=FALSE----------------------------------------------------------
+## ggplot(tactile_prop_red, aes(x = prop_red)) +
+##   geom_histogram(binwidth = 0.05, color = "white") +
+##   labs(x = "Sample proportion red based on n = 50", title = "Sampling distribution of p-hat")
+
+## ----samplingdistribution-tactile, echo=FALSE, fig.cap="Sampling distribution of 33 sample proportions based on 33 tactile samples with n=50"----
+tactile_histogram <- ggplot(tactile_prop_red, aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, color = "white")
+tactile_histogram + 
+    labs(
+      x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
+      title = expression(paste("Sampling distribution of ", hat(p)))
+      )
+
+## ---- eval=FALSE---------------------------------------------------------
+## tactile_prop_red %>%
+##   summarize(mean = mean(prop_red), sd = sd(prop_red))
+
+## ---- echo=FALSE---------------------------------------------------------
+summary_stats <- tactile_prop_red %>% 
+  summarize(mean = mean(prop_red), sd = sd(prop_red))
+summary_stats %>% 
+  kable(digits = 3)
+
+## ------------------------------------------------------------------------
+bowl
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_shovel <- bowl %>%
+##   rep_sample_n(size = 50)
+## View(virtual_shovel)
+
+## ---- echo=FALSE---------------------------------------------------------
+virtual_shovel <- bowl %>% 
+  rep_sample_n(size = 50)
+virtual_shovel %>% 
+  slice(1:10) %>%
+  knitr::kable(
+    align = c("r", "r"),
+    digits = 3,
+    caption = "First 10 sampled balls of 50 in virtual sample",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_shovel %>%
+##   summarize(red = sum(color == "red")) %>%
+##   mutate(prop_red = red / 50)
+
+## ---- echo=FALSE---------------------------------------------------------
+virtual_shovel %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Count and proportion red in single virtual sample of size n = 50",
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_samples <- bowl %>%
+##   rep_sample_n(size = 50, reps = 33)
+## View(virtual_samples)
+
+## ---- echo=FALSE---------------------------------------------------------
+virtual_samples <- bowl %>% 
+  rep_sample_n(size = 50, reps = 33)
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_prop_red <- virtual_samples %>%
+##   group_by(replicate) %>%
+##   summarize(red = sum(color == "red")) %>%
+##   mutate(prop_red = red / 50)
+## View(virtual_prop_red)
+
+## ----virtual-prop-red, echo=FALSE----------------------------------------
+virtual_prop_red <- virtual_samples %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+virtual_prop_red %>% 
+  kable(
+    digits = 2,
+    caption = "33 sample proportions red based on 33 virtual samples with n=50", 
+    booktabs = TRUE
+  )
+
+## ---- eval = FALSE-------------------------------------------------------
+## ggplot(virtual_prop_red, aes(x = prop_red)) +
+##   geom_histogram(binwidth = 0.05, color = "white") +
+##   labs(x = "Sample proportion red based on n = 50", title = "Sampling distribution of p-hat")
+
+## ----samplingdistribution-virtual, echo=FALSE, fig.cap="Sampling distribution of 33 sample proportions based on 33 virtual samples with n=50"----
+virtual_histogram <- ggplot(virtual_prop_red, aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, color = "white")
+virtual_histogram +
+    labs(
+      x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
+      title = expression(paste("Sampling distribution of ", hat(p)))
+      )
+
+## ----tactile-vs-virtual, echo=FALSE, fig.cap="Comparison of sampling distributions based on 33 tactile & virtual samples with n=50"----
+tactile_histogram <- tactile_histogram +
+  labs(
+    x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
+    title = "Sampling distribution: Tactile"
+    )
+virtual_histogram <- virtual_histogram +
+  labs(
+    x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
+    title = "Sampling distribution: Virtual"
+    )
+# using patchwork package for ggplot compositions
+tactile_histogram + virtual_histogram
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_samples <- bowl %>%
+##   rep_sample_n(size = 50, reps = 1000)
+## View(virtual_samples)
+
+## ---- echo=FALSE---------------------------------------------------------
+virtual_samples <- bowl %>% 
+  rep_sample_n(size = 50, reps = 1000)
+
+## ---- eval=FALSE---------------------------------------------------------
+## virtual_prop_red <- virtual_samples %>%
+##   group_by(replicate) %>%
+##   summarize(red = sum(color == "red")) %>%
+##   mutate(prop_red = red / 50)
+## View(virtual_prop_red)
+
+## ---- echo=FALSE---------------------------------------------------------
+virtual_prop_red <- virtual_samples %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+
+## ---- eval=FALSE---------------------------------------------------------
+## ggplot(virtual_prop_red, aes(x = prop_red)) +
+##   geom_histogram(binwidth = 0.05, color = "white") +
+##   labs(x = "Sample proportion red based on n = 50", title = "Sampling distribution of p-hat")
+
+## ----samplingdistribution-virtual-1000, echo=FALSE, fig.cap="Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50"----
+virtual_prop_red <- virtual_samples %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+
+ggplot(virtual_prop_red, aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, color = "white") +
+    labs(
+      x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), 
+      title = expression(paste("Sampling distribution of ", hat(p)))
+      )
+
+## ------------------------------------------------------------------------
+virtual_prop_red %>% 
+  summarize(SE = sd(prop_red))
+
+## ------------------------------------------------------------------------
+virtual_samples_50 <- bowl %>% 
+  rep_sample_n(size = 50, reps = 1000)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_50 <- virtual_samples_50 %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_50 %>% 
+  summarize(SE = sd(prop_red))
+
+## ------------------------------------------------------------------------
+virtual_samples_25 <- bowl %>% 
+  rep_sample_n(size = 25, reps = 1000)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_25 <- virtual_samples_25 %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 25)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_25 %>% 
+  summarize(SE = sd(prop_red))
+
+## ------------------------------------------------------------------------
+virtual_samples_100 <- bowl %>% 
+  rep_sample_n(size = 100, reps = 1000)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_100 <- virtual_samples_100 %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 100)
+
+## ------------------------------------------------------------------------
+virtual_prop_red_100 %>% 
+  summarize(SE = sd(prop_red))
+
+## ----comparing-n, echo = FALSE-------------------------------------------
+virtual_prop_red_25 <- virtual_prop_red_25 %>% 
+  mutate(n = 25)
+virtual_prop_red_50 <- virtual_prop_red_50 %>% 
+  mutate(n = 50)
+virtual_prop_red_100 <- virtual_prop_red_100 %>% 
+  mutate(n = 100)
+
+virtual_prop <- virtual_prop_red_25 %>% 
+  bind_rows(virtual_prop_red_50) %>% 
+  bind_rows(virtual_prop_red_100)
+
+virtual_prop %>% 
+  group_by(n) %>% 
+  summarize(SE = sd(prop_red)) %>% 
+  kable(
+    digits = 4,
+    caption = "Comparing the SE for different n", 
+    booktabs = TRUE
+  )
+
+## ----comparing-sampling-distributions, echo = FALSE, fig.cap="Comparing sampling distributions of p-hat for different sample sizes n"----
+ggplot(virtual_prop, aes(x = prop_red)) +
+  geom_histogram(binwidth = 0.05, color = "white") +
+  labs(x = "Sample proportion red", title = "Comparing sampling distributions of p-hat for different sample sizes n") +
+  facet_wrap(~n)
+
diff --git a/previous_versions/v0.4.0/scripts/09-confidence-intervals.R b/previous_versions/v0.4.0/scripts/09-confidence-intervals.R
new file mode 100755
index 000000000..0fb6f3698
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/09-confidence-intervals.R
@@ -0,0 +1,536 @@
+## ----inference-summary-table, echo=FALSE, message=FALSE------------------
+# Original at https://docs.google.com/spreadsheets/d/1QkOpnBGqOXGyJjwqx1T2O5G5D72wWGfWlPyufOgtkk4/edit#gid=0
+library(dplyr)
+library(readr)
+read_csv("data/ch9_summary_table - Sheet1.csv", na = "") %>% 
+  kable(
+    caption = "Scenarios of sampling for inference", 
+    booktabs = TRUE
+  )
+
+## ----message=FALSE, warning=FALSE----------------------------------------
+library(dplyr)
+library(ggplot2)
+library(janitor)
+library(moderndive)
+library(infer)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+
+## ----include=FALSE-------------------------------------------------------
+set.seed(2018)
+pennies_sample <- pennies %>% sample_n(40)
+
+## ------------------------------------------------------------------------
+pennies_sample
+
+## ------------------------------------------------------------------------
+ggplot(pennies_sample, aes(x = age_in_2011)) +
+  geom_histogram(bins = 10, color = "white")
+
+## ------------------------------------------------------------------------
+x_bar <- pennies_sample %>% 
+  summarize(stat = mean(age_in_2011))
+x_bar
+
+## ----include=FALSE-------------------------------------------------------
+set.seed(201)
+
+## ------------------------------------------------------------------------
+bootstrap_sample1 <- pennies_sample %>% 
+  rep_sample_n(size = 40, replace = TRUE, reps = 1)
+bootstrap_sample1
+
+## ------------------------------------------------------------------------
+ggplot(bootstrap_sample1, aes(x = age_in_2011)) +
+  geom_histogram(bins = 10, color = "white")
+
+## ------------------------------------------------------------------------
+bootstrap_sample1 %>% 
+  summarize(stat = mean(age_in_2011))
+
+## ------------------------------------------------------------------------
+six_bootstrap_samples <- pennies_sample %>% 
+  rep_sample_n(size = 40, replace = TRUE, reps = 6)
+
+## ------------------------------------------------------------------------
+ggplot(six_bootstrap_samples, aes(x = age_in_2011)) +
+  geom_histogram(bins = 10, color = "white") +
+  facet_wrap(~ replicate)
+
+## ------------------------------------------------------------------------
+six_bootstrap_samples %>% 
+  group_by(replicate) %>% 
+  summarize(stat = mean(age_in_2011))
+
+## ----fig.align='center', echo=FALSE--------------------------------------
+knitr::include_graphics("images/flowcharts/infer/specify.png")
+
+## ------------------------------------------------------------------------
+pennies_sample %>% 
+  specify(response = age_in_2011)
+
+## ------------------------------------------------------------------------
+pennies_sample %>% 
+  specify(formula = age_in_2011 ~ NULL)
+
+## ----fig.align='center', echo=FALSE--------------------------------------
+knitr::include_graphics("images/flowcharts/infer/generate.png")
+
+## ------------------------------------------------------------------------
+thousand_bootstrap_samples <- pennies_sample %>% 
+  specify(response = age_in_2011) %>% 
+  generate(reps = 1000)
+
+## ------------------------------------------------------------------------
+thousand_bootstrap_samples %>% count(replicate)
+
+## ----fig.align='center', echo=FALSE--------------------------------------
+knitr::include_graphics("images/flowcharts/infer/calculate.png")
+
+## ------------------------------------------------------------------------
+bootstrap_distribution <- pennies_sample %>% 
+  specify(response = age_in_2011) %>% 
+  generate(reps = 1000) %>% 
+  calculate(stat = "mean")
+bootstrap_distribution
+
+## ------------------------------------------------------------------------
+pennies_sample %>% 
+  summarize(stat = mean(age_in_2011))
+
+## ------------------------------------------------------------------------
+pennies_sample %>% 
+  specify(response = age_in_2011) %>% 
+  calculate(stat = "mean")
+
+## ----fig.align='center', echo=FALSE--------------------------------------
+knitr::include_graphics("images/flowcharts/infer/visualize.png")
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% visualize()
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% visualize(obs_stat = x_bar)
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  summarize(mean_of_means = mean(stat))
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  get_ci(level = 0.95, type = "percentile")
+
+## ------------------------------------------------------------------------
+percentile_ci <- bootstrap_distribution %>% 
+  get_ci()
+percentile_ci
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  visualize(endpoints = percentile_ci, direction = "between")
+
+## ----eval=FALSE----------------------------------------------------------
+## standard_error_ci <- bootstrap_distribution %>%
+##   get_ci(type = "se", point_estimate = x_bar)
+## standard_error_ci
+
+## ----echo=FALSE----------------------------------------------------------
+standard_error_ci <- bootstrap_distribution %>% 
+  get_ci(type = "se", point_estimate = x_bar)
+round(standard_error_ci, 2)
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  visualize(endpoints = standard_error_ci, direction = "between")
+
+## ------------------------------------------------------------------------
+ggplot(pennies, aes(x = age_in_2011)) +
+  geom_histogram(bins = 10, color = "white")
+
+## ------------------------------------------------------------------------
+pennies %>% 
+  summarize(mean_age = mean(age_in_2011),
+            median_age = median(age_in_2011))
+
+## ------------------------------------------------------------------------
+ggplot(pennies_sample, aes(x = age_in_2011)) +
+  geom_histogram(bins = 10, color = "white")
+
+## ------------------------------------------------------------------------
+pennies_sample %>% 
+  summarize(mean_age = mean(age_in_2011),
+            median_age = median(age_in_2011))
+
+## ------------------------------------------------------------------------
+thousand_samples <- pennies %>% 
+  rep_sample_n(size = 40, reps = 1000, replace = FALSE)
+
+## ------------------------------------------------------------------------
+sampling_distribution <- thousand_samples %>% 
+  group_by(replicate) %>% 
+  summarize(stat = mean(age_in_2011))
+
+## ---- fig.cap="Sampling distribution for n=40 samples of pennies"--------
+sampling_distribution %>% 
+  visualize(bins = 10, fill = "salmon")
+
+## ------------------------------------------------------------------------
+sampling_distribution %>% 
+  summarize(se = sd(stat))
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  visualize(bins = 10, fill = "blue")
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  summarize(se = sd(stat))
+
+## ------------------------------------------------------------------------
+sampling_distribution %>% 
+  summarize(mean_of_sampling_means = mean(stat))
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  summarize(mean_of_bootstrap_means = mean(stat))
+
+## ------------------------------------------------------------------------
+pennies %>% 
+  summarize(overall_mean = mean(age_in_2011))
+
+## ----include=FALSE-------------------------------------------------------
+pennies_mu <- pennies %>% 
+  summarize(overall_mean = mean(age_in_2011)) %>% 
+  pull()
+
+## ------------------------------------------------------------------------
+pennies_sample2 <- pennies %>% 
+  sample_n(size = 40)
+
+## ------------------------------------------------------------------------
+percentile_ci2 <- pennies_sample2 %>% 
+  specify(formula = age_in_2011 ~ NULL) %>% 
+  generate(reps = 1000) %>% 
+  calculate(stat = "mean") %>% 
+  get_ci()
+percentile_ci2
+
+## ----echo=FALSE----------------------------------------------------------
+set.seed(201)
+
+pennies_samples <- pennies %>% 
+  rep_sample_n(size = 40, reps = 100, replace = FALSE)
+
+nested_pennies <- pennies_samples %>% 
+  group_by(replicate) %>% 
+  tidyr::nest()
+
+infer_pipeline <- function(entry){
+  entry %>% 
+    specify(formula = age_in_2011 ~ NULL) %>% 
+    generate(reps = 1000) %>% 
+    calculate(stat = "mean") %>% 
+    get_ci()
+}
+
+if(!file.exists("rds/pennies_cis.rds")){
+  pennies_cis <- nested_pennies %>% 
+    mutate(percentile_ci = purrr::map(data, infer_pipeline)) %>% 
+    mutate(point_estimate = purrr::map_dbl(data, ~mean(.x$age_in_2011)))
+  saveRDS(object = pennies_cis, "rds/pennies_cis.rds")
+} else {
+  pennies_cis <- readRDS("rds/pennies_cis.rds")
+}
+
+perc_cis <- pennies_cis %>% 
+  tidyr::unnest(percentile_ci) %>% 
+  rename(lower = `2.5%`, upper = `97.5%`) %>% 
+  mutate(captured = lower <= pennies_mu & pennies_mu <= upper)
+
+ggplot(perc_cis) +
+  geom_point(aes(x = point_estimate, y = replicate, color = captured)) +
+  geom_segment(aes(y = replicate, yend = replicate, x = lower, xend = upper, 
+                   color = captured)) +
+  labs(
+    x = expression("Age in 2011 (Years)"),
+    y = "Replicate ID",
+    title = expression(paste("95% percentile-based confidence intervals for ", 
+                             mu, sep = ""))
+  ) +
+  scale_color_manual(values = c("blue", "orange")) + 
+  geom_vline(xintercept = pennies_mu, color = "red") 
+
+## ----echo=FALSE----------------------------------------------------------
+set.seed(2019)
+
+pennies_samples2 <- pennies %>% 
+  rep_sample_n(size = 40, reps = 100, replace = FALSE)
+
+nested_pennies2 <- pennies_samples2 %>% 
+  group_by(replicate) %>% 
+  tidyr::nest() %>% 
+  mutate(sample_mean = purrr::map_dbl(data, ~mean(.x$age_in_2011)))
+
+bootstrap_pipeline <- function(entry){
+  entry %>% 
+    specify(formula = age_in_2011 ~ NULL) %>% 
+    generate(reps = 1000) %>% 
+    calculate(stat = "mean")
+}
+
+if(!file.exists("rds/pennies_se_cis.rds")){
+  pennies_se_cis <- nested_pennies2 %>% 
+    mutate(bootstraps = purrr::map(data, bootstrap_pipeline)) %>% 
+    group_by(replicate) %>% 
+    mutate(se_ci = purrr::map(bootstraps, get_ci, type = "se",
+                              level = 0.9,
+                              point_estimate = sample_mean))
+  saveRDS(object = pennies_se_cis, "rds/pennies_se_cis.rds")
+} else {
+  pennies_se_cis <- readRDS("rds/pennies_se_cis.rds")
+}
+
+se_cis <- pennies_se_cis %>% 
+  tidyr::unnest(se_ci) %>% 
+  mutate(captured = lower <= pennies_mu & pennies_mu <= upper)
+
+ggplot(se_cis) +
+  geom_point(aes(x = sample_mean, y = replicate, color = captured)) +
+  geom_segment(aes(y = replicate, yend = replicate, x = lower, xend = upper, 
+                   color = captured)) +
+  labs(
+    x = expression("Age in 2011 (Years)"),
+    y = "Replicate ID",
+    title = expression(paste(
+      "90% standard error-based confidence intervals for ", mu, sep = "")
+      )
+  ) +
+  scale_color_manual(values = c("blue", "orange")) + 
+  geom_vline(xintercept = pennies_mu, color = "red") 
+
+## ----include=FALSE-------------------------------------------------------
+color <- c(rep("red", 21), rep("white", 50 - 21)) %>% 
+  sample()
+tactile_shovel1 <- tibble::tibble(color)
+
+## ------------------------------------------------------------------------
+tactile_shovel1
+
+## ------------------------------------------------------------------------
+p_hat <- tactile_shovel1 %>% 
+  specify(formula = color ~ NULL, success = "red") %>% 
+  calculate(stat = "prop")
+p_hat
+
+## ----eval=FALSE----------------------------------------------------------
+## tactile_shovel1 %>%
+##   specify(formula = color ~ NULL, success = "red") %>%
+##   generate(reps = 10000)
+
+## ----echo=FALSE----------------------------------------------------------
+set.seed(2018)
+gen <- tactile_shovel1 %>% 
+  specify(formula = color ~ NULL, success = "red") %>% 
+  generate(reps = 10000)
+
+## ----eval=FALSE----------------------------------------------------------
+## bootstrap_props <- tactile_shovel1 %>%
+##   specify(formula = color ~ NULL, success = "red") %>%
+##   generate(reps = 10000) %>%
+##   calculate(stat = "prop")
+
+## ----echo=FALSE----------------------------------------------------------
+bootstrap_props <- gen %>% 
+  calculate(stat = "prop")
+
+## ------------------------------------------------------------------------
+bootstrap_props %>% visualize(bins = 25)
+
+## ------------------------------------------------------------------------
+standard_error_ci <- bootstrap_props %>% 
+  get_ci(type = "se", level = 0.95, point_estimate = p_hat)
+standard_error_ci
+
+## ------------------------------------------------------------------------
+bootstrap_props %>% 
+  visualize(bins = 25, endpoints = standard_error_ci)
+
+## ---- eval=FALSE, message=FALSE, warning=FALSE---------------------------
+## tactile_prop_red
+
+## ---- eval=FALSE, message=FALSE, warning=FALSE---------------------------
+## conf_ints <- tactile_prop_red %>%
+##   rename(p_hat = prop_red) %>%
+##   mutate(
+##     n = 50,
+##     SE = sqrt(p_hat * (1 - p_hat) / n),
+##     MoE = 1.96 * SE,
+##     lower_ci = p_hat - MoE,
+##     upper_ci = p_hat + MoE
+##   )
+## conf_ints
+
+## ---- echo=FALSE, message=FALSE, warning=FALSE---------------------------
+conf_ints <- tactile_prop_red %>% 
+  rename(p_hat = prop_red) %>% 
+  select(-replicate) %>% 
+  mutate(
+    n = 50, 
+    SE = sqrt(p_hat*(1-p_hat)/n),
+    MoE = 1.96*SE,
+    lower_ci = p_hat - MoE,
+    upper_ci = p_hat + MoE,
+    y = seq_len(n())
+  )
+conf_ints %>% 
+  select(-y) %>% 
+  kable(
+    digits = 3,
+    caption = "33 confidence intervals from 33 tactile samples of size n=50", 
+    booktabs = TRUE
+  )
+
+## ----tactile-conf-int, echo=FALSE, message=FALSE, warning=FALSE, fig.cap= "33 confidence intervals based on 33 tactile samples of size n=50", fig.height=6----
+groups <- conf_ints$group
+conf_ints %>%
+  mutate(p = 900 / 2400,
+         captured = lower_ci <= p & p <= upper_ci) %>%
+  ggplot() +
+  geom_point(aes(x = p_hat, y = y, col = captured)) +
+  geom_vline(xintercept = 900 / 2400, col = "red") +
+  geom_segment(aes(
+    y = y,
+    yend = y,
+    x = lower_ci,
+    xend = upper_ci,
+    col = captured
+  )) +
+  scale_y_continuous(breaks = 1:33, labels = groups) +
+  labs(x = expression("Proportion red"),
+       y = "",
+       title = expression(paste("95% confidence intervals for ", p, 
+                                sep = ""))) +
+  scale_color_manual(values = c("blue", "orange")) 
+
+## ------------------------------------------------------------------------
+# First: Take 100 virtual samples of n=50 balls
+virtual_samples <- bowl %>% 
+  rep_sample_n(size = 50, reps = 100)
+
+# Second: For each virtual sample compute the proportion red
+virtual_prop_red <- virtual_samples %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+
+# Third: Compute the 95% confidence interval as above
+virtual_prop_red <- virtual_prop_red %>% 
+  rename(p_hat = prop_red) %>% 
+  mutate(
+    n = 50,
+    SE = sqrt(p_hat*(1-p_hat)/n),
+    MoE = 1.96 * SE,
+    lower_ci = p_hat - MoE,
+    upper_ci = p_hat + MoE
+  )
+
+## ----virtual-conf-int, echo=FALSE, message=FALSE, warning=FALSE, fig.height=6, fig.cap="100 confidence intervals based on 100 virtual samples of size n=50"----
+set.seed(79)
+
+virtual_samples <- bowl %>% 
+  rep_sample_n(size = 50, reps = 100)
+
+# Second: For each virtual sample compute the proportion red
+virtual_prop_red <- virtual_samples %>% 
+  group_by(replicate) %>% 
+  summarize(red = sum(color == "red")) %>% 
+  mutate(prop_red = red / 50)
+
+# Third: Compute the 95% confidence interval as above
+virtual_prop_red <- virtual_prop_red %>% 
+  rename(p_hat = prop_red) %>% 
+  mutate(
+    n = 50,
+    SE = sqrt(p_hat * (1 - p_hat) / n),
+    MoE = 1.96 * SE,
+    lower_ci = p_hat - MoE,
+    upper_ci = p_hat + MoE
+  ) %>% 
+  mutate(
+    y = seq_len(n()),
+    p = 900 / 2400,
+    captured = lower_ci <= p & p <= upper_ci
+  )
+
+ggplot(virtual_prop_red) +
+  geom_point(aes(x = p_hat, y = y, color = captured)) +
+  geom_segment(aes(y = y, yend = y, x = lower_ci, xend = upper_ci, 
+                   color = captured)) +
+  labs(
+    x = expression("Proportion red"),
+    y = "Replicate ID",
+    title = expression(paste("95% confidence intervals for ", p, sep = ""))
+  ) +
+  scale_color_manual(values = c("blue", "orange")) + 
+  geom_vline(xintercept = 900 / 2400, color = "red") 
+
+## ------------------------------------------------------------------------
+mythbusters_yawn
+
+## ------------------------------------------------------------------------
+mythbusters_yawn %>% 
+  tabyl(group, yawn) %>% 
+  adorn_percentages() %>% 
+  adorn_pct_formatting() %>% 
+  # To show original counts
+  adorn_ns()
+
+## ----error=TRUE----------------------------------------------------------
+mythbusters_yawn %>% 
+  specify(formula = yawn ~ group)
+
+## ------------------------------------------------------------------------
+mythbusters_yawn %>% 
+  specify(formula = yawn ~ group, success = "yes")
+
+## ----error=TRUE----------------------------------------------------------
+mythbusters_yawn %>% 
+  specify(formula = yawn ~ group, success = "yes") %>% 
+  calculate(stat = "diff in props")
+
+## ----error=TRUE----------------------------------------------------------
+obs_diff <- mythbusters_yawn %>% 
+  specify(formula = yawn ~ group, success = "yes") %>% 
+  calculate(stat = "diff in props", order = c("seed", "control"))
+obs_diff
+
+## ------------------------------------------------------------------------
+head(mythbusters_yawn)
+
+## ------------------------------------------------------------------------
+set.seed(2019)
+
+## ------------------------------------------------------------------------
+head(mythbusters_yawn) %>% 
+  sample_n(size = 6, replace = TRUE)
+
+## ------------------------------------------------------------------------
+bootstrap_distribution <- mythbusters_yawn %>% 
+  specify(formula = yawn ~ group, success = "yes") %>% 
+  generate(reps = 1000) %>% 
+  calculate(stat = "diff in props", order = c("seed", "control"))
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% visualize(bins = 20)
+
+## ------------------------------------------------------------------------
+bootstrap_distribution %>% 
+  get_ci(type = "percentile", level = 0.95)
+
+## ----include=FALSE-------------------------------------------------------
+bootstrap_distribution %>% 
+  get_ci(type = "percentile", level = 0.95) -> myth_ci
+
diff --git a/previous_versions/v0.4.0/scripts/10-hypothesis-testing.R b/previous_versions/v0.4.0/scripts/10-hypothesis-testing.R
new file mode 100755
index 000000000..cb6e7bb2a
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/10-hypothesis-testing.R
@@ -0,0 +1,184 @@
+## ----message=FALSE, warning=FALSE----------------------------------------
+library(dplyr)
+library(ggplot2)
+library(infer)
+library(nycflights13)
+library(ggplot2movies)
+library(broom)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+
+## ------------------------------------------------------------------------
+bos_sfo <- flights %>% 
+  na.omit() %>% 
+  filter(dest %in% c("BOS", "SFO")) %>% 
+  group_by(dest) %>% 
+  sample_n(100)
+
+## ------------------------------------------------------------------------
+bos_sfo_summary <- bos_sfo %>% group_by(dest) %>% 
+  summarize(mean_time = mean(air_time),
+            sd_time = sd(air_time))
+bos_sfo_summary
+
+## ------------------------------------------------------------------------
+ggplot(data = bos_sfo, mapping = aes(x = dest, y = air_time)) +
+  geom_boxplot()
+
+## ----message=FALSE, warning=FALSE----------------------------------------
+movies_trimmed <- movies %>% 
+  select(title, year, rating, Action, Romance)
+
+## ------------------------------------------------------------------------
+movies_trimmed <- movies_trimmed %>%
+  filter(!(Action == 1 & Romance == 1))
+
+## ------------------------------------------------------------------------
+movies_trimmed <- movies_trimmed %>%
+  mutate(genre = case_when(Action == 1 ~ "Action",
+                           Romance == 1 ~ "Romance",
+                           TRUE ~ "Neither")) %>%
+  filter(genre != "Neither") %>%
+  select(-Action, -Romance)
+
+## ----fig.cap="Rating vs genre in the population"-------------------------
+ggplot(data = movies_trimmed, aes(x = genre, y = rating)) +
+  geom_boxplot()
+
+## ----movie-hist, warning=FALSE, fig.cap="Faceted histogram of genre vs rating"----
+ggplot(data = movies_trimmed, mapping = aes(x = rating)) +
+  geom_histogram(binwidth = 1, color = "white") +
+  facet_grid(genre ~ .)
+
+## ------------------------------------------------------------------------
+set.seed(2017)
+movies_genre_sample <- movies_trimmed %>% 
+  group_by(genre) %>%
+  sample_n(34) %>% 
+  ungroup()
+
+## ----fig.cap="Genre vs rating for our sample"----------------------------
+ggplot(data = movies_genre_sample, aes(x = genre, y = rating)) +
+  geom_boxplot()
+
+## ----warning=FALSE, fig.cap="Genre vs rating for our sample as faceted histogram"----
+ggplot(data = movies_genre_sample, mapping = aes(x = rating)) +
+  geom_histogram(binwidth = 1, color = "white") +
+  facet_grid(genre ~ .)
+
+## ------------------------------------------------------------------------
+summary_ratings <- movies_genre_sample %>% 
+  group_by(genre) %>%
+  summarize(mean = mean(rating),
+            std_dev = sd(rating),
+            n = n())
+summary_ratings
+
+## ------------------------------------------------------------------------
+obs_diff <- movies_genre_sample %>% 
+  specify(formula = rating ~ genre) %>% 
+  calculate(stat = "diff in means", order = c("Romance", "Action"))
+obs_diff
+
+## ----include=FALSE-------------------------------------------------------
+set.seed(2018)
+
+## ----message=FALSE, warning=FALSE----------------------------------------
+shuffled_ratings_old <- #movies_trimmed %>%
+  movies_genre_sample %>% 
+     mutate(genre = mosaic::shuffle(genre)) %>% 
+     group_by(genre) %>%
+     summarize(mean = mean(rating))
+diff(shuffled_ratings_old$mean)
+
+permuted_ratings <- movies_genre_sample %>% 
+  specify(formula = rating ~ genre) %>% 
+  generate(reps = 1)
+
+## ----include=FALSE-------------------------------------------------------
+if(!file.exists("rds/generated_samples.rds")){
+  generated_samples <- movies_genre_sample %>% 
+    specify(formula = rating ~ genre) %>% 
+    hypothesize(null = "independence") %>% 
+    generate(reps = 5000)
+   saveRDS(object = generated_samples, 
+           "rds/generated_samples.rds")
+} else {
+   generated_samples <- readRDS("rds/generated_samples.rds")
+}
+
+## ----eval=FALSE----------------------------------------------------------
+## generated_samples <- movies_genre_sample %>%
+##   specify(formula = rating ~ genre) %>%
+##   hypothesize(null = "independence") %>%
+##   generate(reps = 5000)
+
+## ----include=FALSE-------------------------------------------------------
+null_distribution_two_means <- generated_samples %>% 
+  calculate(stat = "diff in means", order = c("Romance", "Action"))
+
+## ----fig.cap="Simulated differences in means histogram"------------------
+null_distribution_two_means %>% visualize()
+
+## ----fig.cap="Shaded histogram to show p-value"--------------------------
+null_distribution_two_means %>% 
+  visualize(obs_stat = obs_diff, direction = "both")
+
+## ----fig.cap="Histogram with vertical lines corresponding to observed statistic"----
+null_distribution_two_means %>% 
+  visualize(bins = 100, obs_stat = obs_diff, direction = "both")
+
+## ------------------------------------------------------------------------
+pvalue <- null_distribution_two_means %>% 
+  get_pvalue(obs_stat = obs_diff, direction = "both")
+pvalue
+
+## ----eval=FALSE----------------------------------------------------------
+## null_distribution_two_means <- movies_genre_sample %>%
+##   specify(formula = rating ~ genre) %>%
+##   hypothesize(null = "independence") %>%
+##   generate(reps = 5000) %>%
+##   calculate(stat = "diff in means", order = c("Romance", "Action"))
+
+## ------------------------------------------------------------------------
+percentile_ci_two_means <- movies_genre_sample %>% 
+  specify(formula = rating ~ genre) %>% 
+#  hypothesize(null = "independence") %>% 
+  generate(reps = 5000) %>% 
+  calculate(stat = "diff in means", order = c("Romance", "Action")) %>% 
+  get_ci()
+percentile_ci_two_means
+
+## ----echo=FALSE----------------------------------------------------------
+ggplot(data.frame(x = c(-4, 4)), aes(x)) + stat_function(fun = dnorm)
+
+## ----fig.cap="Simulated differences in means histogram"------------------
+ggplot(data = null_distribution_two_means, aes(x = stat)) +
+  geom_histogram(color = "white", bins = 20)
+
+## ----eval=FALSE----------------------------------------------------------
+## generated_samples <- movies_genre_sample %>%
+##   specify(formula = rating ~ genre) %>%
+##   hypothesize(null = "independence") %>%
+##   generate(reps = 5000)
+
+## ------------------------------------------------------------------------
+null_distribution_t <- generated_samples %>% 
+  calculate(stat = "t", order = c("Romance", "Action"))
+null_distribution_t %>% visualize()
+
+## ------------------------------------------------------------------------
+null_distribution_t %>% 
+  visualize(method = "both")
+
+## ------------------------------------------------------------------------
+obs_t <- movies_genre_sample %>% 
+  specify(formula = rating ~ genre) %>% 
+  calculate(stat = "t", order = c("Romance", "Action"))
+
+## ------------------------------------------------------------------------
+null_distribution_t %>% 
+  visualize(method = "both", obs_stat = obs_t, direction = "both")
+
diff --git a/previous_versions/v0.4.0/scripts/11-inference-for-regression.R b/previous_versions/v0.4.0/scripts/11-inference-for-regression.R
new file mode 100755
index 000000000..4b0efb2e1
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/11-inference-for-regression.R
@@ -0,0 +1,173 @@
+## ----setup_inference_regression, include=FALSE---------------------------
+chap <- 11
+lc <- 0
+rq <- 0
+# **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**
+# **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
+
+knitr::opts_chunk$set(
+  tidy = FALSE, 
+  out.width = '\\textwidth'
+  )
+
+# This bit of code is a bug fix on asis blocks, which we use to show/not show LC
+# solutions, which are written like markdown text. In theory, it shouldn't be
+# necessary for knitr versions <=1.11.6, but I've found I still need to for
+# everything to knit properly in asis blocks. More info here: 
+# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
+library(knitr)
+knit_engines$set(asis = function(options) {
+  if (options$echo && options$eval) knit_child(text = options$code)
+})
+
+# This controls which LC solutions to show. Options for solutions_shown: "ALL"
+# (to show all solutions), or subsets of c('11-1', '11-2'), including the
+# null vector c('') to show no solutions.
+solutions_shown <- c('')
+show_solutions <- function(section){
+  return(solutions_shown == "ALL" | section %in% solutions_shown)
+  }
+
+## ---- message=FALSE, warning=FALSE---------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+library(infer)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+
+## ------------------------------------------------------------------------
+evals %>% 
+  specify(score ~ bty_avg)
+
+## ------------------------------------------------------------------------
+slope_obs <- evals %>% 
+  specify(score ~ bty_avg) %>% 
+  calculate(stat = "slope")
+
+## ----eval=FALSE----------------------------------------------------------
+## null_slope_distn <- evals %>%
+##   specify(score ~ bty_avg) %>%
+##   hypothesize(null = "independence") %>%
+##   generate(reps = 10000) %>%
+##   calculate(stat = "slope")
+
+## ----echo=FALSE----------------------------------------------------------
+if(!file.exists("rds/null_slope_distn.rds")){
+  null_slope_distn <- evals %>% 
+    specify(score ~ bty_avg) %>%
+    hypothesize(null = "independence") %>% 
+    generate(reps = 10000) %>% 
+    calculate(stat = "slope")
+   saveRDS(object = null_slope_distn, 
+           "rds/null_slope_distn.rds")
+} else {
+   null_slope_distn <- readRDS("rds/null_slope_distn.rds")
+}
+
+## ------------------------------------------------------------------------
+null_slope_distn %>% 
+  visualize(obs_stat = slope_obs, direction = "greater")
+
+## ----fig.cap="Shaded histogram to show p-value"--------------------------
+null_slope_distn %>% 
+  get_pvalue(obs_stat = slope_obs, direction = "greater")
+
+## ----eval=FALSE----------------------------------------------------------
+## null_slope_distn <- evals %>%
+##   specify(score ~ bty_avg) %>%
+##   hypothesize(null = "independence") %>%
+##   generate(reps = 10000, type = "permute") %>%
+##   calculate(stat = "slope")
+
+## ----echo=FALSE----------------------------------------------------------
+bootstrap_slope_distn <- evals %>% 
+  specify(score ~ bty_avg) %>%
+  generate(reps = 10000, type = "bootstrap") %>% 
+  calculate(stat = "slope")
+
+## ----echo=FALSE----------------------------------------------------------
+if(!file.exists("rds/bootstrap_slope_distn.rds")){
+  bootstrap_slope_distn <- evals %>% 
+    specify(score ~ bty_avg) %>%
+    generate(reps = 10000, type = "bootstrap") %>% 
+    calculate(stat = "slope")
+  saveRDS(object = bootstrap_slope_distn, 
+           "rds/bootstrap_slope_distn.rds")
+} else {
+  bootstrap_slope_distn <- readRDS("rds/bootstrap_slope_distn.rds")
+}
+
+## ------------------------------------------------------------------------
+bootstrap_slope_distn %>% visualize()
+
+## ------------------------------------------------------------------------
+percentile_slope_ci <- bootstrap_slope_distn %>% 
+  get_ci(level = 0.99, type = "percentile")
+percentile_slope_ci
+
+## ------------------------------------------------------------------------
+se_slope_ci <- bootstrap_slope_distn %>% 
+  get_ci(level = 0.99, type = "se", point_estimate = slope_obs)
+se_slope_ci
+
+## ---- echo=FALSE---------------------------------------------------------
+library(tidyr)
+
+## ------------------------------------------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+
+evals_multiple <- evals %>%
+  select(score, ethnicity, gender, language, age, bty_avg, rank)
+
+## ----model1, echo=FALSE, warning=FALSE, fig.cap="Model 1: no interaction effect included"----
+coeff <- lm(score ~ age + gender, data = evals_multiple) %>% coef() %>% as.numeric()
+slopes <- evals_multiple %>%
+  group_by(gender) %>%
+  summarise(min = min(age), max = max(age)) %>%
+  mutate(intercept = coeff[1]) %>%
+  mutate(intercept = ifelse(gender == "male", intercept + coeff[3], intercept)) %>%
+  gather(point, age, -c(gender, intercept)) %>%
+  mutate(y_hat = intercept + age * coeff[2])
+  
+  ggplot(evals_multiple, aes(x = age, y = score, col = gender)) +
+  geom_jitter() +
+  labs(x = "Age", y = "Teaching Score", color = "Gender") +
+  geom_line(data = slopes, aes(y = y_hat), size = 1)
+
+## ----model2, echo=FALSE, warning=FALSE, fig.cap="Model 2: interaction effect included"----
+ggplot(evals_multiple, aes(x = age, y = score, col = gender)) +
+  geom_jitter() +
+  labs(x = "Age", y = "Teaching Score", color = "Gender") +
+  geom_smooth(method = "lm", se = FALSE)
+
+## ---- eval=FALSE---------------------------------------------------------
+## score_model_2 <- lm(score ~ age + gender, data = evals_multiple)
+## get_regression_table(score_model_2)
+
+## ---- echo=FALSE---------------------------------------------------------
+score_model_2 <- lm(score ~ age + gender, data = evals_multiple)
+get_regression_table(score_model_2) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Model 1: Regression table with no interaction effect included", 
+    booktabs = TRUE
+  )
+
+## ---- eval=FALSE---------------------------------------------------------
+## score_model_3 <- lm(score ~ age * gender, data = evals_multiple)
+## get_regression_table(score_model_3)
+
+## ---- echo=FALSE---------------------------------------------------------
+score_model_3 <- lm(score ~ age * gender, data = evals_multiple)
+get_regression_table(score_model_3) %>% 
+  knitr::kable(
+    digits = 3,
+    caption = "Model 2: Regression table with interaction effect included", 
+    booktabs = TRUE
+  )
+
diff --git a/previous_versions/v0.4.0/scripts/12-thinking-with-data.R b/previous_versions/v0.4.0/scripts/12-thinking-with-data.R
new file mode 100755
index 000000000..dde21cee6
--- /dev/null
+++ b/previous_versions/v0.4.0/scripts/12-thinking-with-data.R
@@ -0,0 +1,234 @@
+## ----setup_thinking_with_data, include=FALSE-----------------------------
+chap <- 12
+lc <- 0
+rq <- 0
+# **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**
+# **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
+
+knitr::opts_chunk$set(
+  tidy = FALSE, 
+  out.width = '\\textwidth'
+  )
+
+# This bit of code is a bug fix on asis blocks, which we use to show/not show LC
+# solutions, which are written like markdown text. In theory, it shouldn't be
+# necessary for knitr versions <=1.11.6, but I've found I still need to for
+# everything to knit properly in asis blocks. More info here: 
+# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
+library(knitr)
+knit_engines$set(asis = function(options) {
+  if (options$echo && options$eval) knit_child(text = options$code)
+})
+
+# This controls which LC solutions to show. Options for solutions_shown: "ALL"
+# (to show all solutions), or subsets of c('4-4', '4-5'), including the
+# null vector c('') to show no solutions.
+solutions_shown <- c('')
+show_solutions <- function(section){
+  return(solutions_shown == "ALL" | section %in% solutions_shown)
+  }
+
+## ----moderndive-figure-conclusion, echo=FALSE, fig.align='center', fig.cap="ModernDive Flowchart"----
+knitr::include_graphics("images/flowcharts/flowchart/flowchart.002.png")
+
+## ----pipeline-figure-conclusion, echo=FALSE, fig.align='center', fig.cap="Data/Science Pipeline"----
+knitr::include_graphics("images/tidy1.png")
+
+## ---- message=FALSE, warning=FALSE---------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+library(fivethirtyeight)
+
+## ----message=FALSE, warning=FALSE, echo=FALSE----------------------------
+# Packages needed internally, but not in text.
+library(knitr)
+library(patchwork)
+library(scales)
+
+## ----warning=FALSE, message=FALSE----------------------------------------
+library(ggplot2)
+library(dplyr)
+library(moderndive)
+
+## ---- eval=FALSE---------------------------------------------------------
+## View(house_prices)
+## glimpse(house_prices)
+
+## ---- echo=FALSE---------------------------------------------------------
+glimpse(house_prices)
+
+## ---- eval=FALSE, message=FALSE, warning=FALSE---------------------------
+## # Histogram of house price:
+## ggplot(house_prices, aes(x = price)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "price (USD)", title = "House price")
+## 
+## # Histogram of sqft_living:
+## ggplot(house_prices, aes(x = sqft_living)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "living space (square feet)", title = "House size")
+## 
+## # Barplot of condition:
+## ggplot(house_prices, aes(x = condition)) +
+##   geom_bar() +
+##   labs(x = "condition", title = "House condition")
+
+## ----house-prices-viz, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Exploratory visualizations of Seattle house prices data", fig.width=16/2, fig.height=9/2.5----
+library(patchwork)
+p1 <- ggplot(house_prices, aes(x = price)) +
+  geom_histogram(color = "white") +
+  labs(x = "price (USD)", title = "House price")
+p2 <- ggplot(house_prices, aes(x = sqft_living)) +
+  geom_histogram(color = "white") +
+  labs(x = "living space (square feet)", title = "House size")
+p3 <- ggplot(house_prices, aes(x = condition)) +
+  geom_bar() +
+  labs(x = "condition", title = "House condition")
+p1 + p2 + p3
+
+## ------------------------------------------------------------------------
+house_prices %>% 
+  summarize(
+    mean_price = mean(price),
+    median_price = median(price),
+    sd_price = sd(price),
+    IQR_price = IQR(price)
+  )
+
+## ----log10-orders-of-magnitude, echo=FALSE-------------------------------
+data_frame(Price = c(1,10,100,1000,10000,100000,1000000)) %>% 
+  mutate(
+    `log10(Price)` = log10(Price),
+    Price = dollar(Price),
+    `Order of magnitude` = c("Singles", "Tens", "Hundreds", "Thousands", "Tens of thousands", "Hundreds of thousands", "Millions"),
+    `Examples` = c("Cups of coffee", "Books", "Mobile phones", "High definition TV's", "Cars", "Luxury cars & houses", "Luxury houses")
+    ) %>% 
+  kable(
+    caption = "log10-transformated prices, orders of magnitude, and examples", 
+    booktabs = TRUE
+  )
+
+## ------------------------------------------------------------------------
+house_prices <- house_prices %>%
+  mutate(
+    log10_price = log10(price),
+    log10_size = log10(sqft_living)
+    )
+
+## ---- eval=FALSE---------------------------------------------------------
+## house_prices %>%
+##   select(price, log10_price, sqft_living, log10_size)
+
+## ---- echo=FALSE---------------------------------------------------------
+house_prices %>% 
+  select(price, log10_price, sqft_living, log10_size) %>% 
+  slice(1:10)
+
+## ---- eval=FALSE---------------------------------------------------------
+## # Before:
+## ggplot(house_prices, aes(x = price)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "price (USD)", title = "House price: Before")
+## 
+## # After:
+## ggplot(house_prices, aes(x = log10_price)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "log10 price (USD)", title = "House price: After")
+
+## ----log10-price-viz, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="House price before and after log10-transformation", fig.width=16/2, fig.height=9/2----
+library(patchwork)
+p1 <- ggplot(house_prices, aes(x = price)) +
+  geom_histogram(color = "white") +
+  labs(x = "price (USD)", title = "House price: Before")
+p2 <- ggplot(house_prices, aes(x = log10_price)) +
+  geom_histogram(color = "white") +
+  labs(x = "log10 price (USD)", title = "House price: After")
+p1 + p2
+
+## ---- eval=FALSE---------------------------------------------------------
+## # Before:
+## ggplot(house_prices, aes(x = sqft_living)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "living space (square feet)", title = "House size: Before")
+## 
+## # After:
+## ggplot(house_prices, aes(x = log10_size)) +
+##   geom_histogram(color = "white") +
+##   labs(x = "log10 living space (square feet)", title = "House size: After")
+
+## ----log10-size-viz, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="House size before and after log10-transformation", fig.width=16/2, fig.height=9/2----
+library(patchwork)
+p1 <- ggplot(house_prices, aes(x = sqft_living)) +
+  geom_histogram(color = "white") +
+  labs(x = "living space (square feet)", title = "House size: Before")
+p2 <- ggplot(house_prices, aes(x = log10_size)) +
+  geom_histogram(color = "white") +
+  labs(x = "log10 living space (square feet)", title = "House size: After")
+p1 + p2
+
+## ----house-price-parallel-slopes, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Parallel slopes model", fig.width=16/2, fig.height=9/2----
+model_price_3_points <-
+  house_prices %>%
+  lm(log10_price ~ log10_size + condition, data = .) %>%
+  get_regression_points()
+ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) +
+  geom_point(alpha = 0.1) +
+  labs(y = "log10 price", x = "log10 size", title = "House prices in Seattle: Parallel slopes model") +
+  geom_line(data = model_price_3_points, aes(y = log10_price_hat), show.legend = FALSE, size = 1) +
+  guides(colour = guide_legend(override.aes = list(alpha = 1)))
+
+## ----house-price-interaction, message=FALSE, warning=FALSE, fig.cap="Interaction model", fig.width=16/2, fig.height=9/2----
+ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) +
+  geom_point(alpha = 0.1) +
+  labs(y = "log10 price", x = "log10 size", title = "House prices in Seattle") +
+  geom_smooth(method = "lm", se = FALSE)
+
+## ----house-price-interaction-2, message=FALSE, warning=FALSE, fig.cap="Interaction model with facets", fig.width=16/2, fig.height=9/2----
+ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) +
+  geom_point(alpha = 0.3) +
+  labs(y = "log10 price", x = "log10 size", title = "House prices in Seattle") +
+  geom_smooth(method = "lm", se = FALSE) +
+  facet_wrap(~condition)
+
+## ------------------------------------------------------------------------
+# Fit regression model:
+price_interaction <- lm(log10_price ~ log10_size * condition, data = house_prices)
+# Get regression table:
+get_regression_table(price_interaction)
+
+## ----house-price-interaction-3, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Interaction model with prediction", fig.width=16/2, fig.height=9/2----
+new_house <- data_frame(log10_size = log10(1900), condition = factor(5)) %>% 
+  get_regression_points(price_interaction, newdata = .)
+
+ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) +
+  geom_point(alpha = 0.1) +
+  labs(y = "log10 price", x = "log10 size", title = "House prices in Seattle") +
+  geom_smooth(method = "lm", se = FALSE) +
+  geom_vline(xintercept = log10(1900), linetype = "dashed", size = 1) +
+  geom_point(data = new_house, aes(y = log10_price_hat), col ="black", size = 3)
+
+## ------------------------------------------------------------------------
+2.45 + 1 * log10(1900)
+
+## ------------------------------------------------------------------------
+10^(2.45 + 1 * log10(1900))
+
+## ----fivethirtyeight-----------------------------------------------------
+library(ggplot2)
+library(dplyr)
+library(fivethirtyeight)
+
+## ------------------------------------------------------------------------
+# Preview data
+glimpse(US_births_1994_2003)
+
+## ------------------------------------------------------------------------
+US_births_1999 <- US_births_1994_2003 %>%
+  filter(year == 1999)
+
+## ------------------------------------------------------------------------
+ggplot(US_births_1999, aes(x = date, y = births)) +
+  geom_line() +
+  labs(x = "Data", y = "Number of births", title = "US Births in 1999")
+
diff --git a/previous_versions/v0.4.0/search_index.json b/previous_versions/v0.4.0/search_index.json
new file mode 100755
index 000000000..d0ad68559
--- /dev/null
+++ b/previous_versions/v0.4.0/search_index.json
@@ -0,0 +1,18 @@
+[
+["index.html", "An Introduction to Statistical and Data Sciences via R 1 Introduction 1.1 Important Note 1.2 Introduction for students 1.3 Introduction for instructors 1.4 DataCamp 1.5 Connect and contribute 1.6 About this book 1.7 About the authors", " An Introduction to Statistical and Data Sciences via R Chester Ismay and Albert Y. Kim July 21, 2018 1 Introduction 1.1 Important Note This is a previous version (v0.4.0) of ModernDive and may be out of date. For the current version of ModernDive, please go to ModernDive.com. Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do? If you’re asking yourself this question, then you’ve come to the right place! Start with our Introduction for Students. Are you an instructor hoping to use this book in your courses? Then click here for more information on how to teach with this book. Are you looking to connect with and contribute to ModernDive? Then click here for information on how. Are you curious about the publishing of this book? Then click here for more information on the open-source technology, in particular R Markdown and the bookdown package. This is version 0.4.0 of ModernDive published on July 21, 2018. For previous versions of ModernDive, see Section 1.6. 1.2 Introduction for students This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would. In Figure 1.1 we present a flowchart of what you’ll cover in this book. You’ll first get started with with data in Chapter 2, where you’ll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then Data science: You’ll assemble your data science toolbox using tidyverse packages. In particular: Ch.3: Visualizing data via the ggplot2 package. Ch.4: Understanding the concept of “tidy” data as a standardized data input format for all packages in the tidyverse Ch.5: Wrangling data via the dplyr package. Data modeling: Using these data science tools and helper functions from the moderndive package, you’ll start performing data modeling. In particular: Ch.6: Constructing basic regression models. Ch.7: Constructing multiple regression models. Statistical inference: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the infer package. In particular: Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls. Ch.9: Building confidence intervals. Ch.10: Conducting hypothesis tests. Data modeling revisited: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.6 &amp; Ch.6. In particular: Ch.11: Interpreting both the statistical and practice significance of the results of the models. We’ll end with a discussion on what it means to “think with data” in Chapter 12 and present an example case study data analysis of house prices in Seattle. Figure 1.1: ModernDive Flowchart 1.2.1 What you will learn from this book We hope that by the end of this book, you’ll have learned How to use R to explore data. How to answer statistical questions using tools like confidence intervals and hypothesis tests. How to effectively create “data stories” using these tools. What do we mean by data stories? We mean any analysis involving data that engages the reader in answering questions with careful visuals and thoughtful discussion, such as How strong is the relationship between per capita income and crime in Chicago neighborhoods? and How many f**ks does Quentin Tarantino give (as measured by the amount of swearing in his films)?. Further discussions on data stories can be found in this Think With Google article. For other examples of data stories constructed by students like yourselves, look at the final projects for two courses that have previously used ModernDive: Middlebury College MATH 116 Introduction to Statistical and Data Sciences using student collected data. Pacific University SOC 301 Social Statistics using data from the fivethirtyeight R package. This book will help you develop your “data science toolbox”, including tools such as data visualization, data formatting, data wrangling, and data modeling using regression. With these tools, you’ll be able to perform the entirety of the “data/science pipeline” while building data communication skills (see Subsection 1.2.2 for more details). In particular, this book will lean heavily on data visualization. In today’s world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are to convey relationships with data. You’ll also see the use of visualization to introduce concepts like mean, median, standard deviation, distributions, etc. In general, we’ll use visualization as a way of building almost all of the ideas in this book. To impart the statistical lessons in this book, we have intentionally minimized the number of mathematical formulas used and instead have focused on developing a conceptual understanding via data visualization, statistical computing, and simulations. We hope this is a more intuitive experience than the way statistics has traditionally been taught in the past and how it is commonly perceived. Finally, you’ll learn the importance of literate programming. By this we mean you’ll learn how to write code that is useful not just for a computer to execute but also for readers to understand exactly what your analysis is doing and how you did it. This is part of a greater effort to encourage reproducible research (see Subsection 1.2.3 for more details). Hal Abelson coined the phrase that we will follow throughout this book: “Programs must be written for people to read, and only incidentally for machines to execute.” We understand that there may be challenging moments as you learn to program. Both of us continue to struggle and find ourselves often using web searches to find answers and reach out to colleagues for help. In the long run though, we all can solve problems faster and more elegantly via programming. We wrote this book as our way to help you get started and you should know that there is a huge community of R users that are always happy to help everyone along as well. This community exists in particular on the internet on various forums and websites such as stackoverflow.com. 1.2.2 Data/science pipeline You may think of statistics as just being a bunch of numbers. We commonly hear the phrase “statistician” when listening to broadcasts of sporting events. Statistics (in particular, data analysis), in addition to describing numbers like with baseball batting averages, plays a vital role in all of the sciences. You’ll commonly hear the phrase “statistically significant” thrown around in the media. You’ll see articles that say “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this book, you’ll be able to better understand whether these claims should be trusted or whether we should be wary. Inside data analysis are many sub-fields that we will discuss throughout this book (though not necessarily in this order): data collection data wrangling data visualization data modeling inference correlation and regression interpretation of results data communication/storytelling These sub-fields are summarized in what Grolemund and Wickham term the “Data/Science Pipeline” in Figure 1.2. Figure 1.2: Data/Science Pipeline We will begin by digging into the gray Understand portion of the cycle with data visualization, then with a discussion on what is meant by tidy data and data wrangling, and then conclude by talking about interpreting and discussing the results of our models via Communication. These steps are vital to any statistical analysis. But why should you care about statistics? “Why did they make me take this class?” There’s a reason so many fields require a statistics course. Scientific knowledge grows through an understanding of statistical significance and data analysis. You needn’t be intimidated by statistics. It’s not the beast that it used to be and, paired with computation, you’ll see how reproducible research in the sciences particularly increases scientific knowledge. 1.2.3 Reproducible research “The most important tool is the mindset, when starting, that the end product will be reproducible.” – Keith Baggerly Another goal of this book is to help readers understand the importance of reproducible analyses. The hope is to get readers into the habit of making their analyses reproducible from the very beginning. This means we’ll be trying to help you build new habits. This will take practice and be difficult at times. You’ll see just why it is so important for you to keep track of your code and well-document it to help yourself later and any potential collaborators as well. Copying and pasting results from one program into a word processor is not the way that efficient and effective scientific research is conducted. It’s much more important for time to be spent on data collection and data analysis and not on copying and pasting plots back and forth across a variety of programs. In a traditional analyses if an error was made with the original data, we’d need to step through the entire process again: recreate the plots and copy and paste all of the new plots and our statistical analysis into your document. This is error prone and a frustrating use of time. We’ll see how to use R Markdown to get away from this tedious activity so that we can spend more time doing science. “We are talking about computational reproducibility.” - Yihui Xie Reproducibility means a lot of things in terms of different scientific fields. Are experiments conducted in a way that another researcher could follow the steps and get similar results? In this book, we will focus on what is known as computational reproducibility. This refers to being able to pass all of one’s data analysis, data-sets, and conclusions to someone else and have them get exactly the same results on their machine. This allows for time to be spent interpreting results and considering assumptions instead of the more error prone way of starting from scratch or following a list of steps that may be different from machine to machine. 1.2.4 Final note for students At this point, if you are interested in instructor perspectives on this book, ways to contribute and collaborate, or the technical details of this book’s construction and publishing, then continue with the rest of the chapter below. Otherwise, let’s get started with R and RStudio in Chapter 2! 1.3 Introduction for instructors This book is inspired by the following books: “Mathematical Statistics with Resampling and R” (Chihara and Hesterberg 2011), “OpenIntro: Intro Stat with Randomization and Simulation” (Diez, Barr, and Çetinkaya-Rundel 2014), and “R for Data Science” (Grolemund and Wickham 2016). The first book, while designed for upper-level undergraduates and graduate students, provides an excellent resource on how to use resampling to impart statistical concepts like sampling distributions using computation instead of large-sample approximations and other mathematical formulas. The last two books are free options to learning introductory statistics and data science, providing an alternative to the many traditionally expensive introductory statistics textbooks. When looking over the large number of introductory statistics textbooks that currently exist, we found that there wasn’t one that incorporated many newly developed R packages directly into the text, in particular the many packages included in the tidyverse collection of packages, such as ggplot2, dplyr, tidyr, and broom. Additionally, there wasn’t an open-source and easily reproducible textbook available that exposed new learners all of three of the learning goals listed at the outset of Subsection 1.2.1. 1.3.1 Who is this book for? This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience. Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you. Blur the lines between lecture and lab With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened. It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key. Focus on the entire data/science research pipeline We believe that the entirety of Grolemund and Wickham’s data/science pipeline should be taught. We believe in “minimizing prerequisites to research”: students should be answering questions with data as soon as possible. It’s all about the data We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the nycflights13 and fivethirtyeight packages. We believe that data visualization is a gateway drug for statistics and that the Grammar of Graphics as implemented in the ggplot2 package is the best way to impart such lessons. However, we often hear: “You can’t teach ggplot2 for data visualization in intro stats!” We, like David Robinson, are much more optimistic. dplyr has made data wrangling much more accessible to novices, and hence much more interesting data-sets can be explored. Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference. This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics. Don’t fence off students from the computation pool, throw them in! Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice. We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis. Complete reproducibility and customizability We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book! Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see About this Book. 1.4 DataCamp DataCamp is a browser-based interactive platform for learning data science, offering courses on a wide array of courses on data science, analytics, statistics, machine learning, and artificial intelligence, where each course is a combination of lectures and exercises that offer immediate feedback. The following chapters of ModernDive roughly map to the following closely-integrated DataCamp courses that use the same R tools and often even the same datasets. By no means is this an exhaustive list of possible DataCamp courses that are relevant to the topics in this book, we recommend these ones in particular to supplement your ModernDive experience. Click on the image for each course to access its webpage on datacamp.com. Instructors at accredited universities can sign their class up for a free academic licence at DataCamp For The Classroom, giving their students access to all premium courses for 6 months for free. Chapter Topic DataCamp Courses 2 Basic R programming concepts 3 &amp; 5 Introductory data visualization and wrangling 4 &amp; 5 Data “tidying” and intermediate data wrangling 6 &amp; 7 Data modelling, basic regression, and multiple regression 9 &amp; 10 Statistical inference: confidence intervals and hypothesis testing 11 Inference for regression 1.5 Connect and contribute If you would like to connect with ModernDive, check out the following links: If you would like to receive periodic updates about ModernDive (roughly every 3 months), please sign up for our mailing list. Contact Albert at albert@moderndive.com and Chester chester@moderndive.com We’re on Twitter at ModernDive. If you would like to contribute to ModernDive, there are many ways! Let’s all work together to make this book as great as possible for as many students and instructors as possible! Please let us know if you find any errors, typos, or areas from improvement on our GitHub issues page. If you are familiar with GitHub and would like to contribute more, please see Section 1.6 below. The authors would like to thank Nina Sonneborn, Kristin Bott, and the participants of our USCOTS 2017 workshop for their feedback and suggestions. A special thanks goes to Prof. Yana Weinstein, cognitive psychological scientist and co-founder of The Learning Scientists, for her extensive contributions. 1.6 About this book This book was written using RStudio’s bookdown package by Yihui Xie (Xie 2018). This package simplifies the publishing of books by having all content written in R Markdown. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub: Latest published version The most up-to-date release: Version 0.4.0 released on July 21, 2018 (source code). Available at ModernDive.com Development version The working copy of the next version which is currently being edited: Preview of development version is available at https://moderndive.netlify.com/ Source code: Available on ModernDive’s GitHub repository page Previous versions Older versions that may be out of date: Version 0.3.0 released on February 3, 2018 (source code) Version 0.2.0 released on August 02, 2017 (source code) Version 0.1.3 released on February 09, 2017 (source code) Version 0.1.2 released on January 22, 2017 (source code) Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated editions of the textbook every few years, we apply a software design influenced model of publishing more easily updated versions. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests. Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of index.Rmd as “Chester Ismay, Albert Y. Kim, and YOU!” 1.7 About the authors Who we are! Chester Ismay Albert Y. Kim Chester Ismay: Data Science Curriculum Lead - DataCamp, Portland, OR, USA. Email: chester@moderndive.com Webpage: http://ismayc.github.io/ Twitter: old_man_chester GitHub: https://github.com/ismayc Albert Y. Kim: Assistant Professor of Statistical &amp; Data Sciences - Smith College, Northampton, MA, USA. Email: albert@moderndive.com Webpage: http://rudeboybert.rbind.io/ Twitter: rudeboybert GitHub: https://github.com/rudeboybert "],
+["2-getting-started.html", "2 Getting Started with Data in R 2.1 What are R and RStudio? 2.2 How do I code in R? 2.3 What are R packages? 2.4 Explore your first dataset 2.5 Conclusion", " 2 Getting Started with Data in R Before we can start exploring data in R, there are some key concepts to understand first: What are R and RStudio? How do I code in R? What are R packages? If you are already familiar with these concepts, feel free to skip to Section 2.4 below introducing some of the datasets we will explore in depth in this book. Much of this chapter is based on two sources which you should feel free to use as references if you are looking for additional details: ModernDive co-author Chester Ismay’s Getting used to R, RStudio, and R Markdown (Ismay 2016), which includes video screen recordings that you can follow along and pause as you learn. DataCamp’s online tutorials. DataCamp is a browser-based interactive platform for learning data science and their tutorials will help facilitate your learning of the above concepts (and other topics in this book). Go to DataCamp and create an account before continuing. 2.1 What are R and RStudio? For much of this book, we will assume that you are using R via RStudio. First time users often confuse the two. At its simplest: R is like a car’s engine RStudio is like a car’s dashboard R: Engine RStudio: Dashboard More precisely, R is a programming language that runs computations while RStudio is an integrated development environment (IDE) that provides an interface by adding many convenient features and tools. So the way of having access to a speedometer, rearview mirrors, and a navigation system makes driving much easier, using RStudio’s interface makes using R much easier as well. Optional: For a more in-depth discussion on the difference between R and RStudio IDE, watch this DataCamp video (2m52s). 2.1.1 Installing R and RStudio If your instructor has provided you with a link and access to RStudio Server, then you can skip this section. We do recommend though after a few months of working on the RStudio Server that you return to these instructions. If you don’t know what RStudio Server is, then please read this section. You will first need to download and install both R and RStudio (Desktop version) on your computer. Download and install R. Note: You must do this first. Click on the download link corresponding to your computer’s operating system. Download and install RStudio. Scroll down to “Installers for Supported Platforms” Click on the download link corresponding to your computer’s operating system. Optional: If you need more detailed instructions on how to install R and RStudio, watch this DataCamp video (1m22s). 2.1.2 Using R via RStudio Recall our car analogy from above. Much as we don’t drive a car by interacting directly with the engine but rather by using elements on the car’s dashboard, we won’t be using R directly but rather we will use RStudio’s interface. After you install R and RStudio on your computer, you’ll have two new programs AKA applications you can open. We will always work in RStudio and not R. In other words: R: Do not open this RStudio: Open this After you open RStudio, you should see the following: Watch the following DataCamp video (4m10s) to learn about the different panes in RStudio, in particular the Console pane where you will later run R code. 2.2 How do I code in R? Now that you’re set up with R and RStudio, you are probably asking yourself “OK. Now how do I use R?” The first thing to note as that unlike other software like Excel, STATA, or SAS that provide point and click interfaces, R is an interpreted language, meaning you have to enter in R commands written in R code i.e. you have to program in R (we use the terms “coding” and “programming” interchangeably in this book). While it is not required to be a seasoned coder/computer programmer to use R, there is still a set of basic programming concepts that R users need to understand. Consequently, while this book is not a book on programming, you will still learn just enough of these basic programming concepts needed to explore and analyze data effectively. 2.2.1 Basic programming concepts and terminology To introduce you to many of these basic programming concepts and terminology, we direct you to the following DataCamp online interactive tutorials. For each of the tutorials, we give a list of the basic programming concepts covered. Note that in this book, we will use a different font to distinguish regular font from computer_code. It is important to note that while these tutorials serve as excellent introductions, a single pass through them is insufficient for long-term learning and retention. The ultimate tools for long-term learning and retention are “learning by doing” and repetition, something we will have you do over the course of the entire book and we encourage this process as much as possible as you learn any new skill. From the Introduction to R course complete the following chapters. As you work through the chapters, carefully note the important terms and what they are used for. We recommend you do so in a notebook that you can easily refer back to. Chapter 1 Intro to basics: Console pane: where you enter in commands Objects: where values are saved, how to assign values to objects. Data types: integers, doubles/numerics, logicals, characters. Chapter 2 Vectors: Vectors: a series of values. Chapter 4 Factors: Categorical data (as opposed to numerical data) are represented in R as factors. Chapter 5 Data frames: Data frames are analogous to rectangular spreadsheets: they are representations of datasets in R where the rows correspond observations and the columns correspond to variables that describe the observations. We will revisit this later in Section 2.4. From the Intermediate R course complete the following chapters: Chapter 1 Conditionals and Control Flow: Testing for equality in R using == (and not = which is typically used for assignment). Ex: 2 + 1 == 3 compares 2 + 1 to 3 and is correct R syntax, while 2 + 1 = 3 is not and is incorrect R syntax. Boolean algebra: TRUE/FALSE statements and mathematical operators such as &lt; (less than), &lt;= (less than or equal), and != (not equal to). Logical operators: &amp; representing “and”, | representing “or”. Ex: (2 + 1 == 3) &amp; (2 + 1 == 4) returns FALSE while (2 + 1 == 3) | (2 + 1 == 4) returns TRUE. Chapter 3 Functions: Concept of functions: they take in inputs (called arguments) and return outputs. You either manually specify a function’s arguments or use the function’s defaults. This list is by no means an exhaustive list of all the programming concepts and terminology needed to become a savvy R user; such a list would be so large it wouldn’t be very useful, especially for novices. Rather, we feel this is the bare minimum you need to know before you get started; the rest we feel you can learn as you go. Remember that your knowledge of all of these concepts will build as you get better and better at “speaking R” and getting used to its syntax. 2.2.2 Tips on learning to code Learning to code/program is very much like learning a foreign language, it can be very daunting and frustrating at first. However just as with learning a foreign language, if you put in the effort and are not afraid to make mistakes, anybody can learn. Lastly, there are a few useful things to keep in mind as you learn to program: Computers are stupid: You have to tell a computer everything it needs to do. Furthermore, your instructions can’t have any mistakes in them, nor can they be ambiguous in any way. Take the “copy/paste/tweak” approach: Especially when learning your first programming language, it is often much easier to taking existing code that you know works and modify it to suit your ends, rather than trying to write new code from scratch. We call this the copy/paste/tweak approach. So early on, we suggest not trying to code from scratch, but please take the code we provide throughout this book and play around with it! Practice is key: Just as the only solution to improving your foreign language skills is practice, so also the only way to get better at R is through practice. Don’t worry however, we’ll give you plenty of opportunities to practice! 2.3 What are R packages? Another point of confusion with new R users is the notion of a package. R packages extend the functionality of R by providing additional functions, data, and documentation and can be downloaded for free from the internet. They are written by a world-wide community of R users. For example, among the many packages we will use in this book are the ggplot2 package for data visualization in Chapter 3 dplyr package for data wrangling in Chapter 5 There are two key things to remember about R packages: Installation: Most packages are not installed by default when you install R and RStudio. You need to install a package before you can use it. Once you’ve installed it, you likely don’t need to install it again unless you want to update it to a newer version of the package. Loading: Packages are not loaded automatically when you open RStudio. You need to load them every time you open RStudio using the library() command. A good analogy for R packages is they are like apps you can download onto a mobile phone: R: A new phone R Packages: Apps you can download So, expanding on this analogy a bit: R is like a new mobile phone. It has a certain amount of functionality when you use it for the first time, but it doesn’t have everything. R packages are like the apps you can download onto your phone, much like those offered in the App Store and Google Play. For example: Instagram. In order to use a package, just like in order to use Instagram, you must: First download it and install it. You do this only once. Load it, or in other words, “open” it, using the library() command. So just as you can only start sharing photos with your friends on Instagram if you first install the app and then open it, you can only access an R package’s data and functions if you first install the package and then load it with the library() command. Let’s cover these two steps: 2.3.1 Package installation (Note that if you are working on an RStudio Server, you probably will not need to install your own packages as that has been already done for you. Still it is important that you know this process for later when you are not using the RStudio Server but rather your own installation of RStudio Desktop.) There are two ways to install an R package. For example, to install the ggplot2 package: Easy way: In the Files pane of RStudio: Click on the “Packages” tab Click on “Install” Type the name of the package under “Packages (separate multiple with space or comma):” In this case, type ggplot2 Click “Install” Alternative way: In the Console pane run install.packages(&quot;ggplot2&quot;) (you must include the quotation marks). Repeat this for the dplyr and nycflights13 packages. Note: You only have to install a package once, unless you want to update an already installed package to the latest version. If you want to update a package to the latest version, then re-install it by repeating the above steps. 2.3.2 Package loading After you’ve installed a package, you can now load it using the library() command. For example, to load the ggplot2 and dplyr packages, run the following code in the Console pane: library(ggplot2) library(dplyr) Note: You have to reload each package you want to use every time you open a new session of RStudio. This is a little annoying to get used to and will be your most common error as you begin. When you see an error such as Error: could not find function remember that this likely comes from you trying to use a function in a package that has not been loaded. Remember to run the library() function with the appropriate package to fix this error. 2.4 Explore your first dataset Let’s put everything we’ve learned so far into practice and start exploring some real data! Data comes to us in a variety of formats, from pictures to text to numbers. Throughout this book, we’ll focus on datasets that can be stored in a spreadsheet as that is among the most common way data is collected in the many fields. Remember from Subsection 2.2.1 that these “spreadsheet”-type datasets are called data frames in R and we will focus on working with data frames throughout this book. Let’s first load all the packages needed for this chapter (This assumes you’ve already installed them. Read Section 2.3 for information on how to install and load R packages if you haven’t already.) At the beginning of all subsequent chapters in this text, we’ll always have a list of packages similar to what follows that you should have installed and loaded to work with that chapter’s R code. library(dplyr) Warning: package &#39;dplyr&#39; was built under R version 3.5.2 # Be sure to install these first! library(nycflights13) library(knitr) 2.4.1 nycflights13 package We likely have all flown on airplanes or know someone who has. Air travel has become an ever-present aspect in many people’s lives. If you live in or are visiting a relatively large city and you walk around that city’s airport, you see gates showing flight information from many different airlines. And you will frequently see that some flights are delayed because of a variety of conditions. Are there ways that we can avoid having to deal with these flight delays? We’d all like to arrive at our destinations on time whenever possible. (Unless you secretly love hanging out at airports. If you are one of these people, pretend for the moment that you are very much anticipating being at your final destination.) Throughout this book, we’re going to analyze data related to flights contained in the nycflights13 package (Wickham 2018). Specifically, this package contains five datasets saved as “data frames” (see Section 2.2) with information about all domestic flights departing from New York City in 2013, from either Newark Liberty International (EWR), John F. Kennedy International (JFK), or LaGuardia (LGA) airports: flights: information on all 336,776 flights airlines: translation between two letter IATA carrier codes and names (16 in total) planes: construction information about each of 3,322 planes used weather: hourly meteorological data (about 8705 observations) for each of the three NYC airports airports: airport names and locations 2.4.2 flights data frame We will begin by exploring the flights data frame that is included in the nycflights13 package and getting an idea of its structure. Run the following in your code in your console: it loads in the flights dataset into your Console. Note depending on the size of your monitor, the output may vary slightly. flights # A tibble: 336,776 x 19 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; 1 2013 1 1 517 515 2 830 819 2 2013 1 1 533 529 4 850 830 3 2013 1 1 542 540 2 923 850 4 2013 1 1 544 545 -1 1004 1022 5 2013 1 1 554 600 -6 812 837 6 2013 1 1 554 558 -4 740 728 7 2013 1 1 555 600 -5 913 854 8 2013 1 1 557 600 -3 709 723 9 2013 1 1 557 600 -3 838 846 10 2013 1 1 558 600 -2 753 745 # … with 336,766 more rows, and 11 more variables: arr_delay &lt;dbl&gt;, # carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;, # air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;, time_hour &lt;dttm&gt; Let’s unpack this output: A tibble: 336,776 x 19: a tibble is a kind of data frame. This particular data frame has 336,776 rows 19 columns corresponding to 19 variables describing each observation year month day dep_time sched_dep_time dep_delay arr_time are different columns, in other words variables, of this data frame. We then have the first 10 rows of observations corresponding to 10 flights. ... with 336,766 more rows, and 11 more variables: indicating to us that 336,766 more rows of data and 11 more variables could not fit in this screen. Unfortunately, this output does not allow us to explore the data very well. Let’s look at different tools to explore data frames. 2.4.3 Exploring data frames Among the many ways of getting a feel for the data contained in a data frame such as flights, we present three functions that take as their argument the data frame in question: Using the View() function built for use in RStudio. We will use this the most. Using the glimpse() function loaded via dplyr package Using the kable() function in the knitr package Using the $ operator to view a single variable in a data frame 1. View(): Run View(flights) in your Console in RStudio and explore this data frame in the resulting pop-up viewer. You should get into the habit of always Viewing any data frames that come your way. Note the capital “V” in View. R is case-sensitive so you’ll receive an error is you run view(flights) instead of View(flights). Learning check (LC2.1) What does any ONE row in this flights dataset refer to? A. Data on an airline B. Data on a flight C. Data on an airport D. Data on multiple flights By running View(flights), we see the different variables listed in the columns and we see that there are different types of variables. Some of the variables like distance, day, and arr_delay are what we will call quantitative variables. These variables are numerical in nature. Other variables here are categorical. Note that if you look in the leftmost column of the View(flights) output, you will see a column of numbers. These are the row numbers of the dataset. If you glance across a row with the same number, say row 5, you can get an idea of what each row corresponds to. In other words, this will allow you to identify what object is being referred to in a given row. This is often called the observational unit. The observational unit in this example is an individual flight departing New York City in 2013. You can identify the observational unit by determining what the thing is that is being measured in each of the variables. 2. glimpse(): The second way to explore a data frame is using the glimpse() function that you can access after you’ve loaded the dplyr package. It provides us with much of the above information and more. glimpse(flights) Observations: 336,776 Variables: 19 $ year &lt;int&gt; 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, … $ month &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … $ day &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … $ dep_time &lt;int&gt; 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558,… $ sched_dep_time &lt;int&gt; 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600,… $ dep_delay &lt;dbl&gt; 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -… $ arr_time &lt;int&gt; 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849… $ sched_arr_time &lt;int&gt; 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851… $ arr_delay &lt;dbl&gt; 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -… $ carrier &lt;chr&gt; &quot;UA&quot;, &quot;UA&quot;, &quot;AA&quot;, &quot;B6&quot;, &quot;DL&quot;, &quot;UA&quot;, &quot;B6&quot;, &quot;EV&quot;, &quot;B6&quot;, … $ flight &lt;int&gt; 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, … $ tailnum &lt;chr&gt; &quot;N14228&quot;, &quot;N24211&quot;, &quot;N619AA&quot;, &quot;N804JB&quot;, &quot;N668DN&quot;, &quot;N39… $ origin &lt;chr&gt; &quot;EWR&quot;, &quot;LGA&quot;, &quot;JFK&quot;, &quot;JFK&quot;, &quot;LGA&quot;, &quot;EWR&quot;, &quot;EWR&quot;, &quot;LGA&quot;… $ dest &lt;chr&gt; &quot;IAH&quot;, &quot;IAH&quot;, &quot;MIA&quot;, &quot;BQN&quot;, &quot;ATL&quot;, &quot;ORD&quot;, &quot;FLL&quot;, &quot;IAD&quot;… $ air_time &lt;dbl&gt; 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, … $ distance &lt;dbl&gt; 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733,… $ hour &lt;dbl&gt; 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, … $ minute &lt;dbl&gt; 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, … $ time_hour &lt;dttm&gt; 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 … Learning check (LC2.2) What are some examples in this dataset of categorical variables? What makes them different than quantitative variables? (LC2.3) What does int, dbl, and chr mean in the output above? We see that glimpse will give you the first few entries of each variable in a row after the variable. In addition, the data type (See Subsection 2.2.1) of the variable is given immediately after each variable’s name inside &lt; &gt;. Here, int and num refer to quantitative variables. In contrast, chr refers to categorical variables. One more type of variable is given here with the time_hour variable: dttm. As you may suspect, this variable corresponds to a specific date and time of day. 3. kable(): The final way to explore the entirety of a data frame is using the kable() function from the knitr package. Let’s explore the different carrier codes for all the airlines in our dataset two ways. Run both of these in your Console: airlines kable(airlines) At first glance of both outputs, it may not appear that there is much difference. However, we’ll see later on, especially when using a tool for document production called R Markdown, that the latter produces output that is much more legible. 4. $ operator Lastly, the $ operator allows us to explore a single variable within a data frame. For example, run the following in your console airlines airlines$name We used the $ operator to extract only the name variable and return it as a vector of length 16. We will only be occasionally exploring data frames using this operator. 2.4.4 Help files Another nice feature of R is the help system. You can get help in R by entering a ? before the name of a function or data frame in question and you will be presented with a page showing the documentation. For example, let’s look at the help file for the flights data frame: ?flights A help file should pop-up in the Help pane of RStudio. Note the content of this particular help file is also accessible on the web on page 3 of the PDF document. You should get in the habit of consulting the help file of any function or data frame in R about which you have questions. 2.5 Conclusion We’ve given you what we feel are the most essential concepts to know before you can start exploring data in R. Is this chapter exhaustive? Absolutely not. To try to include everything in this chapter would make the chapter so large it wouldn’t be useful! However, as we stated earlier, the best way to learn R is to learn by doing. Now let’s get into learning about how to create good stories about and with data. In Chapter 3, we start with what we feel is the most important tool in a data scientist’s toolbox: data visualization. 2.5.1 What’s to come? We’ll now start the “data science” portion of the in Chapter 3, where we will further explore the datasets include the nycflights13 package. We’ll see that data visualization is a powerful tool to add to our toolbox for exploring what is going on in a dataset beyond the View and glimpse functions we introduced in this chapter. "],
+["3-viz.html", "3 Data Visualization via ggplot2 3.1 The Grammar of Graphics 3.2 Five Named Graphs - The 5NG 3.3 5NG#1: Scatterplots 3.4 5NG#2: Linegraphs 3.5 5NG#3: Histograms 3.6 Facets 3.7 5NG#4: Boxplots 3.8 5NG#5: Barplots 3.9 Conclusion", " 3 Data Visualization via ggplot2 We begin the development of your data science toolbox with data visualization. By visualizing our data, we will be able to gain valuable insights from our data that we couldn’t initially see from just looking at the raw data in spreadsheet form. We will use the ggplot2 package as it provides an easy way to customize your plots and is rooted in the data visualization theory known as The Grammar of Graphics (Wilkinson 2005). At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). The most important thing to know about graphics is that they should be created to make it obvious for your audience to understand the findings and insight you want to get across. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible, but on the other you don’t want to include so many as to overwhelm your audience. As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the distribution of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is distributed in terms of its values) as we go across the levels of a different categorical variable. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(nycflights13) library(ggplot2) library(dplyr) DataCamp Our approach to introducing data visualization via the Grammar of Graphics and the ggplot2 package is very similar to the approach taken in David Robinson’s DataCamp course “Introduction to the Tidyverse,” a course targeted at people new to R and the tidyverse. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters of the course are Chapter 2 on “Data visualization” and Chapter 4 on “Types of visualizations”. 3.1 The Grammar of Graphics We begin with a discussion of a theoretical framework for data visualization known as the “The Grammar of Graphics,” which serves as the basis for the ggplot2 package. Much like how we construct sentences in any language by using a linguistic grammar (nouns, verbs, subjects, objects, etc.), the theoretical framework given by Leland Wilkinson (Wilkinson 2005) allows us to specify the components of a statistical graphic. 3.1.1 Components of the Grammar In short, the grammar tells us that: A statistical graphic is a mapping of data variables to aesthetic attributes of geometric objects. Specifically, we can break a graphic into the following three essential components: data: the data-set comprised of variables that we map. geom: the geometric object in question. This refers to our type of objects we can observe in our plot. For example, points, lines, bars, etc. aes: aesthetic attributes of the geometric object that we can perceive on a graphic. For example, x/y position, color, shape, and size. Each assigned aesthetic attribute can be mapped to a variable in our data-set. Let’s break down the grammar with an example. 3.1.2 Gapminder In February 2006, a statistician named Hans Rosling gave a TED talk titled “The best stats you’ve ever seen” where he presented global economic, health, and development data from the website gapminder.org. For example, from the 1704 countries included from 2007, consider only the first 6 countries when listed alphabetically: Table 3.1: Gapminder 2007 Data: First 6 of 142 countries Country Continent Life Expectancy Population GDP per Capita Afghanistan Asia 43.83 31889923 974.58 Albania Europe 76.42 3600523 5937.03 Algeria Africa 72.30 33333216 6223.37 Angola Africa 42.73 12420476 4797.23 Argentina Americas 75.32 40301927 12779.38 Australia Oceania 81.23 20434176 34435.37 Each row in this table corresponds to a country in 2007. For each row, we have 5 columns: Country: Name of country. Continent: Which of the five continents the country is part of. (Note that Americas groups North and South America and that Antarctica is excluded here.) Life Expectancy: Life expectancy in years. Population: Number of people living in the country. GDP per Capita: Gross domestic product (in US dollars). Now consider Figure 3.1, which plots this data for all 142 countries in the data frame. Note that R will deal with large numbers using scientific notation. So in the legend for “Population”, 1.25e+09 = \\(1.25 \\times 10^{9}\\) = 1,250,000,000 = 1.25 billion. Figure 3.1: Life Expectancy over GDP per Capita in 2007 Let’s view this plot through the grammar of graphics: The data variable GDP per Capita gets mapped to the x-position aesthetic of the points. The data variable Life Expectancy gets mapped to the y-position aesthetic of the points. The data variable Population gets mapped to the size aesthetic of the points. The data variable Continent gets mapped to the color aesthetic of the points. Recall that data here corresponds to each of the variables being in the same data frame and the “data variable” corresponds to a column in a data frame. While in this example we are considering one type of geometric object (of type point), graphics are not limited to just points. Some plots involve lines while others involve bars. Let’s summarize the three essential components of the grammar in a table: Table 3.2: Summary of Grammar of Graphics for this plot data variable aes geom GDP per Capita x point Life Expectancy y point Population size point Continent color point 3.1.3 Other components of the Grammar There are other components of the Grammar of Graphics we can control. As you start to delve deeper into the Grammar of Graphics, you’ll start to encounter these topics more and more often. In this book, we’ll only work with the two other components below (The other components are left to a more advanced text such as R for Data Science (Grolemund and Wickham 2016)): faceting breaks up a plot into small multiples corresponding to the levels of another variable (Section 3.6) position adjustments for barplots (Section 3.8) In general, the Grammar of Graphics allows for a high degree of customization and also a consistent framework for easy updating/modification of plots. 3.1.4 The ggplot2 package In this book, we will be using the ggplot2 package for data visualization, which is an implementation of the Grammar of Graphics for R (Wickham et al. 2018). You may have noticed that a lot of the previous text in this chapter is written in computer font. This is because the various components of the Grammar of Graphics are specified in the ggplot function, which expects at a bare minimum as arguments: The data frame where the variables exist: the data argument The mapping of the variables to aesthetic attributes: the mapping argument, which specifies the aesthetic attributes involved After we’ve specified these components, we then add layers to the plot using the + sign. The most essential layer to add to a plot is the specification of which type of geometric object we want the plot to involve; e.g. points, lines, bars. Other layers we can add include the specification of the plot title, axes labels, facets, and visual themes for the plot. Let’s now put the theory of the Grammar of Graphics into practice. 3.2 Five Named Graphs - The 5NG For our purposes, we will be limiting consideration to five different types of graphs. We term these five named graphs the 5NG: scatterplots linegraphs boxplots histograms barplots We will discuss some variations of these plots, but with this basic repertoire in your toolbox you can visualize a wide array of different data variable types. Note that certain plots are only appropriate for categorical/logical variables and others only for quantitative variables. You’ll want to quiz yourself often as we go along on which plot makes sense a given a particular problem or data-set. 3.3 5NG#1: Scatterplots The simplest of the 5NG are scatterplots (also called bivariate plots); they allow you to investigate the relationship between two numerical variables. While you may already be familiar with this type of plot, let’s view it through the lens of the Grammar of Graphics. Specifically, we will graphically investigate the relationship between the following two numerical variables in the flights data frame: dep_delay: departure delay on the horizontal “x” axis and arr_delay: arrival delay on the vertical “y” axis for Alaska Airlines flights leaving NYC in 2013. This requires paring down the flights data frame to a smaller data frame all_alaska_flights consisting of only Alaska Airlines (carrier code “AS”) flights. Don’t worry for now if you don’t fully understand what this code is doing, we’ll explain this in details Chapter 5, just run it all and understand that we are taking all flights and only considering those corresponding to Alaska Airlines. all_alaska_flights &lt;- flights %&gt;% filter(carrier == &quot;AS&quot;) This code snippet makes use of functions in the dplyr package for data wrangling to achieve our goal: it takes the flights data frame and filters it to only return the rows which meet the condition carrier == &quot;AS&quot;. Recall from Section 2.2 that testing for equality is specified with == and not =. You will see many more examples of == and filter() in Chapter 5. Learning check (LC3.1) Take a look at both the flights and all_alaska_flights data frames by running View(flights) and View(all_alaska_flights) in the console. In what respect do these data frames differ? 3.3.1 Scatterplots via geom_point We proceed to create the scatterplot using the ggplot() function: ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point() Figure 3.2: Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013 In Figure 3.2 we see that a positive relationship exists between dep_delay and arr_delay: as departure delays increase, arrival delays tend to also increase. We also note that the majority of points fall near the point (0, 0). There is a large mass of points clustered there. Furthermore after executing this code, R returns a warning message alerting us to the fact that 5 rows were ignored due to missing values. For 5 rows either the value for dep_delay or arr_delay or both were missing, and thus these rows were ignored in our plot. Let’s go back to the ggplot() function call that created this visualization, keeping in mind our discussion in Section 3.1: Within the ggplot() function call, we specify two of the components of the grammar: The data frame to be all_alaska_flights by setting data = all_alaska_flights The aesthetic mapping by setting aes(x = dep_delay, y = arr_delay). Specifically the variable dep_delay maps to the x position aesthetic the variable arr_delay maps to the y position aesthetic We add a layer to the ggplot() function call using the + sign. The layer in question specifies the third component of the grammar: the geometric object. In this case the geometric object are points, set by specifying geom_point(). Some notes on layers: Note that the + sign comes at the end of lines, and not at the beginning. You’ll get an error in R if you put it at the beginning. When adding layers to a plot, you are encouraged to hit Return on your keyboard after entering the + so that the code for each layer is on a new line. As we add more and more layers to plots, you’ll see this will greatly improve the legibility of your code. To stress the importance of adding layers, in particular the layer specifying the geometric object, consider Figure 3.3 where no layers are added. A not very useful plot! ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) Figure 3.3: Plot with No Layers Learning check (LC3.2) What are some practical reasons why dep_delay and arr_delay have a positive relationship? (LC3.3) What variables (not necessarily in the flights data frame) would you expect to have a negative correlation (i.e. a negative relationship) with dep_delay? Why? Remember that we are focusing on numerical variables here. (LC3.4) Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights? (LC3.5) What are some other features of the plot that stand out to you? (LC3.6) Create a new scatterplot using different variables in the all_alaska_flights data frame by modifying the example above. 3.3.2 Over-plotting The large mass of points near (0, 0) in Figure 3.2 can cause some confusion. This is the result of a phenomenon called overplotting. As one may guess, this corresponds to values being plotted on top of each other over and over again. It is often difficult to know just how many values are plotted in this way when looking at a basic scatterplot as we have here. There are two ways to address this issue: By adjusting the transparency of the points via the alpha argument By jittering the points via geom_jitter() The first way of relieving overplotting is by changing the alpha argument in geom_point() which controls the transparency of the points. By default, this value is set to 1. We can change this to any value between 0 and 1 where 0 sets the points to be 100% transparent and 1 sets the points to be 100% opaque. Note how the following function call is identical to the one in Section 3.3, but with alpha = 0.2 added to the geom_point(). ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_point(alpha = 0.2) Figure 3.4: Delay scatterplot with alpha=0.2 The key feature to note in Figure 3.4 is that the transparency of the points is cumulative: areas with a high-degree of overplotting are darker, whereas areas with a lower degree are less dark. Note that there is no aes() surrounding alpha = 0.2 here. Since we are NOT mapping a variable to an aesthetic but instead are just changing a setting, we don’t need to create a mapping with aes(). In fact, you’ll receive an error if you try to change the second line above to geom_point(aes(alpha = 0.2)). The second way of relieving overplotting is to jitter the points a bit. In other words, we are going to add just a bit of random noise to the points to better see them and alleviate some of the overplotting. You can think of “jittering” as shaking the points around a bit on the plot. Let’s illustrate using a simple example first. Say we have a data frame jitter_example with 4 rows of identical value 0 for both x and y: jitter_example # A tibble: 4 x 2 x y &lt;dbl&gt; &lt;dbl&gt; 1 0 0 2 0 0 3 0 0 4 0 0 We display the resulting scatterplot in Figure 3.5; observe that the 4 points are superimposed on top of each other. While we know there are 4 values being plotted, this fact might not be apparent to others. Figure 3.5: Regular scatterplot of jitter example data In Figure 3.6 we instead display a jittered scatterplot. Since each point is given a random “nudge”, it is now plainly evident that there are four points. Figure 3.6: Jittered scatterplot of jitter example data To create a jittered scatterplot, instead of using geom_point, we use geom_jitter. To specify how much jitter to add, we adjust the width and height arguments. This corresponds to how hard you’d like to shake the plot in units corresponding to those for both the horizontal and vertical variables (in this case, minutes). It is important to add just enough jitter to break any overlap in points, but not so much that we completely obscure the overall pattern in points. ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_jitter(width = 30, height = 30) Figure 3.7: Jittered delay scatterplot Observe how this function call is identical to the one in Subsection 3.3.1, but with geom_point() replaced with geom_jitter(). Also, it is important to note that geom_jitter() is strictly a visualization tool and that does not alter the original values saved in jitter_example. The plot in Figure 3.7 helps us a little bit in getting a sense for the overplotting, but with a relatively large data-set like this one (714 flights), it can be argued that changing the transparency of the points by setting alpha proved more effective. Furthermore, we’ll see later on that the two following R commands will yield the exact same plot: ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + geom_jitter(width = 30, height = 30) ggplot(all_alaska_flights, aes(x = dep_delay, y = arr_delay)) + geom_jitter(width = 30, height = 30) In other words you can drop the data = and mapping = if you keep the order of the two arguments the same. Since the ggplot() function is expecting its first argument data to be a data frame and its second argument to correspond to mapping =, you can omit both and you’ll get the same plot. As you get more and more practice, you’ll likely find yourself not including the specification of the argument like this. But for now to keep things straightforward let’s make it a point to include the data = and mapping =. Learning check (LC3.7) Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? (LC3.8) After viewing the Figure 3.4 above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2? 3.3.3 Summary Scatterplots display the relationship between two numerical variables. They are among the most commonly used plots because they can provide an immediate way to see the trend in one variable versus another. However, if you try to create a scatterplot where either one of the two variables is not numerical, you will get strange results. Be careful! With medium to large data-sets, you may need to play with either geom_jitter() or the alpha argument in order to get a good feel for relationships in your data. This tweaking is often a fun part of data visualization since you’ll have the chance to see different relationships come about as you make subtle changes to your plots. 3.4 5NG#2: Linegraphs The next of the 5NG is a linegraph. They are most frequently used when the x-axis represents time and the y-axis represents some other numerical variable; such plots are known as time series. Time represents a variable that is connected together by each day following the previous day. In other words, time has a natural ordering. Linegraphs should be avoided when there is not a clear sequential ordering to the explanatory variable, i.e. the x-variable or the predictor variable. Our focus now turns to the temp variable in this weather data-set. By Looking over the weather data-set by typing View(weather) in the console. Running ?weather to bring up the help file. We can see that the temp variable corresponds to hourly temperature (in Fahrenheit) recordings at weather stations near airports in New York City. Instead of considering all hours in 2013 for all three airports in NYC, let’s focus on the hourly temperature at Newark airport (origin code “EWR”) for the first 15 days in January 2013. The weather data frame in the nycflights13 package contains this data, but we first need to filter it to only include those rows that correspond to Newark in the first 15 days of January. early_january_weather &lt;- weather %&gt;% filter(origin == &quot;EWR&quot; &amp; month == 1 &amp; day &lt;= 15) This is similar to the previous use of the filter command in Section 3.3, however we now use the &amp; operator. The above selects only those rows in weather where the originating airport is &quot;EWR&quot; and we are in the first month and the day is from 1 to 15 inclusive. Learning check (LC3.9) Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather) in the console. In what respect do these data frames differ? (LC3.10) View() the flights data frame again. Why does the time_hour variable uniquely identify the hour of the measurement whereas the hour variable does not? 3.4.1 Linegraphs via geom_line We plot a linegraph of hourly temperature using geom_line(): ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) + geom_line() Figure 3.8: Hourly Temperature in Newark for January 1-15, 2013 Much as with the ggplot() call in Chapter 3.3.1, we describe the components of the Grammar of Graphics: Within the ggplot() function call, we specify two of the components of the grammar: The data frame to be early_january_weather by setting data = early_january_weather The aesthetic mapping by setting aes(x = time_hour, y = temp). Specifically time_hour (i.e. the time variable) maps to the x position temp maps to the y position We add a layer to the ggplot() function call using the + sign The layer in question specifies the third component of the grammar: the geometric object in question. In this case the geometric object is a line, set by specifying geom_line(). Learning check (LC3.11) Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? (LC3.12) Why are linegraphs frequently used when time is the explanatory variable? (LC3.13) Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013. 3.4.2 Summary Linegraphs, just like scatterplots, display the relationship between two numerical variables. However, the variable on the x-axis (i.e. the explanatory variable) should have a natural ordering, like some notion of time. We can mislead our audience if that isn’t the case. 3.5 5NG#3: Histograms Let’s consider the temp variable in the weather data frame once again, but now unlike with the linegraphs in Chapter 3.4, let’s say we don’t care about the relationship of temperature to time, but rather we care about the (statistical) distribution of temperatures. We could just produce points where each of the different values appear on something similar to a number line: Figure 3.9: Plot of Hourly Temperature Recordings from NYC in 2013 This gives us a general idea of how the values of temp differ. We see that temperatures vary from around 11 up to 100 degrees Fahrenheit. The area between 40 and 60 degrees appears to have more points plotted than outside that range. 3.5.1 Histograms via geom_histogram What is commonly produced instead of the above plot is a plot known as a histogram. The histogram shows how many elements of a single numerical variable fall in specified bins. In this case, these bins may correspond to between 0-10°F, 10-20°F, etc. We produce a histogram of the hour temperatures at all three NYC airports in 2013: ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram() `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Warning: Removed 1 rows containing non-finite values (stat_bin). Figure 3.10: Histogram of Hourly Temperature Recordings from NYC in 2013 Note here: There is only one variable being mapped in aes(): the single numerical variable temp. You don’t need to compute the y-aesthetic: it gets computed automatically. We set the geometric object to be geom_histogram() We got a warning message of 1 rows containing non-finite values being removed. This is due to one of the values of temperature being missing. R is alerting us that this happened. Another warning corresponds to an urge to specify the number of bins you’d like to create. 3.5.2 Adjusting the bins We can adjust characteristics of the bins in one of two ways: By adjusting the number of bins via the bins argument By adjusting the width of the bins via the binwidth argument First, we have the power to specify how many bins we would like to put the data into as an argument in the geom_histogram() function. By default, this is chosen to be 30 somewhat arbitrarily; we have received a warning above our plot that this was done. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(bins = 60, color = &quot;white&quot;) Figure 3.11: Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Bins Note the addition of the color argument. If you’d like to be able to more easily differentiate each of the bins, you can specify the color of the outline as done above. You can also adjust the color of the bars by setting the fill argument. Type colors() in your console to see all 657 available colors. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(bins = 60, color = &quot;white&quot;, fill = &quot;steelblue&quot;) Figure 3.12: Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Colored Bins Second, instead of specifying the number of bins, we can also specify the width of the bins by using the binwidth argument in the geom_histogram function. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(binwidth = 10, color = &quot;white&quot;) Figure 3.13: Histogram of Hourly Temperature Recordings from NYC in 2013 - Binwidth = 10 Learning check (LC3.14) What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures? (LC3.15) Would you classify the distribution of temperatures as symmetric or skewed? (LC3.16) What would you guess is the “center” value in this distribution? Why did you make that choice? (LC3.17) Is this data spread out greatly from the center or is it close? Why? 3.5.3 Summary Histograms, unlike scatterplots and linegraphs, present information on only a single numerical variable. In particular they are visualizations of the (statistical) distribution of values. 3.6 Facets Before continuing the 5NG, we briefly introduce a new concept called faceting. Faceting is used when we’d like to create small multiples of the same plot over a different categorical variable. By default, all of the small multiples will have the same vertical axis. For example, suppose we were interested in looking at how the temperature histograms we saw in Chapter 3.5 varied by month. This is what is meant by “the distribution of a variable over another variable”: temp is one variable and month is the other variable. In order to look at histograms of temp for each month, we add a layer facet_wrap(~ month). You can also specify how many rows you’d like the small multiple plots to be in using nrow or how many columns using ncol inside of facet_wrap. ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + facet_wrap(~ month, nrow = 4) Figure 3.14: Faceted histogram Note the use of the ~ before month in facet_wrap. The tilde (~) is required and you’ll receive the error Error in as.quoted(facets) : object 'month' not found if you don’t include it before month here. As we might expect, the temperature tends to increase as summer approaches and then decrease as winter approaches. Learning check (LC3.18) What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables? (LC3.19) What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100? (LC3.20) For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics. (LC3.21) Does the temp variable in the weather data-set have a lot of variability? Why do you say that? 3.7 5NG#4: Boxplots While using faceted histograms can provide a way to compare distributions of a numerical variable split by groups of a categorical variable as in Section 3.6, an alternative plot called a boxplot (also called a side-by-side boxplot) achieves the same task and is frequently preferred. The boxplot uses the information provided in the five-number summary referred to in Appendix A. It gives a way to compare this summary information across the different levels of a categorical variable. 3.7.1 Boxplots via geom_boxplot Let’s create a boxplot to compare the monthly temperatures as we did above with the faceted histograms. ggplot(data = weather, mapping = aes(x = month, y = temp)) + geom_boxplot() Figure 3.15: Invalid boxplot specification Warning messages: 1: Continuous x aesthetic -- did you forget aes(group=...)? 2: Removed 1 rows containing non-finite values (stat_boxplot). Note the set of warnings that is given here. The second warning corresponds to missing values in the data frame and it is turned off on subsequent plots. Let’s focus on the first warning. Observe that this plot does not look like what we were expecting. We were expecting to see the distribution of temperatures for each month (so 12 different boxplots). The first warning is letting us know that we are plotting a numerical, and not categorical variable, on the x-axis. This gives us the overall boxplot without any other groupings. We can get around this by introducing a new function for our x variable: ggplot(data = weather, mapping = aes(x = factor(month), y = temp)) + geom_boxplot() Figure 3.16: Month by temp boxplot We have introduced a new function called factor() which converts a numerical variable to a categorical one. This is necessary as geom_boxplot requires the x variable to be a categorical variable, which the variable month is not. So after applying factor(month), month goes from having numerical values 1, 2, …, 12 to having labels “1”, “2”, …, “12”. The resulting Figure 3.16 shows 12 separate “box and whiskers” plots with the following features: The “box” portions of this plot represent the 25th percentile AKA the 1st quartile, the median AKA the 50th percentile AKA the 2nd quartile, and the 75th percentile AKA the 3rd quartile. The height of each box, i.e. the value of the 3rd quartile minus the value of the 1st quartile, is called the interquartile range (\\(IQR\\)). It is a measure of spread of the middle 50% of values, with longer boxes indicating more variability. The “whisker” portions of these plots extend out from the bottoms and tops of the boxes and represent points less than the 25th percentile and greater than the 75th percentiles respectively. They’re set to extend out no more than \\(1.5 \\times IQR\\) units away from either end of the boxes. We say “no more than” because the ends of the whiskers represent the first observed values of temp to be within the range of the whiskers. The length of these whiskers show how the data outside the middle 50% of values vary, with longer whiskers indicating more variability. The dots representing values falling outside the whiskers are called outliers. It is important to keep in mind that the definition of an outlier is somewhat arbitrary and not absolute. In this case, they are defined by the length of the whiskers, which are no more than \\(1.5 \\times IQR\\) units long. Looking at this plot we can see, as expected, that summer months (6 through 8) have higher median temperatures as evidenced by the higher solid lines in the middle of the boxes. We can easily compare temperatures across months by drawing imaginary horizontal lines across the plot. Furthermore, the height of the 12 boxes as quantified by the interquartile ranges are informative too; they tell us about variability, or spread, of temperatures recorded in a given month. But to really bring home what boxplots show, let’s focus only on the month of November’s 2141 temperature recordings. Figure 3.17: November boxplot Now let’s plot all 2141 temperature recordings for November on top of the boxplot in Figure 3.18. Figure 3.18: November boxplot with points What the boxplot does is summarize the 2141 points for you, in particular: 25% of points (about 534 observations) fall below the bottom edge of the box which is the first quartile of 35.96 degrees Fahrenheit (2.2 degrees Celsius). In other words 25% of observations were colder than 35.96 degrees Fahrenheit. 25% of points fall between the bottom edge of the box and the solid middle line which is the median of 44.96 degrees Fahrenheit (7.8 degrees Celsius). In other words 25% of observations were between 35.96 and 44.96 degrees Fahrenheit. 25% of points fall between the solid middle line and the top edge of the box which is the third quartile of 51.98 degrees Fahrenheit (11.1 degrees Celsius). In other words 25% of observations were between 44.96 and 51.98 degrees Fahrenheit. 25% of points fall over the top edge of the box. In other words 25% of observations were warmer than 51.98 degrees Fahrenheit. The middle 50% of points lie within the interquartile range 16.02 degrees Fahrenheit. Learning check (LC3.22) What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point. (LC3.23) Which months have the highest variability in temperature? What reasons do you think this is? (LC3.24) We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example? (LC3.25) Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram? 3.7.2 Summary Boxplots provide a way to compare and contrast the distribution of one quantitative variable across multiple levels of one categorical variable. One can see where the median falls across the different groups by looking at the center line in the box. To see how spread out the variable is across the different groups, look at both the width of the box and also how far the lines stretch vertically from the box. (If the lines stretch far from the box but the box has a small width, the variability of the values closer to the center is much smaller than the variability of the outer ends of the variable.) Outliers are even more easily identified when looking at a boxplot than when looking at a histogram. 3.8 5NG#5: Barplots Both histograms and boxplots represent ways to visualize the variability of numerical variables. Another common task is to present the distribution of a categorical variable. This is a simpler task, focused on how many elements from the data fall into different categories of the categorical variable. Often the best way to visualize these different counts (also known as frequencies) is via a barplot, also known as a barchart. One complication, however, is how your data is represented: is the categorical variable of interest “pre-counted” or not? For example, run the following code in your Console. This code manually creates two data frames representing a collection of fruit: 3 apples and 2 oranges. fruits &lt;- data_frame( fruit = c(&quot;apple&quot;, &quot;apple&quot;, &quot;apple&quot;, &quot;orange&quot;, &quot;orange&quot;) ) fruits_counted &lt;- data_frame( fruit = c(&quot;apple&quot;, &quot;orange&quot;), number = c(3, 2) ) We see both the fruits and fruits_counted data frames represent the same collection of fruit. Whereas fruits just lists the fruit individually: Table 3.3: Fruits fruit apple apple apple orange orange fruits_counted has a variable count which represents pre-counted values of each fruit. Table 3.4: Fruits (Pre-Counted) fruit number apple 3 orange 2 3.8.1 Barplots via geom_bar/geom_col Let’s generate barplots using these two different representations of the same basket of fruit: 3 apples and 2 oranges. Using the not pre-counted data fruits from Table 3.3: ggplot(data = fruits, mapping = aes(x = fruit)) + geom_bar() Figure 3.19: Barplot when counts are not pre-counted and using the pre-counted data fruits_counted from Table 3.4: ggplot(data = fruits_counted, mapping = aes(x = fruit, y = number)) + geom_col() Figure 3.20: Barplot when counts are pre-counted Compare the barplots in Figures 3.19 and 3.20, which are identical, but are based on the two different data frames. Observe that: The code that generates Figure 3.19 based on fruits does not map a variable to the y aesthetic and uses geom_bar(). The code that generates Figure 3.20 based on fruits_counted maps the number variable to the y aesthetic and uses geom_col() Stating the above differently: When the categorical variable you want to plot is not pre-counted in your data frame you need to use geom_bar(). When the categorical variable is pre-counted (in the above fruits_counted example in the variable number), you need to use geom_col() with the y aesthetic explicitly mapped. Please note that understanding this difference is one of ggplot2’s trickier aspects that causes the most confusion, and fortunately this is as complicated as our use of ggplot2 is going to get. Let’s consider a different distribution: the distribution of airlines that flew out of New York City in 2013. Here we explore the number of flights from each airline/carrier. This can be plotted by invoking the geom_bar function in ggplot2: ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() Figure 3.21: Number of flights departing NYC in 2013 by airline using geom_bar To get an understanding of what the names of these airlines are corresponding to these carrier codes, we can look at the airlines data frame in the nycflights13 package. airlines carrier name 9E Endeavor Air Inc. AA American Airlines Inc. AS Alaska Airlines Inc. B6 JetBlue Airways DL Delta Air Lines Inc. EV ExpressJet Airlines Inc. F9 Frontier Airlines Inc. FL AirTran Airways Corporation HA Hawaiian Airlines Inc. MQ Envoy Air OO SkyWest Airlines Inc. UA United Air Lines Inc. US US Airways Inc. VX Virgin America WN Southwest Airlines Co. YV Mesa Airlines Inc. Going back to our barplot, we see that United Air Lines, JetBlue Airways, and ExpressJet Airlines had the most flights depart New York City in 2013. To get the actual number of flights by each airline we can use the group_by(), summarize(), and n() functions in the dplyr package on the carrier variable in flights, which we will introduce formally in Chapter 5. flights_table &lt;- flights %&gt;% group_by(carrier) %&gt;% summarize(number = n()) flights_table carrier number 9E 18460 AA 32729 AS 714 B6 54635 DL 48110 EV 54173 F9 685 FL 3260 HA 342 MQ 26397 OO 32 UA 58665 US 20536 VX 5162 WN 12275 YV 601 In this table, the counts of the carriers are pre-counted. To create a barplot using the data frame flights_table, we use geom_col() instead of geom_bar() map the y aesthetic to the variable number. Compare this barplot using geom_col in Figure 3.22 with the earlier barplot using geom_bar in Figure 3.21. They are identical. However the input data we used for these are different. ggplot(data = flights_table, mapping = aes(x = carrier, y = number)) + geom_col() Figure 3.22: Number of flights departing NYC in 2013 by airline using geom_col Learning check (LC3.26) Why are histograms inappropriate for visualizing categorical variables? (LC3.27) What is the difference between histograms and barplots? (LC3.28) How many Envoy Air flights departed NYC in 2013? (LC3.29) What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly? 3.8.2 Must avoid pie charts! Unfortunately, one of the most common plots seen today for categorical data is the pie chart. While they may see harmless enough, they actually present a problem in that humans are unable to judge angles well. As Naomi Robbins describes in her book “Creating More Effective Graphs” (Robbins 2013), we overestimate angles greater than 90 degrees and we underestimate angles less than 90 degrees. In other words, it is difficult for us to determine relative size of one piece of the pie compared to another. Let’s examine our previous barplot example on the number of flights departing NYC by airline. This time we will use a pie chart. As you review this chart, try to identify how much larger the portion of the pie is for ExpressJet Airlines (EV) compared to US Airways (US), what the third largest carrier is in terms of departing flights, and how many carriers have fewer flights than United Airlines (UA)? Figure 3.23: The dreaded pie chart While it is quite easy to look back at the barplot to get the answer to these questions, it’s quite difficult to get the answers correct when looking at the pie graph. Barplots can always present the information in a way that is easier for the eye to determine relative position. There may be one exception from Nathan Yau at FlowingData.com but we will leave this for the reader to decide: Figure 3.24: The only good pie chart Learning check (LC3.30) Why should pie charts be avoided and replaced by barplots? (LC3.31) What is your opinion as to why pie charts continue to be used? 3.8.3 Using barplots to compare two categorical variables Barplots are the go-to way to visualize the frequency of different categories of a categorical variable. They make it easy to order the counts and to compare the frequencies of one group to another. Another use of barplots (unfortunately, sometimes inappropriately and confusingly) is to compare two categorical variables together. Let’s examine the distribution of outgoing flights from NYC by carrier and airport. We begin by getting the names of the airports in NYC that were included in the flights data-set. Here, we preview the inner_join() function from Chapter 5. This function will join the data frame flights with the data frame airports by matching rows that have the same airport code. However, in flights the airport code is included in the origin variable whereas in airports the airport code is included in the faa variable. We will revisit such examples in Section 5.8 on joining data-sets. flights_namedports &lt;- flights %&gt;% inner_join(airports, by = c(&quot;origin&quot; = &quot;faa&quot;)) After running View(flights_namedports), we see that name now corresponds to the name of the airport as referenced by the origin variable. We will now plot carrier as the horizontal variable. When we specify geom_bar, it will specify count as being the vertical variable. A new addition here is fill = name. Look over what was produced from the plot to get an idea of what this argument gives. ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) + geom_bar() Figure 3.25: Stacked barplot comparing the number of flights by carrier and airport This plot is what is known as a stacked barplot. While simple to make, it often leads to many problems. For example in this plot, it is difficult to compare the heights of the different colors (corresponding to the number of flights from each airport) between the bars (corresponding to the different carriers). Note that fill is an aesthetic just like x is an aesthetic, and thus must be included within the parentheses of the aes() mapping. The following code, where the fill aesthetic is specified on the outside will yield an error. This is a fairly common error that new ggplot users make: ggplot(data = flights_namedports, mapping = aes(x = carrier), fill = name) + geom_bar() Learning check (LC3.32) What kinds of questions are not easily answered by looking at the above figure? (LC3.33) What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights? Another variation on the stacked barplot is the side-by-side barplot also called a dodged barplot. ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) + geom_bar(position = &quot;dodge&quot;) Figure 3.26: Side-by-side AKA dodged barplot comparing the number of flights by carrier and airport Learning check (LC3.34) Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case? (LC3.35) What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general? Lastly, an often preferred type of barplot is the faceted barplot. We already saw this concept of faceting and small multiples in Section 3.6. This gives us a nicer way to compare the distributions across both carrier and airport/name. ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) + geom_bar() + facet_wrap(~ name, ncol = 1) Figure 3.27: Faceted barplot comparing the number of flights by carrier and airport Learning check (LC3.36) Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case? (LC3.37) What information about the different carriers at different airports is more easily seen in the faceted barplot? 3.8.4 Summary Barplots are the preferred way of displaying categorical variables. They are easy-to-understand and make it easy to compare across groups of a categorical variable. When dealing with more than one categorical variable, faceted barplots are frequently preferred over side-by-side or stacked barplots. Stacked barplots are sometimes nice to look at, but it is quite difficult to compare across the levels since the sizes of the bars are all of different sizes. Side-by-side barplots can provide an improvement on this, but the issue about comparing across groups still must be dealt with. 3.9 Conclusion 3.9.1 Putting it all together Let’s recap all five of the Five Named Graphs (5NG) in Table 3.5 summarizing their differences. Using these 5NG, you’ll be able to visualize the distributions and relationships of variables contained in a wide array of datasets. This will be even more the case as we start to map more variables to more of each geometric object’s aesthetic attribute options, further unlocking the awesome power of the ggplot2 package. Table 3.5: Summary of 5NG Named graph Shows Geometric object Notes 1 Scatterplot Relationship between 2 numerical variables geom_point() 2 Linegraph Relationship between 2 numerical variables geom_line() Used when there is a sequential order to x-variable e.g. time 3 Histogram Distribution of 1 numerical variable geom_histogram() Facetted histogram shows distribution of 1 numerical variable split by 1 categorical variable 4 Boxplot Distribution of 1 numerical variable split by 1 categorical variable geom_boxplot() 5 Barplot Distribution of 1 categorical variable geom_barplot() when counts are not pre-counted Stacked &amp; dodged barplots show distribution of 2 categorical variables geom_col() when counts are pre-counted 3.9.2 Review questions Review questions have been designed using the fivethirtyeight R package (Kim, Ismay, and Chunn 2019) with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course Effective Data Storytelling using the tidyverse. The material in this chapter is covered in the chapters of the DataCamp course available below: Scatterplots &amp; Linegraphs Histograms &amp; Boxplots Barplots ggplot2 Review 3.9.3 What’s to come? In Chapter 4, we’ll introduce the concept of “tidy data” and how it is used as a key data format for all the packages we use in this textbook. You’ll see that the concept appears to be simple, but actually can be a little challenging to decipher without careful practice. We’ll also investigate how to import CSV (comma-separated value) files into R using the readr package. 3.9.4 Resources An excellent resource as you begin to create plots using the ggplot2 package is a cheatsheet that RStudio has put together entitled “Data Visualization with ggplot2” available by clicking here or by clicking the RStudio Menu Bar -&gt; Help -&gt; Cheatsheets -&gt; “Data Visualization with ggplot2” This cheatsheet covers more than what we’ve discussed in this chapter but provides nice visual descriptions of what each function produces. 3.9.5 Script of R code An R script file of all R code used in this chapter is available here. "],
+["4-tidy.html", "4 Tidy Data via tidyr 4.1 What is tidy data? 4.2 Back to nycflights13 4.3 Importing spreadsheets into R 4.4 Converting to “tidy” data format 4.5 Optional: Normal forms of data 4.6 Conclusion", " 4 Tidy Data via tidyr In Subsection 2.2.1 we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section 2.4, we started explorations of our first data frame flights included in the nycflights13 package. In Chapter 3 we made graphics using data contained in flights and other data frames. In this chapter, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest of having your data “neatly organized” in a spreadsheet. Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored and the implications of these rules on analyses. Although knowledge of this type of data formatting was not necessary in our treatment of data visualization in Chapter 3 since all the data was already in tidy format, we’ll see going forward that having tidy data will allow you to more easily create data visualizations in a wide range of settings. Furthermore, it will also help you with data wrangling in Chapter 5 and in all subsequent chapters in this book when we cover regression and discuss statistical inference. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(nycflights13) library(tidyr) library(readr) DataCamp Our approach to introducing the concept of “tidy” data is aligned with the approach taken in Alison Hill’s DataCamp course “Working with Data in the Tidyverse,” a course where students learn to work with data using tools from the tidyverse in R. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapter is Chapter 3 “Tidy your data.” 4.1 What is tidy data? You have surely heard the word “tidy” in your life: “Tidy up your room!” “Please write your homework in a tidy way so that it is easier to grade and to provide feedback.” Marie Kondo’s best-selling book The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing “I am not by any stretch of the imagination a tidy person, and the piles of unread books on the coffee table and by my bed have a plaintive, pleading quality to me - ‘Read me, please!’” - Linda Grant What does it mean for your data to be “tidy”? Beyond just being organized, in the context of this book having “tidy” data means that your data follows a standardized format. This makes it easier for you and others to visualize your data, to wrangle/transform your data, and to model your data. We will follow Hadley Wickham’s definition of tidy data here (Wickham 2014): A dataset is a collection of values, usually either numbers (if quantitative) or strings AKA text data (if qualitative). Values are organised in two ways. Every value belongs to a variable and an observation. A variable contains all values that measure the same underlying attribute (like height, temperature, duration) across units. An observation contains all values measured on the same unit (like a person, or a day, or a city) across attributes. Tidy data is a standard way of mapping the meaning of a dataset to its structure. A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types. In tidy data: Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table. Figure 4.1: Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html For example, say the following table consists of stock prices: Table 4.1: Stock Prices (Non-Tidy Format) Date Boeing Stock Price Amazon Stock Price Google Stock Price 2009-01-01 $173.55 $174.90 $174.34 2009-01-02 $172.61 $171.42 $170.04 Although the data are neatly organized in a spreadsheet-type format, they are not in tidy format since there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), but there are not three columns. In tidy data format each variable should be its own column, as shown below. Notice that both tables present the same information, but in different formats. Table 4.2: Stock Prices (Tidy Format) Date Stock Name Stock Price 2009-01-01 Boeing $173.55 2009-01-02 Boeing $172.61 2009-01-01 Amazon $174.90 2009-01-02 Amazon $171.42 2009-01-01 Google $174.34 2009-01-02 Google $170.04 However, consider the following table Table 4.3: Date, Boeing Price, Weather Data Date Boeing Price Weather 2009-01-01 $173.55 Sunny 2009-01-02 $172.61 Overcast In this case, even though the variable “Boeing Price” occurs again, the data is tidy since there are three variables corresponding to three unique pieces of information (Date, Boeing stock price, and the weather that particular day). The non-tidy data format in the original table is also known as “wide” format whereas the tidy data format in the second table is also known as “long/narrow” data format. In this book, we will work mostly with datasets that are already in tidy format even though a lot of the world’s data isn’t always in this nice format that the tidyverse gets its name from. Data that is in wide format can be converted to “tidy” format by using the gather() function in the tidyr package (Wickham and Henry 2018) in the tidyverse; we’ll show an example of this in Section 4.4. For other examples of converting a dataset into “tidy” format, check out the different functions available for data tidying and a case study using data from the World Health Organization in R for Data Science (Grolemund and Wickham 2016). Learning check (LC4.1) Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits? # A tibble: 3 x 4 country beer_servings spirit_servings wine_servings &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 Canada 240 122 100 2 South Korea 140 16 9 3 USA 249 158 84 This data frame is not in tidy format. What would it look like if it were? 4.2 Back to nycflights13 Recall the nycflights13 package with data about all domestic flights departing from New York City in 2013 that we introduced in Section 2.4 and used extensively in Chapter 3 to create visualizations. In particular, let’s revisit the flights data frame by running View(flights) in your console. We see that flights has a rectangular shape with each row corresponding to a different flight and each column corresponding to a characteristic of that flight. This matches exactly with how Hadley Wickham defined tidy data: Each variable forms a column. Each observation forms a row. But what about the third property? Each type of observational unit forms a table. 4.2.1 Observational units We identified earlier that the observational unit in the flights dataset is an individual flight. And we have shown that this dataset consists of 336,776 flights with 19 variables. In other words, rows of this dataset don’t refer to a measurement on an airline or on an airport; they refer to characteristics/measurements on a given flight from New York City in 2013. Also included in the nycflights13 package are datasets with different observational units (Wickham 2018): airlines: translation between two letter IATA carrier codes and names (16 in total) planes: construction information about each of 3,322 planes used weather: hourly meteorological data (about 8705 observations) for each of the three NYC airports airports: airport names and locations The organization of this data follows the third “tidy” data property: observations corresponding to the same observational unit should be saved in the same table/data frame. Another example involves a spreadsheet of all students enrolled in a university along with information about them, such as name, gender, and date of birth. Each row represents an individual student, which is the observational unit in question. 4.2.2 Identification vs measurement variables There is a subtle difference between the kinds of variables that you will encounter in data frames: measurement variables and identification variables. The airports data frame you worked with above contains both these types of variables. Recall that in airports the observational unit is an airport, and thus each row corresponds to one particular airport. Let’s pull them apart using the glimpse function: glimpse(airports) Observations: 1,458 Variables: 8 $ faa &lt;chr&gt; &quot;04G&quot;, &quot;06A&quot;, &quot;06C&quot;, &quot;06N&quot;, &quot;09J&quot;, &quot;0A9&quot;, &quot;0G6&quot;, &quot;0G7&quot;, &quot;0P2&quot;, … $ name &lt;chr&gt; &quot;Lansdowne Airport&quot;, &quot;Moton Field Municipal Airport&quot;, &quot;Schaumbu… $ lat &lt;dbl&gt; 41.13047, 32.46057, 41.98934, 41.43191, 31.07447, 36.37122, 41.… $ lon &lt;dbl&gt; -80.61958, -85.68003, -88.10124, -74.39156, -81.42778, -82.1734… $ alt &lt;int&gt; 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 409, 875, 1… $ tz &lt;dbl&gt; -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -5, -5, -5,… $ dst &lt;chr&gt; &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;… $ tzone &lt;chr&gt; &quot;America/New_York&quot;, &quot;America/Chicago&quot;, &quot;America/Chicago&quot;, &quot;Amer… The variables faa and name are what we will call identification variables: variables that uniquely identify each observational unit. They are mainly used to provide a unique name to each observational unit, thereby allowing us to uniquely identify them. faa gives the unique code provided by the FAA for that airport, while the name variable gives the longer more natural name of the airport. The remaining variables (lat, lon, alt, tz, dst, tzone) are often called measurement or characteristic variables: variables that describe properties of each observational unit, in other words each observation in each row. For example, lat and long describe the latitude and longitude of each airport. So in our above example of a spreadsheet of all students enrolled at a university, email address could be treated as an identical variable since it uniquely identifies each observational unit i.e. each student, while date of birth could not since it is possible (and highly probable) that two students share the same birthday. Furthermore, sometimes a single variable might not be enough to uniquely identify each observational unit: combinations of variables might be needed (see Learning Check below). While it is not an absolute rule, for organizational purposes it is considered good practice to have your identification variables in the far left-most columns of your data frame. Learning check (LC4.2) What properties of the observational unit do each of lat, lon, alt, tz, dst, and tzone describe for the airports data frame? Note that you may want to use ?airports to get more information. (LC4.3) Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions. 4.3 Importing spreadsheets into R Up to this point, we’ve used data either stored inside of an R package or we’ve manually created the data such as the fruits and fruits_counted data in Subsection 3.8. Another common way to get data into R is by reading in data from a spreadsheet file either on your computer or online. Spreadsheet data is often saved in one of two formats: A Comma Separated Values .csv file. You can think of a CSV file as a bare-bones spreadsheet where: Each line in the file corresponds to one row of data/one observation. Values for each line are separated with commas. In other words, the values of different variables are separated by commas. The first line is often, but not always, a header row indicating the names of the columns/variables. An Excel .xlsx file. This format is based on Microsoft’s proprietary Excel software. As opposed to a bare-bones .csv files, .xlsx Excel files contain a lot of metadata, or put more simply, data about the data. Examples include the use of bold and italic fonts, colored cells, different column widths, and formula macros etc. Google Sheets allows you to download your data in both comma separated values .csv and Excel .xlsx formats: Go to the Google Sheets menu bar -&gt; File -&gt; Download as -&gt; Select “Microsoft Excel” or “Comma-separated values”. We’ll cover two methods for importing data in R: one using the R console and the other using RStudio’s graphical interface. 4.3.1 Method 1: From the console First, let’s download a Comma Separated Values (CSV) file of ratings of the level of democracy in different countries spanning 1952 to 1992: https://moderndive.com/data/dem_score.csv. We use the read_csv() function from the readr package to read it off the web and then take a look. library(readr) dem_score &lt;- read_csv(&quot;https://moderndive.com/data/dem_score.csv&quot;) dem_score # A tibble: 96 x 10 country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 Albania -9 -9 -9 -9 -9 -9 -9 -9 5 2 Argentina -9 -1 -1 -9 -9 -9 -8 8 7 3 Armenia -9 -7 -7 -7 -7 -7 -7 -7 7 4 Australia 10 10 10 10 10 10 10 10 10 5 Austria 10 10 10 10 10 10 10 10 10 6 Azerbaijan -9 -7 -7 -7 -7 -7 -7 -7 1 7 Belarus -9 -7 -7 -7 -7 -7 -7 -7 7 8 Belgium 10 10 10 10 10 10 10 10 10 9 Bhutan -10 -10 -10 -10 -10 -10 -10 -10 -10 10 Bolivia -4 -3 -3 -4 -7 -7 8 9 9 # … with 86 more rows In this dem_score data frame, the minimum value of -10 corresponds to a highly autocratic nation whereas a value of 10 corresponds to a highly democratic nation. 4.3.2 Method 2: Using RStudio’s interface Let’s read in the same data saved in Excel format this time at https://moderndive.com/data/dem_score.xlsx, but using RStudio’s graphical interface instead of via the R console. First download the Excel file, then go to the Files pane of RStudio -&gt; Navigate to the directory where your downloaded dem_score.xlsx is saved -&gt; Click on dem_score.xlsx -&gt; Click “Import Dataset…” -&gt; Click “Import Dataset…” At this point you should see an image like in After clicking on the “Import” button on the bottom right RStudio save this spreadsheet’s data in a data frame called dem_score and display its contents in the spreadsheet viewer. Furthermore you’ll see the code that read in your data in the console; you can copy and paste this code to reload your data again later instead of repeating the above manual process. 4.4 Converting to “tidy” data format In this Section, we’ll show you how to convert a dataset that isn’t in “tidy” format i.e. “wide” format, to a dataset that is in “tidy” format i.e. “long/narrow” format. Let’s use the dem_score data frame we loaded from a spreadsheet in the previous Section but focus on only data corresponding to the country of Guatemala. guat_dem &lt;- dem_score %&gt;% filter(country == &quot;Guatemala&quot;) guat_dem # A tibble: 1 x 10 country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 Guatemala 2 -6 -5 3 1 -3 -7 3 3 Now let’s produce a plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Let’s start by laying out how we would map our aesthetics to variables in the data frame: The data frame is guat_dem by setting data = guat_dem What are the names of the variables to plot? We’d like to see how the democracy score has changed over the years. Now we are stuck in a predicament. We see that we have a variable named country but its only value is &quot;Guatemala&quot;. We have other variables denoted by different year values. Unfortunately, we’ve run into a dataset that is not in the appropriate format to apply the Grammar of Graphics and ggplot2. Remember that ggplot2 is a package in the tidyverse and, thus, needs data to be in a tidy format. We’d like to finish off our mapping of aesthetics to variables by doing something like The aesthetic mapping is set by aes(x = year, y = democracy_score) but this is not possible with our wide-formatted data. We need to take the values of the current column names in guat_dem (aside from country) and convert them into a new variable that will act as a key called year. Then, we’d like to take the numbers on the inside of the table and turn them into a column that will act as values called democracy_score. Our resulting data frame will have three columns: country, year, and democracy_score. The gather() function in the tidyr package can complete this task for us. The first argument to gather(), just as with ggplot2(), is the data argument where we specify which data frame we would like to tidy. The next two arguments to gather() are key and value, which specify what we’d like to call the new columns that convert our wide data into long format. Lastly, we include a specification for variables we’d like to NOT include in this tidying process using a -. guat_tidy &lt;- gather(data = guat_dem, key = year, value = democracy_score, - country) guat_tidy # A tibble: 9 x 3 country year democracy_score &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; 1 Guatemala 1952 2 2 Guatemala 1957 -6 3 Guatemala 1962 -5 4 Guatemala 1967 3 5 Guatemala 1972 1 6 Guatemala 1977 -3 7 Guatemala 1982 -7 8 Guatemala 1987 3 9 Guatemala 1992 3 We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a linegraph and ggplot2. ggplot(data = guat_tidy, mapping = aes(x = year, y = democracy_score)) + geom_line() geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? Observe that the year variable in guat_tidy is stored as a character vector since we had to circumvent the naming rules in R by adding backticks around the different year columns in guat_dem. This is leading to ggplot not knowing exactly how to plot a line using a categorical variable. We can fix this by using the parse_number() function in the readr package and then specify the horizontal axis label to be &quot;year&quot;: ggplot(data = guat_tidy, mapping = aes(x = parse_number(year), y = democracy_score)) + geom_line() + labs(x = &quot;year&quot;) Figure 4.2: Guatemala’s democracy score ratings from 1952 to 1992 We’ll see in Chapter 5 how we could use the mutate() function to change year to be a numeric variable instead after we have done our tidying. Notice now that the mappings of aesthetics to variables make sense in Figure 4.2: The data frame is guat_tidy by setting data = dem_score The x aesthetic is mapped to year The y aesthetic is mapped to democracy_score The geom_etry chosen is line Learning check (LC4.4) Convert the dem_score data frame into a tidy data frame and assign the name of dem_score_tidy to the resulting long-formatted data frame. (LC4.5) Read in the life expectancy data stored at https://moderndive.com/data/le_mess.csv and convert it to a tidy data frame. 4.5 Optional: Normal forms of data The datasets included in the nycflights13 package are in a form that minimizes redundancy of data. We will see that there are ways to merge (or join) the different tables together easily. We are capable of doing so because each of the tables have keys in common to relate one to another. This is an important property of normal forms of data. The process of decomposing data frames into less redundant tables without losing information is called normalization. More information is available on Wikipedia. We saw an example of this above with the airlines dataset. While the flights data frame could also include a column with the names of the airlines instead of the carrier code, this would be repetitive since there is a unique mapping of the carrier code to the name of the airline/carrier. Below an example is given showing how to join the airlines data frame together with the flights data frame by linking together the two datasets via a common key of &quot;carrier&quot;. Note that this “joined” data frame is assigned to a new data frame called joined_flights. The key variable that we frequently join by is one of the identification variables mentioned above. library(dplyr) joined_flights &lt;- inner_join(x = flights, y = airlines, by = &quot;carrier&quot;) View(joined_flights) If we View this dataset, we see a new variable has been created called name. (We will see in Subsection 5.9.2 ways to change name to a more descriptive variable name.) More discussion about joining data frames together will be given in Chapter 5. We will see there that the names of the columns to be linked need not match as they did here with &quot;carrier&quot;. Learning check (LC4.6) What are common characteristics of “tidy” datasets? (LC4.7) What makes “tidy” datasets useful for organizing data? (LC4.8) What are some advantages of data in normal forms? What are some disadvantages? 4.6 Conclusion 4.6.1 Review questions Review questions have been designed using the fivethirtyeight R package (Kim, Ismay, and Chunn 2019) with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course Effective Data Storytelling using the tidyverse. The material in this chapter is covered in the Tidy Data chapter of the DataCamp course available here. 4.6.2 What’s to come? In Chapter 5, we’ll further explore data in tidy format by grouping our data, creating summaries based on those groupings, filtering our data to match conditions, and performing other wranglings with our data including defining new columns/variables. These data wrangling procedures will go hand-in-hand with the data visualizations you’ve produced in Chapter 3. 4.6.3 Script of R code An R script file of all R code used in this chapter is available here. "],
+["5-wrangling.html", "5 Data Wrangling via dplyr 5.1 The pipe %&gt;% 5.2 Data wrangling verbs 5.3 Filter observations using filter 5.4 Summarize variables using summarize 5.5 Group rows using group_by 5.6 Create new variables/change old variables using mutate 5.7 Reorder the data frame using arrange 5.8 Joining data frames 5.9 Other verbs 5.10 Conclusion", " 5 Data Wrangling via dplyr Let’s briefly recap where we have been so far and where we are headed. In Chapter 4, we discussed what it means for data to be tidy. We saw that this refers to observations corresponding to rows and variables being stored in columns (one variable for every column). The entries in the data frame correspond to different combinations of observations (specific instances of observational units) and variables. In the flights data frame, we saw that each row corresponds to a different flight leaving New York City. In other words, the observational unit of the flights tidy data frame is a flight. The variables are listed as columns, and for flights these columns include both quantitative variables like dep_delay and distance and also categorical variables like carrier and origin. An entry in the table corresponds to a particular flight on a given day and a particular value of a given variable representing that flight. Armed with this knowledge and looking back on Chapter 3, we see that organizing data in this tidy way makes it easy for us to produce graphics, specifically a set of 5 common graphics we termed the 5 Named Graphics (5NG): scatterplots linegraphs boxplots histograms barplots We can simply specify what variable/column we would like on one axis, (if applicable) what variable we’d like on the other axis, and what type of plot we’d like to make by specifying the geometric object in question. We can also vary aesthetic attributes of the geometric objects in question (points, lines, bar), such as the size and color, along the values of another variable in this tidy dataset. Recall the Gapminder example from Figure 3.1. Lastly, in a few spots in Chapter 3 and Chapter 4, we hinted at some ways to summarize and wrangle data to suit your needs, using the filter() and inner_join() functions. This chapter expands on these two functions and provides you with various new data wrangling tools from the dplyr package (Wickham et al. 2019) for your data science toolbox. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(nycflights13) DataCamp Our approach to introducing data wrangling tools from the dplyr package is very similar to the approach taken in David Robinson’s DataCamp course “Introduction to the Tidyverse,” a course targeted at people new to R and the tidyverse. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 on “Data wrangling” and Chapter 3 on “Grouping and summarizing”. While not required for this book, if you would like a quick peek at more powerful tools to explore, tame, tidy, and transform data, we suggest you take Alison Hill’s DataCamp course “Working with Data in the Tidyverse,” Click on the image below to access the course. The relevant chapter is Chapter 3 “Tidy your data.” 5.1 The pipe %&gt;% Before we dig into data wrangling, let’s first introduce the pipe operator (%&gt;%). Just as the + sign was used to add layers to a plot created using ggplot(), the pipe operator allows us to chain together dplyr data wrangling functions. The pipe operator can be read as “then”. The %&gt;% operator allows us to go from one step in dplyr to the next easily so we can, for example: filter our data frame to only focus on a few rows then group_by another variable to create groups then summarize this grouped data to calculate the mean for each level of the group. The piping syntax will be our major focus throughout the rest of this book and you’ll find that you’ll quickly be addicted to the chaining with some practice. 5.2 Data wrangling verbs The d in dplyr stands for data frames, so the functions in dplyr are built for working with objects of the data frame type. For now, we focus on the most commonly used functions that help wrangle and summarize data. A description of these verbs follows, with each section devoted to an example of that verb, or a combination of a few verbs, in action. filter(): Pick rows based on conditions about their values summarize(): Compute summary measures known as “summary statistics” of variables group_by(): Group rows of observations together mutate(): Create a new variable in the data frame by mutating existing ones arrange(): Arrange/sort the rows based on one or more variables join(): Join/merge two data frames by matching along a “key” variable. There are many different join()s available. Here, we will focus on the inner_join() function. All of the verbs are used similarly where you: take a data frame, pipe it using the %&gt;% syntax into one of the verbs above followed by other arguments specifying which criteria you’d like the verb to work with in parentheses. Keep in mind, there are more advanced functions than just these five and you’ll see some examples of this near the end of this chapter in 5.9, but with just the above verbs you’ll be able to perform a broad array of data wrangling tasks. 5.3 Filter observations using filter Figure 5.1: Filter diagram from Data Wrangling with dplyr and tidyr cheatsheet The filter function here works much like the “Filter” option in Microsoft Excel; it allows you to specify criteria about values of a variable in your dataset and then chooses only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The dest code (or airport code) for Portland, Oregon is &quot;PDX&quot;. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here: portland_flights &lt;- flights %&gt;% filter(dest == &quot;PDX&quot;) View(portland_flights) Note the following: The ordering of the commands: Take the data frame flights then filter the data frame so that only those where the dest equals &quot;PDX&quot; are included. The double equal sign == for testing for equality, and not =. You are almost guaranteed to make the mistake at least once of only including one equals sign. You can combine multiple criteria together using operators that make comparisons: | corresponds to “or” &amp; corresponds to “and” We can often skip the use of &amp; and just separate our conditions with a comma. You’ll see this in the example below. In addition, you can use other mathematical checks (similar to ==): &gt; corresponds to “greater than” &lt; corresponds to “less than” &gt;= corresponds to “greater than or equal to” &lt;= corresponds to “less than or equal to” != corresponds to “not equal to” To see many of these in action, let’s select all flights that left JFK airport heading to Burlington, Vermont (&quot;BTV&quot;) or Seattle, Washington (&quot;SEA&quot;) in the months of October, November, or December. Run the following btv_sea_flights_fall &lt;- flights %&gt;% filter(origin == &quot;JFK&quot;, (dest == &quot;BTV&quot; | dest == &quot;SEA&quot;), month &gt;= 10) View(btv_sea_flights_fall) Note: even though colloquially speaking one might say “all flights leaving Burlington, Vermont and Seattle, Washington,” in terms of computer logical operations, we really mean “all flights leaving Burlington, Vermont or Seattle, Washington.” For a given row in the data, dest can be “BTV”, “SEA”, or something else, but not “BTV” and “SEA” at the same time. Another example uses the ! to pick rows that don’t match a condition. The ! can be read as “not”. Here we are selecting rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA. not_BTV_SEA &lt;- flights %&gt;% filter(!(dest == &quot;BTV&quot; | dest == &quot;SEA&quot;)) View(not_BTV_SEA) As a final note we point out that filter() should often be the first verb you’ll apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope to just the observations your care about. Learning check (LC5.1) What’s another way using the “not” operator ! we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the flights data frame? Test this out using the code above. 5.4 Summarize variables using summarize The next common task when working with data is to be able to summarize data: take a large number of values and summarize them with a single value. While this may seem like a very abstract idea, something as simple as the sum, the smallest value, and the largest values are all summaries of a large number of values. Figure 5.2: Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet Figure 5.3: Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet We can calculate the standard deviation and mean of the temperature variable temp in the weather data frame of nycflights13 in one step using the summarize (or equivalently using the UK spelling summarise) function in dplyr (See Appendix A): summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp), std_dev = sd(temp)) summary_temp mean std_dev —– ——– We’ve created a small data frame here called summary_temp that includes both the mean and the std_dev of the temp variable in weather. Notice as shown in Figures 5.2 and 5.3, the data frame weather went from many rows to a single row of just the summary values in the data frame summary_temp. But why are the values returned NA? This stands for “not available or not applicable” and is how R encodes missing values; if in a data frame for a particular row and column no value exists, NA is stored instead. Furthermore, by default any time you try to summarize a number of values (using mean() and sd() for example) that has one or more missing values, then NA is returned. Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You’ll often encounter issues with missing values. You can summarize all non-missing values by setting the na.rm argument to TRUE (rm is short for “remove”). This will remove any NA missing values and only return the summary value for all non-missing values. So the code below computes the mean and standard deviation of all non-missing values. Notice how the na.rm=TRUE are set as arguments to the mean() and sd() functions, and not to the summarize() function. summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) summary_temp mean std_dev 55.26039 17.78785 It is not good practice to include a na.rm = TRUE in your summary commands by default; you should attempt to run code first without this argument as this will alert you to the presence of missing data. Only after you’ve identified where missing values occur and have thought about the potential causes of this missing should you consider using na.rm = TRUE. In the upcoming Learning Checks we’ll consider the possible ramifications of blindly sweeping rows with missing values under the rug. What other summary functions can we use inside the summarize() verb? Any function in R that takes a vector of values and returns just one. Here are just a few: mean(): the mean AKA the average sd(): the standard deviation, which is a measure of spread min() and max(): the minimum and maximum values respectively IQR(): Interquartile range sum(): the sum n(): a count of the number of rows/observations in each group. This particular summary function will make more sense when group_by() is covered in Section 5.5. Learning check (LC5.2) Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach? (LC5.3) Modify the above summarize function to create summary_temp to also use the n() summary function: summarize(count = n()). What does the returned value correspond to? (LC5.4) Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) first. summary_temp &lt;- weather %&gt;% summarize(mean = mean(temp, na.rm = TRUE)) %&gt;% summarize(std_dev = sd(temp, na.rm = TRUE)) 5.5 Group rows using group_by Figure 5.4: Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet It’s often more useful to summarize a variable based on the groupings of another variable. Let’s say, we are interested in the mean and standard deviation of temperatures but grouped by month. To be more specific: we want the mean and standard deviation of temperatures split by month. sliced by month. aggregated by month. collapsed over month. Run the following code: summary_monthly_temp &lt;- weather %&gt;% group_by(month) %&gt;% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) summary_monthly_temp month mean std_dev 1 35.63566 10.224635 2 34.27060 6.982378 3 39.88007 6.249278 4 51.74564 8.786168 5 61.79500 9.681644 6 72.18400 7.546371 7 80.06622 7.119898 8 74.46847 5.191615 9 67.37129 8.465902 10 60.07113 8.846035 11 44.99043 10.443805 12 38.44180 9.982432 This code is identical to the previous code that created summary_temp, with an extra group_by(month) added. Grouping the weather dataset by month and then passing this new data frame into summarize yields a data frame that shows the mean and standard deviation of temperature for each month in New York City. Note: Since each row in summary_monthly_temp represents a summary of different rows in weather, the observational units have changed. It is important to note that group_by doesn’t change the data frame. It sets meta-data (data about the data), specifically the group structure of the data. It is only after we apply the summarize function that the data frame changes. If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the ungroup() function. For example, say the group structure meta-data is set to be by month via group_by(month), all future summarizations will be reported on a month-by-month basis. If however, we would like to no longer have this and have all summarizations be for all data in a single group (in this case over the entire year of 2013), then pipe the data frame in question through and ungroup() to remove this. We now revisit the n() counting summary function we introduced in the previous section. For example, suppose we’d like to get a sense for how many flights departed each of the three airports in New York City: by_origin &lt;- flights %&gt;% group_by(origin) %&gt;% summarize(count = n()) by_origin origin count EWR 120835 JFK 111279 LGA 104662 We see that Newark (&quot;EWR&quot;) had the most flights departing in 2013 followed by &quot;JFK&quot; and lastly by LaGuardia (&quot;LGA&quot;). Note there is a subtle but important difference between sum() and n(). While sum() simply adds up a large set of numbers, the latter counts the number of times each of many different values occur. 5.5.1 Grouping by more than one variable You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports for each month, we can also group by a second variable month: group_by(origin, month). by_origin_monthly &lt;- flights %&gt;% group_by(origin, month) %&gt;% summarize(count = n()) by_origin_monthly # A tibble: 36 x 3 # Groups: origin [3] origin month count &lt;chr&gt; &lt;int&gt; &lt;int&gt; 1 EWR 1 9893 2 EWR 2 9107 3 EWR 3 10420 4 EWR 4 10531 5 EWR 5 10592 6 EWR 6 10175 7 EWR 7 10475 8 EWR 8 10359 9 EWR 9 9550 10 EWR 10 10104 # … with 26 more rows We see there are 36 rows to by_origin_monthly because there are 12 months times 3 airports (EWR, JFK, and LGA). Let’s now pose two questions. First, what if we reverse the order of the grouping i.e. we group_by(month, origin)? by_monthly_origin &lt;- flights %&gt;% group_by(month, origin) %&gt;% summarize(count = n()) by_monthly_origin # A tibble: 36 x 3 # Groups: month [12] month origin count &lt;int&gt; &lt;chr&gt; &lt;int&gt; 1 1 EWR 9893 2 1 JFK 9161 3 1 LGA 7950 4 2 EWR 9107 5 2 JFK 8421 6 2 LGA 7423 7 3 EWR 10420 8 3 JFK 9697 9 3 LGA 8717 10 4 EWR 10531 # … with 26 more rows In by_monthly_origin the month column is now first and the rows are sorted by month instead of origin. If you compare the values of count in by_origin_monthly and by_monthly_origin using the View() function, you’ll see that the values are actually the same, just presented in a different order. Second, why do we group_by(origin, month) and not group_by(origin) and then group_by(month)? Let’s investigate: by_origin_monthly_incorrect &lt;- flights %&gt;% group_by(origin) %&gt;% group_by(month) %&gt;% summarize(count = n()) by_origin_monthly_incorrect # A tibble: 12 x 2 month count &lt;int&gt; &lt;int&gt; 1 1 27004 2 2 24951 3 3 28834 4 4 28330 5 5 28796 6 6 28243 7 7 29425 8 8 29327 9 9 27574 10 10 28889 11 11 27268 12 12 28135 What happened here is that the second group_by(month) overrode the first group_by(origin), so that in the end we are only grouping by month. The lesson here, is if you want to group_by() two or more variables, you should include all these variables in a single group_by() function call. Learning check (LC5.5) Recall from Chapter 3 when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the summary_monthly_temp data frame tell us about temperatures in New York City throughout the year? (LC5.6) What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC? (LC5.7) Recreate by_monthly_origin, but instead of grouping via group_by(origin, month), group variables in a different order group_by(month, origin). What differs in the resulting dataset? (LC5.8) How could we identify how many flights left each of the three airports for each carrier? (LC5.9) How does the filter operation differ from a group_by followed by a summarize? 5.6 Create new variables/change old variables using mutate Figure 5.5: Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet When looking at the flights dataset, there are some clear additional variables that could be calculated based on the values of variables already in the dataset. Passengers are often frustrated when their flights departs late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to when they expected to land. This is commonly referred to as “gain” and we will create this variable using the mutate function. Note that we have also overwritten the flights data frame with what it was before as well as an additional variable gain here, or put differently, the mutate() command outputs a new data frame which then gets saved over the original flights data frame. flights &lt;- flights %&gt;% mutate(gain = dep_delay - arr_delay) Let’s take a look at dep_delay, arr_delay, and the resulting gain variables for the first 5 rows in our new flights data frame: # A tibble: 5 x 3 dep_delay arr_delay gain &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 2 11 -9 2 4 20 -16 3 2 33 -31 4 -1 -18 17 5 -6 -25 19 The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its “gained time in the air” is actually a loss of 9 minutes, hence its gain is -9. Contrast this to the flight in the fourth row which departed a minute early (dep_delay of -1) but arrived 18 minutes early (arr_delay of -18), so its “gained time in the air” is 17 minutes, hence its gain is +17. Why did we overwrite flights instead of assigning the resulting data frame to a new object, like flights_with_gain? As a rough rule of thumb, as long as you are not losing information that you might need later, it’s acceptable practice to overwrite data frames. However, if you overwrite existing variables and/or change the observational units, recovering the original information might prove difficult. In this case, it might make sense to create a new data object. Let’s look at summary measures of this gain variable and even plot it in the form of a histogram: gain_summary &lt;- flights %&gt;% summarize( min = min(gain, na.rm = TRUE), q1 = quantile(gain, 0.25, na.rm = TRUE), median = quantile(gain, 0.5, na.rm = TRUE), q3 = quantile(gain, 0.75, na.rm = TRUE), max = max(gain, na.rm = TRUE), mean = mean(gain, na.rm = TRUE), sd = sd(gain, na.rm = TRUE), missing = sum(is.na(gain)) ) gain_summary min q1 median q3 max mean sd missing -196 -3 7 17 109 5.659779 18.04365 9430 We’ve recreated the summary function we saw in Chapter 3 here using the summarize function in dplyr. ggplot(data = flights, mapping = aes(x = gain)) + geom_histogram(color = &quot;white&quot;, bins = 20) Figure 5.6: Histogram of gain variable We can also create multiple columns at once and even refer to columns that were just created in a new column. Hadley and Garrett produce one such example in Chapter 5 of “R for Data Science” (Grolemund and Wickham 2016): flights &lt;- flights %&gt;% mutate( gain = dep_delay - arr_delay, hours = air_time / 60, gain_per_hour = gain / hours ) Learning check (LC5.10) What do positive values of the gain variable in flights correspond to? What about negative values? And what about a zero value? (LC5.11) Could we create the dep_delay and arr_delay columns by simply subtracting dep_time from sched_dep_time and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in flights. (LC5.12) What can we say about the distribution of gain? Describe it in a few sentences using the plot and the gain_summary data frame values. 5.7 Reorder the data frame using arrange One of the most common things people working with data would like to do is sort the data frames by a specific variable in a column. Have you ever been asked to calculate a median by hand? This requires you to put the data in order from smallest to highest in value. The dplyr package has a function called arrange that we will use to sort/reorder our data according to the values of the specified variable. This is often used after we have used the group_by and summarize functions as we will see. Let’s suppose we were interested in determining the most frequent destination airports from New York City in 2013: freq_dest &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) freq_dest # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 ABQ 254 2 ACK 265 3 ALB 439 4 ANC 8 5 ATL 17215 6 AUS 2439 7 AVL 275 8 BDL 443 9 BGR 375 10 BHM 297 # … with 95 more rows You’ll see that by default the values of dest are displayed in alphabetical order here. We are interested in finding those airports that appear most: freq_dest %&gt;% arrange(num_flights) # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 LEX 1 2 LGA 1 3 ANC 8 4 SBN 10 5 HDN 15 6 MTJ 15 7 EYW 17 8 PSP 19 9 JAC 25 10 BZN 36 # … with 95 more rows This is actually giving us the opposite of what we are looking for. It tells us the least frequent destination airports first. To switch the ordering to be descending instead of ascending we use the desc (descending) function: freq_dest %&gt;% arrange(desc(num_flights)) # A tibble: 105 x 2 dest num_flights &lt;chr&gt; &lt;int&gt; 1 ORD 17283 2 ATL 17215 3 LAX 16174 4 BOS 15508 5 MCO 14082 6 CLT 14064 7 SFO 13331 8 FLL 12055 9 MIA 11728 10 DCA 9705 # … with 95 more rows 5.8 Joining data frames Another common task is joining AKA merging two different datasets. For example, in the flights data, the variable carrier lists the carrier code for the different flights. While &quot;UA&quot; and &quot;AA&quot; might be somewhat easy to guess for some (United and American Airlines), what are “VX”, “HA”, and “B6”? This information is provided in a separate data frame airlines. View(airlines) We see that in airports, carrier is the carrier code while name is the full name of the airline. Using this table, we can see that “VX”, “HA”, and “B6” correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, will we have to continually look up the carrier’s name for each flight in the airlines dataset? No! Instead of having to do this manually, we can have R automatically do the “looking up” for us. Note that the values in the variable carrier in flights match the values in the variable carrier in airlines. In this case, we can use the variable carrier as a key variable to join/merge/match the two data frames by. Key variables are almost always identification variables that uniquely identify the observational units as we saw back in Subsection 4.2.2 on identification vs measurement variables. This ensures that rows in both data frames are appropriate matched during the join. Hadley and Garrett (Grolemund and Wickham 2016) created the following diagram to help us understand how the different datasets are linked by various key variables: Figure 5.7: Data relationships in nycflights13 from R for Data Science 5.8.1 Joining by “key” variables In both flights and airlines, the key variable we want to join/merge/match the two data frames with has the same name in both datasets: carriers. We make use of the inner_join() function to join by the variable carrier. flights_joined &lt;- flights %&gt;% inner_join(airlines, by = &quot;carrier&quot;) View(flights) View(flights_joined) We observed that the flights and flights_joined are identical except that flights_joined has an additional variable name whose values were drawn from airlines. A visual representation of the inner_join is given below (Grolemund and Wickham 2016): Figure 5.8: Diagram of inner join from R for Data Science There are more complex joins available, but the inner_join will solve nearly all of the problems you’ll face in our experience. 5.8.2 Joining by “key” variables with different names Say instead, you are interested in all the destinations of flights from NYC in 2013 and ask yourself: “What cities are these airports in?” “Is &quot;ORD&quot; Orlando?” &quot;Where is &quot;FLL&quot;? The airports data frame contains airport codes: View(airports) However, looking at both the airports and flights and the visual representation of the relations between the data frames in Figure 5.8, we see that in: airports the airport code is in the variable faa flights the airport code is in the variable origin So to join these two datasets, our inner_join operation involves a by argument that accounts for the different names: flights %&gt;% inner_join(airports, by = c(&quot;dest&quot; = &quot;faa&quot;)) Let’s construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport: named_dests &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) %&gt;% arrange(desc(num_flights)) %&gt;% inner_join(airports, by = c(&quot;dest&quot; = &quot;faa&quot;)) %&gt;% rename(airport_name = name) named_dests # A tibble: 101 x 9 dest num_flights airport_name lat lon alt tz dst tzone &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; 1 ORD 17283 Chicago Ohare Intl 42.0 -87.9 668 -6 A America… 2 ATL 17215 Hartsfield Jackson… 33.6 -84.4 1026 -5 A America… 3 LAX 16174 Los Angeles Intl 33.9 -118. 126 -8 A America… 4 BOS 15508 General Edward Law… 42.4 -71.0 19 -5 A America… 5 MCO 14082 Orlando Intl 28.4 -81.3 96 -5 A America… 6 CLT 14064 Charlotte Douglas … 35.2 -80.9 748 -5 A America… 7 SFO 13331 San Francisco Intl 37.6 -122. 13 -8 A America… 8 FLL 12055 Fort Lauderdale Ho… 26.1 -80.2 9 -5 A America… 9 MIA 11728 Miami Intl 25.8 -80.3 8 -5 A America… 10 DCA 9705 Ronald Reagan Wash… 38.9 -77.0 15 -5 A America… # … with 91 more rows In case you didn’t know, &quot;ORD&quot; is the airport code of Chicago O’Hare airport and &quot;FLL&quot; is the main airport in Fort Lauderdale, Florida, which we can now see in our named_dests data frame. 5.8.3 Joining by multiple “key” variables Say instead we are in a situation where we need to join by multiple variables. For example, in Figure 5.7 above we see that in order to join the flights and weather data frames, we need more than one key variable: year, month, day, hour, and origin. This is because the combination of these 5 variables act to uniquely identify each observational unit in the weather data frame: hourly weather recordings at each of the 3 NYC airports. We achieve this by specifying a vector of key variables to join by using the c() concatenate function. Note the individual variables need to be wrapped in quotation marks. flights_weather_joined &lt;- flights %&gt;% inner_join(weather, by = c(&quot;year&quot;, &quot;month&quot;, &quot;day&quot;, &quot;hour&quot;, &quot;origin&quot;)) flights_weather_joined # A tibble: 335,220 x 32 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; 1 2013 1 1 517 515 2 830 819 2 2013 1 1 533 529 4 850 830 3 2013 1 1 542 540 2 923 850 4 2013 1 1 544 545 -1 1004 1022 5 2013 1 1 554 600 -6 812 837 6 2013 1 1 554 558 -4 740 728 7 2013 1 1 555 600 -5 913 854 8 2013 1 1 557 600 -3 709 723 9 2013 1 1 557 600 -3 838 846 10 2013 1 1 558 600 -2 753 745 # … with 335,210 more rows, and 24 more variables: arr_delay &lt;dbl&gt;, # carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;, # air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;, # time_hour.x &lt;dttm&gt;, gain &lt;dbl&gt;, hours &lt;dbl&gt;, gain_per_hour &lt;dbl&gt;, # temp &lt;dbl&gt;, dewp &lt;dbl&gt;, humid &lt;dbl&gt;, wind_dir &lt;dbl&gt;, wind_speed &lt;dbl&gt;, # wind_gust &lt;dbl&gt;, precip &lt;dbl&gt;, pressure &lt;dbl&gt;, visib &lt;dbl&gt;, # time_hour.y &lt;dttm&gt; Learning check (LC5.13) Looking at Figure 5.7, when joining flights and weather (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of year, month, day, hour, and origin, and not just hour? (LC5.14) What surprises you about the top 10 destinations from NYC in 2013? 5.9 Other verbs On top of the following examples of other verbs, if you’d like to see more examples on using dplyr, the data wrangling verbs we introduction in Section 5.2, and the pipe function %&gt;% with the nycflights13 dataset, check out Chapter 5 of Hadley and Garrett’s book (Grolemund and Wickham 2016). 5.9.1 Select variables using select Figure 5.9: Select diagram from Data Wrangling with dplyr and tidyr cheatsheet We’ve seen that the flights data frame in the nycflights13 package contains many different variables. The names function gives a listing of all the columns in a data frame; in our case you would run names(flights). You can also identify these variables by running the glimpse function in the dplyr package: glimpse(flights) However, say you only want to consider two of these variables, say carrier and flight. You can select these: flights %&gt;% select(carrier, flight) This function makes navigating datasets with a very large number of variables easier for humans by restricting consideration to only those of interest, like carrier and flight above. So for example, this might make viewing the dataset using the View() spreadsheet viewer more digestible. However, as far as the computer is concerned it doesn’t care how many variables additional variables are in the dataset in question, so long as carrier and flight are included. Another example involves the variable year. If you remember the original description of the flights data frame (or by running ?flights), you’ll remember that this data correspond to flights in 2013 departing New York City. The year variable isn’t really a variable here in that it doesn’t vary… flights actually comes from a larger dataset that covers many years. We may want to remove the year variable from our dataset since it won’t be helpful for analysis in this case. We can deselect year by using the - sign: flights_no_year &lt;- flights %&gt;% select(-year) names(flights_no_year) Or we could specify a ranges of columns: flight_arr_times &lt;- flights %&gt;% select(month:day, arr_time:sched_arr_time) flight_arr_times The select function can also be used to reorder columns in combination with the everything helper function. Let’s suppose we’d like the hour, minute, and time_hour variables, which appear at the end of the flights dataset, to actually appear immediately after the day variable: flights_reorder &lt;- flights %&gt;% select(month:day, hour:time_hour, everything()) names(flights_reorder) in this case everything() picks up all remaining variables. Lastly, the helper functions starts_with, ends_with, and contains can be used to choose column names that match those conditions: flights_begin_a &lt;- flights %&gt;% select(starts_with(&quot;a&quot;)) flights_begin_a flights_delays &lt;- flights %&gt;% select(ends_with(&quot;delay&quot;)) flights_delays flights_time &lt;- flights %&gt;% select(contains(&quot;time&quot;)) flights_time 5.9.2 Rename variables using rename Another useful function is rename, which as you may suspect renames one column to another name. Suppose we wanted dep_time and arr_time to be departure_time and arrival_time instead in the flights_time data frame: flights_time_new &lt;- flights %&gt;% select(contains(&quot;time&quot;)) %&gt;% rename(departure_time = dep_time, arrival_time = arr_time) names(flights_time) Note that in this case we used a single = sign with the rename(). Ex: departure_time = dep_time. This is because we are not testing for equality like we would using ==, but instead we want to assign a new variable departure_time to have the same values as dep_time and then delete the variable dep_time. It’s easy to forget if the new name comes before or after the equals sign. I usually remember this as “New Before, Old After” or NBOA. You’ll receive an error if you try to do it the other way: Error: Unknown variables: departure_time, arrival_time. 5.9.3 Find the top number of values using top_n We can also use the top_n function which automatically tells us the most frequent num_flights. We specify the top 10 airports here: named_dests %&gt;% top_n(n = 10, wt = num_flights) We’ll still need to arrange this by num_flights though: named_dests %&gt;% top_n(n = 10, wt = num_flights) %&gt;% arrange(desc(num_flights)) Note: Remember that I didn’t pull the n and wt arguments out of thin air. They can be found by using the ? function on top_n. We can go one stop further and tie together the group_by and summarize functions we used to find the most frequent flights: ten_freq_dests &lt;- flights %&gt;% group_by(dest) %&gt;% summarize(num_flights = n()) %&gt;% arrange(desc(num_flights)) %&gt;% top_n(n = 10) View(ten_freq_dests) Learning check (LC5.15) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways. (LC5.16) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains. (LC5.17) Why might we want to use the select function on a data frame? (LC5.18) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013. 5.10 Conclusion 5.10.1 Putting it all together: Available seat miles Let’s recap a selection of verbs in Table 5.1 summarizing their differences. Using these verbs and the pipe %&gt;% operator from Section 5.1, you’ll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book. Table 5.1: Summary of data wrangling verbs Verb Data wrangling operation 1 filter() Pick out a subset of rows 2 summarize() Summarize many values to one using a summary statistic function like mean(), median(), etc. 3 group_by() Add grouping structure to rows in data frame. Note this does not change values in data frame. 4 mutate() Create new variables by mutating existing ones 5 arrange() Arrange rows of a data variable in ascending (default) or descending order 6 inner_join() Join/merge two data frames, matching rows by a key variable 7 select() Pick out a subset of columns to make data frames easier to view Let’s now put your newly acquired data wrangling skills to the test! An airline industry measure of a passenger airline’s capacity is the available seat miles, which is equal to the number of seats available multiplied by the number of miles or kilometers flown. So for example say an airline had 2 flights using a plane with 10 seats that flew 500 miles and 3 flights using a plane with 20 seats that flew 1000 miles, the available seat miles would be 2 \\(\\times\\) 10 \\(\\times\\) 500 \\(+\\) 3 \\(\\times\\) 20 \\(\\times\\) 1000 = 70,000 seat miles. Learning check (LC5.19) Using the datasets included in the nycflights13 package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints: Crucial: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level pseudocode that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse what you are trying to do (the algorithm) with how you are going to do it (writing dplyr code). Take a close look at all the datasets using the View() function: flights, weather, planes, airports, and airlines to identify which variables are necessary to compute available seat miles. Figure 5.7 above showing how the various datasets can be joined will also be useful. Consider the data wrangling verbs in Table 5.1 as your toolbox! 5.10.2 Review questions Review questions have been designed using the fivethirtyeight R package (Kim, Ismay, and Chunn 2019) with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course Effective Data Storytelling using the tidyverse. The material in this chapter is covered in the chapters of the DataCamp course available below: Filtering, Grouping, &amp; Summarizing dplyr Review 5.10.3 What’s to come? Congratulations! We’ve completed the “data science” portion of this book! We’ll now move to the “data modeling” portion in Chapters 6 and 7, where you’ll leverage your data visualization and wrangling skills to model the relationships between different variables of datasets. However, we’re going to leave “Inference for Regression” (Chapter 11) until later. 5.10.4 Resources As we saw with the RStudio cheatsheet on data visualization, RStudio has also created a cheatsheet for data wrangling entitled “Data Transformation with dplyr”. 5.10.5 Script of R code An R script file of all R code used in this chapter is available here. "],
+["6-regression.html", "6 Basic Regression 6.1 One numerical explanatory variable 6.2 One categorical explanatory variable 6.3 Related topics 6.4 Conclusion", " 6 Basic Regression Now that we are equipped with data visualization skills from Chapter 3, an understanding of the “tidy” data format from Chapter 4, and data wrangling skills from Chapter 5, we now proceed with data modeling. The fundamental premise of data modeling is to make explicit the relationship between: an outcome variable \\(y\\), also called a dependent variable and an explanatory/predictor variable \\(x\\), also called an independent variable or covariate. Another way to state this is using mathematical terminology: we will model the outcome variable \\(y\\) as a function of the explanatory/predictor variable \\(x\\). Why do we have two different labels, explanatory and predictor, for the variable \\(x\\)? That’s because roughly speaking data modeling can be used for two purposes: Modeling for prediction: You want to predict an outcome variable \\(y\\) based on the information contained in a set of predictor variables. You don’t care so much about understanding how all the variables relate and interact, but so long as you can make good predictions about \\(y\\), you’re fine. For example, if we know many individuals’ risk factors for lung cancer, such as smoking habits and age, can we predict whether or not they will develop lung cancer? Here we wouldn’t care so much about distinguishing the degree to which the different risk factors contribute to lung cancer, but instead only on whether or not they could be put together to make reliable predictions. Modeling for explanation: You want to explicitly describe the relationship between an outcome variable \\(y\\) and a set of explanatory variables, determine the significance of any found relationships, and have measures summarizing these. Continuing our example from above, we would now be interested in describing the individual effects of the different risk factors and quantifying the magnitude of these effects. One reason could be to design an intervention to reduce lung cancer cases in a population, such as targeting smokers of a specific age group with an advertisement for smoking cessation programs. In this book, we’ll focus more on this latter purpose. Data modeling is used in a wide variety of fields, including statistical inference, causal inference, artificial intelligence, and machine learning. There are many techniques for data modeling, such as tree-based models, neural networks and deep learning, and supervised learning. In this chapter, we’ll focus on one particular technique: linear regression, one of the most commonly-used and easy-to-understand approaches to modeling. Recall our discussion in Subsection 2.4.3 on numerical and categorical variables. Linear regression involves: an outcome variable \\(y\\) that is numerical and explanatory variables \\(\\vec{x}\\) that are either numerical or categorical. With linear regression there is always only one numerical outcome variable \\(y\\) but we have choices on both the number and the type of explanatory variables \\(\\vec{x}\\) to use. We’re going to cover the following regression scenarios: In this current chapter on basic regression, we’ll always have only one explanatory variable. In Section 6.1, this explanatory variable will be a single numerical explanatory variable \\(x\\). This scenario is known as simple linear regression. In Section 6.2, this explanatory variable will be a categorical explanatory variable \\(x\\). In the next chapter, Chapter 7 on multiple regression, we’ll have more than one explanatory variable: We’ll focus on two numerical explanatory variables \\(x_1\\) and \\(x_2\\) in Section 7.1. This can be denoted as \\(\\vec{x}\\) as well since we have more than one explanatory variable. We’ll use one numerical and one categorical explanatory variable in Section 7.1. We’ll also introduce interaction models here; there, the effect of one explanatory variable depends on the value of another. We’ll study all four of these regression scenarios using real data, all easily accessible via R packages! Needed packages In this chapter we introduce a new package, moderndive, that is an accompaniment package to this ModernDive book. It includes useful functions for linear regression and other functions as well as data used later in the book. Let’s now load all the packages needed for this chapter. If needed, read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(gapminder) library(skimr) DataCamp The introductory basic regression analysis below was the inspiration for a large part of ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 2 “Modeling with Basic Regression”. 6.1 One numerical explanatory variable Why do some professors and instructors at universities and colleges get high teaching evaluations from students while others don’t? What factors can explain these differences? Are there biases? These are questions that are of interest to university/college administrators, as teaching evaluations are among the many criteria considered in determining which professors and instructors should get promotions. Researchers at the University of Texas in Austin, Texas (UT Austin) tried to answer this question: what factors can explain differences in instructor’s teaching evaluation scores? To this end, they collected information on \\(n = 463\\) instructors. A full description of the study can be found at openintro.org. We’ll keep things simple for now and try to explain differences in instructor evaluation scores as a function of one numerical variable: their “beauty score.” The specifics on how this score was calculated will be described shortly. Could it be that instructors with higher beauty scores also have higher teaching evaluations? Could it be instead that instructors with higher beauty scores tend to have lower teaching evaluations? Or could it be there is no relationship between beauty score and teaching evaluations? We’ll achieve ways to address these questions by modeling the relationship between these two variables with a particular kind of linear regression called simple linear regression. Simple linear regression is the most basic form of linear regression. With it we have A numerical outcome variable \\(y\\). In this case, their teaching score. A single numerical explanatory variable \\(x\\). In this case, their beauty score. 6.1.1 Exploratory data analysis A crucial step before doing any kind of modeling or analysis is performing an exploratory data analysis, or EDA, of all our data. Exploratory data analysis can give you a sense of the distribution of the data, and whether there are outliers and/or missing values. Most importantly, it can inform how to build your model. There are many approaches to exploratory data analysis; here are three: Most fundamentally: just looking at the raw values, in a spreadsheet for example. While this may seem trivial, many people ignore this crucial step! Computing summary statistics likes means, medians, and standard deviations. Creating data visualizations. Let’s load the data, select only a subset of the variables, and look at the raw values. Recall you can look at the raw values by running View() in the console in RStudio to pop-up the spreadsheet viewer with the data frame of interest as the argument to View(). Here, however, we present only a snapshot of five randomly chosen rows: evals_ch6 &lt;- evals %&gt;% select(score, bty_avg, age) evals_ch6 %&gt;% sample_n(5) Table 6.1: Random sample of 5 instructors score bty_avg age 3.6 6.67 34 4.9 3.50 43 3.3 2.33 47 4.4 4.67 33 4.7 3.67 60 While a full description of each of these variables can be found at openintro.org, let’s summarize what each of these variables represents. score: Numerical variable of the average teaching score based on students’ evaluations between 1 and 5. This is the outcome variable \\(y\\) of interest. bty_avg: Numerical variable of average “beauty” rating based on a panel of 6 students’ scores between 1 and 10. This is the numerical explanatory variable \\(x\\) of interest. Here 1 corresponds to a low beauty rating and 10 to a high beauty rating. age: A numerical variable of age in years as an integer value. Another way to look at the raw values is using the glimpse() function, which gives us a slightly different view of the data. We see Observations: 463, indicating that there are 463 observations in evals, each corresponding to a particular instructor at UT Austin. Expressed differently, each row in the data frame evals corresponds to one of 463 instructors. glimpse(evals_ch6) Observations: 463 Variables: 3 $ score &lt;dbl&gt; 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4.5, 4… $ bty_avg &lt;dbl&gt; 5.00, 5.00, 5.00, 5.00, 3.00, 3.00, 3.00, 3.33, 3.33, 3.17, 3… $ age &lt;int&gt; 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, 40, 4… Since both the outcome variable score and the explanatory variable bty_avg are numerical, we can compute summary statistics about them such as the mean, median, and standard deviation. Let’s take evals_ch6 and select only the two variables of interest for now. However, let’s instead pipe this into the skim() function from the skimr package. This function quickly uses a “skim” of the data to return the following summary information about each variable. evals_ch6 %&gt;% select(score, bty_avg) %&gt;% skim() Skim summary statistics n obs: 463 n variables: 2 ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 hist bty_avg 0 463 463 4.42 1.53 1.67 3.17 4.33 5.5 8.17 ▂▅▅▇▃▃▂▁ score 0 463 463 4.17 0.54 2.3 3.8 4.3 4.6 5 ▁▁▂▃▅▇▇▆ In this case for our two numerical variables bty_avg beauty score and teaching score score it returns: missing: the number of missing values complete: the number of non-missing or complete values n: the total number of values mean: the average sd: the standard deviation p0: the 0th percentile: the value at which 0% of observations are smaller than it. This is also known as the minimum p25: the 25th percentile: the value at which 25% of observations are smaller than it. This is also known as the 1st quartile p50: the 25th percentile: the value at which 50% of observations are smaller than it. This is also know as the 2nd quartile and more commonly the median p75: the 75th percentile: the value at which 75% of observations are smaller than it. This is also known as the 3rd quartile p100: the 100th percentile: the value at which 100% of observations are smaller than it. This is also known as the maximum A quick snapshot of the histogram We get an idea of how the values in both variables are distributed. For example, the mean teaching score was 4.17 out of 5 whereas the mean beauty score was 4.42 out of 10. Furthermore, the middle 50% of teaching scores were between 3.80 and 4.6 (the first and third quartiles) while the middle 50% of beauty scores were between 3.17 and 5.5 out of 10. The skim() function however only returns what are called univariate summaries, i.e. summaries about single variables at a time. Since we are considering the relationship between two numerical variables, it would be nice to have a summary statistic that simultaneously considers both variables. The correlation coefficient is a bivariate summary statistic that fits this bill. Coefficients in general are quantitative expressions of a specific property of a phenomenon. A correlation coefficient is a quantitative expression between -1 and 1 that summarizes the strength of the linear relationship between two numerical variables: -1 indicates a perfect negative relationship: as the value of one variable goes up, the value of the other variable tends to go down. 0 indicates no relationship: the values of both variables go up/down independently of each other. +1 indicates a perfect positive relationship: as the value of one variable goes up, the value of the other variable tends to go up as well. Figure 6.1 gives examples of different correlation coefficient values for hypothetical numerical variables \\(x\\) and \\(y\\). We see that while for a correlation coefficient of -0.75 there is still a negative relationship between \\(x\\) and \\(y\\), it is not as strong as the negative relationship between \\(x\\) and \\(y\\) when the correlation coefficient is -1. Figure 6.1: Different correlation coefficients The correlation coefficient is computed using the get_correlation() function in the moderndive package, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. We place the name of the response variable on the left hand side of the ~ and the explanatory variable on the right hand side of the “tilde.” We will use this same “formula” syntax with regression later in this chapter. evals_ch6 %&gt;% get_correlation(formula = score ~ bty_avg) # A tibble: 1 x 1 correlation &lt;dbl&gt; 1 0.187 The correlation coefficient can also be computed using the cor() function, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. Recall from Subsection 2.4.3 that the $ pulls out specific variables from a data frame: cor(x = evals_ch6$bty_avg, y = evals_ch6$score) [1] 0.187 In our case, the correlation coefficient of 0.187 indicates that the relationship between teaching evaluation score and beauty average is “weakly positive.” There is a certain amount of subjectivity in interpreting correlation coefficients, especially those that aren’t close to -1, 0, and 1. For help developing such intuition and more discussion on the correlation coefficient see Subsection 6.3.1 below. Let’s now proceed by visualizing this data. Since both the score and bty_avg variables are numerical, a scatterplot is an appropriate graph to visualize this data. Let’s do this using geom_point() and set informative axes labels and title and display the result in Figure 6.2. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) Figure 6.2: Instructor evaluation scores at UT Austin Observe the following: Most “beauty” scores lie between 2 and 8. Most teaching scores lie between 3 and 5. Recall our earlier computation of the correlation coefficient, which describes the strength of the linear relationship between two numerical variables. Looking at Figure 6.3, it is not immediately apparent that these two variables are positively related. This is to be expected given the positive, but rather weak (close to 0), correlation coefficient of 0.187. Before we continue, we bring to light an important fact about this dataset: it suffers from overplotting. Recall from the data visualization Subsection 3.3.2 that overplotting occurs when several points are stacked directly on top of each other thereby obscuring the number of points. For example, let’s focus on the 6 points in the top-right of the plot with a beauty score of around 8 out of 10: are there truly only 6 points, or are there many more just stacked on top of each other? You can think of these as ties. Let’s break up these ties with a little random “jitter” added to the points in Figure 6.3. Figure 6.3: Instructor evaluation scores at UT Austin: Jittered Jittering adds a little random bump to each of the points to break up these ties: just enough so you can distinguish them, but not so much that the plot is overly altered. Furthermore, jittering is strictly a visualization tool; it does not alter the original values in the dataset. Let’s compare side-by-side the regular scatterplot in Figure 6.2 with the jittered scatterplot in Figure 6.3 in Figure 6.4. Figure 6.4: Comparing regular and jittered scatterplots. We make several further observations: Focusing our attention on the top-right of the plot again, as noted earlier where there seemed to only be 6 points in the regular scatterplot, we see there were in fact really 9 as seen in the jittered scatterplot. A further interesting trend is that the jittering revealed a large number of instructors with beauty scores of between 3 and 4.5, towards the lower end of the beauty scale. Going forward for simplicity’s sake however, we’ll only present regular scatterplot rather than the jittered scatterplots; we’ll only keep the overplotting in mind whenever looking at such plots. Going back to scatterplot in Figure 6.2, let’s improve on it by adding a “regression line” in Figure 6.5. This is easily done by adding a new layer to the ggplot code that created Figure 6.3: + geom_smooth(method = &quot;lm&quot;). A regression line is a “best fitting” line in that of all possible lines you could draw on this plot, it is “best” in terms of some mathematical criteria. We discuss the criteria for “best” in Subsection 6.3.3 below, but we suggest you read this only after covering the concept of a residual coming up in Subsection 6.1.3. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) + geom_smooth(method = &quot;lm&quot;) Figure 6.5: Regression line When viewed on this plot, the regression line is a visual summary of the relationship between two numerical variables, in our case the outcome variable score and the explanatory variable bty_avg. The positive slope of the blue line is consistent with our observed correlation coefficient of 0.187 suggesting that there is a positive relationship between score and bty_avg. We’ll see later however that while the correlation coefficient is not equal to the slope of this line, they always have the same sign: positive or negative. What are the grey bands surrounding the blue line? These are standard error bands, which can be thought of as error/uncertainty bands. Let’s skip this idea for now and suppress these grey bars for now by adding the argument se = FALSE to geom_smooth(method = &quot;lm&quot;). We’ll introduce standard errors in Chapter 8 on sampling, use them for constructing confidence intervals and conducting hypothesis tests in Chapters 9 and 10, and consider them when we revisit regression in Chapter 11. ggplot(evals_ch6, aes(x = bty_avg, y = score)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Teaching Score&quot;, title = &quot;Relationship of teaching and beauty scores&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) Figure 6.6: Regression line without error bands Learning check (LC6.1) Conduct a new exploratory data analysis with the same outcome variable \\(y\\) being score but with age as the new explanatory variable \\(x\\). Remember, this involves three things: Looking at the raw values. Computing summary statistics of the variables of interest. Creating informative visualizations. What can you say about the relationship between age and teaching scores based on this exploration? 6.1.2 Simple linear regression You may recall from secondary school / high school algebra, in general, the equation of a line is \\(y = a + bx\\), which is defined by two coefficients. Recall we defined this earlier as “quantitative expressions of a specific property of a phenomenon.” These two coefficients are: the intercept coefficient \\(a\\), or the value of \\(y\\) when \\(x = 0\\), and the slope coefficient \\(b\\), or the increase in \\(y\\) for every increase of one in \\(x\\). However, when defining a line specifically for regression, like the blue regression line in Figure 6.6, we use slightly different notation: the equation of the regression line is \\(\\widehat{y} = b_0 + b_1 \\cdot x\\) where the intercept coefficient is \\(b_0\\), or the value of \\(\\widehat{y}\\) when \\(x=0\\), and the slope coefficient \\(b_1\\), or the increase in \\(\\widehat{y}\\) for every increase of one in \\(x\\). Why do we put a “hat” on top of the \\(y\\)? It’s a form of notation commonly used in regression, which we’ll introduce in the next Subsection 6.1.3 when we discuss fitted values. For now, let’s ignore the hat and treat the equation of the line as you would from secondary school / high school algebra recognizing the slope and the intercept. We know looking at Figure 6.6 that the slope coefficient corresponding to bty_avg should be positive. Why? Because as bty_avg increases, professors tend to roughly have larger teaching evaluation scores. However, what are the specific values of the intercept and slope coefficients? Let’s not worry about computing these by hand, but instead let the computer do the work for us. Specifically let’s use R! Let’s get the value of the intercept and slope coefficients by outputting something called the linear regression table. We will fit the linear regression model to the data using the lm() function and save this to score_model. lm stands for “linear model”, given that we are dealing with lines. When we say “fit”, we are saying find the best fitting line to this data. The lm() function that “fits” the linear regression model is typically used as lm(y ~ x, data = data_frame_name) where: y is the outcome variable, followed by a tilde (~). This is likely the key to the left of “1” on your keyboard. In our case, y is set to score. x is the explanatory variable. In our case, x is set to bty_avg. We call the combination y ~ x a model formula. Recall the use of this notation when we computed the correlation coefficient using the get_correlation() function in Subsection 6.1.1. data_frame_name is the name of the data frame that contains the variables y and x. In our case, data_frame_name is the evals_ch6 data frame. score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) score_model Call: lm(formula = score ~ bty_avg, data = evals_ch6) Coefficients: (Intercept) bty_avg 3.8803 0.0666 This output is telling us that the Intercept coefficient \\(b_0\\) of the regression line is 3.8803 and the slope coefficient for by_avg is 0.0666. Therefore the blue regression line in Figure 6.6 is \\[\\widehat{\\text{score}} = b_0 + b_{\\text{bty_avg}} \\cdot\\text{bty_avg} = 3.8803 + 0.0666\\cdot\\text{ bty_avg}\\] where The intercept coefficient \\(b_0 = 3.8803\\) means for instructors that had a hypothetical beauty score of 0, we would expect them to have on average a teaching score of 3.8803. In this case however, while the intercept has a mathematical interpretation when defining the regression line, there is no practical interpretation since score is an average of a panel of 6 students’ ratings from 1 to 10, a bty_avg of 0 would be impossible. Furthermore, no instructors had a beauty score anywhere near 0 in this data. Of more interest is the slope coefficient associated with bty_avg: \\(b_{\\text{bty avg}} = +0.0666\\). This is a numerical quantity that summarizes the relationship between the outcome and explanatory variables. Note that the sign is positive, suggesting a positive relationship between beauty scores and teaching scores, meaning as beauty scores go up, so also do teaching scores go up. The slope’s precise interpretation is: For every increase of 1 unit in bty_avg, there is an associated increase of, on average, 0.0666 units of score. Such interpretations need be carefully worded: We only stated that there is an associated increase, and not necessarily a causal increase. For example, perhaps it’s not that beauty directly affects teaching scores, but instead individuals from wealthier backgrounds tend to have had better education and training, and hence have higher teaching scores, but these same individuals also have higher beauty scores. Avoiding such reasoning can be summarized by the adage “correlation is not necessarily causation.” In other words, just because two variables are correlated, it doesn’t mean one directly causes the other. We discuss these ideas more in Subsection 6.3.2. We say that this associated increase is on average 0.0666 units of teaching score and not that the associated increase is exactly 0.0666 units of score across all values of bty_avg. This is because the slope is the average increase across all points as shown by the regression line in Figure 6.6. Now that we’ve learned how to compute the equation for the blue regression line in Figure 6.6 and interpreted all its terms, let’s take our modeling one step further. This time after fitting the model using the lm(), let’s get something called the regression table using the get_regression_table() function from the moderndive package: # Fit regression model: score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) # Get regression table: get_regression_table(score_model) Table 6.2: Linear regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 Note how we took the output of the model fit saved in score_model and used it as an input to the subsequent get_regression_table() function. The output now looks like a table: in fact it is a data frame. The values of the intercept and slope of 3.880 and 0.0666 are now in the estimate column. But what are the remaining 5 columns: std_error, statistic, p_value, lower_ci and upper_ci? What do they tell us? They tell us about both the statistical significance and practical significance of our model results. You can think of this loosely as the “meaningfulness” of the results from a statistical perspective. We are going to put aside these ideas for now and revisit them in Chapter 11 on (statistical) inference for regression, after we’ve had a chance to cover: Standard errors in Chapter 8 (std_error) Confidence intervals in Chapter 9 (lower_ci and upper_ci) Hypothesis testing in Chapter 10 (statistic and p_value). For now, we’ll only focus on the term and estimate columns of any regression table. The get_regression_table() from the moderndive is an example of what’s known as a wrapper function in computer programming, which takes other pre-existing functions and “wraps” them into a single function. This concept is illustrated in Figure 6.7. Figure 6.7: The concept of a ‘wrapper’ function. So all you need to worry about is the what the inputs look like and what the outputs look like; you leave all the other details “under the hood of the car.” In our regression modeling example, the get_regression_table() has Input: A saved lm() linear regression Output: A data frame with information on the intercept and slope of the regression line. If you’re interested in learning more about the get_regression_table() function’s construction and thinking, see Subsection 6.3.4 below. Learning check (LC6.2) Fit a new simple linear regression using lm(score ~ age, data = evals_ch6) where age is the new explanatory variable \\(x\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 6.1.3 Observed/fitted values and residuals We just saw how to get the value of the intercept and the slope of the regression line from the regression table generated by get_regression_table(). Now instead, say we want information on individual points. In this case, we focus on one of the \\(n = 463\\) instructors in this dataset, corresponding to a single row of evals_ch6. For example, say we are interested in the 21st instructor in this dataset: Table 6.3: Data for 21st instructor score bty_avg age 4.9 7.33 31 What is the value on the blue line corresponding to this instructor’s bty_avg of 7.333? In Figure 6.8 we mark three values in particular corresponding to this instructor. Red circle: This is the observed value \\(y\\) = 4.9 and corresponds to this instructor’s actual teaching score. Red square: This is the fitted value \\(\\widehat{y}\\) and corresponds to the value on the regression line for \\(x\\) = 7.333. This value is computed using the intercept and slope in the regression table above: \\[\\widehat{y} = b_0 + b_1 \\cdot x = 3.88 + 0.067 * 7.333 = 4.369\\] Blue arrow: The length of this arrow is the residual and is computed by subtracting the fitted value \\(\\widehat{y}\\) from the observed value \\(y\\). The residual can be thought of as the error or “lack of fit” of the regression line. In the case of this instructor, it is \\(y - \\widehat{y}\\) = 4.9 - 4.369 = 0.531. In other words, the model was off by 0.531 teaching score units for this instructor. Figure 6.8: Example of observed value, fitted value, and residual What if we want both the fitted value \\(\\widehat{y} = b_0 + b_1 \\cdot x\\) and the residual \\(y - \\widehat{y}\\) not only the 21st instructor but for all 463 instructors in the study? Recall that each instructor corresponds to one of the 463 rows in the evals_ch6 data frame and also one of the 463 points in the regression plot in Figure 6.6. We could repeat the above calculations by hand 463 times, but that would be tedious and time consuming. Instead, let’s use the get_regression_points() function that we’ve included in the moderndive R package. Note that in the table below we only present the results for the 21st through the 24th instructors. regression_points &lt;- get_regression_points(score_model) regression_points Table 6.4: Regression points (for only 21st through 24th instructor) ID score bty_avg score_hat residual 21 4.9 7.33 4.37 0.531 22 4.6 7.33 4.37 0.231 23 4.5 7.33 4.37 0.131 24 4.4 5.50 4.25 0.153 Just as with the get_regression_table() function, the inputs to the get_regression_points() function are the same, however the outputs are different. Let’s inspect the individual columns: The score column represents the observed value of the outcome variable \\(y\\). The bty_avg column represents the values of the explanatory variable \\(x\\). The score_hat column represents the fitted values \\(\\widehat{y}\\). The residual column represents the residuals \\(y - \\widehat{y}\\). get_regression_points() is another example of a wrapper function we described in Figure 6.7. If you’re curious about this function as well, check out Subsection 6.3.4. Just as we did for the 21st instructor in the evals_ch6 dataset (in the first row of the table above), let’s repeat the above calculations for the 24th instructor in the evals_ch6 dataset (in the fourth row of the table above): score = 4.4 is the observed value \\(y\\) for this instructor. bty_avg = 5.50 is the value of the explanatory variable \\(x\\) for this instructor. score_hat = 4.25 = 3.88 + 0.067 * \\(x\\) = 3.88 + 0.067 * 5.50 is the fitted value \\(\\widehat{y}\\) for this instructor. residual = 0.153 = 4.4 - 4.25 is the value of the residual for this instructor. In other words, the model was off by 0.153 teaching score units for this instructor. More development of this idea appears in Section 6.3.3 and we encourage you to read that section after you investigate residuals. 6.1.4 Residual analysis Recall the residuals can be thought of as the error or the “lack-of-fit” between the observed value \\(y\\) and the fitted value \\(\\widehat{y}\\) on the blue regression line in Figure 6.6. Ideally when we fit a regression model, we’d like there to be no systematic pattern to these residuals. We’ll be more specific as to what we mean by no systematic pattern when we see Figure 6.10 below, but let’s keep this notion imprecise for now. Investigating any such patterns is known as residual analysis and is the theme of this section. We’ll perform our residual analysis in two ways: Creating a scatterplot with the residuals on the \\(y\\)-axis and the original explanatory variable \\(x\\) on the \\(x\\)-axis. Creating a histogram of the residuals, thereby showing the distribution of the residuals. First, recall in Figure 6.8 above we created a scatterplot where on the vertical axis we had the teaching score \\(y\\), on the horizontal axis we had the beauty score \\(x\\), and the blue arrow represented the residual for one particular instructor. Instead, in Figure 6.9 below, let’s create a scatterplot where On the vertical axis we have the residual \\(y-\\widehat{y}\\) instead On the horizontal axis we have the beauty score \\(x\\) as before: ggplot(regression_points, aes(x = bty_avg, y = residual)) + geom_point() + labs(x = &quot;Beauty Score&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;, size = 1) Figure 6.9: Plot of residuals over beauty score You can think of Figure 6.9 as Figure 6.8 but with the blue line flattened out to \\(y=0\\). Does it seem like there is no systematic pattern to the residuals? This question is rather qualitative and subjective in nature, thus different people may respond with different answers to the above question. However, it can be argued that there isn’t a drastic pattern in the residuals. Let’s now get a little more precise in our definition of no systematic pattern in the residuals. Ideally, the residuals should behave randomly. In addition, the residuals should be on average 0. In other words, sometimes the regression model will make a positive error in that \\(y - \\widehat{y} &gt; 0\\), sometimes the regression model will make a negative error in that \\(y - \\widehat{y} &lt; 0\\), but on average the error is 0. Further, the value and spread of the residuals should not depend on the value of \\(x\\). In Figure 6.10 below, we display some hypothetical examples where there are drastic patterns to the residuals. In Example 1, the value of the residual seems to depend on \\(x\\): the residuals tend to be positive for small and large values of \\(x\\) in this range, whereas values of \\(x\\) more in the middle tend to have negative residuals. In Example 2, while the residuals seem to be on average 0 for each value of \\(x\\), the spread of the residuals varies for different values of \\(x\\); this situation is known as heteroskedasticity. Figure 6.10: Examples of less than ideal residual patterns The second way to perform a residual analysis is to look at the histogram of the residuals: ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 0.25, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) Figure 6.11: Histogram of residuals This histogram seems to indicate that we have more positive residuals than negative. Since the residual \\(y-\\widehat{y}\\) is positive when \\(y &gt; \\widehat{y}\\), it seems our fitted teaching score from the regression model tends to underestimate the true teaching score. This histogram has a slight left-skew in that there is a long tail on the left. Another way to say this is this data exhibits a negative skew. Is this a problem? Again, there is a certain amount of subjectivity in the response. In the authors’ opinion, while there is a slight skew/pattern to the residuals, it isn’t a large concern. On the other hand, others might disagree with our assessment. Here are examples of an ideal and less than ideal pattern to the residuals when viewed in a histogram: Figure 6.12: Examples of ideal and less than ideal residual patterns In fact, we’ll see later on that we would like the residuals to be normally distributed with mean 0. In other words, be bell-shaped and centered at 0! While this requirement and residual analysis in general may seem to some of you as not being overly critical at this point, we’ll see later after when we cover inference for regression in Chapter 11 that for the last five columns of the regression table from earlier (std error, statistic, p_value,lower_ci, and upper_ci) to have valid interpretations, the above three conditions should roughly hold. Learning check (LC6.3) Continuing with our regression using age as the explanatory variable and teaching score as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 463 instructors. Perform a residual analysis and look for any systematic patterns in the residuals. Ideally, there should be little to no pattern. 6.2 One categorical explanatory variable It’s an unfortunate truth that life expectancy is not the same across various countries in the world; there are a multitude of factors that are associated with how long people live. International development agencies are very interested in studying these differences in the hope of understanding where governments should allocate resources to address this problem. In this section, we’ll explore differences in life expectancy in two ways: Differences between continents: Are there significant differences in life expectancy, on average, between the five continents of the world: Africa, the Americas, Asia, Europe, and Oceania? Differences within continents: How does life expectancy vary within the world’s five continents? For example, is the spread of life expectancy among the countries of Africa larger than the spread of life expectancy among the countries of Asia? To answer such questions, we’ll study the gapminder dataset in the gapminder package. Recall we mentioned this dataset in Subsection 3.1.2 when we first studied the “Grammar of Graphics” introduced in Figure 3.1. This dataset has international development statistics such as life expectancy, GDP per capita, and population by country (\\(n\\) = 142) for 5-year intervals between 1952 and 2007. We’ll use this data for linear regression again, but note that our explanatory variable \\(x\\) is now categorical, and not numerical like when we covered simple linear regression in Section 6.1. More precisely, we have: A numerical outcome variable \\(y\\). In this case, life expectancy. A single categorical explanatory variable \\(x\\), In this case, the continent the country is part of. When the explanatory variable \\(x\\) is categorical, the concept of a “best-fitting” line is a little different than the one we saw previously in Section 6.1 where the explanatory variable \\(x\\) was numerical. We’ll study these differences shortly in Subsection 6.2.2, but first we conduct our exploratory data analysis. 6.2.1 Exploratory data analysis Let’s load the gapminder data and filter() for only observations in 2007. Next we select() only the variables we’ll need along with gdpPercap, which is each country’s gross domestic product per capita (GDP). GDP is a rough measure of that country’s economic performance. (This will be used for the upcoming Learning Check). Lastly, we save this in a data frame with name gapminder2007: library(gapminder) gapminder2007 &lt;- gapminder %&gt;% filter(year == 2007) %&gt;% select(country, continent, lifeExp, gdpPercap) You should look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function. In Table 6.5 we only show 5 randomly selected countries out of 142: View(gapminder2007) Table 6.5: Random sample of 5 countries country continent lifeExp gdpPercap Slovak Republic Europe 74.7 18678 Israel Asia 80.7 25523 Bulgaria Europe 73.0 10681 Tanzania Africa 52.5 1107 Myanmar Asia 62.1 944 glimpse(gapminder2007) Observations: 142 Variables: 4 $ country &lt;fct&gt; Afghanistan, Albania, Algeria, Angola, Argentina, Australia… $ continent &lt;fct&gt; Asia, Europe, Africa, Africa, Americas, Oceania, Europe, As… $ lifeExp &lt;dbl&gt; 43.8, 76.4, 72.3, 42.7, 75.3, 81.2, 79.8, 75.6, 64.1, 79.4,… $ gdpPercap &lt;dbl&gt; 975, 5937, 6223, 4797, 12779, 34435, 36126, 29796, 1391, 33… We see that the variable continent is indeed categorical, as it is encoded as fct which stands for “factor.” This is R’s way of storing categorical variables. Let’s once again apply the skim() function from the skimr package to our two variables of interest: continent and lifeExp: gapminder2007 %&gt;% select(continent, lifeExp) %&gt;% skim() Skim summary statistics n obs: 142 n variables: 2 ── Variable type:factor ────── variable missing complete n n_unique top_counts continent 0 142 142 5 Afr: 52, Asi: 33, Eur: 30, Ame: 25 ordered FALSE ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 lifeExp 0 142 142 67.01 12.07 39.61 57.16 71.94 76.41 82.6 hist ▂▂▂▂▂▃▇▇ The output now reports summaries for categorical variables (the variable type: factor) separately from the numerical variables. For the categorical variable continent it now reports: missing, complete, n as before which are the number of missing, complete, and total number of values. n_unique: The unique number of levels to this variable, corresponding to Africa, Asia, Americas, Europe, and Oceania top_counts: In this case the top four counts: Africa has 52 entries each corresponding to a country, Asia has 33, Europe has 30, and Americans has 25. Not displayed is Oceania with 2 countries ordered: Reporting whether the variable is “ordinal.” In this case, it is not ordered. Given that the global median life expectancy is 71.94, half of the world’s countries (71 countries) will have a life expectancy less than 71.94. Further, half will have a life expectancy greater than this value. The mean life expectancy of 67.01 is lower however. Why are these two values different? Let’s look at a histogram of lifeExp in Figure 6.13 to see why. Figure 6.13: Histogram of Life Expectancy in 2007 We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancies that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a group_by(continent) to the above code: lifeExp_by_continent &lt;- gapminder2007 %&gt;% group_by(continent) %&gt;% summarize(median = median(lifeExp), mean = mean(lifeExp)) Table 6.6: Life expectancy by continent continent median mean Africa 52.9 54.8 Americas 72.9 73.6 Asia 72.4 70.7 Europe 78.6 77.6 Oceania 80.7 80.7 We see now that there are differences in life expectancies between the continents. For example let’s focus on only medians. While the median life expectancy across all \\(n = 142\\) countries in 2007 was 71.935, the median life expectancy across the \\(n =52\\) countries in Africa was only 52.927. Let’s create a corresponding visualization. One way to compare the life expectancies of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section 3.6, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure 6.14, the variable we facet by is continent, which is categorical with five levels, each corresponding to the five continents of the world. ggplot(gapminder2007, aes(x = lifeExp)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + labs(x = &quot;Life expectancy&quot;, y = &quot;Number of countries&quot;, title = &quot;Life expectancy by continent&quot;) + facet_wrap(~ continent, nrow = 2) Figure 6.14: Life expectancy in 2007 Another way would be via a geom_boxplot where we map the categorical variable continent to the \\(x\\)-axis and the different life expectancies within each continent on the \\(y\\)-axis; we do this in Figure 6.15. ggplot(gapminder2007, aes(x = continent, y = lifeExp)) + geom_boxplot() + labs(x = &quot;Continent&quot;, y = &quot;Life expectancy (years)&quot;, title = &quot;Life expectancy by continent&quot;) Figure 6.15: Life expectancy in 2007 Some people prefer comparing a numerical variable between different levels of a categorical variable, in this case comparing life expectancy between different continents, using a boxplot over a faceted histogram as we can make quick comparisons with single horizontal lines. For example, we can see that even the country with the highest life expectancy in Africa is still lower than all countries in Oceania. It’s important to remember however that the solid lines in the middle of the boxes correspond to the medians (i.e. the middle value) rather than the mean (the average). So, for example, if you look at Asia, the solid line denotes the median life expectancy of around 72 years, indicating to us that half of all countries in Asia have a life expectancy below 72 years whereas half of all countries in Asia have a life expectancy above 72 years. Furthermore, note that: Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes). Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand. Now, let’s start making comparisons of life expectancy between continents. Let’s use Africa as a baseline for comparsion. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa: The median life expectancy of the Americas is roughly 20 years greater. The median life expectancy of Asia is roughly 20 years greater. The median life expectancy of Europe is roughly 25 years greater. The median life expectancy of Oceania is roughly 27.8 years greater. Let’s remember these four differences vs Africa corresponding to the Americas, Asia, Europe, and Oceania: 20, 20, 25, 27.8. Learning check (LC6.4) Conduct a new exploratory data analysis with the same explanatory variable \\(x\\) being continent but with gdpPercap as the new outcome variable \\(y\\). Remember, this involves three things: Looking at the raw values Computing summary statistics of the variables of interest. Creating informative visualizations What can you say about the differences in GDP per capita between continents based on this exploration? 6.2.2 Linear regression In Subsection 6.1.2 we introduced simple linear regression, which involves modeling the relationship between a numerical outcome variable \\(y\\) as a function of a numerical explanatory variable \\(x\\), in our life expectancy example, we now have a categorical explanatory variable \\(x\\) continent. While we still can fit a regression model, given our categorical explanatory variable we no longer have a concept of a “best-fitting” line, but rather “differences relative to a baseline for comparison.” Before we fit our regression model, let’s create a table similar to Table 6.6, but Report the mean life expectancy for each continent. Report the difference in mean life expectancy relative to Africa’s mean life expectancy of 54.806 in the column “mean vs Africa”; this column is simply the “mean” column minus 54.806. Think back to your observations from the eyeball test of Figure 6.15 at the end of the last subsection. The column “mean vs Africa” is the same idea of comparing a summary statistic to a baseline for comparison, in this case the countries of Africa, but using means instead of medians. Table 6.7: Mean life expectancy by continent continent mean mean vs Africa Africa 54.8 0.0 Americas 73.6 18.8 Asia 70.7 15.9 Europe 77.6 22.8 Oceania 80.7 25.9 Now, let’s use the get_regression_table() function we introduced in Section 6.1.2 to get the regression table for gapminder2007 analysis: lifeExp_model &lt;- lm(lifeExp ~ continent, data = gapminder2007) get_regression_table(lifeExp_model) Table 6.8: Linear regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 54.8 1.02 53.45 0 52.8 56.8 continentAmericas 18.8 1.80 10.45 0 15.2 22.4 continentAsia 15.9 1.65 9.68 0 12.7 19.2 continentEurope 22.8 1.70 13.47 0 19.5 26.2 continentOceania 25.9 5.33 4.86 0 15.4 36.5 Just as before, we have the term and estimates columns of interest, but unlike before, we now have 5 rows corresponding to 5 outputs in our table: an intercept like before, but also continentAmericas, continentAsia, continentEurope, and continentOceania. What are these values? First, we must describe the equation for fitted value \\(\\widehat{y}\\), which is a little more complicated when the \\(x\\) explanatory variable is categorical: \\[ \\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x) \\end{align} \\] Let’s break this down. First, \\(\\mathbb{1}_{A}(x)\\) is what’s known in mathematics as an “indicator function” that takes one of two possible values: \\[ \\mathbb{1}_{A}(x) = \\left\\{ \\begin{array}{ll} 1 &amp; \\text{if } x \\text{ is in } A \\\\ 0 &amp; \\text{if } \\text{otherwise} \\end{array} \\right. \\] In a statistical modeling context this is also known as a “dummy variable”. In our case, let’s consider the first such indicator variable: \\[ \\mathbb{1}_{\\mbox{Amer}}(x) = \\left\\{ \\begin{array}{ll} 1 &amp; \\text{if } \\text{country } x \\text{ is in the Americas} \\\\ 0 &amp; \\text{otherwise}\\end{array} \\right. \\] Now let’s interpret the terms in the estimate column of the regression table. First \\(b_0 =\\) intercept = 54.8 corresponds to the mean life expectancy for countries in Africa, since for country \\(x\\) in Africa we have the following equation: \\[ \\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 0 + 15.9\\cdot 0 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 \\end{align} \\] i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table 6.7. Next, \\(b_{\\text{Amer}}\\) = continentAmericas = 18.8 is the difference in mean life expectancies of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is: \\[ \\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 1 + 15.9\\cdot 0 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 + 18.8\\\\ &amp;= 72.9 \\end{align} \\] i.e. in this case, only the indicator function \\(\\mathbb{1}_{\\mbox{Amer}}(x)\\) is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table 6.7. Similarly, \\(b_{\\text{Asia}}\\) = continentAsia = 15.9 is the difference in mean life expectancies of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is: \\[ \\begin{align} \\widehat{\\text{life exp}} &amp;= b_0 + b_{\\text{Amer}}\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + b_{\\text{Asia}}\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + b_{\\text{Euro}}\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + b_{\\text{Ocean}}\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot\\mathbb{1}_{\\mbox{Amer}}(x) + 15.9\\cdot\\mathbb{1}_{\\mbox{Asia}}(x) + 22.8\\cdot\\mathbb{1}_{\\mbox{Euro}}(x) + 25.9\\cdot\\mathbb{1}_{\\mbox{Ocean}}(x)\\\\ &amp;= 54.8 + 18.8\\cdot 0 + 15.9\\cdot 1 + 22.8\\cdot 0 + 25.9\\cdot 0\\\\ &amp;= 54.8 + 15.9\\\\ &amp;= 70.7 \\end{align} \\] i.e. in this case, only the indicator function \\(\\mathbb{1}_{\\mbox{Asia}}(x)\\) is equal to 1, but all others are 0. Recall that 70.7 corresponds to the group mean life expectancy for all countries in Asia in Table 6.7. The same logic applies to \\(b_{\\text{Euro}} = 22.8\\) and \\(b_{\\text{Ocean}} = 25.9\\); they correspond to the “offset” in mean life expectancy for countries in Europe and Oceania, relative to the mean life expectancy of the baseline group for comparison of African countries. Let’s generalize this idea a bit. If we fit a linear regression model using a categorical explanatory variable \\(x\\) that has \\(k\\) levels, a regression model will return an intercept and \\(k - 1\\) “slope” coefficients. When \\(x\\) is a numerical explanatory variable the interpretation is of a “slope” coefficient, but when \\(x\\) is categorical the meaning is a little trickier. They are offsets relative to the baseline. In our case, since there are \\(k = 5\\) continents, the regression model returns an intercept corresponding to the baseline for comparison Africa and \\(k - 1 = 4\\) slope coefficients corresponding to the Americas, Asia, Europe, and Oceania. Africa was chosen as the baseline by R for no other reason than it is first alphabetically of the 5 continents. You can manually specify which continent to use as baseline instead of the default choice of whichever comes first alphabetically, but we leave that to a more advanced course. (The forcats package is particularly nice for doing this and we encourage you to explore using it.) Learning check (LC6.5) Fit a new linear regression using lm(gdpPercap ~ continent, data = gapminder2007) where gdpPercap is the new outcome variable \\(y\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 6.2.3 Observed/fitted values and residuals Recall in Subsection 6.1.3 when we had a numerical explanatory variable \\(x\\), we defined: Observed values \\(y\\), or the observed value of the outcome variable Fitted values \\(\\widehat{y}\\), or the value on the regression line for a given \\(x\\) value Residuals \\(y - \\widehat{y}\\), or the error between the observed value and the fitted value What do fitted values \\(\\widehat{y}\\) and residuals \\(y - \\widehat{y}\\) correspond to when the explanatory variable \\(x\\) is categorical? Let’s investigate these values for the first 10 countries in the gapminder2007 dataset: Table 6.9: First 10 out of 142 countries country continent lifeExp gdpPercap Afghanistan Asia 43.8 975 Albania Europe 76.4 5937 Algeria Africa 72.3 6223 Angola Africa 42.7 4797 Argentina Americas 75.3 12779 Australia Oceania 81.2 34435 Austria Europe 79.8 36126 Bahrain Asia 75.6 29796 Bangladesh Asia 64.1 1391 Belgium Europe 79.4 33693 Recall the get_regression_points() function we used in Subsection 6.1.3 to return the observed value of the outcome variable, all explanatory variables, fitted values, and residuals for all points in the regression. Recall that each “point”. In this case, each row corresponds to one of 142 countries in the gapminder2007 dataset. They are also the 142 observations used to construct the boxplots in Figure 6.15. regression_points &lt;- get_regression_points(lifeExp_model) regression_points Table 6.10: Regression points (First 10 out of 142 countries) ID lifeExp continent lifeExp_hat residual 1 43.8 Asia 70.7 -26.900 2 76.4 Europe 77.6 -1.226 3 72.3 Africa 54.8 17.495 4 42.7 Africa 54.8 -12.075 5 75.3 Americas 73.6 1.712 6 81.2 Oceania 80.7 0.515 7 79.8 Europe 77.6 2.180 8 75.6 Asia 70.7 4.907 9 64.1 Asia 70.7 -6.666 10 79.4 Europe 77.6 1.792 Notice The fitted values lifeExp_hat \\(\\widehat{\\text{lifeexp}}\\). Countries in Africa have the same fitted value of 54.8, which is the mean life expectancy of Africa. Countries in Asia have the same fitted value of 70.7, which is the mean life expectancy of Asia. This similarly holds for countries in the Americas, Europe, and Oceania. The residual column is simply \\(y - \\widehat{y}\\) = lifeexp - lifeexp_hat. These values can be interpreted as that particular country’s deviation from the mean life expectancy of the respective continent’s mean. For example, the first row of this dataset corresponds to Afghanistan, and the residual of \\(-26.9 = 43.8 - 70.7\\) is Afghanistan’s mean life expectancy minus the mean life expectancy of all Asian countries. 6.2.4 Residual analysis Recall our discussion on residuals from Section 6.1.4 where our goal was to investigate whether or not there was a systematic pattern to the residuals. Ideally since residuals can be thought of as error, there should be no such pattern. While there are many ways to do such residual analysis, we focused on two approaches based on visualizations. A plot with residuals on the vertical axis and the predictor (in this case continent) on the horizontal axis A histogram of all residuals First, let’s plot the residuals versus continent in Figure 6.16, but also let’s plot all 142 points with a little horizontal random jitter by setting the width = 0.1 parameter in geom_jitter(): ggplot(regression_points, aes(x = continent, y = residual)) + geom_jitter(width = 0.1) + labs(x = &quot;Continent&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;) Figure 6.16: Plot of residuals over continent We observe There seems to be a rough balance of both positive and negative residuals for all 5 continents. However, there is one clear outlier in Asia. It has the smallest residual, hence also has the smallest life expectancy in Asia. Let’s investigate the 5 countries in Asia with the shortest life expectancy: gapminder2007 %&gt;% filter(continent == &quot;Asia&quot;) %&gt;% arrange(lifeExp) Table 6.11: Countries in Asia with shortest life expectancy country continent lifeExp gdpPercap Afghanistan Asia 43.8 975 Iraq Asia 59.5 4471 Cambodia Asia 59.7 1714 Myanmar Asia 62.1 944 Yemen, Rep. Asia 62.7 2281 This was the earlier identified residual for Afghanistan of -26.9. Unfortunately given recent geopolitical turmoil, individuals who live in Afghanistan and, in particular in 2007, have a drastically lower life expectancy. Second, let’s look at a histogram of all 142 values of residuals in Figure 6.17. In this case, the residuals form a rather nice bell-shape, although there are a couple of very low and very high values at the tails. As we said previously, searching for patterns in residuals can be somewhat subjective, but ideally we hope there are no “drastic” patterns. ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 5, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) Figure 6.17: Histogram of residuals Learning check (LC6.6) Continuing with our regression using gdpPercap as the outcome variable and continent as the explanatory variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 142 countries in 2007 and perform a residual analysis to look for any systematic patterns in the residuals. Is there a pattern? Please keep in mind that these types of questions are somewhat subjective and different people will most likely have different answers. The focus should be on being able to justify the conclusions made. 6.3 Related topics 6.3.1 Correlation coefficient Let’s re-plot Figure 6.1, but now consider a broader range of correlation coefficient values in Figure 6.18. Figure 6.18: Different Correlation Coefficients As we suggested in Subsection 6.1.1, interpreting coefficients that are not close to the extreme values of -1 and 1 can be subjective. To develop your sense of correlation coefficients, we suggest you play the following 80’s-style video game called “Guess the correlation”! Click on the image below to do so: 6.3.2 Correlation is not necessarily causation You’ll note throughout this chapter we’ve been very cautious in making statements of the “associated effect” of explanatory variables on the outcome variables, for example our statement from Subsection 6.1.2 that “for every increase of 1 unit in bty_avg, there is an associated increase of, on average, 18.802 units of score.” We stay this because we are careful not to make causal statements. So while beauty score bty_avg is positively correlated with teaching score, does it directly cause effects on teaching score. For example, let’s say an instructor has their bty_avg reevaluated, but only after taking steps to try to boost their beauty score. Does this mean that they will suddenly be a better instructor? Or will they suddenly get higher teaching scores? Maybe? Here is another example, a not-so-great medical doctor goes through their medical records and finds that patients who slept with their shoes on tended to wake up more with headaches. So this doctor declares “Sleeping with shoes on cause headaches!” Figure 6.19: Does sleeping with shoes on cause headaches? However as some of you might have guessed, if someone is sleeping with their shoes on its probably because they are intoxicated. Furthermore, drinking more tends to cause more hangovers, and hence more headaches. In this instance, alcohol is what’s known as a confounding/lurking variable. It “lurks” behind the scenes, confounding or making less apparent, the causal effect (if any) of “sleeping with shoes on” with waking up with a headache. We can summarize this notion in Figure 6.20 with a causal graph where: Y: Is an outcome variable, here “waking up with a headache.” X: Is a treatment variable whose causal effect we are interested in, here “sleeping with shoes on.” Figure 6.20: Causal graph. So for example, many such studies use regression modeling where the outcome variable is set to Y and the explanatory/predictor variable is X, much as you’ve started learning how to do in this chapter. However, Figure 6.20 also includes a third variable with arrows pointing at both X and Y. Z: Is a confounding variable that effects both X &amp; Y, thus “confounding” their relationship. So as we said, alcohol will both cause people to be more likely to sleep with their shoes on as well as more likely to wake up with a headache. Thus when evaluating what causes one to wake up with a headache, its hard to tease out the effect of sleeping with shoes on versus just the alcohol. Thus our model needs to also use Z as an explanatory/predictor variable as well, in other words our doctor needs to take into account who had been drinking the night before. We’ll start covering multiple regression models that allows us to incorporate more than one variable in the next chapter. Establishing causation is a tricky problem and frequently takes either carefully designed experiments or methods to control for the effects of potential confounding variables. Both these approaches attempt either to remove all confounding variables or take them into account as best they can, and only focus on the behavior of an outcome variable in the presence of the levels of the other variable(s). Be careful as you read studies to make sure that the writers aren’t falling into this fallacy of correlation implying causation. If you spot one, you may want to send them a link to Spurious Correlations. 6.3.3 Best fitting line Regression lines are also known as “best fitting lines”. But what do we mean by best? Let’s unpack the criteria that is used by regression to determine best. Recall the plot in Figure 6.8 where for a instructor with a beauty average score of \\(x=7.333\\) The observed value \\(y=4.9\\) was marked with a red circle The fitted value \\(\\widehat{y} = 4.369\\) on the regression line was marked with a red square The residual \\(y-\\widehat{y} = 4.9-4.369 = 0.531\\) was the length of the blue arrow. Let’s do this for another arbitrarily chosen instructor whose beauty score was \\(x=2.333\\). The residual in this case is \\(2.7 - 4.036 = -1.336\\). Another arbitrarily chosen instructor whose beauty score was \\(x=3.667\\) results in the residual in this case being \\(4.4 - 4.125 = 0.2753\\). Let’s do this one more time for another arbitrarily chosen instructor. This instructor had a beauty score of \\(x = 6\\). The residual in this case is \\(3.8 - 4.28 = -0.4802\\). Now let’s say we repeated this process for all 463 instructors in our dataset. Regression minimizes the sum of all 463 arrow lengths squared. In other words, it minimizes the sum of the squared residuals: \\[ \\sum_{i=1}^{n}(y_i - \\widehat{y}_i)^2 \\] We square the arrow lengths so that positive and negative deviations of the same amount are treated equally. That’s why alternative names for the simple linear regression line are the least-squares line and the best fitting line. It can be proven via calculus and linear algebra that this line uniquely minimizes the sum of the squared arrow lengths. For the regression line in the plot, the sum of the squared residuals is 131.879. This is the lowest possible value of the sum of the squared residuals of all possible lines we could draw on this scatterplot? How do we know this? We can mathematically prove this fact, but this requires some calculus and linear algebra, so let’s leave this proof for another course! 6.3.4 get_regression_x() functions What is going on behind the scenes with the get_regression_table() get_regression_points() from the moderndive package? Recall we introduced In Subsection 6.1.2, the get_regression_table() function that returned a regression table. In Subsection 6.1.3, the get_regression_points() function that returned information on all \\(n\\) points/observations involved in a regression? and that these were examples of wrapper functions that takes other pre-existing functions and “wraps” them in a single function. This way all the user needs to worry about is the input and the output format, and ignore what’s “under the hood.” In this subsection we “lift the hood” and see how the engine of these wrapper functions work. First, the get_regression_table() wrapper function leverages the the tidy() function in the broom package and the clean_names() function in the janitor package to generate tidy data frames with information about a regression model. Here is what the regression table from Subsection 6.1.2 looks like: score_model &lt;- lm(score ~ bty_avg, data = evals_ch6) get_regression_table(score_model) term estimate std_error statistic p_value lower_ci upper_ci intercept 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 The get_regression_table() function takes the above two functions that already existed in other R packages, uses them, and hides the details as seen below. This was on the editorial decision on our part as we felt the following code was unfortunately out of the reach for some new coders, so the following wrapper function was written so that users need only focus on the output. library(broom) library(janitor) score_model %&gt;% tidy(conf.int = TRUE) %&gt;% mutate_if(is.numeric, round, digits = 3) %&gt;% clean_names() %&gt;% rename(lower_ci = conf_low, upper_ci = conf_high) term estimate std_error statistic p_value lower_ci upper_ci (Intercept) 3.880 0.076 50.96 0 3.731 4.030 bty_avg 0.067 0.016 4.09 0 0.035 0.099 Note that the mutate_if() function is from the dplyr package and applies the round() function with 3 significant digits precision only to those variables that are numerical. Similarly, the second get_regression_points() function is another wrapper function, but this time returning information about the points in a regression rather than the regression table. It uses the augment() function in the broom package instead of tidy() as with get_regression_points(). library(broom) library(janitor) score_model %&gt;% augment() %&gt;% mutate_if(is.numeric, round, digits = 3) %&gt;% clean_names() %&gt;% select(-c(&quot;se_fit&quot;, &quot;hat&quot;, &quot;sigma&quot;, &quot;cooksd&quot;, &quot;std_resid&quot;)) score bty_avg fitted resid 4.7 5.00 4.21 0.486 4.1 5.00 4.21 -0.114 3.9 5.00 4.21 -0.314 4.8 5.00 4.21 0.586 4.6 3.00 4.08 0.520 4.3 3.00 4.08 0.220 2.8 3.00 4.08 -1.280 4.1 3.33 4.10 -0.002 3.4 3.33 4.10 -0.702 4.5 3.17 4.09 0.409 In this case, it outputs only variables of interest to us as new regression modelers: the outcome variable \\(y\\) (score), all explanatory/predictor variables (bty_avg), all resulting fitted values \\(\\hat{y}\\) used by applying the equation of the regression line to bty_avg, and the residual \\(y - \\hat{y}\\). If you’re even more curious, take a look at the source code for these functions on GitHub. 6.4 Conclusion In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter 7, we’ll study multiple regression where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections {#model1residuals} and {#model2residuals}. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, lower_ci and upper_ci (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in! 6.4.1 Script of R code An R script file of all R code used in this chapter is available here. "],
+["7-multiple-regression.html", "7 Multiple Regression 7.1 Two numerical explanatory variables 7.2 One numerical &amp; one categorical explanatory variable 7.3 Related topics 7.4 Conclusion", " 7 Multiple Regression In Chapter 6 we introduced ideas related to modeling, in particular that the fundamental premise of modeling is to make explicit the relationship between an outcome variable \\(y\\) and an explanatory/predictor variable \\(x\\). Recall further the synonyms that we used to also denote \\(y\\) as the dependent variable and \\(x\\) as an independent variable or covariate. There are many modeling approaches one could take, among the most well-known being linear regression, which was the focus of the last chapter. Whereas in the last chapter we focused solely on regression scenarios where there is only one explanatory/predictor variable, in this chapter, we now focus on modeling scenarios where there is more than one. This case of regression more than one explanatory variable is known as multiple regression. You can imagine when trying to model a particular outcome variable, like teaching evaluation score as in Section 6.1 or life expectancy as in Section 6.2, it would be very useful to incorporate more than one explanatory variable. Since our regression models will now consider more than one explanatory/predictor variable, the interpretation of the associated effect of any one explanatory/predictor variables must be made in conjunction with the others. For example, say we are modeling individuals’ incomes as a function of their number of years of education and their parents’ wealth. When interpreting the effect of education on income, one has to consider the effect of their parents’ wealth at the same time, as these two variables are almost certainly related. Make note of this throughout this chapter and as you work on interpreting the results of multiple regression models into the future. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(ISLR) library(skimr) DataCamp The approach taken below of using more than one variable of information in models using multiple regression is identical to that taken in ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression”. 7.1 Two numerical explanatory variables Let’s now attempt to identify factors that are associated with how much credit card debt an individual will have. The textbook An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani is an intermediate-level textbook on statistical and machine learning freely available here. It has an accompanying R package called ISLR with datasets that the authors use to demonstrate various machine learning methods. One dataset that is frequently used by the authors is the Credit dataset where predictions are made on the credit card balance held by \\(n = 400\\) credit card holders. These predictions are based on information about them like income, credit limit, and education level. Note that this dataset is not based on actual individuals, it is a simulated dataset used for educational purposes. Since no information was provided as to who these \\(n\\) = 400 individuals are and how they came to be included in this dataset, it will be hard to make any scientific claims based on this data. Recall our discussion from the previous chapter that correlation does not necessarily imply causation. That being said, we’ll still use Credit to demonstrate multiple regression with: A numerical outcome variable \\(y\\), in this case credit card balance. Two explanatory variables: A first numerical explanatory variable \\(x_1\\). In this case, their credit limit. A second numerical explanatory variable \\(x_2\\). In this case, their income (in thousands of dollars). In the forthcoming Learning Checks, we’ll consider a different scenario: The same numerical outcome variable \\(y\\): credit card balance. Two new explanatory variables: A first numerical explanatory variable \\(x_1\\): their credit rating. A second numerical explanatory variable \\(x_2\\): their age. 7.1.1 Exploratory data analysis Let’s load the Credit data and select() only the needed subset of variables. library(ISLR) Credit &lt;- Credit %&gt;% select(Balance, Limit, Income, Rating, Age) Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function. Although in Table 7.1 we only show 5 randomly selected credit card holders out of 400: View(Credit) Table 7.1: Random sample of 5 credit card holders Balance Limit Income Rating Age 1425 6045 39.8 459 32 279 3300 15.1 266 66 204 5308 80.6 394 57 1050 9310 180.4 665 67 15 4952 88.8 360 86 glimpse(Credit) Observations: 400 Variables: 5 $ Balance &lt;int&gt; 333, 903, 580, 964, 331, 1151, 203, 872, 279, 1350, 1407, 0, … $ Limit &lt;int&gt; 3606, 6645, 7075, 9504, 4897, 8047, 3388, 7114, 3300, 6819, 8… $ Income &lt;dbl&gt; 14.9, 106.0, 104.6, 148.9, 55.9, 80.2, 21.0, 71.4, 15.1, 71.1… $ Rating &lt;int&gt; 283, 483, 514, 681, 357, 569, 259, 512, 266, 491, 589, 138, 3… $ Age &lt;int&gt; 34, 82, 71, 36, 68, 77, 37, 87, 66, 41, 30, 64, 57, 49, 75, 5… Let’s look at some summary statistics, again using the skim() function from the skimr package: Credit %&gt;% select(Balance, Limit, Income) %&gt;% skim() Skim summary statistics n obs: 400 n variables: 3 ── Variable type:integer ───── variable missing complete n mean sd p0 p25 p50 p75 p100 Balance 0 400 400 520.01 459.76 0 68.75 459.5 863 1999 Limit 0 400 400 4735.6 2308.2 855 3088 4622.5 5872.75 13913 hist ▇▃▃▃▂▁▁▁ ▅▇▇▃▂▁▁▁ ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 Income 0 400 400 45.22 35.24 10.35 21.01 33.12 57.47 186.63 hist ▇▃▂▁▁▁▁▁ We observe for example: The mean and median credit card balance are $520.01 and $495.50 respectively. 25% of card holders had debts of $68.75 or less. The mean and median credit card limit are $4735.6 and $4622.50 respectively. 75% of these card holders had incomes of $57,470 or less. Since our outcome variable Balance and the explanatory variables Limit and Rating are numerical, we can compute the correlation coefficient between pairs of these variables. First, we could run the get_correlation() command as seen in Subsection 6.1.1 twice, once for each explanatory variable: Credit %&gt;% get_correlation(Balance ~ Limit) Credit %&gt;% get_correlation(Balance ~ Income) Or we can simultaneously compute them by returning a correlation matrix in Table 7.2. We can read off the correlation coefficient for any pair of variables by looking them up in the appropriate row/column combination. Credit %&gt;% select(Balance, Limit, Income) %&gt;% cor() Table 7.2: Correlations between credit card balance, credit limit, and income Balance Limit Income Balance 1.000 0.862 0.464 Limit 0.862 1.000 0.792 Income 0.464 0.792 1.000 For example, the correlation coefficient of: Balance with itself is 1 as we would expect based on the definition of the correlation coefficient. Balance with Limit is 0.862. This indicates a strong positive linear relationship, which makes sense as only individuals with large credit limits can accrue large credit card balances. Balance with Income is 0.464. This is suggestive of another positive linear relationship, although not as strong as the relationship between Balance and Limit. As an added bonus, we can read off the correlation coefficient of the two explanatory variables, Limit and Income of 0.792. In this case, we say there is a high degree of collinearity between these two explanatory variables. Collinearity (or multicollinearity) is a phenomenon in which one explanatory variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. So in this case, if we knew someone’s credit card Limit and since Limit and Income are highly correlated, we could make a fairly accurate guess as to that person’s Income. Or put loosely, these two variables provided redundant information. For now let’s ignore any issues related to collinearity and press on. Let’s visualize the relationship of the outcome variable with each of the two explanatory variables in two separate plots: ggplot(Credit, aes(x = Limit, y = Balance)) + geom_point() + labs(x = &quot;Credit limit (in $)&quot;, y = &quot;Credit card balance (in $)&quot;, title = &quot;Relationship between balance and credit limit&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) ggplot(Credit, aes(x = Income, y = Balance)) + geom_point() + labs(x = &quot;Income (in $1000)&quot;, y = &quot;Credit card balance (in $)&quot;, title = &quot;Relationship between balance and income&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) Figure 7.1: Relationship between credit card balance and credit limit/income First, there is a positive relationship between credit limit and balance, since as credit limit increases so also does credit card balance; this is to be expected given the strongly positive correlation coefficient of 0.862. In the case of income, the positive relationship doesn’t appear as strong, given the weakly positive correlation coefficient of 0.464. However the two plots in Figure 7.1 only focus on the relationship of the outcome variable with each of the explanatory variables independently. To get a sense of the joint relationship of all three variables simultaneously through a visualization, let’s display the data in a 3-dimensional (3D) scatterplot, where The numerical outcome variable \\(y\\) Balance is on the z-axis (vertical axis) The two numerical explanatory variables form the “floor” axes. In this case The first numerical explanatory variable \\(x_1\\) Income is on of the floor axes. The second numerical explanatory variable \\(x_2\\) Limit is on the other floor axis. Click on the following image to open an interactive 3D scatterplot in your browser: Previously in Figure 6.6, we plotted a “best-fitting” regression line through a set of points where the numerical outcome variable \\(y\\) was teaching score and a single numerical explanatory variable \\(x\\) was bty_avg. What is the analogous concept when we have two numerical predictor variables? Instead of a best-fitting line, we now have a best-fitting plane, which is a 3D generalization of lines which exist in 2D. Click on the following image to open an interactive plot of the regression plane in your browser. Move the image around, zoom in, and think about how this plane generalizes the concept of a linear regression line to three dimensions. Learning check (LC7.1) Conduct a new exploratory data analysis with the same outcome variable \\(y\\) being Balance but with Rating and Age as the new explanatory variables \\(x_1\\) and \\(x_2\\). Remember, this involves three things: Looking at the raw values Computing summary statistics of the variables of interest. Creating informative visualizations What can you say about the relationship between a credit card holder’s balance and their credit rating and age? 7.1.2 Multiple regression Just as we did when we had a single numerical explanatory variable \\(x\\) in Subsection 6.1.2 and when we had a single categorical explanatory variable \\(x\\) in Subsection 6.2.2, we fit a regression model and obtained the regression table in our two numerical explanatory variable scenario. To fit a regression model and get a table using get_regression_table(), we now use a + to consider multiple explanatory variables. In this case since we want to perform a regression of Limit and Income simultaneously, we input Balance ~ Limit + Income. Balance_model &lt;- lm(Balance ~ Limit + Income, data = Credit) get_regression_table(Balance_model) Table 7.3: Multiple regression table term estimate std_error statistic p_value lower_ci upper_ci intercept -385.179 19.465 -19.8 0 -423.446 -346.912 Limit 0.264 0.006 45.0 0 0.253 0.276 Income -7.663 0.385 -19.9 0 -8.420 -6.906 How do we interpret these three values that define the regression plane? Intercept: -$385.18 (rounded to two decimal points to represent cents). The intercept in our case represents the credit card balance for an individual who has both a credit Limit of $0 and Income of $0. In our data however, the intercept has limited practical interpretation as no individuals had Limit or Income values of $0 and furthermore the smallest credit card balance was $0. Rather, it is used to situate the regression plane in 3D space. Limit: $0.26. Now that we have multiple variables to consider, we have to add a caveat to our interpretation: taking all other variables in our model into account, for every increase of one unit in credit Limit (dollars), there is an associated increase of on average $0.26 in credit card balance. Note: Just as we did in Subsection 6.1.2, we are not making any causal statements, only statements relating to the association between credit limit and balance We need to preface our interpretation of the associated effect of Limit with the statement “taking all other variables into account”, in this case Income, to emphasize that we are now jointly interpreting the associated effect of multiple explanatory variables in the same model and not in isolation. Income: -$7.66. Similarly, taking all other variables into account, for every increase of one unit in Income (in other words, $1000 in income), there is an associated decrease of on average $7.66 in credit card balance. However, recall in Figure 7.1 that when considered separately, both Limit and Income had positive relationships with the outcome variable Balance. As card holders’ credit limits increased their credit card balances tended to increase as well, and a similar relationship held for incomes and balances. In the above multiple regression, however, the slope for Income is now -7.66, suggesting a negative relationship between income and credit card balance. What explains these contradictory results? This is known as Simpson’s Paradox, a phenomenon in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. We expand on this in Subsection 7.3.2 where we’ll look at the relationship between credit Limit and credit card balance but split by different income bracket groups. Learning check (LC7.2) Fit a new simple linear regression using lm(Balance ~ Rating + Age, data = Credit) where Rating and Age are the new numerical explanatory variables \\(x_1\\) and \\(x_2\\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above? 7.1.3 Observed/fitted values and residuals As we did previously in Table 7.4, let’s unpack the output of the get_regression_points() function for our model for credit card balance for all 400 card holders in the dataset. Recall that each card holder corresponds to one of the 400 rows in the Credit data frame and also for one of the 400 3D points in the 3D scatterplots in Subsection 7.1.1. regression_points &lt;- get_regression_points(Balance_model) regression_points Table 7.4: Regression points (first 5 rows of 400) ID Balance Limit Income Balance_hat residual 1 333 3606 14.9 454 -120.8 2 903 6645 106.0 559 344.3 3 580 7075 104.6 683 -103.4 4 964 9504 148.9 986 -21.7 5 331 4897 55.9 481 -150.0 Recall the format of the output: Balance corresponds to \\(y\\) (the observed value) Balance_hat corresponds to \\(\\widehat{y}\\) (the fitted value) residual corresponds to \\(y - \\widehat{y}\\) (the residual) 7.1.4 Residual analysis Recall in Section 6.1.4, our first residual analysis plot investigated the presence of any systematic pattern in the residuals when we had a single numerical predictor: bty_age. For the Credit card dataset, since we have two numerical predictors, Limit and Income, we must perform this twice: ggplot(regression_points, aes(x = Limit, y = residual)) + geom_point() + labs(x = &quot;Credit limit (in $)&quot;, y = &quot;Residual&quot;, title = &quot;Residuals vs credit limit&quot;) ggplot(regression_points, aes(x = Income, y = residual)) + geom_point() + labs(x = &quot;Income (in $1000)&quot;, y = &quot;Residual&quot;, title = &quot;Residuals vs income&quot;) Figure 7.2: Residuals vs credit limit and income In this case, there does appear to be a systematic pattern to the residuals. As the scatter of the residuals around the line \\(y=0\\) is definitely not consistent. This behavior of the residuals is further evidenced by the histogram of residuals in Figure 7.3. We observe that the residuals have a slight right-skew (recall we say that data is right-skewed, or positively-skewed, if there is a tail to the right). Ideally, these residuals should be bell-shaped around a residual value of 0. ggplot(regression_points, aes(x = residual)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) Figure 7.3: Relationship between credit card balance and credit limit/income Another way to interpret this histogram is that since the residual is computed as \\(y - \\widehat{y}\\) = balance - balance_hat, we have some values where the fitted value \\(\\widehat{y}\\) is very much lower than the observed value \\(y\\). In other words, we are underestimating certain credit card holders’ balances by a very large amount. Learning check (LC7.3) Continuing with our regression using Rating and Age as the explanatory variables and credit card Balance as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 400 credit card holders. Perform a residual analysis and look for any systematic patterns in the residuals. 7.2 One numerical &amp; one categorical explanatory variable Let’s revisit the instructor evaluation data introduced in Section 6.1, where we studied the relationship between instructor evaluation scores and their beauty scores. This analysis suggested that there is a positive relationship between bty_avg and score, in other words as instructors had higher beauty scores, they also tended to have higher teaching evaluation scores. Now let’s say instead of bty_avg we are interested in the numerical explanatory variable \\(x_1\\) age and furthermore we want to use a second explanatory variable \\(x_2\\), the (binary) categorical variable gender. Note: This study only focused on the gender binary of &quot;male&quot; or &quot;female&quot; when the data was collected and analyzed years ago. It has been tradition to use gender as an “easy” binary variable in the past in statistical analyses. We have chosen to include it here because of the interesting results of the study, but we also understand that a segment of the population is not included in this dichotomous assignment of gender and we advocate for more inclusion in future studies to show representation of groups that do not identify with the gender binary. We now resume our analyses using this evals data and hope that others find these results interesting and worth further exploration. Our modeling scenario now becomes A numerical outcome variable \\(y\\). As before, instructor evaluation score. Two explanatory variables: A numerical explanatory variable \\(x_1\\): in this case, their age. A categorical explanatory variable \\(x_2\\): in this case, their binary gender. 7.2.1 Exploratory data analysis Let’s reload the evals data and select() only the needed subset of variables. Note that these are different than the variables chosen in Chapter 6. Let’s given this the name evals_ch7. evals_ch7 &lt;- evals %&gt;% select(score, age, gender) Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function, although in Table 7.5 we only show 5 randomly selected instructors out of 463: View(evals_ch7) Table 7.5: Random sample of 5 instructors score age gender 3.6 34 male 4.9 43 male 3.3 47 male 4.4 33 female 4.7 60 male Let’s look at some summary statistics using the skim() function from the skimr package: evals_ch7 %&gt;% skim() Skim summary statistics n obs: 463 n variables: 3 ── Variable type:factor ────── variable missing complete n n_unique top_counts ordered gender 0 463 463 2 mal: 268, fem: 195, NA: 0 FALSE ── Variable type:integer ───── variable missing complete n mean sd p0 p25 p50 p75 p100 hist age 0 463 463 48.37 9.8 29 42 48 57 73 ▅▅▅▇▅▇▂▁ ── Variable type:numeric ───── variable missing complete n mean sd p0 p25 p50 p75 p100 hist score 0 463 463 4.17 0.54 2.3 3.8 4.3 4.6 5 ▁▁▂▃▅▇▇▆ Furthermore, let’s compute the correlation between two numerical variables we have score and age. Recall that correlation coefficients only exist between numerical variables. We observe that they are weakly negatively correlated. evals_ch7 %&gt;% get_correlation(formula = score ~ age) # A tibble: 1 x 1 correlation &lt;dbl&gt; 1 -0.107 In Figure 7.4, we plot a scatterplot of score over age. Given that gender is a binary categorical variable in this study, we can make some interesting tweaks: We can assign a color to points from each of the two levels of gender: female and male. Furthermore, the geom_smooth(method = &quot;lm&quot;, se = FALSE) layer automatically fits a different regression line for each since we have provided color = gender at the top level in ggplot(). This allows for all geom_etries that follow to have the same mapping of aes()thetics to variables throughout the plot. ggplot(evals_ch7, aes(x = age, y = score, color = gender)) + geom_jitter() + labs(x = &quot;Age&quot;, y = &quot;Teaching Score&quot;, color = &quot;Gender&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) Figure 7.4: Instructor evaluation scores at UT Austin split by gender (jittered) We notice some interesting trends: There are almost no women faculty over the age of 60. We can see this by the lack of red dots above 60. Fitting separate regression lines for men and women, we see they have different slopes. We see that the associated effect of increasing age seems to be much harsher for women than men. In other words, as women age, the drop in their teaching score appears to be faster. 7.2.2 Multiple regression: Parallel slopes model Much like we started to consider multiple explanatory variables using the + sign in Subsection 7.1.2, let’s fit a regression model and get the regression table. This time we provide the name of score_model_2 to our regression model fit, in so as to not overwrite the model score_model from Section 6.1.2. score_model_2 &lt;- lm(score ~ age + gender, data = evals_ch7) get_regression_table(score_model_2) Table 7.6: Regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 4.484 0.125 35.79 0.000 4.238 4.730 age -0.009 0.003 -3.28 0.001 -0.014 -0.003 gendermale 0.191 0.052 3.63 0.000 0.087 0.294 The modeling equation for this scenario is: \\[ \\begin{align} \\widehat{y} &amp;= b_0 + b_1 \\cdot x_1 + b_2 \\cdot x_2 \\\\ \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ \\end{align} \\] where \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) is an indicator function for sex == male. In other words, \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals one if the current observation corresponds to a male professor, and 0 if the current observation corresponds to a female professor. This model can be visualized in Figure 7.5. Figure 7.5: Instructor evaluation scores at UT Austin by gender: same slope We see that: Females are treated as the baseline for comparison for no other reason than “female” is alphabetically earlier than “male.” The \\(b_{male} = 0.1906\\) is the vertical “bump” that men get in their teaching evaluation scores. Or more precisely, it is the average difference in teaching score that men get relative to the baseline of women. Accordingly, the intercepts (which in this case make no sense since no instructor can have an age of 0) are : for women: \\(b_0\\) = 4.484 for men: \\(b_0 + b_{male}\\) = 4.484 + 0.191 = 4.675 Both men and women have the same slope. In other words, in this model the associated effect of age is the same for men and women. So for every increase of one year in age, there is on average an associated change of \\(b_{age}\\) = -0.009 (a decrease) in teaching score. But wait, why is Figure 7.5 different than Figure 7.4! What is going on? What we have in the original plot is known as an interaction effect between age and gender. Focusing on fitting a model for each of men and women, we see that the resulting regression lines are different. Thus, gender appears to interact in different ways for men and women with the different values of age. 7.2.3 Multiple regression: Interaction model We say a model has an interaction effect if the associated effect of one variable depends on the value of another variable. These types of models usually prove to be tricky to view on first glance because of their complexity. In this case, the effect of age will depend on the value of gender. Put differently, the effect of age on teaching scores will differ for men and for women, as was suggested by the different slopes for men and women in our visual exploratory data analysis in Figure 7.4. Let’s fit a regression with an interaction term. Instead of using the + sign in the enumeration of explanatory variables, we use the * sign. Let’s fit this regression and save it in score_model_3, then we get the regression table using the get_regression_table() function as before. score_model_interaction &lt;- lm(score ~ age * gender, data = evals_ch7) get_regression_table(score_model_interaction) Table 7.7: Regression table term estimate std_error statistic p_value lower_ci upper_ci intercept 4.883 0.205 23.80 0.000 4.480 5.286 age -0.018 0.004 -3.92 0.000 -0.026 -0.009 gendermale -0.446 0.265 -1.68 0.094 -0.968 0.076 age:gendermale 0.014 0.006 2.45 0.015 0.003 0.024 The modeling equation for this scenario is: \\[ \\begin{align} \\widehat{y} &amp;= b_0 + b_1 \\cdot x_1 + b_2 \\cdot x_2 + b_3 \\cdot x_1 \\cdot x_2\\\\ \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ \\end{align} \\] Oof, that’s a lot of rows in the regression table output and a lot of terms in the model equation. The fourth term being added on the right hand side of the equation corresponds to the interaction term. Let’s simplify things by considering men and women separately. First, recall that \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals 1 if a particular observation (or row in evals_ch7) corresponds to a male instructor. In this case, using the values from the regression table the fitted value of \\(\\widehat{\\mbox{score}}\\) is: \\[ \\begin{align} \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot 1 + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot 1 \\\\ &amp;= \\left(b_0 + b_{\\mbox{male}}\\right) + \\left(b_{\\mbox{age}} + b_{\\mbox{age,male}} \\right) \\cdot \\mbox{age} \\\\ &amp;= \\left(4.883 + -0.446\\right) + \\left(-0.018 + 0.014 \\right) \\cdot \\mbox{age} \\\\ &amp;= 4.437 -0.004 \\cdot \\mbox{age} \\end{align} \\] Second, recall that \\(\\mathbb{1}_{\\mbox{is male}}(x)\\) equals 0 if a particular observation corresponds to a female instructor. Again, using the values from the regression table the fitted value of \\(\\widehat{\\mbox{score}}\\) is: \\[ \\begin{align} \\widehat{\\mbox{score}} &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) + b_{\\mbox{age,male}} \\cdot \\mbox{age} \\cdot \\mathbb{1}_{\\mbox{is male}}(x) \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age} + b_{\\mbox{male}} \\cdot 0 + b_{\\mbox{age,male}}\\mbox{age} \\cdot 0 \\\\ &amp;= b_0 + b_{\\mbox{age}} \\cdot \\mbox{age}\\\\ &amp;= 4.883 -0.018 \\cdot \\mbox{age} \\end{align} \\] Let’s summarize these values in a table: Table 7.8: Comparison of male and female intercepts and age slopes Gender Intercept Slope for age Male instructors 4.44 -0.004 Female instructors 4.88 -0.018 We see that while male instructors have a lower intercept, as they age, they have a less steep associated average decrease in teaching scores: 0.004 teaching score units per year as opposed to -0.018 for women. This is consistent with the different slopes and intercepts of the red and blue regression lines fit in Figure 7.4. Recall our definition of a model having an interaction effect: when the associated effect of one variable, in this case age, depends on the value of another variable, in this case gender. But how do we know when it’s appropriate to include an interaction effect? For example, which is the more appropriate model? The regular multiple regression model without an interaction term we saw in Section 7.2.2 or the multiple regression model with the interaction term we just saw? We’ll revisit this question in Chapter 11 on “inference for regression.” 7.2.4 Observed/fitted values and residuals Now say we want to apply the above calculations for male and female instructors for all 463 instructors in the evals_ch7 dataset. As our multiple regression models get more and more complex, computing such values by hand gets more and more tedious. The get_regression_points() function spares us this tedium and returns all fitted values and all residuals. For simplicity, let’s focus only on the fitted interaction model, which is saved in score_model_interaction. regression_points &lt;- get_regression_points(score_model_interaction) regression_points Table 7.9: Regression points (first 5 rows of 463) ID score age gender score_hat residual 1 4.7 36 female 4.25 0.448 2 4.1 36 female 4.25 -0.152 3 3.9 36 female 4.25 -0.352 4 4.8 36 female 4.25 0.548 5 4.6 59 male 4.20 0.399 Recall the format of the output: score corresponds to \\(y\\) the observed value score_hat corresponds to \\(\\widehat{y} = \\widehat{\\mbox{score}}\\) the fitted value residual corresponds to the residual \\(y - \\widehat{y}\\) 7.2.5 Residual analysis As always, let’s perform a residual analysis first with a histogram, which we can facet by gender: ggplot(regression_points, aes(x = residual)) + geom_histogram(binwidth = 0.25, color = &quot;white&quot;) + labs(x = &quot;Residual&quot;) + facet_wrap(~gender) Figure 7.6: Interaction model histogram of residuals Second, the residuals as compared to the predictor variables: \\(x_1\\): numerical explanatory/predictor variable of age \\(x_2\\): categorical explanatory/predictor variable of gender ggplot(regression_points, aes(x = age, y = residual)) + geom_point() + labs(x = &quot;age&quot;, y = &quot;Residual&quot;) + geom_hline(yintercept = 0, col = &quot;blue&quot;, size = 1) + facet_wrap(~ gender) Figure 7.7: Interaction model residuals vs predictor 7.3 Related topics 7.3.1 More on the correlation coefficient Recall from Table 7.2 that we saw the correlation coefficient between Income in thousands of dollars and credit card Balance was 0.464. What if in instead we looked at the correlation coefficient between Income and credit card Balance, but where Income was in dollars and not thousands of dollars? This can be done by multiplying Income by 1000. library(ISLR) data(Credit) Credit %&gt;% select(Balance, Income) %&gt;% mutate(Income = Income * 1000) %&gt;% cor() Table 7.10: Correlation between income (in $) and credit card balance Balance Income Balance 1.000 0.464 Income 0.464 1.000 We see it is the same! We say that the correlation coefficient is invariant to linear transformations! In other words, the correlation between \\(x\\) and \\(y\\) will be the same as the correlation between \\(a\\times x + b\\) and \\(y\\) where \\(a\\) and \\(b\\) are numerical values (real numbers in mathematical terms). 7.3.2 Simpson’s Paradox Recall in Section 7.1, we saw the two following seemingly contradictory results when studying the relationship between credit card balance, credit limit, and income. On the one hand, the right hand plot of Figure 7.1 suggested that credit card balance and income were positively related: Figure 7.8: Relationship between credit card balance and credit limit/income On the other hand, the multiple regression in Table 7.3, suggested that when modeling credit card balance as a function of both credit limit and income at the same time, credit limit has a negative relationship with balance, as evidenced by the slope of -7.66. How can this be? First, let’s dive a little deeper into the explanatory variable Limit. Figure 7.9 shows a histogram of all 400 values of Limit, along with vertical red lines that cut up the data into quartiles, meaning: 25% of credit limits were between $0 and $3088. Let’s call this the “low” credit limit bracket. 25% of credit limits were between $3088 and $4622. Let’s call this the “medium-low” credit limit bracket. 25% of credit limits were between $4622 and $5873. Let’s call this the “medium-high” credit limit bracket. 25% of credit limits were over $5873. Let’s call this the “high” credit limit bracket. Figure 7.9: Histogram of credit limits and quartiles Let’s now display The scatterplot showing the relationship between credit card balance and limit (the right-hand plot of Figure 7.1). The scatterplot showing the relationship between credit card balance and limit now with a color aesthetic added corresponding to the credit limit bracket. Figure 7.10: Relationship between credit card balance and income for different credit limit brackets In the right-hand plot, the Red points (bottom-left) correspond to the low credit limit bracket. Green points correspond to the medium-low credit limit bracket. Blue points correspond to the medium-high credit limit bracket. Purple points (top-right) correspond to the high credit limit bracket. The left-hand plot focuses of the relationship between balance and income in aggregate, but the right-hand plot focuses on the relationship between balance and income broken down by credit limit bracket. Whereas in aggregate there is an overall positive relationship, when broken down we now see that for the low (red points), medium-low (green points), and medium-high (blue points) income bracket groups, the strong positive relationship between credit card balance and income disappears! Only for the high bracket does the relationship stay somewhat positive. In this example, credit limit is a confounding variable for credit card balance and income. 7.4 Conclusion 7.4.1 What’s to come? Congratulations! We’re ready to proceed to the third portion of this book: “statistical inference” using a new package called infer. Once we’ve covered Chapters 8 on sampling, 9 on confidence intervals, and 10 on hypothesis testing, we’ll come back to the models we’ve seen in “data modeling” in Chapter 11 on inference for regression. As we said at the end of Chapter 6, we’ll see why we’ve been conducting the residual analyses from Subsections 7.1.4 and 7.2.5. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, conf_low and conf_high (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Up next: 7.4.2 Script of R code An R script file of all R code used in this chapter is available here. "],
+["8-sampling.html", "8 Sampling 8.1 Introduction to sampling 8.2 Tactile sampling simulation 8.3 Virtual sampling simulation 8.4 In real-life sampling: Polls 8.5 Conclusion", " 8 Sampling In this chapter we kick off the third segment of this book, statistical inference, by learning about sampling. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we’ll cover in Chapters 9 and 10 respectively. We will see that the tools that you learned in the data science segment of this book (data visualization, “tidy” data format, and data wrangling) will also play an important role here in the development of your understanding. As mentioned before, the concepts throughout this text all build into a culmination allowing you to “think with data.” Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(moderndive) 8.1 Introduction to sampling Let’s kick off this chapter immediately with an exercise that involves sampling. Imagine you are given a large bowl with 2400 balls that are either red or white. We are interested in the proportion of balls in this bowl that are red, but you don’t have the time to do an exhaustive count. You are also given a “shovel” that you can insert into this bowl… Figure 8.1: A bowl with 2400 balls … and extract a sample of 50 balls: Figure 8.2: A shovel used to extract a sample of size n = 50 Concepts related to sampling Let’s now define some concepts and terminology important to understand sampling, being sure to tie things back to the above example. You might have to read this a couple times more as you progress throughout this book, as they are very deeply layered concepts. However as we’ll soon see, they are very powerful concepts that open up a whole new world of scientific thinking: Population: The population is a set of \\(N\\) observations of interest. Above Ex: Our bowl consisting of \\(N=2400\\) identically-shaped balls. Population parameter: A population parameter is a numerical summary value about the population. In most settings, this is a value that’s unknown and you wish you knew it. Above Ex: The true population proportion \\(p\\) of the balls in the bowl that are red. In this scenario the parameter of interest is the proportion, but in others it could be numerical summary values like the mean, median, etc. Census: An exhaustive enumeration/counting of all observations in the population in order to compute the population parameter’s numerical value. exactly Above Ex: This corresponds to manually going over all \\(N=2400\\) balls and counting the number that are red, thereby allowing us to compute the population proportion \\(p\\) of the balls that are red exactly. When \\(N\\) is small, a census is feasible. However, when \\(N\\) is large, a census can get very expensive, either in terms of time, energy, or money. Ex: the Decennial United States census attempts to exhaustively count the US population. Consequently it is a very expensive, but necessary, procedure. Sampling: Collecting a sample of size \\(n\\) of observations from the population. Typically the sample size \\(n\\) is much smaller than the population size \\(N\\), thereby making sampling a much cheaper procedure than a census. Above Ex: Using the shovel to extract a sample of \\(n=50\\) balls. It is important to remember that the lowercase \\(n\\) corresponds to the sample size and uppercase \\(N\\) corresponds to the population size, thus \\(n \\leq N\\). Point estimates/sample statistics: A summary statistic based on the sample of size \\(n\\) that estimates the unknown population parameter. Above Ex: it’s the sample proportion \\(\\widehat{p}\\) red of the balls in the sample of size \\(n=50\\). Key: The sample proportion red \\(\\widehat{p}\\) is an estimate of the true unknown population proportion red \\(p\\). Representative sampling: A sample is said be a representative sample if it “looks like the population”. In other words, the sample’s characteristics are a good representation of the population’s characteristics. Above Ex: Does our sample of \\(n=50\\) balls “look like” the contents of the larger set of \\(N=2400\\) balls in the bowl? Generalizability: We say a sample is generalizable if any results of based on the sample can generalize to the population. Above Ex: Is \\(\\widehat{p}\\) a “good guess” of \\(p\\)? In other words, can we infer about the true proportion of the balls in the bowl that are red, based on the results of our sample of \\(n=50\\) balls? Bias: In a statistical sense, we say bias occurs if certain observations in a population have a higher chance of being sampled than others. We say a sampling procedure is unbiased if every observation in a population had an equal chance of being sampled. Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? we feel since the balls are all of the same size, there isn’t any bias in the sampling. If, say, the red balls had a much larger diameter than the red ones. You might have have a higher or lower probability of now sampling red balls. Random sampling: We say a sampling procedure is random if we sample randomly from the population in an unbiased fashion. Above Ex: As long as you mixed the bowl sufficiently before sampling, your samples of size \\(n=50\\) balls would be random. Inference via sampling Why did we go through the trouble of enumerating all the above concepts and terminology? The moral of the story: If the sampling of a sample of size \\(n\\) is done at random, then The sample is unbiased and representative of the population, thus Any result based on the sample can generalize to the population, thus The point estimate/sample statistic is a “good guess” of the unknown population parameter of interest and thus we have inferred about the population based on our sample. In the above example: If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size \\(n=50\\), then The contents of the shovel will “look like” the contents of the bowl, thus Any results based on the sample of \\(n=50\\) balls can generalize to the large bowl of \\(N=2400\\) balls, thus The sample proportion \\(\\widehat{p}\\) of the \\(n=50\\) balls in the shovel that are red is a “good guess” of the true population proportion \\(p\\) of the \\(N=2400\\) balls that are red. and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel. At this point, you might be saying to yourself: “Big deal, why do we care about this bowl?” As hopefully you’ll soon come to appreciate, this sampling bowl exercise is merely a simulation representing the reality of many important sampling scenarios in a simplified and accessible setting. One in particular sampling scenario is familiar to many: polling. Whether for market research or for political purposes, polls inform much of the world’s decision and opinion making, and understanding the mechanism behind them can better inform you statistical citizenship. We’ll tie-in everything we learn in this chapter with an example relating to a 2013 poll on President Obama’s approval ratings among young adult in Section 8.4. 8.2 Tactile sampling simulation Let’s start by revisiting our tactile sampling illustrating with “sampling bowl” in Figures 8.1 and 8.2. By tactile we mean with your hands and to the touch. We’ll break down the act of tactile sampling from the bowl with the shovel using our newly acquired concepts and terminology relating to sampling. In particular we’ll study how sampling variability affects outcomes, which we’ll illustrate through simulations of repeated sampling. To this end, we’ll be using both the above-mentioned tactile simulation, but also using virtual simulation. By virtual we mean on the computer. 8.2.1 Using shovel once Let’s now view our shovel through the lens of sampling with the following 3-step tactile sampling simulation: Step 1: Use the shovel to take a sample of size \\(n=50\\) balls from the bowl as seen in Fig 8.3. Figure 8.3: Step 1: Take sample of size \\(n=50\\) Step 2: Pour them into a cup and Count the number that are red then Compute the sample proportion \\(\\widehat{p}\\) of the \\(n=50\\) balls that are red as seen in Figure 8.4 below. Note from above there are 18 balls out of \\(n=50\\) that are red. Thus the sample proportion red \\(\\widehat{p}\\) for this particular sample is thus \\(\\widehat{p} = 18 / 50 = 0.36\\). Figure 8.4: Step 2: Pour into Red Solo Cup and compute \\(\\widehat{p}\\) Step 3: Mark the sample proportion \\(\\widehat{p}\\) in a hand-drawn histogram, just like our intrepid students are doing in Figure 8.5. Figure 8.5: Step 3: Mark \\(\\widehat{p}\\)’s in histogram Repeat Steps 1-3 a few times: After a few groups of students complete this exercise, let’s draw the resulting histogram by hand. In Figure 8.6 we have the resulting hand-drawn histogram for 10 groups of students. Figure 8.6: Step 3: Histogram of 10 values of \\(\\widehat{p}\\) Observe the behavior of the 10 different values of the sample proportion \\(\\widehat{p}\\) in the histogram of their distribution, in particular where the values center and how much they spread out, in other words how much they vary. Note: The lowest value of \\(\\widehat{p}\\) was somewhere between 0.20 and 0.25. The highest value of \\(\\widehat{p}\\) was somewhere between 0.45 and 0.50. Five of the sample proportions \\(\\widehat{p}\\) cluster. Five different samples of size \\(n=50\\) yielded sample proportions \\(\\widehat{p}\\) that were in the range 0.30 to 0.35. Let’s now look at some real-life outcomes of this tactile sampling simulation. We present the actual results for not 10 groups of students, but 33 groups of students below! 8.2.2 Using shovel 33 times All told, 33 groups took samples. In other words, the shovel was used 33 times and 33 values of the sample proportion \\(\\widehat{p}\\) were computed; this data is saved in the tactile_prop_red data frame included in the moderndive package. Let’s display its contents in Table ??. Notice how the replicate column enumerates each of the 33 groups, red_balls is the count of balls in the sample of size \\(n=50\\) that we red, and prop_red is the sample proportion \\(\\widehat{p}\\) that are red. tactile_prop_red View(tactile_prop_red) group replicate red_balls prop_red Ilyas, Yohan 1 21 0.42 Morgan, Terrance 2 17 0.34 Martin, Thomas 3 21 0.42 Clark, Frank 4 21 0.42 Riddhi, Karina 5 18 0.36 Andrew, Tyler 6 19 0.38 Julia 7 19 0.38 Rachel, Lauren 8 11 0.22 Daniel, Caroline 9 15 0.30 Josh, Maeve 10 17 0.34 Emily, Emily 11 16 0.32 Conrad, Emily 12 18 0.36 Oliver, Erik 13 17 0.34 Isabel, Nam 14 21 0.42 X, Claire 15 15 0.30 Cindy, Kimberly 16 20 0.40 Kevin, James 17 11 0.22 Nam, Isabelle 18 21 0.42 Harry, Yuko 19 15 0.30 Yuki, Eileen 20 16 0.32 Ramses 21 23 0.46 Joshua, Elizabeth, Stanley 22 15 0.30 Siobhan, Jane 23 18 0.36 Jack, Will 24 16 0.32 Caroline, Katie 25 21 0.42 Griffin, Y 26 18 0.36 Kaitlin, Jordan 27 17 0.34 Ella, Garrett 28 18 0.36 Julie, Hailin 29 15 0.30 Katie, Caroline 30 21 0.42 Mallory, Damani, Melissa 31 21 0.42 Katie 32 16 0.32 Francis, Vignesh 33 19 0.38 Using your data visualization skills that you honed in Chapter 3, let’s visualize the distribution of these 33 sample proportions red \\(\\widehat{p}\\) using a histogram with binwidth = 0.05. This visualization is appropriate since prop_red is a numerical variable. This histogram is showing a very particular important type of distribution in statistics: the sampling distribution. ggplot(tactile_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, color = &quot;white&quot;) + labs(x = &quot;Sample proportion red based on n = 50&quot;, title = &quot;Sampling distribution of p-hat&quot;) Figure 8.7: Sampling distribution of 33 sample proportions based on 33 tactile samples with n=50 Sampling distributions are a specific kind of distribution: distributions of point estimates/sample statistics based on samples of size \\(n\\) used to estimate an unknown population parameter. In the case of the histogram in Figure 8.7, its the distribution of the sample proportion red \\(\\widehat{p}\\) based on \\(n=50\\) sampled balls from the bowl, for which we want to estimate the unknown population proportion \\(p\\) of the \\(N=2400\\) balls that are red. Sampling distributions describe how values of the sample proportion red \\(\\widehat{p}\\) will vary from sample to sample due to sampling variability and thus identify “typical” and “atypical” values of \\(\\widehat{p}\\). For example Obtaining a sample that yields \\(\\widehat{p} = 0.36\\) would be considered typical, common, and plausible since it would in theory occur frequently. Obtaining a sample that yields \\(\\widehat{p} = 0.8\\) would be considered atypical, uncommon, and implausible since it lies far away from most of the distribution. Let’s now ask ourselves the following questions: Where is the sampling distribution centered? What is the spread of this sampling distribution? Recall from Section 5.4 the mean and the standard deviation are two summary statistics that would answer this question: tactile_prop_red %&gt;% summarize(mean = mean(prop_red), sd = sd(prop_red)) mean sd 0.356 0.058 Finally, it’s important to keep in mind: If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red \\(p\\), or in other words the true number of balls out of 2400 that are red. The spread of this histogram, as quantified by the standard deviation of 0.058, is called the standard error. It quantifies the variability of our estimates for \\(\\widehat{p}\\). Note: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors. 8.3 Virtual sampling simulation Now let’s mimic the above tactile sampling, but with virtual sampling. We’ll resort to virtual sampling because while collecting 33 tactile samples manually is feasible, for large numbers like 1000, things start getting tiresome! That’s where a computer can really help: computers excel at performing mundane tasks repeatedly; think of what accounting software must be like! In other words: Instead of considering the tactile bowl shown in Figure 8.1 above and using a tactile shovel to draw samples of size \\(n=50\\) Let’s use a virtual bowl saved in a computer and use R’s random number generator as a virtual shovel to draw samples of size \\(n=50\\) First, we describe our virtual bowl. In the moderndive package, we’ve included a data frame called bowl that has 2400 rows corresponding to the \\(N=2400\\) balls in the physical bowl. Run View(bowl) in RStudio to convince yourselves that bowl is indeed a virtual version of the tactile bowl in the previous section. bowl # A tibble: 2,400 x 2 ball_ID color &lt;int&gt; &lt;chr&gt; 1 1 white 2 2 white 3 3 white 4 4 red 5 5 white 6 6 white 7 7 red 8 8 white 9 9 red 10 10 white # … with 2,390 more rows Note that the balls are not actually marked with numbers; the variable ball_ID is merely used as an identification variable for each row of bowl. Recall our previous discussion on identification variables in Subsection 4.2.2 in the “Data Tidying” Chapter 4. Next, we describe our virtual shovel: the rep_sample_n() function included in the moderndive package where rep_sample_n() indicates that we are taking repeated/replicated samples of size \\(n\\). 8.3.1 Using shovel once The rep_sample_n() function included in the moderndive package where rep_sample_n() indicates that we are taking repeated/replicated samples of size \\(n\\). Let’s perform the virtual analogue of tactilely inserting the shovel only once into the bowl and extracting a sample of size \\(n=50\\). In the table below we only show results about the first 10 sampled balls out of 50. virtual_shovel &lt;- bowl %&gt;% rep_sample_n(size = 50) View(virtual_shovel) Table 8.1: First 10 sampled balls of 50 in virtual sample replicate ball_ID color 1 2079 red 1 1076 white 1 1691 red 1 1687 red 1 1434 white 1 954 white 1 483 white 1 1520 white 1 2060 red 1 1682 white Looking at all 50 rows of virtual_shovel in the spreadsheet viewer that pops up after running View(virtual_shovel) in RStudio, the ball_ID variable seems to suggest that we do indeed have a random sample of \\(n=50\\) balls. However, what does the replicate variable indicate, where in this case it’s equal to 1 for all 50 rows? We’ll see in a minute. First let’s compute both the number of balls red and the proportion red out of \\(n=50\\) using our dplyr data wrangling tools from Chapter 5: virtual_shovel %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) Table 8.2: Count and proportion red in single virtual sample of size n = 50 replicate red prop_red 1 23 0.46 Why does this work? Because for every row where color == &quot;red&quot;, the Boolean TRUE is returned and R treats TRUE like the number 1. Equivalently, for every row where color is not equal to &quot;red&quot;, the Boolean FALSE is returned and R treats FALSE like the number 0. So summing the number of TRUE’s and FALSE’s is equivalent to summing 1’s and 0’s which counts the number of balls where color is red. 8.3.2 Using shovel 33 times Recall however in our tactile sampling exercise in Section 8.2 above that we had 33 groups of students take 33 samples total of size \\(n=50\\) using the shovel 33 times and hence compute 33 separate values of the sample proportion red \\(\\widehat{p}\\). In other words we repeated/replicated the sampling 33 times. We can achieve this by reusing the same rep_sample_n() function code above, but by adding the reps = 33 argument indicating we want to repeat this sampling 33 times: virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 33) View(virtual_samples) virtual_samples has \\(50 \\times 33 = 1650\\) rows, corresponding to 33 samples of size \\(n=50\\), or 33 draws from the shovel. We won’t display the contents of this data frame but leave it to you to View() this data frame. You’ll see that the first 50 rows have replicate equal to 1, then the next 50 rows have replicate equal to 2, and so on and so forth, up until the last 50 rows which have replicate equal to 33. The replicate variable denotes which of our 33 samples a particular ball is included in. Now let’s compute the 33 corresponding values of the sample proportion \\(\\widehat{p}\\) based on 33 different samples of size \\(n=50\\) by reusing the previous code, but remembering to group_by the replicate variable first since we want to compute the sample proportion for each of the 33 samples separately. Notice the similarity of this table with Table ??. virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) View(virtual_prop_red) replicate red prop_red 1 17 0.34 2 20 0.40 3 24 0.48 4 20 0.40 5 17 0.34 6 16 0.32 7 17 0.34 8 19 0.38 9 19 0.38 10 12 0.24 11 22 0.44 12 17 0.34 13 20 0.40 14 22 0.44 15 13 0.26 16 15 0.30 17 23 0.46 18 20 0.40 19 16 0.32 20 12 0.24 21 14 0.28 22 21 0.42 23 14 0.28 24 18 0.36 25 19 0.38 26 12 0.24 27 22 0.44 28 23 0.46 29 19 0.38 30 18 0.36 31 20 0.40 32 17 0.34 33 20 0.40 Just as we did before, let’s now visualize the sampling distribution using a histogram with binwidth = 0.05 of the 33 virtually sample proportions \\(\\widehat{p}\\): ggplot(virtual_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, color = &quot;white&quot;) + labs(x = &quot;Sample proportion red based on n = 50&quot;, title = &quot;Sampling distribution of p-hat&quot;) Figure 8.8: Sampling distribution of 33 sample proportions based on 33 virtual samples with n=50 The resulting sampling distribution based on our virtual sampling simulation is near identical to the sampling distribution of our tactile sampling simulation from Section 8.3. Let’s compare them side-by-side in Figure 8.9. Figure 8.9: Comparison of sampling distributions based on 33 tactile &amp; virtual samples with n=50 We see that they are similar in terms of center and spread, although not identical due to random variation. This was in fact by design, as we made the virtual contents of the virtual bowl match the actual contents of the actual bowl pictured above. 8.3.3 Using shovel 1000 times In Figure 8.8, we can start seeing a pattern in the sampling distribution emerge. However, 33 values of the sample proportion \\(\\widehat{p}\\) might not be enough to get a true sense of the distribution. Using 1000 values of \\(\\widehat{p}\\) would definitely give a better sense. What are our two options for constructing these histograms? Tactile sampling: Make the 33 groups of students take \\(1000 / 33 \\approx 31\\) samples of size \\(n=50\\) each, count the number of red balls for each of the 1000 tactile samples, and then compute the 1000 corresponding values of the sample proportion \\(\\widehat{p}\\). However, this would be cruel and unusual as this would take hours! Virtual sampling: Computers are very good at automating repetitive tasks such as this one. This is the way to go! First, generate 1000 samples of size \\(n=50\\) virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 1000) View(virtual_samples) Then for each of these 1000 samples of size \\(n=50\\), compute the corresponding sample proportions virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) View(virtual_prop_red) As previously done, let’s plot the sampling distribution of these 1000 simulated values of the sample proportion red \\(\\widehat{p}\\) with a histogram in Figure 8.10. ggplot(virtual_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, color = &quot;white&quot;) + labs(x = &quot;Sample proportion red based on n = 50&quot;, title = &quot;Sampling distribution of p-hat&quot;) Figure 8.10: Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50 Since the sampling is random and thus representative and unbiased, the above sampling distribution is centered at the true population proportion red \\(p\\) of all \\(N=2400\\) balls in the bowl. Eyeballing it, the sampling distribution appears to be centered at around 0.375. What is the standard error of the above sampling distribution of \\(\\widehat{p}\\) based on 1000 samples of size \\(n=50\\)? virtual_prop_red %&gt;% summarize(SE = sd(prop_red)) # A tibble: 1 x 1 SE &lt;dbl&gt; 1 0.0698 What this value is saying might not be immediately apparent by itself to someone who is new to sampling. It’s best to first compare different standard errors for different sampling schemes based on different sample sizes \\(n\\). We’ll do so for samples of size \\(n=25\\), \\(n=50\\), and \\(n=100\\) next. 8.3.4 Using different shovels Recall, the sampling we just did on the computer using the rep_sample_n() function is simply a virtual version of act of taking a tactile sample using the shovel with \\(n=50\\) slots seen in Figure 8.11. We visualized the variation in the resulting sample proportion red \\(\\widehat{p}\\) in a histogram of the sampling distribution and quantified this variation using the standard error. Figure 8.11: Tactile shovel for sampling n = 50 balls But what if we changed the sample size to \\(n=25\\)? This would correspond to sampling using the shovel with \\(n=25\\) slots see in Figure 8.12. What differences if any would you notice about the sampling distribution and the standard error? Figure 8.12: Tactile shovel for sampling n = 25 balls Furthermore what if we took samples of size \\(n=100\\) as well? This would correspond to sampling using the shovel with \\(n=100\\) slots see in Figure 8.13. What differences if any would you notice about the sampling distribution and the standard error for \\(n=100\\) as compared to \\(n=50\\) and \\(n=25\\)? Figure 8.13: Tactile shovel for sampling n = 100 balls Let’s take the opportunity to review our sampling procedure and do this for 1000 virtual samples of size \\(n=25\\), \\(n=50\\), \\(n=100\\) each. Shovel with \\(n=50\\) slots: Take 1000 virtual samples of size \\(n=50\\), mimicking the act of taking 1000 tactile samples using the shovel with \\(n=50\\) slots: virtual_samples_50 &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 1000) Then based on each of these 1000 virtual samples of size \\(n=50\\), compute the corresponding 1000 sample proportions \\(\\widehat{p}\\) being sure to divide by 50: virtual_prop_red_50 &lt;- virtual_samples_50 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) The standard error is the standard deviation of the 1000 sample proportions \\(\\widehat{p}\\), in other words we are quantifying how much \\(\\widehat{p}\\) varies from sample-to-sample based on samples of size \\(n=50\\) due to sampling variation. virtual_prop_red_50 %&gt;% summarize(SE = sd(prop_red)) # A tibble: 1 x 1 SE &lt;dbl&gt; 1 0.0694 Shovel with \\(n=25\\) slots: Take 1000 virtual samples of size \\(n=25\\), mimicking the act of taking 1000 tactile samples using the shovel with \\(n=25\\) slots: virtual_samples_25 &lt;- bowl %&gt;% rep_sample_n(size = 25, reps = 1000) Then based on each of these 1000 virtual samples of size \\(n=50\\), compute the corresponding 1000 sample proportions \\(\\widehat{p}\\) being sure to divide by 50: virtual_prop_red_25 &lt;- virtual_samples_25 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 25) The standard error is the standard deviation of the 1000 sample proportions \\(\\widehat{p}\\), in other words we are quantifying how much \\(\\widehat{p}\\) varies from sample-to-sample based on samples of size \\(n=25\\) due to sampling variation. virtual_prop_red_25 %&gt;% summarize(SE = sd(prop_red)) # A tibble: 1 x 1 SE &lt;dbl&gt; 1 0.100 Shovel with \\(n=100\\) slots: Take 1000 virtual samples of size \\(n=100\\), mimicking the act of taking 1000 tactile samples using the shovel with \\(n=100\\) slots: virtual_samples_100 &lt;- bowl %&gt;% rep_sample_n(size = 100, reps = 1000) Then based on each of these 1000 virtual samples of size \\(n=100\\), compute the corresponding 1000 sample proportions \\(\\widehat{p}\\) being sure to divide by 100: virtual_prop_red_100 &lt;- virtual_samples_100 %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 100) The standard error is the standard deviation of the 1000 sample proportions \\(\\widehat{p}\\), in other words we are quantifying how much \\(\\widehat{p}\\) varies from sample-to-sample based on samples of size \\(n=100\\) due to sampling variation. virtual_prop_red_100 %&gt;% summarize(SE = sd(prop_red)) # A tibble: 1 x 1 SE &lt;dbl&gt; 1 0.0457 Comparison: Let’s compare the 3 standard errors we computed above in Table ??: n SE 25 0.100 50 0.069 100 0.046 Observe the behavior of the standard error as \\(n\\) increases from \\(n=25\\) to \\(n=50\\) to \\(n=100\\), the standard error get smaller. In other words, the values of \\(\\widehat{p}\\) vary less. The standard error is a numerical quantification of the spreads of the following three histograms (on the same scale) of the sampling distribution of the sample proportion \\(\\widehat{p}\\): Figure 8.14: Comparing sampling distributions of p-hat for different sample sizes n Observe that the histogram of possible \\(\\widehat{p}\\) values are narrowest and most consistent for the \\(n=100\\) case. In other words, they make less error. “Bigger sample size equals better sampling” is a concept you probably knew before reading this chapter. What we’ve just demonstrated is what this concept means: Samples based on large samples sizes will yield point estimates that vary less around the true value and hence be less prone to error. In the case of our sampling bowl, the sample proportion red \\(\\widehat{p}\\) based on samples of size \\(n=100\\) will vary the least around the true proportion \\(p\\) of the balls that are red, and thus be less prone to error. On the case of polls as we study in the next chapter: representative polls based on a larger number of respondents will yield guess that tend to be closer to the truth. 8.4 In real-life sampling: Polls In December 4, 2013 National Public Radio reported on a recent poll of President Obama’s approval rating among young Americans aged 18-29 in an article Poll: Support For Obama Among Young Americans Eroding. A quote from the article: After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama. According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama’s job performance, his lowest-ever standing among the group and an 11-point drop from April. Let’s tie elements of this story using the concepts and terminology we learned at the outset of this chapter along with our observations from the tactile and virtual sampling simulations: Population: Who is the population of \\(N\\) observations of interest? Bowl: \\(N=2400\\) identically-shaped balls Obama poll: \\(N = \\text{?}\\) young Americans aged 18-29 Population parameter: What is the population parameter? Bowl: The true population proportion \\(p\\) of the balls in the bowl that are red. Obama poll: The true population proportion \\(p\\) of young Americans who approve of Obama’s job performance. Census: What would a census be in this case? Bowl: Manually going over all \\(N=2400\\) balls and exactly computing the population proportion \\(p\\) of the balls that are red. Obama poll: Locating all \\(N = \\text{?}\\) young Americans (which is in the millions) and asking them if they approve of Obama’s job performance. This would be quite expensive to do! Sampling: How do you acquire the sample of size \\(n\\) observations? Bowl: Using the shovel to extract a sample of \\(n=50\\) balls. Obama poll: One way would be to get phone records from a database and pick out \\(n\\) phone numbers. In the case of the above poll, the sample was of size \\(n=2089\\) young adults. Point estimates/sample statistics: What is the summary statistic based on the sample of size \\(n\\) that estimates the unknown population parameter? Bowl: The sample proportion \\(\\widehat{p}\\) red of the balls in the sample of size \\(n=50\\). Key: The sample proportion red \\(\\widehat{p}\\) of young Americans in the sample of size \\(n=2089\\) that approve of Obama’s job performance. In this study’s case, \\(\\widehat{p} = 0.41\\) which is the quoted 41% figure in the article. Representative sampling: Is the sample procedure representative? In other words, to the resulting samples “look like” the population? Bowl: Does our sample of \\(n=50\\) balls “look like” the contents of the larger set of \\(N=2400\\) balls in the bowl? Obama poll: Does our sample of \\(n=2089\\) young Americans “look like” the population of all young Americans aged 18-29? Generalizability: Are the samples generalizable to the greater population? Bowl: Is \\(\\widehat{p}\\) a “good guess” of \\(p\\)? Obama poll: Is \\(\\widehat{p} = 0.41\\) a “good guess” of \\(p\\)? In other words, can we confidently say that 41% of all young Americans approve of Obama. Bias: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample? Bowl: Here, I would say it is unbiased. All balls are equally sized as evidenced by the slots of the \\(n=50\\) shovel, and thus no particular color of ball can be favored in our samples over others. Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using a database of only mobile phone numbers, would people without mobile phones be included? What about if this were an internet poll on a certain news website? Would non-readers of this this website be included? Random sampling: Was the sampling random? Bowl: As long as you mixed the bowl sufficiently before sampling, your samples would be random? Obama poll: Random sampling is a necessary assumption for all of the above to work. Most articles reporting on polls take this assumption as granted. In our Obama poll, you’d have to ask the group that conducted the poll: The Harvard University Institute of Politics. Recall the punchline of all the above: If the sampling of a sample of size \\(n\\) is done at random, then The sample is unbiased and representative of the population, thus Any result based on the sample can generalize to the population, thus The point estimate/sample statistic is a “good guess” of the unknown population parameter of interest and thus we have inferred about the population based on our sample. In the bowl example: If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size \\(n=50\\), then The contents of the shovel will “look like” the contents of the bowl, thus Any results based on the sample of \\(n=50\\) balls can generalize to the large bowl of \\(N=2400\\) balls, thus The sample proportion \\(\\widehat{p}\\) of the \\(n=50\\) sampled balls in the shovel that are red is a “good guess” of the true population proportion \\(p\\) of the \\(N=2400\\) balls that are red. and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel: the proportion of balls that are red. In the Obama poll example: If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then These 2089 young Americans would “look like” the population of all young Americans, thus Any results based on this sample of 2089 young Americans can generalize to entire population of all young Americans, thus The reported sample approval rating of 41% of these 2089 young Americans is a “good guess” of the true approval rating amongst all young Americans. So long story short, this poll’s guess of Obama’s approval rating was 41%. However is this the end of the story when understanding the results of a poll? If you read further in the article, it states: The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points. Note the term margin of error, which here is plus or minus 2.1 percentage points. This is saying that a typical range of errors for polls of this type is about \\(\\pm 2.1\\%\\), in words from about 2.1% too small to about 2.1% too big. These errors are caused by sampling variation, the same sampling variation you saw studied in the histograms in Sections 8.2 on our tactile sampling simulations and Sections 8.3 on our virtual sampling simulations. In this case of polls, any variation from the true approval rating is an “error” and a reasonable range of errors is the margin of error. We’ll see in the next chapter that this what’s known as a 95% confidence interval for the unknown approval rating. We’ll study confidence intervals using a new package for our data science and statistical toolbox: the infer package for statistical inference. 8.5 Conclusion 8.5.1 Central Limit Theorem What you did in Section 8.2 and 8.3 was demonstrate a very famous theorem, or mathematically proven truth, called the Central Limit Theorem. It loosely states that when sample means and sample proportions are based on larger and larger samples, the sampling distribution corresponding to these point estimates get More and more normal More and more narrow Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following three minute and 38 second video explaining this crucial theorem to statistics using as examples, what else? The average weight of wild bunny rabbits! The average wing span of dragons! 8.5.2 What’s to come? This chapter serves as an introduction to the theoretical underpinning of the statistical inference techniques that will be discussed in greater detail in Chapter 9 for confidence intervals and Chapter 10 for hypothesis testing. 8.5.3 Script of R code An R script file of all R code used in this chapter is available here. "],
+["9-confidence-intervals.html", "9 Confidence Intervals 9.1 Bootstrapping 9.2 The infer package for statistical inference 9.3 Now to confidence intervals 9.4 Comparing bootstrap and sampling distributions 9.5 Interpreting the confidence interval 9.6 EXAMPLE: One proportion 9.7 EXAMPLE: Comparing two proportions 9.8 Conclusion", " 9 Confidence Intervals In Chapter 8, we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter 8: Generally speaking, we learned that if the sampling of a sample of size \\(n\\) is done at random, then the resulting sample is unbiased and representative of the population, thus any result based on the sample can generalize to the population, and hence the point estimate/sample statistic computed from this sample is a “good guess” of the unknown population parameter of interest Specific to the bowl, we learned that if we properly mix the balls first thereby ensuring the randomness of samples extracted using the shovel with \\(n=50\\) slots, then the contents of the shovel will “look like” the contents of the bowl, thus any results based on the sample of \\(n=50\\) balls can generalize to the large bowl of \\(N=2400\\) balls, and hence the sample proportion red \\(\\widehat{p}\\) of the \\(n=50\\) balls in the shovel is a “good guess” of the true population proportion red \\(p\\) of the \\(N=2400\\) balls in the bowl. We emphasize that we used a point estimate/sample statistic, in this case the sample proportion \\(\\widehat{p}\\), to estimate the unknown value of the population parameter, in this case the population proportion \\(p\\). In other words, we are using the sample to infer about the population. We can however consider inferential situations other than just those involving proportions. We present a wide array of such scenarios in Table ??. In all 7 cases, the point estimate/sample statistic estimates the unknown population parameter. It does so by computing summary statistics based on a sample of size \\(n\\). Scenario Population parameter Population Notation Point estimate/sample statistic Sample Notation 1 Population proportion \\(p\\) Sample proportion \\(\\widehat{p}\\) 2 Population mean \\(\\mu\\) Sample mean \\(\\overline{x}\\) 3 Difference in population proportions \\(p_1 - p_2\\) Difference in sample proportions \\(\\widehat{p}_1 - \\widehat{p}_2\\) 4 Difference in population means \\(\\mu_1 - \\mu_2\\) Difference in sample means \\(\\overline{x}_1 - \\overline{x}_2\\) 5 Population standard deviation \\(\\sigma\\) Sample standard deviation \\(s\\) 6 Population regression intercept \\(\\beta_0\\) Sample regression intercept \\(\\widehat{\\beta}_0\\) or \\(b_0\\) 7 Population regression slope \\(\\beta_1\\) Sample regression slope \\(\\widehat{\\beta}_1\\) or \\(b_1\\) We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing: Scenario 2 about means. Ex: the average age of pennies. Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of two-sample inference. Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This another situation of two-sample inference. In contrast to these, Scenario 5 involves a measure of spread: the standard deviation. Does the spread/variability of a sample match the spread/variability of the population? However, we leave this topic for a more intermediate course on statistical inference. In Chapter 11 on inference for regression, we’ll cover Scenarios 6 &amp; 7 about the regression line. In particular we’ll see that the fitted regression line from Chapter 6 on basic regression, \\(\\widehat{y} = b_0 + b_1 \\cdot x\\), is in fact an estimate of some true population regression line \\(y = \\beta_0 + \\beta+1 \\cdot x\\) based on a sample of \\(n\\) pairs of points \\((x, y)\\). Ex: Recall our sample of \\(n=463\\) instructors at the UT Austin from the evals data set in Chapter 6. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for all instructors, not just those at the UT Austin? In most cases, we don’t have the population values as we did with the bowl of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a confidence interval and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as bootstrapping that will be the focus of the beginning sections of this chapter. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(janitor) library(moderndive) library(infer) DataCamp Our approach of using data science tools to understand the first major component of statistical inference, confidence intervals, uses the same tools as in Mine Cetinkaya-Rundel and Andrew Bray’s DataCamp courses “Inference for Numerical Data” and “Inference for Categorical Data.” If you’re interested in complementing your learning below in an interactive online environment, click on the images below to access the courses. 9.1 Bootstrapping 9.1.1 Data explanation The moderndive package contains a sample of 40 pennies collected and minted in the United States. Let’s explore this sample data first: pennies_sample # A tibble: 40 x 2 year age_in_2011 &lt;int&gt; &lt;int&gt; 1 2005 6 2 1981 30 3 1977 34 4 1992 19 5 2005 6 6 2006 5 7 2000 11 8 1992 19 9 1988 23 10 1996 15 # … with 30 more rows The pennies_sample data frame has rows corresponding to a single penny with two variables: year of minting as shown on the penny and age_in_2011 giving the years the penny had been in circulation from 2011 as an integer, e.g. 15, 2, etc. Suppose we are interested in understanding some properties of the mean age of all US pennies from this data collected in 2011. How might we go about that? Let’s begin by understanding some of the properties of pennies_sample using data wrangling from Chapter 5 and data visualization from Chapter 3. 9.1.2 Exploratory data analysis First, let’s visualize the values in this sample as a histogram: ggplot(pennies_sample, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) We see a roughly symmetric distribution here that has quite a few values near 20 years in age with only a few larger than 40 years or smaller than 5 years. If pennies_sample is a representative sample from the population, we’d expect the age of all US pennies collected in 2011 to have a similar shape, a similar spread, and similar measures of central tendency like the mean. So where does the mean value fall for this sample? This point will be known as our point estimate and provides us with a single number that could serve as the guess to what the true population mean age might be. Recall how to find this using the dplyr package: x_bar &lt;- pennies_sample %&gt;% summarize(stat = mean(age_in_2011)) x_bar # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 We’ve denoted this sample mean as \\(\\bar{x}\\), which is the standard symbol for denoting the mean of a sample. Our point estimate is, thus, \\(\\bar{x} = 25.1\\). Note that this is just one sample though providing just one guess at the population mean. What if we’d like to have another guess? This should all sound similar to what we did in Chapter 8. There instead of collecting just a single scoop of balls we had many different students use the shovel to scoop different samples of red and white balls. We then calculated a sample statistic (the sample proportion) from each sample. But, we don’t have a population to pull from here with the pennies. We only have this one sample. The process of bootstrapping allows us to use a single sample to generate many different samples that will act as our way of approximating a sampling distribution using a created bootstrap distribution instead. We will pull ourselves up from our bootstraps using a single sample (pennies_sample) to get an idea of the grander sampling distribution. 9.1.3 The Bootstrapping Process Bootstrapping uses a process of sampling with replacement from our original sample to create new bootstrap samples of the same size as our original sample. We can again make use of the rep_sample_n() function to explore what one such bootstrap sample would look like. Remember that we are randomly sampling from the original sample here with replacement and that we always use the same sample size for the bootstrap samples as the size of the original sample (pennies_sample). bootstrap_sample1 &lt;- pennies_sample %&gt;% rep_sample_n(size = 40, replace = TRUE, reps = 1) bootstrap_sample1 # A tibble: 40 x 3 # Groups: replicate [1] replicate year age_in_2011 &lt;int&gt; &lt;int&gt; &lt;int&gt; 1 1 1983 28 2 1 2000 11 3 1 2004 7 4 1 1981 30 5 1 1993 18 6 1 2006 5 7 1 1981 30 8 1 2004 7 9 1 1992 19 10 1 1994 17 # … with 30 more rows Let’s visualize what this new bootstrap sample looks like: ggplot(bootstrap_sample1, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) We now have another sample from what we could assume comes from the population of interest. We can similarly calculate the sample mean of this bootstrap sample, called a bootstrap statistic. bootstrap_sample1 %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 1 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 23.2 We can see that this sample mean is smaller than the x_bar value we calculated earlier for the pennies_sample data. We’ll come back to analyzing the different bootstrap statistic values shortly. Let’s recap what was done to get to this bootstrap sample using a tactile explanation: First, pretend that each of the 40 values of age_in_2011 in pennies_sample were written on a small piece of paper. Recall that these values were 6, 30, 34, 19, 6, etc. Now, put the 40 small pieces of paper into a receptacle such as a baseball cap. Shake up the pieces of paper. Draw “at random” from the cap to select one piece of paper. Write down the value on this piece of paper. Say that it is 28. Now, place this piece of paper containing 28 back into the cap. Draw “at random” again from the cap to select a piece of paper. Note that this is the sampling with replacement part since you may draw 28 again. Repeat this process until you have drawn 40 pieces of paper and written down the values on these 40 pieces of paper. Completing this repetition produces ONE bootstrap sample. If you look at the values in bootstrap_sample1, you can see how this process plays out. We originally drew 28, then we drew 11, then 7, and so on. Of course, we didn’t actually use pieces of paper and a cap here. We just had the computer perform this process for us to produce bootstrap_sample1 using rep_sample_n() with replace = TRUE set. The process of sampling with replacement is how we can use the original sample to take a guess as to what other values in the population may be. Sometimes in these bootstrap samples, we will select lots of larger values from the original sample, sometimes we will select lots of smaller values, and most frequently we will select values that are near the center of the sample. Let’s explore what the distribution of values of age_in_2011 for six different bootstrap samples looks like to further understand this variability. six_bootstrap_samples &lt;- pennies_sample %&gt;% rep_sample_n(size = 40, replace = TRUE, reps = 6) ggplot(six_bootstrap_samples, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) + facet_wrap(~ replicate) We can also look at the six different means using dplyr syntax: six_bootstrap_samples %&gt;% group_by(replicate) %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 6 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 23.6 2 2 24.1 3 3 25.2 4 4 23.1 5 5 24.0 6 6 24.7 Instead of doing this six times, we could do it 1000 times and then look at the distribution of stat across all 1000 of the replicates. This sets the stage for the infer R package (Bray et al. 2019) that was created to help users perform statistical inference such as confidence intervals and hypothesis tests using verbs similar to what you’ve seen with dplyr. We’ll walk through setting up each of the infer verbs for confidence intervals using this pennies_sample example, while also explaining the purpose of the verbs in a general framework. 9.2 The infer package for statistical inference The infer package makes great use of the %&gt;% to create a pipeline for statistical inference. The goal of the package is to provide a way for its users to explain the computational process of confidence intervals and hypothesis tests using the code as a guide. The verbs build in order here, so you’ll want to start with specify() and then continue through the others as needed. 9.2.1 Specify variables The specify() function is used primarily to choose which variables will be the focus of the statistical inference. In addition, a setting of which variable will act as the explanatory and which acts as the response variable is done here. For proportion problems to those in Chapter 8, we can also give which of the different levels we would like to have as a success. We’ll see further examples of these options in this chapter, Chapter 10, and in Appendix B. To begin to create a confidence interval for the population mean age of US pennies in 2011, we start by using specify() to choose which variable in our pennies_sample data we’d like to work with. This can be done in one of two ways: Using the response argument: pennies_sample %&gt;% specify(response = age_in_2011) Response: age_in_2011 (integer) # A tibble: 40 x 1 age_in_2011 &lt;int&gt; 1 6 2 30 3 34 4 19 5 6 6 5 7 11 8 19 9 23 10 15 # … with 30 more rows Using formula notation: pennies_sample %&gt;% specify(formula = age_in_2011 ~ NULL) Response: age_in_2011 (integer) # A tibble: 40 x 1 age_in_2011 &lt;int&gt; 1 6 2 30 3 34 4 19 5 6 6 5 7 11 8 19 9 23 10 15 # … with 30 more rows Note that the formula notation uses the common R methodology to include the response \\(y\\) variable on the left of the ~ and the explanatory \\(x\\) variable on the right of the “tilde.” Recall that you used this notation frequently with the lm() function in Chapters 6 and 7 when fitting regression models. Either notation works just fine, but a preference is usually given here for the formula notation to further build on the ideas from earlier chapters. 9.2.2 Generate replicates After specify()ing the variables we’d like in our inferential analysis, we next feed that into the generate() verb. The generate() verb’s main argument is reps, which is used to give how many different repetitions one would like to perform. Another argument here is type, which is automatically determined by the kinds of variables passed into specify(). We can also be explicit and set this type to be type = &quot;bootstrap&quot;. This type argument will be further used in hypothesis testing in Chapter 10 as well. Make sure to check out ?generate to see the options here and use the ? operator to better understand other verbs as well. Let’s generate() 1000 bootstrap samples: thousand_bootstrap_samples &lt;- pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% generate(reps = 1000) We can use the dplyr count() function to help us understand what the thousand_bootstrap_samples data frame looks like: thousand_bootstrap_samples %&gt;% count(replicate) # A tibble: 1,000 x 2 # Groups: replicate [1,000] replicate n &lt;int&gt; &lt;int&gt; 1 1 40 2 2 40 3 3 40 4 4 40 5 5 40 6 6 40 7 7 40 8 8 40 9 9 40 10 10 40 # … with 990 more rows Notice that each replicate has 40 entries here. Now that we have 1000 different bootstrap samples, our next step is to calculate the bootstrap statistics for each sample. 9.2.3 Calculate summary statistics After generate()ing many different samples, we next want to condense those samples down into a single statistic for each replicated sample. As seen in the diagram, the calculate() function is helpful here. As we did at the beginning of this chapter, we now want to calculate the mean age_in_2011 for each bootstrap sample. To do so, we use the stat argument and set it to &quot;mean&quot; below. The stat argument has a variety of different options here and we will see further examples of this throughout the remaining chapters. bootstrap_distribution &lt;- pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;mean&quot;) bootstrap_distribution # A tibble: 1,000 x 2 replicate stat &lt;int&gt; &lt;dbl&gt; 1 1 26.5 2 2 25.4 3 3 26.0 4 4 26 5 5 25.2 6 6 29.0 7 7 22.8 8 8 26.4 9 9 24.9 10 10 28.1 # … with 990 more rows We see that the resulting data has 1000 rows and 2 columns corresponding to the 1000 replicates and the mean for each bootstrap sample. Observed statistic / point estimate calculations Just as group_by() %&gt;% summarize() produces a useful workflow in dplyr, we can also use specify() %&gt;% calculate() to compute summary measures on our original sample data. It’s often helpful both in confidence interval calculations, but also in hypothesis testing to identify what the corresponding statistic is in the original data. For our example on penny age, we computed above a value of x_bar using the summarize() verb in dplyr: pennies_sample %&gt;% summarize(stat = mean(age_in_2011)) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 This can also be done by skipping the generate() step in the pipeline feeding specify() directly into calculate(): pennies_sample %&gt;% specify(response = age_in_2011) %&gt;% calculate(stat = &quot;mean&quot;) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 25.1 This shortcut will be particularly useful when the calculation of the observed statistic is tricky to do using dplyr alone. This is particularly the case when working with more than one variable as will be seen in Chapter 10. 9.2.4 Visualize the results The visualize() verb provides a simple way to view the bootstrap distribution as a histogram of the stat variable values. It has many other arguments that one can use as well including the shading of the histogram values corresponding to the confidence interval values. bootstrap_distribution %&gt;% visualize() The shape of this resulting distribution may look familiar to you. It resembles the well-known normal (bell-shaped) curve. The following diagram recaps the infer pipeline for creating a bootstrap distribution. 9.3 Now to confidence intervals Definition: Confidence Interval A confidence interval gives a range of plausible values for a parameter. It depends on a specified confidence level with higher confidence levels corresponding to wider confidence intervals and lower confidence levels corresponding to narrower confidence intervals. Common confidence levels include 90%, 95%, and 99%. Usually we don’t just begin sections with a definition, but confidence intervals are simple to define and play an important role in the sciences and any field that uses data. You can think of a confidence interval as playing the role of a net when fishing. Instead of just trying to catch a fish with a single spear (estimating an unknown parameter by using a single point estimate/statistic), we can use a net to try to provide a range of possible locations for the fish (use a range of possible values based around our statistic to make a plausible guess as to the location of the parameter). The bootstrapping process will provide bootstrap statistics that have a bootstrap distribution with center at (or extremely close to) the mean of the original sample. This can be seen by giving the observed statistic obs_stat argument the value of the point estimate x_bar. bootstrap_distribution %&gt;% visualize(obs_stat = x_bar) We can also compute the mean of the bootstrap distribution of means to see how it compares to x_bar: bootstrap_distribution %&gt;% summarize(mean_of_means = mean(stat)) # A tibble: 1 x 1 mean_of_means &lt;dbl&gt; 1 25.1 In this case, we can see that the bootstrap distribution provides us a guess as to what the variability in different sample means may look like only using the original sample as our guide. We can quantify this variability in the form of a 95% confidence interval in a couple different ways. 9.3.1 The percentile method One way to calculate a range of plausible values for the unknown mean age of coins in 2011 is to use the middle 95% of the bootstrap_distribution to determine our endpoints. Our endpoints are thus at the 2.5th and 97.5th percentiles. This can be done with infer using the get_ci() function. (You can also use the conf_int() or get_confidence_interval() functions here as they are aliases that work the exact same way.) bootstrap_distribution %&gt;% get_ci(level = 0.95, type = &quot;percentile&quot;) # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 These options are the default values for level and type so we can also just do: percentile_ci &lt;- bootstrap_distribution %&gt;% get_ci() percentile_ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 Using the percentile method, our range of plausible values for the mean age of US pennies in circulation in 2011 is 20.972 years to 29.252 years. We can use the visualize() function to view this using the endpoints and direction arguments, setting direction to &quot;between&quot; (between the values) and endpoints to be those stored with name percentile_ci. bootstrap_distribution %&gt;% visualize(endpoints = percentile_ci, direction = &quot;between&quot;) You can see that 95% of the data stored in the stat variable in bootstrap_distribution falls between the two endpoints with 2.5% to the left outside of the shading and 2.5% to the right outside of the shading. The cut-off points that provide our range are shown with the darker lines. 9.3.2 The standard error method If the bootstrap distribution is close to symmetric and bell-shaped, we can also use a shortcut formula for determining the lower and upper endpoints of the confidence interval. This is done by using the formula \\(\\bar{x} \\pm (multiplier * SE),\\) where \\(\\bar{x}\\) is our original sample mean and \\(SE\\) stands for standard error and corresponds to the standard deviation of the bootstrap distribution. The value of \\(multiplier\\) here is the appropriate percentile of the standard normal distribution. These are automatically calculated when level is provided with level = 0.95 being the default. (95% of the values in a standard normal distribution fall within 1.96 standard deviations of the mean, so \\(multiplier = 1.96\\) for level = 0.95, for example.) As mentioned, this formula assumes that the bootstrap distribution is symmetric and bell-shaped. This is often the case with bootstrap distributions, especially those in which the original distribution of the sample is not highly skewed. Definition: standard error The standard error is the standard deviation of the sampling distribution. The variability of the sampling distribution may be approximated by the variability of the bootstrap distribution. Traditional theory-based methodologies for inference also have formulas for standard errors, assuming some conditions are met. This \\(\\bar{x} \\pm (multiplier * SE)\\) formula is implemented in the get_ci() function as shown with our pennies problem using the bootstrap distribution’s variability as an approximation for the sampling distribution’s variability. We’ll see more on this approximation shortly. Note that the center of the confidence interval (the point_estimate) must be provided for the standard error confidence interval. standard_error_ci &lt;- bootstrap_distribution %&gt;% get_ci(type = &quot;se&quot;, point_estimate = x_bar) standard_error_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 21.0 29.3 bootstrap_distribution %&gt;% visualize(endpoints = standard_error_ci, direction = &quot;between&quot;) We see that both methods produce nearly identical confidence intervals with the percentile method being \\([20.97, 29.25]\\) and the standard error method being \\([20.97, 29.28]\\). 9.4 Comparing bootstrap and sampling distributions To help build up the idea of a confidence interval, we weren’t completely honest in our initial discussion. The pennies_sample data frame represents a sample from a larger number of pennies stored as pennies in the moderndive package. The pennies data frame (also in the moderndive package) contains 800 rows of data and two columns pertaining to the same variables as pennies_sample. Let’s begin by understanding some of the properties of the age_by_2011 variable in the pennies data frame. ggplot(pennies, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) pennies %&gt;% summarize(mean_age = mean(age_in_2011), median_age = median(age_in_2011)) # A tibble: 1 x 2 mean_age median_age &lt;dbl&gt; &lt;dbl&gt; 1 21.2 20 We see that pennies is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that pennies_sample was more symmetric than pennies. In fact, it actually exhibited some left-skew as we compare the mean and median values. ggplot(pennies_sample, aes(x = age_in_2011)) + geom_histogram(bins = 10, color = &quot;white&quot;) pennies_sample %&gt;% summarize(mean_age = mean(age_in_2011), median_age = median(age_in_2011)) # A tibble: 1 x 2 mean_age median_age &lt;dbl&gt; &lt;dbl&gt; 1 25.1 25.5 Sampling distribution Let’s assume that pennies represents our population of interest. We can then create a sampling distribution for the population mean age of pennies, denoted by the Greek letter \\(\\mu\\), using the rep_sample_n() function seen in Chapter 8. First we will create 1000 samples from the pennies data frame. thousand_samples &lt;- pennies %&gt;% rep_sample_n(size = 40, reps = 1000, replace = FALSE) When creating a sampling distribution, we do not replace the items when we create each sample. This is in contrast to the bootstrap distribution. It’s important to remember that the sampling distribution is sampling without replacement from the population to better understand sample-to-sample variability, whereas the bootstrap distribution is sampling with replacement from our original sample to better understand potential sample-to-sample variability. After sampling from pennies 1000 times, we next want to compute the mean age for each of the 1000 samples: sampling_distribution &lt;- thousand_samples %&gt;% group_by(replicate) %&gt;% summarize(stat = mean(age_in_2011)) We could use ggplot() with geom_histogram() again, but since we’ve named our column in summarize() to be stat, we can also use the shortcut visualize() function in infer and also specify the number of bins and also fill the bars with a different color such as &quot;salmon&quot;. This will be done to help remember that &quot;salmon&quot; corresponds to “sampling distribution”. sampling_distribution %&gt;% visualize(bins = 10, fill = &quot;salmon&quot;) Figure 9.1: Sampling distribution for n=40 samples of pennies We can also examine the variability in this sampling distribution by calculating the standard deviation of the stat column. Remember that the standard deviation of the sampling distribution is the standard error, frequently denoted as se. sampling_distribution %&gt;% summarize(se = sd(stat)) # A tibble: 1 x 1 se &lt;dbl&gt; 1 2.01 Bootstrap distribution Let’s now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We’ll shade the bootstrap distribution blue to further assist with remembering which is which. bootstrap_distribution %&gt;% visualize(bins = 10, fill = &quot;blue&quot;) bootstrap_distribution %&gt;% summarize(se = sd(stat)) # A tibble: 1 x 1 se &lt;dbl&gt; 1 2.12 Notice that while the standard deviations are similar, the center of the sampling distribution and the bootstrap distribution differ: sampling_distribution %&gt;% summarize(mean_of_sampling_means = mean(stat)) # A tibble: 1 x 1 mean_of_sampling_means &lt;dbl&gt; 1 21.2 bootstrap_distribution %&gt;% summarize(mean_of_bootstrap_means = mean(stat)) # A tibble: 1 x 1 mean_of_bootstrap_means &lt;dbl&gt; 1 25.1 Since the bootstrap distribution is centered at the original sample mean, it doesn’t necessarily provide a good estimate of the overall population mean \\(\\mu\\). Let’s calculate the mean of age_in_2011 for the pennies data frame to see how it compares to the mean of the sampling distribution and the mean of the bootstrap distribution. pennies %&gt;% summarize(overall_mean = mean(age_in_2011)) # A tibble: 1 x 1 overall_mean &lt;dbl&gt; 1 21.2 Notice that this value matches up well with the mean of the sampling distribution. This is actually an artifact of the Central Limit Theorem introduced in Chapter 8. The mean of the sampling distribution is expected to be the mean of the overall population. The unfortunate fact though is that we don’t know the population mean in nearly all circumstances. The motivation of presenting it here was to show that the theory behind the Central Limit Theorem works using the tools you’ve worked with so far using the ggplot2, dplyr, moderndive, and infer packages. If we aren’t able to use the sample mean as a good guess for the population mean, how should we best go about estimating what the population mean may be if we can only select samples from the population. We’ve now come full circle and can discuss the underpinnings of the confidence interval and ways to interpret it. 9.5 Interpreting the confidence interval As shown above in Subsection 9.3.1, one range of plausible values for the population mean age of pennies in 2011, denoted by \\(\\mu\\), is \\([20.97, 29.25]\\). Recall that this confidence interval is based on bootstrapping using pennies_sample. Note that the mean of pennies (21.152) does fall in this confidence interval. If we had a different sample of size 40 and constructed a confidence interval using the same method, would we be guaranteed that it contained the population parameter value as well? Let’s try it out: pennies_sample2 &lt;- pennies %&gt;% sample_n(size = 40) Note the use of the sample_n() function in the dplyr package here. This does the same thing as rep_sample_n(reps = 1) but omits the extra replicate column. We next create an infer pipeline to generate a percentile-based 95% confidence interval for \\(\\mu\\): percentile_ci2 &lt;- pennies_sample2 %&gt;% specify(formula = age_in_2011 ~ NULL) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;mean&quot;) %&gt;% get_ci() percentile_ci2 # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 18.4 25.3 This new confidence interval also contains the value of \\(\\mu\\). Let’s further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of pennies. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years. Of the 100 confidence intervals based on samples of size \\(n = 40\\), 96 of them captured the population mean \\(\\mu = 21.152\\), whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we’d expect 95% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is “95% reliable” in that we can expect it to include the true population parameter 95% of the time if the process is repeated. To further accentuate this point, let’s perform a similar procedure using 90% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals. Of the 100 confidence intervals based on samples of size \\(n = 40\\), 87 of them captured the population mean \\(\\mu = 21.152\\), whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be “95% confident” or “90% confident” that the true value falls within the range of the specified confidence interval. We will use this “confident” language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process. Back to our pennies example After this elaboration on what the level corresponds to in a confidence interval, let’s conclude by providing an interpretation of the original confidence interval result we found in Subsection 9.3.1. Interpretation: We are 95% confident that the true mean age of pennies in circulation in 2011 is between 20.972 and 29.252 years. This level of confidence is based on the percentile-based method including the true mean 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created. 9.6 EXAMPLE: One proportion Let’s revisit our exercise of trying to estimate the proportion of red balls in the bowl from Chapter 8. We are now interested in determining a confidence interval for population parameter \\(p\\), the proportion of balls that are red out of the total \\(N = 2400\\) red and white balls. We will use the first sample reported from Ilyas and Yohan in Subsection 8.2.2 for our point estimate. They observed 21 red balls out of the 50 in their shovel. This data is stored in the tactile_shovel1 data frame in the moderndive package. tactile_shovel1 # A tibble: 50 x 1 color &lt;chr&gt; 1 red 2 red 3 white 4 red 5 white 6 red 7 red 8 white 9 red 10 white # … with 40 more rows 9.6.1 Observed Statistic To compute the proportion that are red in this data we can use the specify() %&gt;% calculate() workflow. Note the use of the success argument here to clarify which of the two colors &quot;red&quot; or &quot;white&quot; we are interested in. p_hat &lt;- tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% calculate(stat = &quot;prop&quot;) p_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.42 9.6.2 Bootstrap distribution Next we want to calculate many different bootstrap samples and their corresponding bootstrap statistic (the proportion of red balls). We’ve done 1000 in the past, but let’s go up to 10,000 now to better see the resulting distribution. Recall that this is done by including a generate() function call in the middle of our pipeline: tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% generate(reps = 10000) This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the calculate() step. bootstrap_props &lt;- tactile_shovel1 %&gt;% specify(formula = color ~ NULL, success = &quot;red&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;prop&quot;) Let’s visualize() what the resulting bootstrap distribution looks like as a histogram. We’ve adjusted the number of bins here as well to better see the resulting shape. bootstrap_props %&gt;% visualize(bins = 25) We see that the resulting distribution is symmetric and bell-shaped so it doesn’t much matter which confidence interval method we choose. Let’s use the standard error method to create a 95% confidence interval. standard_error_ci &lt;- bootstrap_props %&gt;% get_ci(type = &quot;se&quot;, level = 0.95, point_estimate = p_hat) standard_error_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 0.284 0.556 bootstrap_props %&gt;% visualize(bins = 25, endpoints = standard_error_ci) We are 95% confident that the true proportion of red balls in the bowl is between 0.284 and years. This level of confidence is based on the standard error-based method including the true proportion 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created. 9.6.3 Theory-based confidence intervals When the bootstrap distribution has the nice symmetric, bell shape that we saw in the red balls example above, we can also use a formula to quantify the standard error. This provides another way to compute a confidence interval, but is a little more tedious and mathematical. The steps are outlined below. We’ve also shown how we can use the confidence interval (CI) interpretation in this case as well to support your understanding of this tricky concept. Procedure for building a theory-based CI for \\(p\\) To construct a theory-based confidence interval for \\(p\\), the unknown true population proportion we Collect a sample of size \\(n\\) Compute \\(\\widehat{p}\\) Compute the standard error \\[\\text{SE} = \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Compute the margin of error \\[\\text{MoE} = 1.96 \\cdot \\text{SE} = 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Compute both end points of the confidence interval: The lower end point lower_ci: \\[\\widehat{p} - \\text{MoE} = \\widehat{p} - 1.96 \\cdot \\text{SE} = \\widehat{p} - 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] The upper end point upper_ci: \\[\\widehat{p} + \\text{MoE} = \\widehat{p} + 1.96 \\cdot \\text{SE} = \\widehat{p} + 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\] Alternatively, you can succinctly summarize a 95% confidence interval for \\(p\\) using the \\(\\pm\\) symbol: \\[ \\widehat{p} \\pm \\text{MoE} = \\widehat{p} \\pm 1.96 \\cdot \\text{SE} = \\widehat{p} \\pm 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}} \\] Confidence intervals based on 33 tactile samples Let’s load the tactile sampling data for the 33 groups from Chapter 8. Recall this data was saved in the tactile_prop_red data frame included in the moderndive package. tactile_prop_red Let’s now apply the above procedure for constructing confidence intervals for \\(p\\) using the data saved in tactile_prop_red by adding/modifying new columns using the dplyr package data wrangling tools seen in Chapter 5: Rename prop_red to p_hat, the official name of the sample proportion Make explicit the sample size n of \\(n=50\\) the standard error SE the margin of error MoE the left endpoint of the confidence interval lower_ci the right endpoint of the confidence interval upper_ci conf_ints &lt;- tactile_prop_red %&gt;% rename(p_hat = prop_red) %&gt;% mutate( n = 50, SE = sqrt(p_hat * (1 - p_hat) / n), MoE = 1.96 * SE, lower_ci = p_hat - MoE, upper_ci = p_hat + MoE ) conf_ints group red_balls p_hat n SE MoE lower_ci upper_ci Ilyas, Yohan 21 0.42 50 0.070 0.137 0.283 0.557 Morgan, Terrance 17 0.34 50 0.067 0.131 0.209 0.471 Martin, Thomas 21 0.42 50 0.070 0.137 0.283 0.557 Clark, Frank 21 0.42 50 0.070 0.137 0.283 0.557 Riddhi, Karina 18 0.36 50 0.068 0.133 0.227 0.493 Andrew, Tyler 19 0.38 50 0.069 0.135 0.245 0.515 Julia 19 0.38 50 0.069 0.135 0.245 0.515 Rachel, Lauren 11 0.22 50 0.059 0.115 0.105 0.335 Daniel, Caroline 15 0.30 50 0.065 0.127 0.173 0.427 Josh, Maeve 17 0.34 50 0.067 0.131 0.209 0.471 Emily, Emily 16 0.32 50 0.066 0.129 0.191 0.449 Conrad, Emily 18 0.36 50 0.068 0.133 0.227 0.493 Oliver, Erik 17 0.34 50 0.067 0.131 0.209 0.471 Isabel, Nam 21 0.42 50 0.070 0.137 0.283 0.557 X, Claire 15 0.30 50 0.065 0.127 0.173 0.427 Cindy, Kimberly 20 0.40 50 0.069 0.136 0.264 0.536 Kevin, James 11 0.22 50 0.059 0.115 0.105 0.335 Nam, Isabelle 21 0.42 50 0.070 0.137 0.283 0.557 Harry, Yuko 15 0.30 50 0.065 0.127 0.173 0.427 Yuki, Eileen 16 0.32 50 0.066 0.129 0.191 0.449 Ramses 23 0.46 50 0.070 0.138 0.322 0.598 Joshua, Elizabeth, Stanley 15 0.30 50 0.065 0.127 0.173 0.427 Siobhan, Jane 18 0.36 50 0.068 0.133 0.227 0.493 Jack, Will 16 0.32 50 0.066 0.129 0.191 0.449 Caroline, Katie 21 0.42 50 0.070 0.137 0.283 0.557 Griffin, Y 18 0.36 50 0.068 0.133 0.227 0.493 Kaitlin, Jordan 17 0.34 50 0.067 0.131 0.209 0.471 Ella, Garrett 18 0.36 50 0.068 0.133 0.227 0.493 Julie, Hailin 15 0.30 50 0.065 0.127 0.173 0.427 Katie, Caroline 21 0.42 50 0.070 0.137 0.283 0.557 Mallory, Damani, Melissa 21 0.42 50 0.070 0.137 0.283 0.557 Katie 16 0.32 50 0.066 0.129 0.191 0.449 Francis, Vignesh 19 0.38 50 0.069 0.135 0.245 0.515 Let’s plot: These 33 confidence intervals for \\(p\\): from lower_ci to upper_ci The true population proportion \\(p = 900 / 2400 = 0.375\\) with a red vertical line Figure 9.2: 33 confidence intervals based on 33 tactile samples of size n=50 We see that: In 31 cases, the confidence intervals “capture” the true \\(p = 900 / 2400 = 0.375\\) In 2 cases, the confidence intervals do not “capture” the true \\(p = 900 / 2400 = 0.375\\) Thus, the confidence intervals capture the true proportion $31 / 33 = 93.939% of the time using this theory-based methodology. Confidence intervals based on 100 virtual samples Let’s say however, we repeated the above 100 times, not tactilely, but virtually. Let’s do this only 100 times instead of 1000 like we did before so that the results can fit on the screen. Again, the steps for compute a 95% confidence interval for \\(p\\) are: Collect a sample of size \\(n = 50\\) as we did in Chapter 8 Compute \\(\\widehat{p}\\): the sample proportion red of these \\(n=50\\) balls Compute the standard error \\(\\text{SE} = \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Compute the margin of error \\(\\text{MoE} = 1.96 \\cdot \\text{SE} = 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Compute both end points of the confidence interval: lower_ci: \\(\\widehat{p} - \\text{MoE} = \\widehat{p} - 1.96 \\cdot \\text{SE} = \\widehat{p} - 1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) upper_ci: \\(\\widehat{p} + \\text{MoE} = \\widehat{p} + 1.96 \\cdot \\text{SE} = \\widehat{p} +1.96 \\cdot \\sqrt{\\frac{\\widehat{p}(1-\\widehat{p})}{n}}\\) Run the following three steps, being sure to View() the resulting data frame after each step so you can convince yourself of what’s going on: # First: Take 100 virtual samples of n=50 balls virtual_samples &lt;- bowl %&gt;% rep_sample_n(size = 50, reps = 100) # Second: For each virtual sample compute the proportion red virtual_prop_red &lt;- virtual_samples %&gt;% group_by(replicate) %&gt;% summarize(red = sum(color == &quot;red&quot;)) %&gt;% mutate(prop_red = red / 50) # Third: Compute the 95% confidence interval as above virtual_prop_red &lt;- virtual_prop_red %&gt;% rename(p_hat = prop_red) %&gt;% mutate( n = 50, SE = sqrt(p_hat*(1-p_hat)/n), MoE = 1.96 * SE, lower_ci = p_hat - MoE, upper_ci = p_hat + MoE ) Here are the results: Figure 9.3: 100 confidence intervals based on 100 virtual samples of size n=50 We see that of our 100 confidence intervals based on samples of size \\(n=50\\), 96 of them captured the true \\(p = 900/2400\\), whereas 4 of them missed. As we create more and more confidence intervals based on more and more samples, about 95% of these intervals will capture. In other words our procedure is “95% reliable.” Theoretical methods like this have largely been used in the past since we didn’t have the computing power to perform the simulation-based methods such as bootstrapping. They are still commonly used though and if the normality assumptions are met, they can provide a nice option for finding confidence intervals and performing hypothesis tests as we will see in Chapter 10. 9.7 EXAMPLE: Comparing two proportions If you see someone else yawn, are you more likely to yawn? In an episode of the show Mythbusters, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website here. More information about the episode is also available on IMDb here. Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (“confederate”) who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at mythbusters_yawn in the moderndive package. Let’s check it out. mythbusters_yawn # A tibble: 50 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 1 seed yes 2 2 control yes 3 3 seed no 4 4 seed yes 5 5 seed no 6 6 control no 7 7 seed yes 8 8 control no 9 9 control no 10 10 seed no # … with 40 more rows The participant ID is stored in the subj variable with values of 1 to 50. The group variable is either &quot;seed&quot; for when a confederate was trying to influence the participant or &quot;control&quot; if a confederate did not interact with the participant. The yawn variable is either &quot;yes&quot; if the participant yawned or &quot;no&quot; if the participant did not yawn. We can use the janitor package to get a glimpse into this data in a table format: mythbusters_yawn %&gt;% tabyl(group, yawn) %&gt;% adorn_percentages() %&gt;% adorn_pct_formatting() %&gt;% # To show original counts adorn_ns() group no yes control 75.0% (12) 25.0% (4) seed 70.6% (24) 29.4% (10) We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We’d like to see if the difference between these two proportions is significantly larger than 0. If so, we’d have evidence to support the claim that yawning is contagious based on this study. In looking over this problem, we can make note of some important details to include in our infer pipeline: We are calling a success having a yawn value of &quot;yes&quot;. Our response variable will always correspond to the variable used in the success so the response variable is yawn. The explanatory variable is the other variable of interest here: group. To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not. 9.7.1 Compute the point estimate mythbusters_yawn %&gt;% specify(formula = yawn ~ group) Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`. Note that the success argument must be specified in situations such as this where the response variable has only two levels. mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) Response: yawn (factor) Explanatory: group (factor) # A tibble: 50 x 2 yawn group &lt;fct&gt; &lt;fct&gt; 1 yes seed 2 yes control 3 no seed 4 yes seed 5 no seed 6 no control 7 yes seed 8 no control 9 no control 10 no seed # … with 40 more rows We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes. mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;) Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c(&quot;first&quot;, &quot;second&quot;)` means `(&quot;first&quot; - &quot;second&quot;)`. Check `?calculate` for details. We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the order in which R should subtract these proportions of successes. As the error message states, we’ll want to put &quot;seed&quot; first after c() and then &quot;control&quot;: order = c(&quot;seed&quot;, &quot;control&quot;). Our point estimate is thus calculated: obs_diff &lt;- mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;seed&quot;, &quot;control&quot;)) obs_diff # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.0441 This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25). 9.7.2 Bootstrap distribution Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection 9.1.3 and in computing bootstrap proportions in Section 9.6, but we haven’t yet worked with bootstrapping involving multiple variables though. In the infer package, bootstrapping with multiple variables means that each row is potentially resampled. Let’s investigate this by looking at the first few rows of mythbusters_yawn: head(mythbusters_yawn) # A tibble: 6 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 1 seed yes 2 2 control yes 3 3 seed no 4 4 seed yes 5 5 seed no 6 6 control no When we bootstrap this data, we are potentially pulling the subject’s readings multiple times. Thus, we could see the entries of &quot;seed&quot; for group and &quot;no&quot; for yawn together in a new row in a bootstrap sample. This is further seen by exploring the sample_n() function in dplyr on this smaller 6 row data frame comprised of head(mythbusters_yawn). The sample_n() function can perform this bootstrapping procedure and is similar to the rep_sample_n() function in infer, except that it is not repeated but rather only performs one sample with or without replacement. set.seed(2019) head(mythbusters_yawn) %&gt;% sample_n(size = 6, replace = TRUE) # A tibble: 6 x 3 subj group yawn &lt;int&gt; &lt;chr&gt; &lt;chr&gt; 1 5 seed no 2 5 seed no 3 2 control yes 4 4 seed yes 5 1 seed yes 6 1 seed yes We can see that in this bootstrap sample generated from the first six rows of mythbusters_yawn, we have some rows repeated. The same is true when we perform the generate() step in infer as done below. bootstrap_distribution &lt;- mythbusters_yawn %&gt;% specify(formula = yawn ~ group, success = &quot;yes&quot;) %&gt;% generate(reps = 1000) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;seed&quot;, &quot;control&quot;)) bootstrap_distribution %&gt;% visualize(bins = 20) This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply get_ci() can be used. bootstrap_distribution %&gt;% get_ci(type = &quot;percentile&quot;, level = 0.95) # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -0.219 0.293 The confidence interval shown here includes the value of 0. We’ll see in Chapter 10 further what this means in terms of this difference being statistically significant or not, but let’s examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293. Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about “95% confident”) that the seed group had a higher proportion of yawning than the control group. Note that this all relates to the importance of denoting the order argument in the calculate() function. Since we specified &quot;seed&quot; and then &quot;control&quot; positive values for the statistic correspond to the &quot;seed&quot; proportion being higher, whereas negative values correspond to the &quot;control&quot; group being higher. We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that “yawning is contagious” being “confirmed” is not statistically appropriate. Learning check Practice problems to come soon! 9.8 Conclusion 9.8.1 What’s to come? This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter 10 up next! 9.8.2 Script of R code An R script file of all R code used in this chapter is available here. "],
+["10-hypothesis-testing.html", "10 Hypothesis Testing 10.1 When inference is not needed 10.2 Basics of hypothesis testing 10.3 Criminal trial analogy 10.4 Types of errors in hypothesis testing 10.5 Statistical significance 10.6 Hypothesis testing with infer 10.7 Example: Comparing two means 10.8 Building theory-based methods using computation 10.9 Conclusion", " 10 Hypothesis Testing We saw some of the main concepts of hypothesis testing introduced in Chapters 8 and 9. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general.Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations. The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the infer package pipeline in Chapter 9. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix B. We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the \\(t\\)-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(dplyr) library(ggplot2) library(infer) library(nycflights13) library(ggplot2movies) library(broom) DataCamp Our approach of using data science tools to understand the second major component of statistical inference, hypothesis testing, uses the same tools as in Mine Cetinkaya-Rundel and Andrew Bray’s DataCamp courses “Inference for Numerical Data” and “Inference for Categorical Data.” If you’re interested in complementing your learning below in an interactive online environment, click on the images below to access the courses. 10.1 When inference is not needed Before we delve into hypothesis testing, it’s good to remember that there are cases where you need not perform a rigorous statistical inference. An important and time-saving skill is to ALWAYS do exploratory data analysis using dplyr and ggplot2 before thinking about running a hypothesis test. Let’s look at such an example selecting a sample of flights traveling to Boston and to San Francisco from New York City in the flights data frame in the nycflights13 package. (We will remove flights with missing data first using na.omit and then sample 100 flights going to each of the two airports.) bos_sfo &lt;- flights %&gt;% na.omit() %&gt;% filter(dest %in% c(&quot;BOS&quot;, &quot;SFO&quot;)) %&gt;% group_by(dest) %&gt;% sample_n(100) Suppose we were interested in seeing if the air_time to SFO in San Francisco was statistically greater than the air_time to BOS in Boston. As suggested, let’s begin with some exploratory data analysis to get a sense for how the two variables of air_time and dest relate for these two destination airports: bos_sfo_summary &lt;- bos_sfo %&gt;% group_by(dest) %&gt;% summarize(mean_time = mean(air_time), sd_time = sd(air_time)) bos_sfo_summary # A tibble: 2 x 3 dest mean_time sd_time &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 BOS 38.3 4.21 2 SFO 345. 18.0 Looking at these results, we can clearly see that SFO air_time is much larger than BOS air_time. The standard deviation is also extremely informative here. Learning check (LC10.1) Could we make the same type of immediate conclusion that SFO had a statistically greater air_time if, say, its corresponding standard deviation was 200 minutes? What about 100 minutes? Explain. To further understand just how different the air_time variable is for BOS and SFO, let’s look at a boxplot: ggplot(data = bos_sfo, mapping = aes(x = dest, y = air_time)) + geom_boxplot() Since there is no overlap at all, we can conclude that the air_time for San Francisco flights is statistically greater (at any level of significance) than the air_time for Boston flights. This is a clear example of not needing to do anything more than some simple exploratory data analysis with descriptive statistics and data visualization to get an appropriate inferential conclusion. This is one reason why you should ALWAYS investigate the sample data first using dplyr and ggplot2 via exploratory data analysis. As you get more and more practice with hypothesis testing, you’ll be better able to determine in many cases whether or not the results will be statistically significant. There are circumstances where it is difficult to tell, but you should always try to make a guess FIRST about significance after you have completed your data exploration and before you actually begin the inferential techniques. 10.2 Basics of hypothesis testing In a hypothesis test, we will use data from a sample to help us decide between two competing hypotheses about a population. We make these hypotheses more concrete by specifying them in terms of at least one population parameter of interest. We refer to the competing claims about the population as the null hypothesis, denoted by \\(H_0\\), and the alternative (or research) hypothesis, denoted by \\(H_a\\). The roles of these two hypotheses are NOT interchangeable. The claim for which we seek significant evidence is assigned to the alternative hypothesis. The alternative is usually what the experimenter or researcher wants to establish or find evidence for. Usually, the null hypothesis is a claim that there really is “no effect” or “no difference.” In many cases, the null hypothesis represents the status quo or that nothing interesting is happening. We assess the strength of evidence by assuming the null hypothesis is true and determining how unlikely it would be to see sample results/statistics as extreme (or more extreme) as those in the original sample. Hypothesis testing brings about many weird and incorrect notions in the scientific community and society at large. One reason for this is that statistics has traditionally been thought of as this magic box of algorithms and procedures to get to results and this has been readily apparent if you do a Google search of “flowchart statistics hypothesis tests”. There are so many different complex ways to determine which test is appropriate. You’ll see that we don’t need to rely on these complicated series of assumptions and procedures to conduct a hypothesis test any longer. These methods were introduced in a time when computers weren’t powerful. Your cellphone (in 2016) has more power than the computers that sent NASA astronauts to the moon after all. We’ll see that ALL hypothesis tests can be broken down into the following framework given by Allen Downey here: Figure 10.1: Hypothesis Testing Framework Before we hop into this framework, we will provide another way to think about hypothesis testing that may be useful. 10.3 Criminal trial analogy We can think of hypothesis testing in the same context as a criminal trial in the United States. A criminal trial in the United States is a familiar situation in which a choice between two contradictory claims must be made. The accuser of the crime must be judged either guilty or not guilty. Under the U.S. system of justice, the individual on trial is initially presumed not guilty. Only STRONG EVIDENCE to the contrary causes the not guilty claim to be rejected in favor of a guilty verdict. The phrase “beyond a reasonable doubt” is often used to set the cutoff value for when enough evidence has been given to convict. Theoretically, we should never say “The person is innocent.” but instead “There is not sufficient evidence to show that the person is guilty.” Now let’s compare that to how we look at a hypothesis test. The decision about the population parameter(s) must be judged to follow one of two hypotheses. We initially assume that \\(H_0\\) is true. The null hypothesis \\(H_0\\) will be rejected (in favor of \\(H_a\\)) only if the sample evidence strongly suggests that \\(H_0\\) is false. If the sample does not provide such evidence, \\(H_0\\) will not be rejected. The analogy to “beyond a reasonable doubt” in hypothesis testing is what is known as the significance level. This will be set before conducting the hypothesis test and is denoted as \\(\\alpha\\). Common values for \\(\\alpha\\) are 0.1, 0.01, and 0.05. 10.3.1 Two possible conclusions Therefore, we have two possible conclusions with hypothesis testing: Reject \\(H_0\\) Fail to reject \\(H_0\\) Gut instinct says that “Fail to reject \\(H_0\\)” should say “Accept \\(H_0\\)” but this technically is not correct. Accepting \\(H_0\\) is the same as saying that a person is innocent. We cannot show that a person is innocent; we can only say that there was not enough substantial evidence to find the person guilty. When you run a hypothesis test, you are the jury of the trial. You decide whether there is enough evidence to convince yourself that \\(H_a\\) is true (“the person is guilty”) or that there was not enough evidence to convince yourself \\(H_a\\) is true (“the person is not guilty”). You must convince yourself (using statistical arguments) which hypothesis is the correct one given the sample information. Important note: Therefore, DO NOT WRITE “Accept \\(H_0\\)” any time you conduct a hypothesis test. Instead write “Fail to reject \\(H_0\\).” 10.4 Types of errors in hypothesis testing Unfortunately, just as a jury or a judge can make an incorrect decision in regards to a criminal trial by reaching the wrong verdict, there is some chance we will reach the wrong conclusion via a hypothesis test about a population parameter. As with criminal trials, this comes from the fact that we don’t have complete information, but rather a sample from which to try to infer about a population. The possible erroneous conclusions in a criminal trial are an innocent person is convicted (found guilty) or a guilty person is set free (found not guilty). The possible errors in a hypothesis test are rejecting \\(H_0\\) when in fact \\(H_0\\) is true (Type I Error) or failing to reject \\(H_0\\) when in fact \\(H_0\\) is false (Type II Error). The risk of error is the price researchers pay for basing an inference about a population on a sample. With any reasonable sample-based procedure, there is some chance that a Type I error will be made and some chance that a Type II error will occur. To help understand the concepts of Type I error and Type II error, observe the following table: Figure 10.2: Type I and Type II errors If we are using sample data to make inferences about a parameter, we run the risk of making a mistake. Obviously, we want to minimize our chance of error; we want a small probability of drawing an incorrect conclusion. The probability of a Type I Error occurring is denoted by \\(\\alpha\\) and is called the significance level of a hypothesis test The probability of a Type II Error is denoted by \\(\\beta\\). Formally, we can define \\(\\alpha\\) and \\(\\beta\\) in regards to the table above, but for hypothesis tests instead of a criminal trial. \\(\\alpha\\) corresponds to the probability of rejecting \\(H_0\\) when, in fact, \\(H_0\\) is true. \\(\\beta\\) corresponds to the probability of failing to reject \\(H_0\\) when, in fact, \\(H_0\\) is false. Ideally, we want \\(\\alpha = 0\\) and \\(\\beta = 0\\), meaning that the chance of making an error does not exist. When we have to use incomplete information (sample data), it is not possible to have both \\(\\alpha = 0\\) and \\(\\beta = 0\\). We will always have the possibility of at least one error existing when we use sample data. Usually, what is done is that \\(\\alpha\\) is set before the hypothesis test is conducted and then the evidence is judged against that significance level. Common values for \\(\\alpha\\) are 0.05, 0.01, and 0.10. If \\(\\alpha = 0.05\\), we are using a testing procedure that, used over and over with different samples, rejects a TRUE null hypothesis five percent of the time. So if we can set \\(\\alpha\\) to be whatever we want, why choose 0.05 instead of 0.01 or even better 0.0000000000000001? Well, a small \\(\\alpha\\) means the test procedure requires the evidence against \\(H_0\\) to be very strong before we can reject \\(H_0\\). This means we will almost never reject \\(H_0\\) if \\(\\alpha\\) is very small. If we almost never reject \\(H_0\\), the probability of a Type II Error – failing to reject \\(H_0\\) when we should – will increase! Thus, as \\(\\alpha\\) decreases, \\(\\beta\\) increases and as \\(\\alpha\\) increases, \\(\\beta\\) decreases. We, therefore, need to strike a balance in \\(\\alpha\\) and \\(\\beta\\) and the common values for \\(\\alpha\\) of 0.05, 0.01, and 0.10 usually lead to a nice balance. Learning check (LC10.2) Reproduce the table above about errors, but for a hypothesis test, instead of the one provided for a criminal trial. 10.4.1 Logic of hypothesis testing Take a random sample (or samples) from a population (or multiple populations) If the sample data are consistent with the null hypothesis, do not reject the null hypothesis. If the sample data are inconsistent with the null hypothesis (in the direction of the alternative hypothesis), reject the null hypothesis and conclude that there is evidence the alternative hypothesis is true (based on the particular sample collected). 10.5 Statistical significance The idea that sample results are more extreme than we would reasonably expect to see by random chance if the null hypothesis were true is the fundamental idea behind statistical hypothesis tests. If data at least as extreme would be very unlikely if the null hypothesis were true, we say the data are statistically significant. Statistically significant data provide convincing evidence against the null hypothesis in favor of the alternative, and allow us to generalize our sample results to the claim about the population. Learning check (LC10.3) What is wrong about saying “The defendant is innocent.” based on the US system of criminal trials? (LC10.4) What is the purpose of hypothesis testing? (LC10.5) What are some flaws with hypothesis testing? How could we alleviate them? 10.6 Hypothesis testing with infer The “There is Only One Test” diagram mentioned in Section 10.2 was the inspiration for the infer pipeline that you saw for confidence intervals in Chapter 9. For hypothesis tests, we include one more verb into the pipeline: the hypothesize() verb. Its main argument is null which is either &quot;point&quot; for point hypotheses involving a single sample or &quot;independence&quot; for testing for independence between two variables. We’ll first explore the two variable case by comparing two means. Note the section headings here that refer to the “There is Only One Test” diagram. We will lay out the specifics for each problem using this framework and the infer pipeline together. 10.7 Example: Comparing two means 10.7.1 Randomization/permutation We will now focus on building hypotheses looking at the difference between two population means in an example. We will denote population means using the Greek symbol \\(\\mu\\) (pronounced “mu”). Thus, we will be looking to see if one group “out-performs” another group. This is quite possibly the most common type of statistical inference and serves as a basis for many other types of analyses when comparing the relationship between two variables. Our null hypothesis will be of the form \\(H_0: \\mu_1 = \\mu_2\\), which can also be written as \\(H_0: \\mu_1 - \\mu_2 = 0\\). Our alternative hypothesis will be of the form \\(H_0: \\mu_1 \\star \\mu_2\\) (or \\(H_a: \\mu_1 - \\mu_2 \\, \\star \\, 0\\)) where \\(\\star\\) = \\(&lt;\\), \\(\\ne\\), or \\(&gt;\\) depending on the context of the problem. You needn’t focus on these new symbols too much at this point. It will just be a shortcut way for us to describe our hypotheses. As we saw in Chapter 9, bootstrapping is a valuable tool when conducting inferences based on one or two population variables. We will see that the process of randomization (also known as permutation) will be valuable in conducting tests comparing quantitative values from two groups. 10.7.2 Comparing action and romance movies The movies dataset in the ggplot2movies package contains information on a large number of movies that have been rated by users of IMDB.com (Wickham 2015). We are interested in the question here of whether Action movies are rated higher on IMDB than Romance movies. We will first need to do a little bit of data wrangling using the ideas from Chapter 5 to get the data in the form that we would like: movies_trimmed &lt;- movies %&gt;% select(title, year, rating, Action, Romance) Note that Action and Romance are binary variables here. To remove any overlap of movies (and potential confusion) that are both Action and Romance, we will remove them from our population: movies_trimmed &lt;- movies_trimmed %&gt;% filter(!(Action == 1 &amp; Romance == 1)) We will now create a new variable called genre that specifies whether a movie in our movies_trimmed data frame is an &quot;Action&quot; movie, a &quot;Romance&quot; movie, or &quot;Neither&quot;. We aren’t really interested in the &quot;Neither&quot; category here so we will exclude those rows as well. Lastly, the Action and Romance columns are not needed anymore since they are encoded in the genre column. movies_trimmed &lt;- movies_trimmed %&gt;% mutate(genre = case_when(Action == 1 ~ &quot;Action&quot;, Romance == 1 ~ &quot;Romance&quot;, TRUE ~ &quot;Neither&quot;)) %&gt;% filter(genre != &quot;Neither&quot;) %&gt;% select(-Action, -Romance) The case_when function is useful for assigning values in a new variable based on the values of another variable. The last step of TRUE ~ &quot;Neither&quot; is used when a particular movie is not set to either Action or Romance. We are left with 8878 movies in our population dataset that focuses on only &quot;Action&quot; and &quot;Romance&quot; movies. Learning check (LC10.6) Why are the different genre variables stored as binary variables (1s and 0s) instead of just listing the genre as a column of values like “Action”, “Comedy”, etc.? (LC10.7) What complications could come above with us excluding action romance movies? Should we question the results of our hypothesis test? Explain. Let’s now visualize the distributions of rating across both levels of genre. Think about what type(s) of plot is/are appropriate here before you proceed: ggplot(data = movies_trimmed, aes(x = genre, y = rating)) + geom_boxplot() Figure 10.3: Rating vs genre in the population We can see that the middle 50% of ratings for &quot;Action&quot; movies is more spread out than that of &quot;Romance&quot; movies in the population. &quot;Romance&quot; has outliers at both the top and bottoms of the scale though. We are initially interested in comparing the mean rating across these two groups so a faceted histogram may also be useful: ggplot(data = movies_trimmed, mapping = aes(x = rating)) + geom_histogram(binwidth = 1, color = &quot;white&quot;) + facet_grid(genre ~ .) Figure 10.4: Faceted histogram of genre vs rating Important note: Remember that we hardly ever have access to the population values as we do here. This example and the nycflights13 dataset were used to create a common flow from chapter to chapter. In nearly all circumstances, we’ll be needing to use only a sample of the population to try to infer conclusions about the unknown population parameter values. These examples do show a nice relationship between statistics (where data is usually small and more focused on experimental settings) and data science (where data is frequently large and collected without experimental conditions). 10.7.3 Sampling \\(\\rightarrow\\) randomization We can use hypothesis testing to investigate ways to determine, for example, whether a treatment has an effect over a control and other ways to statistically analyze if one group performs better than, worse than, or different than another. We are interested here in seeing how we can use a random sample of action movies and a random sample of romance movies from movies to determine if a statistical difference exists in the mean ratings of each group. Learning check (LC10.8) Define the relevant parameters here in terms of the populations of movies. 10.7.4 Data Let’s select a random sample of 34 action movies and a random sample of 34 romance movies. (The number 34 was chosen somewhat arbitrarily here.) set.seed(2017) movies_genre_sample &lt;- movies_trimmed %&gt;% group_by(genre) %&gt;% sample_n(34) %&gt;% ungroup() Note the addition of the ungroup() function here. This will be useful shortly in allowing us to permute the values of rating across genre. Our analysis does not work without this ungroup() function since the data stays grouped by the levels of genre without it. We can now observe the distributions of our two sample ratings for both groups. Remember that these plots should be rough approximations of our population distributions of movie ratings for &quot;Action&quot; and &quot;Romance&quot; in our population of all movies in the movies data frame. ggplot(data = movies_genre_sample, aes(x = genre, y = rating)) + geom_boxplot() Figure 10.5: Genre vs rating for our sample ggplot(data = movies_genre_sample, mapping = aes(x = rating)) + geom_histogram(binwidth = 1, color = &quot;white&quot;) + facet_grid(genre ~ .) Figure 10.6: Genre vs rating for our sample as faceted histogram Learning check (LC10.9) What single value could we change to improve the approximation using the sample distribution on the population distribution? Do we have reason to believe, based on the sample distributions of rating over the two groups of genre, that there is a significant difference between the mean rating for action movies compared to romance movies? It’s hard to say just based on the plots. The boxplot does show that the median sample rating is higher for romance movies, but the histogram isn’t as clear. The two groups have somewhat differently shaped distributions but they are both over similar values of rating. It’s often useful to calculate the mean and standard deviation as well, conditioned on the two levels. summary_ratings &lt;- movies_genre_sample %&gt;% group_by(genre) %&gt;% summarize(mean = mean(rating), std_dev = sd(rating), n = n()) summary_ratings # A tibble: 2 x 4 genre mean std_dev n &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; 1 Action 5.11 1.49 34 2 Romance 6.06 1.15 34 Learning check (LC10.10) Why did we not specify na.rm = TRUE here as we did in Chapter 5? We see that the sample mean rating for romance movies, \\(\\bar{x}_{r}\\), is greater than the similar measure for action movies, \\(\\bar{x}_a\\). But is it statistically significantly greater (thus, leading us to conclude that the means are statistically different)? The standard deviation can provide some insight here but with these standard deviations being so similar it’s still hard to say for sure. Learning check (LC10.11) Why might the standard deviation provide some insight about the means being statistically different or not? 10.7.5 Model of \\(H_0\\) The hypotheses we specified can also be written in another form to better give us an idea of what we will be simulating to create our null distribution. \\(H_0: \\mu_r - \\mu_a = 0\\) \\(H_a: \\mu_r - \\mu_a \\ne 0\\) 10.7.6 Test statistic \\(\\delta\\) We are, therefore, interested in seeing whether the difference in the sample means, \\(\\bar{x}_r - \\bar{x}_a\\), is statistically different than 0. We can now come back to our infer pipeline for computing our observed statistic. Note the order argument that shows the mean value for &quot;Action&quot; being subtracted from the mean value of &quot;Romance&quot;. 10.7.7 Observed effect \\(\\delta^*\\) obs_diff &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) obs_diff # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.95 Our goal next is to figure out a random process with which to simulate the null hypothesis being true. Recall that \\(H_0: \\mu_r - \\mu_a = 0\\) corresponds to us assuming that the population means are the same. We would like to assume this is true and perform a random process to generate() data in the model of the null hypothesis. 10.7.8 Simulated data Tactile simulation Here, with us assuming the two population means are equal (\\(H_0: \\mu_r - \\mu_a = 0\\)), we can look at this from a tactile point of view by using index cards. There are \\(n_r = 34\\) data elements corresponding to romance movies and \\(n_a = 34\\) for action movies. We can write the 34 ratings from our sample for romance movies on one set of 34 index cards and the 34 ratings for action movies on another set of 34 index cards. (Note that the sample sizes need not be the same.) The next step is to put the two stacks of index cards together, creating a new set of 68 cards. If we assume that the two population means are equal, we are saying that there is no association between ratings and genre (romance vs action). We can use the index cards to create two new stacks for romance and action movies. First, we must shuffle all the cards thoroughly. After doing so, in this case with equal values of sample sizes, we split the deck in half. We then calculate the new sample mean rating of the romance deck, and also the new sample mean rating of the action deck. This creates one simulation of the samples that were collected originally. We next want to calculate a statistic from these two samples. Instead of actually doing the calculation using index cards, we can use R as we have before to simulate this process. Let’s do this just once and compare the results to what we see in movies_genre_sample. shuffled_ratings_old &lt;- #movies_trimmed %&gt;% movies_genre_sample %&gt;% mutate(genre = mosaic::shuffle(genre)) %&gt;% group_by(genre) %&gt;% summarize(mean = mean(rating)) diff(shuffled_ratings_old$mean) [1] 0.126 permuted_ratings &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% generate(reps = 1) Learning check (LC10.12) How would the tactile shuffling of index cards change if we had different samples of say 20 action movies and 60 romance movies? Describe each step that would change. (LC10.13) Why are we taking the difference in the means of the cards in the new shuffled decks? 10.7.9 Distribution of \\(\\delta\\) under \\(H_0\\) The generate() step completes a permutation sending values of ratings to potentially different values of genre from which they originally came. It simulates a shuffling of the ratings between the two levels of genre just as we could have done with index cards. We can now proceed in a similar way to what we have done previously with bootstrapping by repeating this process many times to create simulated samples, assuming the null hypothesis is true. generated_samples &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) A null distribution of simulated differences in sample means is created with the specification of stat = &quot;diff in means&quot; for the calculate() step. The null distribution is similar to the bootstrap distribution we saw in Chapter 9, but remember that it consists of statistics generated assuming the null hypothesis is true. We can now plot the distribution of these simulated differences in means: null_distribution_two_means %&gt;% visualize() Figure 10.7: Simulated differences in means histogram 10.7.10 The p-value Remember that we are interested in seeing where our observed sample mean difference of 0.95 falls on this null/randomization distribution. We are interested in simply a difference here so “more extreme” corresponds to values in both tails on the distribution. Let’s shade our null distribution to show a visual representation of our \\(p\\)-value: null_distribution_two_means %&gt;% visualize(obs_stat = obs_diff, direction = &quot;both&quot;) Figure 10.8: Shaded histogram to show p-value Remember that the observed difference in means was 0.95. We have shaded red all values at or above that value and also shaded red those values at or below its negative value (since this is a two-tailed test). By giving obs_stat = obs_diff a vertical darker line is also shown at 0.95. To better estimate how large the \\(p\\)-value will be, we also increase the number of bins to 100 here from 20: null_distribution_two_means %&gt;% visualize(bins = 100, obs_stat = obs_diff, direction = &quot;both&quot;) Figure 10.9: Histogram with vertical lines corresponding to observed statistic At this point, it is important to take a guess as to what the \\(p\\)-value may be. We can see that there are only a few permuted differences as extreme or more extreme than our observed effect (in both directions). Maybe we guess that this \\(p\\)-value is somewhere around 2%, or maybe 3%, but certainly not 30% or more. Lastly, we calculate the \\(p\\)-value directly using infer: pvalue &lt;- null_distribution_two_means %&gt;% get_pvalue(obs_stat = obs_diff, direction = &quot;both&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.0046 We have around 0.46% of values as extreme or more extreme than our observed statistic in both directions. Assuming we are using a 5% significance level for \\(\\alpha\\), we have evidence supporting the conclusion that the mean rating for romance movies is different from that of action movies. The next important idea is to better understand just how much higher of a mean rating can we expect the romance movies to have compared to that of action movies. 10.7.11 Corresponding confidence interval One of the great things about the infer pipeline is that going between hypothesis tests and confidence intervals is incredibly simple. To create a null distribution, we ran null_distribution_two_means &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) To get the corresponding bootstrap distribution with which we can compute a confidence interval, we can just remove or comment out the hypothesize() step since we are no longer assuming the null hypothesis is true when we bootstrap: percentile_ci_two_means &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% # hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) %&gt;% get_ci() percentile_ci_two_means # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.333 1.59 Thus, we can expect the true mean of Romance movies on IMDB to have a rating 0.333 to 1.593 points higher than that of Action movies. Remember that this is based on bootstrapping using movies_genre_sample as our original sample and the confidence interval process being 95% reliable. Learning check (LC10.14) Conduct the same analysis comparing action movies versus romantic movies using the median rating instead of the mean rating? What was different and what was the same? (LC10.15) What conclusions can you make from viewing the faceted histogram looking at rating versus genre that you couldn’t see when looking at the boxplot? (LC10.16) Describe in a paragraph how we used Allen Downey’s diagram to conclude if a statistical difference existed between mean movie ratings for action and romance movies. (LC10.17) Why are we relatively confident that the distributions of the sample ratings will be good approximations of the population distributions of ratings for the two genres? (LC10.18) Using the definition of “\\(p\\)-value”, write in words what the \\(p\\)-value represents for the hypothesis test above comparing the mean rating of romance to action movies. (LC10.19) What is the value of the \\(p\\)-value for the hypothesis test comparing the mean rating of romance to action movies? (LC10.20) Do the results of the hypothesis test match up with the original plots we made looking at the population of movies? Why or why not? 10.7.12 Summary To review, these are the steps one would take whenever you’d like to do a hypothesis test comparing values from the distributions of two groups: Simulate many samples using a random process that matches the way the original data were collected and that assumes the null hypothesis is true. Collect the values of a sample statistic for each sample created using this random process to build a null distribution. Assess the significance of the original sample by determining where its sample statistic lies in the null distribution. If the proportion of values as extreme or more extreme than the observed statistic in the randomization distribution is smaller than the pre-determined significance level \\(\\alpha\\), we reject \\(H_0\\). Otherwise, we fail to reject \\(H_0\\). (If no significance level is given, one can assume \\(\\alpha = 0.05\\).) 10.8 Building theory-based methods using computation As a point of reference, we will now discuss the traditional theory-based way to conduct the hypothesis test for determining if there is a statistically significant difference in the sample mean rating of Action movies versus Romance movies. This method and ones like it work very well when the assumptions are met in order to run the test. They are based on probability models and distributions such as the normal and \\(t\\)-distributions. These traditional methods have been used for many decades back to the time when researchers didn’t have access to computers that could run 5000 simulations in a few seconds. They had to base their methods on probability theory instead. Many fields and researchers continue to use these methods and that is the biggest reason for their inclusion here. It’s important to remember that a \\(t\\)-test or a \\(z\\)-test is really just an approximation of what you have seen in this chapter already using simulation and randomization. The focus here is on understanding how the shape of the \\(t\\)-curve comes about without digging big into the mathematical underpinnings. 10.8.1 Example: \\(t\\)-test for two independent samples What is commonly done in statistics is the process of normalization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common normalization is known as the \\(z\\)-score. The formula for a \\(z\\)-score is \\[Z = \\frac{x - \\mu}{\\sigma},\\] where \\(x\\) represent the value of a variable, \\(\\mu\\) represents the mean of the variable, and \\(\\sigma\\) represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding \\(z\\)-score that gives how many standard deviations away that value is from its mean. \\(z\\)-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below. Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity. Another form of normalization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This normalization is often called the \\(t\\)-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is \\[T =\\dfrac{ (\\bar{x}_1 - \\bar{x}_2) - (\\mu_1 - \\mu_2)}{ \\sqrt{\\dfrac{{s_1}^2}{n_1} + \\dfrac{{s_2}^2}{n_2}} }\\] There is a lot to try to unpack here. \\(\\bar{x}_1\\) is the sample mean response of the first group \\(\\bar{x}_2\\) is the sample mean response of the second group \\(\\mu_1\\) is the population mean response of the first group \\(\\mu_2\\) is the population mean response of the second group \\(s_1\\) is the sample standard deviation of the response of the first group \\(s_2\\) is the sample standard deviation of the response of the second group \\(n_1\\) is the sample size of the first group \\(n_2\\) is the sample size of the second group Assuming that the null hypothesis is true (\\(H_0: \\mu_1 - \\mu_2 = 0\\)), \\(T\\) is said to be distributed following a \\(t\\) distribution with degrees of freedom equal to the smaller value of \\(n_1 - 1\\) and \\(n_2 - 1\\). The “degrees of freedom” can be thought of measuring how different the \\(t\\) distribution will be as compared to a normal distribution. Small sample sizes lead to small degrees of freedom and, thus, \\(t\\) distributions that have more values in the tails of their distributions. Large sample sizes lead to large degrees of freedom and, thus, \\(t\\) distributions that closely align with the standard normal, bell-shaped curve. So, assuming \\(H_0\\) is true, our formula simplifies a bit: \\[T =\\dfrac{ \\bar{x}_1 - \\bar{x}_2}{ \\sqrt{\\dfrac{{s_1}^2}{n_1} + \\dfrac{{s_2}^2}{n_2}} }.\\] We have already built an approximation for what we think the distribution of \\(\\delta = \\bar{x}_1 - \\bar{x}_2\\) looks like using randomization above. Recall this distribution: ggplot(data = null_distribution_two_means, aes(x = stat)) + geom_histogram(color = &quot;white&quot;, bins = 20) Figure 10.10: Simulated differences in means histogram The infer package also includes some built-in theory-based statistics as well, so instead of going through the process of trying to transform the difference into a standardized form, we can just provide a different value for stat in calculate(). Recall the generated_samples data frame created via: generated_samples &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 5000) We can now created a null distribution of \\(t\\) statistics: null_distribution_t &lt;- generated_samples %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) null_distribution_t %&gt;% visualize() We see that the shape of this stat = &quot;t&quot; distribution is the same as that of stat = &quot;diff in means&quot;. The scale has changed though with the \\(t\\) values having less spread than the difference in means. A traditional \\(t\\)-test doesn’t look at this simulated distribution, but instead it looks at the \\(t\\)-curve with degrees of freedom equal to 62.029. We can overlay this distribution over the top of our permuted \\(t\\) statistics using the method = &quot;both&quot; setting in visualize(). null_distribution_t %&gt;% visualize(method = &quot;both&quot;) We can see that the curve does a good job of approximating the randomization distribution here. (More on when to expect for this to be the case when we discuss conditions for the \\(t\\)-test in a bit.) To calculate the \\(p\\)-value in this case, we need to figure out how much of the total area under the \\(t\\)-curve is at our observed \\(T\\)-statistic or more, plus also adding the area under the curve at the negative value of the observed \\(T\\)-statistic or below. (Remember this is a two-tailed test so we are looking for a difference–values in the tails of either direction.) Just as we converted all of the simulated values to \\(T\\)-statistics, we must also do so for our observed effect \\(\\delta^*\\): obs_t &lt;- movies_genre_sample %&gt;% specify(formula = rating ~ genre) %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Romance&quot;, &quot;Action&quot;)) So graphically we are interested in finding the percentage of values that are at or above 2.945 or at or below -2.945. null_distribution_t %&gt;% visualize(method = &quot;both&quot;, obs_stat = obs_t, direction = &quot;both&quot;) As we might have expected with this just being a standardization of the difference in means statistic that produced a small \\(p\\)-value, we also have a very small one here. 10.8.2 Conditions for t-test The infer package does not automatically check conditions for the theoretical methods to work and this warning was given when we used method = &quot;both&quot;. In order for the results of the \\(t\\)-test to be valid, three conditions must be met: Independent observations in both samples Nearly normal populations OR large sample sizes (\\(n \\ge 30\\)) Independently selected samples Condition 1: This is met since we sampled at random using R from our population. Condition 2: Recall from Figure 10.4, that we know how the populations are distributed. Both of them are close to normally distributed. If we are a little concerned about this assumption, we also do have samples of size larger than 30 (\\(n_1 = n_2 = 34\\)). Condition 3: This is met since there is no natural pairing of a movie in the Action group to a movie in the Romance group. Since all three conditions are met, we can be reasonably certain that the theory-based test will match the results of the randomization-based test using shuffling. Remember that theory-based tests can produce some incorrect results in these assumptions are not carefully checked. The only assumption for randomization and computational-based methods is that the sample is selected at random. They are our preference and we strongly believe they should be yours as well, but it’s important to also see how the theory-based tests can be done and used as an approximation for the computational techniques until at least more researchers are using these techniques that utilize the power of computers. 10.9 Conclusion We conclude by showing the infer pipeline diagram. In Chapter 11, we’ll come back to regression and see how the ideas covered in Chapter 9 and this chapter can help in understanding the significance of predictors in modeling. 10.9.1 Script of R code An R script file of all R code used in this chapter is available here. "],
+["11-inference-for-regression.html", "11 Inference for Regression 11.1 Simulation-based Inference for Regression 11.2 Bootstrapping for the regression slope 11.3 Inference for multiple regression", " 11 Inference for Regression Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(infer) DataCamp Our approach of understanding both the statistical and practical significance of any regression results, is aligned with the approach taken in Jo Hardin’s DataCamp course “Inference for Regression.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. 11.1 Simulation-based Inference for Regression We can also use the concept of permuting to determine the standard error of our null distribution and conduct a hypothesis test for a population slope. Let’s go back to our example on teacher evaluations Chapters 6 and 7. We’ll begin in the basic regression setting to test to see if we have evidence that a statistically significant positive relationship exists between teaching and beauty scores for the University of Texas professors. As we did in Chapter 6, teaching score will act as our outcome variable and bty_avg will be our explanatory variable. We will set up this hypothesis testing process as we have each before via the “There is Only One Test” diagram in Figure 10.1 using the infer package. 11.1.1 Data Our data is stored in evals and we are focused on the measurements of the score and bty_avg variables there. Note that we don’t choose a subset of variables here since we will specify() the variables of interest using infer. evals %&gt;% specify(score ~ bty_avg) Response: score (numeric) Explanatory: bty_avg (numeric) # A tibble: 463 x 2 score bty_avg &lt;dbl&gt; &lt;dbl&gt; 1 4.7 5 2 4.1 5 3 3.9 5 4 4.8 5 5 4.6 3 6 4.3 3 7 2.8 3 8 4.1 3.33 9 3.4 3.33 10 4.5 3.17 # … with 453 more rows 11.1.2 Test statistic \\(\\delta\\) Our test statistic here is the sample slope coefficient that we denote with \\(b_1\\). 11.1.3 Observed effect \\(\\delta^*\\) We can use the specify() %&gt;% calculate() shortcut here to determine the slope value seen in our observed data: slope_obs &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% calculate(stat = &quot;slope&quot;) The calculated slope value from our observed sample is \\(b_1 = 0.067\\). 11.1.4 Model of \\(H_0\\) We are looking to see if a positive relationship exists so \\(H_A: \\beta_1 &gt; 0\\). Our null hypothesis is always in terms of equality so we have \\(H_0: \\beta_1 = 0\\). In other words, when we assume the null hypothesis is true, we are assuming there is NOT a linear relationship between teaching and beauty scores for University of Texas professors. 11.1.5 Simulated data Now to simulate the null hypothesis being true and recreating how our sample was created, we need to think about what it means for \\(\\beta_1\\) to be zero. If \\(\\beta_1 = 0\\), we said above that there is no relationship between the teaching and beauty scores. If there is no relationship, then any one of the teaching score values could have just as likely occurred with any of the other beauty score values instead of the one that it actually did fall with. We, therefore, have another example of permuting in our simulating of data under the null hypothesis. Tactile simulation We could use a deck of 926 note cards to create a tactile simulation of this permuting process. We would write the 463 different values of beauty scores on each of the 463 cards, one per card. We would then do the same thing for the 463 teaching scores putting them on one per card. Next, we would lay out each of the 463 beauty score cards and we would shuffle the teaching score deck. Then, after shuffling the deck well, we would disperse the cards one per each one of the beauty score cards. We would then enter these new values in for teaching score and compute a sample slope based on this permuting. We could repeat this process many times, keeping track of our sample slope after each shuffle. 11.1.6 Distribution of \\(\\delta\\) under \\(H_0\\) We can build our null distribution in much the same way we did in Chapter 10 using the generate() and calculate() functions. Note also the addition of the hypothesize() function, which lets generate() know to perform the permuting instead of bootstrapping. null_slope_distn &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;slope&quot;) null_slope_distn %&gt;% visualize(obs_stat = slope_obs, direction = &quot;greater&quot;) In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with visualize(). 11.1.7 The p-value null_slope_distn %&gt;% get_pvalue(obs_stat = slope_obs, direction = &quot;greater&quot;) # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0 Since 0.067 falls far to the right of this plot beyond where any of the histogram bins have data, we can say that we have a \\(p\\)-value of 0. We, thus, have evidence to reject the null hypothesis in support of there being a positive association between the beauty score and teaching score of University of Texas faculty members. Learning check (LC11.1) Repeat the inference above but this time for the correlation coefficient instead of the slope. Note the implementation of stat = &quot;correlation&quot; in the calculate() function of the infer package. 11.2 Bootstrapping for the regression slope With the p-value calculated as 0 in the hypothesis test above, we can next determine just how strong of a positive slope value we might expect between the variables of teaching score and beauty score (bty_avg) for University of Texas faculty. Recall the infer pipeline above to compute the null distribution. Recall that this assumes the null hypothesis is true that there is no relationship between teaching score and beauty score using the hypothesize() function. null_slope_distn &lt;- evals %&gt;% specify(score ~ bty_avg) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000, type = &quot;permute&quot;) %&gt;% calculate(stat = &quot;slope&quot;) To further reinforce the process being done in the pipeline, we’ve added the type argument to generate(). This is automatically added based on the entries for specify() and hypothesize() but it provides a useful way to check to make sure generate() is created the samples in the desired way. In this case, we permuted the values of one variable across the values of the other 10,000 times and calculated a &quot;slope&quot; coefficient for each of these 10,000 generated samples. If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping: bootstrap_slope_distn %&gt;% visualize() Next we can use the get_ci() function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score. percentile_slope_ci &lt;- bootstrap_slope_distn %&gt;% get_ci(level = 0.99, type = &quot;percentile&quot;) percentile_slope_ci # A tibble: 1 x 2 `0.5%` `99.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.0229 0.110 se_slope_ci &lt;- bootstrap_slope_distn %&gt;% get_ci(level = 0.99, type = &quot;se&quot;, point_estimate = slope_obs) se_slope_ci # A tibble: 1 x 2 lower upper &lt;dbl&gt; &lt;dbl&gt; 1 0.0220 0.111 With the bootstrap distribution being close to symmetric, it makes sense that the two resulting confidence intervals are similar. 11.3 Inference for multiple regression 11.3.1 Refresher: Professor evaluations data Let’s revisit the professor evaluations data that we analyzed using multiple regression with one numerical and one categorical predictor. In particular \\(y\\): outcome variable of instructor evaluation score predictor variables \\(x_1\\): numerical explanatory/predictor variable of age \\(x_2\\): categorical explanatory/predictor variable of gender library(ggplot2) library(dplyr) library(moderndive) evals_multiple &lt;- evals %&gt;% select(score, ethnicity, gender, language, age, bty_avg, rank) First, recall that we had two competing potential models to explain professors’ teaching scores: Model 1: No interaction term. i.e. both male and female profs have the same slope describing the associated effect of age on teaching score Model 2: Includes an interaction term. i.e. we allow for male and female profs to have different slopes describing the associated effect of age on teaching score 11.3.2 Refresher: Visualizations Recall the plots we made for both these models: Figure 11.1: Model 1: no interaction effect included Figure 11.2: Model 2: interaction effect included 11.3.3 Refresher: Regression tables Last, let’s recall the regressions we fit. First, the regression with no interaction effect: note the use of + in the formula. score_model_2 &lt;- lm(score ~ age + gender, data = evals_multiple) get_regression_table(score_model_2) Table 11.1: Model 1: Regression table with no interaction effect included term estimate std_error statistic p_value lower_ci upper_ci intercept 4.484 0.125 35.79 0.000 4.238 4.730 age -0.009 0.003 -3.28 0.001 -0.014 -0.003 gendermale 0.191 0.052 3.63 0.000 0.087 0.294 Second, the regression with an interaction effect: note the use of * in the formula. score_model_3 &lt;- lm(score ~ age * gender, data = evals_multiple) get_regression_table(score_model_3) Table 11.2: Model 2: Regression table with interaction effect included term estimate std_error statistic p_value lower_ci upper_ci intercept 4.883 0.205 23.80 0.000 4.480 5.286 age -0.018 0.004 -3.92 0.000 -0.026 -0.009 gendermale -0.446 0.265 -1.68 0.094 -0.968 0.076 age:gendermale 0.014 0.006 2.45 0.015 0.003 0.024 11.3.4 Script of R code An R script file of all R code used in this chapter is available here. "],
+["12-thinking-with-data.html", "12 Thinking with Data 12.1 Case study: Seattle house prices 12.2 Case study: Effective data storytelling Concluding remarks", " 12 Thinking with Data Recall in Section 1.2 “Introduction for students” and at the end of chapters throughout this book, we displayed the “ModernDive flowchart” mapping your journey through this book. Figure 12.1: ModernDive Flowchart Let’s get a refresher of what you’ve covered so far. You first got started with with data in Chapter 2, where you learned about the difference between R and RStudio, started coding in R, started understanding what R packages are, and explored your first dataset: all domestic departure flights from a New York City airport in 2013. Then: Data science: You assembled your data science toolbox using tidyverse packages. In particular: Ch.3: Visualizing data via the ggplot2 package. Ch.4: Understanding the concept of “tidy” data as a standardized data input format for all packages in the tidyverse Ch.5: Wrangling data via the dplyr package. Data modeling: Using these data science tools and helper functions from the moderndive package, you started performing data modeling. In particular: Ch.6: Constructing basic regression models. Ch.7: Constructing multiple regression models. Statistical inference: Once again using your newly acquired data science tools, we unpacked statistical inference using the infer package. In particular: Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls. Ch.9: Building confidence intervals. Ch.10: Conducting hypothesis tests. Data modeling revisited: Armed with your new understanding of statistical inference, you revisited and reviewed the models you constructed in Ch.6 &amp; Ch.7. In particular: Ch.11: Interpreting both the statistical and practice significance of the results of the models. All this was our approach of guiding you through your first experiences of “thinking with data”, an expression originally coined by Diane Lambert of Google. How the philosophy underlying this expression guided our mapping of the flowchart above was well put in the introduction to the “Practical Data Science for Stats” collection of preprints focusing on the practical side of data science workflows and statistical analysis, curated by Jennifer Bryan and Hadley Wickham: There are many aspects of day-to-day analytical work that are almost absent from the conventional statistics literature and curriculum. And yet these activities account for a considerable share of the time and effort of data analysts and applied statisticians. The goal of this collection is to increase the visibility and adoption of modern data analytical workflows. We aim to facilitate the transfer of tools and frameworks between industry and academia, between software engineering and statistics and computer science, and across different domains. In other words, in order to be equipped to “think with data” in the 21st century, future analysts need preparation going through the entirety of the “Data/Science Pipeline” we also saw earlier and not just parts of it. Figure 12.2: Data/Science Pipeline In Section 12.1, we’ll take you through full-pass of the “Data/Science Pipeline” where we’ll analyze the sale price of houses in Seattle, WA, USA. In Section 12.2, we’ll present you with examples of effective data storytelling, in particular the articles from the data journalism website FiveThirtyEight.com, many of whose source datasets are accessible from the fivethirtyeight R package. Needed packages Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) library(fivethirtyeight) DataCamp The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression”. Cases studies involving data in the fivethirtyeight R package form the basis of ModernDive co-author Chester Ismay’s DataCamp course “Effective Data Storytelling in the Tidyverse”. This free course can be accessed here. 12.1 Case study: Seattle house prices Kaggle.com is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the House Sales in King County, USA consisting of homes sold in between May 2014 and May 2015 in King County, Washington State, USA, which includes the greater Seattle metropolitan area. This CC0: Public Domain licensed dataset is included in the moderndive package in the house_prices data frame, which we’ll refer to as the “Seattle house prices” dataset. The dataset consists 21,613 houses and 21 variables describing these houses; for a full list of these variables see the help file by running ?house_prices in the console. In this case study, we’ll create a model using multiple regression where: The outcome variable \\(y\\) is the sale price of houses The two explanatory/predictor variables we’ll use are : \\(x_1\\): house size sqft_living, as measured by square feet of living space, where 1 square foot is about 0.09 square meters. \\(x_2\\): house condition, a categorical variable with 5 levels where 1 indicates “poor” and 5 indicates “excellent.” Let’s load all the packages needed for this case study (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages. library(ggplot2) library(dplyr) library(moderndive) 12.1.1 Exploratory data analysis (EDA) A crucial first step before any formal modeling is an exploratory data analysis, commonly abbreviated at EDA. Exploratory data analysis can give you a sense of your data, help identify issues with your data, bring to light any outliers, and help inform model construction. There are three basic approaches to EDA: Most fundamentally, just looking at the raw data. For example using RStudio’s View() spreadsheet viewer or the glimpse() function from the dplyr package Creating visualizations like the ones using ggplot2 from Chapter 3 Computing summary statistics using the dplyr data wrangling tools from Chapter 5 First, let’s look the raw data using View() and the glimpse() function. Explore the dataset. Which variables are numerical and which are categorical? For the categorical variables, what are their levels? Which do you think would useful variables to use in a model for house price? In this case study, we’ll only consider the variables price, sqft_living, and condition. An important thing to observe is that while the condition variable has values 1 through 5, these are saved in R as fct factors i.e. R’s way of saving categorical variables. So you should think of these as the “labels” 1 through 5 and not the numerical values 1 through 5. View(house_prices) glimpse(house_prices) Observations: 21,613 Variables: 21 $ id &lt;chr&gt; &quot;7129300520&quot;, &quot;6414100192&quot;, &quot;5631500400&quot;, &quot;2487200875&quot;,… $ date &lt;date&gt; 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-0… $ price &lt;dbl&gt; 221900, 538000, 180000, 604000, 510000, 1225000, 257500… $ bedrooms &lt;int&gt; 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2… $ bathrooms &lt;dbl&gt; 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2… $ sqft_living &lt;int&gt; 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 18… $ sqft_lot &lt;int&gt; 5650, 7242, 10000, 5000, 8080, 101930, 6819, 9711, 7470… $ floors &lt;dbl&gt; 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, … $ waterfront &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,… $ view &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0… $ condition &lt;fct&gt; 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4… $ grade &lt;fct&gt; 7, 7, 6, 7, 8, 11, 7, 7, 7, 7, 8, 7, 7, 7, 7, 9, 7, 7, … $ sqft_above &lt;int&gt; 1180, 2170, 770, 1050, 1680, 3890, 1715, 1060, 1050, 18… $ sqft_basement &lt;int&gt; 0, 400, 0, 910, 0, 1530, 0, 0, 730, 0, 1700, 300, 0, 0,… $ yr_built &lt;int&gt; 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 2… $ yr_renovated &lt;int&gt; 0, 1991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0… $ zipcode &lt;fct&gt; 98178, 98125, 98028, 98136, 98074, 98053, 98003, 98198,… $ lat &lt;dbl&gt; 47.5, 47.7, 47.7, 47.5, 47.6, 47.7, 47.3, 47.4, 47.5, 4… $ long &lt;dbl&gt; -122, -122, -122, -122, -122, -122, -122, -122, -122, -… $ sqft_living15 &lt;int&gt; 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780, 2… $ sqft_lot15 &lt;int&gt; 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 8113,… Let’s now perform the second possible approach to EDA: creating visualizations. Since price and sqft_living are numerical variables, an appropriate way to visualize of these variables’ distributions would be using a histogram using a geom_histogram() as seen in Section 3.5. However, since condition is categorical, a barplot using a geom_bar() yields an appropriate visualization of its distribution. Recall from Section 3.8 that since condition is not “pre-counted”, we use a geom_bar() and not a geom_col(). In Figure 12.3, we display all three of these visualizations at once. # Histogram of house price: ggplot(house_prices, aes(x = price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;price (USD)&quot;, title = &quot;House price&quot;) # Histogram of sqft_living: ggplot(house_prices, aes(x = sqft_living)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;living space (square feet)&quot;, title = &quot;House size&quot;) # Barplot of condition: ggplot(house_prices, aes(x = condition)) + geom_bar() + labs(x = &quot;condition&quot;, title = &quot;House condition&quot;) Figure 12.3: Exploratory visualizations of Seattle house prices data We observe the following: In the histogram for price: Since e+06 means \\(10^6\\), or one million, we see that a majority of houses are less than 2 million dollars. The x-axis stretches out far to the right to 8 million dollars, even though there appear to be no houses. In the histogram for size sqft_living Most houses appear to have less than 5000 square feet of living space. For comparison a standard American football field is about 57,600 square feet, where as a standard soccer AKA association football field is about 64,000 square feet. The x-axis exhibits the same stretched out behavior to the right as for price Most houses are of condition 3, 4, or 5. In the case of price, why does the x-axis stretch so far to the right? It is because there are a very small number of houses with price closer to 8 million; these prices are outliers in this case. We say variable is “right skewed” as exhibited by the long right tail. This skew makes it difficult to compare prices of the less expensive houses as the more expensive houses dominate the scale of the x-axis. This is similarly the case for sqft_living. Let’s now perform the third possible approach to EDA: computing summary statistics. In particular, let’s compute 4 summary statistics using the summarize() data wrangling verb from Section 5.4. Two measures of center: the mean and median Two measures of variability/spread: the standard deviation and interquartile-range (IQR = 3rd quartile - 1st quartile) house_prices %&gt;% summarize( mean_price = mean(price), median_price = median(price), sd_price = sd(price), IQR_price = IQR(price) ) # A tibble: 1 x 4 mean_price median_price sd_price IQR_price &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 540088. 450000 367127. 323050 Observe the following: The mean price of $540,088 is larger than the median of $450,000. This is because the small number of very expensive outlier houses prices are inflating the average, whereas since the median is the “middle” value, it is not as sensitive to such large values at the high end. This is why the news typically reports median house prices and not average house prices when describing the real estate market. We say here that the median more “robust to outliers” than the mean. Similarly, while both the standard deviation and IQR are both measures of spread and variability, the IQR is more “robust to outliers”. If you repeat the above summarize() for sqft_living, you’ll find a similar relationship between mean vs median and standard deviation vs IQR given its similar right-skewed nature. Is there anything we can do about this right-skew? Again, this could potentially be an issue because we’ll have a harder time discriminating between houses at the lower end of price and sqft_living, which might lead to a problem when modeling. We can in fact address this issue by using a log base 10 transformation, which we cover next. 12.1.2 log10 transformations At its simplest, log10() transformations returns base 10 logarithms. For example, since \\(1000 = 10^3\\), log10(1000) returns 3. To undo a log10-transformation, we raise 10 to this value. For example, to undo the previous log10-transformation and return the original value of 1000, we raise 10 to this value \\(10^{3}\\) by running 10^(3) = 1000. log-transformations allow us to focus on multiplicative changes instead of additive ones, thereby emphasizing changes in “orders of magnitude.” Let’s illustrate this idea in Table ?? with examples of prices of consumer goods in US dollars. Price log10(Price) Order of magnitude Examples $1 0 Singles Cups of coffee $10 1 Tens Books $100 2 Hundreds Mobile phones $1,000 3 Thousands High definition TV’s $10,000 4 Tens of thousands Cars $100,000 5 Hundreds of thousands Luxury cars &amp; houses $1,000,000 6 Millions Luxury houses Let’s break this down: When purchasing a cup of coffee, we tend to think of prices ranging in single dollars e.g. $2 or $3. However when purchasing say mobile phones, we don’t tend to think in prices in single dollars e.g. $676 or $757, but tend to round to the nearest unit of hundreds of dollars e.g. $200 or $500. Let’s say want to know the log10-transformed value of $76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since $76 is between $10 and $100. In fact, log10(76) is 1.880814. log10-transformations are monotonic, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B). Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: $100 to $1000. Let’s create new log10-transformed versions of the right-skewed variable price and sqft_living using the mutate() function from Section 5.6, but we’ll give the latter the name log10_size, which is a little more succinct and descriptive a variable name. house_prices &lt;- house_prices %&gt;% mutate( log10_price = log10(price), log10_size = log10(sqft_living) ) Let’s first display the before and after effects of this transformation on these variables for only the first 10 rows of house_prices: house_prices %&gt;% select(price, log10_price, sqft_living, log10_size) # A tibble: 10 x 4 price log10_price sqft_living log10_size &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; 1 221900 5.35 1180 3.07 2 538000 5.73 2570 3.41 3 180000 5.26 770 2.89 4 604000 5.78 1960 3.29 5 510000 5.71 1680 3.23 6 1225000 6.09 5420 3.73 7 257500 5.41 1715 3.23 8 291850 5.47 1060 3.03 9 229500 5.36 1780 3.25 10 323000 5.51 1890 3.28 Observe in particular: The house in the 6th row with price $1,225,000, which is just above one million dollars. Since \\(10^6\\) is one million, its log10_price is 6.09. Contrast this with all other houses with log10_price less than 6. Similarly, there is only one house with size sqft_living less than 1000. Since \\(1000 = 10^3\\), its the lone house with log10_size less than 3. Let’s now visualize the before and after effects of this transformation for price in Figure 12.4. # Before: ggplot(house_prices, aes(x = price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;price (USD)&quot;, title = &quot;House price: Before&quot;) # After: ggplot(house_prices, aes(x = log10_price)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;log10 price (USD)&quot;, title = &quot;House price: After&quot;) Figure 12.4: House price before and after log10-transformation Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and bell-shaped, although this isn’t always necessarily the case. Now you can now better discriminate between house prices at the lower end of the scale. Let’s do the same for size where the before variable is sqft_living and the after variable is log10_size. Observe in Figure 12.5 that the log10-transformation has a similar effect of un-skewing the variable. Again, we emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case. # Before: ggplot(house_prices, aes(x = sqft_living)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;living space (square feet)&quot;, title = &quot;House size: Before&quot;) # After: ggplot(house_prices, aes(x = log10_size)) + geom_histogram(color = &quot;white&quot;) + labs(x = &quot;log10 living space (square feet)&quot;, title = &quot;House size: After&quot;) Figure 12.5: House size before and after log10-transformation Given the now un-skewed nature of log10_price and log10_size, we are going to revise our modeling structure: We’ll use a new outcome variable \\(y\\) log10_price of houses The two explanatory/predictor variables we’ll use are: \\(x_1\\): A modified version of house size: log10_size \\(x_2\\): House condition will remain unchanged 12.1.3 EDA Part II Let’s continue our exploratory data analysis from Subsection 12.1.1 above. The earlier EDA you performed was univariate in nature in that we only considered one variable at a time. The goal of modeling, however, is to explore relationships between variables. So we must jointly consider the relationship between the outcome variable log10_price and the explanatory/predictor variables log10_size (numerical) and condition (categorical). We viewed such a modeling scenario in Section 7.2 using the evals dataset, where the outcome variable was teaching score, the numerical explanatory/predictor variable was instructor age and the categorical explanatory/predictor variable was (binary) gender. We have two possible visual models. Either a parallel slopes model in Figure 12.6 where we have a different regression line for each of the 5 possible condition levels, each with a different intercept but the same slope: Figure 12.6: Parallel slopes model Or an interaction model in Figure 12.7, where we allow each regression line to not only have different intercepts, but different slopes as well: ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) + geom_point(alpha = 0.1) + labs(y = &quot;log10 price&quot;, x = &quot;log10 size&quot;, title = &quot;House prices in Seattle&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) Figure 12.7: Interaction model In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plot it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the purple line is highest, followed by condition 4 and 3. As for condition 1 and 2, this pattern isn’t as clear, as if you recall from the univariate barplot of condition in Figure 12.3 there are very few houses of condition 1 or 2. This ready is more apparent in an alternative visualization to Figure 12.7 displayed in Figure 12.8 that uses facets instead: ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) + geom_point(alpha = 0.3) + labs(y = &quot;log10 price&quot;, x = &quot;log10 size&quot;, title = &quot;House prices in Seattle&quot;) + geom_smooth(method = &quot;lm&quot;, se = FALSE) + facet_wrap(~condition) Figure 12.8: Interaction model with facets Which exploratory visualization of the interaction model is better, the one in Figure 12.7 or Figure 12.8? There is no universal right answer, you need to make a choice depending on what you want to convey, and own it. 12.1.4 Regression modeling For now let’s focus on the latter, interaction model we’ve visualized in Figure 12.8 above. What are the 5 different slopes and intercepts for the condition = 1, condition = 2, …, and condition = 5 lines in Figure 12.8? To determine these, we first need the values from the regression table: # Fit regression model: price_interaction &lt;- lm(log10_price ~ log10_size * condition, data = house_prices) # Get regression table: get_regression_table(price_interaction) # A tibble: 10 x 7 term estimate std_error statistic p_value lower_ci upper_ci &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; 1 intercept 3.33 0.451 7.38 0 2.45 4.22 2 log10_size 0.69 0.148 4.65 0 0.399 0.98 3 condition2 0.047 0.498 0.094 0.925 -0.93 1.02 4 condition3 -0.367 0.452 -0.812 0.417 -1.25 0.519 5 condition4 -0.398 0.453 -0.879 0.38 -1.29 0.49 6 condition5 -0.883 0.457 -1.93 0.053 -1.78 0.013 7 log10_size:condition2 -0.024 0.163 -0.148 0.882 -0.344 0.295 8 log10_size:condition3 0.133 0.148 0.893 0.372 -0.158 0.424 9 log10_size:condition4 0.146 0.149 0.979 0.328 -0.146 0.437 10 log10_size:condition5 0.31 0.15 2.07 0.039 0.016 0.604 Recall from Section 7.2.3 on how to interpret the outputs where there exists an interaction term, where in this case the “baseline for comparison” group for the categorical variable condition are the condition 1 houses. We’ll write our answers as: \\[\\widehat{\\log10(\\text{price})} = \\hat{\\beta}_0 + \\hat{\\beta}_{\\text{size}} * \\log10(\\text{size})\\] for all five condition levels separately: Condition 1: \\(\\widehat{\\log10(\\text{price})} = 3.33 + 0.69 * \\log10(\\text{size})\\) Condition 2: \\(\\widehat{\\log10(\\text{price})} = (3.33 + 0.047) + (0.69 - 0.024) * \\log10(\\text{size}) = 3.38 + 0.666 * \\log10(\\text{size})\\) Condition 3: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.367) + (0.69 + 0.133) * \\log10(\\text{size}) = 2.96 + 0.823 * \\log10(\\text{size})\\) Condition 4: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.398) + (0.69 + 0.146) * \\log10(\\text{size}) = 2.93 + 0.836 * \\log10(\\text{size})\\) Condition 5: \\(\\widehat{\\log10(\\text{price})} = (3.33 - 0.883) + (0.69 + 0.31) * \\log10(\\text{size}) = 2.45 + 1 * \\log10(\\text{size})\\) These correspond to the regression lines in the exploratory visualization of the interaction model in Figure 12.7 above. For homes of all 5 condition types, as the size of the house increases, the prices increases. This is what most would expect. However, the rate of increase of price with size is fastest for the homes with condition 3, 4, and 5 of 0.823, 0.836, and 1 respectively; these are the 3 most largest slopes out of the 5. 12.1.5 Making predictions Say you’re a realtor and someone calls you asking you how much their home will sell for. They tell you that it’s in condition = 5 and is sized 1900 square feet. What do you tell them? We first make this prediction visually in Figure 12.9. The predicted log10_price of this house is marked with a black dot: it is where the two following lines intersect: The purple regression line for the condition = 5 homes and The vertical dashed black line at log10_size equals 3.28, since our predictor variable is the log10-transformed square feet of living space and \\(\\log10(1900) = 3.28\\) . Figure 12.9: Interaction model with prediction Eyeballing it, it seems the predicted log10_price seems to be around 5.72. Let’s now obtain the an exact numerical value for the prediction using the values of the intercept and slope for the condition = 5 that we computed using the regression table output. We use the equation for the condition = 5 line, being sure to log10() the square footage first. 2.45 + 1 * log10(1900) [1] 5.73 This value is very close to our earlier visually made prediction of 5.72. But wait! We were using the outcome variable log10_price as our outcome variable! So if we want a prediction in terms of price in dollar units, we need to un-log this by taking a power of 10 as described in Section 12.1.2. 10^(2.45 + 1 * log10(1900)) [1] 535493 So we our predicted price for this home of condition 5 and size 1900 square feet is $535,493. Learning check (LC12.1) Repeat the regression modeling in Subsection 12.1.4 and the prediction making you just did on the house of condition 5 and size 1900 square feet in Subsection 12.1.5, but using the parallel slopes model you visualized in Figure 12.6. Hint: it’s $524,807! 12.2 Case study: Effective data storytelling Note: This section is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. As we’ve progressed throughout this book, you’ve seen how to work with data in a variety of ways. You’ve learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types. You’ve summarized data in table form and calculated summary statistics for a variety of different variables. Further, you’ve seen the value of inference as a process to come to conclusions about a population by using a random sample. Lastly, you’ve explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure. All throughout, you’ve learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the “effective data storytelling” done by data journalists around the world. Great data stories don’t mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling. 12.2.1 Bechdel test for Hollywood gender representation We recommend you read and analyze this article by Walt Hickey entitled The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women on the Bechdel test, an informal test of gender representation in a movie. As you read over it, think carefully about how Walt is using data, graphics, and analyses to paint the picture for the reader of what the story is he wants to tell. In the spirit of reproducibility, the members of FiveThirtyEight have also shared the data and R code that they used to create for this story and many more of their articles on GitHub. ModernDive co-authors Chester Ismay and Albert Y. Kim along with Jennifer Chunn went one step further by creating the fivethirtyeight R package. The fivethirtyeight package takes FiveThirtyEight’s article data from GitHub, “tames” it so that it’s novice-friendly, and makes all data, documentation, and the original article easily accessible via an R package. The package homepage also includes a list of all fivethirtyeight data sets included. Furthermore, example “vignettes” of fully reproducible start-to-finish analyses of some of these data using dplyr, ggplot2, and other packages in the tidyverse is available here. For example, a vignette showing how to reproduce one of the plots at the end of the above article on the Bechdel test is available here. 12.2.2 US Births in 1999 Here is another example involving the US_births_1994_2003 data frame of all births in the United States between 1994 and 2003. For more information on this data frame including a link to the original article on FiveThirtyEight.com, check out the help file by running ?US_births_1994_2003 in the console. First, let’s load all necessary packages: library(ggplot2) library(dplyr) library(fivethirtyeight) It’s always a good idea to preview your data, either by using RStudio’s spreadsheet View() function or using glimpse() from the dplyr package below: # Preview data glimpse(US_births_1994_2003) Observations: 3,652 Variables: 6 $ year &lt;int&gt; 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1… $ month &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… $ date_of_month &lt;int&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, … $ date &lt;date&gt; 1994-01-01, 1994-01-02, 1994-01-03, 1994-01-04, 1994-0… $ day_of_week &lt;ord&gt; Sat, Sun, Mon, Tues, Wed, Thurs, Fri, Sat, Sun, Mon, Tu… $ births &lt;int&gt; 8096, 7772, 10142, 11248, 11053, 11406, 11251, 8653, 79… We’ll focus on the number of births for each date, but only for births that occurred in 1999. Recall we achieve this using the filter() command from dplyr package: US_births_1999 &lt;- US_births_1994_2003 %&gt;% filter(year == 1999) Since date is a notion of time, which has a sequential ordering to it, a linegraph AKA a “time series” plot would be more appropriate than a scatterplot. In other words, use a geom_line() instead of geom_point(): ggplot(US_births_1999, aes(x = date, y = births)) + geom_line() + labs(x = &quot;Data&quot;, y = &quot;Number of births&quot;, title = &quot;US Births in 1999&quot;) We see a big valley occurring just before January 1st, 2000, mostly likely due to the holiday season. However, what about the major peak of over 14,000 births occurring just before October 1st, 1999? What could be the reason for this anomalously high spike in ? Time to think with data! 12.2.3 Other examples Stand by! 12.2.4 Script of R code An R script file of all R code used in this chapter is available here. Concluding remarks If you’ve come to this point in the book, I’d suspect that you know a thing or two about how to work with data in R. You’ve also gained a lot of knowledge about how to use simulation techniques to determine statistical significance and how these techniques build an intuition about traditional inferential methods like the \\(t\\)-test. The hope is that you’ve come to appreciate data wrangling, tidy datasets, and the power of data visualization. Actually, the data visualization part may be the most important thing here. If you can create truly beautiful graphics that display information in ways that the reader can clearly decipher, you’ve picked up a great skill. Let’s hope that that skill keeps you creating great stories with data into the near and far distant future. Thanks for coming along for the ride as we dove into modern data analysis using R! "],
+["A-appendixA.html", "A Statistical Background A.1 Basic statistical terms", " A Statistical Background A.1 Basic statistical terms A.1.1 Mean The mean is the most commonly reported measure of center. It is commonly called the “average” though this term can be a little ambiguous. The mean is the sum of all of the data elements divided by how many elements there are. If we have \\(n\\) data points, the mean is given by: \\[Mean = \\frac{x_1 + x_2 + \\cdots + x_n}{n}\\] A.1.2 Median The median is calculated by first sorting a variable’s data from smallest to largest. After sorting the data, the middle element in the list is the median. If the middle falls between two values, then the median is the mean of those two values. A.1.3 Standard deviation We will next discuss the standard deviation of a sample dataset pertaining to one variable. The formula can be a little intimidating at first but it is important to remember that it is essentially a measure of how far to expect a given data value is from its mean: \\[Standard \\, deviation = \\sqrt{\\frac{(x_1 - Mean)^2 + (x_2 - Mean)^2 + \\cdots + (x_n - Mean)^2}{n - 1}}\\] A.1.4 Five-number summary The five-number summary consists of five values: minimum, first quantile (25th percentile), median (50th percentile), third quantile (75th) quantile, and maximum. The quantiles are calculated as first quantile (\\(Q_1\\)): the median of the first half of the sorted data third quantile (\\(Q_3\\)): the median of the second half of the sorted data The interquartile range is defined as \\(Q_3 - Q_1\\) and is a measure of how spread out the middle 50% of values is. The five-number summary is not influenced by the presence of outliers in the ways that the mean and standard deviation are. It is, thus, recommended for skewed datasets. A.1.5 Distribution The distribution of a variable/dataset corresponds to generalizing patterns in the dataset. It often shows how frequently elements in the dataset appear. It shows how the data varies and gives some information about where a typical element in the data might fall. Distributions are most easily seen through data visualization. A.1.6 Outliers Outliers correspond to values in the dataset that fall far outside the range of “ordinary” values. In regards to a boxplot (by default), they correspond to values below \\(Q_1 - (1.5 * IQR)\\) or above \\(Q_3 + (1.5 * IQR)\\). Note that these terms (aside from Distribution) only apply to quantitative variables. "],
+["B-appendixB.html", "B Inference Examples Needed packages B.1 Inference mind map B.2 One mean B.3 One proportion B.4 Two proportions B.5 Two means (independent samples) B.6 Two means (paired samples)", " B Inference Examples This appendix is designed to provide you with examples of the five basic hypothesis tests and their corresponding confidence intervals. Traditional theory-based methods as well as computational-based methods are presented. Note: This appendix is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. Please check out our sneak peak of infer below in the meanwhile. For more details on infer visit https://infer.netlify.com/. Needed packages library(dplyr) library(ggplot2) library(infer) library(knitr) library(readr) library(janitor) B.1 Inference mind map To help you better navigate and choose the appropriate analysis, we’ve created a mind map on http://coggle.it available here and below. Figure B.1: Mind map for Inference B.2 One mean B.2.1 Problem statement The National Survey of Family Growth conducted by the Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men’s and women’s health. One of the variables collected on this survey is the age at first marriage. 5,534 randomly sampled US women between 2006 and 2010 completed the survey. The women sampled here had been married at least once. Do we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years? (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 4]) B.2.2 Competing hypotheses In words Null hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is equal to 23 years. Alternative hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years. In symbols (with annotations) \\(H_0: \\mu = \\mu_{0}\\), where \\(\\mu\\) represents the mean age of first marriage for all US women from 2006 to 2010 and \\(\\mu_0\\) is 23. \\(H_A: \\mu &gt; 23\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.2.3 Exploring the sample data age_at_marriage &lt;- read_csv(&quot;https://moderndive.com/data/ageAtMar.csv&quot;) age_summ &lt;- age_at_marriage %&gt;% summarize(sample_size = n(), mean = mean(age), sd = sd(age), minimum = min(age), lower_quartile = quantile(age, 0.25), median = median(age), upper_quartile = quantile(age, 0.75), max = max(age)) kable(age_summ) sample_size mean sd minimum lower_quartile median upper_quartile max 5534 23.4 4.72 10 20 23 26 43 The histogram below also shows the distribution of age. ggplot(data = age_at_marriage, mapping = aes(x = age)) + geom_histogram(binwidth = 3, color = &quot;white&quot;) The observed statistic of interest here is the sample mean: x_bar &lt;- age_at_marriage %&gt;% specify(response = age) %&gt;% calculate(stat = &quot;mean&quot;) x_bar # A tibble: 1 x 1 stat &lt;dbl&gt; 1 23.4 Guess about statistical significance We are looking to see if the observed sample mean of 23.44 is statistically greater than \\(\\mu_0 = 23\\). They seem to be quite close, but we have a large sample size here. Let’s guess that the large sample size will lead us to reject this practically small difference. B.2.4 Non-traditional methods Bootstrapping for hypothesis test In order to look to see if the observed sample mean of 23.44 is statistically greater than \\(\\mu_0 = 23\\), we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 5534 was selected. We can use the idea of bootstrapping to simulate the population from which the sample came and then generate samples from that simulated population to account for sampling variability. Recall how bootstrapping would apply in this context: Sample with replacement from our original sample of 5534 women and repeat this process 10,000 times, calculate the mean for each of the 10,000 bootstrap samples created in Step 1., combine all of these bootstrap statistics calculated in Step 2 into a boot_distn object, and shift the center of this distribution over to the null value of 23. (This is needed since it will be centered at 23.44 via the process of bootstrapping.) set.seed(2018) null_distn_one_mean &lt;- age_at_marriage %&gt;% specify(response = age) %&gt;% hypothesize(null = &quot;point&quot;, mu = 23) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) null_distn_one_mean %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44 for our \\(p\\)-value. null_distn_one_mean %&gt;% visualize(obs_stat = x_bar, direction = &quot;greater&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_one_mean %&gt;% get_pvalue(obs_stat = x_bar, direction = &quot;greater&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0 So our \\(p\\)-value is 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tail of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\mu\\) using our sample data using bootstrapping. Note that we don’t need to shift this distribution since we want the center of our confidence interval to be our point estimate \\(\\bar{x}_{obs} = 23.44\\). boot_distn_one_mean &lt;- age_at_marriage %&gt;% specify(response = age) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) ci &lt;- boot_distn_one_mean %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 23.3 23.6 boot_distn_one_mean %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 23 is not contained in this confidence interval as a plausible value of \\(\\mu\\) (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (\\(\\mu &gt; 23\\)). Interpretation: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.316 and 23.565. B.2.5 Traditional methods Check conditions Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations are collected independently. The cases are selected independently through random sampling so this condition is met. Approximately normal: The distribution of the response variable should be normal or the sample size should be at least 30. The histogram for the sample above does show some skew. The Q-Q plot below also shows some skew. ggplot(data = age_at_marriage, mapping = aes(sample = age)) + stat_qq() The sample size here is quite large though (\\(n = 5534\\)) so both conditions are met. Test statistic The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean \\(\\mu\\). A good guess is the sample mean \\(\\bar{X}\\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \\(\\bar{x}_{obs} = 23.44\\) or larger assuming that the population mean is 23 (assuming the null hypothesis is true). If the conditions are met and assuming \\(H_0\\) is true, we can “standardize” this original test statistic of \\(\\bar{X}\\) into a \\(T\\) statistic that follows a \\(t\\) distribution with degrees of freedom equal to \\(df = n - 1\\): \\[ T =\\dfrac{ \\bar{X} - \\mu_0}{ S / \\sqrt{n} } \\sim t (df = n - 1) \\] where \\(S\\) represents the standard deviation of the sample and \\(n\\) is the sample size. Observed test statistic While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t_test() function to perform this analysis for us. t_test_results &lt;- age_at_marriage %&gt;% infer::t_test(formula = age ~ NULL, alternative = &quot;greater&quot;, mu = 23) t_test_results # A tibble: 1 x 6 statistic t_df p_value alternative lower_ci upper_ci &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 6.94 5533 2.25e-12 greater 23.3 Inf We see here that the \\(t_{obs}\\) value is 6.936. Compute \\(p\\)-value The \\(p\\)-value—the probability of observing an \\(t_{obs}\\) value of 6.936 or more in our null distribution of a \\(t\\) with 5533 degrees of freedom—is essentially 0. State conclusion We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean was statistically greater than the hypothesized mean has supporting evidence here. Based on this sample, we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years. Confidence interval t.test(x = age_at_marriage$age, alternative = &quot;two.sided&quot;, mu = 23)$conf [1] 23.3 23.6 attr(,&quot;conf.level&quot;) [1] 0.95 B.2.6 Comparing results Observing the bootstrap distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met (the large sample size was the driver here) leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results. B.3 One proportion B.3.1 Problem statement The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. 73 were satisfied and the remaining were unsatisfied. Based on these findings from the sample, can we reject the CEO’s hypothesis that 80% of the customers are satisfied? [Tweaked a bit from http://stattrek.com/hypothesis-test/proportion.aspx?Tutorial=AP] B.3.2 Competing hypotheses In words Null hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is equal 0.80. Alternative hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80. In symbols (with annotations) \\(H_0: \\pi = p_{0}\\), where \\(\\pi\\) represents the proportion of all customers of the large electric utility satisfied with service they receive and \\(p_0\\) is 0.8. \\(H_A: \\pi \\ne 0.8\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.3.3 Exploring the sample data elec &lt;- c(rep(&quot;satisfied&quot;, 73), rep(&quot;unsatisfied&quot;, 27)) %&gt;% as_data_frame() %&gt;% rename(satisfy = value) The bar graph below also shows the distribution of satisfy. ggplot(data = elec, aes(x = satisfy)) + geom_bar() The observed statistic is computed as p_hat &lt;- elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% calculate(stat = &quot;prop&quot;) p_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 0.73 Guess about statistical significance We are looking to see if the sample proportion of 0.73 is statistically different from \\(p_0 = 0.8\\) based on this sample. They seem to be quite close, and our sample size is not huge here (\\(n = 100\\)). Let’s guess that we do not have evidence to reject the null hypothesis. B.3.4 Non-traditional methods Simulation for hypothesis test In order to look to see if 0.73 is statistically different from 0.8, we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 100 was selected. We can use the idea of an unfair coin to simulate this process. We will simulate flipping an unfair coin (with probability of success 0.8 matching the null hypothesis) 100 times. Then we will keep track of how many heads come up in those 100 flips. Our simulated statistic matches with how we calculated the original statistic \\(\\hat{p}\\): the number of heads (satisfied) out of our total sample of 100. We then repeat this process many times (say 10,000) to create the null distribution looking at the simulated proportions of successes: set.seed(2018) null_distn_one_prop &lt;- elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% hypothesize(null = &quot;point&quot;, p = 0.8) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;prop&quot;) null_distn_one_prop %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a two-tailed test so we will be looking for values that are 0.8 - 0.73 = 0.07 away from 0.8 in BOTH directions for our \\(p\\)-value: null_distn_one_prop %&gt;% visualize(obs_stat = p_hat, direction = &quot;both&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_one_prop %&gt;% get_pvalue(obs_stat = p_hat, direction = &quot;both&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.0813 So our \\(p\\)-value is 0.081 and we fail to reject the null hypothesis at the 5% level. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\pi\\) using our sample data. To do so, we use bootstrapping, which involves sampling with replacement from our original sample of 100 survey respondents and repeating this process 10,000 times, calculating the proportion of successes for each of the 10,000 bootstrap samples created in Step 1., combining all of these bootstrap statistics calculated in Step 2 into a boot_distn object, identifying the 2.5th and 97.5th percentiles of this distribution (corresponding to the 5% significance level chosen) to find a 95% confidence interval for \\(\\pi\\), and interpret this confidence interval in the context of the problem. boot_distn_one_prop &lt;- elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;prop&quot;) Just as we use the mean function for calculating the mean over a numerical variable, we can also use it to compute the proportion of successes for a categorical variable where we specify what we are calling a “success” after the ==. (Think about the formula for calculating a mean and how R handles logical statements such as satisfy == &quot;satisfied&quot; for why this must be true.) ci &lt;- boot_distn_one_prop %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 0.64 0.81 boot_distn_one_prop %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0.80 is contained in this confidence interval as a plausible value of \\(\\pi\\) (the unknown population proportion). This matches with our hypothesis test results of failing to reject the null hypothesis. Interpretation: We are 95% confident the true proportion of customers who are satisfied with the service they receive is between 0.64 and 0.81. B.3.5 Traditional methods Check conditions Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations are collected independently. The cases are selected independently through random sampling so this condition is met. Approximately normal: The number of expected successes and expected failures is at least 10. This condition is met since 73 and 27 are both greater than 10. Test statistic The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population proportion \\(\\pi\\). A good guess is the sample proportion \\(\\hat{P}\\). Recall that this sample proportion is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample proportion of \\(\\hat{p}_{obs} = 0.73\\) or larger assuming that the population proportion is 0.80 (assuming the null hypothesis is true). If the conditions are met and assuming \\(H_0\\) is true, we can standardize this original test statistic of \\(\\hat{P}\\) into a \\(Z\\) statistic that follows a \\(N(0, 1)\\) distribution. \\[ Z =\\dfrac{ \\hat{P} - p_0}{\\sqrt{\\dfrac{p_0(1 - p_0)}{n} }} \\sim N(0, 1) \\] Observed test statistic While one could compute this observed test statistic by “hand” by plugging the observed values into the formula, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. The calculation has been done in R below for completeness though: p_hat &lt;- 0.73 p0 &lt;- 0.8 n &lt;- 100 (z_obs &lt;- (p_hat - p0) / sqrt( (p0 * (1 - p0)) / n)) [1] -1.75 We see here that the \\(z_{obs}\\) value is around -1.75. Our observed sample proportion of 0.73 is 1.75 standard errors below the hypothesized parameter value of 0.8. Visualize and compute \\(p\\)-value elec %&gt;% specify(response = satisfy, success = &quot;satisfied&quot;) %&gt;% hypothesize(null = &quot;point&quot;, p = 0.8) %&gt;% calculate(stat = &quot;z&quot;) %&gt;% visualize(method = &quot;theoretical&quot;, obs_stat = z_obs, direction = &quot;both&quot;) 2 * pnorm(z_obs) [1] 0.0801 The \\(p\\)-value—the probability of observing an \\(z_{obs}\\) value of -1.75 or more extreme (in both directions) in our null distribution—is around 8%. Note that we could also do this test directly using the prop.test function. stats::prop.test(x = table(elec$satisfy), n = length(elec$satisfy), alternative = &quot;two.sided&quot;, p = 0.8, correct = FALSE) 1-sample proportions test without continuity correction data: table(elec$satisfy), null probability 0.8 X-squared = 3, df = 1, p-value = 0.08 alternative hypothesis: true p is not equal to 0.8 95 percent confidence interval: 0.636 0.807 sample estimates: p 0.73 prop.test does a \\(\\chi^2\\) test here but this matches up exactly with what we would expect: \\(x^2_{obs} = 3.06 = (-1.75)^2 = (z_{obs})^2\\) and the \\(p\\)-values are the same because we are focusing on a two-tailed test. Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping. State conclusion We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample proportion was not statistically greater than the hypothesized proportion has not been invalidated. Based on this sample, we have do not evidence that the proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80, at the 5% level. B.3.6 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results. B.4 Two proportions B.4.1 Problem statement A 2010 survey asked 827 randomly sampled registered voters in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of California? Or do you not know enough to say?” Conduct a hypothesis test to determine if the data provide strong evidence that the proportion of college graduates who do not have an opinion on this issue is different than that of non-college graduates. (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 6]) B.4.2 Competing hypotheses In words Null hypothesis: There is no association between having an opinion on drilling and having a college degree for all registered California voters in 2010. Alternative hypothesis: There is an association between having an opinion on drilling and having a college degree for all registered California voters in 2010. Another way in words Null hypothesis: The probability that a Californian voter in 2010 having no opinion on drilling and is a college graduate is the same as that of a non-college graduate. Alternative hypothesis: These parameter probabilities are different. In symbols (with annotations) \\(H_0: \\pi_{college} = \\pi_{no\\_college}\\) or \\(H_0: \\pi_{college} - \\pi_{no\\_college} = 0\\), where \\(\\pi\\) represents the probability of not having an opinion on drilling. \\(H_A: \\pi_{college} - \\pi_{no\\_college} \\ne 0\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.4.3 Exploring the sample data offshore &lt;- read_csv(&quot;https://moderndive.com/data/offshore.csv&quot;) offshore %&gt;% tabyl(college_grad, response) college_grad no opinion opinion no 131 258 yes 104 334 off_summ &lt;- offshore %&gt;% group_by(college_grad) %&gt;% summarize(prop_no_opinion = mean(response == &quot;no opinion&quot;), sample_size = n()) ggplot(offshore, aes(x = college_grad, fill = response)) + geom_bar(position = &quot;fill&quot;) + coord_flip() Guess about statistical significance We are looking to see if a difference exists in the size of the bars corresponding to no opinion for the plot. Based solely on the plot, we have little reason to believe that a difference exists since the bars seem to be about the same size, BUT…it’s important to use statistics to see if that difference is actually statistically significant! B.4.4 Non-traditional methods Collecting summary info The observed statistic is d_hat &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) d_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -0.0993 Randomization for hypothesis test In order to look to see if the observed sample proportion of no opinion for college graduates of 0.337 is statistically different than that for graduates of 0.237, we need to account for the sample sizes. Note that this is the same as looking to see if \\(\\hat{p}_{grad} - \\hat{p}_{nograd}\\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 389 and 438 were selected. We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability. set.seed(2018) null_distn_two_props &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) null_distn_two_props %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.099 or less than or equal to 0.099 for our \\(p\\)-value. null_distn_two_props %&gt;% visualize(obs_stat = d_hat, direction = &quot;two_sided&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_two_props %&gt;% get_pvalue(obs_stat = d_hat, direction = &quot;two_sided&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.0021 So our \\(p\\)-value is 0.002 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tails of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\pi_{college} - \\pi_{no\\_college}\\) using our sample data with bootstrapping. boot_distn_two_props &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in props&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) ci &lt;- boot_distn_two_props %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -0.161 -0.0378 boot_distn_two_props %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0 is not contained in this confidence interval as a plausible value of \\(\\pi_{college} - \\pi_{no\\_college}\\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates. Interpretation: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates. B.4.5 Traditional methods B.4.6 Check conditions Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: Each case that was selected must be independent of all the other cases selected. This condition is met since cases were selected at random to observe. Sample size: The number of pooled successes and pooled failures must be at least 10 for each group. We need to first figure out the pooled success rate: \\[\\hat{p}_{obs} = \\dfrac{131 + 104}{827} = 0.28.\\] We now determine expected (pooled) success and failure counts: \\(0.28 \\cdot (131 + 258) = 108.92\\), \\(0.72 \\cdot (131 + 258) = 280.08\\) \\(0.28 \\cdot (104 + 334) = 122.64\\), \\(0.72 \\cdot (104 + 334) = 315.36\\) Independent selection of samples: The cases are not paired in any meaningful way. We have no reason to suspect that a college graduate selected would have any relationship to a non-college graduate selected. B.4.7 Test statistic The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample proportions corresponding to no opinion on drilling (\\(\\hat{p}_{college, obs} - \\hat{p}_{no\\_college, obs}\\) = 0.033) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the standard normal distribution to standardize the difference in sample proportions (\\(\\hat{P}_{college} - \\hat{P}_{no\\_college}\\)) using the standard error of \\(\\hat{P}_{college} - \\hat{P}_{no\\_college}\\) and the pooled estimate: \\[ Z =\\dfrac{ (\\hat{P}_1 - \\hat{P}_2) - 0}{\\sqrt{\\dfrac{\\hat{P}(1 - \\hat{P})}{n_1} + \\dfrac{\\hat{P}(1 - \\hat{P})}{n_2} }} \\sim N(0, 1) \\] where \\(\\hat{P} = \\dfrac{\\text{total number of successes} }{ \\text{total number of cases}}.\\) Observed test statistic While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the prop.test function to perform this analysis for us. z_hat &lt;- offshore %&gt;% specify(response ~ college_grad, success = &quot;no opinion&quot;) %&gt;% calculate(stat = &quot;z&quot;, order = c(&quot;yes&quot;, &quot;no&quot;)) z_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -3.16 The observed difference in sample proportions is 3.16 standard deviations smaller than 0. The \\(p\\)-value—the probability of observing a \\(Z\\) value of -3.16 or more extreme in our null distribution—is 0.0016. This can also be calculated in R directly: 2 * pnorm(-3.16, lower.tail = TRUE) [1] 0.00158 B.4.8 State conclusion We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians. B.4.9 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results. B.5 Two means (independent samples) B.5.1 Problem statement Average income varies from one region of the country to another, and it often reflects both lifestyles and regional living expenses. Suppose a new graduate is considering a job in two locations, Cleveland, OH and Sacramento, CA, and he wants to see whether the average income in one of these cities is higher than the other. He would like to conduct a hypothesis test based on two randomly selected samples from the 2000 Census. (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 5]) B.5.2 Competing hypotheses In words Null hypothesis: There is no association between income and location (Cleveland, OH and Sacramento, CA). Alternative hypothesis: There is an association between income and location (Cleveland, OH and Sacramento, CA). Another way in words Null hypothesis: The mean income is the same for both cities. Alternative hypothesis: The mean income is different for the two cities. In symbols (with annotations) \\(H_0: \\mu_{sac} = \\mu_{cle}\\) or \\(H_0: \\mu_{sac} - \\mu_{cle} = 0\\), where \\(\\mu\\) represents the average income. \\(H_A: \\mu_{sac} - \\mu_{cle} \\ne 0\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.5.3 Exploring the sample data cle_sac &lt;- read.delim(&quot;https://moderndive.com/data/cleSac.txt&quot;) %&gt;% rename(metro_area = Metropolitan_area_Detailed, income = Total_personal_income) %&gt;% na.omit() inc_summ &lt;- cle_sac %&gt;% group_by(metro_area) %&gt;% summarize(sample_size = n(), mean = mean(income), sd = sd(income), minimum = min(income), lower_quartile = quantile(income, 0.25), median = median(income), upper_quartile = quantile(income, 0.75), max = max(income)) kable(inc_summ) metro_area sample_size mean sd minimum lower_quartile median upper_quartile max Cleveland_ OH 212 27467 27681 0 8475 21000 35275 152400 Sacramento_ CA 175 32428 35774 0 8050 20000 49350 206900 The boxplot below also shows the mean for each group highlighted by the red dots. ggplot(cle_sac, aes(x = metro_area, y = income)) + geom_boxplot() + stat_summary(fun.y = &quot;mean&quot;, geom = &quot;point&quot;, color = &quot;red&quot;) Guess about statistical significance We are looking to see if a difference exists in the mean income of the two levels of the explanatory variable. Based solely on the boxplot, we have reason to believe that no difference exists. The distributions of income seem similar and the means fall in roughly the same place. B.5.4 Non-traditional methods Collecting summary info We now compute the observed statistic: d_hat &lt;- cle_sac %&gt;% specify(income ~ metro_area) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Sacramento_ CA&quot;, &quot;Cleveland_ OH&quot;)) d_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 4960. Randomization for hypothesis test In order to look to see if the observed sample mean for Sacramento of 27467.066 is statistically different than that for Cleveland of 32427.543, we need to account for the sample sizes. Note that this is the same as looking to see if \\(\\bar{x}_{sac} - \\bar{x}_{cle}\\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 212 and 175 were selected. We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability. set.seed(2018) null_distn_two_means &lt;- cle_sac %&gt;% specify(income ~ metro_area) %&gt;% hypothesize(null = &quot;independence&quot;) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Sacramento_ CA&quot;, &quot;Cleveland_ OH&quot;)) null_distn_two_means %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to 4960.477 or less than or equal to -4960.477 for our \\(p\\)-value. null_distn_two_means %&gt;% visualize(obs_stat = d_hat, direction = &quot;both&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_two_means %&gt;% get_pvalue(obs_stat = d_hat, direction = &quot;both&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0.124 So our \\(p\\)-value is 0.124 and we fail to reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are not very far into the tail of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\mu_{sac} - \\mu_{cle}\\) using our sample data with bootstrapping. Here we will bootstrap each of the groups with replacement instead of shuffling. This is done using the groups argument in the resample function to fix the size of each group to be the same as the original group sizes of 175 for Sacramento and 212 for Cleveland. boot_distn_two_means &lt;- cle_sac %&gt;% specify(income ~ metro_area) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;diff in means&quot;, order = c(&quot;Sacramento_ CA&quot;, &quot;Cleveland_ OH&quot;)) ci &lt;- boot_distn_two_means %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -1446. 11308. boot_distn_two_means %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0 is contained in this confidence interval as a plausible value of \\(\\mu_{sac} - \\mu_{cle}\\) (the unknown population parameter). This matches with our hypothesis test results of failing to reject the null hypothesis. Since zero is a plausible value of the population parameter, we do not have evidence that Sacramento incomes are different than Cleveland incomes. Interpretation: We are 95% confident the true mean yearly income for those living in Sacramento is between 1445.53 dollars smaller to 11307.82 dollars higher than for Cleveland. Note: You could also use the null distribution based on randomization with a shift to have its center at \\(\\bar{x}_{sac} - \\bar{x}_{cle} = \\$4960.48\\) instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above. B.5.5 Traditional methods Check conditions Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations are independent in both groups. This metro_area variable is met since the cases are randomly selected from each city. Approximately normal: The distribution of the response for each group should be normal or the sample sizes should be at least 30. ggplot(cle_sac, aes(x = income)) + geom_histogram(color = &quot;white&quot;, binwidth = 20000) + facet_wrap(~ metro_area) We have some reason to doubt the normality assumption here since both the histograms show deviation from a normal model fitting the data well for each group. The sample sizes for each group are greater than 100 though so the assumptions should still apply. Independent samples: The samples should be collected without any natural pairing. There is no mention of there being a relationship between those selected in Cleveland and in Sacramento. B.5.6 Test statistic The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample means (\\(\\bar{x}_{sac, obs} - \\bar{x}_{cle, obs}\\) = 4960.477) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the \\(t\\) distribution to standardize the difference in sample means (\\(\\bar{X}_{sac} - \\bar{X}_{cle}\\)) using the approximate standard error of \\(\\bar{X}_{sac} - \\bar{X}_{cle}\\) (invoking \\(S_{sac}\\) and \\(S_{cle}\\) as estimates of unknown \\(\\sigma_{sac}\\) and \\(\\sigma_{cle}\\)). \\[ T =\\dfrac{ (\\bar{X}_1 - \\bar{X}_2) - 0}{ \\sqrt{\\dfrac{S_1^2}{n_1} + \\dfrac{S_2^2}{n_2}} } \\sim t (df = min(n_1 - 1, n_2 - 1)) \\] where 1 = Sacramento and 2 = Cleveland with \\(S_1^2\\) and \\(S_2^2\\) the sample variance of the incomes of both cities, respectively, and \\(n_1 = 175\\) for Sacramento and \\(n_2 = 212\\) for Cleveland. Observed test statistic Note that we could also do (ALMOST) this test directly using the t.test function. The x and y arguments are expected to both be numeric vectors here so we’ll need to appropriately filter our datasets. cle_sac %&gt;% specify(income ~ metro_area) %&gt;% calculate(stat = &quot;t&quot;, order = c(&quot;Cleveland_ OH&quot;, &quot;Sacramento_ CA&quot;)) # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -1.50 We see here that the observed test statistic value is around -1.5. While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. B.5.7 Compute \\(p\\)-value The \\(p\\)-value—the probability of observing an \\(t_{174}\\) value of -1.501 or more extreme (in both directions) in our null distribution—is 0.13. This can also be calculated in R directly: 2 * pt(-1.501, df = min(212 - 1, 175 - 1), lower.tail = TRUE) [1] 0.135 We can also approximate by using the standard normal curve: 2 * pnorm(-1.501) [1] 0.133 Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping. B.5.8 State conclusion We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference not existing in the means was backed by this statistical analysis. We do not have evidence to suggest that the true mean income differs between Cleveland, OH and Sacramento, CA based on this data. B.5.9 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results. B.6 Two means (paired samples) Problem statement Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water at 10 randomly selected locations on a stretch of river. Do the data suggest that the true average concentration in the surface water is smaller than that of bottom water? (Note that units are not given.) [Tweaked a bit from https://onlinecourses.science.psu.edu/stat500/node/51] B.6.1 Competing hypotheses In words Null hypothesis: The mean concentration in the bottom water is the same as that of the surface water at different paired locations. Alternative hypothesis: The mean concentration in the surface water is smaller than that of the bottom water at different paired locations. In symbols (with annotations) \\(H_0: \\mu_{diff} = 0\\), where \\(\\mu_{diff}\\) represents the mean difference in concentration for surface water minus bottom water. \\(H_A: \\mu_{diff} &lt; 0\\) Set \\(\\alpha\\) It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. B.6.2 Exploring the sample data zinc_tidy &lt;- read_csv(&quot;https://moderndive.com/data/zinc_tidy.csv&quot;) We want to look at the differences in surface - bottom for each location: zinc_diff &lt;- zinc_tidy %&gt;% group_by(loc_id) %&gt;% summarize(pair_diff = diff(concentration)) %&gt;% ungroup() Next we calculate the mean difference as our observed statistic: d_hat &lt;- zinc_diff %&gt;% specify(response = pair_diff) %&gt;% calculate(stat = &quot;mean&quot;) d_hat # A tibble: 1 x 1 stat &lt;dbl&gt; 1 -0.0804 The histogram below also shows the distribution of pair_diff. ggplot(zinc_diff, aes(x = pair_diff)) + geom_histogram(binwidth = 0.04, color = &quot;white&quot;) Guess about statistical significance We are looking to see if the sample paired mean difference of -0.08 is statistically less than 0. They seem to be quite close, but we have a small number of pairs here. Let’s guess that we will fail to reject the null hypothesis. B.6.3 Non-traditional methods Bootstrapping for hypothesis test In order to look to see if the observed sample mean difference \\(\\bar{x}_{diff} = 4960.477\\) is statistically less than 0, we need to account for the number of pairs. We also need to determine a process that replicates how the paired data was selected in a way similar to how we calculated our original difference in sample means. Treating the differences as our data of interest, we next use the process of bootstrapping to build other simulated samples and then calculate the mean of the bootstrap samples. We hypothesize that the mean difference is zero. This process is similar to comparing the One Mean example seen above, but using the differences between the two groups as a single sample with a hypothesized mean difference of 0. set.seed(2018) null_distn_paired_means &lt;- zinc_diff %&gt;% specify(response = pair_diff) %&gt;% hypothesize(null = &quot;point&quot;, mu = 0) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) null_distn_paired_means %&gt;% visualize() We can next use this distribution to observe our \\(p\\)-value. Recall this is a left-tailed test so we will be looking for values that are less than or equal to 4960.477 for our \\(p\\)-value. null_distn_paired_means %&gt;% visualize(obs_stat = d_hat, direction = &quot;less&quot;) Calculate \\(p\\)-value pvalue &lt;- null_distn_paired_means %&gt;% get_pvalue(obs_stat = d_hat, direction = &quot;less&quot;) pvalue # A tibble: 1 x 1 p_value &lt;dbl&gt; 1 0 So our \\(p\\)-value is essentially 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the left tail of the null distribution. Bootstrapping for confidence interval We can also create a confidence interval for the unknown population parameter \\(\\mu_{diff}\\) using our sample data (the calculated differences) with bootstrapping. This is similar to the bootstrapping done in a one sample mean case, except now our data is differences instead of raw numerical data. Note that this code is identical to the pipeline shown in the hypothesis test above except the hypothesize() function is not called. boot_distn_paired_means &lt;- zinc_diff %&gt;% specify(response = pair_diff) %&gt;% generate(reps = 10000) %&gt;% calculate(stat = &quot;mean&quot;) ci &lt;- boot_distn_paired_means %&gt;% get_ci() ci # A tibble: 1 x 2 `2.5%` `97.5%` &lt;dbl&gt; &lt;dbl&gt; 1 -0.112 -0.0503 boot_distn_paired_means %&gt;% visualize(endpoints = ci, direction = &quot;between&quot;) We see that 0 is not contained in this confidence interval as a plausible value of \\(\\mu_{diff}\\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter and since the entire confidence interval falls below zero, we have evidence that surface zinc concentration levels are lower, on average, than bottom level zinc concentrations. Interpretation: We are 95% confident the true mean zinc concentration on the surface is between 0.11 units smaller to 0.05 units smaller than on the bottom. B.6.4 Traditional methods Check conditions Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met. Independent observations: The observations among pairs are independent. The locations are selected independently through random sampling so this condition is met. Approximately normal: The distribution of population of differences is normal or the number of pairs is at least 30. The histogram above does show some skew so we have reason to doubt the population being normal based on this sample. We also only have 10 pairs which is fewer than the 30 needed. A theory-based test may not be valid here. Test statistic The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean difference \\(\\mu_{diff}\\). A good guess is the sample mean difference \\(\\bar{X}_{diff}\\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \\(\\bar{x}_{diff, obs} = 0.0804\\) or larger assuming that the population mean difference is 0 (assuming the null hypothesis is true). If the conditions are met and assuming \\(H_0\\) is true, we can “standardize” this original test statistic of \\(\\bar{X}_{diff}\\) into a \\(T\\) statistic that follows a \\(t\\) distribution with degrees of freedom equal to \\(df = n - 1\\): \\[ T =\\dfrac{ \\bar{X}_{diff} - 0}{ S_{diff} / \\sqrt{n} } \\sim t (df = n - 1) \\] where \\(S\\) represents the standard deviation of the sample differences and \\(n\\) is the number of pairs. Observed test statistic While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t_test function on the differences to perform this analysis for us. t_test_results &lt;- zinc_diff %&gt;% infer::t_test(formula = pair_diff ~ NULL, alternative = &quot;less&quot;, mu = 0) t_test_results # A tibble: 1 x 6 statistic t_df p_value alternative lower_ci upper_ci &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; 1 -4.86 9 0.000446 less -Inf -0.0501 We see here that the \\(t_{obs}\\) value is -4.864. Compute \\(p\\)-value The \\(p\\)-value—the probability of observing a \\(t_{obs}\\) value of -4.864 or less in our null distribution of a \\(t\\) with 9 degrees of freedom—is 0. This can also be calculated in R directly: pt(-4.8638, df = nrow(zinc_diff) - 1, lower.tail = TRUE) [1] 0.000446 State conclusion We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean difference was not statistically less than the hypothesized mean of 0 has been invalidated here. Based on this sample, we have evidence that the mean concentration in the bottom water is greater than that of the surface water at different paired locations. B.6.5 Comparing results Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \\(p\\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results here. "],
+["C-appendixC.html", "C Reach for the Stars Needed packages C.1 Sorted barplots C.2 Interactive graphics", " C Reach for the Stars Needed packages library(dplyr) library(ggplot2) library(knitr) library(dygraphs) library(nycflights13) C.1 Sorted barplots Building upon the example in Section 3.8: flights_table &lt;- table(flights$carrier) flights_table 9E AA AS B6 DL EV F9 FL HA MQ OO UA US 18460 32729 714 54635 48110 54173 685 3260 342 26397 32 58665 20536 VX WN YV 5162 12275 601 We can sort this table from highest to lowest counts by using the sort function: sorted_flights &lt;- sort(flights_table, decreasing = TRUE) names(sorted_flights) [1] &quot;UA&quot; &quot;B6&quot; &quot;EV&quot; &quot;DL&quot; &quot;AA&quot; &quot;MQ&quot; &quot;US&quot; &quot;9E&quot; &quot;WN&quot; &quot;VX&quot; &quot;FL&quot; &quot;AS&quot; &quot;F9&quot; &quot;YV&quot; &quot;HA&quot; [16] &quot;OO&quot; It is often preferred for barplots to be ordered corresponding to the heights of the bars. This allows the reader to more easily compare the ordering of different airlines in terms of departed flights (Robbins 2013). We can also much more easily answer questions like “How many airlines have more departing flights than Southwest Airlines?”. We can use the sorted table giving the number of flights defined as sorted_flights to reorder the carrier. ggplot(data = flights, mapping = aes(x = carrier)) + geom_bar() + scale_x_discrete(limits = names(sorted_flights)) Figure C.1: Number of flights departing NYC in 2013 by airline - Descending numbers The last addition here specifies the values of the horizontal x axis on a discrete scale to correspond to those given by the entries of sorted_flights. C.2 Interactive graphics C.2.1 Interactive linegraphs Another useful tool for viewing linegraphs such as this is the dygraph function in the dygraphs package in combination with the dyRangeSelector function. This allows us to zoom in on a selected range and get an interactive plot for us to work with: library(dygraphs) flights_day &lt;- mutate(flights, date = as.Date(time_hour)) flights_summarized &lt;- flights_day %&gt;% group_by(date) %&gt;% summarize(median_arr_delay = median(arr_delay, na.rm = TRUE)) rownames(flights_summarized) &lt;- flights_summarized$date flights_summarized &lt;- select(flights_summarized, -date) dyRangeSelector(dygraph(flights_summarized)) The syntax here is a little different than what we have covered so far. The dygraph function is expecting for the dates to be given as the rownames of the object. We then remove the date variable from the flights_summarized data frame since it is accounted for in the rownames. Lastly, we run the dygraph function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via dyRangeSelector. (Note that this plot will only be interactive in the HTML version of this book.) "],
+["references.html", "References", " References "]
+]
diff --git a/previous_versions/v0.4.0/style.css b/previous_versions/v0.4.0/style.css
new file mode 100755
index 000000000..ed4333b47
--- /dev/null
+++ b/previous_versions/v0.4.0/style.css
@@ -0,0 +1,26 @@
+.learncheck {
+  padding: 1em 1em 1em 1em;
+  margin-bottom: 10px;
+  background: #9ED3AD 5px center/3em no-repeat;
+} 
+
+.review {
+  padding: 1em 1em 1em 1em;
+  margin-bottom: 10px;
+  background: #9ED3AD 1px center/1em no-repeat;
+} 
+
+p.caption {
+  color: #777;
+  margin-top: 10px;
+}
+p code {
+  white-space: inherit;
+}
+pre {
+  word-break: normal;
+  word-wrap: normal;
+}
+pre code {
+  white-space: inherit;
+}
diff --git a/previous_versions/v0.4.0/wide_format.png b/previous_versions/v0.4.0/wide_format.png
new file mode 100755
index 000000000..d693fadc2
Binary files /dev/null and b/previous_versions/v0.4.0/wide_format.png differ
diff --git a/style.css b/style.css
index ed4333b47..d39a9051a 100755
--- a/style.css
+++ b/style.css
@@ -2,13 +2,19 @@
   padding: 1em 1em 1em 1em;
   margin-bottom: 10px;
   background: #9ED3AD 5px center/3em no-repeat;
-} 
+}
+
+.announcement {
+  padding: 1em 1em 1em 1em;
+  margin-bottom: 10px;
+  background: #F6D328 5px center/3em no-repeat;
+}
 
 .review {
   padding: 1em 1em 1em 1em;
   margin-bottom: 10px;
   background: #9ED3AD 1px center/1em no-repeat;
-} 
+}
 
 p.caption {
   color: #777;